Help

Max Planck Institute for Psycholinguistics

WebCelex Help - Restrictions screen



Within the restrictions screen one can put restrictions on the columns which one has selected. The selected columns are shown in the Columns section. Columns which have the 'hidden' status are shown in grey to indicate that they won't show up in the final output. One can also add a user defined column at this stage by pushing the Add Column button. These type of columns show up in light blue in the Columns section and can be used to perform specific functions on the static columns and display the results. Check the String or Numeric section for more info on the use of these user defined columns. Further topics are Operators, Variables, Regular Expressions, File Uploading, Special Characters and The Query Field.

The standard procedure for adding a restriction is one of the following two. The first option is to use the buttons to build a restriction. One selects the column on which to set a restriction by pressing the appropriate button and immediately thereafter selects the operation which should be performed on that column by pressing the button associated with that operation. A screen will pop up in which the needed values can be entered. By pressing Ok the restriction is added to the Query field. For instance, when one wants to display the results where the values for column one (i.e. Inl) are above 100, one presses the Inl button and the > button. The popup screen will display: 'Inl is larger than' and expects a value to be filled in into the textfield. By filling in '100' and pressing Ok, the query field contains the line (Inl > 100). By making use of the && (logical AND) and || (logical OR) buttons one can create increasingly complex restrictions. The second option to create a query is to type it directly into the Query field, but beware, the query is handed to a Perl program and therefore the query should be Perl compliant. For a short introduction into Perl syntax related to creating queries, press here. One simple way to learn some Perl syntax is to make use of the buttons to create queries and study the resulting Perl code within the query field. Tip: if one pushes a column button twice, the name of the associated column will be displayed in the query field, helping to create queries quicker.

Operators

There are ten different operators to be found within the restrictions screen. < is equivalent to 'is less than' and displays a '<' for numeric data and 'lt' for alphabetical data within the query field. <= is equivalent to 'is less than or equal to' and displays a '<=' for numeric data and 'le' for alphabetical data within the query field. != is equivalent to 'is not equal to' and displays '!=' for numeric data and 'ne' for alphabetical data within the query field. == is equivalent to 'is equal to' and displays '==' for numeric data and 'eq' for alphabetical data within the query field. >= is equivalent to 'is greater than or equal to' and displays '>=' for numeric data and 'ge' for alphabetical data within the query field. > is equivalent to 'is greater than' and displays '>' for numeric data and 'gt' for alphabetical data within the query field. ( and ) can be used for changing operator precedence. && (logical AND) and || (logical OR) can be used to combine statements.

Variables

Variables can be added by a user. They are meant to make life easier for the user who creates very complex queries. By clicking on the Add Variable button, one can specify a variable-name and its associated value in two popup-windows. After the name and value have been specified, the page reloads and a button will appear in the Variables section containing the name of the variable. By clicking this button, its associated value will be added to the Restrictions field. These buttons are persistent, in that they reappear in different sessions. Each user can create his own variables. When a variable is present it can be removed by clicking the Delete Variable button. The expiration date for variables is one year, which means that if no changes are made in the variables section for over one year, the buttons are automatically removed.

Regular Expressions

There are five different regular expression buttons, which is admittedly a very small subset of the possible uses of regular expressions. One should check the introduction to Perl for a section on regular expressions. The match button is the standard regular expression match. If the popup screen states 'Head matches regular expression' and one fills in 'ab' into the textfield and presses Ok, the query field will contain '(Head =~ /ab/)' which evaluates to true if and only if column one contains the sequence 'ab'. The unmatch button is the negation of the preceding button and evaluates to true if and only if a sequence is not present within a column. The begins with button evaluates to true if and only if a column starts with a certain sequence and the ends with button evaluates to true if and only if a column ends with a certain sequence. The substitute button substitutes in a column every occurence of a certain sequence X for a certain sequence Y. for instance, if we substitute within 'Word' sequence 'as' for 'ce', the query field will contain '((Word =~ s/as/ce/g)||1)'. Where s/ stands for the substitution operator, 'as' is sequence X, 'ce' is sequence Y and g ensures that the substitution is global, instead of just on the first match. The sequence '||1' ensures that even if the substitution fails, the row on which the operation is performed is shown (evaluates to true). There are two extra checkboxes; 'Ignore case', which occurs in every regular expression popup screen and 'Exclude failed substitutions' which is specific for this popup screen. 'Ignore case' just does what it says; it ignores if characters are in upper- or lowercase whenever a match or substitution is performed. 'Exclude failed substitutions' only displays the rows where substitution of sequence X for sequence Y is performed. If it is inactive (default) every row is shown, including the rows where no substitution has been done.

String Functions

String functions are only available if one has added a user-defined column. These user-defined columns are meant for displaying data which is a result of operations on the static columns. For instance, if one wants to know how many times a certain sequence occurs within a column, the number of times can be shown in a user-defined column. Or if one wants to have a certain column in uppercase, one can display this uppercase form in a user-defined column. One can essentially perform any function, which does not make use of other variables (variables are not allowed for security reasons) and which fits into one statement. If a function is needed which cannot be created in this manner or if one is unsure wheather or not it can be created, one can contact the developer within the lexicon screen.

There are five predefined string functions. substring displays a substring of a static column. To use the function, one first clicks a user-defined column, clicks the substring button, selects a static column within the popup screen, specifies the starting position (0 is the first character) and specifies the length of the substring. The reverse function just displays the reversed form of a static column. The substitute function is the same as the substitute button in the regular expressions section, but instead of performing the substitution within the static column itself, the substituted form is displayed within the selected user-defined column. The combine function combines the static columns which are selected within the popup screen into a string which is displayed within the user-defined column. The count sequence function displays the number of times a certain sequence occurs within a static column. Tip: if one wants to use a user-defined column to add restrictions, one can create a query, where first the user-defined column is filled (i.e. with the number of times a sequence occurs) and subsequently the column is tested with boolean operators or regular expressions. For instance, we want to obtain the rows, where 'Head' contains one or more of the sequence 'al' and we want to display the number of occurences within a user-defined column. The following procedure will satisfy this need:

  1. Select and accept 'Head' within the column selection screen.
  2. Press 'Add Column' within the restrictions screen.
  3. Fill in the name of the user-defined column (i.e. MyCount) and press 'Ok'.
  4. Click 'MyCount'.
  5. Click 'count sequence'.
  6. Click 'Head'.
  7. Fill in 'al' into the second textfield.
  8. Press 'Ok'.
  9. Click '&&'.
  10. Click 'MyCount'.
  11. Click '>'.
  12. Fill in '0' into the textfield.
  13. Press 'Ok'.
The query field should now contain:

((MyCount=(split/al/,' '.Head.' ')-1)||1) && (MyCount > '0')

and ensures that upon retrieval, only those rows are retrieved where the number of occurences of 'al' is larger than zero. Voila.

  • Please note that since the user-defined column is assumed to be of the character type, quotes appear around the 0 in the last part of the query. If one wants to avoid strange behaviour, these quotes should be removed manually, since otherwise the ASCII values of MyCount are compared with the ASCII value of 0, instead of their real values.

    Numeric Functions

    Numeric functions are only available for user-defined columns. Please read 'String Functions' if it is not clear what user-defined columns are. There are five predefined numeric functions. round, rounds a numeric column to a specified precision. binary converts a column from a decimal to a binary representation. random creates a column with random values (0 or 1 for the integer representation and a probability with a specific precision for the floating point representation). minimum returns the minimum value of every selected static column within the popup screen (alphabetic or numeric value). maximum returns the maximum value of every selected static column within the popup screen (alphabetic or numeric value). length returns the length of a selected static column.

    Example:
    Suppose we have selected 'Head' and want to compute the length of each headword. Instead of using the static 'HeadCnt' column, we can adopt the following procedure.

    1. Select and accept 'Head' within the column selection screen.
    2. Press 'Add Column' within the restrictions screen.
    3. Fill in the name of the user-defined column (i.e. MyCount) and press 'Ok'.
    4. Click 'MyCount'.
    5. Click 'length'.
    6. Click 'Head'.
    7. Press 'Ok'.
    The query field should now contain:

    (MyCount=(length(Head))||1)

    and ensures that upon retrieval the lengths of the headwords are displayed within the 'MyCount' column.

    File Uploading

    File uploading is a very powerful option. One can imagine that a user wants to retrieve data for only a subset of the database; for instance only for some words, or only for some specific frequencies. Instead of having to put a restriction in the query field such as (Word eq 'appel' || Word eq 'aardbei' || Word eq 'banaan'...) or (Inl==2 || Inl==10 || Inl==12...) one can use the file upload mechanism. Within the file upload textfield one can specify the name of the file containing the elements which should (or should not) appear in the results. One can use the Browse button to search for the file. Within the Column? pull-down menu, one must select the column on which the file operates. If Exclude is checked, the lines will be retrieved NOT containing the elements in the file. If it is not checked, the lines will be retrieved which DO contain the elements in the file. The match once operator is an operator which ensures that if the exclude operator is inactive, all the elements in the file will be displayed only once. This is necessary when the elements in the file are unique (such as words or id-numbers), preventing the search to go on when no more elements should be retrieved. The file should contain each element on a new line and does a perfect match on those elements (i.e. it is case sensitive).

    An example:
    Imagine that a user has retrieved a list of words via WebCelex. The user now decides that he wants to retrieve extra information on these words. He selects the list of words, puts them in a file mywords in his home directory, which looks like this:
    appel
    aardbei
    banaan
    .
    .
    .

    Within the restrictions screen he browses for the file mywords, which upon selection appears within the textfield. He then selects the column Words from his selection of columns and checks the box Match once. When the results are retrieved, his list of words will appear in addition with the extra columns he could have selected.

    Special Characters

    Special characters are those characters which are present within the database, but which are not present on a standard keyboard. In order to search on items containing those characters, the special characters can be selected directly and used within the query field or the popup screens. The special characters differ from database to database.

    The Query Field

    The query field is the textfield where the resulting query remains. The textfield can be filled by using the predefined buttons, by directly typing a restriction within the field, or by a combination of the two. One should be aware that the query field is evaluated as standard Perl code, in the form 'if (eval(TheQueryIHaveTyped)) PrintTheRow else DoNothing'. So the query should be interpretable by Perl. Some Perl keywords are forbidden for reasons of security. If such keywords are used, you'll get a message displaying:

    Unexpected Error
    Illegal character(s) encountered : 'the illegal character(s)'