Help

Max Planck Institute for Psycholinguistics

WebCelex Help - Export screen



The export screen is used for formatting the output. The items will be discussed in the following order:
Title
Title specifies the name of the lexicon which is to be generated. This is default the name of the database which was chosen in the lexicon screen, but can be changed at will. This name will also appear in the subject header when the retrieve mode is email.

Retrieve Mode
Retrieve Mode specifies the location to where the output is sent. WWW sends the output to a html page and email sends the output to the email-address which is specified in address. The default address is the address which is associated with the person who has logged in, but this can be changed at will.

Format
Format is one of two predefined formats, which can be selected.

Default shows data in the following form:
Column1\Column2\Column3Alternative1%Column3Alternative2\Column4

LISP shows data in the following form:
((Column1),(Column2),(Column3Alternative1,Column3Alternative2),Column4)

Table Format
If Table Format is selected, the results are shown in a tabular format. This is not completely foolproof for email, but works perfectly for WWW. If Borders is checked, there are borders drawn around each cell (works only for WWW retrieve mode). But beware, if you want to see results quickly in WWW format, this option should be turned off, since the table can only be displayed when the full results are retrieved.

BOC
Beginning of column; the character in this field is shown at the beginning of every column.

EOC
End of column; the character in this field is shown at the end of every column.

BOL
Beginning of line; the character in this field is shown at the beginning of every line.

EOL
End of line; the character in this field is shown at the end of every line.

Delimiter
Delimiter; the character in this field is shown between every column; it separates the columns.

Alternatives
If alternatives is selected, the other options for a given column are given. There are some columns where there is not one specific value, but there are several values. The alternative Delimiter specifies how alternatives should be separated. One can only perform searches on the first option in a given column and not on the other alternatives. Furthermore, one should use delimiters which do not clash with information within the column.

Rows
Within Rows one can specify which lines must be retrieved. Random only searches a random percentage of the database, thus generating less output. Interval only searches a subset of the database and Maximum specifies the maximum number of lines which should be retrieved. Random, Interval and Maximum can be used in conjunction. Maximum is automatically activated when one retrieves data via email to prevent the files from growing too large. For Interval one can specify a numeric or alphabetic range. Within the numeric range, one can specify the lines to be retrieved. For instance, from 1 to 100 step 2 retrieves 50 lines, namely 1,3,5,7, etc. Within the alphabetic range one should specify the starting position of the beginning characters of the words which are to be retrieved (the first two characters are indexed and one should not use diacritics) and the end position of the beginning characters of the words which are to be retrieved. For example, if we want to retrieve all lines starting at 'aren' and ending with 'azijn', 'aren' should be filled into the Begin field and 'azijn' should be filled into the End field. These interval ranges are indexed, so one should always try to use this interval mode whenever appropriate; note however that for some databases the alphabetical range does not function properly all the time. This is caused by the structure of CELEX, where some words are not appropriately sorted. Faulty behavior has been observed within the German and Dutch Wordforms database. Tip: if one specifies a restriction like Word ge 'beer' one should always put something like 'beer' in the interval's Begin field and 'zzz' in the interval's End field to make use of this quick indexed method.

Note: When a query takes longer than an hour to finish, the retrieval of results will be automatically aborted. So note that whenever a query comes close to one hour, it could be the case that the retrieved results are incomplete (but ofcourse one may wonder if the query is valid whenever results take longer than an hour to retrieve).

Word
Word adds the lexicon word without diacritics at the beginning of the table; one cannot set restrictions on this column. The words contain links to external databases. By clicking on a word one can access dictionary information (the second mousebutton opens the link in a new window). The english words are linked to information provided by the WordNet project at Princeton University. The german words are linked to information provided by LEO - Link Everything Online, which is an online service by Informatik der Technischen Universität München. These external links are not maintained by the MPI and can die at any time. Please report this to the person responsible for maintaining WebCelex. There is currently no link to a dutch dictionary. For suggestions on dutch dictionaries or alternatives for the current English and German dictionaries, also contact your developer.

Column Names
Column Names inserts the column names as shown in the column selection screen at the top of the table.

Counter
Counter simply adds a rownumber at the beginning of the table.

Query
Query shows the column information and restrictions, as specified in the column selection page and the restrictions page at the beginning of the page, thus ensuring that upon saving the data, one can determine what the data represents.

Special Characters
if Special Characters is active, the characters with diacritics are shown in the proper format, otherwise the database representation is shown. I.e. instead of ä, "a is shown. Turning this option off is generally faster since characters don't have to be converted.