Help

Max Planck Institute for Psycholinguistics

WebCelex Help - Project Documentation screen

This document contains an overview of all the files which comprise this webbased application. Every file (or class of files) is handled separately and a description is given of its functionality. All the files presented in this document should be available on cd as a backup of the full application.

HTML files CGI scripts Databases

.char class

The .char class is a set of indexes. It is used by interval when an alphabetic range is chosen and significantly speeds up the lookup process. It works by providing the byteoffsets for every separate database based on the first two characters of the word.

.num class

The .num class is a set of indexes. It is used by interval when a numeric range is chosen and significantly speeds up the lookup process. It works by providing the byteoffsets for every separate database in steps of 500 rows.

.idx class

The .idx class provides an index which couples the wordforms IdNums with the Lemma IdNums. In this way, lemma information can be presented within the wordforms section, looking up the byteoffsets for the lemma db's for a given wordform IdNum.

.dat class

The .dat class is a set of files to generate the output for the columns Familysize and Familyfrequency. Those files could be added to the .cd files, but by doing this all the indexfiles would have to be recreated, which is an immense task. Therefore, those files have been added separately. Dirty, but quick.

Login.html

Login.html is the file which provides the entry point to the WebCelex application. A user should enter his useraccount and password as specified in the .passwords file in the /cgi directory.

Cols.html class

HTML files ending with cols.html are the left frames for the column selection page. Its parent is the cgi script colstart.pl. In this file the items which can be selected from a specific database are specified. Three hidden fields are submitted, being info, comprising useraccount, password and email-address; cnames, comprising the column names and groups, comprising the group in which the column names reside.

Dia.html class

HTML files ending with dia.html are the right frames for the column selection page. Its parent is the cgi script colstart.pl. In this file, the column descriptions for the items to be selected in cols.html reside.

Helpfiles

File NameContains
dab.html help on Dutch Abbreviations
dabn.htmlhelp on Dutch New Abbreviations
dct.html help on Dutch Corpustypes
dlemmas.html help on Dutch Lemmas
dsyl.html help on Dutch Syllables
dudb.html help on Dutch uitdenBoogaart
dwordforms.htmlhelp on Dutch Wordforms
gct.htmlhelp on German Corpustypes
glemmas.htmlhelp on German Lemmas
gsyl.htmlhelp on German Syllables
gwordforms.htmlhelp on German Wordforms
ect.htmlhelp on English Corpustypes
elemmas.htmlhelp on English Lemmas
esyl.htmlhelp on English Syllables
ewordforms.htmlhelp on English Wordforms
help01.htmlthe first overview helpscreen
helpCode.htmlthis document
helpCohorts.htmlhelp on Toolbox Cohorts
helpColumn.htmlhelp on Column Selection
helpExport.htmlhelp on Export Screen
helpLexicon.htmlhelp on Lexicon Screen
helpLogin.htmlhelp on Login Screen
helpNeighbours.htmlhelp on Toolbox Neighbours
helpNgrams.htmlhelp on Toolbox N-grams
helpRestrictions.htmlhelp on Restrictions Screen
helpSorting.htmlhelp on Toolbox Sorting
helpStatistics.htmlhelp on Toolbox Statistics
helpToolbox.htmlhelp on Toolbox
helpUniqueness.htmlhelp on Toolbox Uniqueness Points


Entry.pl

Within entry, the login and password are validated. One can select a database, which calls colstart.pl with a specific language and type. Colstart.pl is the parent of all cols.html and dia.html files. Alternatively, the user can type a message, which is passed to the mailit.pl program.

Export.pl

Export.pl is the cgi script which calls the results script get.pl. Here the formatting of the results is specified.

Get.pl class

The get.pl class of files are the scripts, providing the resulting data for a specific database. These are the most complex scripts in the application.

Mailit.pl

The mailit.pl script mails a message specified by the user in entry.pl to the developer.

Restrictions.pl

The restrictions.pl file is responsible for the restrictions screen in which restrictions on the data can be set.

Colstart.pl

The colstart.pl file is the parent providing the frameset for the column selection screen. It passes login and passwords to its frames.

.passwords file

Within the .passwords file, login, password and email for each user is specified.


Introduction

This application makes use of the database as specified on the CELEX cdrom. There are however some differences. The alternatives for certain columns are represented at the end of a row on the cdrom (the fields repeat themselves). However, within this application, the alternatives are all present within the same field and separated by a % sign. Furthermore, there are some db's which contain variants on the cdrom. These lines were to long for the awk-scripts on the cdrom to be processed and were put into separate files. Within this application, the variants are reintegrated with their original db. There are many fields which are present in several databases, but the application only makes use of one field in one db (i.e, it uses Inl within frequency, but not Inl within Orthography or Morphology). These fields could be cut from the db's which would mean a speedup of the application, but this has severe implications for the get.pl files. Finally, there are many fields which are not present within the db, but which can be deduced from it. This has been done within the get.pl files, where the orginal awk scripts have been converted and are applied at run-time when necessary.

dab.cd

The dab.cd file contains the following fields:
  1. Abbr
  2. AbbrDia
  3. Meaning
  4. Inl
  5. InlDev
  6. InlMln
  7. InlLog
  8. OrthoCnt

dabn.cd

The dabn.cd file contains the following fields:
  1. IdNum
  2. OrthoCnt
  3. OrthoNum
  4. AbbrChg
  5. NewOrthoStatus
  6. OldOrthoStatus
  7. Abbr
  8. AbbrDia
  9. Meaning
  10. Inl
  11. InlDev
  12. InlMln
  13. InlLog
  14. AbbrImage

dct.cd

The dct.cd file contains the following fields:
  1. Type
  2. Freq
  3. Disp
  4. Indet
  5. Status

dfl.cd

The dfl.cd file contains the following fields:
  1. IdNum
  2. Head
  3. Inl
  4. InlDev
  5. InlMln
  6. InlLog
  7. Dict

dfs.cd

The dfs.cd file contains the following fields:
  1. Syllable
  2. SylPos
  3. SylInlMln
  4. SylTotInlMln

dfw.cd

The dfw.cd file contains the following fields:
  1. IdNum
  2. Word
  3. IdNumLemma
  4. Inl
  5. InlDev
  6. InlMln
  7. InlLog

dml2.cd

IndexDML-fields in CELEXDML-fields in CD-file
1IdNumColumn 1
2HeadColumn 2
3MorphStatusColumn 4
4MorphCntColumn 5
5MorphNumobsolete
6DerCompColumn 6
7CompColumn 7
8DefColumn 8
9ImmColumn 9
10ImmClassConvertVerbNumbersToV(ImmSubCat);
11ImmSubCatColumn 10
12ImmSAConvertImmWordClassToSAPattern(ImmSubCat);
13ImmAlloColumn 11
14ImmSubstColumn 12
15FlatStripStructureMarkers(StrucLab);
16FlatClassExtractWordClass(StrucLab);
17FlatSA ConvertFlatWordClassToSAPattern(StrucLab);
18Struc StripClassLabels(StrucLab);
19StrucLabColumn 13
20StrucBrackLabStripOrthographicInformation(StrucLab);
21StrucAlloColumn 14
22StrucSubstColumn 15
23CompCntCountMorpComponents(ImmSubCat);
24MorCntCountMorphemes(StrucLab);
25LevelCntCountLevels(StrucLab);
26SepaColumn 16

There are 11 fields which can have alternative parsings. These parsings are separated by a % sign. These eleven fields are:

dol3.cd

IndexDOL-fields in CELEXDOL-fields in CD-file
1IdNumColumn 1
2OrthoCntColumn 4
3OrthoNumobsolete
4OrthoStatusColumn 5
5InlSpellFreqColumn 6
6InlSpellDevColumn 7
7HeadStripDiacritics(HeadDia);
8HeadRevReverseString(HeadDia);
9HeadDiaColumn 2
10HeadLowToLower(HeadDia);
11HeadLowSortSortString(HeadDia);
12HeadCntCountCharacters(HeadDia);
13HeadSylStripDiacritics(HeadSylDia);
14HeadSylDiaColumn 8
15HeadSylCntCountSyllables(HeadSylDia);
16StemStripDiacritics(StemDia);
17StemRevReverseString(StemDia);
18StemDiaColumn 9
19StemCntCountCharacters(StemDia);
20StemSylStripDiacritics(StemSylDia);
21StemSylDiaColumn 10
22StemSylCntCountSyllables(StemSylDia);
23AbStemStripDiacritics(AbStemDia);
24AbStemDiaColumn 11
25AbStemCntCountCharacters(AbStemDia);

There are 8 fields which can have alternative parsings. These parsings are separated by a % sign. These eight fields are: The original doln file has been integrated with the original dol file to create dol3.cd. Therefore, the fields HeadDiaNew, SpellChg and HeadSylDiaNew are the last three fields of this dol3.cd database.

dow3.cd

IndexDOW-fields in CELEXDOW-fields in CD-file
1 IdNum Column 1
2 OrthoCnt Column 5
3 OrthoNum obsolete
4 OrthoStatus Column 6
5 InlSpellFreq Column 7
6 InlSpellDev Column 8
7 Word StripDiacritics(WordDia);
8 WordRev ReverseString(WordDia);
9 WordDia Column 2
10 WordLow ToLower(WordDia);
11 WordLowSort SortString(WordDia);
12 WordCnt CountCharacters(WordDia);
13 WordSyl StripDiacritics(WordSylDia);
14 WordSylDia Column 9
15 WordSylCnt CountSyllables(WordSylDia);
16 IdNumLemma Column 4

There are 5 fields which can have alternative parsings. These parsings are separated by a % sign. These eight fields are: The original down file has been integrated with the original dow file to create dow3.cd. Therefore, the fields WordDiaNew, SpellChg and WordSylDiaNew are the last three fields of this dow3.cd database.

dpl.cd

IndexDPL-fields in CELEXDPL-fields in CD-file
1 IdNum Column 1
2 PhonSAM PhoneticTranscriptions(PhonStrsDISC) SP
3 PhonCLX PhoneticTranscriptions(PhonStrsDISC) CX
4 PhonCPA PhoneticTranscriptions(PhonStrsDISC) CP
5 PhonDISC PhoneticTranscriptions(PhonStrsDISC)
6 PhonCnt NumOfChar(PhonStrsDISC)
7 PhonSylSAM PhonSylTranscriptions(PhonStrsDISC) SP
8 PhonSylCLX PhonSylTranscriptions(PhonStrsDISC) CX
9 PhonSylBCLX Column 6
10 PhonSylCPA PhonSylTranscriptions(PhonStrsDISC) CP
11 PhonSylDISC PhonSylTranscriptions(PhonStrsDISC)
12 SylCnt CountSyllables(PhonStrsDISC)
13 PhonStrsSAM PhonStrsTranscriptions(PhonStrsDISC) SP
14 PhonStrsCLX PhonStrsTranscriptions(PhonStrsDISC) CX
15 PhonStrsCPA PhonStrsTranscriptions(PhonStrsDISC) CP
16 PhonStrsDISC Column 4
17 StrsPat MakeStressPattern(PhonStrsDISC)
18 PhonStSAM PhoneticTranscriptions(PhonStrsStDISC) SP
19 PhonStCLX PhoneticTranscriptions(PhonStrsStDISC) CX
20 PhonStCPA PhoneticTranscriptions(PhonStrsStDISC) CP
21 PhonStDISC PhoneticTranscriptions(PhonStrsStDISC)
22 PhonStCnt NumOfChar(PhonStrsStDISC)
23 PhonSylStSAM PhonSylTranscriptions(PhonStrsStDISC) SP
24 PhonSylStCLX PhonSylTranscriptions(PhonStrsStDISC) CX
25 PhonSylStBCLX Column 9
26 PhonSylStCPA PhonSylTranscriptions(PhonStrsStDISC) CP
27 PhonSylStDISC PhonSylTranscriptions(PhonStrsStDISC)
28 StSylCnt CountSyllables(PhonStrsStDISC)
29 PhonStrsStSAM PhonStrsTranscriptions(PhonStrsStDISC) SP
30 PhonStrsStCLX PhonStrsTranscriptions(PhonStrsStDISC) CX
31 PhonStrsStCPA PhonStrsTranscriptions(PhonStrsStDISC) CP
32 PhonStrsStDISC Column 7
33 StStrsPat MakeStressPattern(PhonStrsStDISC)
34 PhonCV ConvertBrackets(PhonCVBr)
35 PhonCVBr Column 5
36 PhonStCV ConvertBrackets(PhonStCVBr)
37 PhonStCVBr Column 8
38 PhonolCLX Column 10
39 PhonolCPA Column 11

dpw.cd

IndexDPW-fields in CELEXDPW-fields in CD-file
1 IdNum Column 1
2 PhonSAM PhoneticTranscriptions(PhonStrsDISC) SP
3 PhonCLX PhoneticTranscriptions(PhonStrsDISC) CX
4 PhonCPA PhoneticTranscriptions(PhonStrsDISC) CP
5 PhonDISC PhoneticTranscriptions(PhonStrsDISC)
6 PhonCnt NumOfChar(PhonStrsDISC)
7 PhonSylSAM PhonSylTranscriptions(PhonStrsDISC) SP
8 PhonSylCLX PhonSylTranscriptions(PhonStrsDISC) CX
9 PhonSylBCLX Column 7
10 PhonSylCPA PhonSylTranscriptions(PhonStrsDISC) CP
11 PhonSylDISC PhonSylTranscriptions(PhonStrsDISC)
12 SylCnt CountSyllables(PhonStrsDISC)
13 PhonStrsSAM PhonStrsTranscriptions(PhonStrsDISC) SP
14 PhonStrsCLX PhonStrsTranscriptions(PhonStrsDISC) CX
15 PhonStrsCPA PhonStrsTranscriptions(PhonStrsDISC) CP
16 PhonStrsDISC Column 5
17 StrsPat MakeStressPattern(PhonStrsDISC)
18 PhonCV ConvertBrackets(PhonCVBr)
19 PhonCVBr Column 6

dsl.cd

The dsl.cd file contains the following fields:
  1. IdNum
  2. Head
  3. Inl
  4. ClassNum
  5. GendNum
  6. DeHetNum
  7. PropNum
  8. AuxNum
  9. SubClassVNum
  10. SubCatNum
  11. AdvNum
  12. CardOrdNum
  13. SubClassPNum

dvl2.cd

The dvl file has alternative parsings for its complementation frames. For a specification of the meaning of the fields within the db, please check the associated README file on the cdrom.

ect.cd

The ect.cd file contains the following fields:
  1. Type
  2. Freq
  3. FreqW
  4. FreqWB
  5. FreqWA
  6. FreqWU
  7. FreqS
  8. FreqSB
  9. FreqSU

efl.cd

The efl.cd file contains the following fields:
  1. IdNum
  2. Head
  3. Cob
  4. CobDev
  5. CobMln
  6. CobLog
  7. CobW
  8. CobWMln
  9. CobWLog
  10. CobS
  11. CobSMln
  12. CobSLog

efs.cd

The file contains the following fields:
  1. Syllable
  2. SylPos
  3. SylInlMln
  4. SylTotInlMln

eml3.cd

IndexEML-fields in CELEXEML-fields in CD-file
1 IdNum Column 1
2 Head Column 2
3 MorphStatus Column 4
4 Lang Column 5
5 MorphCnt Column 6
6 MorphNum obsolete
7 NVAffComp Column 7
8 Der Column 8
9 Comp Column 9
10 DerComp Column 10
11 Def Column 11
12 Imm Column 12
13 ImmClass ConvertVerbNumbersToV(ImmSubCat);
14 ImmSubCat Column 13
15 ImmSA Column 14
16 ImmAllo Column 15
17 ImmSubst Column 16
18 ImmOpac Column 17
19 TransDer Column 18
20 ImmInfix Column 19
21 ImmRevers Column 20
22 Flat StripStructureMarkers(StrucLab);
23 FlatClass ExtractWordClass(StrucLab);
24 FlatSA Column 21
25 Struc StripClassLabels(StrucLab);
26 StrucLab Column 22
27 StrucBrackLab StripOrthographicInformation(StrucLab);
28 StrucAllo Column 23
29 StrucSubst Column 24
30 StrucOpac Column 25
31 CompCnt CountMorpComponents(ImmSubCat);
32 MorphCnt CountMorphemes(StrucLab);
33 LevelCnt CountLevels(StrucLab);

There are 19 fields which can have alternative parsings. These parsings are separated by a % sign. These nineteen fields are: 6 variants have been reintegrated with the database.

emw.cd

The emw.cd file contains the following fields:
  1. IdNum
  2. Word
  3. Cob
  4. IdNumLemma
  5. FlectType
  6. TransInfl
The awk directory contains the script: script TypeToInflectionalFeatures(String): type2fea.awk for reconstructing the 13 Inflectional Features fields outlined in the CELEX User Guide:
  1. Sing
  2. Plu
  3. Pos
  4. Comp
  5. Sup
  6. Inf
  7. Part
  8. Pres
  9. Past
  10. Sin1
  11. Sin2
  12. Sin3
  13. Rare

eol2.cd

IndexEOL-fields in CELEXEOL-fields in CD-file
1 IdNum Column 1
2 OrthoCnt Column 4
3 OrthoNum obsolete
4 OrthoStatus Column 5
5 CobSpellFreq Column 6
6 CobSpellDev Column 7
7 Head StripDiacritics(HeadDia);
8 HeadRev ReverseString(HeadDia);
9 HeadDia Column 2
10 HeadLow ToLower(HeadDia);
11 HeadLowSort SortString(HeadDia);
12 HeadCnt CountCharacters(HeadDia);
13 HeadSyl StripDiacritics(HeadSylDia);
14 HeadSylDia Column 8
15 HeadSylCnt CountSyllables(HeadSylDia);

There are 4 fields which can have alternative parsings. These parsings are separated by a % sign. These four fields are:

eow2.cd

IndexEOW-fields in CELEXEOW-fields in CD-file
1 IdNum Column 1
2 OrthoCnt Column 5
3 OrthoNum obsolete
4 OrthoStatus Column 6
5 CobSpellFreq Column 7
6 CobSpellDev Column 8
7 Word StripDiacritics(WordDia);
8 WordRev ReverseString(WordDia);
9 WordDia Column 2
10 WordLow ToLower(WordDia);
11 WordLowSort SortString(WordDia);
12 WordCnt CountCharacters(WordDia);
13 WordSyl StripDiacritics(WordSylDia);
14 WordSylDia Column 9
15 WordSylCnt CountSyllables(WordSylDia);
16 IdNumLemma Column 4

epl3.cd

IndexEPL-fields in CELEXEPL-fields in CD-file
1 IdNum Column 1
2 PronCnt Column 4
3 PronNum obsolete
4 PronStatus Column 5
5 PhonSAM PhoneticTranscriptions(PhonStrsDISC) SP
6 PhonCLX PhoneticTranscriptions(PhonStrsDISC) CX
7 PhonCPA PhoneticTranscriptions(PhonStrsDISC) CP
8 PhonDISC PhoneticTranscriptions(PhonStrsDISC)
9 PhonCnt NumOfChar(PhonStrsDISC)
10 PhonSylSAM PhonSylTranscriptions(PhonStrsDISC) SP
11 PhonSylCLX PhonSylTranscriptions(PhonStrsDISC) CX
12 PhonSylBCLX Column 8
13 PhonSylCPA PhonSylTranscriptions(PhonStrsDISC) CP
14 PhonSylDISC PhonSylTranscriptions(PhonStrsDISC)
15 SylCnt CountSyllables(PhonStrsDISC)
16 PhonStrsSAM PhonStrsTranscriptions(PhonStrsDISC) SP
17 PhonStrsCLX PhonStrsTranscriptions(PhonStrsDISC) CX
18 PhonStrsCPA PhonStrsTranscriptions(PhonStrsDISC) CP
19 PhonStrsDISC Column 6
20 StrsPat MakeStressPattern(PhonStrsDISC)
21 PhonCV ConvertBrackets(PhonCVBr)
22 PhonCVBr Column 7

There are 4 fields which can have alternative parsings. These parsings are separated by a % sign. These four fields are: 16 variants have been reintegrated with the database.

epw3.cd

IndexEPW-fields in CELEXEPW-fields in CD-file
1 IdNum Column 1
2 PronCnt Column 5
3 PronNum obsolete
4 PronStatus Column 6
5 PhonSAM PhoneticTranscriptions(PhonStrsDISC) SP
6 PhonCLX PhoneticTranscriptions(PhonStrsDISC) CX
7 PhonCPA PhoneticTranscriptions(PhonStrsDISC) CP
8 PhonDISC PhoneticTranscriptions(PhonStrsDISC)
9 PhonCnt NumOfChar(PhonStrsDISC)
10 PhonSylSAM PhonSylTranscriptions(PhonStrsDISC) SP
11 PhonSylCLX PhonSylTranscriptions(PhonStrsDISC) CX
12 PhonSylBCLX Column 9
13 PhonSylCPA PhonSylTranscriptions(PhonStrsDISC) CP
14 PhonSylDISC PhonSylTranscriptions(PhonStrsDISC)
15 SylCnt CountSyllables(PhonStrsDISC)
16 PhonStrsSAM PhonStrsTranscriptions(PhonStrsDISC) SP
17 PhonStrsCLX PhonStrsTranscriptions(PhonStrsDISC) CX
18 PhonStrsCPA PhonStrsTranscriptions(PhonStrsDISC) CP
19 PhonStrsDISC Column 7
20 StrsPat MakeStressPattern(PhonStrsDISC)
21 PhonCV ConvertBrackets(PhonCVBr)
22 PhonCVBr Column 8

There are 4 fields which can have alternative parsings. These parsings are separated by a % sign. These four fields are: 48 variants have been reintegrated with the database.

esl.cd

The esl.cd file contains the following fields:
  1. IdNum
  2. Head
  3. Cob
  4. ClassNum
  5. C_N
  6. Unc_N
  7. Sing_N
  8. Plu_N
  9. GrC_N
  10. GrUnc_N
  11. Attr_N
  12. PostPos_N
  13. Voc_N
  14. Proper_N
  15. Exp_N
  16. Trans_V
  17. TransComp_V
  18. Intrans_V
  19. Ditrans_V
  20. Link_V
  21. Phr_V
  22. Prep_V
  23. PhrPrep_V
  24. Exp_V
  25. Ord_A
  26. Attr_A
  27. Pred_A
  28. PostPos_A
  29. Exp_A
  30. Ord_ADV
  31. Pred_ADV
  32. PostPos_ADV
  33. Comb_ADV
  34. Exp_ADV
  35. Card_NUM
  36. Ord_NUM
  37. Exp_NUM
  38. Pers_PRON
  39. Dem_PRON
  40. Poss_PRON
  41. Refl_PRON
  42. Wh_PRON
  43. Det_PRON
  44. Pron_PRON
  45. Exp_PRON
  46. Cor_C
  47. Sub_C

gct.cd

The gct.cd file contains the following fields:
  1. Type
  2. Freq
  3. Disp
  4. FreqW
  5. DispW
  6. FreqS
  7. DispS

gfl.cd

The gfl.cd file contains the following fields:
  1. IdNum
  2. Head
  3. Mann
  4. MannDev
  5. MannMln
  6. MannLog
  7. MannW
  8. MannWMln
  9. MannWLog
  10. MannS
  11. MannSMln
  12. MannSLog

gfs.cd

The gfs.cd file contains the following fields:
  1. Syllable
  2. SylPos
  3. SylInlMln
  4. SylTotInlMln

gfw.cd

The gfw.cd file contains the following fields:
  1. IdNum
  2. Word
  3. IdNumLemma
  4. Mann
  5. MannDev
  6. MannMln
  7. MannLog
  8. MannW
  9. MannWMln
  10. MannWLog
  11. MannS
  12. MannSMln
  13. MannSLog

gml2.cd

IndexGML-fields in CELEXGML-fields in CD-file
1 IdNum Column 1
2 MorphStatus Column 4
3 MorphCnt Column 5
4 MorphNum obsolete
5 DerComp Column 6
6 Comp Column 7
7 Def Column 8
8 Imm Column 9
9 ImmClass Column 10
10 ImmSA ConvertImmWordClassToSAPattern(ImmClass);
11 ImmAllo Column 11
12 ImmOpac Column 12
13 ImmUml Column 13
14 Flat StripStructureMarkers(StrucLab);
15 FlatClass ExtractWordClass(StrucLab);
16 FlatSA ConvertFlatWordClassToSAPattern(StrucLab);
17 Struc StripClassLabels(StrucLab);
18 StrucLab Column 14
19 StrucBracLab StripOrthographicInformation(StrucLab);
20 StrucAllo Column 15
21 StrucOpac Column 16
22 StrucUml Column 17
23 CompCnt CountMorpComponents(ImmClass);
24 MorCnt CountMorphemes(StrucLab);
25 LevelCnt CountLevels(StrucLab);
26 Sepa Column 18
27 InflPar Column 19
28 InflVar Column 20

There are 15 fields which can have alternative parsings. These parsings are separated by a % sign. These fifteen fields are:

gmw.cd

The gmw.cd file contains the following fields:
  1. IdNum
  2. Word
  3. Mann
  4. IdNumLemma
  5. FlectType
The awk directory contains the script: script TypeToInflectionalFeatures(String): type2fea.awk for reconstructing the 29 Inflectional Features fields outlined in the CELEX User Guide:
  1. Sepa
  2. Sing
  3. Plu
  4. Nom
  5. Gen
  6. Dat
  7. Acc
  8. Pos
  9. Comp
  10. Sup
  11. Inf
  12. ZuInf
  13. Part
  14. Pres
  15. Past
  16. Sin1
  17. Sin2
  18. Sin3
  19. Plu13
  20. Plu2
  21. Ind
  22. Sub
  23. Imp
  24. Suff_e
  25. Suff_en
  26. Suff_er
  27. Suff_em
  28. Suff_es
  29. Suff_s

gol.cd

IndexGOL-fields in CELEXGOL-fields in CD-file
1 IdNum Column 1
2 Head StripDiacritics(HeadDia);
3 HeadRev ReverseString(HeadDia);
4 HeadDia Column 2
5 HeadLow ToLower(HeadDia);
6 HeadLowSort SortString(HeadDia);
7 HeadLowSortDia obsolete (not derivable, as diacr appear separately)
8 HeadCnt CountCharacters(HeadDia);
9 HeadSyl StripDiacritics(HeadSylDia);
10 HeadSylDia Column 4
11 HeadSylChg Column 5
12 HeadSylCnt CountSyllables(HeadSylDia);
13 Stem StripDiacritics(StemDia);
14 StemRev ReverseString(StemDia);
15 StemDia Column 6
16 StemCnt CountCharacters(StemDia);
17 StemSyl StripDiacritics(StemSylDia);
18 StemSylDia Column 7
19 StemSylChg Column 8
20 StemSylCnt CountSyllables(StemSylDia);

gow.cd

IndexGOW-fields in CELEXGOW-fields in CD-file
1 IdNum Column 1
2 Word StripDiacritics(WordDia);
3 WordRev ReverseString(WordDia);
4 WordDia Column 2
5 WordLow ToLower(WordDia);
6 WordLowSort SortString(WordDia);
7 WordLowSortDia obsolete (not derivable, as diacr. appear separately)
8 WordCnt CountCharacters(WordDia);
9 WordSyl StripDiacritics(WordSylDia);
10 WordSylDia Column 5
11 WordSylChg Column 6
12 WordSylCnt CountSyllables(WordSylDia);
13 IdNumLemma Column 4

gpl.cd

IndexGPL-fields in CELEXGPL-fields in CD-file
1 IdNum Column 1
2 PhonSAM PhoneticTranscriptions(PhonStrsDISC) SP
3 PhonCLX PhoneticTranscriptions(PhonStrsDISC) CX
4 PhonCPA PhoneticTranscriptions(PhonStrsDISC) CP
5 PhonDISC PhoneticTranscriptions(PhonStrsDISC)
6 PhonCnt NumOfChar(PhonStrsDISC)
7 PhonSylSAM PhonSylTranscriptions(PhonStrsDISC) SP
8 PhonSylCLX PhonSylTranscriptions(PhonStrsDISC) CX
9 PhonSylBCLX Column 5
10 PhonSylCPA PhonSylTranscriptions(PhonStrsDISC) CP
11 PhonSylDISC PhonSylTranscriptions(PhonStrsDISC)
12 SylCnt CountSyllables(PhonStrsDISC)
13 PhonStrsSAM PhonStrsTranscriptions(PhonStrsDISC) SP
14 PhonStrsCLX PhonStrsTranscriptions(PhonStrsDISC) CX
15 PhonStrsCPA PhonStrsTranscriptions(PhonStrsDISC) CP
16 PhonStrsDISC Column 4
17 StrsPat MakeStressPattern(PhonStrsDISC)
18 PhonStSAM PhoneticTranscriptions(PhonStrsStDISC) SP
19 PhonStCLX PhoneticTranscriptions(PhonStrsStDISC) CX
20 PhonStCPA PhoneticTranscriptions(PhonStrsStDISC) CP
21 PhonStDISC PhoneticTranscriptions(PhonStrsStDISC)
22 PhonStCnt NumOfChar(PhonStrsStDISC)
23 PhonSylStSAM PhonSylTranscriptions(PhonStrsStDISC) SP
24 PhonSylStCLX PhonSylTranscriptions(PhonStrsStDISC) CX
25 PhonSylStBCLX Column 7
26 PhonSylStCPA PhonSylTranscriptions(PhonStrsStDISC) CP
27 PhonSylStDISC PhonSylTranscriptions(PhonStrsStDISC)
28 StSylCnt CountSyllables(PhonStrsStDISC)
29 PhonStrsStSAM PhonStrsTranscriptions(PhonStrsStDISC) SP
30 PhonStrsStCLX PhonStrsTranscriptions(PhonStrsStDISC) CX
31 PhonStrsStCPA PhonStrsTranscriptions(PhonStrsStDISC) CP
32 PhonStrsStDISC Column 6
33 StStrsPat MakeStressPattern(PhonStrsStDISC)
34 PhonCV ConvertBrackets(PhonCVBr)
35 PhonCVBr Column 8
36 PhonStCV ConvertBrackets(PhonStCVBr)
37 PhonStCVBr Column 9
38 PhonolSAM Column 10
39 PhonolCLX Column 11

gpw.cd

IndexGPW-fields in CELEXGPW-fields in CD-file
1 IdNum Column 1
2 PhonSAM PhoneticTranscriptions(PhonStrsDISC) SP
3 PhonCLX PhoneticTranscriptions(PhonStrsDISC) CX
4 PhonCPA PhoneticTranscriptions(PhonStrsDISC) CP
5 PhonDISC PhoneticTranscriptions(PhonStrsDISC)
6 PhonCnt NumOfChar(PhonStrsDISC)
7 PhonSylSAM PhonSylTranscriptions(PhonStrsDISC) SP
8 PhonSylCLX PhonSylTranscriptions(PhonStrsDISC) CX
9 PhonSylBCLX Column 6
10 PhonSylCPA PhonSylTranscriptions(PhonStrsDISC) CP
11 PhonSylDISC PhonSylTranscriptions(PhonStrsDISC)
12 SylCnt CountSyllables(PhonStrsDISC)
13 PhonStrsSAM PhonStrsTranscriptions(PhonStrsDISC) SP
14 PhonStrsCLX PhonStrsTranscriptions(PhonStrsDISC) CX
15 PhonStrsCPA PhonStrsTranscriptions(PhonStrsDISC) CP
16 PhonStrsDISC Column 5
17 StrsPat MakeStressPattern(PhonStrsDISC)
18 PhonCV ConvertBrackets(PhonCVBr)
19 PhonCVBr Column 7

gsl.cd

The gsl.cd file contains the following fields:
  1. IdNum
  2. Head
  3. Mann
  4. ClassNum
  5. GendNum
  6. PropNum
  7. SingTant
  8. PlurTant
  9. AuxNum
  10. SubClassVNum
  11. CompComp
  12. CompEsSubj
  13. CompSubj
  14. CompAcc
  15. CompSecAcc
  16. CompDat
  17. CompGen
  18. CompPrep
  19. CompSecPrep
  20. CompAdv
  21. Grad
  22. CardOrdNum
  23. SubClassPNum
  24. Case

udb.cd

The udb.cd file contains the following fields:
  1. Word1
  2. Code1
  3. Word2
  4. Code2
  5. FreqW
  6. FreqS