data/README.md
hello!
here is the data compressed and compiled into the word models that compromise uses to understand text.
there are some things to note:
run npm run pack after making a change, to see changes appear.
lexicon words are lowercased and compressed with efrt, some characters are reserved -[0-9,;!:|¦]
be careful adding ambiguous words - 'ray' should not be a #Person - it's a better fit for ./switches/person-date.js
many word-lists have conjugations automatically applied to them - #Singular words are pluralized, etc.
the lexicon output data can be found in ./src/2-two/preTagger/model/lexicon/_data.js
and the word-conjugation data can be found in ./src/2-two/preTagger/model/models/_data.js
for more information, see the compromise-lexicon docs.