examples/README.md
<a href="https://explosion.ai"></a>
For spaCy v3 we've converted many of the v2 example scripts into end-to-end spacy projects workflows. The workflows include all the steps to go from data to packaged spaCy models.
The simplest demos for training a single pipeline component are in the
pipelines category
including:
pipelines/ner_demo:
Train a named entity recognizerpipelines/textcat_demo:
Train a text classifierpipelines/parser_intent_demo:
Train a dependency parser for custom semanticsThe tutorials
category includes examples that work through specific NLP use cases end-to-end:
tutorials/textcat_goemotions:
Train a text classifier to categorize emotions in Reddit poststutorials/nel_emerson:
Use an entity linker to disambiguate mentions of the same nameCheck out the projects documentation and browse through the available projects!
The
pipelines/ner_demo
project converts the spaCy v2
train_ner.py
demo script into a spaCy v3 project.
Clone the project:
python -m spacy project clone pipelines/ner_demo
Install requirements and download any data assets:
cd ner_demo
python -m pip install -r requirements.txt
python -m spacy project assets
Run the default workflow to convert, train and evaluate:
python -m spacy project run all
Sample output:
ℹ Running workflow 'all'
================================== convert ==================================
Running command: /home/user/venv/bin/python scripts/convert.py en assets/train.json corpus/train.spacy
Running command: /home/user/venv/bin/python scripts/convert.py en assets/dev.json corpus/dev.spacy
=============================== create-config ===============================
Running command: /home/user/venv/bin/python -m spacy init config --lang en --pipeline ner configs/config.cfg --force
ℹ Generated config template specific for your use case
- Language: en
- Pipeline: ner
- Optimize for: efficiency
- Hardware: CPU
- Transformer: None
✔ Auto-filled config with all values
✔ Saved config
configs/config.cfg
You can now add your data and train your pipeline:
python -m spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy
=================================== train ===================================
Running command: /home/user/venv/bin/python -m spacy train configs/config.cfg --output training/ --paths.train corpus/train.spacy --paths.dev corpus/dev.spacy --training.eval_frequency 10 --training.max_steps 100 --gpu-id -1
ℹ Using CPU
=========================== Initializing pipeline ===========================
[2021-03-11 19:34:59,101] [INFO] Set up nlp object from config
[2021-03-11 19:34:59,109] [INFO] Pipeline: ['tok2vec', 'ner']
[2021-03-11 19:34:59,113] [INFO] Created vocabulary
[2021-03-11 19:34:59,113] [INFO] Finished initializing nlp object
[2021-03-11 19:34:59,265] [INFO] Initialized pipeline components: ['tok2vec', 'ner']
✔ Initialized pipeline
============================= Training pipeline =============================
ℹ Pipeline: ['tok2vec', 'ner']
ℹ Initial learn rate: 0.001
E # LOSS TOK2VEC LOSS NER ENTS_F ENTS_P ENTS_R SCORE
--- ------ ------------ -------- ------ ------ ------ ------
0 0 0.00 7.90 0.00 0.00 0.00 0.00
10 10 0.11 71.07 0.00 0.00 0.00 0.00
20 20 0.65 22.44 50.00 50.00 50.00 0.50
30 30 0.22 6.38 80.00 66.67 100.00 0.80
40 40 0.00 0.00 80.00 66.67 100.00 0.80
50 50 0.00 0.00 80.00 66.67 100.00 0.80
60 60 0.00 0.00 100.00 100.00 100.00 1.00
70 70 0.00 0.00 100.00 100.00 100.00 1.00
80 80 0.00 0.00 100.00 100.00 100.00 1.00
90 90 0.00 0.00 100.00 100.00 100.00 1.00
100 100 0.00 0.00 100.00 100.00 100.00 1.00
✔ Saved pipeline to output directory
training/model-last
Package the model:
python -m spacy project run package
Visualize the model's output with Streamlit:
python -m spacy project run visualize-model