extra/example_data/textcat_example_data/README.md
spacy JSON training files were generated from JSONL with:
python textcatjsonl_to_trainjson.py -m en file.jsonl .
cooking.json is an example with mutually-exclusive classes with two labels:
bakingnot_bakingjigsaw-toxic-comment.json is an example with multiple labels per instance:
insultobscenesevere_toxictoxiccooking.jsonl: https://cooking.stackexchange.com. The meta IDs link to the
original question as https://cooking.stackexchange.com/questions/ID, e.g.,
https://cooking.stackexchange.com/questions/2 for the first instance.jigsaw-toxic-comment.jsonl: Jigsaw Toxic Comments Classification
Challengecooking.jsonl: CC BY-SA 4.0 (CC_BY-SA-4.0.txt)jigsaw-toxic-comment.jsonl:
CC_BY-SA-3.0.txt)CC0.txt)