docs/datasets/vertical_and_multilingual_datasets.en.md
Here we have sorted out the commonly used vertical multi-language OCR dataset datasets, which are being updated continuously. We welcome you to contribute datasets ~
Data source:CCPD
Data introduction: It contains more than 250000 vehicle license plate images and vehicle license plate detection and recognition information labeling. It contains the following license plate image information in different scenes.
CCPD-Base: General license plate picture
CCPD-DB: The brightness of license plate area is bright, dark or uneven
CCPD-FN: The license plate is farther or closer to the camera location
CCPD-Rotate: License plate includes rotation (horizontal 20~50 degrees, vertical-10~10 degrees)
CCPD-Tilt: License plate includes rotation (horizontal 15~45 degrees, vertical 15~45 degrees)
CCPD-Blur: The license plate contains blurring due to camera lens jitter
CCPD-Weather: The license plate is photographed on rainy, snowy or foggy days
CCPD-Challenge: So far, some of the most challenging images in license plate detection and recognition tasks
CCPD-NP: Pictures of new cars without license plates.
Download address
Data source: source
Data introduction: There are three types of training data
1.Sample card data of China Merchants Bank: including card image data and annotation data, a total of 618 pictures
2.Single character data: including pictures and annotation data, 37 pictures in total.
3.There are only other bank cards, no more detailed information, a total of 50 pictures.
The demo image is shown as follows. The annotation information is stored in excel, and the demo image below is marked as
Top 8 card number: 62257583
Card type: card of our bank
End of validity: 07/41
Chinese phonetic alphabet of card users: MICHAEL
Download address: cmb2017-2.zip
Data source: captcha
Data introduction: This is a toolkit for data synthesis. You can output captcha images according to the input text. Use the toolkit to generate several demo images as follows.
Download address: The dataset is generated and has no download address.