Configuration

1. Optional Parameter List

The following list can be viewed through --help

FLAG	Supported script	Use	Defaults	Note
-c	ALL	Specify configuration file to use	None	Please refer to the parameter introduction for configuration file usage
-o	ALL	set configuration options	None	Configuration using -o has higher priority than the configuration file selected with -c. E.g: -o Global.use_gpu=false

2. Introduction to Global Parameters of Configuration File

Take rec_chinese_lite_train_v2.0.yml as an example

Global

Parameter	Use	Defaults	Note
use_gpu	Set using GPU or not	true	\
epoch_num	Maximum training epoch number	500	\
log_smooth_window	Log queue length, the median value in the queue each time will be printed	20	\
print_batch_step	Set print log interval	10	\
save_model_dir	Set model save path	output/{算法名称}	\
save_epoch_step	Set model save interval	3	\
eval_batch_step	Set the model evaluation interval	2000 or [1000, 2000]	running evaluation every 2000 iters or evaluation is run every 2000 iterations after the 1000th iteration
cal_metric_during_train	Set whether to evaluate the metric during the training process. At this time, the metric of the model under the current batch is evaluated	true	\
load_static_weights	Set whether the pre-training model is saved in static graph mode (currently only required by the detection algorithm)	true	\
pretrained_model	Set the path of the pre-trained model	./pretrain_models/CRNN/best_accuracy	\
checkpoints	set model parameter path	None	Used to load parameters after interruption to continue training
use_visualdl	Set whether to enable visualdl for visual log display	False	Tutorial
use_wandb	Set whether to enable W&B for visual log display	False	Documentation
infer_img	Set inference image path or folder path	./infer_img	\
character_dict_path	Set dictionary path	./ppocr/utils/ppocr_keys_v1.txt	If the character_dict_path is None, model can only recognize number and lower letters
max_text_length	Set the maximum length of text	25	\
use_space_char	Set whether to recognize spaces	True	\
label_list	Set the angle supported by the direction classifier	['0','180']	Only valid in angle classifier model
save_res_path	Set the save address of the test model results	./output/det_db/predicts_db.txt	Only valid in the text detection model

Optimizer (ppocr/optimizer)

Parameter	Use	Defaults	Note
name	Optimizer class name	Adam	Currently supports `Momentum`,`Adam`,`RMSProp`, see ppocr/optimizer/optimizer.py
beta1	Set the exponential decay rate for the 1st moment estimates	0.9	\
beta2	Set the exponential decay rate for the 2nd moment estimates	0.999	\
clip_norm	The maximum norm value	-	\
lr	Set the learning rate decay method	-	\
name	Learning rate decay class name	Cosine	Currently supports`Linear`,`Cosine`,`Step`,`Piecewise`, see ppocr/optimizer/learning_rate.py
learning_rate	Set the base learning rate	0.001	\
regularizer	Set network regularization method	-	\
name	Regularizer class name	L2	Currently support `L1`,`L2`, see ppocr/optimizer/regularizer.py
factor	Regularizer coefficient	0.00001	\

Architecture (ppocr/modeling)

In PaddleOCR, the network is divided into four stages: Transform, Backbone, Neck and Head

Parameter	Use	Defaults	Note
model_type	Network Type	rec	Currently support`rec`,`det`,`cls`
algorithm	Model name	CRNN	See algorithm_overview for the support list
Transform	Set the transformation method	-	Currently only recognition algorithms are supported, see ppocr/modeling/transform for details
name	Transformation class name	TPS	Currently supports `TPS`
num_fiducial	Number of TPS control points	20	Ten on the top and bottom
loc_lr	Localization network learning rate	0.1	\
model_name	Localization network size	small	Currently support`small`,`large`
Backbone	Set the network backbone class name	-	see ppocr/modeling/backbones
name	backbone class name	ResNet	Currently support`MobileNetV3`,`ResNet`
layers	resnet layers	34	Currently support18,34,50,101,152,200
model_name	MobileNetV3 network size	small	Currently support`small`,`large`
Neck	Set network neck	-	see ppocr/modeling/necks
name	neck class name	SequenceEncoder	Currently support`SequenceEncoder`,`DBFPN`
encoder_type	SequenceEncoder encoder type	rnn	Currently support`reshape`,`fc`,`rnn`
hidden_size	rnn number of internal units	48	\
out_channels	Number of DBFPN output channels	256	\
Head	Set the network head	-	see ppocr/modeling/heads
name	head class name	CTCHead	Currently support`CTCHead`,`DBHead`,`ClsHead`
fc_decay	CTCHead regularization coefficient	0.0004	\
k	DBHead binarization coefficient	50	\
class_dim	ClsHead output category number	2	\

Loss (ppocr/losses)

Parameter	Use	Defaults	Note
name	loss class name	CTCLoss	Currently support`CTCLoss`,`DBLoss`,`ClsLoss`
balance_loss	Whether to balance the number of positive and negative samples in DBLossloss (using OHEM)	True	\
ohem_ratio	The negative and positive sample ratio of OHEM in DBLossloss	3	\
main_loss_type	The loss used by shrink_map in DBLossloss	DiceLoss	Currently support`DiceLoss`,`BCELoss`
alpha	The coefficient of shrink_map_loss in DBLossloss	5	\
beta	The coefficient of threshold_map_loss in DBLossloss	10	\

PostProcess (ppocr/postprocess)

Parameter	Use	Defaults	Note
name	Post-processing class name	CTCLabelDecode	Currently support`CTCLoss`,`AttnLabelDecode`,`DBPostProcess`,`ClsPostProcess`
thresh	The threshold for binarization of the segmentation map in DBPostProcess	0.3	\
box_thresh	The threshold for filtering output boxes in DBPostProcess. Boxes below this threshold will not be output	0.7	\
max_candidates	The maximum number of text boxes output in DBPostProcess	1000
unclip_ratio	The unclip ratio of the text box in DBPostProcess	2.0	\

Metric (ppocr/metrics)

Parameter	Use	Defaults	Note
name	Metric method name	CTCLabelDecode	Currently support`DetMetric`,`RecMetric`,`ClsMetric`
main_indicator	Main indicators, used to select the best model	acc	For the detection method is hmean, the recognition and classification method is acc

Dataset (ppocr/data)

Parameter	Use	Defaults	Note
dataset	Return one sample per iteration	-	-
name	dataset class name	SimpleDataSet	Currently support`SimpleDataSet`,`LMDBDataSet`
data_dir	Image folder path	./train_data	\
label_file_list	Groundtruth file path	["./train_data/train_list.txt"]	This parameter is not required when dataset is LMDBDataSet
ratio_list	Ratio of data set	[1.0]	If there are two train_lists in label_file_list and ratio_list is [0.4,0.6], 40% will be sampled from train_list1, and 60% will be sampled from train_list2 to combine the entire dataset
transforms	List of methods to transform images and labels	[DecodeImage,CTCLabelEncode,RecResizeImg,KeepKeys]	see ppocr/data/imaug
loader	dataloader related	-
shuffle	Does each epoch disrupt the order of the data set	True	\
batch_size_per_card	Single card batch size during training	256	\
drop_last	Whether to discard the last incomplete mini-batch because the number of samples in the data set cannot be divisible by batch_size	True	\
num_workers	The number of sub-processes used to load data, if it is 0, the sub-process is not started, and the data is loaded in the main process	8	\

Weights & Biases (W&B)

Parameter	Use	Defaults	Note
project	Project to which the run is to be logged	uncategorized	\
name	Alias/Name of the run	Randomly generated by wandb	\
id	ID of the run	Randomly generated by wandb	\
entity	User or team to which the run is being logged	The logged in user	\
save_dir	local directory in which all the models and other data is saved	wandb	\
config	model configuration	None	\

3. Multilingual Config File Generation

PaddleOCR currently supports recognition for 80 languages (besides Chinese). A multi-language configuration file template is provided under the path configs/rec/multi_languages: rec_multi_language_lite_train.yml.

There are two ways to create the required configuration file:

Automatically generated by script

Script generate_multi_language_configs.py can help you generate configuration files for multi-language models.

Take Italian as an example, if your data is prepared in the following format:

text

|-train_data
    |- it_train.txt # train_set label
    |- it_val.txt # val_set label
    |- data
        |- word_001.jpg
        |- word_002.jpg
        |- word_003.jpg
        | ...

You can use the default parameters to generate a configuration file:

bash

# The code needs to be run in the specified directory
cd PaddleOCR/configs/rec/multi_language/
# Set the configuration file of the language to be generated through the -l or --language parameter.
# This command will write the default parameters into the configuration file
python3 generate_multi_language_configs.py -l it

If your data is placed in another location, or you want to use your own dictionary, you can generate the configuration file by specifying the relevant parameters:

bash

# -l or --language field is required
# --train to modify the training set
# --val to modify the validation set
# --data_dir to modify the data set directory
# --dict to modify the dict path
# -o to modify the corresponding default parameters
cd PaddleOCR/configs/rec/multi_language/
python3 generate_multi_language_configs.py -l it \  # language
--train {path/of/train_label.txt} \ # path of train_label
--val {path/of/val_label.txt} \     # path of val_label
--data_dir {train_data/path} \      # root directory of training data
--dict {path/of/dict} \             # path of dict
-o Global.use_gpu=False             # whether to use gpu
...

Italian is made up of Latin letters, so after executing the command, you will get the rec_latin_lite_train.yml.

Manually modify the configuration file

You can also manually modify the following fields in the template:

yaml

 Global:
   use_gpu: True
   epoch_num: 500
   ...
   character_dict_path:  {path/of/dict} # path of dict

Train:
   dataset:
     name: SimpleDataSet
     data_dir: train_data/ # root directory of training data
     label_file_list: ["./train_data/train_list.txt"] # train label path
   ...

Eval:
   dataset:
     name: SimpleDataSet
     data_dir: train_data/ # root directory of val data
     label_file_list: ["./train_data/val_list.txt"] # val label path
   ...

Currently, the multi-language algorithms supported by PaddleOCR are:

Configuration file	Algorithm name	backbone	trans	seq	pred	language
rec_chinese_cht_lite_train.yml	CRNN	Mobilenet_v3 small 0.5	None	BiLSTM	ctc	chinese traditional
rec_en_lite_train.yml	CRNN	Mobilenet_v3 small 0.5	None	BiLSTM	ctc	English(Case sensitive)
rec_french_lite_train.yml	CRNN	Mobilenet_v3 small 0.5	None	BiLSTM	ctc	French
rec_ger_lite_train.yml	CRNN	Mobilenet_v3 small 0.5	None	BiLSTM	ctc	German
rec_japan_lite_train.yml	CRNN	Mobilenet_v3 small 0.5	None	BiLSTM	ctc	Japanese
rec_korean_lite_train.yml	CRNN	Mobilenet_v3 small 0.5	None	BiLSTM	ctc	Korean
rec_latin_lite_train.yml	CRNN	Mobilenet_v3 small 0.5	None	BiLSTM	ctc	Latin
rec_arabic_lite_train.yml	CRNN	Mobilenet_v3 small 0.5	None	BiLSTM	ctc	arabic
rec_cyrillic_lite_train.yml	CRNN	Mobilenet_v3 small 0.5	None	BiLSTM	ctc	cyrillic
rec_devanagari_lite_train.yml	CRNN	Mobilenet_v3 small 0.5	None	BiLSTM	ctc	devanagari

For more supported languages, please refer to: Multi-language model

The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded using the following two methods.

Baidu Netdisk, Extraction code:frgi.