Back to Paddleocr

Configuration

docs/version2.x/ppocr/blog/config.en.md

3.5.018.3 KB
Original Source

Configuration

1. Optional Parameter List

The following list can be viewed through --help

FLAGSupported scriptUseDefaultsNote
-cALLSpecify configuration file to useNonePlease refer to the parameter introduction for configuration file usage
-oALLset configuration optionsNoneConfiguration using -o has higher priority than the configuration file selected with -c. E.g: -o Global.use_gpu=false

2. Introduction to Global Parameters of Configuration File

Take rec_chinese_lite_train_v2.0.yml as an example

Global

ParameterUseDefaultsNote
use_gpuSet using GPU or nottrue\
epoch_numMaximum training epoch number500\
log_smooth_windowLog queue length, the median value in the queue each time will be printed20\
print_batch_stepSet print log interval10\
save_model_dirSet model save pathoutput/{算法名称}\
save_epoch_stepSet model save interval3\
eval_batch_stepSet the model evaluation interval2000 or [1000, 2000]running evaluation every 2000 iters or evaluation is run every 2000 iterations after the 1000th iteration
cal_metric_during_trainSet whether to evaluate the metric during the training process. At this time, the metric of the model under the current batch is evaluatedtrue\
load_static_weightsSet whether the pre-training model is saved in static graph mode (currently only required by the detection algorithm)true\
pretrained_modelSet the path of the pre-trained model./pretrain_models/CRNN/best_accuracy\
checkpointsset model parameter pathNoneUsed to load parameters after interruption to continue training
use_visualdlSet whether to enable visualdl for visual log displayFalseTutorial
use_wandbSet whether to enable W&B for visual log displayFalseDocumentation
infer_imgSet inference image path or folder path./infer_img\
character_dict_pathSet dictionary path./ppocr/utils/ppocr_keys_v1.txtIf the character_dict_path is None, model can only recognize number and lower letters
max_text_lengthSet the maximum length of text25\
use_space_charSet whether to recognize spacesTrue\
label_listSet the angle supported by the direction classifier['0','180']Only valid in angle classifier model
save_res_pathSet the save address of the test model results./output/det_db/predicts_db.txtOnly valid in the text detection model

Optimizer (ppocr/optimizer)

ParameterUseDefaultsNote
nameOptimizer class nameAdamCurrently supports Momentum,Adam,RMSProp, see ppocr/optimizer/optimizer.py
beta1Set the exponential decay rate for the 1st moment estimates0.9\
beta2Set the exponential decay rate for the 2nd moment estimates0.999\
clip_normThe maximum norm value-\
lrSet the learning rate decay method-\
nameLearning rate decay class nameCosineCurrently supportsLinear,Cosine,Step,Piecewise, see ppocr/optimizer/learning_rate.py
learning_rateSet the base learning rate0.001\
regularizerSet network regularization method-\
nameRegularizer class nameL2Currently support L1,L2, see ppocr/optimizer/regularizer.py
factorRegularizer coefficient0.00001\

Architecture (ppocr/modeling)

In PaddleOCR, the network is divided into four stages: Transform, Backbone, Neck and Head

ParameterUseDefaultsNote
model_typeNetwork TyperecCurrently supportrec,det,cls
algorithmModel nameCRNNSee algorithm_overview for the support list
TransformSet the transformation method-Currently only recognition algorithms are supported, see ppocr/modeling/transform for details
nameTransformation class nameTPSCurrently supports TPS
num_fiducialNumber of TPS control points20Ten on the top and bottom
loc_lrLocalization network learning rate0.1\
model_nameLocalization network sizesmallCurrently supportsmall,large
BackboneSet the network backbone class name-see ppocr/modeling/backbones
namebackbone class nameResNetCurrently supportMobileNetV3,ResNet
layersresnet layers34Currently support18,34,50,101,152,200
model_nameMobileNetV3 network sizesmallCurrently supportsmall,large
NeckSet network neck-see ppocr/modeling/necks
nameneck class nameSequenceEncoderCurrently supportSequenceEncoder,DBFPN
encoder_typeSequenceEncoder encoder typernnCurrently supportreshape,fc,rnn
hidden_sizernn number of internal units48\
out_channelsNumber of DBFPN output channels256\
HeadSet the network head-see ppocr/modeling/heads
namehead class nameCTCHeadCurrently supportCTCHead,DBHead,ClsHead
fc_decayCTCHead regularization coefficient0.0004\
kDBHead binarization coefficient50\
class_dimClsHead output category number2\

Loss (ppocr/losses)

ParameterUseDefaultsNote
nameloss class nameCTCLossCurrently supportCTCLoss,DBLoss,ClsLoss
balance_lossWhether to balance the number of positive and negative samples in DBLossloss (using OHEM)True\
ohem_ratioThe negative and positive sample ratio of OHEM in DBLossloss3\
main_loss_typeThe loss used by shrink_map in DBLosslossDiceLossCurrently supportDiceLoss,BCELoss
alphaThe coefficient of shrink_map_loss in DBLossloss5\
betaThe coefficient of threshold_map_loss in DBLossloss10\

PostProcess (ppocr/postprocess)

ParameterUseDefaultsNote
namePost-processing class nameCTCLabelDecodeCurrently supportCTCLoss,AttnLabelDecode,DBPostProcess,ClsPostProcess
threshThe threshold for binarization of the segmentation map in DBPostProcess0.3\
box_threshThe threshold for filtering output boxes in DBPostProcess. Boxes below this threshold will not be output0.7\
max_candidatesThe maximum number of text boxes output in DBPostProcess1000
unclip_ratioThe unclip ratio of the text box in DBPostProcess2.0\

Metric (ppocr/metrics)

ParameterUseDefaultsNote
nameMetric method nameCTCLabelDecodeCurrently supportDetMetric,RecMetric,ClsMetric
main_indicatorMain indicators, used to select the best modelaccFor the detection method is hmean, the recognition and classification method is acc

Dataset (ppocr/data)

ParameterUseDefaultsNote
datasetReturn one sample per iteration--
namedataset class nameSimpleDataSetCurrently supportSimpleDataSet,LMDBDataSet
data_dirImage folder path./train_data\
label_file_listGroundtruth file path["./train_data/train_list.txt"]This parameter is not required when dataset is LMDBDataSet
ratio_listRatio of data set[1.0]If there are two train_lists in label_file_list and ratio_list is [0.4,0.6], 40% will be sampled from train_list1, and 60% will be sampled from train_list2 to combine the entire dataset
transformsList of methods to transform images and labels[DecodeImage,CTCLabelEncode,RecResizeImg,KeepKeys]see ppocr/data/imaug
loaderdataloader related-
shuffleDoes each epoch disrupt the order of the data setTrue\
batch_size_per_cardSingle card batch size during training256\
drop_lastWhether to discard the last incomplete mini-batch because the number of samples in the data set cannot be divisible by batch_sizeTrue\
num_workersThe number of sub-processes used to load data, if it is 0, the sub-process is not started, and the data is loaded in the main process8\

Weights & Biases (W&B)

ParameterUseDefaultsNote
projectProject to which the run is to be loggeduncategorized\
nameAlias/Name of the runRandomly generated by wandb\
idID of the runRandomly generated by wandb\
entityUser or team to which the run is being loggedThe logged in user\
save_dirlocal directory in which all the models and other data is savedwandb\
configmodel configurationNone\

3. Multilingual Config File Generation

PaddleOCR currently supports recognition for 80 languages (besides Chinese). A multi-language configuration file template is provided under the path configs/rec/multi_languages: rec_multi_language_lite_train.yml.

There are two ways to create the required configuration file:

  1. Automatically generated by script

Script generate_multi_language_configs.py can help you generate configuration files for multi-language models.

  • Take Italian as an example, if your data is prepared in the following format:

    text
    |-train_data
        |- it_train.txt # train_set label
        |- it_val.txt # val_set label
        |- data
            |- word_001.jpg
            |- word_002.jpg
            |- word_003.jpg
            | ...
    

    You can use the default parameters to generate a configuration file:

    bash
    # The code needs to be run in the specified directory
    cd PaddleOCR/configs/rec/multi_language/
    # Set the configuration file of the language to be generated through the -l or --language parameter.
    # This command will write the default parameters into the configuration file
    python3 generate_multi_language_configs.py -l it
    
  • If your data is placed in another location, or you want to use your own dictionary, you can generate the configuration file by specifying the relevant parameters:

    bash
    # -l or --language field is required
    # --train to modify the training set
    # --val to modify the validation set
    # --data_dir to modify the data set directory
    # --dict to modify the dict path
    # -o to modify the corresponding default parameters
    cd PaddleOCR/configs/rec/multi_language/
    python3 generate_multi_language_configs.py -l it \  # language
    --train {path/of/train_label.txt} \ # path of train_label
    --val {path/of/val_label.txt} \     # path of val_label
    --data_dir {train_data/path} \      # root directory of training data
    --dict {path/of/dict} \             # path of dict
    -o Global.use_gpu=False             # whether to use gpu
    ...
    
    

Italian is made up of Latin letters, so after executing the command, you will get the rec_latin_lite_train.yml.

  1. Manually modify the configuration file

    You can also manually modify the following fields in the template:

    yaml
     Global:
       use_gpu: True
       epoch_num: 500
       ...
       character_dict_path:  {path/of/dict} # path of dict
    
    Train:
       dataset:
         name: SimpleDataSet
         data_dir: train_data/ # root directory of training data
         label_file_list: ["./train_data/train_list.txt"] # train label path
       ...
    
    Eval:
       dataset:
         name: SimpleDataSet
         data_dir: train_data/ # root directory of val data
         label_file_list: ["./train_data/val_list.txt"] # val label path
       ...
    
    

Currently, the multi-language algorithms supported by PaddleOCR are:

Configuration fileAlgorithm namebackbonetransseqpredlanguage
rec_chinese_cht_lite_train.ymlCRNNMobilenet_v3 small 0.5NoneBiLSTMctcchinese traditional
rec_en_lite_train.ymlCRNNMobilenet_v3 small 0.5NoneBiLSTMctcEnglish(Case sensitive)
rec_french_lite_train.ymlCRNNMobilenet_v3 small 0.5NoneBiLSTMctcFrench
rec_ger_lite_train.ymlCRNNMobilenet_v3 small 0.5NoneBiLSTMctcGerman
rec_japan_lite_train.ymlCRNNMobilenet_v3 small 0.5NoneBiLSTMctcJapanese
rec_korean_lite_train.ymlCRNNMobilenet_v3 small 0.5NoneBiLSTMctcKorean
rec_latin_lite_train.ymlCRNNMobilenet_v3 small 0.5NoneBiLSTMctcLatin
rec_arabic_lite_train.ymlCRNNMobilenet_v3 small 0.5NoneBiLSTMctcarabic
rec_cyrillic_lite_train.ymlCRNNMobilenet_v3 small 0.5NoneBiLSTMctccyrillic
rec_devanagari_lite_train.ymlCRNNMobilenet_v3 small 0.5NoneBiLSTMctcdevanagari

For more supported languages, please refer to: Multi-language model

The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded using the following two methods.