docs/assistant/ModelConverter_en.md
When Angel application is complete, any PSModel will be saved in binary format for enhancing speed and saving space. Understanding that some users may need to have the PSModels saved in plaintext format, we created ModelConverter class to parse the PSModel binary file to plaintext.
Before conversion, Angel's binary model file has the following default design:
After conversion, Angel's plaintext model file has the following default design:
rowIndex=0
0:-0.004235138405748639
1:-0.003367253227582031
3:-0.003988846053264014
6:0.001803243020660425
8:1.9413353447408782E-4
Currently, Angel supports two converter modes: stand-alone mode and distributed mode.
Conversion is done on the machine that runs the script (typically, the gateway machine that is used to submit the Angel job). In general, this mode is easy to use thus is recommended. The submit command is shown below:
./bin/angel-model-convert \
--angel.load.model.path ${hdfs_path} \
--angel.save.model.path ${hdfs_path} \
--angel.modelconverts.model.names ${models} \
--angel.modelconverts.serde.class ${SerdeClass}
Even though the stand-alone mode is easy to use, it is subject to the gateway machine's resource and can have poor performance due to the consumption of CPU and network IO, especially when the model is large. The distributed mode is provided to address this issue.
The distributed mode starts an Angel job under the Yarn mode, where conversion is done within the worker, taking advantage of Angel's distributed processing capability. This mode puts only small pressure on the gateway machine and improves speed. The submit command is shown below:
./bin/angel-submit \
--angel.app.submit.class com.tencent.angel.ml.toolkits.modelconverter.ModelConverterRunner \
--angel.load.model.path ${hdfs_path} \
--angel.save.model.path ${hdfs_path} \
--angel.modelconverts.model.names ${models} \
--angel.modelconverts.serde.class ${SerdeClass}
Parameters: