docs/design/model_format_en.md
Angel's Models are stored in units of matrices. Each matrix corresponds to a folder named after it in the model storage path, which contains the matrix's metadata and data files. A matrix has only one corresponding metadata file but usually multiple data files, as most of Angel's models are derived from Parameter Server.
Metadatas are stored in JSON format. A matrix metadata consists mainly of matrix features, partition indices and row-related indices, which are described by the MatrixFilesMeta, MatrixPartitionMeta and RowPartitionMeta classes respectively.
Purpose: matrix-related information
Included Fields:
Purpose: metadata of the partition block
Included Fields:
Purpose: matadata of a specific row slice in a partition block Included Fileds:
Angel 2.0 adopts a user-defined model format. That is, the model output format can be customized according to practical needs. It's generally fine to use Angel's default output formats. Since Angel's default output formats are relatively simple, most of the models stored in defualt formats can be directly parsed without metadata files.
Angel provides 8 default model output formats:
ValueBinaryRowFormat, ColIdValueBinaryRowFormat, RowIdColIdValueBinaryRowFormat, ValueTextRowFormat, ColIdValueTextRowFormat, RowIdColIdValueTextRowFormat, BinaryColumnFormat and TextColumnFormat. These eight formats are described below.
value
value
value
index,value
index,value
index,value
rowid,index,value
rowid,index,value
rowid,index,value
index,row1 value,row2 value,...
index,row1 value,row2 value,...
index,row1 value,row2 value,...
Algorithms in Angel are currently implemented basing on a new computational graph framework, in which each layer can be individually formatted. By default, SimpleInputLayer uses ColIdValueTextRowFormat, Embedding layer uses TextColumnFormat, FCLayer uses RowIdColIdValueTextRowFormat.
ColIdValueTextRowFormat by defaultRowIdColIdValueTextRowFormatColIdValueTextRowFormat, while the Embedding layer uses TextColumnFormatColIdValueTextRowFormat, while the Embedding layer uses TextColumnFormat, and the fully connected part uses RowIdColIdValueTextRowFormatOf course, if you don't want to use the default format, you can configure your model output format via the following parameters:
Furthermore, if the 8 output formats provided by Angel cannot meet your requirements, you can also choose to extend the RowFormat or ColumnFormat class to customize your format at will. The detailed implementation is very simple, and the 8 formats that are currently available can provide useful references. After implementation, compile and pack up the new format, add it to Angel's depencencies using parameters provided by Angel, and configure via the four parameters mentioned above to use your custom output format.