Back to Angel

Model Config Details En

docs/algo/model_config_details_en.md

3.1.013.6 KB
Original Source

Task Types,worker,network configurations

Property NameDefaultMeaning
traintrainAngel task type,training model
predictpredictuse model to predict
inctraininctrainincremental training of existing models
ml.matrix.dot.use.parallel.executorfalsewhether to use parallel in dot of dense matrix
angel.worker.thread.num1the number of threads in a worker
angel.compress.bytes8low precision compression, the size of each floating point number can be set to [1,8]

Data Parameters

Property NameDefaultMeaning
ml.data.typelibsvmAngel input data types,supports:libsvm,dense,dummy
ml.data.splitor\s+input data separator, customizable separator
ml.data.has.labeltruewhether the input data has a label, the default has a label
ml.data.label.trans.classNoTranstrans the label of input data ,supports;NoTrans<do not trans>,PosNegTrans(threshold)<use threshold to convert the label +1, -1,when greater than the threshold return 1, otherwise -1>,ZeroOneTrans(threshold)<use threshold to convert the label 1, 0,when greater than the threshold return 0>,AddOneTrans<all labels +1>,SubOneTrans<all labels -1>
ml.data.label.trans.threshold0the threshold of label trans ,use with PosNegTrans and ZeroOneTrans in ml.data.label.trans.class
ml.data.validate.ratio0.05proportion of data used for validation (no validation when set to 0)
ml.feature.index.range-1input data dimension, because the feature hash, can not fill the entire hash space, there are a lot of gaps, the configuration is the size of the hash space. The maximum featureID value +1, when the -1 is selected, the feature dimension can be mapped to (0, Long.max)
ml.block.size1000000The size of each block after dividing the matrix, the number of rows * the number of columns <= block.size, the purpose is to make the matrix partition evenly
ml.data.use.shufflefalsewhether to shuffle the data
ml.data.posneg.ratio-1positive and negative sample sampling ratio, -1 means off sampling function, normal value is positive real number (0~1), useful for positive and negative samples with large difference (such as 5 times or more)

Data Types

nameDescription
libsvmeach line of text represents a sample, and the format of each sample is "y index1:value1 index2:value2 index3:value3 ...". Where: index is the ID of the feature, and value is the corresponding feature value; y of the training data is the class of the sample, and two values of 1, 1 can be taken; y of the predicted data is the ID value of the sample. For example, the text of a sample [2.0, 3.1, 0.0, 0.0, -1, 2.2] belonging to a positive class is represented as "1 0:2.0 1:3.1 4:-1 5:2.2", where "1" is a category," 0: 2.0" indicates that the value of the 0th feature is 2.0. Similarly, the samples belonging to the negative class [2.0, 0.0, 0.1, 0.0, 0.0, 0.0] are expressed as "-1 0: 2.0 2: 0.1"
denseeach line of text represents a sample, and the format of each sample is "y value1 value2 value3 ...". The y of the training data is the category of the sample, which can take two values of 1, 1; the y of the predicted data is the ID value of the sample. For example, the text of the sample [2.0, 3.1, 0.0, 0.0, -1, 2.2] belonging to the positive class is represented as "1 2.0 3.1 -1 2.2", where "1" is the category and "2.0" is the 0th feature. The value is 2.0. Similarly, the samples belonging to the negative class [2.0, 0.0, 0.1, 0.0, 0.0, 0.0] are expressed as "-1 2.0 0.1"
dummyeach line of text represents a sample, and each sample has the format "y index1 index2 index3 ...". Where: the ID of the index feature; the y of the training data is the category of the sample, which can take two values of 1, 1; the y of the predicted data is the ID value of the sample. For example, the text of a sample [2.0, 3.1, 0.0, 0.0, -1, 2.2] belonging to a positive class is represented as "1 0 1 4 5", where "1" is a category and "0 1 4 5" represents a feature vector. The values of the 0th, 1st, 4th, and 5th dimensions are not 0. Similarly, the samples belonging to the negative class [2.0, 0.0, 0.1, 0.0, 0.0, 0.0] are represented as "-1 0 2"

Model Parameters

Property NameDefaultMeaning
ml.model.class.name""the name of Angel model
ml.model.size-1the size of model,when -1 is selected, the range is (0, +long.max)
ml.model.typeRowType.T_FLOAT_DENSE.toStringAngel model type,suports 28 kinds
ml.model.is.classificationtruewhether the model belongs to the classification model
ml.epoch.num30Number of iterations
ml.batch.sample.ratio1.0Indicates the percentage of each batch as a percentage of the overall data
ml.learn.rate0.5learning rate
ml.num.update.per.epoch10the update number of parameter in one epoch
ml.opt.decay.class.nameStandardDecaythe name of decay class,optional:StandardDecay<Use with alpha>,WarmRestarts<Use with alpha>,CorrectionDecay<Use with alpha and beta>,ConstantLearningRate<constant decay>
ml.opt.decay.on.batchfalsewhether to decay in batches
ml.opt.decay.intervals100the intervals of decay
ml.opt.decay.alpha0.001alpha in decay
ml.opt.decay.beta0.001beta in decay

Model Types

Property NameDefaultMeaning
RowType.T_DOUBLE_DENSE0Indicates that the type of model data row is a dense Double type with an index value in the Int range
RowType.T_DOUBLE_DENSE_COMPONENT1Indicates that the type of the model data row is a combined dense Double type with an index value in the Int range
RowType.T_DOUBLE_DENSE_LONGKEY_COMPONENT2Indicates that the type of the model data row is a combined dense Double type with an index value in the Long range.
RowType.T_DOUBLE_SPARSE3Indicates that the type of the model data row is a sparse Double type with an index value in the Int range
RowType.T_DOUBLE_SPARSE_COMPONENT4Indicates that the type of the model data row is the combined sparse Double type with the index value in the Int range.
RowType.T_DOUBLE_SPARSE_LONGKEY5Indicates that the type of the model data row is a sparse Double type with an index value in the Long range.
RowType.T_DOUBLE_SPARSE_LONGKEY_COMPONENT6Indicates that the type of the model data row is the combined sparse Double type with the index value in the Int range.
RowType.T_FLOAT_DENSE7Indicates that the type of the model data row is a dense Float type with an index value in the Int range
RowType.T_FLOAT_DENSE_COMPONENT8Indicates that the type of the model data row is a combined dense Float type with an index value in the Int range.
RowType.T_FLOAT_DENSE_LONGKEY_COMPONENT9Indicates that the type of the model data row is a combined dense Float type with an index value in the Long range.
RowType.T_FLOAT_SPARSE10Indicates that the type of model data row is a sparse Float type with an index value in the Int range
RowType.T_FLOAT_SPARSE_COMPONENT11Indicates that the type of model data row is a combined sparse Float type with an index value in the Int range
RowType.T_FLOAT_SPARSE_LONGKEY12Indicates that the type of the model data row is a sparse Float type with an index value in the Long range
RowType.T_FLOAT_SPARSE_LONGKEY_COMPONENT13Indicates that the type of the model data row is a combined sparse Float type with an index value in the Long range
RowType.T_LONG_DENSE14Indicates that the type of the model data row is a dense Long type with an index value in the Int range
RowType.T_LONG_DENSE_COMPONENT15Indicates that the type of the model data row is the combined dense Long type with the index value in the Int range
RowType.T_LONG_DENSE_LONGKEY_COMPONENT16Indicates that the type of the model data row is the combined dense Long type with the index value in the Long range
RowType.T_LONG_SPARSE17Indicates that the type of the model data row is a sparse Long type with an index value in the Long range
RowType.T_LONG_SPARSE_COMPONENT18Indicates that the type of the model data row is the combined sparse Long type of the index value in the Int range
RowType.T_LONG_SPARSE_LONGKEY19Indicates that the type of the model data row is a sparse Long type with an index value in the Long range
RowType.T_LONG_SPARSE_LONGKEY_COMPONENT20Indicates that the type of the model data row is the combined sparse Long type with the index value in the Long range
RowType.T_INT_DENSE21Indicates that the type of model data row is a dense Int type with an index value in the Int range
RowType.T_INT_DENSE_COMPONENT22Indicates that the type of the model data row is the combined dense Int type with the index value in the Int range
RowType.T_INT_DENSE_LONGKEY_COMPONENT23Indicates that the type of the model data row is a combined dense Int type with an index value in the Long range
RowType.T_INT_SPARSE24Indicates that the type of the model data row is a sparse Int type with an index value in the Int range
RowType.T_INT_SPARSE_COMPONENT25Indicates that the type of the model data row is a sparse Int type with an index value in the Int range
RowType.T_INT_SPARSE_LONGKEY26Indicates that the type of the model data row is a sparse Int type with an index value in the Long range
RowType.T_INT_SPARSE_LONGKEY_COMPONENT27Indicates that the type of the model data row is the combined sparse Int type with the index value in the Long range

Optimizer Configuration

Property NameDefaultMeaning
ml.fclayer.optimizerMomentumfull connection layer optimizer, optional optimizer: Momentum,AdaDelta,AdaGrad,Adam,FTRL
ml.embedding.optimizerMomentumembedding layer optimizer,optional optimizer: Momentum,AdaDelta,AdaGrad,Adam,FTRL
ml.inputlayer.optimizerMomentuminput layer optimizer, optional optimizer: Momentum,AdaDelta,AdaGrad,Adam,FTRL
ml.fclayer.matrix.output.formatclassOf[RowIdColIdValueTextRowFormat].getCanonicalNamethe output format of full connection layer
ml.embedding.matrix.output.formatclassOf[TextColumnFormat].getCanonicalNamethe output format of embedding layer
ml.simpleinputlayer.matrix.output.formatclassOf[ColIdValueTextRowFormat].getCanonicalNamethe output format of simpleinput layer
ml.reg.l20.0coefficient of the L2 penalty
ml.reg.l10.0coefficient of the L1 penalty

Momentum

Property NameDefaultMeaning
ml.opt.momentum.momentum0.9momentum

AdaDelta

Property NameDefaultMeaning
ml.opt.adadelta.alpha0.9alpha
ml.opt.adadelta.beta0.9beta

AdaGrad

Property NameDefaultMeaning
ml.opt.adagrad.beta0.9beta

Adam

Property NameDefaultMeaning
ml.opt.adam.gamma0.99gamma
ml.opt.adam.beta0.9beta

FTRL

Property NameDefaultMeaning
ml.opt.ftrl.alpha0.1alpha
ml.opt.ftrl.beta1.0beta

Layers, models Parameter Configuration

Embedding Parameter Configuration

Property NameDefaultMeaning
ml.fm.field.num-1feature dimension, -1 for all features, and the range can be mapped to (0, +long.max)
ml.fm.rank8the length of vector in embedding

(MLP) Layer Parameter Configuration

Property NameDefaultMeaning
ml.num.class2the number of classification

(MLR) Layer Parameter Configuration

Property NameDefaultMeaning
ml.mlr.rank5the number of fields

RobustRegression Parameter Configuration

Property NameDefaultMeaning
ml.robustregression.loss.delta1.0residual difference section point

Kmeans Parameter Configuration

Property NameDefaultMeaning
ml.kmeans.center.num5the number of clusters
ml.kmeans.c0.1learning rate

GBDT Parameter Configuration

Property NameDefaultMeaning
ml.gbdt.task.typeclassificationtask type,Optional:classification, regression
ml.gbdt.class.num2the number of the classification
ml.gbdt.tree.num10the number of trees
ml.gbdt.tree.depth5maximum tree depth
ml.gbdt.max.node.num(none)the maximum number of nodes
ml.gbdt.split.num5maximum size of grad/hess histograms for each feature
ml.gbdt.sample.ratio1proportion of features selected for training; default is 1
ml.gbdt.min.child.weight0.01the minimum child weight
ml.gbdt.reg.alpha0L1 regular
ml.gbdt.reg.lambda1.0L2 regular
ml.gbdt.thread.num20the number of threads
ml.gbdt.batch.size10000the size of a batch
ml.gbdt.server.splitfalseif true, use two-stage tree splitting; default is false
ml.gbdt.cate.featnonecategorical features,with the format of "feature id : feature rang" ("0:2,1:3" for instance). "none"

Evaluation

Property NameDefaultMeaning
train.loss(none)the loss of training data
validate.loss(none)the loss of validating data
log.likelihood(none)log likelihood
train.error(none)the error of training data
validate.error(none)the error of validating data