docs/algo/linear_on_angel_en.md
Linear Regression is a regression model, which uses the least squares function to model the relationship between one or more independent variables and a dependent variable. It is a common predictiction model.
Linear regression is a simple regression method. Given a data set of n statistical units, a linear regression model assumes that the relationship between the dependent variable and the of regressors X is linear. This relationship is modeled through a disturbance term or error variable ε — an unobserved random variable that adds "noise" to the linear relationship between the dependent variable and regressors. The model is expressed in the following form:
The objective function of linear regression is to minimize the sum of squares of residuals:
where is a group of samples.
Linear regression algorithm can be abstracted as a 1×N PSModel, denoted by w, where , as shown in the following figure:
Angel MLLib provides Linear regression algorithm trained with the mini-batch gradient descent method.
Worker:
In each iteration, worker pulls the up-to-date w from PS, updates the model parameters, △w, using the mini-batch gradient descent optimization method, and push △w back to PS.
PS:
In each iteration, PS receives △w from all workers, add their average to w,obtaining a new model.
Flow:
Algorithm:
Decaying learning rate
The learning rate decays along iterations as , where:
Algorithm Parameters
I/O Parameters
Resource Parameters
Training Job
./bin/angel-submit \
--action.type=train \
--angel.app.submit.class=com.tencent.angel.ml.core.graphsubmit.GraphRunner \
--ml.model.class.name=com.tencent.angel.ml.regression.LinearRegression \
--angel.train.data.path=$input_path \
--angel.save.model.path=$model_path \
--angel.log.path=$log_path \
--ml.data.is.classification=false \
--ml.model.is.classification=false \
--ml.epoch.num=10 \
--ml.feature.index.range=$featureNum+1 \
--ml.data.validate.ratio=0.1 \
--ml.learn.rate=0.1 \
--ml.learn.decay=1 \
--ml.reg.l2=0.001 \
--ml.num.update.per.epoch=10 \
--ml.worker.thread.num=4 \
--ml.data.type=libsvm \
--ml.model.type=T_FLOAT_DENSE \
--angel.workergroup.number=2 \
--angel.worker.memory.mb=5000 \
--angel.worker.task.number=1 \
--angel.ps.number=2 \
--angel.ps.memory.mb=5000 \
--angel.job.name=linearReg_network \
--angel.output.path.deleteonexist=true \
IncTraining Job
./bin/angel-submit \
--action.type=inctrain \
--angel.app.submit.class=com.tencent.angel.ml.core.graphsubmit.GraphRunner \
--ml.model.class.name=com.tencent.angel.ml.regression.LinearRegression \
--angel.train.data.path=$input_path \
--angel.load.model.path=$model_path \
--angel.save.model.path=$model_path \
--angel.log.path=$log_path \
--ml.model.is.classification=false \
--ml.data.is.classification=false \
--ml.epoch.num=10 \
--ml.feature.index.range=$featureNum+1 \
--ml.data.validate.ratio=0.1 \
--ml.learn.rate=0.1 \
--ml.learn.decay=1 \
--ml.reg.l2=0.001 \
--ml.num.update.per.epoch=10 \
--ml.worker.thread.num=4 \
--ml.data.type=libsvm \
--ml.model.type=T_FLOAT_DENSE \
--angel.workergroup.number=2 \
--angel.worker.memory.mb=5000 \
--angel.worker.task.number=1 \
--angel.ps.number=2 \
--angel.ps.memory.mb=5000 \
--angel.job.name=linearReg_network \
--angel.output.path.deleteonexist=true
```
* **Prediction Job**
```java
./bin/angel-submit \
--action.type=predict \
--angel.app.submit.class=com.tencent.angel.ml.core.graphsubmit.GraphRunner \
--ml.model.class.name=com.tencent.angel.ml.regression.LinearRegression \
--angel.predict.data.path=$input_path \
--angel.save.model.path=$model_path \
--angel.predict.out.path $predict_path \
--angel.log.path=$log_path \
--ml.feature.index.range=$featureNum+1 \
--ml.data.type=libsvm \
--ml.model.type=T_FLOAT_DENSE \
--ml.worker.thread.num=4 \
--angel.workergroup.number=2 \
--angel.worker.memory.mb=5000 \
--angel.worker.task.number=1 \
--angel.ps.number=2 \
--angel.ps.memory.mb=5000 \
--angel.job.name=linearReg_network_predict \
--angel.output.path.deleteonexist=true \
```
### Performance
* Data: E2006-tfidf, 1.5×10^5 features, 1.6×10^4 samples
* Resources:
* Angel: executor: 2, 5G memory, 1 task; ps: 2, 5G memory
* Time of 100 epochs:
* Angel: 25min