Back to Angel

Config Details En

docs/deploy/config_details_en.md

3.1.08.7 KB
Original Source

Available Properties of I/O Path, Running Mode, Deploy Mode

Property NameDefaultMeaning
action.typetrainAngel task type; supports "train" for model training and "predict" for generating predictions from model
angel.train.data.path(none)Path of data for training , used when action.type is "train"
angel.predict.data.path(none)Path of data for prediction , used when action.type is "predict"
angel.inputformatorg.apache.hadoop.mapreduce.lib
.input.CombineTextInputFormatFormat of the training data, mainly used for data partitioning and reading, supporting customized formats
angel.predict.out.path(none)Save path for "predict" result, only used when action.type is "predict"
angel.save.model.path(none)Save path for the model, only used when action.type is "train"
angel.load.model.path(none)Path for loading model, can be used for both "train" and "predict" types. When action.type is "train", it loads the old model and does incremental learning. When action.type is "predict", it loads the model for prediction.
angel.deploy.modeYARNDeploy mode, currently supports "YARN" and "LOCAL". Currently, "LOCAL" mode only supports starting up one worker and one PS
angel.running.modeANGEL_PS_WORKERRunning mode, currently supports "ANGEL_PS_WORKER" and "ANGEL_PS" modes. "ANGEL_PS_WORKER" starts up PS and worker components, thus Angel can complete the computation of the entire job. "ANGEL_PS" starts up PS only, providing PS-service to third-party frameworks (such as spark)
angel.job.libjars(none)Jars that the Angel application depends on. Use , to separate multiple jars.
queue(none)Resource pool used by the Angel application, used for resource management under Yarn mode.
angel.job.nameangel appName of Angel application
angel.app.config.file(none)Config file. You can use either command-line or config file to configure Angel's parameters. You can upload a xml format config file.
angel.app.submit.classcom.tencent.angel.utils.DefaultAppSubmitterClass for Angel application submission
angel.task.user.task.classcom.tencent.angel.worker.task.BaseTaskTask class that Angel worker runs; can be customized as long as it inherits com.tencent.angel.worker.task.BaseTask
angel.staleness0Maximum staleness among tasks: 0 means BSP, >0 means SSP, <0 means ASYNC

Available Properties of Resource and Runtime Environment

Property NameDefaultMeaning
angel.am.env(none)Configures Angel master environment, in the form of Key1=Val1, Key2=Val2, ...
angel.worker.env(none)Configures Angel worker environment, in the form of key1=Val1, Key2=Val2, ...
angel.ps.env(none)Configures Angel PS environment, in the form of Key1=Val1, Key2=Val2, ...
angel.workergroup.number1Number of workergroups to be started. Currently, one workergroup supports one worker only (therefore there are equal numbers of workers and workergroups)
angel.worker.task.number1Number of tasks run on each worker
angel.ps.number1Number of PS to be started
angel.am.java.opts(none)JVM parameters for Angel master process
angel.am.resource.cpu-vcores1Number of CPU vcores used by Angel master
angel.worker.memory.mb1024Memory used by a single worker (MB)
angel.worker.java.opts(none)JVM parameters for worker process
angel.worker.cpu.vcores1Number of vcores used by a single worker
angel.ps.memory.mb1024Memory used by a single PS (MB)
angel.ps.java.opts(none)JVM parameters for PS process
angel.ps.cpu.vcores1Number of vcores used by a single PS

Available Properties of RPC

Property NameDefaultMeaning
angel.am.heartbeat.interval.ms5000Interval (ms) for heartbeats sent from Angel master to Yarn RM to prevent connection timeout
angel.worker.heartbeat.interval.ms5000Interval (ms) for heartbeats sent from Angel worker to Angel master
angel.ps.heartbeat.interval.ms5000Interval (ms) for heartbeats sent from Angel PS to Angel master
angel.netty.matrixtransfer.client.sndbuf1048576Send buffer size (byte) used by Netty client for transferring matrix data. We use Netty framework for matrix data transfer between PSAgent and PS; PSAgent is the client, thus Netty client is on the PSAgent side
angel.netty.matrixtransfer.client.rcvbuf1048576Receive buffer size (byte) used by Netty client
angel.netty.matrixtransfer.client.eventgroup.threadnum24Number of threads on Netty client (number of threads used by NioEventLoopGroup)
angel.netty.matrixtransfer.client.usedirectbuffertrueIf true, Netty client will use direct buffer; if false, Netty client will use heap
angel.netty.matrixtransfer.server.sndbuf1048576Send buffer size (byte) of Netty server. We use Netty framework for matrix data transfer between PSAgent and PS; PS is the server
angel.netty.matrixtransfer.server.rcvbuf1048576Receive buffer size (byte) of Netty server
angel.netty.matrixtransfer.server.eventgroup.threadnum24Number of threads on Netty server (number of threads used by NioEventLoopGroup)
angel.netty.matrixtransfer.server.usedirectbuffertrueIf true, Netty server will use direct buffer; if false, Netty server will use heap
angel.netty.matrixtransfer.max.message.size104857600Maximum message size (byte) supported by Netty matrix transfer
angel.matrixtransfer.max.requestnum.perserver4Maximum number of simultaneous requests that PSAgent sends to any PS, used for preventing data overflow caused by too many simultaneous requests to a PS that usually delays requests to other PS
angel.matrixtransfer.max.requestnum64Maximum number of simultaneous requests that PSAgent sends to all PS, used for flow control

Available Properties for System Availability

Property NameDefaultMeaning
angel.am.max-attempts2Maximum number of attempts of the master
angel.worker.max-attempts4Maximum number of attempts of the worker
angel.ps.max-attempts4Maximum number of attempts of the PS
angel.am.appstate.timeout.ms30000How long (ms) the application can be in a state that is neither RUNNING nor COMMITTING before it quits
angel.am.wait.matrix.timeout.ms30000Timeout (ms) for the master to wait for matrix configuration submitted by the client
angel.am.write.state.interval.ms10000Timeout (ms) for the master to write job status onto hdfs
angel.worker.heartbeat.interval.ms5000Interval (ms) for heartbeats sent from worker to master
angel.worker.heartbeat.timeout.ms60000How long (ms) the master will wait for heartbeat sent from worker before timing out
angel.ps.heartbeat.interval.ms5000Interval (ms) for heartbeats sent from PS to master
angel.ps.heartbeat.timeout.ms60000How long (ms) the master will wait for heartbeat sent from PS before timing out
angel.ps.backup.interval.ms60000Interval (ms) for PS to write loaded parameters onto hdfs. PS does so regularly in order to be able to recover after shutdown
angel.matrixtransfer.retry.interval.ms3000Interval (ms) for retries of failed matrix transfer
angel.matrixtransfer.request.timeout.ms30000Timeout (ms) for matrix transfer

Available Properties for Log

Property NameDefaultMeaning
angel.am.log.levelINFOMaster log output level: DEBUG, INFO, WARN, ERROR
angel.worker.log.levelINFOWorker log output level: DEBUG, INFO, WARN, ERROR
angel.ps.log.levelINFOPS log output level: DEBUG, INFO, WARN, ERROR

Available Properties for Cache

Property NameDefaultMeaning
angel.psagent.cache.sync.timeinterval.ms50Interval (ms) for cache updating on the PSAgent side. To prefetch the matrix parameters, we maintain a cache for the matrix parameters on the PSAgent side, which synchronize with data on the PS side regularly
angel.psagent.sync.policy.classcom.tencent.angel.psagent.matrix
.cache.DefalutPolicyCache policy on the PSAgent side
angel.task.data.storage.levelmemory_diskStorage format for training data preprocessed by task, supporting memory, memory_disk and disk. memory means storing all training data in the memory, recommended if the memory is large enough. disk means storing all training data on local disk. memory_disk supports allocating storage to both the memory and disk.
angel.task.memorystorage.max.mb1000Memory size (MB) used by each task for storing training data, only used when storage level is set to memory_disk