Back to Dolphinscheduler

Configuration

docs/docs/en/architecture/configuration.md

3.4.153.9 KB
Original Source
<!-- markdown-link-check-disable -->

Configuration

Preface

This document explains the DolphinScheduler application configurations.

Directory Structure

The directory structure of DolphinScheduler is as follows:

├── LICENSE
│
├── NOTICE
│
├── licenses                                    directory of licenses
│
├── bin                                         directory of DolphinScheduler application commands, configurations scripts
│   ├── dolphinscheduler-daemon.sh              script to start or shut down DolphinScheduler application
│   ├── env                                     directory of scripts to load environment variables
│   │   ├── dolphinscheduler_env.sh             script to export environment variables [eg: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...] when you start or stop service using script `dolphinscheduler-daemon.sh`
│
├── alert-server                                directory of DolphinScheduler alert-server commands, configurations scripts and libs
│   ├── bin
│   │   └── start.sh                            script to start DolphinScheduler alert-server
│   │   └── jvm_args_env.sh                     script to set JVM args of DolphinScheduler alert-server
│   ├── conf
│   │   ├── application.yaml                    configurations of alert-server
│   │   ├── bootstrap.yaml                      configurations for Spring Cloud bootstrap, mostly you don't need to modify this,
│   │   ├── common.properties                   configurations of common-service like storage, credentials, etc.
│   │   ├── dolphinscheduler_env.sh             script to load environment variables for alert-server
│   │   └── logback-spring.xml                  configurations of alert-service log
│   └── libs                                    directory of alert-server libs
│
├── api-server                                  directory of DolphinScheduler api-server commands, configurations scripts and libs
│   ├── bin
│   │   └── start.sh                            script to start DolphinScheduler api-server
│   │   └── jvm_args_env.sh                     script to set JVM args of DolphinScheduler api-server
│   ├── conf
│   │   ├── application.yaml                    configurations of api-server
│   │   ├── bootstrap.yaml                      configurations for Spring Cloud bootstrap, mostly you don't need to modify this,
│   │   ├── common.properties                   configurations of common-service like storage, credentials, etc.
│   │   ├── dolphinscheduler_env.sh             script to load environment variables for api-server
│   │   └── logback-spring.xml                  configurations of api-service log
│   ├── libs                                    directory of api-server libs
│   └── ui                                      directory of api-server related front-end web resources
│
├── master-server                               directory of DolphinScheduler master-server commands, configurations scripts and libs
│   ├── bin
│   │   └── start.sh                            script to start DolphinScheduler master-server
│   │   └── jvm_args_env.sh                     script to set JVM args of DolphinScheduler master-server
│   ├── conf
│   │   ├── application.yaml                    configurations of master-server
│   │   ├── bootstrap.yaml                      configurations for Spring Cloud bootstrap, mostly you don't need to modify this,
│   │   ├── common.properties                   configurations of common-service like storage, credentials, etc.
│   │   ├── dolphinscheduler_env.sh             script to load environment variables for master-server
│   │   └── logback-spring.xml                  configurations of master-service log
│   └── libs                                    directory of master-server libs
│
├── standalone-server                           directory of DolphinScheduler standalone-server commands, configurations scripts and libs
│   ├── bin
│   │   └── start.sh                            script to start DolphinScheduler standalone-server
│   │   └── jvm_args_env.sh                     script to set JVM args of DolphinScheduler standalone-server
│   ├── conf
│   │   ├── application.yaml                    configurations of standalone-server
│   │   ├── bootstrap.yaml                      configurations for Spring Cloud bootstrap, mostly you don't need to modify this,
│   │   ├── common.properties                   configurations of common-service like storage, credentials, etc.
│   │   ├── dolphinscheduler_env.sh             script to load environment variables for standalone-server
│   │   ├── logback-spring.xml                  configurations of standalone-service log
│   │   └── sql                                 .sql files to create or upgrade DolphinScheduler metadata
│   ├── libs                                    directory of standalone-server libs
│   └── ui                                      directory of standalone-server related front-end web resources
│  
├── tools                                       directory of DolphinScheduler metadata tools commands, configurations scripts and libs
│   ├── bin
│   │   └── upgrade-schema.sh                   script to initialize or upgrade DolphinScheduler metadata
│   ├── conf
│   │   ├── application.yaml                    configurations of tools
│   │   └── common.properties                   configurations of common-service like storage, credentials, etc.
│   ├── libs                                    directory of tool libs
│   └── sql                                     .sql files to create or upgrade DolphinScheduler metadata
│  
├── worker-server                               directory of DolphinScheduler worker-server commands, configurations scripts and libs
│   ├── bin
│   │   └── start.sh                        script to start DolphinScheduler worker-server
│   │   └── jvm_args_env.sh                 script to set JVM args of DolphinScheduler worker-server
│   ├── conf
│   │   ├── application.yaml                configurations of worker-server
│   │   ├── bootstrap.yaml                  configurations for Spring Cloud bootstrap, mostly you don't need to modify this,
│   │   ├── common.properties               configurations of common-service like storage, credentials, etc.
│   │   ├── dolphinscheduler_env.sh         script to load environment variables for worker-server
│   │   └── logback-spring.xml              configurations of worker-service log
│   └── libs                                directory of worker-server libs
│
└── ui                                          directory of front-end web resources

Configurations in Details

dolphinscheduler-daemon.sh [startup or shutdown DolphinScheduler application]

dolphinscheduler-daemon.sh is responsible for DolphinScheduler startup and shutdown. Essentially, start-all.sh or stop-all.sh startup and shutdown the cluster via dolphinscheduler-daemon.sh. Currently, DolphinScheduler just makes a basic config, remember to config further JVM options based on your practical situation of resources.

Default simplified parameters are:

bash
export DOLPHINSCHEDULER_OPTS="
-server
-Xmx16g
-Xms1g
-Xss512k
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:+UseFastAccessorMethods
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70
"

"-XX:DisableExplicitGC" is not recommended due to may lead to memory link (DolphinScheduler dependent on Netty to communicate). If add "-Djava.net.preferIPv6Addresses=true" will use ipv6 address, if add "-Djava.net.preferIPv4Addresses=true" will use ipv4 address, if doesn't set the two parameter will use ipv4 or ipv6.

DolphinScheduler uses Spring Hikari to manage database connections, configuration file location:

ServiceConfiguration file
Master Servermaster-server/conf/application.yaml
Api Serverapi-server/conf/application.yaml
Worker Serverworker-server/conf/application.yaml
Alert Serveralert-server/conf/application.yaml

The default configuration is as follows:

ParametersDefault valueDescription
spring.datasource.driver-class-nameorg.postgresql.Driverdatasource driver
spring.datasource.urljdbc:postgresql://127.0.0.1:5432/dolphinschedulerdatasource connection url
spring.datasource.usernamerootdatasource username
spring.datasource.passwordrootdatasource password
spring.datasource.hikari.connection-test-queryselect 1validate connection by running the SQL
spring.datasource.hikari.minimum-idle5minimum connection pool size number
spring.datasource.hikari.auto-committruewhether auto commit
spring.datasource.hikari.pool-nameDolphinSchedulername of the connection pool
spring.datasource.hikari.maximum-pool-size50maximum connection pool size number
spring.datasource.hikari.connection-timeout30000connection timeout
spring.datasource.hikari.idle-timeout600000Maximum idle connection survival time
spring.datasource.hikari.leak-detection-threshold0Connection leak detection threshold
spring.datasource.hikari.initialization-fail-timeout1Connection pool initialization failed timeout

Note that DolphinScheduler also supports database configuration through bin/env/dolphinscheduler_env.sh.

DolphinScheduler uses Zookeeper for cluster management, fault tolerance, event monitoring and other functions. Configuration file location:

ServiceConfiguration file
Master Servermaster-server/conf/application.yaml
Api Serverapi-server/conf/application.yaml
Worker Serverworker-server/conf/application.yaml

The default configuration is as follows:

ParametersDefault valueDescription
registry.zookeeper.namespacedolphinschedulernamespace of zookeeper
registry.zookeeper.connect-stringlocalhost:2181the connection string of zookeeper
registry.zookeeper.retry-policy.base-sleep-time60mstime to wait between subsequent retries
registry.zookeeper.retry-policy.max-sleep300msmaximum time to wait between subsequent retries
registry.zookeeper.retry-policy.max-retries5maximum retry times
registry.zookeeper.session-timeout30ssession timeout
registry.zookeeper.connection-timeout30sconnection timeout
registry.zookeeper.block-until-connected600mswaiting time to block until the connection succeeds
registry.zookeeper.digest{username}:{password}digest of zookeeper to access znode, works only when acl is enabled, for more details please check [https://zookeeper.apache.org/doc/r3.4.14/zookeeperAdmin.html](Apache Zookeeper doc)

Note that DolphinScheduler also supports zookeeper related configuration through bin/env/dolphinscheduler_env.sh.

For ETCD Registry, please see more details on link. For JDBC Registry, please see more details on link.

common.properties [hadoop、s3、yarn config properties]

Currently, common.properties mainly configures Hadoop,s3a related configurations. Configuration file location:

ServiceConfiguration file
Master Servermaster-server/conf/common.properties
Api Serverapi-server/conf/common.properties, api-server/conf/aws.yaml
Worker Serverworker-server/conf/common.properties, worker-server/conf/aws.yaml
Alert Serveralert-server/conf/common.properties

The default configuration is as follows:

ParametersDefault valueDescription
data.basedir.path/tmp/dolphinschedulerlocal directory used to store temp files
resource.storage.typeNONEtype of resource files: HDFS, S3, OSS, GCS, ABS, NONE
resource.upload.path/dolphinschedulerstorage path of resource files
hdfs.root.userhdfsconfigure users with corresponding permissions if storage type is HDFS
fs.defaultFShdfs://mycluster:8020If resource.storage.type=S3, then the request url would be similar to 's3a://dolphinscheduler'. Otherwise if resource.storage.type=HDFS and hadoop supports HA, copy core-site.xml and hdfs-site.xml into 'conf' directory
hadoop.security.authentication.startup.statefalsewhether hadoop grant kerberos permission
java.security.krb5.conf.path/opt/krb5.confkerberos config directory
login.user.keytab.username[email protected]kerberos username
login.user.keytab.path/opt/hdfs.headless.keytabkerberos user keytab
kerberos.expire.time2kerberos expire time,integer,the unit is hour
yarn.resourcemanager.ha.rm.ids192.168.xx.xx,192.168.xx.xxspecify the yarn resourcemanager url. if resourcemanager supports HA, input HA IP addresses (separated by comma), or input null for standalone
yarn.application.status.addresshttp://ds1:8088/ws/v1/cluster/apps/%skeep default if ResourceManager supports HA or not use ResourceManager, or replace ds1 with corresponding hostname if ResourceManager in standalone mode
development.statefalsespecify whether in development state
dolphin.scheduler.network.interface.preferredNONEdisplay name of the network card which will be used
dolphin.scheduler.network.interface.restrictdocker0display name of the network card which shouldn't be used
dolphin.scheduler.network.priority.strategydefaultIP acquisition strategy, give priority to finding the internal network or the external network
resource.manager.httpaddress.port8088the port of resource manager
yarn.job.history.status.addresshttp://ds1:19888/ws/v1/history/mapreduce/jobs/%sjob history status url of yarn
datasource.encryption.enablefalsewhether to enable datasource encryption
datasource.encryption.salt!@#$%^&*the salt of the datasource encryption
support.hive.oneSessionfalsespecify whether hive SQL is executed in the same session
sudo.enabletruewhether to enable sudo
zeppelin.rest.urlhttp://localhost:8080the RESTful API url of zeppelin
appId.collectlogway to collect applicationId, if use aop, alter the configuration from log to aop, annotation of applicationId auto collection related configuration in bin/env/dolphinscheduler_env.sh should be removed. Note: Aop way doesn't support submitting yarn job on remote host by client mode like Beeline, and will failure if override applicationId collection-related environment configuration in dolphinscheduler_env.sh, and .

Location: api-server/conf/application.yaml

ParametersDefault valueDescription
server.port12345api service communication port
server.servlet.session.timeout120msession timeout
server.servlet.context-path/dolphinscheduler/request path
spring.servlet.multipart.max-file-size1024MBmaximum file size
spring.servlet.multipart.max-request-size1024MBmaximum request size
server.jetty.max-http-post-size5000000jetty maximum post size
spring.banner.charsetUTF-8message encoding
spring.jackson.time-zoneUTCtime zone
spring.jackson.date-format"yyyy-MM-dd HH:mm:ss"time format
spring.messages.basenamei18n/messagesi18n config
security.authentication.typePASSWORDauthentication type
security.authentication.ldap.user.adminread-only-adminadmin user account when you log-in with LDAP
security.authentication.ldap.urlsldap://ldap.forumsys.com:389/LDAP urls
security.authentication.ldap.base.dndc=example,dc=comLDAP base dn
security.authentication.ldap.usernamecn=read-only-admin,dc=example,dc=comLDAP username
security.authentication.ldap.passwordpasswordLDAP password
security.authentication.ldap.user.identity-attributeuidLDAP user identity attribute
security.authentication.ldap.user.email-attributemailLDAP user email attribute
security.authentication.ldap.user.not-exist-actionCREATEaction when ldap user is not exist,default value: CREATE. Optional values include(CREATE,DENY)
security.authentication.ldap.ssl.enablefalseLDAP ssl switch
security.authentication.ldap.ssl.trust-storeldapkeystore.jksLDAP jks file absolute path
security.authentication.ldap.ssl.trust-store-passwordpasswordLDAP jks password
security.authentication.casdoor.user.adminadmin user account when you log-in with Casdoor
casdoor.endpointCasdoor server url
casdoor.client-idid in Casdoor
casdoor.client-secretsecret in Casdoor
casdoor.certificatecertificate in Casdoor
casdoor.organization-nameorganization name in Casdoor
casdoor.application-nameapplication name in Casdoor
casdoor.redirect-urldoplhinscheduler login url
api.traffic.control.global.switchfalsetraffic control global switch
api.traffic.control.max-global-qps-rate300global max request number per second
api.traffic.control.tenant-switchfalsetraffic control tenant switch
api.traffic.control.default-tenant-qps-rate10default tenant max request number per second
api.traffic.control.customize-tenant-qps-ratecustomize tenant max request number per second

Location: master-server/conf/application.yaml

ParametersDefault valueDescription
master.listen-port5678master listen port
master.logic-task-config.task-executor-thread-count2 * CPU +1The thread size used to execute logic task
master.worker-load-balancer-configuration-properties.typeDYNAMIC_WEIGHTED_ROUND_ROBINMaster will use the worker's cpu/memory/threadPool usage to calculate the worker load, the lower load will have more change to be dispatched task
master.max-heartbeat-interval10smaster max heartbeat interval
master.server-load-protection.enabledtrueIf set true, will open master overload protection
master.server-load-protection.max-system-cpu-usage-percentage-thresholds0.8Master max system cpu usage, when the master's system cpu usage is smaller then this value, master server can execute workflow.
master.server-load-protection.max-jvm-cpu-usage-percentage-thresholds0.8Master max JVM cpu usage, when the master's jvm cpu usage is smaller then this value, master server can execute workflow.
master.server-load-protection.max-system-memory-usage-percentage-thresholds0.8Master max system memory usage , when the master's system memory usage is smaller then this value, master server can execute workflow.
master.server-load-protection.max-disk-usage-percentage-thresholds0.8Master max disk usage , when the master's disk usage is smaller then this value, master server can execute workflow.
master.server-load-protection.max-concurrent-workflow-instances2147483647Master max concurrent workflow instances, when the master's workflow instance count reaches or exceeds this value, master server will be marked as busy.
master.server-load-protection.max-workflow-instance-runtime0mMaximum allowed running time for a workflow instance. If the running duration exceeds this value, the instance will be kill. The default value of 0d indicates no limit, the min value is 1m.
master.server-load-protection.max-task-instance-runtime0mMaximum allowed running time for a task instance. If the running duration exceeds this value, the instance will be kill. The default value of 0d indicates no limit, the min value is 1m.
master.worker-group-refresh-interval10sThe interval to refresh worker group from db to memory
master.command-fetch-strategy.typeID_SLOT_BASEDThe command fetch strategy, only support ID_SLOT_BASED
master.command-fetch-strategy.config.id-step1The id auto incremental step of t_ds_command in db
master.command-fetch-strategy.config.fetch-size10The number of commands fetched by master
master.task-dispatch-policy.dispatch-timeout-enabledfalseIndicates whether the dispatch timeout checking mechanism is enabled
master.task-dispatch-policy.max-task-dispatch-duration1hThe maximum allowed duration a task may wait in the dispatch queue before being assigned to a worker

Location: worker-server/conf/application.yaml

ParametersDefault valueDescription
worker.listen-port1234worker-service listen port
worker.max-heartbeat-interval10sworker-service max heartbeat interval
worker.host-weight100worker host weight to dispatch tasks
worker.server-load-protection.enabledtrueIf set true will open worker overload protection
worker.server-load-protection.max-system-cpu-usage-percentage-thresholds0.8Worker max system cpu usage, when the worker's system cpu usage is smaller then this value, master server can execute workflow.
worker.server-load-protection.max-jvm-cpu-usage-percentage-thresholds0.8Worker max JVM cpu usage, when the worker's jvm cpu usage is smaller then this value, master server can execute workflow.
worker.server-load-protection.max-system-memory-usage-percentage-thresholds0.8Worker max system memory usage , when the worker's system memory usage is smaller then this value, master server can execute workflow.
worker.server-load-protection.max-disk-usage-percentage-thresholds0.8Worker max disk usage , when the worker's disk usage is smaller then this value, master server can execute workflow.
worker.registry-disconnect-strategy.strategystopUsed when the worker disconnect from registry, default value: stop. Optional values include stop, waiting
worker.registry-disconnect-strategy.max-waiting-time100sUsed when the worker disconnect from registry, and the disconnect strategy is waiting, this config means the worker will waiting to reconnect to registry in given times, and after the waiting times, if the worker still cannot connect to registry, will stop itself, if the value is 0s, will wait infinitely
worker.physical-task-config.task-executor-thread-size100The thread size used to execute physical task
worker.tenant-config.auto-create-tenant-enabledtruetenant corresponds to the user of the system, which is used by the worker to submit the job. If system does not have this user, it will be automatically created after the parameter worker.tenant.auto.create is true.
worker.tenant-config.default-tenant-enabledfalseIf set true, will use worker bootstrap user as the tenant to execute task when the tenant is default.

Location: alert-server/conf/application.yaml

ParametersDefault valueDescription
server.port50053the port of Alert Server
alert.port50052the port of alert

This part describes quartz configs and configure them based on your practical situation and resources.

ServiceConfiguration file
Master Servermaster-server/conf/application.yaml
Api Serverapi-server/conf/application.yaml

The default configuration is as follows:

ParametersDefault value
spring.quartz.properties.org.quartz.jobStore.isClusteredtrue
spring.quartz.properties.org.quartz.jobStore.classorg.springframework.scheduling.quartz.LocalDataSourceJobStore
spring.quartz.properties.org.quartz.scheduler.instanceIdAUTO
spring.quartz.properties.org.quartz.jobStore.tablePrefixQRTZ_
spring.quartz.properties.org.quartz.jobStore.acquireTriggersWithinLocktrue
spring.quartz.properties.org.quartz.scheduler.instanceNameDolphinScheduler
spring.quartz.properties.org.quartz.jobStore.usePropertiesfalse
spring.quartz.properties.org.quartz.jobStore.misfireThreshold60000
spring.quartz.properties.org.quartz.scheduler.makeSchedulerThreadDaemontrue
spring.quartz.properties.org.quartz.jobStore.driverDelegateClassorg.quartz.impl.jdbcjobstore.PostgreSQLDelegate
spring.quartz.properties.org.quartz.jobStore.clusterCheckinInterval5000

The above configuration items is the same in Master Server and Api Server, but their Quartz Scheduler threadpool configuration is different.

The default quartz threadpool configuration in Master Server is as follows:

ParametersDefault value
spring.quartz.properties.org.quartz.threadPool.makeThreadsDaemonstrue
spring.quartz.properties.org.quartz.threadPool.threadCount25
spring.quartz.properties.org.quartz.threadPool.threadPriority5
spring.quartz.properties.org.quartz.threadPool.classorg.quartz.simpl.SimpleThreadPool

Since Api Server will not start Quartz Scheduler instance, as a client only, therefore it's threadpool is configured as QuartzZeroSizeThreadPool which has zero thread; The default configuration is as follows:

ParametersDefault value
spring.quartz.properties.org.quartz.threadPool.classorg.apache.dolphinscheduler.scheduler.quartz.QuartzZeroSizeThreadPool

dolphinscheduler_env.sh [load environment variables configs]

When using shell to commit tasks, DolphinScheduler will export environment variables from bin/env/dolphinscheduler_env.sh. The mainly configuration including JAVA_HOME and other environment paths.

bash
# JAVA_HOME, will use it to start DolphinScheduler server
export JAVA_HOME=${JAVA_HOME:-/opt/soft/java}

# Tasks related configurations, need to change the configuration if you use the related tasks.
export HADOOP_HOME=${HADOOP_HOME:-/opt/soft/hadoop}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/opt/soft/hadoop/etc/hadoop}
export SPARK_HOME=${SPARK_HOME:-/opt/soft/spark}
export PYTHON_LAUNCHER=${PYTHON_LAUNCHER:-/opt/soft/python/bin/python3}
export HIVE_HOME=${HIVE_HOME:-/opt/soft/hive}
export FLINK_HOME=${FLINK_HOME:-/opt/soft/flink}
export DATAX_LAUNCHER=${DATAX_LAUNCHER:-/opt/soft/datax/bin/datax.py}

export PATH=$HADOOP_HOME/bin:$SPARK_HOME/bin:$PYTHON_LAUNCHER:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_LAUNCHER:$PATH

# applicationId auto collection related configuration, the following configurations are unnecessary if setting appId.collect=log
export HADOOP_CLASSPATH=`hadoop classpath`:${DOLPHINSCHEDULER_HOME}/tools/libs/*
export SPARK_DIST_CLASSPATH=$HADOOP_CLASSPATH:$SPARK_DIST_CLASS_PATH
export HADOOP_CLIENT_OPTS="-javaagent:${DOLPHINSCHEDULER_HOME}/tools/libs/aspectjweaver-1.9.7.jar":$HADOOP_CLIENT_OPTS
export SPARK_SUBMIT_OPTS="-javaagent:${DOLPHINSCHEDULER_HOME}/tools/libs/aspectjweaver-1.9.7.jar":$SPARK_SUBMIT_OPTS
export FLINK_ENV_JAVA_OPTS="-javaagent:${DOLPHINSCHEDULER_HOME}/tools/libs/aspectjweaver-1.9.7.jar":$FLINK_ENV_JAVA_OPTS
ServiceConfiguration file
Master Servermaster-server/conf/logback-spring.xml
Api Serverapi-server/conf/logback-spring.xml
Worker Serverworker-server/conf/logback-spring.xml
Alert Serveralert-server/conf/logback-spring.xml