Back to Dolphinscheduler

Aliyun EMR Serverless Spark

docs/docs/en/guide/task/aliyun-serverless-spark.md

3.4.114.6 KB
Original Source

Aliyun EMR Serverless Spark

Introduction

Aliyun EMR Serverless Spark task plugin submits spark job to Aliyun EMR Serverless Spark service.

Create Connections

  • Click Datasource -> Create Datasource -> ALIYUN_SERVERLESS_SPARK to create a connection.

  • Fill in Datasource Name, Access Key Id, Access Key Secret, Region Id and click Confirm.

Create Tasks

  • Click Porject -> Workflow Definition -> Create Workflow and drag the ALIYUN_SERVERLESS_SPARK task to the canvas.

  • Fill in the task parameters and click Confirm to create the task node.

Task Parameters

ParametersDescription
Datasource typesThe type of datasource the task uses, should be ALIYUN_SERVERLESS_SPARK.
Datasource instancesThe instance of ALIYUN_SERVERLESS_SPARK datasource.
workspace idAliyun Serverless Spark workspace id.
resource queue idAliyun Serverless Spark resource queue the task uses to submit spark job.
code typeAliyun Serverless Spark code type, could be JAR, PYTHON or SQL.
job nameAliyun Serverless Spark job name.
entry pointThe location of the job code such as jar package, python file, or sql file. OSS location supported.
entry point argumentsArguments of the job main program.
spark submit parametersSpark-submit related parameters.
engine release versionSpark engine release version.
is productionWhether the spark job runs in production or development environment.

Examples

Submit Jar tasks

ParametersExample Values / Operations
region idcn-hangzhou
access key id<your-access-key-id>
access key secret<your-access-key-secret>
resource queue idroot_queue
code typeJAR
job nameds-emr-spark-jar
entry pointoss://datadev-oss-hdfs-test/spark-resource/examples/jars/spark-examples_2.12-3.3.1.jar
entry point arguments100
spark submit parameters--class org.apache.spark.examples.SparkPi --conf spark.executor.cores=4 --conf spark.executor.memory=20g --conf spark.driver.cores=4 --conf spark.driver.memory=8g --conf spark.executor.instances=1
engine release versionesr-2.1-native (Spark 3.3.1, Scala 2.12, Native Runtime)
is productionPlease open the switch

Submit SQL tasks

ParametersExample Values / Operations
region idcn-hangzhou
access key id<your-access-key-id>
access key secret<your-access-key-secret>
resource queue idroot_queue
code typeSQL
job nameds-emr-spark-sql-1
entry pointAny non-empty string
entry point arguments-e#show tables;show tables;
spark submit parameters--class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver --conf spark.executor.cores=4 --conf spark.executor.memory=20g --conf spark.driver.cores=4 --conf spark.driver.memory=8g --conf spark.executor.instances=1
engine release versionesr-2.1-native (Spark 3.3.1, Scala 2.12, Native Runtime)
is productionPlease open the switch

Submit SQL tasks located in OSS

ParametersExample Values / Operations
region idcn-hangzhou
access key id<your-access-key-id>
access key secret<your-access-key-secret>
resource queue idroot_queue
code typeSQL
job nameds-emr-spark-sql-2
entry pointAny non-empty string
entry point arguments-f#oss://datadev-oss-hdfs-test/spark-resource/examples/sql/show_db.sql
spark submit parameters--class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver --conf spark.executor.cores=4 --conf spark.executor.memory=20g --conf spark.driver.cores=4 --conf spark.driver.memory=8g --conf spark.executor.instances=1"
engine release versionesr-2.1-native (Spark 3.3.1, Scala 2.12, Native Runtime)
is productionPlease open the switch

Submit PySpark Tasks

ParametersExample Values / Operations
region idcn-hangzhou
access key id<your-access-key-id>
access key secret<your-access-key-secret>
resource queue idroot_queue
code typePYTHON
job nameds-emr-spark-python
entry pointoss://datadev-oss-hdfs-test/spark-resource/examples/src/main/python/pi.py
entry point arguments100
spark submit parameters--conf spark.executor.cores=4 --conf spark.executor.memory=20g --conf spark.driver.cores=4 --conf spark.driver.memory=8g --conf spark.executor.instances=1
engine release versionesr-2.1-native (Spark 3.3.1, Scala 2.12, Native Runtime)
is productionPlease open the switch