Back to Dolphinscheduler

Sqoop Node

docs/docs/en/guide/task/sqoop.md

3.4.19.9 KB
Original Source

Sqoop Node

Overview

Sqoop task type for executing Sqoop application. The workers run sqoop to execute sqoop tasks.

Create Task

  • Click Project Management -> Project Name -> Workflow Definition, and click the Create Workflow button to enter the DAG editing page.
  • Drag from the toolbar to the canvas.

Task Parameters

ParameterDescription
Job Namemap-reduce job name
Direct(1) import:Imports an individual table from an RDBMS to HDFS or Hive. (2) export:Exports a set of files from HDFS or Hive back to an RDBMS.
Hadoop ParamsHadoop custom param for sqoop job.
Sqoop Advanced ParametersSqoop advanced param for sqoop job.
Data Source - TypeSelect the corresponding data source type.
Data Source - DatasourceSelect the corresponding DataSource.
Data Source - ModelType(1) Form:Synchronize data from a table, need to fill in the Table and ColumnType. (2) SQL:Synchronize data of SQL queries result, need to fill in the SQL Statement.
Data Source - TableSets the table name to use when importing to Hive.
Data Source - ColumnType(1) All Columns:Import all fields in the selected table. (2) Some Columns:Import the specified fields in the selected table, need to fill in the Column.
Data Source - ColumnFill in the field name, and separate with commas.
Data Source - SQL StatementFill in SQL query statement.
Data Source - Map Column HiveOverride mapping from SQL to Hive type for configured columns.
Data Source - Map Column JavaOverride mapping from SQL to Java type for configured columns.
Data Target - TypeSelect the corresponding data target type.
Data Target - DatabaseFill in the Hive database name.
Data Target - TableFill in the Hive table name.
Data Target - CreateHiveTableImport a table definition into Hive. If set, then the job will fail if the target hive table exits.
Data Target - DropDelimiterDrops \n, \r, and \01 from string fields when importing to Hive.
Data Target - OverWriteSrcOverwrite existing data in the Hive table.
Data Target - Hive Target DirYou can also explicitly choose the target directory.
Data Target - ReplaceDelimiterReplace \n, \r, and \01 from string fields with user defined string when importing to Hive.
Data Target - Hive partition KeysFill in the hive partition keys name, and separate with commas.
Data Target - Hive partition ValuesFill in the hive partition Values name, and separate with commas.
Data Target - Target DirFill in the HDFS target directory.
Data Target - DeleteTargetDirDelete the target directory if it exists.
Data Target - CompressionCodecChoice the hadoop codec.
Data Target - FileTypeChoice the storage Type.
Data Target - FieldsTerminatedSets the field separator character.
Data Target - LinesTerminatedSets the end-of-line character.

Task Example

This example demonstrates importing data from MySQL into Hive. The MySQL database name is test and the table name is example. The following figure shows sample data.

Configuring the Sqoop environment

If you are using the Sqoop task type in a production environment, you must ensure that the worker can execute the sqoop command.

Configuring Sqoop Task Node

you can configure the node content by following the steps in the diagram below.

The key configuration in this sample is shown in the following table.

ParameterValue
Job Namesqoop_mysql_to_hive_test
Data Source - TypeMYSQL
Data Source - DatasourceMYSQL MyTestMySQL(You could change MyTestMySQL to the name you like)
Data Source - ModelTypeForm
Data Source - Tableexample
Data Source - ColumnTypeAll Columns
Data Target - TypeHIVE
Data Target - Databasetmp
Data Target - Tableexample
Data Target - CreateHiveTabletrue
Data Target - DropDelimiterfalse
Data Target - OverWriteSrctrue
Data Target - Hive Target Dir(No need to fill in)
Data Target - ReplaceDelimiter,
Data Target - Hive partition Keys(No need to fill in)
Data Target - Hive partition Values(No need to fill in)

View run results