CREATE SOURCE

A source describes an external system you want Materialize to read data from, and provides details about how to decode and interpret that data.

Syntax summary

{{% include-example file="examples/create_source_postgres" example="syntax" %}}

For details, see CREATE SOURCE: PostgreSQL (New Syntax). {{< /tab >}}

{{% include-example file="examples/create_source_postgres_legacy" example="syntax" %}}

For details, see CREATE SOURCE: PostgreSQL (Legacy). {{< /tab >}} {{< tab "MySQL" >}}

{{% include-example file="examples/create_source_mysql" example="syntax" %}}

For details, see CREATE SOURCE: MySQL. {{< /tab >}}

{{% include-example file="examples/create_source_sql_server" example="syntax" %}}

For details, see CREATE SOURCE: SQL Server (New Syntax).

{{% include-example file="examples/create_source_sql_server_legacy" example="syntax" %}}

For details, see CREATE SOURCE: SQL Server(Legacy).

{{% include-example file="examples/create_source_kafka" example="syntax-avro" %}}

{{% include-example file="examples/create_source_kafka" example="syntax-json" %}}

{{% include-example file="examples/create_source_kafka" example="syntax-text-bytes" %}}

{{% include-example file="examples/create_source_kafka" example="syntax-csv" %}}

{{% include-example file="examples/create_source_kafka" example="syntax-protobuf" %}} {{< /tab >}}

{{% include-example file="examples/create_source_kafka" example="syntax-key-value-format" %}}

For details, see CREATE SOURCE: Kafka/Redpanda. {{< /tab >}}

{{% include-example file="examples/create_source_webhook" example="syntax" %}}

For details, see CREATE SOURCE: Webhook. {{< /tab >}}

Privileges

The privileges required to execute CREATE SOURCE are:

{{% include-headless "/headless/sql-command-privileges/create-source" %}}

Available guides

The following guides step you through setting up sources:

{{< include-md file="shared-content/multilink-box-native-connectors.md" >}}

Best practices

Separate cluster(s) for sources

In production, if possible, use a dedicated cluster for sources; i.e., avoid putting sources on the same cluster that hosts compute objects, sinks, and/or serves queries.

{{% include-from-yaml data="best_practices_details" name="architecture-upsert-source" %}}

Sizing a source

Some sources are low traffic and require relatively few resources to handle data ingestion, while others are high traffic and require hefty resource allocations. The cluster in which you place a source determines the amount of CPU, memory, and disk available to the source.

It's a good idea to size up the cluster hosting a source when:

You want to increase throughput. Larger sources will typically ingest data faster, as there is more CPU available to read and decode data from the upstream external system.
You are using the upsert envelope or Debezium envelope, and your source contains many unique keys. These envelopes maintain state proportional to the number of unique keys in the upstream external system. Larger sizes can store more unique keys.

Sources share the resource allocation of their cluster with all other objects in the cluster. Colocating multiple sources onto the same cluster can be more resource efficient when you have many low-traffic sources that occasionally need some burst capacity.