Back to Materialize

COPY FROM

doc/user/content/sql/copy-from.md

12312.8 KB
Original Source

COPY FROM copies data into a table using the Postgres COPY protocol.

Syntax

{{< tabs >}}

{{< tab "Copy from STDIN" >}} {{% include-syntax file="examples/copy_from" example="syntax" %}} {{< /tab >}}

{{< tab "Copy from S3 and S3 compatible services" >}} {{% include-syntax file="examples/copy_from_s3" example="syntax" %}} {{< /tab >}}

{{< /tabs >}}

Details

S3 Bucket IAM Policies

To use COPY FROM with S3, you need to allow the following actions in your IAM policy:

Action typeAction nameAction description
Reads3:GetObjectGrants permission to retrieve an object from a bucket.
Lists3:ListBucketGrants permission to list some or all of the objects in a bucket.

{{< note >}} For S3-compatible object storage services (e.g., Google Cloud Storage, Cloudflare R2, MinIO), you need to enable equivalent permissions on the service you are using. The specific configuration steps will vary by provider, but the access credentials must allow the same read and list operations on the target bucket. {{< /note >}}

Text formatting

As described in the Text Format section of PostgreSQL's documentation.

CSV formatting

As described in the CSV Format section of PostgreSQL's documentation except that:

  • More than one layer of escaped quote characters returns the wrong result.

  • Quote characters must immediately follow a delimiter to be treated as expected.

  • Single-column rows containing quoted end-of-data markers (e.g. "\.") will be treated as end-of-data markers despite being quoted. In PostgreSQL, this data would be escaped and would not terminate the data processing.

  • Quoted null strings will be parsed as nulls, despite being quoted. In PostgreSQL, this data would be escaped.

    To ensure proper null handling, we recommend specifying a unique string for null values, and ensuring it is never quoted.

  • Unterminated quotes are allowed, i.e. they do not generate errors. In PostgreSQL, all open unescaped quotation punctuation must have a matching piece of unescaped quotation punctuation or it generates an error.

PARQUET formatting

Supported PARQUET compression formats

  • snappy
  • gzip
  • brotli
  • zstd
  • lz4

{{< comment >}} TODO:

  • Text can be imported as text or JSON/JSONB or a map.. do we document casting rules/make a whole section for casting?
  • Parquet has an interval type - can materialize import it? {{< /comment >}}
Arrow typeParquet primitive typeParquet logical typeMaterialize type
boolBOOLEANboolean
date32INT32DATEdate
decimal128[38, 10 or max-scale]FIXED_LEN_BYTE_ARRAYDECIMALnumeric
fixed_size_binary(16)FIXED_LEN_BYTE_ARRAYbytea
float32FLOATreal
float64DOUBLEdouble precision
int16INT32INT(16, true)smallint
int32INT32integer
int64INT64bigint
large_binaryBYTE_ARRAYbytea
large_utf8BYTE_ARRAYjsonb
listNestedlist
structNestedArrays ([])
time64[microsecond]INT64TIMESTAMP[isAdjustedToUTC = false, unit = MICROS]timestamp
time64[microsecond]INT64TIMESTAMP[isAdjustedToUTC = true, unit = MICROS]timestamp with time zone
time64[nanosecond]INT64TIME[isAdjustedToUTC = false, unit = NANOS]time
uint16INT32INT(16, false)uint2
uint32INT32INT(32, false)uint4
uint64INT64INT(64, false)uint8
utf8 or large_utf8BYTE_ARRAYSTRINGtext
mapunsupported
intervalunsupported

Limits

You can copy up to 10 GiB of data at a time. If you need to copy more than that, please contact support.

When importing parquet files, entire row groups are held in memory at once, so ensure that your Materialize instance has enough available memory to accomodate your parquet files. If you are encountering memory issues, and are unable to reduce the sizes of your row groups, please contact support.

Examples

From STDIN

mzsql
COPY t FROM STDIN WITH (DELIMITER '|');
mzsql
COPY t FROM STDIN (FORMAT CSV);
mzsql
COPY t FROM STDIN (DELIMITER '|');

From AWS S3

Using AWS connection

Perform bulk import:

Using FILES option:

mzsql
COPY INTO csv_table FROM 's3://example_bucket' (FORMAT CSV, AWS CONNECTION = example_aws_conn, FILES = ['example_data.csv']);

Using the full s3 URI:

mzsql
COPY INTO csv_table FROM 's3://example_bucket/example_data.csv' (FORMAT CSV, AWS CONNECTION = example_aws_conn);

Using PATTERN option:

mzsql
COPY INTO parquet_table FROM 's3://example_bucket' (FORMAT PARQUET, AWS CONNECTION = example_aws_conn, PATTERN = '*parquet*');

Using S3-compatible object storage

You can use COPY FROM with any S3-compatible object storage service, such as Google Cloud Storage, Cloudflare R2, or MinIO. First, create an AWS connection for S3-compatible storage, then use it in the COPY command. Make sure your credentials have the necessary permissions as described in S3 Bucket IAM Policies.

mzsql
COPY INTO csv_table FROM 's3://my_bucket/my_data.csv' (FORMAT CSV, AWS CONNECTION = gcs_connection);

Using presigned URL

mzsql
COPY INTO csv_table FROM '<s3 presigned URL>' (FORMAT CSV);

Privileges

The privileges required to execute this statement are:

{{% include-headless "/headless/sql-command-privileges/copy-from" %}}