doc/user/content/sql/copy-from.md
COPY FROM copies data into a table using the Postgres COPY protocol.
{{< tabs >}}
{{< tab "Copy from STDIN" >}} {{% include-syntax file="examples/copy_from" example="syntax" %}} {{< /tab >}}
{{< tab "Copy from S3 and S3 compatible services" >}} {{% include-syntax file="examples/copy_from_s3" example="syntax" %}} {{< /tab >}}
{{< /tabs >}}
To use COPY FROM with S3, you need to allow the following actions in your IAM policy:
| Action type | Action name | Action description |
|---|---|---|
| Read | s3:GetObject | Grants permission to retrieve an object from a bucket. |
| List | s3:ListBucket | Grants permission to list some or all of the objects in a bucket. |
{{< note >}} For S3-compatible object storage services (e.g., Google Cloud Storage, Cloudflare R2, MinIO), you need to enable equivalent permissions on the service you are using. The specific configuration steps will vary by provider, but the access credentials must allow the same read and list operations on the target bucket. {{< /note >}}
As described in the Text Format section of PostgreSQL's documentation.
As described in the CSV Format section of PostgreSQL's documentation except that:
More than one layer of escaped quote characters returns the wrong result.
Quote characters must immediately follow a delimiter to be treated as expected.
Single-column rows containing quoted end-of-data markers (e.g. "\.") will be
treated as end-of-data markers despite being quoted. In PostgreSQL, this data
would be escaped and would not terminate the data processing.
Quoted null strings will be parsed as nulls, despite being quoted. In PostgreSQL, this data would be escaped.
To ensure proper null handling, we recommend specifying a unique string for null values, and ensuring it is never quoted.
Unterminated quotes are allowed, i.e. they do not generate errors. In PostgreSQL, all open unescaped quotation punctuation must have a matching piece of unescaped quotation punctuation or it generates an error.
Supported PARQUET compression formats
{{< comment >}} TODO:
| Arrow type | Parquet primitive type | Parquet logical type | Materialize type |
|---|---|---|---|
bool | BOOLEAN | boolean | |
date32 | INT32 | DATE | date |
decimal128[38, 10 or max-scale] | FIXED_LEN_BYTE_ARRAY | DECIMAL | numeric |
fixed_size_binary(16) | FIXED_LEN_BYTE_ARRAY | bytea | |
float32 | FLOAT | real | |
float64 | DOUBLE | double precision | |
int16 | INT32 | INT(16, true) | smallint |
int32 | INT32 | integer | |
int64 | INT64 | bigint | |
large_binary | BYTE_ARRAY | bytea | |
large_utf8 | BYTE_ARRAY | jsonb | |
list | Nested | list | |
struct | Nested | Arrays ([]) | |
time64[microsecond] | INT64 | TIMESTAMP[isAdjustedToUTC = false, unit = MICROS] | timestamp |
time64[microsecond] | INT64 | TIMESTAMP[isAdjustedToUTC = true, unit = MICROS] | timestamp with time zone |
time64[nanosecond] | INT64 | TIME[isAdjustedToUTC = false, unit = NANOS] | time |
uint16 | INT32 | INT(16, false) | uint2 |
uint32 | INT32 | INT(32, false) | uint4 |
uint64 | INT64 | INT(64, false) | uint8 |
utf8 or large_utf8 | BYTE_ARRAY | STRING | text |
map | unsupported | ||
interval | unsupported |
You can copy up to 10 GiB of data at a time. If you need to copy more than that, please contact support.
When importing parquet files, entire row groups are held in memory at once, so ensure that your Materialize instance has enough available memory to accomodate your parquet files. If you are encountering memory issues, and are unable to reduce the sizes of your row groups, please contact support.
COPY t FROM STDIN WITH (DELIMITER '|');
COPY t FROM STDIN (FORMAT CSV);
COPY t FROM STDIN (DELIMITER '|');
Perform bulk import:
Using FILES option:
COPY INTO csv_table FROM 's3://example_bucket' (FORMAT CSV, AWS CONNECTION = example_aws_conn, FILES = ['example_data.csv']);
Using the full s3 URI:
COPY INTO csv_table FROM 's3://example_bucket/example_data.csv' (FORMAT CSV, AWS CONNECTION = example_aws_conn);
Using PATTERN option:
COPY INTO parquet_table FROM 's3://example_bucket' (FORMAT PARQUET, AWS CONNECTION = example_aws_conn, PATTERN = '*parquet*');
You can use COPY FROM with any S3-compatible object storage service, such as
Google Cloud Storage, Cloudflare R2, or MinIO. First,
create an AWS connection for S3-compatible storage,
then use it in the COPY command. Make sure your credentials have the necessary
permissions as described in S3 Bucket IAM Policies.
COPY INTO csv_table FROM 's3://my_bucket/my_data.csv' (FORMAT CSV, AWS CONNECTION = gcs_connection);
COPY INTO csv_table FROM '<s3 presigned URL>' (FORMAT CSV);
The privileges required to execute this statement are:
{{% include-headless "/headless/sql-command-privileges/copy-from" %}}