docs/integrations/data-integrations/amazon-s3.mdx
This documentation describes the integration of MindsDB with Amazon S3, an object storage service that offers industry-leading scalability, data availability, security, and performance.
<Tip> This data source integration is thread-safe, utilizing a connection pool where each thread is assigned its own connection. When handling requests in parallel, threads retrieve connections from the pool as needed. </Tip>Before proceeding, ensure that MindsDB is installed locally via Docker or Docker Desktop.
Establish a connection to your Amazon S3 bucket from MindsDB by executing the following SQL command:
CREATE DATABASE s3_datasource
WITH
engine = 's3',
parameters = {
"aws_access_key_id": "AQAXEQK89OX07YS34OP",
"aws_secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"bucket": "my-bucket"
};
Required connection parameters include the following:
aws_access_key_id: The AWS access key that identifies the user or IAM role.aws_secret_access_key: The AWS secret access key that identifies the user or IAM role.Optional connection parameters include the following:
aws_session_token: The AWS session token that identifies the user or IAM role. This becomes necessary when using temporary security credentials.bucket: The name of the Amazon S3 bucket. If not provided, all available buckets can be queried, however, this can affect performance, especially when listing all of the available objects.Retrieve data from a specified object (file) in a S3 bucket by providing the integration name and the object key:
SELECT *
FROM s3_datasource.`my-file.csv`;
LIMIT 10;
Wrap the object key in backticks (`) to avoid any issues parsing the SQL statements provided. This is especially important when the object key contains spaces, special characters or prefixes, such as my-folder/my-file.csv.
At the moment, the supported file formats are CSV, TSV, JSON, and Parquet. </Tip>
<Note> The above examples utilize `s3_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command. </Note>The special files table can be used to list all objects available in the specified bucket or all buckets if the bucket name is not provided:
SELECT *
FROM s3_datasource.files LIMIT 10
The content of files can also be retrieved by explicitly requesting the content column. This column is empty by default to avoid unnecessary data transfer:
SELECT path, content
FROM s3_datasource.files LIMIT 10