docs/guides/working_with_data/sql.md
marimo lets you mix and match Python and SQL: Use SQL to query Python dataframes (or databases like SQLite and Postgres), and get the query result back as a Python dataframe.
For a video overview on how to use SQL in marimo, watch our YouTube tutorial.
To create a SQL cell, you first need to install additional dependencies, including duckdb:
/// tab | install with pip
pip install "marimo[sql]"
///
/// tab | install with uv
uv add "marimo[sql]"
///
/// tab | install with conda
conda install -c conda-forge marimo duckdb polars
///
!!! example "Examples"
For example notebooks, check out
[`examples/sql/` on GitHub](https://github.com/marimo-team/marimo/tree/main/examples/sql/).
In this example notebook, we have a Pandas dataframe and a SQL cell that queries it. Notice that the query result is returned as a Python dataframe and usable in subsequent cells.
<iframe src="https://marimo.app/l/38dxkd?embed=true" class="demo xlarge" height="800px" frameBorder="0"> </iframe>You can create SQL cells in one of three ways:
<figcaption>Add SQL Cell</figcaption>
This creates a "SQL" cell for you, which is syntactic sugar for Python code. The underlying code looks like:
output_df = mo.sql(f"SELECT * FROM my_table LIMIT {max_rows.value}")
Notice that we have an output_df variable in the cell. This contains
the query result, and is a Polars DataFrame (if you have polars installed) or
a Pandas DataFrame (if you don't). One of them must be installed in order to
interact with the query result.
The SQL statement itself is an f-string, letting you
interpolate Python values into the query with {}. In particular, this means
your SQL queries can depend on the values of UI elements or other Python values,
and they are fit into marimo's reactive dataflow graph.
marimo supports different output types for SQL queries, which is particularly useful when working with large datasets. You can configure this in your application configuration in the top right of the marimo editor.
The available options are:
native: Uses DuckDB's native lazy relation (recommended for best performance)lazy-polars: Returns a lazy Polars DataFramepandas: Returns a Pandas DataFramepolars: Returns an eager Polars DataFrameauto: Automatically chooses based on installed packages (first tries polars then pandas)For best performance with large datasets, we recommend using native to avoid loading the entire result set into memory and to more easily chain SQL cells together. By default, only the first 10 rows are displayed in the UI to prevent memory issues.
???+ tip "Set a default"
The default output type is currently `auto`, but we recommend explicitly setting the output type to `native` for best performance with large datasets or `polars` if you need to work with the results in Python code. You can configure this in your application settings.
You can reference a local dataframe in your SQL cell by using the name of the Python variable that holds the dataframe. If you have a database connection with a table of the same name, the database table will be used instead.
<div align="center"> <figure><figcaption>Reference a dataframe</figcaption>
Since the output dataframe variable (_df) has an underscore, making it private, it is not referenceable from other cells.
Defining a non-private (non-underscored) output variable in the SQL cell allows you to reference the resulting dataframe in other Python and SQL cells.
<div align="center"> <figure><figcaption>Reference the SQL result</figcaption>
In the above example, you may have noticed we queried an HTTP endpoint instead of a local dataframe. We are not only limited to querying local dataframes; we can also query files, databases such as Postgres and SQLite, and APIs:
-- or
SELECT * FROM 's3://my-bucket/file.parquet';
-- or
SELECT * FROM read_csv('path/to/example.csv');
-- or
SELECT * FROM read_parquet('path/to/example.parquet');
For a full list you can check out the duckdb extensions. You can also check out our examples on GitHub.
Our "SQL" cells are really just Python under the hood to keep notebooks as pure Python scripts. By default, we use f-strings for SQL strings, which allows for parameterized SQL like SELECT * from table where value < {min}.
To escape real {/} that you don't want parameterized, use double {{...}}:
SELECT unnest([{{'a': 42, 'b': 84}}, {{'a': 100, 'b': NULL}}]);
There are two ways to connect to a database in marimo:
Click the "Add Database Connection" button in your notebook to connect to PostgreSQL, MySQL, SQLite, DuckDB, Snowflake, or BigQuery databases. The UI will guide you through entering your connection details securely. Environment variables picked up from your dotenv can be used to fill out the database configuration fields.
<figcaption>Add a database connection through the UI</figcaption>
If you'd like to connect to a database that isn't supported by the UI, you can use the code method below, or submit a feature request.
You can bring your own database via a connection engine with one of the following libraries
By default, marimo uses the in-memory duckdb connection.
??? info "List of supported databases"
Updated: 2025-04-30. This list is not exhaustive.
| Database | Library |
| -------------------------- | ---------------------------------- |
| Amazon Athena | `sqlalchemy`, `sqlmodel`, `ibis` |
| Amazon Redshift | `sqlalchemy`, `sqlmodel` |
| Apache Drill | `sqlalchemy`, `sqlmodel` |
| Apache Druid | `sqlalchemy`, `sqlmodel`, `ibis` |
| Apache Hive and Presto | `sqlalchemy`, `sqlmodel` |
| Apache Solr | `sqlalchemy`, `sqlmodel` |
| BigQuery | `sqlalchemy`, `sqlmodel`, `ibis` |
| ClickHouse | `clickhouse_connect`, `chdb` |
| CockroachDB | `sqlalchemy`, `sqlmodel` |
| Databricks | `sqlalchemy`, `sqlmodel`, `ibis` |
| dlt | `ibis` |
| Datafusion | `ibis` |
| DuckDB | `duckdb` |
| EXASolution | `sqlalchemy`, `sqlmodel`, `ibis` |
| Elasticsearch (readonly) | `sqlalchemy`, `sqlmodel` |
| Firebolt | `sqlalchemy`, `sqlmodel` |
| Flink | `ibis` |
| Google Sheets | `sqlalchemy`, `sqlmodel` |
| Impala | `sqlalchemy`, `sqlmodel`, `ibis` |
| Microsoft Access | `sqlalchemy`, `sqlmodel` |
| Microsoft SQL Server | `sqlalchemy`, `sqlmodel`, `ibis` |
| MonetDB | `sqlalchemy`, `sqlmodel` |
| MySQL | `sqlalchemy`, `sqlmodel`, `ibis` |
| OpenGauss | `sqlalchemy`, `sqlmodel` |
| Oracle | `sqlalchemy`, `sqlmodel`, `ibis` |
| PostgreSQL | `sqlalchemy`, `sqlmodel`, `ibis` |
| PySpark | `ibis` |
| RisingWave | `ibis` |
| SAP HANA | `sqlalchemy`, `sqlmodel` |
| Snowflake | `sqlalchemy`, `sqlmodel`, `ibis` |
| SQLite | `sqlalchemy`, `sqlmodel`, `ibis` |
| Teradata Vantage | `sqlalchemy`, `sqlmodel` |
| TimePlus | `sqlalchemy`, `sqlmodel` |
| Trino | `sqlalchemy`, `sqlmodel`, `ibis` |
Define the engine as a Python variable in a cell:
/// tab | SQLAlchemy
import sqlalchemy
# Create an in-memory SQLite database with SQLAlchemy
sqlite_engine = sqlalchemy.create_engine("sqlite:///:memory:")
///
/// tab | SQLModel
import sqlmodel
# Create an in-memory SQLite database with SQLModel
sqlite_engine = sqlmodel.create_engine("sqlite:///:memory:")
///
/// tab | Ibis
import ibis
# Create an in-memory SQLite database with Ibis
sqlite_engine = ibis.connect("sqlite:///:memory:")
///
/// tab | DuckDB
import duckdb
# Create a DuckDB connection
duckdb_conn = duckdb.connect("file.db")
///
/// tab | ClickHouse Connect
ClickHouse Connect enables remote connections to ClickHouse databases. Refer to the official docs for more configuration options.
import clickhouse_connect
engine = clickhouse_connect.get_client(host="localhost", port=8123, username="default", password="password")
///
/// tab | chDB
!!! warning
chDB is still new. You may experience issues with your queries. We recommend only using one connection at a time.
Refer to [chDB docs](https://github.com/orgs/chdb-io/discussions/295) for more information.
import chdb
connection = chdb.connect(":memory:")
# Supported formats with examples:
":memory:" # In-memory database
"test.db" # Relative path
"file:test.db" # Explicit file protocol
"/path/to/test.db" # Absolute path
"file:/path/to/test.db" # Absolute path with protocol
"file:test.db?param1=value1¶m2=value2" # With query parameters
"file::memory:?verbose&log-level=test" # In-memory with parameters
"///path/to/test.db?param1=value1" # Triple slash absolute path
///
marimo will auto-discover the engine and let you select it in the SQL cell's connection dropdown.
<div align="center"> <figure><figcaption>Choose a custom database connection</figcaption>
marimo will automatically discover the database connection and display the database, schemas, tables, and columns in the Data Sources panel. This panels lets you quickly navigate your database schema and reference tables and columns to pull in your SQL queries.
<div align="center"> <figure><figcaption>Data Sources panel</figcaption>
???+ note
By default, marimo auto-discovers databases and schemas, but not tables and columns (to avoid performance issues with large databases). You can configure this behavior in your `pyproject.toml` file. Options are `true`, `false`, or `"auto"`. `"auto"` will determine whether to auto-discover based on the type of database (e.g. when the value is `"auto"`, Snowflake and BigQuery will not auto-discover tables and columns while SQLite, Postgres, and MySQL will):
```toml title="pyproject.toml"
[tool.marimo.datasources]
auto_discover_schemas = true # Default: true
auto_discover_tables = "auto" # Default: "auto"
auto_discover_columns = "auto" # Default: false
```
marimo supports connecting to Iceberg catalogs. You can click the "+" button in the Datasources panel or manually create a PyIceberg Catalog connection. PyIceberg supports a variety of catalog implementations including REST, SQL, Glue, DynamoDB, and more.
from pyiceberg.catalog.rest import RestCatalog
catalog = RestCatalog(
name="catalog",
warehouse="1234567890",
uri="https://my-catalog.com",
token="my-token",
)
Catalogs will appear in the Datasources panel, but they cannot be used as an engine in SQL cells. However, you can still load the table and use it in subsequent Python or SQL cells.
df = catalog.load_table(("my-namespace", "my-table")).to_polars()
SUMMARIZE df;
marimo provides a few utilities when working with SQL
SQL Linter
Lint your SQL code and provide better autocompletions and error highlighting.
<div align="center"> <figure> </figure> </div>To disable the linter, you can set the sql_linter configuration to false in your pyproject.toml file or disable it in the marimo editor's settings menu.
SQL Formatting
Click on the paint roller icon at the bottom right of the SQL cell to format your SQL code.
<div align="center"> <figure> </figure> </div>SQL Mode
For In-Memory DuckDB, marimo offers a Validate mode that will validate your SQL as you write it.
<figure> <video autoplay muted loop playsinline width="600px" align="center"> <source src="/_static/docs-sql-validate-mode.mp4" type="video/mp4"> </video> </figure>Under the hood, this runs a debounced query in EXPLAIN mode and returns the parsed errors.
For an interactive tutorial, run
marimo tutorial sql
at your command-line.
Check out our examples on GitHub.