apps/www/_blog/2022-12-15-postgres-foreign-data-wrappers-rust.mdx
Today we're releasing Supabase Wrappers, a framework for building Postgres Foreign Data Wrappers (FDW) which connects Postgres to external systems.
Foreign Data Wrappers are a core feature of PostgreSQL. We've extended this feature to query other databases or any other external system (including third-party APIs), using SQL.
We're releasing Wrappers today in Alpha, with support for Firebase and Stripe. Wrappers for Clickhouse, BigQuery, and Airtable are under development.
Let's step through a full example using the Stripe Wrapper.
First, let's give your Postgres database some authentication details to access your Stripe account:
create server stripe_server
foreign data wrapper stripe_wrapper
options (api_key 'sk_test_xxx');
Now we can map your Stripe data to Foreign Tables, which are just like normal tables except that the data exists outside of your database.
-- Create a foreign table for your Stripe products
create foreign table products (
id text,
name text,
description text,
default_price text
)
server my_stripe_server
options ( object 'products' );
After setting up your foreign table, you can query you Stripe products directly from your database:
-- Fetch all your stripe products in Postgres
select *
from products
limit 10;
Or from your application, using one of our client libraries:
import { createClient } from '@supabase/supabase-js'
const SUPABASE_URL = 'https://xyzcompany.supabase.co'
const SUPABASE_KEY = 'public-anon-key'
const supabase = createClient(SUPABASE_URL, SUPABASE_KEY)
const { data: stripeCustomers, error } = supabase
.from('products')
.select('id, name, description, default_price')
.limit(10)
Note: we kept this example simple, however for API security and code organization, you should store your foreign data in a separate schema.
Once we've added more Wrappers, they enable various possibilities:
In their 2017 paper, researchers from the University of Bologna investigated an approach to on-demand ETL:
In traditional OLAP systems, the ETL process loads all available data in the data warehouse before users start querying them. In some cases, this may be either inconvenient (because data are supplied from a provider for a fee) or unfeasible (because of their size); on the other hand, directly launching each analysis query on source data would not enable data reuse, leading to poor performance and high costs. The alternative investigated in this paper is that of fetching and storing data on-demand.
Their paper outlines the foundation for QETL (pronounced "kettle"): Query, Extract, Transform, Load. This differs from a more traditional ETL/ELT approach, where data is moved from one place to another. In QETL, the “Query” function allows data engineers to access data in a remote system even before moving it. This approach reduces the reliance on data engineering infrastructure, allowing teams to access operational insights faster.
We've built upon this concept using PostgreSQL's FDW system. This a new tool for developers and data engineers, with several benefits:
How does this look in action? Assuming that all of your analytical data is stored in Snowflake, you could create a foreign table inside your Supabase database:
create foreign table snowflake.order_history (
id bigint,
ts timestamptz,
event text,
user_id uuid
)
server my_snowflake_warehouse
options (table 'order_history', rowid_column 'id');
Now from your Supabase database you can query your Snowflake data directly:
select * from snowflake.order_history limit 1000;
You can even join remote data with your local tables to enrich existing operational data. For example, to figure out how many times a user has purchased something from your store:
select
users.id,
count(order_history.event)
from
snowflake.order_history
join auth.users on auth.users.id = snowflake.order_history.user_id
where order_history.event = 'purchase' and order_history.user_id = '<some_user_id>';
We can either run these queries on demand or, for better query performance, we can run them in the background (using something like pg_cron), and materialize the data into a local table.
This gives us the basis of QETL:
select statements on external systems, either on demand or on a recurring basis.This is a two-way process. It's equally useful to offload large datasets from your Postgres database to your Data Warehouse. With FDWs, this can be as simple as:
insert into snowflake.analytics
select * from analytics
where ts > (now() - interval '1 DAY');
On-demand ETL is a strong compliment for current ETL practices, and another tool in the toolbelt for Data Engineering and Developers that works with immediately with tools that interface with Postgres.
In a recent Software Engineering episode Andy Pavlo (database Professor at Carnegie Mellon and Co-Founder of OtterTune), explored the future between “better databases” and “better interfaces” [00:37:18]:
<Quote img="andy-pavlo.jpeg" caption="Andy Pavlo">Specialized engines are always going to outperform general-purpose ones. The question is whether the specialized engine is going to have such a significant difference in performance that it can overcome the limitations of a general purpose one.
...
The challenge oftentimes is this: is the benefit you're getting from a specialized engine because the Engine is different or the API is different?.
</Quote>He goes on to explore the benefits of a Graph database vs a Relational database.
Our recent release of pg_graphql closes the gap on the graph use-case by building a GraphQL API directly into Postgres as an extension. While a specialized graph database might provide performance benefits over Postgres, perhaps one of the largest benefits is simply the Graph API which makes it easier to reason about the data.
With the introduction of Wrappers, we're hoping to close the gap on even more of these type of workloads.
An exciting part of the FDW approach is that it provides a common interface to the world: SQL. While it has many shortcomings, SQL is the lingua franca of data. Postgres' FDWs transform any API into a data set with a common interface. This “interface aggregator” is similar to the promise of GraphQL engines. The benefit of embedding this functionality in the database is that it exists at the lowest level of the tech stack. Everything that is built on top can leverage it. While Postgres cannot easily access the functionality of a GraphQL server, a GraphQL server can easily access the functionality of Postgres.
The FDW interface also future-proofs Postgres. Instead of keeping up with the latest technological advances, Postgres can instead act as an interface whenever they develop. The recent advance in AI and ML is a great example of this: AI technology is developing faster than the time it would take to build a new “AI database”. With a FDW, Postgres can become the interface to this technology and many other technological advances in the future.
Postgres has a builtin “Postgres FDW” allows querying one Postgres database from another. We've extended this functionality to support a variety of data sources, from Data Warehouses to APIs. This release includes two initial wrappers: Stripe and Firebase
| Integration | Platform | Self-hosted | <span style={{fontWeight: "normal"}}>select</span> | <span style={{fontWeight: "normal"}}>insert</span> | <span style={{fontWeight: "normal"}}>update</span> | <span style={{fontWeight: "normal"}}>delete</span> |
|---|---|---|---|---|---|---|
| Firebase | 🚧 | ✅ | ✅ | 🚧 | 🚧 | 🚧 |
| Postgres | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Stripe | 🚧 | ✅ | ✅ | 🚧 | 🚧 | 🚧 |
With several more under development:
| Integration | <span style={{fontWeight: "normal"}}>select</span> | <span style={{fontWeight: "normal"}}>insert</span> | <span style={{fontWeight: "normal"}}>update</span> | <span style={{fontWeight: "normal"}}>delete</span> |
|---|---|---|---|---|
| Airtable | ✅ | 🚧 | 🚧 | 🚧 |
| BigQuery | ✅ | ✅ | ✅ | ✅ |
| ClickHouse | ✅ | ✅ | ✅ | ✅ |
Wrappers used pgx, extending it with FDW support. pgx is a Postgres extension framework written in Rust. Wrappers is very similar to Steampipe or Multicorn. We opted to develop our own framework for several reasons:
Wrappers supports a variety of types, including:
bool, i8, i16, f32, i32, f64, i64, String, Date, Timestamp, and JsonB.
Foreign Data Wrappers have a concept of "push down". This means that the FDW runs the query on remote data source. This is useful for performance reasons, as the remote data source can often perform the query more efficiently than Postgres. Push down is also useful for security reasons, as the remote data source can enforce access control. Limited push-down support has been added as a Proof of Concept, but this will be a key focus of Wrappers.
You can follow development of all the Wrappers in the official GitHub Repo.
We're not "officially" releasing Wrappers onto the platform yet, although the brave and curious might be able to figure out how to use it "unofficially". Caveat emptor: there will be breaking changes.
We're excited to see what the community does with Wrappers. We're hoping that Wrappers will help to accelerate the adoption of Postgres as a data hub. If you're interested in getting involved or building your own Wrapper, don't hesitate to jump into the code and start developing with us.