Semantic Text Deduplication - Supabase

This guide will walk you through a "Semantic Text Deduplication" example using Colab and Supabase Vecs. You'll learn how to find similar movie reviews using embeddings, and remove any that seem like duplicates. You will:

Launch a Postgres database that uses pgvector to store embeddings
Launch a notebook that connects to your database
Load the IMDB dataset
Use the sentence-transformers/all-MiniLM-L6-v2 model to create an embedding representing the semantic meaning of each review.
Search for all duplicates.

<$Partial path="database_setup.mdx" />

Launching a notebook

Launch our semantic_text_deduplication notebook in Colab:

<a className="w-64" href="https://colab.research.google.com/github/supabase/supabase/blob/master/examples/ai/semantic_text_deduplication.ipynb"

</a>

At the top of the notebook, you'll see a button Copy to Drive. Click this button to copy the notebook to your Google Drive.

Connecting to your database

Inside the Notebook, find the cell which specifies the DB_CONNECTION. It will contain some code like this:

python

import vecs

DB_CONNECTION = "postgresql://<user>:<password>@<host>:<port>/<db_name>"

# create vector store client
vx = vecs.create_client(DB_CONNECTION)

Replace the DB_CONNECTION with your own connection string. You can find the connection string on your project dashboard by clicking Connect.

SQLAlchemy requires the connection string to start with postgresql:// (instead of postgres://). Don't forget to rename this after copying the string from the dashboard.

</Admonition> <Admonition type='note'>

You must use the "connection pooling" string (domain ending in *.pooler.supabase.com) with Google Colab since Colab does not support IPv6.

</Admonition>

Stepping through the notebook

Now all that's left is to step through the notebook. You can do this by clicking the "execute" button (ctrl+enter) at the top left of each code cell. The notebook guides you through the process of creating a collection, adding data to it, and querying it.

You can view the inserted items in the Table Editor, by selecting the vecs schema from the schema dropdown.

<$Partial path="ai/quickstart_hf_deployment.mdx" />

Next steps

You can now start building your own applications with Vecs. Check our examples for ideas.