llama-index-integrations/readers/llama-index-readers-database/README.md
Database Reader is a tool designed to query and load data from databases efficiently.
SQLDatabase, SQLAlchemy Engine, full URI, or discrete credentialsschema selection (namespace)metadata_cols) and text exclusion (excluded_text_cols)id_ generation function (document_id)lazy_load_data) and async (aload_data)You can install Database Reader via pip:
pip install llama-index-readers-database
from llama_index.readers.database import DatabaseReader
# Initialize DatabaseReader with the SQL database connection details
reader = DatabaseReader(
sql_database="<SQLDatabase Object>", # Optional: SQLDatabase object
engine="<SQLAlchemy Engine Object>", # Optional: SQLAlchemy Engine object
uri="<Connection URI>", # Optional: Connection URI
scheme="<Scheme>", # Optional: Scheme
host="<Host>", # Optional: Host
port="<Port>", # Optional: Port
user="<Username>", # Optional: Username
password="<Password>", # Optional: Password
dbname="<Database Name>", # Optional: Database Name
)
# Load data from the database using a query
documents = reader.load_data(
query="<SQL Query>" # SQL query parameter to filter tables and rows
)
# Initialize DatabaseReader with the SQL connection string and custom database schema
from llama_index.readers.database import DatabaseReader
reader = DatabaseReader(
uri="postgresql+psycopg2://user:pass@localhost:5432/mydb",
schema="warehouse", # optional namespace
)
# Streaming variant, excluded id from text_resource
for doc in reader.lazy_load_data(
query="SELECT * FROM warehouse.big_table", excluded_text_cols={"id"}
):
process(doc)
# Async variant, added region to metadata
docs_async = await reader.aload_data(
query="SELECT * FROM warehouse.big_table", metadata_cols=["region"]
)
# Advanced usage with custom named metadata columns, columns excluded from the `Document.text_resource`, and a dynamic `Document.id_` generated from row data and a fstring template
from llama_index.readers.database import DatabaseReader
reader_media = DatabaseReader(
uri="postgresql+psycopg2://user:pass@localhost:5432/mydb",
schema="media", # optional namespace
)
docs = reader_media.load_data(
query="SELECT id, title, body, updated_at FROM media.articles",
metadata_cols=[
("id", "article_id"),
"updated_at",
], # map / include in metadata
excluded_text_cols=["updated_at"], # omit from text
document_id=lambda row: f"media-articles-{row['id']}", # custom document id
)
This loader is designed to be used as a way to load data into LlamaIndex.