llama-index-integrations/tools/llama-index-tools-waii/README.md
This tool connects to database connections managed by Waii, which allows generic SQL queries, do performance analyze, describe a SQL query, and more.
First you need to create a waii.ai account, you request an account from here.
Initialize the tool with your account credentials:
from llama_index.tools.waii import WaiiToolSpec
waii_tool = WaiiToolSpec(
url="https://tweakit.waii.ai/api/",
# API Key of Waii (not OpenAI API key)
api_key="...",
# Connection key of WAII connected database, see https://github.com/waii-ai/waii-sdk-py#get-connections
database_key="...",
)
The tools available are:
get_answer: Get answer to natural language question (which generate a SQL query, run it, explain the result)describe_query: Describe a SQL queryperformance_analyze: Analyze performance of a SQL query (by query_id)diff_query: Compare two SQL queriesdescribe_dataset: Describe dataset, such as table, schema, etc.transcode: Transcode SQL query to another SQL dialectget_semantic_contexts: Get semantic contexts of a SQL querygenerate_query_only: Generate SQL query only (not run it)run_query: Run a SQL queryYou can also load the data directly call load_data
documents = waii_tool.load_data("Get all tables with their number of columns")
index = VectorStoreIndex.from_documents(documents).as_query_engine()
print(index.query("Which table contains most columns?"))
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI
agent = FunctionAgent(
tools=waii_tool.to_tool_list(), llm=OpenAI(model="gpt-4-1106-preview")
)
from llama_index.core.workflow import Context
ctx = Context(agent)
print(
await agent.run(
"Give me top 3 countries with the most number of car factory", ctx=ctx
)
)
print(
await agent.run("What are the car factories of these countries", ctx=ctx)
)
from llama_index.core.workflow import Context
ctx = Context(agent)
print(
await agent.run(
"Give me top 3 longest running queries, and their duration.", ctx=ctx
)
)
print(await agent.run("analyze the 2nd-longest running query", ctx=ctx))
previous_query = """
SELECT
employee_id,
department,
salary,
AVG(salary) OVER (PARTITION BY department) AS department_avg_salary,
salary - AVG(salary) OVER (PARTITION BY department) AS diff_from_avg
FROM
employees;
"""
current_query = """
SELECT
employee_id,
department,
salary,
MAX(salary) OVER (PARTITION BY department) AS department_max_salary,
salary - AVG(salary) OVER (PARTITION BY department) AS diff_from_avg
FROM
employees;
LIMIT 100;
"""
print(
await agent.run(
f"tell me difference between {previous_query} and {current_query}",
ctx=ctx,
)
)
print(await agent.run("Summarize the dataset", ctx=ctx))
print(
await agent.run(
"Give me questions which I can ask about this dataset", ctx=ctx
)
)
q = """
SELECT
employee_id,
department,
salary,
AVG(salary) OVER (PARTITION BY department) AS department_avg_salary,
salary - AVG(salary) OVER (PARTITION BY department) AS diff_from_avg
FROM
employees;
"""
print(await agent.run(f"what this query can do? {q}", ctx=ctx))
q = """
from pyspark.sql import SparkSession
from pyspark.sql.functions import avg, col
from pyspark.sql.window import Window
# Initialize Spark session
spark = SparkSession.builder.appName("example").getOrCreate()
# Assuming you have a DataFrame called 'employees'
# If not, you need to read your data into a DataFrame first
# Define window specification
windowSpec = Window.partitionBy("department")
# Perform the query
result = (employees
.select(
col("employee_id"),
col("department"),
col("salary"),
avg("salary").over(windowSpec).alias("department_avg_salary"),
(col("salary") - avg("salary").over(windowSpec)).alias("diff_from_avg")
))
# Show the result
result.show()
"""
print(
await agent.run(f"translate this pyspark query {q}, to Snowflake", ctx=ctx)
)
You can also use Waii API directly, see here