examples/semantic_layer_csv.ipynb
In this notebook, we will show how to create a semantic layer on a CSV file. The semantic layer works as a bridge between the raw data and the natural language layer.
import pandasai as pai
For this example, we will use a small dataset of heart disease patients from Kaggle.
# Load the heart disease dataset
file_df = pai.read_csv("./dataheart.csv")
# Display the first few rows
file_df.head()
Requirements for the semantic layer:
path: Must be in format 'organization/dataset'name: A descriptive name for the datasetdf: A dataframedescription: Brief overview of the datasetcolumns: List of dictionaries with format:
{
"name": "column_name",
"type": "column_type", # string, number, date, datetime
"description": "column_description"
}
dataset = pai.create(path="organization/heart",
name="Heart",
description="Heart Disease Dataset",
df = file_df,
columns=[
{
"name": "Age",
"type": "integer",
"description": "Age of the patient in years"
},
{
"name": "Sex",
"type": "string",
"description": "Gender of the patient (M: Male, F: Female)"
},
{
"name": "ChestPainType",
"type": "string",
"description": "Type of chest pain (ATA: Atypical Angina, NAP: Non-Anginal Pain, ASY: Asymptomatic, TA: Typical Angina)"
},
{
"name": "RestingBP",
"type": "integer",
"description": "Resting blood pressure in mm Hg"
},
{
"name": "Cholesterol",
"type": "integer",
"description": "Serum cholesterol in mg/dl"
},
{
"name": "FastingBS",
"type": "integer",
"description": "Fasting blood sugar (1: if FastingBS > 120 mg/dl, 0: otherwise)"
},
{
"name": "RestingECG",
"type": "string",
"description": "Resting electrocardiogram results (Normal, ST: having ST-T wave abnormality, LVH: showing probable or definite left ventricular hypertrophy)"
},
{
"name": "MaxHR",
"type": "integer",
"description": "Maximum heart rate achieved"
},
{
"name": "ExerciseAngina",
"type": "string",
"description": "Exercise-induced angina (Y: Yes, N: No)"
},
{
"name": "Oldpeak",
"type": "float",
"description": "ST depression induced by exercise relative to rest"
},
{
"name": "ST_Slope",
"type": "string",
"description": "Slope of the peak exercise ST segment (Up, Flat, Down)"
},
{
"name": "HeartDisease",
"type": "integer",
"description": "Target variable - Heart disease presence (1: heart disease, 0: normal)"
}
])
Once you have saved the dataframe with its semantic layer, you can load it in any session using the load() method. This allows you to:
# Load the semantically enhanced dataset
df = pai.load("organization/heart")
You can now ask questions about your data in natural language to your dataframe using the chat() method. PandasAI can be used with several LLMs. For the purpose of this example, we are using LiteLLM.
from pandasai_litellm.litellm import LiteLLM
# Initialize LiteLLM with your OpenAI model
llm = LiteLLM(model="gpt-4.1-mini", api_key="YOUR_OPENAI_API_KEY")
# Configure PandasAI to use this LLM
pai.config.set({
"llm": llm
})
response = df.chat("What is the correlation between age and cholesterol?")
print(response)