Back to Pandas Ai

Semantic Layer on CSV

examples/semantic_layer_csv.ipynb

3.0.04.2 KB
Original Source

Semantic Layer on CSV

In this notebook, we will show how to create a semantic layer on a CSV file. The semantic layer works as a bridge between the raw data and the natural language layer.

Why use a Semantic Layer?

  • Adds context and meaning to data columns
  • Makes it easier for the large language model to understand context
  • Set once, use across multiple sessions

Import PandasAI

python
import pandasai as pai

Read raw data

For this example, we will use a small dataset of heart disease patients from Kaggle.

python
# Load the heart disease dataset
file_df = pai.read_csv("./dataheart.csv")

# Display the first few rows
file_df.head()

Create the Semantic Layer

Requirements for the semantic layer:

  • path: Must be in format 'organization/dataset'
  • name: A descriptive name for the dataset
  • df: A dataframe
  • description: Brief overview of the dataset
  • columns: List of dictionaries with format:
    python
    {
        "name": "column_name",
        "type": "column_type",  # string, number, date, datetime
        "description": "column_description"
    }
    
python
dataset = pai.create(path="organization/heart",
    name="Heart",
    description="Heart Disease Dataset",
    df = file_df,
    columns=[
        {
            "name": "Age",
            "type": "integer",
            "description": "Age of the patient in years"
        },
        {
            "name": "Sex",
            "type": "string",
            "description": "Gender of the patient (M: Male, F: Female)"
        },
        {
            "name": "ChestPainType",
            "type": "string",
            "description": "Type of chest pain (ATA: Atypical Angina, NAP: Non-Anginal Pain, ASY: Asymptomatic, TA: Typical Angina)"
        },
        {
            "name": "RestingBP",
            "type": "integer",
            "description": "Resting blood pressure in mm Hg"
        },
        {
            "name": "Cholesterol",
            "type": "integer",
            "description": "Serum cholesterol in mg/dl"
        },
        {
            "name": "FastingBS",
            "type": "integer",
            "description": "Fasting blood sugar (1: if FastingBS > 120 mg/dl, 0: otherwise)"
        },
        {
            "name": "RestingECG",
            "type": "string",
            "description": "Resting electrocardiogram results (Normal, ST: having ST-T wave abnormality, LVH: showing probable or definite left ventricular hypertrophy)"
        },
        {
            "name": "MaxHR",
            "type": "integer",
            "description": "Maximum heart rate achieved"
        },
        {
            "name": "ExerciseAngina",
            "type": "string",
            "description": "Exercise-induced angina (Y: Yes, N: No)"
        },
        {
            "name": "Oldpeak",
            "type": "float",
            "description": "ST depression induced by exercise relative to rest"
        },
        {
            "name": "ST_Slope",
            "type": "string",
            "description": "Slope of the peak exercise ST segment (Up, Flat, Down)"
        },
        {
            "name": "HeartDisease",
            "type": "integer",
            "description": "Target variable - Heart disease presence (1: heart disease, 0: normal)"
        }
    ])

Load Semantic Dataframe

Once you have saved the dataframe with its semantic layer, you can load it in any session using the load() method. This allows you to:

  • Maintain data context across sessions
  • Ask questions about your data in natural language
  • Generate more accurate analysis and visualizations
python
# Load the semantically enhanced dataset
df = pai.load("organization/heart")

Chat with your dataframe

You can now ask questions about your data in natural language to your dataframe using the chat() method. PandasAI can be used with several LLMs. For the purpose of this example, we are using LiteLLM.

python
from pandasai_litellm.litellm import LiteLLM

# Initialize LiteLLM with your OpenAI model
llm = LiteLLM(model="gpt-4.1-mini", api_key="YOUR_OPENAI_API_KEY")

# Configure PandasAI to use this LLM
pai.config.set({
    "llm": llm
})

response = df.chat("What is the correlation between age and cholesterol?")

print(response)