🚀 Gretel to Opik Integration: Creating Q&A Datasets for Model Evaluation

The Story: You need high-quality Q&A datasets to evaluate your AI models, but creating them manually is time-consuming and expensive. This cookbook shows you how to use Gretel's synthetic data generation to create diverse, realistic Q&A datasets and import them into Opik for model evaluation and optimization.

What you'll accomplish:

Generate synthetic Q&A data using Gretel Data Designer
Convert it to Opik format
Import into Opik for model evaluation
See your dataset in the Opik UI

📋 Prerequisites

Gretel Account: Sign up at gretel.ai and get your API key
Comet Account: Sign up at comet.com for Opik access

Let's get started! 🎯

🛠️ Two Approaches Available

This cookbook demonstrates two methods for generating synthetic data with Gretel:

Data Designer (recommended for custom datasets): Create datasets from scratch with precise control
Safe Synthetics (recommended for existing data): Generate synthetic versions of existing datasets

We'll start with Data Designer, then show Safe Synthetics as an alternative.

💾 Step 1: Install Required Packages

We'll install the Gretel client and Opik SDK:

python

%pip install gretel-client opik pandas --upgrade --quiet

🔐 Step 2: Authentication Setup

Let's authenticate with both Gretel and Opik:

python

import os
import getpass
import opik
import pandas as pd

print("🔐 Setting up authentication...")

# Set up Gretel API key
if "GRETEL_API_KEY" not in os.environ:
    os.environ["GRETEL_API_KEY"] = getpass.getpass("Enter your Gretel API key: ")

# Set up Opik (will prompt for API key if not configured)
opik.configure()

print("✅ Authentication completed!")

📊 Step 3: Generate Q&A Dataset with Gretel Data Designer

Now we'll use Gretel Data Designer to generate synthetic Q&A data. We'll create questions and answers about AI and machine learning:

python

from gretel_client.navigator_client import Gretel  # Use navigator_client instead!
from gretel_client.data_designer import columns as C
from gretel_client.data_designer import params as P

print("🤖 Setting up Q&A dataset generation with Gretel Data Designer...")

# Initialize Data Designer using the navigator_client and factory method
gretel_navigator = Gretel()  # This creates the navigator client
dd = gretel_navigator.data_designer.new(model_suite="apache-2.0")

# Add topic column (categorical sampler)
dd.add_column(
    C.SamplerColumn(
        name="topic",
        type=P.SamplerType.CATEGORY,
        params=P.CategorySamplerParams(
            values=[
                "neural networks", "deep learning", "machine learning", "NLP", 
                "computer vision", "reinforcement learning", "AI ethics", "data science"
            ]
        )
    )
)

# Add difficulty column
dd.add_column(
    C.SamplerColumn(
        name="difficulty",
        type=P.SamplerType.CATEGORY,
        params=P.CategorySamplerParams(
            values=["beginner", "intermediate", "advanced"]
        )
    )
)

# Add question column (LLM-generated)
dd.add_column(
    C.LLMTextColumn(
        name="question",
        prompt=(
            "Generate a challenging, specific question about {{ topic }} "
            "at {{ difficulty }} level. The question should be clear, focused, "
            "and something a student or practitioner might actually ask."
        )
    )
)

# Add answer column (LLM-generated)
dd.add_column(
    C.LLMTextColumn(
        name="answer",
        prompt=(
            "Provide a clear, accurate, and comprehensive answer to this {{ difficulty }}-level "
            "question about {{ topic }}: '{{ question }}'. The answer should be educational "
            "and directly address all aspects of the question."
        )
    )
)

print("📊 Generating Q&A dataset...")

# Generate the dataset
workflow_run = dd.create(num_records=20, wait_until_done=True)
synthetic_df = workflow_run.dataset.df

print(f"✅ Generated {len(synthetic_df)} Q&A pairs!")
print(f"\n📊 Dataset shape: {synthetic_df.shape}")
print(f"📋 Columns: {list(synthetic_df.columns)}")

# Display first few rows
print("\n📄 Sample data:")
synthetic_df.head(3)

🔄 Step 4: Convert to Opik Format

Let's convert our Gretel-generated data to the format Opik expects:

python

def convert_to_opik_format(df):
    """Convert Gretel Q&A data to Opik dataset format"""
    opik_items = []
    
    for _, row in df.iterrows():
        # Create Opik dataset item
        item = {
            "input": {
                "question": row["question"]
            },
            "expected_output": row["answer"],
            "metadata": {
                "topic": row.get("topic", "AI/ML"),
                "difficulty": row.get("difficulty", "unknown"),
                "source": "gretel_navigator"
            }
        }
        opik_items.append(item)
    
    return opik_items

print("🔄 Converting to Opik format...")

opik_data = convert_to_opik_format(synthetic_df)

print(f"✅ Converted {len(opik_data)} items to Opik format!")
print("\n📋 Sample converted item:")
import json
print(json.dumps(opik_data[0], indent=2))

📤 Step 5: Push Dataset to Opik

Now let's upload our dataset to Opik where it can be used for model evaluation:

python

print("📤 Pushing dataset to Opik...")

# Initialize Opik client
opik_client = opik.Opik()

# Create the dataset
dataset_name = "gretel-ai-qa-dataset"
dataset = opik_client.get_or_create_dataset(
    name=dataset_name,
    description="Synthetic Q&A dataset generated using Gretel Data Designer for AI/ML evaluation"
)

# Insert the data
dataset.insert(opik_data)

print(f"✅ Successfully created dataset: {dataset.name}")
print(f"🆔 Dataset ID: {dataset.id}")
print(f"📊 Total items: {len(opik_data)}")

The trace can now be viewed in the UI:

✅ Step 6: Verify Your Dataset

Let's confirm the dataset was created successfully and see how to use it:

python

print("🔍 Verifying dataset creation...")

# Try to retrieve the dataset
try:
    retrieved_dataset = opik_client.get_dataset(dataset_name)
    print(f"✅ Dataset verified: {retrieved_dataset.name}")
    print(f"🆔 Dataset ID: {retrieved_dataset.id}")
    
    print(f"\n🎯 Next steps:")
    print(f"1. Go to https://www.comet.com")
    print(f"2. Navigate to Opik → Datasets")
    print(f"3. Find your dataset: {dataset_name}")
    print(f"4. Use it to evaluate your AI models!")
    
except Exception as e:
    print(f"❌ Could not verify dataset: {e}")
    print("Please check your Opik configuration and try again.")

🧪 Step 7: Example Model Evaluation

Here's how you can use your new dataset to evaluate a model with Opik:

python

# Example: Simple Q&A model evaluation
@opik.track
def simple_qa_model(input_data):
    """A simple example model that generates responses to questions"""
    question = input_data.get('question', '')
    
    # This is just an example - replace with your actual model
    if 'neural network' in question.lower():
        return "A neural network is a computational model inspired by biological neural networks."
    elif 'machine learning' in question.lower():
        return "Machine learning is a subset of AI that enables systems to learn from data."
    else:
        return "This is a complex AI/ML topic that requires detailed explanation."

print("🧪 Example model evaluation setup:")
print(f"Dataset: {dataset_name}")
print("Model: simple_qa_model (replace with your actual model)")
print("\n💡 To run evaluation, uncomment and run the following code:")
print("\n🎉 Integration complete! Your Gretel-generated dataset is ready for model evaluation in Opik.")

Congratulations! 🎉 You've successfully:

Generated synthetic Q&A data using Gretel Data Designer's advanced column types
Converted the data to Opik's expected format
Created a dataset in Opik for model evaluation
Set up the foundation for AI model testing and optimization

The key advantage of using Gretel Data Designer is its modular approach - you can define exactly what data you want using samplers (for categories) and LLM columns (for generated text), giving you precise control over your synthetic dataset.

🔗 Next Steps

View your dataset: Go to your Comet workspace → Opik → Datasets
Evaluate models: Use the dataset to test your Q&A models
Optimize prompts: Use Opik's Agent Optimizer with your synthetic data
Scale up: Generate larger datasets for more comprehensive testing

📚 Resources

Happy evaluating! 🚀

🔄 Alternative: Using Gretel Safe Synthetics

If you have an existing Q&A dataset and want to create a synthetic version, you can use Gretel Safe Synthetics instead:

python

%%capture
%pip install -U gretel-client

Step A: Prepare Sample Data

python

import pandas as pd
from gretel_client.navigator_client import Gretel

# Initialize Gretel client
gretel = Gretel(api_key="prompt")

# Option 1: Use Gretel's sample ecommerce dataset (has 200+ records)
my_data_source = "https://gretel-datasets.s3.us-west-2.amazonaws.com/ecommerce_customers.csv"

# Option 2: Create your own Q&A dataset (needs 200+ records for holdout)
# For demonstration, we'll create a larger dataset
sample_questions = [
    'What is machine learning?',
    'How do neural networks work?',
    'What is the difference between AI and ML?',
    'Explain deep learning concepts',
    'What are the applications of NLP?'
] * 50  # Repeat to get 250 records

sample_answers = [
    'Machine learning is a subset of AI that enables systems to learn from data.',
    'Neural networks are computational models inspired by biological neural networks.',
    'AI is the broader concept while ML is a specific approach to achieve AI.',
    'Deep learning uses multi-layer neural networks to model complex patterns.',
    'NLP applications include chatbots, translation, sentiment analysis, and text generation.'
] * 50  # Repeat to get 250 records

sample_data = {
    'question': sample_questions,
    'answer': sample_answers,
    'topic': (['ML', 'Neural Networks', 'AI/ML', 'Deep Learning', 'NLP'] * 50),
    'difficulty': (['beginner', 'intermediate', 'beginner', 'advanced', 'intermediate'] * 50)
}

original_df = pd.DataFrame(sample_data)
print(f"📄 Original dataset: {len(original_df)} records")
print(original_df.head())

# Important: Gretel requires at least 200 records to use holdout
if len(original_df) < 200:
    print("⚠️ Warning: Dataset has less than 200 records. Holdout will be disabled.")

Step B: Generate Synthetic Version

python

# For quick demo with small dataset - disable holdout and transform
synthetic_dataset = gretel.safe_synthetic_dataset \
    .from_data_source(original_df, holdout=None) \
    .synthesize(num_records=5) \
    .create()

# Wait for completion and get results
synthetic_dataset.wait_until_done()
synthetic_df_safe = synthetic_dataset.dataset.df

print(f"✅ Generated {len(synthetic_df_safe)} synthetic Q&A pairs using Safe Synthetics!")
print(synthetic_df_safe.head())

Step C: View Results and Quality Report

python

# Preview synthetic data
print("🔍 Synthetic dataset preview:")
print(synthetic_dataset.dataset.df.head())

# View quality report table
print("📊 Quality Report Summary:")
print(synthetic_dataset.report.table)

# View detailed HTML report in notebook
# synthetic_dataset.report.display_in_notebook()

# Access workflow details
print("\n🔧 Workflow Configuration:")
print(synthetic_dataset.config_yaml)

# List all workflow steps
print("\n📋 Workflow Steps:")
for step in synthetic_dataset.steps:
    print(f"- {step.name}")

Step D: Convert to Opik and Upload

python

def convert_to_opik_format(df):
    """Convert Gretel Q&A data to Opik dataset format"""
    opik_items = []
    
    for _, row in df.iterrows():
        # Create Opik dataset item
        item = {
            "input": {
                "question": row["question"]
            },
            "expected_output": row["answer"],
            "metadata": {
                "topic": row.get("topic", "AI/ML"),
                "difficulty": row.get("difficulty", "unknown"),
                "source": "gretel_navigator"
            }
        }
        opik_items.append(item)
    
    return opik_items

# Initialize Opik client if not already defined
opik_client = opik.Opik()
# Convert and upload to Opik (same process as before)
opik_data_safe = convert_to_opik_format(synthetic_df_safe)

# Create dataset in Opik
dataset_safe = opik_client.get_or_create_dataset(
    name="gretel-safe-synthetics-qa-dataset",
    description="Synthetic Q&A dataset generated using Gretel Safe Synthetics"
)

dataset_safe.insert(opik_data_safe)
print(f"✅ Safe Synthetics dataset created: {dataset_safe.name}")

The trace can now be viewed in the UI:

🚨 Important: Dataset Size Requirements

Dataset Size	Holdout Setting	Example
< 200 records	`holdout=None`	`from_data_source(df, holdout=None)`
200+ records	Default (5%) or custom	`from_data_source(df)` or `from_data_source(df, holdout=0.1)`
Large datasets	Custom percentage/count	`from_data_source(df, holdout=250)`

🤔 When to Use Which Approach?

Use Case	Recommended Approach	Why
Creating new datasets from scratch	Data Designer	More control, custom column types, guided generation
Synthesizing existing datasets	Safe Synthetics	Preserves statistical relationships, privacy-safe
Custom data structures	Data Designer	Flexible column definitions, template system
Production data replication	Safe Synthetics	Maintains data utility while ensuring privacy

Both approaches integrate seamlessly with Opik for model evaluation! 🎯