apps/opik-documentation/documentation/fern/docs-v2/integrations/huggingface-datasets.mdx
Hugging Face Datasets is a library that provides easy access to thousands of datasets for machine learning and natural language processing tasks.
This guide explains how to integrate Opik with Hugging Face Datasets to convert and import datasets into Opik for model evaluation and optimization.
Comet provides a hosted version of the Opik platform, simply create an account and grab your API Key.
You can also run the Opik platform locally, see the installation guide for more information.
To use Hugging Face Datasets with Opik, you'll need to have both the datasets and opik packages installed:
pip install opik datasets transformers pandas tqdm huggingface_hub
Configure the Opik Python SDK for your deployment type. See the Python SDK Configuration guide for detailed instructions on:
opik configureopik.configure()In order to access private datasets on Hugging Face, you will need to have your Hugging Face token. You can create and manage your Hugging Face tokens on this page.
You can set it as an environment variable:
export HUGGINGFACE_HUB_TOKEN="YOUR_TOKEN"
Or set it programmatically:
import os
import getpass
if "HUGGINGFACE_HUB_TOKEN" not in os.environ:
os.environ["HUGGINGFACE_HUB_TOKEN"] = getpass.getpass("Enter your Hugging Face token: ")
# Set project name for organization
os.environ["OPIK_PROJECT_NAME"] = "huggingface-datasets-integration-demo"
The integration provides a utility class to convert Hugging Face datasets to Opik format:
from datasets import load_dataset, Dataset as HFDataset
from opik import Opik
from typing import Optional, Dict, Any, List
import json
from tqdm import tqdm
import warnings
import numpy as np
import pandas as pd
warnings.filterwarnings('ignore')
class HuggingFaceToOpikConverter:
"""Utility class to convert Hugging Face datasets to Opik format."""
def __init__(self, opik_client: Opik):
self.opik_client = opik_client
def load_hf_dataset(
self,
dataset_name: str,
split: Optional[str] = None,
config: Optional[str] = None,
subset_size: Optional[int] = None,
**kwargs
) -> HFDataset:
"""
Load a dataset from Hugging Face Hub.
Args:
dataset_name: Name of the dataset on HF Hub
split: Specific split to load (train, validation, test)
config: Configuration/subset of the dataset
subset_size: Limit the number of samples
**kwargs: Additional arguments for load_dataset
Returns:
Loaded Hugging Face dataset
"""
print(f"📥 Loading dataset: {dataset_name}")
if config:
print(f" Config: {config}")
if split:
print(f" Split: {split}")
# Load the dataset
dataset = load_dataset(
dataset_name,
name=config,
split=split,
**kwargs
)
# Limit dataset size if specified
if subset_size and len(dataset) > subset_size:
dataset = dataset.select(range(subset_size))
print(f" Limited to {subset_size} samples")
print(f" ✅ Loaded {len(dataset)} samples")
print(f" Features: {list(dataset.features.keys())}")
return dataset
Here's how to convert a Hugging Face dataset to Opik format and upload it:
# Initialize the converter
opik_client = Opik()
converter = HuggingFaceToOpikConverter(opik_client)
# Load a dataset from Hugging Face
dataset = converter.load_hf_dataset(
dataset_name="squad",
split="validation",
subset_size=100 # Limit for demo
)
# Convert to Opik format
opik_data = converter.convert_to_opik_format(
dataset=dataset,
input_columns=["question"],
output_columns=["answers"],
metadata_columns=["id", "title"],
dataset_name="squad-qa-dataset",
description="SQuAD question answering dataset converted from Hugging Face"
)
print(f"✅ Converted {len(opik_data)} items to Opik format!")
The converter provides a method to transform Hugging Face datasets into Opik's expected format:
def convert_to_opik_format(
self,
dataset: HFDataset,
input_columns: List[str],
output_columns: List[str],
metadata_columns: Optional[List[str]] = None,
dataset_name: str = "huggingface-dataset",
description: str = "Dataset converted from Hugging Face"
) -> List[Dict[str, Any]]:
"""
Convert a Hugging Face dataset to Opik format.
Args:
dataset: Hugging Face dataset
input_columns: List of column names to use as input
output_columns: List of column names to use as expected output
metadata_columns: Optional list of columns to include as metadata
dataset_name: Name for the Opik dataset
description: Description for the Opik dataset
Returns:
List of Opik dataset items
"""
opik_items = []
for row in tqdm(dataset, desc="Converting to Opik format"):
# Extract input data
input_data = {}
for col in input_columns:
if col in dataset.features:
input_data[col] = self._extract_field_value(row, col)
# Extract expected output
expected_output = {}
for col in output_columns:
if col in dataset.features:
expected_output[col] = self._extract_field_value(row, col)
# Extract metadata
metadata = {}
if metadata_columns:
for col in metadata_columns:
if col in dataset.features:
metadata[col] = self._extract_field_value(row, col)
# Create Opik dataset item
item = {
"input": input_data,
"expected_output": expected_output,
"metadata": metadata
}
opik_items.append(item)
return opik_items
Use the @track decorator to create comprehensive traces when working with your converted datasets:
from opik import track
@track
def evaluate_qa_model(dataset_item):
"""Evaluate a Q&A model using Hugging Face dataset."""
question = dataset_item["input"]["question"]
# Your model logic here (replace with actual model)
if 'what' in question.lower():
response = "This is a question asking for information."
elif 'how' in question.lower():
response = "This is a question asking for a process or method."
else:
response = "This is a general question that requires analysis."
return {
"question": question,
"response": response,
"expected": dataset_item["expected_output"],
"metadata": dataset_item["metadata"]
}
# Evaluate on your dataset
for item in opik_data[:5]: # Evaluate first 5 items
result = evaluate_qa_model(item)
print(f"Question: {result['question'][:50]}...")
# Load SQuAD dataset
squad_dataset = converter.load_hf_dataset(
dataset_name="squad",
split="validation",
subset_size=50
)
# Convert to Opik format
squad_opik = converter.convert_to_opik_format(
dataset=squad_dataset,
input_columns=["question"],
output_columns=["answers"],
metadata_columns=["id", "title"],
dataset_name="squad-qa-dataset",
description="SQuAD question answering dataset"
)
# Load GLUE SST-2 dataset
sst2_dataset = converter.load_hf_dataset(
dataset_name="glue",
config_name="sst2",
split="validation",
subset_size=100
)
# Convert to Opik format
sst2_opik = converter.convert_to_opik_format(
dataset=sst2_dataset,
input_columns=["sentence"],
output_columns=["label"],
metadata_columns=["idx"],
dataset_name="sst2-sentiment-dataset",
description="SST-2 sentiment analysis dataset from GLUE"
)
# Load Common Crawl dataset
cc_dataset = converter.load_hf_dataset(
dataset_name="common_crawl",
subset_size=200
)
# Convert to Opik format
cc_opik = converter.convert_to_opik_format(
dataset=cc_dataset,
input_columns=["text"],
output_columns=["language"],
metadata_columns=["url", "timestamp"],
dataset_name="common-crawl-dataset",
description="Common Crawl text classification dataset"
)
Once your Hugging Face datasets are converted and uploaded to Opik, you can view them in the Opik UI. Each dataset will contain:
Once your Hugging Face datasets are in Opik, you can evaluate your LLM applications using Opik's evaluation framework:
from opik.evaluation import evaluate
from opik.evaluation.metrics import Hallucination
# Define your evaluation task
def evaluation_task(x):
return {
"message": x["input"]["question"],
"output": x["response"],
"reference": x["expected_output"]["answers"]
}
# Create the Hallucination metric
hallucination_metric = Hallucination()
# Run the evaluation
evaluation_results = evaluate(
experiment_name="huggingface-datasets-evaluation",
dataset=squad_opik,
task=evaluation_task,
scoring_metrics=[hallucination_metric],
project_name="my-project",
)
Make sure to set the following environment variables:
# Hugging Face Configuration (optional, for private datasets)
export HUGGINGFACE_HUB_TOKEN="your-huggingface-token"
# Opik Configuration
export OPIK_PROJECT_NAME="your-project-name"
export OPIK_WORKSPACE="your-workspace-name"
subset_size parameter to limit large datasetsOnce you have Hugging Face Datasets integrated with Opik, you can: