docs/en/datasets/explorer/api.md
!!! warning "Community Note ⚠️"
As of **`ultralytics>=8.3.12`**, Ultralytics Explorer has been removed. To use Explorer, install `pip install ultralytics==8.3.11`. Similar (and expanded) dataset exploration features are available in [Ultralytics Platform](https://platform.ultralytics.com/).
<a href="https://colab.research.google.com/github/ultralytics/ultralytics/blob/main/docs/en/datasets/explorer/explorer.ipynb"></a> The Explorer API is a Python API for exploring your datasets. It supports filtering and searching your dataset using SQL queries, vector similarity search, and semantic search.
<p align="center"> <iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/3VryynorQeo?start=279" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen> </iframe><strong>Watch:</strong> Ultralytics Explorer API Overview
</p>Explorer depends on external libraries for some of its functionality. These are automatically installed when you use Explorer. To manually install these dependencies, use the following command:
pip install ultralytics[explorer]
from ultralytics import Explorer
# Create an Explorer object
explorer = Explorer(data="coco128.yaml", model="yolo26n.pt")
# Create embeddings for your dataset
explorer.create_embeddings_table()
# Search for similar images to a given image/images
df = explorer.get_similar(img="path/to/image.jpg")
# Or search for similar images to a given index/indices
df = explorer.get_similar(idx=0)
!!! note
[Embeddings](https://www.ultralytics.com/glossary/embeddings) table for a given dataset and model pair is only created once and reused. These use [LanceDB](https://lancedb.github.io/lancedb/) under the hood, which scales on-disk, so you can create and reuse embeddings for large datasets like COCO without running out of memory.
In case you want to force update the embeddings table, you can pass force=True to create_embeddings_table method.
You can directly access the LanceDB table object to perform advanced analysis. Learn more about it in the Working with Embeddings Table section
Similarity search is a technique for finding similar images to a given image. It is based on the idea that similar images will have similar embeddings. Once the embeddings table is built, you can get run semantic search in any of the following ways:
exp.get_similar(idx=[1,10], limit=10)exp.get_similar(img=["path/to/img1", "path/to/img2"], limit=10)In case of multiple inputs, the aggregate of their embeddings is used.
You get a pandas DataFrame with the limit number of most similar data points to the input, along with their distance in the embedding space. You can use this dataset to perform further filtering.
!!! example "Semantic Search"
=== "Using Images"
```python
from ultralytics import Explorer
# create an Explorer object
exp = Explorer(data="coco128.yaml", model="yolo26n.pt")
exp.create_embeddings_table()
similar = exp.get_similar(img="https://ultralytics.com/images/bus.jpg", limit=10)
print(similar.head())
# Search using multiple indices
similar = exp.get_similar(
img=["https://ultralytics.com/images/bus.jpg", "https://ultralytics.com/images/bus.jpg"],
limit=10,
)
print(similar.head())
```
=== "Using Dataset Indices"
```python
from ultralytics import Explorer
# create an Explorer object
exp = Explorer(data="coco128.yaml", model="yolo26n.pt")
exp.create_embeddings_table()
similar = exp.get_similar(idx=1, limit=10)
print(similar.head())
# Search using multiple indices
similar = exp.get_similar(idx=[1, 10], limit=10)
print(similar.head())
```
You can also plot the similar images using the plot_similar method. This method takes the same arguments as get_similar and plots the similar images in a grid.
!!! example "Plotting Similar Images"
=== "Using Images"
```python
from ultralytics import Explorer
# create an Explorer object
exp = Explorer(data="coco128.yaml", model="yolo26n.pt")
exp.create_embeddings_table()
plt = exp.plot_similar(img="https://ultralytics.com/images/bus.jpg", limit=10)
plt.show()
```
=== "Using Dataset Indices"
```python
from ultralytics import Explorer
# create an Explorer object
exp = Explorer(data="coco128.yaml", model="yolo26n.pt")
exp.create_embeddings_table()
plt = exp.plot_similar(idx=1, limit=10)
plt.show()
```
This feature lets you filter your dataset using natural language, without writing SQL. The AI-powered query generator converts your prompt into a query and returns matching results. For example, you can ask: "show me 100 images with exactly one person and 2 dogs. There can be other objects too" and it will generate the query and show you those results. Note: This feature uses LLMs, so results are probabilistic and may be inaccurate.
!!! example "Ask AI"
```python
from ultralytics.data.explorer import plot_query_result
from ultralytics import Explorer
# create an Explorer object
exp = Explorer(data="coco128.yaml", model="yolo26n.pt")
exp.create_embeddings_table()
df = exp.ask_ai("show me 100 images with exactly one person and 2 dogs. There can be other objects too")
print(df.head())
# plot the results
plt = plot_query_result(df)
plt.show()
```
You can run SQL queries on your dataset using the sql_query method. This method takes a SQL query as input and returns a pandas DataFrame with the results.
!!! example "SQL Query"
```python
from ultralytics import Explorer
# create an Explorer object
exp = Explorer(data="coco128.yaml", model="yolo26n.pt")
exp.create_embeddings_table()
df = exp.sql_query("WHERE labels LIKE '%person%' AND labels LIKE '%dog%'")
print(df.head())
```
You can also plot the results of a SQL query using the plot_sql_query method. This method takes the same arguments as sql_query and plots the results in a grid.
!!! example "Plotting SQL Query Results"
```python
from ultralytics import Explorer
# create an Explorer object
exp = Explorer(data="coco128.yaml", model="yolo26n.pt")
exp.create_embeddings_table()
# plot the SQL Query
exp.plot_sql_query("WHERE labels LIKE '%person%' AND labels LIKE '%dog%' LIMIT 10")
```
You can also work with the embeddings table directly. Once the embeddings table is created, you can access it using the Explorer.table
!!! tip
Explorer works on [LanceDB](https://lancedb.github.io/lancedb/) tables internally. You can access this table directly, using `Explorer.table` object and run raw queries, push down pre- and post-filters, etc.
```python
from ultralytics import Explorer
exp = Explorer()
exp.create_embeddings_table()
table = exp.table
```
Here are some examples of what you can do with the table:
!!! example
```python
from ultralytics import Explorer
exp = Explorer()
exp.create_embeddings_table()
table = exp.table
embeddings = table.to_pandas()["vector"]
print(embeddings)
```
!!! example
```python
from ultralytics import Explorer
exp = Explorer(model="yolo26n.pt")
exp.create_embeddings_table()
table = exp.table
# Dummy embedding
embedding = [i for i in range(256)]
rs = table.search(embedding).metric("cosine").where("").limit(10)
```
When using large datasets, you can also create a dedicated vector index for faster querying. This is done using the create_index method on LanceDB table.
table.create_index(num_partitions=..., num_sub_vectors=...)
You can use the embeddings table to perform a variety of exploratory analysis. Here are some examples:
Explorer comes with a similarity_index operation:
max_dist to the current image in the generated embedding space, considering top_k similar images at a time.It returns a pandas DataFrame with the following columns:
idx: Index of the image in the datasetim_file: Path to the image filecount: Number of images in the dataset that are closer than max_dist to the current imagesim_im_files: List of paths to the count similar images!!! tip
For a given dataset, model, `max_dist` & `top_k` the similarity index once generated will be reused. In case, your dataset has changed, or you simply need to regenerate the similarity index, you can pass `force=True`.
!!! example "Similarity Index"
```python
from ultralytics import Explorer
exp = Explorer()
exp.create_embeddings_table()
sim_idx = exp.similarity_index()
```
You can use similarity index to build custom conditions to filter out the dataset. For example, you can filter out images that are not similar to any other image in the dataset using the following code:
import numpy as np
sim_count = np.array(sim_idx["count"])
sim_idx["im_file"][sim_count > 30]
You can also visualize the embedding space using the plotting tool of your choice. For example here is a simple example using matplotlib:
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
# Reduce dimensions using PCA to 3 components for visualization in 3D
pca = PCA(n_components=3)
reduced_data = pca.fit_transform(embeddings)
# Create a 3D scatter plot using Matplotlib Axes3D
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection="3d")
# Scatter plot
ax.scatter(reduced_data[:, 0], reduced_data[:, 1], reduced_data[:, 2], alpha=0.5)
ax.set_title("3D Scatter Plot of Reduced 256-Dimensional Data (PCA)")
ax.set_xlabel("Component 1")
ax.set_ylabel("Component 2")
ax.set_zlabel("Component 3")
plt.show()
Start creating your own CV dataset exploration reports using the Explorer API. For inspiration, check out the VOC Exploration Example.
Try our GUI Demo based on Explorer API
The Ultralytics Explorer API is designed for comprehensive dataset exploration. It allows users to filter and search datasets using SQL queries, vector similarity search, and semantic search. This powerful Python API can handle large datasets, making it ideal for various computer vision tasks using Ultralytics models.
To install the Ultralytics Explorer API along with its dependencies, use the following command:
pip install ultralytics[explorer]
This will automatically install all necessary external libraries for the Explorer API functionality. For additional setup details, refer to the installation section of our documentation.
You can use the Ultralytics Explorer API to perform similarity searches by creating an embeddings table and querying it for similar images. Here's a basic example:
from ultralytics import Explorer
# Create an Explorer object
explorer = Explorer(data="coco128.yaml", model="yolo26n.pt")
explorer.create_embeddings_table()
# Search for similar images to a given image
similar_images_df = explorer.get_similar(img="path/to/image.jpg")
print(similar_images_df.head())
For more details, please visit the Similarity Search section.
LanceDB, used under the hood by Ultralytics Explorer, provides scalable, on-disk embeddings tables. This ensures that you can create and reuse embeddings for large datasets like COCO without running out of memory. These tables are only created once and can be reused, enhancing efficiency in data handling.
The Ask AI feature allows users to filter datasets using natural language queries. This feature leverages LLMs to convert these queries into SQL queries behind the scenes. Here's an example:
from ultralytics import Explorer
# Create an Explorer object
explorer = Explorer(data="coco128.yaml", model="yolo26n.pt")
explorer.create_embeddings_table()
# Query with natural language
query_result = explorer.ask_ai("show me 100 images with exactly one person and 2 dogs. There can be other objects too")
print(query_result.head())
For more examples, check out the Ask AI section.