scientific-skills/imaging-data-commons/references/index_tables_guide.md
Tested with: idc-index 0.11.14 (IDC data version v23)
This guide covers the structure and access patterns for IDC index tables: programmatic schema discovery, DataFrame access, and join column references. For the overview of available tables and their purposes, see the "Index Tables" section in the main SKILL.md.
Complete index table documentation: https://idc-index.readthedocs.io/en/latest/indices_reference.html
Load this guide when you need to:
For SQL query examples (filter discovery, finding annotations, size estimation), see references/sql_patterns.md.
pip install --upgrade idc-index
from idc_index import IDCClient
client = IDCClient()
# Query the primary index (always available)
results = client.sql_query("SELECT * FROM index WHERE Modality = 'CT' LIMIT 10")
# Fetch and query additional indices
client.fetch_index("collections_index")
collections = client.sql_query("SELECT collection_id, CancerTypes, TumorLocations FROM collections_index")
client.fetch_index("analysis_results_index")
analysis = client.sql_query("SELECT * FROM analysis_results_index LIMIT 5")
# Primary index (always available after client initialization)
df = client.index
# Fetch and access on-demand indices
client.fetch_index("sm_index")
sm_df = client.sm_index
The indices_overview dictionary contains complete schema information for all tables. Always consult this when writing queries or exploring data structure.
DICOM attribute mapping: Many columns are populated directly from DICOM attributes in the source files. The column description in the schema indicates when a column corresponds to a DICOM attribute (e.g., "DICOM Modality attribute" or references a DICOM tag). This allows leveraging DICOM knowledge when querying — standard DICOM attribute names like PatientID, StudyInstanceUID, Modality, BodyPartExamined work as expected.
from idc_index import IDCClient
client = IDCClient()
# List all available indices with descriptions
for name, info in client.indices_overview.items():
print(f"\n{name}:")
print(f" Installed: {info['installed']}")
print(f" Description: {info['description']}")
# Get complete schema for a specific index (columns, types, descriptions)
schema = client.indices_overview["index"]["schema"]
print(f"\nTable: {schema['table_description']}")
print("\nColumns:")
for col in schema['columns']:
desc = col.get('description', 'No description')
# Description indicates if column is from DICOM attribute
print(f" {col['name']} ({col['type']}): {desc}")
# Find columns that are DICOM attributes (check description for "DICOM" reference)
dicom_cols = [c['name'] for c in schema['columns'] if 'DICOM' in c.get('description', '').upper()]
print(f"\nDICOM-sourced columns: {dicom_cols}")
Alternative: use get_index_schema() method:
schema = client.get_index_schema("index")
# Returns same schema dict: {'table_description': ..., 'columns': [...]}
Most common columns in the primary index table (use indices_overview for complete list and descriptions):
| Column | Type | DICOM | Description |
|---|---|---|---|
collection_id | STRING | No | IDC collection identifier |
analysis_result_id | STRING | No | If applicable, indicates what analysis results collection given series is part of |
source_DOI | STRING | No | DOI linking to dataset details; use for learning more about the content and for attribution (see citations below) |
PatientID | STRING | Yes | Patient identifier |
StudyInstanceUID | STRING | Yes | DICOM Study UID |
SeriesInstanceUID | STRING | Yes | DICOM Series UID — use for downloads/viewing |
Modality | STRING | Yes | Imaging modality (CT, MR, PT, SM, SEG, ANN, RTSTRUCT, etc.) |
BodyPartExamined | STRING | Yes | Anatomical region |
SeriesDescription | STRING | Yes | Description of the series |
Manufacturer | STRING | Yes | Equipment manufacturer |
StudyDate | STRING | Yes | Date study was performed |
PatientSex | STRING | Yes | Patient sex |
PatientAge | STRING | Yes | Patient age at time of study |
license_short_name | STRING | No | License type (CC BY 4.0, CC BY-NC 4.0, etc.) |
series_size_MB | FLOAT | No | Size of series in megabytes |
instanceCount | INTEGER | No | Number of DICOM instances in series |
SOPClassUID | STRING | Yes | DICOM SOP Class UID (identifies the object/service class, e.g., CT Image Storage) |
TransferSyntaxUID | STRING | Yes | DICOM Transfer Syntax UID (encoding/compression method) |
DICOM = Yes: Column value extracted from the DICOM attribute with the same name. Refer to the DICOM standard for numeric tag mappings. Use standard DICOM knowledge for expected values and formats.
Use this table to identify join columns between index tables. Always call client.fetch_index("table_name") before using a table in SQL.
| Table A | Table B | Join Condition |
|---|---|---|
index | collections_index | index.collection_id = collections_index.collection_id |
index | sm_index | index.SeriesInstanceUID = sm_index.SeriesInstanceUID |
index | seg_index | index.SeriesInstanceUID = seg_index.segmented_SeriesInstanceUID |
index | ann_index | index.SeriesInstanceUID = ann_index.SeriesInstanceUID |
ann_index | ann_group_index | ann_index.SeriesInstanceUID = ann_group_index.SeriesInstanceUID |
index | clinical_index | index.collection_id = clinical_index.collection_id (then filter by patient) |
index | contrast_index | index.SeriesInstanceUID = contrast_index.SeriesInstanceUID |
index | volume_geometry_index | index.SeriesInstanceUID = volume_geometry_index.SeriesInstanceUID |
index | rtstruct_index | index.SeriesInstanceUID = rtstruct_index.SeriesInstanceUID |
rtstruct_index | index (source images) | rtstruct_index.referenced_SeriesInstanceUID = index.SeriesInstanceUID |
For complete query examples using these joins, see references/sql_patterns.md.
Issue: Column not found in table
client.indices_overview["table_name"]["schema"]["columns"] to list available columnsIssue: DataFrame access returns None
client.fetch_index(), then access via property matching the index namereferences/sql_patterns.md for query examples using these tablesreferences/clinical_data_guide.md for clinical data workflowsreferences/digital_pathology_guide.md for pathology-specific indices