Back to Claude Scientific Skills

Index Tables Guide for IDC

scientific-skills/imaging-data-commons/references/index_tables_guide.md

2.38.07.2 KB
Original Source

Index Tables Guide for IDC

Tested with: idc-index 0.11.14 (IDC data version v23)

This guide covers the structure and access patterns for IDC index tables: programmatic schema discovery, DataFrame access, and join column references. For the overview of available tables and their purposes, see the "Index Tables" section in the main SKILL.md.

Complete index table documentation: https://idc-index.readthedocs.io/en/latest/indices_reference.html

When to Use This Guide

Load this guide when you need to:

  • Discover table schemas and column types programmatically
  • Access index tables as pandas DataFrames (not via SQL)
  • Understand key columns and join relationships between tables

For SQL query examples (filter discovery, finding annotations, size estimation), see references/sql_patterns.md.

Prerequisites

bash
pip install --upgrade idc-index

Accessing Index Tables

python
from idc_index import IDCClient
client = IDCClient()

# Query the primary index (always available)
results = client.sql_query("SELECT * FROM index WHERE Modality = 'CT' LIMIT 10")

# Fetch and query additional indices
client.fetch_index("collections_index")
collections = client.sql_query("SELECT collection_id, CancerTypes, TumorLocations FROM collections_index")

client.fetch_index("analysis_results_index")
analysis = client.sql_query("SELECT * FROM analysis_results_index LIMIT 5")

As pandas DataFrames (direct access)

python
# Primary index (always available after client initialization)
df = client.index

# Fetch and access on-demand indices
client.fetch_index("sm_index")
sm_df = client.sm_index

Discovering Table Schemas

The indices_overview dictionary contains complete schema information for all tables. Always consult this when writing queries or exploring data structure.

DICOM attribute mapping: Many columns are populated directly from DICOM attributes in the source files. The column description in the schema indicates when a column corresponds to a DICOM attribute (e.g., "DICOM Modality attribute" or references a DICOM tag). This allows leveraging DICOM knowledge when querying — standard DICOM attribute names like PatientID, StudyInstanceUID, Modality, BodyPartExamined work as expected.

python
from idc_index import IDCClient
client = IDCClient()

# List all available indices with descriptions
for name, info in client.indices_overview.items():
    print(f"\n{name}:")
    print(f"  Installed: {info['installed']}")
    print(f"  Description: {info['description']}")

# Get complete schema for a specific index (columns, types, descriptions)
schema = client.indices_overview["index"]["schema"]
print(f"\nTable: {schema['table_description']}")
print("\nColumns:")
for col in schema['columns']:
    desc = col.get('description', 'No description')
    # Description indicates if column is from DICOM attribute
    print(f"  {col['name']} ({col['type']}): {desc}")

# Find columns that are DICOM attributes (check description for "DICOM" reference)
dicom_cols = [c['name'] for c in schema['columns'] if 'DICOM' in c.get('description', '').upper()]
print(f"\nDICOM-sourced columns: {dicom_cols}")

Alternative: use get_index_schema() method:

python
schema = client.get_index_schema("index")
# Returns same schema dict: {'table_description': ..., 'columns': [...]}

Key Columns Reference

Most common columns in the primary index table (use indices_overview for complete list and descriptions):

ColumnTypeDICOMDescription
collection_idSTRINGNoIDC collection identifier
analysis_result_idSTRINGNoIf applicable, indicates what analysis results collection given series is part of
source_DOISTRINGNoDOI linking to dataset details; use for learning more about the content and for attribution (see citations below)
PatientIDSTRINGYesPatient identifier
StudyInstanceUIDSTRINGYesDICOM Study UID
SeriesInstanceUIDSTRINGYesDICOM Series UID — use for downloads/viewing
ModalitySTRINGYesImaging modality (CT, MR, PT, SM, SEG, ANN, RTSTRUCT, etc.)
BodyPartExaminedSTRINGYesAnatomical region
SeriesDescriptionSTRINGYesDescription of the series
ManufacturerSTRINGYesEquipment manufacturer
StudyDateSTRINGYesDate study was performed
PatientSexSTRINGYesPatient sex
PatientAgeSTRINGYesPatient age at time of study
license_short_nameSTRINGNoLicense type (CC BY 4.0, CC BY-NC 4.0, etc.)
series_size_MBFLOATNoSize of series in megabytes
instanceCountINTEGERNoNumber of DICOM instances in series
SOPClassUIDSTRINGYesDICOM SOP Class UID (identifies the object/service class, e.g., CT Image Storage)
TransferSyntaxUIDSTRINGYesDICOM Transfer Syntax UID (encoding/compression method)

DICOM = Yes: Column value extracted from the DICOM attribute with the same name. Refer to the DICOM standard for numeric tag mappings. Use standard DICOM knowledge for expected values and formats.

Join Column Reference

Use this table to identify join columns between index tables. Always call client.fetch_index("table_name") before using a table in SQL.

Table ATable BJoin Condition
indexcollections_indexindex.collection_id = collections_index.collection_id
indexsm_indexindex.SeriesInstanceUID = sm_index.SeriesInstanceUID
indexseg_indexindex.SeriesInstanceUID = seg_index.segmented_SeriesInstanceUID
indexann_indexindex.SeriesInstanceUID = ann_index.SeriesInstanceUID
ann_indexann_group_indexann_index.SeriesInstanceUID = ann_group_index.SeriesInstanceUID
indexclinical_indexindex.collection_id = clinical_index.collection_id (then filter by patient)
indexcontrast_indexindex.SeriesInstanceUID = contrast_index.SeriesInstanceUID
indexvolume_geometry_indexindex.SeriesInstanceUID = volume_geometry_index.SeriesInstanceUID
indexrtstruct_indexindex.SeriesInstanceUID = rtstruct_index.SeriesInstanceUID
rtstruct_indexindex (source images)rtstruct_index.referenced_SeriesInstanceUID = index.SeriesInstanceUID

For complete query examples using these joins, see references/sql_patterns.md.

Troubleshooting

Issue: Column not found in table

  • Cause: Column name misspelled or doesn't exist in that table
  • Solution: Use client.indices_overview["table_name"]["schema"]["columns"] to list available columns

Issue: DataFrame access returns None

  • Cause: Index not fetched or property name incorrect
  • Solution: Fetch first with client.fetch_index(), then access via property matching the index name

Resources