scientific-skills/imaging-data-commons/references/dicomweb_guide.md
IDC provides DICOMweb access through Google Cloud Healthcare API DICOM stores. This guide covers the implementation specifics and usage patterns.
Use DICOMweb when you need:
For most use cases, idc-index is simpler and recommended. Use DICOMweb when you specifically need the DICOMweb protocol.
https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb
https://healthcare.googleapis.com/v1/projects/nci-idc-data/locations/us-central1/datasets/idc/dicomStores/idc-store-v{VERSION}/dicomWeb
Replace {VERSION} with the IDC release number. To find the current version:
from idc_index import IDCClient
client = IDCClient()
print(client.get_idc_version()) # e.g., "23" for v23
idc-open-data bucket (missing ~4% from other buckets)See Content Coverage Differences and Authentication sections below.
Important: The two DICOMweb endpoints have different data coverage. The IDC public proxy contains MORE data than the authenticated Google Healthcare endpoint.
| Endpoint | Coverage | Missing Data |
|---|---|---|
| IDC Public Proxy | 100% | None |
| Google Healthcare API | ~96% | ~4% (two buckets not replicated) |
The Google Healthcare DICOM store only replicates data from the idc-open-data S3 bucket. It does not include data from two additional buckets:
idc-open-data-cridc-open-data-twoThese missing buckets typically contain several thousand series each, representing approximately 4% of total IDC data. The exact counts vary by IDC version.
See cloud_storage_guide.md for details on bucket organization, file structure, and direct access methods.
Between releases, both endpoints remain current. The 1-2 week delay only occurs during the transition period after a new IDC version is published.
Warning from IDC documentation: "Google-hosted DICOM store may not contain the latest version of IDC data!" - Check during the weeks following a new release.
Use IDC Public Proxy when:
Use Google Healthcare API when:
Before choosing an endpoint, verify whether your data might be in the missing buckets:
from idc_index import IDCClient
client = IDCClient()
# Check which buckets contain your collection's data
results = client.sql_query("""
SELECT series_aws_url, COUNT(*) as series_count
FROM index
WHERE collection_id = 'your_collection_id'
GROUP BY series_aws_url
""")
print(results)
# Look for URLs containing 'idc-open-data-cr' or 'idc-open-data-two'
# If present, that data won't be available in Google Healthcare endpoint
IDC DICOMweb is provided through Google Cloud Healthcare API DICOM stores. The implementation follows DICOM PS3.18 Web Services with specific characteristics documented in the Google Healthcare DICOM conformance statement.
| Service | Description | Supported |
|---|---|---|
| QIDO-RS | Search for DICOM objects | Yes |
| WADO-RS | Retrieve DICOM objects and metadata | Yes |
| STOW-RS | Store DICOM objects | No (IDC is read-only) |
Not supported: URI Service, Worklist Service, Non-Patient Instance Service, Capabilities Transactions
The implementation supports a limited set of searchable tags:
| Level | Searchable Tags |
|---|---|
| Study | StudyInstanceUID, PatientName, PatientID, AccessionNumber, ReferringPhysicianName, StudyDate |
| Series | All study tags + SeriesInstanceUID, Modality |
| Instance | All series tags + SOPInstanceUID |
Important: Only exact matching is supported, except for:
All examples use the public proxy endpoint. For authenticated access to Google Healthcare, see the authentication section.
Use idc-index to discover data, then use DICOMweb for metadata access:
from idc_index import IDCClient
client = IDCClient()
# Find studies of interest
results = client.sql_query("""
SELECT StudyInstanceUID, SeriesInstanceUID, PatientID, Modality
FROM index
WHERE collection_id = 'tcga_luad' AND Modality = 'CT'
LIMIT 5
""")
# Use these UIDs with DICOMweb
study_uid = results.iloc[0]['StudyInstanceUID']
series_uid = results.iloc[0]['SeriesInstanceUID']
print(f"Study: {study_uid}")
print(f"Series: {series_uid}")
import requests
base_url = "https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb"
# Search for a specific study
study_uid = "1.3.6.1.4.1.14519.5.2.1.6450.9002.307623500513044641407722230440"
response = requests.get(
f"{base_url}/studies",
params={"StudyInstanceUID": study_uid},
headers={"Accept": "application/dicom+json"}
)
if response.status_code == 200:
studies = response.json()
print(f"Found {len(studies)} study")
import requests
base_url = "https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb"
study_uid = "1.3.6.1.4.1.14519.5.2.1.6450.9002.307623500513044641407722230440"
response = requests.get(
f"{base_url}/studies/{study_uid}/series",
headers={"Accept": "application/dicom+json"}
)
if response.status_code == 200:
series_list = response.json()
for series in series_list:
# DICOM tags are returned as hex codes
series_uid = series.get("0020000E", {}).get("Value", [None])[0]
modality = series.get("00080060", {}).get("Value", [None])[0]
description = series.get("0008103E", {}).get("Value", [""])[0]
print(f"{modality}: {description}")
import requests
base_url = "https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb"
study_uid = "1.3.6.1.4.1.14519.5.2.1.6450.9002.307623500513044641407722230440"
series_uid = "1.3.6.1.4.1.14519.5.2.1.6450.9002.217441095430480124587725641302"
response = requests.get(
f"{base_url}/studies/{study_uid}/series/{series_uid}/instances",
params={"limit": 10},
headers={"Accept": "application/dicom+json"}
)
if response.status_code == 200:
instances = response.json()
print(f"Found {len(instances)} instances")
for inst in instances[:3]:
sop_uid = inst.get("00080018", {}).get("Value", [None])[0]
print(f" SOPInstanceUID: {sop_uid}")
import requests
base_url = "https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb"
study_uid = "1.3.6.1.4.1.14519.5.2.1.6450.9002.307623500513044641407722230440"
series_uid = "1.3.6.1.4.1.14519.5.2.1.6450.9002.217441095430480124587725641302"
response = requests.get(
f"{base_url}/studies/{study_uid}/series/{series_uid}/metadata",
headers={"Accept": "application/dicom+json"}
)
if response.status_code == 200:
instances = response.json()
print(f"Retrieved metadata for {len(instances)} instances")
# Extract image dimensions from first instance
if instances:
inst = instances[0]
rows = inst.get("00280010", {}).get("Value", [None])[0]
cols = inst.get("00280011", {}).get("Value", [None])[0]
print(f"Image dimensions: {rows} x {cols}")
from idc_index import IDCClient
import requests
# Use idc-index for efficient discovery
idc = IDCClient()
results = idc.sql_query("""
SELECT StudyInstanceUID, SeriesInstanceUID, Modality, SeriesDescription
FROM index
WHERE collection_id = 'nlst' AND Modality = 'CT'
LIMIT 1
""")
study_uid = results.iloc[0]['StudyInstanceUID']
series_uid = results.iloc[0]['SeriesInstanceUID']
print(f"Found: {results.iloc[0]['SeriesDescription']}")
# Use DICOMweb to stream metadata without downloading files
base_url = "https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb"
response = requests.get(
f"{base_url}/studies/{study_uid}/series/{series_uid}/metadata",
headers={"Accept": "application/dicom+json"}
)
if response.status_code == 200:
metadata = response.json()
print(f"Retrieved metadata for {len(metadata)} instances without downloading files")
DICOMweb returns tags as hexadecimal codes. Common tags:
| Tag | Name | Description |
|---|---|---|
| 00080018 | SOPInstanceUID | Unique instance identifier |
| 00080020 | StudyDate | Date study was performed |
| 00080060 | Modality | Imaging modality (CT, MR, PT, etc.) |
| 0008103E | SeriesDescription | Description of series |
| 00100020 | PatientID | Patient identifier |
| 0020000D | StudyInstanceUID | Unique study identifier |
| 0020000E | SeriesInstanceUID | Unique series identifier |
| 00280010 | Rows | Image height in pixels |
| 00280011 | Columns | Image width in pixels |
To use the Google Healthcare endpoint with higher quotas:
from google.auth import default
from google.auth.transport.requests import Request
import requests
# Get credentials (requires gcloud auth)
credentials, project = default()
credentials.refresh(Request())
# Build authenticated request
base_url = "https://healthcare.googleapis.com/v1/projects/nci-idc-data/locations/us-central1/datasets/idc/dicomStores/idc-store-v23/dicomWeb"
response = requests.get(
f"{base_url}/studies",
params={"limit": 5},
headers={
"Authorization": f"Bearer {credentials.token}",
"Accept": "application/dicom+json"
}
)
Prerequisites:
gcloud)gcloud auth application-default loginidc-index to discover UIDs first, then query DICOMweb with specific UIDs.gcloud auth application-default login and ensure your account has accesslimit values, or use authenticated endpoint for higher quotasidc-index query firstidc-open-data-cr or idc-open-data-two buckets (not available in Google Healthcare endpoint)limit parameter on instance queries, or query specific instances by SOPInstanceUIDIDC Documentation:
DICOMweb Standards and Tools:
Related Guides:
cloud_storage_guide.md - Direct bucket access, file organization, CRDC UUIDs, and versioningbigquery_guide.md - Advanced metadata queries with full DICOM attributes