scientific-skills/bids/references/conversion_tools.md
This reference covers detailed workflows for converting DICOM and other raw data formats to BIDS using the three main conversion tools.
HeuDiConv is the most flexible DICOM-to-BIDS converter. It supports three usage modes — from fully automatic turnkey conversion to fully custom heuristics — and handles duplicates, provenance tracking, and sourcedata archiving out of the box.
Repository: https://github.com/nipy/heudiconv Docs: https://heudiconv.readthedocs.io/ Tutorials: https://heudiconv.readthedocs.io/en/latest/tutorials.html
uv pip install heudiconv
# HeuDiConv wraps dcm2niix for the actual conversion
# dcm2niix is usually installed as a dependency, but can also be installed via:
# conda install -c conda-forge dcm2niix
# or: apt-get install dcm2niix
If scanner protocol names follow the ReproIn naming convention, conversion is fully automatic with no heuristic file to write. ReproIn is a setup for automatic generation of sharable, version-controlled BIDS datasets directly from MR scanners.
# Turnkey conversion — just point at DICOMs, HeuDiConv does the rest
heudiconv --files dicom/001 -o data -f reproin --bids --minmeta
Protocol names encode BIDS entities directly. Format: <seqtype>[-<suffix>][_<entity>-<label>]...
| Protocol name at scanner | BIDS output |
|---|---|
anat-T1w or just anat | sub-XX/anat/sub-XX_T1w.nii.gz |
func-bold_task-rest or func_task-rest | sub-XX/func/sub-XX_task-rest_bold.nii.gz |
dwi_dir-AP | sub-XX/dwi/sub-XX_dir-AP_dwi.nii.gz |
fmap_dir-PA or fmap-epi_dir-PA | sub-XX/fmap/sub-XX_dir-PA_epi.nii.gz |
fmap_acq-4mm | sub-XX/fmap/sub-XX_acq-4mm_epi.nii.gz |
Key features:
anat defaults to T1w, func to bold, fmap to epi — so they can be omittedanat-scout_ses-pre) and ReproIn propagates it to all sequences in that scanner Program/Patientrun-01, run-02, ...) when the same protocol is run multiple times--locator).tgz files under sourcedata/ for reproducibilitySee also:
If you already have collected data with non-ReproIn protocol names (or cannot control scanner naming), you can write a thin heuristic that maps your protocol names into ReproIn conventions. This gives you all ReproIn benefits (automatic entity handling, duplicate management, sourcedata archiving) while accommodating arbitrary scanner naming.
See https://github.com/repronim/reproin/issues/18 for a brief HOWTO on this approach.
The idea is to write a heuristic whose infotodict returns keys that follow ReproIn naming patterns, so the ReproIn machinery handles the rest.
For studies with complex mappings or non-standard requirements, write a full Python heuristic file. This is the most common workflow for retrospective conversion of existing datasets.
# -f convertall: built-in heuristic that lists all series without converting
# -c none: don't convert, just generate dicominfo.tsv
heudiconv \
--files dicom/219/itbs/*/*.dcm \
-s 219 \
-f convertall \
-c none \
-o Nifti/
This creates .heudiconv/219/info/dicominfo.tsv containing one row per DICOM series with columns:
series_id, sequence_name, protocol_name, series_descriptiondim1-dim4 (image dimensions), TR, TE, image_typeis_derived, is_motion_corrected — important for filteringReview this TSV (open in a spreadsheet) to understand what was acquired and plan the mapping to BIDS names. Step 1 only needs to be done once per project.
"""HeuDiConv heuristic for a typical fMRI study.
Study design:
- T1w MPRAGE anatomical
- Resting-state BOLD
- Task BOLD (n-back working memory)
- DWI with two phase-encoding directions
- Fieldmap (phase-difference)
"""
def create_key(template, outtype=('nii.gz',), annotation_classes=None):
if template is None or not template:
raise ValueError('Template must be a valid format string')
return template, outtype, annotation_classes
def infotodict(seqinfo):
"""Heuristic evaluator for determining which runs belong where.
Parameters
----------
seqinfo : list of namedtuples
Each namedtuple has fields: .series_id, .sequence_name,
.protocol_name, .series_description, .dim1, .dim2, .dim3, .dim4,
.TR, .TE, .is_derived, .is_motion_corrected, .image_type, etc.
Returns
-------
info : dict
Keys are tuples from create_key(), values are lists of series_id
"""
# Define BIDS output templates
t1w = create_key(
'sub-{subject}/{session}/anat/sub-{subject}_{session}_T1w'
)
rest_bold = create_key(
'sub-{subject}/{session}/func/sub-{subject}_{session}_task-rest_bold'
)
# {item:02d} auto-numbers runs when the same protocol is run multiple times
nback_bold = create_key(
'sub-{subject}/{session}/func/sub-{subject}_{session}_task-nback_run-{item:02d}_bold'
)
dwi_AP = create_key(
'sub-{subject}/{session}/dwi/sub-{subject}_{session}_dir-AP_dwi'
)
dwi_PA = create_key(
'sub-{subject}/{session}/dwi/sub-{subject}_{session}_dir-PA_dwi'
)
fmap_phasediff = create_key(
'sub-{subject}/{session}/fmap/sub-{subject}_{session}_phasediff'
)
fmap_mag1 = create_key(
'sub-{subject}/{session}/fmap/sub-{subject}_{session}_magnitude1'
)
fmap_mag2 = create_key(
'sub-{subject}/{session}/fmap/sub-{subject}_{session}_magnitude2'
)
info = {
t1w: [], rest_bold: [], nback_bold: [],
dwi_AP: [], dwi_PA: [],
fmap_phasediff: [], fmap_mag1: [], fmap_mag2: [],
}
for s in seqinfo:
protocol = s.protocol_name.lower()
series_desc = s.series_description.lower() if s.series_description else ''
# Anatomical — filter by dim3 to exclude localizers
if ('mprage' in protocol or 't1w' in protocol) and s.dim3 > 100:
info[t1w].append(s.series_id)
# Functional — filter by dim4 and exclude MOCO series
elif 'rest' in protocol and s.dim4 > 10 and not s.is_motion_corrected:
info[rest_bold].append(s.series_id)
elif 'nback' in protocol and s.dim4 > 10 and not s.is_motion_corrected:
info[nback_bold].append(s.series_id)
# Diffusion
elif ('dti' in protocol or 'dwi' in protocol) and s.dim4 > 1:
if 'ap' in protocol or 'ap' in series_desc:
info[dwi_AP].append(s.series_id)
elif 'pa' in protocol or 'pa' in series_desc:
info[dwi_PA].append(s.series_id)
# Fieldmaps
elif 'field' in protocol or 'fmap' in protocol:
if 'ph' in s.image_type_text.lower():
info[fmap_phasediff].append(s.series_id)
elif s.series_description and 'e1' in s.series_description.lower():
info[fmap_mag1].append(s.series_id)
elif s.series_description and 'e2' in s.series_description.lower():
info[fmap_mag2].append(s.series_id)
return info
# Convert with custom heuristic
heudiconv \
--files dicom/219/itbs/*/*.dcm \
-s 219 \
-ss itbs \
-f Nifti/code/heuristic.py \
-c dcm2niix \
--bids \
--minmeta \
-o Nifti/
# Or using -d template for batch conversion of multiple subjects
heudiconv \
-d /path/to/dicoms/{subject}/*/*/*.dcm \
-s 01 02 03 04 05 \
-f my_heuristic.py \
-c dcm2niix \
--bids \
--minmeta \
-o /path/to/bids_output
# Key flags:
# --files : point to specific DICOM files/directories
# -d : DICOM path template ({subject}, {session} are replaced)
# -s : subject label(s)
# -ss : session label
# -f : heuristic file path, or built-in name (reproin, convertall)
# -c : converter (dcm2niix, none)
# --bids / -b : output BIDS structure (creates JSON sidecars, etc.)
# --minmeta : prevent excess DICOM metadata from overflowing JSON sidecars
# -o : output directory
# --overwrite : re-run conversion overwriting existing files
Every conversion creates/updates a .heudiconv/ hidden directory alongside the output:
.heudiconv/<subject>/info/dicominfo.tsv — DICOM series metadata.heudiconv/<subject>/info/<heuristic>.py — copy of the heuristic usedImportant: If you re-run conversion for a subject/session that was already processed, HeuDiConv silently reuses cached conversion info from .heudiconv/. If troubleshooting, delete the subject's entry from .heudiconv/ (or the whole directory) and re-run.
Keep .heudiconv/ with your data — together with code/ it provides valuable provenance information.
--minmeta to prevent excess DICOM metadata from overflowing JSON sidecars — fMRIPrep and MRIQC may crash on bloated JSON files{item:02d} in templates for auto-numbering runs: if multiple series match, they get run-01, run-02, etc. Without this, later runs silently overwrite earlier onesdim3/dim4 to exclude localizers (small dim3) and single-volume scouts (dim4 == 1)s.is_motion_corrected to exclude scanner-generated MOCO series (e.g., if not s.is_motion_corrected)s.is_derived to skip other derived/processed seriescode/ for reproducibility--files when DICOM organization doesn't follow a clean {subject} template patterndcm2bids uses JSON configuration files instead of Python heuristics. Simpler for straightforward datasets.
Repository: https://github.com/UNFmontreal/Dcm2Bids Docs: https://unfmontreal.github.io/Dcm2Bids/
uv pip install dcm2bids
# Also installs dcm2niix
dcm2bids_scaffold -o /path/to/bids_output
Creates the basic BIDS structure with dataset_description.json, README, .bidsignore, etc.
dcm2bids_helper -d /path/to/dicom_dir -o /path/to/bids_output
Creates tmp_dcm2bids/helper/ with converted NIfTI files and JSON sidecars. Review the JSON files to find distinguishing metadata fields.
{
"descriptions": [
{
"id": "id_t1w",
"datatype": "anat",
"suffix": "T1w",
"criteria": {
"SeriesDescription": "*MPRAGE*",
"ImageType": ["ORIGINAL", "PRIMARY", "M", "ND", "NORM"]
}
},
{
"id": "id_bold_rest",
"datatype": "func",
"suffix": "bold",
"custom_entities": "task-rest",
"criteria": {
"SeriesDescription": "*REST*BOLD*",
"ImageType": ["ORIGINAL", "PRIMARY", "M", "ND", "MOSAIC"]
},
"sidecar_changes": {
"TaskName": "rest"
}
},
{
"id": "id_bold_nback",
"datatype": "func",
"suffix": "bold",
"custom_entities": "task-nback",
"criteria": {
"SeriesDescription": "*NBACK*",
"EchoTime": 0.03
},
"sidecar_changes": {
"TaskName": "nback"
}
},
{
"id": "id_dwi",
"datatype": "dwi",
"suffix": "dwi",
"custom_entities": "dir-AP",
"criteria": {
"SeriesDescription": "*DTI*AP*"
}
},
{
"id": "id_fmap_phasediff",
"datatype": "fmap",
"suffix": "phasediff",
"criteria": {
"SeriesDescription": "*field*map*",
"EchoTime1": 0.00492,
"EchoTime2": 0.00738
},
"sidecar_changes": {
"IntendedFor": [
"bids::sub-{subject}/func/sub-{subject}_task-rest_bold.nii.gz",
"bids::sub-{subject}/func/sub-{subject}_task-nback_bold.nii.gz"
]
}
}
]
}
Configuration file fields:
datatype: BIDS datatype (anat, func, dwi, fmap, etc.)suffix: BIDS suffix (T1w, bold, dwi, etc.)custom_entities: additional BIDS entities (task-rest, dir-AP, acq-highres, etc.)criteria: dictionary of DICOM/JSON metadata fields to match (supports wildcards *)sidecar_changes: fields to add/modify in the output JSON sidecarid: arbitrary identifier for the description (for logging)# Single subject
dcm2bids -d /path/to/dicom_dir -p 01 -c dcm2bids_config.json -o /path/to/bids_output
# With session
dcm2bids -d /path/to/dicom_dir -p 01 -s pre -c dcm2bids_config.json -o /path/to/bids_output
# Flags:
# -d : DICOM source directory
# -p : participant label
# -s : session label (optional)
# -c : configuration file
# -o : output BIDS directory
# --auto_extract_entities : auto-detect run numbers from DICOM
# --force_dcm2bids : overwrite existing conversions
dcm2bids_helper first to see exactly what metadata dcm2niix extracts*) and is case-sensitivesidecar_changes can inject any BIDS metadata (useful for TaskName, IntendedFor)code/dcm2bids_config.json for reproducibilityBIDScoin provides a graphical interface and YAML-based configuration. Good for users who prefer visual mapping.
Repository: https://github.com/Donders-Institute/bidscoin Docs: https://bidscoin.readthedocs.io/
uv pip install bidscoin
# Optional: install with all plugin dependencies
uv pip install "bidscoin[all]"
# Step 1: Create a bidsmap template by scanning DICOMs
bidsmapper /path/to/raw /path/to/bids
# Step 2: Edit the bidsmap (launches GUI)
bidseditor /path/to/bids
# Step 3: Convert using the finalized bidsmap
bidscoiner /path/to/raw /path/to/bids
bidseditor shows DICOM metadata alongside BIDS mapping| Feature | HeuDiConv | dcm2bids | BIDScoin |
|---|---|---|---|
| Configuration | Python heuristic | JSON config | YAML + GUI |
| Flexibility | Highest (full Python) | Medium (criteria matching) | Medium (plugin system) |
| Learning curve | Steeper (Python) | Moderate | Gentlest (GUI) |
| Batch processing | Excellent | Good | Good |
| ReproIn support | Built-in | No | No |
| DataLad integration | Built-in | No | No |
| Best for | Complex studies, automation | Simple-to-moderate studies | Visual learners, multi-site |
| Active development | Yes | Yes | Yes |
After converting DICOM to BIDS with any tool:
bids-validator /path/to/bids_outputRepetitionTime, TaskName, SliceTiming, PhaseEncodingDirection)participants.tsv with demographic dataREADME describing the datasetbids-validator again after any manual modificationsdim4 (number of volumes) matches expectationsSliceTiming is correct for the multiband factorEchoTime1 < EchoTime2 in the phasediff sidecarInPlanePhaseEncodingDirection → BIDS PhaseEncodingDirectionrun-01, run-02){item:02d} placeholder--auto_extract_entities or manually specify runsImageType containing DERIVED or is_derived flag in HeuDiConv