skills/nextflow/references/language.md
The complete Nextflow scripting language: processes, channels, operators, workflows, and modules. Nextflow is a Groovy-based DSL; DSL2 is the default and only DSL (DSL1 is removed, so nextflow.enable.dsl=2 is unnecessary). Source: https://www.nextflow.io/docs/latest/
Current syntax conventions (a strict-syntax parser, NXF_SYNTAX_PARSER=v2, is opt-in in 25.x and becomes the default in 26.04 — write to it now, it also runs on the legacy parser):
channel.of(...) (lowercase namespace) is canonical; Channel.of(...) still works but is discouraged.{ v -> v * 2 }) are preferred over the implicit it.emit:; scale resources with task.attempt.output {} + publish: (full feature in 25.10) is the new declarative way to publish results; publishDir still works and remains dominant in nf-core — both are shown below.for/while loops (use each/collect), import/custom class (use functions or lib/), process shell: (use script:), include … addParams().A Nextflow script (.nf) mixes process/workflow definitions with Groovy. Every script enables DSL2 implicitly (it is the default since 22.03). A run begins at the unnamed workflow {} block (the entry workflow).
#!/usr/bin/env nextflow
params.input = 'data/*.fastq' // pipeline parameter with a default
process FASTQC { /* ... */ } // a process definition
workflow { // entry point
reads = channel.fromPath(params.input)
FASTQC(reads)
}
Run with nextflow run main.nf --input 'data/*.fastq'. Parameters declared as params.x are overridable on the CLI (--x), in nextflow.config, or in a -params-file.
A process defines a task: a (usually Bash) script executed in its own isolated work directory. Nextflow stages declared inputs in and declared outputs out, so processes never read/write each other's files directly — they communicate only through channels.
process ALIGN {
tag "$meta.id" // label shown in the log/trace
label 'process_high' // maps to resources in config
container 'quay.io/biocontainers/bwa:0.7.17--hed695b0_7'
publishDir "${params.outdir}/bam", mode: 'copy'
input:
tuple val(meta), path(reads) // a sample: metadata map + file(s)
path index // a shared reference (value channel)
output:
tuple val(meta), path("*.bam"), emit: bam
path "versions.yml", emit: versions
when:
meta.run_alignment != false // skip task if false
script:
def prefix = task.ext.prefix ?: meta.id
def args = task.ext.args ?: '' // extra flags injected from config
"""
bwa mem $args -t $task.cpus $index ${reads} | samtools sort -o ${prefix}.bam
cat <<-END_VERSIONS > versions.yml
"${task.process}":
bwa: \$(bwa 2>&1 | sed -n 's/Version: //p')
END_VERSIONS
"""
stub:
"""
touch ${meta.id}.bam
touch versions.yml
"""
}
Declared one per line under input:. Each input consumes one item from a channel.
| Qualifier | Meaning |
|---|---|
val(x) | Any value (string, number, map) |
path(f) | A file/dir; staged into the work dir. Use path (not the old file) |
tuple val(meta), path(reads) | A composite item — the nf-core standard: a meta map + files |
env(NAME) | Value exposed as an environment variable |
stdin | Feed the channel item to the script's stdin |
each x | Repeat the process once per value in x (combinatorial) |
Inputs are positional and matched to channels in call order: ALIGN(reads_ch, index_ch).
Declared under output:; each becomes a channel. Use emit: to name outputs so callers can reference ALIGN.out.bam instead of positional ALIGN.out[0].
| Form | Meaning |
|---|---|
path "*.bam" | Files matched by glob in the work dir after the script runs |
tuple val(meta), path("*.bam") | Carry metadata forward with the file |
val x | Emit a value computed in the process |
stdout | Capture the script's stdout as the output |
eval('cmd') | Capture the stdout of a command run in the task env (24.04+) — used for tool versions |
path "out", emit: name | Named output channel (reference as PROC.out.name) |
..., topic: versions | Also route this output to a named topic channel |
optional true | Output may be absent without erroring |
script: (default) — a multi-line string run as Bash. Nextflow variables interpolate with $var/${expr}; escape shell vars you do NOT want Nextflow to touch as \$var.shell: — like script but Nextflow vars use !{var}, leaving $ for the shell. Deprecated as of 25.04 — use script: with \$ to escape shell vars.exec: — native Groovy, no external process (for in-line computation).#!/usr/bin/env python).process PY {
input: val x
output: stdout
script:
"""
#!/usr/bin/env python
print(${x} ** 2)
"""
}
when: — skip the task when the expression is false (prefer filtering channels upstream when possible).script: can branch with normal Groovy if/else returning different command strings.stub: — an alternate minimal script run with -stub-run to test pipeline wiring without the real tool. nf-core requires stubs.Set inside a process (or globally via config). Most-used:
| Directive | Purpose |
|---|---|
cpus, memory, time, disk | Resource requests (e.g. memory '8.GB', time '2.h') |
container | Container image for this process |
conda | Conda packages/env for this process |
publishDir | Copy/link outputs to a results dir (`mode: 'copy' |
tag | Human-readable label per task in logs/trace |
label | Group processes (target with withLabel: in config) |
errorStrategy | 'terminate' (default), 'ignore', 'retry', 'finish' |
maxRetries, maxErrors | Retry limits |
cache | true/'lenient'/'deep'/false — caching behavior |
scratch | Run in node-local scratch then stage out |
stageInMode/stageOutMode | 'symlink'/'copy'/'link' staging |
beforeScript/afterScript | Commands wrapping the task script |
accelerator | GPU request (e.g. accelerator 1, type: 'nvidia-tesla-v100') |
array | Submit as a job array (HPC/cloud), e.g. array 100 |
ext | Free-form map (ext.args, ext.prefix) injected from config |
pod | Kubernetes pod options |
module | Load an HPC environment module |
maxForks | Cap parallel tasks for this process |
Access the resolved values at runtime via task.* (task.cpus, task.memory, task.attempt, task.process, task.ext.args).
Channels are the asynchronous queues connecting processes. Two kinds:
channel.value(), by operators like collect/first, or implicitly from a single value. A process input bound to a value channel is reused for every task.channel.of(1, 2, 3) // emit given values (ranges expand: 1..23)
channel.fromList([1, 2, 3]) // emit list items
channel.value('ref.fa') // singleton value channel
channel.fromPath('data/*.bam') // one item per matching file
channel.fromPath('data/**.fastq', checkIfExists: true) // also: arity:'1', type:'file', hidden:true
channel.fromFilePairs('data/*_{1,2}.fastq.gz') // -> [id, [r1, r2]] for paired reads
channel.topic('versions') // collect values emitted to a named topic (24.04+)
channel.empty() // emits nothing
channel.fromSRA(...)exists but is deprecated as of 26.04 — prefer a samplesheet (splitCsv) over fetching reads by accession.
fromFilePairs is the idiomatic way to group paired-end reads; it yields [ sampleId, [read1, read2] ], which you typically map into the nf-core [ meta, [reads] ] shape.
Operators transform/combine channels. Chain with .; the dataflow graph is built from these connections.
| Operator | Purpose |
|---|---|
map { } | Transform each item |
filter { } | Keep items matching a condition/type/regex |
flatten | Flatten nested emissions into individual items |
collect | Gather all items into a single list (→ value channel) |
toList / toSortedList | Collect into one (sorted) list |
groupTuple | Group tuples by key (e.g. by meta) — often needs groupTuple(by: 0) |
join | Inner-join two channels by a matching key |
combine | Cartesian product (optionally by: a key) |
cross | Combine matching keyed items |
mix | Merge multiple channels into one stream |
concat | Emit one channel fully, then the next, in order |
branch { } | Route items into multiple named sub-channels by condition |
multiMap { } | Emit to several channels from one pass |
splitCsv / splitText / splitFasta / splitFastq | Split file contents into items |
collectFile | Write items into one or more files |
unique / distinct | De-duplicate |
first / last / take / until | Select subsets |
set { ch } | Name the resulting channel (alternative to ch =) |
view { } | Print items for debugging (returns the channel unchanged) |
ifEmpty | Provide a default if the channel is empty |
dump(tag:'x') | Debug-print when run with -dump-channels x |
// Build the nf-core [meta, reads] shape from a samplesheet
channel
.fromPath(params.input)
.splitCsv(header: true)
.map { row -> tuple([id: row.sample, single_end: row.fastq_2 ? false : true],
row.fastq_2 ? [file(row.fastq_1), file(row.fastq_2)] : [file(row.fastq_1)]) }
.set { reads_ch }
// Group per-sample results, then join two channels by meta
counts.groupTuple()
.join(metadata) // matches on the first (key) element
.view()
A workflow composes processes and other workflows. The unnamed workflow is the entry point. Named workflows are reusable (sub)workflows.
workflow RNASEQ {
take: // typed inputs (channels)
reads
index
main: // pipeline logic
FASTQC(reads)
ALIGN(reads, index)
QUANT(ALIGN.out.bam)
emit: // named outputs
bam = ALIGN.out.bam
counts = QUANT.out.counts
versions = FASTQC.out.versions.mix(ALIGN.out.versions)
}
workflow { // entry: wire inputs and call the named workflow
reads = channel.fromFilePairs(params.reads)
index = channel.value(file(params.index))
RNASEQ(reads, index)
RNASEQ.out.counts.view()
}
ALIGN(reads, index). Outputs are on .out (use emit: names: ALIGN.out.bam).include it again under an alias.reads | FASTQC.publish: section and describe them in a top-level output {} block — the recommended replacement for the publishDir directive (which still works and dominates nf-core):workflow {
main:
ch = ANALYZE(input)
publish:
results = ch // name the published channel
}
output {
results { path 'analysis' } // -> <outputDir>/analysis (default outputDir: results/)
}
Modules are .nf files whose processes/workflows are imported with include. This is the basis of nf-core's reusable components.
include { FASTQC } from './modules/fastqc/main.nf'
include { ALIGN as ALIGN_TUMOR;
ALIGN as ALIGN_NORMAL } from './modules/align/main.nf'
include { RNASEQ } from './subworkflows/rnaseq.nf'
as aliases let you include the same component multiple times..nf extension optional.Make pipelines robust by retrying failures with more resources instead of over-provisioning everything. task.attempt increments on each retry.
process BIG_JOB {
label 'process_high'
cpus { 4 * task.attempt }
memory { 8.GB * task.attempt }
time { 4.h * task.attempt }
errorStrategy { task.exitStatus in [137, 140, 143] ? 'retry' : 'terminate' }
maxRetries 3
// ...
}
errorStrategy 'ignore' lets the pipeline continue past a failed task; 'finish' stops launching new tasks but lets running ones complete.conf/base.config keyed on process_* labels (see references/developing.md)."${x}"). In script: blocks, escape shell variables as \$VAR.def inside script:/closures to avoid leaking globals.[ id: 'x', single_end: false ]; access as meta.id.collect) for things consumed by many tasks (like a reference index).groupTuple may emit before all items arrive unless sizes are known; provide size: or use groupTuple(by:) carefully.include ... as.output: match the work directory, not publishDir.when: for clarity and caching.nextflow lint and a stricter parser; nf-core is migrating pipelines to it. Keep scripts to documented DSL2 constructs and avoid deprecated DSL1 idioms (Channel.create(), .into{} overuse, top-level file() for inputs).