dev-doc/AI_Tasks.md
This document describes each AI task available in darktable, including model requirements, I/O specifications, and integration details.
See AI.md for the architecture overview and how to add new tasks.
Interactive object masking using SAM/SAM2/SegNext models.
Task key: "mask"
API: src/common/ai/segmentation.h
Consumer: src/develop/masks/object.c
| Architecture | config.json arch | Encoder Outputs | Mask Candidates | Box Prompts |
|---|---|---|---|---|
| SAM 2.1 | "sam2" | 3 tensors | 3 (multi-mask) | yes |
| SegNext | "segnext" | 2 tensors | 1 (single-mask) | no |
| Tensor | Shape | Type | Description |
|---|---|---|---|
| Input 0 | [1, 3, 1024, 1024] | float32 | preprocessed image |
Preprocessing (applied by segmentation.c):
Encoder outputs (typical):
| Output | Shape | Description |
|---|---|---|
| 0 | [1, 256, 64, 64] | image embeddings |
| 1 | [1, 32, 256, 256] | high-resolution features |
| 2 (SAM2) | [1, 64, 128, 128] | mid-resolution features |
Inputs:
| Index | Name | Shape | Description |
|---|---|---|---|
| 0..E | encoder outputs | varies | passed through from encoder |
| E+1 | point_coords | [1, N+1, 2] | point coords (N prompts + 1 SAM padding) |
| E+2 | point_labels | [1, N+1] | 1=foreground, 0=background, -1=padding |
| E+3 | mask_input | [1, 1, 256, 256] | previous low-res mask |
| E+4 | has_mask_input | [1] | 0.0 (first click) or 1.0 (refinement) |
Outputs:
| Index | Name | Shape | Description |
|---|---|---|---|
| 0 | masks | [1, M, 1024, 1024] | mask logits (pre-sigmoid) |
| 1 | iou_predictions | [1, M] | predicted IoU per mask |
| 2 | low_res_masks | [1, M, 256, 256] | for iterative refinement |
mask = 1 / (1 + exp(-logits))has_mask_input = 0.0, decoder ignores mask_inputhas_mask_input = 1.0dt_seg_reset_prev_mask() clears the cached mask without clearing
image embeddingsdt_seg_reset_encoding() clears everything (call on image change){
"id": "mask-object-sam21-small",
"name": "mask sam2.1 hiera small",
"description": "Segment Anything 2.1 (Hiera Small) for interactive masking",
"task": "mask",
"arch": "sam2",
"backend": "onnx"
}
mask-object-sam21-small/
config.json
encoder.onnx
decoder.onnx
Conversion scripts are maintained in the darktable-ai repository. Requirements for the decoder export:
orig_im_size inputmasks output at fixed 1024x1024 (include F.interpolate in graph)low_res_masks at 256x256num_labels)num_points may be dynamicRemoves noise from developed images using neural network inference.
Task key: "denoise"
API: src/common/ai/restore.h (dt_restore_load_denoise)
Consumer: src/libs/neural_restore.c
| Tensor | Name | Shape | Type | Description |
|---|---|---|---|---|
| Input 0 | input | [1, 3, H, W] | float32 | sRGB image, NCHW planar layout, values [0,1] |
| Output 0 | output | [1, 3, H, W] | float32 | denoised sRGB image, same layout |
| Tensor | Name | Shape | Type | Description |
|---|---|---|---|---|
| Input 0 | input | [1, 3, H, W] | float32 | sRGB image |
| Input 1 | sigma | [1, 1, H, W] | float32 | noise level map, values = sigma / 255.0 |
| Output 0 | output | [1, 3, H, W] | float32 | denoised image |
Set "num_inputs": 2 in config.json.
Models operate in sRGB. The restore module handles conversion:
DWT (discrete wavelet transform) based luminance detail recovery:
{
"id": "denoise-nind",
"name": "denoise nind",
"description": "UNet denoiser trained on NIND dataset",
"task": "denoise",
"backend": "onnx",
"num_inputs": 1
}
torch.onnx.export(model, dummy_input, "model.onnx",
input_names=["input"],
output_names=["output"],
dynamic_axes={
"input": {2: "height", 3: "width"},
"output": {2: "height", 3: "width"}
})
Super-resolution upscaling of developed images (2x or 4x).
Task key: "upscale"
API: src/common/ai/restore.h (dt_restore_load_upscale_x2, dt_restore_load_upscale_x4)
Consumer: src/libs/neural_restore.c
Same pipeline as denoise, but the output dimensions are scaled:
[1, 3, H*2, W*2][1, 3, H*4, W*4]A single model can provide both scales via separate ONNX files:
model_x2.onnx for 2x upscalemodel_x4.onnx for 4x upscale| Tensor | Name | Shape | Type | Description |
|---|---|---|---|---|
| Input 0 | input | [1, 3, H, W] | float32 | sRGB image, NCHW layout |
| Output 0 | output | [1, 3, H*S, W*S] | float32 | upscaled sRGB image (S = scale factor) |
{
"id": "upscale-bsrgan",
"name": "upscale bsrgan",
"description": "BSRGAN 2x and 4x blind super-resolution",
"task": "upscale",
"github_asset": "upscale-bsrgan.dtmodel",
"default": true
}
upscale-bsrgan/
config.json
model_x2.onnx
model_x4.onnx