docs/source/guide/ml_tutorials/timeseries_segmenter.md
This example demonstrates a minimal ML backend that performs time series segmentation.
It trains a small LSTM neural network on labeled CSV data and predicts segments
for new tasks. The backend expects the labeling configuration to use
<TimeSeries> and <TimeSeriesLabels> tags.
LABEL_STUDIO_HOST and LABEL_STUDIO_API_KEY in docker-compose.yml
so the backend can download labeled tasks for training.# build and run
docker-compose up --build
A small example CSV is available in tests/time_series.csv.
Connect the model from the Model page in your project settings. The default
URL is http://localhost:9090.
Use a configuration similar to the following:
<View>
<TimeSeriesLabels name="label" toName="ts">
<Label value="Run"/>
<Label value="Walk"/>
</TimeSeriesLabels>
<TimeSeries name="ts" valueType="url" value="$csv_url">
<Channel column="value" />
</TimeSeries>
</View>
The backend reads the time column and channels to build feature vectors. Each
CSV referenced by csv_url must contain the time column and the channel
columns.
You can use the following data for tests with this labeling configuration.
The backend supports two types of the time series segmentation:
Range Annotations
start ≠ end, instant = falseInstant Annotations
start = end, instant = trueNote: Instant labels often create highly imbalanced datasets since they represent brief moments within long time series. The model's balanced learning approach is specifically designed to handle this challenge effectively.
Training starts automatically when annotations are created or updated. The model uses a PyTorch-based LSTM neural network with proper temporal modeling and balanced learning to handle imbalanced time series data effectively.
The model follows these steps during training:
MODEL_DIRYou can customize training behavior with these environment variables:
Basic configuration:
START_TRAINING_EACH_N_UPDATES: How often to retrain (default: 1, trains on every annotation)TRAIN_EPOCHS: Number of training epochs (default: 1000)SEQUENCE_SIZE: Sliding window size for temporal context (default: 50)HIDDEN_SIZE: LSTM hidden layer size (default: 64)Balanced learning (for imbalanced data):
BALANCED_ACCURACY_THRESHOLD: Stop training when balanced accuracy exceeds this (default: 0.85)MIN_CLASS_F1_THRESHOLD: Stop training when minimum per-class F1 exceeds this (default: 0.70)USE_CLASS_WEIGHTS: Enable class-weighted loss function (default: true)The balanced learning approach is especially important when using instant labels (created by double-clicking on the time series), as these often create highly imbalanced datasets where background periods vastly outnumber event instances.
Time series data is often highly imbalanced, especially when using instant labels:
The problem:
Our solution:
Class Weights: Automatically calculated inverse frequency weights
├── Background (Class 0): Low weight (e.g., 0.1x)
├── Run (Class 1): High weight (e.g., 5.0x)
└── Walk (Class 2): High weight (e.g., 4.0x)
Early Stopping: Dual criteria prevent premature stopping
├── Balanced Accuracy ≥ 85% (macro-averaged across classes)
└── Minimum Class F1 ≥ 70% (worst-performing class must be decent)
Metrics: Focus on per-class performance
├── Balanced Accuracy: Equal weight to each class
├── Macro F1: Average F1 across all classes
└── Per-class F1: Individual class performance tracking
This ensures the model learns to detect actual events rather than just predicting background.
When multiple annotations exist for the same task, the model prioritizes ground truth annotations:
The model processes new time series data by applying the trained LSTM classifier with sliding window temporal context. Only meaningful event segments are returned to Label Studio, filtering out background periods automatically.
For each task, the model performs these steps:
instant=true for point events (start=end, one sample events that you can label using double click) and instant=false for rangesThe model provides several quality indicators:
This approach ensures that predictions focus on actual events rather than forcing labels on every timestep.
The backend automatically handles multiple Label Studio projects by maintaining separate trained models for each project. This ensures proper isolation and prevents cross-project interference.
Model storage:
model_project_{project_id}.ptmodel_project_47.pt, Project 123 → model_project_123.ptmodel_project_0.ptModel training:
Model prediction:
This architecture provides several key advantages:
Data isolation:
Performance independence:
Scalability:
No additional configuration is required - project isolation works automatically. The backend determines project context from:
This seamless multi-tenant support makes the backend suitable for enterprise Label Studio deployments where multiple teams or clients need isolated ML models.
flowchart TD
A[Annotation Event] --> B{Training Trigger?}
B -- no --> C[Skip Training]
B -- yes --> D[Fetch Labeled Tasks]
D --> E[Process Annotations]
E --> F{Ground Truth?}
F -- yes --> G[Priority Processing]
F -- no --> H[Standard Processing]
G --> I[Generate Samples]
H --> I
I --> J[Background + Event Labels]
J --> K[PyTorch LSTM Training]
K --> L[Model Validation]
L --> M[Save Model]
M --> N[Cache in Memory]
flowchart TD
T[Prediction Request] --> U[Load PyTorch Model]
U --> V[Read Task CSV]
V --> W[Extract Features]
W --> X[Sliding Window LSTM]
X --> Y[Overlap Averaging]
Y --> Z[Filter Background]
Z --> AA[Group Event Segments]
AA --> BB[Calculate Confidence]
BB --> CC[Return Segments]
instant=true) vs. duration events (instant=false)Edit docker-compose.yml to set environment variables for your specific use case:
environment:
- LABEL_STUDIO_HOST=http://localhost:8080
- LABEL_STUDIO_API_KEY=your_api_key_here
- MODEL_DIR=/app/models
- START_TRAINING_EACH_N_UPDATES=1
- TRAIN_EPOCHS=1000
- SEQUENCE_SIZE=50
- HIDDEN_SIZE=64
environment:
# ... basic config above ...
- BALANCED_ACCURACY_THRESHOLD=0.85
- MIN_CLASS_F1_THRESHOLD=0.70
- USE_CLASS_WEIGHTS=true
For instant labels (point events):
USE_CLASS_WEIGHTS=true)MIN_CLASS_F1_THRESHOLD=0.60) for very rare eventsTRAIN_EPOCHS=2000) for better minority class learningFor range annotations with balanced data:
USE_CLASS_WEIGHTS=false) if classes are roughly equalFor short time series:
SEQUENCE_SIZE=20) for sequences shorter than 50 timestepsHIDDEN_SIZE=32) to prevent overfitting