plugins/arrow-flight-rpc/docs/metrics.md
The Arrow Flight RPC plugin provides comprehensive metrics to monitor the performance and health of the transport. These metrics are available through the Flight Stats API.
Metrics can be accessed using the Flight Stats API:
GET /_flight/stats
This returns metrics for all nodes. To get metrics for a specific node:
GET /_flight/stats/{node_id}
Streaming transport tasks can be monitored using the existing Tasks API:
curl "localhost:9200/_cat/tasks?v"
Streaming tasks are identified by the stream-transport type:
action task_id parent_task_id type start_time timestamp running_time ip node
indices:data/read/search TVk0SciMQtSwplV6rQwyMA:2165 - transport 1754082449785 21:07:29 169.5ms 127.0.0.1 node-1
indices:data/read/search[phase/query] TVk0SciMQtSwplV6rQwyMA:2166 TVk0SciMQtSwplV6rQwyMA:2165 stream-transport 1754082449786 21:07:29 168.4ms 127.0.0.1 node-1
Metrics are organized into the following categories:
Metrics related to client-side calls:
| Metric | Description |
|---|---|
started | Number of client calls started |
completed | Number of client calls completed |
duration | Duration statistics for client calls (min, max, avg, sum) |
request_bytes | Size statistics for requests sent by clients (min, max, avg, sum) |
response | Total size of responses received by clients (with human-readable format) |
response_bytes | Total size of responses received by clients (in bytes) |
Metrics related to client-side batch operations:
| Metric | Description |
|---|---|
requested | Number of batches requested by clients |
received | Number of batches received by clients |
received_bytes | Size statistics for batches received by clients (min, max, avg, sum) |
processing_time | Time statistics for processing received batches (min, max, avg, sum) |
Metrics related to server-side calls:
| Metric | Description |
|---|---|
started | Number of server calls started |
completed | Number of server calls completed |
duration | Duration statistics for server calls (min, max, avg, sum) |
request_bytes | Size statistics for requests received by servers (min, max, avg, sum) |
response | Total size of responses sent by servers (with human-readable format) |
response_bytes | Total size of responses sent by servers (in bytes) |
Metrics related to server-side batch operations:
| Metric | Description |
|---|---|
sent | Number of batches sent by servers |
sent_bytes | Size statistics for batches sent by servers (min, max, avg, sum) |
processing_time | Time statistics for processing and sending batches (min, max, avg, sum) |
Metrics related to call status codes:
| Metric | Description |
|---|---|
client.{status} | Count of client calls completed with each status code (OK, CANCELLED, UNAVAILABLE, etc.) |
server.{status} | Count of server calls completed with each status code (OK, CANCELLED, UNAVAILABLE, etc.) |
Metrics related to resource usage:
| Metric | Description |
|---|---|
arrow_allocated | Current Arrow memory allocation (human-readable format) |
arrow_allocated_bytes | Current Arrow memory allocation in bytes |
arrow_peak | Peak Arrow memory allocation (human-readable format) |
arrow_peak_bytes | Peak Arrow memory allocation in bytes |
direct_memory | Current direct memory usage (human-readable format) |
direct_memory_bytes | Current direct memory usage in bytes |
client_threads_active | Number of active client threads |
client_threads_total | Total number of client threads |
server_threads_active | Number of active server threads |
server_threads_total | Total number of server threads |
client_channels_active | Number of active client channels |
server_channels_active | Number of active server channels |
The API also provides cluster-level aggregated metrics that combine data from all nodes:
GET /_flight/stats
The response includes a cluster_stats section with aggregated metrics for:
Note: All duration and size fields include both human-readable formats (e.g., "1s", "24.6kb") and raw values in nanoseconds/bytes.
{
"cluster_name": "opensearch",
"nodes": {
"node_id": {
"name": "node_name",
"streamAddress": "localhost:9400",
"flight_metrics": {
"client_calls": {
"started": 6,
"completed": 6,
"duration": {
"count": 6,
"sum": "1s",
"sum_nanos": 1019,
"min": "9ms",
"min_nanos": 9,
"max": "743.7ms",
"max_nanos": 743,
"avg": "169.8ms",
"avg_nanos": 169
},
"request_bytes": {
"count": 6,
"sum": "5.9kb",
"sum_bytes": 6132,
"min": "1022b",
"min_bytes": 1022,
"max": "1022b",
"max_bytes": 1022,
"avg": "1022b",
"avg_bytes": 1022
},
"response": "24.6kb",
"response_bytes": 25276
},
"client_batches": {
"requested": 6,
"received": 6,
"received_bytes": {
"count": 6,
"sum": "24.6kb",
"sum_bytes": 25276,
"min": "3.3kb",
"min_bytes": 3477,
"max": "4.2kb",
"max_bytes": 4361,
"avg": "4.1kb",
"avg_bytes": 4212
},
"processing_time": {
"count": 6,
"sum": "12.1ms",
"sum_nanos": 12,
"min": "352micros",
"min_nanos": 0,
"max": "9.5ms",
"max_nanos": 9,
"avg": "2ms",
"avg_nanos": 2
}
},
"server_calls": {
"started": 3,
"completed": 3,
"duration": {
"count": 3,
"sum": "147.9ms",
"sum_nanos": 147,
"min": "6ms",
"min_nanos": 6,
"max": "135.7ms",
"max_nanos": 135,
"avg": "49.3ms",
"avg_nanos": 49
},
"request_bytes": {
"count": 3,
"sum": "2.9kb",
"sum_bytes": 3066,
"min": "1022b",
"min_bytes": 1022,
"max": "1022b",
"max_bytes": 1022,
"avg": "1022b",
"avg_bytes": 1022
},
"response": "12.7kb",
"response_bytes": 13083
},
"server_batches": {
"sent": 3,
"sent_bytes": {
"count": 3,
"sum": "12.7kb",
"sum_bytes": 13083,
"min": "4.2kb",
"min_bytes": 4361,
"max": "4.2kb",
"max_bytes": 4361,
"avg": "4.2kb",
"avg_bytes": 4361
},
"processing_time": {
"count": 3,
"sum": "6.4ms",
"sum_nanos": 6,
"min": "525.4micros",
"min_nanos": 0,
"max": "5.3ms",
"max_nanos": 5,
"avg": "2.1ms",
"avg_nanos": 2
}
},
"status": {
"client": {
"OK": 6
},
"server": {
"OK": 3
}
},
"resources": {
"arrow_allocated": "0b",
"arrow_allocated_bytes": 0,
"arrow_peak": "48kb",
"arrow_peak_bytes": 49152,
"direct_memory": "120.7mb",
"direct_memory_bytes": 126642920,
"client_threads_active": 0,
"client_threads_total": 0,
"server_threads_active": 0,
"server_threads_total": 0,
"client_channels_active": 2,
"server_channels_active": 1
}
}
}
},
"cluster_stats": {
"client": {
"calls": {
"started": 6,
"completed": 6,
"duration": "1s",
"duration_nanos": 1019,
"avg_duration": "169.8ms",
"avg_duration_nanos": 169,
"request": "5.9kb",
"request_bytes": 6132,
"response": "24.6kb",
"response_bytes": 25276
},
"batches": {
"requested": 6,
"received": 6,
"received_size": "24.6kb",
"received_bytes": 25276,
"avg_processing_time": "2ms",
"avg_processing_time_nanos": 2
}
},
"server": {
"calls": {
"started": 6,
"completed": 6,
"duration": "556ms",
"duration_nanos": 556,
"avg_duration": "92.6ms",
"avg_duration_nanos": 92,
"request": "5.9kb",
"request_bytes": 6132,
"response": "24.6kb",
"response_bytes": 25276
},
"batches": {
"sent": 6,
"sent_size": "24.6kb",
"sent_bytes": 25276,
"avg_processing_time": "34.6ms",
"avg_processing_time_nanos": 34
}
}
}
}
duration metrics for client and server callsarrow_allocated_bytes and arrow_peak_bytesclient_thread_utilization_percent and server_thread_utilization_percentstatus.client and status.serverCANCELLED status countsRESOURCE_EXHAUSTED status countsclient_calls.started and server_calls.started ratesclient_calls.request_bytes and server_batches.sent_bytes ratesclient_batches.received with client_batches.requested