Elasticsearch

Plugin: go.d.plugin Module: elasticsearch

Overview

This collector monitors the performance and health of the Elasticsearch cluster.

It uses Cluster APIs to collect metrics.

Used endpoints:

Endpoint	Description	API
`/`	Node info
`/_nodes/stats`	Nodes metrics	Nodes stats API
`/_nodes/_local/stats`	Local node metrics	Nodes stats API
`/_cluster/health`	Cluster health stats	Cluster health API
`/_cluster/stats`	Cluster metrics	Cluster stats API

This collector is supported on all platforms.

This collector supports collecting metrics from multiple instances of this integration, including remote instances.

Elasticsearch can be monitored further using the following other integrations:

Default Behavior

Auto-Detection

By default, it detects instances running on localhost by attempting to connect to port 9200:

Limits

By default, this collector monitors only the node it is connected to. To monitor all cluster nodes, set the cluster_mode configuration option to yes.

Performance Impact

The default configuration for this integration is not expected to impose a significant performance impact on the system.

Setup

You can configure the elasticsearch collector in two ways:

Method	Best for	How to
UI	Fast setup without editing files	Go to Nodes → Configure this node → Collectors → Jobs, search for elasticsearch, then click + to add a job.
File	If you prefer configuring via file, or need to automate deployments (e.g., with Ansible)	Edit `go.d/elasticsearch.conf` and add a job.

:::important

UI configuration requires paid Netdata Cloud plan.

:::

Prerequisites

No action required.

Configuration

Options

The following options can be defined globally: update_every, autodetection_retry.

<details open><summary>Config options</summary>

Group	Option	Description	Default	Required
Collection	update_every	Data collection interval (seconds).	5	no
	autodetection_retry	Autodetection retry interval (seconds). Set 0 to disable.	0	no
Target	url	Target endpoint URL.	http://127.0.0.1:9200	yes
	timeout	HTTP request timeout (seconds).	2	no
Metrics Selection	cluster_mode	Collect metrics for all nodes in the cluster (yes) or only the local node (no).	no	no
	collect_node_stats	Collect node metrics.	yes	no
	collect_cluster_health	Collect cluster health metrics.	yes	no
	collect_cluster_stats	Collect cluster stats metrics.	yes	no
	collect_indices_stats	Collect index metrics.	no	no
HTTP Auth	username	Username for Basic HTTP authentication.		no
	password	Password for Basic HTTP authentication.		no
	bearer_token_file	Path to a file containing a bearer token (used for `Authorization: Bearer`).		no
TLS	tls_skip_verify	Skip TLS certificate and hostname verification (insecure).	no	no
	tls_ca	Path to CA bundle used to validate the server certificate.		no
	tls_cert	Path to client TLS certificate (for mTLS).		no
	tls_key	Path to client TLS private key (for mTLS).		no
Proxy	proxy_url	HTTP proxy URL.		no
	proxy_username	Username for proxy Basic HTTP authentication.		no
	proxy_password	Password for proxy Basic HTTP authentication.		no
Request	method	HTTP method to use.	GET	no
	body	Request body (e.g., for POST/PUT).		no
	headers	Additional HTTP headers (one per line as key: value).		no
	not_follow_redirects	Do not follow HTTP redirects.	no	no
	force_http2	Force HTTP/2 (including h2c over TCP).	no	no
Functions	functions.top_queries.disabled	Disable the top-queries function.	no	no
	functions.top_queries.timeout	Query timeout (seconds). Uses collector timeout if not set.		no
	functions.top_queries.limit	Maximum number of queries to return.	500	no
Virtual Node	vnode	Associates this data collection job with a Virtual Node.		no

</details>

via UI

Configure the elasticsearch collector from the Netdata web interface:

Go to Nodes.
Select the node where you want the elasticsearch data-collection job to run and click the :gear: (Configure this node). That node will run the data collection.
The Collectors → Jobs view opens by default.
In the Search box, type elasticsearch (or scroll the list) to locate the elasticsearch collector.
Click the + next to the elasticsearch collector to add a new job.
Fill in the job fields, then click Test to verify the configuration and Submit to save.
- Test runs the job with the provided settings and shows whether data can be collected.
- If it fails, an error message appears with details (for example, connection refused, timeout, or command execution errors), so you can adjust and retest.

via File

The configuration file name for this integration is go.d/elasticsearch.conf.

The file format is YAML. Generally, the structure is:

yaml

update_every: 1
autodetection_retry: 0
jobs:
  - name: some_name1
  - name: some_name2

You can edit the configuration file using the edit-config script from the Netdata config directory.

bash

cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
sudo ./edit-config go.d/elasticsearch.conf

Examples

Basic single node mode

A basic example configuration.

yaml

jobs:
  - name: local
    url: http://127.0.0.1:9200

Cluster mode

Cluster mode example configuration.

<details open><summary>Config</summary>

yaml

jobs:
  - name: local
    url: http://127.0.0.1:9200
    cluster_mode: yes

</details>

HTTP authentication

Basic HTTP authentication.

<details open><summary>Config</summary>

yaml

jobs:
  - name: local
    url: http://127.0.0.1:9200
    username: username
    password: password

</details>

HTTPS with self-signed certificate

Elasticsearch with enabled HTTPS and self-signed certificate.

<details open><summary>Config</summary>

yaml

jobs:
  - name: local
    url: https://127.0.0.1:9200
    tls_skip_verify: yes

</details>

Multi-instance

Note: When you define multiple jobs, their names must be unique.

Collecting metrics from local and remote instances.

<details open><summary>Config</summary>

yaml

jobs:
  - name: local
    url: http://127.0.0.1:9200

  - name: remote
    url: http://192.0.2.1:9200

</details>

Alerts

The following alerts are available:

Alert name	On metric	Description
elasticsearch_node_indices_search_time_query	elasticsearch.node_indices_search_time	search performance is degraded, queries run slowly.
elasticsearch_node_indices_search_time_fetch	elasticsearch.node_indices_search_time	search performance is degraded, fetches run slowly.
elasticsearch_cluster_health_status_red	elasticsearch.cluster_health_status	cluster health status is red.
elasticsearch_cluster_health_status_yellow	elasticsearch.cluster_health_status	cluster health status is yellow.
elasticsearch_node_index_health_red	elasticsearch.node_index_health	node index $label:index health status is red.

Metrics

Metrics grouped by scope.

The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.

Per node

These metrics refer to the cluster node.

Labels:

Label	Description
cluster_name	Name of the cluster. Based on the Cluster name setting.
node_name	Human-readable identifier for the node. Based on the Node name setting.
host	Network host for the node, based on the Network host setting.

Metrics:

Metric	Dimensions	Unit
elasticsearch.node_indices_indexing	index	operations/s
elasticsearch.node_indices_indexing_current	index	operations
elasticsearch.node_indices_indexing_time	index	milliseconds
elasticsearch.node_indices_search	queries, fetches	operations/s
elasticsearch.node_indices_search_current	queries, fetches	operations
elasticsearch.node_indices_search_time	queries, fetches	milliseconds
elasticsearch.node_indices_refresh	refresh	operations/s
elasticsearch.node_indices_refresh_time	refresh	milliseconds
elasticsearch.node_indices_flush	flush	operations/s
elasticsearch.node_indices_flush_time	flush	milliseconds
elasticsearch.node_indices_fielddata_memory_usage	used	bytes
elasticsearch.node_indices_fielddata_evictions	evictions	operations/s
elasticsearch.node_indices_segments_count	segments	segments
elasticsearch.node_indices_segments_memory_usage_total	used	bytes
elasticsearch.node_indices_segments_memory_usage	terms, stored_fields, term_vectors, norms, points, doc_values, index_writer, version_map, fixed_bit_set	bytes
elasticsearch.node_indices_translog_operations	total, uncommitted	operations
elasticsearch.node_indices_translog_size	total, uncommitted	bytes
elasticsearch.node_file_descriptors	open	fd
elasticsearch.node_jvm_heap	inuse	percentage
elasticsearch.node_jvm_heap_bytes	committed, used	bytes
elasticsearch.node_jvm_buffer_pools_count	direct, mapped	pools
elasticsearch.node_jvm_buffer_pool_direct_memory	total, used	bytes
elasticsearch.node_jvm_buffer_pool_mapped_memory	total, used	bytes
elasticsearch.node_jvm_gc_count	young, old	gc/s
elasticsearch.node_jvm_gc_time	young, old	milliseconds
elasticsearch.node_thread_pool_queued	generic, search, search_throttled, get, analyze, write, snapshot, warmer, refresh, listener, fetch_shard_started, fetch_shard_store, flush, force_merge, management	threads
elasticsearch.node_thread_pool_rejected	generic, search, search_throttled, get, analyze, write, snapshot, warmer, refresh, listener, fetch_shard_started, fetch_shard_store, flush, force_merge, management	threads
elasticsearch.node_cluster_communication_packets	received, sent	pps
elasticsearch.node_cluster_communication_traffic	received, sent	bytes/s
elasticsearch.node_http_connections	open	connections
elasticsearch.node_breakers_trips	requests, fielddata, in_flight_requests, model_inference, accounting, parent	trips/s

Per cluster

These metrics refer to the cluster.

Labels:

Label	Description
cluster_name	Name of the cluster. Based on the Cluster name setting.

Metrics:

Metric	Dimensions	Unit
elasticsearch.cluster_health_status	green, yellow, red	status
elasticsearch.cluster_number_of_nodes	nodes, data_nodes	nodes
elasticsearch.cluster_shards_count	active_primary, active, relocating, initializing, unassigned, delayed_unaasigned	shards
elasticsearch.cluster_pending_tasks	pending	tasks
elasticsearch.cluster_number_of_in_flight_fetch	in_flight_fetch	fetches
elasticsearch.cluster_indices_count	indices	indices
elasticsearch.cluster_indices_shards_count	total, primaries, replication	shards
elasticsearch.cluster_indices_docs_count	docs	docs
elasticsearch.cluster_indices_store_size	size	bytes
elasticsearch.cluster_indices_query_cache	hit, miss	events/s
elasticsearch.cluster_nodes_by_role_count	coordinating_only, data, data_cold, data_content, data_frozen, data_hot, data_warm, ingest, master, ml, remote_cluster_client, voting_only	nodes

Per index

These metrics refer to the index.

Labels:

Label	Description
cluster_name	Name of the cluster. Based on the Cluster name setting.
index	Name of the index.

Metrics:

Metric	Dimensions	Unit
elasticsearch.node_index_health	green, yellow, red	status
elasticsearch.node_index_shards_count	shards	shards
elasticsearch.node_index_docs_count	docs	docs
elasticsearch.node_index_store_size	store_size	bytes

Live Data

This collector exposes real-time functions for interactive troubleshooting in the Live tab.

Top Queries

Retrieves currently running search tasks from the Elasticsearch Tasks API.

This function queries the /_tasks endpoint filtered for search actions (*search), providing a real-time snapshot of all active search operations across all nodes in the cluster.

Use cases:

Identify long-running search queries that may be impacting cluster performance
Monitor active search workload distribution across cluster nodes
Debug slow or stuck search operations in real-time

Aspect	Description
Name	`Elasticsearch:top-queries`
Require Cloud	yes
Performance	Queries the `/_tasks` API filtered for search actions:
• Lightweight operation with minimal cluster overhead
• Returns only currently active search tasks, typically a small result set
Security	Task descriptions may contain query details including potentially sensitive information:
• Index names and search patterns
• Query terms and filter values
• Access should be restricted to authorized personnel only
Availability	Available when:
• The collector has successfully connected to Elasticsearch/OpenSearch
• The user has `monitor` or `manage` cluster privileges
• Returns HTTP 503 if collector is still initializing
• Returns HTTP 500 if the Tasks API query fails
• Returns HTTP 504 if the query times out

Prerequisites

Ensure access to Tasks API

The user must have appropriate privileges to access the Tasks API.

For secured clusters, grant the monitor or manage cluster privilege:
json
```
{
  "cluster": ["monitor"]
}
```

Verify access to the Tasks API:

bash

curl -u user:password "http://localhost:9200/_tasks?actions=*search"

:::info

The Tasks API returns only currently running tasks; completed tasks are not stored
Search tasks can be cancelled using POST /_tasks/{task_id}/_cancel if they are cancellable
Works with both Elasticsearch and OpenSearch clusters

:::

Parameters

Parameter	Type	Description	Required	Default	Options
Filter By	select	Select the primary sort column. Options include running time, start time, and task ID. Defaults to running time to show longest-running searches first.	yes	runningTime

Returns

Real-time snapshot of currently executing search tasks across all cluster nodes. Each row represents a single active search operation.

Column	Type	Unit	Visibility	Description
Task ID	string		hidden	Unique identifier for the task in format `nodeId:taskId`. Can be used with the Task Management API to cancel long-running tasks.
Node ID	string			Internal identifier of the node executing this search task.
Node Name	string			Human-readable name of the node executing the search. Useful for identifying workload distribution across the cluster.
Action	string			The search action being performed (e.g., `indices:data/read/search`). Indicates the type of search operation.
Type	string		hidden	Task type classification (typically `transport` for search tasks).
Description	string			Detailed description of the search task including indices being searched and query details. Truncated to 4096 characters.
Start Time	timestamp			Timestamp when the search task started executing.
Running Time	duration	milliseconds		Time elapsed since the search started. High values indicate long-running searches that may need investigation or cancellation.
Cancellable	boolean		hidden	Whether the task supports cancellation via the Task Management API.
Cancelled	boolean		hidden	Whether a cancellation request has been issued for this task.

Troubleshooting

Debug Mode

Important: Debug mode is not supported for data collection jobs created via the UI using the Dyncfg feature.

To troubleshoot issues with the elasticsearch collector, run the go.d.plugin with the debug option enabled. The output should give you clues as to why the collector isn't working.

Navigate to the plugins.d directory, usually at /usr/libexec/netdata/plugins.d/. If that's not the case on your system, open netdata.conf and look for the plugins setting under [directories].
bash
```
cd /usr/libexec/netdata/plugins.d/
```
Switch to the netdata user.
bash
```
sudo -u netdata -s
```

Run the go.d.plugin to debug the collector:

bash

./go.d.plugin -d -m elasticsearch

To debug a specific job:

bash

./go.d.plugin -d -m elasticsearch -j jobName

Getting Logs

If you're encountering problems with the elasticsearch collector, follow these steps to retrieve logs and identify potential issues:

Run the command specific to your system (systemd, non-systemd, or Docker container).
Examine the output for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.

System with systemd

Use the following command to view logs generated since the last Netdata service restart:

bash

journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep elasticsearch

System without systemd

Locate the collector log file, typically at /var/log/netdata/collector.log, and use grep to filter for collector's name:

bash

grep elasticsearch /var/log/netdata/collector.log

Note: This method shows logs from all restarts. Focus on the latest entries for troubleshooting current issues.

Docker Container

If your Netdata runs in a Docker container named "netdata" (replace if different), use this command:

bash

docker logs netdata 2>&1 | grep elasticsearch