docs/victoriametrics/FAQ.md
To be the best tool for monitoring and observability.
See case studies and articles.
See the full list of Prominent Features.
Yes. See these benchmarks.
See the list of technical articles on VictoriaMetrics components:
Follow the Quick Start guide.
See the Contributing guide.
Yes. Learn more in the High Availability docs for both single-node and cluster setups.
Yes. See Replication and data safety for details.
The single-node version scales vertically. It can handle up to 100 million active time series and 2 million samples per second (based on real usage).
The cluster version scales both vertically and horizontally. It can handle billions of active time series and hundreds of millions of samples per second (based on real usage).
See performance comparison with other solutions.
Yes, in most cases. VictoriaMetrics can substitute Prometheus in the following aspects:
While both vmagent and Prometheus may scrape Prometheus targets (aka /metrics pages),
read Prometheus-compatible scrape configs
and send data to multiple remote storage systems, vmagent has the following additional features:
-remoteWrite.url).
This means that slow or temporarily unavailable storage doesn't prevent it from sending data to healthy storage in parallel.
Prometheus uses a single shared buffer for all the configured remote storage systems (see remote_write->url) with a hardcoded retention of 2 hours.Both vmagent and Prometheus agent serve the same purpose – to efficiently scrape Prometheus-compatible targets at the edge. They have the following differences:
Yes. Prometheus continues to write data to local storage after enabling remote write, so all the existing local storage data and new data is available for querying via Prometheus as usual.
It is recommended using vmagent for scraping Prometheus targets and writing data to VictoriaMetrics.
VictoriaMetrics has no limitation on backfilling of old (historical) or out-of-order metrics while they're within the specified retention period. See more about backfilling.
<!-- Links inside the paragraph break navigation in the right-side menu. To fix this, an explicit anchor definition has been added. -->The following articles and talks provide additional details:
VictoriaMetrics also uses less RAM than Thanos components.
<!-- Links inside the paragraph break navigation in the right-side menu. To fix this, an explicit anchor definition has been added. -->Grafana Mimir is a Cortex fork, so it has the same differences as Cortex. See what is the difference between VictoriaMetrics and Cortex.
See also Grafana Mimir vs VictoriaMetrics benchmark.
<!-- Links inside the paragraph break navigation in the right-side menu. To fix this, an explicit anchor definition has been added. -->VictoriaMetrics is similar to Cortex in the following aspects:
The main differences between Cortex and VictoriaMetrics:
See How to migrate from InfluxDB to VictoriaMetrics.
<!-- Links inside the paragraph break navigation in the right-side menu. To fix this, an explicit anchor definition has been added. -->No. VictoriaMetrics core is written in Go from scratch by fasthttp's author. The architecture is optimized for storing and querying large amounts of time series data with high cardinality. VictoriaMetrics storage uses certain ideas from ClickHouse. Special thanks to Alexey Milovidov.
The following versions are open source and free:
We provide commercial support for both versions. Contact us for pricing.
VictoriaMetrics Cloud – the most cost-efficient hosted monitoring platform, operated by VictoriaMetrics core team.
<!-- Links inside the paragraph break navigation in the right-side menu. To fix this, an explicit anchor definition has been added. -->The remote read API requires transferring all the raw data for all the requested metrics over the given time range. For instance,
if a query covers 1000 metrics with 10K values each, then the remote read API has to return 1000*10K=10M metric values to Prometheus.
This is slow and expensive.
Prometheus' remote read API isn't intended for querying foreign data – aka global query view. See this issue for details.
Instead, query VictoriaMetrics directly via vmui, the Prometheus Querying API or via Prometheus datasource in Grafana.
HA pairs)?Yes. See these docs for details.
Source code for Victoriametrics can be found in the following locations:
VictoriaMetrics is able to handle data from hundreds of millions of IoT sensors and industrial sensors. It supports high cardinality data, perfectly scales up on a single node and scales horizontally to multiple nodes in cluster setup. It also supports an option for reducing the index size for IoT data - see these docs.
Both the single-node and cluster versions of VictoriaMetrics are built on the same core code, so they share many features. That said, here are the key differences between them:
The single-node VictoriaMetrics runs on a single host, while cluster version of VictoriaMetrics can scale to many hosts. The single-node VictoriaMetrics can scale vertically, e.g. its capacity and performance scales almost linearly when increasing available CPU, RAM, disk IO and disk space. See an article about vertical scalability of a single-node VictoriaMetrics.
The cluster version of VictoriaMetrics supports multitenancy, but single-node VictoriaMetrics does not.
The cluster version of VictoriaMetrics supports data replication, while single-node VictoriaMetrics relies on the durability
of the persistent storage pointed by the -storageDataPath command-line flag.
See these docs for details.
The single-node version of VictoriaMetrics delivers higher capacity and performance than the cluster version when running on the same hardware with equal CPU and RAM, as it avoids the overhead of network data transfers between cluster components.
See also which type of VictoriaMetrics is recommended to use.
Questions about VictoriaMetrics can be asked via the following channels:
See the full list of community channels.
File bugs and feature requests in our GitHub Issues.
See these docs. Multitenancy is supported only by the cluster version of VictoriaMetrics.
All VictoriaMetrics components provide command-line flags to control the size of internal buffers and caches:
-memory.allowedPercent and -memory.allowedBytes (pass -help to any VictoriaMetrics component in order to see the description for these flags).
These limits don't take into account additional memory, which may be needed for processing incoming queries.
Hard limits may be enforced only by the OS via cgroups,
Docker (see these docs) or
Kubernetes (see these docs).
Memory usage for VictoriaMetrics components can be tuned according to the following docs:
VictoriaMetrics is included in OpenBSD and FreeBSD ports so just install it from there or use pre-built binaries from releases page.
Yes. See these docs.
A time series is uniquely identified by its name plus a set of its labels. For example, temperature{city="NY",country="US"} and temperature{city="SF",country="US"}
are two distinct series, since they differ by the city label. A time series is considered active if it received at least a single sample during the last hour.
The number of active time series is displayed on the official Grafana dashboard for VictoriaMetrics - see these docs for details.
If old time series are constantly substituted by new time series at a high rate,
then such a state is called high churn rate. High churn rate has the following negative consequences:
<-storageDataPath>/indexdb, since the inverted index contains entries for every label of every time series with at least a single ingested sample.The main reason for high churn rate is a metric label with frequently changed value. Examples of such labels:
queryid, which changes with each query at postgres_exporter.pod, which changes with each new deployment in Kubernetes.timestamp, minute or hour.hash or uuid label, which changes frequently.The solution against high churn rate is to identify and eliminate labels with frequently changed values. Cardinality explorer can help determining these labels. If labels can't be removed, try pre-aggregating data before it gets ingested into database with stream aggregation.
The official Grafana dashboards for VictoriaMetrics contain graphs for churn rate - see these docs for details.
High cardinality usually means a high number of active time series. High cardinality may lead to high memory usage and/or to a high percentage of slow inserts. The source of high cardinality is usually a label with a large number of unique values, which presents a big share of the ingested time series. Examples of such labels:
user_idurlipThe solution is to identify and remove the source of high cardinality with the help of cardinality explorer.
The official Grafana dashboards for VictoriaMetrics contain graphs, which show the number of active time series - see these docs for details.
VictoriaMetrics maintains in-memory cache for mapping of active time series into internal series ids.
The cache size depends on the available memory for VictoriaMetrics in the host system. If the information about all the active time series doesn't fit the cache,
then VictoriaMetrics needs to read and unpack the information from disk on every incoming sample for time series missing in the cache.
This operation is much slower than the cache lookup, so such an insert is named a slow insert.
A high percentage of slow inserts on the official dashboard for VictoriaMetrics indicates
a memory shortage for the current number of active time series. Such a condition usually leads
to a significant slowdown for data ingestion and to significantly increased disk IO and CPU usage.
The solution is to add more memory or to reduce the number of active time series.
Cardinality explorer can be helpful for locating the source of high number of active time series.
See this article.
VictoriaMetrics also provides query tracer and cardinality explorer, which can help during query optimization.
See also troubleshooting slow queries.
Both single-node VictoriaMetrics and VictoriaMetrics cluster are production-ready.
See Scalability limits of VictoriaMetrics.
Single-node VictoriaMetrics requires lower amounts of CPU and RAM for handling the same workload comparing to cluster version of VictoriaMetrics, since it doesn't need to pass the encoded data over the network between cluster components.
The performance of a single-node VictoriaMetrics scales almost perfectly with the available CPU, RAM and disk IO resources on the host where it runs - see this article.
Single-node VictoriaMetrics is easier to setup and operate comparing to cluster version of VictoriaMetrics.
Given the facts above it is recommended to use single-node VictoriaMetrics in the majority of cases.
Cluster version of VictoriaMetrics may be preferred over single-node VictoriaMetrics in the following relatively rare cases:
If multitenancy support is needed, since single-node VictoriaMetrics doesn't support multitenancy. Though it is possible to run multiple single-node VictoriaMetrics instances - one per each tenant - and route incoming requests from particular tenant to the needed VictoriaMetrics instance via vmauth.
If the current workload cannot be handled by a single-node VictoriaMetrics. For example, if you are going to ingest hundreds of millions of active time series at ingestion rates exceeding a million samples per second, then it is better to use cluster version of VictoriaMetrics, since its capacity can scale horizontally with the number of nodes in the cluster.
Don't choose cluster unless you have to.
The single-node version of VictoriaMetrics stores data on disk in slightly different format compared to the cluster version of VictoriaMetrics.
This makes it impossible to just copy the on-disk data from -storageDataPath directory from single-node VictoriaMetrics to a vmstorage node in VictoriaMetrics cluster.
If you need to migrate data from a single-node VictoriaMetrics to the cluster version, then follow these instructions.
MetricsQL provides better user experience than PromQL. It fixes a few annoying issues in PromQL. This prevents MetricsQL to be 100% compatible with PromQL. See this article for details.
Please see these docs.
Please see these docs.
Please see these docs.
Please use the whisper-to-graphite tool for reading data from Graphite and pushing them to VictoriaMetrics via Graphite's import API.
There could be a slight difference in stored values for time series. Due to different compression algorithms, VM may reduce the precision for float values with more than 12 significant decimal digits. Please see this article.
The query engine may behave differently for some functions. Please see this article.
Deduplication is a special case of zero-offset downsampling. So, if both downsampling and deduplication are enabled, then deduplication is replaced by zero-offset downsampling.
Single-node VictoriaMetrics cannot be restarted / upgraded or downgraded without downtime, since it needs to be gracefully shut down and then started again. See how to upgrade VictoriaMetrics.
Cluster version of VictoriaMetrics can be restarted / upgraded / downgraded without downtime according to these instructions.
VictoriaMetrics doesn't rebalance data between vmstorage nodes when new vmstorage nodes are added to the cluster.
This means that newly added vmstorage nodes will have less data at -storageDataPath compared to the older vmstorage nodes
until the historical data is removed from the old vmstorage nodes when it goes outside the configured retention.
The automatic re-balancing is the process of moving data between vmstorage nodes, so every node eventually contains the same amount of data.
It is disabled by default because it may consume additional CPU, network bandwidth and disk IO at vmstorage nodes for long periods of time,
which, in turn, can negatively impact VictoriaMetrics cluster availability.
Additionally, it is unclear how to handle the automatic re-balancing if cluster configuration changes while the re-balancing is in progress.
The amounts of data stored in vmstorage becomes equal among old vmstorage nodes and new vmstorage nodes
after historical data is removed from the old vmstorage nodes because it goes outside of configured retention.
The data ingestion load becomes even between old vmstorage nodes and new vmstorage nodes almost immediately
after adding new vmstorage nodes to the cluster, since vminsert nodes evenly distribute incoming time series
among the nodes specified in -storageNode command-line flag. The newly added vmstorage nodes may experience
increased load during the first couple of minutes because they need to register active time series.
The query load becomes even between old vmstorage nodes and new vmstorage nodes after most of queries are executed
over time ranges with data covered by new vmstorage nodes. Usually the most of queries are received
from alerting and recording rules, which query data on limited time ranges
such as a few hours to a few days at most. This means that the query load between old vmstorage nodes and new vmstorage nodes
should become even within few hours / days after adding new vmstorage nodes.
See also rebalancing docs at VictoriaMetrics cluster.
VictoriaMetrics doesn't restore replication factor
when some of vmstorage nodes are removed from the cluster because of the following reasons:
Automatic replication factor recovery needs to copy non-trivial amounts of data between the remaining vmstorage nodes.
This additional copying requires additional CPU, disk IO and network bandwidth at vmstorage nodes. This may negatively impact
VictoriaMetrics cluster availability during extended periods of time.
It is unclear when the automatic replication factor recovery must be started. How to distinguish the expected temporary
vmstorage node unavailability because of maintenance, upgrade or config changes from permanent loss of data at the vmstorage node?
It is recommended reading replication and data safety docs for more details.
VictoriaMetrics stores index data into <-storageDataPath>/indexdb subdirectory,
while the data itself is stored in the <-storageDataPath>/data subdirectory,
(the <-storageDataPath> is the corresponding command-line flag value, which points to the directory where VictoriaMetrics stores all its data).
The size of the indexdb subdirectory is exposed
via vm_data_size_bytes{type="indexdb/file"} metric, while the size of the data subdirectory is exposed via vm_data_size_bytes{type="storage/big"}
and vm_data_size_bytes{type="storage/small"} metrics.
The size of the indexdb subdirectory can exceed the size of the data subdirectory in cases of high churn rate
when old time series are replaced by new time series at a high rate.
VictoriaMetrics stores various index data into indexdb per each label
per each registered time series in order to speed up searching for these time series by label filters.
So the size of the indexdb grows proportionally to the total number of time series registered in VictoriaMetrics,
and proportionally to the total length of all the labels seen across all the registered time series.
Typical monitoring in Kubernetes generates moderate-to-high churn rate for time series because every restart of the pod creates a new set of time series
for all the metrics exposed by that pod, with a new pod label.
The number of labels and the total length of label=value pairs per every time series in Kubernetes is quite large
(~30-40 labels with ~1KB total length of label=value pairs per time series). This contributes to quick growth of the indexdb over time,
so its' size may exceed the size of the data folder by up to 2x in typical production cases.
There are the following workarounds, which can reduce the growth rate of the indexdb:
To drop unneeded long labels from the ingested metrics before they are stored in VictoriaMetrics. See how to drop unneeded labels from scrape targets and how to drop unneeded labels from metrics.
To aggregate multiple time series into a single output time series before storing them into VictoriaMetrics. The aggregation can be performed via recording rules at vmalert by using aggregate functions at MetricsQL or via streaming aggregation according to these docs.
VictoriaMetrics also adds per-day entries into indexdb for time series seen during the particular day, in order to speed up searches for time series seen at that day.
This gradually increases indexdb size over time even if time series remain the same over multiple days. If the set of monitored time series
in your case is constant over many days, then it is a good idea to disable the per-day index and rely only on global index during queries.
This reduces indexdb growth rate. See how to disable per-day index.
Note that the deduplication
and downsampling
may reduce the number of raw samples
per each stored time series, but they do not reduce the number of stored time series, so they cannot reduce indexdb size.
See also how to calculate the needed disk space at VictoriaMetrics for the given workload.