docs/content/v2024.1/deploy/checklist.md
A YugabyteDB cluster consists of two distributed services - the YB-TServer service and the YB-Master service. Because the YB-Master service serves the role of the cluster metadata manager, it should be brought up first, followed by the YB-TServer service. To bring up these distributed services, the respective servers (YB-Master or YB-TServer) need to be started across different nodes. There is a number of topics to consider and recommendations to follow when starting these services.
ulimit on each node running a YugabyteDB server.YugabyteDB internally replicates data in a consistent manner using the Raft consensus protocol to survive node failure without compromising data correctness. This distributed consensus replication is applied at a per-shard (also known as tablet) level similar to Google Spanner.
The replication factor (RF) corresponds to the number of copies of the data. You need at least as many nodes as the RF, which means one node for RF 1, three nodes for RF 3, and so on. With a RF of 3, your cluster can tolerate one node failure. With a RF of 5, it can tolerate two node failures. More generally, if RF is n, YugabyteDB can survive floor((n - 1) / 2) failures without compromising correctness or availability of data.
See Fault tolerance for more information.
When deploying a cluster, keep in mind the following:
--replication_factor flag when bringing up the YB-Master servers.Note that YugabyteDB works with both hostnames or IP addresses. The latter are preferred at this point, as they are more extensively tested.
See the yb-master command reference for more information.
YugabyteDB is designed to run on bare-metal machines, virtual machines (VMs), and containers.
You should allocate adequate CPU and RAM. YugabyteDB has adequate defaults for running on a wide range of machines, and has been tested from 2 core to 64 core machines, and up to 200GB RAM.
Minimum requirement
Production requirement
Add more CPU (compared to adding more RAM) to improve performance.
Additional considerations
For typical Online Transaction Processing (OLTP) workloads, YugabyteDB performance improves with more aggregate CPU in the cluster. You can achieve this by using larger nodes or adding more nodes to a cluster. Note that if you do not have enough CPUs, this will manifest itself as higher latencies and eventually dropped requests.
Memory depends on your application query pattern. Writes require memory but only up to a certain point (for example, 4GB, but if you have a write-heavy workload you may need a little more). Beyond that, more memory generally helps improve the read throughput and latencies by caching data in the internal cache. If you do not have enough memory to fit the read working set, then you will typically experience higher read latencies because data has to be read from disk. Having a faster disk could help in some of these cases.
YugabyteDB explicitly manages a block cache, and does not need the entire data set to fit in memory. It does not rely on the OS to keep data in its buffers. If you provide YugabyteDB sufficient memory, data accessed and present in block cache stays in memory.
YugabyteDB requires the SSE2 instruction set support, which was introduced into Intel chips with the Pentium 4 in 2001 and AMD processors in 2003. Most systems produced in the last several years are equipped with SSE2.
In addition, YugabyteDB requires SSE4.2.
To verify that your system supports SSE2, run the following command:
cat /proc/cpuinfo | grep sse2
To verify that your system supports SSE4.2, run the following command:
cat /proc/cpuinfo | grep sse4.2
SSDs (solid state disks) are required.
Both local or remote attached storage work with YugabyteDB. Because YugabyteDB internally replicates data for fault tolerance, remote attached storage which does its own additional replication is not a requirement. Local disks often offer better performance at a lower cost.
Multi-disk nodes:
--fs_data_dirs flag.Mount settings:
noatime setting when mounting the data drives.YugabyteDB does not require any form of RAID, but runs optimally on a JBOD (just a bunch of disks) setup. YugabyteDB can also leverage multiple disks per node and has been tested beyond 20 TB of storage per node.
Write-heavy applications usually require more disk IOPS (especially if the size of each record is larger), therefore in this case the total IOPS that a disk can support matters. On the read side, if the data does not fit into the cache and data needs to be read from the disk in order to satisfy queries, the disk performance (latency and IOPS) will start to matter.
YugabyteDB uses per-tablet size tiered compaction. Therefore the typical space amplification in YugabyteDB tends to be in the 10-20% range.
YugabyteDB stores data compressed by default. The effectiveness of compression depends on the data set. For example, if the data has already been compressed, then the additional compression at the storage layer of YugabyteDB will not be very effective.
It is recommended to plan for about 20% headroom on each node to allow space for miscellaneous overheads such as temporary additional space needed for compactions, metadata overheads, and so on.
The following is a list of default ports along with the network access required for using YugabyteDB:
Each of the nodes in the YugabyteDB cluster must be able to communicate with each other using TCP/IP on the following ports:
To view the cluster dashboard, you need to be able to navigate to the following ports on the nodes:
To access the database from applications or clients, the following ports need to be accessible from the applications or CLI:
This deployment uses YugabyteDB default ports.
YugabyteDB Anywhere has its own port requirements. Refer to Networking.
For YugabyteDB to maintain strict data consistency, clock drift and clock skew across all nodes must be tightly controlled and kept within defined bounds. Any deviation can impact node availability, as YugabyteDB prioritizes consistency over availability and will shut down servers if necessary to maintain integrity. Clock synchronization software, such as NTP or chrony, allows you to reduce clock skew and drift by continuously synchronizing system clocks across nodes in a distributed system like YugabyteDB. The following are some recommendations on how to configure clock synchronization.
Set a safe value for the maximum clock skew flag (--max_clock_skew_usec) for YB-TServers and YB-Masters when starting the YugabyteDB servers. The recommended value is two times the expected maximum clock skew between any two nodes in your deployment.
For example, if the maximum clock skew across nodes is expected to be no more than 250 milliseconds, then set the parameter to 500000 (--max_clock_skew_usec=500000).
The maximum clock drift on any node should be bounded to no more than 500 PPM (or parts per million). This means that the clock on any node should drift by no more than 0.5 ms per second. Note that 0.5 ms per second is the standard assumption of clock drift in Linux.
In practice, the clock drift would have to be orders of magnitude higher in order to cause correctness issues.
For a list of best practices, see security checklist.
YugabyteDB can run on a number of public clouds.
n2-highcpu-16 and n2-highcpu-32.