testsuite/cluster-test/README.md
Cluster test is framework that introduces different failures to system and verifies that some degree of liveliness and safety is preserved during those experiments.
Cluster test works with real AWS cluster and uses root ssh access and AWS api to introduce failures.
Major components of cluster test include:
Reboot{SomeNode}RebootRandomValidators experiment would generate some number of Reboot effects for subset of nodes in cluster.LivenessHealthCheck that verifies that validators produce commits and CommitHistoryHealthCheck that verifies safety, in terms that validators do not produce contradicting commits.Normally experiment lifecycle consist of multiple stages. This lifecycle is managed by test runner:
Experiment is running it also reports set of validators that affected by it through Experiment::affected_validators(). We still verify that liveness and safety is not violated for any other validators. For example, when rebooting 3 validators we make sure that all other validators still make progress.Normally we run cluster_test on linux machine in AWS. In order to build linux binary on mac laptop we have cross compilation script:
docker/cluster_test/build.sh
This script requires docker for mac and starts docker container with build environment.
Build in this container is incremental, first build takes a lot of time but second build is much faster.
As a result, build script produces binary by default. Running it with --build-docker-image will also produce docker image.