ADRs/0010 - Test module consolidation.md
Proposed
Proposed by: Adam Gibson (13th December 2021)
Testing historically on a large code base like deeplearning4j often involves platform specific code with several categories as documented in [the Test Architectures ADR](./0006 - Test architecture.md)
There are multiple levels of testing which test ever larger chunks of the application.
Unit tests typically test code at the smallest possible unit. In Java this usually encompasses just a single class.
Component tests are meant to test a component consisting of multiple units working together. A logical component usually consists of a few classes at most and don't cross the boundary between two components.
Integration tests are meant to test how components integrate with each other. Their most important job is to ensure that those components properly interface with each other.
End-to-End tests, often also called system tests, are meant to test the entire system. They interface with the application through the same UI as a regular user does.
Regression tests are meant to mimic a specific behavior or usage that results in a bug. They are created before the bug fix and need to reproduce the bug but expect the correct behavior, i.e. they should fail at first. Once the bug is fixed, they should pass without any change in the test definition. These test cases accumulate as bug reports come in and guard us from recreating that particular bug in that particular situation.
The Eclipse Deeplearning4j project has mostly what we would call End-To-End tests. We want to run a set of tests on different classifiers (eg: cuda version + cudnn, cuda version + non cudnn, cpu, arm32,arm64,..) in order to verify platform specific behavior works as intended.
When testing, we generally have a few things we test the behavior of:
Verifying behavior across these different backends even at release time is time-consuming and error-prone taking hours to run with some tests being inconsistent (oftentimes spark and multi threading clashing with OMP math threads causing crahses/slowdowns)
We put anything that is considered an end-to-end test requiring platform specific behavior in to its own module. These tests would already be tagged. We would have an accompanying pom.xml that accommodated downloading snapshots to allow us to run specified tests on different classifiers.
The goal would be to allow specifying the following parameters from the command line:
Future work may extend this behavior to add performance tests as well.
The intended workflow would be to allow the following steps:
This allows easy configuration on CI and creation of different scripts for validation along the lines of behavior we want to run. Examples include:
These distinctions would be achieved through a mix of test tags and test name filters.