Tests come in a variety of formats. I've seen all kinds of tests and in varying numbers. Here are the main types I've encountered, from largest to more restricted.
Of course one might not need all levels, this list contains levels I've encountered. Feel free to add missing levels in the comments.
Martin Fowler described the test pyramid using only three test types (UI, Service, Unit). People sometimes have different meanings for each test type, here is mine.
A physical tester uses the application as an end-user would.
A robot uses the application as an end-user would.
Many instances of robots use the application or at least part of it simultaneously in order to simulate a production-like load on the system.
** NB: ** Performance Tests usually benchmark a specific feature and don't need a full-fledged deployed environment, but since they are close to load test, I've put them at this level, although they could be at practically any level.
A test suite is run on a running system with its external dependencies (ie a database, a message broker, external APIs, ...)
A test suite is run on a running system whose external dependencies are stubbed.
A test suite is directly on code with internal or external dependencies mocked.
A test suite is directly run on a single unit of code (aka a Class, a Package, or a single file)
Purpose of Tests
Tests have a purpose. They are here to say this feature no longer works. It's good to know when a feature ceases to work, but the goal of tests, in the end, is to indicate precisely when and where things went wrong.
Cost and value
Each type of test comes at a cost. This is the essence of the test pyramid. Higher level tests take longer to run while they cover more parts of the application, whereas lower level tests are faster but cover little parts of the application. Higher level tests also have a greater maintenance cost.
Changes in the application have a bigger impact on higher level tests. It's not uncommon to change a small feature and have to update many higher-level tests. Higher level tests are more fragile due to the fact that they have a large coverage.
Higher level tests suffer from :
- Longer runs
- Longer feedback loop
Higher level tests benefits :
- Great coverage
- Best coverage per line of code ratio
- Closest to user experience
Overall, tests should be relatively fast to run. This is why there should be little high-level tests and there can be a lot of lower level tests. Fast running tests are good because they will be run often, thus favoring a quick and constant feedback loop.
Too many UI Tests
I've seen many applications with a lot of UI tests. These tests have an immense value because they are the closest to the user's perspective. But since these tests can take several hours to run, they are run at night. With a regular-sized team producing features every day, it can be hard to spot which feature impacted the test results. Sometimes the analysis can be quite obvious, but the analysis is time-consuming. This is why having a large number of UI tests is not recommended. The coverage is great, but tests are merely there to stamp a release, they should also provide guidance as to exactly where things went wrong.
Don't get me wrong, UI Tests have a lot of value as their large coverage may fill in the missing gaps of other tests. They also take into account the graphical aspects, which are not covered by other levels.
Unit Tests are brittle
Unit test aim at verifying a class's behavior. When a refactoring occurs that needs to change the class's contract, the unit test is often garbaged. This is why unit tests are much more brittle than contract and acceptance test, which are usually considered more stable.
Acceptance Tests and Unit Tests
TDD is not only applicable to unit test but should be applied as well to acceptance tests. ATDD aids in using acceptance tests for designing. These can be combined with unit tests using the outside in TDD or double loop TDD. Acceptance tests are more stable and should be relied on when building a feature while unit tests can be used for class design and should be removed when a refactoring renders them useless.
So my test pyramid would look more like this.
Reversing the bottom of the pyramid is inspired by Thomas Pierrain's work on Rid Me Of Those Testing Pyramids.
Hexagonal architecture is a pattern in which the domain code is isolated from infrastructure code. The upside is that the domain code is simple and responds solely to business needs.
Acceptance tests fit perfectly in such an architecture because the correspond to domain requirements. They are less likely to change during design than unit test and they can be trusted for validating domain requirements. Unit test only validates a class, not a domain function.
Using the double loop TDD, one can rely on an acceptance test and write unit tests to help during design and not be reluctant on disposing of them when the design needs a refactor.
The test pyramid remains valid for higher level tests based on maintenance and execution time matters. This doesn't mean that unit tests should rule the pyramid. Acceptance tests should be predominant and unit tests should help with designing and coverage. Unit tests should rely on acceptance tests, which means they should be more disposable.