Tests come in a variety of formats. I've seen all kinds of tests and in varying numbers. Here are the main types I've encountered, from largest to more restricted.

pyramid

Of course one might not need all levels, this list contains levels I've encountered. Feel free to add missing levels in the comments.

Test Types

Martin Fowler described the test pyramid using only three test types (UI, Service, Unit). People sometimes have different meanings for each test type, here is mine.

Manual Tests

A physical tester uses the application as an end-user would.

UI Tests

A robot uses the application as an end-user would.

Load/Performance Tests

Many instances of robots use the application or at least part of it simultaneously in order to simulate a production-like load on the system.

** NB: ** Performance Tests usually benchmark a specific feature and don't need a full-fledged deployed environment, but since they are close to load test, I've put them at this level, although they could be at practically any level.

Integration Tests

A test suite is run on a running system with its external dependencies (ie a database, a message broker, external APIs, ...)

Contract Tests

A test suite is run on a running system whose external dependencies are stubbed.

Acceptance Tests

A test suite is directly on code with internal or external dependencies mocked.

Unit Tests

A test suite is directly run on a single unit of code (aka a Class, a Package, or a single file)

Purpose of Tests

Tests have a purpose. They are here to say this feature no longer works. It's good to know when a feature ceases to work, but the goal of tests, in the end, is to indicate precisely when and where things went wrong.

Cost and value

Each type of test comes at a cost. This is the essence of the test pyramid. Higher level tests take longer to run while they cover more parts of the application, whereas lower level tests are faster but cover little parts of the application. Higher level tests also have a greater maintenance cost.

Changes in the application have a bigger impact on higher level tests. It's not uncommon to change a small feature and have to update many higher-level tests. Higher level tests are more fragile due to the fact that they have a large coverage.

A piggy bank on a white surface
Photo by Fabian Blank / Unsplash

Higher level tests suffer from :

  • Maintenance
  • Longer runs
  • Longer feedback loop

Higher level tests benefits :

  • Great coverage
  • Best coverage per line of code ratio
  • Closest to user experience

Feedback Loop

Overall, tests should be relatively fast to run. This is why there should be little high-level tests and there can be a lot of lower level tests. Fast running tests are good because they will be run often, thus favoring a quick and constant feedback loop.

FeedbackLoop

Too many UI Tests

I've seen many applications with a lot of UI tests. These tests have an immense value because they are the closest to the user's perspective. But since these tests can take several hours to run, they are run at night. With a regular-sized team producing features every day, it can be hard to spot which feature impacted the test results. Sometimes the analysis can be quite obvious, but the analysis is time-consuming. This is why having a large number of UI tests is not recommended. The coverage is great, but tests are merely there to stamp a release, they should also provide guidance as to exactly where things went wrong.

Don't get me wrong, UI Tests have a lot of value as their large coverage may fill in the missing gaps of other tests. They also take into account the graphical aspects, which are not covered by other levels.

Unit Tests are brittle

Unit tests are great. Combined with methods like Test-Driven Development they aid during the design phase and enforce the YAGNI principle.

Unit test aim at verifying a class's behavior. When a refactoring occurs that needs to change the class's contract, the unit test is often garbaged. This is why unit tests are much more brittle than contract and acceptance test, which are usually considered more stable.

Acceptance Tests and Unit Tests

TDD is not only applicable to unit test but should be applied as well to acceptance tests. ATDD aids in using acceptance tests for designing. These can be combined with unit tests using the outside in TDD or double loop TDD. Acceptance tests are more stable and should be relied on when building a feature while unit tests can be used for class design and should be removed when a refactoring renders them useless.

So my test pyramid would look more like this.

PyramidTests2

Reversing the bottom of the pyramid is inspired by Thomas Pierrain's work on Rid Me Of Those Testing Pyramids.

Hexagonal architecture

Hexagonal architecture is a pattern in which the domain code is isolated from infrastructure code. The upside is that the domain code is simple and responds solely to business needs.

Acceptance tests fit perfectly in such an architecture because the correspond to domain requirements. They are less likely to change during design than unit test and they can be trusted for validating domain requirements. Unit test only validates a class, not a domain function.

Using the double loop TDD, one can rely on an acceptance test and write unit tests to help during design and not be reluctant on disposing of them when the design needs a refactor.

Conclusion TL;DR

The test pyramid remains valid for higher level tests based on maintenance and execution time matters. This doesn't mean that unit tests should rule the pyramid. Acceptance tests should be predominant and unit tests should help with designing and coverage. Unit tests should rely on acceptance tests, which means they should be more disposable.