11 Jan Automated tests and the confidence they breed
“Danger breeds best on too much confidence.” – Pierre Corneille, Le Cid
A view about testing I’ve encountered often relates to the confidence that certain kinds of tests provide. One of my team-mates recently referred to the test pyramid as a confidence pyramid, the implication being that (as has been suggested elsewhere) as you go towards the top of the pyramid, your tests will instill more confidence, as they are end-to-end tests (Note: I use the term “end-to-end” here to mean the same, in principle, as “UI” in the original test automation pyramid). While that’s largely true, I think it’s true only from a narrow point of view. In my view, end-to-end tests generally imply confidence only of a certain type, which is that the system is “potentially shippable” (that too, partially – see the section about comprehensiveness below).
Unit and service tests fail utterly at providing that kind of confidence. A unit test might tell you you’ve inched towards shippability, while a service test might tell you you’ve taken a stride towards it. However, nothing tells you that you’ve arrived like an end-to-end test does.
What about other kinds of confidence, though? For instance:
Confidence in the design of the system
As J. B. Rainsberger has beautifully put it, smaller tests put positive pressure on your design. The bigger the tests, the larger the number of designs that can pass those tests. So, bigger (end-to-end) tests can’t provide feedback about the design nearly as well as smaller (unit) tests.
Confidence in being able to find and fix issues efficiently
A bigger test, by definition, covers more of your code than a smaller one. When a big test fails, the area in which the source of the failure (the bug) could be hiding/present is larger compared to that covered by a smaller test. So, if your system is a sports field, with a bug lurking in it after sunset, end-to-end tests are gonna (not so) helpfully shine a floodlight on the whole field, service tests will shine a spotlight on a patch of the field where the bug is, while unit tests will beam a flashlight on the spot around the bug. No prizes for guessing which kind of test will make it easier to hunt down the bug, a crucial component of the true value of tests.
Confidence in the reliability of the tests
Tests that purport to check the behavior of a system, but fail for reasons that have nothing to do with that behavior (e.g. problem with the network, a database, a third-party service, etc.) create unnecessary noise at best – more generally, they lower confidence in the system and/or the tests, lead to adoption of various questionable
practices tricks like rerunning, ignoring or disabling flaky tests. The solution, as LinkedIn discovered from experience, is eradicating non-determinism in tests, by testing smaller, more isolated parts of the system.
Confidence in the comprehensiveness of the tests
I said above that end-to-end tests give you confidence that a system is potentially shippable. That’s a simplification. The whole, and somewhat embarrassing, truth is their stamp of approval is only as good as the number of scenarios they check. Now, that’s not any different from any other kind of tests, except that you can’t really hope to have enough end-to-end tests to check very many of them. See The Forgotten Layer of the Test Automation Pyramid for a description of the wastefulness, and Integrated Tests Are a Scam for a computation of the futility.
Confidence in the feedback
a) More specific and direct feedback is more useful than less specific or indirect feedback. So, tests that simply tell you about the presence of a bug while not helping you find/fix it are like a parent who simply yells at a child for being a “bad boy”, but won’t tell them about a specific bad behavior or action, nor about what they should change and how. (System under test: “But what did I do?” End-to-end test: “Oh, what did you do? You were a bad boy – that’s what you did!”)
b) Quick feedback is more useful than delayed feedback. So, tests that take several minutes or hours to run, rather than a few seconds or minutes, are like a parent who admonishes a child for a misdemeanor weeks after it first occurred. By then, the child can barely recollect the situation, let alone reconstruct it and analyze or correct their behavior. Worse, they may’ve repeated similar behavior multiple times between then and now, without even realizing they were doing something wrong. If one of those instances has landed them in trouble, the child’s gonna have a hard time figuring out which one and trying to make amends.
So, for all of these other types of confidence, I think the confidence goes down as you go up the pyramid. The way out? Well, it’s neither “more end-to-end tests” nor “no more end-to-end tests”, but “just enough end-to-end tests (and no more)”.