2025-08-27

Delete tests

When (and What) to Delete

Many see value in deleting bad tests: obsolete ones, pure implementation tests (“called API N times”), duplicated coverage, or tests that only enforce code structure.
Several argue flaky tests that no one has time to fix effectively behave as lies and may be worse than no test, especially if they block CI or are routinely ignored.
Others counter that deletion should be last resort: flaky or noisy tests usually signal real issues (races, nondeterminism, fragile APIs) and should be fixed or refactored, not removed.

Flaky, Slow, and Brittle Suites

Flakiness often stems from sleeps, timing assumptions, shared state, poor synchronization, or bad fixtures. These accumulate until suites become both slow and unreliable.
Suggested mitigations:
- Replace sleeps with “wait until condition or timeout.”
- Separate flaky tests into their own suite; run them nightly and fix/retire them gradually.
- Categorize tests (fast/slow; unit/integration/E2E) and run subsets on PRs, full suites on main.
Several stories describe huge, partially broken suites where engineers stop trusting results; strategy there is to promote a small “evergreen” subset and prune or repair the rest.

Unit vs Integration vs E2E (Big Argument)

One camp: integration/E2E tests exercise real behavior and make most unit tests redundant; unit tests over-internal details are brittle, mock-heavy, and miss real failures at component boundaries.
Opposing camp: unit tests are essential for fast TDD cycles, edge cases, algorithms, parsers, and complex pure functions; integration tests alone are too slow, incomplete, and harder to debug.
Several note that “unit/integration/system/functional” terminology is inconsistent across organizations; some prefer definitions based on size/cost (e.g., Google’s Small/Medium/Large).
Broad middle ground:
- Use unit tests for stable, well-bounded logic and contracts.
- Use integration tests for component interaction and database/protocol contracts.
- Reserve E2E for “does the system actually work” checks and happy-path/regression coverage.

Test Maintenance, Coverage, and Tooling

Tests are code with real maintenance cost; suites that require touching 150 tests per refactor are seen as poorly designed and overly “white-box.”
Heavy reliance on coverage metrics can incentivize low‑value tests (e.g., massive mocking just to bump percentages).
Some advocate measuring “test maintenance overhead vs bugs caught,” and explicitly deleting tests that cause frequent false positives relative to their value.
A few teams report success using strong types, design-by-contract–style assertions, property-based testing, and AI tools (e.g., LLMs picking relevant E2E tests from a diff) to reduce test burden.

Overall Reaction to the Article

Many readers interpret the piece as clickbait; the practical takeaway they agree with is “delete useless or harmful tests,” not “delete tests in general.”
Persistent theme: the right answer is usually to fix the tests or raise their abstraction level, and to treat test design as seriously as production code.

Related topics