I've stopped using box plots (2021)
Debate over box plots’ usefulness
- Many agree box plots are easily misread and often hide important structure (gaps, multimodality, tight clusters).
- Several argue they’re a relic of paper-era “data compression by hand”; computers remove that constraint.
- Others strongly defend them as a compact way to show location and spread (median, quartiles, outliers), especially for comparing multiple groups.
Audience understanding vs “education problem”
- A major theme: plots are communication tools; if many readers misinterpret box plots, they’re poor choices for most audiences.
- Some say this is just a training issue and reject dropping box plots because “people aren’t educated.”
- Counterpoint: some misperceptions (e.g., “longer shape = more data”) are cognitive, not easily fixed by explanation.
Alternatives: violin, strip, jitter, beeswarm, heatmaps
- Frequently suggested replacements: jittered strip plots, bee/swarm plots, sina plots, violin plots, stacked/side-by-side histograms, ECDFs, and “distribution heatmaps.”
- Several favor “box + overlaid raw points” as a pragmatic compromise.
- Critics of violin plots note sensitivity to KDE bandwidth, oversmoothing, poor comparability between groups, and visual clutter; some find them aesthetically or socially awkward.
- Raincloud/half-violin and ridge plots are mentioned as hybrids.
Statistical assumptions and misunderstandings
- Long subthread argues whether box plots “assume” Gaussian/unimodal data vs being fully nonparametric (just quartiles and whisker rules).
- There is confusion even among commenters about how quartiles and whiskers are defined, and about links to the central limit theorem.
- Some note that for multimodal or heavy‑tailed distributions, box plots can be actively misleading.
Use-case-driven defenses of box plots
- Defenders cite cases where stakeholders explicitly care about specific percentiles (e.g., 15th/85th, 25th/75th) and want simple comparisons across many groups.
- Box plots seen as best when: distributions are roughly unimodal, audience is statistically trained, and focus is on a small set of summary stats rather than full shape.
Meta: visualization goals and human factors
- Recurrent point: the priority should be clearest insight for the intended audience, not loyalty to a traditional chart type.
- Some conclude that disagreement and confusion in the thread itself bolster the case for favoring simpler, more literal distribution plots in most situations.