I've stopped using box plots (2021)

Debate over box plots’ usefulness

  • Many agree box plots are easily misread and often hide important structure (gaps, multimodality, tight clusters).
  • Several argue they’re a relic of paper-era “data compression by hand”; computers remove that constraint.
  • Others strongly defend them as a compact way to show location and spread (median, quartiles, outliers), especially for comparing multiple groups.

Audience understanding vs “education problem”

  • A major theme: plots are communication tools; if many readers misinterpret box plots, they’re poor choices for most audiences.
  • Some say this is just a training issue and reject dropping box plots because “people aren’t educated.”
  • Counterpoint: some misperceptions (e.g., “longer shape = more data”) are cognitive, not easily fixed by explanation.

Alternatives: violin, strip, jitter, beeswarm, heatmaps

  • Frequently suggested replacements: jittered strip plots, bee/swarm plots, sina plots, violin plots, stacked/side-by-side histograms, ECDFs, and “distribution heatmaps.”
  • Several favor “box + overlaid raw points” as a pragmatic compromise.
  • Critics of violin plots note sensitivity to KDE bandwidth, oversmoothing, poor comparability between groups, and visual clutter; some find them aesthetically or socially awkward.
  • Raincloud/half-violin and ridge plots are mentioned as hybrids.

Statistical assumptions and misunderstandings

  • Long subthread argues whether box plots “assume” Gaussian/unimodal data vs being fully nonparametric (just quartiles and whisker rules).
  • There is confusion even among commenters about how quartiles and whiskers are defined, and about links to the central limit theorem.
  • Some note that for multimodal or heavy‑tailed distributions, box plots can be actively misleading.

Use-case-driven defenses of box plots

  • Defenders cite cases where stakeholders explicitly care about specific percentiles (e.g., 15th/85th, 25th/75th) and want simple comparisons across many groups.
  • Box plots seen as best when: distributions are roughly unimodal, audience is statistically trained, and focus is on a small set of summary stats rather than full shape.

Meta: visualization goals and human factors

  • Recurrent point: the priority should be clearest insight for the intended audience, not loyalty to a traditional chart type.
  • Some conclude that disagreement and confusion in the thread itself bolster the case for favoring simpler, more literal distribution plots in most situations.