One in five genetics papers contains errors thanks to Excel (2016)

Scope of the Problem

  • Gene names like SEPT2 and MARCH1 being auto-converted to dates or numbers in Excel has been known since at least 2004.
  • Studies and follow-up work show these errors are widespread in genetics papers and other domains (e.g., CUSIPs, ZIP codes, business names like “7/11”).
  • Commenters note similar issues with locale-dependent number formats, scientific notation, and CSV imports that silently mangle data.

Is Excel or the User at Fault?

  • One camp says this is fundamentally user error: Excel is a general-purpose spreadsheet, not a genetics tool, and users must understand its quirks, set column types, or use better-suited tools (R, Python, SPSS, SAS, databases).
  • Another camp argues that blaming users is counterproductive: Excel’s defaults and auto-conversions violate “principle of least surprise,” make silent, hard-to-detect changes, and encourage mistakes even among careful users.
  • There’s debate over responsibility: some emphasize UX designers and vendors should remove “footguns,” others stress user training and technical competence.

Design, Defaults, and Workarounds

  • Excel’s “General” type and auto-detection for dates/scientific notation are seen as major problems; users want a global “treat everything as text” or “trust my input” mode.
  • Newer Excel versions reportedly add options to limit auto-conversions, but availability and behavior across versions is unclear.
  • Common workarounds: pre-set all cells/columns to text, use the data import wizard with explicit types, never save when just inspecting CSVs, or avoid editing in Excel altogether.

Broader Reflections on Spreadsheets

  • Spreadsheets are praised as uniquely powerful, accessible “bicycles for the mind” and often better than oversimplified SaaS tools.
  • Others argue they’re misused as databases and analysis pipelines; errors scale badly, and there’s little support for testing or reproducibility.
  • Some note institutional lock-in: journals requiring XLS, coworkers standardizing on Excel, and lack of good, user-friendly database alternatives.
  • There are calls for more constrained or code-backed tabular tools, and for bringing modern software practices (testing, type systems) into spreadsheet environments.

Related Technical Quirks

  • A long subthread discusses operator precedence (e.g., -3^2 = 9 vs -9) across spreadsheets and languages, illustrating how legacy design choices persist and confuse users.