One in five genetics papers contains errors thanks to Excel (2016)
Scope of the Problem
- Gene names like SEPT2 and MARCH1 being auto-converted to dates or numbers in Excel has been known since at least 2004.
- Studies and follow-up work show these errors are widespread in genetics papers and other domains (e.g., CUSIPs, ZIP codes, business names like “7/11”).
- Commenters note similar issues with locale-dependent number formats, scientific notation, and CSV imports that silently mangle data.
Is Excel or the User at Fault?
- One camp says this is fundamentally user error: Excel is a general-purpose spreadsheet, not a genetics tool, and users must understand its quirks, set column types, or use better-suited tools (R, Python, SPSS, SAS, databases).
- Another camp argues that blaming users is counterproductive: Excel’s defaults and auto-conversions violate “principle of least surprise,” make silent, hard-to-detect changes, and encourage mistakes even among careful users.
- There’s debate over responsibility: some emphasize UX designers and vendors should remove “footguns,” others stress user training and technical competence.
Design, Defaults, and Workarounds
- Excel’s “General” type and auto-detection for dates/scientific notation are seen as major problems; users want a global “treat everything as text” or “trust my input” mode.
- Newer Excel versions reportedly add options to limit auto-conversions, but availability and behavior across versions is unclear.
- Common workarounds: pre-set all cells/columns to text, use the data import wizard with explicit types, never save when just inspecting CSVs, or avoid editing in Excel altogether.
Broader Reflections on Spreadsheets
- Spreadsheets are praised as uniquely powerful, accessible “bicycles for the mind” and often better than oversimplified SaaS tools.
- Others argue they’re misused as databases and analysis pipelines; errors scale badly, and there’s little support for testing or reproducibility.
- Some note institutional lock-in: journals requiring XLS, coworkers standardizing on Excel, and lack of good, user-friendly database alternatives.
- There are calls for more constrained or code-backed tabular tools, and for bringing modern software practices (testing, type systems) into spreadsheet environments.
Related Technical Quirks
- A long subthread discusses operator precedence (e.g.,
-3^2= 9 vs -9) across spreadsheets and languages, illustrating how legacy design choices persist and confuse users.