An entire Herculaneum scroll has been read for the first time
Technical approach & ML role
- Scrolls are CT-scanned at a powerful synchrotron beamline; raw data can reach ~260 TB for a large scroll.
- Workflow: segment and virtually unwrap layers → render surfaces → detect ink using ML and physically based rendering.
- Ink is usually carbon-based and not directly visible in X-ray; ML models work on small 3D CT chunks and detect subtle textural differences.
- Team stresses that training data quality (manual annotations of papyrus surfaces and ink) matters more than model choice.
- ML can “hallucinate” at the stroke level (slightly extended lines, filled gaps), but not full grammatical Greek/Latin; philologists remain essential.
Scope, progress & limitations
- About 30 scrolls have been scanned so far; estimated ~350 mostly intact scrolls, ~1000 damaged, plus many fragments; the vast majority remain in Italy.
- Scanning a big scroll takes days; human refinement and annotation take months even with automation.
- Funding and beamtime costs are major bottlenecks; current work is paid from donations and private funding, not government grants.
- Estimates of how long it will take to read the collection are described as highly uncertain and scroll-dependent.
What was actually “read”
- The newly announced scroll (PHerc. 1667) is the compact inner core of a roll badly damaged by earlier physical opening attempts.
- Some commenters argue “entire scroll” is misleading since outer layers are gone; others clarify the team means “the entire surviving core.”
- The text is a philosophical treatise on ethics, probably Stoic (2nd c. BC), discussing human nature, impulse, moral progress, and naming Aristocreon.
- Translation in the paper is partial and fragmentary; several columns are heavily damaged with only scattered phrases.
Potential impact & expected content
- Many hope for lost works (e.g., early Greek philosophers, historians, technical treatises) or alternative political perspectives suppressed by later transmission.
- Others think the main effect will be filling in details rather than radically overturning our picture of antiquity.
- Early expectations that the library is “just Epicurean” are questioned; only one of three newly read texts appears clearly Epicurean so far.
Broader debates & reactions
- Strong enthusiasm: seen as one of the clearest examples of ML/AI tangibly advancing scholarship, with overlap to medical imaging techniques.
- Some skepticism about hype, title accuracy, and whether focus on Herculaneum crowds out attention to other large corpora (e.g., Mesopotamian tablets).
- Extended side-discussion on translation philosophy, bias, and why bilingual editions matter; consensus that all translation involves interpretation.
- Archaeology’s destructiveness and whether to delay excavation for future tech are debated; current practice often leaves portions unexcavated intentionally.