Elsevier embeds a hash in the PDF metadata that is unique for each download (2022)
Technical tracking and watermarking methods
- Elsevier embeds a unique hash in PDF metadata for each download; some suggest it may be a MAC tied to the buyer, used to identify leaked copies.
- Commenters note metadata is “low-hanging fruit”; more sophisticated schemes can also alter spacing, kerning, fonts, colors, invisible characters, or diagrams to uniquely tag each copy.
- Several note that with multiple differently watermarked copies, one could compare and “re-anonymize,” though this is non-trivial.
- Some see the chosen hash-in-metadata approach as crude compared to steganography; others assume Elsevier likely uses multiple techniques.
Proposed countermeasures and tools
- Common suggestions:
- “Print to PDF,” or print–scan–OCR pipelines to strip metadata and hidden structure.
- Convert pages to images, rebuild a PDF, then run OCR.
- Use Ghostscript / ps2pdf pipelines, or custom scripts to re-render and compress PDFs.
- Use sandboxing/sanitizing tools (QubesOS PDF sanitizer, Dangerzone, docleaner, pdfparanoia, pdf-redact-tools).
- Skeptics warn that:
- Print–scan destroys accessibility/tagging and may still preserve some steganographic signals.
- Zone identifier streams and other OS-level metadata can also leak source info.
- A general “one-click anonymizer” is hard because watermark schemes vary and can be deeply embedded.
Access, piracy, and Sci-Hub
- Many call Elsevier and similar publishers “parasites” or rent-seekers, profiting from publicly funded research and unpaid peer review while imposing high paywalls and open-access fees.
- Sci-Hub and Library Genesis are widely praised as essential to democratize access, even by users with institutional subscriptions (ease of use).
- Concern is raised that Sci-Hub is under legal pressure and no longer scraping recent papers, cutting off newer research.
Systemic incentives and responsibility
- Some blame lies placed on academia and governments: hiring, tenure, and funding systems heavily reward publication in prestigious, often paywalled journals, sustaining companies like Elsevier.
- Others argue Elsevier’s behavior is “fair business” within the existing copyright system; if researchers dislike the terms, they could use alternatives.
- Counterargument: individual researchers face career pressure and inertia; prestige metrics and peer-review gatekeeping make opting out costly.
- Proposed actions: refuse unpaid peer review for for‑profit paywalled journals, support open-access and arXiv-style models, and develop community-run alternatives.
Ethics, legality, and broader comparisons
- Some view watermarking as preferable to DRM; others see both as unjustified control over knowledge.
- A question is raised about EU legality; one reply suggests Elsevier’s jurisdiction likely shields it, but details remain unclear.
- Comparisons are drawn to extreme DRM schemes for standards (ISO/ANSI) and to past cases where media stores embedded sensitive buyer data into files.