Elsevier embeds a hash in the PDF metadata that is unique for each download (2022)

Technical tracking and watermarking methods

  • Elsevier embeds a unique hash in PDF metadata for each download; some suggest it may be a MAC tied to the buyer, used to identify leaked copies.
  • Commenters note metadata is “low-hanging fruit”; more sophisticated schemes can also alter spacing, kerning, fonts, colors, invisible characters, or diagrams to uniquely tag each copy.
  • Several note that with multiple differently watermarked copies, one could compare and “re-anonymize,” though this is non-trivial.
  • Some see the chosen hash-in-metadata approach as crude compared to steganography; others assume Elsevier likely uses multiple techniques.

Proposed countermeasures and tools

  • Common suggestions:
    • “Print to PDF,” or print–scan–OCR pipelines to strip metadata and hidden structure.
    • Convert pages to images, rebuild a PDF, then run OCR.
    • Use Ghostscript / ps2pdf pipelines, or custom scripts to re-render and compress PDFs.
    • Use sandboxing/sanitizing tools (QubesOS PDF sanitizer, Dangerzone, docleaner, pdfparanoia, pdf-redact-tools).
  • Skeptics warn that:
    • Print–scan destroys accessibility/tagging and may still preserve some steganographic signals.
    • Zone identifier streams and other OS-level metadata can also leak source info.
    • A general “one-click anonymizer” is hard because watermark schemes vary and can be deeply embedded.

Access, piracy, and Sci-Hub

  • Many call Elsevier and similar publishers “parasites” or rent-seekers, profiting from publicly funded research and unpaid peer review while imposing high paywalls and open-access fees.
  • Sci-Hub and Library Genesis are widely praised as essential to democratize access, even by users with institutional subscriptions (ease of use).
  • Concern is raised that Sci-Hub is under legal pressure and no longer scraping recent papers, cutting off newer research.

Systemic incentives and responsibility

  • Some blame lies placed on academia and governments: hiring, tenure, and funding systems heavily reward publication in prestigious, often paywalled journals, sustaining companies like Elsevier.
  • Others argue Elsevier’s behavior is “fair business” within the existing copyright system; if researchers dislike the terms, they could use alternatives.
  • Counterargument: individual researchers face career pressure and inertia; prestige metrics and peer-review gatekeeping make opting out costly.
  • Proposed actions: refuse unpaid peer review for for‑profit paywalled journals, support open-access and arXiv-style models, and develop community-run alternatives.

Ethics, legality, and broader comparisons

  • Some view watermarking as preferable to DRM; others see both as unjustified control over knowledge.
  • A question is raised about EU legality; one reply suggests Elsevier’s jurisdiction likely shields it, but details remain unclear.
  • Comparisons are drawn to extreme DRM schemes for standards (ISO/ANSI) and to past cases where media stores embedded sensitive buyer data into files.