Llms.txt

Purpose of llms.txt

  • Proposed as a small text/Markdown file at site root listing AI-friendly docs and key links.
  • Intended mainly for end-users and tools (e.g., IDEs, “projects” features) to assemble good LLM context about a library/site, especially for content created after model training cutoffs.
  • Not pitched as a training-data spec, but as a way to curate minimal, well-structured context for inference-time use.

Incentives and Value for Site Owners

  • Supporters:
    • Useful for documentation-heavy projects and open source libraries that want LLMs to help users quickly.
    • Might act as a forcing function to write clear, concise summaries that are also helpful to humans.
  • Skeptics:
    • Little direct benefit; mostly helps AI products, not authors.
    • Could make original content less visited and more “obsolete” as users stay in LLM interfaces.

Scraping, Control, and Compensation

  • Strong frustration that LLM companies profit from scraped content without attribution or payment; some compare it to theft.
  • Calls for mechanisms to declare prices or enforce “right_to_be_un_vectorized,” but recognized as aspirational.
  • General sentiment that robots.txt / ai-txt-style signals are weak; bad actors ignore them, and even some big AI crawlers disregard crawl delays.

Technical Design Debates

  • Many argue this should be a .well-known resource or extension of robots.txt / existing metadata, not another root file.
  • Some question why Markdown is used at all; plain text, HTML, or existing formats (OpenAPI, man pages, etc.) already work.
  • Concern that LLMs should be able to parse normal HTML/docs; needing llms.txt is seen as a symptom of poor site structure or weak models.

Manipulation, Poisoning, and Security

  • Multiple commenters note llms.txt could be abused to poison models or present LLM-only misleading content.
  • Others argue this risk already exists via normal pages; llms.txt doesn’t fundamentally change the attack surface.

Broader Web & UX Concerns

  • Fear that this further optimizes the web for machines over humans, instead of fixing confusing, marketing-heavy sites.
  • Parallels drawn to the Semantic Web and prior machine-readable metadata efforts, with mixed historical success.
  • Some would use llms.txt themselves as an ad-free, concise human-readable “real” docs page.