Llms.txt
Purpose of llms.txt
- Proposed as a small text/Markdown file at site root listing AI-friendly docs and key links.
- Intended mainly for end-users and tools (e.g., IDEs, “projects” features) to assemble good LLM context about a library/site, especially for content created after model training cutoffs.
- Not pitched as a training-data spec, but as a way to curate minimal, well-structured context for inference-time use.
Incentives and Value for Site Owners
- Supporters:
- Useful for documentation-heavy projects and open source libraries that want LLMs to help users quickly.
- Might act as a forcing function to write clear, concise summaries that are also helpful to humans.
- Skeptics:
- Little direct benefit; mostly helps AI products, not authors.
- Could make original content less visited and more “obsolete” as users stay in LLM interfaces.
Scraping, Control, and Compensation
- Strong frustration that LLM companies profit from scraped content without attribution or payment; some compare it to theft.
- Calls for mechanisms to declare prices or enforce “right_to_be_un_vectorized,” but recognized as aspirational.
- General sentiment that robots.txt / ai-txt-style signals are weak; bad actors ignore them, and even some big AI crawlers disregard crawl delays.
Technical Design Debates
- Many argue this should be a
.well-knownresource or extension ofrobots.txt/ existing metadata, not another root file. - Some question why Markdown is used at all; plain text, HTML, or existing formats (OpenAPI, man pages, etc.) already work.
- Concern that LLMs should be able to parse normal HTML/docs; needing llms.txt is seen as a symptom of poor site structure or weak models.
Manipulation, Poisoning, and Security
- Multiple commenters note llms.txt could be abused to poison models or present LLM-only misleading content.
- Others argue this risk already exists via normal pages; llms.txt doesn’t fundamentally change the attack surface.
Broader Web & UX Concerns
- Fear that this further optimizes the web for machines over humans, instead of fixing confusing, marketing-heavy sites.
- Parallels drawn to the Semantic Web and prior machine-readable metadata efforts, with mixed historical success.
- Some would use llms.txt themselves as an ad-free, concise human-readable “real” docs page.