Perplexity AI is lying about their user agent
What Perplexity Is Alleged to Be Doing
- Article shows Perplexity fetching a blocked URL on demand using a generic Chrome-like User-Agent, not the documented
PerplexityBot. - Many see this as deceptive, because Perplexity documents a special UA for their crawler but uses an indistinguishable browser UA for some requests.
- Some argue this suggests they’ll also evade blocking for large-scale crawling; others think the article only proves behavior for “summarize this URL” queries.
Crawling vs User‑Initiated Fetch
- One camp: robots.txt and special UAs are for crawlers (systematically traversing sites). A one‑off fetch at explicit user request is morally like a browser: robots.txt shouldn’t apply.
- Opposing view: any automated access by a third-party service is a “bot” and should honor robots.txt and site policies, regardless of whether it’s bulk crawling or on-demand summarization.
- Related nuance: some point to OpenAI’s split between
GPTBot(training) andChatGPT-User(retrieval) as a better model; Perplexity is faulted for not doing similar.
Ethics of User Agents & Blocking
- Many say lying about UA is long‑standing practice (browsers themselves “lie” for compatibility), so morally weak ground to attack Perplexity on that alone.
- Others reply that explicitly publishing a UA for opt‑out while routinely using a disguised one crosses from legacy quirk into bad faith.
- There’s tension between site owners wanting to block AI tools and users wanting agents that can act “as their browser.”
Copyright, Fair Use, and “Theft”
- Strong disagreement over whether training/summarization is akin to:
- Fair-use reading/transforming, or
- Unpaid commercial exploitation that undercuts original creators.
- Some stress moral rights (misrepresentation, “mutilation” of works) and licenses (CC, GPL, etc.) that AI models almost never respect.
- Others argue anything publicly served is fair game to consume and transform, with enforcement realistically limited to paywalls and contracts.
Impact on Creators & Incentives
- Publishers report huge, often abusive bot traffic since the “LLM explosion.”
- Fear: zero‑click AI answers (Perplexity, search AI snippets) will kill traffic, ad revenue, and data/analytics, undermining incentives to create original content.
- Counterpoint: much public web content is already SEO/ad slop; AI tools that “strip the sludge” are seen as user‑aligned.
Proposed Responses
- Technical: hard CAPTCHAs, blocking cloud IP ranges, trap URLs in robots.txt, poisoning content for LLMs.
- Legal/contractual: prominent licenses forbidding ML use; collective lawsuits; DMCA/CCPA/GDPR angles (scope and enforceability disputed).
- Philosophical split: some call for stronger creator control over downstream machine use; others see that as incompatible with an open web.