2024-06-15

Perplexity AI is lying about their user agent

What Perplexity Is Alleged to Be Doing

Article shows Perplexity fetching a blocked URL on demand using a generic Chrome-like User-Agent, not the documented PerplexityBot.
Many see this as deceptive, because Perplexity documents a special UA for their crawler but uses an indistinguishable browser UA for some requests.
Some argue this suggests they’ll also evade blocking for large-scale crawling; others think the article only proves behavior for “summarize this URL” queries.

Crawling vs User‑Initiated Fetch

One camp: robots.txt and special UAs are for crawlers (systematically traversing sites). A one‑off fetch at explicit user request is morally like a browser: robots.txt shouldn’t apply.
Opposing view: any automated access by a third-party service is a “bot” and should honor robots.txt and site policies, regardless of whether it’s bulk crawling or on-demand summarization.
Related nuance: some point to OpenAI’s split between GPTBot (training) and ChatGPT-User (retrieval) as a better model; Perplexity is faulted for not doing similar.

Ethics of User Agents & Blocking

Many say lying about UA is long‑standing practice (browsers themselves “lie” for compatibility), so morally weak ground to attack Perplexity on that alone.
Others reply that explicitly publishing a UA for opt‑out while routinely using a disguised one crosses from legacy quirk into bad faith.
There’s tension between site owners wanting to block AI tools and users wanting agents that can act “as their browser.”

Copyright, Fair Use, and “Theft”

Strong disagreement over whether training/summarization is akin to:
- Fair-use reading/transforming, or
- Unpaid commercial exploitation that undercuts original creators.
Some stress moral rights (misrepresentation, “mutilation” of works) and licenses (CC, GPL, etc.) that AI models almost never respect.
Others argue anything publicly served is fair game to consume and transform, with enforcement realistically limited to paywalls and contracts.

Impact on Creators & Incentives

Publishers report huge, often abusive bot traffic since the “LLM explosion.”
Fear: zero‑click AI answers (Perplexity, search AI snippets) will kill traffic, ad revenue, and data/analytics, undermining incentives to create original content.
Counterpoint: much public web content is already SEO/ad slop; AI tools that “strip the sludge” are seen as user‑aligned.

Proposed Responses

Technical: hard CAPTCHAs, blocking cloud IP ranges, trap URLs in robots.txt, poisoning content for LLMs.
Legal/contractual: prominent licenses forbidding ML use; collective lawsuits; DMCA/CCPA/GDPR angles (scope and enforceability disputed).
Philosophical split: some call for stronger creator control over downstream machine use; others see that as incompatible with an open web.

Related topics