Perplexity AI is lying about their user agent

What Perplexity Is Alleged to Be Doing

  • Article shows Perplexity fetching a blocked URL on demand using a generic Chrome-like User-Agent, not the documented PerplexityBot.
  • Many see this as deceptive, because Perplexity documents a special UA for their crawler but uses an indistinguishable browser UA for some requests.
  • Some argue this suggests they’ll also evade blocking for large-scale crawling; others think the article only proves behavior for “summarize this URL” queries.

Crawling vs User‑Initiated Fetch

  • One camp: robots.txt and special UAs are for crawlers (systematically traversing sites). A one‑off fetch at explicit user request is morally like a browser: robots.txt shouldn’t apply.
  • Opposing view: any automated access by a third-party service is a “bot” and should honor robots.txt and site policies, regardless of whether it’s bulk crawling or on-demand summarization.
  • Related nuance: some point to OpenAI’s split between GPTBot (training) and ChatGPT-User (retrieval) as a better model; Perplexity is faulted for not doing similar.

Ethics of User Agents & Blocking

  • Many say lying about UA is long‑standing practice (browsers themselves “lie” for compatibility), so morally weak ground to attack Perplexity on that alone.
  • Others reply that explicitly publishing a UA for opt‑out while routinely using a disguised one crosses from legacy quirk into bad faith.
  • There’s tension between site owners wanting to block AI tools and users wanting agents that can act “as their browser.”

Copyright, Fair Use, and “Theft”

  • Strong disagreement over whether training/summarization is akin to:
    • Fair-use reading/transforming, or
    • Unpaid commercial exploitation that undercuts original creators.
  • Some stress moral rights (misrepresentation, “mutilation” of works) and licenses (CC, GPL, etc.) that AI models almost never respect.
  • Others argue anything publicly served is fair game to consume and transform, with enforcement realistically limited to paywalls and contracts.

Impact on Creators & Incentives

  • Publishers report huge, often abusive bot traffic since the “LLM explosion.”
  • Fear: zero‑click AI answers (Perplexity, search AI snippets) will kill traffic, ad revenue, and data/analytics, undermining incentives to create original content.
  • Counterpoint: much public web content is already SEO/ad slop; AI tools that “strip the sludge” are seen as user‑aligned.

Proposed Responses

  • Technical: hard CAPTCHAs, blocking cloud IP ranges, trap URLs in robots.txt, poisoning content for LLMs.
  • Legal/contractual: prominent licenses forbidding ML use; collective lawsuits; DMCA/CCPA/GDPR angles (scope and enforceability disputed).
  • Philosophical split: some call for stronger creator control over downstream machine use; others see that as incompatible with an open web.