2024-12-18

Feed readers which don't take "no" for an answer

HTTP status codes and API semantics

Debate over whether HTTP status codes are good design for app-level errors.
Some argue app-specific error payloads should dominate, with HTTP codes only indicating transport-level success/failure.
Others insist layered design makes sense: HTTP handles resource/transport status (e.g., 404, 429), app errors go in the body.
Disagreement over using 404 for “resource not found in DB” vs “endpoint doesn’t exist”; some see both as 404, others prefer 200 with an empty/“no results” payload.

Feed reader behavior & conditional requests

Central complaint: many RSS/Atom readers poll too frequently with unconditional GETs of large feeds.
Proper behavior cited: send If-Modified-Since / If-None-Match and respect 304 Not Modified.
Some readers do this correctly; others hammer feeds every few minutes and ignore caching semantics, effectively wasting bandwidth.

Aggressive rate limiting and 429 responses

The blog in question returns 429 and advises a 24‑hour retry for clients that repeatedly fetch unconditionally.
Supporters: servers owe clients neither unlimited requests nor special treatment; 429 + Retry-After is a clear signal, and misbehaving clients should fix caching.
Critics: blocking after 2 hits in 20 minutes for a 500KB RSS feed is “hostile” and punishes end users, especially behind shared IPs or when testing new readers.
Semantic dispute over whether 429 is “rate limiting” vs “blocking,” but practical effect is the same: no content during the window.

Bandwidth, feed design, and caching

The feed contains ~~100 full posts (~~500KB). Some say that’s excessive and should be trimmed (e.g., fewer items, summaries only).
Others defend full-content, long-history feeds; the real waste is clients re-downloading unchanged content instead of using conditional requests.
Examples given where individual readers account for noticeable percentages of a site’s yearly egress.

Bots, LLM scrapers, and infrastructure

Several report big increases in bot and LLM-related traffic, often ignoring robots.txt and faking user agents.
Approaches mentioned: blocking datacenter IPs, “bot motels” (trapping crawlers in junk content), poisoning indexes.
Some suggest CDNs, WebSub/pubsubhubbub, or third-party hubs to offload polling; others resist CDNs as corrosive to an open, independently hosted web.

Miscellaneous tangents

Grammar digression on “which” vs “that.”
Reflections on falling traffic for small sites, search downranking, paywalls, and monopoly/antitrust politics.

Related topics