Show HN: Kage – Shadow any website to a single binary for offline viewing
Purpose and core idea
- Tool mirrors entire websites (not just single pages) into offline-friendly bundles.
- Captures the rendered DOM after JS execution, strips scripts, and rewrites resources so snapshots work without network access.
Comparisons to existing tools
- Seen as a modern alternative to HTTrack/wget for JS-heavy sites (e.g., SPAs, Next.js).
- Compared frequently with SingleFile/SingleFileZ and archiveweb.page:
- SingleFile is praised for high-fidelity single-page saves and base64 bundling.
- SingleFile already supports scoped recursive crawls; some suggest borrowing its techniques for complex features (shadow DOMs, iframes, websockets, deduping).
- Also mentioned alongside grab-site, browsertrix, Kiwix/ZIM, gwtar, mhtml, Teleport Pro, curl, and “Print to PDF.”
Output formats & distribution
- Currently outputs:
- Static HTML folder.
- ZIM files (for Kiwix readers on multiple platforms).
- Self-contained executables that embed a webview and archive.
- Suggestions:
- Single self-contained HTML with client-side routing.
- Self-extracting archive that opens in the user’s browser (CHM-like).
- Markdown folder for use with tools like Obsidian/Logseq; EPUB export.
Use cases
- Offline reading of blogs, docs, company wikis, Confluence, and dev docs (e.g., Apple docs, Snowflake).
- Travel/offline scenarios (flights, trains, sites without connectivity).
- Long-term personal archives and research access.
Technical behavior & limitations
- Uses Chrome/Chromium; Docker recommended for easier setup.
- Current gaps/requests: throttling to reduce site load, excluding images/videos, partial crawls, cookie/auth support, handling paywalls, and embedded videos.
- Serving via a local HTTP server avoids file:// CORS/origin quirks; some debate about how restrictive browsers are in this mode.
Security & trust concerns
- Use of
--no-sandboxwith Chrome in Docker raised; later constrained to Docker via env/flags. - Some distrust storing archives as binaries; preference for HTML/ZIM for sharing.
- Antivirus false positive reported for a related binary.
- A few readers strongly dislike the README style, calling it “LLM slop,” and question whether code quality matches.
Archival ecosystem & long-term concerns
- Discussion of WARC vs mitmproxy captures, HTTP/2/WebSockets, and potential new archival formats.
- Interest in compatibility with established formats (WARC, ZIM) and questions about how “keep it for a decade” holds up over time (unclear).