I failed to recreate the 1996 Space Jam website with Claude

Web tech & the original Space Jam site

  • Several comments note the 1996 site actually used table-based layout, not CSS absolute positioning; early versions even used server-side image maps before moving to static tables.
  • People suggest prompting the model explicitly to use <table> layouts and 1990s-era techniques, though others argue only tables and CSS ever mattered in practice.
  • Some nostalgia and technical detail about 90s browser quirks (font metrics, gamma differences, nested tables, 1×1 spacer GIFs, sliced images, Dreamweaver/Photoshop workflows).

Why multimodal LLMs struggle here

  • Multiple commenters say current multimodal LLMs don’t “see pixels”: images are chopped into patches and embedded into a semantic vector space, destroying precise geometry.
  • Pixel-perfect tasks, exact coordinates, and spatial layouts (ASCII art, circles, game UIs) are repeatedly cited as consistent weak spots, even when models are strong at general coding.
  • Someone points out that models often parse 2D content poorly even as text.

Suggested better approaches

  • Strong theme: don’t one-shot. Use iterative, agentic workflows:
    • Have the model write image-processing tools (OpenCV, template matching) to locate assets and measure offsets.
    • Use Playwright or browser tooling to render, screenshot, diff against the target, and loop until tests pass.
    • Treat this as TDD: first write a test that compares rendered output to the screenshot, then have the model satisfy the test.
  • Several people report getting much closer or essentially perfect results with this tooling+feedback setup, though often with hacks (e.g., using the screenshot itself as a background).

Benchmark value & realism

  • Some see the task as contrived (“just download the HTML”); others note it mirrors real workflows where developers implement UIs from static mocks or screenshots.
  • Many say the exercise usefully maps the boundary: models are good at “make something like X” but bad at “recreate X exactly.”

Trust, overconfidence, and tool role

  • Commenters stress that LLMs are overconfident and their failure modes are opaque; juniors may not recognize subtle mistakes.
  • Debate over whether a tool that needs checking is “bad” or simply incomplete but still useful if it does 80–90% of the work.
  • Several frame LLMs as cheap, fallible interns that require supervision and external verification rather than as autonomous programmers.