2025-12-07

I failed to recreate the 1996 Space Jam website with Claude

Web tech & the original Space Jam site

Several comments note the 1996 site actually used table-based layout, not CSS absolute positioning; early versions even used server-side image maps before moving to static tables.
People suggest prompting the model explicitly to use <table> layouts and 1990s-era techniques, though others argue only tables and CSS ever mattered in practice.
Some nostalgia and technical detail about 90s browser quirks (font metrics, gamma differences, nested tables, 1×1 spacer GIFs, sliced images, Dreamweaver/Photoshop workflows).

Why multimodal LLMs struggle here

Multiple commenters say current multimodal LLMs don’t “see pixels”: images are chopped into patches and embedded into a semantic vector space, destroying precise geometry.
Pixel-perfect tasks, exact coordinates, and spatial layouts (ASCII art, circles, game UIs) are repeatedly cited as consistent weak spots, even when models are strong at general coding.
Someone points out that models often parse 2D content poorly even as text.

Suggested better approaches

Strong theme: don’t one-shot. Use iterative, agentic workflows:
- Have the model write image-processing tools (OpenCV, template matching) to locate assets and measure offsets.
- Use Playwright or browser tooling to render, screenshot, diff against the target, and loop until tests pass.
- Treat this as TDD: first write a test that compares rendered output to the screenshot, then have the model satisfy the test.
Several people report getting much closer or essentially perfect results with this tooling+feedback setup, though often with hacks (e.g., using the screenshot itself as a background).

Benchmark value & realism

Some see the task as contrived (“just download the HTML”); others note it mirrors real workflows where developers implement UIs from static mocks or screenshots.
Many say the exercise usefully maps the boundary: models are good at “make something like X” but bad at “recreate X exactly.”

Trust, overconfidence, and tool role

Commenters stress that LLMs are overconfident and their failure modes are opaque; juniors may not recognize subtle mistakes.
Debate over whether a tool that needs checking is “bad” or simply incomplete but still useful if it does 80–90% of the work.
Several frame LLMs as cheap, fallible interns that require supervision and external verification rather than as autonomous programmers.

Related topics