Anthropic's original take home assignment open sourced

Assignment clarity and scope

  • Several people initially found the repo confusing: the README is mostly about performance numbers, while the real instructions are buried in perf_takehome.py.
  • The core task: modify KernelBuilder.build_kernel to produce a faster instruction sequence for a simulated machine, with performance measured by test_kernel_cycles.
  • Some argue the “cryptic” setup is intentional and realistic: quickly pulling a clear problem statement out of partial code and comments is itself part of the test. Others think this is too much reverse‑engineering for an interview.

Technical nature of the problem

  • The “machine” is a simulated VLIW + SIMD architecture, conceptually closer to a GPU/TPU or DSP than a CPU, with instruction slots, vector ALU, and memory/scratch operations.
  • The kernel is a synthetic tree‑like/random walk hashing problem chosen largely for its optimization hooks, not for real‑world utility.
  • Multiple commenters compare it to demoscene/code golf: packing operations into minimal cycles, exploiting instruction‑level and data‑level parallelism.

Time limits, compensation, and candidate burden

  • Confusion over the “2 hours” wording: is that a candidate limit, or just the time Claude used? Some say candidates had 4 hours; others thought longer was allowed.
  • Many feel this is too large a task for an unpaid take‑home, especially given low odds of offer and the need to juggle multiple applications and life commitments.
  • Some adopt a policy of refusing or asking to be paid for lengthy take‑homes; others note this effectively self‑excludes you from elite labs that have many willing applicants.

Hiring signal vs LeetCode

  • Supporters like that this is tightly aligned with a performance‑engineering role, unlike generic LeetCode questions or CRUD apps.
  • Critics note it selects for a narrow “optimizer” profile and doesn’t test system design, product sense, or teamwork, though defenders reply that’s fine for this specific role.

AI vs humans on the benchmark

  • Multiple users ran various LLM agents against the task. Some models achieved large speedups and got near or below human‑reported numbers, though not always beating Anthropic’s published Opus result.
  • There’s concern about whether models might “cheat” by exploiting knowledge of expected outputs; others assume Anthropic manually inspected solutions and used cycle counts from the simulator.

Tone and perception of Anthropic

  • The line “so we can be appropriately impressed and perhaps discuss interviewing” is widely debated. Some read it as playful and non‑committal; many find it condescending or elitist.
  • A few see the whole setup as marketing for Claude’s performance rather than a genuinely candidate‑friendly exercise.

Reflections on difficulty and expertise

  • Several experienced engineers say the task humblingly highlights how specialized low‑level performance work is.
  • Others push back against treating this as a universal bar: it’s one niche “game” among many in software, and being bad at this doesn’t make you a bad engineer.