Anthropic's original take home assignment open sourced
Assignment clarity and scope
- Several people initially found the repo confusing: the README is mostly about performance numbers, while the real instructions are buried in
perf_takehome.py. - The core task: modify
KernelBuilder.build_kernelto produce a faster instruction sequence for a simulated machine, with performance measured bytest_kernel_cycles. - Some argue the “cryptic” setup is intentional and realistic: quickly pulling a clear problem statement out of partial code and comments is itself part of the test. Others think this is too much reverse‑engineering for an interview.
Technical nature of the problem
- The “machine” is a simulated VLIW + SIMD architecture, conceptually closer to a GPU/TPU or DSP than a CPU, with instruction slots, vector ALU, and memory/scratch operations.
- The kernel is a synthetic tree‑like/random walk hashing problem chosen largely for its optimization hooks, not for real‑world utility.
- Multiple commenters compare it to demoscene/code golf: packing operations into minimal cycles, exploiting instruction‑level and data‑level parallelism.
Time limits, compensation, and candidate burden
- Confusion over the “2 hours” wording: is that a candidate limit, or just the time Claude used? Some say candidates had 4 hours; others thought longer was allowed.
- Many feel this is too large a task for an unpaid take‑home, especially given low odds of offer and the need to juggle multiple applications and life commitments.
- Some adopt a policy of refusing or asking to be paid for lengthy take‑homes; others note this effectively self‑excludes you from elite labs that have many willing applicants.
Hiring signal vs LeetCode
- Supporters like that this is tightly aligned with a performance‑engineering role, unlike generic LeetCode questions or CRUD apps.
- Critics note it selects for a narrow “optimizer” profile and doesn’t test system design, product sense, or teamwork, though defenders reply that’s fine for this specific role.
AI vs humans on the benchmark
- Multiple users ran various LLM agents against the task. Some models achieved large speedups and got near or below human‑reported numbers, though not always beating Anthropic’s published Opus result.
- There’s concern about whether models might “cheat” by exploiting knowledge of expected outputs; others assume Anthropic manually inspected solutions and used cycle counts from the simulator.
Tone and perception of Anthropic
- The line “so we can be appropriately impressed and perhaps discuss interviewing” is widely debated. Some read it as playful and non‑committal; many find it condescending or elitist.
- A few see the whole setup as marketing for Claude’s performance rather than a genuinely candidate‑friendly exercise.
Reflections on difficulty and expertise
- Several experienced engineers say the task humblingly highlights how specialized low‑level performance work is.
- Others push back against treating this as a universal bar: it’s one niche “game” among many in software, and being bad at this doesn’t make you a bad engineer.