Tenstorrent and the State of AI Hardware Startups
Tenstorrent and Non‑Nvidia Hardware Economics
- Some operators interested in “democratizing compute” report that demand is overwhelmingly Nvidia-centric; renting “fringe” hardware like Tenstorrent is a tough sell today.
- Catch‑22: without users, alternative hardware doesn’t get ecosystem support; without ecosystem, users won’t switch.
Memory Capacity as a Key Differentiator
- Multiple commenters argue Tenstorrent’s cards are not compelling vs consumer Nvidia GPUs: similar or lower memory/bandwidth, weaker software, and only modestly cheaper.
- Suggestion: dramatically increasing on‑card memory (e.g., 48–96GB, even on mediocre GPUs) could attract hobbyists and drive community‑built software stacks, breaking CUDA lock‑in.
- AMD is cited as an example of “good enough” hardware but weak ecosystem and limited ROCm support.
Competing AI Hardware Startups (Groq, Cerebras)
- Some skepticism about Groq’s economics and architecture: claims they need hundreds/thousands of chips per large model and mis‑forecasted LLM scale.
- Cerebras is described as operationally challenging: exotic cooling, concerns about reliability and replacement, and a “never turn it off” warranty clause.
- Others counter that Cerebras runs Llama very fast; efficiency, power, and capex per token are argued to matter more than peak speed.
Nvidia/AMD Dominance and Toolchains
- Frustration with Nvidia’s build tooling and drivers, but also recognition that their end‑to‑end stack is still unmatched.
- One view blames “shareholder rent‑seeking” for poor user experience; another stresses that the systems are inherently complex, fast‑moving, and buggy across all layers, not just drivers.
- If cheaper/faster alternatives that ran mainstream ML frameworks existed, many say they would switch, but no one has clearly done so yet.
“AI Hardware” vs Traditional HPC
- Some argue current “AI hardware” is essentially HPC with an AI‑focused marketing layer and will remain generally useful beyond the present AI boom.
- Others ask what non‑AI workloads would realistically justify such accelerators; no clear consensus emerges.
Future AI Workloads: Matmul vs Mixed Workloads
- Tenstorrent’s bet on mixed CPU+accelerator workloads is noted; commenters observe it hasn’t yet paid off in training, where dense linear algebra (MATMUL) still dominates.
- There is speculation that simply scaling the same decades‑old paradigm (bigger models, more data, more hardware) may be nearing limits, but no agreed‑upon “what’s next.”
LLMs, Junior Engineers, and Productivity
- Strong claims appear that modern LLMs (e.g., large models like Llama 3.1 405B or proprietary systems) let individuals produce code at or above junior level, raising questions about junior hiring.
- Many describe large productivity gains: rapid implementation of utilities, web/audio components, or even full apps with tests, by combining existing codebases with LLM refactors.
- Critics argue most real software involves complex requirements, integration, and long‑term maintenance, where LLMs still struggle—especially on large, intricate systems or novel, hardware‑constrained problems.
- There is concern that using LLMs to avoid hiring juniors is shortsighted: it reduces the pipeline of future seniors and shifts work to a few highly leveraged senior engineers plus tools.
Quality, Code Bloat, and Maintainability
- Some report LLMs excel on small, greenfield tasks but degrade on larger codebases; others report the opposite when giving models full project context.
- Many note LLM‑generated code often looks plausible but is subtly wrong, especially for complex frameworks, financial logic, or non‑idiomatic patterns, leading to “knowledge debt.”
- Several worry that super‑cheap code generation will inflate codebases, increasing bugs and long‑term maintenance costs without visible improvement in software quality.
Training and Learning for Juniors in an LLM World
- Concern: juniors may stop understanding fundamentals, blindly pasting AI output, unable to “run code in their head.”
- Suggestions:
- Don’t allow juniors to merge code they can’t explain; use Socratic questioning to enforce understanding.
- Assign harder tasks if AI makes current ones trivial, to keep learning pressure on.
- Use LLMs as patient tutors rather than code printers; combine them with reading docs and idiomatic examples.
- Some argue this is just another generational shift in abstraction: future devs may be judged on their ability to specify and direct LLMs, not to hand‑craft loops and boilerplate.
ARM–Qualcomm Dispute and RISC‑V Implications
- The ARM–Qualcomm/Nuvia licensing battle is debated, with conflicting interpretations of who breached architecture license agreements (ALAs).
- Key points from the thread:
- Qualcomm allegedly used Nuvia‑derived cores under Qualcomm’s cheaper ALA instead of Nuvia’s server‑oriented one; ARM disputes this and revoked certain rights.
- The exact contracts are secret; commenters stress that without seeing them, it’s unclear who is legally “right,” though both sides claim the other breached.
- Some see ARM’s behavior as a warning against sole‑source licensed IP and a driver pushing startups toward RISC‑V. Others argue clauses requiring consent on IP transfer are standard, and Nuvia would have known.
RISC‑V Ecosystem and Technical Debates
- One line of criticism claims parts of the RISC‑V community are “refighting old wars,” locking in questionable core design choices and prematurely ossifying the standard.
- Others push back, asking for specifics and arguing:
- RISC‑V compressed instructions are relatively easy to decode and don’t fundamentally hinder wide decoders.
- The ecosystem is large and collaborative; no single company (e.g., a major IP vendor) fully controls it.
- There are already higher‑performance cores (e.g., XiangShan) and ongoing work on vector extensions that may deliver scalable performance on existing binaries.
- The discussion ends without resolution; accusations of vagueness and lack of concrete criticism remain.