2024-12-15

Tenstorrent and the State of AI Hardware Startups

Tenstorrent and Non‑Nvidia Hardware Economics

Some operators interested in “democratizing compute” report that demand is overwhelmingly Nvidia-centric; renting “fringe” hardware like Tenstorrent is a tough sell today.
Catch‑22: without users, alternative hardware doesn’t get ecosystem support; without ecosystem, users won’t switch.

Memory Capacity as a Key Differentiator

Multiple commenters argue Tenstorrent’s cards are not compelling vs consumer Nvidia GPUs: similar or lower memory/bandwidth, weaker software, and only modestly cheaper.
Suggestion: dramatically increasing on‑card memory (e.g., 48–96GB, even on mediocre GPUs) could attract hobbyists and drive community‑built software stacks, breaking CUDA lock‑in.
AMD is cited as an example of “good enough” hardware but weak ecosystem and limited ROCm support.

Competing AI Hardware Startups (Groq, Cerebras)

Some skepticism about Groq’s economics and architecture: claims they need hundreds/thousands of chips per large model and mis‑forecasted LLM scale.
Cerebras is described as operationally challenging: exotic cooling, concerns about reliability and replacement, and a “never turn it off” warranty clause.
Others counter that Cerebras runs Llama very fast; efficiency, power, and capex per token are argued to matter more than peak speed.

Nvidia/AMD Dominance and Toolchains

Frustration with Nvidia’s build tooling and drivers, but also recognition that their end‑to‑end stack is still unmatched.
One view blames “shareholder rent‑seeking” for poor user experience; another stresses that the systems are inherently complex, fast‑moving, and buggy across all layers, not just drivers.
If cheaper/faster alternatives that ran mainstream ML frameworks existed, many say they would switch, but no one has clearly done so yet.

“AI Hardware” vs Traditional HPC

Some argue current “AI hardware” is essentially HPC with an AI‑focused marketing layer and will remain generally useful beyond the present AI boom.
Others ask what non‑AI workloads would realistically justify such accelerators; no clear consensus emerges.

Future AI Workloads: Matmul vs Mixed Workloads

Tenstorrent’s bet on mixed CPU+accelerator workloads is noted; commenters observe it hasn’t yet paid off in training, where dense linear algebra (MATMUL) still dominates.
There is speculation that simply scaling the same decades‑old paradigm (bigger models, more data, more hardware) may be nearing limits, but no agreed‑upon “what’s next.”

LLMs, Junior Engineers, and Productivity

Strong claims appear that modern LLMs (e.g., large models like Llama 3.1 405B or proprietary systems) let individuals produce code at or above junior level, raising questions about junior hiring.
Many describe large productivity gains: rapid implementation of utilities, web/audio components, or even full apps with tests, by combining existing codebases with LLM refactors.
Critics argue most real software involves complex requirements, integration, and long‑term maintenance, where LLMs still struggle—especially on large, intricate systems or novel, hardware‑constrained problems.
There is concern that using LLMs to avoid hiring juniors is shortsighted: it reduces the pipeline of future seniors and shifts work to a few highly leveraged senior engineers plus tools.

Quality, Code Bloat, and Maintainability

Some report LLMs excel on small, greenfield tasks but degrade on larger codebases; others report the opposite when giving models full project context.
Many note LLM‑generated code often looks plausible but is subtly wrong, especially for complex frameworks, financial logic, or non‑idiomatic patterns, leading to “knowledge debt.”
Several worry that super‑cheap code generation will inflate codebases, increasing bugs and long‑term maintenance costs without visible improvement in software quality.

Training and Learning for Juniors in an LLM World

Concern: juniors may stop understanding fundamentals, blindly pasting AI output, unable to “run code in their head.”
Suggestions:
- Don’t allow juniors to merge code they can’t explain; use Socratic questioning to enforce understanding.
- Assign harder tasks if AI makes current ones trivial, to keep learning pressure on.
- Use LLMs as patient tutors rather than code printers; combine them with reading docs and idiomatic examples.
Some argue this is just another generational shift in abstraction: future devs may be judged on their ability to specify and direct LLMs, not to hand‑craft loops and boilerplate.

ARM–Qualcomm Dispute and RISC‑V Implications

The ARM–Qualcomm/Nuvia licensing battle is debated, with conflicting interpretations of who breached architecture license agreements (ALAs).
Key points from the thread:
- Qualcomm allegedly used Nuvia‑derived cores under Qualcomm’s cheaper ALA instead of Nuvia’s server‑oriented one; ARM disputes this and revoked certain rights.
- The exact contracts are secret; commenters stress that without seeing them, it’s unclear who is legally “right,” though both sides claim the other breached.
- Some see ARM’s behavior as a warning against sole‑source licensed IP and a driver pushing startups toward RISC‑V. Others argue clauses requiring consent on IP transfer are standard, and Nuvia would have known.

RISC‑V Ecosystem and Technical Debates

One line of criticism claims parts of the RISC‑V community are “refighting old wars,” locking in questionable core design choices and prematurely ossifying the standard.
Others push back, asking for specifics and arguing:
- RISC‑V compressed instructions are relatively easy to decode and don’t fundamentally hinder wide decoders.
- The ecosystem is large and collaborative; no single company (e.g., a major IP vendor) fully controls it.
- There are already higher‑performance cores (e.g., XiangShan) and ongoing work on vector extensions that may deliver scalable performance on existing binaries.
The discussion ends without resolution; accusations of vagueness and lack of concrete criticism remain.

Related topics