2026-03-10

Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUs

Relationship to Existing Architectures

Several commenters relate the technique to Mixture-of-Experts, looped / recurrent LLMs, and models like Ouro-LLM, LoopLM, and SOLAR that duplicate or reuse layers.
OP’s method is described as orthogonal to MoE: it works on dense and MoE models by repeating vertical chunks of the stack, not sparsifying experts.
Others note that adding/swapping/duplicating layers has prior art (ResNets, StyleGAN, upcycling), and recent papers argue that pre-LN transformers make middle layers near-identity, concentrating “real computation” in the middle.

Middle Layers as “Organs” / Functional Circuits

A key theme is that contiguous mid-layer blocks behave like emergent “organs” or circuits: duplicating whole blocks improves performance, but single layers or arbitrary mixes do not.
Heatmaps across layers are interpreted as showing boundaries between such organs (e.g., input encoding, “reasoning”, output decoding).
Commenters link this to CKA analyses and other work showing neighboring middle layers have similar representations and residual connections preserve a stable latent space.
There is debate whether these patterns are universal structures or artifacts of particular training procedures.

Base64 and Latent “Thought Language”

Many are struck by the observation that LLMs can read/write base64 or hex, reason over it, and convert back, despite seemingly limited exposure to such text.
Some argue models have likely seen enough base64 in web and email corpora; others stress that the behavior still implies an internal “translation circuit” that maps encoded text into a common reasoning space.
This motivates the broader hypothesis of a shared latent “thought language” used across modalities and encodings.

Experiments, Tools, and Limitations

Duplicating individual layers or repeating the same block many times generally hurts performance; gains appear only for specific mid-blocks and limited repetition.
Multiple disjoint duplicated regions and meta-models (e.g., XGBoost) to predict good merges have been tried, but details are deferred to future posts.
Combinatorial explosion is a recurring concern when considering arbitrary reordering or routing between layers.

Speculation and Future Directions

Ideas raised include:
- Looping specific reasoning blocks versus whole-model loops.
- Dynamic routing that chooses which layer or block to apply next.
- Variable-depth inference (“how hard to think” knob per token).
- Pluggable “knowledge banks” or standardized encode/logic/decode modules.
- Combining organs from different models, or adding new modalities via surgery.
Commenters note that hobbyist “LLM brain surgery” is exploring spaces corporate and academic work may have deprioritized due to cost or focus.

Community Reaction and Open Questions

The thread is overwhelmingly enthusiastic about the ingenuity, clarity of the writeup, and the sense of “poking a synthetic brain.”
Some view the findings as surprising and under-appreciated; others see them as a natural consequence of residual architectures and known optimization behavior.
Open questions include how general these organs are across tasks, models, and sizes, and whether training with loops from the start would outperform post-hoc surgery.

Related topics