Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUs
Relationship to Existing Architectures
- Several commenters relate the technique to Mixture-of-Experts, looped / recurrent LLMs, and models like Ouro-LLM, LoopLM, and SOLAR that duplicate or reuse layers.
- OP’s method is described as orthogonal to MoE: it works on dense and MoE models by repeating vertical chunks of the stack, not sparsifying experts.
- Others note that adding/swapping/duplicating layers has prior art (ResNets, StyleGAN, upcycling), and recent papers argue that pre-LN transformers make middle layers near-identity, concentrating “real computation” in the middle.
Middle Layers as “Organs” / Functional Circuits
- A key theme is that contiguous mid-layer blocks behave like emergent “organs” or circuits: duplicating whole blocks improves performance, but single layers or arbitrary mixes do not.
- Heatmaps across layers are interpreted as showing boundaries between such organs (e.g., input encoding, “reasoning”, output decoding).
- Commenters link this to CKA analyses and other work showing neighboring middle layers have similar representations and residual connections preserve a stable latent space.
- There is debate whether these patterns are universal structures or artifacts of particular training procedures.
Base64 and Latent “Thought Language”
- Many are struck by the observation that LLMs can read/write base64 or hex, reason over it, and convert back, despite seemingly limited exposure to such text.
- Some argue models have likely seen enough base64 in web and email corpora; others stress that the behavior still implies an internal “translation circuit” that maps encoded text into a common reasoning space.
- This motivates the broader hypothesis of a shared latent “thought language” used across modalities and encodings.
Experiments, Tools, and Limitations
- Duplicating individual layers or repeating the same block many times generally hurts performance; gains appear only for specific mid-blocks and limited repetition.
- Multiple disjoint duplicated regions and meta-models (e.g., XGBoost) to predict good merges have been tried, but details are deferred to future posts.
- Combinatorial explosion is a recurring concern when considering arbitrary reordering or routing between layers.
Speculation and Future Directions
- Ideas raised include:
- Looping specific reasoning blocks versus whole-model loops.
- Dynamic routing that chooses which layer or block to apply next.
- Variable-depth inference (“how hard to think” knob per token).
- Pluggable “knowledge banks” or standardized encode/logic/decode modules.
- Combining organs from different models, or adding new modalities via surgery.
- Commenters note that hobbyist “LLM brain surgery” is exploring spaces corporate and academic work may have deprioritized due to cost or focus.
Community Reaction and Open Questions
- The thread is overwhelmingly enthusiastic about the ingenuity, clarity of the writeup, and the sense of “poking a synthetic brain.”
- Some view the findings as surprising and under-appreciated; others see them as a natural consequence of residual architectures and known optimization behavior.
- Open questions include how general these organs are across tasks, models, and sizes, and whether training with loops from the start would outperform post-hoc surgery.