Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUs

Relationship to Existing Architectures

  • Several commenters relate the technique to Mixture-of-Experts, looped / recurrent LLMs, and models like Ouro-LLM, LoopLM, and SOLAR that duplicate or reuse layers.
  • OP’s method is described as orthogonal to MoE: it works on dense and MoE models by repeating vertical chunks of the stack, not sparsifying experts.
  • Others note that adding/swapping/duplicating layers has prior art (ResNets, StyleGAN, upcycling), and recent papers argue that pre-LN transformers make middle layers near-identity, concentrating “real computation” in the middle.

Middle Layers as “Organs” / Functional Circuits

  • A key theme is that contiguous mid-layer blocks behave like emergent “organs” or circuits: duplicating whole blocks improves performance, but single layers or arbitrary mixes do not.
  • Heatmaps across layers are interpreted as showing boundaries between such organs (e.g., input encoding, “reasoning”, output decoding).
  • Commenters link this to CKA analyses and other work showing neighboring middle layers have similar representations and residual connections preserve a stable latent space.
  • There is debate whether these patterns are universal structures or artifacts of particular training procedures.

Base64 and Latent “Thought Language”

  • Many are struck by the observation that LLMs can read/write base64 or hex, reason over it, and convert back, despite seemingly limited exposure to such text.
  • Some argue models have likely seen enough base64 in web and email corpora; others stress that the behavior still implies an internal “translation circuit” that maps encoded text into a common reasoning space.
  • This motivates the broader hypothesis of a shared latent “thought language” used across modalities and encodings.

Experiments, Tools, and Limitations

  • Duplicating individual layers or repeating the same block many times generally hurts performance; gains appear only for specific mid-blocks and limited repetition.
  • Multiple disjoint duplicated regions and meta-models (e.g., XGBoost) to predict good merges have been tried, but details are deferred to future posts.
  • Combinatorial explosion is a recurring concern when considering arbitrary reordering or routing between layers.

Speculation and Future Directions

  • Ideas raised include:
    • Looping specific reasoning blocks versus whole-model loops.
    • Dynamic routing that chooses which layer or block to apply next.
    • Variable-depth inference (“how hard to think” knob per token).
    • Pluggable “knowledge banks” or standardized encode/logic/decode modules.
    • Combining organs from different models, or adding new modalities via surgery.
  • Commenters note that hobbyist “LLM brain surgery” is exploring spaces corporate and academic work may have deprioritized due to cost or focus.

Community Reaction and Open Questions

  • The thread is overwhelmingly enthusiastic about the ingenuity, clarity of the writeup, and the sense of “poking a synthetic brain.”
  • Some view the findings as surprising and under-appreciated; others see them as a natural consequence of residual architectures and known optimization behavior.
  • Open questions include how general these organs are across tasks, models, and sizes, and whether training with loops from the start would outperform post-hoc surgery.