2025-09-26

SimpleFold: Folding proteins is simpler than you think

What “simpler” means here

Commenters clarify that “simple” is relative: protein structure prediction used to look near-intractable; now comparable-quality models can run on a single server or high-end Mac.
SimpleFold uses a fairly standard transformer, not an LLM and not a heavily engineered AlphaFold-style architecture.
It targets efficiency: model sizes (100M–3B parameters) and compute are far lower than AlphaFold2, making local inference more realistic for small labs.

Structure prediction vs true folding

Multiple people stress this is structure prediction, not simulating the folding process or dynamics.
AlphaFold and SimpleFold give end-state 3D structures; projects like Folding@home and molecular dynamics (MD) are still needed for trajectories, kinetics, stability, and environment effects.
MD is not obsolete: it studies motion around the folded state and folding pathways, not just final shapes.

Relation to AlphaFold and training data

A key caveat: most training data comes from AI-generated structures (AlphaFold, ESMFold, AF3-style replicas), not purely experimental structures.
Several commenters frame this as classic knowledge distillation: complex MSA-based “teacher” models generate a large synthetic corpus for a simpler “student” model.
This shifts complexity from the model to the data; the “simplicity” depends on earlier, expensive models and crystallography-derived ground truth.
Some think this supports the “bitter lesson”: large data + scalable architectures matter more than intricate inductive biases; others argue it’s mostly an efficiency/distillation result, not a new conceptual breakthrough.

MSAs, generalization, and future directions

AlphaFold’s reliance on multiple sequence alignments (MSAs) is seen as both powerful and limiting: good when homologs exist, weak for proteins without close relatives (e.g., immune receptors).
Alignment-free models (ESM, SimpleFold) show MSAs might not be essential if enough structure data exists, especially as new experimental datasets (e.g., binding consortia) grow.
There’s interest in whether adding back MSA-like signals to this simpler base could push performance further.

Apple’s motives and Siri contrast

Speculation ranges from hardware marketing (show Macs can run serious science ML) and generic research prestige to internal research autonomy unrelated to products.
Several people complain that Apple can ship protein models but not a competent Siri; replies note different teams, lower expectations for research models, and higher safety/UX bar for an open-world assistant.

Reception and skepticism

Many are enthusiastic about democratizing protein structure prediction and the societal value of faster in silico folding.
Some criticize the title as overselling: the approach is simpler and cheaper, but still behind state-of-the-art and heavily dependent on prior complex models.

Related topics