SimpleFold: Folding proteins is simpler than you think
What “simpler” means here
- Commenters clarify that “simple” is relative: protein structure prediction used to look near-intractable; now comparable-quality models can run on a single server or high-end Mac.
- SimpleFold uses a fairly standard transformer, not an LLM and not a heavily engineered AlphaFold-style architecture.
- It targets efficiency: model sizes (100M–3B parameters) and compute are far lower than AlphaFold2, making local inference more realistic for small labs.
Structure prediction vs true folding
- Multiple people stress this is structure prediction, not simulating the folding process or dynamics.
- AlphaFold and SimpleFold give end-state 3D structures; projects like Folding@home and molecular dynamics (MD) are still needed for trajectories, kinetics, stability, and environment effects.
- MD is not obsolete: it studies motion around the folded state and folding pathways, not just final shapes.
Relation to AlphaFold and training data
- A key caveat: most training data comes from AI-generated structures (AlphaFold, ESMFold, AF3-style replicas), not purely experimental structures.
- Several commenters frame this as classic knowledge distillation: complex MSA-based “teacher” models generate a large synthetic corpus for a simpler “student” model.
- This shifts complexity from the model to the data; the “simplicity” depends on earlier, expensive models and crystallography-derived ground truth.
- Some think this supports the “bitter lesson”: large data + scalable architectures matter more than intricate inductive biases; others argue it’s mostly an efficiency/distillation result, not a new conceptual breakthrough.
MSAs, generalization, and future directions
- AlphaFold’s reliance on multiple sequence alignments (MSAs) is seen as both powerful and limiting: good when homologs exist, weak for proteins without close relatives (e.g., immune receptors).
- Alignment-free models (ESM, SimpleFold) show MSAs might not be essential if enough structure data exists, especially as new experimental datasets (e.g., binding consortia) grow.
- There’s interest in whether adding back MSA-like signals to this simpler base could push performance further.
Apple’s motives and Siri contrast
- Speculation ranges from hardware marketing (show Macs can run serious science ML) and generic research prestige to internal research autonomy unrelated to products.
- Several people complain that Apple can ship protein models but not a competent Siri; replies note different teams, lower expectations for research models, and higher safety/UX bar for an open-world assistant.
Reception and skepticism
- Many are enthusiastic about democratizing protein structure prediction and the societal value of faster in silico folding.
- Some criticize the title as overselling: the approach is simpler and cheaper, but still behind state-of-the-art and heavily dependent on prior complex models.