SimpleFold: Folding proteins is simpler than you think

What “simpler” means here

  • Commenters clarify that “simple” is relative: protein structure prediction used to look near-intractable; now comparable-quality models can run on a single server or high-end Mac.
  • SimpleFold uses a fairly standard transformer, not an LLM and not a heavily engineered AlphaFold-style architecture.
  • It targets efficiency: model sizes (100M–3B parameters) and compute are far lower than AlphaFold2, making local inference more realistic for small labs.

Structure prediction vs true folding

  • Multiple people stress this is structure prediction, not simulating the folding process or dynamics.
  • AlphaFold and SimpleFold give end-state 3D structures; projects like Folding@home and molecular dynamics (MD) are still needed for trajectories, kinetics, stability, and environment effects.
  • MD is not obsolete: it studies motion around the folded state and folding pathways, not just final shapes.

Relation to AlphaFold and training data

  • A key caveat: most training data comes from AI-generated structures (AlphaFold, ESMFold, AF3-style replicas), not purely experimental structures.
  • Several commenters frame this as classic knowledge distillation: complex MSA-based “teacher” models generate a large synthetic corpus for a simpler “student” model.
  • This shifts complexity from the model to the data; the “simplicity” depends on earlier, expensive models and crystallography-derived ground truth.
  • Some think this supports the “bitter lesson”: large data + scalable architectures matter more than intricate inductive biases; others argue it’s mostly an efficiency/distillation result, not a new conceptual breakthrough.

MSAs, generalization, and future directions

  • AlphaFold’s reliance on multiple sequence alignments (MSAs) is seen as both powerful and limiting: good when homologs exist, weak for proteins without close relatives (e.g., immune receptors).
  • Alignment-free models (ESM, SimpleFold) show MSAs might not be essential if enough structure data exists, especially as new experimental datasets (e.g., binding consortia) grow.
  • There’s interest in whether adding back MSA-like signals to this simpler base could push performance further.

Apple’s motives and Siri contrast

  • Speculation ranges from hardware marketing (show Macs can run serious science ML) and generic research prestige to internal research autonomy unrelated to products.
  • Several people complain that Apple can ship protein models but not a competent Siri; replies note different teams, lower expectations for research models, and higher safety/UX bar for an open-world assistant.

Reception and skepticism

  • Many are enthusiastic about democratizing protein structure prediction and the societal value of faster in silico folding.
  • Some criticize the title as overselling: the approach is simpler and cheaper, but still behind state-of-the-art and heavily dependent on prior complex models.