OpenAI's new open-source model is basically Phi-5

Open source vs “open weights”

  • Major debate over whether gpt-oss (and similar models) are truly “open source” or just “open weights.”
  • One side argues: Apache/MIT licenses + freely modifiable weights satisfy open source definitions; for LLMs, weights are the “preferred form for modification,” analogous to config + hard‑coded constants. Training data and pipelines are IP/know‑how, not required.
  • The other side counters: without training data, training code, and evaluation pipelines, you cannot realistically reproduce or meaningfully improve the model; weights are more like bytecode or a binary blob. Calling this “open source” is seen as misleading or user‑hostile.
  • Some extend this to a broader point: traditional OSS definitions assumed a source/object dichotomy that doesn’t map cleanly onto models.

Synthetic data and knowledge

  • Commenters link gpt-oss to the Phi family and note Microsoft documentation: gpt-oss was trained primarily on synthetic data plus heavily filtered real code.
  • Discussion on whether a synthetic-only model can still emit sensitive content (e.g., drug synthesis): in theory yes, if that knowledge was present in generating models or emerges via generalization, but it’s “not likely” for highly specific instructions.
  • Others emphasize that modern LLMs can generalize and create genuinely new text (e.g., proofs or novel code) even if not seen verbatim.

Safety, censorship, and erotic role-play

  • Strong guardrails observed: models quote policy, refuse sexual and some violent content, and sometimes “melt down” in creative/translation tasks over mild references (e.g., teenage romance, “chained to a bed” metaphors).
  • Many argue this makes gpt-oss poor for fiction, translation, or adult but non‑pornographic discussion.
  • Several comments claim most fine‑tunes of small local models are for erotic role‑play, citing open-hosting usage rankings where role-play chats appear heavily. Others are skeptical or annoyed by unsupported “50% perverts” claims.
  • Long subthread on whether explicit or taboo simulations reduce harm (methadone analogy) or entrench paraphilias; no clear consensus, and little hard evidence cited.

Use cases and qualitative performance

  • Multiple users report gpt-oss 20B performing impressively on code and reasoning: tricky SQL updates, subtle unit/physics checks, identifying ill‑posed questions, explaining obfuscated code, recognizing Y combinators, etc., often outperforming similarly sized open models.
  • Others find it stubborn (won’t admit errors) or too policy‑obsessed to be trusted.
  • Gaming/DM and world-simulation experiments show models can generate coherent but often generic scenarios, highly suggestible to user hints.

Business vs hobbyist needs

  • Several note a split: businesses prefer over‑safe, boring models for support bots and education; local communities want minimal guardrails and personalization (including porn).
  • Some argue that reputational risk from uncensoring is overstated; users mostly judge on capability, not how quickly the community removes guardrails.

Hallucination, knowledge gaps, and future direction

  • Cited internal evals show gpt-oss 20B/120B have low accuracy and very high hallucination rates compared with o4‑mini and especially o3, reinforcing that they have limited real‑world knowledge by design, similar to prior Phi models.
  • One commenter sees this “knowledge‑light” design as a feature for safety; others see it as a serious capability gap.
  • Broader speculation that model “intelligence” may plateau or even degrade due to data pollution and diminishing returns, while overall product usefulness continues to rise via better tool use, agents, and integration.