O1 isn't a chat model (and that's the point)

Prompting o1 vs OpenAI’s own guidance

  • OpenAI’s docs say o1 works best with brief, clear prompts and minimal extra info.
  • The article (and several commenters) argue the opposite: o1 often performs best when “stuffed” with extensive context plus a simple, focused instruction.
  • Some see this as contradiction; others frame it as bimodal: simple prompts help less-skilled users, while expert users can outperform docs by crafting rich, highly structured prompts.
  • Skeptics note there’s little hard evidence or evals comparing these strategies; they want concrete prompt/response examples.

Capabilities and limitations of o1

  • Widely agreed: o1 is strong on math, coding, logic puzzles, and structured troubleshooting, and more consistent than 4o on such tasks.
  • Several users find it worse than 4o for chatty, creative, or open-ended tasks.
  • Some praise o1 for better instruction-following, extrapolation from examples, “pushing back” when the user is wrong, and being less censored.
  • Others complain about bugs, long or failed runs, and the need for large, carefully prepared prompts; some see this as a regression, not a feature.
  • Many feel the current $200/month price is hard to justify; maybe viable at much lower price points.

Narrow reasoning vs AGI debate

  • One camp: o1 is a step back toward narrow AI—great at specific reasoning, but not more “generally intelligent” than prior models and not a path to AGI.
  • Another camp: LLM-based systems (including o1) may be key building blocks for future AGI, even if they’re not sufficient alone.
  • A substantial faction argues LLMs will never yield AGI: they frame LLMs as pattern-matching, non-thinking systems, less “intelligent” than simple animals.
  • Others push back that this confidence is unwarranted given incomplete understanding of human intelligence and historical tech trajectories (e.g., aviation → spaceflight).
  • There’s broad agreement that “AGI” itself is poorly defined and heavily used for marketing hype.

Architecture, chain-of-thought, and context

  • Multiple comments highlight an architectural limitation: o1 appears unable to reuse its own prior chain-of-thought across turns.
  • OpenAI docs say its intermediate “reasoning tokens” are not visible in later steps; this may weaken multi-step chain-of-thought and push it back toward one-shot pattern matching.
  • Some suggest future improvements via vastly larger context windows, better summaries, or retrieval of past reasoning traces.

User strategies and prompting patterns

  • Effective patterns reported:
    • Provide lots of domain context + a concise, unambiguous task.
    • Avoid heavy “guidance”; let o1 reason, but cap ambiguity and ask it to clarify when unsure.
    • Use other models (e.g., 4o) to help structure specs, outlines, and missing info, then hand the curated context to o1.
    • Sometimes restart with a fresh chat and refined “report-style” prompt rather than iterating ad hoc.
  • Some users report o1 can generate or stitch together entire toolchains or services from a detailed spec and example project, but this is still experimental.

Adoption, churn, and education

  • Frequent model changes and shifting best practices make prompting strategies feel ephemeral; some expect any “manual” to be obsolete within weeks.
  • This instability, plus unreliable outputs, is seen as a barrier to stable business use.
  • In creative fields (e.g., art school debates around Stable Diffusion), some argue tools should still be taught—focusing on exploration, critique, and “generative art” concepts rather than any specific model version.
  • Others worry that educators use rapid change as an excuse to avoid engaging with AI at all.

Safety and medical use concerns

  • A subthread criticizes using o1 for medical diagnosis, especially when described as “shockingly close” but only correct part of the time.
  • Several commenters stress that 60% correctness is unacceptable for diagnosis; people should not treat o1 as a doctor.
  • Counterpoints: human doctors are also fallible, and LLMs might be helpful as an extra research aid if users remain skeptical and seek real medical professionals for decisions.