O1 isn't a chat model (and that's the point)
Prompting o1 vs OpenAI’s own guidance
- OpenAI’s docs say o1 works best with brief, clear prompts and minimal extra info.
- The article (and several commenters) argue the opposite: o1 often performs best when “stuffed” with extensive context plus a simple, focused instruction.
- Some see this as contradiction; others frame it as bimodal: simple prompts help less-skilled users, while expert users can outperform docs by crafting rich, highly structured prompts.
- Skeptics note there’s little hard evidence or evals comparing these strategies; they want concrete prompt/response examples.
Capabilities and limitations of o1
- Widely agreed: o1 is strong on math, coding, logic puzzles, and structured troubleshooting, and more consistent than 4o on such tasks.
- Several users find it worse than 4o for chatty, creative, or open-ended tasks.
- Some praise o1 for better instruction-following, extrapolation from examples, “pushing back” when the user is wrong, and being less censored.
- Others complain about bugs, long or failed runs, and the need for large, carefully prepared prompts; some see this as a regression, not a feature.
- Many feel the current $200/month price is hard to justify; maybe viable at much lower price points.
Narrow reasoning vs AGI debate
- One camp: o1 is a step back toward narrow AI—great at specific reasoning, but not more “generally intelligent” than prior models and not a path to AGI.
- Another camp: LLM-based systems (including o1) may be key building blocks for future AGI, even if they’re not sufficient alone.
- A substantial faction argues LLMs will never yield AGI: they frame LLMs as pattern-matching, non-thinking systems, less “intelligent” than simple animals.
- Others push back that this confidence is unwarranted given incomplete understanding of human intelligence and historical tech trajectories (e.g., aviation → spaceflight).
- There’s broad agreement that “AGI” itself is poorly defined and heavily used for marketing hype.
Architecture, chain-of-thought, and context
- Multiple comments highlight an architectural limitation: o1 appears unable to reuse its own prior chain-of-thought across turns.
- OpenAI docs say its intermediate “reasoning tokens” are not visible in later steps; this may weaken multi-step chain-of-thought and push it back toward one-shot pattern matching.
- Some suggest future improvements via vastly larger context windows, better summaries, or retrieval of past reasoning traces.
User strategies and prompting patterns
- Effective patterns reported:
- Provide lots of domain context + a concise, unambiguous task.
- Avoid heavy “guidance”; let o1 reason, but cap ambiguity and ask it to clarify when unsure.
- Use other models (e.g., 4o) to help structure specs, outlines, and missing info, then hand the curated context to o1.
- Sometimes restart with a fresh chat and refined “report-style” prompt rather than iterating ad hoc.
- Some users report o1 can generate or stitch together entire toolchains or services from a detailed spec and example project, but this is still experimental.
Adoption, churn, and education
- Frequent model changes and shifting best practices make prompting strategies feel ephemeral; some expect any “manual” to be obsolete within weeks.
- This instability, plus unreliable outputs, is seen as a barrier to stable business use.
- In creative fields (e.g., art school debates around Stable Diffusion), some argue tools should still be taught—focusing on exploration, critique, and “generative art” concepts rather than any specific model version.
- Others worry that educators use rapid change as an excuse to avoid engaging with AI at all.
Safety and medical use concerns
- A subthread criticizes using o1 for medical diagnosis, especially when described as “shockingly close” but only correct part of the time.
- Several commenters stress that 60% correctness is unacceptable for diagnosis; people should not treat o1 as a doctor.
- Counterpoints: human doctors are also fallible, and LLMs might be helpful as an extra research aid if users remain skeptical and seek real medical professionals for decisions.