2025-01-18

O1 isn't a chat model (and that's the point)

Prompting o1 vs OpenAI’s own guidance

OpenAI’s docs say o1 works best with brief, clear prompts and minimal extra info.
The article (and several commenters) argue the opposite: o1 often performs best when “stuffed” with extensive context plus a simple, focused instruction.
Some see this as contradiction; others frame it as bimodal: simple prompts help less-skilled users, while expert users can outperform docs by crafting rich, highly structured prompts.
Skeptics note there’s little hard evidence or evals comparing these strategies; they want concrete prompt/response examples.

Capabilities and limitations of o1

Widely agreed: o1 is strong on math, coding, logic puzzles, and structured troubleshooting, and more consistent than 4o on such tasks.
Several users find it worse than 4o for chatty, creative, or open-ended tasks.
Some praise o1 for better instruction-following, extrapolation from examples, “pushing back” when the user is wrong, and being less censored.
Others complain about bugs, long or failed runs, and the need for large, carefully prepared prompts; some see this as a regression, not a feature.
Many feel the current $200/month price is hard to justify; maybe viable at much lower price points.

Narrow reasoning vs AGI debate

One camp: o1 is a step back toward narrow AI—great at specific reasoning, but not more “generally intelligent” than prior models and not a path to AGI.
Another camp: LLM-based systems (including o1) may be key building blocks for future AGI, even if they’re not sufficient alone.
A substantial faction argues LLMs will never yield AGI: they frame LLMs as pattern-matching, non-thinking systems, less “intelligent” than simple animals.
Others push back that this confidence is unwarranted given incomplete understanding of human intelligence and historical tech trajectories (e.g., aviation → spaceflight).
There’s broad agreement that “AGI” itself is poorly defined and heavily used for marketing hype.

Architecture, chain-of-thought, and context

Multiple comments highlight an architectural limitation: o1 appears unable to reuse its own prior chain-of-thought across turns.
OpenAI docs say its intermediate “reasoning tokens” are not visible in later steps; this may weaken multi-step chain-of-thought and push it back toward one-shot pattern matching.
Some suggest future improvements via vastly larger context windows, better summaries, or retrieval of past reasoning traces.

User strategies and prompting patterns

Effective patterns reported:
- Provide lots of domain context + a concise, unambiguous task.
- Avoid heavy “guidance”; let o1 reason, but cap ambiguity and ask it to clarify when unsure.
- Use other models (e.g., 4o) to help structure specs, outlines, and missing info, then hand the curated context to o1.
- Sometimes restart with a fresh chat and refined “report-style” prompt rather than iterating ad hoc.
Some users report o1 can generate or stitch together entire toolchains or services from a detailed spec and example project, but this is still experimental.

Adoption, churn, and education

Frequent model changes and shifting best practices make prompting strategies feel ephemeral; some expect any “manual” to be obsolete within weeks.
This instability, plus unreliable outputs, is seen as a barrier to stable business use.
In creative fields (e.g., art school debates around Stable Diffusion), some argue tools should still be taught—focusing on exploration, critique, and “generative art” concepts rather than any specific model version.
Others worry that educators use rapid change as an excuse to avoid engaging with AI at all.

Safety and medical use concerns

A subthread criticizes using o1 for medical diagnosis, especially when described as “shockingly close” but only correct part of the time.
Several commenters stress that 60% correctness is unacceptable for diagnosis; people should not treat o1 as a doctor.
Counterpoints: human doctors are also fallible, and LLMs might be helpful as an extra research aid if users remain skeptical and seek real medical professionals for decisions.

Related topics