First thoughts on o3 pro
Language tangent: “its/it’s” and English irregularities
- Thread opens with a joke about the article misusing “it’s,” leading to a long side-discussion.
- Some argue the “its/it’s” distinction is an unnecessary exception: speech is unambiguous without it, and noun possessives already overload apostrophes (e.g., “the dog’s tired” vs “the dog’s ball”).
- Others defend apostrophes for clarity and see value in rules, but are reminded that human language is patterns, not rigid laws, and evolves.
- Discussion touches on historical forms (“it’s” predating “its”, Old English pronouns) and how English roots undermine simple pattern-matching rules.
When o3 Pro might be useful
- Many are unsure when it’s worth waiting minutes and paying more versus using fast models.
- Proposed use cases: hard debugging (distributed systems, Istio, Wine/SDL joystick bug), large-scale architecture review, niche platforms where lots of context must be supplied, reorganizing personal knowledge bases, or deep critique of contentious threads.
- Several users say they reserve slow “reasoning” models for rare, thorny problems; everyday coding stays with faster models.
Strengths, failures, and prompting style
- Successes: deep bug-hunting; surfacing overlooked mathematical or methodological ideas; better meta-prompting (having it design the prompt and reasoning process for another model).
- Failures: nontrivial code transformations (e.g., pipeline parallel → DDP) still elude multiple frontier models; multi-step research tasks lose the goal and hallucinate progress; Towers of Hanoi solutions break mid-sequence, undermining claims of strong algorithmic reasoning.
- Some find o3 Pro’s latency and output limits painful, requiring workarounds (e.g., file download links) and an asynchronous mindset. Others see that same long-form “tasteful” output as its main value.
Comparisons: Gemini, Claude, o-series
- No consensus: some find Gemini 2.5 Pro clearly more usable (huge context, fewer visible limits, better for “dump in the repo and ask questions”), others think it’s weaker or inconsistent.
- Claude (especially Claude Code) is widely praised for coding workflows and “flow state,” with strong agentic tools in editors.
- Some feel o3 Pro isn’t clearly better than o1 Pro and lament o1’s removal from the UI.
Reasoning, AGI, and tools
- One camp cites tests like Towers of Hanoi and Apple’s “Illusion of Thinking” to argue these models aren’t genuine general reasoners.
- Others reply that LLMs should be judged as orchestrators that use tools (code, search) rather than as bare calculators; expecting perfect internal execution of long algorithms is mis-specified.
- There’s disagreement over whether incorrect but structured attempts still count as “reasoning,” and how that relates to AGI timelines.
Agents, memory, and autonomy
- People list an emerging stack: long-running reasoning models, code-execution VMs (Codex), web-browsing agents (Operator), “deep research” tools, and phone-call agents for real-world tasks. Multi-hour or even multi-day workflows are seen as possible when orchestrated by external programs.
- At the same time, a user reports that o3 Pro still forgets multi-step goals within a single thread and fabricates progress; “autonomy without continuity is not autonomy.”
- ChatGPT’s new memory feature is shown to accumulate surprisingly detailed user profiles. Some are unfazed; others see it as confirming privacy worries.
Developer productivity and the “bubble” question
- Many experienced developers report huge productivity gains: LLMs write large chunks of working code, handle boilerplate, and make old hobby projects feasible; the human focuses on architecture, testing, and validation.
- Others repeatedly get unusable or messy code, see constant errors, and suspect a hype bubble, pointing to energy costs and lack of visible “AI renaissance” outputs.
- One pattern emerges: these tools amplify skill—skilled devs with good prompting, incremental workflows, and strong validation get leverage; novices who rely on LLMs without understanding often collapse in interviews.
Societal and economic implications
- Some feel humans are increasingly the bottleneck as models improve, anticipating a future where human cognitive labor is largely eclipsed, barring heavy global regulation.
- Others push back: current models still err often; humans have real-world access, embodiment, and social roles that are hard to automate.
- This leads into a side debate on capitalism, markets, “worth” beyond economic output, class interests, and whether economies must be rethought as AI and automation advance.
Miscellaneous
- Observations that reasoning models can feel “socially awkward” compared to chattier ones.
- Complaints that OpenAI’s ecosystem (ChatGPT app, Xcode integration, MCP tools) needs better parallelism and that a single
run_pythontool often works better than many MCP tools. - Some speculate the article itself may read like it was AI-assisted, but this remains unresolved.