2025-06-10

First thoughts on o3 pro

Language tangent: “its/it’s” and English irregularities

Thread opens with a joke about the article misusing “it’s,” leading to a long side-discussion.
Some argue the “its/it’s” distinction is an unnecessary exception: speech is unambiguous without it, and noun possessives already overload apostrophes (e.g., “the dog’s tired” vs “the dog’s ball”).
Others defend apostrophes for clarity and see value in rules, but are reminded that human language is patterns, not rigid laws, and evolves.
Discussion touches on historical forms (“it’s” predating “its”, Old English pronouns) and how English roots undermine simple pattern-matching rules.

When o3 Pro might be useful

Many are unsure when it’s worth waiting minutes and paying more versus using fast models.
Proposed use cases: hard debugging (distributed systems, Istio, Wine/SDL joystick bug), large-scale architecture review, niche platforms where lots of context must be supplied, reorganizing personal knowledge bases, or deep critique of contentious threads.
Several users say they reserve slow “reasoning” models for rare, thorny problems; everyday coding stays with faster models.

Strengths, failures, and prompting style

Successes: deep bug-hunting; surfacing overlooked mathematical or methodological ideas; better meta-prompting (having it design the prompt and reasoning process for another model).
Failures: nontrivial code transformations (e.g., pipeline parallel → DDP) still elude multiple frontier models; multi-step research tasks lose the goal and hallucinate progress; Towers of Hanoi solutions break mid-sequence, undermining claims of strong algorithmic reasoning.
Some find o3 Pro’s latency and output limits painful, requiring workarounds (e.g., file download links) and an asynchronous mindset. Others see that same long-form “tasteful” output as its main value.

Comparisons: Gemini, Claude, o-series

No consensus: some find Gemini 2.5 Pro clearly more usable (huge context, fewer visible limits, better for “dump in the repo and ask questions”), others think it’s weaker or inconsistent.
Claude (especially Claude Code) is widely praised for coding workflows and “flow state,” with strong agentic tools in editors.
Some feel o3 Pro isn’t clearly better than o1 Pro and lament o1’s removal from the UI.

Reasoning, AGI, and tools

One camp cites tests like Towers of Hanoi and Apple’s “Illusion of Thinking” to argue these models aren’t genuine general reasoners.
Others reply that LLMs should be judged as orchestrators that use tools (code, search) rather than as bare calculators; expecting perfect internal execution of long algorithms is mis-specified.
There’s disagreement over whether incorrect but structured attempts still count as “reasoning,” and how that relates to AGI timelines.

Agents, memory, and autonomy

People list an emerging stack: long-running reasoning models, code-execution VMs (Codex), web-browsing agents (Operator), “deep research” tools, and phone-call agents for real-world tasks. Multi-hour or even multi-day workflows are seen as possible when orchestrated by external programs.
At the same time, a user reports that o3 Pro still forgets multi-step goals within a single thread and fabricates progress; “autonomy without continuity is not autonomy.”
ChatGPT’s new memory feature is shown to accumulate surprisingly detailed user profiles. Some are unfazed; others see it as confirming privacy worries.

Developer productivity and the “bubble” question

Many experienced developers report huge productivity gains: LLMs write large chunks of working code, handle boilerplate, and make old hobby projects feasible; the human focuses on architecture, testing, and validation.
Others repeatedly get unusable or messy code, see constant errors, and suspect a hype bubble, pointing to energy costs and lack of visible “AI renaissance” outputs.
One pattern emerges: these tools amplify skill—skilled devs with good prompting, incremental workflows, and strong validation get leverage; novices who rely on LLMs without understanding often collapse in interviews.

Societal and economic implications

Some feel humans are increasingly the bottleneck as models improve, anticipating a future where human cognitive labor is largely eclipsed, barring heavy global regulation.
Others push back: current models still err often; humans have real-world access, embodiment, and social roles that are hard to automate.
This leads into a side debate on capitalism, markets, “worth” beyond economic output, class interests, and whether economies must be rethought as AI and automation advance.

Miscellaneous

Observations that reasoning models can feel “socially awkward” compared to chattier ones.
Complaints that OpenAI’s ecosystem (ChatGPT app, Xcode integration, MCP tools) needs better parallelism and that a single run_python tool often works better than many MCP tools.
Some speculate the article itself may read like it was AI-assisted, but this remains unresolved.

Related topics