2025-05-16

A Research Preview of Codex

Naming and product scope

Confusion over the name “Codex” since it was previously a model and is also an open‑source “codex-cli” tool; people expect this to confuse both humans and LLMs.
Some see Codex as a managed, cloud version of the new CLI agent with GitHub integration and microVMs; others wish it supported GitLab or arbitrary git remotes.

Effectiveness and real‑world workflows

Many report LLMs are great for boilerplate, scripts, refactors, and meta‑programming (e.g., C# source generators, Python→C codegen), but unreliable on complex/novel tasks or niche languages.
Strong consensus that you must decompose work, prompt precisely, enforce tests, and review every change; expecting “write an app end‑to‑end” to work is seen as unrealistic.
Several describe using agents as “infinite junior devs”: good at scaffolding, but still requiring substantial cleanup and architectural guidance.

Use cases where Codex‑style agents shine

Semi‑structured, repetitive work: upgrading dependencies, adding tests, small refactors, internal tools, and “hyper‑narrow” apps for specific business workflows.
Parallel task execution is valued for batching many small edits/tests that would otherwise be tedious; task runtimes of minutes make concurrency useful.
Some hope Codex can find nontrivial bugs, though current demos look more superficial; skepticism about “vibe coding” without deep validation.

Privacy, IP, and training

Repeated questions about whether uploaded repos are used for training; mention of an explicit opt‑out toggle, but strong skepticism about trusting any such promise.
Split views: some say most company code is worthless to others and SaaS access is standard; others stress trade secrets, third‑party licenses, and security risk.

Non‑engineers using agents

Speculation that PMs, legal, or compliance could use Codex to propose PRs, with engineers doing final review and testing.
Counterargument: if non‑devs can’t run and interpret the app, devs end up doing nearly all of the real work (validation, debugging, shepherding changes).

Impact on careers and juniors

Anxiety that high‑paying SWE work and especially junior roles are shrinking; difficulty for new grads is widely reported.
Debate over whether automation will increase total demand (Jevons‑style) vs. permanently oversupply developers.
Some argue future engineers will be more like architects/PMs of agents; others mourn loss of “tinkering” and warn of a broken training pipeline.

Benchmarks and model quality

Codex reportedly improves SWE‑bench Verified only by a few points over o3, raising questions about diminishing returns and possible “benchmaxxing”.
Observations that LLM performance varies sharply by language (Python strong, others weaker); real‑world usefulness heavily depends on stack.

Open source, infra, and environments

Interest in open‑source Codex‑like systems (OpenHands, prior GitHub Actions tools) and microVM/desktop sandboxes targeted at agents.
Some open‑source maintainers are reconsidering contributing, feeling their work trains systems that undercut them.

Safety and misuse

Concern about “neutered” models blocking malware; others note jailbreaks are easy and that restrictions mainly hit the public, not powerful actors.
Broader unease about opaque corporate control over what users can do with such general‑purpose tools.

Pricing, rollout, and UX

Frustration that Codex is gated behind an expensive Pro tier and “rolling out” slowly; multiple reports of being on Pro but still redirected to upsell pages.
Complaints about confusing setup (e.g., where to define setup scripts, secrets behavior) and lack of real support channels.

Related topics