2026-01-23

Unrolling the Codex agent loop

Agent loop, reasoning tokens, and context management

Several comments dig into how Codex uses reasoning tokens in the agent loop: they persist within a single “agent turn” (tool-call loop) but are dropped between user turns, which can lose context across related user messages.
Developers work around this by having the model write plans/progress/notes to markdown (or SQL / external stores) as cross-turn “snapshots.”
There is confusion and mild contradiction between docs and behavior of the Responses API about when reasoning is reused; some report that encrypted reasoning items sent back by the client are silently ignored across user turns.
The /responses/compact endpoint and its encrypted_content/compaction items are highlighted as a strong, latent-space compaction mechanism that preserves understanding while freeing context, though it tightly couples you to OpenAI models.

Observability, steering, and history

People want better visibility into Codex’s “thinking” and tool usage so they can interrupt/steer early; some see this as both a UX and cost issue.
Steering can be experimentally enabled, but many still feel real-time “thought” display is insufficient compared to other tools.
External logging/transcript systems (Emacs agent-shell, daemons, OTEL-based tools like codex-plus) are used to preserve full interaction histories and analyze behavior.

CLI UX, performance, and feature gaps

Codex CLI is widely praised for speed, resource usage, and polished UX compared to other CLIs (Claude Code, Gemini CLI), though some still find it frustratingly slow versus ChatGPT web.
Others say Codex is “too slow” and breaks their flow, or gets stuck in loops on simple tasks.
Missing features frequently cited: hooks (to intervene in the harness), checkpoints/forks, and clear diffs/approval flows for file edits. Hooks are described as critical for reducing token use and catching “stupid” agent behavior.
Some users find Codex much more capable for complex coding (GPU pipelines, emulators), while others find it “almost useless” versus Claude Code or Gemini. Experiences are sharply divided.

Open source vs proprietary harnesses

Codex CLI’s open-source Rust implementation is viewed as a major advantage: inspectable internals, learnable agent patterns, and community bugfixes (though feature PRs rarely get merged).
In contrast, Claude Code’s proprietary harness and GitHub-as-issues-only frustrate users who could otherwise fix long-standing bugs or extend behavior.
There’s debate over reverse-engineering proprietary tools, potential ToS violations, and the ethical tension given how LLMs were trained.

Multi-model and ecosystem integration

Codex can be pointed at non-OpenAI models via custom providers, but it’s “annoying” and some features (like compaction) are OpenAI-specific.
Competing tools (Amp, OpenCode, Gemini CLI) are compared: Amp’s read loop and mixed-model strategy (fast vs smart) feels snappier to some; OpenCode is praised for UX; Claude Code for hooks and diffs despite stability issues.

Related topics