2026-01-26

ChatGPT Containers can now run bash, pip/npm install packages and download files

New container capabilities & language support

ChatGPT’s “containers” can now run bash, install packages via pip/npm, download files, and execute multiple languages (Node, Ruby, Perl, PHP, Go, Java, Swift, Kotlin, C/C++).
Feature seems available even to free users, but heavily rate-limited; paid users report more stable access.
Some minor rough edges: npm auth misconfigurations, needing to explicitly say “in the container” to avoid getting only instructions.
Users have successfully installed additional tooling (e.g., deb packages, Ruby gems) inside the sandbox.

Dependencies, packages, and LLM-written code

One thread questions whether npm/pip-style dependency trees still make sense if LLMs can generate needed code on demand.
Pushback: serious libraries (NumPy, pandas, scikit-learn, BLAS, crypto, etc.) encapsulate heavy correctness and performance work that is not realistic to “regenerate” every time.
Concerns about “AI-slop” dependencies vs. vetted, human-reviewed libraries and supply-chain attacks (both through public registries and inside containers).
Some users now inline tiny modules directly into projects to avoid dependency bloat and npm/pip-jacking.

Static vs dynamic languages in the LLM era

Big subthread on whether dynamic languages’ advantage shrinks when LLMs write most of the code.
Many report moving prototypes/CLI tools from Python/JS to Go or Rust, arguing:
- Compiler/type errors are a powerful feedback loop for agents.
- Static constraints reduce “category errors” (types, lifetimes, concurrency, memory safety).
- Go’s simple syntax, tooling, and standard library pair well with coding agents.
Counterpoints:
- Python/TypeScript still give shorter, more legible code for humans reviewing AI output.
- LLMs perform worse in less-popular or niche languages; training data and ecosystem maturity still matter.
- Some suggest a pipeline: prototype in Python, then use LLMs to port to Rust/Go; others question why not write Rust/Go directly.

Security, isolation, and compute limits

Users ask if code runs “as root” and how isolated it really is.
Responses indicate:
- No sudo/apt; installations via pip/npm in a restricted user environment.
- Containers reportedly use gVisor and other hardening techniques, but skepticism remains due to frequent container escapes.
CPU/RAM observations: environment reports many cores (e.g., 56) but likely via shared host topology and cgroup throttling rather than dedicated compute.
Infosec commenters expect a surge in sandbox escapes, supply-chain attacks, and generally more insecure, AI-generated systems.

Agents, dev environments, and tool ecosystems

Several note this move positions ChatGPT as a full “remote dev box”, potentially eroding demand for local environments and some SaaS sandboxes.
Interest in persistent or ephemeral virtual dev environments: some tools (Claude Code for web, sprites-like systems, custom VM offerings) are already experimenting here, though stability is mixed.
Linux tool access (ffmpeg, ImageMagick, file/magic, etc.) enables agents to solve “real” system tasks (e.g., image/video transformations, print-preflight checks) more reliably than pure model reasoning.

LLM usage, “vibecoding”, and quality

Strong disagreement over the claim that “most code is now written by LLMs”:
- Some engineers (including at large companies) report 20–80% of new code authored by agents, especially boilerplate, tests, and frontends.
- Others say LLM code in production is still rare in their domains, or limited to assistance rather than full authorship.
Advocates argue:
- Human time is better spent on problem selection, design, and verification than hand-writing boilerplate.
- With good specs, tests, and review, large refactors and greenfield projects can be done dramatically faster.
Skeptics stress:
- “Vibecoded” systems risk being fragile, insecure, and poorly understood by their nominal owners.
- Most existing human-written code is already low quality; training on it plus weak specs may amplify garbage.
- Customers may not yet see clear end-user benefits, especially where organizational factors dominate quality outcomes.

Other models & regressions

Comparisons:
- Some prefer ChatGPT for search and these new containers; others favor Claude Code’s agentic behavior and Gemini for search.
Reports that Gemini recently lost (or broke) its ability to actually execute Python/JS despite claiming to do so, undermining trust in its “run code” feature.

Related topics