ChatGPT Containers can now run bash, pip/npm install packages and download files

New container capabilities & language support

  • ChatGPT’s “containers” can now run bash, install packages via pip/npm, download files, and execute multiple languages (Node, Ruby, Perl, PHP, Go, Java, Swift, Kotlin, C/C++).
  • Feature seems available even to free users, but heavily rate-limited; paid users report more stable access.
  • Some minor rough edges: npm auth misconfigurations, needing to explicitly say “in the container” to avoid getting only instructions.
  • Users have successfully installed additional tooling (e.g., deb packages, Ruby gems) inside the sandbox.

Dependencies, packages, and LLM-written code

  • One thread questions whether npm/pip-style dependency trees still make sense if LLMs can generate needed code on demand.
  • Pushback: serious libraries (NumPy, pandas, scikit-learn, BLAS, crypto, etc.) encapsulate heavy correctness and performance work that is not realistic to “regenerate” every time.
  • Concerns about “AI-slop” dependencies vs. vetted, human-reviewed libraries and supply-chain attacks (both through public registries and inside containers).
  • Some users now inline tiny modules directly into projects to avoid dependency bloat and npm/pip-jacking.

Static vs dynamic languages in the LLM era

  • Big subthread on whether dynamic languages’ advantage shrinks when LLMs write most of the code.
  • Many report moving prototypes/CLI tools from Python/JS to Go or Rust, arguing:
    • Compiler/type errors are a powerful feedback loop for agents.
    • Static constraints reduce “category errors” (types, lifetimes, concurrency, memory safety).
    • Go’s simple syntax, tooling, and standard library pair well with coding agents.
  • Counterpoints:
    • Python/TypeScript still give shorter, more legible code for humans reviewing AI output.
    • LLMs perform worse in less-popular or niche languages; training data and ecosystem maturity still matter.
    • Some suggest a pipeline: prototype in Python, then use LLMs to port to Rust/Go; others question why not write Rust/Go directly.

Security, isolation, and compute limits

  • Users ask if code runs “as root” and how isolated it really is.
  • Responses indicate:
    • No sudo/apt; installations via pip/npm in a restricted user environment.
    • Containers reportedly use gVisor and other hardening techniques, but skepticism remains due to frequent container escapes.
  • CPU/RAM observations: environment reports many cores (e.g., 56) but likely via shared host topology and cgroup throttling rather than dedicated compute.
  • Infosec commenters expect a surge in sandbox escapes, supply-chain attacks, and generally more insecure, AI-generated systems.

Agents, dev environments, and tool ecosystems

  • Several note this move positions ChatGPT as a full “remote dev box”, potentially eroding demand for local environments and some SaaS sandboxes.
  • Interest in persistent or ephemeral virtual dev environments: some tools (Claude Code for web, sprites-like systems, custom VM offerings) are already experimenting here, though stability is mixed.
  • Linux tool access (ffmpeg, ImageMagick, file/magic, etc.) enables agents to solve “real” system tasks (e.g., image/video transformations, print-preflight checks) more reliably than pure model reasoning.

LLM usage, “vibecoding”, and quality

  • Strong disagreement over the claim that “most code is now written by LLMs”:
    • Some engineers (including at large companies) report 20–80% of new code authored by agents, especially boilerplate, tests, and frontends.
    • Others say LLM code in production is still rare in their domains, or limited to assistance rather than full authorship.
  • Advocates argue:
    • Human time is better spent on problem selection, design, and verification than hand-writing boilerplate.
    • With good specs, tests, and review, large refactors and greenfield projects can be done dramatically faster.
  • Skeptics stress:
    • “Vibecoded” systems risk being fragile, insecure, and poorly understood by their nominal owners.
    • Most existing human-written code is already low quality; training on it plus weak specs may amplify garbage.
    • Customers may not yet see clear end-user benefits, especially where organizational factors dominate quality outcomes.

Other models & regressions

  • Comparisons:
    • Some prefer ChatGPT for search and these new containers; others favor Claude Code’s agentic behavior and Gemini for search.
  • Reports that Gemini recently lost (or broke) its ability to actually execute Python/JS despite claiming to do so, undermining trust in its “run code” feature.