The Python Package Index Should Get Rid of Its Training Wheels

PyPI’s Core Problem: Bandwidth, Not Just Storage

  • Several comments stress that storage is relatively cheap; bandwidth is the real cost driver and exceeds PSF income at retail cloud prices.
  • Binary artifacts increase storage linearly (not exponentially), but popular packages see tens of millions of downloads, largely from CI/container use, driving bandwidth costs.

Binary Wheels vs Building from Source

  • Many see wheels as essential: they make Python usable for non-experts, Windows users, scientists/ML folks, and avoid nightmare local compiles (NumPy/SciPy, TensorFlow, PyTorch).
  • Skeptics of removing “training wheels” argue build times on weak/IoT devices would be prohibitive and complex C/Fortran/GPU stacks are unrealistic to rebuild locally.
  • Others note that client-side builds could improve security and reduce central bandwidth, but won’t fundamentally change growth curves.

Zig-Based Repeatable Builds Proposal

  • The article’s idea: adopt a Zig-based, hermetic C/C++ build system so PyPI can rebuild wheels on demand or users can build locally; then old binaries can be deleted safely.
  • Supporters say Zig has largely “solved” cross-compilation and unified builds, making such a scheme feasible.
  • Critics counter that:
    • C/C++ build complexity is mostly in autoconf/CMake/meson/etc., not just compilers.
    • Fortran, OpenMP, existing bespoke build systems, and non-open-source packages are big gaps.
    • Fixing the global C/C++ build mess to save PyPI bandwidth is seen as overreach by some.

Handling Large Packages (e.g., TensorFlow)

  • TensorFlow’s multi‑terabyte footprint (due to GPU/CUDA/version matrix and pre-releases) is viewed as “absurd” by some, but others argue it’s worth it if it saves engineers hours.
  • Proposed mitigations:
    • Require very large or corporate-backed projects to self-host binaries while PyPI stores hashes.
    • Allow PyPI to delete rarely used binaries if they’re reproducible.

Mirrors, Caches, and CI Waste

  • Many note wasteful CI patterns: fresh containers constantly redownloading core packages despite pip caching.
  • Suggestions: CI providers should run caching proxies (DevPi, Nexus, Artifactory), or even pay PyPI’s bandwidth directly.
  • Some envision a Nix-like ecosystem where multiple caches and mirrors exist once builds are standardized.

System Package Managers and External Dependencies

  • Some advocate “just use system packages” (Debian, Fedora, Homebrew, etc.) at least for non-Python deps.
  • Counterpoints:
    • Windows and containers/venvs don’t align well with system package managers.
    • Distros can’t keep up with the full PyPI ecosystem or multiple parallel versions.
    • Mixing system and language package managers causes “real horrors.”
  • PEP 725 is highlighted as a way for Python packages to declare external (system-level) dependencies so tools like Nix/Guix/Spack/conda can manage them.

Security and Dependency Culture

  • Some organizations avoid PyPI entirely, maintaining curated mirrors due to malware and hijacking incidents.
  • One view: Python’s extremely easy packaging encourages excessive, low-quality dependencies compared to ecosystems like C++, where higher friction forces more vetting.

User Experience and Fragility

  • Several anecdotes describe confusing breakages from name collisions (e.g., jwt vs PyJWT), missing optional dependencies (e.g., cryptography), and version conflicts.
  • Others respond that:
    • PyPI package names and import names are intentionally decoupled.
    • Optional extras and evolving APIs are normal; reading docs and pinning versions is expected.
  • Overall sentiment: Python packaging “mostly works” but has sharp edges; removing wheels outright is widely opposed, while targeted reform (better metadata, caching, and optional build-from-source paths) gets more support.