The Python Package Index Should Get Rid of Its Training Wheels
PyPI’s Core Problem: Bandwidth, Not Just Storage
- Several comments stress that storage is relatively cheap; bandwidth is the real cost driver and exceeds PSF income at retail cloud prices.
- Binary artifacts increase storage linearly (not exponentially), but popular packages see tens of millions of downloads, largely from CI/container use, driving bandwidth costs.
Binary Wheels vs Building from Source
- Many see wheels as essential: they make Python usable for non-experts, Windows users, scientists/ML folks, and avoid nightmare local compiles (NumPy/SciPy, TensorFlow, PyTorch).
- Skeptics of removing “training wheels” argue build times on weak/IoT devices would be prohibitive and complex C/Fortran/GPU stacks are unrealistic to rebuild locally.
- Others note that client-side builds could improve security and reduce central bandwidth, but won’t fundamentally change growth curves.
Zig-Based Repeatable Builds Proposal
- The article’s idea: adopt a Zig-based, hermetic C/C++ build system so PyPI can rebuild wheels on demand or users can build locally; then old binaries can be deleted safely.
- Supporters say Zig has largely “solved” cross-compilation and unified builds, making such a scheme feasible.
- Critics counter that:
- C/C++ build complexity is mostly in autoconf/CMake/meson/etc., not just compilers.
- Fortran, OpenMP, existing bespoke build systems, and non-open-source packages are big gaps.
- Fixing the global C/C++ build mess to save PyPI bandwidth is seen as overreach by some.
Handling Large Packages (e.g., TensorFlow)
- TensorFlow’s multi‑terabyte footprint (due to GPU/CUDA/version matrix and pre-releases) is viewed as “absurd” by some, but others argue it’s worth it if it saves engineers hours.
- Proposed mitigations:
- Require very large or corporate-backed projects to self-host binaries while PyPI stores hashes.
- Allow PyPI to delete rarely used binaries if they’re reproducible.
Mirrors, Caches, and CI Waste
- Many note wasteful CI patterns: fresh containers constantly redownloading core packages despite pip caching.
- Suggestions: CI providers should run caching proxies (DevPi, Nexus, Artifactory), or even pay PyPI’s bandwidth directly.
- Some envision a Nix-like ecosystem where multiple caches and mirrors exist once builds are standardized.
System Package Managers and External Dependencies
- Some advocate “just use system packages” (Debian, Fedora, Homebrew, etc.) at least for non-Python deps.
- Counterpoints:
- Windows and containers/venvs don’t align well with system package managers.
- Distros can’t keep up with the full PyPI ecosystem or multiple parallel versions.
- Mixing system and language package managers causes “real horrors.”
- PEP 725 is highlighted as a way for Python packages to declare external (system-level) dependencies so tools like Nix/Guix/Spack/conda can manage them.
Security and Dependency Culture
- Some organizations avoid PyPI entirely, maintaining curated mirrors due to malware and hijacking incidents.
- One view: Python’s extremely easy packaging encourages excessive, low-quality dependencies compared to ecosystems like C++, where higher friction forces more vetting.
User Experience and Fragility
- Several anecdotes describe confusing breakages from name collisions (e.g.,
jwtvsPyJWT), missing optional dependencies (e.g.,cryptography), and version conflicts. - Others respond that:
- PyPI package names and import names are intentionally decoupled.
- Optional extras and evolving APIs are normal; reading docs and pinning versions is expected.
- Overall sentiment: Python packaging “mostly works” but has sharp edges; removing wheels outright is widely opposed, while targeted reform (better metadata, caching, and optional build-from-source paths) gets more support.