Elixir/Erlang Hot Swapping Code (2016)
Role and Reliability of Hot Swapping on the BEAM
- Erlang/Elixir’s hot code loading is widely acknowledged as real and powerful, designed for high‑uptime, long‑running, stateful systems (e.g., telecom, healthcare, persistent connections).
- Several commenters stress Erlang’s overall reliability focus (fault tolerance, supervisors, “let it crash”) but distinguish that from the reliability of hot swapping itself, which they see as more fragile.
- Many argue that for most web apps, simpler restart‑based or blue‑green/container deployments are adequate and safer.
Practical Use vs. Complexity
- Some teams report heavy production use: e.g., ~99% of deployments via hot reload with almost no restarts, or frequent zero‑disruption Elixir patches; others only use it in emergencies while CI/CD pipelines run.
- Others found that “zero downtime via hot reload” required large extra effort: explicit state migration code (
code_change), careful supervision design, and thorough testing. - Complexity points raised:
- Two versions of a module can coexist; internal vs fully qualified calls can hit different versions.
- State schemas and message formats must be forward/backward compatible during transitions.
- Bidirectional migrations between specific versions are needed; bugs here can kill processes or cause restart loops.
Distributed Systems and Atomicity
- Commenters note you cannot achieve truly atomic hot upgrades across nodes, or even fully within one VM; schedulers and processes see new code at slightly different times.
- Recommended patterns focus on progressive rollout and compatibility steps: deploy code that handles old+new requests, then update clients/peers, then remove legacy paths.
Comparisons to Other Ecosystems
- PHP “edit in prod” nostalgia surfaces; acknowledged as workable for small, simple systems and teams, but unsafe at scale.
- Many prefer container‑level rolling/blue‑green deployments, arguing they already require resilience to instance churn. Others counter that some domains (telephony, MMOs, drones, interactive music systems) truly benefit from in‑place state‑preserving updates.
- Similar capabilities are mentioned in Common Lisp, Smalltalk, some Lisps/Clojure workflows, JVM hot reload, MUD engines, and game servers using data‑ or script‑driven hotfixes.
Community Trajectory
- In Elixir, hot‑upgrade enthusiasm (e.g., older Distillery/relup workflows) has cooled; newer tooling (mix releases) downplays it due to complexity.
- Consensus: a powerful, niche feature—critical when you truly need continuous, stateful uptime, but overkill and risky for most everyday services.