Elixir/Erlang Hot Swapping Code (2016)

Role and Reliability of Hot Swapping on the BEAM

  • Erlang/Elixir’s hot code loading is widely acknowledged as real and powerful, designed for high‑uptime, long‑running, stateful systems (e.g., telecom, healthcare, persistent connections).
  • Several commenters stress Erlang’s overall reliability focus (fault tolerance, supervisors, “let it crash”) but distinguish that from the reliability of hot swapping itself, which they see as more fragile.
  • Many argue that for most web apps, simpler restart‑based or blue‑green/container deployments are adequate and safer.

Practical Use vs. Complexity

  • Some teams report heavy production use: e.g., ~99% of deployments via hot reload with almost no restarts, or frequent zero‑disruption Elixir patches; others only use it in emergencies while CI/CD pipelines run.
  • Others found that “zero downtime via hot reload” required large extra effort: explicit state migration code (code_change), careful supervision design, and thorough testing.
  • Complexity points raised:
    • Two versions of a module can coexist; internal vs fully qualified calls can hit different versions.
    • State schemas and message formats must be forward/backward compatible during transitions.
    • Bidirectional migrations between specific versions are needed; bugs here can kill processes or cause restart loops.

Distributed Systems and Atomicity

  • Commenters note you cannot achieve truly atomic hot upgrades across nodes, or even fully within one VM; schedulers and processes see new code at slightly different times.
  • Recommended patterns focus on progressive rollout and compatibility steps: deploy code that handles old+new requests, then update clients/peers, then remove legacy paths.

Comparisons to Other Ecosystems

  • PHP “edit in prod” nostalgia surfaces; acknowledged as workable for small, simple systems and teams, but unsafe at scale.
  • Many prefer container‑level rolling/blue‑green deployments, arguing they already require resilience to instance churn. Others counter that some domains (telephony, MMOs, drones, interactive music systems) truly benefit from in‑place state‑preserving updates.
  • Similar capabilities are mentioned in Common Lisp, Smalltalk, some Lisps/Clojure workflows, JVM hot reload, MUD engines, and game servers using data‑ or script‑driven hotfixes.

Community Trajectory

  • In Elixir, hot‑upgrade enthusiasm (e.g., older Distillery/relup workflows) has cooled; newer tooling (mix releases) downplays it due to complexity.
  • Consensus: a powerful, niche feature—critical when you truly need continuous, stateful uptime, but overkill and risky for most everyday services.