Python is not a great language for data science

Scope and thesis of the article

  • Many commenters think the piece is well written but under-argued: the main post mostly contrasts Python vs R code snippets and only fully states its thesis in a sequel (Python’s issues for data science: reference semantics, no built-in missing values, no built-in vectorization, no non‑standard evaluation).
  • Some find the examples weak or contrived (e.g., manually computing means/SDs instead of using statistics or NumPy), arguing this exaggerates Python’s shortcomings.

Why Python dominates data science

  • Strong consensus that Python’s success is driven by ecosystem and network effects, not inherent suitability:
    • Huge library support (NumPy, pandas/Polars, scikit‑learn, PyTorch, Jupyter, etc.).
    • General‑purpose “glue” language: OK at scraping, file and format handling, orchestration, and integration with databases, C/C++/Fortran, GPUs.
    • Easy for non‑programmers and cross‑discipline teams; code is widely readable and reviewable.
  • Several note that hiring, teaching, and production engineering all strongly favor Python; R, SAS, Matlab, etc. are seen as niche or expensive.

R vs Python in practice

  • Many practitioners use both:
    • R (especially tidyverse/data.table + ggplot) favored for exploratory analysis, tabular wrangling, and plotting; code often shorter and closer to statistical thinking.
    • Python preferred for “logistics”: file juggling, large‑scale pipelines, reproducible deployments, and integration into larger software systems.
  • Productionizing R is widely described as painful; common pattern is prototype in R, rewrite in another language.
  • Others push back that R has serious quirks (non‑standard evaluation, indexing oddities, silent NA behaviors) and can be fragile for larger software.

Tables, dataframes, and language design

  • A big subthread argues the real problem is that mainstream languages don’t treat tables/dataframes as first‑class citizens; instead users learn mini‑languages (pandas, dplyr, Polars).
  • Suggestions and examples span SQL, q/kdb, Clojure, Rye, Lil, Nushell, APL, Matlab, Julia, Fortran, and Excel‑style tools.
  • Some think SQL + tools like DuckDB are a cleaner core for tabular work, with Python or R around the edges; others prefer staying in a dataframe‑centric DSL.

Broader language comparisons and “good enough”

  • Multiple commenters claim no current language is truly “great” for data science; Python and R are both compromises.
  • Julia, Clojure, Kotlin, Nim, SAS, Matlab, and even shell pipelines are mentioned as promising or domain‑strong but lacking Python’s momentum.
  • Common conclusion: Python isn’t the best for data science, but it’s “good enough” at nearly everything and wins on ubiquity, tooling, and ecosystem.