Well-known paradox of R-squared is still buggin me
Is There Really a Paradox?
- Many commenters argue there is no paradox.
- State slightly shifts the chance of voting for one party (e.g., 0.45 → 0.55), so it has limited predictive power for individual votes.
- R² is low because an intercept-only model (everyone ≈ 50/50) already explains most of the variance; state adds only a small improvement.
- At the aggregate (state total) level, state fully determines the outcome in the toy setup, but R² is defined at the individual level here.
How R² Behaves in This Setup
- R² is framed as the relative reduction in mean squared error (MSE) compared to always predicting the mean (0.5).
- Baseline MSE is 0.25; using state reduces it only slightly (to ~0.2475), hence R² ≈ 0.01.
- Some note that squaring makes small effects look smaller; using correlation R instead of R² would give 0.1, which feels more intuitive to some.
- Others stress R² has nice variance-decomposition properties and is best understood as “MSE rescaled.”
Binary Outcomes: Regression vs Classification
- Several argue linear regression and standard R² are ill-suited for binary or categorical outcomes; prefer logistic/probit models, cross-entropy, Brier score, or pseudo-R² measures.
- Counterpoint: with a single binary predictor, linear regression gives the correct group means (0.45, 0.55), and R² is a valid lens on prediction error, even if not ideal for classification performance.
- Disagreement appears between viewing regression as a fitting mechanism vs R² as a classification quality metric.
Modeling Choices and Variance Decomposition
- Some suggest treating state as a nominal factor or using mixed models; others say that’s unnecessary given symmetry and identical within-state variance.
- ANOVA-style reasoning: within-state variance is large and between-state mean differences are small, so the grouping by state contributes little explanatory power—consistent with low R².
Interpretation, Effect Size, and Intuition
- Commenters distinguish statistical significance from practical relevance: small R² can coexist with “big” effects in some domains (e.g., genetics).
- Intuition is distorted by sample size: 55–45 with millions of votes feels huge, but at the individual level it’s still close to a coin flip.
Tangent: Voting Systems and Arrow’s Theorem
- Some shift to electoral mechanics: in first-past-the-post, a small shift in vote share can flip 100% of representation.
- There is debate over whether Arrow’s impossibility theorem applies to plurality/FPTP and over the merits of ranked vs other voting systems.