2024-06-30

Well-known paradox of R-squared is still buggin me

Is There Really a Paradox?

Many commenters argue there is no paradox.
State slightly shifts the chance of voting for one party (e.g., 0.45 → 0.55), so it has limited predictive power for individual votes.
R² is low because an intercept-only model (everyone ≈ 50/50) already explains most of the variance; state adds only a small improvement.
At the aggregate (state total) level, state fully determines the outcome in the toy setup, but R² is defined at the individual level here.

How R² Behaves in This Setup

R² is framed as the relative reduction in mean squared error (MSE) compared to always predicting the mean (0.5).
Baseline MSE is 0.25; using state reduces it only slightly (to ~0.2475), hence R² ≈ 0.01.
Some note that squaring makes small effects look smaller; using correlation R instead of R² would give 0.1, which feels more intuitive to some.
Others stress R² has nice variance-decomposition properties and is best understood as “MSE rescaled.”

Binary Outcomes: Regression vs Classification

Several argue linear regression and standard R² are ill-suited for binary or categorical outcomes; prefer logistic/probit models, cross-entropy, Brier score, or pseudo-R² measures.
Counterpoint: with a single binary predictor, linear regression gives the correct group means (0.45, 0.55), and R² is a valid lens on prediction error, even if not ideal for classification performance.
Disagreement appears between viewing regression as a fitting mechanism vs R² as a classification quality metric.

Modeling Choices and Variance Decomposition

Some suggest treating state as a nominal factor or using mixed models; others say that’s unnecessary given symmetry and identical within-state variance.
ANOVA-style reasoning: within-state variance is large and between-state mean differences are small, so the grouping by state contributes little explanatory power—consistent with low R².

Interpretation, Effect Size, and Intuition

Commenters distinguish statistical significance from practical relevance: small R² can coexist with “big” effects in some domains (e.g., genetics).
Intuition is distorted by sample size: 55–45 with millions of votes feels huge, but at the individual level it’s still close to a coin flip.

Tangent: Voting Systems and Arrow’s Theorem

Some shift to electoral mechanics: in first-past-the-post, a small shift in vote share can flip 100% of representation.
There is debate over whether Arrow’s impossibility theorem applies to plurality/FPTP and over the merits of ranked vs other voting systems.

Related topics