Bias-Variance Tradeoff
The bias-variance tradeoff explains why model capacity matters. Expected prediction error can be viewed as a combination of bias, variance, and irreducible noise.
-
Bias: Error introduced by approximating a complex real-world process with a simple model (e.g. fitting a straight line to quadratic data). High bias leads to underfitting.
-
Variance: Error introduced by the model's sensitivity to small fluctuations in the training dataset. High variance leads to overfitting, where the model learns the noise rather than the signal.
-
Irreducible Error (): The noise floor inherent in the data generating process, which no model can ever eliminate.
Provides a clear diagnostic roadmap for improving models (e.g., add features if bias is high; add data/regularization if variance is high).
In practice, computing exact bias and variance terms is impossible because the true underlying distribution is unknown.
Intuition
How to think about this algorithm
Imagine a target board at an archery range:
-
Low Bias, Low Variance: Arrows are tightly clustered in the bullseye. (The ideal model: accurate and consistent).
-
High Bias, Low Variance: Arrows are tightly clustered, but far off target in the corner. (Underfitting: consistent, but consistently wrong).
-
Low Bias, High Variance: Arrows are spread widely across the entire board, but their average center is close to the bullseye. (Overfitting: highly inconsistent, chasing individual training noise).
-
High Bias, High Variance: Arrows are scattered and completely off target. (The worst case).
As model capacity increases, training error usually falls. Validation error often falls at first, then rises when the model starts fitting noise. The best model is usually near the bottom of that validation curve.
Bias-Variance Tradeoff (Polynomial Fitting)
Click plot space to add coordinate points. Slide model degree to modify polynomial capacity. Analyze fitting properties and the loss chart.
Degree 1 is linear. Degrees 2-3 are quadratic/cubic. Higher degrees curve exponentially to capture outlier training coordinates.
The Logic
Mathematical core for bias-variance tradeoff
1. Mathematical Decomposition
Let the true data-generating process be , where and (irreducible noise).
If we fit a model on a training set, the expected squared error at a query point is:
Where:
2. Bias Term
The difference between the expected prediction of our model and the true function:
3. Variance Term
The variance of the model's prediction over different training set samples:
Code Example
bias-variance_tradeoff.py · scikit-learn example
1import numpy as np
2from sklearn.preprocessing import PolynomialFeatures
3from sklearn.linear_model import LinearRegression
4from sklearn.pipeline import make_pipeline
5
6# Create a polynomial regression model pipeline
7def get_poly_model(degree):
8 return make_pipeline(
9 PolynomialFeatures(degree=degree),
10 LinearRegression()
11 )
12
13# Fit model on training coordinates (X, y)
14# model = get_poly_model(degree=3)
15# model.fit(X, y)
16Strengths
Provides a clear diagnostic roadmap for improving models (e.g., add features if bias is high; add data/regularization if variance is high).
Separates avoidable modeling error from the irreducible noise floor .
Guides feature selection, model capacity decisions, and validation strategy.
Limitations
In practice, computing exact bias and variance terms is impossible because the true underlying distribution is unknown.
Modern deep learning exhibits a 'double descent' phenomenon where extremely overparameterized models bypass the classical tradeoff and generalize well.
Does not choose hyperparameters directly; validation or cross-validation is still required.
Key Assumptions
Scope conditions and interpretation notes
- 1
The training and test sets are sampled from the identical underlying probability distribution.
- 2
The irreducible noise floor is stationary and independent of model parameters.
References
Books and papers for deeper study
Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning. 2nd edn. New York: Springer.