Linear & Logistic Regression
Linear and logistic regression are often the first models worth fitting because they expose the entire machine-learning workflow without hiding it behind many layers. You choose a simple model family, measure its mistakes with a loss function, and solve for parameters that make that loss small.
Linear regression predicts a continuous value such as price, demand, temperature, or risk score. Logistic regression predicts the probability of a class such as churn/no churn, fraud/not fraud, pass/fail, or disease/no disease. The important shared idea is the linear score:
Linear regression uses that score directly as the prediction. Logistic regression passes it through a sigmoid so the output is constrained to the interval .
Where is it used?
Use linear regression when the target is numeric and roughly changes in a straight-line way with the features. Use logistic regression when the target is categorical but you still want interpretable coefficients and probabilities. In practice, both are strong baselines: if a complex model only barely beats them, the extra complexity may not be buying much.
Incredibly easy to understand. You can look at the final weights and know exactly how much each feature influenced the prediction.
It assumes the relationship between variables is a perfectly straight line. If the real world is curved or complex, these models will fail.
Intuition
How to think about this algorithm
In the interactive lab, linear regression is the ruler problem. Each point has a vertical residual: the gap between the observed value and the line's prediction. Squaring those gaps makes large misses expensive, so one outlier can visibly pull the fitted line. That is not a UI trick; it is exactly what the squared-error objective asks the model to do.
Logistic regression is the boundary problem. A linear score says which side of a boundary a point sits on. The sigmoid converts distance from that boundary into confidence: far on one side means probability near 1, far on the other side means probability near 0, and the boundary itself is the uncertain region around 0.5.
The useful mental model is this: regression is not "drawing a line"; it is choosing parameters that minimize a specific loss under a specific assumption about the shape of the relationship.
Method of Least Squares Regression
Manually adjust slope and intercept parameters. Residual squares visualize error variance directly. Minimise total square areas to solve optimal parameters.
The Logic
Mathematical core for linear & logistic regression
1. Linear regression model
For a row of features , the model predicts:
The residual is the signed error:
Ordinary least squares chooses parameters that minimize mean squared error:
With a full-rank design matrix, the closed-form solution is:
2. Logistic regression model
For binary labels , logistic regression starts with the same linear score:
Then it maps the score to a probability:
The decision boundary at threshold is where:
For the common threshold , this simplifies to:
3. Cross-entropy loss
Squared error is not the right objective for class probabilities. Logistic regression uses binary cross-entropy:
Its gradient has a compact form:
This is why the algorithm is so useful pedagogically: the update direction is literally driven by predicted probability minus observed label.
Code Example
linear_&_logistic_regression.py · scikit-learn example
1import numpy as np
2from sklearn.linear_model import LinearRegression, LogisticRegression
3from sklearn.metrics import mean_squared_error, log_loss
4
5# Linear regression: predict a numeric score from one feature.
6hours = np.array([[1.1], [2.1], [3.0], [4.0], [5.2], [6.1], [7.0], [8.4]])
7score = np.array([2.2, 2.8, 4.1, 4.6, 5.9, 6.7, 7.6, 8.7])
8
9lin = LinearRegression()
10lin.fit(hours, score)
11score_hat = lin.predict(hours)
12
13print("Linear slope:", lin.coef_[0])
14print("Linear intercept:", lin.intercept_)
15print("MSE:", mean_squared_error(score, score_hat))
16
17# Logistic regression: predict probability of passing from two features.
18# Features: [hours studied, practice-test average]
19X = np.array([
20 [1.2, 2.0],
21 [2.0, 3.2],
22 [2.8, 2.4],
23 [3.6, 4.0],
24 [5.2, 5.1],
25 [6.4, 5.8],
26 [7.1, 7.4],
27 [8.2, 6.8],
28 [8.8, 8.5],
29])
30y = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1])
31
32clf = LogisticRegression()
33clf.fit(X, y)
34prob = clf.predict_proba(X)[:, 1]
35
36print("Logistic weights:", clf.coef_[0])
37print("Logistic intercept:", clf.intercept_[0])
38print("Cross-entropy:", log_loss(y, prob))
39print("New example pass probability:", clf.predict_proba([[5.5, 6.0]])[0, 1])Strengths
Incredibly easy to understand. You can look at the final weights and know exactly how much each feature influenced the prediction.
Extremely fast to train, making it the perfect baseline model to try before moving on to complex neural networks.
Limitations
It assumes the relationship between variables is a perfectly straight line. If the real world is curved or complex, these models will fail.
Highly sensitive to outliers. A single extreme data point can drag the entire line out of place.
If two of your input features are highly correlated (like 'years alive' and 'age'), the math can break down.
Key Assumptions
Scope conditions and interpretation notes
- 1
The conditional mean of the target is approximately linear in the features.
- 2
Residual variance is roughly constant across the prediction range.
- 3
Residuals are not strongly correlated after accounting for the features.
References
Books and papers for deeper study
Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd edn. New York: Springer.