Support Vector Machines
Support Vector Machines (SVMs) are margin-based classifiers. If many separating lines classify the training data correctly, an SVM chooses the one with the largest margin: the greatest distance to the closest training examples.
Those closest examples are the support vectors. They matter because moving them changes the boundary, while points far from the margin often have no effect on the fitted classifier.
Where is it used?
SVMs are useful when the dataset is medium-sized and the feature representation is strong. They are historically common in text classification, handwritten digit recognition, and biological classification tasks.
The convex training objective has a global optimum, which makes the optimization behavior easier to reason about than many neural models.
Kernel SVMs can be slow and memory-heavy on very large datasets because they depend on many pairwise similarities.
Intuition
How to think about this algorithm
The intuition is geometric. A boundary that barely separates the training data is fragile: a small measurement error can flip the prediction. A wider margin is more stable because new points can move slightly without crossing the boundary.
When a straight boundary is not enough, kernels let the SVM compute dot products in a richer feature space without explicitly constructing every transformed feature. In the original input space, that can produce curved decision boundaries.
Hyperplane Margins & Support Vectors (SVM)
Click Mode to choose Class A or B, then click the grid workspace to place custom nodes. Adjust C to allow margin slack vs strict separation.
Lower C increases margin width (tolerating classification errors). Higher C enforces strict separation boundaries.
The Logic
Mathematical core for support vector machines
1. The Margin
For separable data, the hard-margin SVM solves:
2. The Kernel Trick
Real data is rarely perfectly separable, so practical SVMs use slack variables and a penalty parameter to trade off margin width against classification mistakes. Kernels replace inner products with a function :
The most popular kernel is the Radial Basis Function (RBF), which measures the distance between two points and and creates smooth, curved boundaries:
Code Example
support_vector_machines.py · scikit-learn example
1import numpy as np
2from sklearn.svm import SVC
3
4X = np.array([[1, 2], [2, 3], [1.5, 1.8], [8, 8], [9, 10], [8.5, 9.2]])
5y = np.array([0, 0, 0, 1, 1, 1])
6
7clf = SVC(kernel='rbf', C=1.0)
8clf.fit(X, y)
9
10print(f"Number of Computed Support Vectors: {clf.n_support_}")Strengths
The convex training objective has a global optimum, which makes the optimization behavior easier to reason about than many neural models.
Kernels allow nonlinear decision boundaries while keeping the optimization problem in terms of pairwise similarities.
It is highly effective even when you have more features (columns) than actual data points (rows).
Limitations
Kernel SVMs can be slow and memory-heavy on very large datasets because they depend on many pairwise similarities.
They do not naturally produce calibrated probabilities without an additional calibration step.
Performance is sensitive to kernel choice, feature scaling, and hyperparameters such as C and gamma.
Key Assumptions
Scope conditions and interpretation notes
- 1
Features are scaled so distances and dot products are meaningful.
- 2
The selected kernel and regularization parameter match the data geometry.
References
Books and papers for deeper study
Boser, B.E., Guyon, I.M. and Vapnik, V.N. (1992) 'A training algorithm for optimal margin classifiers', in Proceedings of the fifth annual workshop on Computational learning theory. Pittsburgh, Pennsylvania: ACM, pp. 144-152.