Support Vector Machine

Short Definition

A Support Vector Machine (SVM) is a supervised machine learning algorithm that finds the optimal hyperplane separating different classes by maximizing the margin between the nearest data points of each class. It is effective for both linear and non-linear classification through the use of kernel functions.

Full Definition

Support Vector Machines are one of the most elegant and theoretically well-founded machine learning algorithms, developed by Vladimir Vapnik and colleagues in the 1990s. The core idea is to find the decision boundary (hyperplane) that best separates two classes while maximizing the margin — the distance between the boundary and the closest data points from each class, called support vectors. This maximum margin principle gives SVMs strong generalization properties. For linearly separable data, the algorithm finds the unique optimal hyperplane. For non-linearly separable data, SVMs use the kernel trick to implicitly map data into a higher-dimensional space where linear separation becomes possible, without actually computing the transformation. Common kernels include linear, polynomial, and radial basis function (RBF). SVMs were the dominant algorithm in machine learning before the deep learning revolution, achieving state-of-the-art results on text classification, image recognition, and bioinformatics. They remain relevant for small to medium datasets, high-dimensional problems like text classification, and applications where theoretical guarantees are valued. SVMs can also handle regression tasks (SVR) and one-class classification for anomaly detection.

Technical Explanation

The SVM optimization problem: minimize (1/2)||w||^2 subject to y_i(w·x_i + b) >= 1 for all i. The dual formulation introduces Lagrange multipliers alpha_i: maximize sum(alpha_i) – (1/2)sum(alpha_i*alpha_j*y_i*y_j*K(x_i,x_j)) subject to alpha_i >= 0 and sum(alpha_i*y_i) = 0. The kernel trick: K(x_i, x_j) = phi(x_i)·phi(x_j) computes inner products in high-dimensional space without explicit mapping. Common kernels: linear K=x·x’, polynomial K=(x·x’+c)^d, RBF K=exp(-gamma||x-x’||^2). Soft margin SVM handles non-separable data with slack variables and penalty parameter C controlling the trade-off between margin maximization and misclassification tolerance.

Use Cases

Text classification and spam filtering | Image recognition | Bioinformatics and genomics | Handwriting recognition | Medical diagnosis | Anomaly detection | Face detection | Protein structure prediction

Advantages

Strong theoretical foundations with generalization bounds | Effective in high-dimensional spaces | Memory efficient using only support vectors | Versatile through different kernel functions | Works well with small to medium datasets | Robust to overfitting in high dimensions

Disadvantages

Not suitable for very large datasets | Sensitive to feature scaling | Kernel selection and parameter tuning can be difficult | Does not directly provide probability estimates | Training time scales poorly with dataset size | Outperformed by deep learning on large datasets

Schema Type

DefinedTerm

Difficulty Level

Beginner