Random Forest
Short Definition
Full Definition
Random Forest is one of the most popular and practical machine learning algorithms, known for its excellent performance, robustness, and ease of use. Introduced by Leo Breiman in 2001, it belongs to the family of ensemble methods that combine multiple weak learners to create a strong learner. The algorithm works by training many decision trees on different random subsets of the training data (bagging) and random subsets of features. For classification, the final prediction is the majority vote across all trees; for regression, it is the average prediction. This randomization strategy is the key to Random Forest’s success: while individual decision trees tend to overfit, the diversity created by random sampling means that the errors of individual trees tend to cancel out when combined. Random Forest handles high-dimensional data well, naturally provides feature importance rankings, requires minimal hyperparameter tuning compared to other methods, and works well with both numerical and categorical features. It is resistant to outliers and can handle missing values. Despite the rise of gradient boosting methods and deep learning, Random Forest remains a go-to algorithm for tabular data, often serving as a strong baseline that is hard to beat. It is widely used in finance, healthcare, ecology, and remote sensing.
Technical Explanation
The algorithm trains B trees, each on a bootstrap sample of size N drawn with replacement from the training set. At each node split, only m randomly selected features (out of total M) are considered, typically m = sqrt(M) for classification and m = M/3 for regression. The out-of-bag (OOB) error provides a built-in estimate of generalization performance without needing a separate validation set. Feature importance is computed as the mean decrease in impurity (Gini importance) or mean decrease in accuracy when a feature is randomly permuted. The ensemble prediction for classification: y_hat = mode(h_1(x), …, h_B(x)), and for regression: y_hat = (1/B) * sum(h_b(x)). Typical B ranges from 100 to 1000 trees.
Use Cases
Advantages
Disadvantages
Schema Type
Featured Snippet Candidate
Difficulty Level