Decision Tree
Short Definition
Full Definition
Decision Trees are among the most intuitive and interpretable machine learning algorithms, making them popular for both practical applications and educational purposes. The algorithm learns a tree-shaped model where each internal node represents a decision based on a feature value, each branch represents the outcome of that decision, and each leaf node represents a final prediction. For classification, leaves predict class labels; for regression, they predict continuous values. The tree is built top-down by recursively selecting the feature and split point that best separates the data according to a criterion like Gini impurity or information gain. Decision trees have several appealing properties: they are easy to visualize and explain to non-technical stakeholders, they handle both numerical and categorical data without preprocessing, they naturally capture non-linear relationships and feature interactions, and they require minimal data preparation. However, individual decision trees are prone to overfitting, especially when grown deep. This limitation led to the development of ensemble methods like Random Forest and Gradient Boosting, which combine many trees for much better performance. Despite being surpassed by ensembles for raw accuracy, decision trees remain valuable for their interpretability and are widely used in healthcare, finance, and legal applications where explainability is required.
Technical Explanation
Tree construction uses recursive partitioning. At each node, the algorithm selects the feature and threshold that maximizes information gain (ID3, C4.5) or minimizes Gini impurity: Gini(t) = 1 – sum(p_i^2). For regression trees, variance reduction is used. The CART algorithm (Classification and Regression Trees) uses binary splits and can handle both tasks. Pruning prevents overfitting: pre-pruning limits tree growth (max depth, min samples), while post-pruning (cost-complexity pruning) removes branches that do not improve generalization. The tree prediction follows a path from root to leaf: O(log N) for balanced trees. Feature importance is proportional to the total reduction in the splitting criterion across all nodes using that feature.
Use Cases
Advantages
Disadvantages
Schema Type
Featured Snippet Candidate
Difficulty Level