Decision Tree

Short Definition

A Decision Tree is a supervised machine learning model that makes predictions by learning a series of hierarchical if-then-else decision rules from training data. It splits data into branches based on feature values, creating an interpretable tree-like structure for classification and regression.

Full Definition

Decision Trees are among the most intuitive and interpretable machine learning algorithms, making them popular for both practical applications and educational purposes. The algorithm learns a tree-shaped model where each internal node represents a decision based on a feature value, each branch represents the outcome of that decision, and each leaf node represents a final prediction. For classification, leaves predict class labels; for regression, they predict continuous values. The tree is built top-down by recursively selecting the feature and split point that best separates the data according to a criterion like Gini impurity or information gain. Decision trees have several appealing properties: they are easy to visualize and explain to non-technical stakeholders, they handle both numerical and categorical data without preprocessing, they naturally capture non-linear relationships and feature interactions, and they require minimal data preparation. However, individual decision trees are prone to overfitting, especially when grown deep. This limitation led to the development of ensemble methods like Random Forest and Gradient Boosting, which combine many trees for much better performance. Despite being surpassed by ensembles for raw accuracy, decision trees remain valuable for their interpretability and are widely used in healthcare, finance, and legal applications where explainability is required.

Technical Explanation

Tree construction uses recursive partitioning. At each node, the algorithm selects the feature and threshold that maximizes information gain (ID3, C4.5) or minimizes Gini impurity: Gini(t) = 1 – sum(p_i^2). For regression trees, variance reduction is used. The CART algorithm (Classification and Regression Trees) uses binary splits and can handle both tasks. Pruning prevents overfitting: pre-pruning limits tree growth (max depth, min samples), while post-pruning (cost-complexity pruning) removes branches that do not improve generalization. The tree prediction follows a path from root to leaf: O(log N) for balanced trees. Feature importance is proportional to the total reduction in the splitting criterion across all nodes using that feature.

Use Cases

Medical diagnosis decision support | Credit approval rules | Customer segmentation | Fault diagnosis | Regulatory compliance rules | Risk assessment | Marketing campaign targeting | Educational assessment

Advantages

Highly interpretable and visualizable | No feature scaling required | Handles both numerical and categorical data | Captures non-linear relationships | Fast prediction time | Easy to explain to stakeholders

Disadvantages

Prone to overfitting without pruning | Unstable with small data changes | Biased toward features with more levels | Cannot capture smooth boundaries well | Outperformed by ensemble methods | Greedy splitting may miss optimal trees

Schema Type

DefinedTerm

Difficulty Level

Beginner