Few-Shot Learning

Short Definition

Few-shot learning is a machine learning approach where models learn to perform new tasks or recognize new categories from only a handful of examples, typically one to five. It mimics human ability to quickly learn from minimal experience and is crucial for scenarios where labeled data is scarce.

Full Definition

Few-shot learning addresses one of the most significant practical challenges in machine learning: the need for large labeled datasets. While humans can learn to recognize a new type of bird from seeing just one or two examples, traditional machine learning models typically require thousands or millions of labeled examples. Few-shot learning aims to bridge this gap. The field has two main branches. In computer vision, meta-learning approaches like Prototypical Networks learn to compare new examples with class prototypes, MAML (Model-Agnostic Meta-Learning) learns initializations that adapt quickly to new tasks, and Siamese Networks learn similarity functions between pairs of examples. In NLP, few-shot learning has been revolutionized by large language models. GPT-3 demonstrated that providing a few examples in the prompt (in-context learning) enables the model to perform new tasks remarkably well. This in-context few-shot capability has become the primary way users interact with LLMs — showing the model what you want through examples rather than fine-tuning on large datasets. Few-shot learning has enormous practical value: it enables AI deployment in domains where labeled data is expensive (medical imaging), rare (manufacturing defects), or constantly changing (new product categories). It also powers rapid prototyping, allowing developers to test AI solutions with minimal data investment before committing to full-scale data collection.

Technical Explanation

Prototypical Networks compute class prototypes as the mean embedding of support examples: c_k = (1/|S_k|) * sum(f_theta(x_i)), then classify queries by distance to prototypes: P(y=k|x) = softmax(-d(f_theta(x), c_k)). MAML learns initialization theta that can be quickly adapted: theta’_i = theta – alpha * gradient(L_Ti(theta)) for each task T_i. Siamese Networks learn a similarity function: sim(x_1, x_2) = sigma(|f(x_1) – f(x_2)|). For LLMs, in-context learning provides k examples in the prompt: ‘Input: X1 -> Output: Y1, Input: X2 -> Output: Y2, Input: X_new -> Output:’. Performance typically improves with more examples (1-shot < 3-shot < 5-shot) but plateaus beyond a certain point.

Use Cases

Medical image classification with rare diseases | Manufacturing defect detection | New product categorization | Personalized recommendations from limited user data | Rapid NLP task prototyping | Wildlife species identification | Security surveillance of rare events | Drug discovery with limited experimental data

Advantages

Requires very few labeled examples | Enables AI in data-scarce domains | Fast adaptation to new tasks | Reduces data collection costs | Natural interaction paradigm for LLMs | Bridges gap between human and machine learning efficiency

Disadvantages

Generally lower accuracy than fully supervised models | Performance sensitive to example selection | Meta-learning approaches require complex training | In-context learning limited by context window | Results can be inconsistent across runs | Difficult to handle complex multi-step tasks

Schema Type

DefinedTerm

Difficulty Level

Beginner