Transfer Learning -

Short Definition

Transfer learning is a machine learning technique where a model trained on one task is reused as the starting point for a different but related task. It leverages knowledge gained from large datasets to improve performance on tasks with limited data, dramatically reducing training time and resource requirements.

Full Definition

Transfer learning has become one of the most practically important techniques in modern machine learning, fundamentally changing how AI models are developed and deployed. The core idea is simple yet powerful: instead of training a model from scratch for every new task, you start with a model that has already learned useful representations from a large dataset and adapt it to your specific needs. This approach is inspired by human learning — we do not learn each new skill in isolation but build upon previously acquired knowledge. In computer vision, transfer learning typically involves taking a model pre-trained on ImageNet (millions of labeled images) and fine-tuning it on a smaller, task-specific dataset. In NLP, the paradigm shift came with models like BERT and GPT, which are pre-trained on massive text corpora using self-supervised objectives and then fine-tuned for specific tasks like sentiment analysis or question answering. Transfer learning has democratized AI development by making it possible to build high-performing models without access to massive datasets or computational resources. A medical startup can fine-tune a pre-trained vision model to detect diseases in X-rays using just thousands of labeled images, rather than millions. The technique has proven effective across virtually every domain of AI, from speech recognition to robotics.

Technical Explanation

Transfer learning involves two phases: pre-training on a source task with abundant data, and fine-tuning on a target task with limited data. Common strategies include feature extraction (freezing pre-trained layers and training only new task-specific layers), full fine-tuning (updating all parameters with a small learning rate), and progressive unfreezing (gradually unfreezing layers from top to bottom). The learning rate for fine-tuning is typically 10-100x smaller than for training from scratch to avoid catastrophic forgetting. Domain adaptation techniques handle distribution shifts between source and target domains. Modern approaches include adapter layers (small trainable modules inserted into frozen models), LoRA (Low-Rank Adaptation using low-rank matrix decomposition), and prompt tuning for language models.

Use Cases

Advantages

Disadvantages

Negative transfer possible if domains are too different | Pre-trained model biases can transfer | Fine-tuning requires careful hyperparameter selection | Large pre-trained models need significant memory | Domain gap can limit effectiveness | Catastrophic forgetting of original knowledge

Schema Type

DefinedTerm

Featured Snippet Candidate

Difficulty Level

Beginner