Transfer Learning
Short Definition
Full Definition
Transfer learning has become one of the most practically important techniques in modern machine learning, fundamentally changing how AI models are developed and deployed. The core idea is simple yet powerful: instead of training a model from scratch for every new task, you start with a model that has already learned useful representations from a large dataset and adapt it to your specific needs. This approach is inspired by human learning — we do not learn each new skill in isolation but build upon previously acquired knowledge. In computer vision, transfer learning typically involves taking a model pre-trained on ImageNet (millions of labeled images) and fine-tuning it on a smaller, task-specific dataset. In NLP, the paradigm shift came with models like BERT and GPT, which are pre-trained on massive text corpora using self-supervised objectives and then fine-tuned for specific tasks like sentiment analysis or question answering. Transfer learning has democratized AI development by making it possible to build high-performing models without access to massive datasets or computational resources. A medical startup can fine-tune a pre-trained vision model to detect diseases in X-rays using just thousands of labeled images, rather than millions. The technique has proven effective across virtually every domain of AI, from speech recognition to robotics.
Technical Explanation
Transfer learning involves two phases: pre-training on a source task with abundant data, and fine-tuning on a target task with limited data. Common strategies include feature extraction (freezing pre-trained layers and training only new task-specific layers), full fine-tuning (updating all parameters with a small learning rate), and progressive unfreezing (gradually unfreezing layers from top to bottom). The learning rate for fine-tuning is typically 10-100x smaller than for training from scratch to avoid catastrophic forgetting. Domain adaptation techniques handle distribution shifts between source and target domains. Modern approaches include adapter layers (small trainable modules inserted into frozen models), LoRA (Low-Rank Adaptation using low-rank matrix decomposition), and prompt tuning for language models.
Use Cases
Advantages
Disadvantages
Schema Type
Featured Snippet Candidate
Difficulty Level