Named Entity Recognition
Short Definition
Full Definition
Named Entity Recognition is a fundamental building block of natural language processing systems, serving as a crucial information extraction step in many AI pipelines. The task involves scanning text to identify mentions of specific types of entities and classifying them into categories. Standard entity types include PERSON, ORGANIZATION, LOCATION, DATE, TIME, MONEY, and PERCENTAGE, though domain-specific NER systems may recognize entities like drug names, gene names, legal citations, or product models. NER has evolved through three generations of approaches. Rule-based systems used hand-crafted patterns and gazetteers. Statistical methods like Conditional Random Fields (CRF) learned to recognize entities from labeled examples. Modern approaches use deep learning, particularly BiLSTM-CRF architectures and fine-tuned transformers like BERT, which achieve near-human performance on standard benchmarks. NER is essential for numerous downstream applications: search engines use it to understand queries, chatbots use it to extract key information from user messages, knowledge graphs are built from NER-extracted entities, and document processing systems use it to automatically extract structured information from unstructured text. The CoNLL-2003 shared task established the standard evaluation benchmark for NER in English and German.
Technical Explanation
NER is typically formulated as a sequence labeling problem using BIO or BIOES tagging: B-PER marks the beginning of a person entity, I-PER marks inside, O marks outside any entity. CRF models the joint probability of the entire label sequence: P(y|x) = (1/Z) * exp(sum(feature_weights * features)). BiLSTM-CRF combines bidirectional LSTM for feature extraction with CRF for structured prediction. Transformer-based NER fine-tunes BERT with a token classification head, predicting entity labels for each subword token. Evaluation uses entity-level precision, recall, and F1: an entity is correct only if both the boundary and type match exactly. Nested NER handles overlapping entities. Few-shot NER uses in-context learning with LLMs.
Use Cases
Advantages
Disadvantages
Schema Type
Featured Snippet Candidate
Difficulty Level