Recurrent Neural Network
Short Definition
Full Definition
Recurrent Neural Networks are a class of neural networks specifically designed to handle sequential and temporal data, where the order of inputs matters. Unlike feedforward networks that process each input independently, RNNs maintain a hidden state that acts as a form of memory, carrying information from previous time steps to influence the processing of current inputs. This makes them naturally suited for tasks like language modeling, speech recognition, time series prediction, and music generation. The basic RNN architecture was developed in the 1980s, with key contributions from Jordan and Elman networks. However, vanilla RNNs suffered from the vanishing and exploding gradient problems, which made it difficult to learn long-range dependencies in sequences. This limitation led to the development of more sophisticated variants: Long Short-Term Memory (LSTM) networks introduced by Hochreiter and Schmidhuber in 1997, and Gated Recurrent Units (GRU) by Cho et al. in 2014. These gated architectures use learnable gates to control information flow, effectively solving the vanishing gradient problem for many practical applications. RNNs dominated sequence modeling tasks for years until the Transformer architecture demonstrated that attention mechanisms could handle sequential data more effectively. While Transformers have largely replaced RNNs in natural language processing, RNNs remain relevant for real-time applications, edge devices, and tasks where sequential processing is naturally advantageous. Recent architectures like RWKV and Mamba blend ideas from RNNs and Transformers.
Technical Explanation
The basic RNN update at time t: h_t = tanh(W_hh * h_{t-1} + W_xh * x_t + b_h), y_t = W_hy * h_t + b_y, where h_t is the hidden state, x_t is input, and y_t is output. LSTM introduces gates: forget gate f_t = sigmoid(W_f * [h_{t-1}, x_t] + b_f), input gate i_t = sigmoid(W_i * [h_{t-1}, x_t] + b_i), cell update c_t = f_t * c_{t-1} + i_t * tanh(W_c * [h_{t-1}, x_t] + b_c), output gate o_t = sigmoid(W_o * [h_{t-1}, x_t] + b_o), hidden state h_t = o_t * tanh(c_t). GRU simplifies to two gates: update and reset. Bidirectional RNNs process sequences in both directions. Sequence-to-sequence models use encoder-decoder RNN pairs. Teacher forcing trains decoders using ground truth rather than predicted tokens.
Use Cases
Advantages
Disadvantages
Schema Type
Featured Snippet Candidate
Difficulty Level