The Long Short-Term Memory (LSTM) cell uses gating mechanisms to control information flow, solving the vanishing gradient problem of vanilla RNNs. It maintains two states: the hidden state h (short-term memory) and the cell state c (long-term memory).
Given input x, previous hidden state h_prev, previous cell state c_prev, and concatenated weight matrices:
W_ih: (4·H, input_size) — input-to-hidden weightsW_hh: (4·H, H) — hidden-to-hidden weightsb_ih, b_hh: (4·H,) — biasesSlice the gate pre-activations in order [i, f, g, o] (each of size H):
i = sigmoid(gates[:H]) # input gate
f = sigmoid(gates[H:2H]) # forget gate
g = tanh(gates[2H:3H]) # cell gate
o = sigmoid(gates[3H:]) # output gate
Signature: def lstm_cell(x, h_prev, c_prev, W_ih, W_hh, b_ih, b_hh)
Returns: (h_next, c_next) — both shape (H,)
Math
Asked at
Test Results