GRU Cell

The Gated Recurrent Unit (GRU) is a streamlined alternative to LSTM with only 2 gates (reset and update) instead of 4. It merges the cell state and hidden state into one, reducing parameters while retaining the ability to capture long-range dependencies.

Given input x and previous hidden state h_prev:

r = sigmoid(W_r @ [h_prev, x] + b_r)   # reset gate
z = sigmoid(W_z @ [h_prev, x] + b_z)   # update gate
n = tanh(W_n @ [r * h_prev, x] + b_n)  # candidate hidden
h = (1 - z) * n + z * h_prev           # new hidden state

In practice, use concatenated weight matrices. Weight W_ih is (3*H, input_size) and W_hh is (3*H, H). The 3 gate components are [r, z, n] in order.

For the candidate n, the reset gate is applied to the hidden-to-hidden contribution only:

n = tanh(W_ih_n @ x + b_ih_n + r * (W_hh_n @ h_prev + b_hh_n))

Signature: def gru_cell(x, h_prev, W_ih, W_hh, b_ih, b_hh)

x: (input_size,)
h_prev: (hidden_size,)
W_ih: (3*hidden_size, input_size)
W_hh: (3*hidden_size, hidden_size)
b_ih, b_hh: (3*hidden_size,)
Returns: h_next (hidden_size,)

Math

Asked at