TorchedUp
ProblemsPremium
TorchedUp
GRU CellMedium
ProblemsPremium

GRU Cell

The Gated Recurrent Unit (GRU) is a streamlined alternative to LSTM with only 2 gates (reset and update) instead of 4. It merges the cell state and hidden state into one, reducing parameters while retaining the ability to capture long-range dependencies.

Given input x and previous hidden state h_prev:

r = sigmoid(W_r @ [h_prev, x] + b_r)   # reset gate
z = sigmoid(W_z @ [h_prev, x] + b_z)   # update gate
n = tanh(W_n @ [r * h_prev, x] + b_n)  # candidate hidden
h = (1 - z) * n + z * h_prev           # new hidden state

In practice, use concatenated weight matrices. Weight W_ih is (3*H, input_size) and W_hh is (3*H, H). The 3 gate components are [r, z, n] in order.

For the candidate n, the reset gate is applied to the hidden-to-hidden contribution only:

n = tanh(W_ih_n @ x + b_ih_n + r * (W_hh_n @ h_prev + b_hh_n))

Signature: def gru_cell(x, h_prev, W_ih, W_hh, b_ih, b_hh)

  • x: (input_size,)
  • h_prev: (hidden_size,)
  • W_ih: (3*hidden_size, input_size)
  • W_hh: (3*hidden_size, hidden_size)
  • b_ih, b_hh: (3*hidden_size,)
  • Returns: h_next (hidden_size,)

Math

Asked at

Python (numpy)0/3 runs today

Test Results

○random weights, h_prev=0
○update gate ≈ 1 preserves h_prev
○reset gate ≈ 0 ignores h_prev in candidate🔒 Premium
Advertisement