Backprop: Conv2d (via im2col)

Hand-derive the gradient of L = sum(Conv2d(x)) w.r.t. the input x. No padding, stride 1.

Forward:

Input x has shape (C_in, H, W) (single image, no batch dim).
Filter K has shape (C_out, C_in, kH, kW).
Output y has shape (C_out, H - kH + 1, W - kW + 1) and y[c_out, i, j] = sum_{c_in, di, dj} K[c_out, c_in, di, dj] * x[c_in, i+di, j+dj].

Implement:

conv2d_forward(x, K) -> y
conv2d_backward(x, K) -> dL/dx of shape (C_in, H, W)

The backward of a valid cross-correlation w.r.t. the input is a full convolution of the upstream gradient with the filter (flipped in spatial dims). With L = sum(y) so dL/dy = ones(C_out, H_out, W_out),

dL/dx[c_in, i, j] = sum over (c_out, di, dj) of K[c_out, c_in, di, dj]
                   for every (di, dj) such that the output position (i - di, j - dj) is in range.

For an interior pixel (far from any edge), every (di, dj) contributes, so dL/dx[c_in, i, j] = sum_{c_out} sum_{di, dj} K[c_out, c_in, di, dj] — a constant. Edge pixels get a smaller subset.

Math

\frac{\partial L}{\partial x _{c, i, j}} = c^{'} \sum (d_{i}, d_{j}) \in V_{i, j} \sum K_{c^{'}, c, d_{i}, d_{j}}

Asked at