Transposed Convolution (ConvTranspose2D)

Implement 2D transposed convolution (also called "deconvolution" or "fractionally strided convolution"), used in U-Nets, GANs, and segmentation models to upsample feature maps.

Signature: def conv_transpose2d(x, kernel, stride=1, padding=0)

x: (H_in, W_in, C_in)
kernel: (kH, kW, C_in, C_out)
stride: int (default 1)
padding: int (default 0)
Returns: (H_out, W_out, C_out)

Output size:

H_out = (H_in - 1) * stride + kH - 2 * padding
W_out = (W_in - 1) * stride + kW - 2 * padding

Algorithm — Scatter (transpose of the gather in forward conv)

For each input position (i, j) and each input channel c_in:

for kh in range(kH):
    for kw in range(kW):
        oh = i * stride + kh - padding
        ow = j * stride + kw - padding
        if 0 <= oh < H_out and 0 <= ow < W_out:
            out[oh, ow, :] += x[i, j, c_in] * kernel[kh, kw, c_in, :]

Intuition: Instead of gathering values into a convolution (standard conv), transposed conv scatters each input value back through the kernel. With stride > 1, output pixels are placed with gaps between them, effectively upsampling.

Why Not Just Unpool?

Transposed convolution learns the upsampling weights (the kernel), making it trainable. It's the standard learnable upsampling layer in:

U-Net decoder (skip connections + transposed conv)
GANs generator (latent → full image)
Deformable segmentation heads

Math

Asked at