Implement 2D transposed convolution (also called "deconvolution" or "fractionally strided convolution"), used in U-Nets, GANs, and segmentation models to upsample feature maps.
Signature: def conv_transpose2d(x, kernel, stride=1, padding=0)
x: (H_in, W_in, C_in)kernel: (kH, kW, C_in, C_out)stride: int (default 1)padding: int (default 0)(H_out, W_out, C_out)Output size:
H_out = (H_in - 1) * stride + kH - 2 * padding
W_out = (W_in - 1) * stride + kW - 2 * padding
For each input position (i, j) and each input channel c_in:
for kh in range(kH):
for kw in range(kW):
oh = i * stride + kh - padding
ow = j * stride + kw - padding
if 0 <= oh < H_out and 0 <= ow < W_out:
out[oh, ow, :] += x[i, j, c_in] * kernel[kh, kw, c_in, :]
Intuition: Instead of gathering values into a convolution (standard conv), transposed conv scatters each input value back through the kernel. With stride > 1, output pixels are placed with gaps between them, effectively upsampling.
Transposed convolution learns the upsampling weights (the kernel), making it trainable. It's the standard learnable upsampling layer in:
Math
Asked at
Test Results