TorchedUp
ProblemsPremium
TorchedUp
Softmax + Dropout FusionMedium
ProblemsPremium

Softmax + Dropout Fusion

In an attention block, P = softmax(scores) is followed by P = dropout(P). Unfused, dropout reads P from HBM and writes a masked copy back. Fused, dropout is applied in the same pass that wrote softmax's output — saving 2 full (N, D) round-trips.

Signature: def softmax_dropout_fusion_bytes(N: int, D: int, dtype_bytes: int) -> list

  • Unfused: softmax read+write + dropout read+write = 4 * N * D * dtype_bytes
  • Fused: read scores once, write final output once = 2 * N * D * dtype_bytes

Return [unfused_bytes, fused_bytes, savings_bytes] (all ints).

Math

Asked at

Python (numpy)0/3 runs today

Test Results

○small fp32
○attention map fp16
○long context fp16🔒 Premium
Advertisement