TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

192. Softmax + Dropout Fusion

Medium

In an attention block, P = softmax(scores) is followed by P = dropout(P). Both tensors are shape (N, D). Unfused, the softmax kernel reads scores and writes P to HBM, then dropout reads P back and writes the masked copy — two full HBM round-trips. Fused, dropout is applied in the same pass that wrote softmax's output, so we read scores once and write the final P once.

Signature: def softmax_dropout_fusion_bytes(N: int, D: int, dtype_bytes: int) -> list

Count DRAM traffic in elements (multiply by dtype_bytes for bytes) for both versions and return [unfused_bytes, fused_bytes, savings_bytes] (all ints).

Math

P=Dropout(softmax(S)), fused into one kernel pass

Asked at

NumPy

import numpy as np

 

def softmax_dropout_fusion_bytes(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?