TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

193. Kernel Launch Overhead Speedup

Easy

Each CUDA kernel launch carries a few microseconds of fixed overhead. For chains of tiny pointwise ops, this overhead dominates over actual GPU work. Fusing n small kernels into one pays the launch cost once.

Signature: def kernel_launch_breakeven(per_kernel_overhead_us: float, fused_kernel_us: float, n_ops: int) -> list

  • Unfused total time: per_kernel_overhead_us * n_ops (microseconds)
  • Fused total time: fused_kernel_us (microseconds)
  • Speedup = unfused / fused (or 0.0 if fused is 0)

Return [unfused_us, fused_us, speedup] (floats).

Math

speedup=tfused​n⋅tlaunch​​

Asked at

NumPy

import numpy as np

 

def kernel_launch_breakeven(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?