TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

146. Throughput vs Latency Batch Size

Medium

Find the largest batch size that still meets a per-token latency SLA.

Signature: def optimal_batch_for_sla(latency_intercept_ms: float, latency_slope_ms: float, max_latency_ms: float, max_batch: int) -> int

Latency model: l(b) = latency_intercept_ms + latency_slope_ms * b. It is non-decreasing in b (bigger batch = slower per-token decode at high enough b).

Return the maximum integer b in [1, max_batch] such that l(b) <= max_latency_ms. If no b satisfies the SLA, return 0. Use a linear search.

Example:

  • intercept=5, slope=0.5, max_latency=10ms, max_batch=32 → largest b with 5 + 0.5b <= 10 is b=10.

Math

b∗=max{b∈[1,B]:a+s⋅b≤Lmax​}

Asked at

NumPy

import numpy as np

 

def optimal_batch_for_sla(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?