Given a model that doesn't fit on a single GPU, compute the minimum number of shards per node needed to fit it in VRAM, clamped to the GPUs you actually have.
Signature: def optimal_shards_per_node(model_bytes: int, gpus_per_node: int, gpu_vram_bytes: int) -> int
Formula: max(1, min(gpus_per_node, ceil(model_bytes / gpu_vram_bytes))).
Example:
Math
Asked at
Test Results