TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

113. Adam fp32 Training Memory

Medium

For full-fp32 training with the Adam optimizer, compute the bytes of memory needed for parameters + gradients + optimizer state.

Signature: def adam_training_memory_bytes(n_params: int) -> int

Adam tracks four fp32 tensors per parameter:

  • weights (4 bytes)
  • gradients (4 bytes)
  • first moment m (4 bytes)
  • second moment v (4 bytes)

Total: 16 bytes per parameter.

Example: 1B params -> 16 GB just for parameters/grads/state (excludes activations).

Return n_params * 16.

Math

MAdam,fp32​=N⋅(4+4+4+4)=16⋅N bytes

Asked at

NumPy

import numpy as np

 

def adam_training_memory_bytes(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?