TorchedUp
ProblemsPremium
TorchedUp
Adam fp32 Training MemoryMedium
ProblemsPremium

Adam fp32 Training Memory

For full-fp32 training with the Adam optimizer, compute the bytes of memory needed for parameters + gradients + optimizer state.

Signature: def adam_training_memory_bytes(n_params: int) -> int

Adam tracks four fp32 tensors per parameter:

  • weights (4 bytes)
  • gradients (4 bytes)
  • first moment m (4 bytes)
  • second moment v (4 bytes)

Total: 16 bytes per parameter.

Example: 1B params -> 16 GB just for parameters/grads/state (excludes activations).

Return n_params * 16.

Math

Asked at

Python (numpy)0/3 runs today

Test Results

○1B params
○125M
○7B🔒 Premium
Advertisement