TorchedUp
ProblemsPremium
TorchedUp
BERT MLM MaskingMedium
ProblemsPremium

Masked Language Modeling Mask

Implement BERT's MLM masking procedure.

Signature: def apply_mlm_mask(token_ids: list, mask_id: int, vocab_size: int, mask_prob: float = 0.15, seed: int = 0) -> tuple

Returns (masked_tokens, labels) where both are lists of length len(token_ids).

Procedure:

  1. np.random.seed(seed)
  2. Draw select = np.random.rand(n) — token i is selected for prediction if select[i] < mask_prob
  3. Draw op = np.random.rand(n) — for selected tokens:
    • op[i] < 0.8 → replace with mask_id
    • 0.8 <= op[i] < 0.9 → replace with a random token from np.random.randint(0, vocab_size, size=n)
    • op[i] >= 0.9 → leave the token unchanged
  4. labels[i] = token_ids[i] for selected positions, -100 otherwise

Draw the three random arrays in this order: select, then op, then rand_tok (call np.random.randint(0, vocab_size, size=n) once at the start, after the two rand calls).

Math

Asked at

Python (numpy)0/3 runs today

Test Results

○mask_prob=0 means no changes
○standard 15% mask, seed 0
○high mask_prob, seed 7🔒 Premium
Advertisement