Implement BERT's MLM masking procedure.
Signature: def apply_mlm_mask(token_ids: list, mask_id: int, vocab_size: int, mask_prob: float = 0.15, seed: int = 0) -> tuple
Returns (masked_tokens, labels) where both are lists of length len(token_ids).
Procedure:
np.random.seed(seed)select = np.random.rand(n) — token i is selected for prediction if select[i] < mask_probop = np.random.rand(n) — for selected tokens:
op[i] < 0.8 → replace with mask_id0.8 <= op[i] < 0.9 → replace with a random token from np.random.randint(0, vocab_size, size=n)op[i] >= 0.9 → leave the token unchangedlabels[i] = token_ids[i] for selected positions, -100 otherwiseDraw the three random arrays in this order: select, then op, then rand_tok (call np.random.randint(0, vocab_size, size=n) once at the start, after the two rand calls).
Math
Asked at
Test Results