TorchedUp
ProblemsPremium
TorchedUp
Token-level F1Easy
ProblemsPremium

Token-level F1

Compute the token-overlap F1 between a predicted answer and a reference answer — the standard span-QA metric (SQuAD).

Signature: def token_f1(prediction: list, reference: list) -> float

  • Use Counter & Counter (multiset intersection) to count common tokens with multiplicity.
  • precision = common / len(prediction)
  • recall = common / len(reference)
  • F1 = 2 * p * r / (p + r)
  • Return 0.0 if there are no common tokens (or either input is empty).

Math

Asked at

Python (numpy)0/3 runs today

Test Results

○exact match
○partial overlap
○multiset matters🔒 Premium
Advertisement