Fleiss' Kappa

Implement Fleiss' kappa for inter-rater agreement when there are more than two raters per item (and possibly different raters across items).

Signature: def fleiss_kappa(ratings: list) -> float

Input ratings has shape (N, K) where ratings[i][k] is the number of raters who assigned item i to category k. Each row sums to the same number of raters n.

Compute:

P_i = (sum_k n_ik^2 - n) / (n * (n - 1)) per item
p_k = sum_i n_ik / (N * n) (overall category proportions)
P_bar = mean(P_i), P_e = sum_k p_k^2
kappa = (P_bar - P_e) / (1 - P_e)

Math

Asked at