Implement Fleiss' kappa for inter-rater agreement when there are more than two raters per item (and possibly different raters across items).
Signature: def fleiss_kappa(ratings: list) -> float
Input ratings has shape (N, K) where ratings[i][k] is the number of raters who assigned item i to category k. Each row sums to the same number of raters n.
Compute:
P_i = (sum_k n_ik^2 - n) / (n * (n - 1)) per itemp_k = sum_i n_ik / (N * n) (overall category proportions)P_bar = mean(P_i), P_e = sum_k p_k^2kappa = (P_bar - P_e) / (1 - P_e)Math
Asked at
Test Results