Greedily select k documents that balance relevance to the query with diversity from already-selected documents.
Signature: def mmr(query_sim: np.ndarray, doc_doc_sim: np.ndarray, lam: float, k: int) -> list
At each step, pick the un-selected document d that maximizes:
lam * query_sim[d] - (1 - lam) * max_{s in selected} doc_doc_sim[d, s]
For the first selection (no prior s), use just lam * query_sim[d]. Return the list of selected indices in selection order.
Math
Asked at
Test Results