Given the per-token hidden states from a transformer encoder, produce a single vector per sequence using one of three pooling strategies.
Signature: def pool(hidden: np.ndarray, mode: str) -> np.ndarray
hidden: shape (batch, seq, d)mode is one of:
'cls' → take hidden[:, 0, :] (the leading [CLS] token)'mean' → average across the sequence axis'max' → element-wise max across the sequence axisReturn shape: (batch, d).
Math
Asked at
Test Results