Eugene Yan • 7/20/2021

Mailbag: How to Bootstrap Labels for Relevant Docs in Search

This article addresses a reader's question on how industry engineers obtain the 'total number of relevant documents' to evaluate search systems with metrics like Recall@K. It advises starting with a lexical search system (e.g., BM25), deploying it to collect user click data as labels, and then using that data to train and evaluate a semantic search model, avoiding the high cost of large-scale human annotation.

0 comments

#Machine Learning #Information Retrieval #Semantic Search