Yoel Zeldes • 12/16/2018

The Story of a Bad Train-Test Split

The article recounts a technical case study where adding thumbnail image features to a content recommendation model led to a biased train-test split. The author explains the need to prevent data leakage by ensuring unique thumbnails and titles are isolated to train or test sets, describes a naive implementation, and analyzes the unexpected performance degradation it caused, highlighting a crucial machine learning pitfall.

0 comments

#Machine Learning #Bia #Feature Engineering