Jeremy Howard • 2/10/2025

TIL: Masked Language Models Are Surprisingly Capable Zero-Shot Learners

The article introduces ModernBERT-Large-Instruct, an instruction-tuned encoder model that uses its Masked Language Modeling head to perform classification and multiple-choice tasks zero-shot. It details how this approach outperforms other small models on benchmarks like MMLU-Pro and matches traditional fine-tuning, all with a simple training recipe and easy-to-use code on HuggingFace.

0 comments

#Instruction Tuning #Masked Language Modeling #Modernbert