Sebastian Raschka • 2/5/2025

Understanding Reasoning LLMs

This technical article defines reasoning models and details four key methods to build them: inference-time scaling, pure reinforcement learning, SFT+RL, and pure supervised fine-tuning. It discusses the specialization of LLMs for complex, multi-step tasks like coding and math, using examples like the DeepSeek training pipeline, and provides guidance on when to use reasoning models.

0 comments

#Reinforcement Learning #Deepseek #LLM Reasoning