Finetune Granite3.1 for Reasoning
A technical guide on fine-tuning IBM's Granite3.1 AI model using Guided Reward Policy Optimization (GRPO) to enhance its reasoning capabilities.
A technical guide on fine-tuning IBM's Granite3.1 AI model using Guided Reward Policy Optimization (GRPO) to enhance its reasoning capabilities.
A tutorial on reproducing DeepSeek R1's RL 'aha moment' using Group Relative Policy Optimization (GRPO) to train a model on the Countdown numbers game.
Explains the training of DeepSeek-R1, focusing on the Group Relative Policy Optimization (GRPO) reinforcement learning method.