AI Alignment Articles

Page 1 of 1 (3 articles)

1/15/2026 • EN

OpenAI researchers propose 'confessions' as a method to improve AI honesty by training models to self-report misbehavior in reinforcement learning.

AI Alignment Model Honesty Proxy Optimization Reinforcement Learning Reward Hacking

11/23/2025 • EN

A guide to building product evaluations for LLMs using three steps: labeling data, aligning evaluators, and running experiments.

AI Alignment Data Labeling Evaluation Harness Experiment Design LLM Evaluation

9/6/2025 • EN

Explores the shift from RLHF to RLVR for training LLMs, focusing on using objective, verifiable rewards to improve reasoning and accuracy.

AI Alignment llm Reasoning Models Reinforcement Learning Rlvr

Select Language