Benchmarking articles

10/5/2025 • EN

Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)

Explores four main methods for evaluating Large Language Models (LLMs), including code examples for implementing each approach from scratch.

benchmarking Fine Tuning LLM Evaluation Model Comparison Reasoning Models

Sebastian Raschka

10/5/2025 • EN

Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)

A guide to the four main methods for evaluating Large Language Models, including code examples and practical implementation details.

benchmarking Fine Tuning LLM Evaluation Model Comparison Reasoning Models

Sebastian Raschka

9/19/2025 • EN

How to waste CPU like a Professional

A technical exploration of seven methods to intentionally waste CPU time for precise durations, focusing on user-land implementations for profiling tests.

benchmarking Cpu Profiling Java linux performance testing

Johannes Bechberger

9/17/2025 • EN

Visual Studio 2026 Build Performance

A performance comparison of Visual Studio 2026 vs. 2022, focusing on build times and resource usage for a large .NET Framework solution.

benchmarking build optimization IDE Performance Net Framework Visual Studio

Panu Oksala

9/15/2025 • EN

LinkedIn Benchmarks again

Analyzes C# performance benchmarks for slicing lists, comparing Skip/Take, Range operator, and GetRange methods, highlighting a common benchmarking error.

.net benchmarking c Memory Allocation performance

Steven Giesel

7/31/2025 • EN

Benchmarking MicroPython

A developer benchmarks MicroPython performance on various microcontrollers, comparing them to a Raspberry Pi 4 and a laptop.

benchmarking Microcontrollers Micropython performance Python

Miguel Grinberg

6/22/2025 • EN

Evaluating Long-Context Question & Answer Systems

Explores challenges and methods for evaluating question-answering AI systems when processing long documents like technical manuals or novels.

benchmarking Information Retrieval LLM Evaluation Long Context Question Answering

Eugene Yan

6/19/2025 • EN

Benchmark models using OpenAI-compatible APIs

A guide to benchmarking language models using a Jupyter Notebook that supports any OpenAI-compatible API, including Ollama and Foundry Local.

benchmarking Jupyter Notebook Language Models Openai API Prompty

Waldek Mastykarz

6/16/2025 • EN

.NET 10 Performance Edition

Highlights key performance improvements in .NET 10, including stack allocation optimizations and delegate escape analysis, with benchmark comparisons.

.net benchmarking c Jit performance

Steven Giesel

6/5/2025 • EN

TIL: Vision-Language Models Read Worse (or Better) Than You Think

Introduces ReadBench, a benchmark for evaluating how well Vision-Language Models (VLMs) can read and extract information from images of text.

benchmarking Multimodal AI Text Extraction Vision Language Models Visual Rag

Jeremy Howard

6/2/2025 • EN

Why is enumerating over List faster than IList?

Explains why iterating over a concrete List<T> in C# is faster than iterating over an IList<T> interface, covering boxing and virtual method overhead.

.net benchmarking c Collections performance

Steven Giesel