Testing articles

3/4/2026 • EN

Practical Guide to Evaluating and Testing Agent Skills

A guide to systematically evaluating and testing AI agent skills, covering success criteria, building an evaluation harness, and improving skill performance.

Agent Skills AI Agents Evaluation Gemini API testing

Philipp Schmid

2/27/2026 • EN

The Claude Code Drawbacks

A developer shares critical drawbacks of using Claude Code for AI-assisted programming, focusing on hidden issues like problematic test generation and maintenance challenges.

AI Coding Tools code quality debugging software development testing

Antonin Januska

2/27/2026 • EN

Writing My First Evals

A developer details the process of building evaluation systems for two AI-powered developer tools to measure their real-world effectiveness.

AI Agents Claude Agent SDK cli Eval testing

Nick Nisi

2/26/2026 • EN

Skill Eval

Introduces Skill Eval, a TypeScript framework for testing and benchmarking AI coding agent skills to ensure reliability and correct behavior.

AI Agents benchmarking docker testing TypeScript

Minko Gechev

2/19/2026 • EN

Testing Data Pipelines: What to Validate and When

Explains the importance of automated testing for data pipelines, covering schema validation, data quality checks, and regression testing.

Data Engineering Data Pipelines Data Validation Quality Assurance testing

Alex Merced

2/9/2026 • EN

Simplifying assertions with lenses

Explores using lenses in Haskell to simplify test assertions for nested data structures, improving test readability and precision.

assertions functional programming Haskell Lenses testing

Mark Seemann

2/8/2026 • EN

So, You “10x’d” Your Work…

A critical analysis of the '10x productivity' claims in AI-assisted software development, questioning quality and oversight.

ai code quality productivity software development testing

James Bach

1/31/2026 • EN

In Praise of –dry-run

A developer explains the practical benefits of implementing a --dry-run option in a reporting application for safe testing and validation.

command line Development Workflow Dry Run software development testing

Henrik Warne

1/27/2026 • EN

Tips for getting coding agents to write good Python tests

Tips for using AI coding agents to generate high-quality Python tests, leveraging existing patterns and tools like pytest.

pytest Python software development test automation testing

Simon Willison

1/27/2026 • EN

Tips for getting coding agents to write good Python tests

Tips for using AI coding agents to generate high-quality Python tests, focusing on leveraging existing test suites and patterns.

pytest Python software development test automation testing

Simon Willison

1/22/2026 • EN

Fortran - Testing - Showing progress and printing results

A technical guide on implementing centralized test result printing and progress reporting in a Fortran testing framework.

Fortran software development test framework testing unit testing

Matthias Noback

1/21/2026 • EN

Don't Trip[wire] Yourself: Testing Error Recovery in Zig

Introducing Tripwire, a Zig library for injecting failures to test error handling and recovery paths, ensuring robust error cleanup.

Errdefer error handling Error Recovery testing Zig

Mitchell Hashimoto

1/19/2026 • EN

Filtering as domain logic

Explores strategies for implementing and testing complex data filtering logic, balancing correctness and performance between in-memory and database queries.

Databases Domain Logic Filtering software design testing

Mark Seemann