Agents 2.0: From Shallow Loops to Deep Agents
Explores the evolution from simple, stateless AI agents (Agent 1.0) to advanced, deep agents (Agent 2.0) capable of complex, multi-step tasks.
Philipp Schmid is a Staff Engineer at Google DeepMind, building AI Developer Experience and DevRel initiatives. He specializes in LLMs, RLHF, and making advanced AI accessible to developers worldwide.
189 articles from this blog
Explores the evolution from simple, stateless AI agents (Agent 1.0) to advanced, deep agents (Agent 2.0) capable of complex, multi-step tasks.
Explains the concept of AI subagents, specialized agents for specific tasks, and their architecture using an orchestrator model.
A 10-step guide for e-commerce teams to generate consistent product images using Google's Gemini 2.5 Flash AI model for text-to-image and editing tasks.
Explores the concept of memory in AI agents, detailing short-term and long-term memory architectures to overcome LLM statelessness.
A quick reference guide for installing, configuring, and using the Google Gemini CLI, an AI-powered terminal tool for coding and task management.
Introducing Code Sandbox MCP, a Model Context Protocol server for safely executing Python and JavaScript code in containers via AI agents.
A guide to adding long-term memory to a Gemini 2.5 chatbot using the Mem0 library and vector databases for personalized AI interactions.
Explains why Context Engineering, not just prompt crafting, is the key skill for building effective AI agents and systems.
Explores the trade-offs between single-agent and multi-agent AI systems, discussing their characteristics, pros, and cons for different tasks.
Explores common design patterns for building AI agents and workflows, discussing when to use them and how to implement core concepts.
A technical cheatsheet for using Google's Gemini AI models with the LangChain framework, covering setup, chat models, prompt templates, and image inputs.
Explains the architecture and workflow of OpenAI's Codex CLI, a terminal-based AI tool for chat-driven software development.
An overview of the Model Context Protocol (MCP), an open standard for connecting AI applications to external tools and data sources.
A tutorial on building a ReAct AI agent from scratch using Google's Gemini 2.5 Pro/Flash and the LangGraph framework for complex reasoning and tool use.
Explains the difference between Pass@k and Pass^k metrics for evaluating AI agent reliability, highlighting why consistency matters in production.
A tutorial on implementing function calling with Google's Gemma 3 27B LLM, showing how to connect it to external tools and APIs.
A practical guide to implementing function calling with Google's Gemini 2.0 Flash model, enabling LLMs to interact with external tools and APIs.
A tutorial on using Google's Gemini 2.0 AI models to extract structured data like invoice numbers and dates from PDF documents.
A tutorial on reproducing DeepSeek R1's RL 'aha moment' using Group Relative Policy Optimization (GRPO) to train a model on the Countdown numbers game.
A technical guide on aligning open-source large language models (LLMs) in 2025 using Direct Preference Optimization (DPO) and synthetic data.