AI TRAINING

AI Agent Engineering with Claude and MCP

Build production-grade autonomous agents with planning loops, tool use, memory, and safety gates using Claude.

Format: bootcamp
Duration: 24–40h
Level: advanced
Group size: 6–16
Price / participant: €2K–€4K
Group price: €18K–€45K
Audience: Software engineers and ML engineers building autonomous or semi-autonomous AI agent systems
Prerequisites: Solid Python proficiency, REST API experience, and basic familiarity with LLM prompting and JSON schemas

What it covers

This hands-on bootcamp teaches software engineers to design, build, and evaluate autonomous AI agents using Anthropic's Claude API, the Agent SDK, and the Model Context Protocol (MCP). Participants implement planning loops, multi-step tool use, memory architectures, and safety evaluation harnesses across real-world scenarios. The format combines short theory segments with extended lab sessions where teams ship working agent prototypes. By the end, engineers can confidently architect, instrument, and harden autonomous agents for production deployment.

What you'll be able to do

Implement a multi-step ReAct agent loop with Claude that plans, calls tools, observes results, and self-corrects
Register and consume MCP-compatible tool servers within an agent orchestration graph
Design a hybrid memory system combining in-context state, a vector retrieval layer, and a structured episodic store
Apply safety gates that interrupt agent execution when confidence drops below a threshold or policy constraints are violated
Write an automated evaluation harness that scores agent trajectories against ground-truth task completions

Topics covered

Claude API fundamentals: tool use, function calling, and structured outputs
Agent SDK architecture: agent loops, state machines, and execution graphs
Model Context Protocol (MCP): server setup, context injection, and tool registration
Planning patterns: ReAct, Reflexion, and multi-agent orchestration
Memory architectures: in-context, external vector stores, and episodic memory
Safety gates: guardrails, constitutional AI checks, and human-in-the-loop triggers
Evaluation frameworks: trajectory scoring, tool-call accuracy, and regression harnesses
Observability and debugging: tracing agent runs, cost control, and latency profiling

Delivery

Delivered over 3–5 days, either on-site or live-remote via video conference. Each day follows a 30/70 theory-to-lab ratio. Participants need a laptop with Python 3.11+, an Anthropic API key, and access to a vector store (Pinecone or Qdrant trial accounts are sufficient). A shared GitHub repo provides starter code, evaluation scaffolding, and reference implementations. Remote cohorts use VS Code Live Share or GitHub Codespaces for pair-lab exercises. A private Slack channel remains open 30 days post-bootcamp for async Q&A.

What makes it work

Start every agent project with an evaluation harness before writing the first prompt, it forces task decomposition discipline
Define a clear contract between the orchestrator and each tool (input schema, error codes, timeout) before integration
Instrument every agent run with full trajectory traces from day one to enable fast debugging and cost optimisation
Schedule a weekly red-team session where engineers deliberately try to break the agent's safety gates

Common mistakes

Skipping trajectory evaluation: teams ship agents without automated scoring, leaving quality regressions undetected in production
Infinite loops with no circuit-breaker: agents with unbounded planning loops exhaust token budgets or enter retry spirals
Treating tool schemas as an afterthought: poorly typed tool descriptions cause Claude to misfire calls far more often than prompt quality issues
Ignoring memory eviction strategy: storing everything in-context causes latency spikes and context-window overflow on longer tasks

When NOT to take this

This bootcamp is the wrong fit for a team that has not yet shipped any LLM-powered feature to production, they will lack the debugging intuition to make sense of agent failure modes and should first complete a practitioner-level prompt-engineering or RAG programme.

Providers to consider

Sources

This training is part of a Data & AI catalog built for leaders serious about execution. Take the free diagnostic to see which trainings your team needs.

Run the diagnostic Book a call