How mature is your Data & AI organization?Take the diagnostic
All trainings

AI TRAINING

AI Agent Engineering with Claude and MCP

Build production-grade autonomous agents with planning loops, tool use, memory, and safety gates using Claude.

Format
bootcamp
Duration
24–40h
Level
advanced
Group size
6–16
Price / participant
€2K–€4K
Group price
€18K–€45K
Audience
Software engineers and ML engineers building autonomous or semi-autonomous AI agent systems
Prerequisites
Solid Python proficiency, REST API experience, and basic familiarity with LLM prompting and JSON schemas

What it covers

This hands-on bootcamp teaches software engineers to design, build, and evaluate autonomous AI agents using Anthropic's Claude API, the Agent SDK, and the Model Context Protocol (MCP). Participants implement planning loops, multi-step tool use, memory architectures, and safety evaluation harnesses across real-world scenarios. The format combines short theory segments with extended lab sessions where teams ship working agent prototypes. By the end, engineers can confidently architect, instrument, and harden autonomous agents for production deployment.

What you'll be able to do

  • Implement a multi-step ReAct agent loop with Claude that plans, calls tools, observes results, and self-corrects
  • Register and consume MCP-compatible tool servers within an agent orchestration graph
  • Design a hybrid memory system combining in-context state, a vector retrieval layer, and a structured episodic store
  • Apply safety gates that interrupt agent execution when confidence drops below a threshold or policy constraints are violated
  • Write an automated evaluation harness that scores agent trajectories against ground-truth task completions

Topics covered

  • Claude API fundamentals: tool use, function calling, and structured outputs
  • Agent SDK architecture: agent loops, state machines, and execution graphs
  • Model Context Protocol (MCP): server setup, context injection, and tool registration
  • Planning patterns: ReAct, Reflexion, and multi-agent orchestration
  • Memory architectures: in-context, external vector stores, and episodic memory
  • Safety gates: guardrails, constitutional AI checks, and human-in-the-loop triggers
  • Evaluation frameworks: trajectory scoring, tool-call accuracy, and regression harnesses
  • Observability and debugging: tracing agent runs, cost control, and latency profiling

Delivery

Delivered over 3–5 days, either on-site or live-remote via video conference. Each day follows a 30/70 theory-to-lab ratio. Participants need a laptop with Python 3.11+, an Anthropic API key, and access to a vector store (Pinecone or Qdrant trial accounts are sufficient). A shared GitHub repo provides starter code, evaluation scaffolding, and reference implementations. Remote cohorts use VS Code Live Share or GitHub Codespaces for pair-lab exercises. A private Slack channel remains open 30 days post-bootcamp for async Q&A.

What makes it work

  • Start every agent project with an evaluation harness before writing the first prompt — it forces task decomposition discipline
  • Define a clear contract between the orchestrator and each tool (input schema, error codes, timeout) before integration
  • Instrument every agent run with full trajectory traces from day one to enable fast debugging and cost optimisation
  • Schedule a weekly red-team session where engineers deliberately try to break the agent's safety gates

Common mistakes

  • Skipping trajectory evaluation: teams ship agents without automated scoring, leaving quality regressions undetected in production
  • Infinite loops with no circuit-breaker: agents with unbounded planning loops exhaust token budgets or enter retry spirals
  • Treating tool schemas as an afterthought: poorly typed tool descriptions cause Claude to misfire calls far more often than prompt quality issues
  • Ignoring memory eviction strategy: storing everything in-context causes latency spikes and context-window overflow on longer tasks

When NOT to take this

This bootcamp is the wrong fit for a team that has not yet shipped any LLM-powered feature to production — they will lack the debugging intuition to make sense of agent failure modes and should first complete a practitioner-level prompt-engineering or RAG programme.

Providers to consider

Sources

This training is part of a Data & AI catalog built for leaders serious about execution. Take the free diagnostic to see which trainings your team needs.