AI TRAINING
AI Agent Engineering with Claude and MCP
Build production-grade autonomous agents with planning loops, tool use, memory, and safety gates using Claude.
What it covers
This hands-on bootcamp teaches software engineers to design, build, and evaluate autonomous AI agents using Anthropic's Claude API, the Agent SDK, and the Model Context Protocol (MCP). Participants implement planning loops, multi-step tool use, memory architectures, and safety evaluation harnesses across real-world scenarios. The format combines short theory segments with extended lab sessions where teams ship working agent prototypes. By the end, engineers can confidently architect, instrument, and harden autonomous agents for production deployment.
What you'll be able to do
- Implement a multi-step ReAct agent loop with Claude that plans, calls tools, observes results, and self-corrects
- Register and consume MCP-compatible tool servers within an agent orchestration graph
- Design a hybrid memory system combining in-context state, a vector retrieval layer, and a structured episodic store
- Apply safety gates that interrupt agent execution when confidence drops below a threshold or policy constraints are violated
- Write an automated evaluation harness that scores agent trajectories against ground-truth task completions
Topics covered
- Claude API fundamentals: tool use, function calling, and structured outputs
- Agent SDK architecture: agent loops, state machines, and execution graphs
- Model Context Protocol (MCP): server setup, context injection, and tool registration
- Planning patterns: ReAct, Reflexion, and multi-agent orchestration
- Memory architectures: in-context, external vector stores, and episodic memory
- Safety gates: guardrails, constitutional AI checks, and human-in-the-loop triggers
- Evaluation frameworks: trajectory scoring, tool-call accuracy, and regression harnesses
- Observability and debugging: tracing agent runs, cost control, and latency profiling
Delivery
Delivered over 3–5 days, either on-site or live-remote via video conference. Each day follows a 30/70 theory-to-lab ratio. Participants need a laptop with Python 3.11+, an Anthropic API key, and access to a vector store (Pinecone or Qdrant trial accounts are sufficient). A shared GitHub repo provides starter code, evaluation scaffolding, and reference implementations. Remote cohorts use VS Code Live Share or GitHub Codespaces for pair-lab exercises. A private Slack channel remains open 30 days post-bootcamp for async Q&A.
What makes it work
- Start every agent project with an evaluation harness before writing the first prompt — it forces task decomposition discipline
- Define a clear contract between the orchestrator and each tool (input schema, error codes, timeout) before integration
- Instrument every agent run with full trajectory traces from day one to enable fast debugging and cost optimisation
- Schedule a weekly red-team session where engineers deliberately try to break the agent's safety gates
Common mistakes
- Skipping trajectory evaluation: teams ship agents without automated scoring, leaving quality regressions undetected in production
- Infinite loops with no circuit-breaker: agents with unbounded planning loops exhaust token budgets or enter retry spirals
- Treating tool schemas as an afterthought: poorly typed tool descriptions cause Claude to misfire calls far more often than prompt quality issues
- Ignoring memory eviction strategy: storing everything in-context causes latency spikes and context-window overflow on longer tasks
When NOT to take this
This bootcamp is the wrong fit for a team that has not yet shipped any LLM-powered feature to production — they will lack the debugging intuition to make sense of agent failure modes and should first complete a practitioner-level prompt-engineering or RAG programme.
Providers to consider
Sources
This training is part of a Data & AI catalog built for leaders serious about execution. Take the free diagnostic to see which trainings your team needs.