How mature is your Data & AI organization?Take the diagnostic
All trainings

AI TRAINING

Edge AI Deployment for Embedded and IoT Teams

Deploy optimised AI models directly on devices, balancing accuracy, latency, power, and thermal constraints.

Format
bootcamp
Duration
24–40h
Level
practitioner
Group size
4–14
Price / participant
€2K–€4K
Group price
€18K–€45K
Audience
Embedded software engineers, firmware developers, and IoT platform architects deploying ML inference on-device
Prerequisites
Solid Python or C/C++ programming skills; working knowledge of basic ML concepts (model training, inference); familiarity with at least one embedded or IoT platform (Raspberry Pi, STM32, ESP32, mobile, or similar)

What it covers

This practitioner-level programme equips embedded and IoT engineers with the full stack of skills needed to ship AI inference on constrained hardware. Participants work hands-on with ONNX, TensorFlow Lite, Core ML, and edge LLM runtimes such as Llama.cpp and llamafile, covering model quantisation, pruning, and hardware-specific optimisation. Sessions address real-world constraints including battery budget, thermal throttling, memory limits, and over-the-air model updates. The format combines short concept modules with lab exercises on physical or emulated edge devices.

What you'll be able to do

  • Convert a trained PyTorch or TensorFlow model to ONNX, TFLite, and Core ML formats and validate parity across runtimes
  • Apply INT8 post-training quantisation and measure accuracy-latency trade-offs on a target device
  • Run a quantised LLM (Llama.cpp or llamafile) on an edge device and profile tokens-per-second against thermal and battery budgets
  • Design and implement a power-aware inference pipeline that respects duty-cycle constraints on battery-powered hardware
  • Build and execute an OTA model update workflow with rollback safety on a representative IoT device

Topics covered

  • Model conversion and interoperability: ONNX, TensorFlow Lite, Core ML
  • Quantisation (INT8, FP16) and structured pruning for edge targets
  • Edge LLM runtimes: Llama.cpp, llamafile, MLC LLM
  • Hardware accelerators: NPUs, DSPs, GPU microcontrollers (ARM Ethos, Apple Neural Engine)
  • Battery budget analysis and power-aware inference scheduling
  • Thermal management and throttling strategies
  • OTA model updates and versioning on constrained devices
  • Benchmarking latency, throughput, and memory footprint on real hardware

Delivery

Delivered as a 3-5 day intensive bootcamp, on-site or remote with hardware kits shipped to participants in advance. Approximately 60% hands-on lab time, 40% guided instruction. Participants receive a reference board (e.g. Raspberry Pi 5 or STM32 dev kit) or use their own target platform. Labs use Docker-based toolchains to minimise setup friction. Remote delivery uses shared cloud-hosted hardware via SSH where physical shipping is not feasible.

What makes it work

  • Start with a hardware-in-the-loop benchmark early in the project to set realistic constraints before model selection
  • Adopt a model-card discipline that records accuracy, latency, power draw, and thermal behaviour for every candidate model
  • Involve firmware and ML engineers in joint design reviews so power budgets are agreed before training begins
  • Use automated regression tests that run the inference pipeline on the target device in CI/CD, catching regressions before release

Common mistakes

  • Attempting to deploy full-precision FP32 models without quantisation, then discovering the device lacks the memory and compute budget at integration time
  • Ignoring thermal throttling during sustained inference, leading to unpredictable latency spikes in production
  • Treating model accuracy on desktop benchmarks as a proxy for on-device accuracy without re-validating after quantisation
  • Skipping OTA update planning until late in the product lifecycle, resulting in fragile manual reflashing processes

When NOT to take this

If the team is still experimenting with model architecture and has not yet reached stable accuracy on desktop benchmarks, edge deployment optimisation is premature — the model will need to be retrained, invalidating all quantisation and conversion work done during this bootcamp.

Providers to consider

Sources

This training is part of a Data & AI catalog built for leaders serious about execution. Take the free diagnostic to see which trainings your team needs.