AI TRAINING

Edge AI Deployment for Embedded and IoT Teams

Deploy optimised AI models directly on devices, balancing accuracy, latency, power, and thermal constraints.

Format: bootcamp
Duration: 24–40h
Level: practitioner
Group size: 4–14
Price / participant: €2K–€4K
Group price: €18K–€45K
Audience: Embedded software engineers, firmware developers, and IoT platform architects deploying ML inference on-device
Prerequisites: Solid Python or C/C++ programming skills; working knowledge of basic ML concepts (model training, inference); familiarity with at least one embedded or IoT platform (Raspberry Pi, STM32, ESP32, mobile, or similar)

What it covers

This practitioner-level programme equips embedded and IoT engineers with the full stack of skills needed to ship AI inference on constrained hardware. Participants work hands-on with ONNX, TensorFlow Lite, Core ML, and edge LLM runtimes such as Llama.cpp and llamafile, covering model quantisation, pruning, and hardware-specific optimisation. Sessions address real-world constraints including battery budget, thermal throttling, memory limits, and over-the-air model updates. The format combines short concept modules with lab exercises on physical or emulated edge devices.

What you'll be able to do

Convert a trained PyTorch or TensorFlow model to ONNX, TFLite, and Core ML formats and validate parity across runtimes
Apply INT8 post-training quantisation and measure accuracy-latency trade-offs on a target device
Run a quantised LLM (Llama.cpp or llamafile) on an edge device and profile tokens-per-second against thermal and battery budgets
Design and implement a power-aware inference pipeline that respects duty-cycle constraints on battery-powered hardware
Build and execute an OTA model update workflow with rollback safety on a representative IoT device

Topics covered

Model conversion and interoperability: ONNX, TensorFlow Lite, Core ML
Quantisation (INT8, FP16) and structured pruning for edge targets
Edge LLM runtimes: Llama.cpp, llamafile, MLC LLM
Hardware accelerators: NPUs, DSPs, GPU microcontrollers (ARM Ethos, Apple Neural Engine)
Battery budget analysis and power-aware inference scheduling
Thermal management and throttling strategies
OTA model updates and versioning on constrained devices
Benchmarking latency, throughput, and memory footprint on real hardware

Delivery

Delivered as a 3-5 day intensive bootcamp, on-site or remote with hardware kits shipped to participants in advance. Approximately 60% hands-on lab time, 40% guided instruction. Participants receive a reference board (e.g. Raspberry Pi 5 or STM32 dev kit) or use their own target platform. Labs use Docker-based toolchains to minimise setup friction. Remote delivery uses shared cloud-hosted hardware via SSH where physical shipping is not feasible.

What makes it work

Start with a hardware-in-the-loop benchmark early in the project to set realistic constraints before model selection
Adopt a model-card discipline that records accuracy, latency, power draw, and thermal behaviour for every candidate model
Involve firmware and ML engineers in joint design reviews so power budgets are agreed before training begins
Use automated regression tests that run the inference pipeline on the target device in CI/CD, catching regressions before release

Common mistakes

Attempting to deploy full-precision FP32 models without quantisation, then discovering the device lacks the memory and compute budget at integration time
Ignoring thermal throttling during sustained inference, leading to unpredictable latency spikes in production
Treating model accuracy on desktop benchmarks as a proxy for on-device accuracy without re-validating after quantisation
Skipping OTA update planning until late in the product lifecycle, resulting in fragile manual reflashing processes

When NOT to take this

If the team is still experimenting with model architecture and has not yet reached stable accuracy on desktop benchmarks, edge deployment optimisation is premature, the model will need to be retrained, invalidating all quantisation and conversion work done during this bootcamp.

Providers to consider

Sources

This training is part of a Data & AI catalog built for leaders serious about execution. Take the free diagnostic to see which trainings your team needs.

Run the diagnostic Book a call