AI TRAINING
Edge AI Deployment for Embedded and IoT Teams
Deploy optimised AI models directly on devices, balancing accuracy, latency, power, and thermal constraints.
What it covers
This practitioner-level programme equips embedded and IoT engineers with the full stack of skills needed to ship AI inference on constrained hardware. Participants work hands-on with ONNX, TensorFlow Lite, Core ML, and edge LLM runtimes such as Llama.cpp and llamafile, covering model quantisation, pruning, and hardware-specific optimisation. Sessions address real-world constraints including battery budget, thermal throttling, memory limits, and over-the-air model updates. The format combines short concept modules with lab exercises on physical or emulated edge devices.
What you'll be able to do
- Convert a trained PyTorch or TensorFlow model to ONNX, TFLite, and Core ML formats and validate parity across runtimes
- Apply INT8 post-training quantisation and measure accuracy-latency trade-offs on a target device
- Run a quantised LLM (Llama.cpp or llamafile) on an edge device and profile tokens-per-second against thermal and battery budgets
- Design and implement a power-aware inference pipeline that respects duty-cycle constraints on battery-powered hardware
- Build and execute an OTA model update workflow with rollback safety on a representative IoT device
Topics covered
- Model conversion and interoperability: ONNX, TensorFlow Lite, Core ML
- Quantisation (INT8, FP16) and structured pruning for edge targets
- Edge LLM runtimes: Llama.cpp, llamafile, MLC LLM
- Hardware accelerators: NPUs, DSPs, GPU microcontrollers (ARM Ethos, Apple Neural Engine)
- Battery budget analysis and power-aware inference scheduling
- Thermal management and throttling strategies
- OTA model updates and versioning on constrained devices
- Benchmarking latency, throughput, and memory footprint on real hardware
Delivery
Delivered as a 3-5 day intensive bootcamp, on-site or remote with hardware kits shipped to participants in advance. Approximately 60% hands-on lab time, 40% guided instruction. Participants receive a reference board (e.g. Raspberry Pi 5 or STM32 dev kit) or use their own target platform. Labs use Docker-based toolchains to minimise setup friction. Remote delivery uses shared cloud-hosted hardware via SSH where physical shipping is not feasible.
What makes it work
- Start with a hardware-in-the-loop benchmark early in the project to set realistic constraints before model selection
- Adopt a model-card discipline that records accuracy, latency, power draw, and thermal behaviour for every candidate model
- Involve firmware and ML engineers in joint design reviews so power budgets are agreed before training begins
- Use automated regression tests that run the inference pipeline on the target device in CI/CD, catching regressions before release
Common mistakes
- Attempting to deploy full-precision FP32 models without quantisation, then discovering the device lacks the memory and compute budget at integration time
- Ignoring thermal throttling during sustained inference, leading to unpredictable latency spikes in production
- Treating model accuracy on desktop benchmarks as a proxy for on-device accuracy without re-validating after quantisation
- Skipping OTA update planning until late in the product lifecycle, resulting in fragile manual reflashing processes
When NOT to take this
If the team is still experimenting with model architecture and has not yet reached stable accuracy on desktop benchmarks, edge deployment optimisation is premature — the model will need to be retrained, invalidating all quantisation and conversion work done during this bootcamp.
Providers to consider
- Edge Impulse (training & certification)www.edgeimpulse.com/experts →
- Coursera – AI on the Edge Specialization (Duke University)www.coursera.org/specializations/ai-on-the-edge →
- Arm Education – ML on Arm learning pathslearn.arm.com/learning-paths/cross-platform/ml-on-arm →
- DeepLearning.AI – AI on the Edge courseswww.deeplearning.ai/short-courses/ →
Sources
This training is part of a Data & AI catalog built for leaders serious about execution. Take the free diagnostic to see which trainings your team needs.