Skip to content

Getting started with the inference engine for microcontrollers

This document describes how to use Plumerai's inference engine for Arm Cortex-M microcontrollers. For documentation about the APIs and an example, please see here for C++ and here for C.

Overview

Plumerai’s inference engine provides fast execution of deep learning models in tiny memory spaces. It reads in your 8-bit TensorFlow Lite file (.tflite) and generates a library that includes the model, runtime, and optimized kernels. The inference software has been benchmarked on Arm Cortex-M microcontrollers as the fastest and most memory-efficient in the world. This is without additional quantization, binarization or pruning. The model accuracy stays exactly the same.

You can find more information about the inference engine on our blog and try it out in our online benchmarker to find out how fast and memory efficient it is for your model.

Building

The Plumerai Inference Engine for Arm Cortex-M microcontrollers consists of three header files and a pre-compiled static library:

plumerai_inference.h  # for the C++ API only
plumerai_inference_c.h  # for the C API only
plumerai_tensorflow_compatibility.h
libPlumerai.a

To build, make sure the header files can be found on the compiler include path, and link with libPlumerai.a.

Usage

The Plumerai Inference Engine is built on top of Tensorflow Lite for Microcontrollers (TFLM), and usage is very similar.

Log messages

Log messages are the same as in TFLM: one has to provide a C function called DebugLog to output strings, for example over UART.

If you define DebugLog in a C++ instead of a C file, make sure to mark it as extern "C":

#include <cstdio>

extern "C" void DebugLog(const char* s) {
  // This defines how logging is done. To be adjusted depending on the target platform.
  printf("%s", s);
}

Tensor arena

The tensor arena is a chunk of memory that stores the tensor data during model inference. The user has to provide this and make sure it is large enough. All tensors, including the model input and output, will point to a location within the tensor arena, overlapping each other when possible. For ideal usage, the tensor arena should be 16-byte aligned.

Example

Example applications can be found here for C++ and here for C.