Plumerai Inference Engine C++ API for microcontrollers¶

This document describes the C++ API for the Plumerai Inference Engine for microcontrollers.

The main API¶

The C++ API consists of a single header file which is self-documented. The class constructor does not take any arguments, instead the class should be initialized with the Initialize function. Afterwards, AllocateTensors needs to be called. Once that is done, inference can be done by calling Invoke. Data can be set and read with the input and output methods. See below for details.

The API is re-entrant, i.e. you can instantiate several InferenceEngine objects in different threads and use them independently. However, using the same instance from different threads at the same time is not supported.

InferenceEngine constructor¶

InferenceEngine::InferenceEngine()

An empty constructor. The class will only be properly initialized after the Initialize method is called (see below).

Initialize the class - simple¶

template <bool report_mode = false>
TfLiteStatus InferenceEngine::Initialize(std::uint8_t* tensor_arena_ptr,
                                         int tensor_arena_size,
                                         int model_id = 0,
                                         tflite::MicroProfiler* profiler=nullptr)

Initializes the inference engine object. This method has to be called before any other method can be called.

Arguments:

report_mode bool: This template argument can be used to toggle between report-mode or regular-mode (the default). Enabling this allows print_report to be called (see below). For best speed and code size this flag should be disabled.
tensor_arena_ptr std::uint8_t*: The tensor arena has to be provided by the user and should be large enough to hold the model's activation tensors. For best performance the tensor arena is 16-byte aligned. The class does not take ownership of the tensor arena. The contents of the tensor arena should not be overwritten during the lifetime of the object, except by setting input tensor data through the corresponding functions. In case of multiple models (see model_id below), each model should have its own tensor arena. See below for the advanced setup, where part of the tensor arena can be shared with other models or other applications.
tensor_arena_size: int: The size of the tensor arena passed by tensor_arena_ptr above.
model_id: int: In case the inference engine library was built to include multiple models, this argument can be used to select a model for this instance of the InferenceEngine class. The default is 0, which will select the only model in single-model mode or the first model in multi-model mode.
profiler: tflite::MicroProfiler*: An optional custom profiler of standard TensorFlow Lite profiling type when report_mode is disabled. If a nullptr is passed (the default), the InferenceEngine option will not report any profiling, unless report_mode is enabled. In that case, an internally constructed profiler will be used.

Returns:

TfLiteStatus: Can be kTfLiteError when there are errors or kTfLiteOk otherwise.

Initialize the class - advanced¶

template <bool report_mode = false>
TfLiteStatus Initialize(std::uint8_t* persistent_tensor_arena_ptr,
                        int persistent_tensor_arena_size,
                        std::uint8_t* non_persistent_tensor_arena_ptr,
                        int non_persistent_tensor_arena_size,
                        int model_id = 0,
                        ::tflite::MicroProfiler* profiler = nullptr);

Initializes the inference engine object. This method has to be called before any other method can be called.

This method is similar to the simple Initialize method above, except that the arena is split in a persistent and non-persistent part. The non-persistent part can be re-used by the user or by another model. See the building documentation for more information.

Arguments:

report_mode bool: This template argument can be used to toggle between report-mode or regular-mode (the default). Enabling this allows print_report to be called (see below). For best speed and code size this flag should be disabled.
persistent_tensor_arena_ptr std::uint8_t*: The persistent tensor arena has to be provided by the user and should be large enough to hold the model's persistent data. The class does not take ownership of the tensor arena. The contents of the tensor arena should not be overwritten during the lifetime of the object. In case of multiple models (see model_id below), each model should have its own persistent tensor arena.
persistent_tensor_arena_size: int: The size of the tensor arena passed by persistent_tensor_arena_ptr above.
non_persistent_tensor_arena_ptr std::uint8_t*: The non-persistent tensor arena has to be provided by the user and should be large enough to hold the model's activation tensors. For best performance the tensor arena is 16-byte aligned. The class does not take ownership of the tensor arena. The input and output tensor will be allocated somewhere in the non-persistent tensor arena. The contents of the non-persistent tensor arena can be overwritten by the user to use for other purposes after the output has been read out. In case of multiple models (see model_id below), different models can share the same non-persistent arena as long as they do not execute simultaneously in different threads.
non_persistent_tensor_arena_size: int: The size of the tensor arena passed by non_persistent_tensor_arena_ptr above.
model_id: int: In case the inference engine library was built to include multiple models, this argument can be used to select a model for this instance of the InferenceEngine class. The default is 0, which will select the only model in single-model mode or the first model in multi-model mode.
profiler: tflite::MicroProfiler*: An optional custom profiler of standard TensorFlow Lite profiling type when report_mode is disabled. If a nullptr is passed (the default), the InferenceEngine option will not report any profiling, unless report_mode is enabled. In that case, an internally constructed profiler will be used.

Returns:

TfLiteStatus: Can be kTfLiteError when there are errors or kTfLiteOk otherwise.

Allocate the tensors¶

TfLiteStatus InferenceEngine::AllocateTensors()

Allocates input, output and intermediate tensors in the tensor arena. This needs to be called before running inference with Invoke. When custom ops have been registered using AddCustomOp (see below), this will call the Init and Prepare functions of those ops.

Returns:

TfLiteStatus: Can be kTfLiteError when not enough space is available or kTfLiteOk otherwise.

Invoke to run inference¶

TfLiteStatus InferenceEngine::Invoke()

Run inference assuming input data is already set using the input function below. Requires AllocateTensors to be called first. Note that calling this function will likely override any data set at the tensor obtained from input: re-running Invoke a second time requires the input data to be set again.

Returns:

TfLiteStatus: Can be kTfLiteError when there are errors or kTfLiteOk otherwise.

Access the input and output tensors¶

TfLiteTensor* InferenceEngine::input(int input_id)
TfLiteTensor* InferenceEngine::output(int output_id)

Get access to the input and output tensors. The returned TfLiteTensor object is the same as the one in Tensorflow (tensorflow/lite/c/common.h). Relevant functionality includes getting a pointer to the data, the datatype and the shape of the tensor:

TfLiteTensor* input_tensor = engine.input(0);
TfLiteType input_data_type = input_tensor->type;
std::int8_t* input_data = tflite::GetTensorData<std::int8_t>(input_tensor);
TfLiteIntArray* input_shape = input_tensor->dims;

Note that parts of the inference happen in-place and because the pointer to the output data might be the same as the pointer to the input data, input data can be overridden. Therefore, the data pointed-to by the TfLiteTensor output of the input method is only valid before Invoke is called, while the one for the output method is only valid after Invoke is finished.

Arguments:

input_id int: The index of the accessed tensor, starting counting at zero. For example, this can be set to 2 for the third input/output tensor. Use inputs_size or outputs_size (see below) to query the number of input or output tensors.

Returns:

TfLiteTensor: See the Tensorflow source code for documentation, or above for an example.

Query the number of input and output tensors¶

size_t InferenceEngine::inputs_size() const
size_t InferenceEngine::outputs_size() const

Retrieve the number of input or output tensors in the model.

Returns:

size_t: The number of input or output tensors.

The optional/advanced API¶

The API methods described below are only not needed for basic usage of the inference engine.

Print the report¶

void print_report() const

When report_mode is set to true in the Initialize function, this method can print the report. It needs to be called after Invoke is called. The report contains a table with each op and details such as the number of parameters, the latency per op, and RAM usage information.

Reset the tensor state¶

TfLiteStatus Reset()

Reset the state to be what you would expect when the interpreter is first created after AllocateTensors is called. This is useful for recurrent neural networks (e.g. LSTMs) which can preserve internal state between Invoke calls. In case of a stateless LSTM, this method needs to be called after each call to Invoke.

Returns:

TfLiteStatus: Can be kTfLiteError when there are errors or kTfLiteOk otherwise.

Add a custom op¶

TfLiteStatus InferenceEngine::AddCustomOp(const char* name,
                                          TFLMRegistration* registration)

Optional in case there are ops that are not supported by the inference engine. If this is used, it has to be called before AllocateTensors. The call will be forwarded to the Tensorflow Op Resolver class function MicroOpResolver::AddCustom and accepts the same arguments.

Arguments:

name const char*: The name of the new custom op.
registration TFLMRegistration*: See MicroOpResolver::AddCustom

Returns:

TfLiteStatus: Can be kTfLiteError when there are errors or kTfLiteOk otherwise.

Retrieve the tensor arena usage¶

size_t InferenceEngine::arena_used_bytes() const

This method gives the optimal arena size, i.e. the size that was actually needed. This can be used to reduce the tensor arena size passed to Initialize. It is only available after AllocateTensors has been called.

Returns:

size_t: The used tensor arena in bytes.

Example usage¶

The Plumerai Inference Engine consists of just a single class, plumerai::InferenceEngine. It can be used as follows:

#include "plumerai/inference_engine.h"
#include "plumerai/model_defines.h"

// The constant 'TENSOR_ARENA_SIZE' is defined in 'plumerai/model_defines.h'
uint8_t tensor_arena[TENSOR_ARENA_SIZE];

// TODO: Implement this to define how debug printing is done
extern "C" void DebugLog(const char *format, va_list args) {
  // vprintf(format, args);
}

int main() {
  constexpr bool report_mode = false;
  plumerai::InferenceEngine inference;
  inference.Initialize<report_mode>(tensor_arena, TENSOR_ARENA_SIZE);

  // Allocate memory from the tensor_arena for the model's tensors.
  auto allocate_status = inference.AllocateTensors();
  if (allocate_status != kTfLiteOk) {
    MicroPrintf("AllocateTensors() failed");
    return 1;
  }

  // Obtain pointers to the model's input and output tensors.
  // TODO: Assumes the model has one input and one output, modify this if there
  // are more.
  TfLiteTensor* input = inference.input(0);
  TfLiteTensor* output = inference.output(0);

  // Example: print the input shape
  MicroPrintf("Input shape:");
  for (int i = 0; i < input->dims->size; ++i) {
    MicroPrintf(" %d", input->dims->data[i]);
  }

  // Example: run inference in an infinite loop.
  while (true) {
    // Set input data example. TODO: Get data from sensor.
    int8_t* input_data = tflite::GetTensorData<int8_t>(input);
    input_data[0] = 17;  // example, setting first element to '17'

    // Run inference on a single input.
    auto invoke_status = inference.Invoke();
    if (invoke_status != kTfLiteOk) {
      MicroPrintf("Invoke failed");
      return 1;
    }

    // Read results and print first output to screen.
    int8_t* output_data = tflite::GetTensorData<int8_t>(output);
    MicroPrintf("Result: %d", int(output_data[0]));
  }

  return 0;
}

The above example can be compiled and linked as explained in the 'Building' section of the documentation, or using the following example Makefile assuming the above code is named main.cc and the inference engine can be found in /path/to/plumerai_inference_engine:

# Project name
TARGET=example

# Folders
LIB_DIR=/path/to/plumerai_inference_engine
LIB_INCL_DIR=$(LIB_DIR)/include
LIBRARY=$(LIB_DIR)/libplumerai.a
BUILD_DIR=build

# Compiler settings
CC=gcc
CCFLAGS=-I$(LIB_INCL_DIR) -O3 -Wl,--gc-sections -fno-rtti -fno-exceptions
LINKER_FLAGS=-lm

# Define the list of source and object files
SOURCES=main.cc
OBJECTS=$(patsubst %.cc, $(BUILD_DIR)/%.o, $(SOURCES))

# Define the main makefile target
all: $(TARGET)

# Target to compile a C++ file
$(BUILD_DIR)/%.o: %.cc
    mkdir -p $(dir $@)
    $(CC) -c -o $@ $< $(CCFLAGS)

# Target to compile and link the final binary
$(TARGET): $(OBJECTS)
    $(CC) -o $(BUILD_DIR)/$@ $^ $(CCFLAGS) $(LIBRARY) $(LINKER_FLAGS)

# Target to clean-up the build directory
.PHONY: clean
clean:
    rm -f $(BUILD_DIR)/*.o $(BUILD_DIR)/$(TARGET)