Plumerai Inference Engine C++ API for microcontrollers¶
This document describes the C++ API for the Plumerai Inference Engine for microcontrollers.
The main API¶
The C++ API consists of a single header file which is self-documented. The class constructor does not take any arguments, instead the class should be initialized with the Initialize
function. Afterwards, AllocateTensors
needs to be called. Once that is done, inference can be done by calling Invoke
. Data can be set and read with the input
and output
methods. See below for details.
The API is re-entrant, i.e. you can instantiate several InferenceEngine
objects in different threads and use them independently. However, using the same instance from different threads at the same time is not supported.
InferenceEngine constructor¶
An empty constructor. The class will only be properly initialized after theInitialize
method is called (see below). Initialize the class - simple¶
template <bool report_mode = false>
TfLiteStatus InferenceEngine::Initialize(std::uint8_t* tensor_arena_ptr,
int tensor_arena_size,
int model_id = 0,
tflite::MicroProfiler* profiler=nullptr)
Initializes the inference engine object. This method has to be called before any other method can be called.
Arguments:
- report_mode
bool
: This template argument can be used to toggle between report-mode or regular-mode (the default). Enabling this allowsprint_report
to be called (see below). For best speed and code size this flag should be disabled. - tensor_arena_ptr
std::uint8_t*
: The tensor arena has to be provided by the user and should be large enough to hold the model's activation tensors. For best performance the tensor arena is 16-byte aligned. The class does not take ownership of the tensor arena. The contents of the tensor arena should not be overwritten during the lifetime of the object, except by setting input tensor data through the corresponding functions. In case of multiple models (seemodel_id
below), each model should have its own tensor arena. See below for the advanced setup, where part of the tensor arena can be shared with other models or other applications. - tensor_arena_size:
int
: The size of the tensor arena passed bytensor_arena_ptr
above. - model_id:
int
: In case the inference engine library was built to include multiple models, this argument can be used to select a model for this instance of theInferenceEngine
class. The default is 0, which will select the only model in single-model mode or the first model in multi-model mode. - profiler:
tflite::MicroProfiler*
: An optional custom profiler of standard TensorFlow Lite profiling type whenreport_mode
is disabled. If anullptr
is passed (the default), theInferenceEngine
option will not report any profiling, unlessreport_mode
is enabled. In that case, an internally constructed profiler will be used.
Returns:
TfLiteStatus
: Can be kTfLiteError
when there are errors or kTfLiteOk
otherwise.
Initialize the class - advanced¶
template <bool report_mode = false>
TfLiteStatus Initialize(std::uint8_t* persistent_tensor_arena_ptr,
int persistent_tensor_arena_size,
std::uint8_t* non_persistent_tensor_arena_ptr,
int non_persistent_tensor_arena_size,
int model_id = 0,
::tflite::MicroProfiler* profiler = nullptr);
Initializes the inference engine object. This method has to be called before any other method can be called.
This method is similar to the simple Initialize
method above, except that the arena is split in a persistent and non-persistent part. The non-persistent part can be re-used by the user or by another model. See the building documentation for more information.
Arguments:
- report_mode
bool
: This template argument can be used to toggle between report-mode or regular-mode (the default). Enabling this allowsprint_report
to be called (see below). For best speed and code size this flag should be disabled. - persistent_tensor_arena_ptr
std::uint8_t*
: The persistent tensor arena has to be provided by the user and should be large enough to hold the model's persistent data. The class does not take ownership of the tensor arena. The contents of the tensor arena should not be overwritten during the lifetime of the object. In case of multiple models (seemodel_id
below), each model should have its own persistent tensor arena. - persistent_tensor_arena_size:
int
: The size of the tensor arena passed bypersistent_tensor_arena_ptr
above. - non_persistent_tensor_arena_ptr
std::uint8_t*
: The non-persistent tensor arena has to be provided by the user and should be large enough to hold the model's activation tensors. For best performance the tensor arena is 16-byte aligned. The class does not take ownership of the tensor arena. The input and output tensor will be allocated somewhere in the non-persistent tensor arena. The contents of the non-persistent tensor arena can be overwritten by the user to use for other purposes after the output has been read out. In case of multiple models (seemodel_id
below), different models can share the same non-persistent arena as long as they do not execute simultaneously in different threads. - non_persistent_tensor_arena_size:
int
: The size of the tensor arena passed bynon_persistent_tensor_arena_ptr
above. - model_id:
int
: In case the inference engine library was built to include multiple models, this argument can be used to select a model for this instance of theInferenceEngine
class. The default is 0, which will select the only model in single-model mode or the first model in multi-model mode. - profiler:
tflite::MicroProfiler*
: An optional custom profiler of standard TensorFlow Lite profiling type whenreport_mode
is disabled. If anullptr
is passed (the default), theInferenceEngine
option will not report any profiling, unlessreport_mode
is enabled. In that case, an internally constructed profiler will be used.
Returns:
TfLiteStatus
: Can be kTfLiteError
when there are errors or kTfLiteOk
otherwise.
Allocate the tensors¶
Allocates input, output and intermediate tensors in the tensor arena. This needs to be called before running inference with Invoke
. When custom ops have been registered using AddCustomOp
(see below), this will call the Init
and Prepare
functions of those ops.
Returns:
TfLiteStatus
: Can be kTfLiteError
when not enough space is available or kTfLiteOk
otherwise.
Invoke to run inference¶
Run inference assuming input data is already set using the input
function below. Requires AllocateTensors
to be called first. Note that calling this function will likely override any data set at the tensor obtained from input
: re-running Invoke
a second time requires the input data to be set again.
Returns:
TfLiteStatus
: Can be kTfLiteError
when there are errors or kTfLiteOk
otherwise.
Access the input and output tensors¶
TfLiteTensor* InferenceEngine::input(int input_id)
TfLiteTensor* InferenceEngine::output(int output_id)
Get access to the input and output tensors. The returned TfLiteTensor
object is the same as the one in Tensorflow (tensorflow/lite/c/common.h
). Relevant functionality includes getting a pointer to the data, the datatype and the shape of the tensor:
TfLiteTensor* input_tensor = engine.input(0);
TfLiteType input_data_type = input_tensor->type;
std::int8_t* input_data = tflite::GetTensorData<std::int8_t>(input_tensor);
TfLiteIntArray* input_shape = input_tensor->dims;
Note that parts of the inference happen in-place and because the pointer to the output data might be the same as the pointer to the input data, input data can be overridden. Therefore, the data pointed-to by the TfLiteTensor
output of the input
method is only valid before Invoke
is called, while the one for the output
method is only valid after Invoke
is finished.
Arguments:
- input_id
int
: The index of the accessed tensor, starting counting at zero. For example, this can be set to2
for the third input/output tensor. Useinputs_size
oroutputs_size
(see below) to query the number of input or output tensors.
Returns:
TfLiteTensor
: See the Tensorflow source code for documentation, or above for an example.
Query the number of input and output tensors¶
Retrieve the number of input or output tensors in the model.
Returns:
size_t
: The number of input or output tensors.
The optional/advanced API¶
The API methods described below are only not needed for basic usage of the inference engine.
Print the report¶
When report_mode
is set to true
in the Initialize
function, this method can print the report. It needs to be called after Invoke
is called. The report contains a table with each op and details such as the number of parameters, the latency per op, and RAM usage information.
Reset the tensor state¶
Reset the state to be what you would expect when the interpreter is first created after AllocateTensors
is called. This is useful for recurrent neural networks (e.g. LSTMs) which can preserve internal state between Invoke
calls. In case of a stateless LSTM, this method needs to be called after each call to Invoke
.
Returns:
TfLiteStatus
: Can be kTfLiteError
when there are errors or kTfLiteOk
otherwise.
Add a custom op¶
Optional in case there are ops that are not supported by the inference engine. If this is used, it has to be called before AllocateTensors
. The call will be forwarded to the Tensorflow Op Resolver class function MicroOpResolver::AddCustom and accepts the same arguments.
Arguments:
- name
const char*
: The name of the new custom op. - registration
TfLiteRegistration*
: See MicroOpResolver::AddCustom
Returns:
TfLiteStatus
: Can be kTfLiteError
when there are errors or kTfLiteOk
otherwise.
Retrieve the tensor arena usage¶
This method gives the optimal arena size, i.e. the size that was actually needed. This can be used to reduce the tensor arena size passed to Initialize
. It is only available after AllocateTensors
has been called.
Returns:
size_t
: The used tensor arena in bytes.
Example usage¶
The Plumerai Inference Engine consists of just a single class, plumerai::InferenceEngine
. It can be used as follows:
#include "plumerai/inference_engine.h"
// Assumes 'TENSOR_ARENA_SIZE' is defined as compilation argument or in a header
uint8_t tensor_arena[TENSOR_ARENA_SIZE];
// TODO: Implement this to define how debug printing is done
extern "C" void DebugLog(const char *s) {
// printf("%s", s);
}
int main() {
constexpr bool report_mode = false;
plumerai::InferenceEngine inference;
inference.Initialize<report_mode>(tensor_arena, TENSOR_ARENA_SIZE);
// Allocate memory from the tensor_arena for the model's tensors.
auto allocate_status = inference.AllocateTensors();
if (allocate_status != kTfLiteOk) {
MicroPrintf("AllocateTensors() failed");
return 1;
}
// Obtain pointers to the model's input and output tensors.
// TODO: Assumes the model has one input and one output, modify this if there
// are more.
TfLiteTensor* input = inference.input(0);
TfLiteTensor* output = inference.output(0);
// Example: print the input shape
MicroPrintf("Input shape:");
for (int i = 0; i < input->dims->size; ++i) {
MicroPrintf(" %d", input->dims->data[i]);
}
// Example: run inference in an infinite loop.
while (true) {
// Set input data example. TODO: Get data from sensor.
int8_t* input_data = tflite::GetTensorData<int8_t>(input);
input_data[0] = 17; // example, setting first element to '17'
// Run inference on a single input.
auto invoke_status = inference.Invoke();
if (invoke_status != kTfLiteOk) {
MicroPrintf("Invoke failed");
return 1;
}
// Read results and print first output to screen.
int8_t* output_data = tflite::GetTensorData<int8_t>(output);
MicroPrintf("Result: %d", int(output_data[0]));
}
return 0;
}
The above example can be compiled and linked as explained in the 'Building' section of the documentation, or using the following example Makefile assuming the above code is named main.cc
and the inference engine can be found in /path/to/plumerai_inference_engine
:
# Project name
TARGET=example
# Folders
LIB_DIR=/path/to/plumerai_inference_engine
LIB_INCL_DIR=$(LIB_DIR)/include
LIBRARY=$(LIB_DIR)/libplumerai.a
BUILD_DIR=build
# Compiler settings
CC=gcc
CCFLAGS=-I$(LIB_INCL_DIR) -O3 -Wl,--gc-sections -fno-rtti -fno-exceptions
LINKER_FLAGS=-lm
# Define the list of source and object files
SOURCES=main.cc
OBJECTS=$(patsubst %.cc, $(BUILD_DIR)/%.o, $(SOURCES))
# Define the main makefile target
all: $(TARGET)
# Target to compile a C++ file
$(BUILD_DIR)/%.o: %.cc
mkdir -p $(dir $@)
$(CC) -c -o $@ $< $(CCFLAGS)
# Target to compile and link the final binary
$(TARGET): $(OBJECTS)
$(CC) -o $(BUILD_DIR)/$@ $^ $(CCFLAGS) $(LIBRARY) $(LINKER_FLAGS)
# Target to clean-up the build directory
.PHONY: clean
clean:
rm -f $(BUILD_DIR)/*.o $(BUILD_DIR)/$(TARGET)