Plumerai Inference Engine C API for microcontrollers¶
This document describes the C API for the Plumerai Inference Engine for microcontrollers.
The main API¶
The C API consists of a single header file which is self-documented. All functions use the PlumeraiInference
object, which should be initialized with the PlumeraiInferenceInit
function. Afterwards, PlumeraiInferenceAllocateTensors
needs to be called. Once that is done, inference can be done by calling PlumeraiInferenceInvoke
. Data can be set and read with the PlumeraiInferenceInput
and PlumeraiInferenceOutput
methods. See below for details.
The API is re-entrant, i.e. you can create several PlumeraiInference
objects in different threads and use them independently. However, using the same instance from different threads at the same time is not supported.
Initialize the inference engine¶
PlumeraiInference PlumeraiInferenceInit(unsigned char* tensor_arena_ptr,
int tensor_arena_size,
int model_id)
Creates the inference engine object.
Arguments:
- tensor_arena_ptr
unsigned char*
: The tensor arena has to be provided by the user and should be large enough to hold the model's activation tensors. For best performance the tensor arena is 16-byte aligned. The contents of the tensor arena should not be overwritten during the lifetime of the object, except by setting input tensor data through the corresponding functions. In case of multiple models (seemodel_id
below), each model should have its own tensor arena. See below for the advanced setup, where part of the tensor arena can be shared with other models or other applications. - tensor_arena_size:
int
: The size of the tensor arena passed bytensor_arena_ptr
above. - model_id:
int
: In case the inference engine library was built to include multiple models, this argument can be used to select a model for this instance of thePlumeraiInference
handle. In the single-model case (the default), this argument should be set to 0.
Returns:
PlumeraiInference
: An initialized inference engine, which can be passed to other functions below.
Initialize the inference engine - advanced¶
PlumeraiInference PlumeraiInferenceAdvancedArenaInit(unsigned char* persistent_tensor_arena_ptr,
int persistent_tensor_arena_size,
unsigned char* non_persistent_tensor_arena_ptr,
int non_persistent_tensor_arena_size, int model_id);
Creates the inference engine object.
This method is similar to PlumeraiInferenceInit
above, except that the arena is split in a persistent and non-persistent part. The non-persistent part can be re-used by the user or by another model. See the building documentation for more information.
Arguments:
- persistent_tensor_arena_ptr
unsigned char*
: The persistent tensor arena has to be provided by the user and should be large enough to hold the model's persistent data. The class does not take ownership of the tensor arena. The contents of the tensor arena should not be overwritten during the lifetime of the object. In case of multiple models (seemodel_id
below), each model should have its own persistent tensor arena. - persistent_tensor_arena_size:
int
: The size of the tensor arena passed bypersistent_tensor_arena_ptr
above. - non_persistent_tensor_arena_ptr
unsigned char*
: The non-persistent tensor arena has to be provided by the user and should be large enough to hold the model's activation tensors. For best performance the tensor arena is 16-byte aligned. The class does not take ownership of the tensor arena. The input and output tensor will be allocated somewhere in the non-persistent tensor arena. The contents of the non-persistent tensor arena can be overwritten by the user to use for other purposes after the output has been read out. In case of multiple models (seemodel_id
below), different models can share the same non-persistent arena as long as they do not execute simultaneously in different threads. - non_persistent_tensor_arena_size:
int
: The size of the tensor arena passed bynon_persistent_tensor_arena_ptr
above. - model_id:
int
: In case the inference engine library was built to include multiple models, this argument can be used to select a model for this instance of thePlumeraiInference
handle. In the single-model case (the default), this argument should be set to 0.
Returns:
PlumeraiInference
: An initialized inference engine, which can be passed to other functions below.
Allocate the tensors¶
Allocates input, output and intermediate tensors in the tensor arena. This needs to be called before running inference with Invoke
.
Arguments:
- engine
PlumeraiInference*
: A pointer to an inference engine object as created byPlumeraiInferenceInit
.
Returns:
TfLiteStatus
: Can be kTfLiteError
when not enough space is available or kTfLiteOk
otherwise.
Invoke to run inference¶
Run inference assuming input data is already set using the PlumeraiInferenceInput
function below. Requires PlumeraiInferenceAllocateTensors
to be called first. Note that calling this function will likely override any data set at the tensor obtained from PlumeraiInferenceInput
: re-running PlumeraiInferenceInvoke
a second time requires the input data to be set again.
Arguments:
- engine
PlumeraiInference*
: A pointer to an inference engine object as created byPlumeraiInferenceInit
.
Returns:
TfLiteStatus
: Can be kTfLiteError
when there are errors or kTfLiteOk
otherwise.
Access the input and output tensors¶
TfLiteTensor* PlumeraiInferenceInput(PlumeraiInference* engine, int input_id)
TfLiteTensor* PlumeraiInferenceOutput(PlumeraiInference* engine, int output_id)
Get access to the input and output tensors. The returned TfLiteTensor
object is the same as the one in Tensorflow (tensorflow/lite/c/common.h
). Relevant functionality includes getting a pointer to the data, the datatype and the shape of the tensor:
TfLiteTensor* input_tensor = PlumeraiInferenceInput(&engine, 0);
TfLiteType input_data_type = input_tensor->type;
char* input_data = input_tensor->data.raw;
TfLiteIntArray* input_shape = input_tensor->dims;
Note that parts of the inference happen in-place and because the pointer to the output data might be the same as the pointer to the input data, input data can be overridden. Therefore, the data pointed-to by the TfLiteTensor
output of the PlumeraiInferenceInput
method is only valid before PlumeraiInferenceInvoke
is called, while the one for the PlumeraiInferenceOutput
method is only valid after PlumeraiInferenceInvoke
is finished.
Arguments:
- engine
PlumeraiInference*
: A pointer to an inference engine object as created byPlumeraiInferenceInit
. - input_id
int
: The index of the accessed tensor, starting counting at zero. For example, this can be set to2
for the third input/output tensor. UsePlumeraiInferenceInputsSize
orPlumeraiInferenceOutputsSize
(see below) to query the number of input or output tensors.
Returns:
TfLiteTensor
: See the Tensorflow source code for documentation, or above for an example.
Query the number of input and output tensors¶
int PlumeraiInferenceInputsSize(PlumeraiInference* engine)
int PlumeraiInferenceOutputsSize(PlumeraiInference* engine)
Retrieve the number of input or output tensors in the model.
Arguments:
- engine
PlumeraiInference*
: A pointer to an inference engine object as created byPlumeraiInferenceInit
.
Returns:
int
: The number of input or output tensors.
The optional/advanced API¶
The API methods described below are only not needed for basic usage of the inference engine.
Print the report¶
When PLUMERAI_INFERENCE_REPORT_MODE
is enabled (and thus PlumeraiInferenceReportModeInit
is called), this method can print the report. It needs to be called after PlumeraiInferenceInvoke
is called. The report contains a table with each op and details such as the number of parameters, the latency per op, and RAM usage information.
Arguments:
- engine
PlumeraiInference*
: A pointer to an inference engine object as created byPlumeraiInferenceInit
.
Reset the tensor state¶
Reset the state to be what you would expect when the inference engine is first created after PlumeraiInferenceAllocateTensors
is called. This is useful for recurrent neural networks (e.g. LSTMs) which can preserve internal state between PlumeraiInferenceInvoke
calls. In case of a stateless LSTM, this method needs to be called after each call to PlumeraiInferenceInvoke
.
Arguments:
- engine
PlumeraiInference*
: A pointer to an inference engine object as created byPlumeraiInferenceInit
.
Returns:
TfLiteStatus
: Can be kTfLiteError
when there are errors or kTfLiteOk
otherwise.
Retrieve the tensor arena usage¶
This method gives the optimal arena size, i.e. the size that was actually needed. This can be used to reduce the tensor arena size passed to PlumeraiInferenceInit
. It is only available after PlumeraiInferenceAllocateTensors
has been called.
Arguments:
- engine
PlumeraiInference*
: A pointer to an inference engine object as created byPlumeraiInferenceInit
.
Returns:
int
: The used tensor arena in bytes.
Example usage¶
The Plumerai Inference Engine can be used as follows from a C program:
#include "plumerai/inference_engine_c.h"
#include <stdio.h>
// Assumes 'TENSOR_ARENA_SIZE' is defined as compilation argument or in a header
unsigned char tensor_arena[TENSOR_ARENA_SIZE];
// TODO: Implement this to define how debug printing is done
void DebugLog(const char *s) {
// printf("%s", s);
}
int main(void) {
PlumeraiInference engine = PlumeraiInferenceInit(
tensor_arena, TENSOR_ARENA_SIZE, 0
);
// Allocate memory from the tensor_arena for the model's tensors.
TfLiteStatus allocate_status = PlumeraiInferenceAllocateTensors(&engine);
if (allocate_status != kTfLiteOk) {
plumerai_printf("AllocateTensors() failed\n");
return 1;
}
// Obtain pointers to the model's input and output tensors.
// TODO: Assumes the model has one input and one output, modify this if there
// are more.
TfLiteTensor* input = PlumeraiInferenceInput(&engine, 0);
TfLiteTensor* output = PlumeraiInferenceOutput(&engine, 0);
// Example: print the input shape
plumerai_printf("Input shape:\n");
for (int i = 0; i < input->dims->size; ++i) {
plumerai_printf(" %d\n", input->dims->data[i]);
}
// Example: run inference in an infinite loop.
while (true) {
// Set input data example. TODO: Get data from sensor.
char* input_data = input->data.raw;
input_data[0] = 17; // example, setting first element to '17'
// Run inference on a single input.
TfLiteStatus invoke_status = PlumeraiInferenceInvoke(&engine);
if (invoke_status != kTfLiteOk) {
plumerai_printf("Invoke failed\n");
return 1;
}
// Read results and print first output to screen.
char* output_data = output->data.raw;
plumerai_printf("Result: %d\n", (int)output_data[0]);
}
return 0;
}
The above example can be compiled and linked as explained in the 'Building' section of the documentation, or using the following example Makefile assuming the above code is named main.c
and the inference engine can be found in /path/to/plumerai_inference_engine
:
# Project name
TARGET=example
# Folders
LIB_DIR=/path/to/plumerai_inference_engine
LIB_INCL_DIR=$(LIB_DIR)/include
LIBRARY=$(LIB_DIR)/libplumerai.a
BUILD_DIR=build
# Compiler settings
CC=gcc
CFLAGS=-I$(LIB_INCL_DIR) -O3 -Wl,--gc-sections
LINKER_FLAGS=-lm
# Define the list of source and object files
SOURCES=main.c
OBJECTS=$(patsubst %.c, $(BUILD_DIR)/%.o, $(SOURCES))
# Define the main makefile target
all: $(TARGET)
# Target to compile a C file
$(BUILD_DIR)/%.o: %.c
mkdir -p $(dir $@)
$(CC) -c -o $@ $< $(CFLAGS)
# Target to compile and link the final binary
$(TARGET): $(OBJECTS)
$(CC) -o $(BUILD_DIR)/$@ $^ $(CFLAGS) $(LIBRARY) $(LINKER_FLAGS)
# Target to clean-up the build directory
.PHONY: clean
clean:
rm -f $(BUILD_DIR)/*.o $(BUILD_DIR)/$(TARGET)