Plumerai Inference Engine C API for microcontrollers¶

This document describes the C API for the Plumerai Inference Engine for microcontrollers.

The main API¶

The C API consists of a single header file which is self-documented. All functions use the PlumeraiInference object, which should be initialized with the PlumeraiInferenceInit function. Afterwards, PlumeraiInferenceAllocateTensors needs to be called. Once that is done, inference can be done by calling PlumeraiInferenceInvoke. Data can be set and read with the PlumeraiInferenceInput and PlumeraiInferenceOutput methods. See below for details.

The API is re-entrant, i.e. you can create several PlumeraiInference objects in different threads and use them independently. However, using the same instance from different threads at the same time is not supported.

Initialize the inference engine¶

PlumeraiInference PlumeraiInferenceInit(unsigned char* tensor_arena_ptr,
                                        int tensor_arena_size,
                                        int model_id)

Creates the inference engine object.

Arguments:

tensor_arena_ptr unsigned char*: The tensor arena has to be provided by the user and should be large enough to hold the model's activation tensors. For best performance the tensor arena is 16-byte aligned. The contents of the tensor arena should not be overwritten during the lifetime of the object, except by setting input tensor data through the corresponding functions. In case of multiple models (see model_id below), each model should have its own tensor arena. See below for the advanced setup, where part of the tensor arena can be shared with other models or other applications.
tensor_arena_size: int: The size of the tensor arena passed by tensor_arena_ptr above.
model_id: int: In case the inference engine library was built to include multiple models, this argument can be used to select a model for this instance of the PlumeraiInference handle. In the single-model case (the default), this argument should be set to 0.

Returns:

PlumeraiInference: An initialized inference engine, which can be passed to other functions below.

Initialize the inference engine - advanced¶

PlumeraiInference PlumeraiInferenceAdvancedArenaInit(unsigned char* persistent_tensor_arena_ptr,
                                                     int persistent_tensor_arena_size,
                                                     unsigned char* non_persistent_tensor_arena_ptr,
                                                     int non_persistent_tensor_arena_size, int model_id);

Creates the inference engine object.

This method is similar to PlumeraiInferenceInit above, except that the arena is split in a persistent and non-persistent part. The non-persistent part can be re-used by the user or by another model. See the building documentation for more information.

Arguments:

persistent_tensor_arena_ptr unsigned char*: The persistent tensor arena has to be provided by the user and should be large enough to hold the model's persistent data. The class does not take ownership of the tensor arena. The contents of the tensor arena should not be overwritten during the lifetime of the object. In case of multiple models (see model_id below), each model should have its own persistent tensor arena.
persistent_tensor_arena_size: int: The size of the tensor arena passed by persistent_tensor_arena_ptr above.
non_persistent_tensor_arena_ptr unsigned char*: The non-persistent tensor arena has to be provided by the user and should be large enough to hold the model's activation tensors. For best performance the tensor arena is 16-byte aligned. The class does not take ownership of the tensor arena. The input and output tensor will be allocated somewhere in the non-persistent tensor arena. The contents of the non-persistent tensor arena can be overwritten by the user to use for other purposes after the output has been read out. In case of multiple models (see model_id below), different models can share the same non-persistent arena as long as they do not execute simultaneously in different threads.
non_persistent_tensor_arena_size: int: The size of the tensor arena passed by non_persistent_tensor_arena_ptr above.
model_id: int: In case the inference engine library was built to include multiple models, this argument can be used to select a model for this instance of the PlumeraiInference handle. In the single-model case (the default), this argument should be set to 0.

Returns:

PlumeraiInference: An initialized inference engine, which can be passed to other functions below.

Allocate the tensors¶

TfLiteStatus PlumeraiInferenceAllocateTensors(PlumeraiInference* engine)

Allocates input, output and intermediate tensors in the tensor arena. This needs to be called before running inference with Invoke.

Arguments:

engine PlumeraiInference*: A pointer to an inference engine object as created by PlumeraiInferenceInit.

Returns:

TfLiteStatus: Can be kTfLiteError when not enough space is available or kTfLiteOk otherwise.

Invoke to run inference¶

TfLiteStatus PlumeraiInferenceInvoke(PlumeraiInference* engine)

Run inference assuming input data is already set using the PlumeraiInferenceInput function below. Requires PlumeraiInferenceAllocateTensors to be called first. Note that calling this function will likely override any data set at the tensor obtained from PlumeraiInferenceInput: re-running PlumeraiInferenceInvoke a second time requires the input data to be set again.

Arguments:

engine PlumeraiInference*: A pointer to an inference engine object as created by PlumeraiInferenceInit.

Returns:

TfLiteStatus: Can be kTfLiteError when there are errors or kTfLiteOk otherwise.

Access the input and output tensors¶

TfLiteTensor* PlumeraiInferenceInput(PlumeraiInference* engine, int input_id)
TfLiteTensor* PlumeraiInferenceOutput(PlumeraiInference* engine, int output_id)

Get access to the input and output tensors. The returned TfLiteTensor object is the same as the one in Tensorflow (tensorflow/lite/c/common.h). Relevant functionality includes getting a pointer to the data, the datatype and the shape of the tensor:

TfLiteTensor* input_tensor = PlumeraiInferenceInput(&engine, 0);
TfLiteType input_data_type = input_tensor->type;
char* input_data = input_tensor->data.raw;
TfLiteIntArray* input_shape = input_tensor->dims;

Note that parts of the inference happen in-place and because the pointer to the output data might be the same as the pointer to the input data, input data can be overridden. Therefore, the data pointed-to by the TfLiteTensor output of the PlumeraiInferenceInput method is only valid before PlumeraiInferenceInvoke is called, while the one for the PlumeraiInferenceOutput method is only valid after PlumeraiInferenceInvoke is finished.

Arguments:

engine PlumeraiInference*: A pointer to an inference engine object as created by PlumeraiInferenceInit.
input_id int: The index of the accessed tensor, starting counting at zero. For example, this can be set to 2 for the third input/output tensor. Use PlumeraiInferenceInputsSize or PlumeraiInferenceOutputsSize (see below) to query the number of input or output tensors.

Returns:

TfLiteTensor: See the Tensorflow source code for documentation, or above for an example.

Query the number of input and output tensors¶

int PlumeraiInferenceInputsSize(PlumeraiInference* engine)
int PlumeraiInferenceOutputsSize(PlumeraiInference* engine)

Retrieve the number of input or output tensors in the model.

Arguments:

engine PlumeraiInference*: A pointer to an inference engine object as created by PlumeraiInferenceInit.

Returns:

int: The number of input or output tensors.

The optional/advanced API¶

The API methods described below are only not needed for basic usage of the inference engine.

Print the report¶

void PlumeraiInferencePrintReport(PlumeraiInference* engine)

When PLUMERAI_INFERENCE_REPORT_MODE is enabled (and thus PlumeraiInferenceReportModeInit is called), this method can print the report. It needs to be called after PlumeraiInferenceInvoke is called. The report contains a table with each op and details such as the number of parameters, the latency per op, and RAM usage information.

Arguments:

engine PlumeraiInference*: A pointer to an inference engine object as created by PlumeraiInferenceInit.

Reset the tensor state¶

TfLiteStatus Reset(PlumeraiInference* engine)

Reset the state to be what you would expect when the inference engine is first created after PlumeraiInferenceAllocateTensors is called. This is useful for recurrent neural networks (e.g. LSTMs) which can preserve internal state between PlumeraiInferenceInvoke calls. In case of a stateless LSTM, this method needs to be called after each call to PlumeraiInferenceInvoke.

Arguments:

engine PlumeraiInference*: A pointer to an inference engine object as created by PlumeraiInferenceInit.

Returns:

TfLiteStatus: Can be kTfLiteError when there are errors or kTfLiteOk otherwise.

Retrieve the tensor arena usage¶

int PlumeraiInferenceArenaUsedBytes(PlumeraiInference* engine)

This method gives the optimal arena size, i.e. the size that was actually needed. This can be used to reduce the tensor arena size passed to PlumeraiInferenceInit. It is only available after PlumeraiInferenceAllocateTensors has been called.

Arguments:

engine PlumeraiInference*: A pointer to an inference engine object as created by PlumeraiInferenceInit.

Returns:

int: The used tensor arena in bytes.

Example usage¶

The Plumerai Inference Engine can be used as follows from a C program:

#include "plumerai/inference_engine_c.h"
#include "plumerai/model_defines.h"

#include <stdio.h>

// The constant 'TENSOR_ARENA_SIZE' is defined in 'plumerai/model_defines.h'
unsigned char tensor_arena[TENSOR_ARENA_SIZE];

// TODO: Implement this to define how debug printing is done
void DebugLog(const char *s) {
  // printf("%s", s);
}

int main(void) {
  PlumeraiInference engine = PlumeraiInferenceInit(
    tensor_arena, TENSOR_ARENA_SIZE, 0
  );

  // Allocate memory from the tensor_arena for the model's tensors.
  TfLiteStatus allocate_status = PlumeraiInferenceAllocateTensors(&engine);
  if (allocate_status != kTfLiteOk) {
    plumerai_printf("AllocateTensors() failed\n");
    return 1;
  }

  // Obtain pointers to the model's input and output tensors.
  // TODO: Assumes the model has one input and one output, modify this if there
  // are more.
  TfLiteTensor* input = PlumeraiInferenceInput(&engine, 0);
  TfLiteTensor* output = PlumeraiInferenceOutput(&engine, 0);

  // Example: print the input shape
  plumerai_printf("Input shape:\n");
  for (int i = 0; i < input->dims->size; ++i) {
    plumerai_printf(" %d\n", input->dims->data[i]);
  }

  // Example: run inference in an infinite loop.
  while (true) {
    // Set input data example. TODO: Get data from sensor.
    char* input_data = input->data.raw;
    input_data[0] = 17;  // example, setting first element to '17'

    // Run inference on a single input.
    TfLiteStatus invoke_status = PlumeraiInferenceInvoke(&engine);
    if (invoke_status != kTfLiteOk) {
      plumerai_printf("Invoke failed\n");
      return 1;
    }

    // Read results and print first output to screen.
    char* output_data = output->data.raw;
    plumerai_printf("Result: %d\n", (int)output_data[0]);
  }

  return 0;
}

The above example can be compiled and linked as explained in the 'Building' section of the documentation, or using the following example Makefile assuming the above code is named main.c and the inference engine can be found in /path/to/plumerai_inference_engine:

# Project name
TARGET=example

# Folders
LIB_DIR=/path/to/plumerai_inference_engine
LIB_INCL_DIR=$(LIB_DIR)/include
LIBRARY=$(LIB_DIR)/libplumerai.a
BUILD_DIR=build

# Compiler settings
CC=gcc
CFLAGS=-I$(LIB_INCL_DIR) -O3 -Wl,--gc-sections
LINKER_FLAGS=-lm

# Define the list of source and object files
SOURCES=main.c
OBJECTS=$(patsubst %.c, $(BUILD_DIR)/%.o, $(SOURCES))

# Define the main makefile target
all: $(TARGET)

# Target to compile a C file
$(BUILD_DIR)/%.o: %.c
    mkdir -p $(dir $@)
    $(CC) -c -o $@ $< $(CFLAGS)

# Target to compile and link the final binary
$(TARGET): $(OBJECTS)
    $(CC) -o $(BUILD_DIR)/$@ $^ $(CFLAGS) $(LIBRARY) $(LINKER_FLAGS)

# Target to clean-up the build directory
.PHONY: clean
clean:
    rm -f $(BUILD_DIR)/*.o $(BUILD_DIR)/$(TARGET)