Building an application with the inference engine¶
The generated inference engine library consists of four header files and a pre-compiled static library:
include/plumerai/inference_engine.h # for the C++ API only include/plumerai/inference_engine_c.h # for the C API only include/plumerai/tensorflow_compatibility.h include/plumerai/model_defines.h libplumerai.a
To build, make sure the header files can be found on the compiler include path, and link with
-Wl,--gc-sections when linking to garbage-collect unused code from the binary. The library is compiled with
-ffunction-sections to support this.
The exact details on how to compile and link depend on the target platform and compiler. Please refer to their respective documentation for detailed instructions.
The inference engine is built on top of Tensorflow Lite for Microcontrollers (TFLM), and usage is very similar. Instructions on how to use the API along with an example can be found here for the C++ API or here for the C API.
Log messages are the same as in TFLM: one has to provide a C function called
DebugLog to output strings, for example over UART.
The tensor arena is a chunk of memory that stores the tensor data during model inference. The user has to provide this and make sure it is large enough. All tensors, including the model input and output, will point to a location within the tensor arena, overlapping each other when possible. For ideal usage, the tensor arena should be 16-byte aligned.
During the lifetime of the inference engine object, the tensor arena can not be used by the user other than setting input through the respective API. The advanced setup, described below, allows the user to use part of the tensor arena space for their own application.
For convenience, the inference engine generator provides a
TENSOR_ARENA_SIZE define (and
TENSOR_ARENA_SIZE_REPORT_MODE for report mode) in the generated
include/plumerai/model_defines.h file. This define can be used directly in the application after an
#include "plumerai/model_defines.h" is added, and should in most cases be sufficient. In rare cases it could only be a lower-bound and might need to be increased slightly. The user is informed about these cases when the
Arena size estimation might be inaccurate message is printed in the offline report.
In the case of supplying multiple models to the Inference Engine Generator, the user has to provide multiple separate tensor arenas, one for each model. In this case,
plumerai/model_defines.h adds the defines
TENSOR_ARENA_SIZE_MODEL_X_REPORT_MODE for report mode), where
1, or higher depending on the number of models. The original defines
TENSOR_ARENA_SIZE_REPORT_MODE also still exist: they are the sum of all model-specific defines. It is possible to save space by re-using parts of the arena for different models, this is covered in the next section.
Advanced tensor arena¶
The tensor arena consists of two parts: the persistent and non-persistent part. The regular setup expects the user to provide a single tensor arena to cover both parts. The advanced setup gives the user more control over these parts.
- The persistent arena stores persistent data such as tensor metadata and statefull LSTM variables. This data should not be overwritten by the user during the lifetime of the inference engine object. Different instances of the inference engine (for different models for example) will need separate persistent tensor arenas.
- The non-persistent arena stores the activation tensor data (including the model input and output) as well as scratch buffers needed for certain layers. The non-persistent arena is only used when inference is performed (during the
Invokefunction) and can be re-used by the user or by another model when an inference pass is completed. It can also be used by the user for other applications.
It is important to note that the model input and output tensors are part of the non-persistent arena: after performing inference the user application should first read out the result before re-using the non-persistent arena for other purposes.
When multiple models share the same non-persistent arena, the user has to ensure that the non-persistent arena is large enough for all models: its size should be the maximum over the non-persistent size requirements for each model.
The Inference Engine Generator generates the preprocessor definitions
X is the model id. The define
TENSOR_ARENA_SIZE_NON_PERSISTENT_MAX is the maximum over all non-persistent sizes so that a buffer of this size can be shared by all models.
Example applications can be found here for C++ and here for C.