Plumerai People Detection C API¶

This document describes the C API for the Plumerai People Detection software for videos on e.g. Arm Cortex-M and ESP32-S3 microcontrollers.

Below are all the functions that make up the API. Following that is the full header as well as a simple example of how to use the API.

PeopleDetectionInit¶

StatusCodeType PeopleDetectionInit(unsigned char* tensor_arena);

Initializes the people detection algorithm. This needs to be called only once at the start of the application.

Arguments

tensor_arena unsigned char*: A pointer to a user-allocated contiguous memory region to store persistent, input, output, and intermediate tensors. The size should be equal or larger than the value given by TENSOR_ARENA_SIZE in model_defines.h. This memory region should not be overwritten after the call to this function is made. See below for a version that splits the tensor arena in persistent and non-persistent storage.

Returns

StatusCodeType: A status-code indicating whether there was an error, it can be either of SUCCESS or ALLOCATION_ERROR (see full header below for a description of these codes).

PeopleDetectionInitSplit¶

StatusCodeType PeopleDetectionInitSplit(
    unsigned char* persistent_tensor_arena,
    unsigned char* non_persistent_tensor_arena);

Initializes the people detection algorithm. As above, but splits the tensor arena in two such that one buffer can be re-used as scratch space.

Arguments

persistent_tensor_arena unsigned char*: A pointer to a user-allocated contiguous memory region to store persistent data. The size should be equal or larger than TENSOR_ARENA_SIZE_PERSISTENT from model_defines.h. This memory region should not be overwritten after the call to this function is made.
non_persistent_tensor_arena unsigned char*: A pointer to a user-allocated contiguous memory region to store input, output, and intermediate tensors. The size should be equal or larger than TENSOR_ARENA_SIZE_NON_PERSISTENT from model_defines.h. This memory region can be overwritten by the user between (but not during) calls to PeopleDetectionProcessFrame.

Returns

StatusCodeType: A status-code indicating whether there was an error, it can be either of SUCCESS or ALLOCATION_ERROR (see full header below for a description of these codes).

PeopleDetectionCleanUp¶

StatusCodeType PeopleDetectionCleanUp();

Cleans-up the people detection algorithm. This needs to be called only once at the end of the application.

Returns

StatusCodeType: A status-code indicating whether there was an error. At the moment always returns SUCCESS.

PeopleDetectionGetInputPointer¶

signed char* PeopleDetectionGetInputPointer();

Retrieves the address of where the input data needs to be stored before making a call to PeopleDetectionProcessFrame. The camera output should be stored here in the image format described before. Any data residing at this location can be overwritten by a call to PeopleDetectionProcessFrame. The user should not write beyond PLUMERAI_IMAGE_SIZE bytes (see model_defines.h) from the start of this value. This function should not be used if input_ptr is used to specify the input data, see appendix A below.

Returns

signed char*: A pointer to the location the input data should be placed.

PeopleDetectionReadDataCallback¶

void PeopleDetectionReadDataCallback(void (*user_callback)(void* input_ptr));

As an alternative of using PeopleDetectionGetInputPointer and setting the new input data in between calls to PeopleDetectionProcessFrame, there is the option to provide a user-defined callback to retrieve input data (e.g. from a camera) in parallel to running people detection. To do so, provide a function pointer that takes as single argument a pointer to the input data (similar as what is returned by PeopleDetectionGetInputPointer) and returns nothing. In this function a user can start for an example an asynchronous DMA call to grab a new camera frame and place it in the input buffer. Waiting for execution of any such parallel operation needs to be done before making a next call to PeopleDetectionProcessFrame.

Arguments

user_callback void func(void* input_ptr): The callback function as described above. It will be executed roughly halfway execution of each call to PeopleDetectionProcessFrame and takes the address of where the input data needs to be stored. The user should not write beyond PLUMERAI_IMAGE_SIZE bytes (see model_defines.h) from the start of this value.

Returns

Nothing.

PeopleDetectionProcessFrame¶

StatusCodeType PeopleDetectionProcessFrame(BoxPrediction* results,
                                           int results_length,
                                           int* num_results_returned,
                                           float delta_t,
                                                const unsigned char* input_ptr);

Process a single frame from a video sequence with RGB input. This will process RGB image data (1st byte red, 3rd blue) of size PLUMERAI_IMAGE_HEIGHT * PLUMERAI_IMAGE_WIDTH * BYTES_PER_PIXEL (see model_defines.h) in unsigned RGB888 or RGB565 format found at the location returned by PeopleDetectionGetInputPointer or set by input_ptr, see appendix A below.

Arguments

results BoxPrediction*: A pointer to an array to store the resulting boxes in. The user needs to allocate space for this structure. The recommended size is 20, but if the user allocates less, then fewer boxes are returned. If fewer boxes are detected, then also fewer are returned. This amount is given by num_results_returned, see below.
results_length int: The number of BoxPrediction elements allocated in the provided results parameter above by the user. See that parameter for more info.
num_results_returned int*: The minimum of the number of resulting bounding-boxes found in the image and results_length. The results structure results will be filled with zeros beyond this amount. If this value is equal to results_length, it might be an indication that more boxes are found than that can be output.
delta_t float: The time in seconds it took between this and the previous call to PeopleDetectionProcessFrame (1/fps). For best quality result, this should be as accurate as possible and can be measured with a timer.
input_ptr const unsigned char*: As alternative to PeopleDetectionGetInputPointer, this can optionally be used to set the input data. In most situations, i.e. when PeopleDetectionGetInputPointer is used, it needs to be set to nullptr (C++) or NULL (C). See appendix A below for more information.

Returns

StatusCodeType: A status-code indicating whether there was an error, it can be either of SUCCESS, INVOKE_ERROR, OUTPUT_DIMS_ERROR (see full header below for a description of these codes).

BoxPrediction¶

typedef struct BoxPrediction {
  float y_min;       // top coordinate between 0 and 1 in height dimension
  float x_min;       // left coordinate between 0 and 1 in width dimension
  float y_max;       // bottom coordinate between 0 and 1 in height dimension
  float x_max;       // right coordinate between 0 and 1 in width dimension
  float confidence;  // between 0 and 1, higher means more confident
  unsigned int id;   // the tracked identifier of this box
  DetectionClass class_id;  // the class of the detected object
} BoxPrediction;

An output structure representing a single resulting bounding box. Coordinates are between 0 and 1, the origin is at the top-left.

Full people_detection_micro.h header¶

// Most functions in this API return a status-code to indicate whether
// everything went well. If not, see below for more information about the error
// code.
typedef enum StatusCode_ {
  // Everything went all right
  SUCCESS = 0,

  // Memory allocation failure, check arena size
  ALLOCATION_ERROR = -1,

  // Unexpected error, contact Plumerai
  INVOKE_ERROR = -2,

  // Unexpected error, contact Plumerai
  OUTPUT_DIMS_ERROR = -3,

  // Unexpected error, contact Plumerai
  REGISTRATION_ERROR = -4,
} StatusCode;

typedef int StatusCodeType;

// Initializes the people detection algorithm. This needs to be called only
// once at the start of the application.
//
// @param tensor_arena A pointer to a user-allocated contiguous memory region
//  to store persistent, input, output, and intermediate tensors. The size
//  should be equal or larger than the value given by `TENSOR_ARENA_SIZE` in
//  `model_defines.h`. This memory region should not be overwritten after the
//  call to this function is made. See below for a version that splits the
//  tensor arena in persistent and non-persistent storage.
// @return a status-code indicating whether there was an error, it can be either
//  of `SUCCESS` or `ALLOCATION_ERROR` (see above for a description).
StatusCodeType PeopleDetectionInit(unsigned char* tensor_arena);

// Initializes the people detection algorithm. As above, but splits the tensor
// arena in two such that one buffer can be re-used as scratch space.
//
// @param persistent_tensor_arena A pointer to a user-allocated contiguous
//  memory region to store persistent data. The size should be equal or larger
//  than `TENSOR_ARENA_SIZE_PERSISTENT` from `model_defines.h`. This memory
//  region should not be overwritten after the call to this function is made.
// @param non_persistent_tensor_arena A pointer to a user-allocated contiguous
//  memory region to store input, output, and intermediate tensors. The size
//  should be equal or larger than `TENSOR_ARENA_SIZE_NON_PERSISTENT` from
//  `model_defines.h`. This memory region can be overwritten by the user between
//  (but not during) calls to `PeopleDetectionProcessFrame`.
// @return a status-code indicating whether there was an error, it can be either
//  of `SUCCESS` or `ALLOCATION_ERROR` (see above for a description).
StatusCodeType PeopleDetectionInitSplit(
    unsigned char* persistent_tensor_arena,
    unsigned char* non_persistent_tensor_arena);

// Cleans-up the people detection algorithm. This needs to be called only once
// at the end of the application.
//
// @return a status-code indicating whether there was an error. At the moment
// always returns `SUCCESS`.
StatusCodeType PeopleDetectionCleanUp();

// Retrieves the address of where the input data needs to be stored before
// making a call to `PeopleDetectionProcessFrame`. The camera output
// should be stored here in the image format described before. Any data residing
// at this location can be overwritten by a call to
// `PeopleDetectionProcessFrame`. The user should not write beyond
// `PLUMERAI_IMAGE_SIZE` bytes (see `model_defines.h`) from the start of this
// value.
// This function should not be used if `input_ptr` is used to specify the input
// data, see appendix A below.
//
// @return a pointer to the location the input data should be placed.
void* PeopleDetectionGetInputPointer();

// As an alternative of using `PeopleDetectionGetInputPointer` and setting the
// new input data in between calls to `PeopleDetectionProcessFrame`, there
// is the option to provide a user-defined callback to retrieve input data (e.g.
// from a camera) in parallel to running people detection. To do so, provide a
// function pointer that takes as single argument a pointer to the input data
// (similar as what is returned by `PeopleDetectionGetInputPointer`) and returns
// nothing. In this function a user can start for an example an asynchronous DMA
// call to grab a new camera frame and place it in the input buffer. Waiting for
// execution of any such parallel operation needs to be done before making a
// next call to `PeopleDetectionProcessFrame`.
//
// @param user_callback The callback function as described above. It will be
//  executed roughly halfway execution of each call to
//  `PeopleDetectionProcessFrame` and takes the address of where the input
//  data needs to be stored. The user should not write beyond
//  `PLUMERAI_IMAGE_SIZE` bytes (see `model_defines.h`) from the start of this
//  value.
void PeopleDetectionReadDataCallback(void (*user_callback)(void* input_ptr));

// Process a single frame from a video sequence with RGB input. This will
// process RGB image data (1st byte red, 3rd blue) of size PLUMERAI_IMAGE_HEIGHT
// * PLUMERAI_IMAGE_WIDTH * BYTES_PER_PIXEL (see `model_defines.h`) in unsigned
// RGB888 or RGB565 format found at the location returned by
// `PeopleDetectionGetInputPointer` or set by `input_ptr`, see appendix A below.
//
// @param results A pointer to an array to store the resulting boxes in. The
//  user needs to allocate space for this structure. The recommended size is 20,
//  but if the user allocates less, then fewer boxes are returned. If fewer
//  boxes are detected, then also fewer are returned. This amount is given by
//  `num_results_returned`, see below.
// @param results_length The number of `BoxPrediction` elements allocated in
//  the provided `results` parameter above by the user. See that parameter for
//  more info.
// @param num_results_returned The minimum of the number of resulting
//  bounding-boxes found in the image and `results_length`. The results
//  structure `results` will be filled with zeros beyond this amount. If this
//  value is equal to `results_length`, it might be an indication that more
//  boxes are found than that can be output.
// @param delta_t The time in seconds it took between this and the previous
//  call to 'PeopleDetectionProcessFrame' (1/fps). For best quality result,
//  this should be as accurate as possible and can be measured with a timer.
// @param input_ptr As alternative to `PeopleDetectionGetInputPointer`, this can
//  optionally be used to set the input data. In most situations, i.e. when
//  `PeopleDetectionGetInputPointer` is used, it needs to be set to `nullptr`
//  (C++) or `NULL` (C). See appendix A below for more information.
// @return a status-code indicating whether there was an error, it can be either
//  of `SUCCESS`, `INVOKE_ERROR`, `OUTPUT_DIMS_ERROR` (see above for
//  description).
StatusCodeType PeopleDetectionProcessFrame(BoxPrediction* results,
                                           int results_length,
                                           int* num_results_returned,
                                           float delta_t,
                                                const unsigned char* input_ptr);

Appendix A: Two ways to provide input.¶

The Plumerai People Detection API for microcontrollers provides two ways to set input. However, to optimize latency and memory requirements at run-time, this choice is not flexible but already decided when the library is created. By default, input should be set at the PeopleDetectionGetInputPointer location and the input_ptr should not be used, unless indicated by Plumerai when the library was provided. If the other API suits your use-case better, please contact Plumerai.

Example model_defines.h header¶

Following is an example of the model_defines.h header. This header might be different depending on the input resolution, the software version, and the target platform.

// The total required tensor arena size, the sum of the two components below.
#define TENSOR_ARENA_SIZE 378448

// The required persistent tensor arena size. Anything in this section should
// not be overridden.
#define TENSOR_ARENA_SIZE_PERSISTENT 34304

// The required non-persistent tensor arena size. This can be freely accessed
// in between model invocations.
#define TENSOR_ARENA_SIZE_NON_PERSISTENT 344128

// The Plumerai People Detection algorithm requires a fixed input resolution.
// It uses the RGB888 data-format with 3 bytes per pixel.
#define PLUMERAI_IMAGE_WIDTH 320   // The width of the input image in pixels.
#define PLUMERAI_IMAGE_HEIGHT 240  // The height of the input image in pixels.
#define PLUMERAI_IMAGE_SIZE (PLUMERAI_IMAGE_WIDTH * PLUMERAI_IMAGE_HEIGHT * 3)

Example usage¶

Below is an example of using the C API shown above.

#include "plumerai/model_defines.h"
#include "plumerai/people_detection_micro.h"

// Here stdio.h is used for `printf`, but this can be replaced with any specific
// method of printing results to screen, if needed.
#include <cstdio>

// Example tensor arena. Can be allocated on a specific memory region if
// desired. It needs to be of size TENSOR_ARENA_SIZE.
__attribute__((aligned(16))) unsigned char tensor_arena[TENSOR_ARENA_SIZE];

void mainloop() {
  // This initializes the people detector
  int error_code = PeopleDetectionInit(tensor_arena);
  if (error_code != 0) return;
  auto input_ptr =
      reinterpret_cast<unsigned char*>(PeopleDetectionGetInputPointer());

  // We pre-allocate memory for the results, at most 10 boxes can be found
  BoxPrediction predictions[10];

  // Loop over frames in a video stream. In this example we run only 10 times.
  for (int t = 0; t < 10; ++t) {
    // Some example input here, normally this is where image data is acquired.
    // We can pass the `input_ptr` value directly to our camera API.
    // In this example here we simply assign some values for fake input data.
    for (int i = 0; i < PLUMERAI_IMAGE_SIZE; ++i) {
      input_ptr[i] = static_cast<unsigned char>(i % 256);
    }

    // Compute the delta-t in seconds between the current call and the previous
    // call to 'PeopleDetectionProcessFrame'. Here we just set it to some
    // fixed value.
    const auto delta_t_s = 0.5f;  // 2FPS

    // Process a single input image frame from e.g. a camera video stream. The
    // result will be stored in `predictions`, and `num_results` will tell us
    // how many results there are.
    int num_results = 0;
    error_code = PeopleDetectionProcessFrame(predictions, 10, &num_results,
                                                  delta_t_s, nullptr);
    if (error_code != 0) return;

    // Write the results to a terminal. Replace this with your own printing
    // function if needed.
    printf("Detected %d people\n", num_results);
    for (int i = 0; i < num_results; ++i) {
      BoxPrediction p = predictions[i];
      printf(
          "Detected person with confidence %.2f @ (x,y) -> (%.2f,%.2f) till "
          "(%.2f,%.2f)\n",
          p.confidence, p.x_min, p.y_min, p.x_max, p.y_max);
    }
  }

  // Clean-up
  error_code = PeopleDetectionCleanUp();
  if (error_code != 0) return;
}

For advanced use with a callback, e.g. for parallel camera data capture, the above example can be adjusted slightly. The first part of the above mainloop function then becomes:

void mainloop() {
  // This initializes the people detector
  int error_code = PeopleDetectionInit(tensor_arena);
  if (error_code != 0) return;

  // This sets the read-data callback which can be used to read in camera data
  // in parallel to running inference, e.g. through a DMA. This callback will be
  // executed roughly halfway execution of 'PeopleDetectionProcessFrame'.
  PeopleDetectionReadDataCallback(start_camera_capture);

  // We pre-allocate memory for the results, at most 10 boxes can be found
  BoxPrediction predictions[10];

  // Since the callback above is only executed during the first inference, we
  // need to acquire the first camera input manually before entering the loop.
  auto input_ptr =
      reinterpret_cast<unsigned char *>(PeopleDetectionGetInputPointer());
  start_camera_capture(input_ptr);

  // Loop over frames in a video stream. In this example we run only 10 times.
  for (int t = 0; t < 10; ++t) {
    // Before we start 'PeopleDetectionProcessFrame' we want to make sure
    // the camera data is fully captured.
    finalize_camera_capture();

It then continues as above with setting int num_results = 0;, making the call to PeopleDetectionProcessFrame and everything that follows. Those two new functions should then be filled in depending on the camera set-up, e.g.:

void start_camera_capture(void *input_ptr) {
  // Here we would normally ask the camera to start recording data and store it
  // into the 'input_ptr'. However, since this differs per camera, we leave this
  // unimplemented here. Ideally this function finishes almost instantly, but
  // leaves some parallel process working in the background (e.g. a DMA). We
  // should not write beyond 'PLUMERAI_IMAGE_SIZE' bytes from 'input_ptr'.
}

void finalize_camera_capture() {
  // If the 'start_camera_capture' function above started some parallel process
  // then it can be synchronized here to make sure the camera capture process
  // is completed. If it isn't completed yet, we can wait here in this function.
}