Plumerai People Detection C API¶

This document describes the C API for the Plumerai People Detection software for videos on Arm Cortex-A and x86.

The API¶

The C API wraps the C++ API and is as similar as possible. Since C++ classes are not supported, a CPeopleDetection object is introduced which needs to be passed around. The API is simple enough: the user needs to call PeopleDetectionInit once at the start, PeopleDetectionCleanUp once at the end, and PeopleDetectionProcessFrame needs to be executed on each input frame. Additionally, there is a PeopleDetectionSingleImage function that can be used to process a single image independent of a video sequence.

The API is re-entrant, i.e. you can initialize several people detection objects in different threads and use them independently. However, using the same instance from different threads at the same time is not supported.

PeopleDetectionInit¶

CPeopleDetection PeopleDetectionInit(int height, int width)

Initializes a new people detection object. This needs to be called only once at the start of the application.

Arguments

height int: The height of the input image in pixels.
width int: The width of the input image in pixels.

Returns

CPeopleDetection: The resulting initialized object.

PeopleDetectionCleanUp¶

void PeopleDetectionCleanUp(CPeopleDetection ppd)

Destructor, needs to be called at the very end to clean it up.

Arguments

ppd CPeopleDetection: An initialized CPeopleDetection object.

Returns

Nothing.

PeopleDetectionProcessFrame¶

PlumeraiErrorCode PeopleDetectionProcessFrame(
    CPeopleDetection ppd, const unsigned char *image_data,
    BoxPrediction *results, int max_results_length, int *num_results,
    float delta_t);
PlumeraiErrorCode PeopleDetectionProcessFrameYUYV(
    CPeopleDetection ppd, const unsigned char *image_data,
    BoxPrediction *results, int max_results_length, int *num_results,
    float delta_t);
PlumeraiErrorCode PeopleDetectionProcessFrameYUV420(
    CPeopleDetection ppd, const unsigned char *image_y,
    const unsigned char *image_u, const unsigned char *image_v,
    BoxPrediction *results, int max_results_length, int *num_results,
    float delta_t);

Process a single frame from a video sequence.

The first version supports RGB input. The second version supports YUYV input. The third version supports planar YUV input with 420 chroma subsampling.

Make sure the image is right side up. When it is upside down it can still work but accuracy is significantly degraded.

Note that the algorithm comes with a built-in threshold (e.g. 0.6 - this differs per model): boxes with confidences lower than that value won't be produced at all by this function.

Arguments

ppd CPeopleDetection: An initialized CPeopleDetection object.
image_data const unsigned char *: A pointer to RGB image data (1st byte red, 3rd blue) of size height * width * 3 or YUYV image data of size height * width * 2.
image_data_y const unsigned char *: A pointer to the Y channel, of size height * width.
image_data_u const unsigned char *: A pointer to the U channel, of size height * width / 4.
image_data_v const unsigned char *: A pointer to the V channel, of size height * width / 4.
results BoxPrediction *: A pointer to an array to store the resulting boxes in. The user needs to allocate space for this structure. The recommended size is 20, but if the user allocates less, then fewer boxes are returned. If fewer boxes are detected, then also fewer are returned. See num_results.
max_results_length int: The number of BoxPrediction elements allocated in the provided results parameter above. See that parameter for more info.
num_results int *: A pointer to an integer, which receives the number of result bounding-boxes. This is the minimum of the number of detected bounding boxes found in the image, and the max_results_length parameter. The results structure results will be filled with zeros beyond this amount. If this value is equal to max_results_length, it might be an indication that more boxes are found than that can be output.
delta_t float: The time in seconds it took between this and the previous video frame (1/fps). If set to 0 then the system clock will be used to compute this value.

Returns

PlumeraiErrorCode: Returns PLUMERAI_SUCCESS on success. See PlumeraiErrorCode for other possible return values.

PeopleDetectionSingleImage¶

PlumeraiErrorCode PeopleDetectionSingleImage(
    CPeopleDetection ppd, const unsigned char *image_data,
    BoxPrediction *results, int max_results_length, int *num_results);
PlumeraiErrorCode PeopleDetectionSingleImageYUYV(
    CPeopleDetection ppd, const unsigned char *image_data,
    BoxPrediction *results, int max_results_length, int *num_results);
PlumeraiErrorCode PeopleDetectionSingleImageYUV420(
    CPeopleDetection ppd, const unsigned char *image_y,
    const unsigned char *image_u, const unsigned char *image_v,
    BoxPrediction *results, int max_results_length, int *num_results);

Process a single image not part of a video sequence.

The first version supports RGB input. The second version supports YUYV input. The third version supports planar YUV input with 420 chroma subsampling.

This should not be used for video data. It can be used for face enrollments from a set of images. The returned box id values are not related to those returned by PeopleDetectionProcessFrame or other calls to PeopleDetectionSingleImage.

Arguments

ppd CPeopleDetection: An initialized CPeopleDetection object.
image_data const unsigned char *: A pointer to RGB image data (1st byte red, 3rd blue) of size height * width * 3 or YUYV image data of size height * width * 2.
image_data_y const unsigned char *: A pointer to the Y channel, of size height * width.
image_data_u const unsigned char *: A pointer to the U channel, of size height * width / 4.
image_data_v const unsigned char *: A pointer to the V channel, of size height * width / 4.
results BoxPrediction *: A pointer to an array to store the resulting boxes in. The user needs to allocate space for this structure. The recommended size is 20, but if the user allocates less, then fewer boxes are returned. If fewer boxes are detected, then also fewer are returned. See num_results.
max_results_length int: The number of BoxPrediction elements allocated in the provided results parameter above. See that parameter for more info.
num_results int *: A pointer to an integer, which receives the number of result bounding-boxes. This is the minimum of the number of detected bounding boxes found in the image, and the max_results_length parameter. The results structure results will be filled with zeros beyond this amount. If this value is equal to max_results_length, it might be an indication that more boxes are found than that can be output.

Returns

PlumeraiErrorCode: Returns PLUMERAI_SUCCESS on success. See PlumeraiErrorCode for other possible return values.

PeopleDetectionStoreState¶

PlumeraiErrorCode PeopleDetectionStoreState(CPeopleDetection ppd,
                                            unsigned char **state_data,
                                            size_t *state_size);

Store the current state of the algorithm to a byte array.

This function can be used when processing a video in chunks, doing different chunks at different times or on different machines. The state can be restored by calling restore_state with the data returned by PeopleDetectionStoreState. When the library is built with support for familiar face identification, the state includes the face library.

Constraints:

The delta_t parameter of PeopleDetectionProcessFrame can not be left to zero after restoring a previous state.
If familiar face identification is enabled, the state can only be stored and restored when not enrolling.

Arguments

ppd CPeopleDetection: An initialized CPeopleDetection object.
state_data unsigned char**: A pointer that receives the address of the serialized data, the caller must free this pointer.
state_size size_t *: A pointer that receives the size of the serialized data.

Returns

PlumeraiErrorCode: Returns PLUMERAI_SUCCESS on success.

PeopleDetectionRestoreState¶

PlumeraiErrorCode PeopleDetectionRestoreState(
    CPeopleDetection ppd, const unsigned char *state_data, size_t state_size);

Restore the state of the algorithm from a byte array.

See store_state for more information. The user must ensure that the height and width of the current object match the height and width of the state that is being restored.

Arguments

ppd CPeopleDetection: An initialized CPeopleDetection object.
state_data const unsigned char*: A pointer to the serialized state.
state_size size_t: The size of the data.

Returns

PlumeraiErrorCode: Returns PLUMERAI_SUCCESS on success. Returns PLUMERAI_STATE_CORRUPT or PLUMERAI_STATE_SETTINGS_MISMATCH on error.

PeopleDetectionGetMotionDetectionGrid¶

PlumeraiErrorCode PeopleDetectionGetMotionDetectionGrid(CPeopleDetection ppd,
    float *motion_detection_grid);

int PeopleDetectionGetMotionDetectionGridHeight(CPeopleDetection ppd);
int PeopleDetectionGetMotionDetectionGridWidth(CPeopleDetection ppd);

Retrieves the amount of motion found in each grid cell of the frame.

As a by-product of people detection, motion detection is performed. For specific use-cases, it might be useful to access this raw motion detection information as well. This function provides access to it, however, it can only be called as long as Plumerai People Detection is also running, and it won't be available in the first few frames.

The result of this function is a 2D grid of type float, with dimensions that can be retrieved with PeopleDetectionGetMotionDetectionGridHeight() and PeopleDetectionGetMotionDetectionGridWidth(). The values in each grid cell are floats between 0.0 and 1.0, and denote how much motion was detected in that grid cell. A higher value indicates more motion.

Arguments

ppd CPeopleDetection: An initialized CPeopleDetection object.
motion_detection_grid float*: A float pointer in which the result will be stored. The user needs to allocate space before calling this function. The size should be sizeof(float) * PeopleDetectionGetMotionDetectionGridHeight() * PeopleDetectionGetMotionDetectionGridWidth().

Returns

PlumeraiErrorCode: Returns PLUMERAI_SUCCESS if all went fine, or PLUMERAI_MOTION_GRID_NOT_YET_READY if this function is called too soon after initialization of the people detection object. It needs to process at least a few frames before the motion grid is valid.

PeopleDetectionDebugNextFrame¶

PlumeraiErrorCode PeopleDetectionDebugNextFrame(CPeopleDetection ppd,
                                                const char *output_file_name);

Enable debug mode for the next frame. The next time a video frame is processed, this will dump the input image as well as internal data and final results to a file. This file can then be shared with Plumerai support for further analysis. The file will be overwritten if it already exists, so to debug multiple frames, distinct filenames have to be used in successive calls to this function. Warning: these files contain uncompressed image data and can become large.

Arguments

output_file_name const char *: A filename to write the data to.

Returns

PlumeraiErrorCode: Returns PLUMERAI_SUCCESS on success. It might return PLUMERAI_INTERNAL_ERROR if this function is called twice without calling PeopleDetectionProcessFrame, or if the file could not be opened for writing.

BoxPrediction¶

typedef enum {
  CLASS_UNKNOWN = 0,
  CLASS_PERSON = 1,
  CLASS_HEAD = 2,
  CLASS_FACE = 3,
  CLASS_MAX_ENUM = 3,
} DetectionClass;

typedef struct BoxPrediction {
  float y_min;             // top coordinate between 0 and 1 in height dimension
  float x_min;             // left coordinate between 0 and 1 in width dimension
  float y_max;             // bottom coordinate between 0 and 1 in height dimension
  float x_max;             // right coordinate between 0 and 1 in width dimension
  float confidence;        // between 0 and 1, higher means more confident
  unsigned int id;         // the tracked identifier of this box
  DetectionClass class_id; // the class of the detected object
} BoxPrediction;

A structure representing a single resulting bounding box. Coordinates are between 0 and 1, the origin is at the top-left. Confidence values lie between 0 and 1. Note that the algorithm comes with a built-in threshold (e.g. 0.6 - this differs per model): boxes with confidences lower than that value won't be produced at all by the Plumerai People Detection functions.

PlumeraiErrorCode¶

typedef enum {
  PLUMERAI_SUCCESS = 0,

  // Should not occur, contact Plumerai if this happens
  PLUMERAI_INTERNAL_ERROR = -1,

  // The `delta_t` parameter should be >= 0
  PLUMERAI_INVALID_DELTA_T = -2,

  // The state can not be (re)stored while enrolling
  PLUMERAI_STATE_WHILE_ENROLLING = -3,

  // The state could not be restored
  PLUMERAI_STATE_CORRUPT = -4,

  // The state was serialized with a different height/width than the current
  // object
  PLUMERAI_STATE_SETTINGS_MISMATCH = -5,

  // This error code is only returned by
  // `PeopleDetectionGetMotionDetectionGrid`.
  // See that function for more details.
  PLUMERAI_MOTION_GRID_NOT_YET_READY = -6
} PlumeraiErrorCode;

PeopleDetectionErrorCodeString¶

const char *PeopleDetectionErrorCodeString(PlumeraiErrorCode error_code);

Returns a string representation of an error code.

Arguments

error_code PlumeraiErrorCode: An error code.

Returns

const char*: A string representation of the error code.

Example usage¶

Below is an example of using the C API shown above.

#include <stdio.h>
#include <stdlib.h>

#include "plumerai/people_detection_c.h"

int main(void) {
  // Settings, to be changed as needed
  const int width = 1600;   // camera image width in pixels
  const int height = 1200;  // camera image height in pixels

  // Initialize the people detection algorithm
  CPeopleDetection ppd = PeopleDetectionInit(height, width);
  BoxPrediction predictions[10];

  // Pre-allocate space for the input image (*3 for RGB)
  unsigned char *image = (unsigned char *)malloc(height * width * 3);

  // Loop over frames in a video stream (example: 10 times)
  for (int t = 0; t < 10; ++t) {
    // Some example input here, normally this is where camera data is acquired
    image[0] = 12;
    image[1] = 143;
    // etc...

    // The time between two video frames in seconds
    // In this example we assume a constant frame rate of 30 fps, but variable
    // rates are supported.
    const float delta_t = 1.f / 30.f;

    // Process the frame
    int num_results = 0;
    PlumeraiErrorCode error_code = PeopleDetectionProcessFrame(
        ppd, image, predictions, 10, &num_results, delta_t);
    if (error_code != PLUMERAI_SUCCESS) {
      printf("Error: %s\n", PeopleDetectionErrorCodeString(error_code));
      return 1;
    }

    // Display the results to stdout
    for (int i = 0; i < num_results; ++i) {
      BoxPrediction p = predictions[i];
      printf(
          "Box #%d of class %d with confidence %.2f @ (x,y) -> (%.2f,%.2f) "
          "till (%.2f,%.2f)\n",
          p.id, p.class_id, p.confidence, p.x_min, p.y_min, p.x_max, p.y_max);
    }
    if (num_results == 0) {
      printf("No bounding boxes found in frame\n");
    }
  }

  // Clean-up
  free(image);
  PeopleDetectionCleanUp(ppd);
  return 0;
}

Upgrade guide¶

From version 1.14 to 1.15¶

The confidence_threshold argument has been removed from the single_image function. The type PlumeraiErrorCode was introduced. Most functions now return this error code. The process frame and single image functions now return the number of results through an output argument instead of the return value.

If your code looked like this before:

int num_results =
    PeopleDetectionProcessFrame(ppd, image, predictions, 10, delta_t);

Then it should be updated as follows:

int num_results = 0;
PlumeraiErrorCode error_code = PeopleDetectionProcessFrame(
    ppd, image, predictions, 10, &num_results, delta_t);
if (error_code != PLUMERAI_SUCCESS) {
  printf("Error: %s\n", PeopleDetectionErrorCodeString(error_code));
}