Plumerai People Detection C API for Arm Cortex-A¶

This document describes the C API for the Plumerai People Detection software for videos on Arm Cortex-A.

The API¶

The C API wraps the C++ API and is as similar as possible. Since C++ classes are not supported, a CPeopleDetection object is introduced which needs to be passed around. The API is simple enough: the user needs to call PeopleDetectionInit once at the start, PeopleDetectionCleanUp once at the end, and PeopleDetectionProcessFrame needs to be executed on each input frame. Additionally, there is a PeopleDetectionSingleImage function that can be used to process a single image independent of a video sequence.

The API is re-entrant, i.e. you can initialize several people detection objects in different threads and use them independently. However, using the same instance from different threads at the same time is not supported.

PeopleDetectionInit¶

CPeopleDetection PeopleDetectionInit(int height, int width)

Initializes a new people detection object. This needs to be called only once at the start of the application.

Arguments

height int: The height of the input image in pixels.
width int: The width of the input image in pixels.

Returns

CPeopleDetection: The resulting initialized object.

PeopleDetectionCleanUp¶

void PeopleDetectionCleanUp(CPeopleDetection ppd)

Destructor, needs to be called at the very end to clean it up.

Arguments

ppd CPeopleDetection: An initialized CPeopleDetection object.

Returns

Nothing.

PeopleDetectionProcessFrame (RGB)¶

int PeopleDetectionProcessFrame(CPeopleDetection ppd,
                                const unsigned char *image_data,
                                BoxPrediction *results, int results_length,
                                float delta_t)

Process a single frame from a video sequence. Make sure the image is right side up. When it is upside down it can still work but accuracy is significantly degraded. See below for the YUYV and YUV420 versions.

Arguments

ppd CPeopleDetection: An initialized CPeopleDetection object.
image_data const unsigned char *: A pointer to RGB image data (1st byte red, 3rd blue) of size height * width * 3.
results BoxPrediction *: A pointer to an array to store the resulting boxes in. The user needs to allocate space for this structure. The recommended size is 20, but if the user allocates less, then fewer boxes are returned. If fewer boxes are detected, then also fewer are returned. See the return value.
results_length int: The number of BoxPrediction elements allocated in the provided results parameter above. See that parameter for more info.
delta_t float: The time in seconds it took between this and the previous video frame (1/fps). If set to 0 then the system clock will be used to compute this value.

Returns

int: The minimum of the number of resulting bounding-boxes found in the image and results_length. The results structure results will be filled with zeros beyond this amount. If this value is equal to results_length, it might be an indication that more boxes are found than that can be output.

PeopleDetectionProcessFrame (YUYV)¶

int PeopleDetectionProcessFrameYUYV(CPeopleDetection ppd,
                                    const unsigned char *image_data,
                                    BoxPrediction *results, int results_length,
                                    float delta_t)

Process a single frame from a video sequence. Make sure the image is right side up. When it is upside down it can still work but accuracy is significantly degraded.

Arguments

ppd CPeopleDetection: An initialized CPeopleDetection object.
image_data const unsigned char *: A pointer to YUYV image data of size height * width * 2.
results BoxPrediction *: A pointer to an array to store the resulting boxes in. The user needs to allocate space for this structure. The recommended size is 20, but if the user allocates less, then fewer boxes are returned. If fewer boxes are detected, then also fewer are returned. See the return value.
results_length int: The number of BoxPrediction elements allocated in the provided results parameter above. See that parameter for more info.
delta_t float: The time in seconds it took between this and the previous video frame (1/fps). If set to 0 then the system clock will be used to compute this value.

Returns

int: The minimum of the number of resulting bounding-boxes found in the image and results_length. The results structure results will be filled with zeros beyond this amount. If this value is equal to results_length, it might be an indication that more boxes are found than that can be output.

PeopleDetectionProcessFrameYUV420¶

int PeopleDetectionProcessFrameYUV420(CPeopleDetection ppd,
                                      const unsigned char *image_y,
                                      const unsigned char *image_u,
                                      const unsigned char *image_v,
                                      BoxPrediction *results,
                                      int results_length, float delta_t)

Process a single frame from a video sequence with planar YUV input with 420 chroma subsampling. Make sure the image is right side up. When it is upside down it can still work but accuracy is significantly degraded. See above for the RGB version.

Arguments

ppd CPeopleDetection: An initialized CPeopleDetection object.
image_data_y const unsigned char *: A pointer to the Y channel, of size height * width.
image_data_u const unsigned char *: A pointer to the U channel, of size height * width / 4.
image_data_v const unsigned char *: A pointer to the V channel, of size height * width / 4.
results BoxPrediction *: A pointer to an array to store the resulting boxes in. The user needs to allocate space for this structure. The recommended size is 20, but if the user allocates less, then fewer boxes are returned. If fewer boxes are detected, then also fewer are returned. See the return value.
results_length int: The number of BoxPrediction elements allocated in the provided results parameter above. See that parameter for more info.
delta_t float: The time in seconds it took between this and the previous video frame (1/fps). If set to 0 then the system clock will be used to compute this value.

Returns

int: The minimum of the number of resulting bounding-boxes found in the image and results_length. The results structure results will be filled with zeros beyond this amount. If this value is equal to results_length, it might be an indication that more boxes are found than that can be output.

PeopleDetectionSingleImage¶

int PeopleDetectionSingleImage(CPeopleDetection ppd,
                               const unsigned char *image_data,
                               float confidence_threshold,
                               BoxPrediction *results, int results_length)

Process a single image not part of a video sequence. This should not be used for video data, but only for single image evaluation and debugging. The returned box id values are not related to those returned by PeopleDetectionProcessFrame or other calls to PeopleDetectionSingleImage.

Arguments

ppd CPeopleDetection: An initialized CPeopleDetection object.
image_data const unsigned char *: A pointer to RGB image data (1st byte red, 3rd blue) of size height * width * 3.
confidence_threshold float: Any box with a confidence value below this threshold will be filtered out. Range between 0 and 1. A value of 0.63 is recommended for regular evaluation, but for mAP computation this can be set to 0.
results BoxPrediction *: A pointer to an array to store the resulting boxes in. The user needs to allocate space for this structure. The recommended size is 20, but if the user allocates less, then fewer boxes are returned. If fewer boxes are detected, then also fewer are returned. See the return value.
results_length int: The number of BoxPrediction elements allocated in the provided results parameter above. See that parameter for more info.

Returns

int: The minimum of the number of resulting bounding-boxes found in the image and results_length. The results structure results will be filled with zeros beyond this amount. If this value is equal to results_length, it might be an indication that more boxes are found than that can be output.

PeopleDetectionDebugNextFrame¶

int PeopleDetectionDebugNextFrame(CPeopleDetection ppd,
                                  const char *output_file_name);

Enable debug mode for the next frame. The next time a video frame is processed, this will dump the input image as well as internal data and final results to a file. This file can then be shared with Plumerai support for further analysis. The file will be overwritten if it already exists, so to debug multiple frames, distinct filenames have to be used in successive calls to this function. Warning: these files contain uncompressed image data and can become large.

Arguments

output_file_name const char *: A filename to write the data to.

Returns

int: Returns 1 if all went well. The function can return 0 if this method is called twice without calling PeopleDetectionProcessFrame (or it's YUV variants), or if the file could not be opened for writing.

BoxPrediction¶

typedef struct {
  float y_min;       // top coordinate between 0 and 1 in height dimension
  float x_min;       // left coordinate between 0 and 1 in width dimension
  float y_max;       // bottom coordinate between 0 and 1 in height dimension
  float x_max;       // right coordinate between 0 and 1 in width dimension
  float confidence;  // between 0 and 1, higher means more confident
  unsigned int id;   // the tracked identifier of this box
  unsigned int class_id;  // the class of the detected object
} BoxPrediction;

A structure representing a single resulting bounding box. Coordinates are between 0 and 1, the origin is at the top-left.

Full header¶

#ifndef PLUMERAI_VIDEO_BOUNDING_BOX_API_C
#define PLUMERAI_VIDEO_BOUNDING_BOX_API_C

#include "box_prediction.h"

// C interface to a C++ class using an opaque pointer
typedef void *CPeopleDetection;

// Initializes a new people detection object. This needs to be called only
// once at the start of the application.
//
// @param height The height of the input image in pixels.
// @param width The width of the input image in pixels.
// @return the resulting initialized object.
CPeopleDetection PeopleDetectionInit(int height, int width);

// Destructor, needs to be called at the very end to clean it up.
//
// @param ppd An initialized `CPeopleDetection` object.
void PeopleDetectionCleanUp(CPeopleDetection ppd);

// Process a single frame from a video sequence with RGB input.
// See below for the YUVY and YUV420 versions.
//
// @param ppd An initialized `CPeopleDetection` object.
// @param image_data A pointer to RGB image data (1st byte red, 3rd blue)
//                   of size height * width * 3.
// @param results A pointer to an array to store the resulting boxes in. The
//  user needs to allocate space for this structure. The recommended size is 20,
//  but if the user allocates less, then fewer boxes are returned. If fewer
//  boxes are detected, then also fewer are returned. See the return value.
// @param results_length The number of 'BoxPrediction' elements allocated in
//  the provided 'results' parameter above. See that parameter for more info.
// @param delta_t The time in seconds it took between this and the previous
//  video frame (1/fps). If set to 0 then the system clock will be used to
//  compute this value.
// @return the minimum of the number of resulting bounding-boxes found in the
//  image and 'results_length'. The results structure 'results' will be filled
//  with zeros beyond this amount. If this value is equal to 'results_length',
//  it might be an indication that more boxes are found than that can be output.
int PeopleDetectionProcessFrame(CPeopleDetection ppd,
                                const unsigned char *image_data,
                                BoxPrediction *results, int results_length,
                                float delta_t);

// Process a single frame from a video sequence with YUYV input.
// See below for the YUV420 version.
//
// @param ppd An initialized `CPeopleDetection` object.
// @param image_data A pointer to YUYV image data of size height * width * 2.
// @param results A pointer to an array to store the resulting boxes in. The
//  user needs to allocate space for this structure. The recommended size is 20,
//  but if the user allocates less, then fewer boxes are returned. If fewer
//  boxes are detected, then also fewer are returned. See the return value.
// @param results_length The number of 'BoxPrediction' elements allocated in
//  the provided 'results' parameter above. See that parameter for more info.
// @param delta_t The time in seconds it took between this and the previous
//  video frame (1/fps). If set to 0 then the system clock will be used to
//  compute this value.
// @return the minimum of the number of resulting bounding-boxes found in the
//  image and 'results_length'. The results structure 'results' will be filled
//  with zeros beyond this amount. If this value is equal to 'results_length',
//  it might be an indication that more boxes are found than that can be output.
int PeopleDetectionProcessFrameYUYV(CPeopleDetection ppd,
                                    const unsigned char *image_data,
                                    BoxPrediction *results, int results_length,
                                    float delta_t);

// Process a single frame from a video sequence with planar YUV input with 420
// chroma subsampling. See above for the RGB and YUYV versions.
//
// @param ppd An initialized `CPeopleDetection` object.
// @param image_data_y A pointer to the Y channel, of size height * width.
// @param image_data_u A pointer to the U channel, of size height * width / 4.
// @param image_data_v A pointer to the V channel, of size height * width / 4.
// @param results A pointer to an array to store the resulting boxes in. The
//  user needs to allocate space for this structure. The recommended size is 20,
//  but if the user allocates less, then fewer boxes are returned. If fewer
//  boxes are detected, then also fewer are returned. See the return value.
// @param results_length The number of 'BoxPrediction' elements allocated in
//  the provided 'results' parameter above. See that parameter for more info.
// @param delta_t The time in seconds it took between this and the previous
//  video frame (1/fps). If set to 0 then the system clock will be used to
//  compute this value.
// @return the minimum of the number of resulting bounding-boxes found in the
//  image and 'results_length'. The results structure 'results' will be filled
//  with zeros beyond this amount. If this value is equal to 'results_length',
//  it might be an indication that more boxes are found than that can be output.
int PeopleDetectionProcessFrameYUV420(CPeopleDetection ppd,
                                      const unsigned char *image_y,
                                      const unsigned char *image_u,
                                      const unsigned char *image_v,
                                      BoxPrediction *results,
                                      int results_length, float delta_t);

// Process a single image not part of a video sequence. This should not be used
// for video data, but only for single image evaluation and debugging. The
// returned box id values are not related to those returned by
// `PeopleDetectionProcessFrame` or other calls to `PeopleDetectionSingleImage`.
//
// @param ppd An initialized `CPeopleDetection` object.
// @param image_data A pointer to RGB image data (1st byte red, 3rd blue)
//                   of size height * width * 3.
// @param confidence_threshold Any box with a confidence value below this
//  threshold will be filtered out. Range between 0 and 1. A value of 0.63
//  is recommended for regular evaluation, but for mAP computation this can
//  be set to 0.
// @param results A pointer to an array to store the resulting boxes in. The
//  user needs to allocate space for this structure. The recommended size is 20,
//  but if the user allocates less, then fewer boxes are returned. If fewer
//  boxes are detected, then also fewer are returned. See the return value.
// @param results_length The number of 'BoxPrediction' elements allocated in
//  the provided 'results' parameter above. See that parameter for more info.
// @return the minimum of the number of resulting bounding-boxes found in the
//  image and 'results_length'. The results structure 'results' will be filled
//  with zeros beyond this amount. If this value is equal to 'results_length',
//  it might be an indication that more boxes are found than that can be output.
int PeopleDetectionSingleImage(CPeopleDetection ppd,
                               const unsigned char *image_data,
                               float confidence_threshold,
                               BoxPrediction *results, int results_length);

// Enable debug mode for the next frame. The next time a video frame is
// processed, this will dump the input image as well as internal data and final
// results to a file. This file can then be shared with Plumerai support for
// further analysis. The file will be overwritten if it already exists, so to
// debug multiple frames, distinct filenames have to be used in successive calls
// to this function.
// Warning: these files contain uncompressed image data and can become large.
//
// @param output_file_name A filename to dump the data to.
// @return returns 1 on success. It might return `0` if this function is called
// twice without calling `PeopleDetectionProcessFrame`, or if the file could not
// be opened for writing.
int PeopleDetectionDebugNextFrame(CPeopleDetection ppd,
                                  const char *output_file_name);

#endif  // PLUMERAI_VIDEO_BOUNDING_BOX_API_C

Example usage¶

Below is an example of using the C API shown above.

#include <stdio.h>
#include <stdlib.h>

#include "plumerai/people_detection_c.h"

int main(void) {

  // Settings, to be changed as needed
  const int width = 1600;   // camera image width in pixels
  const int height = 1200;  // camera image height in pixels

  // Initialize the people detection algorithm
  CPeopleDetection ppd = PeopleDetectionInit(height, width);
  BoxPrediction predictions[10];

  // Pre-allocate space for the input image (*3 for RGB)
  unsigned char *image = (unsigned char *)malloc(height * width * 3);

  // Loop over frames in a video stream
  while (1) {

    // Some example input here, normally this is where camera data is acquired
    image[0] = 12;
    image[1] = 143;
    // etc...

    // Process the frame
    int num_results =
        PeopleDetectionProcessFrame(ppd, image, predictions, 10, 0.f);

    // Display the results to stdout
    for (int i = 0; i < num_results; ++i) {
      BoxPrediction p = predictions[i];
      printf(
          "Box #%d of class %d with confidence %.2f @ (x,y) -> (%.2f,%.2f) "
          "till (%.2f,%.2f)\n",
          p.id, p.class_id, p.confidence, p.x_min, p.y_min, p.x_max, p.y_max);
    }
  }

  // Clean-up
  free(image);
  PeopleDetectionCleanUp(ppd);
  return 0;
}