Skip to content

Plumerai People Detection C++ API

This document describes the C++ API for the Plumerai People Detection software for videos on Arm Cortex-A and x86.

The C++ API consists of a single header file which is self-documented. It is simple enough: there is a constructor that needs to be ran once, and a process_frame function that needs to be executed on each input frame. Additionally, there is a single_image function that can be used to process a single image independent of a video sequence.

The API is re-entrant, i.e. you can instantiate several PeopleDetection objects in different threads and use them independently. However, using the same instance from different threads at the same time is not supported.

API

PeopleDetection

PeopleDetection

PeopleDetection::PeopleDetection(int height, int width);

Initializes a new people detection object. This needs to be called only once at the start of the application.

Arguments

  • height int: The height of the input image in pixels.
  • width int: The width of the input image in pixels.

Returns

Nothing.

process_frame

template <ImageFormat image_format>
ErrorCodeType PeopleDetection::process_frame(const std::uint8_t *image_data,
                                             std::vector<BoxPrediction> &results,
                                             float delta_t = 0.f);

template <ImageFormat image_format>
ErrorCodeType PeopleDetection::process_frame(const std::uint8_t *image_y,
                                             const std::uint8_t *image_uv,
                                             std::vector<BoxPrediction> &results,
                                             float delta_t = 0.f);

template <ImageFormat image_format>
ErrorCodeType PeopleDetection::process_frame(const std::uint8_t *image_y,
                                             const std::uint8_t *image_u,
                                             const std::uint8_t *image_v,
                                             std::vector<BoxPrediction> &results,
                                             float delta_t = 0.f);

Process a single frame from a video sequence.

The first version supports RGB, RGBA, BGRA or YUYV input. The second version supports NV12 input. The third version supports planar YUV input with 420 chroma subsampling.

Make sure the image is right side up. When it is upside down it can still work but accuracy is significantly degraded.

Note that the algorithm comes with a built-in threshold (e.g. 0.6 - this differs per model): boxes with confidences lower than that value won't be produced at all by this function.

Arguments

  • image_format ImageFormat: A template parameter which can be ImageFormat::PACKED_RGB888, ImageFormat::PACKED_RGBA8888, ImageFormat::PACKED_BGRA8888, ImageFormat::PACKED_YUYV, ImageFormat::PLANAR_YUYV420 or ImageFormat::PLANAR_NV12.
  • image_data const std::uint8_t *: A pointer to RGB image data (1st byte red, 3rd blue) of size height * width * 3, or RGBA or BGRA image data of size height * width * 4 or YUYV image data of size height * width * 2.
  • image_data_y const std::uint8_t *: A pointer to the Y channel, of size height * width.
  • image_data_uv const std::uint8_t *: A pointer to the interleaved UV channels, of size height * width / 2.
  • image_data_u const std::uint8_t *: A pointer to the U channel, of size height * width / 4.
  • image_data_v const std::uint8_t *: A pointer to the V channel, of size height * width / 4.
  • results std::vector<BoxPrediction> &: The resulting bounding-boxes found in the frame.
  • delta_t float: The time in seconds it took between this and the previous video frame (1/fps). If set to 0 then the system clock will be used to compute this value.

Returns

ErrorCodeType (==int): an error code of type ErrorCode (or ErrorCodeFamiliarFaceID if familiar face identification is enabled). See those enums for more details.

single_image

template <ImageFormat image_format>
ErrorCodeType PeopleDetection::single_image(const std::uint8_t *image_data,
                                            std::vector<BoxPrediction> &results,
                                            int height = 0, int width = 0);

template <ImageFormat image_format>
ErrorCodeType PeopleDetection::single_image(const std::uint8_t *image_y,
                                            const std::uint8_t *image_uv,
                                            std::vector<BoxPrediction> &results,
                                            int height = 0, int width = 0);

template <ImageFormat image_format>
ErrorCodeType PeopleDetection::single_image(const std::uint8_t *image_y,
                                            const std::uint8_t *image_u,
                                            const std::uint8_t *image_v,
                                            std::vector<BoxPrediction> &results,
                                            int height = 0, int width = 0);

Process a single image not part of a video sequence.

The first version supports RGB, RGBA, BGRA or YUYV input. The second version supports NV12 input. The third version supports planar YUV input with 420 chroma subsampling.

This should not be used for video data. It can be used for face enrollments from a set of images. The returned box id values are not related to those returned by process_frame or other calls to single_frame.

Arguments

  • image_format ImageFormat: A template parameter which can be ImageFormat::PACKED_RGB888, ImageFormat::PACKED_RGBA8888, ImageFormat::PACKED_BGRA8888, ImageFormat::PACKED_YUYV, ImageFormat::PLANAR_YUYV420 or ImageFormat::PLANAR_NV12.
  • image_data const std::uint8_t *: A pointer to RGB image data (1st byte red, 3rd blue) of size height * width * 3, or RGBA or BGRA image data of size height * width * 4 or YUYV image data of size height * width * 2.
  • image_data_y const std::uint8_t *: A pointer to the Y channel, of size height * width.
  • image_data_uv const std::uint8_t *: A pointer to the interleaved UV channels, of size height * width / 2.
  • image_data_u const std::uint8_t *: A pointer to the U channel, of size height * width / 4.
  • image_data_v const std::uint8_t *: A pointer to the V channel, of size height * width / 4.
  • results std::vector<BoxPrediction> &: The resulting bounding-boxes found in the frame.
  • height int: The height of the input image in pixels. If height = 0 the height set in the constructor will be used.
  • width int: The width of the input image in pixels. If width = 0 the width set in the constructor will be used.

Returns

ErrorCodeType (==int): an error code of type ErrorCode (or ErrorCodeFamiliarFaceID if familiar face identification is enabled). See those enums for more details.

reset_tracker

void PeopleDetection::reset_tracker();

This function is only available if the library was built with tracking support. This resets all internal tracker state. It is recommended to call this whenever two consecutive frames are too different from each other, such as when switching to a different camera input or when the camera abruptly moved.

Arguments

None.

Returns

Nothing.

store_state

ErrorCodeType PeopleDetection::store_state(
    std::vector<std::uint8_t> &state) const;

Store the current state of the algorithm to a byte array.

This function can be used when processing a video in chunks, doing different chunks at different times or on different machines. The state can be restored by calling restore_state with the data returned by store_state. When the library is built with support for familiar face identification, the state includes the face library.

Constraints:

  • The delta_t parameter of process_frame can not be left to zero after restoring a previous state.
  • If familiar face identification is enabled, the state can only be stored and restored when not enrolling.

Arguments

  • state std::vector<std::uint8_t> &: A vector to store the serialized state in.

Returns

ErrorCodeType (==int): an error code of type ErrorCode. See that enum for more details.

Example

auto ppd = plumerai::PeopleDetection(height, width);

std::vector<std::uint8_t> state;
auto error_code = ppd.store_state(state);
if (error_code != plumerai::ErrorCode::SUCCESS) {
  printf("ERROR: store_state returned %s\n", plumerai::error_code_string(error_code));
}

restore_state

ErrorCodeType PeopleDetection::restore_state(
    const std::vector<std::uint8_t> &state);

Restore the state of the algorithm from a byte array.

See store_state for more information. The user must ensure that the height and width of the current object match the height and width of the state that is being restored.

Arguments

  • state std::vector<std::uint8_t> &: A vector containing the serialized state.

Returns

ErrorCodeType (==int): an error code of type ErrorCode. See that enum for more details.

Example

auto ppd = plumerai::PeopleDetection(height, width);

// The state as obtained by `store_state`, e.g. loaded from memory
std::vector<std::uint8_t> state = ...;
auto error_code = ppd.restore_state(state);
if (error_code != plumerai::ErrorCode::SUCCESS) {
  printf("ERROR: restore_state returned %s\n", plumerai::error_code_string(error_code));
}

get_motion_detection_grid

ErrorCodeType PeopleDetection::get_motion_detection_grid(
    std::vector<float> &motion_detection_grid) const;

int PeopleDetection::get_motion_detection_grid_height() const;
int PeopleDetection::get_motion_detection_grid_width() const;

Retrieves the amount of motion found in each grid cell of the frame.

As a by-product of people detection, motion detection is performed. For specific use-cases, it might be useful to access this raw motion detection information as well. This function provides access to it, however, it can only be called as long as Plumerai People Detection is also running, and it won't be available in the first few frames.

The result of this function is a 2D grid of type float, with dimensions that can be retrieved with get_motion_detection_grid_height() and get_motion_detection_grid_width(). The values in each grid cell are floats between 0.0 and 1.0, and denote how much motion was detected in that grid cell. A higher value indicates more motion.

Arguments

  • motion_detection_grid std::vector<float> &: A float vector in which the result will be stored. It will be resized by this function to the correct size.

Returns

ErrorCodeType (==int): an error code of type ErrorCode. It will return SUCCESS if all went fine, or MOTION_GRID_NOT_YET_READY if this function is called too soon after initialization of the people detection object. It needs to process at least a few frames before the motion grid is valid.

Example

auto ppd = plumerai::PeopleDetection(height, width);
for (int t = 0; t < num_frames; ++t) {
  ppd.process_frame(...);

  std::vector<float> motion_detection_grid(0);
  auto error_code = ppd.get_motion_detection_grid(motion_detection_grid);
  if (error_code == plumerai::ErrorCode::SUCCESS) {
    // Motion-detection grid is now available
  }
}

debug_next_frame

ErrorCodeType PeopleDetection::debug_next_frame(const char *output_file_name);

Enable debug mode for the next frame. The next time process_frame is called, this will dump the input image as well as internal data and final results to a file. This file can then be shared with Plumerai support for further analysis. The file will be overwritten if it already exists, so to debug multiple frames, distinct filenames have to be used in successive calls to this function. Warning: these files contain uncompressed image data and can become large.

Arguments

  • output_file_name const char *: A filename to write the data to.

Returns

ErrorCodeType (==int): an error code of type ErrorCode. Returns ErrorCode::SUCCESS if all went well. It might return ErrorCode::INTERNAL_ERROR if this method is called twice without calling process_frame, or if the file could not be opened for writing.

BoxPrediction

typedef enum {
  CLASS_UNKNOWN = 0,
  CLASS_PERSON = 1,
  CLASS_HEAD = 2,
  CLASS_FACE = 3,
  CLASS_MAX_ENUM = 3,
} DetectionClass;

typedef struct BoxPrediction {
  float y_min;             // top coordinate between 0 and 1 in height dimension
  float x_min;             // left coordinate between 0 and 1 in width dimension
  float y_max;             // bottom coordinate between 0 and 1 in height dimension
  float x_max;             // right coordinate between 0 and 1 in width dimension
  float confidence;        // between 0 and 1, higher means more confident
  unsigned int id;         // the tracked identifier of this box
  DetectionClass class_id; // the class of the detected object
} BoxPrediction;

A structure representing a single resulting bounding box. Coordinates are between 0 and 1, the origin is at the top-left. Confidence values lie between 0 and 1. Note that the algorithm comes with a built-in threshold (e.g. 0.6 - this differs per model): boxes with confidences lower than that value won't be produced at all by the Plumerai People Detection functions.

ErrorCode

typedef enum {
  SUCCESS = 0,

  // Should not occur, contact Plumerai if this happens
  INTERNAL_ERROR = -1,

  // The `delta_t` parameter should be >= 0
  INVALID_DELTA_T = -2,

  // The `STATE_` error codes are only returned by `store_state` and
  // `restore_state`. See those functions for more details.

  // The state can not be (re)stored while enrolling
  STATE_WHILE_ENROLLING = -3,

  // The state could not be restored
  STATE_CORRUPT = -4,

  // The state was serialized with a different height/width than the current
  // object, or uses different familiar face identification settings.
  STATE_SETTINGS_MISMATCH = -5,

  // This error code is only returned by `get_motion_detection_grid`.
  // See that function for more details.
  MOTION_GRID_NOT_YET_READY = -6
} ErrorCode;

error_code_string

const char *plumerai::error_code_string(ErrorCodeType error_code);

Returns a string representation of an error code.

Arguments

  • error_code ErrorCodeType: An error code.

Returns

const char*: A string representation of the error code.

Example

auto ppd = plumerai::PeopleDetection(height, width);

auto error_code = ppd.process_frame(...);
if (error_code != plumerai::ErrorCode::SUCCESS) {
  printf("ERROR: %s\n", plumerai::error_code_string(error_code));
}

Example usage

Below is an example of using the C++ API shown above.

#include <cstdint>
#include <vector>

#include "plumerai/people_detection.h"

int main() {
  // Settings, to be changed as needed
  constexpr int width = 1600;   // camera image width in pixels
  constexpr int height = 1200;  // camera image height in pixels
  constexpr auto image_format = plumerai::ImageFormat::PACKED_RGB888;

  // Initialize the people detection algorithm
  auto ppd = plumerai::PeopleDetection(height, width);

  // Loop over frames in a video stream (example: 10 frames)
  for (int t = 0; t < 10; ++t) {
    // Some example input here, normally this is where camera data is acquired
    auto image = std::vector<std::uint8_t>(height * width * 3);  // 3 for RGB

    // The time between two video frames in seconds
    // In this example we assume a constant frame rate of 30 fps, but variable
    // rates are supported.
    const float delta_t = 1.f / 30.f;

    // Process the frame
    std::vector<BoxPrediction> predictions(0);
    const auto error_code =
        ppd.process_frame<image_format>(image.data(), predictions, delta_t);
    if (error_code != plumerai::ErrorCode::SUCCESS) {
      printf("Error: %s\n", plumerai::error_code_string(error_code));
      return 1;
    }

    // Display the results to stdout
    for (auto &p : predictions) {
      printf(
          "Box #%d of class %d with confidence %.2f @ (x,y) -> (%.2f,%.2f) "
          "till (%.2f,%.2f)\n",
          p.id, p.class_id, p.confidence, p.x_min, p.y_min, p.x_max, p.y_max);
    }
  }
  return 0;
}

Upgrade guide

From version 1.14 to 1.15

The confidence_threshold argument has been removed from the single_image function. The return type of debug_next_frame changed from bool to ErrorCodeType.

From version 1.13 to 1.14

In version 1.14 the API of process_frame changed compared to earlier versions: the return type is now an error code, and the resulting boxes are now returned via a reference argument.

If your code looked like this before:

const auto predictions = ppd.process_frame<image_format>(image.data());

Then it should be updated as follows:

std::vector<BoxPrediction> predictions(0);
const auto error_code = ppd.process_frame<image_format>(image.data(), predictions);
if (error_code != plumerai::ErrorCode::SUCCESS) {
  printf("Error code: %d\n", error_code);
}