Plumerai People Detection C++ API¶

This document describes the C++ API for the Plumerai People Detection software for videos on Arm Cortex-A and x86.

The API¶

The C++ API consists of a single header file which is self-documented. It is simple enough: there is a constructor that needs to be ran once, and a process_frame function that needs to be executed on each input frame. Additionally, there is a single_image function that can be used to process a single image independent of a video sequence.

The API is re-entrant, i.e. you can instantiate several PeopleDetection objects in different threads and use them independently. However, using the same instance from different threads at the same time is not supported.

PeopleDetection::PeopleDetection¶

PeopleDetection::PeopleDetection(int height, int width)

Initializes a new people detection object. This needs to be called only once at the start of the application.

Arguments

height int: The height of the input image in pixels.
width int: The width of the input image in pixels.

Returns

Nothing.

PeopleDetection::process_frame (RGB, YUYV)¶

template <ImageFormat image_format>
std::vector<BoxPrediction> process_frame(const std::uint8_t *image_data,
                                         float delta_t = 0.f)

Process a single frame from a video sequence with RGB or YUYV input. Make sure the image is right side up. When it is upside down it can still work but accuracy is significantly degraded. See below for the YUV420 version.

Note that the algorithm comes with a built-in threshold (e.g. 0.6 - this differs per model): boxes with confidences lower than that value won't be produced at all by this function.

Arguments

image_format ImageFormat: A template parameter which can be ImageFormat::PACKED_RGB888 or ImageFormat::PACKED_YUYV. For ImageFormat::PLANAR_YUYV420 see the function below.
image_data const std::uint8_t *: A pointer to RGB image data (1st byte red, 3rd blue) of size height * width * 3 or YUYV image data of size height * width * 2.
delta_t float: The time in seconds it took between this and the previous video frame (1/fps). If set to 0 then the system clock will be used to compute this value.

Returns

std::vector<BoxPrediction>: The resulting bounding-boxes found in the frame.

PeopleDetection::process_frame (YUV420)¶

template <ImageFormat image_format>
std::vector<BoxPrediction> process_frame(const std::uint8_t *image_y,
                                         const std::uint8_t *image_u,
                                         const std::uint8_t *image_v,
                                         float delta_t = 0.f)

Process a single frame from a video sequence with planar YUV input and 420 chroma subsampling. Make sure the image is right side up. When it is upside down it can still work but accuracy is significantly degraded. See above for the RGB and YUYV version.

Note that the algorithm comes with a built-in threshold (e.g. 0.6 - this differs per model): boxes with confidences lower than that value won't be produced at all by this function.

Arguments

image_format ImageFormat: A template parameter which has to be set to ImageFormat::PLANAR_YUYV420. See the function above for the other formats.
image_data_y const std::uint8_t *: A pointer to the Y channel, of size height * width.
image_data_u const std::uint8_t *: A pointer to the U channel, of size height * width / 4.
image_data_v const std::uint8_t *: A pointer to the V channel, of size height * width / 4.
delta_t float: The time in seconds it took between this and the previous video frame (1/fps). If set to 0 then the system clock will be used to compute this value.

Returns

std::vector<BoxPrediction>: The resulting bounding-boxes found in the frame.

PeopleDetection::single_image¶

std::vector<BoxPrediction> single_image(const std::uint8_t *image_data,
                                        float confidence_threshold);

Process a single image not part of a video sequence. This should not be used for video data, but only for single image evaluation and debugging. The returned box id values are not related to those returned by process_frame or other calls to single_frame.

Arguments

image_data const std::uint8_t *: A pointer to RGB image data (1st byte red, 3rd blue) of size height * width * 3.
confidence_threshold float: Any box with a confidence value below this threshold will be filtered out. Range between 0 and 1. A value of 0.63 is recommended for regular evaluation, but for mAP computation this can be set to 0.

Returns

std::vector<BoxPrediction>: The resulting bounding-boxes found in the image.

PeopleDetection::debug_next_frame¶

bool debug_next_frame(const char *output_file_name);

Enable debug mode for the next frame. The next time process_frame is called, this will dump the input image as well as internal data and final results to a file. This file can then be shared with Plumerai support for further analysis. The file will be overwritten if it already exists, so to debug multiple frames, distinct filenames have to be used in successive calls to this function. Warning: these files contain uncompressed image data and can become large.

Arguments

output_file_name const char *: A filename to write the data to.

Returns

bool: Returns true if all went well. The function can return false if this method is called twice without calling process_frame, or if the file could not be opened for writing.

BoxPrediction¶

struct BoxPrediction {
  float y_min;       // top coordinate between 0 and 1 in height dimension
  float x_min;       // left coordinate between 0 and 1 in width dimension
  float y_max;       // bottom coordinate between 0 and 1 in height dimension
  float x_max;       // right coordinate between 0 and 1 in width dimension
  float confidence;  // between 0 and 1, higher means more confident
  unsigned int id;   // the tracked identifier of this box
  unsigned int class_id;  // the class of the detected object
};

A structure representing a single resulting bounding box. Coordinates are between 0 and 1, the origin is at the top-left. Confidence values lie between 0 and 1. Note that the algorithm comes with a built-in threshold (e.g. 0.6 - this differs per model): boxes with confidences lower than that value won't be produced at all by the Plumerai People Detection functions, with the exception of the single_image function for mAP evaluations.

Full header¶

#pragma once

#include <string>
#include <vector>

#include "box_prediction.h"

namespace plumerai {

// Supported input formats
// - Packed/interleaved:
//   - RGB888
//   - YUYV, also known as YUY2, which has 4:2:0 subsampling
// - Planar:
//   - YUV420
enum class ImageFormat { PACKED_RGB888, PACKED_YUYV, PLANAR_YUV420 };

class PeopleDetection {
 public:
  // Initializes a new people detection object. This needs to be called only
  // once at the start of the application.
  //
  // @param height The height of the input image in pixels.
  // @param width The width of the input image in pixels.
  PeopleDetection(int height, int width);

  // Destructor, called automatically when the object goes out of scope
  ~PeopleDetection();

  // Process a single frame from a video sequence.
  // This version supports RGB or YUYV input. See below for the YUV420 version.
  //
  // @param image_format Can be either ImageFormat::PACKED_RGB888 or
  //  ImageFormat::PACKED_YUYV. See below for PLANAR_YUYV420.
  // @param image_data A pointer to RGB image data (1st byte red, 3rd blue)
  //  of size height * width * 3 or YUYV image data of size height * width * 2.
  // @param delta_t The time in seconds it took between this and the previous
  //  video frame (1/fps). If left to the default of 0, then the system clock
  //  will be used to compute this value.
  // @return the resulting bounding-boxes found in the frame.
  template <ImageFormat image_format = ImageFormat::PACKED_RGB888>
  std::vector<BoxPrediction> process_frame(const std::uint8_t *image_data,
                                           float delta_t = 0.f);

  // Process a single frame from a video sequence with planar YUV input and 420
  // chroma subsampling. See above for the RGB and YUYV versions.
  //
  // @param image_format Has to be ImageFormat::PLANAR_YUYV420. See above for
  //  other formats.
  // @param image_data_y A pointer to the Y channel, of size height * width.
  // @param image_data_u A pointer to the U channel, of size height * width / 4.
  // @param image_data_v A pointer to the V channel, of size height * width / 4.
  // @param delta_t The time in seconds it took between this and the previous
  //  video frame (1/fps). If left to the default of 0, then the system clock
  //  will be used to compute this value.
  // @return the resulting bounding-boxes found in the frame.
  template <ImageFormat image_format = ImageFormat::PLANAR_YUV420>
  std::vector<BoxPrediction> process_frame(const std::uint8_t *image_y,
                                           const std::uint8_t *image_u,
                                           const std::uint8_t *image_v,
                                           float delta_t = 0.f);

  // Process a single image not part of a video sequence. This should not be
  // used for video data, but only for single image evaluation and debugging.
  // The returned box id values are not related to those returned by
  // `process_frame` or other calls to `single_frame`.
  //
  // @param image_data A pointer to RGB image data (1st byte red, 3rd blue)
  //  of size height * width * 3.
  // @param confidence_threshold Any box with a confidence value below this
  //  threshold will be filtered out. Range between 0 and 1. A value of 0.63
  //  is recommended for regular evaluation, but for mAP computation this can
  //  be set to 0.
  // @param height The height of the input image in pixels. If `height = 0` the
  //  height set in the constructor will be used.
  // @param width The width of the input image in pixels. If `width = 0` the
  //  width set in the constructor will be used.
  // @return the resulting bounding-boxes found in the image.
  std::vector<BoxPrediction> single_image(const std::uint8_t *image_data,
                                          float confidence_threshold,
                                          int height = 0, int width = 0);

  //  Enable debug mode for the next frame. The next time `process_frame` is
  //  called, this will dump the input image as well as internal data and final
  //  results to a file. This file can then be shared with Plumerai support for
  //  further analysis. The file will be overwritten if it already exists, so to
  //  debug multiple frames, distinct filenames have to be used in successive
  //  calls to this function.
  //  Warning: these files contain uncompressed image data and can become large.
  //
  // @param output_file_name A filename to dump the data to.
  // @return a bool indicating if all went well. It might return `False` if this
  // method is called twice without calling `process_frame`, or if the file
  // could not be opened for writing .
  bool debug_next_frame(const char *output_file_name);
};

}  // namespace plumerai

Example usage¶

Below is an example of using the C++ API shown above.

#include <string>
#include <vector>

#include "plumerai/people_detection.h"

int main() {

  // Settings, to be changed as needed
  constexpr int width = 1600;   // camera image width in pixels
  constexpr int height = 1200;  // camera image height in pixels
  constexpr auto image_format = plumerai::ImageFormat::PACKED_RGB888;

  // Initialize the people detection algorithm
  auto ppd = plumerai::PeopleDetection(height, width);

  // Loop over frames in a video stream
  while (true) {

    // Some example input here, normally this is where camera data is acquired
    auto image = std::vector<std::uint8_t>(height * width * 3);  // 3 for RGB

    // Process the frame
    auto predictions = ppd.process_frame<image_format>(image.data());

    // Display the results to stdout
    for (auto &p : predictions) {
      printf(
          "Box #%d of class %d with confidence %.2f @ (x,y) -> (%.2f,%.2f) "
          "till (%.2f,%.2f)\n",
          p.id, p.class_id, p.confidence, p.x_min, p.y_min, p.x_max, p.y_max);
    }
  }
  return 0;
}