Plumerai People Detection C++ API¶
This document describes the C++ API for the Plumerai People Detection software for videos on Arm Cortex-A and x86.
The C++ API consists of a single header file which is self-documented. It is simple enough: there is a constructor that needs to be ran once, and a process_frame
function that needs to be executed on each input frame. Additionally, there is a single_image
function that can be used to process a single image independent of a video sequence.
The API is re-entrant, i.e. you can instantiate several PeopleDetection objects in different threads and use them independently. However, using the same instance from different threads at the same time is not supported.
API¶
PeopleDetection¶
PeopleDetection¶
Initializes a new people detection object. This needs to be called only once at the start of the application.
Arguments
- height
int
: The height of the input image in pixels. - width
int
: The width of the input image in pixels.
Returns
Nothing.
process_frame (RGB, YUYV)¶
template <ImageFormat image_format>
ErrorCodeType PeopleDetection::process_frame(const std::uint8_t *image_data,
std::vector<BoxPrediction> &results,
float delta_t = 0.f);
Process a single frame from a video sequence with RGB or YUYV input. Make sure the image is right side up. When it is upside down it can still work but accuracy is significantly degraded. See below for the YUV420 version.
Note that the algorithm comes with a built-in threshold (e.g. 0.6 - this differs per model): boxes with confidences lower than that value won't be produced at all by this function.
Arguments
- image_format
ImageFormat
: A template parameter which can beImageFormat::PACKED_RGB888
orImageFormat::PACKED_YUYV
. ForImageFormat::PLANAR_YUYV420
see the function below. - image_data
const std::uint8_t *
: A pointer to RGB image data (1st byte red, 3rd blue) of sizeheight * width * 3
or YUYV image data of sizeheight * width * 2
. - results
std::vector<BoxPrediction> &
: The resulting bounding-boxes found in the frame. - delta_t
float
: The time in seconds it took between this and the previous video frame (1/fps). If set to 0 then the system clock will be used to compute this value.
Returns
ErrorCodeType
(==int
): an error code of type ErrorCode
(or ErrorCodeFamiliarFaceID
if familiar face identification is enabled). See those enums for more details.
process_frame (YUV420)¶
template <ImageFormat image_format>
ErrorCodeType PeopleDetection::process_frame(const std::uint8_t *image_y,
const std::uint8_t *image_u,
const std::uint8_t *image_v,
std::vector<BoxPrediction> &results,
float delta_t = 0.f);
Process a single frame from a video sequence with planar YUV input and 420 chroma subsampling. Make sure the image is right side up. When it is upside down it can still work but accuracy is significantly degraded. See above for the RGB and YUYV version.
Note that the algorithm comes with a built-in threshold (e.g. 0.6 - this differs per model): boxes with confidences lower than that value won't be produced at all by this function.
Arguments
- image_format
ImageFormat
: A template parameter which has to be set toImageFormat::PLANAR_YUYV420
. See the function above for the other formats. - image_data_y
const std::uint8_t *
: A pointer to the Y channel, of sizeheight * width
. - image_data_u
const std::uint8_t *
: A pointer to the U channel, of sizeheight * width / 4
. - image_data_v
const std::uint8_t *
: A pointer to the V channel, of sizeheight * width / 4
. - results
std::vector<BoxPrediction> &
: The resulting bounding-boxes found in the frame. - delta_t
float
: The time in seconds it took between this and the previous video frame (1/fps). If set to 0 then the system clock will be used to compute this value.
Returns
ErrorCodeType
(==int
): an error code of type ErrorCode
(or ErrorCodeFamiliarFaceID
if familiar face identification is enabled). See those enums for more details.
single_image¶
ErrorCodeType PeopleDetection::single_image(const std::uint8_t *image_data,
float confidence_threshold,
std::vector<BoxPrediction> &results,
int height = 0, int width = 0);
Process a single image not part of a video sequence. This should not be used for video data, but only for single image evaluation and debugging. The returned box id values are not related to those returned by process_frame
or other calls to single_frame
.
Arguments
- image_data
const std::uint8_t *
: A pointer to RGB image data (1st byte red, 3rd blue) of sizeheight * width * 3
. - confidence_threshold
float
: Any box with a confidence value below this threshold will be filtered out. Range between 0 and 1. A value of 0.63 is recommended for regular evaluation, but for mAP computation this can be set to 0. - results
std::vector<BoxPrediction> &
: The resulting bounding-boxes found in the frame. - height
int
: The height of the input image in pixels. Ifheight = 0
the height set in the constructor will be used. - width
int
: The width of the input image in pixels. Ifwidth = 0
the width set in the constructor will be used.
Returns
ErrorCodeType
(==int
): an error code of type ErrorCode
(or ErrorCodeFamiliarFaceID
if familiar face identification is enabled). See those enums for more details.
reset_tracker¶
This function is only available if the library was built with tracking support. This resets all internal tracker state. It is recommended to call this whenever two consecutive frames are too different from each other, such as when switching to a different camera input or when the camera abruptly moved.
Arguments
None.
Returns
Nothing.
store_state¶
Store the current state of the algorithm to a byte array.
This function can be used when processing a video in chunks, doing different chunks at different times or on different machines. The state can be restored by calling restore_state
with the data returned by store_state
. When the library is built with support for familiar face identification, the state includes the face library.
Constraints:
- The
delta_t
parameter ofprocess_frame
can not be left to zero after restoring a previous state. - If familiar face identification is enabled, the state can only be stored and restored when not enrolling.
Arguments
- state
std::vector<std::uint8_t> &
: A vector to store the serialized state in.
Returns
ErrorCodeType
(==int
): an error code of type ErrorCode
. See that enum for more details.
Example
auto ppd = plumerai::PeopleDetection(height, width);
std::vector<std::uint8_t> state;
auto error_code = ppd.store_state(state);
if (error_code != plumerai::ErrorCode::SUCCESS) {
printf("ERROR: store_state returned %d\n", error_code);
}
restore_state¶
Restore the state of the algorithm from a byte array.
See store_state
for more information. The user must ensure that the height and width of the current object match the height and width of the state that is being restored.
Arguments
- state
std::vector<std::uint8_t> &
: A vector containing the serialized state.
Returns
ErrorCodeType
(==int
): an error code of type ErrorCode
. See that enum for more details.
Example
auto ppd = plumerai::PeopleDetection(height, width);
// The state as obtained by `store_state`, e.g. loaded from memory
std::vector<std::uint8_t> state = ...;
auto error_code = ppd.restore_state(state);
if (error_code != plumerai::ErrorCode::SUCCESS) {
printf("ERROR: restore_state returned %d\n", error_code);
}
debug_next_frame¶
Enable debug mode for the next frame. The next time process_frame
is called, this will dump the input image as well as internal data and final results to a file. This file can then be shared with Plumerai support for further analysis. The file will be overwritten if it already exists, so to debug multiple frames, distinct filenames have to be used in successive calls to this function. Warning: these files contain uncompressed image data and can become large.
Arguments
- output_file_name
const char *
: A filename to write the data to.
Returns
bool
: Returns true
if all went well. The function can return false
if this method is called twice without calling process_frame
, or if the file could not be opened for writing.
BoxPrediction¶
typedef enum {
CLASS_UNKNOWN = 0,
CLASS_PERSON = 1,
CLASS_HEAD = 2,
CLASS_FACE = 3,
CLASS_MAX_ENUM = 3,
} DetectionClass;
typedef struct BoxPrediction {
float y_min; // top coordinate between 0 and 1 in height dimension
float x_min; // left coordinate between 0 and 1 in width dimension
float y_max; // bottom coordinate between 0 and 1 in height dimension
float x_max; // right coordinate between 0 and 1 in width dimension
float confidence; // between 0 and 1, higher means more confident
unsigned int id; // the tracked identifier of this box
DetectionClass class_id; // the class of the detected object
} BoxPrediction;
A structure representing a single resulting bounding box. Coordinates are between 0 and 1, the origin is at the top-left. Confidence values lie between 0 and 1. Note that the algorithm comes with a built-in threshold (e.g. 0.6 - this differs per model): boxes with confidences lower than that value won't be produced at all by the Plumerai People Detection functions, with the exception of the single_image
function for mAP evaluations.
ErrorCode¶
typedef enum {
SUCCESS = 0,
// Should not occur, contact Plumerai if this happens
INTERNAL_ERROR = -1,
// The `delta_t` parameter should be >= 0
INVALID_DELTA_T = -2,
// The `STATE_` error codes are only returned by `store_state` and
// `restore_state`. See those functions for more details.
// The state can not be (re)stored while enrolling
STATE_WHILE_ENROLLING = -3,
// The state could not be restored
STATE_CORRUPT = -3,
// The state was serialized with a different height/width than the current
// object
STATE_HEIGHT_WIDTH_MISMATCH = -4
} ErrorCode;
Example usage¶
Below is an example of using the C++ API shown above.
#include <cstdint>
#include <vector>
#include "plumerai/people_detection.h"
int main() {
// Settings, to be changed as needed
constexpr int width = 1600; // camera image width in pixels
constexpr int height = 1200; // camera image height in pixels
constexpr auto image_format = plumerai::ImageFormat::PACKED_RGB888;
// Initialize the people detection algorithm
auto ppd = plumerai::PeopleDetection(height, width);
// Loop over frames in a video stream
while (true) {
// Some example input here, normally this is where camera data is acquired
auto image = std::vector<std::uint8_t>(height * width * 3); // 3 for RGB
// Process the frame
std::vector<BoxPrediction> predictions(0);
const auto error_code = ppd.process_frame<image_format>(image.data(), predictions);
if (error_code != plumerai::ErrorCode::SUCCESS) {
printf("Error code: %d\n", error_code);
return 1;
}
// Display the results to stdout
for (auto &p : predictions) {
printf(
"Box #%d of class %d with confidence %.2f @ (x,y) -> (%.2f,%.2f) "
"till (%.2f,%.2f)\n",
p.id, p.class_id, p.confidence, p.x_min, p.y_min, p.x_max, p.y_max);
}
}
return 0;
}
Upgrade guide¶
From version 1.13 to 1.14¶
In version 1.14 the API of process_frame
changed compared to earlier versions: the return type is now an error code, and the resulting boxes are now returned via a reference argument.
If your code looked like this before:
Then it should be updated as follows: