Plumerai People Detection for Cortex-A C++ API¶
This document describes the C++ API for Plumerai’s people detection software for videos on Arm Cortex-A.
The API¶
The C++ API consists of a single header file which is self-documented. It is simple enough: there is a constructor that needs to be ran once, and a process_frame
function that needs to be executed on each input frame. Additionally, there is a single_image
function that can be used to process a single image independent of a video sequence.
The API is re-entrant, i.e. you can instantiate several PeopleDetection objects in different threads and use them independently. However, using the same instance from different threads at the same time is not supported.
PeopleDetection::PeopleDetection¶
Initializes a new people detection object. This needs to be called only once at the start of the application.Arguments
- height
int
: The height of the input image in pixels. - width
int
: The width of the input image in pixels.
Returns
Nothing.
PeopleDetection::process_frame (RGB)¶
Process a single frame from a video sequence with RGB input. Make sure the image is right side up. When it is upside down it can still work but accuracy is significantly degraded. See below for the YUV version.Arguments
- image_data
const std::uint8_t *
: A pointer to RGB image data (1st byte red, 3rd blue) of sizeheight * width * 3
. - delta_t
float
: The time in seconds it took between this and the previous video frame (1/fps). If set to 0 then the system clock will be used to compute this value.
Returns
std::vector<BoxPrediction>
: The resulting bounding-boxes found in the frame.
PeopleDetection::process_frame (YUV)¶
std::vector<BoxPrediction> process_frame(const std::uint8_t *image_y,
const std::uint8_t *image_u,
const std::uint8_t *image_v,
float delta_t = 0.f)
Arguments
- image_data_y
const std::uint8_t *
: A pointer to the Y channel, of sizeheight * width
. - image_data_u
const std::uint8_t *
: A pointer to the U channel, of sizeheight * width / 4
. - image_data_v
const std::uint8_t *
: A pointer to the V channel, of sizeheight * width / 4
. - delta_t
float
: The time in seconds it took between this and the previous video frame (1/fps). If set to 0 then the system clock will be used to compute this value.
Returns
std::vector<BoxPrediction>
: The resulting bounding-boxes found in the frame.
PeopleDetection::single_image¶
std::vector<BoxPrediction> single_image(const std::uint8_t *image_data,
float confidence_threshold);
process_frame
or other calls to single_frame
. Arguments
- image_data
const std::uint8_t *
: A pointer to RGB image data (1st byte red, 3rd blue) of sizeheight * width * 3
. - confidence_threshold
float
: Any box with a confidence value below this threshold will be filtered out. Range between 0 and 1. A value of 0.63 is recommended for regular evaluation, but for mAP computation this can be set to 0.
Returns
std::vector<BoxPrediction>
: The resulting bounding-boxes found in the image.
BoxPrediction¶
struct BoxPrediction {
float y_min; // top coordinate between 0 and 1 in height dimension
float x_min; // left coordinate between 0 and 1 in width dimension
float y_max; // bottom coordinate between 0 and 1 in height dimension
float x_max; // right coordinate between 0 and 1 in width dimension
float confidence; // between 0 and 1, higher means more confident
unsigned int id; // the tracked identifier of this box
unsigned int class_id; // the class of the detected object
};
Full header¶
#pragma once
#include <string>
#include <vector>
#include "box_prediction.h"
namespace plumerai {
class PeopleDetection {
public:
// Initializes a new people detection object. This needs to be called only
// once at the start of the application.
//
// @param height The height of the input image in pixels.
// @param width The width of the input image in pixels.
PeopleDetection(int height, int width);
// Destructor, called automatically when the object goes out of scope
~PeopleDetection();
// Process a single frame from a video sequence with RGB input.
// See below for the YUV version.
//
// @param image_data A pointer to RGB image data (1st byte red, 3rd blue)
// of size height * width * 3.
// @param delta_t The time in seconds it took between this and the previous
// video frame (1/fps). If left to the default of 0, then the system clock
// will be used to compute this value.
// @return the resulting bounding-boxes found in the frame.
std::vector<BoxPrediction> process_frame(const std::uint8_t *image_data,
float delta_t = 0.f);
// Process a single frame from a video sequence with planar YUV input and 420
// chroma subsampling. See above for the RGB version.
//
// @param image_data_y A pointer to the Y channel, of size height * width.
// @param image_data_u A pointer to the U channel, of size height * width / 4.
// @param image_data_v A pointer to the V channel, of size height * width / 4.
// @param delta_t The time in seconds it took between this and the previous
// video frame (1/fps). If left to the default of 0, then the system clock
// will be used to compute this value.
// @return the resulting bounding-boxes found in the frame.
std::vector<BoxPrediction> process_frame(const std::uint8_t *image_y,
const std::uint8_t *image_u,
const std::uint8_t *image_v,
float delta_t = 0.f);
// Process a single image not part of a video sequence. This should not be
// used for video data, but only for single image evaluation and debugging.
// The returned box id values are not related to those returned by
// `process_frame` or other calls to `single_frame`.
//
// @param image_data A pointer to RGB image data (1st byte red, 3rd blue)
// of size height * width * 3.
// @param confidence_threshold Any box with a confidence value below this
// threshold will be filtered out. Range between 0 and 1. A value of 0.63
// is recommended for regular evaluation, but for mAP computation this can
// be set to 0.
// @return the resulting bounding-boxes found in the image.
std::vector<BoxPrediction> single_image(const std::uint8_t *image_data,
float confidence_threshold);
};
} // namespace plumerai
Example usage¶
Below is an example of using the C++ API shown above.
#include <string>
#include <vector>
#include "plumerai/people_detection.h"
int main() {
// Settings, to be changed as needed
constexpr int width = 1600; // camera image width in pixels
constexpr int height = 1200; // camera image height in pixels
// Initialize the people detection algorithm
auto ppd = plumerai::PeopleDetection(height, width);
// Loop over frames in a video stream
while (true) {
// Some example input here, normally this is where camera data is acquired
auto image = std::vector<std::uint8_t>(height * width * 3); // 3 for RGB
// Process the frame
auto predictions = ppd.process_frame(image.data());
// Display the results to stdout
for (auto &p : predictions) {
printf(
"Box #%d of class %d with confidence %.2f @ (x,y) -> (%.2f,%.2f) "
"till (%.2f,%.2f)\n",
p.id, p.class_id, p.confidence, p.x_min, p.y_min, p.x_max, p.y_max);
}
}
return 0;
}