Plumerai Video Intelligence C++ API¶

This document describes the C++ API for the Plumerai Video Intelligence library for videos.

The C++ API header files are self-documented. The main entrypoint is the plumerai::VideoIntelligence class which provides access to all functionality of the Plumerai software. It has a process_frame function that needs to be executed on each input frame. The various intelligence features such as object detection and familiar face identification are available through different interfaces, listed here.

Please refer to the minimal examples for example code to get started.

The API is re-entrant, i.e. you can instantiate several VideoIntelligence objects in different threads and use them independently. However, using the same instance from different threads at the same time is not supported.

VideoIntelligence¶

VideoIntelligence(int height, int width);

Initializes a new Video Intelligence object.

This version of the constructor uses dynamic memory allocation. This is the default and recommended way to use the VideoIntelligence object.

This needs to be called only once at the start of the application.

Arguments:

height: The height of the input image in pixels.
width: The width of the input image in pixels.

VideoIntelligence¶

VideoIntelligence(int height, int width, uint8_t* buffer,
                  std::size_t buffer_size);

Initializes a new Video Intelligence object.

This version of the constructor uses a user-provided memory buffer and is meant for advanced use-cases where dynamic memory allocation is not supported or not desired.

This needs to be called only once at the start of the application.

The VideoIntelligence object will not take ownership of the memory buffer. The memory buffer must be at least as big as the size returned by get_required_memory_size and must stay valid for the lifetime of the VideoIntelligence object.

If the size of the memory buffer is not big enough, the resulting object will be invalid, and all functions will return NOT_ENOUGH_MEMORY.

Arguments:

height: The height of the input image in pixels.
width: The width of the input image in pixels.
buffer: A pointer to the memory buffer, must stay valid for the lifetime of the VideoIntelligence object.
buffer_size: The size of the memory buffer. The required size can be obtained through get_required_memory_size.

process_frame¶

template <ImageFormat image_format>
ErrorCode process_frame(const ImagePointer<image_format> image_data,
                        float delta_t = 0.f);

Process a single frame from a video sequence.

Make sure the image is right side up. When it is upside down it can still work but accuracy is significantly degraded.

Supported input formats:

Packed/interleaved data-types:
- PACKED_RGB888: 8-bit red, green, and blue. Red first.
  - Example in memory: R0 G0 B0 R1 G1 B1 ...
- PACKED_RGBA8888: 8-bit red, green, blue, and alpha. Red first. Also known as ABGR32.
  - Example in memory: R0 G0 B0 A0 R1 G1 B1 A1 ...
- PACKED_BGRA8888: 8-bit blue, green, red, and alpha. Blue first. Also known as ARGB32.
  - Example in memory: B0 G0 R0 A0 B1 G1 R1 A1 ...
- PACKED_RGB565: 5-bit red, 6-bit green, and 5-bit blue. Blue and first 3-bits green in the first byte, remaining 3-bits green and red in the second byte.
  - Example in memory: BG0 GR0 BG1 GR1 ...
- PACKED_YUYV: ==YUY2, Luma (Y) and chroma (U, V) with 4:2:0 subsampling.
  - Example in memory: Y0 U01 Y1 V01 Y2 U23 Y3 V23 ...
Planar formats:
- PLANAR_RGB888: 8-bit red, green, and blue. Red first. We assume the R, G, and B data is consecutive in memory without padding in between the three channels.
  - Example in memory: R0 R1 R2 ... G0 G1 G2 ... B0 B1 B2 ...
- PLANAR_YUV420: 8-bit luma (Y) and chroma (U, V) with 4:2:0 subsampling. The Y, U, and V planes can be in 3 different memory locations.
  - Example in memory: Y0 Y1 Y2 Y3 ..., U01 U23 ..., V01 V23 ...
- PLANAR_NV12: 8-bit luma (Y) and interleaved chroma (UV) with 4:2:0 subsampling. The Y and UV planes can be in 2 different memory locations.
  - Example in memory: Y0 Y1 Y2 Y3 ..., U01 V01 U23 V23 ...

Note that not all formats are available on every platform.

Arguments:

image_format: A template parameter which must be one of the ImageFormat enum values.
image_data: A pointer to the image data in the form of an ImagePointer helper struct.
delta_t: The time in seconds it took between this and the previous video frame (1/fps). If set to 0, then the system clock will be used to compute this value.

Returns:

An error code of type ErrorCode. See that enum for more details.

single_image¶

template <ImageFormat image_format>
ErrorCode single_image(const ImagePointer<image_format> image_data,
                       int height = 0, int width = 0);

Process a single image not part of a video sequence.

This should not be used for video data. It can be used for face enrollments from a set of images. The object detection box id values obtained after calling single_image are not related to those generated through process_frame or through other calls to single_image.

See the documentation under VideoIntelligence::process_frame for details about the formats.

Arguments:

image_format: A template parameter which must be one of the ImageFormat enum values.
image_data: A pointer to the image data in the form of an ImagePointer helper struct.
height: The height of the input image in pixels. If height = 0 the height set in the constructor will be used.
width: The width of the input image in pixels. If width = 0 the width set in the constructor will be used.

Returns:

An error code of type ErrorCode. See that enum for more details.

set_night_mode¶

ErrorCode set_night_mode(bool night_mode);

Configure the video intelligence algorithm for either day mode color videos (default) or night mode IR videos.

This configures the video intelligence algorithm for optimal performance on day mode color data (default) or on night mode IR data.

After switching from day to night mode or back, the motion detection algorithm will need a couple of video frames to stabilize, so the motion-grid will not be immediately available.

This function does not have to be called before every frame, only when switching from RGB to IR video data or back.

Arguments:

night_mode: Set to true for night mode or false for day mode.

Returns:

Returns ErrorCode::SUCCESS on success.

camera_is_unstable¶

ErrorCode camera_is_unstable();

Signal that the camera/ISP is unstable.

In some situations, the camera or ISP may be adjusting its settings, resulting in unstable video frames. This can happen, for example, during auto-exposure or switching to and from IR mode. After calling this function, the algorithm will not run motion detection when frames are processed. When camera_is_no_longer_unstable is called, the algorithm will reset its internal motion state and continue processing frames as usual.

Returns:

Returns ErrorCode::SUCCESS on success.

camera_is_no_longer_unstable¶

ErrorCode camera_is_no_longer_unstable();

Signal that the camera/ISP is stable again.

Needs to be called some time after camera_is_unstable to re-enable the Plumerai motion detection. See camera_is_unstable for more information.

Returns:

Returns ErrorCode::SUCCESS on success.

store_state¶

ErrorCode store_state(std::vector<std::uint8_t>& state) const;

Store the current state of the algorithm to a byte array.

This function can be used when processing a video in chunks, doing different chunks at different times or on different machines. The state can be restored by calling restore_state with the data returned by store_state. When the library is built with support for familiar face identification, the state includes the face library.

Constraints:

The delta_t parameter of process_frame can not be left to zero after restoring a previous state.
If familiar face identification is enabled, the state can only be stored and restored when not enrolling.

Arguments:

state: A vector to store the serialized state in.

Returns:

Returns ErrorCode::SUCCESS on success, or ErrorCode::STATE_WHILE_ENROLLING if the state is being stored while enrolling.

Example:

auto pvi = plumerai::VideoIntelligence(height, width);
std::vector<std::uint8_t> state;
auto error_code = pvi.store_state(state);
if (error_code != plumerai::ErrorCode::SUCCESS) {
  printf("ERROR: store_state returned %s\n",
         plumerai::error_code_string(error_code));
}

restore_state¶

ErrorCode restore_state(const std::vector<std::uint8_t>& state);

Restore the state of the algorithm from a byte array.

See store_state for more information. The user must ensure that the height and width of the current object match the height and width of the state that is being restored.

Arguments:

state: A vector containing the serialized state.

Returns:

Returns ErrorCode::SUCCESS on success. Returns ErrorCode::STATE_CORRUPT, ErrorCode::STATE_SETTINGS_MISMATCH, or ErrorCode::STATE_WHILE_ENROLLING on error.

Example:

auto pvi = plumerai::VideoIntelligence(height, width);
// The state as obtained by the store-state API, e.g. loaded from memory
std::vector<std::uint8_t> state = ...;
auto error_code = pvi.restore_state(state);
if (error_code != plumerai::ErrorCode::SUCCESS) {
  printf("ERROR: restore_state returned %s\n",
         plumerai::error_code_string(error_code));
}

debug_mode_start¶

ErrorCode debug_mode_start(bool exclude_images = false);

Enable debug mode.

When process_frame is called while debug mode is active, this will store debug information. This data is meant to be stored to a file to be shared with Plumerai support for further analysis. These files contain uncompressed input image data and can become large (several megabytes per frame). If exclude_images is set to true, no image data will be included.

The resulting debug data can be retrieved using debug_mode_end.

Calling debug_mode_start invalidates the data pointer obtained from any previous calls to debug_mode_end.

Arguments:

exclude_images: Set to true to exclude input images in the debug data.

Returns:

Returns ErrorCode::SUCCESS if all went well. Returns ErrorCode::NOT_AVAILABLE on platforms where this functionality is not available. It can return ErrorCode::INTERNAL_ERROR if this method is called twice without calling debug_mode_end.

debug_mode_end¶

ErrorCode debug_mode_end(const std::uint8_t** debug_data_buffer,
                         std::size_t* debug_data_size);

Stop debug mode and retrieve the debug data.

The user will receive a pointer to the gathered data. The data will be invalidated after another call to debug_mode_start.

This function is only available on platforms that support dynamic memory allocation.

Arguments:

debug_data_buffer: An output parameter that receives a pointer to the debug data.
debug_data_size: An output parameter that receives the size of the debug data.

Returns:

Returns ErrorCode::SUCCESS if all went well. Returns ErrorCode::NOT_AVAILABLE on platforms where this functionality is not available. It can return ErrorCode::INTERNAL_ERROR if this method is called twice without calling debug_mode_start, or ErrorCode::INVALID_ARGUMENT when debug_data_buffer is null.

Example:

auto pvi = plumerai::VideoIntelligence(height, width);
auto error_code = pvi.debug_mode_start(true);
if (error_code != plumerai::ErrorCode::SUCCESS) {
  printf("ERROR: debug_mode_start returned %s\n",
         plumerai::error_code_string(error_code));
}
for (...) {
  pvi.process_frame(...);
}
const std::uint8_t* debug_data_buffer = nullptr;
std::size_t debug_data_size = 0;
auto error_code = pvi.debug_mode_end(&debug_data_buffer,
                                     &debug_data_size);
if (error_code != plumerai::ErrorCode::SUCCESS) {
  printf("ERROR: debug_mode_end returned %s\n",
         plumerai::error_code_string(error_code));
}
// Write the data to a file for Plumerai support
const char* debug_file_name = "/tmp/plumerai_debug_data.bin";
debug_file = fopen(debug_file_name, "wb");
fwrite(debug_data_buffer, 1, debug_data_size, debug_file);
fclose(debug_file);

get_required_memory_size¶

static std::size_t get_required_memory_size(int height, int width);

Get the required memory size for the VideoIntelligence object.

This optional function can be used to avoid any dynamic memory allocation within the VideoIntelligence object. See the advanced VideoIntelligence constructor for details.

Arguments:

height: The height of the input image in pixels.
width: The width of the input image in pixels.

code_version¶

static const char* code_version();

Returns the version of the video intelligence code as a date and time.

For other version numbers, see also ObjectDetection::detector_version for the object detector, and FaceIdentification::embedding_version for the face embedder.

Returns:

The version of the code as YYYY.MM.DD.HH.MM date and time string.

object_detection¶

ObjectDetection object_detection();

Get the interface for the ObjectDetection video intelligence features.

motion_detection¶

MotionDetection motion_detection();

Get the interface for the MotionDetection video intelligence features.

detection_zones¶

DetectionZones detection_zones();

Get the interface for the DetectionZones video intelligence features.

face_identification¶

FaceIdentification face_identification();

Get the interface for the FaceIdentification video intelligence features.

face_enrollment_automatic¶

FaceEnrollmentAutomatic face_enrollment_automatic();

Get the interface for the FaceEnrollmentAutomatic video intelligence features.

face_enrollment_manual¶

FaceEnrollmentManual face_enrollment_manual();

Get the interface for the FaceEnrollmentManual video intelligence features.

vlm_video_collection¶

VLMVideoCollection vlm_video_collection();

Get the interface for the VLMVideoCollection video intelligence features.

Upgrade guide¶

From version 2.2 to 2.3¶

The meaning of the confidence field in the BoxPrediction struct has been changed: the confidence values are now tuned so that for each code/model version and for each object class, sensible values range from 0.5 to 1. If you have been using the confidence field before, e.g. for thresholding, you must re-tune your thresholds accordingly and consider the new Plumerai API functions for this instead.

By default, the Plumerai software now sets a threshold (e.g. 0.6) to filter out values below a certain confidence level. The default value can be queried with a new API function ObjectDetection::get_confidence_threshold. To change the threshold, use the newly added API function ObjectDetection::set_confidence_threshold.

As a side effect of some internal changes, the track IDs produced by the algorithm are no longer guaranteed to be consecutive values, instead they will sometimes skip values.

From version 2.1 to 2.2¶

The track ID in the BoxPrediction struct has been updated, so code that uses track IDs needs to be updated accordingly:

It has been renamed from id to track_id.
It is now a signed 64-bit value (int64_t) instead of an unsigned 32-bit.
The track ID now starts counting from a unique random number instead of 0. This should likely not affect any code, but if it does, it is possible to re-create the old behaviour by saving the first track ID as a base value and subtracting it from subsequent track IDs.

The FaceEnrollmentAutomatic class has been updated:

The following functions have been renamed, code can be updated with a find-replace.
- remove_embedding is now remove_face_id
- remove_all_embeddings is now remove_all_face_ids
- get_embedding_data is now get_enrollment_data
- restore_embedding_data is now restore_enrollment_data
- merge_embeddings is now merge_face_ids
The get_face_snapshot function now includes an additional track_id argument to retrieve snapshots for different tracks within an enrollment. Refer to the function's documentation for further details.

From version 2.0 to 2.1¶

Apart from adding new functionality, two minor changes were made to the existing API in 2.1:

The detection zones API used to accept an array of std::tuple items, but this is now a flat array of coordinates. See DetectionZones::add_zone for more information.
Because there is now also a static memory version of the API, the VideoIntelligence constructor and FaceEnrollmentAutomatic::configure_face_snapshots now take additional optional arguments. Leaving them to their defaults will result in the same behaviour as in 2.0.

From version 1.x to 2.0¶

The name 'People Detection' was changed to 'Video Intelligence' to reflect support for other detection classes and features such as advanced motion detection.
- The library filename changed from libplumerai{peopledetection,faceidentification} to libplumeraivideointelligence. This should be updated in the relevant build scripts.
- The main header file is now plumerai/video_intelligence.h instead of plumerai/people_detection.h, plumerai/face_identification.h or plumerai/face_identification_automatic.h.
- The main class to use is now always plumerai::VideoIntelligence instead of plumerai::PeopleDetection, plumerai::FaceIdentification or plumerai::FaceIdentificationAutomatic.
Different features have been moved to separate 'feature classes', accessible from the main class VideoIntelligence. For example VideoIntelligence::motion_detection() provides access to all functionality related to motion detection, and VideoIntelligence::face_enrollment_automatic() provides access to automatic face enrollment functionality. Most functions that were in PeopleDetection, FaceIdentification or FaceIdentificationAutomatic have been moved to one of those other feature classes. For example:
- PeopleDetection::add_detection_zone(...) is now VideoIntelligence::detection_zones().add_zone(...)
- FaceIdentification::add_face_embedding(...) is now VideoIntelligence::face_enrollment_manual().add_embedding(...)
- FaceIdentification::get_face_id(...) is now VideoIntelligence::face_identification().get_face_id(...)
The function process_frame no longer returns bounding boxes as output. Instead, the boxes are now accessible through VideoIntelligence::object_detection().get_detections(...) after the frame has been processed. Furthermore, image data pointers are now wrapped in a plumerai::ImagePointer struct, this is shown in the example below.

Code that looked like this before:

#include "plumerai/people_detection.h"

auto ppd = plumerai::PeopleDetection(height, width);

std::vector<BoxPrediction> predictions(0);
const auto error_code =
    ppd.process_frame<image_format>(image_data, predictions, delta_t);

Should be updated to this:

#include "plumerai/video_intelligence.h" // (1)

// Initialize the Video Intelligence object (2)
auto pvi = plumerai::VideoIntelligence(height, width);

// Process a video frame (3)
auto error_code = pvi.process_frame(
    plumerai::ImagePointer<image_format>(image_data), delta_t);

// Get bounding box results (4)
std::vector<BoxPrediction> predictions;
error_code = pvi.object_detection().get_detections(predictions);

Replace plumerai/people_detection.h, plumerai/face_identification.h or plumerai/face_identification_automatic.h by plumerai/video_intelligence.h.
Replace plumerai::PeopleDetection by plumerai::VideoIntelligence.
Wrap the raw pointer argument in plumerai::ImagePointer. Remove the predictions output argument.
Get bounding box results through the new ObjectDetection::get_detections function.

From version 1.14 to 1.15¶

The confidence_threshold argument has been removed from the single_image function. The return type of debug_next_frame changed from bool to ErrorCodeType.

In version 1.15 the API of start_face_enrollment and finish_face_enrollment changed compared to earlier versions:

The function get_cumulative_enrollment_score was removed.
The previous_embedding argument of start_face_enrollment was removed.
The resulting embedding of finish_face_enrollment is now returned via a reference argument.
The return type of both functions is now an error code, and finish_face_enrollment can indicate a low quality enrollment through this error code. There is no more need to check the enrollment score for the quality of the embedding.

If your code looked like this before:

const auto embedding = ffid.finish_face_enrollment();

Then it should be updated as follows:

std::vector<int8_t> embedding;
const auto error_code = ffid.finish_face_enrollment(embedding);
if (error_code != plumerai::ErrorCode::SUCCESS) {
  printf("Error code: %d\n", error_code);
}

From version 1.13 to 1.14¶

In version 1.14 the API of process_frame changed compared to earlier versions: the return type is now an error code, and the resulting boxes are now returned via a reference argument.

If your code looked like this before:

const auto predictions = ppd.process_frame<image_format>(image.data());

Then it should be updated as follows:

std::vector<BoxPrediction> predictions(0);
const auto error_code = ppd.process_frame<image_format>(image.data(), predictions);
if (error_code != plumerai::ErrorCode::SUCCESS) {
  printf("Error code: %d\n", error_code);
}