Plumerai People Detection Python API

This document describes the Python API for the Plumerai People Detection software for videos.


The Python API consists of a single class with a constructor that needs to be ran once, and a process_frame function that needs to be executed on each input frame. Additionally, there is a single_image function that can be used to process a single image independent of a video sequence.



PeopleDetection.__init__(height: int, width: int)

Initializes a new people detection object.

This needs to be called only once at the start of the application.


  • height:: The height of the input image in pixels.
  • width:: The width of the input image in pixels.


  • Nothing.


    image, delta_t: float = 0.0
) -> tuple[ErrorCode, tuple[BoxPrediction, ...]]

Process a single frame from a video sequence with RGB input.

The image must have the height and width that was specified when the PeopleDetection object was created.


  • image: A tensor of shape (video_height, video_width, 3) with RGB image data. It can be a Numpy array or TensorFlow, PyTorch or Jax tensor.
  • delta_t: The time in seconds it took between this and the previous video frame (1/fps). If left to the default of 0, then the system clock will be used to compute this value.


  • An error code of type ErrorCode and the resulting bounding-boxes found in the frame.


PeopleDetection.single_image(image) -> tuple[ErrorCode, tuple[BoxPrediction, ...]]

Process a single image not part of a video sequence.

This should not be used for video data. The returned box id values are not related to those returned by process_frame or other calls to single_image.


  • image: A tensor of shape (*, *, 3) with RGB image data. It can be a Numpy array or TensorFlow, PyTorch or Jax tensor.


  • An error code of type ErrorCode and the resulting bounding-boxes found in the frame.


A structure representing a single resulting bounding box. Coordinates are between 0 and 1, the origin is at the top-left.

class BoxPrediction(NamedTuple):
    y_min: float  # top coordinate between 0 and 1 in height dimension
    x_min: float  # left coordinate between 0 and 1 in width dimension
    y_max: float  # bottom coordinate between 0 and 1 in height dimension
    x_max: float  # right coordinate between 0 and 1 in width dimension
    confidence: float  # between 0 and 1, higher means more confident
    id: int = 0  # the tracked identifier of this box
    class_id: int = DetectionClass.CLASS_UNKNOWN  # the class of the object


class DetectionClass(Enum):
    CLASS_HEAD = 2
    CLASS_FACE = 3


Error codes returned which can be returned by the API.

class ErrorCode(Enum):
    # All went well
    SUCCESS = 0

    # Should not occur, contact Plumerai if this happens

    # The `delta_t` parameter should be >= 0

Example usage

Below is an example of using the Python API shown above.

import numpy as np
import plumerai_people_detection as ppd_api

# Settings, to be changed as needed
width = 1600  # camera image width in pixels
height = 1200  # camera image height in pixels

# Initialize the people detection algorithm
ppd = ppd_api.PeopleDetection(height, width)

# Loop over frames in a video stream (example: 10 frames)
for t in range(10):
    # Some example input here, normally this is where camera data is acquired
    image = np.zeros((height, width, 3), dtype=np.uint8)

    # The time between two video frames in seconds. In this example we assume
    # a constant frame rate of 30 fps, but variable rates are supported.
    delta_t = 1. / 30.

    # Process the frame
    error_code, predictions = ppd.process_frame(image, delta_t)
    if error_code != ppd_api.ErrorCode.SUCCESS:
        raise RuntimeError(f"Error in 'process_frame': {error_code}")

    # Display the results
    for p in predictions:
            f"Box #{} of class {p.class_id} with confidence {p.confidence} "
    if len(predictions) == 0:
        print("No bounding boxes found in this frame")