Plumerai VLM Video API¶

This page documents the API for the VLM Video Collection and VLM Video Embedder components.

These components are used in Plumerai Video Search and Plumerai Custom AI Notifications. See those product pages for an overview of how these components are used in practice.

For context, the architecture diagram from the Video Search product page is included below:

overview

To use the text embedder and matcher components, refer to the VLM Text Search API documentation.

See the minimal examples for example usage of the VLM Video Collection and VLM Video Embedder APIs.

VLMVideoCollection¶

start_clip¶

ErrorCode start_clip(bool include_thumbnails = false,
                     bool include_captioning_data = false) const;

Start a new clip to collect data for the VLM Video Embedder.

The user needs to decide when data collection should start, e.g. when an object is detected or when the camera awakes through a motion event. The VLMVideoCollection then will start collecting the necessary data for the VLM Video Embedder from this point onwards for every call to process_frame, until end_clip is called.

The results will be available when calling end_clip. See end_clip for example usage.

If include_thumbnails is True, the collected data can be used for thumbnail generation. If include_captioning_data is True, the result can be used for caption generation. If the only purpose of the data is to do video search with the VLM Video Embedder, then both can be set to false to keep the data size smaller. Accuracy of video search is not affected by this parameter.

This is the clip-based version of video search. As an alternative, a clip-less version is available via set_collection_ready_callback, see docs for that function for more details.

Arguments:

include_thumbnails: If True, include data required for thumbnails.
include_captioning_data: If True, include data required for video captioning.

Returns:

Returns SUCCESS on success, or CLIP_ALREADY_STARTED when a clip was already started.

end_clip¶

ErrorCode end_clip(std::vector<std::uint8_t>& clip_data) const;

Ends the data collection for the clip started with start_clip.

The results will be moved into the clip_data buffer.

Arguments:

clip_data: A buffer to store the collected data in. Any existing data is overwritten.

Returns:

Returns SUCCESS on success, CLIP_NOT_YET_STARTED when a clip was not started previously, or no frames have been processed, or the clip was already ended before.

Example:

auto pvi = plumerai::VideoIntelligence(height, width);
bool clip_in_progress = false;
auto clip_data = std::vector<std::uint8_t>(0);
while(true) {  // video frames loop
  pvi.process_frame(...);
  if (...) {  // e.g. predictions found, user input, fixed time
    auto error_code = pvi.vlm_video_collection().start_clip();
    if (error_code != plumerai::ErrorCode::SUCCESS) return;
    clip_in_progress = true;
  }
  if (clip_in_progress && ...) {  // e.g. no more predictions found
    auto error_code = pvi.vlm_video_collection().end_clip(clip_data);
    if (error_code != plumerai::ErrorCode::SUCCESS) return;
    clip_in_progress = false;
    // `clip_data` can be stored or processed with `VLMVideoEmbedder`
  }

set_collection_ready_callback¶

ErrorCode set_collection_ready_callback(
    std::function<void(std::vector<std::uint8_t>&&)> callback) const;

Register a callback invoked synchronously from process_frame whenever a new collection is ready.

This is the clip-less version of video search, as an alternative to the clip-based version using the start_clip/end_clip API. A callback is registered that automatically triggers whenever data for video search is ready to be collected. This is driven by internal Plumerai logic rather than a user-defined time interval. In case of clear clip boundaries (e.g. triggered by PIR waking up the camera) it is recommended to use the start_clip/end_clip API instead. In case of cameras that run 24/7 with no clear clip boundaries, this API is recommended.

The user must call finish_all_collections at end-of-video to flush still-active collections.

The callback is invoked on the calling thread of process_frame (and of finish_all_collections) and may fire zero, one, or multiple times within a single call.

Arguments:

callback: A function invoked with the serialized collection data. The data is passed by rvalue reference; ownership is transferred. The user must move into their own container or copy the bytes before the callback returns.

Returns:

Returns SUCCESS when the callback is registered, or NOT_AVAILABLE when VLM video collection is not available.

Example:

auto pvi = plumerai::VideoIntelligence(height, width);
auto collections = std::vector<std::vector<std::uint8_t>>{};
auto error_code =
    pvi.vlm_video_collection().set_collection_ready_callback(
        [&collections](std::vector<std::uint8_t>&& blob) {
          collections.push_back(std::move(blob));
          // ... or store/embed the blob with `VLMVideoEmbedder`
        });
if (error_code != plumerai::ErrorCode::SUCCESS) return;
while (...) {  // video frames loop
  pvi.process_frame(...);  // callback may fire 0+ times per call
}
pvi.vlm_video_collection().finish_all_collections();

finish_all_collections¶

ErrorCode finish_all_collections() const;

At the end of a video sequence, call this to flush collections. Only to be used in conjunction with set_collection_ready_callback.

Returns:

Returns SUCCESS when everything was successful, or NOT_AVAILABLE when VLM video collection is not available.

VLMVideoEmbedder¶

compute_embeddings¶

ErrorCode compute_embeddings(const std::uint8_t* clip_data,
                             size_t clip_data_size,
                             std::vector<std::uint8_t>& clip_embeddings,
                             bool compute_single_unit_only = false) const;

Compute video embeddings on data collected using VLMVideoCollection.

Depending on the size of the collected data, this can be compute-heavy. The user can optionally set compute_single_unit_only to compute only a single part of the video embeddings. When compute_single_unit_only is set, this needs to be called in a loop with the same arguments. The error code return value will inform the user whether all units were computed, and whether the results are valid.

Arguments:

clip_data: A pointer to the data collected using VLMVideoCollection::start_clip and VLMVideoCollection::end_clip.
clip_data_size: The size of the data collected using VLMVideoCollection::start_clip and VLMVideoCollection::end_clip.
clip_embeddings: The resulting embeddings. These are only valid when the function returns SUCCESS.
compute_single_unit_only: Can be set to do a partial computation of the embeddings. If set, this needs to be called in a loop, see above.

Returns:

Returns SUCCESS when all embeddings have been computed, EMBEDDING_PART_COMPUTED when a single unit was successfully computed (but not everything), CLIP_NOT_YET_STARTED when an empty clip is provided, or INVALID_CLIP_DATA when the clip data is invalid.

Example:

auto clip_data = ...;  // from `VLMVideoCollection::end_clip`.
auto pvllme = plumerai::VLMVideoEmbedder();
auto clip_embeddings = std::vector<std::uint8_t>(0);
error_code = pvllme.compute_embeddings(clip_data.data(),
                                       clip_data.size(),
                                       clip_embeddings);
if (error_code != plumerai::ErrorCode::SUCCESS) return;
// `clip_embeddings` can be stored for e.g. video search.