Plumerai Video Search¶

Reviewing weeks of footage from your cameras typically requires scrubbing through countless hours of video. With Plumerai Video Search, you simply type what you’re looking for, like “mailman” or “man with a ladder”, and get instant results. Each result includes a representative thumbnail, letting users find relevant footage quickly and accurately.

Plumerai Video Search illustrated.

Plumerai Video Search is built using a Vision Language Model ('VLM') consisting of a video embedder and text embedder. It runs efficiently on-device or in the cloud and leverages the power of Plumerai's Video Intelligence solutions, including Plumerai Object Detection, Plumerai Familiar Face Identification, and Plumerai Advanced Motion Detection.

Product features:

Combines Plumerai's TinyML and VLM for high accuracy results and significant cloud compute cost reduction.
Runs efficiently on the edge, fully in the cloud, or in a hybrid set-up.
Provides the most relevant thumbnail of a video clip in the search results.
Performs thresholding, filtering results based on relevance, avoiding noisy or low-confidence matches.

Video Search architecture¶

Plumerai Video Search is composed of four core components, each handling a specific part of the search pipeline. These components are shown in the diagram below and described in detail in the following list.

The first two components, 'VLM Video Collection' and 'VLM Video Embedder' process video clips as they are recorded. These components are designed to be efficient, fast, and can run on an edge device or in a hybrid-fashion, where the collection runs on the edge and the embedder in the cloud.

The last two components, 'VLM Text Embedder' and 'Video and Text Matcher' run once per user search query, typically on-demand based on user-input. These can run in the cloud or on a user's mobile phone or laptop for example.

overview

The four main components in more detail:

The Plumerai VLM Video Collection processes the incoming video data and acts as a spatiotemporal salient frame and region selector, reusing the Familiar Face Identification, Object Detection and Advanced Motion Detection modules.
Results of the video collection are fed into the Plumerai VLM Video Embedder using a vision language model to embed image data into small embeddings. These embeddings are then stored together with metadata and serve as objects that can be rapidly searched through.
The Plumerai VLM Text Embedder processes a user text-query into an embedding.
The Plumerai Video and Text Matcher matches the text embedding against the video embeddings, performs ranking and thresholding, and returns the final results to the user.

The API for the first two components is described here for C++ and here for Python. The API for the last two components is described here for Python.

Deployment scenarios¶

Plumerai Video Search supports flexible deployment configurations, depending on device capabilities and system requirements. The main deployment models are illustrated below.

All on the camera¶

In this scenario, all video-related components run on the camera: both the VLM Video Collection and the VLM Video Embedder. The VLM Text Embedder and Video and Text Matcher typically run in the cloud, but can also run on a local hub, laptop, or mobile device.

overview

Hybrid camera/cloud¶

The VLM Video Collection runs on the camera, while the more compute and memory-intensive VLM Video Embedder runs in the cloud. Just like in the scenario where everything runs on the camera, the VLM Text Embedder and Video and Text Matcher run in the cloud or on a user's mobile phone or so.

overview

All on the cloud¶

All components of Plumerai Video Search run in the cloud. This configuration is suitable for systems that do not support running any AI components on-device. Note that this configuration requires video decompression in the cloud as an additional step before Plumerai VLM Video Collection, which is not required in the other configurations.

overview