Plumerai VLM Text Search API¶

This document describes the API for the VLM Text Embedder and the Video and Text Matcher functionality.

These components are part of Plumerai Video Search and Plumerai Custom AI Notifications. See those product pages for architectural context and real-world usage examples.

For reference, the architecture diagram below shows how the text embedder and matcher fit into the overall system:

overview

For the VLM Video Collection and VLM Video Embedder API documentation, visit this page for C++ and this page for Python.

Data types¶

TextEncoding¶

TextEncoding: TypeAlias = bytes

Raw bytes obtained from running encode_text.

VideoEncoding¶

VideoEncoding: TypeAlias = bytes

Raw bytes obtained from running the VLM Video Embedder. This is input to the search and score_video functions.

IsMatch¶

class IsMatch(Enum):
    NO = 0
    MAYBE = 1
    YES = 2

Represents whether a video and search query match.

Allowed values:

YES: Video matches query. Evaluates to True when used as a boolean.
NO: Video does not match query. Evaluates to False when used as a boolean.
MAYBE: Video may match query but is uncertain. Evaluates to True when used as a boolean.

SearchResultMetadata¶

class SearchResultMetadata:
    time: float

Metadata as used in the SearchResult and ScoreResult objects.

SearchResult¶

class SearchResult:
    similarities: list[float]
    is_match: list[IsMatch]
    metadata: list[SearchResultMetadata]

The result as returned by the search function.

ScoreResult¶

class ScoreResult:
    similarity: float
    is_match: IsMatch
    metadata: SearchResultMetadata

The result as returned by the score_video function.

Functions¶

encode_text¶

encode_text(text: str) -> TextEncoding

Encode a text string into Plumerai's search encoding format.

Arguments:

text: A string of text you want to encode.

Returns:

bytes of Plumerai's TextEncoding object.

search¶

search(user_history: list[VideoEncoding],
       query: str | TextEncoding) -> SearchResult

Search a user's history for relevance to a given query string.

Arguments:

user_history: The list of VideoEncoding objects from a single user you want to search over.
query: A search string e.g. "a cat in the driveway" or the TextEncoding of a search string returned by encode_text.

Returns:

A SearchResult object which contains the similarities and match values with respect to the given query.

score_video¶

score_video(video_encoding: VideoEncoding,
            text_encoding: TextEncoding) -> ScoreResult

Score a single video encoding against a text encoding.

Arguments:

video_encoding: A single VideoEncoding object.
text_encoding: A single TextEncoding object as returned by encode_text.

Returns:

A list of similarity scores between the video and each text encoding.

Example: regular video search¶

To run a search query, all relevant video encodings for a given user need to be loaded into memory. The search function then takes a user_history which is a list[VideoEncoding] (output from the VLM Video Embedder) and a search query which is a str or the bytes resulting from calling encode_text. It is good practice to cache common search queries using encode_text and then read these from cache in order to do a search - this reduces latency and compute because search no longer has to encode the text.

from plumerai_video_search import IsMatch, search
# Read relevant items from storage / database here
plumerai_video_encoding = ...
user_history = [plumerai_video_encoding, plumerai_video_encoding2]
search(user_history=user_history, query="cat")
SearchResult(similarities=[0.3192299008369446, 0.032717231661081314], is_match=[IsMatch.YES, IsMatch.NO], metadata=[SearchResultMetadata(time=3.813), SearchResultMetadata(time=0.3333333333333333)])

The function search returns a SearchResult object which contains the similarities where a higher similarity means the corresponding video is more relevant and is_match which indicates whether each result meets the match threshold or other criteria such as metadata-based filters.

The Plumerai SearchResult object should be combined with video IDs to provide a user with results e.g.

video_ids: list = ...  # A list of video IDs corresponding to user_history
search_result = search(user_history=user_history, query="cat")
indices = np.argsort(search_result.similarities)[::-1].tolist()
processed_search_result = [
    (video_ids[i], search_result.metadata[i].time)
    for i, match in zip(indices, search_result.is_match) if match == IsMatch.YES
]

The object SearchResultMetadata.time contains the time (in seconds) of the relevant part of the video for a search query. This is useful for displaying search results to users as the thumbnail can then be relevant for what they're looking for.

Example: individual video scoring¶

You can also score a single video against a text query using the score_video function. This is useful for classifying videos for notification or UI features.

This is different to the search API in that information from other videos aren't available and so the scoring and thresholding are scaled differently.

from plumerai_video_search import encode_text, score_video
queries = [encode_text("cat"), encode_text("person")]
[score_video(plumerai_video_encoding, q) for q in queries]
[ScoreResult(similarity=1.2557752, is_match=IsMatch.YES, metadata=SearchResultMetadata(time=3.22)), ScoreResult(similarity=-2.490197, is_match=IsMatch.NO, metadata=SearchResultMetadata(time=0.149))]

Optimization: Text embedding caching¶

To reduce latency, a lookup table of common search queries is used.

For queries not found in the cache, the model dynamically encodes the text into an embedding. Encoding can be relatively slow on a single CPU core. You can set the PLUMERAI_THREADS environment variable to reduce the latency, e.g. setting PLUMERAI_THREADS=8 will use 8 cores for encoding the text.