Skip to content

Plumerai VLM Text Search API

This document describes the API for the Video and Text Matcher functionality.

These components are part of Plumerai Video Search and Plumerai Custom AI Notifications. See those product pages for architectural context and real-world usage examples.

For reference, the architecture diagram below shows how the text embedder and matcher fit into the overall system:

overview

For the VLM Video Collection and VLM Video Embedder API documentation, visit this page.

Creating text encodings can be done using the Python API encode_text. The resulting binary blob can then be used in the search C++ function.

ErrorCode
search(const std::vector<std::tuple<const std::uint8_t*, std::size_t>>&
           user_history,
       std::tuple<const std::uint8_t*, std::size_t> text_embedding,
       std::vector<ScoreResult>& search_results);

Search a user's history for relevance to a given query string.

The text embedding is expected to be generated using the plumerai_vision_language package.

The results are returned in the same order as the user_history input, i.e. the first ScoreResult corresponds to the first video embedding in user_history.

Arguments:

  • user_history: Vector of tuples with pointers and sizes of video embeddings.
  • text_embedding: Tuple with a pointer and size of the text embedding.
  • search_results: Vector to store the search results.

Returns:

  • Returns INVALID_VIDEO_EMBEDDING or INVALID_TEXT_EMBEDDING on error, otherwise SUCCESS.

Example:

// Assume a text embedding is stored in `text_embedding`
std::vector<std::uint8_t> text_embedding;
// Create a list of videos to search through
std::vector<std::tuple<const std::uint8_t*, std::size_t>> user_history;
// Video embedding obtained from plumerai::VLMVideoEmbedder
user_history.emplace_back(clip_embedding_ptr, clip_embedding_size);

std::vector<plumerai::vlm_text_search::ScoreResult> search_results;
const auto error_code = plumerai::vlm_text_search::search(
    user_history, {text_embedding.data(), text_embedding.size()},
    search_results);
if (error_code != plumerai::ErrorCode::SUCCESS) {
  printf("Error: %s\n", plumerai::error_code_string(error_code));
}

IsMatch

enum class IsMatch : int { NO = 0, MAYBE = 1, YES = 2 };

Represents whether a video and search query match.

Allowed values:

  • YES: Video matches query.
  • NO: Video does not match query.
  • MAYBE: Video may match query but is uncertain.

SearchResultMetadata

struct SearchResultMetadata {
  float time;
};

Metadata as used in the ScoreResult objects.

ScoreResult

struct ScoreResult {
  float similarity;
  IsMatch is_match;
  SearchResultMetadata metadata;
};

Result of a search query.

A higher similarity means the corresponding video is more relevant, and is_match indicates whether each result meets the match threshold.

The metadata.time field contains the time (in seconds) of the relevant part of the video for a search query. This is useful for displaying search results to users as the thumbnail can then be relevant for what they are looking for.