Tutorial: Plumerai People Detection library on ESP32-S3¶
On this page you will find instructions of how to run the Plumerai People Detection library on the Espressif ESP32-S3-EYE and similar boards. In particular, we demonstrate how to build a demo application with camera input and display output using Espressif's ESP-IDF and ESP-WHO software packages. However, this is not a requirement for the library: it can run bare-metal, e.g. without FreeRTOS. The example on this page is tested with the Espressif ESP32-S3-EYE, the same as used in our ESP32-S3 demonstrator, but will work in a similar way with other boards that are supported by ESP-WHO.
For the Seeed Studio XIAO ESP32S3 Sense, similar steps are required, but then using EdgeLab instead of ESP-WHO. Please contact Plumerai if you want to have a more detailed description of the steps required.
In step 3 onwards we will modify existing code. If you check out ESP-WHO as a git repository in step 1, that means you can also apply a git patch directly to the code to make all required changes. That patch can be found here without the optional improvements or here with the optional improvements. In both cases, these are for the main repository. On top of that, you need a small patch for the esp32-camera sub-repository. However, we recommend to instead follow the instructions below to get a good understanding of what has to be done. The patch file can be consulted in case some of the instructions are unclear. At each step there will also be a smaller patch file specific to that step. Note that applying a patch file for a step assumes you have applied previous patches in order, otherwise conflicts might show up.
Step 1: Install Espressif's ESP-IDF and ESP-WHO¶
We first have to install Espressif's ESP-IDF and ESP-WHO software packages.
The recommended approach to install ESP-IDF is to follow the official instructions, which includes an option to install ESP-IDF as an IDE plugin, or by manual installation. This tutorial assumes that version 4.4 of ESP-IDF is installed, version 5 or newer will likely not work because ESP-WHO requires version 4.4. However the Plumerai People Detection should work with any version of ESP-IDF.
Once you have ESP-IDF installed, you can follow the instructions for installing ESP-WHO. Assuming you have git
already installed, this can be as simple as:
For reference, this tutorial was tested using the latest commit in master
at the time of writing, which was 5497ff27a27770eb06e9592eb7e135a2b3c4d0e1
.
Step 2: Build an existing example application¶
In step 3 we will modify an existing Espressif example application to use the Plumerai People Detection library. But before we do, it is good practice to build and run the unmodified example, making sure everything works fine.
The following commands assume ESP-IDF is installed and available on command-line. Depending on your set-up, you might have to run something like . /path/to/esp-idf/export.sh
.
We will modify the human_face_detection
example. To build it, navigate to the ESP-WHO folder in a terminal and then run:
To build, flash, and run the example on device, attach an ESP32-S3 device through USB and run:
A standard face detection demo application should now run on device, and the log should print things like this as soon as a face is in view:
I (10059) detection_result: [ 0]: ( 51, -9, 173, 159)
I (10059) detection_result: left eye: ( 89, 54), right eye: (143, 53), nose: (119, 82), mouth left: ( 96, 111), mouth right: (138, 109)
If you see weird colors on the display this can be solved by reconnecting the USB cable.
If the above steps didn't work, please consult the generic ESP-IDF or ESP-WHO documentation, or the documentation specific for your device. Once this is working fine, you are ready to integrate the Plumerai People Detection software in the next steps.
Step 3: Prepare the build configuration¶
Before modifying the application, we'll change the build configuration to fit our needs. We need to make two changes in the build configuration, which can be opened by running idf.py menuconfig
in the examples/human_face_detection/lcd
sub-folder of your ESP-WHO installation:
- Navigate to
Compiler options
→Optimization Level (...)
and selectOptimize for performance (-O2)
. Use the escape key to navigate back to the main menu for the next step. - Navigate to
Component config
→ESP System Settings
and disable theWatch CPU0 Idle Task
andWatch CPU1 Idle Task
options: in front of these it should look like[ ]
and not like[*]
. Now pressq
to exit and save your changes.
Step 4: Include the Plumerai People Detection files¶
For reference, the patch file for this step (without the unzipping of the library part) can be found here.
The next step is to extract the contents of the plumerai_people_detection_micro_esp32.zip
file (if you don't have one, please contact Plumerai) to the examples/human_face_detection/lcd
sub-folder of your ESP-WHO installation. There should now (additionally) be the following files in the lcd
folder:
├── plumerai_people_detection_micro_esp32
│ ├── include
│ │ └── plumerai
│ │ ├── box_prediction.h
│ │ ├── model_defines.h
│ │ └── people_detection_micro.h
│ ├── lib
│ │ └── esp32-s3
│ │ └── libplumerai_people_detection_micro.a
│ └── VERSION
Take a note of the values inside include/plumerai/model_defines.h
, in particular the PLUMERAI_IMAGE_WIDTH
and PLUMERAI_IMAGE_HEIGHT
image dimensions and the mentioned image format. In this tutorial we assume the image dimensions are 640x480 and the image format is RGB565 with 2 bytes per pixel. If this is different in your case, either contact Plumerai to request a new library with the right settings, or adjust the steps below according to the different image dimensions and format.
First, modify examples/human_face_detection/lcd/CMakeLists.txt
and add the following line just above the project(human_face_detection_lcd)
line:
And the following line just below project(human_face_detection_lcd)
(i.e. at the end of the file):
target_link_libraries(${PROJECT_NAME}.elf ${CMAKE_CURRENT_LIST_DIR}/plumerai_people_detection_micro_esp32/lib/esp32-s3/libplumerai_people_detection_micro.a)
To verify if all was fine, it is possible to run idf.py flash monitor
again. Of course, the application hasn't changed, but it should now be possible to use the Plumerai People Detection API in the application.
Step 5: Prepare the display code to accept different camera resolutions¶
For reference, the patch file for this step can be found here.
For best quality we want to increase the default camera resolution of 240x240 to something larger, say 640x480. The reason that this low-resolution was chosen in the example, is that the LCD is also assumed to work at 240x240, which is indeed the case for the Espressif ESP32-S3-EYE board. Thus, in this step we will modify the display code to prepare for our camera resolution change.
Most of the changes are in who_human_face_detection.cpp
in the components/modules/ai
sub-folder of ESP-WHO. First, we create some new globals (e.g. between the #define TWO_STAGE_ON 1
define and the static const char *TAG
variable):
constexpr int lcd_width = 240;
constexpr int lcd_height = 240;
constexpr int lcd_display_height = 180; // For 4:3 aspect ratio
camera_fb_t * lcd_buffer = nullptr;
Then, we initialize the LCD buffer at the very start of the static void task_process_handler(void *arg)
function:
lcd_buffer = reinterpret_cast<camera_fb_t *>(heap_caps_aligned_alloc(16, sizeof(camera_fb_t), MALLOC_CAP_8BIT));
lcd_buffer->height = lcd_height;
lcd_buffer->width = lcd_width;
lcd_buffer->len = 2 * lcd_buffer->width * lcd_buffer->height;
lcd_buffer->format = PIXFORMAT_RGB565;
ESP_LOGI(TAG, "Allocating %d bytes LCD buffer in external PSRAM", lcd_buffer->len);
lcd_buffer->buf = reinterpret_cast<uint8_t *>(heap_caps_aligned_alloc(16, lcd_buffer->len, MALLOC_CAP_8BIT | MALLOC_CAP_SPIRAM));
for (int i = 0; i < lcd_buffer->len; ++i) { lcd_buffer->buf[i] = 0; }
Further down, at the start of the if (xQueueReceive(xQueueFrameI, &frame, portMAX_DELAY))
section (just before #if TWO_STAGE_ON
and the detector.infer
calls), we resize the camera frame to the dimensions of the LCD buffer:
dl::image::resize_image_nearest(
reinterpret_cast<uint16_t *>(frame->buf), {(int)frame->height, (int)frame->width, 1},
reinterpret_cast<uint16_t *>(lcd_buffer->buf), {lcd_display_height, lcd_width, 1}
);
Next, a little bit below, just before if (detect_results.size() > 0)
, we can release the camera buffer, since we have consumed it for inference and for the LCD:
And for the last change in this file, we modify xQueueSend(xQueueFrameO, &frame, portMAX_DELAY);
to send the LCD buffer instead:
Since we return the camera framebuffer above already, we also change components/modules/lcd/who_lcd.c
by removing or commenting out line 31:
To verify if all was fine, it is possible to run idf.py flash monitor
again. The original human face detection algorithm is still running, but won't display results on screen anymore. Furthermore, the 240x240 camera image is now squeezed into 240x180.
Step 6: Integrate Plumerai People Detection¶
For reference, the patch file for this step can be found here.
Now it is finally time to use the Plumerai People Detection API.
Most of the changes are in who_human_face_detection.cpp
in the components/modules/ai
sub-folder of ESP-WHO. First, at the top of the file (e.g. instead of the old setting #define TWO_STAGE_ON 1
) we include the Plumerai library headers and define some utility functions:
#include "plumerai/model_defines.h"
#include "plumerai/people_detection_micro.h"
extern "C" void DebugLog(const char *format, va_list args) { vprintf(format, args); }
int clip(float value, int max) { return std::max(0, std::min(static_cast<int>(value), max)); }
void draw_detection(BoxPrediction &p, uint16_t * buffer, int width, int height) {
if (p.confidence < 0.7) { return; } // confidence threshold, adjust as needed
int color = 63488; // red in RGB565
int x_min = clip(p.x_min * width, width - 1);
int y_min = clip(p.y_min * height, height - 1);
int x_max = clip(p.x_max * width, width - 1);
int y_max = clip(p.y_max * height, height - 1);
dl::image::draw_hollow_rectangle(buffer, height, width, x_min, y_min, x_max, y_max, color);
}
Now we can replace the initialization of the human face detector with the Plumerai people detector. We replace the following lines:
HumanFaceDetectMSR01 detector(0.3F, 0.3F, 10, 0.3F);
#if TWO_STAGE_ON
HumanFaceDetectMNP01 detector2(0.4F, 0.3F, 10);
#endif
with the following (see the PeopleDetectionInit docs for more information):
ESP_LOGI(TAG, "Reserving %d bytes of tensor arena in internal RAM", TENSOR_ARENA_SIZE);
auto tensor_arena = reinterpret_cast<unsigned char *>(heap_caps_aligned_alloc(16, TENSOR_ARENA_SIZE, MALLOC_CAP_8BIT | MALLOC_CAP_INTERNAL));
if (tensor_arena == nullptr) { ESP_LOGI(TAG, "Error: could not allocate tensor arena"); }
ESP_LOGI(TAG, "Initializing the Plumerai People Detection");
auto error_code = PeopleDetectionInit(tensor_arena);
if (error_code != 0) { ESP_LOGI(TAG, "Error: could not initialize Plumerai People Detection"); }
constexpr int max_detections = 20;
BoxPrediction predictions[max_detections];
Next, in the while-loop we can replace these calls to the old detector:
#if TWO_STAGE_ON
std::list<dl::detect::result_t> &detect_candidates = detector.infer((uint16_t *)frame->buf, {(int)frame->height, (int)frame->width, 3});
std::list<dl::detect::result_t> &detect_results = detector2.infer((uint16_t *)frame->buf, {(int)frame->height, (int)frame->width, 3}, detect_candidates);
#else
std::list<dl::detect::result_t> &detect_results = detector.infer((uint16_t *)frame->buf, {(int)frame->height, (int)frame->width, 3});
#endif
with the following (see the PeopleDetectionProcessFrame docs for more information):
int num_results = 0;
constexpr float framerate = 2.5f; // in frames-per-second (FPS)
error_code = PeopleDetectionProcessFrame(predictions, max_detections, &num_results, 1 / framerate, frame->buf);
if (error_code != 0) { ESP_LOGI(TAG, "Error: could not process a frame using Plumerai People Detection"); }
ESP_LOGI(TAG, "Detected %d person(s)", num_results);
Furthermore, still in this file, we replace the old box drawing code:
if (detect_results.size() > 0)
{
draw_detection_result((uint16_t *)frame->buf, frame->height, frame->width, detect_results);
print_detection_result(detect_results);
is_detected = true;
}
with one suited for the Plumerai BoxPrediction format:
for (int result_id = 0; result_id < num_results; ++result_id) {
draw_detection(predictions[result_id], reinterpret_cast<uint16_t *>(lcd_buffer->buf), lcd_width, lcd_display_height);
is_detected = true;
}
And finally, at the very bottom of the file, we increase the stack size from 4KB to 8KB in the first call to xTaskCreatePinnedToCore
:
We also have to request a different camera resolution from the application. We do that by modifying main/app_main.cpp
in the examples/human_face_detection/lcd
folder, changing FRAMESIZE_240X240
into FRAMESIZE_VGA
, and reducing the number of frame buffers from 2 to 1 to obtain:
Now, we are ready to run idf.py flash monitor
again, to see if everything builds fine. However, when running it on device, you'll notice that the detection quality is either very poor or non-existent at all. On the terminal you'll likely see the following message appear many times:
This is because of the way the RGB565 format is handled inside the ESP-WHO code, which we'll fix in the next step.
Step 7: Fix the RGB565 camera and LCD settings¶
For reference, the patch file for this step can be found here for the main repository and here for the esp32-camera sub-repository.
The ESP-WHO (and the esp32-camera dependency) use an unconventional version of the RGB565 format, where the green values (which are split over two bytes) are partially in reversed order. This is an issue for the Plumerai People Detection algorithm, which expects regular little-endian RGB565.
First, we modify the camera code. In this example we assume the camera is an OV2640, which is the case on the Espressif ESP32-S3-EYE board for example. For the OV2640 camera we modify components/esp32-camera/sensors/private_include/ov2640_settings.h
by appending | IMAGE_MODE_LBYTE_FIRST
to the line with IMAGE_MODE_RGB565
(line 412 at time of writing), resulting in:
Next, we modify the LCD code. Here we assume the LCD controller is the ST7789, which is the case on the Espressif ESP32-S3-EYE. For the ST7789 controller, modify the static void lcd_st7789_init_reg(void)
function in components/screen/controller_driver/st7789/st7789.c
by adding the following four lines after the first two lines in that function (LCD_WRITE_CMD(0x3A);
and LCD_WRITE_DATA(0x05);
):
// Set proper RGB565 little-endianness
LCD_WRITE_CMD(0xB0);
LCD_WRITE_DATA(0x00);
LCD_WRITE_DATA(0xF8);
Now, re-run idf.py flash monitor
, point the camera at one or more people, and notice red bounding boxes being drawn around them.
Step 8: Optional improvements¶
Now that a first version of the Plumerai People Detection software is running on your ESP32-S3, it is time to make some improvements.
8a. Rename some files and variables¶
For reference, the patch file for this step can be found here.
A trivial change is to rename occurrences of human_face_detection
(the demo application that we modified) to plumerai_people_detection
: e.g. the CMake project name, the .cpp
and .hpp
file names, the function names, and the TAG
for logging.
8b. Draw more visible boxes¶
For reference, the patch file for this step can be found here.
You might have noticed that the red boxes are not that well visible on a small display. One solution would be to modify the draw_detection
function to draw multiple boxes close to each other to obtain a wider box border. Here is an example of the modified function to obtain a border width of 4 pixels:
void draw_detection(BoxPrediction &p, uint16_t * buffer, int width, int height) {
if (p.confidence < 0.7) { return; } // confidence threshold, adjust as needed
int color = 63488; // red in RGB565
for (int w = 0; w < 4; ++w) {
int x_min = clip(p.x_min * width + w, width - 1);
int y_min = clip(p.y_min * height + w, height - 1);
int x_max = clip(p.x_max * width + w, width - 1);
int y_max = clip(p.y_max * height + w, height - 1);
dl::image::draw_hollow_rectangle(buffer, height, width, x_min, y_min, x_max, y_max, color);
}
}
8c. Use the camera callback for better speed¶
For reference, the patch file for this step can be found here.
You might have noticed that the speed of the whole application is not as good as the pre-compiled Plumerai ESP32-S3 demo. The main cause is that we release the camera framebuffer towards the end of our code. This, in combination with using only one framebuffer, puts the camera capture, the resizing for display, and the Plumerai People Detection all in sequence. We can run the camera capture in parallel again by using the PeopleDetectionReadDataCallback function, although it does complicate code a bit.
First, in components/modules/ai/who_plumerai_people_detection.cpp
, we move the camera_fb_t *frame = NULL
out of the task_process_handler
function and make it a global, e.g. just below the camera_fb_t * lcd_buffer = nullptr
definition. Then, right after, we declare a new function which we'll set as callback (make sure this code is placed after the LCD constants). Together this becomes:
camera_fb_t *frame = NULL; // (don't forget to remove this one from the `task_process_handler` function!)
void resize_for_display_and_free_framebuffer(void *_) {
dl::image::resize_image_nearest(
reinterpret_cast<uint16_t *>(frame->buf), {(int)frame->height, (int)frame->width, 1},
reinterpret_cast<uint16_t *>(lcd_buffer->buf), {lcd_display_height, lcd_width, 1}
);
// The input resizing is done (in network) and so is the LCD resizing (above),
// now we can return the framebuffer such that the camera can get a new frame.
esp_camera_fb_return(frame);
}
Then, before the while (true)
loop starts but after PeopleDetectionInit
is called we set the callback:
In the same file, we should remove two pieces of code:
- Remove the call to
resize_image_nearest
that is a few lines before thePeopleDetectionProcessFrame
call in the loop. - Remove the call to
esp_camera_fb_return
that is a few lines after thePeopleDetectionProcessFrame
call in the loop.
Finally, we can re-enable the double camera framebuffer again. We do that by modifying main/app_main.cpp
in the examples/human_face_detection/lcd
folder, changing 1
back into 2
, thus obtaining:
If we re-run idf.py flash monitor
we should see a better overall speed and lower latency.
8d. Measure delta-t instead of guessing¶
For reference, the patch file for this step can be found here.
The PeopleDetectionProcessFrame function has a float delta_t
argument that we previously set to 1 / framerate
where we guessed the framerate to be 2.5 FPS. To obtain better quality predictions, it is recommended to measure this instead.
We can measure this by first adding the following just before the while (true)
loop starts in components/modules/ai/who_plumerai_people_detection.cpp
:
In the same file we then add the following after the loop with the call to draw_detection
:
auto time_since_last_update = esp_timer_get_time() - prev_end_time;
delta_t_s = static_cast<float>(time_since_last_update) / 1000000.0f;
prev_end_time = esp_timer_get_time();
ESP_LOGI(TAG, "Total speed of demo: %.3fms (%.1f FPS)", delta_t_s * 1000.f, 1.0f / delta_t_s);
Finally, we substitute 1 / framerate
with delta_t_s
in the call to PeopleDetectionProcessFrame
and remove the framerate
variable.
8e. Use different colors for different people¶
For reference, the patch file for this step can be found here.
In draw_detection
we always use red as the color for boxes. To distinguish individuals, we can use the tracked identifier value id
of the BoxPrediction class to set colors.
In components/modules/ai/who_plumerai_people_detection.cpp
, we can change int color = 63488;
in the draw_detection
function with int color = colors[p.id % num_colors];
and add the following as a global before that function call: