Skip to content

Correctness validation

The inference engine library can be tested for correctness on-device against reference TensorFlow output. The Inference Engine Generator produces the following files in the validation folder of the:

  • validation.c: A C source file that runs the model on reference data and compares against reference output.
  • validation.h: The header file of validation.c.
  • validation_utils.h: Utility functions used by validation.c.
  • reference_data.h: An auto-generated file containing the reference input and output data.
  • A Python script to optionally generate new reference data. By default random data is used, but this script can easily be modified to use data from a validation set for example.
  • model.tflite: A copy of the model that was used as input to the Inference Engine Generator.

In case of multiple models, a validation_modelX folder is generated instead for each model, where X is 0, 1, or higher depending on the number of models passed into the Inference Engine Generator.

Simply add validation/validation.c to an existing project, add validation/ to the compiler include path and call the function TestPlumeraiInferenceEngine(). The file validation.c can easily be adapted to different use-cases. By default the code will not return an error status if there are mismatches in the output because those are often expected. Any mismatches are visualized in a histogram to be able to easily inspect the accuracy.

The validation code assumes that the model input is 8-bit quantized integer. If the supplied model has another data-type (e.g. it starts with a float to INT8 quantization layer) the validation code will exit with assertion failed: input->type == kTfLiteInt8.

Custom reference data

The Inference Engine Generator will internally use to generate random input data. This is already done and validation/reference_data.h is ready to use. If needed, can be easily modified to test on other validation data.

The script requires Python 3.5 or newer and TensorFlow:

  1. Install tensorflow with pip, e.g. pip3 install tensorflow-cpu.
  2. Run the file with Python, e.g. python3

The function get_input_data in can be modified to read in custom data and use it for the validation model.


There are many scenarios in which there can be a mismatch between the reference output and the actual output without there being an actual bug.

The root cause of this is rounding mismatches between different implementations (3.5 could become 3 or 4) and especially in neural networks with many layers, such off-by-one errors can propagate and cause larger mismatches in the final output. This is to be expected and not cause for alarm: there can be differences between CPU and GPU implementations of TensorFlow layers without even doing any INT8 quantization. After INT8 quantization the output will be different of that of the floating-point network. When comparing two implementations of the same INT8 network there can also be differences: although the INT8 multiply-accumulate operations themselves are error-free, there is a requantization step at the end of each layer (similar to a batchnorm layer) which can cause rounding mismatches.

The reference data generated by uses the TensorFlow Lite Python interpreter and is not a golden true reference output either, because it also makes certain decisions in terms of rounding and requantization.