The inference engine library can be tested for correctness on-device against reference TensorFlow output. The Inference Engine Generator produces the following files in the
validation folder of the:
validation.c: A C source file that runs the model on reference data and compares against reference output.
validation.h: The header file of
validation_utils.h: Utility functions used by
reference_data.h: An auto-generated file containing the reference input and output data.
generate_reference_data.py: A Python script to optionally generate new reference data. By default random data is used, but this script can easily be modified to use data from a validation set for example.
model.tflite: A copy of the model that was used as input to the Inference Engine Generator.
In case of multiple models, a
validation_modelX folder is generated instead for each model, where
1, or higher depending on the number of models passed into the Inference Engine Generator.
validation/validation.c to an existing project, add
validation/ to the compiler include path and call the function
TestPlumeraiInferenceEngine(). The file
validation.c can easily be adapted to different use-cases. By default the code will not return an error status if there are mismatches in the output because those are often expected. Any mismatches are visualized in a histogram to be able to easily inspect the accuracy.
The validation code assumes that the model input is 8-bit quantized integer. If the supplied model has another data-type (e.g. it starts with a float to INT8 quantization layer) the validation code will exit with
assertion failed: input->type == kTfLiteInt8.
Custom reference data¶
The Inference Engine Generator will internally use
generate_reference_data.py to generate random input data. This is already done and
validation/reference_data.h is ready to use. If needed,
generate_reference_data.py can be easily modified to test on other validation data.
The script requires Python 3.5 or newer and TensorFlow:
- Install tensorflow with pip, e.g.
pip3 install tensorflow-cpu.
- Run the file with Python, e.g.
generate_reference_data.py can be modified to read in custom data and use it for the validation model.
There are many scenarios in which there can be a mismatch between the reference output and the actual output without there being an actual bug.
The root cause of this is rounding mismatches between different implementations (
3.5 could become
4) and especially in neural networks with many layers, such off-by-one errors can propagate and cause larger mismatches in the final output. This is to be expected and not cause for alarm: there can be differences between CPU and GPU implementations of TensorFlow layers without even doing any INT8 quantization. After INT8 quantization the output will be different of that of the floating-point network. When comparing two implementations of the same INT8 network there can also be differences: although the INT8 multiply-accumulate operations themselves are error-free, there is a requantization step at the end of each layer (similar to a batchnorm layer) which can cause rounding mismatches.
The reference data generated by
generate_reference_data.py uses the TensorFlow Lite Python interpreter and is not a golden true reference output either, because it also makes certain decisions in terms of rounding and requantization.