# Compiling Your TFLite Model to C++ This guide outlines the steps required to compile a new TFLite model into your project using the **Vela compiler**. --- ## Prerequisites Ensure the following are installed on your system: 1. Python 3.10 or later 2. Visual Studio C++ Build Tools 14 or later (Windows only) 3. A quantized TFLite model (INT8) --- ## Instructions ### Step 1: Install Vela Compiler **a. Install via pip:** ```bash pip install ethos-u-vela ``` This installs the latest version of the Vela compiler from PyPI. **b. Verify Installation:** ```bash vela-version ``` Expected version: `4.2.0` --- ### Step 2: Create a Virtual Environment Navigate to the desired location and run: - **Windows:** ```bash python -m venv ``` - **Linux:** ```bash python -m venv / ``` --- ### Step 3: Activate the Virtual Environment - **Windows:** ```bash \Scripts\activate.bat ``` - **Linux:** ```bash source /bin/activate ``` --- ### Step 4: Navigate to the MCU SDK Inference Directory Go to: ```bash /tools/Inference ``` --- ### Step 5: Install Additional Dependencies Run: ```bash pip install -r requirements.txt ``` --- ### Step 6: Prepare Your TFLite Model Copy the `.tflite` file into `/tools/Inference` OR Use the full path to the model in the next steps. --- ### Step 7: Rename Your TFLite Model Ensure the filename contains no special characters or spaces. --- ### Step 8: Compile the TFLite File Use the script: ```bash python infer_code_gen.py -t [-o ] [-n ] [-s ] [-i ] [-c ] [-tl ] [-p ] ``` --- ### Step 9: Verify Output Files After successful compilation, these files will appear in the output directory: - `model.cc` – model weights - `model_io.cc` – randomized input & expected output --- ### Step 10: Rename the Compiled Model Rename the output `Model.tflite` model to: ```bash .bin ``` --- ### Step 11: Model.bin can be generated from both VS code and Synatoolkit. ### Step 11.1: Generate Model.bin Using VS Code Use this `.bin` file with the **VS Code Image Converter** to produce the final model.bin Refer to: [Astra MCU SDK VSCode Extension User Guide](../developer_guide/Astra_MCU_SDK_VSCode_Extension_Userguide.rst#image-conversion-advanced-configurations) . ### Step 11.2: Generate Model.bin Using Synatoolkit Use this `.bin` file with the **Synatoolkit Image Generator** to produce the final model.bin Refer to: [SynaToolkit](../subject/toolkit/toolkit.rst). --- # Commands for Inference Code Generation ## For SRAM Optimization **Windows:** ```bash python infer_code_gen.py -t ..tflite -o -p Size -t1 1 ``` **Linux:** ```bash python infer_code_gen.py -t /.tflite -o -p Size -t1 1 ``` --- ## For Flash Optimization **Windows:** ```bash python infer_code_gen.py -t .tflite -o -p Performance -tl 2 ``` **Linux:** ```bash python infer_code_gen.py -t .tflite -o -p Performance -tl 2 ``` ### Note: The `infer_code_gen.py` script allows for performance tuning via the `--arena-cache-size`(1MB, 1.25MB, 1.5MB etc) parameter within its `vela_params` (see lines ~98-99). Experimenting with this value can help optimize memory footprint (e.g., `Total SRAM used`, `Total On-chip Flash used`) and inference speed. ```bash vela_params = ['vela', '--output-dir', os.path.dirname(args.tflite_path), '--accelerator-config=ethos-u55-128' , '--optimise=' + args.optimize, '--config=Arm\\vela.ini', memory_mode, '--system-config=Ethos_U55_High_End_Embedded', args.tflite_path,'--arena-cache-size=1500000'] ``` --- ## Vela output This section provides a summary of the model compilation results generated by Arm Vela, detailing the network's characteristics and estimated performance on the Ethos-U55 NPU. ```text Network summary for hl Accelerator configuration Ethos_U55_128 System configuration Ethos_U55_High_End_Embedded Memory mode Sram_Only Accelerator clock 500 MHz Design peak SRAM bandwidth 3.73 GB/s Design peak On-chip Flash bandwidth 3.73 GB/s Total SRAM used 350.00 KiB Total On-chip Flash used 1322.48 KiB CPU operators = 4 (6.0%) NPU operators = 63 (94.0%) Average SRAM bandwidth 1.68 GB/s Input SRAM bandwidth 18.34 MB/batch Weight SRAM bandwidth 0.00 MB/batch Output SRAM bandwidth 6.93 MB/batch Total SRAM bandwidth 25.27 MB/batch Total SRAM bandwidth per input 25.27 MB/inference (batch size 1) Average On-chip Flash bandwidth 0.23 GB/s Input On-chip Flash bandwidth 0.00 MB/batch Weight On-chip Flash bandwidth 3.24 MB/batch Output On-chip Flash bandwidth 0.00 MB/batch Total On-chip Flash bandwidth 3.44 MB/batch Total On-chip Flash bandwidth per input 3.44 MB/inference (batch size 1) Original Weights Size 2688.16 KiB NPU Encoded Weights Size 819.64 KiB Neural network macs 331175776 MACs/batch Info: The numbers below are internal compiler estimates. For performance numbers the compiled network should be run on an FVP Model or FPGA. Network Tops/s 0.04 Tops/s NPU cycles 7447496 cycles/batch SRAM Access cycles 3378876 cycles/batch DRAM Access cycles 0 cycles/batch On-chip Flash Access cycles 450977 cycles/batch Off-chip Flash Access cycles 0 cycles/batch Total cycles 7542742 cycles/batch ``` --- ## Memory Allocation Notes When configuring memory for your project, keep the following in mind: * **Tensor Arena Size:** This should be set to at least the `Total SRAM used` value provided in the Vela output. We strongly recommend adding a **minimum of 10KB extra** to this as a buffer to ensure smooth operation and account for any runtime overheads. * **Model Weights:** The model's weights will reside in the space indicated by `Total On-chip Flash used`.