diff --git a/CHANGELOG.md b/CHANGELOG.md
index 4af2ea0f02..0b982e27c2 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -10,12 +10,14 @@
 
 ### Added
 
-* CTest - OpenVX Tests
-* Hardware Support
+* CTest - Tests for install verification
+* Hardware Support updates
+* Doxygen - Support for API documentation
 
 ### Optimizations
 
 * CMakeList Cleanup
+* Readme
 
 ### Changed
 
@@ -30,7 +32,7 @@
 
 * rocAL bug fix and updates
 
-### Tested Configurations
+### Tested configurations
 
 * Windows `10` / `11`
 * Linux distribution
@@ -38,13 +40,12 @@
   + CentOS - `7` / `8`
   + RHEL - `8` / `9`
   + SLES - `15-SP4`
-* ROCm: rocm-core - `5.4.3.50403-121`
-* miopen-hip - `2.19.0.50403-121`
-* miopen-opencl - `2.18.0.50300-63`
-* migraphx - `2.4.0.50403-121`
+* ROCm: rocm-core - `5.7.0.50700-6`
+* miopen-hip - `2.20.0.50700-63`
+* migraphx - `2.7.0.50700-63`
 * Protobuf - [V3.12.4](https://github.com/protocolbuffers/protobuf/releases/tag/v3.12.4)
 * OpenCV - [4.6.0](https://github.com/opencv/opencv/releases/tag/4.6.0)
-* RPP - [1.2.0](https://github.com/GPUOpen-ProfessionalCompute-Libraries/rpp/releases/tag/1.2.0)
+* RPP - [1.2.0.50700-63](https://github.com/GPUOpen-ProfessionalCompute-Libraries/rpp/releases/tag/1.2.0)
 * FFMPEG - [n4.4.2](https://github.com/FFmpeg/FFmpeg/releases/tag/n4.4.2)
 * Dependencies for all the above packages
 * MIVisionX Setup Script - `V2.5.5`
@@ -52,6 +53,7 @@
 ### Known issues
 
 * OpenCV 4.X support for some apps missing
+* MIVisionX Package install requires manual prerequisites installation 
 
 ## MIVisionX 2.4.0
 
diff --git a/CMakeLists.txt b/CMakeLists.txt
index d0ca237736..2b322b064e 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -38,7 +38,6 @@ set(ROCM_PATH /opt/rocm CACHE PATH "Default ROCm installation path")
 if(CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT)
   set(CMAKE_INSTALL_PREFIX ${ROCM_PATH} CACHE PATH "MIVisionX default installation path" FORCE)
 endif(CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT)
-set(CMAKE_INSTALL_RPATH_USE_LINK_PATH TRUE)
 
 # MIVisionX Default Options
 option(ENHANCED_MESSAGE "MIVisionX Enhanced Message Option"        ON)
diff --git a/README.md b/README.md
index e4cb9d0bcc..d66954d290 100644
--- a/README.md
+++ b/README.md
@@ -22,19 +22,19 @@ MIVisionX toolkit is a set of comprehensive computer vision and machine intellig
 - [Utilities](#utilities)
 - [Prerequisites](#prerequisites)
   - [Hardware](#hardware)
-  - [Operating System](#operating-system)
+  - [Operating System \& Prerequisites](#operating-system--prerequisites)
     - [Windows](#windows)
     - [macOS](#macos)
     - [Linux](#linux)
   - [Prerequisites setup script for Linux](#prerequisites-setup-script-for-linux)
     - [Prerequisites for running the script](#prerequisites-for-running-the-script)
 - [Build \& Install MIVisionX](#build--install-mivisionx)
-  - [Building on Windows](#building-on-windows)
+  - [Windows](#windows-1)
     - [Using `Visual Studio`](#using-visual-studio)
-  - [Building on macOS](#building-on-macos)
-  - [Building on Linux](#building-on-linux)
+  - [macOS](#macos-1)
+  - [Linux](#linux-1)
     - [Using `apt-get` / `yum` / `zypper`](#using-apt-get--yum--zypper)
-    - [Using MIVisionX-setup.py](#using-mivisionx-setuppy)
+    - [Using `MIVisionX-setup.py`](#using-mivisionx-setuppy)
 - [Verify the Installation](#verify-the-installation)
   - [Verifying on Linux / macOS](#verifying-on-linux--macos)
   - [Verifying on Windows](#verifying-on-windows)
@@ -128,10 +128,10 @@ MIVisionX provides you with tools for accomplishing your tasks throughout the wh
 
 ## Utilities
 
-* [inference_generator](utilities/inference_generator/README.md#inference-generator): generate inference library from pre-trained CAFFE models
 * [loom_shell](utilities/loom_shell/README.md#radeon-loomsh): an interpreter to prototype 360 degree video stitching applications using a script
-* [RunVX](utilities/runvx/README.md#amd-runvx): command-line utility to execute OpenVX graph described in GDF text file
+* [mv_deploy](utilities/mv_deploy/README.md): consists of a model-compiler and necessary header/.cpp files which are required to run inference for a specific NeuralNet model
 * [RunCL](utilities/runcl/README.md#amd-runcl): command-line utility to build, execute, and debug OpenCL programs
+* [RunVX](utilities/runvx/README.md#amd-runvx): command-line utility to execute OpenVX graph described in GDF text file
 
 ## Prerequisites
 
@@ -143,7 +143,7 @@ MIVisionX provides you with tools for accomplishing your tasks throughout the wh
 
   **Note:** Some modules in MIVisionX can be built for `CPU ONLY`. To take advantage of `Advanced Features And Modules` we recommend using `AMD GPUs` or `AMD APUs`.
 
-### Operating System
+### Operating System & Prerequisites
 
 #### Windows
 
@@ -172,7 +172,7 @@ MIVisionX provides you with tools for accomplishing your tasks throughout the wh
   + **CentOS** - `7` / `8`
   + **RedHat** - `8` / `9`
   + **SLES** - `15-SP4`
-* Install [ROCm](https://docs.amd.com)
+* Install [ROCm](https://rocmdocs.amd.com/en/latest/deploy/linux/installer/install.html) with `--usecase=graphics,rocm`
 * CMake 3.5 or later
 * MIOpen for [vx_nn](amd_openvx_extensions/amd_nn/README.md#openvx-neural-network-extension-library-vx_nn) extension
 * MIGraphX for `vx_migraphx` extension
@@ -194,8 +194,8 @@ For the convenience of the developer, we provide the setup script `MIVisionX-set
   + CentOS - `7` / `8`
   + RedHat - `8` / `9`
   + SLES - `15-SP4`
-* [ROCm supported hardware](https://docs.amd.com)
-* [ROCm](https://docs.amd.com)
+* [ROCm supported hardware](https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html)
+* Install [ROCm](https://rocmdocs.amd.com/en/latest/deploy/linux/installer/install.html) with `--usecase=graphics,rocm`
 
   **usage:**
 
@@ -215,12 +215,12 @@ For the convenience of the developer, we provide the setup script `MIVisionX-set
                             --rocm_path [ROCm Installation Path - optional (default:/opt/rocm) - ROCm Installation Required]
   ```
     **Note:**
-    * **ROCm upgrade** with `sudo apt upgrade` requires the setup script **rerun**.
+    * **ROCm upgrade** requires the setup script **rerun**.
     * use `X Window` / `X11` for [remote GUI app control](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/wiki/X-Window-forwarding)
 
 ## Build & Install MIVisionX
 
-### Building on Windows
+### Windows
 
 #### Using `Visual Studio`
 
@@ -229,16 +229,16 @@ For the convenience of the developer, we provide the setup script `MIVisionX-set
 
   **NOTE:** `vx_nn` is not supported on `Windows` in this release
 
-### Building on macOS
+### macOS
 
 macOS [build instructions](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/wiki/macOS#macos-build-instructions)
 
-### Building on Linux
+### Linux
 
-#### Using `apt-get` / `yum` / `zypper`
+* [ROCm supported hardware](https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html)
+* Install [ROCm](https://rocmdocs.amd.com/en/latest/deploy/linux/installer/install.html) with `--usecase=graphics,rocm`
 
-* [ROCm supported hardware](https://docs.amd.com)
-* Install [ROCm](https://docs.amd.com)
+#### Using `apt-get` / `yum` / `zypper`
 
 * On `Ubuntu`
   ```
@@ -250,7 +250,7 @@ macOS [build instructions](https://github.com/GPUOpen-ProfessionalCompute-Librar
   ```
 * On `SLES`
   ```
-  sudo zypper install mivisionxF
+  sudo zypper install mivisionx
   ```
 
   **Note:**
@@ -265,22 +265,21 @@ macOS [build instructions](https://github.com/GPUOpen-ProfessionalCompute-Librar
     + Docs folder into `/opt/rocm/share/doc/mivisionx`
   * Package (.deb & .rpm) install requires `OpenCV v4.6` to execute `AMD OpenCV extensions`
 
-#### Using MIVisionX-setup.py
+#### Using `MIVisionX-setup.py`
 
-* Install [ROCm](https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html)
-* Use the below commands to set up and build MIVisionX
+* Clone MIVisionX git repository
 
   ```
   git clone https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX.git
-  cd MIVisionX
   ```
 
   **Note:** MIVisionX has support for two GPU backends: **OPENCL** and **HIP**:
 
-  + Instructions for building MIVisionX with the **HIP** GPU backend (i.e., default GPU backend):
+* Instructions for building MIVisionX with the **HIP** GPU backend (i.e., default GPU backend):
 
     + run the setup script to install all the dependencies required by the **HIP** GPU backend:
     ```
+    cd MIVisionX
     python MIVisionX-setup.py
     ```
 
@@ -293,12 +292,17 @@ macOS [build instructions](https://github.com/GPUOpen-ProfessionalCompute-Librar
     sudo cmake --build . --target PyPackageInstall
     sudo make install
     ```
+
+    + run tests - [test option instructions](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/wiki/CTest)
+    ```
+    make test
+    ```
     **Note:**
     + `PyPackageInstall` used for rocal_pybind installation
     + rocal_pybind not supported on windows.
     + `sudo` required for pybind installation
 
-  + Instructions for building MIVisionX with [**OPENCL** GPU backend](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/wiki/OpenCL-Backend)
+* Instructions for building MIVisionX with [**OPENCL** GPU backend](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/wiki/OpenCL-Backend)
 
 
 ## Verify the Installation
@@ -350,8 +354,8 @@ Docker files to build MIVisionX containers are [available](docker#mivisionx-dock
 
 #### Prerequisites
 * Ubuntu `20.04`/`22.04`
-* [ROCm supported hardware](https://docs.amd.com)
-* [ROCm](https://docs.amd.com)
+* [ROCm supported hardware](https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html)
+* Install [ROCm](https://rocmdocs.amd.com/en/latest/deploy/linux/installer/install.html) with `--usecase=graphics,rocm`
 * [Docker](https://docs.docker.com/engine/install/ubuntu/)
 
 #### Workflow
@@ -432,20 +436,20 @@ Review all notable [changes](CHANGELOG.md#changelog) with the latest release
   + CentOS - `7` / `8`
   + RHEL - `8` / `9`
   + SLES - `15-SP4`
-* ROCm: rocm-core - `5.4.3.50403-121`
-* miopen-hip - `2.19.0.50403-121`
-* miopen-opencl - `2.18.0.50300-63`
-* migraphx - `2.4.0.50403-121`
+* ROCm: rocm-core - `5.7.0.50700-6`
+* miopen-hip - `2.20.0.50700-63`
+* migraphx - `2.7.0.50700-63`
 * Protobuf - [V3.12.4](https://github.com/protocolbuffers/protobuf/releases/tag/v3.12.4)
 * OpenCV - [4.6.0](https://github.com/opencv/opencv/releases/tag/4.6.0)
-* RPP - [1.2.0](https://github.com/GPUOpen-ProfessionalCompute-Libraries/rpp/releases/tag/1.2.0)
+* RPP - [1.2.0.50700-63](https://github.com/GPUOpen-ProfessionalCompute-Libraries/rpp/releases/tag/1.2.0)
 * FFMPEG - [n4.4.2](https://github.com/FFmpeg/FFmpeg/releases/tag/n4.4.2)
 * Dependencies for all the above packages
 * MIVisionX Setup Script - `V2.5.5`
 
 ### Known issues
 
-* Package install requires **OpenCV** `V-4.6.0` to execute `AMD OpenCV extensions`
+* OpenCV 4.X support for some apps missing
+* MIVisionX Package install requires manual prerequisites installation 
 
 ## MIVisionX Dependency Map
 
diff --git a/amd_openvx_extensions/amd_nn/CMakeLists.txt b/amd_openvx_extensions/amd_nn/CMakeLists.txt
index 0dbd18befb..c0c21e45f0 100644
--- a/amd_openvx_extensions/amd_nn/CMakeLists.txt
+++ b/amd_openvx_extensions/amd_nn/CMakeLists.txt
@@ -132,7 +132,7 @@ if(BUILD_DEV)
     install(DIRECTORY ../../apps/dg_test DESTINATION ${CMAKE_INSTALL_DATADIR}/mivisionx/apps)
     install(DIRECTORY ../../apps/mivisionx_inference_analyzer DESTINATION ${CMAKE_INSTALL_DATADIR}/mivisionx/apps)
     install(DIRECTORY ../../apps/mivisionx_openvx_classifier DESTINATION ${CMAKE_INSTALL_DATADIR}/mivisionx/apps)
-    install(DIRECTORY ../../samples/inference/mv_objdetect DESTINATION ${CMAKE_INSTALL_DATADIR}/mivisionx/samples)
+    install(DIRECTORY ../../samples/mv_objdetect DESTINATION ${CMAKE_INSTALL_DATADIR}/mivisionx/samples)
     install(DIRECTORY ../../samples/model_compiler_samples DESTINATION ${CMAKE_INSTALL_DATADIR}/mivisionx/samples)
 endif(BUILD_DEV)
 
diff --git a/apps/image_augmentation/README.md b/apps/image_augmentation/README.md
index 7251204b8d..cc5daaa37a 100644
--- a/apps/image_augmentation/README.md
+++ b/apps/image_augmentation/README.md
@@ -8,7 +8,7 @@ This application demonstrates the basic usage of rocAL's C API to load JPEG imag
 
 ### Pre-requisites
 
-* Ubuntu Linux, [version `16.04` or later](https://www.microsoft.com/software-download/windows10)
+* Ubuntu Linux, [version `20.04` or later](https://www.microsoft.com/software-download/windows10)
 * rocAL library (Part of the MIVisionX toolkit)
 * [OpenCV 3.1](https://github.com/opencv/opencv/releases) or higher
 * Radeon Performance Primitives (RPP)
diff --git a/apps/mivisionx_inference_analyzer/README.md b/apps/mivisionx_inference_analyzer/README.md
index d55141c2dd..6ebc4a7d5f 100644
--- a/apps/mivisionx_inference_analyzer/README.md
+++ b/apps/mivisionx_inference_analyzer/README.md
@@ -36,7 +36,7 @@ Pre-trained models in [ONNX](https://onnx.ai/), [NNEF](https://www.khronos.org/n
 
 ## Prerequisites
 
-* Ubuntu `16.04` / `18.04` or CentOS `7.5` / `7.6`
+* Ubuntu `20.04` / `22.04` or CentOS `7.5` / `7.6`
 * [ROCm supported hardware](https://rocm.github.io/ROCmInstall.html#hardware-support) 
   + AMD Radeon GPU or AMD APU required
 * Latest [ROCm](https://github.com/RadeonOpenCompute/ROCm#installing-from-amd-rocm-repositories)
diff --git a/docs/.sphinx/_toc.yml.in b/docs/.sphinx/_toc.yml.in
index 90ad0fb2d0..12923094b0 100644
--- a/docs/.sphinx/_toc.yml.in
+++ b/docs/.sphinx/_toc.yml.in
@@ -64,7 +64,7 @@ subtrees:
         - entries:
           - file: samples/c_samples/README
           - file: samples/gdf/README
-          - file: samples/inference/mv_objdetect/README
+          - file: samples/mv_objdetect/README
           - file: samples/loom_360_stitch/README
           - file: samples/model_compiler_samples/README
             subtrees:
diff --git a/docs/.sphinx/requirements.in b/docs/.sphinx/requirements.in
index 8a7eff9103..49693b7942 100644
--- a/docs/.sphinx/requirements.in
+++ b/docs/.sphinx/requirements.in
@@ -1 +1,2 @@
-rocm-docs-core[api_reference]==0.24.0
+rocm-docs-core[api_reference]>=0.24.0
+
diff --git a/docs/.sphinx/requirements.txt b/docs/.sphinx/requirements.txt
index a67aee59b2..d62b231589 100644
--- a/docs/.sphinx/requirements.txt
+++ b/docs/.sphinx/requirements.txt
@@ -47,7 +47,7 @@ fastjsonschema==2.16.3
     # via rocm-docs-core
 gitdb==4.0.10
     # via gitpython
-gitpython==3.1.34
+gitpython==3.1.35
     # via rocm-docs-core
 idna==3.4
     # via requests
@@ -110,7 +110,7 @@ requests==2.31.0
     # via
     #   pygithub
     #   sphinx
-rocm-docs-core[api_reference]==0.24.0
+rocm-docs-core[api_reference]>=0.24.0
     # via -r requirements.in
 smmap==5.0.0
     # via gitdb
diff --git a/samples/README.md b/samples/README.md
index ee0580bcf5..9fab99934c 100644
--- a/samples/README.md
+++ b/samples/README.md
@@ -6,7 +6,7 @@ MIVisionX samples using OpenVX and OpenVX extensions. In the samples below we wi
 * [GDF - Graph Description Format Samples](#gdf---graph-description-format)
 * [Loom 360 Stitch - Radeon Loom 360 Stitch Samples](#loom-360-stitch---radeon-loom-360-stitch-samples)
 * [Model Compiler Samples - Run Efficient Inference](#model-compiler-samples---run-efficient-inference)
-* [MIVisionX Inference Deploy Samples](inference/mv_objdetect/)
+* [MIVisionX Inference Deploy Samples](mv_objdetect)
 
 ## GDF - Graph Description Format
 
@@ -108,7 +108,7 @@ make
 
 MIVisionX samples using [LoomShell](../utilities/loom_shell/README.md#radeon-loomshell)
 
-[![Loom Stitch](https://raw.githubusercontent.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/develop/docs/data/loom-4.pngloom-4.png)](https://youtu.be/E8pPU04iZjw)
+[![Loom Stitch](https://raw.githubusercontent.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/develop/docs/data/loom-4.png)](https://youtu.be/E8pPU04iZjw)
 
 **Note:** 
 
@@ -225,3 +225,11 @@ In this [sample](model_compiler_samples/README.md#mivisionx-model-compiler-sampl
 ### [Sample-3: Classification Using Pre-Trained NNEF Model](model_compiler_samples/README.md#sample-3---classification-using-pre-trained-nnef-model)
 
 ### [Sample-4: Classification Using Pre-Trained Caffe Model](model_compiler_samples/README.md#sample-4---classification-using-pre-trained-caffe-model)
+
+## MV Object Detect Samples
+
+This [sample](mv_objdetect) shows how to run video decoding and object detection using pre-trained `YoloV2` Caffe Model
+
+The sample demonstrates the use of mv_compile utility to do video decoding and inference.
+
+<p align="center"><img width="80%" src="https://raw.githubusercontent.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/develop/samples/mv_objdetect/data/images/Video_4_screenshot.png" /></p>
\ No newline at end of file
diff --git a/samples/model_compiler_samples/README.md b/samples/model_compiler_samples/README.md
index 934fda9110..bfa19a33d0 100644
--- a/samples/model_compiler_samples/README.md
+++ b/samples/model_compiler_samples/README.md
@@ -27,7 +27,7 @@ Pre-trained models in [ONNX](https://onnx.ai/), [NNEF](https://www.khronos.org/n
 
 ### Prerequisites
 
-* Ubuntu `18.04`/`20.04` or CentOS `7`/`8`
+* Ubuntu `20.04`/`22.04` or CentOS `7`/`8`
 * [ROCm supported hardware](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.1/page/Prerequisite_Actions.html) 
 	* AMD Radeon GPU or AMD APU required
 * Latest [ROCm](https://docs.amd.com/category/ROCm™%20v5.x)
@@ -35,11 +35,23 @@ Pre-trained models in [ONNX](https://onnx.ai/), [NNEF](https://www.khronos.org/n
 
 #### Docker for Samples
 
-MIVisionX provides developers with [docker images](https://hub.docker.com/u/mivisionx) for [Ubuntu 18.04](https://hub.docker.com/r/mivisionx/ubuntu-18.04), [Ubuntu 20.04](https://hub.docker.com/r/mivisionx/ubuntu-20.04), [CentOS 7](https://hub.docker.com/r/mivisionx/centos-7), & [CentOS 8](https://hub.docker.com/r/mivisionx/centos-8). Using docker images developers can quickly prototype and build applications without having to be locked into a single system setup or lose valuable time figuring out the dependencies of the underlying software.
+MIVisionX provides developers with docker images for Ubuntu `20.04` / `22.04`. Using docker images developers can quickly prototype and build applications without having to be locked into a single system setup or lose valuable time figuring out the dependencies of the underlying software.
 
-##### Docker with display option for the samples
+Docker files to build MIVisionX containers are [available](docker#mivisionx-docker)
+
+### MIVisionX Docker
+* [Ubuntu 20.04](https://cloud.docker.com/repository/docker/mivisionx/ubuntu-20.04)
+* [Ubuntu 22.04](https://cloud.docker.com/repository/docker/mivisionx/ubuntu-22.04)
+
+### Docker Workflow on Ubuntu `20.04`/`22.04`
 
-* Check [docker prerequisites](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX#docker-workflow-sample-on-ubuntu-1804--2004)
+#### Prerequisites
+* Ubuntu `20.04`/`22.04`
+* [ROCm supported hardware](https://docs.amd.com)
+* [ROCm](https://docs.amd.com)
+* [Docker](https://docs.docker.com/engine/install/ubuntu/)
+
+##### Docker with display option for the samples
 
 * Start docker with display
 ````
diff --git a/samples/inference/mv_objdetect/CMakeLists.txt b/samples/mv_objdetect/CMakeLists.txt
similarity index 100%
rename from samples/inference/mv_objdetect/CMakeLists.txt
rename to samples/mv_objdetect/CMakeLists.txt
diff --git a/samples/inference/mv_objdetect/README.md b/samples/mv_objdetect/README.md
similarity index 97%
rename from samples/inference/mv_objdetect/README.md
rename to samples/mv_objdetect/README.md
index 6f2bf21591..b05ac02b13 100644
--- a/samples/inference/mv_objdetect/README.md
+++ b/samples/mv_objdetect/README.md
@@ -9,8 +9,7 @@ The sample has two .cpp files, `mvobjdetect.cpp` and `visualize.cpp`. But it nee
 ## Prerequisites
 
 * Linux
-  * Ubuntu `18.04`/`20.04`
-  * CentOS `7`/`8`
+  * Ubuntu `20.04`/`22.04`
 * [ROCm supported hardware](https://docs.amd.com) 
   * **GPU**: [AMD Radeon&trade; Graphics](https://docs.amd.com/bundle/Hardware_and_Software_Reference_Guide/page/Hardware_and_Software_Support.html) [Required]
   * **APU**: [AMD Radeon&trade; `Mobile`/`Embedded`](https://docs.amd.com/bundle/Hardware_and_Software_Reference_Guide/page/Hardware_and_Software_Support.html) [optional]
@@ -33,7 +32,7 @@ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/lib
 wget https://github.com/kiritigowda/YoloV2NCS/raw/master/models/caffemodels/yoloV2Tiny20.caffemodel
 ```
 
-### Step 2. compile model for OPENCL-ROCm-OpenVX backend using mv_compile utility
+### Step 2. compile model for OpenVX backend using mv_compile utility
 The mv_compile utility generates deployment library, header files, and .cpp files required to run inference for the specified model.
 
 * Usage:
@@ -149,7 +148,7 @@ cd ..
 
 ### Step 10. Sample output for multiple video object detection
 
-<p align="center"><img width="80%" src="https://raw.githubusercontent.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/develop/samples/inference/mv_objdetect/data/images/Video_4_screenshot.png" /></p>
+<p align="center"><img width="80%" src="https://raw.githubusercontent.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/develop/samples/mv_objdetect/data/images/Video_4_screenshot.png" /></p>
 
 # License
 This project is licensed under the MIT License - see the LICENSE.md file for details
diff --git a/samples/inference/mv_objdetect/data/Videos/Videos_4.txt b/samples/mv_objdetect/data/Videos/Videos_4.txt
similarity index 100%
rename from samples/inference/mv_objdetect/data/Videos/Videos_4.txt
rename to samples/mv_objdetect/data/Videos/Videos_4.txt
diff --git a/samples/inference/mv_objdetect/data/images/Video_4_screenshot.png b/samples/mv_objdetect/data/images/Video_4_screenshot.png
similarity index 100%
rename from samples/inference/mv_objdetect/data/images/Video_4_screenshot.png
rename to samples/mv_objdetect/data/images/Video_4_screenshot.png
diff --git a/samples/inference/mv_objdetect/data/images/img_04.JPG b/samples/mv_objdetect/data/images/img_04.JPG
similarity index 100%
rename from samples/inference/mv_objdetect/data/images/img_04.JPG
rename to samples/mv_objdetect/data/images/img_04.JPG
diff --git a/samples/inference/mv_objdetect/mvobjdetect.cpp b/samples/mv_objdetect/mvobjdetect.cpp
similarity index 100%
rename from samples/inference/mv_objdetect/mvobjdetect.cpp
rename to samples/mv_objdetect/mvobjdetect.cpp
diff --git a/samples/inference/mv_objdetect/visualize.cpp b/samples/mv_objdetect/visualize.cpp
similarity index 100%
rename from samples/inference/mv_objdetect/visualize.cpp
rename to samples/mv_objdetect/visualize.cpp
diff --git a/samples/inference/mv_objdetect/visualize.h b/samples/mv_objdetect/visualize.h
similarity index 100%
rename from samples/inference/mv_objdetect/visualize.h
rename to samples/mv_objdetect/visualize.h
diff --git a/tests/library_tests/README.md b/tests/library_tests/README.md
index e73529318f..02181f3ff5 100644
--- a/tests/library_tests/README.md
+++ b/tests/library_tests/README.md
@@ -1,6 +1,6 @@
 # MIVisionX Library Tests
 
-## Script to check if all libraries are built
+## Script to check if all libraries are built & installed
 
 ```
 python runLibraryTests.py
diff --git a/tests/library_tests/runLibraryTests.py b/tests/library_tests/runLibraryTests.py
index 84ba97b4a5..aacc2d586b 100644
--- a/tests/library_tests/runLibraryTests.py
+++ b/tests/library_tests/runLibraryTests.py
@@ -86,9 +86,11 @@ def write_formatted(output, f):
         platform_name = platform_name+'-SLES'
 else:
     print("\nMIVisionX Library Test on "+platform_name+" is unsupported")
-    print("MIVisionX Library Test Supported on: Ubuntu 20/22; CentOS 7/8; RedHat 8/9; & SLES 15 SP3")
+    print("MIVisionX Library Test Supported on: Ubuntu 20/22; CentOS 7/8; RedHat 8/9; & SLES 15 SP4")
     exit(1)
 
+# TBD - Install inxi package
+
 print("\nMIVisionX Library Test V:"+__version__ +
       " on "+platform_name+" is supported")
 
@@ -311,6 +313,4 @@ def write_formatted(output, f):
 print("STATUS: Output Report File - "+reportFileDir)
 if warning == 1:
     print("WARNING: Not all modules of MIVisionX is built, check for missing dependencies")
-else:
-    print("SUCCESS: All modules of MIVisionX built")
-print("runLibraryTests.py completed - V:"+__version__+"\n")
+print("MIVisionX Tests - runLibraryTests.py - V:"+__version__+"\n")
diff --git a/tests/openvx_api_tests/CMakeLists.txt b/tests/openvx_api_tests/CMakeLists.txt
index 835e77a60f..2072b7c27e 100644
--- a/tests/openvx_api_tests/CMakeLists.txt
+++ b/tests/openvx_api_tests/CMakeLists.txt
@@ -25,6 +25,9 @@
 ################################################################################
 cmake_minimum_required(VERSION 3.5)
 
+# TBD - Install additional data indepedent tests
+install(DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/../library_tests DESTINATION ${CMAKE_INSTALL_DATADIR}/mivisionx/tests)
+
 # default run
 # canny
 add_test(
@@ -130,42 +133,42 @@ if(GPU_SUPPORT)
     # caffe2nnir2openvx Fuse flow
     add_test(NAME caffe2nnir2openvx_fuse 
           COMMAND ${Python3_EXECUTABLE} ${CMAKE_SOURCE_DIR}/tests/neural_network_tests/runNeuralNetworkTests.py
-          --profiler_mode 2
+          --profiler_mode 2 --reinstall off
           WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
     # caffe2nnir2openvx FP16 flow
     add_test(NAME caffe2nnir2openvx_fp16
           COMMAND ${Python3_EXECUTABLE} ${CMAKE_SOURCE_DIR}/tests/neural_network_tests/runNeuralNetworkTests.py
-          --profiler_mode 3
+          --profiler_mode 3 --reinstall off
           WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
     # onnx2nnir2openvx No Fuse flow
     add_test(NAME onnx2nnir2openvxx_no_fuse 
           COMMAND ${Python3_EXECUTABLE} ${CMAKE_SOURCE_DIR}/tests/neural_network_tests/runNeuralNetworkTests.py
-          --profiler_mode 4
+          --profiler_mode 4 --reinstall off
           WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})    
     # onnx2nnir2openvx Fuse flow
     add_test(NAME onnx2nnir2openvxx_fuse 
           COMMAND ${Python3_EXECUTABLE} ${CMAKE_SOURCE_DIR}/tests/neural_network_tests/runNeuralNetworkTests.py
-          --profiler_mode 5
+          --profiler_mode 5 --reinstall off
           WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
     # onnx2nnir2openvx FP16 flow
     add_test(NAME onnx2nnir2openvxx_fp16 
           COMMAND ${Python3_EXECUTABLE} ${CMAKE_SOURCE_DIR}/tests/neural_network_tests/runNeuralNetworkTests.py
-          --profiler_mode 6
+          --profiler_mode 6 --reinstall off
           WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
     # nnef2nnir2openvx No Fuse flow
     add_test(NAME nnef2nnir2openvxx_no_fuse 
           COMMAND ${Python3_EXECUTABLE} ${CMAKE_SOURCE_DIR}/tests/neural_network_tests/runNeuralNetworkTests.py
-          --profiler_mode 7
+          --profiler_mode 7 --reinstall off
           WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})    
     # nnef2nnir2openvx Fuse flow
     add_test(NAME nnef2nnir2openvxx_fuse 
           COMMAND ${Python3_EXECUTABLE} ${CMAKE_SOURCE_DIR}/tests/neural_network_tests/runNeuralNetworkTests.py
-          --profiler_mode 8
+          --profiler_mode 8 --reinstall off
           WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
     # nnef2nnir2openvx FP16 flow
     add_test(NAME nnef2nnir2openvxx_fp16 
           COMMAND ${Python3_EXECUTABLE} ${CMAKE_SOURCE_DIR}/tests/neural_network_tests/runNeuralNetworkTests.py
-          --profiler_mode 9
+          --profiler_mode 9 --reinstall off
           WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
   endif(NEURAL_NET AND Python3_FOUND)
 
diff --git a/utilities/inference_generator/CMakeLists.txt b/utilities/inference_generator/CMakeLists.txt
deleted file mode 100644
index 9f8488afba..0000000000
--- a/utilities/inference_generator/CMakeLists.txt
+++ /dev/null
@@ -1,41 +0,0 @@
-# Copyright (c) 2017 - 2023 Advanced Micro Devices, Inc. All rights reserved.
-#
-# Permission is hereby granted, free of charge, to any person obtaining a copy
-# of this software and associated documentation files (the "Software"), to deal
-# in the Software without restriction, including without limitation the rights
-# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-# copies of the Software, and to permit persons to whom the Software is
-# furnished to do so, subject to the following conditions:
-#
-# The above copyright notice and this permission notice shall be included in
-# all copies or substantial portions of the Software.
-#
-# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
-# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
-# THE SOFTWARE.
-
-cmake_minimum_required(VERSION 3.5)
-project(inference_generator)
-
-set(CMAKE_CXX_STANDARD 14)
-
-find_package(Protobuf REQUIRED)
-PROTOBUF_GENERATE_CPP(PROTO_SRCS PROTO_HDRS proto/caffe.proto)
-
-include_directories(${CMAKE_CURRENT_BINARY_DIR})
-list(APPEND CAFFE_SOURCES src/caffe2openvx.cpp ${PROTO_SRCS} ${PROTO_HDRS})
-add_executable(caffe2openvx ${CAFFE_SOURCES})
-target_link_libraries(caffe2openvx ${PROTOBUF_LIBRARIES})
-install (TARGETS caffe2openvx DESTINATION ${CMAKE_INSTALL_BINDIR})
-
-if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "MSVC")
-    set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /MT")
-    set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} /MTd")
-else()
-    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=gnu++14")
-endif()
-
diff --git a/utilities/inference_generator/README.md b/utilities/inference_generator/README.md
deleted file mode 100644
index 067fae1cf3..0000000000
--- a/utilities/inference_generator/README.md
+++ /dev/null
@@ -1,111 +0,0 @@
-# Inference Generator
-
-caffe2openvx: Convert a pre-trained CAFFE model into a C library for use by applications.
-* Extract neural network model from `deploy.prototxt`
-  + generate C code that instantiates OpenVX kernels from [vx_nn](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/tree/master/vx_nn/README.md) module
-  + generate build scripts that package C code into a library
-  + the generated C code or library can be easily integrated into an application for running inference
-* Extract weights and biases from `weights.caffemodel` into separates folders for use by the C library during initialization
-* Also generate a GDF for quick prototyping and kernel debugging
-
-The generated C code will have two functions in `annmodule.h`:
-
-```
-void annGetTensorDimensions(
-        vx_size dimInput[4],    // input tensor dimensions
-        vx_size dimOutput[4]    // output tensor dimensions
-    );
-
-vx_graph annCreateGraph(
-        vx_context context,     // OpenVX context
-        vx_tensor input,        // input tensor
-        vx_tensor output,       // output tensor
-        const char * dataFolder // folder with weights and biases
-    );
-or
-vx_graph annCreateGraphWithInputImage(
-        vx_context context,     // OpenVX context
-        vx_image input,         // input image (RGB or U8)
-        vx_tensor output,       // output tensor
-        const char * dataFolder // folder with weights and biases
-    );
-or
-vx_graph annCreateGraphWithInputImageWithArgmaxTensor(
-        vx_context context,     // OpenVX context
-        vx_image input,         // input image (RGB or U8)
-        vx_tensor output,       // output tensor
-        const char * dataFolder // folder with weights and biases
-    );
-or
-vx_graph annCreateGraphWithInputImageWithArgmaxImage(
-        vx_context context,     // OpenVX context
-        vx_image input,         // input image (RGB or U8)
-        vx_image output,        // output image (U8)
-        const char * dataFolder // folder with weights and biases
-    );
-or
-vx_graph annCreateGraphWithInputImageWithArgmaxImageWithLut(
-        vx_context context,     // OpenVX context
-        vx_image input,         // input image (RGB or U8)
-        vx_image output,        // output image (RGB)
-        const char * dataFolder // folder with weights and biases
-    );
-```
-
-* `annGetTensorDimensions`: allows an application to query dimensions of input and output tensors
-* `annCreateGraph` (or another variant above): creates and initializes a graph with trained neural network for inference
-
-## Command-line Usage
-
-```
-  % caffe2openvx
-        [options]
-        <net.prototxt|net.caffemodel>
-        [n c H W [type fixed-point-position [convert-policy round-policy]]]
-```
-
-| option | description |
-| ------ | ----------- |
-| --(no-)error-messages     | do/don't enable error messages (default: ON) |
-| --(no-)virtual-buffers    | do/don't use virtual buffers (default: ON) |
-| --(no-)generate-gdf       | do/don't generate RunVX GDF with weight/bias initialization (default: ON) |
-| --(no-)generate-vx-code   | do/don't generate OpenVX C Code with weight/bias initialization (default: ON) |
-| --output-dir <folder>     | specify output folder for weights/biases, GDF, and OpenVX C Code (default: current) |
-| --input-rgb <a> <b> <rev> | convert input from RGB image into tensor using (a*x+b) conversion: rev=(BGR?1:0) |
-| --input-u8  <a> <b>       | convert input from U8 image into tensor using (a*x+b) conversion |
-| --argmax-tensor u8/u16 k  | return argmax output with specified tensor type and top_k |
-| --argmax-image u8/u16     | return argmax output with specified image type |
-| --argmax-lut <rgbLut.txt> | argmax color table: one R G B entry per label |
-| --flags <int>             | specify custom flags (default: 0) |
-
-## Example
-
-Make sure that all executables and libraries are in `PATH` and `LD_LIBRARY_PATH` environment variables.
-
-```
-% export PATH=$PATH:/opt/rocm/bin
-% export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/lib
-```
-
-Below log outlines a simple use-case with inference generator.
-
-```
-% caffe2openvx weights.caffemodel 1 3 32 32
-% caffe2openvx deploy.prototxt 1 3 32 32
-% ls
-CMakeLists.txt   annmodule.txt   cmake              weights
-annmodule.cpp    anntest.cpp     deploy.prototxt    weights.caffemodel
-annmodule.h      bias            net.gdf
-% mkdir build
-% cd build
-% cmake ..
-% make
-% cd ..
-% ls build
-CMakeCache.txt  Makefile        cmake_install.cmake
-CMakeFiles      anntest         libannmodule.so
-% ./build/anntest
-OK: annGetTensorDimensions() => [input 32x32x3x32] [output 1x1x10x32]
-```
-
-The `anntest.cpp` is a simple program to initialize and run neural network using the `annmodule` library.
diff --git a/utilities/inference_generator/proto/caffe.proto b/utilities/inference_generator/proto/caffe.proto
deleted file mode 100644
index c96966b589..0000000000
--- a/utilities/inference_generator/proto/caffe.proto
+++ /dev/null
@@ -1,1412 +0,0 @@
-syntax = "proto2";
-
-package caffe;
-
-// Specifies the shape (dimensions) of a Blob.
-message BlobShape {
-  repeated int64 dim = 1 [packed = true];
-}
-
-message BlobProto {
-  optional BlobShape shape = 7;
-  repeated float data = 5 [packed = true];
-  repeated float diff = 6 [packed = true];
-  repeated double double_data = 8 [packed = true];
-  repeated double double_diff = 9 [packed = true];
-
-  // 4D dimensions -- deprecated.  Use "shape" instead.
-  optional int32 num = 1 [default = 0];
-  optional int32 channels = 2 [default = 0];
-  optional int32 height = 3 [default = 0];
-  optional int32 width = 4 [default = 0];
-}
-
-// The BlobProtoVector is simply a way to pass multiple blobproto instances
-// around.
-message BlobProtoVector {
-  repeated BlobProto blobs = 1;
-}
-
-message Datum {
-  optional int32 channels = 1;
-  optional int32 height = 2;
-  optional int32 width = 3;
-  // the actual image data, in bytes
-  optional bytes data = 4;
-  optional int32 label = 5;
-  // Optionally, the datum could also hold float data.
-  repeated float float_data = 6;
-  // If true data contains an encoded image that need to be decoded
-  optional bool encoded = 7 [default = false];
-}
-
-message FillerParameter {
-  // The filler type.
-  optional string type = 1 [default = 'constant'];
-  optional float value = 2 [default = 0]; // the value in constant filler
-  optional float min = 3 [default = 0]; // the min value in uniform filler
-  optional float max = 4 [default = 1]; // the max value in uniform filler
-  optional float mean = 5 [default = 0]; // the mean value in Gaussian filler
-  optional float std = 6 [default = 1]; // the std value in Gaussian filler
-  // The expected number of non-zero output weights for a given input in
-  // Gaussian filler -- the default -1 means don't perform sparsification.
-  optional int32 sparse = 7 [default = -1];
-  // Normalize the filler variance by fan_in, fan_out, or their average.
-  // Applies to 'xavier' and 'msra' fillers.
-  enum VarianceNorm {
-    FAN_IN = 0;
-    FAN_OUT = 1;
-    AVERAGE = 2;
-  }
-  optional VarianceNorm variance_norm = 8 [default = FAN_IN];
-}
-
-message NetParameter {
-  optional string name = 1; // consider giving the network a name
-  // DEPRECATED. See InputParameter. The input blobs to the network.
-  repeated string input = 3;
-  // DEPRECATED. See InputParameter. The shape of the input blobs.
-  repeated BlobShape input_shape = 8;
-
-  // 4D input dimensions -- deprecated.  Use "input_shape" instead.
-  // If specified, for each input blob there should be four
-  // values specifying the num, channels, height and width of the input blob.
-  // Thus, there should be a total of (4 * #input) numbers.
-  repeated int32 input_dim = 4;
-
-  // Whether the network will force every layer to carry out backward operation.
-  // If set False, then whether to carry out backward is determined
-  // automatically according to the net structure and learning rates.
-  optional bool force_backward = 5 [default = false];
-  // The current "state" of the network, including the phase, level, and stage.
-  // Some layers may be included/excluded depending on this state and the states
-  // specified in the layers' include and exclude fields.
-  optional NetState state = 6;
-
-  // Print debugging information about results while running Net::Forward,
-  // Net::Backward, and Net::Update.
-  optional bool debug_info = 7 [default = false];
-
-  // The layers that make up the net.  Each of their configurations, including
-  // connectivity and behavior, is specified as a LayerParameter.
-  repeated LayerParameter layer = 100;  // ID 100 so layers are printed last.
-
-  // DEPRECATED: use 'layer' instead.
-  repeated V1LayerParameter layers = 2;
-}
-
-// NOTE
-// Update the next available ID when you add a new SolverParameter field.
-//
-// SolverParameter next available ID: 42 (last added: layer_wise_reduce)
-message SolverParameter {
-  //////////////////////////////////////////////////////////////////////////////
-  // Specifying the train and test networks
-  //
-  // Exactly one train net must be specified using one of the following fields:
-  //     train_net_param, train_net, net_param, net
-  // One or more test nets may be specified using any of the following fields:
-  //     test_net_param, test_net, net_param, net
-  // If more than one test net field is specified (e.g., both net and
-  // test_net are specified), they will be evaluated in the field order given
-  // above: (1) test_net_param, (2) test_net, (3) net_param/net.
-  // A test_iter must be specified for each test_net.
-  // A test_level and/or a test_stage may also be specified for each test_net.
-  //////////////////////////////////////////////////////////////////////////////
-
-  // Proto filename for the train net, possibly combined with one or more
-  // test nets.
-  optional string net = 24;
-  // Inline train net param, possibly combined with one or more test nets.
-  optional NetParameter net_param = 25;
-
-  optional string train_net = 1; // Proto filename for the train net.
-  repeated string test_net = 2; // Proto filenames for the test nets.
-  optional NetParameter train_net_param = 21; // Inline train net params.
-  repeated NetParameter test_net_param = 22; // Inline test net params.
-
-  // The states for the train/test nets. Must be unspecified or
-  // specified once per net.
-  //
-  // By default, train_state will have phase = TRAIN,
-  // and all test_state's will have phase = TEST.
-  // Other defaults are set according to the NetState defaults.
-  optional NetState train_state = 26;
-  repeated NetState test_state = 27;
-
-  // The number of iterations for each test net.
-  repeated int32 test_iter = 3;
-
-  // The number of iterations between two testing phases.
-  optional int32 test_interval = 4 [default = 0];
-  optional bool test_compute_loss = 19 [default = false];
-  // If true, run an initial test pass before the first iteration,
-  // ensuring memory availability and printing the starting value of the loss.
-  optional bool test_initialization = 32 [default = true];
-  optional float base_lr = 5; // The base learning rate
-  // the number of iterations between displaying info. If display = 0, no info
-  // will be displayed.
-  optional int32 display = 6;
-  // Display the loss averaged over the last average_loss iterations
-  optional int32 average_loss = 33 [default = 1];
-  optional int32 max_iter = 7; // the maximum number of iterations
-  // accumulate gradients over `iter_size` x `batch_size` instances
-  optional int32 iter_size = 36 [default = 1];
-
-  // The learning rate decay policy. The currently implemented learning rate
-  // policies are as follows:
-  //    - fixed: always return base_lr.
-  //    - step: return base_lr * gamma ^ (floor(iter / step))
-  //    - exp: return base_lr * gamma ^ iter
-  //    - inv: return base_lr * (1 + gamma * iter) ^ (- power)
-  //    - multistep: similar to step but it allows non uniform steps defined by
-  //      stepvalue
-  //    - poly: the effective learning rate follows a polynomial decay, to be
-  //      zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power)
-  //    - sigmoid: the effective learning rate follows a sigmod decay
-  //      return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
-  //
-  // where base_lr, max_iter, gamma, step, stepvalue and power are defined
-  // in the solver parameter protocol buffer, and iter is the current iteration.
-  optional string lr_policy = 8;
-  optional float gamma = 9; // The parameter to compute the learning rate.
-  optional float power = 10; // The parameter to compute the learning rate.
-  optional float momentum = 11; // The momentum value.
-  optional float weight_decay = 12; // The weight decay.
-  // regularization types supported: L1 and L2
-  // controlled by weight_decay
-  optional string regularization_type = 29 [default = "L2"];
-  // the stepsize for learning rate policy "step"
-  optional int32 stepsize = 13;
-  // the stepsize for learning rate policy "multistep"
-  repeated int32 stepvalue = 34;
-
-  // Set clip_gradients to >= 0 to clip parameter gradients to that L2 norm,
-  // whenever their actual L2 norm is larger.
-  optional float clip_gradients = 35 [default = -1];
-
-  optional int32 snapshot = 14 [default = 0]; // The snapshot interval
-  optional string snapshot_prefix = 15; // The prefix for the snapshot.
-  // whether to snapshot diff in the results or not. Snapshotting diff will help
-  // debugging but the final protocol buffer size will be much larger.
-  optional bool snapshot_diff = 16 [default = false];
-  enum SnapshotFormat {
-    HDF5 = 0;
-    BINARYPROTO = 1;
-  }
-  optional SnapshotFormat snapshot_format = 37 [default = BINARYPROTO];
-  // the mode solver will use: 0 for CPU and 1 for GPU. Use GPU in default.
-  enum SolverMode {
-    CPU = 0;
-    GPU = 1;
-  }
-  optional SolverMode solver_mode = 17 [default = GPU];
-  // the device_id will that be used in GPU mode. Use device_id = 0 in default.
-  optional int32 device_id = 18 [default = 0];
-  // If non-negative, the seed with which the Solver will initialize the Caffe
-  // random number generator -- useful for reproducible results. Otherwise,
-  // (and by default) initialize using a seed derived from the system clock.
-  optional int64 random_seed = 20 [default = -1];
-
-  // type of the solver
-  optional string type = 40 [default = "SGD"];
-
-  // numerical stability for RMSProp, AdaGrad and AdaDelta and Adam
-  optional float delta = 31 [default = 1e-8];
-  // parameters for the Adam solver
-  optional float momentum2 = 39 [default = 0.999];
-
-  // RMSProp decay value
-  // MeanSquare(t) = rms_decay*MeanSquare(t-1) + (1-rms_decay)*SquareGradient(t)
-  optional float rms_decay = 38 [default = 0.99];
-
-  // If true, print information about the state of the net that may help with
-  // debugging learning problems.
-  optional bool debug_info = 23 [default = false];
-
-  // If false, don't save a snapshot after training finishes.
-  optional bool snapshot_after_train = 28 [default = true];
-
-  // DEPRECATED: old solver enum types, use string instead
-  enum SolverType {
-    SGD = 0;
-    NESTEROV = 1;
-    ADAGRAD = 2;
-    RMSPROP = 3;
-    ADADELTA = 4;
-    ADAM = 5;
-  }
-  // DEPRECATED: use type instead of solver_type
-  optional SolverType solver_type = 30 [default = SGD];
-
-  // Overlap compute and communication for data parallel training
-  optional bool layer_wise_reduce = 41 [default = true];
-}
-
-// A message that stores the solver snapshots
-message SolverState {
-  optional int32 iter = 1; // The current iteration
-  optional string learned_net = 2; // The file that stores the learned net.
-  repeated BlobProto history = 3; // The history for sgd solvers
-  optional int32 current_step = 4 [default = 0]; // The current step for learning rate
-}
-
-enum Phase {
-   TRAIN = 0;
-   TEST = 1;
-}
-
-message NetState {
-  optional Phase phase = 1 [default = TEST];
-  optional int32 level = 2 [default = 0];
-  repeated string stage = 3;
-}
-
-message NetStateRule {
-  // Set phase to require the NetState have a particular phase (TRAIN or TEST)
-  // to meet this rule.
-  optional Phase phase = 1;
-
-  // Set the minimum and/or maximum levels in which the layer should be used.
-  // Leave undefined to meet the rule regardless of level.
-  optional int32 min_level = 2;
-  optional int32 max_level = 3;
-
-  // Customizable sets of stages to include or exclude.
-  // The net must have ALL of the specified stages and NONE of the specified
-  // "not_stage"s to meet the rule.
-  // (Use multiple NetStateRules to specify conjunctions of stages.)
-  repeated string stage = 4;
-  repeated string not_stage = 5;
-}
-
-// Specifies training parameters (multipliers on global learning constants,
-// and the name and other settings used for weight sharing).
-message ParamSpec {
-  // The names of the parameter blobs -- useful for sharing parameters among
-  // layers, but never required otherwise.  To share a parameter between two
-  // layers, give it a (non-empty) name.
-  optional string name = 1;
-
-  // Whether to require shared weights to have the same shape, or just the same
-  // count -- defaults to STRICT if unspecified.
-  optional DimCheckMode share_mode = 2;
-  enum DimCheckMode {
-    // STRICT (default) requires that num, channels, height, width each match.
-    STRICT = 0;
-    // PERMISSIVE requires only the count (num*channels*height*width) to match.
-    PERMISSIVE = 1;
-  }
-
-  // The multiplier on the global learning rate for this parameter.
-  optional float lr_mult = 3 [default = 1.0];
-
-  // The multiplier on the global weight decay for this parameter.
-  optional float decay_mult = 4 [default = 1.0];
-}
-
-// NOTE
-// Update the next available ID when you add a new LayerParameter field.
-//
-// LayerParameter next available layer-specific ID: 147 (last added: recurrent_param)
-message LayerParameter {
-  optional string name = 1; // the layer name
-  optional string type = 2; // the layer type
-  repeated string bottom = 3; // the name of each bottom blob
-  repeated string top = 4; // the name of each top blob
-
-  // The train / test phase for computation.
-  optional Phase phase = 10;
-
-  // The amount of weight to assign each top blob in the objective.
-  // Each layer assigns a default value, usually of either 0 or 1,
-  // to each top blob.
-  repeated float loss_weight = 5;
-
-  // Specifies training parameters (multipliers on global learning constants,
-  // and the name and other settings used for weight sharing).
-  repeated ParamSpec param = 6;
-
-  // The blobs containing the numeric parameters of the layer.
-  repeated BlobProto blobs = 7;
-
-  // Specifies whether to backpropagate to each bottom. If unspecified,
-  // Caffe will automatically infer whether each input needs backpropagation
-  // to compute parameter gradients. If set to true for some inputs,
-  // backpropagation to those inputs is forced; if set false for some inputs,
-  // backpropagation to those inputs is skipped.
-  //
-  // The size must be either 0 or equal to the number of bottoms.
-  repeated bool propagate_down = 11;
-
-  // Rules controlling whether and when a layer is included in the network,
-  // based on the current NetState.  You may specify a non-zero number of rules
-  // to include OR exclude, but not both.  If no include or exclude rules are
-  // specified, the layer is always included.  If the current NetState meets
-  // ANY (i.e., one or more) of the specified rules, the layer is
-  // included/excluded.
-  repeated NetStateRule include = 8;
-  repeated NetStateRule exclude = 9;
-
-  // Parameters for data pre-processing.
-  optional TransformationParameter transform_param = 100;
-
-  // Parameters shared by loss layers.
-  optional LossParameter loss_param = 101;
-
-  // Layer type-specific parameters.
-  //
-  // Note: certain layers may have more than one computational engine
-  // for their implementation. These layers include an Engine type and
-  // engine parameter for selecting the implementation.
-  // The default for the engine is set by the ENGINE switch at compile-time.
-  optional AccuracyParameter accuracy_param = 102;
-  optional ArgMaxParameter argmax_param = 103;
-  optional BatchNormParameter batch_norm_param = 139;
-  optional BiasParameter bias_param = 141;
-  optional ConcatParameter concat_param = 104;
-  optional ContrastiveLossParameter contrastive_loss_param = 105;
-  optional ConvolutionParameter convolution_param = 106;
-  optional CropParameter crop_param = 144;
-  optional DataParameter data_param = 107;
-  optional DropoutParameter dropout_param = 108;
-  optional DummyDataParameter dummy_data_param = 109;
-  optional EltwiseParameter eltwise_param = 110;
-  optional ELUParameter elu_param = 140;
-  optional EmbedParameter embed_param = 137;
-  optional ExpParameter exp_param = 111;
-  optional FlattenParameter flatten_param = 135;
-  optional HDF5DataParameter hdf5_data_param = 112;
-  optional HDF5OutputParameter hdf5_output_param = 113;
-  optional HingeLossParameter hinge_loss_param = 114;
-  optional ImageDataParameter image_data_param = 115;
-  optional InfogainLossParameter infogain_loss_param = 116;
-  optional InnerProductParameter inner_product_param = 117;
-  optional InputParameter input_param = 143;
-  optional LogParameter log_param = 134;
-  optional LRNParameter lrn_param = 118;
-  optional MemoryDataParameter memory_data_param = 119;
-  optional MVNParameter mvn_param = 120;
-  optional ParameterParameter parameter_param = 145;
-  optional PoolingParameter pooling_param = 121;
-  optional PowerParameter power_param = 122;
-  optional PReLUParameter prelu_param = 131;
-  optional PythonParameter python_param = 130;
-  optional RecurrentParameter recurrent_param = 146;
-  optional ReductionParameter reduction_param = 136;
-  optional ReLUParameter relu_param = 123;
-  optional ReshapeParameter reshape_param = 133;
-  optional ScaleParameter scale_param = 142;
-  optional SigmoidParameter sigmoid_param = 124;
-  optional SoftmaxParameter softmax_param = 125;
-  optional SPPParameter spp_param = 132;
-  optional SliceParameter slice_param = 126;
-  optional TanHParameter tanh_param = 127;
-  optional ThresholdParameter threshold_param = 128;
-  optional TileParameter tile_param = 138;
-  optional WindowDataParameter window_data_param = 129;
-}
-
-// Message that stores parameters used to apply transformation
-// to the data layer's data
-message TransformationParameter {
-  // For data pre-processing, we can do simple scaling and subtracting the
-  // data mean, if provided. Note that the mean subtraction is always carried
-  // out before scaling.
-  optional float scale = 1 [default = 1];
-  // Specify if we want to randomly mirror data.
-  optional bool mirror = 2 [default = false];
-  // Specify if we would like to randomly crop an image.
-  optional uint32 crop_size = 3 [default = 0];
-  // mean_file and mean_value cannot be specified at the same time
-  optional string mean_file = 4;
-  // if specified can be repeated once (would subtract it from all the channels)
-  // or can be repeated the same number of times as channels
-  // (would subtract them from the corresponding channel)
-  repeated float mean_value = 5;
-  // Force the decoded image to have 3 color channels.
-  optional bool force_color = 6 [default = false];
-  // Force the decoded image to have 1 color channels.
-  optional bool force_gray = 7 [default = false];
-}
-
-// Message that stores parameters shared by loss layers
-message LossParameter {
-  // If specified, ignore instances with the given label.
-  optional int32 ignore_label = 1;
-  // How to normalize the loss for loss layers that aggregate across batches,
-  // spatial dimensions, or other dimensions.  Currently only implemented in
-  // SoftmaxWithLoss and SigmoidCrossEntropyLoss layers.
-  enum NormalizationMode {
-    // Divide by the number of examples in the batch times spatial dimensions.
-    // Outputs that receive the ignore label will NOT be ignored in computing
-    // the normalization factor.
-    FULL = 0;
-    // Divide by the total number of output locations that do not take the
-    // ignore_label.  If ignore_label is not set, this behaves like FULL.
-    VALID = 1;
-    // Divide by the batch size.
-    BATCH_SIZE = 2;
-    // Do not normalize the loss.
-    NONE = 3;
-  }
-  // For historical reasons, the default normalization for
-  // SigmoidCrossEntropyLoss is BATCH_SIZE and *not* VALID.
-  optional NormalizationMode normalization = 3 [default = VALID];
-  // Deprecated.  Ignored if normalization is specified.  If normalization
-  // is not specified, then setting this to false will be equivalent to
-  // normalization = BATCH_SIZE to be consistent with previous behavior.
-  optional bool normalize = 2;
-}
-
-// Messages that store parameters used by individual layer types follow, in
-// alphabetical order.
-
-message AccuracyParameter {
-  // When computing accuracy, count as correct by comparing the true label to
-  // the top k scoring classes.  By default, only compare to the top scoring
-  // class (i.e. argmax).
-  optional uint32 top_k = 1 [default = 1];
-
-  // The "label" axis of the prediction blob, whose argmax corresponds to the
-  // predicted label -- may be negative to index from the end (e.g., -1 for the
-  // last axis).  For example, if axis == 1 and the predictions are
-  // (N x C x H x W), the label blob is expected to contain N*H*W ground truth
-  // labels with integer values in {0, 1, ..., C-1}.
-  optional int32 axis = 2 [default = 1];
-
-  // If specified, ignore instances with the given label.
-  optional int32 ignore_label = 3;
-}
-
-message ArgMaxParameter {
-  // If true produce pairs (argmax, maxval)
-  optional bool out_max_val = 1 [default = false];
-  optional uint32 top_k = 2 [default = 1];
-  // The axis along which to maximise -- may be negative to index from the
-  // end (e.g., -1 for the last axis).
-  // By default ArgMaxLayer maximizes over the flattened trailing dimensions
-  // for each index of the first / num dimension.
-  optional int32 axis = 3;
-}
-
-message ConcatParameter {
-  // The axis along which to concatenate -- may be negative to index from the
-  // end (e.g., -1 for the last axis).  Other axes must have the
-  // same dimension for all the bottom blobs.
-  // By default, ConcatLayer concatenates blobs along the "channels" axis (1).
-  optional int32 axis = 2 [default = 1];
-
-  // DEPRECATED: alias for "axis" -- does not support negative indexing.
-  optional uint32 concat_dim = 1 [default = 1];
-}
-
-message BatchNormParameter {
-  // If false, normalization is performed over the current mini-batch
-  // and global statistics are accumulated (but not yet used) by a moving
-  // average.
-  // If true, those accumulated mean and variance values are used for the
-  // normalization.
-  // By default, it is set to false when the network is in the training
-  // phase and true when the network is in the testing phase.
-  optional bool use_global_stats = 1;
-  // What fraction of the moving average remains each iteration?
-  // Smaller values make the moving average decay faster, giving more
-  // weight to the recent values.
-  // Each iteration updates the moving average @f$S_{t-1}@f$ with the
-  // current mean @f$ Y_t @f$ by
-  // @f$ S_t = (1-\beta)Y_t + \beta \cdot S_{t-1} @f$, where @f$ \beta @f$
-  // is the moving_average_fraction parameter.
-  optional float moving_average_fraction = 2 [default = .999];
-  // Small value to add to the variance estimate so that we don't divide by
-  // zero.
-  optional float eps = 3 [default = 1e-5];
-}
-
-message BiasParameter {
-  // The first axis of bottom[0] (the first input Blob) along which to apply
-  // bottom[1] (the second input Blob).  May be negative to index from the end
-  // (e.g., -1 for the last axis).
-  //
-  // For example, if bottom[0] is 4D with shape 100x3x40x60, the output
-  // top[0] will have the same shape, and bottom[1] may have any of the
-  // following shapes (for the given value of axis):
-  //    (axis == 0 == -4) 100; 100x3; 100x3x40; 100x3x40x60
-  //    (axis == 1 == -3)          3;     3x40;     3x40x60
-  //    (axis == 2 == -2)                   40;       40x60
-  //    (axis == 3 == -1)                                60
-  // Furthermore, bottom[1] may have the empty shape (regardless of the value of
-  // "axis") -- a scalar bias.
-  optional int32 axis = 1 [default = 1];
-
-  // (num_axes is ignored unless just one bottom is given and the bias is
-  // a learned parameter of the layer.  Otherwise, num_axes is determined by the
-  // number of axes by the second bottom.)
-  // The number of axes of the input (bottom[0]) covered by the bias
-  // parameter, or -1 to cover all axes of bottom[0] starting from `axis`.
-  // Set num_axes := 0, to add a zero-axis Blob: a scalar.
-  optional int32 num_axes = 2 [default = 1];
-
-  // (filler is ignored unless just one bottom is given and the bias is
-  // a learned parameter of the layer.)
-  // The initialization for the learned bias parameter.
-  // Default is the zero (0) initialization, resulting in the BiasLayer
-  // initially performing the identity operation.
-  optional FillerParameter filler = 3;
-}
-
-message ContrastiveLossParameter {
-  // margin for dissimilar pair
-  optional float margin = 1 [default = 1.0];
-  // The first implementation of this cost did not exactly match the cost of
-  // Hadsell et al 2006 -- using (margin - d^2) instead of (margin - d)^2.
-  // legacy_version = false (the default) uses (margin - d)^2 as proposed in the
-  // Hadsell paper. New models should probably use this version.
-  // legacy_version = true uses (margin - d^2). This is kept to support /
-  // reproduce existing models and results
-  optional bool legacy_version = 2 [default = false];
-}
-
-message ConvolutionParameter {
-  optional uint32 num_output = 1; // The number of outputs for the layer
-  optional bool bias_term = 2 [default = true]; // whether to have bias terms
-
-  // Pad, kernel size, and stride are all given as a single value for equal
-  // dimensions in all spatial dimensions, or once per spatial dimension.
-  repeated uint32 pad = 3; // The padding size; defaults to 0
-  repeated uint32 kernel_size = 4; // The kernel size
-  repeated uint32 stride = 6; // The stride; defaults to 1
-  // Factor used to dilate the kernel, (implicitly) zero-filling the resulting
-  // holes. (Kernel dilation is sometimes referred to by its use in the
-  // algorithme à trous from Holschneider et al. 1987.)
-  repeated uint32 dilation = 18; // The dilation; defaults to 1
-
-  // For 2D convolution only, the *_h and *_w versions may also be used to
-  // specify both spatial dimensions.
-  optional uint32 pad_h = 9 [default = 0]; // The padding height (2D only)
-  optional uint32 pad_w = 10 [default = 0]; // The padding width (2D only)
-  optional uint32 kernel_h = 11; // The kernel height (2D only)
-  optional uint32 kernel_w = 12; // The kernel width (2D only)
-  optional uint32 stride_h = 13; // The stride height (2D only)
-  optional uint32 stride_w = 14; // The stride width (2D only)
-
-  optional uint32 group = 5 [default = 1]; // The group size for group conv
-
-  optional FillerParameter weight_filler = 7; // The filler for the weight
-  optional FillerParameter bias_filler = 8; // The filler for the bias
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 15 [default = DEFAULT];
-
-  // The axis to interpret as "channels" when performing convolution.
-  // Preceding dimensions are treated as independent inputs;
-  // succeeding dimensions are treated as "spatial".
-  // With (N, C, H, W) inputs, and axis == 1 (the default), we perform
-  // N independent 2D convolutions, sliding C-channel (or (C/g)-channels, for
-  // groups g>1) filters across the spatial axes (H, W) of the input.
-  // With (N, C, D, H, W) inputs, and axis == 1, we perform
-  // N independent 3D convolutions, sliding (C/g)-channels
-  // filters across the spatial axes (D, H, W) of the input.
-  optional int32 axis = 16 [default = 1];
-
-  // Whether to force use of the general ND convolution, even if a specific
-  // implementation for blobs of the appropriate number of spatial dimensions
-  // is available. (Currently, there is only a 2D-specific convolution
-  // implementation; for input blobs with num_axes != 2, this option is
-  // ignored and the ND implementation will be used.)
-  optional bool force_nd_im2col = 17 [default = false];
-}
-
-message CropParameter {
-  // To crop, elements of the first bottom are selected to fit the dimensions
-  // of the second, reference bottom. The crop is configured by
-  // - the crop `axis` to pick the dimensions for cropping
-  // - the crop `offset` to set the shift for all/each dimension
-  // to align the cropped bottom with the reference bottom.
-  // All dimensions up to but excluding `axis` are preserved, while
-  // the dimensions including and trailing `axis` are cropped.
-  // If only one `offset` is set, then all dimensions are offset by this amount.
-  // Otherwise, the number of offsets must equal the number of cropped axes to
-  // shift the crop in each dimension accordingly.
-  // Note: standard dimensions are N,C,H,W so the default is a spatial crop,
-  // and `axis` may be negative to index from the end (e.g., -1 for the last
-  // axis).
-  optional int32 axis = 1 [default = 2];
-  repeated uint32 offset = 2;
-}
-
-message DataParameter {
-  enum DB {
-    LEVELDB = 0;
-    LMDB = 1;
-  }
-  // Specify the data source.
-  optional string source = 1;
-  // Specify the batch size.
-  optional uint32 batch_size = 4;
-  // The rand_skip variable is for the data layer to skip a few data points
-  // to avoid all asynchronous sgd clients to start at the same point. The skip
-  // point would be set as rand_skip * rand(0,1). Note that rand_skip should not
-  // be larger than the number of keys in the database.
-  // DEPRECATED. Each solver accesses a different subset of the database.
-  optional uint32 rand_skip = 7 [default = 0];
-  optional DB backend = 8 [default = LEVELDB];
-  // DEPRECATED. See TransformationParameter. For data pre-processing, we can do
-  // simple scaling and subtracting the data mean, if provided. Note that the
-  // mean subtraction is always carried out before scaling.
-  optional float scale = 2 [default = 1];
-  optional string mean_file = 3;
-  // DEPRECATED. See TransformationParameter. Specify if we would like to randomly
-  // crop an image.
-  optional uint32 crop_size = 5 [default = 0];
-  // DEPRECATED. See TransformationParameter. Specify if we want to randomly mirror
-  // data.
-  optional bool mirror = 6 [default = false];
-  // Force the encoded image to have 3 color channels
-  optional bool force_encoded_color = 9 [default = false];
-  // Prefetch queue (Increase if data feeding bandwidth varies, within the
-  // limit of device memory for GPU training)
-  optional uint32 prefetch = 10 [default = 4];
-}
-
-message DropoutParameter {
-  optional float dropout_ratio = 1 [default = 0.5]; // dropout ratio
-}
-
-// DummyDataLayer fills any number of arbitrarily shaped blobs with random
-// (or constant) data generated by "Fillers" (see "message FillerParameter").
-message DummyDataParameter {
-  // This layer produces N >= 1 top blobs.  DummyDataParameter must specify 1 or N
-  // shape fields, and 0, 1 or N data_fillers.
-  //
-  // If 0 data_fillers are specified, ConstantFiller with a value of 0 is used.
-  // If 1 data_filler is specified, it is applied to all top blobs.  If N are
-  // specified, the ith is applied to the ith top blob.
-  repeated FillerParameter data_filler = 1;
-  repeated BlobShape shape = 6;
-
-  // 4D dimensions -- deprecated.  Use "shape" instead.
-  repeated uint32 num = 2;
-  repeated uint32 channels = 3;
-  repeated uint32 height = 4;
-  repeated uint32 width = 5;
-}
-
-message EltwiseParameter {
-  enum EltwiseOp {
-    PROD = 0;
-    SUM = 1;
-    MAX = 2;
-  }
-  optional EltwiseOp operation = 1 [default = SUM]; // element-wise operation
-  repeated float coeff = 2; // blob-wise coefficient for SUM operation
-
-  // Whether to use an asymptotically slower (for >2 inputs) but stabler method
-  // of computing the gradient for the PROD operation. (No effect for SUM op.)
-  optional bool stable_prod_grad = 3 [default = true];
-}
-
-// Message that stores parameters used by ELULayer
-message ELUParameter {
-  // Described in:
-  // Clevert, D.-A., Unterthiner, T., & Hochreiter, S. (2015). Fast and Accurate
-  // Deep Network Learning by Exponential Linear Units (ELUs). arXiv
-  optional float alpha = 1 [default = 1];
-}
-
-// Message that stores parameters used by EmbedLayer
-message EmbedParameter {
-  optional uint32 num_output = 1; // The number of outputs for the layer
-  // The input is given as integers to be interpreted as one-hot
-  // vector indices with dimension num_input.  Hence num_input should be
-  // 1 greater than the maximum possible input value.
-  optional uint32 input_dim = 2;
-
-  optional bool bias_term = 3 [default = true]; // Whether to use a bias term
-  optional FillerParameter weight_filler = 4; // The filler for the weight
-  optional FillerParameter bias_filler = 5; // The filler for the bias
-
-}
-
-// Message that stores parameters used by ExpLayer
-message ExpParameter {
-  // ExpLayer computes outputs y = base ^ (shift + scale * x), for base > 0.
-  // Or if base is set to the default (-1), base is set to e,
-  // so y = exp(shift + scale * x).
-  optional float base = 1 [default = -1.0];
-  optional float scale = 2 [default = 1.0];
-  optional float shift = 3 [default = 0.0];
-}
-
-/// Message that stores parameters used by FlattenLayer
-message FlattenParameter {
-  // The first axis to flatten: all preceding axes are retained in the output.
-  // May be negative to index from the end (e.g., -1 for the last axis).
-  optional int32 axis = 1 [default = 1];
-
-  // The last axis to flatten: all following axes are retained in the output.
-  // May be negative to index from the end (e.g., the default -1 for the last
-  // axis).
-  optional int32 end_axis = 2 [default = -1];
-}
-
-// Message that stores parameters used by HDF5DataLayer
-message HDF5DataParameter {
-  // Specify the data source.
-  optional string source = 1;
-  // Specify the batch size.
-  optional uint32 batch_size = 2;
-
-  // Specify whether to shuffle the data.
-  // If shuffle == true, the ordering of the HDF5 files is shuffled,
-  // and the ordering of data within any given HDF5 file is shuffled,
-  // but data between different files are not interleaved; all of a file's
-  // data are output (in a random order) before moving onto another file.
-  optional bool shuffle = 3 [default = false];
-}
-
-message HDF5OutputParameter {
-  optional string file_name = 1;
-}
-
-message HingeLossParameter {
-  enum Norm {
-    L1 = 1;
-    L2 = 2;
-  }
-  // Specify the Norm to use L1 or L2
-  optional Norm norm = 1 [default = L1];
-}
-
-message ImageDataParameter {
-  // Specify the data source.
-  optional string source = 1;
-  // Specify the batch size.
-  optional uint32 batch_size = 4 [default = 1];
-  // The rand_skip variable is for the data layer to skip a few data points
-  // to avoid all asynchronous sgd clients to start at the same point. The skip
-  // point would be set as rand_skip * rand(0,1). Note that rand_skip should not
-  // be larger than the number of keys in the database.
-  optional uint32 rand_skip = 7 [default = 0];
-  // Whether or not ImageLayer should shuffle the list of files at every epoch.
-  optional bool shuffle = 8 [default = false];
-  // It will also resize images if new_height or new_width are not zero.
-  optional uint32 new_height = 9 [default = 0];
-  optional uint32 new_width = 10 [default = 0];
-  // Specify if the images are color or gray
-  optional bool is_color = 11 [default = true];
-  // DEPRECATED. See TransformationParameter. For data pre-processing, we can do
-  // simple scaling and subtracting the data mean, if provided. Note that the
-  // mean subtraction is always carried out before scaling.
-  optional float scale = 2 [default = 1];
-  optional string mean_file = 3;
-  // DEPRECATED. See TransformationParameter. Specify if we would like to randomly
-  // crop an image.
-  optional uint32 crop_size = 5 [default = 0];
-  // DEPRECATED. See TransformationParameter. Specify if we want to randomly mirror
-  // data.
-  optional bool mirror = 6 [default = false];
-  optional string root_folder = 12 [default = ""];
-}
-
-message InfogainLossParameter {
-  // Specify the infogain matrix source.
-  optional string source = 1;
-  optional int32 axis = 2 [default = 1]; // axis of prob
-}
-
-message InnerProductParameter {
-  optional uint32 num_output = 1; // The number of outputs for the layer
-  optional bool bias_term = 2 [default = true]; // whether to have bias terms
-  optional FillerParameter weight_filler = 3; // The filler for the weight
-  optional FillerParameter bias_filler = 4; // The filler for the bias
-
-  // The first axis to be lumped into a single inner product computation;
-  // all preceding axes are retained in the output.
-  // May be negative to index from the end (e.g., -1 for the last axis).
-  optional int32 axis = 5 [default = 1];
-  // Specify whether to transpose the weight matrix or not.
-  // If transpose == true, any operations will be performed on the transpose
-  // of the weight matrix. The weight matrix itself is not going to be transposed
-  // but rather the transfer flag of operations will be toggled accordingly.
-  optional bool transpose = 6 [default = false];
-}
-
-message InputParameter {
-  // This layer produces N >= 1 top blob(s) to be assigned manually.
-  // Define N shapes to set a shape for each top.
-  // Define 1 shape to set the same shape for every top.
-  // Define no shape to defer to reshaping manually.
-  repeated BlobShape shape = 1;
-}
-
-// Message that stores parameters used by LogLayer
-message LogParameter {
-  // LogLayer computes outputs y = log_base(shift + scale * x), for base > 0.
-  // Or if base is set to the default (-1), base is set to e,
-  // so y = ln(shift + scale * x) = log_e(shift + scale * x)
-  optional float base = 1 [default = -1.0];
-  optional float scale = 2 [default = 1.0];
-  optional float shift = 3 [default = 0.0];
-}
-
-// Message that stores parameters used by LRNLayer
-message LRNParameter {
-  optional uint32 local_size = 1 [default = 5];
-  optional float alpha = 2 [default = 1.];
-  optional float beta = 3 [default = 0.75];
-  enum NormRegion {
-    ACROSS_CHANNELS = 0;
-    WITHIN_CHANNEL = 1;
-  }
-  optional NormRegion norm_region = 4 [default = ACROSS_CHANNELS];
-  optional float k = 5 [default = 1.];
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 6 [default = DEFAULT];
-}
-
-message MemoryDataParameter {
-  optional uint32 batch_size = 1;
-  optional uint32 channels = 2;
-  optional uint32 height = 3;
-  optional uint32 width = 4;
-}
-
-message MVNParameter {
-  // This parameter can be set to false to normalize mean only
-  optional bool normalize_variance = 1 [default = true];
-
-  // This parameter can be set to true to perform DNN-like MVN
-  optional bool across_channels = 2 [default = false];
-
-  // Epsilon for not dividing by zero while normalizing variance
-  optional float eps = 3 [default = 1e-9];
-}
-
-message ParameterParameter {
-  optional BlobShape shape = 1;
-}
-
-message PoolingParameter {
-  enum PoolMethod {
-    MAX = 0;
-    AVE = 1;
-    STOCHASTIC = 2;
-  }
-  optional PoolMethod pool = 1 [default = MAX]; // The pooling method
-  // Pad, kernel size, and stride are all given as a single value for equal
-  // dimensions in height and width or as Y, X pairs.
-  optional uint32 pad = 4 [default = 0]; // The padding size (equal in Y, X)
-  optional uint32 pad_h = 9 [default = 0]; // The padding height
-  optional uint32 pad_w = 10 [default = 0]; // The padding width
-  optional uint32 kernel_size = 2; // The kernel size (square)
-  optional uint32 kernel_h = 5; // The kernel height
-  optional uint32 kernel_w = 6; // The kernel width
-  optional uint32 stride = 3 [default = 1]; // The stride (equal in Y, X)
-  optional uint32 stride_h = 7; // The stride height
-  optional uint32 stride_w = 8; // The stride width
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 11 [default = DEFAULT];
-  // If global_pooling then it will pool over the size of the bottom by doing
-  // kernel_h = bottom->height and kernel_w = bottom->width
-  optional bool global_pooling = 12 [default = false];
-}
-
-message PowerParameter {
-  // PowerLayer computes outputs y = (shift + scale * x) ^ power.
-  optional float power = 1 [default = 1.0];
-  optional float scale = 2 [default = 1.0];
-  optional float shift = 3 [default = 0.0];
-}
-
-message PythonParameter {
-  optional string module = 1;
-  optional string layer = 2;
-  // This value is set to the attribute `param_str` of the `PythonLayer` object
-  // in Python before calling the `setup()` method. This could be a number,
-  // string, dictionary in Python dict format, JSON, etc. You may parse this
-  // string in `setup` method and use it in `forward` and `backward`.
-  optional string param_str = 3 [default = ''];
-  // DEPRECATED
-  optional bool share_in_parallel = 4 [default = false];
-}
-
-// Message that stores parameters used by RecurrentLayer
-message RecurrentParameter {
-  // The dimension of the output (and usually hidden state) representation --
-  // must be explicitly set to non-zero.
-  optional uint32 num_output = 1 [default = 0];
-
-  optional FillerParameter weight_filler = 2; // The filler for the weight
-  optional FillerParameter bias_filler = 3; // The filler for the bias
-
-  // Whether to enable displaying debug_info in the unrolled recurrent net.
-  optional bool debug_info = 4 [default = false];
-
-  // Whether to add as additional inputs (bottoms) the initial hidden state
-  // blobs, and add as additional outputs (tops) the final timestep hidden state
-  // blobs.  The number of additional bottom/top blobs required depends on the
-  // recurrent architecture -- e.g., 1 for RNNs, 2 for LSTMs.
-  optional bool expose_hidden = 5 [default = false];
-}
-
-// Message that stores parameters used by ReductionLayer
-message ReductionParameter {
-  enum ReductionOp {
-    SUM = 1;
-    ASUM = 2;
-    SUMSQ = 3;
-    MEAN = 4;
-  }
-
-  optional ReductionOp operation = 1 [default = SUM]; // reduction operation
-
-  // The first axis to reduce to a scalar -- may be negative to index from the
-  // end (e.g., -1 for the last axis).
-  // (Currently, only reduction along ALL "tail" axes is supported; reduction
-  // of axis M through N, where N < num_axes - 1, is unsupported.)
-  // Suppose we have an n-axis bottom Blob with shape:
-  //     (d0, d1, d2, ..., d(m-1), dm, d(m+1), ..., d(n-1)).
-  // If axis == m, the output Blob will have shape
-  //     (d0, d1, d2, ..., d(m-1)),
-  // and the ReductionOp operation is performed (d0 * d1 * d2 * ... * d(m-1))
-  // times, each including (dm * d(m+1) * ... * d(n-1)) individual data.
-  // If axis == 0 (the default), the output Blob always has the empty shape
-  // (count 1), performing reduction across the entire input --
-  // often useful for creating new loss functions.
-  optional int32 axis = 2 [default = 0];
-
-  optional float coeff = 3 [default = 1.0]; // coefficient for output
-}
-
-// Message that stores parameters used by ReLULayer
-message ReLUParameter {
-  // Allow non-zero slope for negative inputs to speed up optimization
-  // Described in:
-  // Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities
-  // improve neural network acoustic models. In ICML Workshop on Deep Learning
-  // for Audio, Speech, and Language Processing.
-  optional float negative_slope = 1 [default = 0];
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 2 [default = DEFAULT];
-}
-
-message ReshapeParameter {
-  // Specify the output dimensions. If some of the dimensions are set to 0,
-  // the corresponding dimension from the bottom layer is used (unchanged).
-  // Exactly one dimension may be set to -1, in which case its value is
-  // inferred from the count of the bottom blob and the remaining dimensions.
-  // For example, suppose we want to reshape a 2D blob "input" with shape 2 x 8:
-  //
-  //   layer {
-  //     type: "Reshape" bottom: "input" top: "output"
-  //     reshape_param { ... }
-  //   }
-  //
-  // If "input" is 2D with shape 2 x 8, then the following reshape_param
-  // specifications are all equivalent, producing a 3D blob "output" with shape
-  // 2 x 2 x 4:
-  //
-  //   reshape_param { shape { dim:  2  dim: 2  dim:  4 } }
-  //   reshape_param { shape { dim:  0  dim: 2  dim:  4 } }
-  //   reshape_param { shape { dim:  0  dim: 2  dim: -1 } }
-  //   reshape_param { shape { dim:  0  dim:-1  dim:  4 } }
-  //
-  optional BlobShape shape = 1;
-
-  // axis and num_axes control the portion of the bottom blob's shape that are
-  // replaced by (included in) the reshape. By default (axis == 0 and
-  // num_axes == -1), the entire bottom blob shape is included in the reshape,
-  // and hence the shape field must specify the entire output shape.
-  //
-  // axis may be non-zero to retain some portion of the beginning of the input
-  // shape (and may be negative to index from the end; e.g., -1 to begin the
-  // reshape after the last axis, including nothing in the reshape,
-  // -2 to include only the last axis, etc.).
-  //
-  // For example, suppose "input" is a 2D blob with shape 2 x 8.
-  // Then the following ReshapeLayer specifications are all equivalent,
-  // producing a blob "output" with shape 2 x 2 x 4:
-  //
-  //   reshape_param { shape { dim: 2  dim: 2  dim: 4 } }
-  //   reshape_param { shape { dim: 2  dim: 4 } axis:  1 }
-  //   reshape_param { shape { dim: 2  dim: 4 } axis: -3 }
-  //
-  // num_axes specifies the extent of the reshape.
-  // If num_axes >= 0 (and axis >= 0), the reshape will be performed only on
-  // input axes in the range [axis, axis+num_axes].
-  // num_axes may also be -1, the default, to include all remaining axes
-  // (starting from axis).
-  //
-  // For example, suppose "input" is a 2D blob with shape 2 x 8.
-  // Then the following ReshapeLayer specifications are equivalent,
-  // producing a blob "output" with shape 1 x 2 x 8.
-  //
-  //   reshape_param { shape { dim:  1  dim: 2  dim:  8 } }
-  //   reshape_param { shape { dim:  1  dim: 2  }  num_axes: 1 }
-  //   reshape_param { shape { dim:  1  }  num_axes: 0 }
-  //
-  // On the other hand, these would produce output blob shape 2 x 1 x 8:
-  //
-  //   reshape_param { shape { dim: 2  dim: 1  dim: 8  }  }
-  //   reshape_param { shape { dim: 1 }  axis: 1  num_axes: 0 }
-  //
-  optional int32 axis = 2 [default = 0];
-  optional int32 num_axes = 3 [default = -1];
-}
-
-message ScaleParameter {
-  // The first axis of bottom[0] (the first input Blob) along which to apply
-  // bottom[1] (the second input Blob).  May be negative to index from the end
-  // (e.g., -1 for the last axis).
-  //
-  // For example, if bottom[0] is 4D with shape 100x3x40x60, the output
-  // top[0] will have the same shape, and bottom[1] may have any of the
-  // following shapes (for the given value of axis):
-  //    (axis == 0 == -4) 100; 100x3; 100x3x40; 100x3x40x60
-  //    (axis == 1 == -3)          3;     3x40;     3x40x60
-  //    (axis == 2 == -2)                   40;       40x60
-  //    (axis == 3 == -1)                                60
-  // Furthermore, bottom[1] may have the empty shape (regardless of the value of
-  // "axis") -- a scalar multiplier.
-  optional int32 axis = 1 [default = 1];
-
-  // (num_axes is ignored unless just one bottom is given and the scale is
-  // a learned parameter of the layer.  Otherwise, num_axes is determined by the
-  // number of axes by the second bottom.)
-  // The number of axes of the input (bottom[0]) covered by the scale
-  // parameter, or -1 to cover all axes of bottom[0] starting from `axis`.
-  // Set num_axes := 0, to multiply with a zero-axis Blob: a scalar.
-  optional int32 num_axes = 2 [default = 1];
-
-  // (filler is ignored unless just one bottom is given and the scale is
-  // a learned parameter of the layer.)
-  // The initialization for the learned scale parameter.
-  // Default is the unit (1) initialization, resulting in the ScaleLayer
-  // initially performing the identity operation.
-  optional FillerParameter filler = 3;
-
-  // Whether to also learn a bias (equivalent to a ScaleLayer+BiasLayer, but
-  // may be more efficient).  Initialized with bias_filler (defaults to 0).
-  optional bool bias_term = 4 [default = false];
-  optional FillerParameter bias_filler = 5;
-}
-
-message SigmoidParameter {
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 1 [default = DEFAULT];
-}
-
-message SliceParameter {
-  // The axis along which to slice -- may be negative to index from the end
-  // (e.g., -1 for the last axis).
-  // By default, SliceLayer concatenates blobs along the "channels" axis (1).
-  optional int32 axis = 3 [default = 1];
-  repeated uint32 slice_point = 2;
-
-  // DEPRECATED: alias for "axis" -- does not support negative indexing.
-  optional uint32 slice_dim = 1 [default = 1];
-}
-
-// Message that stores parameters used by SoftmaxLayer, SoftmaxWithLossLayer
-message SoftmaxParameter {
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 1 [default = DEFAULT];
-
-  // The axis along which to perform the softmax -- may be negative to index
-  // from the end (e.g., -1 for the last axis).
-  // Any other axes will be evaluated as independent softmaxes.
-  optional int32 axis = 2 [default = 1];
-}
-
-message TanHParameter {
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 1 [default = DEFAULT];
-}
-
-// Message that stores parameters used by TileLayer
-message TileParameter {
-  // The index of the axis to tile.
-  optional int32 axis = 1 [default = 1];
-
-  // The number of copies (tiles) of the blob to output.
-  optional int32 tiles = 2;
-}
-
-// Message that stores parameters used by ThresholdLayer
-message ThresholdParameter {
-  optional float threshold = 1 [default = 0]; // Strictly positive values
-}
-
-message WindowDataParameter {
-  // Specify the data source.
-  optional string source = 1;
-  // For data pre-processing, we can do simple scaling and subtracting the
-  // data mean, if provided. Note that the mean subtraction is always carried
-  // out before scaling.
-  optional float scale = 2 [default = 1];
-  optional string mean_file = 3;
-  // Specify the batch size.
-  optional uint32 batch_size = 4;
-  // Specify if we would like to randomly crop an image.
-  optional uint32 crop_size = 5 [default = 0];
-  // Specify if we want to randomly mirror data.
-  optional bool mirror = 6 [default = false];
-  // Foreground (object) overlap threshold
-  optional float fg_threshold = 7 [default = 0.5];
-  // Background (non-object) overlap threshold
-  optional float bg_threshold = 8 [default = 0.5];
-  // Fraction of batch that should be foreground objects
-  optional float fg_fraction = 9 [default = 0.25];
-  // Amount of contextual padding to add around a window
-  // (used only by the window_data_layer)
-  optional uint32 context_pad = 10 [default = 0];
-  // Mode for cropping out a detection window
-  // warp: cropped window is warped to a fixed size and aspect ratio
-  // square: the tightest square around the window is cropped
-  optional string crop_mode = 11 [default = "warp"];
-  // cache_images: will load all images in memory for faster access
-  optional bool cache_images = 12 [default = false];
-  // append root_folder to locate images
-  optional string root_folder = 13 [default = ""];
-}
-
-message SPPParameter {
-  enum PoolMethod {
-    MAX = 0;
-    AVE = 1;
-    STOCHASTIC = 2;
-  }
-  optional uint32 pyramid_height = 1;
-  optional PoolMethod pool = 2 [default = MAX]; // The pooling method
-  enum Engine {
-    DEFAULT = 0;
-    CAFFE = 1;
-    CUDNN = 2;
-  }
-  optional Engine engine = 6 [default = DEFAULT];
-}
-
-// DEPRECATED: use LayerParameter.
-message V1LayerParameter {
-  repeated string bottom = 2;
-  repeated string top = 3;
-  optional string name = 4;
-  repeated NetStateRule include = 32;
-  repeated NetStateRule exclude = 33;
-  enum LayerType {
-    NONE = 0;
-    ABSVAL = 35;
-    ACCURACY = 1;
-    ARGMAX = 30;
-    BNLL = 2;
-    CONCAT = 3;
-    CONTRASTIVE_LOSS = 37;
-    CONVOLUTION = 4;
-    DATA = 5;
-    DECONVOLUTION = 39;
-    DROPOUT = 6;
-    DUMMY_DATA = 32;
-    EUCLIDEAN_LOSS = 7;
-    ELTWISE = 25;
-    EXP = 38;
-    FLATTEN = 8;
-    HDF5_DATA = 9;
-    HDF5_OUTPUT = 10;
-    HINGE_LOSS = 28;
-    IM2COL = 11;
-    IMAGE_DATA = 12;
-    INFOGAIN_LOSS = 13;
-    INNER_PRODUCT = 14;
-    LRN = 15;
-    MEMORY_DATA = 29;
-    MULTINOMIAL_LOGISTIC_LOSS = 16;
-    MVN = 34;
-    POOLING = 17;
-    POWER = 26;
-    RELU = 18;
-    SIGMOID = 19;
-    SIGMOID_CROSS_ENTROPY_LOSS = 27;
-    SILENCE = 36;
-    SOFTMAX = 20;
-    SOFTMAX_LOSS = 21;
-    SPLIT = 22;
-    SLICE = 33;
-    TANH = 23;
-    WINDOW_DATA = 24;
-    THRESHOLD = 31;
-  }
-  optional LayerType type = 5;
-  repeated BlobProto blobs = 6;
-  repeated string param = 1001;
-  repeated DimCheckMode blob_share_mode = 1002;
-  enum DimCheckMode {
-    STRICT = 0;
-    PERMISSIVE = 1;
-  }
-  repeated float blobs_lr = 7;
-  repeated float weight_decay = 8;
-  repeated float loss_weight = 35;
-  optional AccuracyParameter accuracy_param = 27;
-  optional ArgMaxParameter argmax_param = 23;
-  optional ConcatParameter concat_param = 9;
-  optional ContrastiveLossParameter contrastive_loss_param = 40;
-  optional ConvolutionParameter convolution_param = 10;
-  optional DataParameter data_param = 11;
-  optional DropoutParameter dropout_param = 12;
-  optional DummyDataParameter dummy_data_param = 26;
-  optional EltwiseParameter eltwise_param = 24;
-  optional ExpParameter exp_param = 41;
-  optional HDF5DataParameter hdf5_data_param = 13;
-  optional HDF5OutputParameter hdf5_output_param = 14;
-  optional HingeLossParameter hinge_loss_param = 29;
-  optional ImageDataParameter image_data_param = 15;
-  optional InfogainLossParameter infogain_loss_param = 16;
-  optional InnerProductParameter inner_product_param = 17;
-  optional LRNParameter lrn_param = 18;
-  optional MemoryDataParameter memory_data_param = 22;
-  optional MVNParameter mvn_param = 34;
-  optional PoolingParameter pooling_param = 19;
-  optional PowerParameter power_param = 21;
-  optional ReLUParameter relu_param = 30;
-  optional SigmoidParameter sigmoid_param = 38;
-  optional SoftmaxParameter softmax_param = 39;
-  optional SliceParameter slice_param = 31;
-  optional TanHParameter tanh_param = 37;
-  optional ThresholdParameter threshold_param = 25;
-  optional WindowDataParameter window_data_param = 20;
-  optional TransformationParameter transform_param = 36;
-  optional LossParameter loss_param = 42;
-  optional V0LayerParameter layer = 1;
-}
-
-// DEPRECATED: V0LayerParameter is the old way of specifying layer parameters
-// in Caffe.  We keep this message type around for legacy support.
-message V0LayerParameter {
-  optional string name = 1; // the layer name
-  optional string type = 2; // the string to specify the layer type
-
-  // Parameters to specify layers with inner products.
-  optional uint32 num_output = 3; // The number of outputs for the layer
-  optional bool biasterm = 4 [default = true]; // whether to have bias terms
-  optional FillerParameter weight_filler = 5; // The filler for the weight
-  optional FillerParameter bias_filler = 6; // The filler for the bias
-
-  optional uint32 pad = 7 [default = 0]; // The padding size
-  optional uint32 kernelsize = 8; // The kernel size
-  optional uint32 group = 9 [default = 1]; // The group size for group conv
-  optional uint32 stride = 10 [default = 1]; // The stride
-  enum PoolMethod {
-    MAX = 0;
-    AVE = 1;
-    STOCHASTIC = 2;
-  }
-  optional PoolMethod pool = 11 [default = MAX]; // The pooling method
-  optional float dropout_ratio = 12 [default = 0.5]; // dropout ratio
-
-  optional uint32 local_size = 13 [default = 5]; // for local response norm
-  optional float alpha = 14 [default = 1.]; // for local response norm
-  optional float beta = 15 [default = 0.75]; // for local response norm
-  optional float k = 22 [default = 1.];
-
-  // For data layers, specify the data source
-  optional string source = 16;
-  // For data pre-processing, we can do simple scaling and subtracting the
-  // data mean, if provided. Note that the mean subtraction is always carried
-  // out before scaling.
-  optional float scale = 17 [default = 1];
-  optional string meanfile = 18;
-  // For data layers, specify the batch size.
-  optional uint32 batchsize = 19;
-  // For data layers, specify if we would like to randomly crop an image.
-  optional uint32 cropsize = 20 [default = 0];
-  // For data layers, specify if we want to randomly mirror data.
-  optional bool mirror = 21 [default = false];
-
-  // The blobs containing the numeric parameters of the layer
-  repeated BlobProto blobs = 50;
-  // The ratio that is multiplied on the global learning rate. If you want to
-  // set the learning ratio for one blob, you need to set it for all blobs.
-  repeated float blobs_lr = 51;
-  // The weight decay that is multiplied on the global weight decay.
-  repeated float weight_decay = 52;
-
-  // The rand_skip variable is for the data layer to skip a few data points
-  // to avoid all asynchronous sgd clients to start at the same point. The skip
-  // point would be set as rand_skip * rand(0,1). Note that rand_skip should not
-  // be larger than the number of keys in the database.
-  optional uint32 rand_skip = 53 [default = 0];
-
-  // Fields related to detection (det_*)
-  // foreground (object) overlap threshold
-  optional float det_fg_threshold = 54 [default = 0.5];
-  // background (non-object) overlap threshold
-  optional float det_bg_threshold = 55 [default = 0.5];
-  // Fraction of batch that should be foreground objects
-  optional float det_fg_fraction = 56 [default = 0.25];
-
-  // optional bool OBSOLETE_can_clobber = 57 [default = true];
-
-  // Amount of contextual padding to add around a window
-  // (used only by the window_data_layer)
-  optional uint32 det_context_pad = 58 [default = 0];
-
-  // Mode for cropping out a detection window
-  // warp: cropped window is warped to a fixed size and aspect ratio
-  // square: the tightest square around the window is cropped
-  optional string det_crop_mode = 59 [default = "warp"];
-
-  // For ReshapeLayer, one needs to specify the new dimensions.
-  optional int32 new_num = 60 [default = 0];
-  optional int32 new_channels = 61 [default = 0];
-  optional int32 new_height = 62 [default = 0];
-  optional int32 new_width = 63 [default = 0];
-
-  // Whether or not ImageLayer should shuffle the list of files at every epoch.
-  // It will also resize images if new_height or new_width are not zero.
-  optional bool shuffle_images = 64 [default = false];
-
-  // For ConcatLayer, one needs to specify the dimension for concatenation, and
-  // the other dimensions must be the same for all the bottom blobs.
-  // By default it will concatenate blobs along the channels dimension.
-  optional uint32 concat_dim = 65 [default = 1];
-
-  optional HDF5OutputParameter hdf5_output_param = 1001;
-}
-
-message PReLUParameter {
-  // Parametric ReLU described in K. He et al, Delving Deep into Rectifiers:
-  // Surpassing Human-Level Performance on ImageNet Classification, 2015.
-
-  // Initial value of a_i. Default is a_i=0.25 for all i.
-  optional FillerParameter filler = 1;
-  // Whether or not slope parameters are shared across channels.
-  optional bool channel_shared = 2 [default = false];
-}
diff --git a/utilities/inference_generator/src/caffe2openvx.cpp b/utilities/inference_generator/src/caffe2openvx.cpp
deleted file mode 100644
index f837516a37..0000000000
--- a/utilities/inference_generator/src/caffe2openvx.cpp
+++ /dev/null
@@ -1,3033 +0,0 @@
-/*
-Copyright (c) 2017 - 2023 Advanced Micro Devices, Inc. All rights reserved.
-
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in
-all copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
-THE SOFTWARE.
-*/
-
-#include <iostream>
-#include <sstream>
-#include <iomanip>
-#include <fcntl.h>
-#include <fstream>
-#include <google/protobuf/text_format.h>
-#include <google/protobuf/io/zero_copy_stream_impl.h>
-#include "caffe.pb.h"
-#include <string.h>
-#include <stdio.h>
-#include <math.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <vector>
-
-#define error(...) printf("ERROR: " __VA_ARGS__), exit(1)
-#define info(...)  printf("OK: " __VA_ARGS__)
-
-//Dump Layer Data : disabled unless enabled explicitly by setting ENABLE_DUMP_LAYER_DATA = 1
-#ifndef ENABLE_DUMP_LAYER_DATA
-#define ENABLE_DUMP_LAYER_DATA 0
-#endif
-
-#ifndef ENABLE_DIRECTIVE
-#define ENABLE_DIRECTIVE 0
-#endif
-
-void getLayerParams(
-    const caffe::LayerParameter& layer,
-    std::string& params)
-{
-    if(layer.type() == "Convolution") {
-        const caffe::ConvolutionParameter& conv = layer.convolution_param();
-        int pad_h = conv.has_pad_h() ? conv.pad_h() : (conv.pad_size() > 0 ? conv.pad(0) : 0);
-        int pad_w = conv.has_pad_w() ? conv.pad_w() : (conv.pad_size() > 1 ? conv.pad(1) : pad_h);
-        int stride_h = conv.has_stride_h() ? conv.stride_h() : (conv.stride_size() > 0 ? conv.stride(0) : 1);
-        int stride_w = conv.has_stride_w() ? conv.stride_w() : (conv.stride_size() > 1 ? conv.stride(1) : stride_h);
-        int kernel_h = conv.has_kernel_h() ? conv.kernel_h() : (conv.kernel_size_size() > 0 ? conv.kernel_size(0) : 0);
-        int kernel_w = conv.has_kernel_w() ? conv.kernel_w() : (conv.kernel_size_size() > 1 ? conv.kernel_size(1) : kernel_h);
-        int k = conv.num_output();
-        int dilation_h = conv.dilation_size() > 0 ? conv.dilation(0) : 1;
-        int dilation_w = conv.dilation_size() > 1 ? conv.dilation(1) : dilation_h;
-        int bias_term = conv.bias_term();
-        int group = conv.has_group() ? conv.group() : 0;
-        params =       std::to_string(k)
-                + " " + std::to_string(kernel_w)
-                + " " + std::to_string(kernel_h)
-                + " " + std::to_string(stride_w)
-                + " " + std::to_string(stride_h)
-                + " " + std::to_string(pad_w)
-                + " " + std::to_string(pad_h)
-                + " " + std::to_string(dilation_w)
-                + " " + std::to_string(dilation_h)
-                + " " + std::to_string(bias_term)
-                + " " + std::to_string(group);
-    }
-    else if(layer.type() == "Pooling") {
-        const caffe::PoolingParameter& pooling = layer.pooling_param();
-        int pad_h = pooling.has_pad_h() ? pooling.pad_h() : pooling.pad();
-        int pad_w = pooling.has_pad_w() ? pooling.pad_w() : pooling.pad();
-        int stride_h = pooling.has_stride_h() ? pooling.stride_h() : pooling.stride();
-        int stride_w = pooling.has_stride_w() ? pooling.stride_w() : pooling.stride();
-        int kernel_h = pooling.has_kernel_h() ? pooling.kernel_h() : pooling.kernel_size();
-        int kernel_w = pooling.has_kernel_w() ? pooling.kernel_w() : pooling.kernel_size();
-        int pool = pooling.pool();
-        int global_pooling = pooling.global_pooling() == true ? 1 : 0;
-        params =       std::to_string(kernel_w)
-                + " " + std::to_string(kernel_h)
-                + " " + std::to_string(stride_w)
-                + " " + std::to_string(stride_h)
-                + " " + std::to_string(pad_w)
-                + " " + std::to_string(pad_h)
-                + " " + std::to_string(pool)
-                + " " + std::to_string(global_pooling);
-    }
-    else if(layer.type() == "InnerProduct") {
-        const caffe::InnerProductParameter& innerprod = layer.inner_product_param();
-        int k = innerprod.num_output();
-        int bias_term = innerprod.bias_term();
-        params = std::to_string(k) + " " + std::to_string(bias_term);
-    }
-    else if(layer.type() == "LRN") {
-        const caffe::LRNParameter& lrn = layer.lrn_param();
-        const caffe::LRNParameter::NormRegion& norm_region = lrn.norm_region();
-        params =       std::to_string(lrn.local_size())
-                + " " + std::to_string(lrn.alpha())
-                + " " + std::to_string(lrn.beta())
-                + " " + std::to_string(norm_region)
-                + " " + std::to_string(lrn.k());
-    }
-    else if(layer.type() == "BatchNorm") {
-        const caffe::BatchNormParameter& norm = layer.batch_norm_param();
-        int use_global_stats = norm.use_global_stats();
-        float eps = norm.eps();
-        params =       std::to_string(eps)
-                + " " + std::to_string(use_global_stats);
-    }
-    else if(layer.type() == "Scale") {
-        const caffe::ScaleParameter& scale = layer.scale_param();
-        params = std::to_string(scale.bias_term());
-    }
-    else if(layer.type() == "Dropout") {
-        const caffe::DropoutParameter& dropout = layer.dropout_param();
-        params = std::to_string(dropout.dropout_ratio());
-    }
-    else if(layer.type() == "Eltwise") {
-        const caffe::EltwiseParameter& eltwise = layer.eltwise_param();
-        params = std::to_string(eltwise.operation());
-    }
-    else if(layer.type() == "Deconvolution") {
-        const caffe::ConvolutionParameter& conv = layer.convolution_param();
-        int pad_h = conv.has_pad_h() ? conv.pad_h() : (conv.pad_size() > 0 ? conv.pad(0) : 0);
-        int pad_w = conv.has_pad_w() ? conv.pad_w() : (conv.pad_size() > 1 ? conv.pad(1) : pad_h);
-        int stride_h = conv.has_stride_h() ? conv.stride_h() : (conv.stride_size() > 0 ? conv.stride(0) : 1);
-        int stride_w = conv.has_stride_w() ? conv.stride_w() : (conv.stride_size() > 1 ? conv.stride(1) : stride_h);
-        int kernel_h = conv.has_kernel_h() ? conv.kernel_h() : (conv.kernel_size_size() > 0 ? conv.kernel_size(0) : 0);
-        int kernel_w = conv.has_kernel_w() ? conv.kernel_w() : (conv.kernel_size_size() > 1 ? conv.kernel_size(1) : kernel_h);
-        int k = conv.num_output();
-        int dilation_h = conv.dilation_size() > 0 ? conv.dilation(0) : 1;
-        int dilation_w = conv.dilation_size() > 1 ? conv.dilation(1) : dilation_h;
-        int bias_term = conv.bias_term();
-        params =       std::to_string(k)
-                + " " + std::to_string(kernel_w)
-                + " " + std::to_string(kernel_h)
-                + " " + std::to_string(stride_w)
-                + " " + std::to_string(stride_h)
-                + " " + std::to_string(pad_w)
-                + " " + std::to_string(pad_h)
-                + " " + std::to_string(dilation_w)
-                + " " + std::to_string(dilation_h)
-                + " " + std::to_string(bias_term);
-    }
-    else if(layer.type() == "ReLU") {
-        const caffe::ReLUParameter& relu = layer.relu_param();
-        float neg_slope = relu.has_negative_slope()? relu.negative_slope():0.0f;
-        params = std::to_string(neg_slope);
-    }
-}
-
-void getV1LayerParams(
-    const caffe::V1LayerParameter& layer,
-    std::string& params)
-{
-    if(layer.type() == caffe::V1LayerParameter_LayerType_CONVOLUTION) {
-        const caffe::ConvolutionParameter& conv = layer.convolution_param();
-        int pad_h = conv.has_pad_h() ? conv.pad_h() : (conv.pad_size() > 0 ? conv.pad(0) : 0);
-        int pad_w = conv.has_pad_w() ? conv.pad_w() : (conv.pad_size() > 1 ? conv.pad(1) : pad_h);
-        int stride_h = conv.has_stride_h() ? conv.stride_h() : (conv.stride_size() > 0 ? conv.stride(0) : 1);
-        int stride_w = conv.has_stride_w() ? conv.stride_w() : (conv.stride_size() > 1 ? conv.stride(1) : stride_h);
-        int kernel_h = conv.has_kernel_h() ? conv.kernel_h() : (conv.kernel_size_size() > 0 ? conv.kernel_size(0) : 0);
-        int kernel_w = conv.has_kernel_w() ? conv.kernel_w() : (conv.kernel_size_size() > 1 ? conv.kernel_size(1) : kernel_h);
-        int k = conv.num_output();
-        int dilation_h = conv.dilation_size() > 0 ? conv.dilation(0) : 1;
-        int dilation_w = conv.dilation_size() > 1 ? conv.dilation(1) : dilation_h;
-        int bias_term = conv.bias_term();
-        int group = conv.has_group() ? conv.group() : 0;
-        params =       std::to_string(k)
-                + " " + std::to_string(kernel_w)
-                + " " + std::to_string(kernel_h)
-                + " " + std::to_string(stride_w)
-                + " " + std::to_string(stride_h)
-                + " " + std::to_string(pad_w)
-                + " " + std::to_string(pad_h)
-                + " " + std::to_string(dilation_w)
-                + " " + std::to_string(dilation_h)
-                + " " + std::to_string(bias_term)
-                + " " + std::to_string(group);
-    }
-    else if(layer.type() == caffe::V1LayerParameter_LayerType_POOLING) {
-        const caffe::PoolingParameter& pooling = layer.pooling_param();
-        int pad_h = pooling.has_pad_h() ? pooling.pad_h() : pooling.pad();
-        int pad_w = pooling.has_pad_w() ? pooling.pad_w() : pooling.pad();
-        int stride_h = pooling.has_stride_h() ? pooling.stride_h() : pooling.stride();
-        int stride_w = pooling.has_stride_w() ? pooling.stride_w() : pooling.stride();
-        int kernel_h = pooling.has_kernel_h() ? pooling.kernel_h() : pooling.kernel_size();
-        int kernel_w = pooling.has_kernel_w() ? pooling.kernel_w() : pooling.kernel_size();
-        int pool = pooling.pool();
-        int global_pooling = pooling.global_pooling() == true ? 1 : 0;
-        params =       std::to_string(kernel_w)
-                + " " + std::to_string(kernel_h)
-                + " " + std::to_string(stride_w)
-                + " " + std::to_string(stride_h)
-                + " " + std::to_string(pad_w)
-                + " " + std::to_string(pad_h)
-                + " " + std::to_string(pool)
-                + " " + std::to_string(global_pooling);
-    }
-    else if(layer.type() == caffe::V1LayerParameter_LayerType_INNER_PRODUCT) {
-        const caffe::InnerProductParameter& innerprod = layer.inner_product_param();
-        int k = innerprod.num_output();
-        int bias_term = innerprod.bias_term();
-        params = std::to_string(k) + " " + std::to_string(bias_term);
-    }
-    else if(layer.type() == caffe::V1LayerParameter_LayerType_LRN) {
-        const caffe::LRNParameter& lrn = layer.lrn_param();
-        const caffe::LRNParameter::NormRegion& norm_region = lrn.norm_region();
-        params =       std::to_string(lrn.local_size())
-                + " " + std::to_string(lrn.alpha())
-                + " " + std::to_string(lrn.beta())
-                + " " + std::to_string(norm_region)
-                + " " + std::to_string(lrn.k());
-    }
-    else if(layer.type() == caffe::V1LayerParameter_LayerType_DROPOUT) {
-        const caffe::DropoutParameter& dropout = layer.dropout_param();
-        params = std::to_string(dropout.dropout_ratio());
-    }
-    else if(layer.type() == caffe::V1LayerParameter_LayerType_ELTWISE) {
-        const caffe::EltwiseParameter& eltwise = layer.eltwise_param();
-        params = std::to_string(eltwise.operation());
-    }
-    else if(layer.type() == caffe::V1LayerParameter_LayerType_DECONVOLUTION) {
-        const caffe::ConvolutionParameter& conv = layer.convolution_param();
-        int pad_h = conv.has_pad_h() ? conv.pad_h() : (conv.pad_size() > 0 ? conv.pad(0) : 0);
-        int pad_w = conv.has_pad_w() ? conv.pad_w() : (conv.pad_size() > 1 ? conv.pad(1) : pad_h);
-        int stride_h = conv.has_stride_h() ? conv.stride_h() : (conv.stride_size() > 0 ? conv.stride(0) : 1);
-        int stride_w = conv.has_stride_w() ? conv.stride_w() : (conv.stride_size() > 1 ? conv.stride(1) : stride_h);
-        int kernel_h = conv.has_kernel_h() ? conv.kernel_h() : (conv.kernel_size_size() > 0 ? conv.kernel_size(0) : 0);
-        int kernel_w = conv.has_kernel_w() ? conv.kernel_w() : (conv.kernel_size_size() > 1 ? conv.kernel_size(1) : kernel_h);
-        int k = conv.num_output();
-        int dilation_h = conv.dilation_size() > 0 ? conv.dilation(0) : 1;
-        int dilation_w = conv.dilation_size() > 1 ? conv.dilation(1) : dilation_h;
-        int bias_term = conv.bias_term();
-        params =       std::to_string(k)
-                + " " + std::to_string(kernel_w)
-                + " " + std::to_string(kernel_h)
-                + " " + std::to_string(stride_w)
-                + " " + std::to_string(stride_h)
-                + " " + std::to_string(pad_w)
-                + " " + std::to_string(pad_h)
-                + " " + std::to_string(dilation_w)
-                + " " + std::to_string(dilation_h)
-                + " " + std::to_string(bias_term);
-    }
-    else if(layer.type() == caffe::V1LayerParameter_LayerType_RELU) {
-        const caffe::ReLUParameter& relu = layer.relu_param();
-        float neg_slope = relu.has_negative_slope()? relu.negative_slope():0.0f;
-        params = std::to_string(neg_slope);
-    }
-}
-
-std::string convertV1LayerTypeToString(caffe::V1LayerParameter_LayerType V1type)
-{
-    if(V1type == caffe::V1LayerParameter_LayerType_CONCAT)
-        return("Concat");
-    else if(V1type == caffe::V1LayerParameter_LayerType_CONVOLUTION)
-        return("Convolution");
-    else if(V1type == caffe::V1LayerParameter_LayerType_DECONVOLUTION)
-        return("Deconvolution");
-    else if(V1type == caffe::V1LayerParameter_LayerType_DROPOUT)
-        return("Dropout");
-    else if(V1type == caffe::V1LayerParameter_LayerType_ELTWISE)
-        return("Eltwise");
-    else if(V1type == caffe::V1LayerParameter_LayerType_INNER_PRODUCT)
-        return("InnerProduct");
-    else if(V1type == caffe::V1LayerParameter_LayerType_LRN)
-        return("LRN");
-    else if(V1type == caffe::V1LayerParameter_LayerType_POOLING)
-        return("Pooling");
-    else if(V1type == caffe::V1LayerParameter_LayerType_RELU)
-        return("ReLU");
-    else if(V1type == caffe::V1LayerParameter_LayerType_SOFTMAX)
-        return("Softmax");
-    else
-        return("UnknownLayer");
-}
-
-void parseProtoTxt(caffe::NetParameter * param,
-                              std::vector<std::vector<std::string>>& net,
-                              int inputDim[4])
-{
-    // initialize outputNameMap and input dimensions if available
-    std::map<std::string,std::string> outputNameMap;
-    if(param->input_size() > 0) {
-        outputNameMap[param->input(0)] = param->input(0);
-    }
-
-    if(param->input_dim_size() == 4  && ((inputDim[0]==0) || (inputDim[1]==0) || (inputDim[2]==0) || (inputDim[3]==0)) ) {
-        inputDim[0] = param->input_dim(0);
-        inputDim[1] = param->input_dim(1);
-        inputDim[2] = param->input_dim(2);
-        inputDim[3] = param->input_dim(3);
-    }
-
-    // process network layer by layer
-    for(int i = 0; i < param->layer_size(); i++) {
-        // get current layer
-        const caffe::LayerParameter layer = param->layer(i);
-
-        if(layer.type() == "Input" || layer.type() == "Data" || layer.type() == "ImageData") {
-            outputNameMap[layer.top(0)] = layer.top(0);
-
-            if(layer.type() == "Input"  && ((inputDim[0]==0) || (inputDim[1]==0) || (inputDim[2]==0) || (inputDim[3]==0))) {
-                inputDim[0] = layer.input_param().shape(0).dim(0);
-                inputDim[1] = layer.input_param().shape(0).dim(1);
-                inputDim[2] = layer.input_param().shape(0).dim(2);
-                inputDim[3] = layer.input_param().shape(0).dim(3);
-            }
-            continue;
-        }
-
-        //Split type.
-        if(layer.type()=="Split") {
-            for(int j=0; j< layer.top_size() ; j++ )
-            {
-                // get layer information and add to net
-                std::vector<std::string> node;
-                node.push_back(layer.type());
-                node.push_back("");
-                node.push_back(layer.top(j));
-                node.push_back(layer.top(j));
-                for(int z = 0; z < layer.bottom_size();z++) {
-                    if(outputNameMap.find(layer.bottom(z)) == outputNameMap.end()) {
-                        outputNameMap[layer.bottom(z)] = layer.bottom(z);
-                    }
-                    node.push_back(outputNameMap[layer.bottom(z)]);
-
-                }
-                net.push_back(node);
-
-                // update output name with layer name
-                outputNameMap[layer.top(j)] = layer.top(j);
-            }
-            continue;
-        }
-
-        // get layer information and add to net
-        std::vector<std::string> node;
-        std::string params;
-        getLayerParams(layer, params);
-        node.push_back(layer.type());
-        node.push_back(params);
-        node.push_back(layer.top(0));
-        node.push_back(layer.name());
-        for(int j = 0; j < layer.bottom_size()  ; j++) {
-            if(outputNameMap.find(layer.bottom(j)) == outputNameMap.end()) {
-                outputNameMap[layer.bottom(j)] = layer.bottom(j);
-            }
-            node.push_back(outputNameMap[layer.bottom(j)]);
-        }
-        net.push_back(node);
-
-        // update output name with layer name
-        outputNameMap[layer.top(0)] = layer.name();
-    }
-}
-
-void parseV1LayerProtoTxt(caffe::NetParameter * param,
-                              std::vector<std::vector<std::string>>& net,
-                              int inputDim[4])
-{
-    // initialize outputNameMap and input dimensions if available
-    std::map<std::string,std::string> outputNameMap;
-    if(param->input_size() > 0) {
-        outputNameMap[param->input(0)] = param->input(0);
-    }
-
-    if(param->input_dim_size() == 4  && ((inputDim[0]==0) || (inputDim[1]==0) || (inputDim[2]==0) || (inputDim[3]==0)) ) {
-        inputDim[0] = param->input_dim(0);
-        inputDim[1] = param->input_dim(1);
-        inputDim[2] = param->input_dim(2);
-        inputDim[3] = param->input_dim(3);
-    }
-
-    // process network layer by layer
-    for(int i = 0; i < param->layers_size(); i++) {
-        // get current layer
-        const caffe::V1LayerParameter layer = param->layers(i);
-
-        if(layer.type() == caffe::V1LayerParameter_LayerType_DATA || layer.type() == caffe::V1LayerParameter_LayerType_IMAGE_DATA) {
-            outputNameMap[layer.top(0)] = layer.top(0);
-            continue;
-        }
-
-        //Split type.
-        if(layer.type()== caffe::V1LayerParameter_LayerType_SPLIT) {
-            for(int j=0; j< layer.top_size() ; j++ )
-            {
-                // get layer information and add to net
-                std::vector<std::string> node;
-                node.push_back(convertV1LayerTypeToString(layer.type()));
-                node.push_back("");
-                node.push_back(layer.top(j));
-                node.push_back(layer.top(j));
-                for(int z = 0; z < layer.bottom_size();z++) {
-                    if(outputNameMap.find(layer.bottom(z)) == outputNameMap.end()) {
-                        outputNameMap[layer.bottom(z)] = layer.bottom(z);
-                    }
-                    node.push_back(outputNameMap[layer.bottom(z)]);
-
-                }
-                net.push_back(node);
-
-                // update output name with layer name
-                outputNameMap[layer.top(j)] = layer.top(j);
-            }
-            continue;
-        }
-
-        // get layer information and add to net
-        std::vector<std::string> node;
-        std::string params;
-        getV1LayerParams(layer, params);
-        node.push_back(convertV1LayerTypeToString(layer.type()));
-        node.push_back(params);
-        node.push_back(layer.top(0));
-        node.push_back(layer.name());
-        for(int j = 0; j < layer.bottom_size()  ; j++) {
-            if(outputNameMap.find(layer.bottom(j)) == outputNameMap.end()) {
-                outputNameMap[layer.bottom(j)] = layer.bottom(j);
-            }
-            node.push_back(outputNameMap[layer.bottom(j)]);
-        }
-        net.push_back(node);
-
-        // update output name with layer name
-        outputNameMap[layer.top(0)] = layer.name();
-    }
-}
-
-int loadCaffeProtoTxt(
-    const char * prototxtFileName,
-    std::vector<std::vector<std::string>>& net,
-    int inputDim[4])
-{
-    // verify that the version of the library that we linked against is
-    // compatible with the version of the headers we compiled against.
-    GOOGLE_PROTOBUF_VERIFY_VERSION;
-
-    //google::protobuf::Message * msg = new google::protobuf::Message();
-    caffe::NetParameter * msg = new caffe::NetParameter();
-
-    // open prototxt and parse
-    int fd = open(prototxtFileName, O_RDONLY);
-    if(fd < 0)
-        error("unable to open: %s\n", prototxtFileName);
-    google::protobuf::io::FileInputStream fi(fd);
-    fi.SetCloseOnDelete(true);
-    if (!google::protobuf::TextFormat::Parse(&fi, msg))
-        error("failed to parse file: %s\n", prototxtFileName);
-    info("loadCaffeProtoTxt: loading %s from %s\n", msg->has_name() ? msg->name().c_str() : "(none)", prototxtFileName);
-
-    if(msg->layer_size() > 0) {
-        parseProtoTxt(msg, net, inputDim);
-    }
-    else if(msg->layers_size() > 0) {
-        info("Reading V1 layer parameters from %s\n", prototxtFileName);
-        parseV1LayerProtoTxt(msg, net, inputDim);
-    }
-    else {
-        error("No 'layers' or 'layer' fields found in the prototxt\n");
-        return -1;
-    }
-    return 0;
-}
-
-int calculateTensorDim(
-    std::vector<std::vector<std::string>>& net,
-    int inputDim[4],
-    std::map<std::string,std::vector<int>>& tensorMap)
-{
-    tensorMap[net[0][4]] = std::vector<int>{inputDim[0], inputDim[1], inputDim[2], inputDim[3]};
-
-    for(auto& node : net) {
-        auto&& type = node[0];
-        auto&& params = node[1];
-        auto&& output = node[3];
-        auto&& input = node[4];
-        auto&& it = tensorMap.find(input);
-        if(it == tensorMap.end()) {
-            error("calculateTensorDim: no dims found for %s\n", input.c_str());
-        }
-
-        auto&& idim = it->second;
-        int n = idim[0], c = idim[1], H = idim[2], W = idim[3];
-        int k = c, h = H, w = W;
-
-        if (n < 1 || c < 1 || H < 1 || W < 1)
-            error("calculateTensorDim: got invalid dim %dx%dx%dx%d for %s\n", n, c, H, W, input.c_str());
-
-        if(type == "Convolution") {
-            std::stringstream ss(params);
-            int kernel_w, kernel_h, stride_w, stride_h, pad_w, pad_h, dilation_w, dilation_h, bias_term;
-            ss >> k >> kernel_w >> kernel_h >> stride_w >> stride_h >> pad_w >> pad_h >> dilation_w >> dilation_h >> bias_term;
-            w = ((W + 2 * pad_w - kernel_w - (kernel_w - 1) * (dilation_w - 1)) / stride_w) + 1;
-            h = ((H + 2 * pad_h - kernel_h - (kernel_h - 1) * (dilation_h - 1)) / stride_h) + 1;
-            tensorMap[output + "_W"] = std::vector<int>{k, c, kernel_h, kernel_w};
-            if(bias_term) {
-                tensorMap[output + "_B"] = std::vector<int>{k};
-            }
-        }
-        else if(type == "Deconvolution") {
-            std::stringstream ss(params);
-            int kernel_w, kernel_h, stride_w, stride_h, pad_w, pad_h, dilation_w, dilation_h, bias_term;
-            ss >> k >> kernel_w >> kernel_h >> stride_w >> stride_h >> pad_w >> pad_h >> dilation_w >> dilation_h >> bias_term;
-            w = stride_w * (W - 1) + dilation_w * (kernel_w - 1) + 1 - ( 2* pad_w );
-            h = stride_h * (H - 1) + dilation_h * (kernel_h - 1) + 1 - ( 2* pad_h );
-            tensorMap[output + "_W"] = std::vector<int>{k, c, kernel_h, kernel_w};
-            if(bias_term) {
-                tensorMap[output + "_B"] = std::vector<int>{k};
-            }
-        }
-        else if(type == "Pooling") {
-            std::stringstream ss(params);
-            int kernel_w, kernel_h, stride_w, stride_h, pad_w, pad_h, pool, global_pooling;
-            ss >> kernel_w >> kernel_h >> stride_w >> stride_h >> pad_w >> pad_h >> pool >> global_pooling;
-            if(global_pooling) {
-                // Compute kernel_w and kernel_h and write back the params for the GDF and C-code gen
-                kernel_h = H;
-                kernel_w = W;
-                pad_h = pad_w = 0;
-                stride_h = stride_w = 1;
-                params =        std::to_string(kernel_w)
-                        + " " + std::to_string(kernel_h)
-                        + " " + std::to_string(stride_w)
-                        + " " + std::to_string(stride_h)
-                        + " " + std::to_string(pad_w)
-                        + " " + std::to_string(pad_h)
-                        + " " + std::to_string(pool)
-                        + " " + std::to_string(global_pooling);
-            }
-            w = static_cast<int>(ceil( static_cast<float> (W + 2 * pad_w + stride_w - kernel_w)/ stride_w));
-            h = static_cast<int>(ceil( static_cast<float> (H + 2 * pad_h + stride_h - kernel_h)/ stride_h));
-            if(pad_h > 0) if((h-1)*stride_h >= (H+pad_h)) h=h-1;
-            if(pad_w > 0) if((w-1)*stride_w >= (W+pad_w)) w=w-1;
-        }
-        else if(type == "InnerProduct") {
-            std::stringstream ss(params);
-            ss >> k;
-            w = 1;
-            h = 1;
-            tensorMap[output + "_W"] = std::vector<int>{k, c, H, W};
-        }
-        else if(type == "Concat") {
-            for(int i = 5; i < node.size(); i++) {
-                auto&& dim = tensorMap[node[i]];
-                k += dim[1];
-                if(dim[0] != n || dim[2] != H || dim[3] != W)
-                    error("calculateTensorDim: Concat: got invalid dim %dx%dx%dx%d for %s (should be %dx*x%dx%d)\n", dim[0], dim[1], dim[2], dim[3], node[i].c_str(), n, H, W);
-            }
-        }
-        else if(type == "SoftmaxWithLoss") {
-            output = node[5];
-        }
-        else if (type == "BatchNorm") {
-            std::stringstream ss(params);
-            int use_global_stats;
-            float eps;
-            ss >> eps >> use_global_stats;
-            tensorMap[output + "_W"] = std::vector<int>{k};
-            tensorMap[output + "_B"] = std::vector<int>{k};
-        }
-        else if(type == "Scale") {
-            std::stringstream ss(params);
-            int bias_term;
-            ss >> bias_term;
-            tensorMap[output + "_W"] = std::vector<int>{k};
-            if(bias_term) {
-                tensorMap[output + "_B"] = std::vector<int>{k};
-            }
-        }
-
-        tensorMap[output] = std::vector<int>{n, k, h, w};
-        if(n < 1 || k < 1 || h < 1 || w < 1)
-            error("calculateTensorDim: got invalid dim %dx%dx%dx%d for %s\n", n, k, h, w, output.c_str());
-    }
-    return 0;
-}
-
-std::string getIdentifierName(const std::string name)
-{
-    size_t N = name.size();
-    const char * s = name.c_str();
-    std::string cname = (N > 0 && std::isdigit(s[0])) ? "_" : "";
-    for(size_t i = 0; i < N; i++) {
-        cname += std::isalnum(s[i]) ? s[i] : '_';
-    }
-    return cname;
-}
-
-void writeGDF(
-    std::ostream& ofsGDF,
-    std::vector<std::vector<std::string>>& net,
-    std::map<std::string,std::vector<int>>& tensorMap,
-    std::string tensorType,
-    int fixedPointPosition,
-    std::string convertPolicy,
-    std::string roundPolicy,
-    bool isVirtualEnabled,
-    std::string outputFolder,
-    bool bFuseScaleLayer)
-{
-    std::map<std::string,bool> tensorCheck;
-    ofsGDF << "import vx_nn" << std::endl;
-    bool bfuse_scale_layer = bFuseScaleLayer;
-
-    for(auto& node : net) {
-        // create input/output tensor objects
-        bool isFirstLayer = (&node == &net.front());
-        bool isLastLayer = (&node == &net.back());
-        for(size_t i = 4; i < node.size(); i++) {
-            if(node[i] != "" && tensorCheck.find(node[i]) == tensorCheck.end()) {
-                auto&& dim = tensorMap[node[i]];
-                if((isVirtualEnabled && isFirstLayer) || (isVirtualEnabled && isLastLayer)) {
-                    ofsGDF << "data " << node[i] << " = tensor:4,{" << dim[3] << "," << dim[2] << "," << dim[1] << "," << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                    tensorCheck[node[i]] = true;
-                    if(!isLastLayer) {
-                        ofsGDF << "read data input.f32" << std::endl;
-                    }
-                }
-                else {
-                    if(isVirtualEnabled) {
-                        ofsGDF << "data " << node[i] << " = virtual-tensor:4,{" << dim[3] << "," << dim[2] << "," << dim[1] << "," << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                        tensorCheck[node[i]] = true;
-                    }
-                    else {
-                        ofsGDF << "data " << node[i] << " = tensor:4,{" << dim[3] << "," << dim[2] << "," << dim[1] << "," << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                        tensorCheck[node[i]]= true;
-                        if(isFirstLayer) ofsGDF << "read data input.f32" << std::endl;
-                    }
-                }
-            }
-        }
-        auto&& output = node[3];
-        if (node[0] == "BatchNorm" && !isLastLayer && bfuse_scale_layer) {
-            auto& next_node = *std::next(&node);
-            if (next_node[0] == "Scale") {
-                auto&& next_output = next_node[3];
-                auto&& odim = tensorMap[next_output];
-                tensorCheck[output] = true; // make sure next node doesn't create input tensor
-                if(!tensorCheck[next_output]) {
-                    if(!isVirtualEnabled) {
-                        ofsGDF << "data " << next_output << " = tensor:4,{" << odim[3] << "," << odim[2] << "," << odim[1] << "," << odim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                    }
-                    else {
-                        if(!isLastLayer) {
-                            ofsGDF << "data " << next_output << " = virtual-tensor:4,{" << odim[3] << "," << odim[2] << "," << odim[1] << "," << odim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                        }
-                        else {
-                            ofsGDF << "data " << next_output << " = tensor:4,{" << odim[3] << "," << odim[2] << "," << odim[1] << "," << odim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                        }
-                    }
-#if ENABLE_DIRECTIVE
-                    ofsGDF << "directive " << next_output << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl;
-#endif
-                }
-                tensorCheck[next_output] = true;
-                bfuse_scale_layer = true;
-            }
-        }
-
-        if (node[0] == "Scale" && !isFirstLayer && bfuse_scale_layer) {
-            auto& prev_node = *std::prev(&node);
-            if (prev_node[0]=="BatchNorm")
-            continue;
-        }
-
-        auto&& odim = tensorMap[output];
-        if(!tensorCheck[output]) {
-            if(!isVirtualEnabled) {
-                ofsGDF << "data " << output << " = tensor:4,{" << odim[3] << "," << odim[2] << "," << odim[1] << "," << odim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-            } else {
-                if(!isLastLayer) {
-                    ofsGDF << "data " << output << " = virtual-tensor:4,{" << odim[3] << "," << odim[2] << "," << odim[1] << "," << odim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                }
-                else {
-                    ofsGDF << "data " << output << " = tensor:4,{" << odim[3] << "," << odim[2] << "," << odim[1] << "," << odim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                }
-            }
-#if ENABLE_DIRECTIVE
-            ofsGDF << "directive " << output << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl;
-#endif
-        }
-        tensorCheck[output] = true;
-
-        // create node object
-        auto&& type = node[0];
-        auto&& params = node[1];
-        std::string layer_name = getIdentifierName(node[3]);
-        if(type == "Convolution") {
-            std::stringstream ss(params);
-            int k, kernel_w, kernel_h, stride_w, stride_h, pad_w, pad_h, dilation_w, dilation_h, bias_term, group;
-            ss >> k >> kernel_w >> kernel_h >> stride_w >> stride_h >> pad_w >> pad_h >> dilation_w >> dilation_h >> bias_term >> group;
-
-            if(group > 1) {
-                // Slice the input tensor into group tensors
-                auto&& dim_ip_grp = tensorMap[node[4]];
-
-                for(int g = 0; g < group; g++) {
-                    if(!isVirtualEnabled) {
-                        ofsGDF << "data " << node[4] << "_grp" << g << " = tensor:4,{" << dim_ip_grp[3] << "," << dim_ip_grp[2] << "," << dim_ip_grp[1]/group << "," << dim_ip_grp[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                    }
-                    else {
-                        ofsGDF << "data " << node[4] << "_grp" << g << " = virtual-tensor:4,{" << dim_ip_grp[3] << "," << dim_ip_grp[2] << "," << dim_ip_grp[1]/group << "," << dim_ip_grp[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                    }
-                }
-
-                // Conv
-                auto&& dim_op_grp = tensorMap[node[3]];
-                auto&& dim_w = tensorMap[output + "_W"];
-
-                for(int g = 0; g < group; g++) {
-                    if(!isVirtualEnabled) {
-                        ofsGDF << "data " << output << "_grp" << g << " = tensor:4,{" << dim_op_grp[3] << "," << dim_op_grp[2] << "," << dim_op_grp[1]/group << "," << dim_op_grp[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                    }
-                    else {
-                        ofsGDF << "data " << output << "_grp" << g << " = virtual-tensor:4,{" << dim_op_grp[3] << "," << dim_op_grp[2] << "," << dim_op_grp[1]/group << "," << dim_op_grp[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                    }
-
-                    ofsGDF << "data " << output << "_grp" << g << "_W" << " = tensor:4,{" << dim_w[3] << "," << dim_w[2] << "," << dim_w[1]/group << "," << dim_w[0]/group << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                    ofsGDF << "init " << output << "_grp" << g << "_W weights/" << layer_name << "_grp" << g << ".f32" << std::endl;
-#if ENABLE_DIRECTIVE
-                    ofsGDF << "directive " << output << "_grp" << g << "_W" << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl;
-#endif
-
-                    if(bias_term){
-                        ofsGDF << "data " << output << "_grp" << g << "_B" << " = tensor:1,{" << k / group << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                        ofsGDF << "init " << output << "_grp" << g << "_B bias/" << layer_name << "_grp" << g << ".f32" << std::endl;
-#if ENABLE_DIRECTIVE
-                        ofsGDF << "directive " << output << "_grp" << g << "_B" << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl;
-#endif
-                    }
-                }
-
-                ofsGDF << "data " << node[3] << "_params = " << " scalar:VX_TYPE_NN_CONVOLUTION_PARAMS,{" << pad_w << "," << pad_h << "," << convertPolicy << "," << roundPolicy << ",VX_NN_DS_SIZE_ROUNDING_FLOOR," << dilation_w-1 << "," << dilation_h-1 << "}" << std::endl;
-                tensorCheck[output + "_W"] = true;
-                if(bias_term) tensorCheck[output + "_B"] = true;
-
-                ofsGDF << "node com.amd.nn_extension.slice_layer ";
-                ofsGDF << node[4];
-                for(int g = 0; g < group; g++) {
-                    ofsGDF << " " << node[4] << "_grp" << g;
-                }
-                ofsGDF << std::endl;
-#if ENABLE_DUMP_LAYER_DATA
-                for(int g = 0; g < group; g++) {
-                    ofsGDF << "write "<< node[4] << "_grp" << g << " out/"<< node[4] << "_grp" << g << ".f32" << std::endl;
-                }
-#endif
-
-                for(int g = 0; g < group; g++) {
-                    ofsGDF << "node org.khronos.nn_extension.convolution_layer ";
-                    ofsGDF << node[4] << "_grp" << g << " ";
-                    ofsGDF << node[3] << "_grp" << g << "_W ";
-                    if(bias_term)
-                        ofsGDF << node[3] << "_grp" << g << "_B ";
-                    else
-                        ofsGDF << "NULL ";
-                    ofsGDF << node[3] << "_params ";
-                    ofsGDF << node[3] << "_grp" << g << std::endl;
-
-#if ENABLE_DUMP_LAYER_DATA
-                    ofsGDF << "write "<< node[3] << "_grp" << g << " out/"<< layer_name << ".f32" << std::endl;
-#endif
-                }
-
-                ofsGDF << "node com.amd.nn_extension.concat_layer ";
-                ofsGDF << node[3];
-                for(int g = 0; g < group; g++) {
-                    ofsGDF << " " << node[3] << "_grp" << g;
-                }
-                ofsGDF << std::endl;
-#if ENABLE_DUMP_LAYER_DATA
-                for(int g = 0; g < group; g++) {
-                    ofsGDF << "write "<< node[3] << "_grp" << g << " out/"<< node[3] << "_grp" << g << ".f32" << std::endl;
-                }
-#endif
-            }
-            else {
-                std::string weights = output + "_W";
-                auto&& dim = tensorMap[weights];
-                ofsGDF << "data " << weights << " = tensor:4,{" << dim[3] << "," << dim[2] << "," << dim[1] << "," << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                ofsGDF << "init " << weights << " ";
-                ofsGDF << "weights/" << layer_name << ".f32" << std::endl;
-#if ENABLE_DIRECTIVE
-                ofsGDF << "directive " << weights << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl;
-#endif
-                tensorCheck[weights] = true;
-                std::string bias = "NULL";
-                if(bias_term) {
-                    bias = output + "_B";
-                    ofsGDF << "data " << bias << " = tensor:1,{" << k << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                    ofsGDF << "init " << bias << " ";
-                    ofsGDF << "bias/"<< layer_name << ".f32" << std::endl;
-#if ENABLE_DIRECTIVE
-                    ofsGDF << "directive " << bias << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl;
-#endif
-                    tensorCheck[bias] = true;
-                }
-
-                ofsGDF << "data " << node[3] << "_params = " << " scalar:VX_TYPE_NN_CONVOLUTION_PARAMS,{" << pad_w << "," << pad_h << "," << convertPolicy << "," << roundPolicy << ",VX_NN_DS_SIZE_ROUNDING_FLOOR," << dilation_w-1 << "," << dilation_h-1 << "}" << std::endl;
-                ofsGDF << "node org.khronos.nn_extension.convolution_layer " << node[4] << " " << node[3] << "_W" << " " << bias << " "
-                       << node[3] <<"_params"
-                       << " " << node[3]
-                       << std::endl;
-#if ENABLE_DUMP_LAYER_DATA
-                ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl;
-#endif
-            }
-        }
-        else if (type == "Deconvolution") {
-            std::stringstream ss(params);
-            int k, kernel_w, kernel_h, stride_w, stride_h, pad_w, pad_h, dilation_w, dilation_h, bias_term;
-            ss >> k >> kernel_w >> kernel_h >> stride_w >> stride_h >> pad_w >> pad_h >> dilation_w >> dilation_h >> bias_term;
-            std::string weights = output + "_W";
-            auto&& dim = tensorMap[weights];
-            ofsGDF << "data " << weights << " = tensor:4,{" << dim[3] << "," << dim[2] << "," << dim[1] << "," << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-            ofsGDF << "init " << weights << " weights/" << layer_name << ".f32" << std::endl;
-#if ENABLE_DIRECTIVE
-            ofsGDF << "directive " << weights << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl;
-#endif
-            tensorCheck[weights] = true;
-            std::string bias = "NULL";
-            if(bias_term) {
-                bias = output + "_B";
-                ofsGDF << "data " << bias << " = tensor:1,{" << k << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                ofsGDF << "init " << bias << " bias/"<< layer_name << ".f32" << std::endl;
-#if ENABLE_DIRECTIVE
-                ofsGDF << "directive " << bias << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl;
-#endif
-                tensorCheck[bias] = true;
-            }
-
-            ofsGDF << "data " << node[3] << "_params = " << " scalar:VX_TYPE_NN_DECONVOLUTION_PARAMS,{" << pad_w << "," << pad_h << "," << convertPolicy << "," << roundPolicy << "," << dilation_w-1 << "," << dilation_h-1 << "}" << std::endl;
-            ofsGDF << "node org.khronos.nn_extension.deconvolution_layer " << node[4] << " " << node[3] << "_W" << " " << bias << " "
-                   << node[3] <<"_params"
-                   << " " << node[3]
-                   << std::endl;
-#if ENABLE_DUMP_LAYER_DATA
-            ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl;
-#endif
-        }
-        else if(type == "Pooling") {
-            std::stringstream ss(params);
-            int kernel_w, kernel_h, stride_w, stride_h, pad_w, pad_h, pool;
-            ss >> kernel_w >> kernel_h >> stride_w >> stride_h >> pad_w >> pad_h >> pool;
-            if((pool != 0 && pool != 1)) error("writeGDF: pooling_layer supports only MAX and AVG\n");
-            ofsGDF << "data " << node[3] <<"_type = " <<  " scalar:VX_TYPE_ENUM," << (pool == 0 ? "VX_NN_POOLING_MAX" : "VX_NN_POOLING_AVG")<< std::endl;
-            ofsGDF << "data " << node[3] <<"_kernel_w = " << "scalar:VX_TYPE_SIZE," << kernel_w << std::endl;
-            ofsGDF << "data " << node[3] <<"_kernel_h = " << "scalar:VX_TYPE_SIZE," << kernel_h << std::endl;
-            ofsGDF << "data " << node[3] <<"_pad_w = " << "scalar:VX_TYPE_SIZE," << pad_w << std::endl;
-            ofsGDF << "data " << node[3] <<"_pad_h = " << "scalar:VX_TYPE_SIZE," << pad_h << std::endl;
-            ofsGDF << "data " << node[3] <<"_roundPolicy = " << " scalar:VX_TYPE_ENUM," << roundPolicy << std::endl;
-            ofsGDF << "node org.khronos.nn_extension.pooling_layer " << node[4] << " "
-                   << node[3] << "_type" << " "
-                   << node[3] << "_kernel_w "
-                   << node[3] << "_kernel_h "
-                   << node[3] << "_pad_w "
-                   << node[3] << "_pad_h "
-                   << node[3] << "_roundPolicy"
-                   << " " << node[3]
-                   << std::endl;
-#if ENABLE_DUMP_LAYER_DATA
-            ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl;
-#endif
-        }
-        else if(type == "InnerProduct") {
-            std::stringstream ss(params);
-            int k, bias_term;
-            ss >> k >> bias_term;
-            std::string weights = output + "_W";
-            auto&& dim = tensorMap[weights];
-            ofsGDF << "data " << weights << " = tensor:4,{" << dim[3] << "," << dim[2] << "," << dim[1] << "," << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-            ofsGDF << "init " << weights << " weights/"<< layer_name << ".f32" << std::endl;
-#if ENABLE_DIRECTIVE
-            ofsGDF << "directive " << weights << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl;
-#endif
-            tensorCheck[weights] = true;
-            std::string bias = "NULL";
-            if(bias_term) {
-                bias = output + "_B";
-                ofsGDF << "data " << bias << " = tensor:1,{" << k << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                ofsGDF << "init " << bias << " bias/"<< layer_name << ".f32" << std::endl;
-#if ENABLE_DIRECTIVE
-                ofsGDF << "directive " << bias << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl;
-#endif
-                tensorCheck[bias] = true;
-            }
-            ofsGDF << "data " << node[3] <<"_convertPolicy = " << " scalar:VX_TYPE_ENUM," << convertPolicy << std::endl;
-            ofsGDF << "data " << node[3] <<"_roundPolicy =" << " scalar:VX_TYPE_ENUM,VX_" << roundPolicy << std::endl;
-            ofsGDF << "node org.khronos.nn_extension.fully_connected_layer " << node[4] << " " << node[3] << "_W" << " " << bias << " "
-                   << node[3] << "_convertPolicy "
-                   << node[3] << "_roundPolicy"
-                   << " " << node[3]
-                   << std::endl;
-#if ENABLE_DUMP_LAYER_DATA
-            ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl;
-#endif
-        }
-        else if(type == "ReLU") {
-            std::stringstream ss(params);
-            float neg_slope;
-            ss >> neg_slope;
-            if (!neg_slope)
-            {
-                ofsGDF << "data " << node[3] << "_mode = " << " scalar:VX_TYPE_ENUM,VX_NN_ACTIVATION_RELU" << std::endl;
-                ofsGDF << "data " << node[3] << "_param_a =" << " scalar:VX_TYPE_FLOAT32,0" << std::endl;
-            }else
-            {
-                ofsGDF << "data " << node[3] << "_mode = " << " scalar:VX_TYPE_ENUM,VX_NN_ACTIVATION_LEAKY_RELU" << std::endl;
-                ofsGDF << "data " << node[3] << "_param_a =" << " scalar:VX_TYPE_FLOAT32," << neg_slope << std::endl;
-            }
-            ofsGDF << "data " << node[3] << "_param_b =" << " scalar:VX_TYPE_FLOAT32,0" << std::endl;
-            ofsGDF << "node org.khronos.nn_extension.activation_layer " << node[4] << " "
-                   << node[3] << "_mode "
-                   << node[3] << "_param_a "
-                   << node[3] << "_param_b"
-                   << " " << node[3]
-                   << std::endl;
-#if ENABLE_DUMP_LAYER_DATA
-            ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl;
-#endif
-        }
-        else if(type == "LRN") {
-            int normalization_size;
-            float alpha, beta, k;
-            std::string norm_region;
-            std::stringstream ss(params);
-            ss >> normalization_size >> alpha >> beta >> norm_region >> k;
-            std::string lrnType;
-            if(norm_region == "1") lrnType = "VX_NN_NORMALIZATION_SAME_MAP";
-            else lrnType = "VX_NN_NORMALIZATION_ACROSS_MAPS";
-            ofsGDF << "data " << node[3] << "_mode = " << " scalar:VX_TYPE_ENUM," << lrnType << std::endl;
-            ofsGDF << "data " << node[3] << "_size = " << " scalar:VX_TYPE_SIZE," << normalization_size << std::endl;
-            ofsGDF << "data " << node[3] << "_alpha =" << " scalar:VX_TYPE_FLOAT32," << alpha << std::endl;
-            ofsGDF << "data " << node[3] << "_beta ="  << " scalar:VX_TYPE_FLOAT32," << beta << std::endl;
-            ofsGDF << "data " << node[3] << "_bias ="  << " scalar:VX_TYPE_FLOAT32," << k << std::endl;
-            ofsGDF << "node org.khronos.nn_extension.normalization_layer " << node[4] << " "
-                   << node[3] << "_mode "
-                   << node[3] << "_size "
-                   << node[3] << "_alpha "
-                   << node[3] << "_beta "
-                   << node[3] << " "
-                   << node[3] << "_bias"
-                   << std::endl;
-#if ENABLE_DUMP_LAYER_DATA
-            ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl;
-#endif
-        }
-        else if(type == "BatchNorm") {
-            int use_global_stats, bias_term;
-            float eps;
-            std::stringstream ss(params);
-            ss >> eps >> use_global_stats;
-            std::string weights = output + "_W";
-            auto&& dim = tensorMap[weights];
-            ofsGDF << "data " << weights << " = tensor:1,{" << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-            ofsGDF << "init " << weights << " weights/" << layer_name << ".f32" << std::endl;
-#if ENABLE_DIRECTIVE
-            ofsGDF << "directive " << weights << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl;
-#endif
-            tensorCheck[weights] = true;
-            std::string bias = output + "_B";
-            dim = tensorMap[bias];
-            ofsGDF << "data " << bias << " = tensor:1,{" << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-            ofsGDF << "init " << bias << " bias/" << layer_name << ".f32" << std::endl;
-#if ENABLE_DIRECTIVE
-            ofsGDF << "directive " << bias << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl;
-#endif
-            tensorCheck[bias] = true;
-            bias = "NULL";
-            if (bfuse_scale_layer) {
-                // check next node. If scale extract weight and bias paramters for scale layer.
-                auto& next_node = *std::next(&node);
-                auto&& next_output = next_node[3];
-                auto&& nn_params = next_node[1];
-                std::string nn_layer_name = getIdentifierName(next_node[3]);
-                weights = next_output + "_W";
-                std::stringstream ss(nn_params);
-                ss >> bias_term;
-                dim = tensorMap[weights];
-                ofsGDF << "data " << weights << " = tensor:1,{" << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                ofsGDF << "init " << weights << " weights/" << nn_layer_name << ".f32" << std::endl;
-                tensorCheck[weights] = true;
-                if(bias_term) {
-                    bias = next_output + "_B";
-                    ofsGDF << "data " << bias << " = tensor:1,{" << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                    ofsGDF << "init " << bias << " bias/"<< nn_layer_name << ".f32" << std::endl;
-#if ENABLE_DIRECTIVE
-                    ofsGDF << "directive " << bias << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl;
-#endif
-                    tensorCheck[bias] = true;
-                }
-                ofsGDF << "data " << node[3] << "_eps ="  << " scalar:VX_TYPE_FLOAT32," << eps << std::endl;
-                ofsGDF << "node com.amd.nn_extension.batch_normalization_layer " << node[4] << " " << node[3] << "_W "
-                       << node[3] << "_B "
-                       << weights << " "
-                       << bias << " "
-                       << node[3] << "_eps "
-                       << next_node[3]
-                       << std::endl;
-#if ENABLE_DUMP_LAYER_DATA
-            ofsGDF << "write "<< next_node[3] << " out/"<< layer_name << ".f32" << std::endl;
-#endif
-            }
-            else {
-                weights = output +"_W1";
-                ofsGDF << "data " << weights << " = tensor:1,{" << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                // put default scale and bias term
-                std::vector<float> scale_arr(dim[0]);
-                std::fill(scale_arr.begin(), scale_arr.end(), 1.0);
-                std::string fileName_weights = outputFolder + "/scale_init.f32";
-                FILE *fp = fopen(fileName_weights.c_str(), "wb");
-                if (fp) {
-                    fwrite(scale_arr.data(), sizeof(float), dim[0], fp);
-                    fclose(fp);
-                }
-                ofsGDF << "init " << weights << " scale_init.f32" << std::endl;
-                ofsGDF << "data " << node[3] << "_eps ="  << " scalar:VX_TYPE_FLOAT32," << eps << std::endl;
-                ofsGDF << "node com.amd.nn_extension.batch_normalization_layer " << node[4] << " " << node[3] << "_W "
-                       << node[3] << "_B "
-                       << weights << " "
-                       << bias << " "
-                       << node[3] << "_eps "
-                       << output
-                       << std::endl;
-#if ENABLE_DUMP_LAYER_DATA
-            ofsGDF << "write "<< output << " out/"<< layer_name << ".f32" << std::endl;
-#endif
-            }
-        }
-        else if(type == "Eltwise") {
-            int op;
-            std::stringstream ss(params);
-            ss >> op;
-            auto&& dim = tensorMap[node[3]];
-            for(int i = 4; i < node.size(); i++) {
-                auto&& idim = tensorMap[node[i]];
-                if(dim[0] != idim[0] || dim[1] != idim[1] || dim[2] != idim[2] || dim[3] != idim[3])
-                    error("writeGDF: Eltwise op=%d requires same dimension inputs: %s[%dx%dx%dx%d] != %s[%dx%dx%dx%d]\n", op, node[i].c_str(), idim[0], idim[1], idim[2], idim[3], node[i-1].c_str(), dim[0], dim[1], dim[2], dim[3]);
-                dim = idim;
-            }
-            std::string tmp = node[4];
-            for(int i = 5; i < node.size(); i++) {
-                std::string out = node[3];
-                if(i < node.size()-1) {
-                    out += "tmp_" + std::to_string(i-4);
-                    ofsGDF << "data " << out << " = tensor:4,{" << dim[3] << "," << dim[2] << "," << dim[1] << "," << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                    tensorCheck[out] = true;
-                }
-                if(op == 1) {
-                    ofsGDF << "data " << node[3] <<"_convertPolicy =" << " scalar:VX_TYPE_ENUM," << convertPolicy << std::endl;
-                    ofsGDF << "node org.khronos.openvx.tensor_add " << tmp << " " << node[i] << " "
-                           << node[3] << "_convertPolicy"
-                           << " " << out
-                           << std::endl;
-                    tmp = out;
-#if ENABLE_DUMP_LAYER_DATA
-                    ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl;
-#endif
-                }
-                else error("writeGDF: Eltwise op=%d not supported\n", op);
-            }
-        }
-        else if(type == "Scale") {
-            int bias_term;
-            auto&& type = node[0];
-            auto&& params = node[1];
-            std::string layer_name = getIdentifierName(node[3]);
-            std::string weights = output + "_W";
-            std::stringstream ss(params); ss >> bias_term;
-            auto&& dim = tensorMap[weights];
-            ofsGDF << "data " << weights << " = tensor:1,{" << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-            ofsGDF << "init " << weights << " weights/" << layer_name << ".f32" << std::endl;
-            tensorCheck[weights] = true;
-#if ENABLE_DIRECTIVE
-            ofsGDF << "directive " << weights << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl;
-#endif
-            std::string bias = "NULL";
-            if(bias_term) {
-                bias = output + "_B ";
-                ofsGDF << "data " << bias << " = tensor:1,{" << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl;
-                ofsGDF << "init " << bias << " bias/"<< layer_name << ".f32" << std::endl;
-#if ENABLE_DIRECTIVE
-                ofsGDF << "directive " << bias << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl;
-#endif
-                tensorCheck[bias] = true;
-            }
-
-            ofsGDF << "node com.amd.nn_extension.scale_layer " << node[4] << " "
-                   << node[3] << "_W "
-                   << node[3] << "_B "
-                   << node[3]
-                   << std::endl;
-#if ENABLE_DUMP_LAYER_DATA
-            ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl;
-#endif
-        }
-        else if(type == "Concat") {
-            ofsGDF << "node com.amd.nn_extension.concat_layer";
-            ofsGDF << " " << node[3];
-            for(int i = 4; i < node.size(); i++) {
-                ofsGDF << " " << node[i];
-            }
-            ofsGDF << std::endl;
-#if ENABLE_DUMP_LAYER_DATA
-            ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl;
-#endif
-        }
-        else if(type == "Dropout") {
-            //during inference dropout layer copies its input to output.
-            ofsGDF << "node org.khronos.openvx.copy " << node[4] << " " << node[3] << std::endl;
-#if ENABLE_DUMP_LAYER_DATA
-            ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl;
-#endif
-        }
-        else if(type == "Softmax") {
-            ofsGDF << "node org.khronos.nn_extension.softmax_layer " << node[4]
-                   << " " << node[3]
-                   << std::endl;
-#if ENABLE_DUMP_LAYER_DATA
-            ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl;
-#endif
-        }
-        else if(type == "Split") {
-            ofsGDF << "node org.khronos.openvx.copy " << node[4] << " " << node[3] << std::endl;
-#if ENABLE_DUMP_LAYER_DATA
-            ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl;
-#endif
-        }
-        else if(type == "SoftmaxWithLoss") {
-            ofsGDF << "node org.khronos.nn_extension.softmax_layer " << node[4]
-                   << " " << node[5]
-                   << std::endl;
-#if ENABLE_DUMP_LAYER_DATA
-            ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl;
-#endif
-        }
-        else {
-            ofsGDF << "# "
-                   << std::left << std::setw(16) << node[0]
-                   << std::left << std::setw(24) << node[1]
-                   << std::left << std::setw(32) << node[3]
-                      ;
-            for(size_t i = 4; i < node.size(); i++)
-                ofsGDF << std::left << std::setw(32) << node[i];
-            ofsGDF << std::endl;
-        }
-        if(isLastLayer) {
-            ofsGDF << "write " << node[3] << " output.f32" << std::endl;
-            auto&& odim = tensorMap[node[3]];
-            printf("#OUTPUT-TENSOR: %s %d %d %d %d\n", node[3].c_str(), odim[0], odim[1], odim[2], odim[3]);
-        }
-        ofsGDF << std::endl;
-    }
-}
-
-void dumpLayerData(const caffe::LayerParameter& layer_parameter, std::string outputFolder)
-{
-    std:: string layer_name;
-    if(layer_parameter.has_name()) {
-        layer_name = getIdentifierName(layer_parameter.name());
-    }
-
-    std::string fileName_weights = outputFolder + "/weights/" + layer_name + ".f32";
-    std::string fileName_bias = outputFolder + "/bias/" + layer_name + ".f32";
-    FILE * fs_weights;
-    FILE * fs_bias;
-    fs_weights = fopen(fileName_weights.c_str(), "wb");
-    fs_bias    = fopen(fileName_bias.c_str(),"wb");
-    if(!fs_weights || !fs_bias) {
-        printf("ERROR: unable to create dump files: make sure weights and bias folders are writable.\n");
-        exit(1);
-    }
-    int blob_size = layer_parameter.blobs_size();
-    if(blob_size > 0) {
-        //Extracting the weights.
-        const caffe::BlobProto& weights_blob = layer_parameter.blobs(0);
-        int weightsize = weights_blob.data_size();
-
-        for(int i=0;i<weightsize;i++) {
-            float weight = weights_blob.data(i);
-            fwrite(&weight,sizeof(float),1,fs_weights);
-        }
-        //Extraction of bias if exists.
-        if(blob_size >= 2) {
-            //Extraction of Bias.
-            const caffe::BlobProto bias_blob = layer_parameter.blobs(1);
-            int biassize = bias_blob.data_size();
-
-            for(int i=0; i < biassize; i++) {
-                float bias = bias_blob.data(i);
-                fwrite(&bias,sizeof(float),1,fs_bias);
-            }
-        }
-    }
-
-    fclose(fs_weights);
-    fclose(fs_bias);
-}
-
-void dumpV1LayerData(const caffe::V1LayerParameter& layer_parameter, std::string outputFolder)
-{
-    std:: string layer_name;
-    if(layer_parameter.has_name()) {
-        layer_name = getIdentifierName(layer_parameter.name());
-    }
-
-    if(layer_parameter.type() == caffe::V1LayerParameter_LayerType_CONVOLUTION)
-    {
-        const caffe::ConvolutionParameter& conv = layer_parameter.convolution_param();
-        int num_groups = conv.has_group() ? conv.group() : 0;
-        if(num_groups > 1)
-        {
-            int blob_size = layer_parameter.blobs_size();
-            const caffe::BlobProto& weights_blob = layer_parameter.blobs(0);
-            int weightsize_per_grp = weights_blob.data_size() / num_groups;
-            int biassize_per_grp = (blob_size >= 2) ? layer_parameter.blobs(1).data_size() / num_groups : 0;
-
-            for(int grp = 0; grp < num_groups; grp++)
-            {
-                std::stringstream fileName_weights;
-                fileName_weights << outputFolder << "/weights/" << layer_name << "_grp" << grp << ".f32";
-                std::stringstream fileName_bias;
-                fileName_bias << outputFolder << "/bias/" << layer_name << "_grp" << grp << ".f32";
-
-                FILE * fs_weights = fopen(fileName_weights.str().c_str(), "wb");
-                FILE * fs_bias    = fopen(fileName_bias.str().c_str(),"wb");
-                if(!fs_weights || !fs_bias) {
-                    printf("ERROR: unable to create dump files: make sure weights and bias folders are writable.\n");
-                    exit(1);
-                }
-
-                // Write weights
-                for(int i = weightsize_per_grp * grp; i < (weightsize_per_grp * (grp + 1)); i++) {
-                    float weight = weights_blob.data(i);
-                    fwrite(&weight, sizeof(float), 1, fs_weights);
-                }
-
-                if(blob_size >= 2) {
-                    // Write bias
-                    const caffe::BlobProto bias_blob = layer_parameter.blobs(1);
-                    for(int i = biassize_per_grp * grp; i < (biassize_per_grp * (grp + 1)); i++) {
-                        float bias = bias_blob.data(i);
-                        fwrite(&bias,sizeof(float),1,fs_bias);
-                    }
-                }
-            }
-            return;
-        }
-    }
-
-    std::string fileName_weights = outputFolder + "/weights/" + layer_name + ".f32";
-    std::string fileName_bias = outputFolder + "/bias/" + layer_name + ".f32";
-    FILE * fs_weights;
-    FILE * fs_bias;
-    fs_weights = fopen(fileName_weights.c_str(), "wb");
-    fs_bias    = fopen(fileName_bias.c_str(),"wb");
-    if(!fs_weights || !fs_bias) {
-        printf("ERROR: unable to create dump files: make sure weights and bias folders are writable.\n");
-        exit(1);
-    }
-    int blob_size = layer_parameter.blobs_size();
-    if(blob_size > 0) {
-        //Extracting the weights.
-        const caffe::BlobProto& weights_blob = layer_parameter.blobs(0);
-        int weightsize = weights_blob.data_size();
-
-        for(int i=0;i<weightsize;i++) {
-            float weight = weights_blob.data(i);
-            fwrite(&weight,sizeof(float),1,fs_weights);
-        }
-        //Extraction of bias if exists.
-        if(blob_size >= 2) {
-            //Extraction of Bias.
-            const caffe::BlobProto bias_blob = layer_parameter.blobs(1);
-            int biassize = bias_blob.data_size();
-
-            for(int i=0; i < biassize; i++) {
-                float bias = bias_blob.data(i);
-                fwrite(&bias,sizeof(float),1,fs_bias);
-            }
-        }
-    }
-
-    fclose(fs_weights);
-    fclose(fs_bias);
-}
-
-void writeVXCode(
-    std::ostream& ofsCodeC,
-    std::vector<std::vector<std::string>>& net,
-    std::map<std::string,std::vector<int>>& tensorMap,
-    std::string tensorType,
-    int fixedPosition,
-    std::string convertPolicy,
-    std::string roundPolicy,
-    bool isVirtualEnabled,
-    bool bFuseScaleLayer,
-    std::string outputFolder,
-    std::string codeType)
-{
-    auto&& inputTensorName = net[0][4];
-    auto&& outputTensorName = net[net.size()-1][3];
-
-    bool bfuse_scale_layer = bFuseScaleLayer;
-    std::map<std::string,bool> declare_tensor_check;
-    for(auto& node : net) {
-        //declare input tensors.
-        bool isFirstLayer = (&node == &net.front());
-        bool isLastLayer = (&node == &net.back());
-
-        std::string layerName = getIdentifierName(node[3]);
-        std::string inputName = getIdentifierName(node[4]);
-        if(codeType == "initialize") {
-            ofsCodeC << "    // " << layerName <<" Layer" << std::endl;
-        }
-        for(size_t i=4; i < node.size(); i++) {
-            if(node[i] != "" && declare_tensor_check.find(node[i]) == declare_tensor_check.end()) {
-                auto&& dim = tensorMap[node[i]];
-                if(codeType == "initialize") {
-                    if(node[i] != inputTensorName && node[i] != outputTensorName) {
-                        ofsCodeC << "    vx_size " << node[i] << "_dims[4] = { " << dim[3] << ", " << dim[2] << ", " << dim[1] << ", " << dim[0] << " };" << std::endl;
-                        ofsCodeC << "    vx_tensor " << node[i] << ";" << std::endl;
-                        ofsCodeC << "    " << node[i] << " = vxCreateTensor(context, 4, " << node[i] + "_dims,"<< tensorType <<", " << fixedPosition << ");" << std::endl;
-                        ofsCodeC << "    " << "ERROR_CHECK_OBJECT("  << node[i] << ");" << std::endl;
-                    }
-                }
-                else if(codeType == "release") {
-                    if(node[i] != inputTensorName && node[i] != outputTensorName) {
-                        ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << node[i] << "));" << std::endl;
-                    }
-                }
-                declare_tensor_check[node[i]]= true;
-            }
-        }
-
-        if (node[0] == "BatchNorm" && !isLastLayer && bfuse_scale_layer) {
-            auto&& output = node[3];
-            auto& next_node = *std::next(&node);
-            if (next_node[0] == "Scale") {
-                auto&& next_output = next_node[3];
-                std::string nextOutput = getIdentifierName(next_node[3]);
-                auto&& odim = tensorMap[next_output];
-                if(!declare_tensor_check[next_output]) {
-                    if((codeType == "initialize") && nextOutput != outputTensorName) {
-                        ofsCodeC << "    vx_size " << nextOutput << "_dims[4] = { " << odim[3] << ", " << odim[2] << ", " << odim[1] << ", " << odim[0] << " };" << std::endl;
-                        ofsCodeC << "    vx_tensor " << nextOutput << ";" << std::endl;
-                        if(isVirtualEnabled){
-                            ofsCodeC << "    " << nextOutput << " = vxCreateVirtualTensor(graph,4, " << nextOutput + "_dims, VX_TYPE_FLOAT32," << fixedPosition << ");" << std::endl;
-                        }
-                        else{
-                            ofsCodeC << "    " << nextOutput << " = vxCreateTensor(context,4, " << nextOutput + "_dims, VX_TYPE_FLOAT32," << fixedPosition << ");" << std::endl;
-                        }
-                        ofsCodeC << "    " << "ERROR_CHECK_OBJECT("  << nextOutput << ");" << std::endl;
-                    }
-                    else if((codeType == "release") && nextOutput != outputTensorName) {
-                        ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << nextOutput << "));" << std::endl;
-                    }
-                    declare_tensor_check[output] = true;
-                }
-                declare_tensor_check[next_output] = true;
-                bfuse_scale_layer = true;
-            }
-        }
-        if (node[0] == "Scale" && !isFirstLayer && bfuse_scale_layer) {
-            auto& prev_node = *std::prev(&node);
-            if (prev_node[0]=="BatchNorm"){
-                if(codeType == "initialize")  {
-                    ofsCodeC << "    // [NOTE -- Scale Layer Fused With Batch Norm Layer]" << std::endl<< std::endl;
-                }
-                continue;
-            }
-        }
-
-        // declare output tensor.
-        auto&& output = node[3];
-        auto&& odim = tensorMap[output];
-        if(!declare_tensor_check[output]) {
-            if(codeType == "initialize") {
-                if(layerName != outputTensorName) {
-                    ofsCodeC << "    vx_size " << layerName << "_dims[4] = { " << odim[3] << ", " << odim[2] << ", " << odim[1] << ", " << odim[0] << " };" << std::endl;
-                    ofsCodeC << "    vx_tensor " << layerName << ";" << std::endl;
-                    if(isVirtualEnabled){
-                        ofsCodeC << "    " << layerName << " = vxCreateVirtualTensor(graph,4, " << layerName + "_dims, VX_TYPE_FLOAT32," << fixedPosition << ");" << std::endl;
-                    }
-                    else{
-                        ofsCodeC << "    " << layerName << " = vxCreateTensor(context,4, " << layerName + "_dims, VX_TYPE_FLOAT32," << fixedPosition << ");" << std::endl;
-                    }
-                    ofsCodeC << "    " << "ERROR_CHECK_OBJECT("  << layerName << ");" << std::endl;
-                }
-            }
-            else if(codeType == "release") {
-                if(layerName != outputTensorName) {
-                    ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << layerName << "));" << std::endl;
-                }
-            }
-            declare_tensor_check[output] = true;
-        }
-
-        auto&& type = node[0];
-        auto&& params = node[1];
-        if(type == "Convolution") {
-            std::stringstream ss(params);
-            int k, kernel_w, kernel_h, stride_w, stride_h, pad_w, pad_h, dilation_w, dilation_h, bias_term, group;
-            ss >> k >> kernel_w >> kernel_h >> stride_w >> stride_h >> pad_w >> pad_h >> dilation_w >> dilation_h >> bias_term >> group;
-            if(group > 1) {
-                auto&& idim = tensorMap[inputName];
-
-                if(codeType == "initialize") {
-                    ofsCodeC << "    vx_size " << inputName << "_grp_dims[4] = { " << idim[3] << ", " << idim[2] << ", " << idim[1]/group << ", " << idim[0] << " };" << std::endl;
-                    ofsCodeC << "    vx_size " << layerName << "_grp_dims[4] = { " << odim[3] << ", " << odim[2] << ", " << odim[1]/group << ", " << odim[0] << " };" << std::endl;
-                    for(int g = 0; g < group; g++) {
-                        // Input tensor for the group-g conv
-                        ofsCodeC << "    vx_tensor " << inputName << "_grp" << g << ";" << std::endl;
-                        if(isVirtualEnabled){
-                            ofsCodeC << "    " << inputName << "_grp" << g << " = vxCreateVirtualTensor(graph,4, " << inputName << "_grp_dims, VX_TYPE_FLOAT32," << fixedPosition << ");" << std::endl;
-                        }
-                        else{
-                            ofsCodeC << "    " << inputName << "_grp" << g << " = vxCreateTensor(context,4, " << inputName << "_grp_dims, VX_TYPE_FLOAT32," << fixedPosition << ");" << std::endl;
-                        }
-                        ofsCodeC << "    " << "ERROR_CHECK_OBJECT("  << inputName << "_grp" << g << ");" << std::endl;
-
-                        // Output tensor for the group-g conv
-                        ofsCodeC << "    vx_tensor " << layerName << "_grp" << g << ";" << std::endl;
-                        if(isVirtualEnabled){
-                            ofsCodeC << "    " << layerName << "_grp" << g << " = vxCreateVirtualTensor(graph,4, " << layerName << "_grp_dims, VX_TYPE_FLOAT32," << fixedPosition << ");" << std::endl;
-                        }
-                        else{
-                            ofsCodeC << "    " << layerName << "_grp" << g << " = vxCreateTensor(context,4, " << layerName << "_grp_dims, VX_TYPE_FLOAT32," << fixedPosition << ");" << std::endl;
-                        }
-                        ofsCodeC << "    " << "ERROR_CHECK_OBJECT("  << layerName << "_grp" << g << ");" << std::endl;
-
-                    }
-
-                    // Slice conv input
-                    ofsCodeC << "    vx_node " << inputName << "_grp_slice_node;" << std::endl;
-                    ofsCodeC << "    " <<  inputName << "_grp_slice_node = " << "vxSliceLayer(graph, ";
-                    ofsCodeC << inputName;
-                    for(int g = 0; g < 8; g++) {
-                        if(g < group)
-                            ofsCodeC << ", " << inputName << "_grp" << g;
-                        else
-                            ofsCodeC << ", NULL";
-                    }
-                    ofsCodeC << ");" << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" << inputName << "_grp_slice_node);" << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << inputName << "_grp_slice_node));" << std::endl;
-
-                    // Concat conv output
-                    ofsCodeC << "    vx_node " << layerName << "_grp_concat_node;" << std::endl;
-                    ofsCodeC << "    " <<  layerName << "_grp_concat_node = " << "vxConcatLayer(graph, ";
-                    ofsCodeC << layerName;
-                    for(int g = 0; g < 8; g++) {
-                        if(g < group)
-                            ofsCodeC << ", " << layerName << "_grp" << g;
-                        else
-                            ofsCodeC << ", NULL";
-                    }
-                    ofsCodeC << ");" << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" << layerName << "_grp_concat_node);" << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName << "_grp_concat_node));" << std::endl;
-                }
-                else if(codeType == "release") {
-                    for(int g = 0; g < group; g++) {
-                        ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << inputName << "_grp" << g << "));" << std::endl;
-                        ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << layerName << "_grp" << g << "));" << std::endl;
-                    }
-                }
-
-                auto&& dim = tensorMap[output + "_W"];
-                if(codeType == "initialize") {
-                    ofsCodeC << "    vx_size " << layerName << "_W" << "_dims[4] = { " << dim[3] << ", " << dim[2] << ", " << dim[1]/group << ", " << dim[0]/group << " };" << std::endl;
-                    for(int g = 0; g < group; g++) {
-                        ofsCodeC << "    vx_tensor " << layerName << "_grp" << g << "_W" << ";" << std::endl;
-                        ofsCodeC << "    " << layerName << "_grp" << g << "_W" << " = vxCreateTensor(context,4, " << layerName << "_W" << "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl;
-                        ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" << layerName << "_grp" << g << "_W" << "); " << std::endl;
-                        ofsCodeC << "    " << "ERROR_CHECK_STATUS(copyTensor(" << layerName << "_grp" << g << "_W" << ", dataFolder + \"/weights/" << layerName << "_grp" << g << ".f32\"));" << std::endl;
-                    }
-                }
-                else if(codeType == "release") {
-                    for(int g = 0; g < group; g++) {
-                        ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << layerName << "_grp" << g << "_W" << "));" << std::endl;
-                    }
-                }
-                declare_tensor_check[output + "_W"] = true;
-                if(bias_term) {
-                    if(codeType == "initialize") {
-                        ofsCodeC << "    vx_size " << layerName << "_B" << "_dims[1] = { " << k/group << " };" << std::endl;
-                        for(int g = 0; g < group; g++) {
-                            ofsCodeC << "    vx_tensor " << layerName << "_grp" << g << "_B" << ";" << std::endl;
-                            ofsCodeC << "    " << layerName << "_grp" << g << "_B" << " = vxCreateTensor(context,1, " << layerName << "_B"  "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl;
-                            ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" << layerName << "_grp" << g << "_B" << "); " << std::endl;
-                            ofsCodeC << "    " << "ERROR_CHECK_STATUS(copyTensor(" << layerName << "_grp" << g << "_B" << ", dataFolder + \"/bias/" << layerName << "_grp" << g << ".f32\"));" << std::endl;
-                        }
-                    }
-                    else if(codeType == "release") {
-                        for(int g = 0; g < group; g++) {
-                            ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << layerName << "_grp" << g << "_B" << "));" << std::endl;
-                        }
-                    }
-                    declare_tensor_check[layerName + "_B"] = true;
-                }
-
-                if(codeType == "initialize") {
-                    ofsCodeC << "    vx_nn_convolution_params_t " << layerName << "_params;" << std::endl;
-                    ofsCodeC << "    " << layerName + "_params.padding_x = " << pad_w << ";" << std::endl;
-                    ofsCodeC << "    " << layerName + "_params.padding_y = " << pad_h << ";" << std::endl;
-                    ofsCodeC << "    " << layerName + "_params.overflow_policy = " << convertPolicy << ";" << std::endl;
-                    ofsCodeC << "    " << layerName + "_params.rounding_policy = " << roundPolicy << ";" << std::endl;
-                    ofsCodeC << "    " << layerName + "_params.down_scale_size_rounding = " << "VX_NN_DS_SIZE_ROUNDING_FLOOR ;" << std::endl;
-                    ofsCodeC << "    " << layerName + "_params.dilation_x = " << dilation_w - 1 << ";" << std::endl;
-                    ofsCodeC << "    " << layerName + "_params.dilation_y = " << dilation_h - 1 << ";" << std::endl;
-
-                    for(int g = 0; g < group; g++) {
-                        ofsCodeC << "    vx_node " << layerName << "_grp" << g << "_node;" << std::endl;
-                        ofsCodeC << "    " << layerName << "_grp" << g << "_node = " << "vxConvolutionLayer(graph, ";
-                        ofsCodeC << inputName << "_grp" << g << ", ";
-                        ofsCodeC << layerName << "_grp" << g << "_W, ";
-                        if(bias_term)
-                            ofsCodeC << layerName << "_grp" << g << "_B, ";
-                        else
-                            ofsCodeC << "NULL, ";
-                        ofsCodeC << "&" << layerName + "_params, " << "sizeof(" << layerName + "_params ), ";
-                        ofsCodeC << layerName << "_grp" << g << ");" << std::endl;
-                        ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" << layerName << "_grp" << g << "_node);" << std::endl;
-                        ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName << "_grp" << g << "_node));" << std::endl;
-                    }
-                }
-            }
-            else {
-                std::string weights = layerName + "_W";
-                std::string dim_weights = output + "_W";
-                auto&& dim = tensorMap[dim_weights];
-                if(codeType == "initialize") {
-                    ofsCodeC << "    vx_size " << weights << "_dims[4] = { " << dim[3] << ", " << dim[2] << ", " << dim[1] << ", " << dim[0] << " };" << std::endl;
-                    ofsCodeC << "    vx_tensor " << weights << ";" << std::endl;
-                    ofsCodeC << "    " << weights << " = vxCreateTensor(context,4, " << weights + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" << weights << "); " << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_STATUS(copyTensor(" << weights << ", dataFolder + \"/weights/" + layerName + ".f32\"));" << std::endl;
-                }
-                else if(codeType == "release") {
-                    ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << weights << "));" << std::endl;
-                }
-                declare_tensor_check[weights] = true;
-                std::string bias = "NULL";
-                if(bias_term) {
-                    bias = layerName + "_B";
-                    if(codeType == "initialize") {
-                        ofsCodeC << "    vx_size " << bias << "_dims[1] = { " << k << " };" << std::endl;
-                        ofsCodeC << "    vx_tensor " << bias << ";" << std::endl;
-                        ofsCodeC << "    " << bias << " = vxCreateTensor(context,1, " << bias + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl;
-                        ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" << bias << "); " << std::endl;
-                        ofsCodeC << "    " << "ERROR_CHECK_STATUS(copyTensor(" << bias << ", dataFolder + \"/bias/" + layerName + ".f32\"));" << std::endl;
-                    }
-                    else if(codeType == "release") {
-                        ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << bias << "));" << std::endl;
-                    }
-                    declare_tensor_check[bias] = true;
-                }
-                if(codeType == "initialize") {
-                    ofsCodeC << "    vx_nn_convolution_params_t " << layerName << "_params;" << std::endl;
-                    ofsCodeC << "    " << layerName + "_params.padding_x = " << pad_w << ";" << std::endl;
-                    ofsCodeC << "    " << layerName + "_params.padding_y = " << pad_h << ";" << std::endl;
-                    ofsCodeC << "    " << layerName + "_params.overflow_policy = " << convertPolicy << ";" << std::endl;
-                    ofsCodeC << "    " << layerName + "_params.rounding_policy = " << roundPolicy << ";" << std::endl;
-                    ofsCodeC << "    " << layerName + "_params.down_scale_size_rounding = " << "VX_NN_DS_SIZE_ROUNDING_FLOOR ;" << std::endl;
-                    ofsCodeC << "    " << layerName + "_params.dilation_x = " << dilation_w - 1 << ";" << std::endl;
-                    ofsCodeC << "    " << layerName + "_params.dilation_y = " << dilation_h - 1 << ";" << std::endl;
-                    ofsCodeC << "    vx_node " << layerName << "_node;" << std::endl;
-                    ofsCodeC << "    " << layerName + "_node = " << "vxConvolutionLayer(graph, " << inputName << ", " << weights << ", " << bias << ", &" << layerName + "_params, " << "sizeof(" << layerName + "_params ), " << layerName << ");" << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl;
-                }
-            }
-        }
-        else if(type == "Deconvolution") {
-            std::stringstream ss(params);
-            int k, kernel_w, kernel_h, stride_w, stride_h, pad_w, pad_h, dilation_w, dilation_h, bias_term;
-            ss >> k >> kernel_w >> kernel_h >> stride_w >> stride_h >> pad_w >> pad_h >> dilation_w >> dilation_h >> bias_term;
-            std::string weights = layerName + "_W";
-            std::string dim_weights = output + "_W";
-            auto&& dim = tensorMap[dim_weights];
-            if(codeType == "initialize") {
-                ofsCodeC << "    vx_size " << weights << "_dims[4] = { " << dim[3] << ", " << dim[2] << ", " << dim[1] << ", " << dim[0] << " };" << std::endl;
-                ofsCodeC << "    vx_tensor " << weights << ";" << std::endl;
-                ofsCodeC << "    " << weights + "= vxCreateTensor(context,4, " << weights + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" << weights << "); " << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(copyTensor(" << weights << ", dataFolder + \"/weights/" + layerName + ".f32\"));" << std::endl;
-            }
-            else if(codeType == "release") {
-                ofsCodeC << "    " << "vxReleaseTensor(&" << weights << " );" << std::endl;
-            }
-            declare_tensor_check[weights] = true;
-            std::string bias = "NULL";
-            if(bias_term) {
-                bias = layerName + "_B";
-                if(codeType == "initialize") {
-                    ofsCodeC << "    vx_size " << bias << "_dims[1] = { " << k << " };" << std::endl;
-                    ofsCodeC << "    vx_tensor " << bias << ";" << std::endl;
-                    ofsCodeC << "    " << bias + " = vxCreateTensor(context,1, " << bias + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" << bias << "); " << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_STATUS(copyTensor(" << bias << ", dataFolder + \"/bias/" + layerName + ".f32\"));" << std::endl;
-                }
-                else if(codeType == "release") {
-                    ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << bias << "));" << std::endl;
-                }
-                declare_tensor_check[bias] = true;
-            }
-            if(codeType == "initialize") {
-                ofsCodeC << "    vx_nn_deconvolution_params_t " << layerName << "_params;" << std::endl;
-                ofsCodeC << "    " << layerName + "_params.padding_x = " << pad_w << ";" << std::endl;
-                ofsCodeC << "    " << layerName + "_params.padding_y = " << pad_h << ";" << std::endl;
-                ofsCodeC << "    " << layerName + "_params.overflow_policy = " << convertPolicy << ";" << std::endl;
-                ofsCodeC << "    " << layerName + "_params.rounding_policy = " << roundPolicy << ";" << std::endl;
-                ofsCodeC << "    " << layerName + "_params.a_x = " << dilation_w - 1 << ";" << std::endl;
-                ofsCodeC << "    " << layerName + "_params.a_y = " << dilation_h - 1 << ";" << std::endl;
-                ofsCodeC << "    vx_node " << layerName << "_node;" << std::endl;
-                ofsCodeC << "    " << layerName + "_node = " << " vxDeconvolutionLayer(graph, " << inputName << ", " << weights << ", " << bias << ", &" << layerName + "_params, sizeof(" + layerName + "_params), " << layerName << ");" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl;
-            }
-        }
-        else if(type == "Pooling") {
-            std::stringstream ss(params);
-            int kernel_w, kernel_h, stride_w, stride_h, pad_w, pad_h, pool;
-            ss >> kernel_w >> kernel_h >> stride_w >> stride_h >> pad_w >> pad_h >> pool;
-            if((pool != 0 && pool != 1)) error("writeGDF: pooling_layer supports only MAX and AVG\n");
-            if(codeType == "initialize") {
-                ofsCodeC << "    vx_enum " << layerName << "_type = " << (pool == 0 ? "VX_NN_POOLING_MAX" : "VX_NN_POOLING_AVG") << ";" << std::endl;
-                ofsCodeC << "    vx_size " << layerName << "_kernel_w = " << kernel_w << ";" << std::endl;
-                ofsCodeC << "    vx_size " << layerName << "_kernel_h = " << kernel_h << ";" << std::endl;
-                ofsCodeC << "    vx_size " << layerName << "_pad_w = " << pad_w << ";" << std::endl;
-                ofsCodeC << "    vx_size " << layerName << "_pad_h = " << pad_h << ";" << std::endl;
-                ofsCodeC << "    vx_enum " << layerName << "_roundPolicy = " << roundPolicy << ";" << std::endl;
-                ofsCodeC << "    vx_node " << layerName << "_node;" << std::endl;
-                ofsCodeC << "    " << layerName + "_node = " << "vxPoolingLayer(graph, " << inputName << ", " << layerName + "_type" << ", " << layerName + "_kernel_w, " << layerName + "_kernel_h, "
-                                   << layerName + "_pad_w, " << layerName + "_pad_h, " << layerName + "_roundPolicy, " << layerName << " );" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl;
-            }
-        }
-        else if(type == "InnerProduct") {
-            std::stringstream ss(params);
-            int k,bias_term;
-            ss >> k >> bias_term;
-            std::string weights = layerName + "_W";
-            std::string dim_weights = output + "_W";
-            auto&& dim = tensorMap[dim_weights];
-            if(codeType == "initialize") {
-                ofsCodeC << "    vx_size " << weights << "_dims[4] = { " << dim[3] << ", " << dim[2] << ", " << dim[1] << ", " << dim[0] << " };" << std::endl;
-                ofsCodeC << "    vx_tensor " << weights << ";" << std::endl;
-                ofsCodeC << "    " << weights << "= vxCreateTensor(context,4," << weights + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" << weights << "); " << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(copyTensor(" << weights << ", dataFolder + \"/weights/" + layerName + ".f32\"));" << std::endl;
-            }
-            else if(codeType == "release") {
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << weights << "));" << std::endl;
-            }
-            declare_tensor_check[weights]= true;
-            std::string bias= "NULL";
-            if(bias_term) {
-                bias = layerName + "_B";
-                if(codeType == "initialize") {
-                    ofsCodeC << "    vx_size " << bias << "_dims[1] = { " << k << " };" << std::endl;
-                    ofsCodeC << "    vx_tensor " << bias << ";" << std::endl;
-                    ofsCodeC << "    " << bias << "= vxCreateTensor(context,1," << bias + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" << bias << "); " << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_STATUS(copyTensor(" << bias << ", dataFolder + \"/bias/" + layerName + ".f32\"));" << std::endl;
-                }
-                else if(codeType == "release") {
-                    ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << bias << "));" << std::endl;
-                }
-                declare_tensor_check[bias]= true;
-            }
-            if(codeType == "initialize") {
-                ofsCodeC << "    vx_enum " << layerName << "_convertPolicy = " << convertPolicy << ";" << std::endl;
-                ofsCodeC << "    vx_enum " << layerName << "_roundPolicy = " << roundPolicy << ";" << std::endl;
-                ofsCodeC << "    vx_node " << layerName << "_node;" << std::endl;
-                ofsCodeC << "    " << layerName + "_node = " << "vxFullyConnectedLayer( graph, " << inputName << ", " << weights << ", " << bias << ", " << layerName + "_convertPolicy, " << layerName + "_roundPolicy, " << layerName + ");" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl;
-            }
-        }
-        else if(type == "ReLU") {
-            std::stringstream ss(params);
-            float neg_slope;
-            ss >> neg_slope;
-            if(codeType == "initialize") {
-                if (!neg_slope) {
-                    ofsCodeC << "    vx_enum " << layerName << "_mode = " << "VX_NN_ACTIVATION_RELU ; " << std::endl;
-                    ofsCodeC << "    vx_float32 " << layerName << "_param_a = 0;" << std::endl;
-                } else
-                {
-                    ofsCodeC << "    vx_enum " << layerName << "_mode = " << "VX_NN_ACTIVATION_LEAKY_RELU ; " << std::endl;
-                    ofsCodeC << "    vx_float32 " << layerName << "_param_a = " << neg_slope << ";" << std::endl;
-                }
-                ofsCodeC << "    vx_float32 " << layerName << "_param_b = 0;" << std::endl;
-                ofsCodeC << "    vx_node " << layerName << "_node;" << std::endl;
-                ofsCodeC << "    " << layerName + "_node = " << "vxActivationLayer(graph, " << inputName << ", " << layerName + "_mode, " << layerName + "_param_a, " << layerName + "_param_b, " << layerName << ");" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl;
-            }
-        }
-        else if(type == "LRN") {
-            int normalization_size; float alpha,beta,k;
-            std::string norm_region;
-            std::stringstream ss(params);
-            ss >> normalization_size >> alpha >> beta >> norm_region >> k;
-            std::string lrnType;
-            lrnType =  (norm_region == "1") ? "VX_NN_NORMALIZATION_SAME_MAP" : "VX_NN_NORMALIZATION_ACROSS_MAPS";
-            if(codeType == "initialize") {
-                ofsCodeC << "    vx_enum " << layerName << "_mode = " << lrnType << ";" << std::endl;
-                ofsCodeC << "    vx_size " << layerName << "_size = "  << normalization_size << ";" << std::endl;
-                ofsCodeC << "    vx_float32 " << layerName << "_alpha = " << alpha << ";" << std::endl;
-                ofsCodeC << "    vx_float32 " << layerName << "_beta = " << beta << ";" << std::endl;
-                ofsCodeC << "    vx_float32 " << layerName << "_bias = " << k << ";" << std::endl;
-                ofsCodeC << "    vx_node " << layerName << "_node;" << std::endl;
-                ofsCodeC << "    " << layerName + "_node = " << "vxNormalizationLayer( graph, " << inputName << ", " << layerName + "_mode, " << layerName + "_size, " << layerName + "_alpha, " << layerName + "_beta, "
-                         << layerName << " );" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl;
-                ofsCodeC << "    if(" << layerName << "_bias != 1) {" << std::endl;
-                ofsCodeC << "        vx_scalar s_bias = vxCreateScalarWithSize(context, VX_TYPE_FLOAT32, &" << layerName << "_bias, sizeof(" << layerName << "_bias));" << std::endl;
-                ofsCodeC << "        ERROR_CHECK_OBJECT(s_bias);" << std::endl;
-                ofsCodeC << "        ERROR_CHECK_STATUS(vxSetParameterByIndex(" << layerName << "_node, 6, (vx_reference) s_bias));" << std::endl;
-                ofsCodeC << "        ERROR_CHECK_STATUS(vxReleaseScalar(&s_bias));" << std::endl;
-                ofsCodeC << "    }" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl;
-            }
-        }
-        else if(type == "BatchNorm") {
-            int use_global_stats;
-            std::stringstream ss(params);
-            float eps;
-            ss >> eps >> use_global_stats;
-            std::string weights = layerName + "_W";
-            std::string dim_weights = output + "_W";
-            auto&& dim = tensorMap[dim_weights];
-            if(codeType == "initialize") {
-                ofsCodeC << "    vx_size " << weights << "_dims[1] = { " << dim[0] << " };" << std::endl;
-                ofsCodeC << "    vx_float32 " << layerName << "_eps = " << eps << ";" << std::endl;
-                ofsCodeC << "    vx_tensor " << weights << ";" << std::endl;
-                ofsCodeC << "    " << weights << " = vxCreateTensor(context,1, " << weights + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" << weights << "); " << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(copyTensor(" << weights << ", dataFolder + \"/weights/" + layerName + ".f32\"));" << std::endl;
-            }
-            else if(codeType == "release") {
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << weights << "));" << std::endl;
-            }
-            declare_tensor_check[weights] = true;
-            std::string bias = layerName + "_B";
-            std::string dim_bias = output + "_B";
-            dim = tensorMap[dim_bias];
-            if(codeType == "initialize") {
-                ofsCodeC << "    vx_size " << bias << "_dims[1] = { " << dim[0] << " };" << std::endl;
-                ofsCodeC << "    vx_tensor " << bias << ";" << std::endl;
-                ofsCodeC << "    " << bias << " = vxCreateTensor(context,1, " << bias + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" << bias << "); " << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(copyTensor(" << bias << ", dataFolder + \"/bias/" + layerName + ".f32\"));" << std::endl;
-            }
-            else if(codeType == "release") {
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << bias << "));" << std::endl;
-            }
-            declare_tensor_check[bias] = true;
-            bias = "NULL";
-
-            if (bfuse_scale_layer) {
-                // check next node. If scale extract weight and bias paramters for scale layer.
-                int bias_term;
-                auto& next_node = *std::next(&node);
-                auto&& next_output = next_node[3];
-                auto&& nn_params = next_node[1];
-                std::string nn_layer_name = getIdentifierName(next_node[3]);
-                weights = nn_layer_name + "_W";
-                std::string dim_weights = next_output + "_W";
-                dim = tensorMap[dim_weights];
-                if(codeType == "initialize") {
-                    ofsCodeC << "    vx_size " << weights << "_dims[1] = { " << dim[0] << " };" << std::endl;
-                    ofsCodeC << "    vx_tensor " << weights << ";" << std::endl;
-                    ofsCodeC << "    " << weights << " = vxCreateTensor(context,1, " << weights + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" << weights << "); " << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_STATUS(copyTensor(" << weights << ", dataFolder + \"/weights/" + nn_layer_name + ".f32\"));" << std::endl;
-                }
-                else if(codeType == "release") {
-                    ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << weights << "));" << std::endl;
-                }
-                declare_tensor_check[weights] = true;
-
-                std::stringstream ss(nn_params);
-                ss >> bias_term;
-                if(bias_term) {
-                    bias = nn_layer_name + "_B";
-                    std::string dim_bias = next_output + "_B";
-                    dim = tensorMap[dim_bias];
-                    if(codeType == "initialize") {
-                        ofsCodeC << "    vx_size " << bias << "_dims[1] = { " << dim[0] << " };" << std::endl;
-                        ofsCodeC << "    vx_tensor " << bias << ";" << std::endl;
-                        ofsCodeC << "    " << bias << " = vxCreateTensor(context,1, " << bias + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl;
-                        ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" << bias << "); " << std::endl;
-                        ofsCodeC << "    " << "ERROR_CHECK_STATUS(copyTensor(" << bias << ", dataFolder + \"/bias/" + nn_layer_name + ".f32\"));" << std::endl;
-                    }
-                    else if(codeType == "release") {
-                        ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << bias << "));" << std::endl;
-                    }
-                    declare_tensor_check[bias] = true;
-                }
-                if(codeType == "initialize") {
-                    ofsCodeC << "    vx_node " << layerName << "_node;" << std::endl;
-                    ofsCodeC << "    " << layerName + "_node = " << "vxBatchNormalizationLayer(graph, "
-                             << inputName +", "
-                             << layerName + "_W, "
-                             << layerName + "_B, "
-                             << weights+", "
-                             << bias+", "
-                             << layerName + "_eps, "
-                             << nn_layer_name << ");" << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl;
-                }
-                else if(codeType == "release") {
-                }
-            }
-            else{
-                // put default scale and bias term
-                std::vector<float> scale_arr(dim[0]);
-                std::fill(scale_arr.begin(), scale_arr.end(), 1.0);
-                std::string fileName_weights = outputFolder + "/weights/scale_init.f32";
-                FILE *fp = fopen(fileName_weights.c_str(), "wb");
-                if (fp) {
-                    fwrite(scale_arr.data(), sizeof(float), dim[0], fp);
-                    fclose(fp);
-                }
-                weights = layerName +"_W1";
-                if(codeType == "initialize") {
-                    ofsCodeC << "    vx_size " << weights << "_dims[1] = { " << dim[0] << " };" << std::endl;
-                    ofsCodeC << "    vx_tensor " << weights << ";" << std::endl;
-                    ofsCodeC << "    " << weights << " = vxCreateTensor(context,1, " << weights + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" << weights << "); " << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_STATUS(copyTensor(" << weights << ", dataFolder + \"/weights/scale_init.f32\"));" << std::endl;
-                }
-                else if(codeType == "release") {
-                    ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << weights << "));" << std::endl;
-                }
-                declare_tensor_check[weights] = true;
-
-                if(codeType == "initialize") {
-                    ofsCodeC << "    vx_node " << layerName << "_node;" << std::endl;
-                    ofsCodeC << "    " << layerName + "_node = " << "vxBatchNormalizationLayer(graph, "
-                             << inputName +", "
-                             << layerName + "_W, "
-                             << layerName + "_B, "
-                             << weights+", "
-                             << bias+", "
-                             << layerName + "_eps, "
-                             << layerName << ");" << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl;
-                }
-                else if(codeType == "release") {
-                }
-            }
-        }
-        else if(type == "Eltwise") {
-            int op;
-            std::stringstream ss(params);
-            ss >> op;
-            auto&& dim = tensorMap[output];
-            for(int i=4; i < node.size(); i++) {
-                auto&& idim= tensorMap[node[i]];
-                if(dim[0]!= idim[0] || dim[1] != idim[1] || dim[2] != idim[2] || dim[3] != idim[3])
-                    error("generateCode : Eltwise op=%d requires same dimension inputs : %s[%dx%dx%dx%d] != %s[%dx%dx%dx%d]\n", op, node[i].c_str(),idim[0], idim[1], idim[2], idim[3], node[i-1].c_str(), dim[0],dim[1],dim[2],dim[3]);
-                dim = idim;
-            }
-            std::string tmp = inputName;
-            for(int i=5; i < node.size() ; i++) {
-                std::string out = layerName;
-                if(i < node.size() - 1) {
-                    out += "tmp_"+ std::to_string(i-4);
-                    if(codeType == "initialize") {
-                        ofsCodeC << "    vx_size " << out << "_dim[4] = { " << dim[3] << ", " << dim[2] << ", " << dim[1] << ", " << dim[0] << " };" << std::endl;
-                        ofsCodeC << "    vx_tensor " << out << "; " << std::endl;
-                        ofsCodeC << "    " << out << "= vxCreateTensor(context,4, " << out + "_dim, " << tensorType << ", " << fixedPosition << ");" << std::endl;
-                    }
-                    declare_tensor_check[out]= true;
-                }
-                if(op == 1) {
-                    if(codeType == "initialize") {
-                        ofsCodeC << "    vx_enum " << layerName << "_convertPolicy = " << convertPolicy << ";" << std::endl;
-                        ofsCodeC << "    vx_node    " << layerName <<"_node;" << std::endl;
-                        ofsCodeC << "    " << layerName + "_node = " << "vxTensorAddNode(graph, " << tmp << ", " << getIdentifierName(node[i]) << ", " << layerName + "_convertPolicy, " << out << ");" << std::endl;
-                        ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl;
-                        ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl;
-                    }
-                    tmp = out;
-                }
-                else error("generateCode : Eltwise op=%d not supported\n", op);
-            }
-        }
-        else if(type == "Scale") {
-            int bias_term;
-            std::stringstream ss(params); ss >> bias_term;
-
-            std::string weights = layerName + "_W";
-            std::string dim_weights = output + "_W";
-            auto&& dim = tensorMap[dim_weights];
-            if(codeType == "initialize") {
-                ofsCodeC << "    vx_size " << weights << "_dims[1] = { " << dim[0] << " };" << std::endl;
-                ofsCodeC << "    vx_tensor " << weights << ";" << std::endl;
-                ofsCodeC << "    " << weights << " = vxCreateTensor(context,1, " << weights + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" << weights << "); " << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(copyTensor(" << weights << ", dataFolder + \"/weights/" + layerName + ".f32\"));" << std::endl;
-            }
-            else if(codeType == "release") {
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << weights << "));" << std::endl;
-            }
-            declare_tensor_check[weights] = true;
-            std::string bias = "NULL";
-            if(bias_term) {
-                bias = layerName + "_B";
-                std::string dim_bias = output + "_B";
-                dim = tensorMap[dim_bias];
-                if(codeType == "initialize") {
-                    ofsCodeC << "    vx_size " << bias << "_dims[1] = { " << dim[0] << " };" << std::endl;
-                    ofsCodeC << "    vx_tensor " << bias << ";" << std::endl;
-                    ofsCodeC << "    " << bias << " = vxCreateTensor(context,1, " << bias + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" << bias << "); " << std::endl;
-                    ofsCodeC << "    " << "ERROR_CHECK_STATUS(copyTensor(" << bias << ", dataFolder + \"/bias/" + layerName + ".f32\"));" << std::endl;
-                }
-                else if(codeType == "release") {
-                    ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << bias << "));" << std::endl;
-                }
-                declare_tensor_check[bias] = true;
-            }
-            if(codeType == "initialize") {
-                ofsCodeC << "    vx_node " << layerName << "_node;" << std::endl;
-                ofsCodeC << "    " << layerName + "_node = " << "vxScaleLayer(graph, "
-                                   << inputName +", "
-                                   << layerName + "_W, "
-                                   << bias + ", "
-                                   << layerName << ");" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl;
-            }
-            else if(codeType == "release") {
-            }
-        }
-        else if(type == "Concat") {
-            if(codeType == "initialize") {
-                ofsCodeC << "    vx_node " << layerName << "_node;" << std::endl;
-                ofsCodeC << "    " <<  layerName + "_node = " << "vxConcatLayer(graph, ";
-                ofsCodeC << layerName;
-                int param_count = 0;
-                for(int i = 4; i < node.size(); i++) {
-                    std::string layerInputs = getIdentifierName(node[i]);
-                    ofsCodeC << ", " << layerInputs;
-                    param_count++;
-                }
-                while(param_count < 8) {
-                    ofsCodeC << ", NULL";
-                    param_count++;
-                }
-                ofsCodeC << " );" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl;
-            }
-        }
-        else if(type == "Dropout") {
-            //during inference dropout layer propogates input to output .
-            if(codeType ==  "initialize") {
-                ofsCodeC << "    vx_node " << layerName << "_node;" << std::endl;
-                ofsCodeC << "    " << layerName + "_node = " << "vxCopyNode( graph, (vx_reference)" << inputName << ", (vx_reference)" << layerName << ");" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl;
-            }
-        }
-        else if(type == "Softmax") {
-            if(codeType == "initialize") {
-                ofsCodeC << "    vx_node " << layerName << "_node;" << std::endl;
-                ofsCodeC << "    " << layerName + "_node = " << "vxSoftmaxLayer(graph, " << inputName << ", " << layerName << ");" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl;
-            }
-        }
-        else if(type == "Split") {
-            if(codeType == "initialize") {
-                ofsCodeC << "    vx_node " << layerName << "_node;" << std::endl;
-                ofsCodeC << "    " << layerName + "_node = " << "vxCopyNode( graph, (vx_reference)"<< inputName << ", (vx_reference)" << layerName << ");" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl;
-            }
-        }
-        else if(type == "SoftmaxWithLoss") {
-            if(codeType == "initialize") {
-                ofsCodeC << "    vx_node " << layerName << "_node;" << std::endl;
-                ofsCodeC << "    " << layerName + "_node = " << "vxSoftmaxLayer(graph, " << inputName << ", " << layerName << ");" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl;
-            }
-        }
-        if(codeType== "initialize")
-            ofsCodeC << std::endl;
-    }
-}
-
-void generateCopyImageCode(std::ostream& ofsCodeC)
-{
-    ofsCodeC << "static vx_status copyImage(vx_image image, std::string fileName, vx_enum usage = VX_WRITE_ONLY)" << std::endl
-             << "{" << std::endl
-             << "    vx_uint32 width = 0, height = 0;" << std::endl
-             << "    vxQueryImage(image, VX_IMAGE_WIDTH, &width, sizeof(vx_uint32));" << std::endl
-             << "    vxQueryImage(image, VX_IMAGE_HEIGHT, &height, sizeof(vx_uint32));" << std::endl
-             << "    vx_rectangle_t rect = { 0, 0, width, height };" << std::endl
-             << "    vx_imagepatch_addressing_t addr;" << std::endl
-             << "    vx_uint8 * ptr = NULL;" << std::endl
-             << "    vx_map_id map_id;" << std::endl
-             << "    vx_status status = vxMapImagePatch(image, &rect, 0, &map_id, &addr, (void **)&ptr, usage, VX_MEMORY_TYPE_HOST, VX_NOGAP_X);" << std::endl
-             << "    if(status) {" << std::endl
-             << "        std::cerr << \"ERROR: vxMapImagePatch() failed for \" << fileName << std::endl;" << std::endl
-             << "        return -1;" << std::endl
-             << "    }" << std::endl
-             << "    vx_uint32 width_in_bytes = (width * addr.stride_x);" << std::endl
-             << "    FILE * fp = fopen(fileName.c_str(), usage == VX_WRITE_ONLY ? \"rb\" : \"wb\");" << std::endl
-             << "    if(!fp) {" << std::endl
-             << "        std::cerr << \"ERROR: unable to open: \" << fileName << std::endl;" << std::endl
-             << "        return -1;" << std::endl
-             << "    }" << std::endl
-             << "    for (vx_uint32 y = 0; y < height; y += addr.step_y) {" << std::endl
-             << "        vx_uint8 * line = (vx_uint8 *)vxFormatImagePatchAddress2d(ptr, 0, y, &addr);" << std::endl
-             << "        if(usage == VX_WRITE_ONLY) {" << std::endl
-             << "            vx_size n = fread(line, sizeof(vx_uint8), width_in_bytes, fp);" << std::endl
-             << "            if(n != width_in_bytes) {" << std::endl
-             << "                std::cerr << \"ERROR: expected char[\" << height*width_in_bytes << \"], but got char[\" << y*width_in_bytes+n << \"] in \" << fileName << std::endl;" << std::endl
-             << "                return -1;" << std::endl
-             << "            }" << std::endl
-             << "        }" << std::endl
-             << "        else {" << std::endl
-             << "            fwrite(line, sizeof(vx_uint8), width_in_bytes, fp);" << std::endl
-             << "        }" << std::endl
-             << "    }" << std::endl
-             << "    fclose(fp);" << std::endl
-             << "    status = vxUnmapImagePatch(image, map_id);" << std::endl
-             << "    if(status) {" << std::endl
-             << "        std::cerr << \"ERROR: vxUnmapImagePatch() failed for \" << fileName << std::endl;" << std::endl
-             << "        return -1;" << std::endl
-             << "    }" << std::endl
-             << "    return 0;" << std::endl
-             << "}" << std::endl << std::endl;
-}
-
-void generateCopyTensorCode(std::ostream& ofsCodeC)
-{
-    ofsCodeC << "static vx_status copyTensor(vx_tensor tensor, std::string fileName, vx_enum usage = VX_WRITE_ONLY)" << std::endl
-             << "{" << std::endl
-             << "    vx_enum data_type = VX_TYPE_FLOAT32;" << std::endl
-             << "    vx_size num_of_dims = 4, dims[4] = { 1, 1, 1, 1 }, stride[4];" << std::endl
-             << "    vxQueryTensor(tensor, VX_TENSOR_DATA_TYPE, &data_type, sizeof(data_type));" << std::endl
-             << "    vxQueryTensor(tensor, VX_TENSOR_NUMBER_OF_DIMS, &num_of_dims, sizeof(num_of_dims));" << std::endl
-             << "    vxQueryTensor(tensor, VX_TENSOR_DIMS, &dims, sizeof(dims[0])*num_of_dims);" << std::endl
-             << "    vx_size itemsize = sizeof(float);" << std::endl
-             << "    if(data_type == VX_TYPE_UINT8 || data_type == VX_TYPE_INT8) {" << std::endl
-             << "        itemsize = sizeof(vx_uint8);" << std::endl
-             << "    }" << std::endl
-             << "    else if(data_type == VX_TYPE_UINT16 || data_type == VX_TYPE_INT16 || data_type == VX_TYPE_FLOAT16) {" << std::endl
-             << "        itemsize = sizeof(vx_uint16);" << std::endl
-             << "    }" << std::endl
-             << "    vx_size count = dims[0] * dims[1] * dims[2] * dims[3];" << std::endl
-             << "    vx_map_id map_id;" << std::endl
-             << "    float * ptr;" << std::endl
-             << "    vx_status status = vxMapTensorPatch(tensor, num_of_dims, nullptr, nullptr, &map_id, stride, (void **)&ptr, usage, VX_MEMORY_TYPE_HOST);" << std::endl
-             << "    if(status) {" << std::endl
-             << "        std::cerr << \"ERROR: vxMapTensorPatch() failed for \" << fileName << std::endl;" << std::endl
-             << "        return -1;" << std::endl
-             << "    }" << std::endl
-             << "    FILE * fp = fopen(fileName.c_str(), usage == VX_WRITE_ONLY ? \"rb\" : \"wb\");" << std::endl
-             << "    if(!fp) {" << std::endl
-             << "        std::cerr << \"ERROR: unable to open: \" << fileName << std::endl;" << std::endl
-             << "        return -1;" << std::endl
-             << "    }" << std::endl
-             << "    if(usage == VX_WRITE_ONLY) {" << std::endl
-             << "        vx_size n = fread(ptr, itemsize, count, fp);" << std::endl
-             << "        if(n != count) {" << std::endl
-             << "            std::cerr << \"ERROR: expected char[\" << count*itemsize << \"], but got char[\" << n*itemsize << \"] in \" << fileName << std::endl;" << std::endl
-             << "            return -1;" << std::endl
-             << "        }" << std::endl
-             << "    }" << std::endl
-             << "    else {" << std::endl
-             << "        fwrite(ptr, itemsize, count, fp);" << std::endl
-             << "    }" << std::endl
-             << "    fclose(fp);" << std::endl
-             << "    status = vxUnmapTensorPatch(tensor, map_id);" << std::endl
-             << "    if(status) {" << std::endl
-             << "        std::cerr << \"ERROR: vxUnmapTensorPatch() failed for \" << fileName << std::endl;" << std::endl
-             << "        return -1;" << std::endl
-             << "    }" << std::endl
-             << "    return 0;" << std::endl
-             << "}" << std::endl << std::endl;
-}
-
-void generateCode(
-    std::ostream& ofsCodeH,
-    std::ostream& ofsCodeC,
-    std::ofstream& ofsCodeM,
-    std::ofstream& ofsCodeA,
-    std::ofstream& ofsCodeD,
-    std::vector<std::vector<std::string>>& net,
-    std::map<std::string,std::vector<int>>& tensorMap,
-    std::string tensorType,
-    int fixedPointPosition,
-    std::string convertPolicy,
-    std::string roundPolicy,
-    bool isVirtualEnabled,
-    std::string outputFolder,
-    bool bInputIsImage,
-    std::string inputImageType,
-    bool bInputChannelReverse,
-    double fInputConversionA,
-    double fInputConversionB,
-    bool bOutputArgmax,
-    bool bOutputIsImage,
-    std::string argmaxOutputDataType,
-    int argmaxTopK,
-    std::vector<int>& argmaxLut,
-    bool bEnableErrorMessages,
-    bool bFuseScaleLayer)
-{
-    std::string annApiName = "annCreateGraph";
-    if(bInputIsImage) annApiName += "WithInputImage";
-    if(bOutputArgmax) annApiName += (bOutputIsImage ? "WithArgmaxImage" : "WithArgmaxTensor");
-    if(argmaxLut.size() > 0) annApiName += "WithLut";
-
-    ////
-    // generate .h file
-    //
-    ofsCodeH << "#ifndef annmodule_h" <<  std::endl
-             << "#define annmodule_h" <<  std::endl
-             <<                         std::endl
-             << "#include <VX/vx.h>" << std::endl
-             <<                         std::endl
-             << "extern \"C\" {"     << std::endl
-             << "    VX_API_ENTRY void     VX_API_CALL annGetTensorDimensions(vx_size dimInput[4], vx_size dimOutput[4]);" << std::endl;
-    ofsCodeH << "    VX_API_ENTRY vx_graph VX_API_CALL " << annApiName << "(vx_context context, "
-                                  << (bInputIsImage ? "vx_image" : "vx_tensor") << " input, "
-                                  << (bOutputIsImage ? "vx_image" : " vx_tensor") << " output, const char * options);" << std::endl;
-    ofsCodeH << "};"                 << std::endl
-             <<                         std::endl
-             << "#endif" <<  std::endl;
-
-    ////
-    // generate .cpp file
-    //
-    ofsCodeC << "#include \"annmodule.h\"" << std::endl << std::endl;
-    ofsCodeC << "#include <vx_ext_amd.h>" << std::endl;
-    ofsCodeC << "#include <VX/vx_khr_nn.h>" << std::endl;
-    ofsCodeC << "#include <vx_amd_nn.h>" << std::endl<< std::endl;
-    ofsCodeC << "#include <iostream>" << std::endl;
-    ofsCodeC << "#include <stdio.h>" << std::endl;
-    ofsCodeC << "#include <stdlib.h>" << std::endl << std::endl;
-
-    ofsCodeC << "#define ERROR_CHECK_STATUS(call) { vx_status status = (call); if(status != VX_SUCCESS) { vxAddLogEntry((vx_reference)context, status, \"ERROR: failed with status = (%d) at \" __FILE__ \"#%d\\n\", status, __LINE__); return nullptr; } }" << std::endl;
-    ofsCodeC << "#define ERROR_CHECK_OBJECT(obj) { vx_status status = vxGetStatus((vx_reference)(obj)); if(status != VX_SUCCESS) { vxAddLogEntry((vx_reference)context, status, \"ERROR: failed with status = (%d) at \" __FILE__ \"#%d\\n\", status, __LINE__); return nullptr; } }" << std::endl << std::endl;
-
-    generateCopyTensorCode(ofsCodeC);
-
-    auto&& input = net[0][4];
-    auto&& output = net[net.size()-1][3];
-    auto&& idim = tensorMap[input];
-    auto&& odim = tensorMap[output];
-    ofsCodeC << "VX_API_ENTRY void VX_API_CALL annGetTensorDimensions(vx_size dimInput[4], vx_size dimOutput[4])" << std::endl
-             << "{" << std::endl
-             << "    dimInput[0] = " << idim[3] << ";" << std::endl
-             << "    dimInput[1] = " << idim[2] << ";" << std::endl
-             << "    dimInput[2] = " << idim[1] << ";" << std::endl
-             << "    dimInput[3] = " << idim[0] << ";" << std::endl
-             << "    dimOutput[0] = " << odim[3] << ";" << std::endl
-             << "    dimOutput[1] = " << odim[2] << ";" << std::endl
-             << "    dimOutput[2] = " << odim[1] << ";" << std::endl
-             << "    dimOutput[3] = " << odim[0] << ";" << std::endl
-             << "}" << std::endl << std::endl;
-    if(bOutputArgmax) {
-        if(argmaxOutputDataType == "VX_TYPE_UINT8" && odim[1] >= 256) {
-            printf("ERROR: output argmax tensor type VX_TYPE_UINT8 can't hold channel numbers upto %d\n", odim[1]);
-            exit(1);
-        }
-        if(argmaxLut.size() > 0 && argmaxLut.size() < odim[1]) {
-            printf("ERROR: argmax LUT requires at least %d entries: got %ld entries\n", odim[1], argmaxLut.size());
-            exit(1);
-        }
-    }
-
-    ofsCodeC << "VX_API_ENTRY vx_graph VX_API_CALL " << annApiName << "(vx_context context, "
-             << (bInputIsImage ? "vx_image" : "vx_tensor") << " " << input << (bInputIsImage ? "__image" : "") << ", "
-             << (bOutputIsImage ? "vx_image" : "vx_tensor") << " " << output << (bOutputArgmax ? "__argmax" : "") << ", const char * dataFolder_)" << std::endl;
-    ofsCodeC << "{" << std::endl;
-    ofsCodeC << "    // load neural network extension kernels" << std::endl;
-    ofsCodeC << "    ERROR_CHECK_STATUS(vxLoadKernels(context,\"vx_nn\"));" << std::endl;
-    ofsCodeC << std::endl;
-    ofsCodeC << "    // create graph" << std::endl;
-    ofsCodeC << "    vx_graph graph = vxCreateGraph(context); " << std::endl;
-    ofsCodeC << "    ERROR_CHECK_OBJECT(graph);" << std::endl;
-    ofsCodeC << std::endl;
-    ofsCodeC << "    // get dataFolder option" << std::endl;
-    ofsCodeC << "    std::string dataFolder = dataFolder_ ? dataFolder_ : \".\", fileName;" << std::endl;
-    ofsCodeC << std::endl;
-    ofsCodeC << "    ////" << std::endl;
-    ofsCodeC << "    // initialize the graph" << std::endl;
-    if(bInputIsImage) {
-        if(inputImageType == "VX_DF_IMAGE_RGB" && idim[1] != 3) {
-            printf("ERROR: need input channels to be 3 to use input as an RGB/BGR images: got input C = %d\n", idim[1]);
-            exit(1);
-        }
-        else if(inputImageType == "VX_DF_IMAGE_U8" && idim[1] != 1) {
-            printf("ERROR: need input channels to be 1 to use input as an U8 images: got input C = %d\n", idim[1]);
-            exit(1);
-        }
-        ofsCodeC << "    vx_size " << input << "_dims[4] = { " << idim[3] << ", " << idim[2] << ", " << idim[1] << ", " << idim[0] << " };" << std::endl;
-        ofsCodeC << "    vx_tensor " << input << ";" << std::endl;
-        if(isVirtualEnabled) {
-            ofsCodeC << "    " << input << " = vxCreateVirtualTensor(graph, 4, " << input + "_dims,"<< tensorType <<", " << fixedPointPosition << ");" << std::endl;
-        }
-        else {
-            ofsCodeC << "    " << input << " = vxCreateTensor(context, 4, " << input + "_dims,"<< tensorType <<", " << fixedPointPosition << ");" << std::endl;
-        }
-        ofsCodeC << "    " << "ERROR_CHECK_OBJECT("  << input << ");" << std::endl;
-        ofsCodeC << "    vx_node " << input << "_image_conversion_node;" << std::endl;
-        ofsCodeC << "    " << input + "_image_conversion_node = " << "vxConvertImageToTensorNode(graph, " << input << "__image, " << input << ", " << fInputConversionA << ", " << fInputConversionB << ", " << (bInputChannelReverse ? "vx_true_e" : "vx_false_e") << ");" << std::endl;
-        ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" + input + "_image_conversion_node);" << std::endl;
-        ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << input + "_image_conversion_node));" << std::endl;
-    }
-    if(bOutputArgmax) {
-        ofsCodeC << "    vx_size " << output << "_dims[4] = { " << odim[3] << ", " << odim[2] << ", 1, " << odim[0] << " };" << std::endl;
-        ofsCodeC << "    vx_tensor " << output << ";" << std::endl;
-        if(isVirtualEnabled) {
-            ofsCodeC << "    " << output << " = vxCreateVirtualTensor(graph, 4, " << output + "_dims,"<< tensorType <<", " << fixedPointPosition << ");" << std::endl;
-        }
-        else {
-            ofsCodeC << "    " << output << " = vxCreateTensor(context, 4, " << output + "_dims,"<< tensorType <<", " << fixedPointPosition << ");" << std::endl;
-        }
-        ofsCodeC << "    " << "ERROR_CHECK_OBJECT("  << output << ");" << std::endl;
-    }
-    writeVXCode(ofsCodeC, net, tensorMap, tensorType, fixedPointPosition, convertPolicy, roundPolicy, isVirtualEnabled, bFuseScaleLayer, outputFolder, "initialize");
-    if(bOutputArgmax) {
-        std::string argmaxOutputName = output + "__argmax";
-        if(bOutputIsImage && argmaxOutputDataType == "VX_DF_IMAGE_U8" && argmaxLut.size() >= odim[1]) {
-            ofsCodeC << "    vx_image " << argmaxOutputName << "_labels;" << std::endl;
-            if(isVirtualEnabled) {
-                ofsCodeC << "    " << argmaxOutputName << "_labels = vxCreateVirtualImage(graph, " << odim[3] << ", " << (odim[2]*odim[0]) << ", VX_DF_IMAGE_U8);" << std::endl;
-            }
-            else {
-                ofsCodeC << "    " << argmaxOutputName << "_labels = vxCreateImage(context, " << odim[3] << ", " << (odim[2]*odim[0]) << ", VX_DF_IMAGE_U8);" << std::endl;
-            }
-            ofsCodeC << "    " << "ERROR_CHECK_OBJECT("  << argmaxOutputName << "_labels);" << std::endl;
-            ofsCodeC << "    vx_node " << output << "_argmax_node;" << std::endl;
-            ofsCodeC << "    " << output + "_argmax_node = " << "vxArgmaxLayer(graph, " << output << ", (vx_reference)" << argmaxOutputName << "_labels);" << std::endl;
-            ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" + output + "_argmax_node);" << std::endl;
-            ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << output << "_argmax_node));" << std::endl;
-            for(int i = 0; i < 3; i++) {
-                std::string lutName = output + "__lut" + i["RGB"];
-                std::string chanName = output + "__channel" + i["RGB"];
-                ofsCodeC << "    vx_lut " << lutName << " = vxCreateLUT(context, VX_TYPE_UINT8, 256);" << std::endl;
-                ofsCodeC << "    vx_uint8 " << lutName << "_tbl[256] = {";
-                for(int j = 0; j < odim[1]; j++) {
-                    if((j & 15) == 0) {
-                        ofsCodeC << std::endl << "        ";
-                    }
-                    ofsCodeC << ((argmaxLut[j] >> (i * 8)) & 255) << ", ";
-                }
-                ofsCodeC << std::endl;
-                ofsCodeC << "    };" << std::endl;
-                ofsCodeC << "    ERROR_CHECK_STATUS(vxCopyLUT(" << lutName << ", " << lutName << "_tbl, VX_WRITE_ONLY, VX_MEMORY_TYPE_HOST));" << std::endl;
-                ofsCodeC << "    vx_image " << chanName << ";" << std::endl;
-                if(isVirtualEnabled) {
-                    ofsCodeC << "    " << chanName << " = vxCreateVirtualImage(graph, " << odim[3] << ", " << (odim[2]*odim[0]) << ", VX_DF_IMAGE_U8);" << std::endl;
-                }
-                else {
-                    ofsCodeC << "    " << chanName << " = vxCreateImage(context, " << odim[3] << ", " << (odim[2]*odim[0]) << ", VX_DF_IMAGE_U8);" << std::endl;
-                }
-                ofsCodeC << "    " << "ERROR_CHECK_OBJECT("  << chanName << ");" << std::endl;
-                ofsCodeC << "    vx_node " << chanName << "_node;" << std::endl;
-                ofsCodeC << "    " << chanName + "_node = " << "vxTableLookupNode(graph, " << argmaxOutputName << ", " << lutName << ", " << chanName << ");" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" + chanName + "_node);" << std::endl;
-                ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << chanName << "_node));" << std::endl;
-            }
-            ofsCodeC << "    vx_node " << output << "_combine_node;" << std::endl;
-            ofsCodeC << "    " << output + "_combine_node = " << "vxChannelCombineNode(graph, " << output << "__channelR, " << output << "__channelG, " << output << "__channelB, NULL, " << argmaxOutputName << ");" << std::endl;
-            ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" << output << "_combine_node);" << std::endl;
-            ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << output << "_combine_node));" << std::endl;
-        }
-        else {
-            ofsCodeC << "    vx_node " << output << "_argmax_node;" << std::endl;
-            ofsCodeC << "    " << output + "_argmax_node = " << "vxArgmaxLayer(graph, " << output << ", (vx_reference)" << argmaxOutputName << ");" << std::endl;
-            ofsCodeC << "    " << "ERROR_CHECK_OBJECT(" << output << "_argmax_node);" << std::endl;
-            ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << output << "_argmax_node));" << std::endl;
-        }
-    }
-    ofsCodeC << "    ////" << std::endl;
-    ofsCodeC << "    // release intermediate objects" << std::endl;
-    if(bInputIsImage) {
-        ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << input << "));" << std::endl;
-    }
-    writeVXCode(ofsCodeC, net, tensorMap, tensorType, fixedPointPosition, convertPolicy, roundPolicy, isVirtualEnabled, bFuseScaleLayer, outputFolder, "release");
-    if(bOutputArgmax) {
-        ofsCodeC << "    " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << output << "));" << std::endl;
-    }
-    ofsCodeC << std::endl;
-    ofsCodeC << "    ////" << std::endl;
-    ofsCodeC << "    // verify the built graph" << std::endl;
-    ofsCodeC << "    ERROR_CHECK_STATUS(vxVerifyGraph(graph));" << std::endl;
-    ofsCodeC << std::endl;
-    ofsCodeC << "    return graph;" << std::endl;
-    ofsCodeC << "}" << std::endl;
-
-    /////
-    // generate CMakeLists.txt
-    //
-    ofsCodeM << "cmake_minimum_required(VERSION 3.5)" << std::endl;
-    ofsCodeM << "project (annmodule)" << std::endl;
-    ofsCodeM << "set (CMAKE_CXX_STANDARD 14)" << std::endl;
-    ofsCodeM << "list(APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake)" << std::endl;
-    ofsCodeM << "find_package(OpenCL     REQUIRED)" << std::endl;
-    ofsCodeM << "include_directories (${OpenCL_INCLUDE_DIRS} ${OpenCL_INCLUDE_DIRS}/Headers )" << std::endl;
-    ofsCodeM << "include_directories (/opt/rocm/include/mivisionx)" << std::endl;
-    ofsCodeM << "link_directories    (/opt/rocm/lib)" << std::endl;
-    ofsCodeM << "list(APPEND SOURCES annmodule.cpp)" << std::endl;
-    ofsCodeM << "add_library(${PROJECT_NAME} SHARED ${SOURCES})" << std::endl;
-    ofsCodeM << "set(CMAKE_CXX_FLAGS \"${CMAKE_CXX_FLAGS} -msse4.2 -std=gnu++14\")" << std::endl;
-    ofsCodeM << "target_link_libraries(${PROJECT_NAME} openvx vx_nn pthread)" << std::endl;
-    ofsCodeM << "add_executable(anntest anntest.cpp)" << std::endl;
-    ofsCodeM << "target_link_libraries(anntest openvx vx_nn pthread ${PROJECT_NAME})" << std::endl;
-
-    /////
-    // generate simple application
-    //
-    ofsCodeA << "#include \"annmodule.h\"" << std::endl ;
-    ofsCodeA << "#include <vx_ext_amd.h>" << std::endl;
-    ofsCodeA << "#include <iostream>" << std::endl;
-    ofsCodeA << "#include <stdio.h>" << std::endl;
-    ofsCodeA << "#include <string.h>" << std::endl;
-    ofsCodeA << "#include <string>" << std::endl;
-    ofsCodeA << "#include <inttypes.h>" << std::endl;
-    ofsCodeA << "#include <chrono>" << std::endl;
-    ofsCodeA << "#include <unistd.h>" << std::endl;
-    ofsCodeA << std::endl;
-    ofsCodeA << "#define ERROR_CHECK_STATUS(call) { vx_status status = (call); if(status != VX_SUCCESS) { printf(\"ERROR: failed with status = (%d) at \" __FILE__ \"#%d\", status, __LINE__); return -1; } }" << std::endl;
-    ofsCodeA << std::endl;
-    if(bEnableErrorMessages) {
-        ofsCodeA << "static void VX_CALLBACK log_callback(vx_context context, vx_reference ref, vx_status status, const vx_char string[])" << std::endl;
-        ofsCodeA << "{" << std::endl;
-        ofsCodeA << "    size_t len = strlen(string);" << std::endl;
-        ofsCodeA << "    if (len > 0) {" << std::endl;
-        ofsCodeA << "        printf(\"%s\", string);" << std::endl;
-        ofsCodeA << "        if (string[len - 1] != '\\n')" << std::endl;
-        ofsCodeA << "            printf(\"\\n\");" << std::endl;
-        ofsCodeA << "        fflush(stdout);" << std::endl;
-        ofsCodeA << "    }" << std::endl;
-        ofsCodeA << "}" << std::endl;
-        ofsCodeA << std::endl;
-    }
-    ofsCodeA << "inline int64_t clockCounter()" << std::endl;
-    ofsCodeA << "{" << std::endl;
-    ofsCodeA << "    return std::chrono::high_resolution_clock::now().time_since_epoch().count();" << std::endl;
-    ofsCodeA << "}" << std::endl;
-    ofsCodeA << std::endl;
-    ofsCodeA << "inline int64_t clockFrequency()" << std::endl;
-    ofsCodeA << "{" << std::endl;
-    ofsCodeA << "    return std::chrono::high_resolution_clock::period::den / std::chrono::high_resolution_clock::period::num;" << std::endl;
-    ofsCodeA << "}" << std::endl;
-    ofsCodeA << std::endl;
-
-    if(bInputIsImage || bOutputIsImage) {
-        generateCopyImageCode(ofsCodeA);
-    }
-    if(!(bInputIsImage && bOutputIsImage)) {
-        generateCopyTensorCode(ofsCodeA);
-    }
-
-    ofsCodeA << "int main(int argc , char ** argv)" << std::endl;
-    ofsCodeA << "{" << std::endl;
-    ofsCodeA << "    // get module configuration" << std::endl;
-    ofsCodeA << "    vx_size dimInput[4] = { 0 }, dimOutput[4] = { 0 };" << std::endl;
-    ofsCodeA << "    annGetTensorDimensions(dimInput, dimOutput);" << std::endl;
-    ofsCodeA << "    printf(\"OK: annGetTensorDimensions() => [input %ldx%ldx%ldx%ld] [output %ldx%ldx%ldx%ld]\\n\", dimInput[0], dimInput[1], dimInput[2], dimInput[3], dimOutput[0], dimOutput[1], dimOutput[2], dimOutput[3]);" << std::endl;
-    ofsCodeA << std::endl;
-    ofsCodeA << "    // create context, input, output, and graph" << std::endl;
-    if(bEnableErrorMessages) {
-        ofsCodeA << "    vxRegisterLogCallback(NULL, log_callback, vx_false_e);" << std::endl;
-    }
-    ofsCodeA << "    vx_context context = vxCreateContext();" << std::endl;
-    ofsCodeA << "    if(vxGetStatus((vx_reference)context)) {" << std::endl;
-    ofsCodeA << "        printf(\"ERROR: vxCreateContext() failed\\n\");" << std::endl;
-    ofsCodeA << "        return -1;" << std::endl;
-    ofsCodeA << "    }" << std::endl;
-    if(bEnableErrorMessages) {
-        ofsCodeA << "    vxRegisterLogCallback(context, log_callback, vx_false_e);" << std::endl;
-    }
-    if(bInputIsImage) {
-        ofsCodeA << "    vx_image input = vxCreateImage(context, (vx_uint32)dimInput[0], (vx_uint32)(dimInput[1]*dimInput[3]), " << inputImageType << ");" << std::endl;
-        ofsCodeA << "    if(vxGetStatus((vx_reference)input)) {" << std::endl;
-        ofsCodeA << "        printf(\"ERROR: vxCreateImage(input,%ld,%ld," << inputImageType << ") failed\\n\", dimInput[0], dimInput[1]*dimInput[3]);" << std::endl;
-        ofsCodeA << "        return -1;" << std::endl;
-        ofsCodeA << "    }" << std::endl;
-    }
-    else {
-        ofsCodeA << "    vx_tensor input = vxCreateTensor(context, 4, dimInput, VX_TYPE_FLOAT32, 0);" << std::endl;
-        ofsCodeA << "    if(vxGetStatus((vx_reference)input)) {" << std::endl;
-        ofsCodeA << "        printf(\"ERROR: vxCreateTensor(input,4,{%ld,%ld,%ld,%ld}) failed\\n\", dimInput[0], dimInput[1], dimInput[2], dimInput[3]);" << std::endl;
-        ofsCodeA << "        return -1;" << std::endl;
-        ofsCodeA << "    }" << std::endl;
-    }
-    if(bOutputArgmax) {
-        if(bOutputIsImage) {
-            std::string outputImageFormat = argmaxOutputDataType;
-            if(argmaxLut.size() > 0) {
-                outputImageFormat = "VX_DF_IMAGE_RGB";
-            }
-            ofsCodeA << "    vx_image output = vxCreateImage(context, (vx_uint32)dimOutput[0], (vx_uint32)(dimOutput[1]*dimOutput[3]), " << outputImageFormat << ");" << std::endl;
-            ofsCodeA << "    if(vxGetStatus((vx_reference)output)) {" << std::endl;
-            ofsCodeA << "        printf(\"ERROR: vxCreateImage(output,%ld,%ld," << outputImageFormat << ") failed\\n\", dimOutput[0], dimOutput[1]*dimOutput[3]);" << std::endl;
-            ofsCodeA << "        return -1;" << std::endl;
-            ofsCodeA << "    }" << std::endl;
-        }
-        else {
-            ofsCodeA << "    vx_size dimArgmax[4] = { dimOutput[0], dimOutput[1], " << argmaxTopK << ", dimOutput[3] };" << std::endl;
-            ofsCodeA << "    vx_tensor output = vxCreateTensor(context, 4, dimArgmax, " << argmaxOutputDataType << ", 0);" << std::endl;
-            ofsCodeA << "    if(vxGetStatus((vx_reference)output)) {" << std::endl;
-            ofsCodeA << "        printf(\"ERROR: vxCreateTensor(output,4,{%ld,%ld,%ld,%ld}," << argmaxOutputDataType << ",0) failed\\n\", dimArgmax[0], dimArgmax[1], dimArgmax[2], dimArgmax[3]);" << std::endl;
-            ofsCodeA << "        return -1;" << std::endl;
-            ofsCodeA << "    }" << std::endl;
-        }
-    }
-    else {
-        ofsCodeA << "    vx_tensor output = vxCreateTensor(context, 4, dimOutput, VX_TYPE_FLOAT32, 0);" << std::endl;
-        ofsCodeA << "    if(vxGetStatus((vx_reference)output)) {" << std::endl;
-        ofsCodeA << "        printf(\"ERROR: vxCreateTensor(output,4,{%ld,%ld,%ld,%ld},VX_TYPE_FLOAT32,0) failed\\n\", dimOutput[0], dimOutput[1], dimOutput[2], dimOutput[3]);" << std::endl;
-        ofsCodeA << "        return -1;" << std::endl;
-        ofsCodeA << "    }" << std::endl;
-    }
-    ofsCodeA << std::endl;
-    ofsCodeA << "    // build graph from the module" << std::endl;
-    ofsCodeA << "    int64_t freq = clockFrequency(), t0, t1;" << std::endl;
-    ofsCodeA << "    t0 = clockCounter();" << std::endl;
-    ofsCodeA << "    vx_graph graph = " << annApiName << "(context, input, output, argc > 1 ? argv[1] : nullptr);" << std::endl;
-    ofsCodeA << "    t1 = clockCounter();" << std::endl;
-    ofsCodeA << "    if(vxGetStatus((vx_reference)graph)) {" << std::endl;
-    ofsCodeA << "        printf(\"ERROR: " << annApiName << "(...,%s) failed\\n\", argv[1]);" << std::endl;
-    ofsCodeA << "        return -1;" << std::endl;
-    ofsCodeA << "    }" << std::endl;
-    ofsCodeA << "    printf(\"OK: " << annApiName << "() took %.3f msec\\n\", (float)(t1-t0)*1000.0f/(float)freq);" << std::endl;
-    ofsCodeA << std::endl;
-    if(bInputIsImage) {
-        ofsCodeA << "    if(argc > 2) {" << std::endl;
-        ofsCodeA << "        if(copyImage(input, argv[2], VX_WRITE_ONLY) < 0) {" << std::endl;
-        ofsCodeA << "            return -1;" << std::endl;
-        ofsCodeA << "        }" << std::endl;
-        ofsCodeA << "        printf(\"OK: read %ldx%ld image from %s\\n\", dimInput[0], dimInput[1], argv[2]);" << std::endl;
-        ofsCodeA << "    }" << std::endl;
-    }
-    else {
-        ofsCodeA << "    if(argc > 2) {" << std::endl;
-        ofsCodeA << "        if(copyTensor(input, argv[2], VX_WRITE_ONLY) < 0) {" << std::endl;
-        ofsCodeA << "            return -1;" << std::endl;
-        ofsCodeA << "        }" << std::endl;
-        ofsCodeA << "        printf(\"OK: read %ldx%ldx%ldx%ld tensor from %s\\n\", dimInput[3], dimInput[2], dimInput[1], dimInput[0], argv[2]);" << std::endl;
-        ofsCodeA << "    }" << std::endl;
-    }
-    ofsCodeA << std::endl;
-    ofsCodeA << "    t0 = clockCounter();" << std::endl;
-    ofsCodeA << "    vx_status status = vxProcessGraph(graph);" << std::endl;
-    ofsCodeA << "    t1 = clockCounter();" << std::endl;
-    ofsCodeA << "    if(status != VX_SUCCESS) {" << std::endl;
-    ofsCodeA << "        printf(\"ERROR: vxProcessGraph() failed (%d)\\n\", status);" << std::endl;
-    ofsCodeA << "        return -1;" << std::endl;
-    ofsCodeA << "    }" << std::endl;
-    ofsCodeA << "    printf(\"OK: vxProcessGraph() took %.3f msec (1st iteration)\\n\", (float)(t1-t0)*1000.0f/(float)freq);" << std::endl;
-    ofsCodeA << std::endl;
-    if(bOutputIsImage) {
-        ofsCodeA << "    if(argc > 3) {" << std::endl;
-        ofsCodeA << "        if(copyImage(output, argv[3], VX_READ_ONLY) < 0) {" << std::endl;
-        ofsCodeA << "            return -1;" << std::endl;
-        ofsCodeA << "        }" << std::endl;
-        ofsCodeA << "        printf(\"OK: wrote %ldx%ld image into %s\\n\", dimOutput[0], dimOutput[1]*dimOutput[3], argv[3]);" << std::endl;
-        ofsCodeA << "    }" << std::endl;
-    }
-    else {
-        ofsCodeA << "    if(argc > 3) {" << std::endl;
-        ofsCodeA << "        if(copyTensor(output, argv[3], VX_READ_ONLY) < 0) {" << std::endl;
-        ofsCodeA << "            return -1;" << std::endl;
-        ofsCodeA << "        }" << std::endl;
-        ofsCodeA << "        printf(\"OK: wrote %ldx%ldx%ldx%ld tensor into %s\\n\", dimOutput[3], " << (bOutputArgmax ? "(vx_size)1" : "dimOutput[2]") << ", dimOutput[1], dimOutput[0], argv[3]);" << std::endl;
-        ofsCodeA << "    }" << std::endl;
-    }
-    ofsCodeA << "    t0 = clockCounter();" << std::endl;
-    ofsCodeA << "    int N = 100;" << std::endl;
-    ofsCodeA << "    for(int i = 0; i < N; i++) {" << std::endl;
-    ofsCodeA << "        status = vxProcessGraph(graph);" << std::endl;
-    ofsCodeA << "        if(status != VX_SUCCESS)" << std::endl;
-    ofsCodeA << "            break;" << std::endl;
-    ofsCodeA << "    }" << std::endl;
-    ofsCodeA << "    t1 = clockCounter();" << std::endl;
-    ofsCodeA << "    printf(\"OK: vxProcessGraph() took %.3f msec (average over %d iterations)\\n\", (float)(t1-t0)*1000.0f/(float)freq/(float)N, N);" << std::endl;
-    ofsCodeA << std::endl;
-    ofsCodeA << "    // release resources" << std::endl;
-    ofsCodeA << "    ERROR_CHECK_STATUS(vxReleaseGraph(&graph));" << std::endl;
-    if(bInputIsImage) {
-        ofsCodeA << "    ERROR_CHECK_STATUS(vxReleaseImage(&input));" << std::endl;
-    }
-    else {
-        ofsCodeA << "    ERROR_CHECK_STATUS(vxReleaseTensor(&input));" << std::endl;
-    }
-    if(bOutputIsImage) {
-        ofsCodeA << "    ERROR_CHECK_STATUS(vxReleaseImage(&output));" << std::endl;
-    }
-    else {
-        ofsCodeA << "    ERROR_CHECK_STATUS(vxReleaseTensor(&output));" << std::endl;
-    }
-    ofsCodeA << "    ERROR_CHECK_STATUS(vxReleaseContext(&context));" << std::endl;
-    ofsCodeA << "    printf(\"OK: successful\\n\");" << std::endl;
-    ofsCodeA << std::endl;
-    ofsCodeA << "    return 0;"<< std::endl;
-    ofsCodeA << "}" << std::endl;
-   
-    ofsCodeD << "find_path(OPENCL_INCLUDE_DIRS"  << std::endl;
-    ofsCodeD << "NAMES OpenCL/cl.h CL/cl.h" << std::endl;
-    ofsCodeD << "HINTS" << std::endl;
-    ofsCodeD << "${OPENCL_ROOT}/include" << std::endl;
-    ofsCodeD << "$ENV{AMDAPPSDKROOT}/include" << std::endl;
-    ofsCodeD << "PATHS" << std::endl;
-    ofsCodeD << "/usr/include" << std::endl;
-    ofsCodeD << "/usr/local/include" << std::endl;
-    ofsCodeD << "/opt/rocm/opencl/include" << std::endl;
-    ofsCodeD << "DOC \"OpenCL header file path\"" << std::endl;
-    ofsCodeD << ")" << std::endl;
-    ofsCodeD << "mark_as_advanced( OPENCL_INCLUDE_DIRS )" << std::endl << std::endl;
-    ofsCodeD << "if(\"${CMAKE_SIZEOF_VOID_P}\" EQUAL \"8\")" << std::endl;
-    ofsCodeD << "   find_library( OPENCL_LIBRARIES" << std::endl;
-    ofsCodeD << "       NAMES OpenCL" << std::endl;
-    ofsCodeD << "       HINTS" << std::endl;
-    ofsCodeD << "       ${OPENCL_ROOT}/lib" << std::endl;
-    ofsCodeD << "       $ENV{AMDAPPSDKROOT}/lib" << std::endl;
-    ofsCodeD << "       DOC \"OpenCL dynamic library path\"" << std::endl;
-    ofsCodeD << "       PATH_SUFFIXES x86_64 x64 x86_64/sdk" << std::endl;
-    ofsCodeD << "       PATHS" << std::endl;
-    ofsCodeD << "       /usr/lib" << std::endl;
-    ofsCodeD << "       /opt/rocm/opencl/lib" << std::endl;
-    ofsCodeD << "       )" << std::endl;
-    ofsCodeD << "else( )" << std::endl;
-    ofsCodeD << "   find_library( OPENCL_LIBRARIES" << std::endl;
-    ofsCodeD << "       NAMES OpenCL" << std::endl;
-    ofsCodeD << "       HINTS" << std::endl;
-    ofsCodeD << "       ${OPENCL_ROOT}/lib" << std::endl;
-    ofsCodeD << "       $ENV{AMDAPPSDKROOT}/lib" << std::endl;
-    ofsCodeD << "       DOC \"OpenCL dynamic library path\"" << std::endl;
-    ofsCodeD << "       PATH_SUFFIXES x86 Win32" << std::endl;
-    ofsCodeD << "       PATHS" << std::endl;
-    ofsCodeD << "       /usr/lib" << std::endl;
-    ofsCodeD << "       )" << std::endl;
-    ofsCodeD << "endif( )" << std::endl;
-    ofsCodeD << "mark_as_advanced( OPENCL_LIBRARIES )" << std::endl << std::endl;
-    ofsCodeD << "include( FindPackageHandleStandardArgs )" << std::endl;
-    ofsCodeD << "find_package_handle_standard_args( OPENCL DEFAULT_MSG OPENCL_LIBRARIES OPENCL_INCLUDE_DIRS )" << std::endl;
-    ofsCodeD << "set(OpenCL_FOUND ${OPENCL_FOUND} CACHE INTERNAL \"\")" << std::endl;
-    ofsCodeD << "set(OpenCL_LIBRARIES ${OPENCL_LIBRARIES} CACHE INTERNAL \"\")" << std::endl;
-    ofsCodeD << "set(OpenCL_INCLUDE_DIRS ${OPENCL_INCLUDE_DIRS} CACHE INTERNAL \"\")" << std::endl;
-    ofsCodeD << "if( NOT OPENCL_FOUND )" << std::endl;
-    ofsCodeD << "   message( STATUS \"FindOpenCL looked for libraries named: OpenCL\" )" << std::endl;
-    ofsCodeD << "endif()" << std::endl;
-}
-
-void parseCaffeModel(const caffe::NetParameter& net_parameter, std::vector<std::vector<std::string>>& net, int inputDim[4], std::string outputFolder, int flags)
-{
-    if(net_parameter.has_name())
-        std::cout<<"Fetching the weights for : " << net_parameter.name()<< std::endl;
-
-    std::map<std::string,std::string> outputNameMap, splitNameMap;
-    if(net_parameter.input_size() > 0) {
-        outputNameMap[net_parameter.input(0)] = net_parameter.input(0);
-    }
-
-    if(net_parameter.input_dim_size()==4 && ((inputDim[0]==0) || (inputDim[1]==0) || (inputDim[2]==0) || (inputDim[3]==0)))
-    {
-        inputDim[0] = net_parameter.input_dim(0);
-        inputDim[1] = net_parameter.input_dim(1);
-        inputDim[2] = net_parameter.input_dim(2);
-        inputDim[3] = net_parameter.input_dim(3);
-    }
-
-    //extract layer information.
-    for(int i=0; i < net_parameter.layer_size() ;i++)
-    {
-        const caffe::LayerParameter& layer_parameter = net_parameter.layer(i);
-
-        if(layer_parameter.top_size() == 0)
-            continue;
-
-        //Check layer name.
-        if(layer_parameter.type() == "Input" || layer_parameter.type() == "Data" || layer_parameter.type() == "ImageData" ) {
-            outputNameMap[layer_parameter.top(0)]= layer_parameter.top(0);
-            if(layer_parameter.type() == "Input"  && ((inputDim[0]==0) || (inputDim[1]==0) || (inputDim[2]==0) || (inputDim[3]==0))) {
-                inputDim[0] = layer_parameter.input_param().shape(0).dim(0);
-                inputDim[1] = layer_parameter.input_param().shape(0).dim(1);
-                inputDim[2] = layer_parameter.input_param().shape(0).dim(2);
-                inputDim[3] = layer_parameter.input_param().shape(0).dim(3);
-            }
-            continue;
-        }
-
-        //dump layer data.
-        dumpLayerData(layer_parameter, outputFolder);
-
-        // enable Split optimization using a bit in flags (i.e., remove Split by using variable renaming instead of a copy)
-        bool isSplitEnabled = (flags & 1);
-        if(!isSplitEnabled) {
-            if(layer_parameter.type()=="Split") {
-                for(int j=0; j< layer_parameter.top_size() ; j++ ) {
-                    // get layer information and add to net
-                    std::vector<std::string> node;
-                    node.push_back(layer_parameter.type());
-                    node.push_back("");
-                    node.push_back(layer_parameter.top(j));
-                    node.push_back(layer_parameter.top(j));
-                    for(int z = 0; z < layer_parameter.bottom_size();z++) {
-                        if(outputNameMap.find(layer_parameter.bottom(z)) == outputNameMap.end()) {
-                            outputNameMap[layer_parameter.bottom(z)] = layer_parameter.bottom(z);
-                        }
-                        node.push_back(outputNameMap[layer_parameter.bottom(z)]);
-                    }
-                    net.push_back(node);
-                    // update output name with layer name
-                    outputNameMap[layer_parameter.top(j)] = layer_parameter.top(j);
-                }
-                continue;
-            }
-        }
-        else
-        {
-            //Split type.
-            if(layer_parameter.type()=="Split") {
-                splitNameMap[layer_parameter.name()]= layer_parameter.bottom(0);
-                for(int j=0; j< layer_parameter.top_size() ; j++ ) {
-                    splitNameMap[layer_parameter.top(j)] = layer_parameter.bottom(0);
-                }
-                continue;
-            }
-        }
-
-        // get layer information and add to net
-        std::vector<std::string> node;
-        std::string params;
-        getLayerParams(layer_parameter, params);
-        node.push_back(layer_parameter.type());
-        node.push_back(params);
-        node.push_back(layer_parameter.top(0));
-        node.push_back(layer_parameter.name());
-        for(int j = 0; j < layer_parameter.bottom_size()  ; j++) {
-            if(isSplitEnabled && (strstr(layer_parameter.bottom(j).c_str(),"split"))) {
-                outputNameMap[layer_parameter.bottom(j)]= splitNameMap[layer_parameter.bottom(j)];
-            }
-            if(outputNameMap.find(layer_parameter.bottom(j)) == outputNameMap.end()) {
-                outputNameMap[layer_parameter.bottom(j)] = layer_parameter.bottom(j);
-            }
-            node.push_back(outputNameMap[layer_parameter.bottom(j)]);
-        }
-        net.push_back(node);
-        // update output name with layer name
-        outputNameMap[layer_parameter.top(0)] = layer_parameter.name();
-    }
-}
-
-void parseV1LayerCaffeModel(const caffe::NetParameter& net_parameter, std::vector<std::vector<std::string>>& net, int inputDim[4], std::string outputFolder, int flags)
-{
-    if(net_parameter.has_name())
-        std::cout<<"Fetching the weights for : " << net_parameter.name()<< std::endl;
-
-    std::map<std::string,std::string> outputNameMap, splitNameMap;
-    if(net_parameter.input_size() > 0) {
-        outputNameMap[net_parameter.input(0)] = net_parameter.input(0);
-    }
-
-    if(net_parameter.input_dim_size()==4 && ((inputDim[0]==0) || (inputDim[1]==0) || (inputDim[2]==0) || (inputDim[3]==0)))
-    {
-        inputDim[0] = net_parameter.input_dim(0);
-        inputDim[1] = net_parameter.input_dim(1);
-        inputDim[2] = net_parameter.input_dim(2);
-        inputDim[3] = net_parameter.input_dim(3);
-    }
-
-    //extract layer information.
-    for(int i=0; i < net_parameter.layers_size() ;i++)
-    {
-        const caffe::V1LayerParameter& layer_parameter = net_parameter.layers(i);
-
-        if(layer_parameter.top_size() == 0)
-            continue;
-
-        //Check layer name.
-        if(layer_parameter.type() == caffe::V1LayerParameter_LayerType_DATA || layer_parameter.type() == caffe::V1LayerParameter_LayerType_IMAGE_DATA) {
-            outputNameMap[layer_parameter.top(0)]= layer_parameter.top(0);
-            continue;
-        }
-
-        //dump layer data.
-        dumpV1LayerData(layer_parameter, outputFolder);
-
-        // enable Split optimization using a bit in flags (i.e., remove Split by using variable renaming instead of a copy)
-        bool isSplitEnabled = (flags & 1);
-        if(!isSplitEnabled) {
-            if(layer_parameter.type() == caffe::V1LayerParameter_LayerType_SPLIT) {
-                for(int j = 0; j < layer_parameter.top_size(); j++) {
-                    // get layer information and add to net
-                    std::vector<std::string> node;
-                    node.push_back(convertV1LayerTypeToString(layer_parameter.type()));
-                    node.push_back("");
-                    node.push_back(layer_parameter.top(j));
-                    node.push_back(layer_parameter.top(j));
-                    for(int z = 0; z < layer_parameter.bottom_size();z++) {
-                        if(outputNameMap.find(layer_parameter.bottom(z)) == outputNameMap.end()) {
-                            outputNameMap[layer_parameter.bottom(z)] = layer_parameter.bottom(z);
-                        }
-                        node.push_back(outputNameMap[layer_parameter.bottom(z)]);
-                    }
-                    net.push_back(node);
-                    // update output name with layer name
-                    outputNameMap[layer_parameter.top(j)] = layer_parameter.top(j);
-                }
-                continue;
-            }
-        }
-        else
-        {
-            //Split type.
-            if(layer_parameter.type() == caffe::V1LayerParameter_LayerType_SPLIT) {
-                splitNameMap[layer_parameter.name()]= layer_parameter.bottom(0);
-                for(int j=0; j< layer_parameter.top_size() ; j++ ) {
-                    splitNameMap[layer_parameter.top(j)] = layer_parameter.bottom(0);
-                }
-                continue;
-            }
-        }
-
-        // get layer information and add to net
-        std::vector<std::string> node;
-        std::string params;
-        getV1LayerParams(layer_parameter, params);
-        node.push_back(convertV1LayerTypeToString(layer_parameter.type()));
-        node.push_back(params);
-        node.push_back(layer_parameter.top(0));
-        node.push_back(layer_parameter.name());
-        for(int j = 0; j < layer_parameter.bottom_size()  ; j++) {
-            if(isSplitEnabled && (strstr(layer_parameter.bottom(j).c_str(),"split"))) {
-                outputNameMap[layer_parameter.bottom(j)]= splitNameMap[layer_parameter.bottom(j)];
-            }
-            if(outputNameMap.find(layer_parameter.bottom(j)) == outputNameMap.end()) {
-                outputNameMap[layer_parameter.bottom(j)] = layer_parameter.bottom(j);
-            }
-            node.push_back(outputNameMap[layer_parameter.bottom(j)]);
-        }
-        net.push_back(node);
-        // update output name with layer name
-        outputNameMap[layer_parameter.top(0)] = layer_parameter.name();
-    }
-}
-
-int loadCaffeModelFile(
-    const char* fileName,
-    std::vector<std::vector<std::string>>& net,
-    int inputDim[4],
-    std::string outputFolder,
-    int flags)
-{
-    //verify the version of protobuf library.
-    GOOGLE_PROTOBUF_VERIFY_VERSION;
-
-    //read the caffemodel.
-    caffe::NetParameter net_parameter;
-    std:: cout<<"Reading the binary file from : "<< fileName<< std::endl;
-    std::fstream input(fileName, std::ios::in| std::ios::binary);
-    bool isSuccess = net_parameter.ParseFromIstream(&input);
-    if(isSuccess) {
-        std::cout << "CaffeModel Read Successful" << std::endl;
-        if(net_parameter.layer_size() > 0) {
-            parseCaffeModel(net_parameter, net, inputDim, outputFolder, flags);
-        }
-        else if(net_parameter.layers_size() > 0) {
-            info("Reading V1 layer caffe model\n");
-            parseV1LayerCaffeModel(net_parameter, net, inputDim, outputFolder, flags);
-
-        }
-        else {
-            error("No 'layers' or 'layer' fields found in the caffemodel\n");
-            return -1;
-        }
-    }
-    else {
-        std::cerr << "CaffeModel Read Failed" << std::endl;
-    }
-    return 0;
-}
-
-int main(int argc, char* argv[])
-{
-    const char * usage =
-            "Usage:\n"
-            "  % caffe2openvx [options] <net.prototxt|net.caffemodel> [n c H W [type fixed-point-position [convert-policy round-policy]]]\n"
-            "    options:\n"
-            "      --[no-]error-messages     - do/don't enable error messages (default: ON)\n"
-            "      --[no-]virtual-buffers    - do/don't use virtual buffers (default: ON)\n"
-            "      --[no-]generate-gdf       - do/don't generate RunVX GDF with weight/bias initialization (default: ON)\n"
-            "      --[no-]generate-vx-code   - do/don't generate OpenVX C Code with weight/bias initialization (default: ON)\n"
-            "      --output-dir <folder>     - specify output folder for weights/biases, GDF, and OpenVX C Code (default: current)\n"
-            "      --input-rgb <a> <b> <rev> - convert input from RGB image into tensor using (a*x+b) conversion: rev=(BGR?1:0)\n"
-            "      --input-u8  <a> <b>       - convert input from U8 image into tensor using (a*x+b) conversion\n"
-            "      --argmax-tensor u8|u16 k  - return argmax output with specified tensor type and top_k\n"
-            "      --argmax-image u8|u16     - return argmax output with specified image type\n"
-            "      --argmax-lut <rgbLut.txt> - argmax color table: one R G B entry per label\n"
-            "      --flags <int>             - specify custom flags (default: 0)\n"
-            ;
-
-    // get options
-    bool bEnableErrorMessages = true;
-    bool isVirtualEnabled = true;
-    bool generateGDF = true;
-    bool generateVXC = true;
-    bool bFuseScaleWithBatchNorm = true;
-    bool bInputIsImage = false;
-    bool bInputChannelReverse = false;
-    double fInputConversionA = 0;
-    double fInputConversionB = 255;
-    std::string inputImageType;
-    bool bOutputArgmax = false;
-    bool bOutputIsImage = false;
-    std::string argmaxOutputDataType;
-    int argmaxTopK = 1;
-    std::vector<int> argmaxLut;
-    std::string outputFolder = ".";
-    int flags = 0;
-    for(; argc > 1 && argv[1][0] == '-'; argc--, argv++) {
-        if(!strcmp(argv[1], "--error-messages")) {
-            bEnableErrorMessages = true;
-        }
-        else if(!strcmp(argv[1], "--no-error-messages")) {
-            bEnableErrorMessages = false;
-        }
-        else if(!strcmp(argv[1], "--virtual-buffers")) {
-            isVirtualEnabled = true;
-        }
-        else if(!strcmp(argv[1], "--no-virtual-buffers")) {
-            isVirtualEnabled = false;
-        }
-        else if(!strcmp(argv[1], "--generate-gdf")) {
-            generateGDF = true;
-        }
-        else if(!strcmp(argv[1], "--no-generate-gdf")) {
-            generateGDF = false;
-        }
-        else if(!strcmp(argv[1], "--generate-vx-code")) {
-            generateVXC = true;
-        }
-        else if(!strcmp(argv[1], "--no-generate-vx-code")) {
-            generateVXC = false;
-        }
-        else if(!strcmp(argv[1], "--output-dir") && argc > 2) {
-            outputFolder = argv[2];
-            argc--;
-            argv++;
-            mkdir(outputFolder.c_str(), 0777);
-        }
-        else if(!strcmp(argv[1], "--flags") && argc > 2) {
-            flags = atoi(argv[2]);
-            argc--;
-            argv++;
-        }
-        else if(!strcmp(argv[1], "--input-rgb") && argc > 4) {
-            bInputIsImage = true;
-            inputImageType = "VX_DF_IMAGE_RGB";
-            fInputConversionA = atof(argv[2]);
-            fInputConversionB = atof(argv[3]);
-            if(!strcmp(argv[4], "0")) bInputChannelReverse = false;
-            else if(!strcmp(argv[4], "1")) bInputChannelReverse = true;
-            else {
-                printf("ERROR: invalid input RGB channel <rev> option: %s (most be 0 or 1)\n", argv[4]);
-                return -1;
-            }
-            argc -= 3;
-            argv += 3;
-        }
-        else if(!strcmp(argv[1], "--input-u8") && argc > 3) {
-            bInputIsImage = true;
-            inputImageType = "VX_DF_IMAGE_U8";
-            fInputConversionA = atof(argv[2]);
-            fInputConversionB = atof(argv[3]);
-            bInputChannelReverse = false;
-            argc -= 2;
-            argv += 2;
-        }
-        else if(!strcmp(argv[1], "--argmax-tensor") && argc > 3) {
-            bOutputArgmax = true;
-            bOutputIsImage = false;
-            if(!strcmp(argv[2], "u8")) argmaxOutputDataType = "VX_TYPE_UINT8";
-            else if(!strcmp(argv[2], "u16")) argmaxOutputDataType = "VX_TYPE_UINT16";
-            else {
-                printf("ERROR: invalid argmax output tensor type: %s (must be u8 or u16)\n", argv[2]);
-                return -1;
-            }
-            argmaxTopK = atoi(argv[3]);
-            argc -= 2;
-            argv += 2;
-        }
-        else if(!strcmp(argv[1], "--argmax-image") && argc > 2) {
-            bOutputArgmax = true;
-            bOutputIsImage = true;
-            if(!strcmp(argv[2], "u8")) argmaxOutputDataType = "VX_DF_IMAGE_U8";
-            else if(!strcmp(argv[2], "u16")) argmaxOutputDataType = "VX_DF_IMAGE_U16";
-            else {
-                printf("ERROR: invalid argmax output image type: %s (must be u8 or u16)\n", argv[2]);
-                return -1;
-            }
-            argmaxTopK = 1;
-            argc -= 1;
-            argv += 1;
-        }
-        else if(!strcmp(argv[1], "--argmax-lut") && argc > 2) {
-            if(!bOutputArgmax || !bOutputIsImage || argmaxOutputDataType != "VX_DF_IMAGE_U8") {
-                printf("ERROR: '--argmax-image u8' is required prior to '--argmax-lut' option\n");
-                return -1;
-            }
-            FILE * fp = fopen(argv[2], "r");
-            if(!fp) {
-                printf("ERROR: unable to open: %s\n", argv[2]);
-                return -1;
-            }
-            argmaxLut.clear();
-            for(int r, g, b; fscanf(fp, "%d%d%d", &r, &g, &b) == 3;) {
-                int v = ((b & 255) << 16) | ((g & 255) << 8) | (r & 255);
-                argmaxLut.push_back(v);
-            }
-            fclose(fp);
-            printf("OK: loaded LUT with %ld entries from %s\n", argmaxLut.size(), argv[2]);
-            argc -= 1;
-            argv += 1;
-        }
-        else {
-            printf("ERROR: invalid option: %s\n", argv[1]);
-            return -1;
-        }
-    }
-
-    // check for command-line arguments
-    if(argc < 2) {
-        printf("%s", usage);
-        return -1;
-    }
-
-    // get command-line arguments
-    int inputDim[4] = { 0, 0, 0, 0 }, fixedPointPosition = 0;
-    const char * tensorType = "VX_TYPE_FLOAT32";
-    const char * convertPolicy = "VX_CONVERT_POLICY_SATURATE";
-    const char * roundPolicy = "VX_ROUND_POLICY_TO_NEAREST_EVEN";
-    const char *  fileName = argv[1];
-    if(argc > 2) inputDim[0] = atoi(argv[2]);
-    if(argc > 3) inputDim[1] = atoi(argv[3]);
-    if(argc > 4) inputDim[2] = atoi(argv[4]);
-    if(argc > 5) inputDim[3] = atoi(argv[5]);
-    if(argc > 6) tensorType = argv[6];
-    if(argc > 7) fixedPointPosition = atoi(argv[7]);
-    if(argc > 8) convertPolicy = argv[8];
-    if(argc > 9) roundPolicy = argv[9];
-    std::vector<std::vector<std::string>> net;
-
-    flags &= 3;     // we are only interersted in LSBs 0 & 1
-    bFuseScaleWithBatchNorm = !((flags & 2) >> 1);
-
-    // load caffe model (or just .prototxt)
-    if(strstr(fileName,".caffemodel")) {
-        // make sure that weights and bias folder are created
-        std::string dir = outputFolder + "/weights";
-        mkdir(dir.c_str(), 0777);
-        dir = outputFolder + "/bias";
-        mkdir(dir.c_str(), 0777);
-        // load caffe model
-        if(loadCaffeModelFile(fileName, net, inputDim, outputFolder, flags) < 0) {
-            return -1;
-        }
-    }
-    else if(strstr(fileName,".prototxt")) {
-        if(loadCaffeProtoTxt(fileName, net, inputDim) < 0) {
-            return -1;
-        }
-    }
-    else {
-        printf("%s", usage);
-        return -1;
-    }
-
-    // generate tensorMap for given input dimensions
-    std::map<std::string,std::vector<int>> tensorMap;
-    if(calculateTensorDim(net, inputDim, tensorMap) < 0) {
-        return -1;
-    }
-
-    if(generateGDF) {
-        std::ofstream ofsGDF(outputFolder + "/net.gdf", std::ios::binary);
-        writeGDF(ofsGDF, net, tensorMap, tensorType, fixedPointPosition, convertPolicy, roundPolicy, isVirtualEnabled, outputFolder, bFuseScaleWithBatchNorm);
-    }
-
-    if(generateVXC) {
-        std::ofstream ofsCodeH(outputFolder + "/annmodule.h", std::ios::binary);
-        std::ofstream ofsCodeC(outputFolder + "/annmodule.cpp", std::ios::binary);
-        std::ofstream ofsCodeM(outputFolder + "/CMakeLists.txt", std::ios::binary);
-        std::ofstream ofsCodeA(outputFolder + "/anntest.cpp", std::ios::binary);
-        std::string dir = outputFolder + "/cmake";
-        mkdir(dir.c_str(), 0777);
-        std::ofstream ofsCodeD(dir + "/FindOpenCL.cmake", std::ios::binary);
-        generateCode(ofsCodeH, ofsCodeC, ofsCodeM, ofsCodeA, ofsCodeD,
-            net, tensorMap, tensorType, fixedPointPosition, convertPolicy, roundPolicy,
-            isVirtualEnabled, outputFolder,
-            bInputIsImage, inputImageType, bInputChannelReverse, fInputConversionA, fInputConversionB,
-            bOutputArgmax, bOutputIsImage, argmaxOutputDataType, argmaxTopK, argmaxLut,
-            bEnableErrorMessages,
-            bFuseScaleWithBatchNorm);
-    }
-
-    return 0;
-}
diff --git a/utilities/inference_generator/src/nnef2openvx.cpp b/utilities/inference_generator/src/nnef2openvx.cpp
deleted file mode 100644
index 32d6f294c0..0000000000
--- a/utilities/inference_generator/src/nnef2openvx.cpp
+++ /dev/null
@@ -1,1848 +0,0 @@
-/*
-Copyright (c) 2017 - 2023 Advanced Micro Devices, Inc. All rights reserved.
-
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in
-all copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
-THE SOFTWARE.
-*/
-
-#include "flat/flat_parser.h"
-#include <iostream>
-#include <fstream>
-#include <sstream>
-#include <cstring>
-#include <set>
-#include <math.h>
-#include <string.h>
-#include <stdlib.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-
-////
-// MAGIC numbers
-//
-#define VARIABLES_FILE_MAGIC 0xF00DD1E0
-#define VARIABLES_DATA_MAGIC 0xF00DD1E1
-#define VARIABLES_EOFF_MAGIC 0xF00DD1E2
-
-////
-// NNEF to OpenVX Translator
-//
-class NNEF2OpenVX_Translator : public nnef::Parser::Callback
-{
-public:
-    NNEF2OpenVX_Translator(std::string nnefFolder_, std::string openvxFolder_, bool useVirtual_, int verbose_)
-      : nnefFolder(nnefFolder_), openvxFolder(openvxFolder_), useVirtual(useVirtual_), verbose(verbose_)
-    {
-    }
-
-protected:
-    ////
-    // class variables
-    //
-    int verbose;
-    bool useVirtual;
-    std::string nnefFolder;
-    std::string openvxFolder;
-    std::string openvxFilenameC;
-    std::ofstream ovxC;
-    std::vector<std::string> inputList;
-    std::vector<std::string> outputList;
-    std::vector<std::string> virtualList;
-    std::vector<std::string> variableList;
-    std::map<std::string,std::tuple<size_t, char *>> variableBinary;
-    std::map<std::string,nnef::Shape> inputShape;
-    std::map<std::string,nnef::Shape> outputShape;
-    std::map<std::string,nnef::Shape> virtualShape;
-    std::map<std::string,nnef::Shape> variableShape;
-    std::map<std::string,std::string> variableLabel;
-    std::map<std::string,size_t> variableRequiredDims;
-    std::vector<nnef::Prototype> opsProto;
-    std::vector<nnef::Dictionary<nnef::Value>> opsValues;
-    std::vector<nnef::Dictionary<nnef::Shape>> opsShapes;
-    std::vector<bool> operationRemoved;
-    std::map<std::string,bool> variableMerged;
-    std::map<std::string,std::string> virtualRename;
-    std::map<std::string,std::string> convNewBiasName;
-
-private:
-    // utility functions
-    static void getTensorDims(const nnef::Shape& shape, std::vector<size_t>& dims, size_t num_dims)
-    {
-        size_t rank = shape.rank();
-        if(num_dims == 0)
-            num_dims = rank;
-        dims.clear();
-        size_t count = 0;
-        if(rank > 1) {
-            for(; count < (num_dims - rank); count++) {
-                dims.push_back(1);
-            }
-        }
-        for(size_t i = 0; i < rank; i++, count++) {
-            dims.push_back(shape[rank-1-i]);
-        }
-        for(; count < num_dims; count++) {
-            dims.push_back(1);
-        }
-    }
-    static std::string codeGenTensorCreate (const std::string& name, const nnef::Shape& shape, bool useVirtual, size_t num_dims)
-    {
-        std::stringstream ss;
-        std::vector<size_t> dims;
-        getTensorDims(shape, dims, num_dims);
-        ss << "    vx_size " << name << "_dims[" << dims.size() << "] = {";
-        for(size_t i = 0; i < dims.size(); i++) {
-            ss << (i == 0 ? " " : ", ") << dims[i];
-        }
-        ss << " };" << std::endl;
-        ss << "    vx_tensor " << name << " = "
-           << (useVirtual ? "vxCreateVirtualTensor(graph, " : "vxCreateTensor(context, ")
-           << dims.size() << ", " << name << "_dims, VX_TYPE_FLOAT32, 0);" << std::endl;
-        ss << "    ERROR_CHECK_OBJECT(" << name << ");" << std::endl;
-        return ss.str();
-    }
-    static unsigned int loadTensorFile(const std::string& nnefFolder, const std::string& label, const nnef::Shape& shape, char *& data)
-    {
-        std::string fileName = nnefFolder + "/" + label + ".dat";
-        FILE * fp = fopen(fileName.c_str(), "rb");
-        if(!fp) {
-            printf("ERROR: unable to open: %s\n", fileName.c_str());
-            exit(1);
-        }
-        enum TensorDataType : unsigned char {
-            TensorDataType_Float,
-            TensorDataType_Quantized,
-            TensorDataType_Signed,
-            TensorDataType_Unsigned
-        };
-        struct TensorFileHeader {
-            unsigned char  magic[2];
-            unsigned char  major;
-            unsigned char  minor;
-            unsigned int   offset;
-            unsigned int   rank;
-            unsigned int   dim[8];
-            unsigned char  data_type;
-            unsigned char  bit_width;
-            unsigned short quant_alg_len;
-            char           quant_alg[1024];
-        } h = { 0 };
-        unsigned int offset = 0;
-        offset += fread(&h.magic, 1, sizeof(h.magic), fp);
-        offset += fread(&h.major, 1, sizeof(h.major), fp);
-        offset += fread(&h.minor, 1, sizeof(h.minor), fp);
-        offset += fread(&h.offset, 1, sizeof(h.offset), fp);
-        offset += fread(&h.rank, 1, sizeof(h.rank), fp);
-        if(h.rank > 0) {
-            offset += fread(h.dim, 1, h.rank * sizeof(h.dim[0]), fp);
-        }
-        offset += fread(&h.data_type, 1, sizeof(h.data_type), fp);
-        offset += fread(&h.bit_width, 1, sizeof(h.bit_width), fp);
-        offset += fread(&h.quant_alg_len, 1, sizeof(h.quant_alg_len), fp);
-        if(h.quant_alg_len > 0) {
-            offset += fread(h.quant_alg, 1, h.quant_alg_len, fp);
-        }
-        if(h.magic[0] != 0x4e || h.magic[1] != 0xef || h.major != 1 || h.minor != 0
-                              || h.bit_width == 0 || h.rank > 8 || h.quant_alg_len >= 1024
-                              || (12 + h.rank * 4 + 4 + h.quant_alg_len) != offset || h.offset < offset)
-        {
-            printf("ERROR: invalid or unsupported tensor file: %s\n", fileName.c_str());
-            printf(" [ 0x%02x, 0x%02x, %d, %d, %d, %d, {", h.magic[0], h.magic[1], h.major, h.minor, h.offset, h.rank);
-            for(unsigned int i = 0; i < h.rank; i++) printf(" %d", h.dim[i]);
-            printf(" }, %d, %d, %d, '%s' ] offset = %d\n", h.data_type, h.bit_width, h.quant_alg_len, h.quant_alg, offset);
-            exit(1);
-        }
-        if(h.offset > offset) {
-            fseek(fp, h.offset, SEEK_SET);
-        }
-        unsigned int size = h.bit_width;
-        for(unsigned int i = 0; i < h.rank; i++) {
-            size *= h.dim[i];
-            if(h.dim[i] != shape[i]) {
-                printf("ERROR: dimension[%d] mismatch: %d in %s (must be %d)\n", i, h.dim[i], fileName.c_str(), shape[i]);
-                exit(1);
-            }
-        }
-        size = (size + 7) >> 3;
-        data = nullptr;
-        if(h.data_type == TensorDataType_Float && h.bit_width == 32) {
-            data = new char [size];
-            if(!data) {
-                printf("ERROR: memory allocation for %d bytes failed for %s\n", size, fileName.c_str());
-                exit(1);
-            }
-            unsigned int n = fread(data, 1, size, fp);
-            if(n != size) {
-                printf("ERROR: unable to read %d bytes of data from %s\n", size, fileName.c_str());
-                exit(1);
-            }
-        }
-        else {
-            printf("ERROR: import of Tensor DataType=%d BitWidth=%d is not yet supported\n", h.data_type, h.bit_width);
-            exit(1);
-        }
-        fclose(fp);
-        return size;
-    }
-
-    std::string virtualName(const std::string name)
-    {
-        auto it = virtualRename.find(name);
-        return (it != virtualRename.end()) ? it->second : name;
-    }
-
-    void codeGenOperation(size_t pos, bool getVariables, bool genCode, int verbose)
-    {
-        ////
-        // make sure that operation is not disabled
-        //
-        if(operationRemoved[pos]) {
-            return;
-        }
-
-        ////
-        // get operation details
-        //
-        const nnef::Prototype& proto = opsProto[pos];
-        const nnef::Dictionary<nnef::Value>& args = opsValues[pos];
-        const nnef::Dictionary<nnef::Shape>& shapes = opsShapes[pos];
-        if(verbose & 1) {
-            std::cout << '\t';
-            for ( size_t i = 0; i < proto.resultCount(); ++i ) {
-                auto& result = proto.result(i);
-                if ( i ) std::cout << ", ";
-                std::cout << args[result.name()];
-            }
-            std::cout << " = " << proto.name() << "(";
-            for ( size_t i = 0; i < proto.paramCount(); ++i ) {
-                auto& param = proto.param(i);
-                if ( i ) std::cout << ", ";
-                if ( !param.type()->isTensor() )
-                    std::cout << param.name() << " = ";
-                std::cout << args[param.name()];
-            }
-            std::cout << ")" << std::endl;
-        }
-
-        ////
-        // utility functions
-        //
-        auto getTensorOrScalar = [] (const nnef::Value& v) -> std::string {
-            std::string value = "0";
-            if(v) {
-                if(v.kind() == nnef::Value::Tensor) {
-                    value = v.tensor().id;
-                }
-                else if(v.kind() == nnef::Value::Scalar) {
-                    value = std::to_string(v.scalar());
-                }
-            }
-            return value;
-        };
-        auto getExtentArray = [] (const nnef::Value& v) -> std::vector<size_t> {
-            std::vector<size_t> value;
-            if(v && v.kind() == nnef::Value::Array) {
-                auto&& a = v.array();
-                for(auto& i : a) {
-                    value.push_back(i.integer());
-                }
-            }
-            return value;
-        };
-        auto getPaddingInfo = [] (const nnef::Value& v, size_t pad[4]) {
-            std::vector<size_t> value;
-            if(v && v.kind() == nnef::Value::Array) {
-                auto&& a = v.array();
-                if(a.size() == 2) {
-                    pad[0] = a[0][0].integer();
-                    pad[1] = a[0][1].integer();
-                    pad[2] = a[1][0].integer();
-                    pad[3] = a[1][1].integer();
-                    // TODO: protection against -ve values
-                    if(pad[0] > 16384) pad[0] = 0;
-                    if(pad[1] > 16384) pad[1] = 0;
-                    if(pad[2] > 16384) pad[2] = 0;
-                    if(pad[3] > 16384) pad[3] = 0;
-                }
-            }
-        };
-
-        ////
-        // process operations
-        //
-        std::string opname = proto.name();
-        if(opname == "external") {
-            const std::string& output = args["output"].tensor().id;
-            const nnef::Shape& shape = shapes[output];
-            if(verbose & 2) {
-                std::cout << opname << " " << output << " " << shape << std::endl;
-            }
-            if(getVariables) {
-                inputShape[output] = shape;
-            }
-        }
-        else if(opname == "variable") {
-            const std::string& output = args["output"].tensor().id;
-            const nnef::Shape& shape = shapes[output];
-            const std::string& label = args["label"].string();
-            if(verbose & 2) {
-                std::cout << opname << " " << output << " " << shape << " label=" << label << std::endl;
-            }
-            if(getVariables) {
-                variableList.push_back(output);
-                variableMerged[output] = false;
-                variableShape[output] = shape;
-                variableLabel[output] = label;
-            }
-        }
-        else if(opname == "conv") {
-            const std::string& output = args["output"].tensor().id;
-            const nnef::Shape& shape = shapes[output];
-            const std::string& input = args["input"].tensor().id;
-            const std::string& filter = args["filter"].tensor().id;
-                  std::string  bias = getTensorOrScalar(args["bias"]);
-            const std::string& border = args["border"].string();
-            const auto& padding = args["padding"];
-            const auto& stride = args["stride"];
-            const auto& dilation = args["dilation"];
-            const auto& groups = args["groups"] ? args["groups"].integer() : 1;
-            if(verbose & 2) {
-                std::cout << opname << " " << output << " " << shape << " " << input << " " << filter << " " << bias
-                          << " border=" << border << " " << padding << " " << stride << " " << dilation << " " << groups << std::endl;
-            }
-            if(getVariables) {
-                if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) {
-                    virtualList.push_back(output);
-                    virtualShape[output] = shape;
-                }
-                else {
-                    outputShape[output] = shape;
-                }
-                variableRequiredDims[filter] = 4;
-                if(bias[0] != '0') {
-                    variableRequiredDims[bias] = 2;
-                }
-            }
-            if(genCode) {
-                if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) {
-                    ovxC << codeGenTensorCreate(output, shape, useVirtual, 4);
-                }
-                if(bias[0] == '0') {
-                    if(convNewBiasName.find(output) != convNewBiasName.end()) {
-                        bias = convNewBiasName.find(output)->second;
-                    }
-                }
-                if(shape[2] == 1 && shape[3] == 1) {
-                    ovxC << "    { vx_node node = vxFullyConnectedLayer(graph, " << virtualName(input) << ", " << filter << ", "
-                         << ((bias[0] == '0') ? "NULL" : bias) << ", VX_CONVERT_POLICY_SATURATE, VX_ROUND_POLICY_TO_NEAREST_EVEN, " << output << ");" << std::endl;
-                    ovxC << "      ERROR_CHECK_STATUS(vxReleaseNode(&node));" << std::endl;
-                    ovxC << "    }" << std::endl;
-                }
-                else {
-                    std::vector<size_t>&& vDilation = getExtentArray(dilation);
-                    size_t pad[4] = { 0, 0, 0, 0 };
-                    getPaddingInfo(padding, pad);
-                    ovxC << "    { vx_nn_convolution_params_t conv_params = { 0 };" << std::endl;
-                    ovxC << "      conv_params.padding_x = " << pad[1] << ";" << std::endl;
-                    ovxC << "      conv_params.padding_y = " << pad[0] << ";" << std::endl;
-                    ovxC << "      conv_params.dilation_x = " << (vDilation.size() > 1 ? vDilation[1] - 1 : 0) << ";" << std::endl;
-                    ovxC << "      conv_params.dilation_y = " << (vDilation.size() > 0 ? vDilation[0] - 1 : 0) << ";" << std::endl;
-                    ovxC << "      conv_params.overflow_policy = " << "VX_CONVERT_POLICY_SATURATE" << ";" << std::endl;
-                    ovxC << "      conv_params.rounding_policy = " << "VX_ROUND_POLICY_TO_NEAREST_EVEN" << ";" << std::endl;
-                    ovxC << "      conv_params.down_scale_size_rounding = " << "VX_NN_DS_SIZE_ROUNDING_FLOOR" << ";" << std::endl;
-                    ovxC << "      vx_node node = vxConvolutionLayer(graph, " << virtualName(input) << ", " << filter << ", "
-                         << ((bias[0] == '0') ? "NULL" : bias) << ", &conv_params, sizeof(conv_params), " << output << ");" << std::endl;
-                    ovxC << "      ERROR_CHECK_STATUS(vxReleaseNode(&node));" << std::endl;
-                    ovxC << "    }" << std::endl;
-                }
-            }
-        }
-        else if(opname == "relu") {
-            const std::string& output = args["y"].tensor().id;
-            const nnef::Shape& shape = shapes[output];
-            const std::string& input = args["x"].tensor().id;
-            if(verbose & 2) {
-                std::cout << opname << " " << output << " " << shape << " " << input << std::endl;
-            }
-            if(getVariables) {
-                if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) {
-                    virtualList.push_back(output);
-                    virtualShape[output] = shape;
-                }
-                else {
-                    outputShape[output] = shape;
-                }
-            }
-            if(genCode) {
-                if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) {
-                    ovxC << codeGenTensorCreate(output, shape, useVirtual, 4);
-                }
-                ovxC << "    { vx_node node = vxActivationLayer(graph, " << virtualName(input) << ", VX_NN_ACTIVATION_RELU, 0.0f, 0.0f, " << output << ");" << std::endl;
-                ovxC << "      ERROR_CHECK_STATUS(vxReleaseNode(&node));" << std::endl;
-                ovxC << "    }" << std::endl;
-            }
-        }
-        else if(opname == "max_pool") {
-            const std::string& output = args["output"].tensor().id;
-            const nnef::Shape& shape = shapes[output];
-            const std::string& input = args["input"].tensor().id;
-            const auto& size = args["size"];
-            const std::string& border = args["border"].string();
-            const auto& padding = args["padding"];
-            const auto& stride = args["stride"];
-            const auto& dilation = args["dilation"];
-            if(verbose & 2) {
-                std::cout << opname << " " << output << " " << shape << " " << input
-                          << " size=" << size << " border=" << border << " " << padding << " " << stride << " " << dilation << std::endl;
-            }
-            if(getVariables) {
-                if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) {
-                    virtualList.push_back(output);
-                    virtualShape[output] = shape;
-                }
-                else {
-                    outputShape[output] = shape;
-                }
-            }
-            if(genCode) {
-                if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) {
-                    ovxC << codeGenTensorCreate(output, shape, useVirtual, 4);
-                }
-                std::vector<size_t>&& vSize = getExtentArray(size);
-                size_t pad[4] = { 0, 0, 0, 0 };
-                getPaddingInfo(padding, pad);
-                ovxC << "    { vx_node node = vxPoolingLayer(graph, " << virtualName(input) << ", VX_NN_POOLING_MAX, "
-                     << size[3] << ", " << size[2] << ", " << pad[1] << ", " << pad[0] << ", "
-                     << "VX_ROUND_POLICY_TO_NEAREST_EVEN, " << output << ");" << std::endl;
-                ovxC << "      ERROR_CHECK_STATUS(vxReleaseNode(&node));" << std::endl;
-                ovxC << "    }" << std::endl;
-            }
-        }
-        else if(opname == "avg_pool") {
-            const std::string& output = args["output"].tensor().id;
-            const nnef::Shape& shape = shapes[output];
-            const std::string& input = args["input"].tensor().id;
-            const auto& size = args["size"];
-            const std::string& border = args["border"].string();
-            const auto& padding = args["padding"];
-            const auto& stride = args["stride"];
-            const auto& dilation = args["dilation"];
-            if(verbose & 2) {
-                std::cout << opname << " " << output << " " << shape << " " << input
-                          << " size=" << size << " border=" << border << " " << padding << " " << stride << " " << dilation << std::endl;
-            }
-            if(getVariables) {
-                if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) {
-                    virtualList.push_back(output);
-                    virtualShape[output] = shape;
-                }
-                else {
-                    outputShape[output] = shape;
-                }
-            }
-            if(genCode) {
-                if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) {
-                    ovxC << codeGenTensorCreate(output, shape, useVirtual, 4);
-                }
-                std::vector<size_t>&& vSize = getExtentArray(size);
-                size_t pad[4] = { 0, 0, 0, 0 };
-                getPaddingInfo(padding, pad);
-                ovxC << "    { vx_node node = vxPoolingLayer(graph, " << virtualName(input) << ", VX_NN_POOLING_AVG, "
-                     << size[3] << ", " << size[2] << ", " << pad[1] << ", " << pad[0] << ", "
-                     << "VX_ROUND_POLICY_TO_NEAREST_EVEN, " << output << ");" << std::endl;
-                ovxC << "      ERROR_CHECK_STATUS(vxReleaseNode(&node));" << std::endl;
-                ovxC << "    }" << std::endl;
-            }
-        }
-        else if(opname == "concat") {
-            const std::string& output = args["value"].tensor().id;
-            const nnef::Shape& shape = shapes[output];
-            std::vector<std::string> itemList;
-            const auto& inputpar = args["values"];
-            for(size_t i = 0; i < inputpar.size(); i++) {
-                std::string name = inputpar[i].tensor().id;
-                itemList.push_back(name);
-            }
-            const int axis = args["axis"].integer();
-            if(verbose & 2) {
-                std::cout << opname << " " << output << " " << shape << " [";
-                for(auto& v : itemList) std::cout << " " << v;
-                std::cout << " ] axis=" << axis << std::endl;
-            }
-            if(getVariables) {
-                if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) {
-                    virtualList.push_back(output);
-                    virtualShape[output] = shape;
-                }
-                else {
-                    outputShape[output] = shape;
-                }
-            }
-            if(genCode) {
-                if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) {
-                    ovxC << codeGenTensorCreate(output, shape, useVirtual, 4);
-                }
-                ovxC << "    { vx_node node = vxConcatLayer(graph, " << output;
-                for(auto& v : itemList) {
-                    ovxC << ", " << virtualName(v);
-                }
-                for(size_t i = itemList.size(); i < 8; i++) {
-                    ovxC << ", NULL";
-                }
-                ovxC << ");" << std::endl;
-                ovxC << "      ERROR_CHECK_STATUS(vxReleaseNode(&node));" << std::endl;
-                ovxC << "    }" << std::endl;
-            }
-        }
-        else if(opname == "batch_normalization") {
-            const std::string& output = args["output"].tensor().id;
-            const nnef::Shape& shape = shapes[output];
-            const std::string& input = args["input"].tensor().id;
-            const std::string& mean = args["mean"].tensor().id;
-            const std::string& variance = args["variance"].tensor().id;
-            std::string scale = getTensorOrScalar(args["scale"]);
-            std::string offset = getTensorOrScalar(args["offset"]);
-            const float epsilon = args["epsilon"].scalar();
-            if(verbose & 2) {
-                std::cout << opname << " " << output << " " << shape << " " << input
-                          << " " << mean << " " << variance << " " << offset << " " << scale << " " << epsilon << std::endl;
-            }
-            if(getVariables) {
-                if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) {
-                    virtualList.push_back(output);
-                    virtualShape[output] = shape;
-                }
-                else {
-                    outputShape[output] = shape;
-                }
-            }
-            if(genCode) {
-                if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) {
-                    ovxC << codeGenTensorCreate(output, shape, useVirtual, 4);
-                }
-                ovxC << "    { vx_node node = vxBatchNormalizationLayer(graph, " << virtualName(input) << ", " << mean << ", " << variance
-                     << ", " << (scale[0] == '1' ? "NULL" : scale) << ", " << (offset[0] == '0' ? "NULL" : offset)
-                     << ", " << epsilon << ", " << output << ");" << std::endl;
-                ovxC << "      ERROR_CHECK_STATUS(vxReleaseNode(&node));" << std::endl;
-                ovxC << "    }" << std::endl;
-            }
-        }
-        else if(opname == "mul") {
-            const std::string& output = args["z"].tensor().id;
-            const nnef::Shape& shape = shapes[output];
-            const std::string& input1 = args["x"].tensor().id;
-            const std::string& input2 = args["y"].tensor().id;
-            if(verbose & 2) {
-                std::cout << opname << " " << output << " " << shape << " " << input1 << " " << input2 << std::endl;
-            }
-            if(getVariables) {
-                if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) {
-                    virtualList.push_back(output);
-                    virtualShape[output] = shape;
-                }
-                else {
-                    outputShape[output] = shape;
-                }
-            }
-            if(genCode) {
-                if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) {
-                    ovxC << codeGenTensorCreate(output, shape, useVirtual, 4);
-                }
-                ovxC << "    { float one = 1.0f;" << std::endl;
-                ovxC << "      vx_scalar scale = vxCreateScalar(context, VX_TYPE_FLOAT32, &one);" << std::endl;
-                ovxC << "      vx_node node = vxTensorMultiplyNode(graph, " << virtualName(input1) << ", " << virtualName(input2) << ", scale, VX_CONVERT_POLICY_SATURATE, VX_ROUND_POLICY_TO_NEAREST_EVEN, " << output << ");" << std::endl;
-                ovxC << "      ERROR_CHECK_STATUS(vxReleaseScalar(&scale));" << std::endl;
-                ovxC << "      ERROR_CHECK_STATUS(vxReleaseNode(&node));" << std::endl;
-                ovxC << "    }" << std::endl;
-            }
-        }
-        else if(opname == "add") {
-            const std::string& output = args["z"].tensor().id;
-            const nnef::Shape& shape = shapes[output];
-            const std::string& input1 = args["x"].tensor().id;
-            const std::string& input2 = args["y"].tensor().id;
-            if(verbose & 2) {
-                std::cout << opname << " " << output << " " << shape << " " << input1 << " " << input2 << std::endl;
-            }
-            if(getVariables) {
-                if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) {
-                    virtualList.push_back(output);
-                    virtualShape[output] = shape;
-                }
-                else {
-                    outputShape[output] = shape;
-                }
-            }
-            if(genCode) {
-                if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) {
-                    ovxC << codeGenTensorCreate(output, shape, useVirtual, 4);
-                }
-                ovxC << "    { vx_node node = vxTensorAddNode(graph, " << virtualName(input1) << ", " << virtualName(input2) << ", VX_CONVERT_POLICY_SATURATE, " << output << ");" << std::endl;
-                ovxC << "      ERROR_CHECK_STATUS(vxReleaseNode(&node));" << std::endl;
-                ovxC << "    }" << std::endl;
-            }
-        }
-        else if(opname == "softmax") {
-            const std::string& output = args["y"].tensor().id;
-            const nnef::Shape& shape = shapes[output];
-            const std::string& input = args["x"].tensor().id;
-            std::vector<size_t>&& axes = getExtentArray(args["axes"]);
-            if(verbose & 2) {
-                std::cout << opname << " " << output << " " << shape << " " << input << " " << args["axes"] << std::endl;
-            }
-            if(axes.size() != 1 || axes[0] != 1) {
-                std::cout << "ERROR: " << opname << " with " << args["axes"] << " is *** not yet supported ***" << std::endl;
-                exit(1);
-            }
-            if(getVariables) {
-                if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) {
-                    virtualList.push_back(output);
-                    virtualShape[output] = shape;
-                }
-                else {
-                    outputShape[output] = shape;
-                }
-            }
-            if(genCode) {
-                if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) {
-                    ovxC << codeGenTensorCreate(output, shape, useVirtual, 4);
-                }
-                ovxC << "    { vx_node node = vxSoftmaxLayer(graph, " << virtualName(input) << ", " << output << ");" << std::endl;
-                ovxC << "      ERROR_CHECK_STATUS(vxReleaseNode(&node));" << std::endl;
-                ovxC << "    }" << std::endl;
-            }
-        }
-        else if(opname == "sum_reduce") {
-            const std::string& output = args["output"].tensor().id;
-            const nnef::Shape& shape = shapes[output];
-            const std::string& input = args["input"].tensor().id;
-            const auto& axes = args["axes"];
-            const bool normalize = args["normalize"].logical();
-            if(verbose & 2) {
-                std::cout << opname << " " << output << " " << shape << " " << input << " " << axes << " " << normalize << std::endl;
-            }
-            if(getVariables) {
-                if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) {
-                    virtualList.push_back(output);
-                    virtualShape[output] = shape;
-                }
-                else {
-                    outputShape[output] = shape;
-                }
-            }
-            if(genCode) {
-                if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) {
-                    ovxC << codeGenTensorCreate(output, shape, useVirtual, 4);
-                }
-                std::cout << opname << " *** not yet supported ***" << std::endl;
-                exit(1);
-            }
-        }
-        else if(opname == "mean_reduce") {
-            const std::string& output = args["output"].tensor().id;
-            const nnef::Shape& shape = shapes[output];
-            const std::string& input = args["input"].tensor().id;
-            const auto& axes = args["axes"];
-            if(verbose & 2) {
-                std::cout << opname << " " << output << " " << shape << " " << input << " " << axes << std::endl;
-            }
-            if(getVariables) {
-                if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) {
-                    virtualList.push_back(output);
-                    virtualShape[output] = shape;
-                }
-                else {
-                    outputShape[output] = shape;
-                }
-            }
-            if(genCode) {
-                if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) {
-                    ovxC << codeGenTensorCreate(output, shape, useVirtual, 4);
-                }
-                std::cout << opname << " *** not yet supported ***" << std::endl;
-                exit(1);
-            }
-        }
-        else {
-            std::cout << opname << " *** not yet supported ***" << std::endl;
-            exit(1);
-        }
-    }
-
-    void codeGenMergeVariables()
-    {
-        auto getTensorOrScalar = [] (const nnef::Value& v) -> std::string {
-            std::string value = "0";
-            if(v) {
-                if(v.kind() == nnef::Value::Tensor) {
-                    value = v.tensor().id;
-                }
-                else if(v.kind() == nnef::Value::Scalar) {
-                    value = std::to_string(v.scalar());
-                }
-            }
-            return value;
-        };
-
-        size_t prevPos = 0;
-        std::string prevOpName = "", prevOutput = "";
-        for(size_t pos = 0; pos < opsProto.size(); pos++) {
-            std::string opname = opsProto[pos].name();
-            if(prevOpName == "batch_normalization" && opname == "conv") {
-                // get "batch_normalization" variables
-                const nnef::Dictionary<nnef::Value>& argsBN = opsValues[prevPos];
-                const nnef::Dictionary<nnef::Shape>& shapesBN = opsShapes[prevPos];
-                const std::string& inputBN = argsBN["input"].tensor().id;
-                const std::string& mean = argsBN["mean"].tensor().id;
-                const std::string& variance = argsBN["variance"].tensor().id;
-                std::string scale = getTensorOrScalar(argsBN["scale"]);
-                std::string offset = getTensorOrScalar(argsBN["offset"]);
-                const float epsilon = argsBN["epsilon"].scalar();
-                const nnef::Shape& shapeMean = shapesBN[mean];
-                // get "conv" variables
-                const nnef::Dictionary<nnef::Value>& argsConv = opsValues[pos];
-                const nnef::Dictionary<nnef::Shape>& shapesConv = opsShapes[pos];
-                const std::string& outputConv = argsConv["output"].tensor().id;
-                const std::string& filter = argsConv["filter"].tensor().id;
-                const std::string& bias = getTensorOrScalar(argsConv["bias"]);
-                const nnef::Shape& shapeFilter = shapesConv[filter];
-                // get filter and mean dimensions
-                size_t filterDimsCount = shapeFilter.rank(), meanDimsCount = shapeMean.rank();
-                std::vector<size_t> filterDims, meanDims;
-                getTensorDims(shapeFilter, filterDims, filterDimsCount);
-                getTensorDims(shapeMean, meanDims, meanDimsCount);
-                // check validity of dimensions
-                size_t K = (filterDimsCount == 4) ? filterDims[3] : filterDims[1];
-                size_t N = (filterDimsCount == 4) ? (filterDims[0] * filterDims[1] * filterDims[2]) : filterDims[0];
-                if((filterDimsCount == 4 || filterDimsCount == 2) && meanDimsCount == 2 && K == meanDims[0]) {
-                    // fuse batch_normalization variables into conv variables
-                    std::tuple<unsigned int, char *> filterBinary = variableBinary[filter];
-                    std::tuple<unsigned int, char *> meanBinary = variableBinary[mean];
-                    std::tuple<unsigned int, char *> varianceBinary = variableBinary[variance];
-                    float * filterBuf = (float *)std::get<1>(filterBinary);
-                    float * biasBuf = nullptr;
-                    float * meanBuf = (float *)std::get<1>(meanBinary);
-                    float * varianceBuf = (float *)std::get<1>(varianceBinary);
-                    float * scaleBuf = nullptr;
-                    float * offsetBuf = nullptr;
-                    if(bias[0] != '0') {
-                        std::tuple<unsigned int, char *> biasBinary = variableBinary[bias];
-                        biasBuf = (float *)std::get<1>(biasBinary);
-                    }
-                    else if(convNewBiasName.find(outputConv) != convNewBiasName.end()) {
-                        std::tuple<unsigned int, char *> biasBinary = variableBinary[convNewBiasName[outputConv]];
-                        biasBuf = (float *)std::get<1>(biasBinary);
-                    }
-                    else {
-                        size_t size = K * sizeof(float);
-                        char * data = new char [size];
-                        biasBuf = (float *)data;
-                        for(size_t i = 0; i < K; i++) {
-                            biasBuf[i] = 0;
-                        }
-                        std::string name = filter + "__new_bias";
-                        std::tuple<unsigned int, char *> binary(size, data);
-                        variableBinary[name] = binary;
-                        convNewBiasName[outputConv] = name;
-                        variableList.push_back(name);
-                        variableMerged[name] = false;
-                        nnef::Shape shape(1);
-                        shape[0] = K;
-                        shape[1] = 1;
-                        variableShape[name] = shape;
-                        variableRequiredDims[name] = 2;
-                    }
-                    if(scale[0] != '1') {
-                        scaleBuf = (float *)std::get<1>(variableBinary[scale]);
-                    }
-                    if(offset[0] != '0') {
-                        offsetBuf = (float *)std::get<1>(variableBinary[offset]);
-                    }
-                    for(size_t k = 0; k < K; k++) {
-                        double mk = 1.0 / sqrt((double)varianceBuf[k] + epsilon);
-                        double ck = -meanBuf[k] * mk;
-                        if(scaleBuf) {
-                            mk *= scaleBuf[k];
-                            ck *= scaleBuf[k];
-                        }
-                        if(offsetBuf) {
-                            ck += offsetBuf[k];
-                        }
-                        float * W = &filterBuf[k*N];
-                        double Wsum = 0;
-                        for(size_t j = 0; j < N; j++) {
-                            Wsum += W[j];
-                            W[j] = (float)(W[j] * mk);
-                        }
-                        if(biasBuf) {
-                            biasBuf[k] = (float)(Wsum * ck + biasBuf[k]);
-                        }
-                    }
-                    // mark that batch_normalization is disabled and rename output as input
-                    operationRemoved[prevPos] = true;
-                    virtualRename[argsConv["input"].tensor().id] = inputBN;
-                    // mark the merged variables
-                    variableMerged[mean] = true;
-                    variableMerged[variance] = true;
-                    if(scaleBuf) variableMerged[scale] = true;
-                    if(offsetBuf) variableMerged[offset] = true;
-                }
-                // use conv as previous layer
-                prevPos = pos;
-                prevOpName = opname;
-                prevOutput = argsConv["output"].tensor().id;
-            }
-            else if(prevOpName == "conv" && opname == "batch_normalization") {
-                // get "conv" variables
-                const nnef::Dictionary<nnef::Value>& argsConv = opsValues[prevPos];
-                const nnef::Dictionary<nnef::Shape>& shapesConv = opsShapes[prevPos];
-                const std::string& outputConv = argsConv["output"].tensor().id;
-                const std::string& filter = argsConv["filter"].tensor().id;
-                const std::string& bias = getTensorOrScalar(argsConv["bias"]);
-                const nnef::Shape& shapeFilter = shapesConv[filter];
-                // get "batch_normalization" variables
-                const nnef::Dictionary<nnef::Value>& argsBN = opsValues[pos];
-                const nnef::Dictionary<nnef::Shape>& shapesBN = opsShapes[pos];
-                const std::string& mean = argsBN["mean"].tensor().id;
-                const std::string& variance = argsBN["variance"].tensor().id;
-                std::string scale = getTensorOrScalar(argsBN["scale"]);
-                std::string offset = getTensorOrScalar(argsBN["offset"]);
-                const float epsilon = argsBN["epsilon"].scalar();
-                const nnef::Shape& shapeMean = shapesBN[mean];
-                // get filter and mean dimensions
-                size_t filterDimsCount = shapeFilter.rank(), meanDimsCount = shapeMean.rank();
-                std::vector<size_t> filterDims, meanDims;
-                getTensorDims(shapeFilter, filterDims, filterDimsCount);
-                getTensorDims(shapeMean, meanDims, meanDimsCount);
-                // check validity of dimensions
-                size_t K = (filterDimsCount == 4) ? filterDims[3] : filterDims[1];
-                size_t N = (filterDimsCount == 4) ? (filterDims[0] * filterDims[1] * filterDims[2]) : filterDims[0];
-                if((filterDimsCount == 4 || filterDimsCount == 2) && meanDimsCount == 2 && K == meanDims[0]) {
-                    // fuse batch_normalization variables into conv variables
-                    std::tuple<unsigned int, char *> filterBinary = variableBinary[filter];
-                    std::tuple<unsigned int, char *> meanBinary = variableBinary[mean];
-                    std::tuple<unsigned int, char *> varianceBinary = variableBinary[variance];
-                    float * filterBuf = (float *)std::get<1>(filterBinary);
-                    float * biasBuf = nullptr;
-                    float * meanBuf = (float *)std::get<1>(meanBinary);
-                    float * varianceBuf = (float *)std::get<1>(varianceBinary);
-                    float * scaleBuf = nullptr;
-                    float * offsetBuf = nullptr;
-                    if(bias[0] != '0') {
-                        std::tuple<unsigned int, char *> biasBinary = variableBinary[bias];
-                        biasBuf = (float *)std::get<1>(biasBinary);
-                    }
-                    else if(convNewBiasName.find(outputConv) != convNewBiasName.end()) {
-                        std::tuple<unsigned int, char *> biasBinary = variableBinary[convNewBiasName[outputConv]];
-                        biasBuf = (float *)std::get<1>(biasBinary);
-                    }
-                    else {
-                        size_t size = K * sizeof(float);
-                        char * data = new char [size];
-                        biasBuf = (float *)data;
-                        for(size_t i = 0; i < K; i++) {
-                            biasBuf[i] = 0;
-                        }
-                        std::string name = filter + "__new_bias";
-                        std::tuple<unsigned int, char *> binary(size, data);
-                        variableBinary[name] = binary;
-                        convNewBiasName[outputConv] = name;
-                        variableList.push_back(name);
-                        variableMerged[name] = false;
-                        nnef::Shape shape(1);
-                        shape[0] = K;
-                        shape[1] = 1;
-                        variableShape[name] = shape;
-                        variableRequiredDims[name] = 2;
-                    }
-                    if(scale[0] != '1') {
-                        scaleBuf = (float *)std::get<1>(variableBinary[scale]);
-                    }
-                    if(offset[0] != '0') {
-                        offsetBuf = (float *)std::get<1>(variableBinary[offset]);
-                    }
-                    for(size_t k = 0; k < K; k++) {
-                        double mk = 1.0 / sqrt((double)varianceBuf[k] + epsilon);
-                        double ck = -meanBuf[k] * mk;
-                        if(scaleBuf) {
-                            mk *= scaleBuf[k];
-                            ck *= scaleBuf[k];
-                        }
-                        if(offsetBuf) {
-                            ck += offsetBuf[k];
-                        }
-                        float * W = &filterBuf[k*N];
-                        for(size_t j = 0; j < N; j++) {
-                            W[j] = (float)(W[j] * mk);
-                        }
-                        if(biasBuf) {
-                            biasBuf[k] = (float)(mk * biasBuf[k] + ck);
-                        }
-                    }
-                    // mark that batch_normalization is disabled, rename output as input, and use conv as previous layer
-                    operationRemoved[pos] = true;
-                    virtualRename[argsBN["output"].tensor().id] = outputConv;
-                    prevOutput = argsBN["output"].tensor().id;
-                    // mark the merged variables
-                    variableMerged[mean] = true;
-                    variableMerged[variance] = true;
-                    if(scaleBuf) variableMerged[scale] = true;
-                    if(offsetBuf) variableMerged[offset] = true;
-                }
-                else {
-                    // use batch_normalization as previous layer
-                    prevPos = pos;
-                    prevOpName = opname;
-                    prevOutput = argsBN["output"].tensor().id;
-                }
-            }
-            else if((prevOpName == "mul" || prevOpName == "add") && opname == "conv") {
-                // get "mul" or "add" variables
-                const nnef::Dictionary<nnef::Value>& argsOP = opsValues[prevPos];
-                const nnef::Dictionary<nnef::Shape>& shapesOP = opsShapes[prevPos];
-                const std::string& x = argsOP["x"].tensor().id;
-                const std::string& y = argsOP["y"].tensor().id;
-                std::string var, inputBN;
-                nnef::Shape shapeVar;
-                if(std::find(variableList.begin(), variableList.end(), x) != variableList.end()) {
-                    inputBN = y;
-                    var = x;
-                    shapeVar = shapesOP[x];
-                }
-                else if(std::find(variableList.begin(), variableList.end(), y) != variableList.end()) {
-                    inputBN = x;
-                    var = y;
-                    shapeVar = shapesOP[y];
-                }
-                // get "conv" variables
-                const nnef::Dictionary<nnef::Value>& argsConv = opsValues[pos];
-                const nnef::Dictionary<nnef::Shape>& shapesConv = opsShapes[pos];
-                const std::string& outputConv = argsConv["output"].tensor().id;
-                const std::string& filter = argsConv["filter"].tensor().id;
-                const std::string& bias = getTensorOrScalar(argsConv["bias"]);
-                const nnef::Shape& shapeFilter = shapesConv[filter];
-                // get var dimensions
-                size_t filterDimsCount = shapeFilter.rank(), varDimsCount = 0;
-                std::vector<size_t> filterDims, varDims;
-                getTensorDims(shapeFilter, filterDims, filterDimsCount);
-                if(var.length() > 0) {
-                    varDimsCount = shapeVar.rank();
-                    getTensorDims(shapeVar, varDims, varDimsCount);
-                }
-                // check validity of dimensions
-                size_t K = (filterDimsCount == 4) ? filterDims[3] : filterDims[1];
-                size_t N = (filterDimsCount == 4) ? (filterDims[0] * filterDims[1] * filterDims[2]) : filterDims[0];
-                if((filterDimsCount == 4 || filterDimsCount == 2) && varDimsCount == 2 && K == varDims[0]) {
-                    // fuse var into conv variables
-                    std::tuple<unsigned int, char *> filterBinary = variableBinary[filter];
-                    std::tuple<unsigned int, char *> biasBinary = variableBinary[bias];
-                    std::tuple<unsigned int, char *> varBinary = variableBinary[var];
-                    float * filterBuf = (float *)std::get<1>(filterBinary);
-                    float * biasBuf = nullptr;
-                    float * varBuf = (float *)std::get<1>(varBinary);
-                    if(bias[0] != '0') {
-                        std::tuple<unsigned int, char *> biasBinary = variableBinary[bias];
-                        biasBuf = (float *)std::get<1>(biasBinary);
-                    }
-                    else if(convNewBiasName.find(outputConv) != convNewBiasName.end()) {
-                        std::tuple<unsigned int, char *> biasBinary = variableBinary[convNewBiasName[outputConv]];
-                        biasBuf = (float *)std::get<1>(biasBinary);
-                    }
-                    else {
-                        size_t size = K * sizeof(float);
-                        char * data = new char [size];
-                        biasBuf = (float *)data;
-                        for(size_t i = 0; i < K; i++) {
-                            biasBuf[i] = 0;
-                        }
-                        std::string name = filter + "__new_bias";
-                        std::tuple<unsigned int, char *> binary(size, data);
-                        variableBinary[name] = binary;
-                        convNewBiasName[outputConv] = name;
-                        variableList.push_back(name);
-                        variableMerged[name] = false;
-                        nnef::Shape shape(1);
-                        shape[0] = K;
-                        shape[1] = 1;
-                        variableShape[name] = shape;
-                        variableRequiredDims[name] = 2;
-                    }
-                    if(prevOpName == "mul") {
-                        for(size_t k = 0; k < K; k++) {
-                            double mk = varBuf[k];
-                            size_t N = filterDims[0] * filterDims[1] * filterDims[2];
-                            float * W = &filterBuf[k*N];
-                            for(size_t j = 0; j < N; j++) {
-                                W[j] = (float)(W[j] * mk);
-                            }
-                        }
-                    }
-                    else {
-                        for(size_t k = 0; k < K; k++) {
-                            double ck = varBuf[k];
-                            size_t N = filterDims[0] * filterDims[1] * filterDims[2];
-                            float * W = &filterBuf[k*N];
-                            double Wsum = 0;
-                            for(size_t j = 0; j < N; j++) {
-                                Wsum += W[j];
-                            }
-                            biasBuf[k] = (float)(ck * Wsum + biasBuf[k]);
-                        }
-                    }
-                    // mark that OP is disabled, rename output as input, and use conv as previous layer
-                    operationRemoved[prevPos] = true;
-                    virtualRename[argsConv["input"].tensor().id] = inputBN;
-                    prevOutput = argsConv["output"].tensor().id;
-                    // mark the merged variables
-                    variableMerged[var] = true;
-                }
-                else {
-                    // use conv as previous layer
-                    prevPos = pos;
-                    prevOpName = opname;
-                    prevOutput = argsConv["output"].tensor().id;
-                }
-            }
-            else if(prevOpName == "conv" && (opname == "mul" || opname == "add")) {
-                // get "conv" variables
-                const nnef::Dictionary<nnef::Value>& argsConv = opsValues[prevPos];
-                const nnef::Dictionary<nnef::Shape>& shapesConv = opsShapes[prevPos];
-                const std::string& outputConv = argsConv["output"].tensor().id;
-                const std::string& filter = argsConv["filter"].tensor().id;
-                const std::string& bias = getTensorOrScalar(argsConv["bias"]);
-                const nnef::Shape& shapeFilter = shapesConv[filter];
-                // get "mul" or "add" variables
-                const nnef::Dictionary<nnef::Value>& argsOP = opsValues[pos];
-                const nnef::Dictionary<nnef::Shape>& shapesOP = opsShapes[pos];
-                const std::string& x = argsOP["x"].tensor().id;
-                const std::string& y = argsOP["y"].tensor().id;
-                std::string var;
-                nnef::Shape shapeVar;
-                if(std::find(variableList.begin(), variableList.end(), x) != variableList.end()) {
-                    var = x;
-                    shapeVar = shapesOP[x];
-                }
-                else if(std::find(variableList.begin(), variableList.end(), y) != variableList.end()) {
-                    var = y;
-                    shapeVar = shapesOP[y];
-                }
-                // get var dimensions
-                size_t filterDimsCount = shapeFilter.rank(), varDimsCount = 0;
-                std::vector<size_t> filterDims, varDims;
-                getTensorDims(shapeFilter, filterDims, filterDimsCount);
-                if(var.length() > 0) {
-                    varDimsCount = shapeVar.rank();
-                    getTensorDims(shapeVar, varDims, varDimsCount);
-                }
-                // check validity of dimensions
-                size_t K = (filterDimsCount == 4) ? filterDims[3] : filterDims[1];
-                size_t N = (filterDimsCount == 4) ? (filterDims[0] * filterDims[1] * filterDims[2]) : filterDims[0];
-                if((filterDimsCount == 4 || filterDimsCount == 2) && varDimsCount == 2 && K == varDims[0]) {
-                    // fuse var into conv variables
-                    std::tuple<unsigned int, char *> filterBinary = variableBinary[filter];
-                    std::tuple<unsigned int, char *> biasBinary = variableBinary[bias];
-                    std::tuple<unsigned int, char *> varBinary = variableBinary[var];
-                    float * filterBuf = (float *)std::get<1>(filterBinary);
-                    float * biasBuf = nullptr;
-                    float * varBuf = (float *)std::get<1>(varBinary);
-                    if(bias[0] != '0') {
-                        std::tuple<unsigned int, char *> biasBinary = variableBinary[bias];
-                        biasBuf = (float *)std::get<1>(biasBinary);
-                    }
-                    else if(convNewBiasName.find(outputConv) != convNewBiasName.end()) {
-                        std::tuple<unsigned int, char *> biasBinary = variableBinary[convNewBiasName[outputConv]];
-                        biasBuf = (float *)std::get<1>(biasBinary);
-                    }
-                    else {
-                        size_t size = K * sizeof(float);
-                        char * data = new char [size];
-                        biasBuf = (float *)data;
-                        for(size_t i = 0; i < K; i++) {
-                            biasBuf[i] = 0;
-                        }
-                        std::string name = filter + "__new_bias";
-                        std::tuple<unsigned int, char *> binary(size, data);
-                        variableBinary[name] = binary;
-                        convNewBiasName[outputConv] = name;
-                        variableList.push_back(name);
-                        variableMerged[name] = false;
-                        nnef::Shape shape(1);
-                        shape[0] = K;
-                        shape[1] = 1;
-                        variableShape[name] = shape;
-                        variableRequiredDims[name] = 2;
-                    }
-                    if(opname == "mul") {
-                        for(size_t k = 0; k < K; k++) {
-                            double mk = varBuf[k];
-                            float * W = &filterBuf[k*N];
-                            for(size_t j = 0; j < N; j++) {
-                                W[j] = (float)(W[j] * mk);
-                            }
-                            if(biasBuf) {
-                                biasBuf[k] = (float)(mk * biasBuf[k]);
-                            }
-                        }
-                    }
-                    else {
-                        for(size_t k = 0; k < K; k++) {
-                            float ck = varBuf[k];
-                            biasBuf[k] = biasBuf[k] + ck;
-                        }
-                    }
-                    // mark that OP is disabled, rename output as input, and use conv as previous layer
-                    operationRemoved[pos] = true;
-                    virtualRename[argsOP["z"].tensor().id] = outputConv;
-                    prevOutput = argsOP["z"].tensor().id;
-                    // mark the merged variables
-                    variableMerged[var] = true;
-                }
-                else {
-                    // use OP as previous layer
-                    prevPos = pos;
-                    prevOpName = opname;
-                    prevOutput = argsOP["z"].tensor().id;
-                }
-            }
-            else if(opname == "max_pool" || opname == "avg_pool") {
-                const nnef::Dictionary<nnef::Value>& args = opsValues[pos];
-                const std::string& input = args["input"].tensor().id;
-                if(input != prevOutput || prevOpName != "conv") {
-                    prevPos = pos;
-                    prevOpName = opname;
-                }
-                prevOutput = args["output"].tensor().id;
-            }
-            else if(opname == "conv" || opname == "batch_normalization") {
-                const nnef::Dictionary<nnef::Value>& args = opsValues[pos];
-                const std::string& input = args["input"].tensor().id;
-                prevPos = pos;
-                prevOpName = opname;
-                prevOutput = args["output"].tensor().id;
-            }
-            else if(opname == "add" || opname == "mul") {
-                const nnef::Dictionary<nnef::Value>& args = opsValues[pos];
-                const std::string& input1 = args["x"].tensor().id;
-                const std::string& input2 = args["y"].tensor().id;
-                prevPos = pos;
-                prevOpName = opname;
-                prevOutput = args["z"].tensor().id;
-            }
-            else {
-                prevPos = 0;
-                prevOpName = "";
-                prevOutput = "";
-            }
-        }
-    }
-
-protected:
-    ////
-    // translator callback implementations
-    //
-    virtual void beginGraph( const nnef::Prototype& proto )
-    {
-        // show NNEF syntax
-        if(verbose & 1) {
-            std::cout << "graph " << proto.name() << "( ";
-            for ( size_t i = 0; i < proto.paramCount(); ++i ) {
-                auto& param = proto.param(i);
-                if ( i ) std::cout << ", ";
-                std::cout << param.name();
-            }
-            std::cout << " ) -> ( ";
-            for ( size_t i = 0; i < proto.resultCount(); ++i ) {
-                auto& result = proto.result(i);
-                if ( i ) std::cout << ", ";
-                std::cout << result.name();
-            }
-            std::cout << " )" << std::endl << '{' << std::endl;
-        }
-
-        ////
-        // get input and output parameter list
-        //
-        for (size_t i = 0; i < proto.paramCount(); ++i) {
-            inputList.push_back(proto.param(i).name());
-        }
-        for (size_t i = 0; i < proto.resultCount(); ++i) {
-            outputList.push_back(proto.result(i).name());
-        }
-
-        ////
-        // generate OpenVX C code preamble
-        //
-        openvxFilenameC = openvxFolder + "/annmodule.cpp";
-        ovxC.open(openvxFilenameC);
-        if(!ovxC) {
-            printf("ERROR: unable to create: %s\n", openvxFilenameC.c_str());
-            exit(1);
-        }
-    }
-
-    virtual void endGraph( const nnef::Prototype& proto )
-    {
-        // show NNEF syntax
-        if(verbose & 1) {
-            std::cout << '}' << std::endl;
-        }
-
-        ////
-        // generate OpenVX C code preamble
-        //
-        ovxC << "#include \"annmodule.h\"" << std::endl
-             << "#include <VX/vx_khr_nn.h>" << std::endl
-             << "#include <vx_amd_nn.h>" << std::endl
-             << "#include <vx_ext_amd.h>" << std::endl
-             << "#include <stdio.h>" << std::endl
-             << std::endl
-             << "#define ERROR_CHECK_OBJECT(obj) { vx_status status = vxGetStatus((vx_reference)(obj)); if(status != VX_SUCCESS) { vxAddLogEntry((vx_reference)context, status     , \"ERROR: failed with status = (%d) at \" __FILE__ \"#%d\\n\", status, __LINE__); return status; } }" << std::endl
-             << "#define ERROR_CHECK_STATUS(call) { vx_status status = (call); if(status != VX_SUCCESS) { vxAddLogEntry((vx_reference)context, status, \"ERROR: failed with status = (%d) at \" __FILE__ \"#%d\\n\", status, __LINE__); return status; } }" << std::endl
-             << std::endl
-             << "static vx_status initializeTensor(vx_context context, vx_tensor tensor, FILE * fp, const char * binaryFilename)" << std::endl
-             << "{" << std::endl
-             << "    vx_enum data_type = VX_TYPE_FLOAT32;" << std::endl
-             << "    vx_size num_of_dims = 4, dims[4] = { 1, 1, 1, 1 }, stride[4];" << std::endl
-             << "    ERROR_CHECK_STATUS(vxQueryTensor(tensor, VX_TENSOR_DATA_TYPE, &data_type, sizeof(vx_enum)));" << std::endl
-             << "    ERROR_CHECK_STATUS(vxQueryTensor(tensor, VX_TENSOR_NUMBER_OF_DIMS, &num_of_dims, sizeof(vx_size)));" << std::endl
-             << "    ERROR_CHECK_STATUS(vxQueryTensor(tensor, VX_TENSOR_DIMS, &dims, num_of_dims * sizeof(vx_size)));" << std::endl
-             << "    vx_size itemsize = sizeof(float);" << std::endl
-             << "    if(data_type == VX_TYPE_UINT8 || data_type == VX_TYPE_INT8) {" << std::endl
-             << "        itemsize = sizeof(vx_uint8);" << std::endl
-             << "    }" << std::endl
-             << "    else if(data_type == VX_TYPE_UINT16 || data_type == VX_TYPE_INT16 || data_type == VX_TYPE_FLOAT16) {" << std::endl
-             << "        itemsize = sizeof(vx_uint16);" << std::endl
-             << "    }" << std::endl
-             << "    vx_size count = dims[0] * dims[1] * dims[2] * dims[3];" << std::endl
-             << std::endl
-             << "    vx_uint32 h[2] = { 0 };" << std::endl
-             << "    fread(h, 1, sizeof(h), fp);" << std::endl
-             << "    if(h[0] != 0x" << std::hex << VARIABLES_DATA_MAGIC << std::dec << " || (vx_size)h[1] != (count*itemsize)) {" << std::endl
-             << "      vxAddLogEntry((vx_reference)tensor, VX_FAILURE, \"ERROR: invalid data (magic,size)=(0x%x,%d) in %s at byte position %d -- expected size is %ld\\n\", h[0], h[1], binaryFilename, ftell(fp)-sizeof(h), count*itemsize);" << std::endl
-             << "      return VX_FAILURE;" << std::endl
-             << "    }" << std::endl
-             << std::endl
-             << "    vx_map_id map_id;" << std::endl
-             << "    float * ptr;" << std::endl
-             << "    ERROR_CHECK_STATUS(vxMapTensorPatch(tensor, num_of_dims, nullptr, nullptr, &map_id, stride, (void **)&ptr, VX_WRITE_ONLY, VX_MEMORY_TYPE_HOST));" << std::endl
-             << "    vx_size n = fread(ptr, itemsize, count, fp);" << std::endl
-             << "    if(n != count) {" << std::endl
-             << "        vxAddLogEntry((vx_reference)tensor, VX_FAILURE, \"ERROR: expected char[%ld], but got char[%ld] in %s\\n\", count*itemsize, n*itemsize, binaryFilename);" << std::endl
-             << "        return VX_FAILURE;" << std::endl
-             << "    }" << std::endl
-             << "    ERROR_CHECK_STATUS(vxUnmapTensorPatch(tensor, map_id));" << std::endl
-             << std::endl
-             << "    return VX_SUCCESS;" << std::endl
-             << "}" << std::endl
-             << std::endl
-             << "vx_status annAddToGraph(vx_graph graph";
-        for(auto& name : inputList) {
-            ovxC << ", vx_tensor " << name;
-        }
-        for(auto& name : outputList) {
-            ovxC << ", vx_tensor " << name;
-        }
-        ovxC << ", const char * binaryFilename)" << std::endl
-             << "{" << std::endl
-             << "    vx_context context = vxGetContext((vx_reference)graph);" << std::endl
-             << "    ERROR_CHECK_OBJECT(context);" << std::endl
-             << "    ERROR_CHECK_STATUS(vxLoadKernels(context, \"vx_nn\"));" << std::endl;
-
-        ////
-        // get variables
-        //
-        for(size_t i = 0; i < opsProto.size(); i++) {
-            codeGenOperation(i, true, false, verbose);
-        }
-
-        ////
-        // get data
-        //
-        for(auto& name : variableList) {
-            unsigned int size = 0;
-            char * data = nullptr;
-            if(variableShape.find(name) != variableShape.end() && variableLabel.find(name) != variableLabel.end()) {
-                auto& shape = variableShape[name];
-                auto& label = variableLabel[name];
-                size = loadTensorFile(nnefFolder, label, shape, data);
-            }
-            if(size > 0 && data) {
-                std::tuple<unsigned int, char *> binary(size, data);
-                variableBinary[name] = binary;
-            }
-            else {
-                printf("ERROR: unable to load binary data for variable '%s'\n", name.c_str());
-                exit(1);
-            }
-        }
-
-        ////
-        // merge variables
-        //
-        codeGenMergeVariables();
-
-        ////
-        // create and initialize variables file
-        //
-        ovxC << std::endl;
-        ovxC << "    // create variables" << std::endl;
-        for(auto& name : variableList) {
-            if(!variableMerged[name]) {
-                if(variableShape.find(name) != variableShape.end()) {
-                    auto& shape = variableShape[name];
-                    int num_dims = 0;
-                    auto it = variableRequiredDims.find(name);
-                    if(it != variableRequiredDims.end()) {
-                        num_dims = it->second;
-                    }
-                    ovxC << codeGenTensorCreate(name, shape, false, num_dims);
-                }
-                else {
-                    printf("ERROR: something wrong with variable '%s': variableShape is missing\n", name.c_str());
-                    exit(1);
-                }
-            }
-        }
-        ovxC << std::endl
-             << "    // initialize variables" << std::endl
-             << "    FILE * fp__variables = fopen(binaryFilename, \"rb\");" << std::endl
-             << "    if(!fp__variables) {" << std::endl
-             << "        vxAddLogEntry((vx_reference)context, VX_FAILURE, \"ERROR: unable to open: %s\\n\", binaryFilename);" << std::endl
-             << "        return VX_FAILURE;" << std::endl
-             << "    }" << std::endl
-             << "    { vx_uint32 magic = 0;" << std::endl
-             << "      fread(&magic, 1, sizeof(magic), fp__variables);" << std::endl
-             << "      if(magic != 0x" << std::hex << VARIABLES_FILE_MAGIC << std::dec << ") {" << std::endl
-             << "        vxAddLogEntry((vx_reference)context, VX_FAILURE, \"ERROR: invalid file magic in %s\\n\", binaryFilename);" << std::endl
-             << "        return VX_FAILURE;" << std::endl
-             << "      }" << std::endl
-             << "    }" << std::endl;
-        std::string variablesFilename = openvxFolder + "/weights.bin";
-        FILE * fpVariables = fopen(variablesFilename.c_str(), "wb");
-        if(!fpVariables) {
-            printf("ERROR: unable to create: %s\n", variablesFilename.c_str());
-            exit(1);
-        }
-        unsigned int magic_file = VARIABLES_FILE_MAGIC;
-        unsigned int magic_data = VARIABLES_DATA_MAGIC;
-        fwrite(&magic_file, 1, sizeof(magic_file), fpVariables);
-        for(auto& name : variableList) {
-            if(!variableMerged[name]) {
-                if(variableShape.find(name) != variableShape.end()) {
-                    auto& shape = variableShape[name];
-                    std::tuple<unsigned int, char *> binary = variableBinary[name];
-                    unsigned int size = std::get<0>(binary);
-                    char * data = std::get<1>(binary);
-                    if(size > 0 && data) {
-                        fwrite(&magic_data, 1, sizeof(magic_data), fpVariables);
-                        fwrite(&size, 1, sizeof(size), fpVariables);
-                        fwrite(data, 1, size, fpVariables);
-                        delete[] data;
-                        std::tuple<unsigned int, char *> empty(0, nullptr);
-                        variableBinary[name] = empty;
-                        ovxC << "    ERROR_CHECK_STATUS(initializeTensor(context, " << name << ", fp__variables, binaryFilename));" << std::endl;
-                    }
-                    else {
-                        printf("ERROR: something wrong with variable '%s': variableBinary is not valid\n", name.c_str());
-                        exit(1);
-                    }
-                }
-                else {
-                    printf("ERROR: something wrong with variable '%s': variableShape is missing\n", name.c_str());
-                    exit(1);
-                }
-            }
-        }
-        unsigned int magic_eoff = VARIABLES_EOFF_MAGIC;
-        fwrite(&magic_eoff, 1, sizeof(magic_eoff), fpVariables);
-        fclose(fpVariables);
-        ovxC << "    { vx_uint32 magic = 0;" << std::endl
-             << "      fread(&magic, 1, sizeof(magic), fp__variables);" << std::endl
-             << "      if(magic != 0x" << std::hex << VARIABLES_EOFF_MAGIC << std::dec << ") {" << std::endl
-             << "        vxAddLogEntry((vx_reference)context, VX_FAILURE, \"ERROR: invalid eoff magic in %s\\n\", binaryFilename);" << std::endl
-             << "        return VX_FAILURE;" << std::endl
-             << "      }" << std::endl
-             << "      fclose(fp__variables);" << std::endl
-             << "    }" << std::endl;
-        std::cout << "OK: created '" << variablesFilename << "'" << std::endl;
-
-        ////
-        // instantiate nodes in graph
-        //
-        ovxC << std::endl;
-        ovxC << "    // create nodes in graph" << std::endl;
-        for(auto i = 0; i < opsProto.size(); i++) {
-            codeGenOperation(i, false, true, 0);
-        }
-
-        ////
-        // generate clean-up code
-        //
-        ovxC << std::endl;
-        ovxC << "    // release internal tensors" << std::endl;
-        for(auto& name : virtualList) {
-            if(virtualRename.find(name) == virtualRename.end()) {
-                ovxC << "    ERROR_CHECK_STATUS(vxReleaseTensor(&" << name << "));" << std::endl;
-            }
-        }
-        for(auto& name : variableList) {
-            if(!variableMerged[name]) {
-                ovxC << "    ERROR_CHECK_STATUS(vxReleaseTensor(&" << name << "));" << std::endl;
-            }
-        }
-        ovxC << std::endl;
-        ovxC << "    return VX_SUCCESS;" << std::endl;
-        ovxC << "}" << std::endl;
-        ovxC.close();
-        std::cout << "OK: created '" << openvxFilenameC << "'" << std::endl;
-
-        ////
-        // generate OpenVX header file
-        //
-        openvxFilenameC = openvxFolder + "/annmodule.h";
-        ovxC.open(openvxFilenameC);
-        if(!ovxC) {
-            printf("ERROR: unable to create: %s\n", openvxFilenameC.c_str());
-            exit(1);
-        }
-        ovxC << "#ifndef included_file_annmodule_h" << std::endl
-             << "#define included_file_annmodule_h" << std::endl
-             << std::endl
-             << "#include <VX/vx.h>" << std::endl
-             << std::endl;
-        ovxC << "////" << std::endl
-             << "// initialize graph neural network for inference" << std::endl;
-        for(auto& name : inputList) {
-            if(inputShape.find(name) != inputShape.end()) {
-                std::vector<size_t> dims;
-                getTensorDims(inputShape[name], dims, 4);
-                ovxC << "//   " << name << " -- dims[] = {";
-                for(size_t i = 0; i < dims.size(); i++) {
-                    ovxC << (i == 0 ? " " : ", ") << dims[i];
-                }
-                ovxC << " } (input)" << std::endl;
-            }
-        }
-        for(auto& name : outputList) {
-            if(outputShape.find(name) != outputShape.end()) {
-                std::vector<size_t> dims;
-                getTensorDims(outputShape[name], dims, 4);
-                ovxC << "//   " << name << " -- dims[] = {";
-                for(size_t i = 0; i < dims.size(); i++) {
-                    ovxC << (i == 0 ? " " : ", ") << dims[i];
-                }
-                ovxC << " } (output)" << std::endl;
-            }
-        }
-        ovxC << "//" << std::endl
-             << "vx_status annAddToGraph(vx_graph graph";
-        for(auto& name : inputList) {
-            ovxC << ", vx_tensor " << name;
-        }
-        for(auto& name : outputList) {
-            ovxC << ", vx_tensor " << name;
-        }
-        ovxC << ", const char * binaryFilename);" << std::endl
-             << std::endl
-             << "#endif" << std::endl;
-        ovxC.close();
-        std::cout << "OK: created '" << openvxFilenameC << "'" << std::endl;
-
-        ////
-        // generate a simple test program
-        //
-        openvxFilenameC = openvxFolder + "/anntest.cpp";
-        ovxC.open(openvxFilenameC);
-        if(!ovxC) {
-            printf("ERROR: unable to create: %s\n", openvxFilenameC.c_str());
-            exit(1);
-        }
-        ovxC << "#include \"annmodule.h\"" << std::endl
-             << "#include <vx_ext_amd.h>" << std::endl
-             << "#include <iostream>" << std::endl
-             << "#include <stdio.h>" << std::endl
-             << "#include <string.h>" << std::endl
-             << "#include <string>" << std::endl
-             << "#include <inttypes.h>" << std::endl
-             << "#include <chrono>" << std::endl
-             << "#include <unistd.h>" << std::endl
-             << "" << std::endl
-             << "#if ENABLE_OPENCV" << std::endl
-             << "#include <opencv2/opencv.hpp>" << std::endl
-             << "using namespace cv; " << std::endl
-             << "#endif" << std::endl
-             << "" << std::endl
-             << "#define ERROR_CHECK_STATUS(call) { vx_status status = (call); if(status != VX_SUCCESS) { printf(\"ERROR: failed with status = (%d) at \" __FILE__ \"#%d\", status, __LINE__); return -1; } }" << std::endl
-             << "" << std::endl
-             << "static void VX_CALLBACK log_callback(vx_context context, vx_reference ref, vx_status status, const vx_char string[])" << std::endl
-             << "{" << std::endl
-             << "    size_t len = strlen(string);" << std::endl
-             << "    if (len > 0) {" << std::endl
-             << "        printf(\"%s\", string);" << std::endl
-             << "        if (string[len - 1] != '\\n')" << std::endl
-             << "            printf(\"\\n\");" << std::endl
-             << "        fflush(stdout);" << std::endl
-             << "    }" << std::endl
-             << "}" << std::endl
-             << "" << std::endl
-             << "inline int64_t clockCounter()" << std::endl
-             << "{" << std::endl
-             << "    return std::chrono::high_resolution_clock::now().time_since_epoch().count();" << std::endl
-             << "}" << std::endl
-             << "" << std::endl
-             << "inline int64_t clockFrequency()" << std::endl
-             << "{" << std::endl
-             << "    return std::chrono::high_resolution_clock::period::den / std::chrono::high_resolution_clock::period::num;" << std::endl
-             << "}" << std::endl
-             << "" << std::endl
-             << "static vx_status copyTensor(vx_tensor tensor, std::string fileName, vx_enum usage = VX_WRITE_ONLY)" << std::endl
-             << "{" << std::endl
-             << "    vx_enum data_type = VX_TYPE_FLOAT32;" << std::endl
-             << "    vx_size num_of_dims = 4, dims[4] = { 1, 1, 1, 1 }, stride[4];" << std::endl
-             << "    vxQueryTensor(tensor, VX_TENSOR_DATA_TYPE, &data_type, sizeof(data_type));" << std::endl
-             << "    vxQueryTensor(tensor, VX_TENSOR_NUMBER_OF_DIMS, &num_of_dims, sizeof(num_of_dims));" << std::endl
-             << "    vxQueryTensor(tensor, VX_TENSOR_DIMS, &dims, sizeof(dims[0])*num_of_dims);" << std::endl
-             << "    vx_size itemsize = sizeof(float);" << std::endl
-             << "    if(data_type == VX_TYPE_UINT8 || data_type == VX_TYPE_INT8) {" << std::endl
-             << "        itemsize = sizeof(vx_uint8);" << std::endl
-             << "    }" << std::endl
-             << "    else if(data_type == VX_TYPE_UINT16 || data_type == VX_TYPE_INT16 || data_type == VX_TYPE_FLOAT16) {" << std::endl
-             << "        itemsize = sizeof(vx_uint16);" << std::endl
-             << "    }" << std::endl
-             << "    vx_size count = dims[0] * dims[1] * dims[2] * dims[3];" << std::endl
-             << "    vx_map_id map_id;" << std::endl
-             << "    float * ptr;" << std::endl
-             << "    vx_status status = vxMapTensorPatch(tensor, num_of_dims, nullptr, nullptr, &map_id, stride, (void **)&ptr, usage, VX_MEMORY_TYPE_HOST);" << std::endl
-             << "    if(status) {" << std::endl
-             << "        std::cerr << \"ERROR: vxMapTensorPatch() failed for \" << fileName << std::endl;" << std::endl
-             << "        return -1;" << std::endl
-             << "    }" << std::endl
-             << "    if(usage == VX_WRITE_ONLY) {" << std::endl
-             << "#if ENABLE_OPENCV" << std::endl
-             << "        if(dims[3] == 1 && dims[2] == 3 && fileName.size() > 4 && (fileName.substr(fileName.size()-4, 4) == \".png\" || fileName.substr(fileName.size()-4, 4) == \".jpg\"))" << std::endl
-             << "        {" << std::endl
-             << "            Mat img = imread(fileName.c_str(), CV_LOAD_IMAGE_COLOR);" << std::endl
-             << "            if(!img.data || img.rows != dims[1] || img.cols != dims[0]) {" << std::endl
-             << "                std::cerr << \"ERROR: invalid image or dimensions in \" << fileName << std::endl;" << std::endl
-             << "                return -1;" << std::endl
-             << "            }" << std::endl
-             << "            unsigned char * src = img.data;" << std::endl
-             << "            for(vx_size c = 0; c < 3; c++) {" << std::endl
-             << "                for(vx_size y = 0; y < dims[1]; y++) {" << std::endl
-             << "                    for(vx_size x = 0; x < dims[0]; x++) {" << std::endl
-             << "                        ptr[(c*stride[2]+y*stride[1]+x*stride[0])>>2] = src[y*dims[0]*3+x*3+c];" << std::endl
-             << "                    }" << std::endl
-             << "                }" << std::endl
-             << "            }" << std::endl
-             << "        }" << std::endl
-             << "        else" << std::endl
-             << "#endif" << std::endl
-             << "        {" << std::endl
-             << "            FILE * fp = fopen(fileName.c_str(), \"rb\");" << std::endl
-             << "            if(!fp) {" << std::endl
-             << "                std::cerr << \"ERROR: unable to open: \" << fileName << std::endl;" << std::endl
-             << "                return -1;" << std::endl
-             << "            }" << std::endl
-             << "            vx_size n = fread(ptr, itemsize, count, fp);" << std::endl
-             << "            fclose(fp);" << std::endl
-             << "            if(n != count) {" << std::endl
-             << "                std::cerr << \"ERROR: expected char[\" << count*itemsize << \"], but got char[\" << n*itemsize << \"] in \" << fileName << std::endl;" << std::endl
-             << "                return -1;" << std::endl
-             << "            }" << std::endl
-             << "        }" << std::endl
-             << "    }" << std::endl
-             << "    else {" << std::endl
-             << "        FILE * fp = fopen(fileName.c_str(), \"wb\");" << std::endl
-             << "        if(!fp) {" << std::endl
-             << "            std::cerr << \"ERROR: unable to open: \" << fileName << std::endl;" << std::endl
-             << "            return -1;" << std::endl
-             << "        }" << std::endl
-             << "        fwrite(ptr, itemsize, count, fp);" << std::endl
-             << "        fclose(fp);" << std::endl
-             << "    }" << std::endl
-             << "    status = vxUnmapTensorPatch(tensor, map_id);" << std::endl
-             << "    if(status) {" << std::endl
-             << "        std::cerr << \"ERROR: vxUnmapTensorPatch() failed for \" << fileName << std::endl;" << std::endl
-             << "        return -1;" << std::endl
-             << "    }" << std::endl
-             << "    return 0;" << std::endl
-             << "}" << std::endl
-             << "" << std::endl
-             << "int main(int argc, const char ** argv)" << std::endl
-             << "{" << std::endl
-             << "    // check command-line usage" << std::endl
-             << "    if(argc < 2) {" << std::endl
-             << "        printf(\"Usage: anntest <weights.bin> [<input/output-filename(s)>...]\\n\");" << std::endl
-             << "        return -1;" << std::endl
-             << "    }" << std::endl
-             << "    const char * binaryFilename = argv[1];" << std::endl
-             << "    argc -= 2;" << std::endl
-             << "    argv += 2;" << std::endl
-             << "" << std::endl
-             << "    // create context, input, output, and graph" << std::endl
-             << "    vxRegisterLogCallback(NULL, log_callback, vx_false_e);" << std::endl
-             << "    vx_context context = vxCreateContext();" << std::endl
-             << "    if(vxGetStatus((vx_reference)context)) {" << std::endl
-             << "        printf(\"ERROR: vxCreateContext() failed\\n\");" << std::endl
-             << "        return -1;" << std::endl
-             << "    }" << std::endl
-             << "    vxRegisterLogCallback(context, log_callback, vx_false_e);" << std::endl
-             << "" << std::endl
-             << "    // create input tensors and initialize" << std::endl
-            ;
-        for(auto& name : inputList) {
-            std::vector<size_t> dims;
-            getTensorDims(inputShape[name], dims, 4);
-            ovxC << "    vx_size " << name << "_dims[" << dims.size() << "] = {";
-            for(size_t i = 0; i < dims.size(); i++) {
-                ovxC << (i == 0 ? " " : ", ") << dims[i];
-            }
-            ovxC << " };" << std::endl
-                 << "    vx_tensor " << name << " = vxCreateTensor(context, " << dims.size() << ", " << name << "_dims, VX_TYPE_FLOAT32, 0);" << std::endl
-                 << "    if(vxGetStatus((vx_reference)" << name << ")) {" << std::endl
-                 << "        printf(\"ERROR: vxCreateTensor() failed for " << name << "\\n\");" << std::endl
-                 << "        return -1;" << std::endl
-                 << "    }" << std::endl
-                 << "    if(*argv) {" << std::endl
-                 << "        if(strcmp(*argv, \"-\") != 0) {" << std::endl
-                 << "            if(copyTensor(" << name << ", *argv, VX_WRITE_ONLY) < 0) {" << std::endl
-                 << "                return -1;" << std::endl
-                 << "            }" << std::endl
-                 << "            printf(\"OK: read tensor '" << name << "' from %s\\n\", *argv);" << std::endl
-                 << "        }" << std::endl
-                 << "        argv++;" << std::endl
-                 << "    }" << std::endl
-                ;
-        }
-        ovxC << "    // create output tensors" << std::endl;
-        for(auto& name : outputList) {
-            std::vector<size_t> dims;
-            getTensorDims(outputShape[name], dims, 4);
-            ovxC << "    vx_size " << name << "_dims[" << dims.size() << "] = {";
-            for(size_t i = 0; i < dims.size(); i++) {
-                ovxC << (i == 0 ? " " : ", ") << dims[i];
-            }
-            ovxC << " };" << std::endl
-                 << "    vx_tensor " << name << " = vxCreateTensor(context, " << dims.size() << ", " << name << "_dims, VX_TYPE_FLOAT32, 0);" << std::endl
-                 << "    if(vxGetStatus((vx_reference)" << name << ")) {" << std::endl
-                 << "        printf(\"ERROR: vxCreateTensor() failed for " << name << "\\n\");" << std::endl
-                 << "        return -1;" << std::endl
-                 << "    }" << std::endl;
-        }
-        ovxC << "" << std::endl
-             << "    // build graph using annmodule" << std::endl
-             << "    vx_status status;" << std::endl
-             << "    int64_t freq = clockFrequency(), t0, t1;" << std::endl
-             << "    t0 = clockCounter();" << std::endl
-             << "    vx_graph graph = vxCreateGraph(context);" << std::endl
-             << "    status = vxGetStatus((vx_reference)graph);" << std::endl
-             << "    if(status) {" << std::endl
-             << "        printf(\"ERROR: vxCreateGraph(...) failed (%d)\\n\", status);" << std::endl
-             << "        return -1;" << std::endl
-             << "    }" << std::endl
-             << "    status = annAddToGraph(graph, "
-            ;
-        for(auto& name : inputList) {
-            ovxC << name << ", ";
-        }
-        for(auto& name : outputList) {
-            ovxC << name << ", ";
-        }
-        ovxC << "binaryFilename);" << std::endl
-             << "    if(status) {" << std::endl
-             << "        printf(\"ERROR: annAddToGraph() failed (%d)\\n\", status);" << std::endl
-             << "        return -1;" << std::endl
-             << "    }" << std::endl
-             << "    status = vxVerifyGraph(graph);" << std::endl
-             << "    if(status) {" << std::endl
-             << "        printf(\"ERROR: vxVerifyGraph(...) failed (%d)\\n\", status);" << std::endl
-             << "        return -1;" << std::endl
-             << "    }" << std::endl
-             << "    t1 = clockCounter();" << std::endl
-             << "    printf(\"OK: graph initialization with annAddToGraph() took %.3f msec\\n\", (float)(t1-t0)*1000.0f/(float)freq);" << std::endl
-             << "" << std::endl
-             << "    t0 = clockCounter();" << std::endl
-             << "    status = vxProcessGraph(graph);" << std::endl
-             << "    t1 = clockCounter();" << std::endl
-             << "    if(status != VX_SUCCESS) {" << std::endl
-             << "        printf(\"ERROR: vxProcessGraph() failed (%d)\\n\", status);" << std::endl
-             << "        return -1;" << std::endl
-             << "    }" << std::endl
-             << "    printf(\"OK: vxProcessGraph() took %.3f msec (1st iteration)\\n\", (float)(t1-t0)*1000.0f/(float)freq);" << std::endl
-             << "" << std::endl
-             << "    // write outputs" << std::endl
-            ;
-        for(auto& name : outputList) {
-            ovxC << "    if(*argv) {" << std::endl
-                 << "        if(strcmp(*argv, \"-\") != 0) {" << std::endl
-                 << "            if(copyTensor(" << name << ", *argv, VX_READ_ONLY) < 0) {" << std::endl
-                 << "                return -1;" << std::endl
-                 << "            }" << std::endl
-                 << "            printf(\"OK: wrote tensor '" << name << "' into %s\\n\", *argv);" << std::endl
-                 << "        }" << std::endl
-                 << "        argv++;" << std::endl
-                 << "    }" << std::endl
-                ;
-        }
-        ovxC << "" << std::endl
-             << "    t0 = clockCounter();" << std::endl
-             << "    int N = 100;" << std::endl
-             << "    for(int i = 0; i < N; i++) {" << std::endl
-             << "        status = vxProcessGraph(graph);" << std::endl
-             << "        if(status != VX_SUCCESS)" << std::endl
-             << "            break;" << std::endl
-             << "    }" << std::endl
-             << "    t1 = clockCounter();" << std::endl
-             << "    printf(\"OK: vxProcessGraph() took %.3f msec (average over %d iterations)\\n\", (float)(t1-t0)*1000.0f/(float)freq/(float)N, N);" << std::endl
-             << "" << std::endl
-             << "    // release resources" << std::endl
-             << "    ERROR_CHECK_STATUS(vxReleaseGraph(&graph));" << std::endl
-            ;
-        for(auto& name : inputList) {
-            ovxC << "    ERROR_CHECK_STATUS(vxReleaseTensor(&" << name << "));" << std::endl;
-        }
-        for(auto& name : outputList) {
-            ovxC << "    ERROR_CHECK_STATUS(vxReleaseTensor(&" << name << "));" << std::endl;
-        }
-        ovxC << "    ERROR_CHECK_STATUS(vxReleaseContext(&context));" << std::endl
-             << "    printf(\"OK: successful\\n\");" << std::endl
-             << "" << std::endl
-             << "    return 0;" << std::endl
-             << "}" << std::endl
-             ;
-        ovxC.close();
-        std::cout << "OK: created '" << openvxFilenameC << "'" << std::endl;
-
-        ////
-        // generate CMakeLists.txt
-        //
-        openvxFilenameC = openvxFolder + "/CMakeLists.txt";
-        ovxC.open(openvxFilenameC);
-        if(!ovxC) {
-            printf("ERROR: unable to create: %s\n", openvxFilenameC.c_str());
-            exit(1);
-        }
-        ovxC << "cmake_minimum_required(VERSION 3.5)" << std::endl
-             << "project (annmodule)" << std::endl
-             << "set (CMAKE_CXX_STANDARD 14) " << std::endl
-             << "list(APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake)" << std::endl
-             << "find_package(OpenCL REQUIRED)" << std::endl
-             << "find_package(OpenCV QUIET)" << std::endl
-             << "include_directories (${OpenCL_INCLUDE_DIRS} ${OpenCL_INCLUDE_DIRS}/Headers )" << std::endl
-             << "include_directories (/opt/rocm/include/mivisionx)" << std::endl
-             << "link_directories    (/opt/rocm/lib)" << std::endl
-             << "list(APPEND SOURCES annmodule.cpp)" << std::endl
-             << "add_library(${PROJECT_NAME} SHARED ${SOURCES})" << std::endl
-             << "set(CMAKE_CXX_FLAGS \"${CMAKE_CXX_FLAGS} -msse4.2 -std=gnu++14\")" << std::endl
-             << "target_link_libraries(${PROJECT_NAME} openvx vx_nn pthread)" << std::endl
-             << "add_executable(anntest anntest.cpp)" << std::endl
-             << "if (OpenCV_FOUND)" << std::endl
-             << "  target_compile_definitions(anntest PUBLIC ENABLE_OPENCV=1)" << std::endl
-             << "  include_directories(${OpenCV_INCLUDE_DIRS})" << std::endl
-             << "  target_link_libraries(anntest ${OpenCV_LIBRARIES})" << std::endl
-             << "else(OpenCV_FOUND)" << std::endl
-             << "  target_compile_definitions(anntest PUBLIC ENABLE_OPENCV=0)" << std::endl
-             << "endif(OpenCV_FOUND)" << std::endl
-             << "target_link_libraries(anntest openvx vx_nn pthread ${PROJECT_NAME})" << std::endl
-            ;
-        ovxC.close();
-        std::cout << "OK: created '" << openvxFilenameC << "'" << std::endl;
-    }
-
-    virtual void operation(const nnef::Prototype& proto,
-                           const nnef::Dictionary<nnef::Value>& args,
-                           const nnef::Dictionary<nnef::Shape>& shapes)
-    {
-        // save the operation details
-        opsProto.push_back(proto);
-        opsValues.push_back(args);
-        opsShapes.push_back(shapes);
-        operationRemoved.push_back(false);
-    }
-
-    virtual bool isAtomic( const nnef::Prototype& proto, const nnef::Dictionary<nnef::Value>& args )
-    {
-        static std::set<std::string> atomics =
-        {
-            "sqr", "sqrt", "min", "max",
-            "softmax", "relu", "tanh", "sigmoid",
-            "batch_normalization", "max_pool", "avg_pool",
-            "quantize_linear", "quantize_logarithmic"
-        };
-        return atomics.find(proto.name()) != atomics.end();
-    }
-};
-
-int main(int argc, const char * argv[])
-{
-    ////
-    // get command-line parameters
-    //
-    int verbose = 0;
-    bool useVirtual = true;
-    while(argc > 1 && argv[1][0] == '-') {
-        if(!strcmp(argv[1], "--no-virtual")) {
-            useVirtual = false;
-            argc -= 1;
-            argv += 1;
-        }
-        else if(argc > 2 && !strcmp(argv[1], "-v")) {
-            verbose = atoi(argv[2]);
-            argc -= 2;
-            argv += 2;
-        }
-        else {
-            printf("ERROR: invalid option: %s\n", argv[1]);
-            return -1;
-        }
-    }
-    if(argc < 3) {
-        printf("Usage: nnef2openvx [-v <verbose>] [--no-virtual] <nnefContainerFolder> <openvxOutputFolder>\n");
-        return -1;
-    }
-    std::string nnefContainedFolder = argv[1];
-    std::string openvxOutputFolder = argv[2];
-    std::string nnefFilename = nnefContainedFolder + "/graph.nnef";
-
-    ////
-    // parse NNEF structure and translate to OpenVX code
-    //
-    std::ifstream ifs(nnefFilename.c_str());
-    if(!ifs) {
-        printf("ERROR: unable to open: %s\n", nnefFilename.c_str());
-        return -1;
-    }
-    mkdir(openvxOutputFolder.c_str(), 0777);
-    printf("OK: parsing %s ...\n", nnefFilename.c_str());
-    std::unique_ptr<nnef::Parser> parser((nnef::Parser*)new nnef::FlatParser());
-    try {
-        NNEF2OpenVX_Translator callback(nnefContainedFolder, openvxOutputFolder, useVirtual, verbose);
-        parser->parse(ifs, callback);
-    }
-    catch(nnef::Error e) {
-        printf("Parse error: [%u:%u] %s\n", e.position().line, e.position().column, e.what());
-        auto origin = e.position().origin;
-        while(origin) {
-            printf("... evaluated from [%u:%u]\n", origin->line, origin->column);
-            origin = origin->origin;
-        }
-    }
-    ifs.close();
-
-    return 0;
-}
diff --git a/utilities/mv_deploy/README.md b/utilities/mv_deploy/README.md
index 5cb14fd0a4..1e7a9067ce 100644
--- a/utilities/mv_deploy/README.md
+++ b/utilities/mv_deploy/README.md
@@ -6,12 +6,12 @@ mv_deploy consists of a model-compiler and necessary header/.cpp files which are
 
 The "mv_compile" will be built as part of MIVisionX package installer
 To build and application using mv_compile, the user can use the deployment api from mv_deploy.h.
-The entire use of the mv_compile and deployment is shown in [mv_objdetectsample](../samples/inference/mv_objdetect)
+The entire use of the mv_compile and deployment is shown in [mv_objdetectsample](../samples/mv_objdetect)
 The sample demonstrates the use of mv_compile utility to do video decoding and inference.
 
 ## Prerequisites
 
-* Ubuntu `18.04`/`20.04` or CentOS `7`/`8`
+* Ubuntu `20.04`/`22.04` or CentOS `7`/`8`
 * [ROCm supported hardware](https://rocm.github.io/ROCmInstall.html#hardware-support) 
 	* AMD Radeon GPU or APU required
 * [ROCm](https://github.com/RadeonOpenCompute/ROCm#installing-from-amd-rocm-repositories)