diff --git a/CHANGELOG.md b/CHANGELOG.md index 4af2ea0f02..0b982e27c2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,12 +10,14 @@ ### Added -* CTest - OpenVX Tests -* Hardware Support +* CTest - Tests for install verification +* Hardware Support updates +* Doxygen - Support for API documentation ### Optimizations * CMakeList Cleanup +* Readme ### Changed @@ -30,7 +32,7 @@ * rocAL bug fix and updates -### Tested Configurations +### Tested configurations * Windows `10` / `11` * Linux distribution @@ -38,13 +40,12 @@ + CentOS - `7` / `8` + RHEL - `8` / `9` + SLES - `15-SP4` -* ROCm: rocm-core - `5.4.3.50403-121` -* miopen-hip - `2.19.0.50403-121` -* miopen-opencl - `2.18.0.50300-63` -* migraphx - `2.4.0.50403-121` +* ROCm: rocm-core - `5.7.0.50700-6` +* miopen-hip - `2.20.0.50700-63` +* migraphx - `2.7.0.50700-63` * Protobuf - [V3.12.4](https://github.com/protocolbuffers/protobuf/releases/tag/v3.12.4) * OpenCV - [4.6.0](https://github.com/opencv/opencv/releases/tag/4.6.0) -* RPP - [1.2.0](https://github.com/GPUOpen-ProfessionalCompute-Libraries/rpp/releases/tag/1.2.0) +* RPP - [1.2.0.50700-63](https://github.com/GPUOpen-ProfessionalCompute-Libraries/rpp/releases/tag/1.2.0) * FFMPEG - [n4.4.2](https://github.com/FFmpeg/FFmpeg/releases/tag/n4.4.2) * Dependencies for all the above packages * MIVisionX Setup Script - `V2.5.5` @@ -52,6 +53,7 @@ ### Known issues * OpenCV 4.X support for some apps missing +* MIVisionX Package install requires manual prerequisites installation ## MIVisionX 2.4.0 diff --git a/CMakeLists.txt b/CMakeLists.txt index d0ca237736..2b322b064e 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -38,7 +38,6 @@ set(ROCM_PATH /opt/rocm CACHE PATH "Default ROCm installation path") if(CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT) set(CMAKE_INSTALL_PREFIX ${ROCM_PATH} CACHE PATH "MIVisionX default installation path" FORCE) endif(CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT) -set(CMAKE_INSTALL_RPATH_USE_LINK_PATH TRUE) # MIVisionX Default Options option(ENHANCED_MESSAGE "MIVisionX Enhanced Message Option" ON) diff --git a/README.md b/README.md index e4cb9d0bcc..d66954d290 100644 --- a/README.md +++ b/README.md @@ -22,19 +22,19 @@ MIVisionX toolkit is a set of comprehensive computer vision and machine intellig - [Utilities](#utilities) - [Prerequisites](#prerequisites) - [Hardware](#hardware) - - [Operating System](#operating-system) + - [Operating System \& Prerequisites](#operating-system--prerequisites) - [Windows](#windows) - [macOS](#macos) - [Linux](#linux) - [Prerequisites setup script for Linux](#prerequisites-setup-script-for-linux) - [Prerequisites for running the script](#prerequisites-for-running-the-script) - [Build \& Install MIVisionX](#build--install-mivisionx) - - [Building on Windows](#building-on-windows) + - [Windows](#windows-1) - [Using `Visual Studio`](#using-visual-studio) - - [Building on macOS](#building-on-macos) - - [Building on Linux](#building-on-linux) + - [macOS](#macos-1) + - [Linux](#linux-1) - [Using `apt-get` / `yum` / `zypper`](#using-apt-get--yum--zypper) - - [Using MIVisionX-setup.py](#using-mivisionx-setuppy) + - [Using `MIVisionX-setup.py`](#using-mivisionx-setuppy) - [Verify the Installation](#verify-the-installation) - [Verifying on Linux / macOS](#verifying-on-linux--macos) - [Verifying on Windows](#verifying-on-windows) @@ -128,10 +128,10 @@ MIVisionX provides you with tools for accomplishing your tasks throughout the wh ## Utilities -* [inference_generator](utilities/inference_generator/README.md#inference-generator): generate inference library from pre-trained CAFFE models * [loom_shell](utilities/loom_shell/README.md#radeon-loomsh): an interpreter to prototype 360 degree video stitching applications using a script -* [RunVX](utilities/runvx/README.md#amd-runvx): command-line utility to execute OpenVX graph described in GDF text file +* [mv_deploy](utilities/mv_deploy/README.md): consists of a model-compiler and necessary header/.cpp files which are required to run inference for a specific NeuralNet model * [RunCL](utilities/runcl/README.md#amd-runcl): command-line utility to build, execute, and debug OpenCL programs +* [RunVX](utilities/runvx/README.md#amd-runvx): command-line utility to execute OpenVX graph described in GDF text file ## Prerequisites @@ -143,7 +143,7 @@ MIVisionX provides you with tools for accomplishing your tasks throughout the wh **Note:** Some modules in MIVisionX can be built for `CPU ONLY`. To take advantage of `Advanced Features And Modules` we recommend using `AMD GPUs` or `AMD APUs`. -### Operating System +### Operating System & Prerequisites #### Windows @@ -172,7 +172,7 @@ MIVisionX provides you with tools for accomplishing your tasks throughout the wh + **CentOS** - `7` / `8` + **RedHat** - `8` / `9` + **SLES** - `15-SP4` -* Install [ROCm](https://docs.amd.com) +* Install [ROCm](https://rocmdocs.amd.com/en/latest/deploy/linux/installer/install.html) with `--usecase=graphics,rocm` * CMake 3.5 or later * MIOpen for [vx_nn](amd_openvx_extensions/amd_nn/README.md#openvx-neural-network-extension-library-vx_nn) extension * MIGraphX for `vx_migraphx` extension @@ -194,8 +194,8 @@ For the convenience of the developer, we provide the setup script `MIVisionX-set + CentOS - `7` / `8` + RedHat - `8` / `9` + SLES - `15-SP4` -* [ROCm supported hardware](https://docs.amd.com) -* [ROCm](https://docs.amd.com) +* [ROCm supported hardware](https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html) +* Install [ROCm](https://rocmdocs.amd.com/en/latest/deploy/linux/installer/install.html) with `--usecase=graphics,rocm` **usage:** @@ -215,12 +215,12 @@ For the convenience of the developer, we provide the setup script `MIVisionX-set --rocm_path [ROCm Installation Path - optional (default:/opt/rocm) - ROCm Installation Required] ``` **Note:** - * **ROCm upgrade** with `sudo apt upgrade` requires the setup script **rerun**. + * **ROCm upgrade** requires the setup script **rerun**. * use `X Window` / `X11` for [remote GUI app control](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/wiki/X-Window-forwarding) ## Build & Install MIVisionX -### Building on Windows +### Windows #### Using `Visual Studio` @@ -229,16 +229,16 @@ For the convenience of the developer, we provide the setup script `MIVisionX-set **NOTE:** `vx_nn` is not supported on `Windows` in this release -### Building on macOS +### macOS macOS [build instructions](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/wiki/macOS#macos-build-instructions) -### Building on Linux +### Linux -#### Using `apt-get` / `yum` / `zypper` +* [ROCm supported hardware](https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html) +* Install [ROCm](https://rocmdocs.amd.com/en/latest/deploy/linux/installer/install.html) with `--usecase=graphics,rocm` -* [ROCm supported hardware](https://docs.amd.com) -* Install [ROCm](https://docs.amd.com) +#### Using `apt-get` / `yum` / `zypper` * On `Ubuntu` ``` @@ -250,7 +250,7 @@ macOS [build instructions](https://github.com/GPUOpen-ProfessionalCompute-Librar ``` * On `SLES` ``` - sudo zypper install mivisionxF + sudo zypper install mivisionx ``` **Note:** @@ -265,22 +265,21 @@ macOS [build instructions](https://github.com/GPUOpen-ProfessionalCompute-Librar + Docs folder into `/opt/rocm/share/doc/mivisionx` * Package (.deb & .rpm) install requires `OpenCV v4.6` to execute `AMD OpenCV extensions` -#### Using MIVisionX-setup.py +#### Using `MIVisionX-setup.py` -* Install [ROCm](https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html) -* Use the below commands to set up and build MIVisionX +* Clone MIVisionX git repository ``` git clone https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX.git - cd MIVisionX ``` **Note:** MIVisionX has support for two GPU backends: **OPENCL** and **HIP**: - + Instructions for building MIVisionX with the **HIP** GPU backend (i.e., default GPU backend): +* Instructions for building MIVisionX with the **HIP** GPU backend (i.e., default GPU backend): + run the setup script to install all the dependencies required by the **HIP** GPU backend: ``` + cd MIVisionX python MIVisionX-setup.py ``` @@ -293,12 +292,17 @@ macOS [build instructions](https://github.com/GPUOpen-ProfessionalCompute-Librar sudo cmake --build . --target PyPackageInstall sudo make install ``` + + + run tests - [test option instructions](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/wiki/CTest) + ``` + make test + ``` **Note:** + `PyPackageInstall` used for rocal_pybind installation + rocal_pybind not supported on windows. + `sudo` required for pybind installation - + Instructions for building MIVisionX with [**OPENCL** GPU backend](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/wiki/OpenCL-Backend) +* Instructions for building MIVisionX with [**OPENCL** GPU backend](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/wiki/OpenCL-Backend) ## Verify the Installation @@ -350,8 +354,8 @@ Docker files to build MIVisionX containers are [available](docker#mivisionx-dock #### Prerequisites * Ubuntu `20.04`/`22.04` -* [ROCm supported hardware](https://docs.amd.com) -* [ROCm](https://docs.amd.com) +* [ROCm supported hardware](https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html) +* Install [ROCm](https://rocmdocs.amd.com/en/latest/deploy/linux/installer/install.html) with `--usecase=graphics,rocm` * [Docker](https://docs.docker.com/engine/install/ubuntu/) #### Workflow @@ -432,20 +436,20 @@ Review all notable [changes](CHANGELOG.md#changelog) with the latest release + CentOS - `7` / `8` + RHEL - `8` / `9` + SLES - `15-SP4` -* ROCm: rocm-core - `5.4.3.50403-121` -* miopen-hip - `2.19.0.50403-121` -* miopen-opencl - `2.18.0.50300-63` -* migraphx - `2.4.0.50403-121` +* ROCm: rocm-core - `5.7.0.50700-6` +* miopen-hip - `2.20.0.50700-63` +* migraphx - `2.7.0.50700-63` * Protobuf - [V3.12.4](https://github.com/protocolbuffers/protobuf/releases/tag/v3.12.4) * OpenCV - [4.6.0](https://github.com/opencv/opencv/releases/tag/4.6.0) -* RPP - [1.2.0](https://github.com/GPUOpen-ProfessionalCompute-Libraries/rpp/releases/tag/1.2.0) +* RPP - [1.2.0.50700-63](https://github.com/GPUOpen-ProfessionalCompute-Libraries/rpp/releases/tag/1.2.0) * FFMPEG - [n4.4.2](https://github.com/FFmpeg/FFmpeg/releases/tag/n4.4.2) * Dependencies for all the above packages * MIVisionX Setup Script - `V2.5.5` ### Known issues -* Package install requires **OpenCV** `V-4.6.0` to execute `AMD OpenCV extensions` +* OpenCV 4.X support for some apps missing +* MIVisionX Package install requires manual prerequisites installation ## MIVisionX Dependency Map diff --git a/amd_openvx_extensions/amd_nn/CMakeLists.txt b/amd_openvx_extensions/amd_nn/CMakeLists.txt index 0dbd18befb..c0c21e45f0 100644 --- a/amd_openvx_extensions/amd_nn/CMakeLists.txt +++ b/amd_openvx_extensions/amd_nn/CMakeLists.txt @@ -132,7 +132,7 @@ if(BUILD_DEV) install(DIRECTORY ../../apps/dg_test DESTINATION ${CMAKE_INSTALL_DATADIR}/mivisionx/apps) install(DIRECTORY ../../apps/mivisionx_inference_analyzer DESTINATION ${CMAKE_INSTALL_DATADIR}/mivisionx/apps) install(DIRECTORY ../../apps/mivisionx_openvx_classifier DESTINATION ${CMAKE_INSTALL_DATADIR}/mivisionx/apps) - install(DIRECTORY ../../samples/inference/mv_objdetect DESTINATION ${CMAKE_INSTALL_DATADIR}/mivisionx/samples) + install(DIRECTORY ../../samples/mv_objdetect DESTINATION ${CMAKE_INSTALL_DATADIR}/mivisionx/samples) install(DIRECTORY ../../samples/model_compiler_samples DESTINATION ${CMAKE_INSTALL_DATADIR}/mivisionx/samples) endif(BUILD_DEV) diff --git a/apps/image_augmentation/README.md b/apps/image_augmentation/README.md index 7251204b8d..cc5daaa37a 100644 --- a/apps/image_augmentation/README.md +++ b/apps/image_augmentation/README.md @@ -8,7 +8,7 @@ This application demonstrates the basic usage of rocAL's C API to load JPEG imag ### Pre-requisites -* Ubuntu Linux, [version `16.04` or later](https://www.microsoft.com/software-download/windows10) +* Ubuntu Linux, [version `20.04` or later](https://www.microsoft.com/software-download/windows10) * rocAL library (Part of the MIVisionX toolkit) * [OpenCV 3.1](https://github.com/opencv/opencv/releases) or higher * Radeon Performance Primitives (RPP) diff --git a/apps/mivisionx_inference_analyzer/README.md b/apps/mivisionx_inference_analyzer/README.md index d55141c2dd..6ebc4a7d5f 100644 --- a/apps/mivisionx_inference_analyzer/README.md +++ b/apps/mivisionx_inference_analyzer/README.md @@ -36,7 +36,7 @@ Pre-trained models in [ONNX](https://onnx.ai/), [NNEF](https://www.khronos.org/n ## Prerequisites -* Ubuntu `16.04` / `18.04` or CentOS `7.5` / `7.6` +* Ubuntu `20.04` / `22.04` or CentOS `7.5` / `7.6` * [ROCm supported hardware](https://rocm.github.io/ROCmInstall.html#hardware-support) + AMD Radeon GPU or AMD APU required * Latest [ROCm](https://github.com/RadeonOpenCompute/ROCm#installing-from-amd-rocm-repositories) diff --git a/docs/.sphinx/_toc.yml.in b/docs/.sphinx/_toc.yml.in index 90ad0fb2d0..12923094b0 100644 --- a/docs/.sphinx/_toc.yml.in +++ b/docs/.sphinx/_toc.yml.in @@ -64,7 +64,7 @@ subtrees: - entries: - file: samples/c_samples/README - file: samples/gdf/README - - file: samples/inference/mv_objdetect/README + - file: samples/mv_objdetect/README - file: samples/loom_360_stitch/README - file: samples/model_compiler_samples/README subtrees: diff --git a/docs/.sphinx/requirements.in b/docs/.sphinx/requirements.in index 8a7eff9103..49693b7942 100644 --- a/docs/.sphinx/requirements.in +++ b/docs/.sphinx/requirements.in @@ -1 +1,2 @@ -rocm-docs-core[api_reference]==0.24.0 +rocm-docs-core[api_reference]>=0.24.0 + diff --git a/docs/.sphinx/requirements.txt b/docs/.sphinx/requirements.txt index a67aee59b2..d62b231589 100644 --- a/docs/.sphinx/requirements.txt +++ b/docs/.sphinx/requirements.txt @@ -47,7 +47,7 @@ fastjsonschema==2.16.3 # via rocm-docs-core gitdb==4.0.10 # via gitpython -gitpython==3.1.34 +gitpython==3.1.35 # via rocm-docs-core idna==3.4 # via requests @@ -110,7 +110,7 @@ requests==2.31.0 # via # pygithub # sphinx -rocm-docs-core[api_reference]==0.24.0 +rocm-docs-core[api_reference]>=0.24.0 # via -r requirements.in smmap==5.0.0 # via gitdb diff --git a/samples/README.md b/samples/README.md index ee0580bcf5..9fab99934c 100644 --- a/samples/README.md +++ b/samples/README.md @@ -6,7 +6,7 @@ MIVisionX samples using OpenVX and OpenVX extensions. In the samples below we wi * [GDF - Graph Description Format Samples](#gdf---graph-description-format) * [Loom 360 Stitch - Radeon Loom 360 Stitch Samples](#loom-360-stitch---radeon-loom-360-stitch-samples) * [Model Compiler Samples - Run Efficient Inference](#model-compiler-samples---run-efficient-inference) -* [MIVisionX Inference Deploy Samples](inference/mv_objdetect/) +* [MIVisionX Inference Deploy Samples](mv_objdetect) ## GDF - Graph Description Format @@ -108,7 +108,7 @@ make MIVisionX samples using [LoomShell](../utilities/loom_shell/README.md#radeon-loomshell) -[![Loom Stitch](https://raw.githubusercontent.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/develop/docs/data/loom-4.pngloom-4.png)](https://youtu.be/E8pPU04iZjw) +[![Loom Stitch](https://raw.githubusercontent.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/develop/docs/data/loom-4.png)](https://youtu.be/E8pPU04iZjw) **Note:** @@ -225,3 +225,11 @@ In this [sample](model_compiler_samples/README.md#mivisionx-model-compiler-sampl ### [Sample-3: Classification Using Pre-Trained NNEF Model](model_compiler_samples/README.md#sample-3---classification-using-pre-trained-nnef-model) ### [Sample-4: Classification Using Pre-Trained Caffe Model](model_compiler_samples/README.md#sample-4---classification-using-pre-trained-caffe-model) + +## MV Object Detect Samples + +This [sample](mv_objdetect) shows how to run video decoding and object detection using pre-trained `YoloV2` Caffe Model + +The sample demonstrates the use of mv_compile utility to do video decoding and inference. + +

\ No newline at end of file diff --git a/samples/model_compiler_samples/README.md b/samples/model_compiler_samples/README.md index 934fda9110..bfa19a33d0 100644 --- a/samples/model_compiler_samples/README.md +++ b/samples/model_compiler_samples/README.md @@ -27,7 +27,7 @@ Pre-trained models in [ONNX](https://onnx.ai/), [NNEF](https://www.khronos.org/n ### Prerequisites -* Ubuntu `18.04`/`20.04` or CentOS `7`/`8` +* Ubuntu `20.04`/`22.04` or CentOS `7`/`8` * [ROCm supported hardware](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.1/page/Prerequisite_Actions.html) * AMD Radeon GPU or AMD APU required * Latest [ROCm](https://docs.amd.com/category/ROCm™%20v5.x) @@ -35,11 +35,23 @@ Pre-trained models in [ONNX](https://onnx.ai/), [NNEF](https://www.khronos.org/n #### Docker for Samples -MIVisionX provides developers with [docker images](https://hub.docker.com/u/mivisionx) for [Ubuntu 18.04](https://hub.docker.com/r/mivisionx/ubuntu-18.04), [Ubuntu 20.04](https://hub.docker.com/r/mivisionx/ubuntu-20.04), [CentOS 7](https://hub.docker.com/r/mivisionx/centos-7), & [CentOS 8](https://hub.docker.com/r/mivisionx/centos-8). Using docker images developers can quickly prototype and build applications without having to be locked into a single system setup or lose valuable time figuring out the dependencies of the underlying software. +MIVisionX provides developers with docker images for Ubuntu `20.04` / `22.04`. Using docker images developers can quickly prototype and build applications without having to be locked into a single system setup or lose valuable time figuring out the dependencies of the underlying software. -##### Docker with display option for the samples +Docker files to build MIVisionX containers are [available](docker#mivisionx-docker) + +### MIVisionX Docker +* [Ubuntu 20.04](https://cloud.docker.com/repository/docker/mivisionx/ubuntu-20.04) +* [Ubuntu 22.04](https://cloud.docker.com/repository/docker/mivisionx/ubuntu-22.04) + +### Docker Workflow on Ubuntu `20.04`/`22.04` -* Check [docker prerequisites](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX#docker-workflow-sample-on-ubuntu-1804--2004) +#### Prerequisites +* Ubuntu `20.04`/`22.04` +* [ROCm supported hardware](https://docs.amd.com) +* [ROCm](https://docs.amd.com) +* [Docker](https://docs.docker.com/engine/install/ubuntu/) + +##### Docker with display option for the samples * Start docker with display ```` diff --git a/samples/inference/mv_objdetect/CMakeLists.txt b/samples/mv_objdetect/CMakeLists.txt similarity index 100% rename from samples/inference/mv_objdetect/CMakeLists.txt rename to samples/mv_objdetect/CMakeLists.txt diff --git a/samples/inference/mv_objdetect/README.md b/samples/mv_objdetect/README.md similarity index 97% rename from samples/inference/mv_objdetect/README.md rename to samples/mv_objdetect/README.md index 6f2bf21591..b05ac02b13 100644 --- a/samples/inference/mv_objdetect/README.md +++ b/samples/mv_objdetect/README.md @@ -9,8 +9,7 @@ The sample has two .cpp files, `mvobjdetect.cpp` and `visualize.cpp`. But it nee ## Prerequisites * Linux - * Ubuntu `18.04`/`20.04` - * CentOS `7`/`8` + * Ubuntu `20.04`/`22.04` * [ROCm supported hardware](https://docs.amd.com) * **GPU**: [AMD Radeon™ Graphics](https://docs.amd.com/bundle/Hardware_and_Software_Reference_Guide/page/Hardware_and_Software_Support.html) [Required] * **APU**: [AMD Radeon™ `Mobile`/`Embedded`](https://docs.amd.com/bundle/Hardware_and_Software_Reference_Guide/page/Hardware_and_Software_Support.html) [optional] @@ -33,7 +32,7 @@ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/lib wget https://github.com/kiritigowda/YoloV2NCS/raw/master/models/caffemodels/yoloV2Tiny20.caffemodel ``` -### Step 2. compile model for OPENCL-ROCm-OpenVX backend using mv_compile utility +### Step 2. compile model for OpenVX backend using mv_compile utility The mv_compile utility generates deployment library, header files, and .cpp files required to run inference for the specified model. * Usage: @@ -149,7 +148,7 @@ cd .. ### Step 10. Sample output for multiple video object detection -

+

# License This project is licensed under the MIT License - see the LICENSE.md file for details diff --git a/samples/inference/mv_objdetect/data/Videos/Videos_4.txt b/samples/mv_objdetect/data/Videos/Videos_4.txt similarity index 100% rename from samples/inference/mv_objdetect/data/Videos/Videos_4.txt rename to samples/mv_objdetect/data/Videos/Videos_4.txt diff --git a/samples/inference/mv_objdetect/data/images/Video_4_screenshot.png b/samples/mv_objdetect/data/images/Video_4_screenshot.png similarity index 100% rename from samples/inference/mv_objdetect/data/images/Video_4_screenshot.png rename to samples/mv_objdetect/data/images/Video_4_screenshot.png diff --git a/samples/inference/mv_objdetect/data/images/img_04.JPG b/samples/mv_objdetect/data/images/img_04.JPG similarity index 100% rename from samples/inference/mv_objdetect/data/images/img_04.JPG rename to samples/mv_objdetect/data/images/img_04.JPG diff --git a/samples/inference/mv_objdetect/mvobjdetect.cpp b/samples/mv_objdetect/mvobjdetect.cpp similarity index 100% rename from samples/inference/mv_objdetect/mvobjdetect.cpp rename to samples/mv_objdetect/mvobjdetect.cpp diff --git a/samples/inference/mv_objdetect/visualize.cpp b/samples/mv_objdetect/visualize.cpp similarity index 100% rename from samples/inference/mv_objdetect/visualize.cpp rename to samples/mv_objdetect/visualize.cpp diff --git a/samples/inference/mv_objdetect/visualize.h b/samples/mv_objdetect/visualize.h similarity index 100% rename from samples/inference/mv_objdetect/visualize.h rename to samples/mv_objdetect/visualize.h diff --git a/tests/library_tests/README.md b/tests/library_tests/README.md index e73529318f..02181f3ff5 100644 --- a/tests/library_tests/README.md +++ b/tests/library_tests/README.md @@ -1,6 +1,6 @@ # MIVisionX Library Tests -## Script to check if all libraries are built +## Script to check if all libraries are built & installed ``` python runLibraryTests.py diff --git a/tests/library_tests/runLibraryTests.py b/tests/library_tests/runLibraryTests.py index 84ba97b4a5..aacc2d586b 100644 --- a/tests/library_tests/runLibraryTests.py +++ b/tests/library_tests/runLibraryTests.py @@ -86,9 +86,11 @@ def write_formatted(output, f): platform_name = platform_name+'-SLES' else: print("\nMIVisionX Library Test on "+platform_name+" is unsupported") - print("MIVisionX Library Test Supported on: Ubuntu 20/22; CentOS 7/8; RedHat 8/9; & SLES 15 SP3") + print("MIVisionX Library Test Supported on: Ubuntu 20/22; CentOS 7/8; RedHat 8/9; & SLES 15 SP4") exit(1) +# TBD - Install inxi package + print("\nMIVisionX Library Test V:"+__version__ + " on "+platform_name+" is supported") @@ -311,6 +313,4 @@ def write_formatted(output, f): print("STATUS: Output Report File - "+reportFileDir) if warning == 1: print("WARNING: Not all modules of MIVisionX is built, check for missing dependencies") -else: - print("SUCCESS: All modules of MIVisionX built") -print("runLibraryTests.py completed - V:"+__version__+"\n") +print("MIVisionX Tests - runLibraryTests.py - V:"+__version__+"\n") diff --git a/tests/openvx_api_tests/CMakeLists.txt b/tests/openvx_api_tests/CMakeLists.txt index 835e77a60f..2072b7c27e 100644 --- a/tests/openvx_api_tests/CMakeLists.txt +++ b/tests/openvx_api_tests/CMakeLists.txt @@ -25,6 +25,9 @@ ################################################################################ cmake_minimum_required(VERSION 3.5) +# TBD - Install additional data indepedent tests +install(DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/../library_tests DESTINATION ${CMAKE_INSTALL_DATADIR}/mivisionx/tests) + # default run # canny add_test( @@ -130,42 +133,42 @@ if(GPU_SUPPORT) # caffe2nnir2openvx Fuse flow add_test(NAME caffe2nnir2openvx_fuse COMMAND ${Python3_EXECUTABLE} ${CMAKE_SOURCE_DIR}/tests/neural_network_tests/runNeuralNetworkTests.py - --profiler_mode 2 + --profiler_mode 2 --reinstall off WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}) # caffe2nnir2openvx FP16 flow add_test(NAME caffe2nnir2openvx_fp16 COMMAND ${Python3_EXECUTABLE} ${CMAKE_SOURCE_DIR}/tests/neural_network_tests/runNeuralNetworkTests.py - --profiler_mode 3 + --profiler_mode 3 --reinstall off WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}) # onnx2nnir2openvx No Fuse flow add_test(NAME onnx2nnir2openvxx_no_fuse COMMAND ${Python3_EXECUTABLE} ${CMAKE_SOURCE_DIR}/tests/neural_network_tests/runNeuralNetworkTests.py - --profiler_mode 4 + --profiler_mode 4 --reinstall off WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}) # onnx2nnir2openvx Fuse flow add_test(NAME onnx2nnir2openvxx_fuse COMMAND ${Python3_EXECUTABLE} ${CMAKE_SOURCE_DIR}/tests/neural_network_tests/runNeuralNetworkTests.py - --profiler_mode 5 + --profiler_mode 5 --reinstall off WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}) # onnx2nnir2openvx FP16 flow add_test(NAME onnx2nnir2openvxx_fp16 COMMAND ${Python3_EXECUTABLE} ${CMAKE_SOURCE_DIR}/tests/neural_network_tests/runNeuralNetworkTests.py - --profiler_mode 6 + --profiler_mode 6 --reinstall off WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}) # nnef2nnir2openvx No Fuse flow add_test(NAME nnef2nnir2openvxx_no_fuse COMMAND ${Python3_EXECUTABLE} ${CMAKE_SOURCE_DIR}/tests/neural_network_tests/runNeuralNetworkTests.py - --profiler_mode 7 + --profiler_mode 7 --reinstall off WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}) # nnef2nnir2openvx Fuse flow add_test(NAME nnef2nnir2openvxx_fuse COMMAND ${Python3_EXECUTABLE} ${CMAKE_SOURCE_DIR}/tests/neural_network_tests/runNeuralNetworkTests.py - --profiler_mode 8 + --profiler_mode 8 --reinstall off WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}) # nnef2nnir2openvx FP16 flow add_test(NAME nnef2nnir2openvxx_fp16 COMMAND ${Python3_EXECUTABLE} ${CMAKE_SOURCE_DIR}/tests/neural_network_tests/runNeuralNetworkTests.py - --profiler_mode 9 + --profiler_mode 9 --reinstall off WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}) endif(NEURAL_NET AND Python3_FOUND) diff --git a/utilities/inference_generator/CMakeLists.txt b/utilities/inference_generator/CMakeLists.txt deleted file mode 100644 index 9f8488afba..0000000000 --- a/utilities/inference_generator/CMakeLists.txt +++ /dev/null @@ -1,41 +0,0 @@ -# Copyright (c) 2017 - 2023 Advanced Micro Devices, Inc. All rights reserved. -# -# Permission is hereby granted, free of charge, to any person obtaining a copy -# of this software and associated documentation files (the "Software"), to deal -# in the Software without restriction, including without limitation the rights -# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -# copies of the Software, and to permit persons to whom the Software is -# furnished to do so, subject to the following conditions: -# -# The above copyright notice and this permission notice shall be included in -# all copies or substantial portions of the Software. -# -# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN -# THE SOFTWARE. - -cmake_minimum_required(VERSION 3.5) -project(inference_generator) - -set(CMAKE_CXX_STANDARD 14) - -find_package(Protobuf REQUIRED) -PROTOBUF_GENERATE_CPP(PROTO_SRCS PROTO_HDRS proto/caffe.proto) - -include_directories(${CMAKE_CURRENT_BINARY_DIR}) -list(APPEND CAFFE_SOURCES src/caffe2openvx.cpp ${PROTO_SRCS} ${PROTO_HDRS}) -add_executable(caffe2openvx ${CAFFE_SOURCES}) -target_link_libraries(caffe2openvx ${PROTOBUF_LIBRARIES}) -install (TARGETS caffe2openvx DESTINATION ${CMAKE_INSTALL_BINDIR}) - -if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "MSVC") - set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /MT") - set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} /MTd") -else() - set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=gnu++14") -endif() - diff --git a/utilities/inference_generator/README.md b/utilities/inference_generator/README.md deleted file mode 100644 index 067fae1cf3..0000000000 --- a/utilities/inference_generator/README.md +++ /dev/null @@ -1,111 +0,0 @@ -# Inference Generator - -caffe2openvx: Convert a pre-trained CAFFE model into a C library for use by applications. -* Extract neural network model from `deploy.prototxt` - + generate C code that instantiates OpenVX kernels from [vx_nn](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/tree/master/vx_nn/README.md) module - + generate build scripts that package C code into a library - + the generated C code or library can be easily integrated into an application for running inference -* Extract weights and biases from `weights.caffemodel` into separates folders for use by the C library during initialization -* Also generate a GDF for quick prototyping and kernel debugging - -The generated C code will have two functions in `annmodule.h`: - -``` -void annGetTensorDimensions( - vx_size dimInput[4], // input tensor dimensions - vx_size dimOutput[4] // output tensor dimensions - ); - -vx_graph annCreateGraph( - vx_context context, // OpenVX context - vx_tensor input, // input tensor - vx_tensor output, // output tensor - const char * dataFolder // folder with weights and biases - ); -or -vx_graph annCreateGraphWithInputImage( - vx_context context, // OpenVX context - vx_image input, // input image (RGB or U8) - vx_tensor output, // output tensor - const char * dataFolder // folder with weights and biases - ); -or -vx_graph annCreateGraphWithInputImageWithArgmaxTensor( - vx_context context, // OpenVX context - vx_image input, // input image (RGB or U8) - vx_tensor output, // output tensor - const char * dataFolder // folder with weights and biases - ); -or -vx_graph annCreateGraphWithInputImageWithArgmaxImage( - vx_context context, // OpenVX context - vx_image input, // input image (RGB or U8) - vx_image output, // output image (U8) - const char * dataFolder // folder with weights and biases - ); -or -vx_graph annCreateGraphWithInputImageWithArgmaxImageWithLut( - vx_context context, // OpenVX context - vx_image input, // input image (RGB or U8) - vx_image output, // output image (RGB) - const char * dataFolder // folder with weights and biases - ); -``` - -* `annGetTensorDimensions`: allows an application to query dimensions of input and output tensors -* `annCreateGraph` (or another variant above): creates and initializes a graph with trained neural network for inference - -## Command-line Usage - -``` - % caffe2openvx - [options] - - [n c H W [type fixed-point-position [convert-policy round-policy]]] -``` - -| option | description | -| ------ | ----------- | -| --(no-)error-messages | do/don't enable error messages (default: ON) | -| --(no-)virtual-buffers | do/don't use virtual buffers (default: ON) | -| --(no-)generate-gdf | do/don't generate RunVX GDF with weight/bias initialization (default: ON) | -| --(no-)generate-vx-code | do/don't generate OpenVX C Code with weight/bias initialization (default: ON) | -| --output-dir | specify output folder for weights/biases, GDF, and OpenVX C Code (default: current) | -| --input-rgb | convert input from RGB image into tensor using (a*x+b) conversion: rev=(BGR?1:0) | -| --input-u8 | convert input from U8 image into tensor using (a*x+b) conversion | -| --argmax-tensor u8/u16 k | return argmax output with specified tensor type and top_k | -| --argmax-image u8/u16 | return argmax output with specified image type | -| --argmax-lut | argmax color table: one R G B entry per label | -| --flags | specify custom flags (default: 0) | - -## Example - -Make sure that all executables and libraries are in `PATH` and `LD_LIBRARY_PATH` environment variables. - -``` -% export PATH=$PATH:/opt/rocm/bin -% export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/lib -``` - -Below log outlines a simple use-case with inference generator. - -``` -% caffe2openvx weights.caffemodel 1 3 32 32 -% caffe2openvx deploy.prototxt 1 3 32 32 -% ls -CMakeLists.txt annmodule.txt cmake weights -annmodule.cpp anntest.cpp deploy.prototxt weights.caffemodel -annmodule.h bias net.gdf -% mkdir build -% cd build -% cmake .. -% make -% cd .. -% ls build -CMakeCache.txt Makefile cmake_install.cmake -CMakeFiles anntest libannmodule.so -% ./build/anntest -OK: annGetTensorDimensions() => [input 32x32x3x32] [output 1x1x10x32] -``` - -The `anntest.cpp` is a simple program to initialize and run neural network using the `annmodule` library. diff --git a/utilities/inference_generator/proto/caffe.proto b/utilities/inference_generator/proto/caffe.proto deleted file mode 100644 index c96966b589..0000000000 --- a/utilities/inference_generator/proto/caffe.proto +++ /dev/null @@ -1,1412 +0,0 @@ -syntax = "proto2"; - -package caffe; - -// Specifies the shape (dimensions) of a Blob. -message BlobShape { - repeated int64 dim = 1 [packed = true]; -} - -message BlobProto { - optional BlobShape shape = 7; - repeated float data = 5 [packed = true]; - repeated float diff = 6 [packed = true]; - repeated double double_data = 8 [packed = true]; - repeated double double_diff = 9 [packed = true]; - - // 4D dimensions -- deprecated. Use "shape" instead. - optional int32 num = 1 [default = 0]; - optional int32 channels = 2 [default = 0]; - optional int32 height = 3 [default = 0]; - optional int32 width = 4 [default = 0]; -} - -// The BlobProtoVector is simply a way to pass multiple blobproto instances -// around. -message BlobProtoVector { - repeated BlobProto blobs = 1; -} - -message Datum { - optional int32 channels = 1; - optional int32 height = 2; - optional int32 width = 3; - // the actual image data, in bytes - optional bytes data = 4; - optional int32 label = 5; - // Optionally, the datum could also hold float data. - repeated float float_data = 6; - // If true data contains an encoded image that need to be decoded - optional bool encoded = 7 [default = false]; -} - -message FillerParameter { - // The filler type. - optional string type = 1 [default = 'constant']; - optional float value = 2 [default = 0]; // the value in constant filler - optional float min = 3 [default = 0]; // the min value in uniform filler - optional float max = 4 [default = 1]; // the max value in uniform filler - optional float mean = 5 [default = 0]; // the mean value in Gaussian filler - optional float std = 6 [default = 1]; // the std value in Gaussian filler - // The expected number of non-zero output weights for a given input in - // Gaussian filler -- the default -1 means don't perform sparsification. - optional int32 sparse = 7 [default = -1]; - // Normalize the filler variance by fan_in, fan_out, or their average. - // Applies to 'xavier' and 'msra' fillers. - enum VarianceNorm { - FAN_IN = 0; - FAN_OUT = 1; - AVERAGE = 2; - } - optional VarianceNorm variance_norm = 8 [default = FAN_IN]; -} - -message NetParameter { - optional string name = 1; // consider giving the network a name - // DEPRECATED. See InputParameter. The input blobs to the network. - repeated string input = 3; - // DEPRECATED. See InputParameter. The shape of the input blobs. - repeated BlobShape input_shape = 8; - - // 4D input dimensions -- deprecated. Use "input_shape" instead. - // If specified, for each input blob there should be four - // values specifying the num, channels, height and width of the input blob. - // Thus, there should be a total of (4 * #input) numbers. - repeated int32 input_dim = 4; - - // Whether the network will force every layer to carry out backward operation. - // If set False, then whether to carry out backward is determined - // automatically according to the net structure and learning rates. - optional bool force_backward = 5 [default = false]; - // The current "state" of the network, including the phase, level, and stage. - // Some layers may be included/excluded depending on this state and the states - // specified in the layers' include and exclude fields. - optional NetState state = 6; - - // Print debugging information about results while running Net::Forward, - // Net::Backward, and Net::Update. - optional bool debug_info = 7 [default = false]; - - // The layers that make up the net. Each of their configurations, including - // connectivity and behavior, is specified as a LayerParameter. - repeated LayerParameter layer = 100; // ID 100 so layers are printed last. - - // DEPRECATED: use 'layer' instead. - repeated V1LayerParameter layers = 2; -} - -// NOTE -// Update the next available ID when you add a new SolverParameter field. -// -// SolverParameter next available ID: 42 (last added: layer_wise_reduce) -message SolverParameter { - ////////////////////////////////////////////////////////////////////////////// - // Specifying the train and test networks - // - // Exactly one train net must be specified using one of the following fields: - // train_net_param, train_net, net_param, net - // One or more test nets may be specified using any of the following fields: - // test_net_param, test_net, net_param, net - // If more than one test net field is specified (e.g., both net and - // test_net are specified), they will be evaluated in the field order given - // above: (1) test_net_param, (2) test_net, (3) net_param/net. - // A test_iter must be specified for each test_net. - // A test_level and/or a test_stage may also be specified for each test_net. - ////////////////////////////////////////////////////////////////////////////// - - // Proto filename for the train net, possibly combined with one or more - // test nets. - optional string net = 24; - // Inline train net param, possibly combined with one or more test nets. - optional NetParameter net_param = 25; - - optional string train_net = 1; // Proto filename for the train net. - repeated string test_net = 2; // Proto filenames for the test nets. - optional NetParameter train_net_param = 21; // Inline train net params. - repeated NetParameter test_net_param = 22; // Inline test net params. - - // The states for the train/test nets. Must be unspecified or - // specified once per net. - // - // By default, train_state will have phase = TRAIN, - // and all test_state's will have phase = TEST. - // Other defaults are set according to the NetState defaults. - optional NetState train_state = 26; - repeated NetState test_state = 27; - - // The number of iterations for each test net. - repeated int32 test_iter = 3; - - // The number of iterations between two testing phases. - optional int32 test_interval = 4 [default = 0]; - optional bool test_compute_loss = 19 [default = false]; - // If true, run an initial test pass before the first iteration, - // ensuring memory availability and printing the starting value of the loss. - optional bool test_initialization = 32 [default = true]; - optional float base_lr = 5; // The base learning rate - // the number of iterations between displaying info. If display = 0, no info - // will be displayed. - optional int32 display = 6; - // Display the loss averaged over the last average_loss iterations - optional int32 average_loss = 33 [default = 1]; - optional int32 max_iter = 7; // the maximum number of iterations - // accumulate gradients over `iter_size` x `batch_size` instances - optional int32 iter_size = 36 [default = 1]; - - // The learning rate decay policy. The currently implemented learning rate - // policies are as follows: - // - fixed: always return base_lr. - // - step: return base_lr * gamma ^ (floor(iter / step)) - // - exp: return base_lr * gamma ^ iter - // - inv: return base_lr * (1 + gamma * iter) ^ (- power) - // - multistep: similar to step but it allows non uniform steps defined by - // stepvalue - // - poly: the effective learning rate follows a polynomial decay, to be - // zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power) - // - sigmoid: the effective learning rate follows a sigmod decay - // return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize)))) - // - // where base_lr, max_iter, gamma, step, stepvalue and power are defined - // in the solver parameter protocol buffer, and iter is the current iteration. - optional string lr_policy = 8; - optional float gamma = 9; // The parameter to compute the learning rate. - optional float power = 10; // The parameter to compute the learning rate. - optional float momentum = 11; // The momentum value. - optional float weight_decay = 12; // The weight decay. - // regularization types supported: L1 and L2 - // controlled by weight_decay - optional string regularization_type = 29 [default = "L2"]; - // the stepsize for learning rate policy "step" - optional int32 stepsize = 13; - // the stepsize for learning rate policy "multistep" - repeated int32 stepvalue = 34; - - // Set clip_gradients to >= 0 to clip parameter gradients to that L2 norm, - // whenever their actual L2 norm is larger. - optional float clip_gradients = 35 [default = -1]; - - optional int32 snapshot = 14 [default = 0]; // The snapshot interval - optional string snapshot_prefix = 15; // The prefix for the snapshot. - // whether to snapshot diff in the results or not. Snapshotting diff will help - // debugging but the final protocol buffer size will be much larger. - optional bool snapshot_diff = 16 [default = false]; - enum SnapshotFormat { - HDF5 = 0; - BINARYPROTO = 1; - } - optional SnapshotFormat snapshot_format = 37 [default = BINARYPROTO]; - // the mode solver will use: 0 for CPU and 1 for GPU. Use GPU in default. - enum SolverMode { - CPU = 0; - GPU = 1; - } - optional SolverMode solver_mode = 17 [default = GPU]; - // the device_id will that be used in GPU mode. Use device_id = 0 in default. - optional int32 device_id = 18 [default = 0]; - // If non-negative, the seed with which the Solver will initialize the Caffe - // random number generator -- useful for reproducible results. Otherwise, - // (and by default) initialize using a seed derived from the system clock. - optional int64 random_seed = 20 [default = -1]; - - // type of the solver - optional string type = 40 [default = "SGD"]; - - // numerical stability for RMSProp, AdaGrad and AdaDelta and Adam - optional float delta = 31 [default = 1e-8]; - // parameters for the Adam solver - optional float momentum2 = 39 [default = 0.999]; - - // RMSProp decay value - // MeanSquare(t) = rms_decay*MeanSquare(t-1) + (1-rms_decay)*SquareGradient(t) - optional float rms_decay = 38 [default = 0.99]; - - // If true, print information about the state of the net that may help with - // debugging learning problems. - optional bool debug_info = 23 [default = false]; - - // If false, don't save a snapshot after training finishes. - optional bool snapshot_after_train = 28 [default = true]; - - // DEPRECATED: old solver enum types, use string instead - enum SolverType { - SGD = 0; - NESTEROV = 1; - ADAGRAD = 2; - RMSPROP = 3; - ADADELTA = 4; - ADAM = 5; - } - // DEPRECATED: use type instead of solver_type - optional SolverType solver_type = 30 [default = SGD]; - - // Overlap compute and communication for data parallel training - optional bool layer_wise_reduce = 41 [default = true]; -} - -// A message that stores the solver snapshots -message SolverState { - optional int32 iter = 1; // The current iteration - optional string learned_net = 2; // The file that stores the learned net. - repeated BlobProto history = 3; // The history for sgd solvers - optional int32 current_step = 4 [default = 0]; // The current step for learning rate -} - -enum Phase { - TRAIN = 0; - TEST = 1; -} - -message NetState { - optional Phase phase = 1 [default = TEST]; - optional int32 level = 2 [default = 0]; - repeated string stage = 3; -} - -message NetStateRule { - // Set phase to require the NetState have a particular phase (TRAIN or TEST) - // to meet this rule. - optional Phase phase = 1; - - // Set the minimum and/or maximum levels in which the layer should be used. - // Leave undefined to meet the rule regardless of level. - optional int32 min_level = 2; - optional int32 max_level = 3; - - // Customizable sets of stages to include or exclude. - // The net must have ALL of the specified stages and NONE of the specified - // "not_stage"s to meet the rule. - // (Use multiple NetStateRules to specify conjunctions of stages.) - repeated string stage = 4; - repeated string not_stage = 5; -} - -// Specifies training parameters (multipliers on global learning constants, -// and the name and other settings used for weight sharing). -message ParamSpec { - // The names of the parameter blobs -- useful for sharing parameters among - // layers, but never required otherwise. To share a parameter between two - // layers, give it a (non-empty) name. - optional string name = 1; - - // Whether to require shared weights to have the same shape, or just the same - // count -- defaults to STRICT if unspecified. - optional DimCheckMode share_mode = 2; - enum DimCheckMode { - // STRICT (default) requires that num, channels, height, width each match. - STRICT = 0; - // PERMISSIVE requires only the count (num*channels*height*width) to match. - PERMISSIVE = 1; - } - - // The multiplier on the global learning rate for this parameter. - optional float lr_mult = 3 [default = 1.0]; - - // The multiplier on the global weight decay for this parameter. - optional float decay_mult = 4 [default = 1.0]; -} - -// NOTE -// Update the next available ID when you add a new LayerParameter field. -// -// LayerParameter next available layer-specific ID: 147 (last added: recurrent_param) -message LayerParameter { - optional string name = 1; // the layer name - optional string type = 2; // the layer type - repeated string bottom = 3; // the name of each bottom blob - repeated string top = 4; // the name of each top blob - - // The train / test phase for computation. - optional Phase phase = 10; - - // The amount of weight to assign each top blob in the objective. - // Each layer assigns a default value, usually of either 0 or 1, - // to each top blob. - repeated float loss_weight = 5; - - // Specifies training parameters (multipliers on global learning constants, - // and the name and other settings used for weight sharing). - repeated ParamSpec param = 6; - - // The blobs containing the numeric parameters of the layer. - repeated BlobProto blobs = 7; - - // Specifies whether to backpropagate to each bottom. If unspecified, - // Caffe will automatically infer whether each input needs backpropagation - // to compute parameter gradients. If set to true for some inputs, - // backpropagation to those inputs is forced; if set false for some inputs, - // backpropagation to those inputs is skipped. - // - // The size must be either 0 or equal to the number of bottoms. - repeated bool propagate_down = 11; - - // Rules controlling whether and when a layer is included in the network, - // based on the current NetState. You may specify a non-zero number of rules - // to include OR exclude, but not both. If no include or exclude rules are - // specified, the layer is always included. If the current NetState meets - // ANY (i.e., one or more) of the specified rules, the layer is - // included/excluded. - repeated NetStateRule include = 8; - repeated NetStateRule exclude = 9; - - // Parameters for data pre-processing. - optional TransformationParameter transform_param = 100; - - // Parameters shared by loss layers. - optional LossParameter loss_param = 101; - - // Layer type-specific parameters. - // - // Note: certain layers may have more than one computational engine - // for their implementation. These layers include an Engine type and - // engine parameter for selecting the implementation. - // The default for the engine is set by the ENGINE switch at compile-time. - optional AccuracyParameter accuracy_param = 102; - optional ArgMaxParameter argmax_param = 103; - optional BatchNormParameter batch_norm_param = 139; - optional BiasParameter bias_param = 141; - optional ConcatParameter concat_param = 104; - optional ContrastiveLossParameter contrastive_loss_param = 105; - optional ConvolutionParameter convolution_param = 106; - optional CropParameter crop_param = 144; - optional DataParameter data_param = 107; - optional DropoutParameter dropout_param = 108; - optional DummyDataParameter dummy_data_param = 109; - optional EltwiseParameter eltwise_param = 110; - optional ELUParameter elu_param = 140; - optional EmbedParameter embed_param = 137; - optional ExpParameter exp_param = 111; - optional FlattenParameter flatten_param = 135; - optional HDF5DataParameter hdf5_data_param = 112; - optional HDF5OutputParameter hdf5_output_param = 113; - optional HingeLossParameter hinge_loss_param = 114; - optional ImageDataParameter image_data_param = 115; - optional InfogainLossParameter infogain_loss_param = 116; - optional InnerProductParameter inner_product_param = 117; - optional InputParameter input_param = 143; - optional LogParameter log_param = 134; - optional LRNParameter lrn_param = 118; - optional MemoryDataParameter memory_data_param = 119; - optional MVNParameter mvn_param = 120; - optional ParameterParameter parameter_param = 145; - optional PoolingParameter pooling_param = 121; - optional PowerParameter power_param = 122; - optional PReLUParameter prelu_param = 131; - optional PythonParameter python_param = 130; - optional RecurrentParameter recurrent_param = 146; - optional ReductionParameter reduction_param = 136; - optional ReLUParameter relu_param = 123; - optional ReshapeParameter reshape_param = 133; - optional ScaleParameter scale_param = 142; - optional SigmoidParameter sigmoid_param = 124; - optional SoftmaxParameter softmax_param = 125; - optional SPPParameter spp_param = 132; - optional SliceParameter slice_param = 126; - optional TanHParameter tanh_param = 127; - optional ThresholdParameter threshold_param = 128; - optional TileParameter tile_param = 138; - optional WindowDataParameter window_data_param = 129; -} - -// Message that stores parameters used to apply transformation -// to the data layer's data -message TransformationParameter { - // For data pre-processing, we can do simple scaling and subtracting the - // data mean, if provided. Note that the mean subtraction is always carried - // out before scaling. - optional float scale = 1 [default = 1]; - // Specify if we want to randomly mirror data. - optional bool mirror = 2 [default = false]; - // Specify if we would like to randomly crop an image. - optional uint32 crop_size = 3 [default = 0]; - // mean_file and mean_value cannot be specified at the same time - optional string mean_file = 4; - // if specified can be repeated once (would subtract it from all the channels) - // or can be repeated the same number of times as channels - // (would subtract them from the corresponding channel) - repeated float mean_value = 5; - // Force the decoded image to have 3 color channels. - optional bool force_color = 6 [default = false]; - // Force the decoded image to have 1 color channels. - optional bool force_gray = 7 [default = false]; -} - -// Message that stores parameters shared by loss layers -message LossParameter { - // If specified, ignore instances with the given label. - optional int32 ignore_label = 1; - // How to normalize the loss for loss layers that aggregate across batches, - // spatial dimensions, or other dimensions. Currently only implemented in - // SoftmaxWithLoss and SigmoidCrossEntropyLoss layers. - enum NormalizationMode { - // Divide by the number of examples in the batch times spatial dimensions. - // Outputs that receive the ignore label will NOT be ignored in computing - // the normalization factor. - FULL = 0; - // Divide by the total number of output locations that do not take the - // ignore_label. If ignore_label is not set, this behaves like FULL. - VALID = 1; - // Divide by the batch size. - BATCH_SIZE = 2; - // Do not normalize the loss. - NONE = 3; - } - // For historical reasons, the default normalization for - // SigmoidCrossEntropyLoss is BATCH_SIZE and *not* VALID. - optional NormalizationMode normalization = 3 [default = VALID]; - // Deprecated. Ignored if normalization is specified. If normalization - // is not specified, then setting this to false will be equivalent to - // normalization = BATCH_SIZE to be consistent with previous behavior. - optional bool normalize = 2; -} - -// Messages that store parameters used by individual layer types follow, in -// alphabetical order. - -message AccuracyParameter { - // When computing accuracy, count as correct by comparing the true label to - // the top k scoring classes. By default, only compare to the top scoring - // class (i.e. argmax). - optional uint32 top_k = 1 [default = 1]; - - // The "label" axis of the prediction blob, whose argmax corresponds to the - // predicted label -- may be negative to index from the end (e.g., -1 for the - // last axis). For example, if axis == 1 and the predictions are - // (N x C x H x W), the label blob is expected to contain N*H*W ground truth - // labels with integer values in {0, 1, ..., C-1}. - optional int32 axis = 2 [default = 1]; - - // If specified, ignore instances with the given label. - optional int32 ignore_label = 3; -} - -message ArgMaxParameter { - // If true produce pairs (argmax, maxval) - optional bool out_max_val = 1 [default = false]; - optional uint32 top_k = 2 [default = 1]; - // The axis along which to maximise -- may be negative to index from the - // end (e.g., -1 for the last axis). - // By default ArgMaxLayer maximizes over the flattened trailing dimensions - // for each index of the first / num dimension. - optional int32 axis = 3; -} - -message ConcatParameter { - // The axis along which to concatenate -- may be negative to index from the - // end (e.g., -1 for the last axis). Other axes must have the - // same dimension for all the bottom blobs. - // By default, ConcatLayer concatenates blobs along the "channels" axis (1). - optional int32 axis = 2 [default = 1]; - - // DEPRECATED: alias for "axis" -- does not support negative indexing. - optional uint32 concat_dim = 1 [default = 1]; -} - -message BatchNormParameter { - // If false, normalization is performed over the current mini-batch - // and global statistics are accumulated (but not yet used) by a moving - // average. - // If true, those accumulated mean and variance values are used for the - // normalization. - // By default, it is set to false when the network is in the training - // phase and true when the network is in the testing phase. - optional bool use_global_stats = 1; - // What fraction of the moving average remains each iteration? - // Smaller values make the moving average decay faster, giving more - // weight to the recent values. - // Each iteration updates the moving average @f$S_{t-1}@f$ with the - // current mean @f$ Y_t @f$ by - // @f$ S_t = (1-\beta)Y_t + \beta \cdot S_{t-1} @f$, where @f$ \beta @f$ - // is the moving_average_fraction parameter. - optional float moving_average_fraction = 2 [default = .999]; - // Small value to add to the variance estimate so that we don't divide by - // zero. - optional float eps = 3 [default = 1e-5]; -} - -message BiasParameter { - // The first axis of bottom[0] (the first input Blob) along which to apply - // bottom[1] (the second input Blob). May be negative to index from the end - // (e.g., -1 for the last axis). - // - // For example, if bottom[0] is 4D with shape 100x3x40x60, the output - // top[0] will have the same shape, and bottom[1] may have any of the - // following shapes (for the given value of axis): - // (axis == 0 == -4) 100; 100x3; 100x3x40; 100x3x40x60 - // (axis == 1 == -3) 3; 3x40; 3x40x60 - // (axis == 2 == -2) 40; 40x60 - // (axis == 3 == -1) 60 - // Furthermore, bottom[1] may have the empty shape (regardless of the value of - // "axis") -- a scalar bias. - optional int32 axis = 1 [default = 1]; - - // (num_axes is ignored unless just one bottom is given and the bias is - // a learned parameter of the layer. Otherwise, num_axes is determined by the - // number of axes by the second bottom.) - // The number of axes of the input (bottom[0]) covered by the bias - // parameter, or -1 to cover all axes of bottom[0] starting from `axis`. - // Set num_axes := 0, to add a zero-axis Blob: a scalar. - optional int32 num_axes = 2 [default = 1]; - - // (filler is ignored unless just one bottom is given and the bias is - // a learned parameter of the layer.) - // The initialization for the learned bias parameter. - // Default is the zero (0) initialization, resulting in the BiasLayer - // initially performing the identity operation. - optional FillerParameter filler = 3; -} - -message ContrastiveLossParameter { - // margin for dissimilar pair - optional float margin = 1 [default = 1.0]; - // The first implementation of this cost did not exactly match the cost of - // Hadsell et al 2006 -- using (margin - d^2) instead of (margin - d)^2. - // legacy_version = false (the default) uses (margin - d)^2 as proposed in the - // Hadsell paper. New models should probably use this version. - // legacy_version = true uses (margin - d^2). This is kept to support / - // reproduce existing models and results - optional bool legacy_version = 2 [default = false]; -} - -message ConvolutionParameter { - optional uint32 num_output = 1; // The number of outputs for the layer - optional bool bias_term = 2 [default = true]; // whether to have bias terms - - // Pad, kernel size, and stride are all given as a single value for equal - // dimensions in all spatial dimensions, or once per spatial dimension. - repeated uint32 pad = 3; // The padding size; defaults to 0 - repeated uint32 kernel_size = 4; // The kernel size - repeated uint32 stride = 6; // The stride; defaults to 1 - // Factor used to dilate the kernel, (implicitly) zero-filling the resulting - // holes. (Kernel dilation is sometimes referred to by its use in the - // algorithme à trous from Holschneider et al. 1987.) - repeated uint32 dilation = 18; // The dilation; defaults to 1 - - // For 2D convolution only, the *_h and *_w versions may also be used to - // specify both spatial dimensions. - optional uint32 pad_h = 9 [default = 0]; // The padding height (2D only) - optional uint32 pad_w = 10 [default = 0]; // The padding width (2D only) - optional uint32 kernel_h = 11; // The kernel height (2D only) - optional uint32 kernel_w = 12; // The kernel width (2D only) - optional uint32 stride_h = 13; // The stride height (2D only) - optional uint32 stride_w = 14; // The stride width (2D only) - - optional uint32 group = 5 [default = 1]; // The group size for group conv - - optional FillerParameter weight_filler = 7; // The filler for the weight - optional FillerParameter bias_filler = 8; // The filler for the bias - enum Engine { - DEFAULT = 0; - CAFFE = 1; - CUDNN = 2; - } - optional Engine engine = 15 [default = DEFAULT]; - - // The axis to interpret as "channels" when performing convolution. - // Preceding dimensions are treated as independent inputs; - // succeeding dimensions are treated as "spatial". - // With (N, C, H, W) inputs, and axis == 1 (the default), we perform - // N independent 2D convolutions, sliding C-channel (or (C/g)-channels, for - // groups g>1) filters across the spatial axes (H, W) of the input. - // With (N, C, D, H, W) inputs, and axis == 1, we perform - // N independent 3D convolutions, sliding (C/g)-channels - // filters across the spatial axes (D, H, W) of the input. - optional int32 axis = 16 [default = 1]; - - // Whether to force use of the general ND convolution, even if a specific - // implementation for blobs of the appropriate number of spatial dimensions - // is available. (Currently, there is only a 2D-specific convolution - // implementation; for input blobs with num_axes != 2, this option is - // ignored and the ND implementation will be used.) - optional bool force_nd_im2col = 17 [default = false]; -} - -message CropParameter { - // To crop, elements of the first bottom are selected to fit the dimensions - // of the second, reference bottom. The crop is configured by - // - the crop `axis` to pick the dimensions for cropping - // - the crop `offset` to set the shift for all/each dimension - // to align the cropped bottom with the reference bottom. - // All dimensions up to but excluding `axis` are preserved, while - // the dimensions including and trailing `axis` are cropped. - // If only one `offset` is set, then all dimensions are offset by this amount. - // Otherwise, the number of offsets must equal the number of cropped axes to - // shift the crop in each dimension accordingly. - // Note: standard dimensions are N,C,H,W so the default is a spatial crop, - // and `axis` may be negative to index from the end (e.g., -1 for the last - // axis). - optional int32 axis = 1 [default = 2]; - repeated uint32 offset = 2; -} - -message DataParameter { - enum DB { - LEVELDB = 0; - LMDB = 1; - } - // Specify the data source. - optional string source = 1; - // Specify the batch size. - optional uint32 batch_size = 4; - // The rand_skip variable is for the data layer to skip a few data points - // to avoid all asynchronous sgd clients to start at the same point. The skip - // point would be set as rand_skip * rand(0,1). Note that rand_skip should not - // be larger than the number of keys in the database. - // DEPRECATED. Each solver accesses a different subset of the database. - optional uint32 rand_skip = 7 [default = 0]; - optional DB backend = 8 [default = LEVELDB]; - // DEPRECATED. See TransformationParameter. For data pre-processing, we can do - // simple scaling and subtracting the data mean, if provided. Note that the - // mean subtraction is always carried out before scaling. - optional float scale = 2 [default = 1]; - optional string mean_file = 3; - // DEPRECATED. See TransformationParameter. Specify if we would like to randomly - // crop an image. - optional uint32 crop_size = 5 [default = 0]; - // DEPRECATED. See TransformationParameter. Specify if we want to randomly mirror - // data. - optional bool mirror = 6 [default = false]; - // Force the encoded image to have 3 color channels - optional bool force_encoded_color = 9 [default = false]; - // Prefetch queue (Increase if data feeding bandwidth varies, within the - // limit of device memory for GPU training) - optional uint32 prefetch = 10 [default = 4]; -} - -message DropoutParameter { - optional float dropout_ratio = 1 [default = 0.5]; // dropout ratio -} - -// DummyDataLayer fills any number of arbitrarily shaped blobs with random -// (or constant) data generated by "Fillers" (see "message FillerParameter"). -message DummyDataParameter { - // This layer produces N >= 1 top blobs. DummyDataParameter must specify 1 or N - // shape fields, and 0, 1 or N data_fillers. - // - // If 0 data_fillers are specified, ConstantFiller with a value of 0 is used. - // If 1 data_filler is specified, it is applied to all top blobs. If N are - // specified, the ith is applied to the ith top blob. - repeated FillerParameter data_filler = 1; - repeated BlobShape shape = 6; - - // 4D dimensions -- deprecated. Use "shape" instead. - repeated uint32 num = 2; - repeated uint32 channels = 3; - repeated uint32 height = 4; - repeated uint32 width = 5; -} - -message EltwiseParameter { - enum EltwiseOp { - PROD = 0; - SUM = 1; - MAX = 2; - } - optional EltwiseOp operation = 1 [default = SUM]; // element-wise operation - repeated float coeff = 2; // blob-wise coefficient for SUM operation - - // Whether to use an asymptotically slower (for >2 inputs) but stabler method - // of computing the gradient for the PROD operation. (No effect for SUM op.) - optional bool stable_prod_grad = 3 [default = true]; -} - -// Message that stores parameters used by ELULayer -message ELUParameter { - // Described in: - // Clevert, D.-A., Unterthiner, T., & Hochreiter, S. (2015). Fast and Accurate - // Deep Network Learning by Exponential Linear Units (ELUs). arXiv - optional float alpha = 1 [default = 1]; -} - -// Message that stores parameters used by EmbedLayer -message EmbedParameter { - optional uint32 num_output = 1; // The number of outputs for the layer - // The input is given as integers to be interpreted as one-hot - // vector indices with dimension num_input. Hence num_input should be - // 1 greater than the maximum possible input value. - optional uint32 input_dim = 2; - - optional bool bias_term = 3 [default = true]; // Whether to use a bias term - optional FillerParameter weight_filler = 4; // The filler for the weight - optional FillerParameter bias_filler = 5; // The filler for the bias - -} - -// Message that stores parameters used by ExpLayer -message ExpParameter { - // ExpLayer computes outputs y = base ^ (shift + scale * x), for base > 0. - // Or if base is set to the default (-1), base is set to e, - // so y = exp(shift + scale * x). - optional float base = 1 [default = -1.0]; - optional float scale = 2 [default = 1.0]; - optional float shift = 3 [default = 0.0]; -} - -/// Message that stores parameters used by FlattenLayer -message FlattenParameter { - // The first axis to flatten: all preceding axes are retained in the output. - // May be negative to index from the end (e.g., -1 for the last axis). - optional int32 axis = 1 [default = 1]; - - // The last axis to flatten: all following axes are retained in the output. - // May be negative to index from the end (e.g., the default -1 for the last - // axis). - optional int32 end_axis = 2 [default = -1]; -} - -// Message that stores parameters used by HDF5DataLayer -message HDF5DataParameter { - // Specify the data source. - optional string source = 1; - // Specify the batch size. - optional uint32 batch_size = 2; - - // Specify whether to shuffle the data. - // If shuffle == true, the ordering of the HDF5 files is shuffled, - // and the ordering of data within any given HDF5 file is shuffled, - // but data between different files are not interleaved; all of a file's - // data are output (in a random order) before moving onto another file. - optional bool shuffle = 3 [default = false]; -} - -message HDF5OutputParameter { - optional string file_name = 1; -} - -message HingeLossParameter { - enum Norm { - L1 = 1; - L2 = 2; - } - // Specify the Norm to use L1 or L2 - optional Norm norm = 1 [default = L1]; -} - -message ImageDataParameter { - // Specify the data source. - optional string source = 1; - // Specify the batch size. - optional uint32 batch_size = 4 [default = 1]; - // The rand_skip variable is for the data layer to skip a few data points - // to avoid all asynchronous sgd clients to start at the same point. The skip - // point would be set as rand_skip * rand(0,1). Note that rand_skip should not - // be larger than the number of keys in the database. - optional uint32 rand_skip = 7 [default = 0]; - // Whether or not ImageLayer should shuffle the list of files at every epoch. - optional bool shuffle = 8 [default = false]; - // It will also resize images if new_height or new_width are not zero. - optional uint32 new_height = 9 [default = 0]; - optional uint32 new_width = 10 [default = 0]; - // Specify if the images are color or gray - optional bool is_color = 11 [default = true]; - // DEPRECATED. See TransformationParameter. For data pre-processing, we can do - // simple scaling and subtracting the data mean, if provided. Note that the - // mean subtraction is always carried out before scaling. - optional float scale = 2 [default = 1]; - optional string mean_file = 3; - // DEPRECATED. See TransformationParameter. Specify if we would like to randomly - // crop an image. - optional uint32 crop_size = 5 [default = 0]; - // DEPRECATED. See TransformationParameter. Specify if we want to randomly mirror - // data. - optional bool mirror = 6 [default = false]; - optional string root_folder = 12 [default = ""]; -} - -message InfogainLossParameter { - // Specify the infogain matrix source. - optional string source = 1; - optional int32 axis = 2 [default = 1]; // axis of prob -} - -message InnerProductParameter { - optional uint32 num_output = 1; // The number of outputs for the layer - optional bool bias_term = 2 [default = true]; // whether to have bias terms - optional FillerParameter weight_filler = 3; // The filler for the weight - optional FillerParameter bias_filler = 4; // The filler for the bias - - // The first axis to be lumped into a single inner product computation; - // all preceding axes are retained in the output. - // May be negative to index from the end (e.g., -1 for the last axis). - optional int32 axis = 5 [default = 1]; - // Specify whether to transpose the weight matrix or not. - // If transpose == true, any operations will be performed on the transpose - // of the weight matrix. The weight matrix itself is not going to be transposed - // but rather the transfer flag of operations will be toggled accordingly. - optional bool transpose = 6 [default = false]; -} - -message InputParameter { - // This layer produces N >= 1 top blob(s) to be assigned manually. - // Define N shapes to set a shape for each top. - // Define 1 shape to set the same shape for every top. - // Define no shape to defer to reshaping manually. - repeated BlobShape shape = 1; -} - -// Message that stores parameters used by LogLayer -message LogParameter { - // LogLayer computes outputs y = log_base(shift + scale * x), for base > 0. - // Or if base is set to the default (-1), base is set to e, - // so y = ln(shift + scale * x) = log_e(shift + scale * x) - optional float base = 1 [default = -1.0]; - optional float scale = 2 [default = 1.0]; - optional float shift = 3 [default = 0.0]; -} - -// Message that stores parameters used by LRNLayer -message LRNParameter { - optional uint32 local_size = 1 [default = 5]; - optional float alpha = 2 [default = 1.]; - optional float beta = 3 [default = 0.75]; - enum NormRegion { - ACROSS_CHANNELS = 0; - WITHIN_CHANNEL = 1; - } - optional NormRegion norm_region = 4 [default = ACROSS_CHANNELS]; - optional float k = 5 [default = 1.]; - enum Engine { - DEFAULT = 0; - CAFFE = 1; - CUDNN = 2; - } - optional Engine engine = 6 [default = DEFAULT]; -} - -message MemoryDataParameter { - optional uint32 batch_size = 1; - optional uint32 channels = 2; - optional uint32 height = 3; - optional uint32 width = 4; -} - -message MVNParameter { - // This parameter can be set to false to normalize mean only - optional bool normalize_variance = 1 [default = true]; - - // This parameter can be set to true to perform DNN-like MVN - optional bool across_channels = 2 [default = false]; - - // Epsilon for not dividing by zero while normalizing variance - optional float eps = 3 [default = 1e-9]; -} - -message ParameterParameter { - optional BlobShape shape = 1; -} - -message PoolingParameter { - enum PoolMethod { - MAX = 0; - AVE = 1; - STOCHASTIC = 2; - } - optional PoolMethod pool = 1 [default = MAX]; // The pooling method - // Pad, kernel size, and stride are all given as a single value for equal - // dimensions in height and width or as Y, X pairs. - optional uint32 pad = 4 [default = 0]; // The padding size (equal in Y, X) - optional uint32 pad_h = 9 [default = 0]; // The padding height - optional uint32 pad_w = 10 [default = 0]; // The padding width - optional uint32 kernel_size = 2; // The kernel size (square) - optional uint32 kernel_h = 5; // The kernel height - optional uint32 kernel_w = 6; // The kernel width - optional uint32 stride = 3 [default = 1]; // The stride (equal in Y, X) - optional uint32 stride_h = 7; // The stride height - optional uint32 stride_w = 8; // The stride width - enum Engine { - DEFAULT = 0; - CAFFE = 1; - CUDNN = 2; - } - optional Engine engine = 11 [default = DEFAULT]; - // If global_pooling then it will pool over the size of the bottom by doing - // kernel_h = bottom->height and kernel_w = bottom->width - optional bool global_pooling = 12 [default = false]; -} - -message PowerParameter { - // PowerLayer computes outputs y = (shift + scale * x) ^ power. - optional float power = 1 [default = 1.0]; - optional float scale = 2 [default = 1.0]; - optional float shift = 3 [default = 0.0]; -} - -message PythonParameter { - optional string module = 1; - optional string layer = 2; - // This value is set to the attribute `param_str` of the `PythonLayer` object - // in Python before calling the `setup()` method. This could be a number, - // string, dictionary in Python dict format, JSON, etc. You may parse this - // string in `setup` method and use it in `forward` and `backward`. - optional string param_str = 3 [default = '']; - // DEPRECATED - optional bool share_in_parallel = 4 [default = false]; -} - -// Message that stores parameters used by RecurrentLayer -message RecurrentParameter { - // The dimension of the output (and usually hidden state) representation -- - // must be explicitly set to non-zero. - optional uint32 num_output = 1 [default = 0]; - - optional FillerParameter weight_filler = 2; // The filler for the weight - optional FillerParameter bias_filler = 3; // The filler for the bias - - // Whether to enable displaying debug_info in the unrolled recurrent net. - optional bool debug_info = 4 [default = false]; - - // Whether to add as additional inputs (bottoms) the initial hidden state - // blobs, and add as additional outputs (tops) the final timestep hidden state - // blobs. The number of additional bottom/top blobs required depends on the - // recurrent architecture -- e.g., 1 for RNNs, 2 for LSTMs. - optional bool expose_hidden = 5 [default = false]; -} - -// Message that stores parameters used by ReductionLayer -message ReductionParameter { - enum ReductionOp { - SUM = 1; - ASUM = 2; - SUMSQ = 3; - MEAN = 4; - } - - optional ReductionOp operation = 1 [default = SUM]; // reduction operation - - // The first axis to reduce to a scalar -- may be negative to index from the - // end (e.g., -1 for the last axis). - // (Currently, only reduction along ALL "tail" axes is supported; reduction - // of axis M through N, where N < num_axes - 1, is unsupported.) - // Suppose we have an n-axis bottom Blob with shape: - // (d0, d1, d2, ..., d(m-1), dm, d(m+1), ..., d(n-1)). - // If axis == m, the output Blob will have shape - // (d0, d1, d2, ..., d(m-1)), - // and the ReductionOp operation is performed (d0 * d1 * d2 * ... * d(m-1)) - // times, each including (dm * d(m+1) * ... * d(n-1)) individual data. - // If axis == 0 (the default), the output Blob always has the empty shape - // (count 1), performing reduction across the entire input -- - // often useful for creating new loss functions. - optional int32 axis = 2 [default = 0]; - - optional float coeff = 3 [default = 1.0]; // coefficient for output -} - -// Message that stores parameters used by ReLULayer -message ReLUParameter { - // Allow non-zero slope for negative inputs to speed up optimization - // Described in: - // Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities - // improve neural network acoustic models. In ICML Workshop on Deep Learning - // for Audio, Speech, and Language Processing. - optional float negative_slope = 1 [default = 0]; - enum Engine { - DEFAULT = 0; - CAFFE = 1; - CUDNN = 2; - } - optional Engine engine = 2 [default = DEFAULT]; -} - -message ReshapeParameter { - // Specify the output dimensions. If some of the dimensions are set to 0, - // the corresponding dimension from the bottom layer is used (unchanged). - // Exactly one dimension may be set to -1, in which case its value is - // inferred from the count of the bottom blob and the remaining dimensions. - // For example, suppose we want to reshape a 2D blob "input" with shape 2 x 8: - // - // layer { - // type: "Reshape" bottom: "input" top: "output" - // reshape_param { ... } - // } - // - // If "input" is 2D with shape 2 x 8, then the following reshape_param - // specifications are all equivalent, producing a 3D blob "output" with shape - // 2 x 2 x 4: - // - // reshape_param { shape { dim: 2 dim: 2 dim: 4 } } - // reshape_param { shape { dim: 0 dim: 2 dim: 4 } } - // reshape_param { shape { dim: 0 dim: 2 dim: -1 } } - // reshape_param { shape { dim: 0 dim:-1 dim: 4 } } - // - optional BlobShape shape = 1; - - // axis and num_axes control the portion of the bottom blob's shape that are - // replaced by (included in) the reshape. By default (axis == 0 and - // num_axes == -1), the entire bottom blob shape is included in the reshape, - // and hence the shape field must specify the entire output shape. - // - // axis may be non-zero to retain some portion of the beginning of the input - // shape (and may be negative to index from the end; e.g., -1 to begin the - // reshape after the last axis, including nothing in the reshape, - // -2 to include only the last axis, etc.). - // - // For example, suppose "input" is a 2D blob with shape 2 x 8. - // Then the following ReshapeLayer specifications are all equivalent, - // producing a blob "output" with shape 2 x 2 x 4: - // - // reshape_param { shape { dim: 2 dim: 2 dim: 4 } } - // reshape_param { shape { dim: 2 dim: 4 } axis: 1 } - // reshape_param { shape { dim: 2 dim: 4 } axis: -3 } - // - // num_axes specifies the extent of the reshape. - // If num_axes >= 0 (and axis >= 0), the reshape will be performed only on - // input axes in the range [axis, axis+num_axes]. - // num_axes may also be -1, the default, to include all remaining axes - // (starting from axis). - // - // For example, suppose "input" is a 2D blob with shape 2 x 8. - // Then the following ReshapeLayer specifications are equivalent, - // producing a blob "output" with shape 1 x 2 x 8. - // - // reshape_param { shape { dim: 1 dim: 2 dim: 8 } } - // reshape_param { shape { dim: 1 dim: 2 } num_axes: 1 } - // reshape_param { shape { dim: 1 } num_axes: 0 } - // - // On the other hand, these would produce output blob shape 2 x 1 x 8: - // - // reshape_param { shape { dim: 2 dim: 1 dim: 8 } } - // reshape_param { shape { dim: 1 } axis: 1 num_axes: 0 } - // - optional int32 axis = 2 [default = 0]; - optional int32 num_axes = 3 [default = -1]; -} - -message ScaleParameter { - // The first axis of bottom[0] (the first input Blob) along which to apply - // bottom[1] (the second input Blob). May be negative to index from the end - // (e.g., -1 for the last axis). - // - // For example, if bottom[0] is 4D with shape 100x3x40x60, the output - // top[0] will have the same shape, and bottom[1] may have any of the - // following shapes (for the given value of axis): - // (axis == 0 == -4) 100; 100x3; 100x3x40; 100x3x40x60 - // (axis == 1 == -3) 3; 3x40; 3x40x60 - // (axis == 2 == -2) 40; 40x60 - // (axis == 3 == -1) 60 - // Furthermore, bottom[1] may have the empty shape (regardless of the value of - // "axis") -- a scalar multiplier. - optional int32 axis = 1 [default = 1]; - - // (num_axes is ignored unless just one bottom is given and the scale is - // a learned parameter of the layer. Otherwise, num_axes is determined by the - // number of axes by the second bottom.) - // The number of axes of the input (bottom[0]) covered by the scale - // parameter, or -1 to cover all axes of bottom[0] starting from `axis`. - // Set num_axes := 0, to multiply with a zero-axis Blob: a scalar. - optional int32 num_axes = 2 [default = 1]; - - // (filler is ignored unless just one bottom is given and the scale is - // a learned parameter of the layer.) - // The initialization for the learned scale parameter. - // Default is the unit (1) initialization, resulting in the ScaleLayer - // initially performing the identity operation. - optional FillerParameter filler = 3; - - // Whether to also learn a bias (equivalent to a ScaleLayer+BiasLayer, but - // may be more efficient). Initialized with bias_filler (defaults to 0). - optional bool bias_term = 4 [default = false]; - optional FillerParameter bias_filler = 5; -} - -message SigmoidParameter { - enum Engine { - DEFAULT = 0; - CAFFE = 1; - CUDNN = 2; - } - optional Engine engine = 1 [default = DEFAULT]; -} - -message SliceParameter { - // The axis along which to slice -- may be negative to index from the end - // (e.g., -1 for the last axis). - // By default, SliceLayer concatenates blobs along the "channels" axis (1). - optional int32 axis = 3 [default = 1]; - repeated uint32 slice_point = 2; - - // DEPRECATED: alias for "axis" -- does not support negative indexing. - optional uint32 slice_dim = 1 [default = 1]; -} - -// Message that stores parameters used by SoftmaxLayer, SoftmaxWithLossLayer -message SoftmaxParameter { - enum Engine { - DEFAULT = 0; - CAFFE = 1; - CUDNN = 2; - } - optional Engine engine = 1 [default = DEFAULT]; - - // The axis along which to perform the softmax -- may be negative to index - // from the end (e.g., -1 for the last axis). - // Any other axes will be evaluated as independent softmaxes. - optional int32 axis = 2 [default = 1]; -} - -message TanHParameter { - enum Engine { - DEFAULT = 0; - CAFFE = 1; - CUDNN = 2; - } - optional Engine engine = 1 [default = DEFAULT]; -} - -// Message that stores parameters used by TileLayer -message TileParameter { - // The index of the axis to tile. - optional int32 axis = 1 [default = 1]; - - // The number of copies (tiles) of the blob to output. - optional int32 tiles = 2; -} - -// Message that stores parameters used by ThresholdLayer -message ThresholdParameter { - optional float threshold = 1 [default = 0]; // Strictly positive values -} - -message WindowDataParameter { - // Specify the data source. - optional string source = 1; - // For data pre-processing, we can do simple scaling and subtracting the - // data mean, if provided. Note that the mean subtraction is always carried - // out before scaling. - optional float scale = 2 [default = 1]; - optional string mean_file = 3; - // Specify the batch size. - optional uint32 batch_size = 4; - // Specify if we would like to randomly crop an image. - optional uint32 crop_size = 5 [default = 0]; - // Specify if we want to randomly mirror data. - optional bool mirror = 6 [default = false]; - // Foreground (object) overlap threshold - optional float fg_threshold = 7 [default = 0.5]; - // Background (non-object) overlap threshold - optional float bg_threshold = 8 [default = 0.5]; - // Fraction of batch that should be foreground objects - optional float fg_fraction = 9 [default = 0.25]; - // Amount of contextual padding to add around a window - // (used only by the window_data_layer) - optional uint32 context_pad = 10 [default = 0]; - // Mode for cropping out a detection window - // warp: cropped window is warped to a fixed size and aspect ratio - // square: the tightest square around the window is cropped - optional string crop_mode = 11 [default = "warp"]; - // cache_images: will load all images in memory for faster access - optional bool cache_images = 12 [default = false]; - // append root_folder to locate images - optional string root_folder = 13 [default = ""]; -} - -message SPPParameter { - enum PoolMethod { - MAX = 0; - AVE = 1; - STOCHASTIC = 2; - } - optional uint32 pyramid_height = 1; - optional PoolMethod pool = 2 [default = MAX]; // The pooling method - enum Engine { - DEFAULT = 0; - CAFFE = 1; - CUDNN = 2; - } - optional Engine engine = 6 [default = DEFAULT]; -} - -// DEPRECATED: use LayerParameter. -message V1LayerParameter { - repeated string bottom = 2; - repeated string top = 3; - optional string name = 4; - repeated NetStateRule include = 32; - repeated NetStateRule exclude = 33; - enum LayerType { - NONE = 0; - ABSVAL = 35; - ACCURACY = 1; - ARGMAX = 30; - BNLL = 2; - CONCAT = 3; - CONTRASTIVE_LOSS = 37; - CONVOLUTION = 4; - DATA = 5; - DECONVOLUTION = 39; - DROPOUT = 6; - DUMMY_DATA = 32; - EUCLIDEAN_LOSS = 7; - ELTWISE = 25; - EXP = 38; - FLATTEN = 8; - HDF5_DATA = 9; - HDF5_OUTPUT = 10; - HINGE_LOSS = 28; - IM2COL = 11; - IMAGE_DATA = 12; - INFOGAIN_LOSS = 13; - INNER_PRODUCT = 14; - LRN = 15; - MEMORY_DATA = 29; - MULTINOMIAL_LOGISTIC_LOSS = 16; - MVN = 34; - POOLING = 17; - POWER = 26; - RELU = 18; - SIGMOID = 19; - SIGMOID_CROSS_ENTROPY_LOSS = 27; - SILENCE = 36; - SOFTMAX = 20; - SOFTMAX_LOSS = 21; - SPLIT = 22; - SLICE = 33; - TANH = 23; - WINDOW_DATA = 24; - THRESHOLD = 31; - } - optional LayerType type = 5; - repeated BlobProto blobs = 6; - repeated string param = 1001; - repeated DimCheckMode blob_share_mode = 1002; - enum DimCheckMode { - STRICT = 0; - PERMISSIVE = 1; - } - repeated float blobs_lr = 7; - repeated float weight_decay = 8; - repeated float loss_weight = 35; - optional AccuracyParameter accuracy_param = 27; - optional ArgMaxParameter argmax_param = 23; - optional ConcatParameter concat_param = 9; - optional ContrastiveLossParameter contrastive_loss_param = 40; - optional ConvolutionParameter convolution_param = 10; - optional DataParameter data_param = 11; - optional DropoutParameter dropout_param = 12; - optional DummyDataParameter dummy_data_param = 26; - optional EltwiseParameter eltwise_param = 24; - optional ExpParameter exp_param = 41; - optional HDF5DataParameter hdf5_data_param = 13; - optional HDF5OutputParameter hdf5_output_param = 14; - optional HingeLossParameter hinge_loss_param = 29; - optional ImageDataParameter image_data_param = 15; - optional InfogainLossParameter infogain_loss_param = 16; - optional InnerProductParameter inner_product_param = 17; - optional LRNParameter lrn_param = 18; - optional MemoryDataParameter memory_data_param = 22; - optional MVNParameter mvn_param = 34; - optional PoolingParameter pooling_param = 19; - optional PowerParameter power_param = 21; - optional ReLUParameter relu_param = 30; - optional SigmoidParameter sigmoid_param = 38; - optional SoftmaxParameter softmax_param = 39; - optional SliceParameter slice_param = 31; - optional TanHParameter tanh_param = 37; - optional ThresholdParameter threshold_param = 25; - optional WindowDataParameter window_data_param = 20; - optional TransformationParameter transform_param = 36; - optional LossParameter loss_param = 42; - optional V0LayerParameter layer = 1; -} - -// DEPRECATED: V0LayerParameter is the old way of specifying layer parameters -// in Caffe. We keep this message type around for legacy support. -message V0LayerParameter { - optional string name = 1; // the layer name - optional string type = 2; // the string to specify the layer type - - // Parameters to specify layers with inner products. - optional uint32 num_output = 3; // The number of outputs for the layer - optional bool biasterm = 4 [default = true]; // whether to have bias terms - optional FillerParameter weight_filler = 5; // The filler for the weight - optional FillerParameter bias_filler = 6; // The filler for the bias - - optional uint32 pad = 7 [default = 0]; // The padding size - optional uint32 kernelsize = 8; // The kernel size - optional uint32 group = 9 [default = 1]; // The group size for group conv - optional uint32 stride = 10 [default = 1]; // The stride - enum PoolMethod { - MAX = 0; - AVE = 1; - STOCHASTIC = 2; - } - optional PoolMethod pool = 11 [default = MAX]; // The pooling method - optional float dropout_ratio = 12 [default = 0.5]; // dropout ratio - - optional uint32 local_size = 13 [default = 5]; // for local response norm - optional float alpha = 14 [default = 1.]; // for local response norm - optional float beta = 15 [default = 0.75]; // for local response norm - optional float k = 22 [default = 1.]; - - // For data layers, specify the data source - optional string source = 16; - // For data pre-processing, we can do simple scaling and subtracting the - // data mean, if provided. Note that the mean subtraction is always carried - // out before scaling. - optional float scale = 17 [default = 1]; - optional string meanfile = 18; - // For data layers, specify the batch size. - optional uint32 batchsize = 19; - // For data layers, specify if we would like to randomly crop an image. - optional uint32 cropsize = 20 [default = 0]; - // For data layers, specify if we want to randomly mirror data. - optional bool mirror = 21 [default = false]; - - // The blobs containing the numeric parameters of the layer - repeated BlobProto blobs = 50; - // The ratio that is multiplied on the global learning rate. If you want to - // set the learning ratio for one blob, you need to set it for all blobs. - repeated float blobs_lr = 51; - // The weight decay that is multiplied on the global weight decay. - repeated float weight_decay = 52; - - // The rand_skip variable is for the data layer to skip a few data points - // to avoid all asynchronous sgd clients to start at the same point. The skip - // point would be set as rand_skip * rand(0,1). Note that rand_skip should not - // be larger than the number of keys in the database. - optional uint32 rand_skip = 53 [default = 0]; - - // Fields related to detection (det_*) - // foreground (object) overlap threshold - optional float det_fg_threshold = 54 [default = 0.5]; - // background (non-object) overlap threshold - optional float det_bg_threshold = 55 [default = 0.5]; - // Fraction of batch that should be foreground objects - optional float det_fg_fraction = 56 [default = 0.25]; - - // optional bool OBSOLETE_can_clobber = 57 [default = true]; - - // Amount of contextual padding to add around a window - // (used only by the window_data_layer) - optional uint32 det_context_pad = 58 [default = 0]; - - // Mode for cropping out a detection window - // warp: cropped window is warped to a fixed size and aspect ratio - // square: the tightest square around the window is cropped - optional string det_crop_mode = 59 [default = "warp"]; - - // For ReshapeLayer, one needs to specify the new dimensions. - optional int32 new_num = 60 [default = 0]; - optional int32 new_channels = 61 [default = 0]; - optional int32 new_height = 62 [default = 0]; - optional int32 new_width = 63 [default = 0]; - - // Whether or not ImageLayer should shuffle the list of files at every epoch. - // It will also resize images if new_height or new_width are not zero. - optional bool shuffle_images = 64 [default = false]; - - // For ConcatLayer, one needs to specify the dimension for concatenation, and - // the other dimensions must be the same for all the bottom blobs. - // By default it will concatenate blobs along the channels dimension. - optional uint32 concat_dim = 65 [default = 1]; - - optional HDF5OutputParameter hdf5_output_param = 1001; -} - -message PReLUParameter { - // Parametric ReLU described in K. He et al, Delving Deep into Rectifiers: - // Surpassing Human-Level Performance on ImageNet Classification, 2015. - - // Initial value of a_i. Default is a_i=0.25 for all i. - optional FillerParameter filler = 1; - // Whether or not slope parameters are shared across channels. - optional bool channel_shared = 2 [default = false]; -} diff --git a/utilities/inference_generator/src/caffe2openvx.cpp b/utilities/inference_generator/src/caffe2openvx.cpp deleted file mode 100644 index f837516a37..0000000000 --- a/utilities/inference_generator/src/caffe2openvx.cpp +++ /dev/null @@ -1,3033 +0,0 @@ -/* -Copyright (c) 2017 - 2023 Advanced Micro Devices, Inc. All rights reserved. - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in -all copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN -THE SOFTWARE. -*/ - -#include -#include -#include -#include -#include -#include -#include -#include "caffe.pb.h" -#include -#include -#include -#include -#include -#include - -#define error(...) printf("ERROR: " __VA_ARGS__), exit(1) -#define info(...) printf("OK: " __VA_ARGS__) - -//Dump Layer Data : disabled unless enabled explicitly by setting ENABLE_DUMP_LAYER_DATA = 1 -#ifndef ENABLE_DUMP_LAYER_DATA -#define ENABLE_DUMP_LAYER_DATA 0 -#endif - -#ifndef ENABLE_DIRECTIVE -#define ENABLE_DIRECTIVE 0 -#endif - -void getLayerParams( - const caffe::LayerParameter& layer, - std::string& params) -{ - if(layer.type() == "Convolution") { - const caffe::ConvolutionParameter& conv = layer.convolution_param(); - int pad_h = conv.has_pad_h() ? conv.pad_h() : (conv.pad_size() > 0 ? conv.pad(0) : 0); - int pad_w = conv.has_pad_w() ? conv.pad_w() : (conv.pad_size() > 1 ? conv.pad(1) : pad_h); - int stride_h = conv.has_stride_h() ? conv.stride_h() : (conv.stride_size() > 0 ? conv.stride(0) : 1); - int stride_w = conv.has_stride_w() ? conv.stride_w() : (conv.stride_size() > 1 ? conv.stride(1) : stride_h); - int kernel_h = conv.has_kernel_h() ? conv.kernel_h() : (conv.kernel_size_size() > 0 ? conv.kernel_size(0) : 0); - int kernel_w = conv.has_kernel_w() ? conv.kernel_w() : (conv.kernel_size_size() > 1 ? conv.kernel_size(1) : kernel_h); - int k = conv.num_output(); - int dilation_h = conv.dilation_size() > 0 ? conv.dilation(0) : 1; - int dilation_w = conv.dilation_size() > 1 ? conv.dilation(1) : dilation_h; - int bias_term = conv.bias_term(); - int group = conv.has_group() ? conv.group() : 0; - params = std::to_string(k) - + " " + std::to_string(kernel_w) - + " " + std::to_string(kernel_h) - + " " + std::to_string(stride_w) - + " " + std::to_string(stride_h) - + " " + std::to_string(pad_w) - + " " + std::to_string(pad_h) - + " " + std::to_string(dilation_w) - + " " + std::to_string(dilation_h) - + " " + std::to_string(bias_term) - + " " + std::to_string(group); - } - else if(layer.type() == "Pooling") { - const caffe::PoolingParameter& pooling = layer.pooling_param(); - int pad_h = pooling.has_pad_h() ? pooling.pad_h() : pooling.pad(); - int pad_w = pooling.has_pad_w() ? pooling.pad_w() : pooling.pad(); - int stride_h = pooling.has_stride_h() ? pooling.stride_h() : pooling.stride(); - int stride_w = pooling.has_stride_w() ? pooling.stride_w() : pooling.stride(); - int kernel_h = pooling.has_kernel_h() ? pooling.kernel_h() : pooling.kernel_size(); - int kernel_w = pooling.has_kernel_w() ? pooling.kernel_w() : pooling.kernel_size(); - int pool = pooling.pool(); - int global_pooling = pooling.global_pooling() == true ? 1 : 0; - params = std::to_string(kernel_w) - + " " + std::to_string(kernel_h) - + " " + std::to_string(stride_w) - + " " + std::to_string(stride_h) - + " " + std::to_string(pad_w) - + " " + std::to_string(pad_h) - + " " + std::to_string(pool) - + " " + std::to_string(global_pooling); - } - else if(layer.type() == "InnerProduct") { - const caffe::InnerProductParameter& innerprod = layer.inner_product_param(); - int k = innerprod.num_output(); - int bias_term = innerprod.bias_term(); - params = std::to_string(k) + " " + std::to_string(bias_term); - } - else if(layer.type() == "LRN") { - const caffe::LRNParameter& lrn = layer.lrn_param(); - const caffe::LRNParameter::NormRegion& norm_region = lrn.norm_region(); - params = std::to_string(lrn.local_size()) - + " " + std::to_string(lrn.alpha()) - + " " + std::to_string(lrn.beta()) - + " " + std::to_string(norm_region) - + " " + std::to_string(lrn.k()); - } - else if(layer.type() == "BatchNorm") { - const caffe::BatchNormParameter& norm = layer.batch_norm_param(); - int use_global_stats = norm.use_global_stats(); - float eps = norm.eps(); - params = std::to_string(eps) - + " " + std::to_string(use_global_stats); - } - else if(layer.type() == "Scale") { - const caffe::ScaleParameter& scale = layer.scale_param(); - params = std::to_string(scale.bias_term()); - } - else if(layer.type() == "Dropout") { - const caffe::DropoutParameter& dropout = layer.dropout_param(); - params = std::to_string(dropout.dropout_ratio()); - } - else if(layer.type() == "Eltwise") { - const caffe::EltwiseParameter& eltwise = layer.eltwise_param(); - params = std::to_string(eltwise.operation()); - } - else if(layer.type() == "Deconvolution") { - const caffe::ConvolutionParameter& conv = layer.convolution_param(); - int pad_h = conv.has_pad_h() ? conv.pad_h() : (conv.pad_size() > 0 ? conv.pad(0) : 0); - int pad_w = conv.has_pad_w() ? conv.pad_w() : (conv.pad_size() > 1 ? conv.pad(1) : pad_h); - int stride_h = conv.has_stride_h() ? conv.stride_h() : (conv.stride_size() > 0 ? conv.stride(0) : 1); - int stride_w = conv.has_stride_w() ? conv.stride_w() : (conv.stride_size() > 1 ? conv.stride(1) : stride_h); - int kernel_h = conv.has_kernel_h() ? conv.kernel_h() : (conv.kernel_size_size() > 0 ? conv.kernel_size(0) : 0); - int kernel_w = conv.has_kernel_w() ? conv.kernel_w() : (conv.kernel_size_size() > 1 ? conv.kernel_size(1) : kernel_h); - int k = conv.num_output(); - int dilation_h = conv.dilation_size() > 0 ? conv.dilation(0) : 1; - int dilation_w = conv.dilation_size() > 1 ? conv.dilation(1) : dilation_h; - int bias_term = conv.bias_term(); - params = std::to_string(k) - + " " + std::to_string(kernel_w) - + " " + std::to_string(kernel_h) - + " " + std::to_string(stride_w) - + " " + std::to_string(stride_h) - + " " + std::to_string(pad_w) - + " " + std::to_string(pad_h) - + " " + std::to_string(dilation_w) - + " " + std::to_string(dilation_h) - + " " + std::to_string(bias_term); - } - else if(layer.type() == "ReLU") { - const caffe::ReLUParameter& relu = layer.relu_param(); - float neg_slope = relu.has_negative_slope()? relu.negative_slope():0.0f; - params = std::to_string(neg_slope); - } -} - -void getV1LayerParams( - const caffe::V1LayerParameter& layer, - std::string& params) -{ - if(layer.type() == caffe::V1LayerParameter_LayerType_CONVOLUTION) { - const caffe::ConvolutionParameter& conv = layer.convolution_param(); - int pad_h = conv.has_pad_h() ? conv.pad_h() : (conv.pad_size() > 0 ? conv.pad(0) : 0); - int pad_w = conv.has_pad_w() ? conv.pad_w() : (conv.pad_size() > 1 ? conv.pad(1) : pad_h); - int stride_h = conv.has_stride_h() ? conv.stride_h() : (conv.stride_size() > 0 ? conv.stride(0) : 1); - int stride_w = conv.has_stride_w() ? conv.stride_w() : (conv.stride_size() > 1 ? conv.stride(1) : stride_h); - int kernel_h = conv.has_kernel_h() ? conv.kernel_h() : (conv.kernel_size_size() > 0 ? conv.kernel_size(0) : 0); - int kernel_w = conv.has_kernel_w() ? conv.kernel_w() : (conv.kernel_size_size() > 1 ? conv.kernel_size(1) : kernel_h); - int k = conv.num_output(); - int dilation_h = conv.dilation_size() > 0 ? conv.dilation(0) : 1; - int dilation_w = conv.dilation_size() > 1 ? conv.dilation(1) : dilation_h; - int bias_term = conv.bias_term(); - int group = conv.has_group() ? conv.group() : 0; - params = std::to_string(k) - + " " + std::to_string(kernel_w) - + " " + std::to_string(kernel_h) - + " " + std::to_string(stride_w) - + " " + std::to_string(stride_h) - + " " + std::to_string(pad_w) - + " " + std::to_string(pad_h) - + " " + std::to_string(dilation_w) - + " " + std::to_string(dilation_h) - + " " + std::to_string(bias_term) - + " " + std::to_string(group); - } - else if(layer.type() == caffe::V1LayerParameter_LayerType_POOLING) { - const caffe::PoolingParameter& pooling = layer.pooling_param(); - int pad_h = pooling.has_pad_h() ? pooling.pad_h() : pooling.pad(); - int pad_w = pooling.has_pad_w() ? pooling.pad_w() : pooling.pad(); - int stride_h = pooling.has_stride_h() ? pooling.stride_h() : pooling.stride(); - int stride_w = pooling.has_stride_w() ? pooling.stride_w() : pooling.stride(); - int kernel_h = pooling.has_kernel_h() ? pooling.kernel_h() : pooling.kernel_size(); - int kernel_w = pooling.has_kernel_w() ? pooling.kernel_w() : pooling.kernel_size(); - int pool = pooling.pool(); - int global_pooling = pooling.global_pooling() == true ? 1 : 0; - params = std::to_string(kernel_w) - + " " + std::to_string(kernel_h) - + " " + std::to_string(stride_w) - + " " + std::to_string(stride_h) - + " " + std::to_string(pad_w) - + " " + std::to_string(pad_h) - + " " + std::to_string(pool) - + " " + std::to_string(global_pooling); - } - else if(layer.type() == caffe::V1LayerParameter_LayerType_INNER_PRODUCT) { - const caffe::InnerProductParameter& innerprod = layer.inner_product_param(); - int k = innerprod.num_output(); - int bias_term = innerprod.bias_term(); - params = std::to_string(k) + " " + std::to_string(bias_term); - } - else if(layer.type() == caffe::V1LayerParameter_LayerType_LRN) { - const caffe::LRNParameter& lrn = layer.lrn_param(); - const caffe::LRNParameter::NormRegion& norm_region = lrn.norm_region(); - params = std::to_string(lrn.local_size()) - + " " + std::to_string(lrn.alpha()) - + " " + std::to_string(lrn.beta()) - + " " + std::to_string(norm_region) - + " " + std::to_string(lrn.k()); - } - else if(layer.type() == caffe::V1LayerParameter_LayerType_DROPOUT) { - const caffe::DropoutParameter& dropout = layer.dropout_param(); - params = std::to_string(dropout.dropout_ratio()); - } - else if(layer.type() == caffe::V1LayerParameter_LayerType_ELTWISE) { - const caffe::EltwiseParameter& eltwise = layer.eltwise_param(); - params = std::to_string(eltwise.operation()); - } - else if(layer.type() == caffe::V1LayerParameter_LayerType_DECONVOLUTION) { - const caffe::ConvolutionParameter& conv = layer.convolution_param(); - int pad_h = conv.has_pad_h() ? conv.pad_h() : (conv.pad_size() > 0 ? conv.pad(0) : 0); - int pad_w = conv.has_pad_w() ? conv.pad_w() : (conv.pad_size() > 1 ? conv.pad(1) : pad_h); - int stride_h = conv.has_stride_h() ? conv.stride_h() : (conv.stride_size() > 0 ? conv.stride(0) : 1); - int stride_w = conv.has_stride_w() ? conv.stride_w() : (conv.stride_size() > 1 ? conv.stride(1) : stride_h); - int kernel_h = conv.has_kernel_h() ? conv.kernel_h() : (conv.kernel_size_size() > 0 ? conv.kernel_size(0) : 0); - int kernel_w = conv.has_kernel_w() ? conv.kernel_w() : (conv.kernel_size_size() > 1 ? conv.kernel_size(1) : kernel_h); - int k = conv.num_output(); - int dilation_h = conv.dilation_size() > 0 ? conv.dilation(0) : 1; - int dilation_w = conv.dilation_size() > 1 ? conv.dilation(1) : dilation_h; - int bias_term = conv.bias_term(); - params = std::to_string(k) - + " " + std::to_string(kernel_w) - + " " + std::to_string(kernel_h) - + " " + std::to_string(stride_w) - + " " + std::to_string(stride_h) - + " " + std::to_string(pad_w) - + " " + std::to_string(pad_h) - + " " + std::to_string(dilation_w) - + " " + std::to_string(dilation_h) - + " " + std::to_string(bias_term); - } - else if(layer.type() == caffe::V1LayerParameter_LayerType_RELU) { - const caffe::ReLUParameter& relu = layer.relu_param(); - float neg_slope = relu.has_negative_slope()? relu.negative_slope():0.0f; - params = std::to_string(neg_slope); - } -} - -std::string convertV1LayerTypeToString(caffe::V1LayerParameter_LayerType V1type) -{ - if(V1type == caffe::V1LayerParameter_LayerType_CONCAT) - return("Concat"); - else if(V1type == caffe::V1LayerParameter_LayerType_CONVOLUTION) - return("Convolution"); - else if(V1type == caffe::V1LayerParameter_LayerType_DECONVOLUTION) - return("Deconvolution"); - else if(V1type == caffe::V1LayerParameter_LayerType_DROPOUT) - return("Dropout"); - else if(V1type == caffe::V1LayerParameter_LayerType_ELTWISE) - return("Eltwise"); - else if(V1type == caffe::V1LayerParameter_LayerType_INNER_PRODUCT) - return("InnerProduct"); - else if(V1type == caffe::V1LayerParameter_LayerType_LRN) - return("LRN"); - else if(V1type == caffe::V1LayerParameter_LayerType_POOLING) - return("Pooling"); - else if(V1type == caffe::V1LayerParameter_LayerType_RELU) - return("ReLU"); - else if(V1type == caffe::V1LayerParameter_LayerType_SOFTMAX) - return("Softmax"); - else - return("UnknownLayer"); -} - -void parseProtoTxt(caffe::NetParameter * param, - std::vector>& net, - int inputDim[4]) -{ - // initialize outputNameMap and input dimensions if available - std::map outputNameMap; - if(param->input_size() > 0) { - outputNameMap[param->input(0)] = param->input(0); - } - - if(param->input_dim_size() == 4 && ((inputDim[0]==0) || (inputDim[1]==0) || (inputDim[2]==0) || (inputDim[3]==0)) ) { - inputDim[0] = param->input_dim(0); - inputDim[1] = param->input_dim(1); - inputDim[2] = param->input_dim(2); - inputDim[3] = param->input_dim(3); - } - - // process network layer by layer - for(int i = 0; i < param->layer_size(); i++) { - // get current layer - const caffe::LayerParameter layer = param->layer(i); - - if(layer.type() == "Input" || layer.type() == "Data" || layer.type() == "ImageData") { - outputNameMap[layer.top(0)] = layer.top(0); - - if(layer.type() == "Input" && ((inputDim[0]==0) || (inputDim[1]==0) || (inputDim[2]==0) || (inputDim[3]==0))) { - inputDim[0] = layer.input_param().shape(0).dim(0); - inputDim[1] = layer.input_param().shape(0).dim(1); - inputDim[2] = layer.input_param().shape(0).dim(2); - inputDim[3] = layer.input_param().shape(0).dim(3); - } - continue; - } - - //Split type. - if(layer.type()=="Split") { - for(int j=0; j< layer.top_size() ; j++ ) - { - // get layer information and add to net - std::vector node; - node.push_back(layer.type()); - node.push_back(""); - node.push_back(layer.top(j)); - node.push_back(layer.top(j)); - for(int z = 0; z < layer.bottom_size();z++) { - if(outputNameMap.find(layer.bottom(z)) == outputNameMap.end()) { - outputNameMap[layer.bottom(z)] = layer.bottom(z); - } - node.push_back(outputNameMap[layer.bottom(z)]); - - } - net.push_back(node); - - // update output name with layer name - outputNameMap[layer.top(j)] = layer.top(j); - } - continue; - } - - // get layer information and add to net - std::vector node; - std::string params; - getLayerParams(layer, params); - node.push_back(layer.type()); - node.push_back(params); - node.push_back(layer.top(0)); - node.push_back(layer.name()); - for(int j = 0; j < layer.bottom_size() ; j++) { - if(outputNameMap.find(layer.bottom(j)) == outputNameMap.end()) { - outputNameMap[layer.bottom(j)] = layer.bottom(j); - } - node.push_back(outputNameMap[layer.bottom(j)]); - } - net.push_back(node); - - // update output name with layer name - outputNameMap[layer.top(0)] = layer.name(); - } -} - -void parseV1LayerProtoTxt(caffe::NetParameter * param, - std::vector>& net, - int inputDim[4]) -{ - // initialize outputNameMap and input dimensions if available - std::map outputNameMap; - if(param->input_size() > 0) { - outputNameMap[param->input(0)] = param->input(0); - } - - if(param->input_dim_size() == 4 && ((inputDim[0]==0) || (inputDim[1]==0) || (inputDim[2]==0) || (inputDim[3]==0)) ) { - inputDim[0] = param->input_dim(0); - inputDim[1] = param->input_dim(1); - inputDim[2] = param->input_dim(2); - inputDim[3] = param->input_dim(3); - } - - // process network layer by layer - for(int i = 0; i < param->layers_size(); i++) { - // get current layer - const caffe::V1LayerParameter layer = param->layers(i); - - if(layer.type() == caffe::V1LayerParameter_LayerType_DATA || layer.type() == caffe::V1LayerParameter_LayerType_IMAGE_DATA) { - outputNameMap[layer.top(0)] = layer.top(0); - continue; - } - - //Split type. - if(layer.type()== caffe::V1LayerParameter_LayerType_SPLIT) { - for(int j=0; j< layer.top_size() ; j++ ) - { - // get layer information and add to net - std::vector node; - node.push_back(convertV1LayerTypeToString(layer.type())); - node.push_back(""); - node.push_back(layer.top(j)); - node.push_back(layer.top(j)); - for(int z = 0; z < layer.bottom_size();z++) { - if(outputNameMap.find(layer.bottom(z)) == outputNameMap.end()) { - outputNameMap[layer.bottom(z)] = layer.bottom(z); - } - node.push_back(outputNameMap[layer.bottom(z)]); - - } - net.push_back(node); - - // update output name with layer name - outputNameMap[layer.top(j)] = layer.top(j); - } - continue; - } - - // get layer information and add to net - std::vector node; - std::string params; - getV1LayerParams(layer, params); - node.push_back(convertV1LayerTypeToString(layer.type())); - node.push_back(params); - node.push_back(layer.top(0)); - node.push_back(layer.name()); - for(int j = 0; j < layer.bottom_size() ; j++) { - if(outputNameMap.find(layer.bottom(j)) == outputNameMap.end()) { - outputNameMap[layer.bottom(j)] = layer.bottom(j); - } - node.push_back(outputNameMap[layer.bottom(j)]); - } - net.push_back(node); - - // update output name with layer name - outputNameMap[layer.top(0)] = layer.name(); - } -} - -int loadCaffeProtoTxt( - const char * prototxtFileName, - std::vector>& net, - int inputDim[4]) -{ - // verify that the version of the library that we linked against is - // compatible with the version of the headers we compiled against. - GOOGLE_PROTOBUF_VERIFY_VERSION; - - //google::protobuf::Message * msg = new google::protobuf::Message(); - caffe::NetParameter * msg = new caffe::NetParameter(); - - // open prototxt and parse - int fd = open(prototxtFileName, O_RDONLY); - if(fd < 0) - error("unable to open: %s\n", prototxtFileName); - google::protobuf::io::FileInputStream fi(fd); - fi.SetCloseOnDelete(true); - if (!google::protobuf::TextFormat::Parse(&fi, msg)) - error("failed to parse file: %s\n", prototxtFileName); - info("loadCaffeProtoTxt: loading %s from %s\n", msg->has_name() ? msg->name().c_str() : "(none)", prototxtFileName); - - if(msg->layer_size() > 0) { - parseProtoTxt(msg, net, inputDim); - } - else if(msg->layers_size() > 0) { - info("Reading V1 layer parameters from %s\n", prototxtFileName); - parseV1LayerProtoTxt(msg, net, inputDim); - } - else { - error("No 'layers' or 'layer' fields found in the prototxt\n"); - return -1; - } - return 0; -} - -int calculateTensorDim( - std::vector>& net, - int inputDim[4], - std::map>& tensorMap) -{ - tensorMap[net[0][4]] = std::vector{inputDim[0], inputDim[1], inputDim[2], inputDim[3]}; - - for(auto& node : net) { - auto&& type = node[0]; - auto&& params = node[1]; - auto&& output = node[3]; - auto&& input = node[4]; - auto&& it = tensorMap.find(input); - if(it == tensorMap.end()) { - error("calculateTensorDim: no dims found for %s\n", input.c_str()); - } - - auto&& idim = it->second; - int n = idim[0], c = idim[1], H = idim[2], W = idim[3]; - int k = c, h = H, w = W; - - if (n < 1 || c < 1 || H < 1 || W < 1) - error("calculateTensorDim: got invalid dim %dx%dx%dx%d for %s\n", n, c, H, W, input.c_str()); - - if(type == "Convolution") { - std::stringstream ss(params); - int kernel_w, kernel_h, stride_w, stride_h, pad_w, pad_h, dilation_w, dilation_h, bias_term; - ss >> k >> kernel_w >> kernel_h >> stride_w >> stride_h >> pad_w >> pad_h >> dilation_w >> dilation_h >> bias_term; - w = ((W + 2 * pad_w - kernel_w - (kernel_w - 1) * (dilation_w - 1)) / stride_w) + 1; - h = ((H + 2 * pad_h - kernel_h - (kernel_h - 1) * (dilation_h - 1)) / stride_h) + 1; - tensorMap[output + "_W"] = std::vector{k, c, kernel_h, kernel_w}; - if(bias_term) { - tensorMap[output + "_B"] = std::vector{k}; - } - } - else if(type == "Deconvolution") { - std::stringstream ss(params); - int kernel_w, kernel_h, stride_w, stride_h, pad_w, pad_h, dilation_w, dilation_h, bias_term; - ss >> k >> kernel_w >> kernel_h >> stride_w >> stride_h >> pad_w >> pad_h >> dilation_w >> dilation_h >> bias_term; - w = stride_w * (W - 1) + dilation_w * (kernel_w - 1) + 1 - ( 2* pad_w ); - h = stride_h * (H - 1) + dilation_h * (kernel_h - 1) + 1 - ( 2* pad_h ); - tensorMap[output + "_W"] = std::vector{k, c, kernel_h, kernel_w}; - if(bias_term) { - tensorMap[output + "_B"] = std::vector{k}; - } - } - else if(type == "Pooling") { - std::stringstream ss(params); - int kernel_w, kernel_h, stride_w, stride_h, pad_w, pad_h, pool, global_pooling; - ss >> kernel_w >> kernel_h >> stride_w >> stride_h >> pad_w >> pad_h >> pool >> global_pooling; - if(global_pooling) { - // Compute kernel_w and kernel_h and write back the params for the GDF and C-code gen - kernel_h = H; - kernel_w = W; - pad_h = pad_w = 0; - stride_h = stride_w = 1; - params = std::to_string(kernel_w) - + " " + std::to_string(kernel_h) - + " " + std::to_string(stride_w) - + " " + std::to_string(stride_h) - + " " + std::to_string(pad_w) - + " " + std::to_string(pad_h) - + " " + std::to_string(pool) - + " " + std::to_string(global_pooling); - } - w = static_cast(ceil( static_cast (W + 2 * pad_w + stride_w - kernel_w)/ stride_w)); - h = static_cast(ceil( static_cast (H + 2 * pad_h + stride_h - kernel_h)/ stride_h)); - if(pad_h > 0) if((h-1)*stride_h >= (H+pad_h)) h=h-1; - if(pad_w > 0) if((w-1)*stride_w >= (W+pad_w)) w=w-1; - } - else if(type == "InnerProduct") { - std::stringstream ss(params); - ss >> k; - w = 1; - h = 1; - tensorMap[output + "_W"] = std::vector{k, c, H, W}; - } - else if(type == "Concat") { - for(int i = 5; i < node.size(); i++) { - auto&& dim = tensorMap[node[i]]; - k += dim[1]; - if(dim[0] != n || dim[2] != H || dim[3] != W) - error("calculateTensorDim: Concat: got invalid dim %dx%dx%dx%d for %s (should be %dx*x%dx%d)\n", dim[0], dim[1], dim[2], dim[3], node[i].c_str(), n, H, W); - } - } - else if(type == "SoftmaxWithLoss") { - output = node[5]; - } - else if (type == "BatchNorm") { - std::stringstream ss(params); - int use_global_stats; - float eps; - ss >> eps >> use_global_stats; - tensorMap[output + "_W"] = std::vector{k}; - tensorMap[output + "_B"] = std::vector{k}; - } - else if(type == "Scale") { - std::stringstream ss(params); - int bias_term; - ss >> bias_term; - tensorMap[output + "_W"] = std::vector{k}; - if(bias_term) { - tensorMap[output + "_B"] = std::vector{k}; - } - } - - tensorMap[output] = std::vector{n, k, h, w}; - if(n < 1 || k < 1 || h < 1 || w < 1) - error("calculateTensorDim: got invalid dim %dx%dx%dx%d for %s\n", n, k, h, w, output.c_str()); - } - return 0; -} - -std::string getIdentifierName(const std::string name) -{ - size_t N = name.size(); - const char * s = name.c_str(); - std::string cname = (N > 0 && std::isdigit(s[0])) ? "_" : ""; - for(size_t i = 0; i < N; i++) { - cname += std::isalnum(s[i]) ? s[i] : '_'; - } - return cname; -} - -void writeGDF( - std::ostream& ofsGDF, - std::vector>& net, - std::map>& tensorMap, - std::string tensorType, - int fixedPointPosition, - std::string convertPolicy, - std::string roundPolicy, - bool isVirtualEnabled, - std::string outputFolder, - bool bFuseScaleLayer) -{ - std::map tensorCheck; - ofsGDF << "import vx_nn" << std::endl; - bool bfuse_scale_layer = bFuseScaleLayer; - - for(auto& node : net) { - // create input/output tensor objects - bool isFirstLayer = (&node == &net.front()); - bool isLastLayer = (&node == &net.back()); - for(size_t i = 4; i < node.size(); i++) { - if(node[i] != "" && tensorCheck.find(node[i]) == tensorCheck.end()) { - auto&& dim = tensorMap[node[i]]; - if((isVirtualEnabled && isFirstLayer) || (isVirtualEnabled && isLastLayer)) { - ofsGDF << "data " << node[i] << " = tensor:4,{" << dim[3] << "," << dim[2] << "," << dim[1] << "," << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - tensorCheck[node[i]] = true; - if(!isLastLayer) { - ofsGDF << "read data input.f32" << std::endl; - } - } - else { - if(isVirtualEnabled) { - ofsGDF << "data " << node[i] << " = virtual-tensor:4,{" << dim[3] << "," << dim[2] << "," << dim[1] << "," << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - tensorCheck[node[i]] = true; - } - else { - ofsGDF << "data " << node[i] << " = tensor:4,{" << dim[3] << "," << dim[2] << "," << dim[1] << "," << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - tensorCheck[node[i]]= true; - if(isFirstLayer) ofsGDF << "read data input.f32" << std::endl; - } - } - } - } - auto&& output = node[3]; - if (node[0] == "BatchNorm" && !isLastLayer && bfuse_scale_layer) { - auto& next_node = *std::next(&node); - if (next_node[0] == "Scale") { - auto&& next_output = next_node[3]; - auto&& odim = tensorMap[next_output]; - tensorCheck[output] = true; // make sure next node doesn't create input tensor - if(!tensorCheck[next_output]) { - if(!isVirtualEnabled) { - ofsGDF << "data " << next_output << " = tensor:4,{" << odim[3] << "," << odim[2] << "," << odim[1] << "," << odim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - } - else { - if(!isLastLayer) { - ofsGDF << "data " << next_output << " = virtual-tensor:4,{" << odim[3] << "," << odim[2] << "," << odim[1] << "," << odim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - } - else { - ofsGDF << "data " << next_output << " = tensor:4,{" << odim[3] << "," << odim[2] << "," << odim[1] << "," << odim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - } - } -#if ENABLE_DIRECTIVE - ofsGDF << "directive " << next_output << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl; -#endif - } - tensorCheck[next_output] = true; - bfuse_scale_layer = true; - } - } - - if (node[0] == "Scale" && !isFirstLayer && bfuse_scale_layer) { - auto& prev_node = *std::prev(&node); - if (prev_node[0]=="BatchNorm") - continue; - } - - auto&& odim = tensorMap[output]; - if(!tensorCheck[output]) { - if(!isVirtualEnabled) { - ofsGDF << "data " << output << " = tensor:4,{" << odim[3] << "," << odim[2] << "," << odim[1] << "," << odim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - } else { - if(!isLastLayer) { - ofsGDF << "data " << output << " = virtual-tensor:4,{" << odim[3] << "," << odim[2] << "," << odim[1] << "," << odim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - } - else { - ofsGDF << "data " << output << " = tensor:4,{" << odim[3] << "," << odim[2] << "," << odim[1] << "," << odim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - } - } -#if ENABLE_DIRECTIVE - ofsGDF << "directive " << output << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl; -#endif - } - tensorCheck[output] = true; - - // create node object - auto&& type = node[0]; - auto&& params = node[1]; - std::string layer_name = getIdentifierName(node[3]); - if(type == "Convolution") { - std::stringstream ss(params); - int k, kernel_w, kernel_h, stride_w, stride_h, pad_w, pad_h, dilation_w, dilation_h, bias_term, group; - ss >> k >> kernel_w >> kernel_h >> stride_w >> stride_h >> pad_w >> pad_h >> dilation_w >> dilation_h >> bias_term >> group; - - if(group > 1) { - // Slice the input tensor into group tensors - auto&& dim_ip_grp = tensorMap[node[4]]; - - for(int g = 0; g < group; g++) { - if(!isVirtualEnabled) { - ofsGDF << "data " << node[4] << "_grp" << g << " = tensor:4,{" << dim_ip_grp[3] << "," << dim_ip_grp[2] << "," << dim_ip_grp[1]/group << "," << dim_ip_grp[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - } - else { - ofsGDF << "data " << node[4] << "_grp" << g << " = virtual-tensor:4,{" << dim_ip_grp[3] << "," << dim_ip_grp[2] << "," << dim_ip_grp[1]/group << "," << dim_ip_grp[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - } - } - - // Conv - auto&& dim_op_grp = tensorMap[node[3]]; - auto&& dim_w = tensorMap[output + "_W"]; - - for(int g = 0; g < group; g++) { - if(!isVirtualEnabled) { - ofsGDF << "data " << output << "_grp" << g << " = tensor:4,{" << dim_op_grp[3] << "," << dim_op_grp[2] << "," << dim_op_grp[1]/group << "," << dim_op_grp[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - } - else { - ofsGDF << "data " << output << "_grp" << g << " = virtual-tensor:4,{" << dim_op_grp[3] << "," << dim_op_grp[2] << "," << dim_op_grp[1]/group << "," << dim_op_grp[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - } - - ofsGDF << "data " << output << "_grp" << g << "_W" << " = tensor:4,{" << dim_w[3] << "," << dim_w[2] << "," << dim_w[1]/group << "," << dim_w[0]/group << "}," << tensorType << "," << fixedPointPosition << std::endl; - ofsGDF << "init " << output << "_grp" << g << "_W weights/" << layer_name << "_grp" << g << ".f32" << std::endl; -#if ENABLE_DIRECTIVE - ofsGDF << "directive " << output << "_grp" << g << "_W" << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl; -#endif - - if(bias_term){ - ofsGDF << "data " << output << "_grp" << g << "_B" << " = tensor:1,{" << k / group << "}," << tensorType << "," << fixedPointPosition << std::endl; - ofsGDF << "init " << output << "_grp" << g << "_B bias/" << layer_name << "_grp" << g << ".f32" << std::endl; -#if ENABLE_DIRECTIVE - ofsGDF << "directive " << output << "_grp" << g << "_B" << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl; -#endif - } - } - - ofsGDF << "data " << node[3] << "_params = " << " scalar:VX_TYPE_NN_CONVOLUTION_PARAMS,{" << pad_w << "," << pad_h << "," << convertPolicy << "," << roundPolicy << ",VX_NN_DS_SIZE_ROUNDING_FLOOR," << dilation_w-1 << "," << dilation_h-1 << "}" << std::endl; - tensorCheck[output + "_W"] = true; - if(bias_term) tensorCheck[output + "_B"] = true; - - ofsGDF << "node com.amd.nn_extension.slice_layer "; - ofsGDF << node[4]; - for(int g = 0; g < group; g++) { - ofsGDF << " " << node[4] << "_grp" << g; - } - ofsGDF << std::endl; -#if ENABLE_DUMP_LAYER_DATA - for(int g = 0; g < group; g++) { - ofsGDF << "write "<< node[4] << "_grp" << g << " out/"<< node[4] << "_grp" << g << ".f32" << std::endl; - } -#endif - - for(int g = 0; g < group; g++) { - ofsGDF << "node org.khronos.nn_extension.convolution_layer "; - ofsGDF << node[4] << "_grp" << g << " "; - ofsGDF << node[3] << "_grp" << g << "_W "; - if(bias_term) - ofsGDF << node[3] << "_grp" << g << "_B "; - else - ofsGDF << "NULL "; - ofsGDF << node[3] << "_params "; - ofsGDF << node[3] << "_grp" << g << std::endl; - -#if ENABLE_DUMP_LAYER_DATA - ofsGDF << "write "<< node[3] << "_grp" << g << " out/"<< layer_name << ".f32" << std::endl; -#endif - } - - ofsGDF << "node com.amd.nn_extension.concat_layer "; - ofsGDF << node[3]; - for(int g = 0; g < group; g++) { - ofsGDF << " " << node[3] << "_grp" << g; - } - ofsGDF << std::endl; -#if ENABLE_DUMP_LAYER_DATA - for(int g = 0; g < group; g++) { - ofsGDF << "write "<< node[3] << "_grp" << g << " out/"<< node[3] << "_grp" << g << ".f32" << std::endl; - } -#endif - } - else { - std::string weights = output + "_W"; - auto&& dim = tensorMap[weights]; - ofsGDF << "data " << weights << " = tensor:4,{" << dim[3] << "," << dim[2] << "," << dim[1] << "," << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - ofsGDF << "init " << weights << " "; - ofsGDF << "weights/" << layer_name << ".f32" << std::endl; -#if ENABLE_DIRECTIVE - ofsGDF << "directive " << weights << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl; -#endif - tensorCheck[weights] = true; - std::string bias = "NULL"; - if(bias_term) { - bias = output + "_B"; - ofsGDF << "data " << bias << " = tensor:1,{" << k << "}," << tensorType << "," << fixedPointPosition << std::endl; - ofsGDF << "init " << bias << " "; - ofsGDF << "bias/"<< layer_name << ".f32" << std::endl; -#if ENABLE_DIRECTIVE - ofsGDF << "directive " << bias << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl; -#endif - tensorCheck[bias] = true; - } - - ofsGDF << "data " << node[3] << "_params = " << " scalar:VX_TYPE_NN_CONVOLUTION_PARAMS,{" << pad_w << "," << pad_h << "," << convertPolicy << "," << roundPolicy << ",VX_NN_DS_SIZE_ROUNDING_FLOOR," << dilation_w-1 << "," << dilation_h-1 << "}" << std::endl; - ofsGDF << "node org.khronos.nn_extension.convolution_layer " << node[4] << " " << node[3] << "_W" << " " << bias << " " - << node[3] <<"_params" - << " " << node[3] - << std::endl; -#if ENABLE_DUMP_LAYER_DATA - ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl; -#endif - } - } - else if (type == "Deconvolution") { - std::stringstream ss(params); - int k, kernel_w, kernel_h, stride_w, stride_h, pad_w, pad_h, dilation_w, dilation_h, bias_term; - ss >> k >> kernel_w >> kernel_h >> stride_w >> stride_h >> pad_w >> pad_h >> dilation_w >> dilation_h >> bias_term; - std::string weights = output + "_W"; - auto&& dim = tensorMap[weights]; - ofsGDF << "data " << weights << " = tensor:4,{" << dim[3] << "," << dim[2] << "," << dim[1] << "," << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - ofsGDF << "init " << weights << " weights/" << layer_name << ".f32" << std::endl; -#if ENABLE_DIRECTIVE - ofsGDF << "directive " << weights << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl; -#endif - tensorCheck[weights] = true; - std::string bias = "NULL"; - if(bias_term) { - bias = output + "_B"; - ofsGDF << "data " << bias << " = tensor:1,{" << k << "}," << tensorType << "," << fixedPointPosition << std::endl; - ofsGDF << "init " << bias << " bias/"<< layer_name << ".f32" << std::endl; -#if ENABLE_DIRECTIVE - ofsGDF << "directive " << bias << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl; -#endif - tensorCheck[bias] = true; - } - - ofsGDF << "data " << node[3] << "_params = " << " scalar:VX_TYPE_NN_DECONVOLUTION_PARAMS,{" << pad_w << "," << pad_h << "," << convertPolicy << "," << roundPolicy << "," << dilation_w-1 << "," << dilation_h-1 << "}" << std::endl; - ofsGDF << "node org.khronos.nn_extension.deconvolution_layer " << node[4] << " " << node[3] << "_W" << " " << bias << " " - << node[3] <<"_params" - << " " << node[3] - << std::endl; -#if ENABLE_DUMP_LAYER_DATA - ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl; -#endif - } - else if(type == "Pooling") { - std::stringstream ss(params); - int kernel_w, kernel_h, stride_w, stride_h, pad_w, pad_h, pool; - ss >> kernel_w >> kernel_h >> stride_w >> stride_h >> pad_w >> pad_h >> pool; - if((pool != 0 && pool != 1)) error("writeGDF: pooling_layer supports only MAX and AVG\n"); - ofsGDF << "data " << node[3] <<"_type = " << " scalar:VX_TYPE_ENUM," << (pool == 0 ? "VX_NN_POOLING_MAX" : "VX_NN_POOLING_AVG")<< std::endl; - ofsGDF << "data " << node[3] <<"_kernel_w = " << "scalar:VX_TYPE_SIZE," << kernel_w << std::endl; - ofsGDF << "data " << node[3] <<"_kernel_h = " << "scalar:VX_TYPE_SIZE," << kernel_h << std::endl; - ofsGDF << "data " << node[3] <<"_pad_w = " << "scalar:VX_TYPE_SIZE," << pad_w << std::endl; - ofsGDF << "data " << node[3] <<"_pad_h = " << "scalar:VX_TYPE_SIZE," << pad_h << std::endl; - ofsGDF << "data " << node[3] <<"_roundPolicy = " << " scalar:VX_TYPE_ENUM," << roundPolicy << std::endl; - ofsGDF << "node org.khronos.nn_extension.pooling_layer " << node[4] << " " - << node[3] << "_type" << " " - << node[3] << "_kernel_w " - << node[3] << "_kernel_h " - << node[3] << "_pad_w " - << node[3] << "_pad_h " - << node[3] << "_roundPolicy" - << " " << node[3] - << std::endl; -#if ENABLE_DUMP_LAYER_DATA - ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl; -#endif - } - else if(type == "InnerProduct") { - std::stringstream ss(params); - int k, bias_term; - ss >> k >> bias_term; - std::string weights = output + "_W"; - auto&& dim = tensorMap[weights]; - ofsGDF << "data " << weights << " = tensor:4,{" << dim[3] << "," << dim[2] << "," << dim[1] << "," << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - ofsGDF << "init " << weights << " weights/"<< layer_name << ".f32" << std::endl; -#if ENABLE_DIRECTIVE - ofsGDF << "directive " << weights << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl; -#endif - tensorCheck[weights] = true; - std::string bias = "NULL"; - if(bias_term) { - bias = output + "_B"; - ofsGDF << "data " << bias << " = tensor:1,{" << k << "}," << tensorType << "," << fixedPointPosition << std::endl; - ofsGDF << "init " << bias << " bias/"<< layer_name << ".f32" << std::endl; -#if ENABLE_DIRECTIVE - ofsGDF << "directive " << bias << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl; -#endif - tensorCheck[bias] = true; - } - ofsGDF << "data " << node[3] <<"_convertPolicy = " << " scalar:VX_TYPE_ENUM," << convertPolicy << std::endl; - ofsGDF << "data " << node[3] <<"_roundPolicy =" << " scalar:VX_TYPE_ENUM,VX_" << roundPolicy << std::endl; - ofsGDF << "node org.khronos.nn_extension.fully_connected_layer " << node[4] << " " << node[3] << "_W" << " " << bias << " " - << node[3] << "_convertPolicy " - << node[3] << "_roundPolicy" - << " " << node[3] - << std::endl; -#if ENABLE_DUMP_LAYER_DATA - ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl; -#endif - } - else if(type == "ReLU") { - std::stringstream ss(params); - float neg_slope; - ss >> neg_slope; - if (!neg_slope) - { - ofsGDF << "data " << node[3] << "_mode = " << " scalar:VX_TYPE_ENUM,VX_NN_ACTIVATION_RELU" << std::endl; - ofsGDF << "data " << node[3] << "_param_a =" << " scalar:VX_TYPE_FLOAT32,0" << std::endl; - }else - { - ofsGDF << "data " << node[3] << "_mode = " << " scalar:VX_TYPE_ENUM,VX_NN_ACTIVATION_LEAKY_RELU" << std::endl; - ofsGDF << "data " << node[3] << "_param_a =" << " scalar:VX_TYPE_FLOAT32," << neg_slope << std::endl; - } - ofsGDF << "data " << node[3] << "_param_b =" << " scalar:VX_TYPE_FLOAT32,0" << std::endl; - ofsGDF << "node org.khronos.nn_extension.activation_layer " << node[4] << " " - << node[3] << "_mode " - << node[3] << "_param_a " - << node[3] << "_param_b" - << " " << node[3] - << std::endl; -#if ENABLE_DUMP_LAYER_DATA - ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl; -#endif - } - else if(type == "LRN") { - int normalization_size; - float alpha, beta, k; - std::string norm_region; - std::stringstream ss(params); - ss >> normalization_size >> alpha >> beta >> norm_region >> k; - std::string lrnType; - if(norm_region == "1") lrnType = "VX_NN_NORMALIZATION_SAME_MAP"; - else lrnType = "VX_NN_NORMALIZATION_ACROSS_MAPS"; - ofsGDF << "data " << node[3] << "_mode = " << " scalar:VX_TYPE_ENUM," << lrnType << std::endl; - ofsGDF << "data " << node[3] << "_size = " << " scalar:VX_TYPE_SIZE," << normalization_size << std::endl; - ofsGDF << "data " << node[3] << "_alpha =" << " scalar:VX_TYPE_FLOAT32," << alpha << std::endl; - ofsGDF << "data " << node[3] << "_beta =" << " scalar:VX_TYPE_FLOAT32," << beta << std::endl; - ofsGDF << "data " << node[3] << "_bias =" << " scalar:VX_TYPE_FLOAT32," << k << std::endl; - ofsGDF << "node org.khronos.nn_extension.normalization_layer " << node[4] << " " - << node[3] << "_mode " - << node[3] << "_size " - << node[3] << "_alpha " - << node[3] << "_beta " - << node[3] << " " - << node[3] << "_bias" - << std::endl; -#if ENABLE_DUMP_LAYER_DATA - ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl; -#endif - } - else if(type == "BatchNorm") { - int use_global_stats, bias_term; - float eps; - std::stringstream ss(params); - ss >> eps >> use_global_stats; - std::string weights = output + "_W"; - auto&& dim = tensorMap[weights]; - ofsGDF << "data " << weights << " = tensor:1,{" << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - ofsGDF << "init " << weights << " weights/" << layer_name << ".f32" << std::endl; -#if ENABLE_DIRECTIVE - ofsGDF << "directive " << weights << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl; -#endif - tensorCheck[weights] = true; - std::string bias = output + "_B"; - dim = tensorMap[bias]; - ofsGDF << "data " << bias << " = tensor:1,{" << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - ofsGDF << "init " << bias << " bias/" << layer_name << ".f32" << std::endl; -#if ENABLE_DIRECTIVE - ofsGDF << "directive " << bias << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl; -#endif - tensorCheck[bias] = true; - bias = "NULL"; - if (bfuse_scale_layer) { - // check next node. If scale extract weight and bias paramters for scale layer. - auto& next_node = *std::next(&node); - auto&& next_output = next_node[3]; - auto&& nn_params = next_node[1]; - std::string nn_layer_name = getIdentifierName(next_node[3]); - weights = next_output + "_W"; - std::stringstream ss(nn_params); - ss >> bias_term; - dim = tensorMap[weights]; - ofsGDF << "data " << weights << " = tensor:1,{" << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - ofsGDF << "init " << weights << " weights/" << nn_layer_name << ".f32" << std::endl; - tensorCheck[weights] = true; - if(bias_term) { - bias = next_output + "_B"; - ofsGDF << "data " << bias << " = tensor:1,{" << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - ofsGDF << "init " << bias << " bias/"<< nn_layer_name << ".f32" << std::endl; -#if ENABLE_DIRECTIVE - ofsGDF << "directive " << bias << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl; -#endif - tensorCheck[bias] = true; - } - ofsGDF << "data " << node[3] << "_eps =" << " scalar:VX_TYPE_FLOAT32," << eps << std::endl; - ofsGDF << "node com.amd.nn_extension.batch_normalization_layer " << node[4] << " " << node[3] << "_W " - << node[3] << "_B " - << weights << " " - << bias << " " - << node[3] << "_eps " - << next_node[3] - << std::endl; -#if ENABLE_DUMP_LAYER_DATA - ofsGDF << "write "<< next_node[3] << " out/"<< layer_name << ".f32" << std::endl; -#endif - } - else { - weights = output +"_W1"; - ofsGDF << "data " << weights << " = tensor:1,{" << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - // put default scale and bias term - std::vector scale_arr(dim[0]); - std::fill(scale_arr.begin(), scale_arr.end(), 1.0); - std::string fileName_weights = outputFolder + "/scale_init.f32"; - FILE *fp = fopen(fileName_weights.c_str(), "wb"); - if (fp) { - fwrite(scale_arr.data(), sizeof(float), dim[0], fp); - fclose(fp); - } - ofsGDF << "init " << weights << " scale_init.f32" << std::endl; - ofsGDF << "data " << node[3] << "_eps =" << " scalar:VX_TYPE_FLOAT32," << eps << std::endl; - ofsGDF << "node com.amd.nn_extension.batch_normalization_layer " << node[4] << " " << node[3] << "_W " - << node[3] << "_B " - << weights << " " - << bias << " " - << node[3] << "_eps " - << output - << std::endl; -#if ENABLE_DUMP_LAYER_DATA - ofsGDF << "write "<< output << " out/"<< layer_name << ".f32" << std::endl; -#endif - } - } - else if(type == "Eltwise") { - int op; - std::stringstream ss(params); - ss >> op; - auto&& dim = tensorMap[node[3]]; - for(int i = 4; i < node.size(); i++) { - auto&& idim = tensorMap[node[i]]; - if(dim[0] != idim[0] || dim[1] != idim[1] || dim[2] != idim[2] || dim[3] != idim[3]) - error("writeGDF: Eltwise op=%d requires same dimension inputs: %s[%dx%dx%dx%d] != %s[%dx%dx%dx%d]\n", op, node[i].c_str(), idim[0], idim[1], idim[2], idim[3], node[i-1].c_str(), dim[0], dim[1], dim[2], dim[3]); - dim = idim; - } - std::string tmp = node[4]; - for(int i = 5; i < node.size(); i++) { - std::string out = node[3]; - if(i < node.size()-1) { - out += "tmp_" + std::to_string(i-4); - ofsGDF << "data " << out << " = tensor:4,{" << dim[3] << "," << dim[2] << "," << dim[1] << "," << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - tensorCheck[out] = true; - } - if(op == 1) { - ofsGDF << "data " << node[3] <<"_convertPolicy =" << " scalar:VX_TYPE_ENUM," << convertPolicy << std::endl; - ofsGDF << "node org.khronos.openvx.tensor_add " << tmp << " " << node[i] << " " - << node[3] << "_convertPolicy" - << " " << out - << std::endl; - tmp = out; -#if ENABLE_DUMP_LAYER_DATA - ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl; -#endif - } - else error("writeGDF: Eltwise op=%d not supported\n", op); - } - } - else if(type == "Scale") { - int bias_term; - auto&& type = node[0]; - auto&& params = node[1]; - std::string layer_name = getIdentifierName(node[3]); - std::string weights = output + "_W"; - std::stringstream ss(params); ss >> bias_term; - auto&& dim = tensorMap[weights]; - ofsGDF << "data " << weights << " = tensor:1,{" << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - ofsGDF << "init " << weights << " weights/" << layer_name << ".f32" << std::endl; - tensorCheck[weights] = true; -#if ENABLE_DIRECTIVE - ofsGDF << "directive " << weights << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl; -#endif - std::string bias = "NULL"; - if(bias_term) { - bias = output + "_B "; - ofsGDF << "data " << bias << " = tensor:1,{" << dim[0] << "}," << tensorType << "," << fixedPointPosition << std::endl; - ofsGDF << "init " << bias << " bias/"<< layer_name << ".f32" << std::endl; -#if ENABLE_DIRECTIVE - ofsGDF << "directive " << bias << " VX_DIRECTIVE_AMD_COPY_TO_OPENCL" << std::endl; -#endif - tensorCheck[bias] = true; - } - - ofsGDF << "node com.amd.nn_extension.scale_layer " << node[4] << " " - << node[3] << "_W " - << node[3] << "_B " - << node[3] - << std::endl; -#if ENABLE_DUMP_LAYER_DATA - ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl; -#endif - } - else if(type == "Concat") { - ofsGDF << "node com.amd.nn_extension.concat_layer"; - ofsGDF << " " << node[3]; - for(int i = 4; i < node.size(); i++) { - ofsGDF << " " << node[i]; - } - ofsGDF << std::endl; -#if ENABLE_DUMP_LAYER_DATA - ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl; -#endif - } - else if(type == "Dropout") { - //during inference dropout layer copies its input to output. - ofsGDF << "node org.khronos.openvx.copy " << node[4] << " " << node[3] << std::endl; -#if ENABLE_DUMP_LAYER_DATA - ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl; -#endif - } - else if(type == "Softmax") { - ofsGDF << "node org.khronos.nn_extension.softmax_layer " << node[4] - << " " << node[3] - << std::endl; -#if ENABLE_DUMP_LAYER_DATA - ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl; -#endif - } - else if(type == "Split") { - ofsGDF << "node org.khronos.openvx.copy " << node[4] << " " << node[3] << std::endl; -#if ENABLE_DUMP_LAYER_DATA - ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl; -#endif - } - else if(type == "SoftmaxWithLoss") { - ofsGDF << "node org.khronos.nn_extension.softmax_layer " << node[4] - << " " << node[5] - << std::endl; -#if ENABLE_DUMP_LAYER_DATA - ofsGDF << "write "<< node[3] << " out/"<< layer_name << ".f32" << std::endl; -#endif - } - else { - ofsGDF << "# " - << std::left << std::setw(16) << node[0] - << std::left << std::setw(24) << node[1] - << std::left << std::setw(32) << node[3] - ; - for(size_t i = 4; i < node.size(); i++) - ofsGDF << std::left << std::setw(32) << node[i]; - ofsGDF << std::endl; - } - if(isLastLayer) { - ofsGDF << "write " << node[3] << " output.f32" << std::endl; - auto&& odim = tensorMap[node[3]]; - printf("#OUTPUT-TENSOR: %s %d %d %d %d\n", node[3].c_str(), odim[0], odim[1], odim[2], odim[3]); - } - ofsGDF << std::endl; - } -} - -void dumpLayerData(const caffe::LayerParameter& layer_parameter, std::string outputFolder) -{ - std:: string layer_name; - if(layer_parameter.has_name()) { - layer_name = getIdentifierName(layer_parameter.name()); - } - - std::string fileName_weights = outputFolder + "/weights/" + layer_name + ".f32"; - std::string fileName_bias = outputFolder + "/bias/" + layer_name + ".f32"; - FILE * fs_weights; - FILE * fs_bias; - fs_weights = fopen(fileName_weights.c_str(), "wb"); - fs_bias = fopen(fileName_bias.c_str(),"wb"); - if(!fs_weights || !fs_bias) { - printf("ERROR: unable to create dump files: make sure weights and bias folders are writable.\n"); - exit(1); - } - int blob_size = layer_parameter.blobs_size(); - if(blob_size > 0) { - //Extracting the weights. - const caffe::BlobProto& weights_blob = layer_parameter.blobs(0); - int weightsize = weights_blob.data_size(); - - for(int i=0;i= 2) { - //Extraction of Bias. - const caffe::BlobProto bias_blob = layer_parameter.blobs(1); - int biassize = bias_blob.data_size(); - - for(int i=0; i < biassize; i++) { - float bias = bias_blob.data(i); - fwrite(&bias,sizeof(float),1,fs_bias); - } - } - } - - fclose(fs_weights); - fclose(fs_bias); -} - -void dumpV1LayerData(const caffe::V1LayerParameter& layer_parameter, std::string outputFolder) -{ - std:: string layer_name; - if(layer_parameter.has_name()) { - layer_name = getIdentifierName(layer_parameter.name()); - } - - if(layer_parameter.type() == caffe::V1LayerParameter_LayerType_CONVOLUTION) - { - const caffe::ConvolutionParameter& conv = layer_parameter.convolution_param(); - int num_groups = conv.has_group() ? conv.group() : 0; - if(num_groups > 1) - { - int blob_size = layer_parameter.blobs_size(); - const caffe::BlobProto& weights_blob = layer_parameter.blobs(0); - int weightsize_per_grp = weights_blob.data_size() / num_groups; - int biassize_per_grp = (blob_size >= 2) ? layer_parameter.blobs(1).data_size() / num_groups : 0; - - for(int grp = 0; grp < num_groups; grp++) - { - std::stringstream fileName_weights; - fileName_weights << outputFolder << "/weights/" << layer_name << "_grp" << grp << ".f32"; - std::stringstream fileName_bias; - fileName_bias << outputFolder << "/bias/" << layer_name << "_grp" << grp << ".f32"; - - FILE * fs_weights = fopen(fileName_weights.str().c_str(), "wb"); - FILE * fs_bias = fopen(fileName_bias.str().c_str(),"wb"); - if(!fs_weights || !fs_bias) { - printf("ERROR: unable to create dump files: make sure weights and bias folders are writable.\n"); - exit(1); - } - - // Write weights - for(int i = weightsize_per_grp * grp; i < (weightsize_per_grp * (grp + 1)); i++) { - float weight = weights_blob.data(i); - fwrite(&weight, sizeof(float), 1, fs_weights); - } - - if(blob_size >= 2) { - // Write bias - const caffe::BlobProto bias_blob = layer_parameter.blobs(1); - for(int i = biassize_per_grp * grp; i < (biassize_per_grp * (grp + 1)); i++) { - float bias = bias_blob.data(i); - fwrite(&bias,sizeof(float),1,fs_bias); - } - } - } - return; - } - } - - std::string fileName_weights = outputFolder + "/weights/" + layer_name + ".f32"; - std::string fileName_bias = outputFolder + "/bias/" + layer_name + ".f32"; - FILE * fs_weights; - FILE * fs_bias; - fs_weights = fopen(fileName_weights.c_str(), "wb"); - fs_bias = fopen(fileName_bias.c_str(),"wb"); - if(!fs_weights || !fs_bias) { - printf("ERROR: unable to create dump files: make sure weights and bias folders are writable.\n"); - exit(1); - } - int blob_size = layer_parameter.blobs_size(); - if(blob_size > 0) { - //Extracting the weights. - const caffe::BlobProto& weights_blob = layer_parameter.blobs(0); - int weightsize = weights_blob.data_size(); - - for(int i=0;i= 2) { - //Extraction of Bias. - const caffe::BlobProto bias_blob = layer_parameter.blobs(1); - int biassize = bias_blob.data_size(); - - for(int i=0; i < biassize; i++) { - float bias = bias_blob.data(i); - fwrite(&bias,sizeof(float),1,fs_bias); - } - } - } - - fclose(fs_weights); - fclose(fs_bias); -} - -void writeVXCode( - std::ostream& ofsCodeC, - std::vector>& net, - std::map>& tensorMap, - std::string tensorType, - int fixedPosition, - std::string convertPolicy, - std::string roundPolicy, - bool isVirtualEnabled, - bool bFuseScaleLayer, - std::string outputFolder, - std::string codeType) -{ - auto&& inputTensorName = net[0][4]; - auto&& outputTensorName = net[net.size()-1][3]; - - bool bfuse_scale_layer = bFuseScaleLayer; - std::map declare_tensor_check; - for(auto& node : net) { - //declare input tensors. - bool isFirstLayer = (&node == &net.front()); - bool isLastLayer = (&node == &net.back()); - - std::string layerName = getIdentifierName(node[3]); - std::string inputName = getIdentifierName(node[4]); - if(codeType == "initialize") { - ofsCodeC << " // " << layerName <<" Layer" << std::endl; - } - for(size_t i=4; i < node.size(); i++) { - if(node[i] != "" && declare_tensor_check.find(node[i]) == declare_tensor_check.end()) { - auto&& dim = tensorMap[node[i]]; - if(codeType == "initialize") { - if(node[i] != inputTensorName && node[i] != outputTensorName) { - ofsCodeC << " vx_size " << node[i] << "_dims[4] = { " << dim[3] << ", " << dim[2] << ", " << dim[1] << ", " << dim[0] << " };" << std::endl; - ofsCodeC << " vx_tensor " << node[i] << ";" << std::endl; - ofsCodeC << " " << node[i] << " = vxCreateTensor(context, 4, " << node[i] + "_dims,"<< tensorType <<", " << fixedPosition << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << node[i] << ");" << std::endl; - } - } - else if(codeType == "release") { - if(node[i] != inputTensorName && node[i] != outputTensorName) { - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << node[i] << "));" << std::endl; - } - } - declare_tensor_check[node[i]]= true; - } - } - - if (node[0] == "BatchNorm" && !isLastLayer && bfuse_scale_layer) { - auto&& output = node[3]; - auto& next_node = *std::next(&node); - if (next_node[0] == "Scale") { - auto&& next_output = next_node[3]; - std::string nextOutput = getIdentifierName(next_node[3]); - auto&& odim = tensorMap[next_output]; - if(!declare_tensor_check[next_output]) { - if((codeType == "initialize") && nextOutput != outputTensorName) { - ofsCodeC << " vx_size " << nextOutput << "_dims[4] = { " << odim[3] << ", " << odim[2] << ", " << odim[1] << ", " << odim[0] << " };" << std::endl; - ofsCodeC << " vx_tensor " << nextOutput << ";" << std::endl; - if(isVirtualEnabled){ - ofsCodeC << " " << nextOutput << " = vxCreateVirtualTensor(graph,4, " << nextOutput + "_dims, VX_TYPE_FLOAT32," << fixedPosition << ");" << std::endl; - } - else{ - ofsCodeC << " " << nextOutput << " = vxCreateTensor(context,4, " << nextOutput + "_dims, VX_TYPE_FLOAT32," << fixedPosition << ");" << std::endl; - } - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << nextOutput << ");" << std::endl; - } - else if((codeType == "release") && nextOutput != outputTensorName) { - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << nextOutput << "));" << std::endl; - } - declare_tensor_check[output] = true; - } - declare_tensor_check[next_output] = true; - bfuse_scale_layer = true; - } - } - if (node[0] == "Scale" && !isFirstLayer && bfuse_scale_layer) { - auto& prev_node = *std::prev(&node); - if (prev_node[0]=="BatchNorm"){ - if(codeType == "initialize") { - ofsCodeC << " // [NOTE -- Scale Layer Fused With Batch Norm Layer]" << std::endl<< std::endl; - } - continue; - } - } - - // declare output tensor. - auto&& output = node[3]; - auto&& odim = tensorMap[output]; - if(!declare_tensor_check[output]) { - if(codeType == "initialize") { - if(layerName != outputTensorName) { - ofsCodeC << " vx_size " << layerName << "_dims[4] = { " << odim[3] << ", " << odim[2] << ", " << odim[1] << ", " << odim[0] << " };" << std::endl; - ofsCodeC << " vx_tensor " << layerName << ";" << std::endl; - if(isVirtualEnabled){ - ofsCodeC << " " << layerName << " = vxCreateVirtualTensor(graph,4, " << layerName + "_dims, VX_TYPE_FLOAT32," << fixedPosition << ");" << std::endl; - } - else{ - ofsCodeC << " " << layerName << " = vxCreateTensor(context,4, " << layerName + "_dims, VX_TYPE_FLOAT32," << fixedPosition << ");" << std::endl; - } - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << layerName << ");" << std::endl; - } - } - else if(codeType == "release") { - if(layerName != outputTensorName) { - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << layerName << "));" << std::endl; - } - } - declare_tensor_check[output] = true; - } - - auto&& type = node[0]; - auto&& params = node[1]; - if(type == "Convolution") { - std::stringstream ss(params); - int k, kernel_w, kernel_h, stride_w, stride_h, pad_w, pad_h, dilation_w, dilation_h, bias_term, group; - ss >> k >> kernel_w >> kernel_h >> stride_w >> stride_h >> pad_w >> pad_h >> dilation_w >> dilation_h >> bias_term >> group; - if(group > 1) { - auto&& idim = tensorMap[inputName]; - - if(codeType == "initialize") { - ofsCodeC << " vx_size " << inputName << "_grp_dims[4] = { " << idim[3] << ", " << idim[2] << ", " << idim[1]/group << ", " << idim[0] << " };" << std::endl; - ofsCodeC << " vx_size " << layerName << "_grp_dims[4] = { " << odim[3] << ", " << odim[2] << ", " << odim[1]/group << ", " << odim[0] << " };" << std::endl; - for(int g = 0; g < group; g++) { - // Input tensor for the group-g conv - ofsCodeC << " vx_tensor " << inputName << "_grp" << g << ";" << std::endl; - if(isVirtualEnabled){ - ofsCodeC << " " << inputName << "_grp" << g << " = vxCreateVirtualTensor(graph,4, " << inputName << "_grp_dims, VX_TYPE_FLOAT32," << fixedPosition << ");" << std::endl; - } - else{ - ofsCodeC << " " << inputName << "_grp" << g << " = vxCreateTensor(context,4, " << inputName << "_grp_dims, VX_TYPE_FLOAT32," << fixedPosition << ");" << std::endl; - } - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << inputName << "_grp" << g << ");" << std::endl; - - // Output tensor for the group-g conv - ofsCodeC << " vx_tensor " << layerName << "_grp" << g << ";" << std::endl; - if(isVirtualEnabled){ - ofsCodeC << " " << layerName << "_grp" << g << " = vxCreateVirtualTensor(graph,4, " << layerName << "_grp_dims, VX_TYPE_FLOAT32," << fixedPosition << ");" << std::endl; - } - else{ - ofsCodeC << " " << layerName << "_grp" << g << " = vxCreateTensor(context,4, " << layerName << "_grp_dims, VX_TYPE_FLOAT32," << fixedPosition << ");" << std::endl; - } - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << layerName << "_grp" << g << ");" << std::endl; - - } - - // Slice conv input - ofsCodeC << " vx_node " << inputName << "_grp_slice_node;" << std::endl; - ofsCodeC << " " << inputName << "_grp_slice_node = " << "vxSliceLayer(graph, "; - ofsCodeC << inputName; - for(int g = 0; g < 8; g++) { - if(g < group) - ofsCodeC << ", " << inputName << "_grp" << g; - else - ofsCodeC << ", NULL"; - } - ofsCodeC << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << inputName << "_grp_slice_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << inputName << "_grp_slice_node));" << std::endl; - - // Concat conv output - ofsCodeC << " vx_node " << layerName << "_grp_concat_node;" << std::endl; - ofsCodeC << " " << layerName << "_grp_concat_node = " << "vxConcatLayer(graph, "; - ofsCodeC << layerName; - for(int g = 0; g < 8; g++) { - if(g < group) - ofsCodeC << ", " << layerName << "_grp" << g; - else - ofsCodeC << ", NULL"; - } - ofsCodeC << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << layerName << "_grp_concat_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName << "_grp_concat_node));" << std::endl; - } - else if(codeType == "release") { - for(int g = 0; g < group; g++) { - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << inputName << "_grp" << g << "));" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << layerName << "_grp" << g << "));" << std::endl; - } - } - - auto&& dim = tensorMap[output + "_W"]; - if(codeType == "initialize") { - ofsCodeC << " vx_size " << layerName << "_W" << "_dims[4] = { " << dim[3] << ", " << dim[2] << ", " << dim[1]/group << ", " << dim[0]/group << " };" << std::endl; - for(int g = 0; g < group; g++) { - ofsCodeC << " vx_tensor " << layerName << "_grp" << g << "_W" << ";" << std::endl; - ofsCodeC << " " << layerName << "_grp" << g << "_W" << " = vxCreateTensor(context,4, " << layerName << "_W" << "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << layerName << "_grp" << g << "_W" << "); " << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(copyTensor(" << layerName << "_grp" << g << "_W" << ", dataFolder + \"/weights/" << layerName << "_grp" << g << ".f32\"));" << std::endl; - } - } - else if(codeType == "release") { - for(int g = 0; g < group; g++) { - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << layerName << "_grp" << g << "_W" << "));" << std::endl; - } - } - declare_tensor_check[output + "_W"] = true; - if(bias_term) { - if(codeType == "initialize") { - ofsCodeC << " vx_size " << layerName << "_B" << "_dims[1] = { " << k/group << " };" << std::endl; - for(int g = 0; g < group; g++) { - ofsCodeC << " vx_tensor " << layerName << "_grp" << g << "_B" << ";" << std::endl; - ofsCodeC << " " << layerName << "_grp" << g << "_B" << " = vxCreateTensor(context,1, " << layerName << "_B" "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << layerName << "_grp" << g << "_B" << "); " << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(copyTensor(" << layerName << "_grp" << g << "_B" << ", dataFolder + \"/bias/" << layerName << "_grp" << g << ".f32\"));" << std::endl; - } - } - else if(codeType == "release") { - for(int g = 0; g < group; g++) { - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << layerName << "_grp" << g << "_B" << "));" << std::endl; - } - } - declare_tensor_check[layerName + "_B"] = true; - } - - if(codeType == "initialize") { - ofsCodeC << " vx_nn_convolution_params_t " << layerName << "_params;" << std::endl; - ofsCodeC << " " << layerName + "_params.padding_x = " << pad_w << ";" << std::endl; - ofsCodeC << " " << layerName + "_params.padding_y = " << pad_h << ";" << std::endl; - ofsCodeC << " " << layerName + "_params.overflow_policy = " << convertPolicy << ";" << std::endl; - ofsCodeC << " " << layerName + "_params.rounding_policy = " << roundPolicy << ";" << std::endl; - ofsCodeC << " " << layerName + "_params.down_scale_size_rounding = " << "VX_NN_DS_SIZE_ROUNDING_FLOOR ;" << std::endl; - ofsCodeC << " " << layerName + "_params.dilation_x = " << dilation_w - 1 << ";" << std::endl; - ofsCodeC << " " << layerName + "_params.dilation_y = " << dilation_h - 1 << ";" << std::endl; - - for(int g = 0; g < group; g++) { - ofsCodeC << " vx_node " << layerName << "_grp" << g << "_node;" << std::endl; - ofsCodeC << " " << layerName << "_grp" << g << "_node = " << "vxConvolutionLayer(graph, "; - ofsCodeC << inputName << "_grp" << g << ", "; - ofsCodeC << layerName << "_grp" << g << "_W, "; - if(bias_term) - ofsCodeC << layerName << "_grp" << g << "_B, "; - else - ofsCodeC << "NULL, "; - ofsCodeC << "&" << layerName + "_params, " << "sizeof(" << layerName + "_params ), "; - ofsCodeC << layerName << "_grp" << g << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << layerName << "_grp" << g << "_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName << "_grp" << g << "_node));" << std::endl; - } - } - } - else { - std::string weights = layerName + "_W"; - std::string dim_weights = output + "_W"; - auto&& dim = tensorMap[dim_weights]; - if(codeType == "initialize") { - ofsCodeC << " vx_size " << weights << "_dims[4] = { " << dim[3] << ", " << dim[2] << ", " << dim[1] << ", " << dim[0] << " };" << std::endl; - ofsCodeC << " vx_tensor " << weights << ";" << std::endl; - ofsCodeC << " " << weights << " = vxCreateTensor(context,4, " << weights + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << weights << "); " << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(copyTensor(" << weights << ", dataFolder + \"/weights/" + layerName + ".f32\"));" << std::endl; - } - else if(codeType == "release") { - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << weights << "));" << std::endl; - } - declare_tensor_check[weights] = true; - std::string bias = "NULL"; - if(bias_term) { - bias = layerName + "_B"; - if(codeType == "initialize") { - ofsCodeC << " vx_size " << bias << "_dims[1] = { " << k << " };" << std::endl; - ofsCodeC << " vx_tensor " << bias << ";" << std::endl; - ofsCodeC << " " << bias << " = vxCreateTensor(context,1, " << bias + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << bias << "); " << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(copyTensor(" << bias << ", dataFolder + \"/bias/" + layerName + ".f32\"));" << std::endl; - } - else if(codeType == "release") { - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << bias << "));" << std::endl; - } - declare_tensor_check[bias] = true; - } - if(codeType == "initialize") { - ofsCodeC << " vx_nn_convolution_params_t " << layerName << "_params;" << std::endl; - ofsCodeC << " " << layerName + "_params.padding_x = " << pad_w << ";" << std::endl; - ofsCodeC << " " << layerName + "_params.padding_y = " << pad_h << ";" << std::endl; - ofsCodeC << " " << layerName + "_params.overflow_policy = " << convertPolicy << ";" << std::endl; - ofsCodeC << " " << layerName + "_params.rounding_policy = " << roundPolicy << ";" << std::endl; - ofsCodeC << " " << layerName + "_params.down_scale_size_rounding = " << "VX_NN_DS_SIZE_ROUNDING_FLOOR ;" << std::endl; - ofsCodeC << " " << layerName + "_params.dilation_x = " << dilation_w - 1 << ";" << std::endl; - ofsCodeC << " " << layerName + "_params.dilation_y = " << dilation_h - 1 << ";" << std::endl; - ofsCodeC << " vx_node " << layerName << "_node;" << std::endl; - ofsCodeC << " " << layerName + "_node = " << "vxConvolutionLayer(graph, " << inputName << ", " << weights << ", " << bias << ", &" << layerName + "_params, " << "sizeof(" << layerName + "_params ), " << layerName << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl; - } - } - } - else if(type == "Deconvolution") { - std::stringstream ss(params); - int k, kernel_w, kernel_h, stride_w, stride_h, pad_w, pad_h, dilation_w, dilation_h, bias_term; - ss >> k >> kernel_w >> kernel_h >> stride_w >> stride_h >> pad_w >> pad_h >> dilation_w >> dilation_h >> bias_term; - std::string weights = layerName + "_W"; - std::string dim_weights = output + "_W"; - auto&& dim = tensorMap[dim_weights]; - if(codeType == "initialize") { - ofsCodeC << " vx_size " << weights << "_dims[4] = { " << dim[3] << ", " << dim[2] << ", " << dim[1] << ", " << dim[0] << " };" << std::endl; - ofsCodeC << " vx_tensor " << weights << ";" << std::endl; - ofsCodeC << " " << weights + "= vxCreateTensor(context,4, " << weights + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << weights << "); " << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(copyTensor(" << weights << ", dataFolder + \"/weights/" + layerName + ".f32\"));" << std::endl; - } - else if(codeType == "release") { - ofsCodeC << " " << "vxReleaseTensor(&" << weights << " );" << std::endl; - } - declare_tensor_check[weights] = true; - std::string bias = "NULL"; - if(bias_term) { - bias = layerName + "_B"; - if(codeType == "initialize") { - ofsCodeC << " vx_size " << bias << "_dims[1] = { " << k << " };" << std::endl; - ofsCodeC << " vx_tensor " << bias << ";" << std::endl; - ofsCodeC << " " << bias + " = vxCreateTensor(context,1, " << bias + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << bias << "); " << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(copyTensor(" << bias << ", dataFolder + \"/bias/" + layerName + ".f32\"));" << std::endl; - } - else if(codeType == "release") { - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << bias << "));" << std::endl; - } - declare_tensor_check[bias] = true; - } - if(codeType == "initialize") { - ofsCodeC << " vx_nn_deconvolution_params_t " << layerName << "_params;" << std::endl; - ofsCodeC << " " << layerName + "_params.padding_x = " << pad_w << ";" << std::endl; - ofsCodeC << " " << layerName + "_params.padding_y = " << pad_h << ";" << std::endl; - ofsCodeC << " " << layerName + "_params.overflow_policy = " << convertPolicy << ";" << std::endl; - ofsCodeC << " " << layerName + "_params.rounding_policy = " << roundPolicy << ";" << std::endl; - ofsCodeC << " " << layerName + "_params.a_x = " << dilation_w - 1 << ";" << std::endl; - ofsCodeC << " " << layerName + "_params.a_y = " << dilation_h - 1 << ";" << std::endl; - ofsCodeC << " vx_node " << layerName << "_node;" << std::endl; - ofsCodeC << " " << layerName + "_node = " << " vxDeconvolutionLayer(graph, " << inputName << ", " << weights << ", " << bias << ", &" << layerName + "_params, sizeof(" + layerName + "_params), " << layerName << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl; - } - } - else if(type == "Pooling") { - std::stringstream ss(params); - int kernel_w, kernel_h, stride_w, stride_h, pad_w, pad_h, pool; - ss >> kernel_w >> kernel_h >> stride_w >> stride_h >> pad_w >> pad_h >> pool; - if((pool != 0 && pool != 1)) error("writeGDF: pooling_layer supports only MAX and AVG\n"); - if(codeType == "initialize") { - ofsCodeC << " vx_enum " << layerName << "_type = " << (pool == 0 ? "VX_NN_POOLING_MAX" : "VX_NN_POOLING_AVG") << ";" << std::endl; - ofsCodeC << " vx_size " << layerName << "_kernel_w = " << kernel_w << ";" << std::endl; - ofsCodeC << " vx_size " << layerName << "_kernel_h = " << kernel_h << ";" << std::endl; - ofsCodeC << " vx_size " << layerName << "_pad_w = " << pad_w << ";" << std::endl; - ofsCodeC << " vx_size " << layerName << "_pad_h = " << pad_h << ";" << std::endl; - ofsCodeC << " vx_enum " << layerName << "_roundPolicy = " << roundPolicy << ";" << std::endl; - ofsCodeC << " vx_node " << layerName << "_node;" << std::endl; - ofsCodeC << " " << layerName + "_node = " << "vxPoolingLayer(graph, " << inputName << ", " << layerName + "_type" << ", " << layerName + "_kernel_w, " << layerName + "_kernel_h, " - << layerName + "_pad_w, " << layerName + "_pad_h, " << layerName + "_roundPolicy, " << layerName << " );" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl; - } - } - else if(type == "InnerProduct") { - std::stringstream ss(params); - int k,bias_term; - ss >> k >> bias_term; - std::string weights = layerName + "_W"; - std::string dim_weights = output + "_W"; - auto&& dim = tensorMap[dim_weights]; - if(codeType == "initialize") { - ofsCodeC << " vx_size " << weights << "_dims[4] = { " << dim[3] << ", " << dim[2] << ", " << dim[1] << ", " << dim[0] << " };" << std::endl; - ofsCodeC << " vx_tensor " << weights << ";" << std::endl; - ofsCodeC << " " << weights << "= vxCreateTensor(context,4," << weights + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << weights << "); " << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(copyTensor(" << weights << ", dataFolder + \"/weights/" + layerName + ".f32\"));" << std::endl; - } - else if(codeType == "release") { - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << weights << "));" << std::endl; - } - declare_tensor_check[weights]= true; - std::string bias= "NULL"; - if(bias_term) { - bias = layerName + "_B"; - if(codeType == "initialize") { - ofsCodeC << " vx_size " << bias << "_dims[1] = { " << k << " };" << std::endl; - ofsCodeC << " vx_tensor " << bias << ";" << std::endl; - ofsCodeC << " " << bias << "= vxCreateTensor(context,1," << bias + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << bias << "); " << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(copyTensor(" << bias << ", dataFolder + \"/bias/" + layerName + ".f32\"));" << std::endl; - } - else if(codeType == "release") { - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << bias << "));" << std::endl; - } - declare_tensor_check[bias]= true; - } - if(codeType == "initialize") { - ofsCodeC << " vx_enum " << layerName << "_convertPolicy = " << convertPolicy << ";" << std::endl; - ofsCodeC << " vx_enum " << layerName << "_roundPolicy = " << roundPolicy << ";" << std::endl; - ofsCodeC << " vx_node " << layerName << "_node;" << std::endl; - ofsCodeC << " " << layerName + "_node = " << "vxFullyConnectedLayer( graph, " << inputName << ", " << weights << ", " << bias << ", " << layerName + "_convertPolicy, " << layerName + "_roundPolicy, " << layerName + ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl; - } - } - else if(type == "ReLU") { - std::stringstream ss(params); - float neg_slope; - ss >> neg_slope; - if(codeType == "initialize") { - if (!neg_slope) { - ofsCodeC << " vx_enum " << layerName << "_mode = " << "VX_NN_ACTIVATION_RELU ; " << std::endl; - ofsCodeC << " vx_float32 " << layerName << "_param_a = 0;" << std::endl; - } else - { - ofsCodeC << " vx_enum " << layerName << "_mode = " << "VX_NN_ACTIVATION_LEAKY_RELU ; " << std::endl; - ofsCodeC << " vx_float32 " << layerName << "_param_a = " << neg_slope << ";" << std::endl; - } - ofsCodeC << " vx_float32 " << layerName << "_param_b = 0;" << std::endl; - ofsCodeC << " vx_node " << layerName << "_node;" << std::endl; - ofsCodeC << " " << layerName + "_node = " << "vxActivationLayer(graph, " << inputName << ", " << layerName + "_mode, " << layerName + "_param_a, " << layerName + "_param_b, " << layerName << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl; - } - } - else if(type == "LRN") { - int normalization_size; float alpha,beta,k; - std::string norm_region; - std::stringstream ss(params); - ss >> normalization_size >> alpha >> beta >> norm_region >> k; - std::string lrnType; - lrnType = (norm_region == "1") ? "VX_NN_NORMALIZATION_SAME_MAP" : "VX_NN_NORMALIZATION_ACROSS_MAPS"; - if(codeType == "initialize") { - ofsCodeC << " vx_enum " << layerName << "_mode = " << lrnType << ";" << std::endl; - ofsCodeC << " vx_size " << layerName << "_size = " << normalization_size << ";" << std::endl; - ofsCodeC << " vx_float32 " << layerName << "_alpha = " << alpha << ";" << std::endl; - ofsCodeC << " vx_float32 " << layerName << "_beta = " << beta << ";" << std::endl; - ofsCodeC << " vx_float32 " << layerName << "_bias = " << k << ";" << std::endl; - ofsCodeC << " vx_node " << layerName << "_node;" << std::endl; - ofsCodeC << " " << layerName + "_node = " << "vxNormalizationLayer( graph, " << inputName << ", " << layerName + "_mode, " << layerName + "_size, " << layerName + "_alpha, " << layerName + "_beta, " - << layerName << " );" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl; - ofsCodeC << " if(" << layerName << "_bias != 1) {" << std::endl; - ofsCodeC << " vx_scalar s_bias = vxCreateScalarWithSize(context, VX_TYPE_FLOAT32, &" << layerName << "_bias, sizeof(" << layerName << "_bias));" << std::endl; - ofsCodeC << " ERROR_CHECK_OBJECT(s_bias);" << std::endl; - ofsCodeC << " ERROR_CHECK_STATUS(vxSetParameterByIndex(" << layerName << "_node, 6, (vx_reference) s_bias));" << std::endl; - ofsCodeC << " ERROR_CHECK_STATUS(vxReleaseScalar(&s_bias));" << std::endl; - ofsCodeC << " }" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl; - } - } - else if(type == "BatchNorm") { - int use_global_stats; - std::stringstream ss(params); - float eps; - ss >> eps >> use_global_stats; - std::string weights = layerName + "_W"; - std::string dim_weights = output + "_W"; - auto&& dim = tensorMap[dim_weights]; - if(codeType == "initialize") { - ofsCodeC << " vx_size " << weights << "_dims[1] = { " << dim[0] << " };" << std::endl; - ofsCodeC << " vx_float32 " << layerName << "_eps = " << eps << ";" << std::endl; - ofsCodeC << " vx_tensor " << weights << ";" << std::endl; - ofsCodeC << " " << weights << " = vxCreateTensor(context,1, " << weights + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << weights << "); " << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(copyTensor(" << weights << ", dataFolder + \"/weights/" + layerName + ".f32\"));" << std::endl; - } - else if(codeType == "release") { - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << weights << "));" << std::endl; - } - declare_tensor_check[weights] = true; - std::string bias = layerName + "_B"; - std::string dim_bias = output + "_B"; - dim = tensorMap[dim_bias]; - if(codeType == "initialize") { - ofsCodeC << " vx_size " << bias << "_dims[1] = { " << dim[0] << " };" << std::endl; - ofsCodeC << " vx_tensor " << bias << ";" << std::endl; - ofsCodeC << " " << bias << " = vxCreateTensor(context,1, " << bias + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << bias << "); " << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(copyTensor(" << bias << ", dataFolder + \"/bias/" + layerName + ".f32\"));" << std::endl; - } - else if(codeType == "release") { - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << bias << "));" << std::endl; - } - declare_tensor_check[bias] = true; - bias = "NULL"; - - if (bfuse_scale_layer) { - // check next node. If scale extract weight and bias paramters for scale layer. - int bias_term; - auto& next_node = *std::next(&node); - auto&& next_output = next_node[3]; - auto&& nn_params = next_node[1]; - std::string nn_layer_name = getIdentifierName(next_node[3]); - weights = nn_layer_name + "_W"; - std::string dim_weights = next_output + "_W"; - dim = tensorMap[dim_weights]; - if(codeType == "initialize") { - ofsCodeC << " vx_size " << weights << "_dims[1] = { " << dim[0] << " };" << std::endl; - ofsCodeC << " vx_tensor " << weights << ";" << std::endl; - ofsCodeC << " " << weights << " = vxCreateTensor(context,1, " << weights + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << weights << "); " << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(copyTensor(" << weights << ", dataFolder + \"/weights/" + nn_layer_name + ".f32\"));" << std::endl; - } - else if(codeType == "release") { - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << weights << "));" << std::endl; - } - declare_tensor_check[weights] = true; - - std::stringstream ss(nn_params); - ss >> bias_term; - if(bias_term) { - bias = nn_layer_name + "_B"; - std::string dim_bias = next_output + "_B"; - dim = tensorMap[dim_bias]; - if(codeType == "initialize") { - ofsCodeC << " vx_size " << bias << "_dims[1] = { " << dim[0] << " };" << std::endl; - ofsCodeC << " vx_tensor " << bias << ";" << std::endl; - ofsCodeC << " " << bias << " = vxCreateTensor(context,1, " << bias + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << bias << "); " << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(copyTensor(" << bias << ", dataFolder + \"/bias/" + nn_layer_name + ".f32\"));" << std::endl; - } - else if(codeType == "release") { - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << bias << "));" << std::endl; - } - declare_tensor_check[bias] = true; - } - if(codeType == "initialize") { - ofsCodeC << " vx_node " << layerName << "_node;" << std::endl; - ofsCodeC << " " << layerName + "_node = " << "vxBatchNormalizationLayer(graph, " - << inputName +", " - << layerName + "_W, " - << layerName + "_B, " - << weights+", " - << bias+", " - << layerName + "_eps, " - << nn_layer_name << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl; - } - else if(codeType == "release") { - } - } - else{ - // put default scale and bias term - std::vector scale_arr(dim[0]); - std::fill(scale_arr.begin(), scale_arr.end(), 1.0); - std::string fileName_weights = outputFolder + "/weights/scale_init.f32"; - FILE *fp = fopen(fileName_weights.c_str(), "wb"); - if (fp) { - fwrite(scale_arr.data(), sizeof(float), dim[0], fp); - fclose(fp); - } - weights = layerName +"_W1"; - if(codeType == "initialize") { - ofsCodeC << " vx_size " << weights << "_dims[1] = { " << dim[0] << " };" << std::endl; - ofsCodeC << " vx_tensor " << weights << ";" << std::endl; - ofsCodeC << " " << weights << " = vxCreateTensor(context,1, " << weights + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << weights << "); " << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(copyTensor(" << weights << ", dataFolder + \"/weights/scale_init.f32\"));" << std::endl; - } - else if(codeType == "release") { - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << weights << "));" << std::endl; - } - declare_tensor_check[weights] = true; - - if(codeType == "initialize") { - ofsCodeC << " vx_node " << layerName << "_node;" << std::endl; - ofsCodeC << " " << layerName + "_node = " << "vxBatchNormalizationLayer(graph, " - << inputName +", " - << layerName + "_W, " - << layerName + "_B, " - << weights+", " - << bias+", " - << layerName + "_eps, " - << layerName << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl; - } - else if(codeType == "release") { - } - } - } - else if(type == "Eltwise") { - int op; - std::stringstream ss(params); - ss >> op; - auto&& dim = tensorMap[output]; - for(int i=4; i < node.size(); i++) { - auto&& idim= tensorMap[node[i]]; - if(dim[0]!= idim[0] || dim[1] != idim[1] || dim[2] != idim[2] || dim[3] != idim[3]) - error("generateCode : Eltwise op=%d requires same dimension inputs : %s[%dx%dx%dx%d] != %s[%dx%dx%dx%d]\n", op, node[i].c_str(),idim[0], idim[1], idim[2], idim[3], node[i-1].c_str(), dim[0],dim[1],dim[2],dim[3]); - dim = idim; - } - std::string tmp = inputName; - for(int i=5; i < node.size() ; i++) { - std::string out = layerName; - if(i < node.size() - 1) { - out += "tmp_"+ std::to_string(i-4); - if(codeType == "initialize") { - ofsCodeC << " vx_size " << out << "_dim[4] = { " << dim[3] << ", " << dim[2] << ", " << dim[1] << ", " << dim[0] << " };" << std::endl; - ofsCodeC << " vx_tensor " << out << "; " << std::endl; - ofsCodeC << " " << out << "= vxCreateTensor(context,4, " << out + "_dim, " << tensorType << ", " << fixedPosition << ");" << std::endl; - } - declare_tensor_check[out]= true; - } - if(op == 1) { - if(codeType == "initialize") { - ofsCodeC << " vx_enum " << layerName << "_convertPolicy = " << convertPolicy << ";" << std::endl; - ofsCodeC << " vx_node " << layerName <<"_node;" << std::endl; - ofsCodeC << " " << layerName + "_node = " << "vxTensorAddNode(graph, " << tmp << ", " << getIdentifierName(node[i]) << ", " << layerName + "_convertPolicy, " << out << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl; - } - tmp = out; - } - else error("generateCode : Eltwise op=%d not supported\n", op); - } - } - else if(type == "Scale") { - int bias_term; - std::stringstream ss(params); ss >> bias_term; - - std::string weights = layerName + "_W"; - std::string dim_weights = output + "_W"; - auto&& dim = tensorMap[dim_weights]; - if(codeType == "initialize") { - ofsCodeC << " vx_size " << weights << "_dims[1] = { " << dim[0] << " };" << std::endl; - ofsCodeC << " vx_tensor " << weights << ";" << std::endl; - ofsCodeC << " " << weights << " = vxCreateTensor(context,1, " << weights + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << weights << "); " << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(copyTensor(" << weights << ", dataFolder + \"/weights/" + layerName + ".f32\"));" << std::endl; - } - else if(codeType == "release") { - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << weights << "));" << std::endl; - } - declare_tensor_check[weights] = true; - std::string bias = "NULL"; - if(bias_term) { - bias = layerName + "_B"; - std::string dim_bias = output + "_B"; - dim = tensorMap[dim_bias]; - if(codeType == "initialize") { - ofsCodeC << " vx_size " << bias << "_dims[1] = { " << dim[0] << " };" << std::endl; - ofsCodeC << " vx_tensor " << bias << ";" << std::endl; - ofsCodeC << " " << bias << " = vxCreateTensor(context,1, " << bias + "_dims, " << tensorType << ", " << fixedPosition << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << bias << "); " << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(copyTensor(" << bias << ", dataFolder + \"/bias/" + layerName + ".f32\"));" << std::endl; - } - else if(codeType == "release") { - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << bias << "));" << std::endl; - } - declare_tensor_check[bias] = true; - } - if(codeType == "initialize") { - ofsCodeC << " vx_node " << layerName << "_node;" << std::endl; - ofsCodeC << " " << layerName + "_node = " << "vxScaleLayer(graph, " - << inputName +", " - << layerName + "_W, " - << bias + ", " - << layerName << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl; - } - else if(codeType == "release") { - } - } - else if(type == "Concat") { - if(codeType == "initialize") { - ofsCodeC << " vx_node " << layerName << "_node;" << std::endl; - ofsCodeC << " " << layerName + "_node = " << "vxConcatLayer(graph, "; - ofsCodeC << layerName; - int param_count = 0; - for(int i = 4; i < node.size(); i++) { - std::string layerInputs = getIdentifierName(node[i]); - ofsCodeC << ", " << layerInputs; - param_count++; - } - while(param_count < 8) { - ofsCodeC << ", NULL"; - param_count++; - } - ofsCodeC << " );" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl; - } - } - else if(type == "Dropout") { - //during inference dropout layer propogates input to output . - if(codeType == "initialize") { - ofsCodeC << " vx_node " << layerName << "_node;" << std::endl; - ofsCodeC << " " << layerName + "_node = " << "vxCopyNode( graph, (vx_reference)" << inputName << ", (vx_reference)" << layerName << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl; - } - } - else if(type == "Softmax") { - if(codeType == "initialize") { - ofsCodeC << " vx_node " << layerName << "_node;" << std::endl; - ofsCodeC << " " << layerName + "_node = " << "vxSoftmaxLayer(graph, " << inputName << ", " << layerName << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl; - } - } - else if(type == "Split") { - if(codeType == "initialize") { - ofsCodeC << " vx_node " << layerName << "_node;" << std::endl; - ofsCodeC << " " << layerName + "_node = " << "vxCopyNode( graph, (vx_reference)"<< inputName << ", (vx_reference)" << layerName << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl; - } - } - else if(type == "SoftmaxWithLoss") { - if(codeType == "initialize") { - ofsCodeC << " vx_node " << layerName << "_node;" << std::endl; - ofsCodeC << " " << layerName + "_node = " << "vxSoftmaxLayer(graph, " << inputName << ", " << layerName << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" + layerName + "_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << layerName + "_node));" << std::endl; - } - } - if(codeType== "initialize") - ofsCodeC << std::endl; - } -} - -void generateCopyImageCode(std::ostream& ofsCodeC) -{ - ofsCodeC << "static vx_status copyImage(vx_image image, std::string fileName, vx_enum usage = VX_WRITE_ONLY)" << std::endl - << "{" << std::endl - << " vx_uint32 width = 0, height = 0;" << std::endl - << " vxQueryImage(image, VX_IMAGE_WIDTH, &width, sizeof(vx_uint32));" << std::endl - << " vxQueryImage(image, VX_IMAGE_HEIGHT, &height, sizeof(vx_uint32));" << std::endl - << " vx_rectangle_t rect = { 0, 0, width, height };" << std::endl - << " vx_imagepatch_addressing_t addr;" << std::endl - << " vx_uint8 * ptr = NULL;" << std::endl - << " vx_map_id map_id;" << std::endl - << " vx_status status = vxMapImagePatch(image, &rect, 0, &map_id, &addr, (void **)&ptr, usage, VX_MEMORY_TYPE_HOST, VX_NOGAP_X);" << std::endl - << " if(status) {" << std::endl - << " std::cerr << \"ERROR: vxMapImagePatch() failed for \" << fileName << std::endl;" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " vx_uint32 width_in_bytes = (width * addr.stride_x);" << std::endl - << " FILE * fp = fopen(fileName.c_str(), usage == VX_WRITE_ONLY ? \"rb\" : \"wb\");" << std::endl - << " if(!fp) {" << std::endl - << " std::cerr << \"ERROR: unable to open: \" << fileName << std::endl;" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " for (vx_uint32 y = 0; y < height; y += addr.step_y) {" << std::endl - << " vx_uint8 * line = (vx_uint8 *)vxFormatImagePatchAddress2d(ptr, 0, y, &addr);" << std::endl - << " if(usage == VX_WRITE_ONLY) {" << std::endl - << " vx_size n = fread(line, sizeof(vx_uint8), width_in_bytes, fp);" << std::endl - << " if(n != width_in_bytes) {" << std::endl - << " std::cerr << \"ERROR: expected char[\" << height*width_in_bytes << \"], but got char[\" << y*width_in_bytes+n << \"] in \" << fileName << std::endl;" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " }" << std::endl - << " else {" << std::endl - << " fwrite(line, sizeof(vx_uint8), width_in_bytes, fp);" << std::endl - << " }" << std::endl - << " }" << std::endl - << " fclose(fp);" << std::endl - << " status = vxUnmapImagePatch(image, map_id);" << std::endl - << " if(status) {" << std::endl - << " std::cerr << \"ERROR: vxUnmapImagePatch() failed for \" << fileName << std::endl;" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " return 0;" << std::endl - << "}" << std::endl << std::endl; -} - -void generateCopyTensorCode(std::ostream& ofsCodeC) -{ - ofsCodeC << "static vx_status copyTensor(vx_tensor tensor, std::string fileName, vx_enum usage = VX_WRITE_ONLY)" << std::endl - << "{" << std::endl - << " vx_enum data_type = VX_TYPE_FLOAT32;" << std::endl - << " vx_size num_of_dims = 4, dims[4] = { 1, 1, 1, 1 }, stride[4];" << std::endl - << " vxQueryTensor(tensor, VX_TENSOR_DATA_TYPE, &data_type, sizeof(data_type));" << std::endl - << " vxQueryTensor(tensor, VX_TENSOR_NUMBER_OF_DIMS, &num_of_dims, sizeof(num_of_dims));" << std::endl - << " vxQueryTensor(tensor, VX_TENSOR_DIMS, &dims, sizeof(dims[0])*num_of_dims);" << std::endl - << " vx_size itemsize = sizeof(float);" << std::endl - << " if(data_type == VX_TYPE_UINT8 || data_type == VX_TYPE_INT8) {" << std::endl - << " itemsize = sizeof(vx_uint8);" << std::endl - << " }" << std::endl - << " else if(data_type == VX_TYPE_UINT16 || data_type == VX_TYPE_INT16 || data_type == VX_TYPE_FLOAT16) {" << std::endl - << " itemsize = sizeof(vx_uint16);" << std::endl - << " }" << std::endl - << " vx_size count = dims[0] * dims[1] * dims[2] * dims[3];" << std::endl - << " vx_map_id map_id;" << std::endl - << " float * ptr;" << std::endl - << " vx_status status = vxMapTensorPatch(tensor, num_of_dims, nullptr, nullptr, &map_id, stride, (void **)&ptr, usage, VX_MEMORY_TYPE_HOST);" << std::endl - << " if(status) {" << std::endl - << " std::cerr << \"ERROR: vxMapTensorPatch() failed for \" << fileName << std::endl;" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " FILE * fp = fopen(fileName.c_str(), usage == VX_WRITE_ONLY ? \"rb\" : \"wb\");" << std::endl - << " if(!fp) {" << std::endl - << " std::cerr << \"ERROR: unable to open: \" << fileName << std::endl;" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " if(usage == VX_WRITE_ONLY) {" << std::endl - << " vx_size n = fread(ptr, itemsize, count, fp);" << std::endl - << " if(n != count) {" << std::endl - << " std::cerr << \"ERROR: expected char[\" << count*itemsize << \"], but got char[\" << n*itemsize << \"] in \" << fileName << std::endl;" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " }" << std::endl - << " else {" << std::endl - << " fwrite(ptr, itemsize, count, fp);" << std::endl - << " }" << std::endl - << " fclose(fp);" << std::endl - << " status = vxUnmapTensorPatch(tensor, map_id);" << std::endl - << " if(status) {" << std::endl - << " std::cerr << \"ERROR: vxUnmapTensorPatch() failed for \" << fileName << std::endl;" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " return 0;" << std::endl - << "}" << std::endl << std::endl; -} - -void generateCode( - std::ostream& ofsCodeH, - std::ostream& ofsCodeC, - std::ofstream& ofsCodeM, - std::ofstream& ofsCodeA, - std::ofstream& ofsCodeD, - std::vector>& net, - std::map>& tensorMap, - std::string tensorType, - int fixedPointPosition, - std::string convertPolicy, - std::string roundPolicy, - bool isVirtualEnabled, - std::string outputFolder, - bool bInputIsImage, - std::string inputImageType, - bool bInputChannelReverse, - double fInputConversionA, - double fInputConversionB, - bool bOutputArgmax, - bool bOutputIsImage, - std::string argmaxOutputDataType, - int argmaxTopK, - std::vector& argmaxLut, - bool bEnableErrorMessages, - bool bFuseScaleLayer) -{ - std::string annApiName = "annCreateGraph"; - if(bInputIsImage) annApiName += "WithInputImage"; - if(bOutputArgmax) annApiName += (bOutputIsImage ? "WithArgmaxImage" : "WithArgmaxTensor"); - if(argmaxLut.size() > 0) annApiName += "WithLut"; - - //// - // generate .h file - // - ofsCodeH << "#ifndef annmodule_h" << std::endl - << "#define annmodule_h" << std::endl - << std::endl - << "#include " << std::endl - << std::endl - << "extern \"C\" {" << std::endl - << " VX_API_ENTRY void VX_API_CALL annGetTensorDimensions(vx_size dimInput[4], vx_size dimOutput[4]);" << std::endl; - ofsCodeH << " VX_API_ENTRY vx_graph VX_API_CALL " << annApiName << "(vx_context context, " - << (bInputIsImage ? "vx_image" : "vx_tensor") << " input, " - << (bOutputIsImage ? "vx_image" : " vx_tensor") << " output, const char * options);" << std::endl; - ofsCodeH << "};" << std::endl - << std::endl - << "#endif" << std::endl; - - //// - // generate .cpp file - // - ofsCodeC << "#include \"annmodule.h\"" << std::endl << std::endl; - ofsCodeC << "#include " << std::endl; - ofsCodeC << "#include " << std::endl; - ofsCodeC << "#include " << std::endl<< std::endl; - ofsCodeC << "#include " << std::endl; - ofsCodeC << "#include " << std::endl; - ofsCodeC << "#include " << std::endl << std::endl; - - ofsCodeC << "#define ERROR_CHECK_STATUS(call) { vx_status status = (call); if(status != VX_SUCCESS) { vxAddLogEntry((vx_reference)context, status, \"ERROR: failed with status = (%d) at \" __FILE__ \"#%d\\n\", status, __LINE__); return nullptr; } }" << std::endl; - ofsCodeC << "#define ERROR_CHECK_OBJECT(obj) { vx_status status = vxGetStatus((vx_reference)(obj)); if(status != VX_SUCCESS) { vxAddLogEntry((vx_reference)context, status, \"ERROR: failed with status = (%d) at \" __FILE__ \"#%d\\n\", status, __LINE__); return nullptr; } }" << std::endl << std::endl; - - generateCopyTensorCode(ofsCodeC); - - auto&& input = net[0][4]; - auto&& output = net[net.size()-1][3]; - auto&& idim = tensorMap[input]; - auto&& odim = tensorMap[output]; - ofsCodeC << "VX_API_ENTRY void VX_API_CALL annGetTensorDimensions(vx_size dimInput[4], vx_size dimOutput[4])" << std::endl - << "{" << std::endl - << " dimInput[0] = " << idim[3] << ";" << std::endl - << " dimInput[1] = " << idim[2] << ";" << std::endl - << " dimInput[2] = " << idim[1] << ";" << std::endl - << " dimInput[3] = " << idim[0] << ";" << std::endl - << " dimOutput[0] = " << odim[3] << ";" << std::endl - << " dimOutput[1] = " << odim[2] << ";" << std::endl - << " dimOutput[2] = " << odim[1] << ";" << std::endl - << " dimOutput[3] = " << odim[0] << ";" << std::endl - << "}" << std::endl << std::endl; - if(bOutputArgmax) { - if(argmaxOutputDataType == "VX_TYPE_UINT8" && odim[1] >= 256) { - printf("ERROR: output argmax tensor type VX_TYPE_UINT8 can't hold channel numbers upto %d\n", odim[1]); - exit(1); - } - if(argmaxLut.size() > 0 && argmaxLut.size() < odim[1]) { - printf("ERROR: argmax LUT requires at least %d entries: got %ld entries\n", odim[1], argmaxLut.size()); - exit(1); - } - } - - ofsCodeC << "VX_API_ENTRY vx_graph VX_API_CALL " << annApiName << "(vx_context context, " - << (bInputIsImage ? "vx_image" : "vx_tensor") << " " << input << (bInputIsImage ? "__image" : "") << ", " - << (bOutputIsImage ? "vx_image" : "vx_tensor") << " " << output << (bOutputArgmax ? "__argmax" : "") << ", const char * dataFolder_)" << std::endl; - ofsCodeC << "{" << std::endl; - ofsCodeC << " // load neural network extension kernels" << std::endl; - ofsCodeC << " ERROR_CHECK_STATUS(vxLoadKernels(context,\"vx_nn\"));" << std::endl; - ofsCodeC << std::endl; - ofsCodeC << " // create graph" << std::endl; - ofsCodeC << " vx_graph graph = vxCreateGraph(context); " << std::endl; - ofsCodeC << " ERROR_CHECK_OBJECT(graph);" << std::endl; - ofsCodeC << std::endl; - ofsCodeC << " // get dataFolder option" << std::endl; - ofsCodeC << " std::string dataFolder = dataFolder_ ? dataFolder_ : \".\", fileName;" << std::endl; - ofsCodeC << std::endl; - ofsCodeC << " ////" << std::endl; - ofsCodeC << " // initialize the graph" << std::endl; - if(bInputIsImage) { - if(inputImageType == "VX_DF_IMAGE_RGB" && idim[1] != 3) { - printf("ERROR: need input channels to be 3 to use input as an RGB/BGR images: got input C = %d\n", idim[1]); - exit(1); - } - else if(inputImageType == "VX_DF_IMAGE_U8" && idim[1] != 1) { - printf("ERROR: need input channels to be 1 to use input as an U8 images: got input C = %d\n", idim[1]); - exit(1); - } - ofsCodeC << " vx_size " << input << "_dims[4] = { " << idim[3] << ", " << idim[2] << ", " << idim[1] << ", " << idim[0] << " };" << std::endl; - ofsCodeC << " vx_tensor " << input << ";" << std::endl; - if(isVirtualEnabled) { - ofsCodeC << " " << input << " = vxCreateVirtualTensor(graph, 4, " << input + "_dims,"<< tensorType <<", " << fixedPointPosition << ");" << std::endl; - } - else { - ofsCodeC << " " << input << " = vxCreateTensor(context, 4, " << input + "_dims,"<< tensorType <<", " << fixedPointPosition << ");" << std::endl; - } - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << input << ");" << std::endl; - ofsCodeC << " vx_node " << input << "_image_conversion_node;" << std::endl; - ofsCodeC << " " << input + "_image_conversion_node = " << "vxConvertImageToTensorNode(graph, " << input << "__image, " << input << ", " << fInputConversionA << ", " << fInputConversionB << ", " << (bInputChannelReverse ? "vx_true_e" : "vx_false_e") << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" + input + "_image_conversion_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << input + "_image_conversion_node));" << std::endl; - } - if(bOutputArgmax) { - ofsCodeC << " vx_size " << output << "_dims[4] = { " << odim[3] << ", " << odim[2] << ", 1, " << odim[0] << " };" << std::endl; - ofsCodeC << " vx_tensor " << output << ";" << std::endl; - if(isVirtualEnabled) { - ofsCodeC << " " << output << " = vxCreateVirtualTensor(graph, 4, " << output + "_dims,"<< tensorType <<", " << fixedPointPosition << ");" << std::endl; - } - else { - ofsCodeC << " " << output << " = vxCreateTensor(context, 4, " << output + "_dims,"<< tensorType <<", " << fixedPointPosition << ");" << std::endl; - } - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << output << ");" << std::endl; - } - writeVXCode(ofsCodeC, net, tensorMap, tensorType, fixedPointPosition, convertPolicy, roundPolicy, isVirtualEnabled, bFuseScaleLayer, outputFolder, "initialize"); - if(bOutputArgmax) { - std::string argmaxOutputName = output + "__argmax"; - if(bOutputIsImage && argmaxOutputDataType == "VX_DF_IMAGE_U8" && argmaxLut.size() >= odim[1]) { - ofsCodeC << " vx_image " << argmaxOutputName << "_labels;" << std::endl; - if(isVirtualEnabled) { - ofsCodeC << " " << argmaxOutputName << "_labels = vxCreateVirtualImage(graph, " << odim[3] << ", " << (odim[2]*odim[0]) << ", VX_DF_IMAGE_U8);" << std::endl; - } - else { - ofsCodeC << " " << argmaxOutputName << "_labels = vxCreateImage(context, " << odim[3] << ", " << (odim[2]*odim[0]) << ", VX_DF_IMAGE_U8);" << std::endl; - } - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << argmaxOutputName << "_labels);" << std::endl; - ofsCodeC << " vx_node " << output << "_argmax_node;" << std::endl; - ofsCodeC << " " << output + "_argmax_node = " << "vxArgmaxLayer(graph, " << output << ", (vx_reference)" << argmaxOutputName << "_labels);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" + output + "_argmax_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << output << "_argmax_node));" << std::endl; - for(int i = 0; i < 3; i++) { - std::string lutName = output + "__lut" + i["RGB"]; - std::string chanName = output + "__channel" + i["RGB"]; - ofsCodeC << " vx_lut " << lutName << " = vxCreateLUT(context, VX_TYPE_UINT8, 256);" << std::endl; - ofsCodeC << " vx_uint8 " << lutName << "_tbl[256] = {"; - for(int j = 0; j < odim[1]; j++) { - if((j & 15) == 0) { - ofsCodeC << std::endl << " "; - } - ofsCodeC << ((argmaxLut[j] >> (i * 8)) & 255) << ", "; - } - ofsCodeC << std::endl; - ofsCodeC << " };" << std::endl; - ofsCodeC << " ERROR_CHECK_STATUS(vxCopyLUT(" << lutName << ", " << lutName << "_tbl, VX_WRITE_ONLY, VX_MEMORY_TYPE_HOST));" << std::endl; - ofsCodeC << " vx_image " << chanName << ";" << std::endl; - if(isVirtualEnabled) { - ofsCodeC << " " << chanName << " = vxCreateVirtualImage(graph, " << odim[3] << ", " << (odim[2]*odim[0]) << ", VX_DF_IMAGE_U8);" << std::endl; - } - else { - ofsCodeC << " " << chanName << " = vxCreateImage(context, " << odim[3] << ", " << (odim[2]*odim[0]) << ", VX_DF_IMAGE_U8);" << std::endl; - } - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << chanName << ");" << std::endl; - ofsCodeC << " vx_node " << chanName << "_node;" << std::endl; - ofsCodeC << " " << chanName + "_node = " << "vxTableLookupNode(graph, " << argmaxOutputName << ", " << lutName << ", " << chanName << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" + chanName + "_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << chanName << "_node));" << std::endl; - } - ofsCodeC << " vx_node " << output << "_combine_node;" << std::endl; - ofsCodeC << " " << output + "_combine_node = " << "vxChannelCombineNode(graph, " << output << "__channelR, " << output << "__channelG, " << output << "__channelB, NULL, " << argmaxOutputName << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << output << "_combine_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << output << "_combine_node));" << std::endl; - } - else { - ofsCodeC << " vx_node " << output << "_argmax_node;" << std::endl; - ofsCodeC << " " << output + "_argmax_node = " << "vxArgmaxLayer(graph, " << output << ", (vx_reference)" << argmaxOutputName << ");" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_OBJECT(" << output << "_argmax_node);" << std::endl; - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseNode(&" << output << "_argmax_node));" << std::endl; - } - } - ofsCodeC << " ////" << std::endl; - ofsCodeC << " // release intermediate objects" << std::endl; - if(bInputIsImage) { - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << input << "));" << std::endl; - } - writeVXCode(ofsCodeC, net, tensorMap, tensorType, fixedPointPosition, convertPolicy, roundPolicy, isVirtualEnabled, bFuseScaleLayer, outputFolder, "release"); - if(bOutputArgmax) { - ofsCodeC << " " << "ERROR_CHECK_STATUS(vxReleaseTensor(&" << output << "));" << std::endl; - } - ofsCodeC << std::endl; - ofsCodeC << " ////" << std::endl; - ofsCodeC << " // verify the built graph" << std::endl; - ofsCodeC << " ERROR_CHECK_STATUS(vxVerifyGraph(graph));" << std::endl; - ofsCodeC << std::endl; - ofsCodeC << " return graph;" << std::endl; - ofsCodeC << "}" << std::endl; - - ///// - // generate CMakeLists.txt - // - ofsCodeM << "cmake_minimum_required(VERSION 3.5)" << std::endl; - ofsCodeM << "project (annmodule)" << std::endl; - ofsCodeM << "set (CMAKE_CXX_STANDARD 14)" << std::endl; - ofsCodeM << "list(APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake)" << std::endl; - ofsCodeM << "find_package(OpenCL REQUIRED)" << std::endl; - ofsCodeM << "include_directories (${OpenCL_INCLUDE_DIRS} ${OpenCL_INCLUDE_DIRS}/Headers )" << std::endl; - ofsCodeM << "include_directories (/opt/rocm/include/mivisionx)" << std::endl; - ofsCodeM << "link_directories (/opt/rocm/lib)" << std::endl; - ofsCodeM << "list(APPEND SOURCES annmodule.cpp)" << std::endl; - ofsCodeM << "add_library(${PROJECT_NAME} SHARED ${SOURCES})" << std::endl; - ofsCodeM << "set(CMAKE_CXX_FLAGS \"${CMAKE_CXX_FLAGS} -msse4.2 -std=gnu++14\")" << std::endl; - ofsCodeM << "target_link_libraries(${PROJECT_NAME} openvx vx_nn pthread)" << std::endl; - ofsCodeM << "add_executable(anntest anntest.cpp)" << std::endl; - ofsCodeM << "target_link_libraries(anntest openvx vx_nn pthread ${PROJECT_NAME})" << std::endl; - - ///// - // generate simple application - // - ofsCodeA << "#include \"annmodule.h\"" << std::endl ; - ofsCodeA << "#include " << std::endl; - ofsCodeA << "#include " << std::endl; - ofsCodeA << "#include " << std::endl; - ofsCodeA << "#include " << std::endl; - ofsCodeA << "#include " << std::endl; - ofsCodeA << "#include " << std::endl; - ofsCodeA << "#include " << std::endl; - ofsCodeA << "#include " << std::endl; - ofsCodeA << std::endl; - ofsCodeA << "#define ERROR_CHECK_STATUS(call) { vx_status status = (call); if(status != VX_SUCCESS) { printf(\"ERROR: failed with status = (%d) at \" __FILE__ \"#%d\", status, __LINE__); return -1; } }" << std::endl; - ofsCodeA << std::endl; - if(bEnableErrorMessages) { - ofsCodeA << "static void VX_CALLBACK log_callback(vx_context context, vx_reference ref, vx_status status, const vx_char string[])" << std::endl; - ofsCodeA << "{" << std::endl; - ofsCodeA << " size_t len = strlen(string);" << std::endl; - ofsCodeA << " if (len > 0) {" << std::endl; - ofsCodeA << " printf(\"%s\", string);" << std::endl; - ofsCodeA << " if (string[len - 1] != '\\n')" << std::endl; - ofsCodeA << " printf(\"\\n\");" << std::endl; - ofsCodeA << " fflush(stdout);" << std::endl; - ofsCodeA << " }" << std::endl; - ofsCodeA << "}" << std::endl; - ofsCodeA << std::endl; - } - ofsCodeA << "inline int64_t clockCounter()" << std::endl; - ofsCodeA << "{" << std::endl; - ofsCodeA << " return std::chrono::high_resolution_clock::now().time_since_epoch().count();" << std::endl; - ofsCodeA << "}" << std::endl; - ofsCodeA << std::endl; - ofsCodeA << "inline int64_t clockFrequency()" << std::endl; - ofsCodeA << "{" << std::endl; - ofsCodeA << " return std::chrono::high_resolution_clock::period::den / std::chrono::high_resolution_clock::period::num;" << std::endl; - ofsCodeA << "}" << std::endl; - ofsCodeA << std::endl; - - if(bInputIsImage || bOutputIsImage) { - generateCopyImageCode(ofsCodeA); - } - if(!(bInputIsImage && bOutputIsImage)) { - generateCopyTensorCode(ofsCodeA); - } - - ofsCodeA << "int main(int argc , char ** argv)" << std::endl; - ofsCodeA << "{" << std::endl; - ofsCodeA << " // get module configuration" << std::endl; - ofsCodeA << " vx_size dimInput[4] = { 0 }, dimOutput[4] = { 0 };" << std::endl; - ofsCodeA << " annGetTensorDimensions(dimInput, dimOutput);" << std::endl; - ofsCodeA << " printf(\"OK: annGetTensorDimensions() => [input %ldx%ldx%ldx%ld] [output %ldx%ldx%ldx%ld]\\n\", dimInput[0], dimInput[1], dimInput[2], dimInput[3], dimOutput[0], dimOutput[1], dimOutput[2], dimOutput[3]);" << std::endl; - ofsCodeA << std::endl; - ofsCodeA << " // create context, input, output, and graph" << std::endl; - if(bEnableErrorMessages) { - ofsCodeA << " vxRegisterLogCallback(NULL, log_callback, vx_false_e);" << std::endl; - } - ofsCodeA << " vx_context context = vxCreateContext();" << std::endl; - ofsCodeA << " if(vxGetStatus((vx_reference)context)) {" << std::endl; - ofsCodeA << " printf(\"ERROR: vxCreateContext() failed\\n\");" << std::endl; - ofsCodeA << " return -1;" << std::endl; - ofsCodeA << " }" << std::endl; - if(bEnableErrorMessages) { - ofsCodeA << " vxRegisterLogCallback(context, log_callback, vx_false_e);" << std::endl; - } - if(bInputIsImage) { - ofsCodeA << " vx_image input = vxCreateImage(context, (vx_uint32)dimInput[0], (vx_uint32)(dimInput[1]*dimInput[3]), " << inputImageType << ");" << std::endl; - ofsCodeA << " if(vxGetStatus((vx_reference)input)) {" << std::endl; - ofsCodeA << " printf(\"ERROR: vxCreateImage(input,%ld,%ld," << inputImageType << ") failed\\n\", dimInput[0], dimInput[1]*dimInput[3]);" << std::endl; - ofsCodeA << " return -1;" << std::endl; - ofsCodeA << " }" << std::endl; - } - else { - ofsCodeA << " vx_tensor input = vxCreateTensor(context, 4, dimInput, VX_TYPE_FLOAT32, 0);" << std::endl; - ofsCodeA << " if(vxGetStatus((vx_reference)input)) {" << std::endl; - ofsCodeA << " printf(\"ERROR: vxCreateTensor(input,4,{%ld,%ld,%ld,%ld}) failed\\n\", dimInput[0], dimInput[1], dimInput[2], dimInput[3]);" << std::endl; - ofsCodeA << " return -1;" << std::endl; - ofsCodeA << " }" << std::endl; - } - if(bOutputArgmax) { - if(bOutputIsImage) { - std::string outputImageFormat = argmaxOutputDataType; - if(argmaxLut.size() > 0) { - outputImageFormat = "VX_DF_IMAGE_RGB"; - } - ofsCodeA << " vx_image output = vxCreateImage(context, (vx_uint32)dimOutput[0], (vx_uint32)(dimOutput[1]*dimOutput[3]), " << outputImageFormat << ");" << std::endl; - ofsCodeA << " if(vxGetStatus((vx_reference)output)) {" << std::endl; - ofsCodeA << " printf(\"ERROR: vxCreateImage(output,%ld,%ld," << outputImageFormat << ") failed\\n\", dimOutput[0], dimOutput[1]*dimOutput[3]);" << std::endl; - ofsCodeA << " return -1;" << std::endl; - ofsCodeA << " }" << std::endl; - } - else { - ofsCodeA << " vx_size dimArgmax[4] = { dimOutput[0], dimOutput[1], " << argmaxTopK << ", dimOutput[3] };" << std::endl; - ofsCodeA << " vx_tensor output = vxCreateTensor(context, 4, dimArgmax, " << argmaxOutputDataType << ", 0);" << std::endl; - ofsCodeA << " if(vxGetStatus((vx_reference)output)) {" << std::endl; - ofsCodeA << " printf(\"ERROR: vxCreateTensor(output,4,{%ld,%ld,%ld,%ld}," << argmaxOutputDataType << ",0) failed\\n\", dimArgmax[0], dimArgmax[1], dimArgmax[2], dimArgmax[3]);" << std::endl; - ofsCodeA << " return -1;" << std::endl; - ofsCodeA << " }" << std::endl; - } - } - else { - ofsCodeA << " vx_tensor output = vxCreateTensor(context, 4, dimOutput, VX_TYPE_FLOAT32, 0);" << std::endl; - ofsCodeA << " if(vxGetStatus((vx_reference)output)) {" << std::endl; - ofsCodeA << " printf(\"ERROR: vxCreateTensor(output,4,{%ld,%ld,%ld,%ld},VX_TYPE_FLOAT32,0) failed\\n\", dimOutput[0], dimOutput[1], dimOutput[2], dimOutput[3]);" << std::endl; - ofsCodeA << " return -1;" << std::endl; - ofsCodeA << " }" << std::endl; - } - ofsCodeA << std::endl; - ofsCodeA << " // build graph from the module" << std::endl; - ofsCodeA << " int64_t freq = clockFrequency(), t0, t1;" << std::endl; - ofsCodeA << " t0 = clockCounter();" << std::endl; - ofsCodeA << " vx_graph graph = " << annApiName << "(context, input, output, argc > 1 ? argv[1] : nullptr);" << std::endl; - ofsCodeA << " t1 = clockCounter();" << std::endl; - ofsCodeA << " if(vxGetStatus((vx_reference)graph)) {" << std::endl; - ofsCodeA << " printf(\"ERROR: " << annApiName << "(...,%s) failed\\n\", argv[1]);" << std::endl; - ofsCodeA << " return -1;" << std::endl; - ofsCodeA << " }" << std::endl; - ofsCodeA << " printf(\"OK: " << annApiName << "() took %.3f msec\\n\", (float)(t1-t0)*1000.0f/(float)freq);" << std::endl; - ofsCodeA << std::endl; - if(bInputIsImage) { - ofsCodeA << " if(argc > 2) {" << std::endl; - ofsCodeA << " if(copyImage(input, argv[2], VX_WRITE_ONLY) < 0) {" << std::endl; - ofsCodeA << " return -1;" << std::endl; - ofsCodeA << " }" << std::endl; - ofsCodeA << " printf(\"OK: read %ldx%ld image from %s\\n\", dimInput[0], dimInput[1], argv[2]);" << std::endl; - ofsCodeA << " }" << std::endl; - } - else { - ofsCodeA << " if(argc > 2) {" << std::endl; - ofsCodeA << " if(copyTensor(input, argv[2], VX_WRITE_ONLY) < 0) {" << std::endl; - ofsCodeA << " return -1;" << std::endl; - ofsCodeA << " }" << std::endl; - ofsCodeA << " printf(\"OK: read %ldx%ldx%ldx%ld tensor from %s\\n\", dimInput[3], dimInput[2], dimInput[1], dimInput[0], argv[2]);" << std::endl; - ofsCodeA << " }" << std::endl; - } - ofsCodeA << std::endl; - ofsCodeA << " t0 = clockCounter();" << std::endl; - ofsCodeA << " vx_status status = vxProcessGraph(graph);" << std::endl; - ofsCodeA << " t1 = clockCounter();" << std::endl; - ofsCodeA << " if(status != VX_SUCCESS) {" << std::endl; - ofsCodeA << " printf(\"ERROR: vxProcessGraph() failed (%d)\\n\", status);" << std::endl; - ofsCodeA << " return -1;" << std::endl; - ofsCodeA << " }" << std::endl; - ofsCodeA << " printf(\"OK: vxProcessGraph() took %.3f msec (1st iteration)\\n\", (float)(t1-t0)*1000.0f/(float)freq);" << std::endl; - ofsCodeA << std::endl; - if(bOutputIsImage) { - ofsCodeA << " if(argc > 3) {" << std::endl; - ofsCodeA << " if(copyImage(output, argv[3], VX_READ_ONLY) < 0) {" << std::endl; - ofsCodeA << " return -1;" << std::endl; - ofsCodeA << " }" << std::endl; - ofsCodeA << " printf(\"OK: wrote %ldx%ld image into %s\\n\", dimOutput[0], dimOutput[1]*dimOutput[3], argv[3]);" << std::endl; - ofsCodeA << " }" << std::endl; - } - else { - ofsCodeA << " if(argc > 3) {" << std::endl; - ofsCodeA << " if(copyTensor(output, argv[3], VX_READ_ONLY) < 0) {" << std::endl; - ofsCodeA << " return -1;" << std::endl; - ofsCodeA << " }" << std::endl; - ofsCodeA << " printf(\"OK: wrote %ldx%ldx%ldx%ld tensor into %s\\n\", dimOutput[3], " << (bOutputArgmax ? "(vx_size)1" : "dimOutput[2]") << ", dimOutput[1], dimOutput[0], argv[3]);" << std::endl; - ofsCodeA << " }" << std::endl; - } - ofsCodeA << " t0 = clockCounter();" << std::endl; - ofsCodeA << " int N = 100;" << std::endl; - ofsCodeA << " for(int i = 0; i < N; i++) {" << std::endl; - ofsCodeA << " status = vxProcessGraph(graph);" << std::endl; - ofsCodeA << " if(status != VX_SUCCESS)" << std::endl; - ofsCodeA << " break;" << std::endl; - ofsCodeA << " }" << std::endl; - ofsCodeA << " t1 = clockCounter();" << std::endl; - ofsCodeA << " printf(\"OK: vxProcessGraph() took %.3f msec (average over %d iterations)\\n\", (float)(t1-t0)*1000.0f/(float)freq/(float)N, N);" << std::endl; - ofsCodeA << std::endl; - ofsCodeA << " // release resources" << std::endl; - ofsCodeA << " ERROR_CHECK_STATUS(vxReleaseGraph(&graph));" << std::endl; - if(bInputIsImage) { - ofsCodeA << " ERROR_CHECK_STATUS(vxReleaseImage(&input));" << std::endl; - } - else { - ofsCodeA << " ERROR_CHECK_STATUS(vxReleaseTensor(&input));" << std::endl; - } - if(bOutputIsImage) { - ofsCodeA << " ERROR_CHECK_STATUS(vxReleaseImage(&output));" << std::endl; - } - else { - ofsCodeA << " ERROR_CHECK_STATUS(vxReleaseTensor(&output));" << std::endl; - } - ofsCodeA << " ERROR_CHECK_STATUS(vxReleaseContext(&context));" << std::endl; - ofsCodeA << " printf(\"OK: successful\\n\");" << std::endl; - ofsCodeA << std::endl; - ofsCodeA << " return 0;"<< std::endl; - ofsCodeA << "}" << std::endl; - - ofsCodeD << "find_path(OPENCL_INCLUDE_DIRS" << std::endl; - ofsCodeD << "NAMES OpenCL/cl.h CL/cl.h" << std::endl; - ofsCodeD << "HINTS" << std::endl; - ofsCodeD << "${OPENCL_ROOT}/include" << std::endl; - ofsCodeD << "$ENV{AMDAPPSDKROOT}/include" << std::endl; - ofsCodeD << "PATHS" << std::endl; - ofsCodeD << "/usr/include" << std::endl; - ofsCodeD << "/usr/local/include" << std::endl; - ofsCodeD << "/opt/rocm/opencl/include" << std::endl; - ofsCodeD << "DOC \"OpenCL header file path\"" << std::endl; - ofsCodeD << ")" << std::endl; - ofsCodeD << "mark_as_advanced( OPENCL_INCLUDE_DIRS )" << std::endl << std::endl; - ofsCodeD << "if(\"${CMAKE_SIZEOF_VOID_P}\" EQUAL \"8\")" << std::endl; - ofsCodeD << " find_library( OPENCL_LIBRARIES" << std::endl; - ofsCodeD << " NAMES OpenCL" << std::endl; - ofsCodeD << " HINTS" << std::endl; - ofsCodeD << " ${OPENCL_ROOT}/lib" << std::endl; - ofsCodeD << " $ENV{AMDAPPSDKROOT}/lib" << std::endl; - ofsCodeD << " DOC \"OpenCL dynamic library path\"" << std::endl; - ofsCodeD << " PATH_SUFFIXES x86_64 x64 x86_64/sdk" << std::endl; - ofsCodeD << " PATHS" << std::endl; - ofsCodeD << " /usr/lib" << std::endl; - ofsCodeD << " /opt/rocm/opencl/lib" << std::endl; - ofsCodeD << " )" << std::endl; - ofsCodeD << "else( )" << std::endl; - ofsCodeD << " find_library( OPENCL_LIBRARIES" << std::endl; - ofsCodeD << " NAMES OpenCL" << std::endl; - ofsCodeD << " HINTS" << std::endl; - ofsCodeD << " ${OPENCL_ROOT}/lib" << std::endl; - ofsCodeD << " $ENV{AMDAPPSDKROOT}/lib" << std::endl; - ofsCodeD << " DOC \"OpenCL dynamic library path\"" << std::endl; - ofsCodeD << " PATH_SUFFIXES x86 Win32" << std::endl; - ofsCodeD << " PATHS" << std::endl; - ofsCodeD << " /usr/lib" << std::endl; - ofsCodeD << " )" << std::endl; - ofsCodeD << "endif( )" << std::endl; - ofsCodeD << "mark_as_advanced( OPENCL_LIBRARIES )" << std::endl << std::endl; - ofsCodeD << "include( FindPackageHandleStandardArgs )" << std::endl; - ofsCodeD << "find_package_handle_standard_args( OPENCL DEFAULT_MSG OPENCL_LIBRARIES OPENCL_INCLUDE_DIRS )" << std::endl; - ofsCodeD << "set(OpenCL_FOUND ${OPENCL_FOUND} CACHE INTERNAL \"\")" << std::endl; - ofsCodeD << "set(OpenCL_LIBRARIES ${OPENCL_LIBRARIES} CACHE INTERNAL \"\")" << std::endl; - ofsCodeD << "set(OpenCL_INCLUDE_DIRS ${OPENCL_INCLUDE_DIRS} CACHE INTERNAL \"\")" << std::endl; - ofsCodeD << "if( NOT OPENCL_FOUND )" << std::endl; - ofsCodeD << " message( STATUS \"FindOpenCL looked for libraries named: OpenCL\" )" << std::endl; - ofsCodeD << "endif()" << std::endl; -} - -void parseCaffeModel(const caffe::NetParameter& net_parameter, std::vector>& net, int inputDim[4], std::string outputFolder, int flags) -{ - if(net_parameter.has_name()) - std::cout<<"Fetching the weights for : " << net_parameter.name()<< std::endl; - - std::map outputNameMap, splitNameMap; - if(net_parameter.input_size() > 0) { - outputNameMap[net_parameter.input(0)] = net_parameter.input(0); - } - - if(net_parameter.input_dim_size()==4 && ((inputDim[0]==0) || (inputDim[1]==0) || (inputDim[2]==0) || (inputDim[3]==0))) - { - inputDim[0] = net_parameter.input_dim(0); - inputDim[1] = net_parameter.input_dim(1); - inputDim[2] = net_parameter.input_dim(2); - inputDim[3] = net_parameter.input_dim(3); - } - - //extract layer information. - for(int i=0; i < net_parameter.layer_size() ;i++) - { - const caffe::LayerParameter& layer_parameter = net_parameter.layer(i); - - if(layer_parameter.top_size() == 0) - continue; - - //Check layer name. - if(layer_parameter.type() == "Input" || layer_parameter.type() == "Data" || layer_parameter.type() == "ImageData" ) { - outputNameMap[layer_parameter.top(0)]= layer_parameter.top(0); - if(layer_parameter.type() == "Input" && ((inputDim[0]==0) || (inputDim[1]==0) || (inputDim[2]==0) || (inputDim[3]==0))) { - inputDim[0] = layer_parameter.input_param().shape(0).dim(0); - inputDim[1] = layer_parameter.input_param().shape(0).dim(1); - inputDim[2] = layer_parameter.input_param().shape(0).dim(2); - inputDim[3] = layer_parameter.input_param().shape(0).dim(3); - } - continue; - } - - //dump layer data. - dumpLayerData(layer_parameter, outputFolder); - - // enable Split optimization using a bit in flags (i.e., remove Split by using variable renaming instead of a copy) - bool isSplitEnabled = (flags & 1); - if(!isSplitEnabled) { - if(layer_parameter.type()=="Split") { - for(int j=0; j< layer_parameter.top_size() ; j++ ) { - // get layer information and add to net - std::vector node; - node.push_back(layer_parameter.type()); - node.push_back(""); - node.push_back(layer_parameter.top(j)); - node.push_back(layer_parameter.top(j)); - for(int z = 0; z < layer_parameter.bottom_size();z++) { - if(outputNameMap.find(layer_parameter.bottom(z)) == outputNameMap.end()) { - outputNameMap[layer_parameter.bottom(z)] = layer_parameter.bottom(z); - } - node.push_back(outputNameMap[layer_parameter.bottom(z)]); - } - net.push_back(node); - // update output name with layer name - outputNameMap[layer_parameter.top(j)] = layer_parameter.top(j); - } - continue; - } - } - else - { - //Split type. - if(layer_parameter.type()=="Split") { - splitNameMap[layer_parameter.name()]= layer_parameter.bottom(0); - for(int j=0; j< layer_parameter.top_size() ; j++ ) { - splitNameMap[layer_parameter.top(j)] = layer_parameter.bottom(0); - } - continue; - } - } - - // get layer information and add to net - std::vector node; - std::string params; - getLayerParams(layer_parameter, params); - node.push_back(layer_parameter.type()); - node.push_back(params); - node.push_back(layer_parameter.top(0)); - node.push_back(layer_parameter.name()); - for(int j = 0; j < layer_parameter.bottom_size() ; j++) { - if(isSplitEnabled && (strstr(layer_parameter.bottom(j).c_str(),"split"))) { - outputNameMap[layer_parameter.bottom(j)]= splitNameMap[layer_parameter.bottom(j)]; - } - if(outputNameMap.find(layer_parameter.bottom(j)) == outputNameMap.end()) { - outputNameMap[layer_parameter.bottom(j)] = layer_parameter.bottom(j); - } - node.push_back(outputNameMap[layer_parameter.bottom(j)]); - } - net.push_back(node); - // update output name with layer name - outputNameMap[layer_parameter.top(0)] = layer_parameter.name(); - } -} - -void parseV1LayerCaffeModel(const caffe::NetParameter& net_parameter, std::vector>& net, int inputDim[4], std::string outputFolder, int flags) -{ - if(net_parameter.has_name()) - std::cout<<"Fetching the weights for : " << net_parameter.name()<< std::endl; - - std::map outputNameMap, splitNameMap; - if(net_parameter.input_size() > 0) { - outputNameMap[net_parameter.input(0)] = net_parameter.input(0); - } - - if(net_parameter.input_dim_size()==4 && ((inputDim[0]==0) || (inputDim[1]==0) || (inputDim[2]==0) || (inputDim[3]==0))) - { - inputDim[0] = net_parameter.input_dim(0); - inputDim[1] = net_parameter.input_dim(1); - inputDim[2] = net_parameter.input_dim(2); - inputDim[3] = net_parameter.input_dim(3); - } - - //extract layer information. - for(int i=0; i < net_parameter.layers_size() ;i++) - { - const caffe::V1LayerParameter& layer_parameter = net_parameter.layers(i); - - if(layer_parameter.top_size() == 0) - continue; - - //Check layer name. - if(layer_parameter.type() == caffe::V1LayerParameter_LayerType_DATA || layer_parameter.type() == caffe::V1LayerParameter_LayerType_IMAGE_DATA) { - outputNameMap[layer_parameter.top(0)]= layer_parameter.top(0); - continue; - } - - //dump layer data. - dumpV1LayerData(layer_parameter, outputFolder); - - // enable Split optimization using a bit in flags (i.e., remove Split by using variable renaming instead of a copy) - bool isSplitEnabled = (flags & 1); - if(!isSplitEnabled) { - if(layer_parameter.type() == caffe::V1LayerParameter_LayerType_SPLIT) { - for(int j = 0; j < layer_parameter.top_size(); j++) { - // get layer information and add to net - std::vector node; - node.push_back(convertV1LayerTypeToString(layer_parameter.type())); - node.push_back(""); - node.push_back(layer_parameter.top(j)); - node.push_back(layer_parameter.top(j)); - for(int z = 0; z < layer_parameter.bottom_size();z++) { - if(outputNameMap.find(layer_parameter.bottom(z)) == outputNameMap.end()) { - outputNameMap[layer_parameter.bottom(z)] = layer_parameter.bottom(z); - } - node.push_back(outputNameMap[layer_parameter.bottom(z)]); - } - net.push_back(node); - // update output name with layer name - outputNameMap[layer_parameter.top(j)] = layer_parameter.top(j); - } - continue; - } - } - else - { - //Split type. - if(layer_parameter.type() == caffe::V1LayerParameter_LayerType_SPLIT) { - splitNameMap[layer_parameter.name()]= layer_parameter.bottom(0); - for(int j=0; j< layer_parameter.top_size() ; j++ ) { - splitNameMap[layer_parameter.top(j)] = layer_parameter.bottom(0); - } - continue; - } - } - - // get layer information and add to net - std::vector node; - std::string params; - getV1LayerParams(layer_parameter, params); - node.push_back(convertV1LayerTypeToString(layer_parameter.type())); - node.push_back(params); - node.push_back(layer_parameter.top(0)); - node.push_back(layer_parameter.name()); - for(int j = 0; j < layer_parameter.bottom_size() ; j++) { - if(isSplitEnabled && (strstr(layer_parameter.bottom(j).c_str(),"split"))) { - outputNameMap[layer_parameter.bottom(j)]= splitNameMap[layer_parameter.bottom(j)]; - } - if(outputNameMap.find(layer_parameter.bottom(j)) == outputNameMap.end()) { - outputNameMap[layer_parameter.bottom(j)] = layer_parameter.bottom(j); - } - node.push_back(outputNameMap[layer_parameter.bottom(j)]); - } - net.push_back(node); - // update output name with layer name - outputNameMap[layer_parameter.top(0)] = layer_parameter.name(); - } -} - -int loadCaffeModelFile( - const char* fileName, - std::vector>& net, - int inputDim[4], - std::string outputFolder, - int flags) -{ - //verify the version of protobuf library. - GOOGLE_PROTOBUF_VERIFY_VERSION; - - //read the caffemodel. - caffe::NetParameter net_parameter; - std:: cout<<"Reading the binary file from : "<< fileName<< std::endl; - std::fstream input(fileName, std::ios::in| std::ios::binary); - bool isSuccess = net_parameter.ParseFromIstream(&input); - if(isSuccess) { - std::cout << "CaffeModel Read Successful" << std::endl; - if(net_parameter.layer_size() > 0) { - parseCaffeModel(net_parameter, net, inputDim, outputFolder, flags); - } - else if(net_parameter.layers_size() > 0) { - info("Reading V1 layer caffe model\n"); - parseV1LayerCaffeModel(net_parameter, net, inputDim, outputFolder, flags); - - } - else { - error("No 'layers' or 'layer' fields found in the caffemodel\n"); - return -1; - } - } - else { - std::cerr << "CaffeModel Read Failed" << std::endl; - } - return 0; -} - -int main(int argc, char* argv[]) -{ - const char * usage = - "Usage:\n" - " % caffe2openvx [options] [n c H W [type fixed-point-position [convert-policy round-policy]]]\n" - " options:\n" - " --[no-]error-messages - do/don't enable error messages (default: ON)\n" - " --[no-]virtual-buffers - do/don't use virtual buffers (default: ON)\n" - " --[no-]generate-gdf - do/don't generate RunVX GDF with weight/bias initialization (default: ON)\n" - " --[no-]generate-vx-code - do/don't generate OpenVX C Code with weight/bias initialization (default: ON)\n" - " --output-dir - specify output folder for weights/biases, GDF, and OpenVX C Code (default: current)\n" - " --input-rgb - convert input from RGB image into tensor using (a*x+b) conversion: rev=(BGR?1:0)\n" - " --input-u8 - convert input from U8 image into tensor using (a*x+b) conversion\n" - " --argmax-tensor u8|u16 k - return argmax output with specified tensor type and top_k\n" - " --argmax-image u8|u16 - return argmax output with specified image type\n" - " --argmax-lut - argmax color table: one R G B entry per label\n" - " --flags - specify custom flags (default: 0)\n" - ; - - // get options - bool bEnableErrorMessages = true; - bool isVirtualEnabled = true; - bool generateGDF = true; - bool generateVXC = true; - bool bFuseScaleWithBatchNorm = true; - bool bInputIsImage = false; - bool bInputChannelReverse = false; - double fInputConversionA = 0; - double fInputConversionB = 255; - std::string inputImageType; - bool bOutputArgmax = false; - bool bOutputIsImage = false; - std::string argmaxOutputDataType; - int argmaxTopK = 1; - std::vector argmaxLut; - std::string outputFolder = "."; - int flags = 0; - for(; argc > 1 && argv[1][0] == '-'; argc--, argv++) { - if(!strcmp(argv[1], "--error-messages")) { - bEnableErrorMessages = true; - } - else if(!strcmp(argv[1], "--no-error-messages")) { - bEnableErrorMessages = false; - } - else if(!strcmp(argv[1], "--virtual-buffers")) { - isVirtualEnabled = true; - } - else if(!strcmp(argv[1], "--no-virtual-buffers")) { - isVirtualEnabled = false; - } - else if(!strcmp(argv[1], "--generate-gdf")) { - generateGDF = true; - } - else if(!strcmp(argv[1], "--no-generate-gdf")) { - generateGDF = false; - } - else if(!strcmp(argv[1], "--generate-vx-code")) { - generateVXC = true; - } - else if(!strcmp(argv[1], "--no-generate-vx-code")) { - generateVXC = false; - } - else if(!strcmp(argv[1], "--output-dir") && argc > 2) { - outputFolder = argv[2]; - argc--; - argv++; - mkdir(outputFolder.c_str(), 0777); - } - else if(!strcmp(argv[1], "--flags") && argc > 2) { - flags = atoi(argv[2]); - argc--; - argv++; - } - else if(!strcmp(argv[1], "--input-rgb") && argc > 4) { - bInputIsImage = true; - inputImageType = "VX_DF_IMAGE_RGB"; - fInputConversionA = atof(argv[2]); - fInputConversionB = atof(argv[3]); - if(!strcmp(argv[4], "0")) bInputChannelReverse = false; - else if(!strcmp(argv[4], "1")) bInputChannelReverse = true; - else { - printf("ERROR: invalid input RGB channel option: %s (most be 0 or 1)\n", argv[4]); - return -1; - } - argc -= 3; - argv += 3; - } - else if(!strcmp(argv[1], "--input-u8") && argc > 3) { - bInputIsImage = true; - inputImageType = "VX_DF_IMAGE_U8"; - fInputConversionA = atof(argv[2]); - fInputConversionB = atof(argv[3]); - bInputChannelReverse = false; - argc -= 2; - argv += 2; - } - else if(!strcmp(argv[1], "--argmax-tensor") && argc > 3) { - bOutputArgmax = true; - bOutputIsImage = false; - if(!strcmp(argv[2], "u8")) argmaxOutputDataType = "VX_TYPE_UINT8"; - else if(!strcmp(argv[2], "u16")) argmaxOutputDataType = "VX_TYPE_UINT16"; - else { - printf("ERROR: invalid argmax output tensor type: %s (must be u8 or u16)\n", argv[2]); - return -1; - } - argmaxTopK = atoi(argv[3]); - argc -= 2; - argv += 2; - } - else if(!strcmp(argv[1], "--argmax-image") && argc > 2) { - bOutputArgmax = true; - bOutputIsImage = true; - if(!strcmp(argv[2], "u8")) argmaxOutputDataType = "VX_DF_IMAGE_U8"; - else if(!strcmp(argv[2], "u16")) argmaxOutputDataType = "VX_DF_IMAGE_U16"; - else { - printf("ERROR: invalid argmax output image type: %s (must be u8 or u16)\n", argv[2]); - return -1; - } - argmaxTopK = 1; - argc -= 1; - argv += 1; - } - else if(!strcmp(argv[1], "--argmax-lut") && argc > 2) { - if(!bOutputArgmax || !bOutputIsImage || argmaxOutputDataType != "VX_DF_IMAGE_U8") { - printf("ERROR: '--argmax-image u8' is required prior to '--argmax-lut' option\n"); - return -1; - } - FILE * fp = fopen(argv[2], "r"); - if(!fp) { - printf("ERROR: unable to open: %s\n", argv[2]); - return -1; - } - argmaxLut.clear(); - for(int r, g, b; fscanf(fp, "%d%d%d", &r, &g, &b) == 3;) { - int v = ((b & 255) << 16) | ((g & 255) << 8) | (r & 255); - argmaxLut.push_back(v); - } - fclose(fp); - printf("OK: loaded LUT with %ld entries from %s\n", argmaxLut.size(), argv[2]); - argc -= 1; - argv += 1; - } - else { - printf("ERROR: invalid option: %s\n", argv[1]); - return -1; - } - } - - // check for command-line arguments - if(argc < 2) { - printf("%s", usage); - return -1; - } - - // get command-line arguments - int inputDim[4] = { 0, 0, 0, 0 }, fixedPointPosition = 0; - const char * tensorType = "VX_TYPE_FLOAT32"; - const char * convertPolicy = "VX_CONVERT_POLICY_SATURATE"; - const char * roundPolicy = "VX_ROUND_POLICY_TO_NEAREST_EVEN"; - const char * fileName = argv[1]; - if(argc > 2) inputDim[0] = atoi(argv[2]); - if(argc > 3) inputDim[1] = atoi(argv[3]); - if(argc > 4) inputDim[2] = atoi(argv[4]); - if(argc > 5) inputDim[3] = atoi(argv[5]); - if(argc > 6) tensorType = argv[6]; - if(argc > 7) fixedPointPosition = atoi(argv[7]); - if(argc > 8) convertPolicy = argv[8]; - if(argc > 9) roundPolicy = argv[9]; - std::vector> net; - - flags &= 3; // we are only interersted in LSBs 0 & 1 - bFuseScaleWithBatchNorm = !((flags & 2) >> 1); - - // load caffe model (or just .prototxt) - if(strstr(fileName,".caffemodel")) { - // make sure that weights and bias folder are created - std::string dir = outputFolder + "/weights"; - mkdir(dir.c_str(), 0777); - dir = outputFolder + "/bias"; - mkdir(dir.c_str(), 0777); - // load caffe model - if(loadCaffeModelFile(fileName, net, inputDim, outputFolder, flags) < 0) { - return -1; - } - } - else if(strstr(fileName,".prototxt")) { - if(loadCaffeProtoTxt(fileName, net, inputDim) < 0) { - return -1; - } - } - else { - printf("%s", usage); - return -1; - } - - // generate tensorMap for given input dimensions - std::map> tensorMap; - if(calculateTensorDim(net, inputDim, tensorMap) < 0) { - return -1; - } - - if(generateGDF) { - std::ofstream ofsGDF(outputFolder + "/net.gdf", std::ios::binary); - writeGDF(ofsGDF, net, tensorMap, tensorType, fixedPointPosition, convertPolicy, roundPolicy, isVirtualEnabled, outputFolder, bFuseScaleWithBatchNorm); - } - - if(generateVXC) { - std::ofstream ofsCodeH(outputFolder + "/annmodule.h", std::ios::binary); - std::ofstream ofsCodeC(outputFolder + "/annmodule.cpp", std::ios::binary); - std::ofstream ofsCodeM(outputFolder + "/CMakeLists.txt", std::ios::binary); - std::ofstream ofsCodeA(outputFolder + "/anntest.cpp", std::ios::binary); - std::string dir = outputFolder + "/cmake"; - mkdir(dir.c_str(), 0777); - std::ofstream ofsCodeD(dir + "/FindOpenCL.cmake", std::ios::binary); - generateCode(ofsCodeH, ofsCodeC, ofsCodeM, ofsCodeA, ofsCodeD, - net, tensorMap, tensorType, fixedPointPosition, convertPolicy, roundPolicy, - isVirtualEnabled, outputFolder, - bInputIsImage, inputImageType, bInputChannelReverse, fInputConversionA, fInputConversionB, - bOutputArgmax, bOutputIsImage, argmaxOutputDataType, argmaxTopK, argmaxLut, - bEnableErrorMessages, - bFuseScaleWithBatchNorm); - } - - return 0; -} diff --git a/utilities/inference_generator/src/nnef2openvx.cpp b/utilities/inference_generator/src/nnef2openvx.cpp deleted file mode 100644 index 32d6f294c0..0000000000 --- a/utilities/inference_generator/src/nnef2openvx.cpp +++ /dev/null @@ -1,1848 +0,0 @@ -/* -Copyright (c) 2017 - 2023 Advanced Micro Devices, Inc. All rights reserved. - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in -all copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN -THE SOFTWARE. -*/ - -#include "flat/flat_parser.h" -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -//// -// MAGIC numbers -// -#define VARIABLES_FILE_MAGIC 0xF00DD1E0 -#define VARIABLES_DATA_MAGIC 0xF00DD1E1 -#define VARIABLES_EOFF_MAGIC 0xF00DD1E2 - -//// -// NNEF to OpenVX Translator -// -class NNEF2OpenVX_Translator : public nnef::Parser::Callback -{ -public: - NNEF2OpenVX_Translator(std::string nnefFolder_, std::string openvxFolder_, bool useVirtual_, int verbose_) - : nnefFolder(nnefFolder_), openvxFolder(openvxFolder_), useVirtual(useVirtual_), verbose(verbose_) - { - } - -protected: - //// - // class variables - // - int verbose; - bool useVirtual; - std::string nnefFolder; - std::string openvxFolder; - std::string openvxFilenameC; - std::ofstream ovxC; - std::vector inputList; - std::vector outputList; - std::vector virtualList; - std::vector variableList; - std::map> variableBinary; - std::map inputShape; - std::map outputShape; - std::map virtualShape; - std::map variableShape; - std::map variableLabel; - std::map variableRequiredDims; - std::vector opsProto; - std::vector> opsValues; - std::vector> opsShapes; - std::vector operationRemoved; - std::map variableMerged; - std::map virtualRename; - std::map convNewBiasName; - -private: - // utility functions - static void getTensorDims(const nnef::Shape& shape, std::vector& dims, size_t num_dims) - { - size_t rank = shape.rank(); - if(num_dims == 0) - num_dims = rank; - dims.clear(); - size_t count = 0; - if(rank > 1) { - for(; count < (num_dims - rank); count++) { - dims.push_back(1); - } - } - for(size_t i = 0; i < rank; i++, count++) { - dims.push_back(shape[rank-1-i]); - } - for(; count < num_dims; count++) { - dims.push_back(1); - } - } - static std::string codeGenTensorCreate (const std::string& name, const nnef::Shape& shape, bool useVirtual, size_t num_dims) - { - std::stringstream ss; - std::vector dims; - getTensorDims(shape, dims, num_dims); - ss << " vx_size " << name << "_dims[" << dims.size() << "] = {"; - for(size_t i = 0; i < dims.size(); i++) { - ss << (i == 0 ? " " : ", ") << dims[i]; - } - ss << " };" << std::endl; - ss << " vx_tensor " << name << " = " - << (useVirtual ? "vxCreateVirtualTensor(graph, " : "vxCreateTensor(context, ") - << dims.size() << ", " << name << "_dims, VX_TYPE_FLOAT32, 0);" << std::endl; - ss << " ERROR_CHECK_OBJECT(" << name << ");" << std::endl; - return ss.str(); - } - static unsigned int loadTensorFile(const std::string& nnefFolder, const std::string& label, const nnef::Shape& shape, char *& data) - { - std::string fileName = nnefFolder + "/" + label + ".dat"; - FILE * fp = fopen(fileName.c_str(), "rb"); - if(!fp) { - printf("ERROR: unable to open: %s\n", fileName.c_str()); - exit(1); - } - enum TensorDataType : unsigned char { - TensorDataType_Float, - TensorDataType_Quantized, - TensorDataType_Signed, - TensorDataType_Unsigned - }; - struct TensorFileHeader { - unsigned char magic[2]; - unsigned char major; - unsigned char minor; - unsigned int offset; - unsigned int rank; - unsigned int dim[8]; - unsigned char data_type; - unsigned char bit_width; - unsigned short quant_alg_len; - char quant_alg[1024]; - } h = { 0 }; - unsigned int offset = 0; - offset += fread(&h.magic, 1, sizeof(h.magic), fp); - offset += fread(&h.major, 1, sizeof(h.major), fp); - offset += fread(&h.minor, 1, sizeof(h.minor), fp); - offset += fread(&h.offset, 1, sizeof(h.offset), fp); - offset += fread(&h.rank, 1, sizeof(h.rank), fp); - if(h.rank > 0) { - offset += fread(h.dim, 1, h.rank * sizeof(h.dim[0]), fp); - } - offset += fread(&h.data_type, 1, sizeof(h.data_type), fp); - offset += fread(&h.bit_width, 1, sizeof(h.bit_width), fp); - offset += fread(&h.quant_alg_len, 1, sizeof(h.quant_alg_len), fp); - if(h.quant_alg_len > 0) { - offset += fread(h.quant_alg, 1, h.quant_alg_len, fp); - } - if(h.magic[0] != 0x4e || h.magic[1] != 0xef || h.major != 1 || h.minor != 0 - || h.bit_width == 0 || h.rank > 8 || h.quant_alg_len >= 1024 - || (12 + h.rank * 4 + 4 + h.quant_alg_len) != offset || h.offset < offset) - { - printf("ERROR: invalid or unsupported tensor file: %s\n", fileName.c_str()); - printf(" [ 0x%02x, 0x%02x, %d, %d, %d, %d, {", h.magic[0], h.magic[1], h.major, h.minor, h.offset, h.rank); - for(unsigned int i = 0; i < h.rank; i++) printf(" %d", h.dim[i]); - printf(" }, %d, %d, %d, '%s' ] offset = %d\n", h.data_type, h.bit_width, h.quant_alg_len, h.quant_alg, offset); - exit(1); - } - if(h.offset > offset) { - fseek(fp, h.offset, SEEK_SET); - } - unsigned int size = h.bit_width; - for(unsigned int i = 0; i < h.rank; i++) { - size *= h.dim[i]; - if(h.dim[i] != shape[i]) { - printf("ERROR: dimension[%d] mismatch: %d in %s (must be %d)\n", i, h.dim[i], fileName.c_str(), shape[i]); - exit(1); - } - } - size = (size + 7) >> 3; - data = nullptr; - if(h.data_type == TensorDataType_Float && h.bit_width == 32) { - data = new char [size]; - if(!data) { - printf("ERROR: memory allocation for %d bytes failed for %s\n", size, fileName.c_str()); - exit(1); - } - unsigned int n = fread(data, 1, size, fp); - if(n != size) { - printf("ERROR: unable to read %d bytes of data from %s\n", size, fileName.c_str()); - exit(1); - } - } - else { - printf("ERROR: import of Tensor DataType=%d BitWidth=%d is not yet supported\n", h.data_type, h.bit_width); - exit(1); - } - fclose(fp); - return size; - } - - std::string virtualName(const std::string name) - { - auto it = virtualRename.find(name); - return (it != virtualRename.end()) ? it->second : name; - } - - void codeGenOperation(size_t pos, bool getVariables, bool genCode, int verbose) - { - //// - // make sure that operation is not disabled - // - if(operationRemoved[pos]) { - return; - } - - //// - // get operation details - // - const nnef::Prototype& proto = opsProto[pos]; - const nnef::Dictionary& args = opsValues[pos]; - const nnef::Dictionary& shapes = opsShapes[pos]; - if(verbose & 1) { - std::cout << '\t'; - for ( size_t i = 0; i < proto.resultCount(); ++i ) { - auto& result = proto.result(i); - if ( i ) std::cout << ", "; - std::cout << args[result.name()]; - } - std::cout << " = " << proto.name() << "("; - for ( size_t i = 0; i < proto.paramCount(); ++i ) { - auto& param = proto.param(i); - if ( i ) std::cout << ", "; - if ( !param.type()->isTensor() ) - std::cout << param.name() << " = "; - std::cout << args[param.name()]; - } - std::cout << ")" << std::endl; - } - - //// - // utility functions - // - auto getTensorOrScalar = [] (const nnef::Value& v) -> std::string { - std::string value = "0"; - if(v) { - if(v.kind() == nnef::Value::Tensor) { - value = v.tensor().id; - } - else if(v.kind() == nnef::Value::Scalar) { - value = std::to_string(v.scalar()); - } - } - return value; - }; - auto getExtentArray = [] (const nnef::Value& v) -> std::vector { - std::vector value; - if(v && v.kind() == nnef::Value::Array) { - auto&& a = v.array(); - for(auto& i : a) { - value.push_back(i.integer()); - } - } - return value; - }; - auto getPaddingInfo = [] (const nnef::Value& v, size_t pad[4]) { - std::vector value; - if(v && v.kind() == nnef::Value::Array) { - auto&& a = v.array(); - if(a.size() == 2) { - pad[0] = a[0][0].integer(); - pad[1] = a[0][1].integer(); - pad[2] = a[1][0].integer(); - pad[3] = a[1][1].integer(); - // TODO: protection against -ve values - if(pad[0] > 16384) pad[0] = 0; - if(pad[1] > 16384) pad[1] = 0; - if(pad[2] > 16384) pad[2] = 0; - if(pad[3] > 16384) pad[3] = 0; - } - } - }; - - //// - // process operations - // - std::string opname = proto.name(); - if(opname == "external") { - const std::string& output = args["output"].tensor().id; - const nnef::Shape& shape = shapes[output]; - if(verbose & 2) { - std::cout << opname << " " << output << " " << shape << std::endl; - } - if(getVariables) { - inputShape[output] = shape; - } - } - else if(opname == "variable") { - const std::string& output = args["output"].tensor().id; - const nnef::Shape& shape = shapes[output]; - const std::string& label = args["label"].string(); - if(verbose & 2) { - std::cout << opname << " " << output << " " << shape << " label=" << label << std::endl; - } - if(getVariables) { - variableList.push_back(output); - variableMerged[output] = false; - variableShape[output] = shape; - variableLabel[output] = label; - } - } - else if(opname == "conv") { - const std::string& output = args["output"].tensor().id; - const nnef::Shape& shape = shapes[output]; - const std::string& input = args["input"].tensor().id; - const std::string& filter = args["filter"].tensor().id; - std::string bias = getTensorOrScalar(args["bias"]); - const std::string& border = args["border"].string(); - const auto& padding = args["padding"]; - const auto& stride = args["stride"]; - const auto& dilation = args["dilation"]; - const auto& groups = args["groups"] ? args["groups"].integer() : 1; - if(verbose & 2) { - std::cout << opname << " " << output << " " << shape << " " << input << " " << filter << " " << bias - << " border=" << border << " " << padding << " " << stride << " " << dilation << " " << groups << std::endl; - } - if(getVariables) { - if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) { - virtualList.push_back(output); - virtualShape[output] = shape; - } - else { - outputShape[output] = shape; - } - variableRequiredDims[filter] = 4; - if(bias[0] != '0') { - variableRequiredDims[bias] = 2; - } - } - if(genCode) { - if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) { - ovxC << codeGenTensorCreate(output, shape, useVirtual, 4); - } - if(bias[0] == '0') { - if(convNewBiasName.find(output) != convNewBiasName.end()) { - bias = convNewBiasName.find(output)->second; - } - } - if(shape[2] == 1 && shape[3] == 1) { - ovxC << " { vx_node node = vxFullyConnectedLayer(graph, " << virtualName(input) << ", " << filter << ", " - << ((bias[0] == '0') ? "NULL" : bias) << ", VX_CONVERT_POLICY_SATURATE, VX_ROUND_POLICY_TO_NEAREST_EVEN, " << output << ");" << std::endl; - ovxC << " ERROR_CHECK_STATUS(vxReleaseNode(&node));" << std::endl; - ovxC << " }" << std::endl; - } - else { - std::vector&& vDilation = getExtentArray(dilation); - size_t pad[4] = { 0, 0, 0, 0 }; - getPaddingInfo(padding, pad); - ovxC << " { vx_nn_convolution_params_t conv_params = { 0 };" << std::endl; - ovxC << " conv_params.padding_x = " << pad[1] << ";" << std::endl; - ovxC << " conv_params.padding_y = " << pad[0] << ";" << std::endl; - ovxC << " conv_params.dilation_x = " << (vDilation.size() > 1 ? vDilation[1] - 1 : 0) << ";" << std::endl; - ovxC << " conv_params.dilation_y = " << (vDilation.size() > 0 ? vDilation[0] - 1 : 0) << ";" << std::endl; - ovxC << " conv_params.overflow_policy = " << "VX_CONVERT_POLICY_SATURATE" << ";" << std::endl; - ovxC << " conv_params.rounding_policy = " << "VX_ROUND_POLICY_TO_NEAREST_EVEN" << ";" << std::endl; - ovxC << " conv_params.down_scale_size_rounding = " << "VX_NN_DS_SIZE_ROUNDING_FLOOR" << ";" << std::endl; - ovxC << " vx_node node = vxConvolutionLayer(graph, " << virtualName(input) << ", " << filter << ", " - << ((bias[0] == '0') ? "NULL" : bias) << ", &conv_params, sizeof(conv_params), " << output << ");" << std::endl; - ovxC << " ERROR_CHECK_STATUS(vxReleaseNode(&node));" << std::endl; - ovxC << " }" << std::endl; - } - } - } - else if(opname == "relu") { - const std::string& output = args["y"].tensor().id; - const nnef::Shape& shape = shapes[output]; - const std::string& input = args["x"].tensor().id; - if(verbose & 2) { - std::cout << opname << " " << output << " " << shape << " " << input << std::endl; - } - if(getVariables) { - if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) { - virtualList.push_back(output); - virtualShape[output] = shape; - } - else { - outputShape[output] = shape; - } - } - if(genCode) { - if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) { - ovxC << codeGenTensorCreate(output, shape, useVirtual, 4); - } - ovxC << " { vx_node node = vxActivationLayer(graph, " << virtualName(input) << ", VX_NN_ACTIVATION_RELU, 0.0f, 0.0f, " << output << ");" << std::endl; - ovxC << " ERROR_CHECK_STATUS(vxReleaseNode(&node));" << std::endl; - ovxC << " }" << std::endl; - } - } - else if(opname == "max_pool") { - const std::string& output = args["output"].tensor().id; - const nnef::Shape& shape = shapes[output]; - const std::string& input = args["input"].tensor().id; - const auto& size = args["size"]; - const std::string& border = args["border"].string(); - const auto& padding = args["padding"]; - const auto& stride = args["stride"]; - const auto& dilation = args["dilation"]; - if(verbose & 2) { - std::cout << opname << " " << output << " " << shape << " " << input - << " size=" << size << " border=" << border << " " << padding << " " << stride << " " << dilation << std::endl; - } - if(getVariables) { - if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) { - virtualList.push_back(output); - virtualShape[output] = shape; - } - else { - outputShape[output] = shape; - } - } - if(genCode) { - if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) { - ovxC << codeGenTensorCreate(output, shape, useVirtual, 4); - } - std::vector&& vSize = getExtentArray(size); - size_t pad[4] = { 0, 0, 0, 0 }; - getPaddingInfo(padding, pad); - ovxC << " { vx_node node = vxPoolingLayer(graph, " << virtualName(input) << ", VX_NN_POOLING_MAX, " - << size[3] << ", " << size[2] << ", " << pad[1] << ", " << pad[0] << ", " - << "VX_ROUND_POLICY_TO_NEAREST_EVEN, " << output << ");" << std::endl; - ovxC << " ERROR_CHECK_STATUS(vxReleaseNode(&node));" << std::endl; - ovxC << " }" << std::endl; - } - } - else if(opname == "avg_pool") { - const std::string& output = args["output"].tensor().id; - const nnef::Shape& shape = shapes[output]; - const std::string& input = args["input"].tensor().id; - const auto& size = args["size"]; - const std::string& border = args["border"].string(); - const auto& padding = args["padding"]; - const auto& stride = args["stride"]; - const auto& dilation = args["dilation"]; - if(verbose & 2) { - std::cout << opname << " " << output << " " << shape << " " << input - << " size=" << size << " border=" << border << " " << padding << " " << stride << " " << dilation << std::endl; - } - if(getVariables) { - if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) { - virtualList.push_back(output); - virtualShape[output] = shape; - } - else { - outputShape[output] = shape; - } - } - if(genCode) { - if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) { - ovxC << codeGenTensorCreate(output, shape, useVirtual, 4); - } - std::vector&& vSize = getExtentArray(size); - size_t pad[4] = { 0, 0, 0, 0 }; - getPaddingInfo(padding, pad); - ovxC << " { vx_node node = vxPoolingLayer(graph, " << virtualName(input) << ", VX_NN_POOLING_AVG, " - << size[3] << ", " << size[2] << ", " << pad[1] << ", " << pad[0] << ", " - << "VX_ROUND_POLICY_TO_NEAREST_EVEN, " << output << ");" << std::endl; - ovxC << " ERROR_CHECK_STATUS(vxReleaseNode(&node));" << std::endl; - ovxC << " }" << std::endl; - } - } - else if(opname == "concat") { - const std::string& output = args["value"].tensor().id; - const nnef::Shape& shape = shapes[output]; - std::vector itemList; - const auto& inputpar = args["values"]; - for(size_t i = 0; i < inputpar.size(); i++) { - std::string name = inputpar[i].tensor().id; - itemList.push_back(name); - } - const int axis = args["axis"].integer(); - if(verbose & 2) { - std::cout << opname << " " << output << " " << shape << " ["; - for(auto& v : itemList) std::cout << " " << v; - std::cout << " ] axis=" << axis << std::endl; - } - if(getVariables) { - if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) { - virtualList.push_back(output); - virtualShape[output] = shape; - } - else { - outputShape[output] = shape; - } - } - if(genCode) { - if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) { - ovxC << codeGenTensorCreate(output, shape, useVirtual, 4); - } - ovxC << " { vx_node node = vxConcatLayer(graph, " << output; - for(auto& v : itemList) { - ovxC << ", " << virtualName(v); - } - for(size_t i = itemList.size(); i < 8; i++) { - ovxC << ", NULL"; - } - ovxC << ");" << std::endl; - ovxC << " ERROR_CHECK_STATUS(vxReleaseNode(&node));" << std::endl; - ovxC << " }" << std::endl; - } - } - else if(opname == "batch_normalization") { - const std::string& output = args["output"].tensor().id; - const nnef::Shape& shape = shapes[output]; - const std::string& input = args["input"].tensor().id; - const std::string& mean = args["mean"].tensor().id; - const std::string& variance = args["variance"].tensor().id; - std::string scale = getTensorOrScalar(args["scale"]); - std::string offset = getTensorOrScalar(args["offset"]); - const float epsilon = args["epsilon"].scalar(); - if(verbose & 2) { - std::cout << opname << " " << output << " " << shape << " " << input - << " " << mean << " " << variance << " " << offset << " " << scale << " " << epsilon << std::endl; - } - if(getVariables) { - if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) { - virtualList.push_back(output); - virtualShape[output] = shape; - } - else { - outputShape[output] = shape; - } - } - if(genCode) { - if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) { - ovxC << codeGenTensorCreate(output, shape, useVirtual, 4); - } - ovxC << " { vx_node node = vxBatchNormalizationLayer(graph, " << virtualName(input) << ", " << mean << ", " << variance - << ", " << (scale[0] == '1' ? "NULL" : scale) << ", " << (offset[0] == '0' ? "NULL" : offset) - << ", " << epsilon << ", " << output << ");" << std::endl; - ovxC << " ERROR_CHECK_STATUS(vxReleaseNode(&node));" << std::endl; - ovxC << " }" << std::endl; - } - } - else if(opname == "mul") { - const std::string& output = args["z"].tensor().id; - const nnef::Shape& shape = shapes[output]; - const std::string& input1 = args["x"].tensor().id; - const std::string& input2 = args["y"].tensor().id; - if(verbose & 2) { - std::cout << opname << " " << output << " " << shape << " " << input1 << " " << input2 << std::endl; - } - if(getVariables) { - if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) { - virtualList.push_back(output); - virtualShape[output] = shape; - } - else { - outputShape[output] = shape; - } - } - if(genCode) { - if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) { - ovxC << codeGenTensorCreate(output, shape, useVirtual, 4); - } - ovxC << " { float one = 1.0f;" << std::endl; - ovxC << " vx_scalar scale = vxCreateScalar(context, VX_TYPE_FLOAT32, &one);" << std::endl; - ovxC << " vx_node node = vxTensorMultiplyNode(graph, " << virtualName(input1) << ", " << virtualName(input2) << ", scale, VX_CONVERT_POLICY_SATURATE, VX_ROUND_POLICY_TO_NEAREST_EVEN, " << output << ");" << std::endl; - ovxC << " ERROR_CHECK_STATUS(vxReleaseScalar(&scale));" << std::endl; - ovxC << " ERROR_CHECK_STATUS(vxReleaseNode(&node));" << std::endl; - ovxC << " }" << std::endl; - } - } - else if(opname == "add") { - const std::string& output = args["z"].tensor().id; - const nnef::Shape& shape = shapes[output]; - const std::string& input1 = args["x"].tensor().id; - const std::string& input2 = args["y"].tensor().id; - if(verbose & 2) { - std::cout << opname << " " << output << " " << shape << " " << input1 << " " << input2 << std::endl; - } - if(getVariables) { - if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) { - virtualList.push_back(output); - virtualShape[output] = shape; - } - else { - outputShape[output] = shape; - } - } - if(genCode) { - if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) { - ovxC << codeGenTensorCreate(output, shape, useVirtual, 4); - } - ovxC << " { vx_node node = vxTensorAddNode(graph, " << virtualName(input1) << ", " << virtualName(input2) << ", VX_CONVERT_POLICY_SATURATE, " << output << ");" << std::endl; - ovxC << " ERROR_CHECK_STATUS(vxReleaseNode(&node));" << std::endl; - ovxC << " }" << std::endl; - } - } - else if(opname == "softmax") { - const std::string& output = args["y"].tensor().id; - const nnef::Shape& shape = shapes[output]; - const std::string& input = args["x"].tensor().id; - std::vector&& axes = getExtentArray(args["axes"]); - if(verbose & 2) { - std::cout << opname << " " << output << " " << shape << " " << input << " " << args["axes"] << std::endl; - } - if(axes.size() != 1 || axes[0] != 1) { - std::cout << "ERROR: " << opname << " with " << args["axes"] << " is *** not yet supported ***" << std::endl; - exit(1); - } - if(getVariables) { - if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) { - virtualList.push_back(output); - virtualShape[output] = shape; - } - else { - outputShape[output] = shape; - } - } - if(genCode) { - if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) { - ovxC << codeGenTensorCreate(output, shape, useVirtual, 4); - } - ovxC << " { vx_node node = vxSoftmaxLayer(graph, " << virtualName(input) << ", " << output << ");" << std::endl; - ovxC << " ERROR_CHECK_STATUS(vxReleaseNode(&node));" << std::endl; - ovxC << " }" << std::endl; - } - } - else if(opname == "sum_reduce") { - const std::string& output = args["output"].tensor().id; - const nnef::Shape& shape = shapes[output]; - const std::string& input = args["input"].tensor().id; - const auto& axes = args["axes"]; - const bool normalize = args["normalize"].logical(); - if(verbose & 2) { - std::cout << opname << " " << output << " " << shape << " " << input << " " << axes << " " << normalize << std::endl; - } - if(getVariables) { - if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) { - virtualList.push_back(output); - virtualShape[output] = shape; - } - else { - outputShape[output] = shape; - } - } - if(genCode) { - if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) { - ovxC << codeGenTensorCreate(output, shape, useVirtual, 4); - } - std::cout << opname << " *** not yet supported ***" << std::endl; - exit(1); - } - } - else if(opname == "mean_reduce") { - const std::string& output = args["output"].tensor().id; - const nnef::Shape& shape = shapes[output]; - const std::string& input = args["input"].tensor().id; - const auto& axes = args["axes"]; - if(verbose & 2) { - std::cout << opname << " " << output << " " << shape << " " << input << " " << axes << std::endl; - } - if(getVariables) { - if(std::find(outputList.begin(), outputList.end(), output) == outputList.end()) { - virtualList.push_back(output); - virtualShape[output] = shape; - } - else { - outputShape[output] = shape; - } - } - if(genCode) { - if(std::find(virtualList.begin(), virtualList.end(), output) != virtualList.end()) { - ovxC << codeGenTensorCreate(output, shape, useVirtual, 4); - } - std::cout << opname << " *** not yet supported ***" << std::endl; - exit(1); - } - } - else { - std::cout << opname << " *** not yet supported ***" << std::endl; - exit(1); - } - } - - void codeGenMergeVariables() - { - auto getTensorOrScalar = [] (const nnef::Value& v) -> std::string { - std::string value = "0"; - if(v) { - if(v.kind() == nnef::Value::Tensor) { - value = v.tensor().id; - } - else if(v.kind() == nnef::Value::Scalar) { - value = std::to_string(v.scalar()); - } - } - return value; - }; - - size_t prevPos = 0; - std::string prevOpName = "", prevOutput = ""; - for(size_t pos = 0; pos < opsProto.size(); pos++) { - std::string opname = opsProto[pos].name(); - if(prevOpName == "batch_normalization" && opname == "conv") { - // get "batch_normalization" variables - const nnef::Dictionary& argsBN = opsValues[prevPos]; - const nnef::Dictionary& shapesBN = opsShapes[prevPos]; - const std::string& inputBN = argsBN["input"].tensor().id; - const std::string& mean = argsBN["mean"].tensor().id; - const std::string& variance = argsBN["variance"].tensor().id; - std::string scale = getTensorOrScalar(argsBN["scale"]); - std::string offset = getTensorOrScalar(argsBN["offset"]); - const float epsilon = argsBN["epsilon"].scalar(); - const nnef::Shape& shapeMean = shapesBN[mean]; - // get "conv" variables - const nnef::Dictionary& argsConv = opsValues[pos]; - const nnef::Dictionary& shapesConv = opsShapes[pos]; - const std::string& outputConv = argsConv["output"].tensor().id; - const std::string& filter = argsConv["filter"].tensor().id; - const std::string& bias = getTensorOrScalar(argsConv["bias"]); - const nnef::Shape& shapeFilter = shapesConv[filter]; - // get filter and mean dimensions - size_t filterDimsCount = shapeFilter.rank(), meanDimsCount = shapeMean.rank(); - std::vector filterDims, meanDims; - getTensorDims(shapeFilter, filterDims, filterDimsCount); - getTensorDims(shapeMean, meanDims, meanDimsCount); - // check validity of dimensions - size_t K = (filterDimsCount == 4) ? filterDims[3] : filterDims[1]; - size_t N = (filterDimsCount == 4) ? (filterDims[0] * filterDims[1] * filterDims[2]) : filterDims[0]; - if((filterDimsCount == 4 || filterDimsCount == 2) && meanDimsCount == 2 && K == meanDims[0]) { - // fuse batch_normalization variables into conv variables - std::tuple filterBinary = variableBinary[filter]; - std::tuple meanBinary = variableBinary[mean]; - std::tuple varianceBinary = variableBinary[variance]; - float * filterBuf = (float *)std::get<1>(filterBinary); - float * biasBuf = nullptr; - float * meanBuf = (float *)std::get<1>(meanBinary); - float * varianceBuf = (float *)std::get<1>(varianceBinary); - float * scaleBuf = nullptr; - float * offsetBuf = nullptr; - if(bias[0] != '0') { - std::tuple biasBinary = variableBinary[bias]; - biasBuf = (float *)std::get<1>(biasBinary); - } - else if(convNewBiasName.find(outputConv) != convNewBiasName.end()) { - std::tuple biasBinary = variableBinary[convNewBiasName[outputConv]]; - biasBuf = (float *)std::get<1>(biasBinary); - } - else { - size_t size = K * sizeof(float); - char * data = new char [size]; - biasBuf = (float *)data; - for(size_t i = 0; i < K; i++) { - biasBuf[i] = 0; - } - std::string name = filter + "__new_bias"; - std::tuple binary(size, data); - variableBinary[name] = binary; - convNewBiasName[outputConv] = name; - variableList.push_back(name); - variableMerged[name] = false; - nnef::Shape shape(1); - shape[0] = K; - shape[1] = 1; - variableShape[name] = shape; - variableRequiredDims[name] = 2; - } - if(scale[0] != '1') { - scaleBuf = (float *)std::get<1>(variableBinary[scale]); - } - if(offset[0] != '0') { - offsetBuf = (float *)std::get<1>(variableBinary[offset]); - } - for(size_t k = 0; k < K; k++) { - double mk = 1.0 / sqrt((double)varianceBuf[k] + epsilon); - double ck = -meanBuf[k] * mk; - if(scaleBuf) { - mk *= scaleBuf[k]; - ck *= scaleBuf[k]; - } - if(offsetBuf) { - ck += offsetBuf[k]; - } - float * W = &filterBuf[k*N]; - double Wsum = 0; - for(size_t j = 0; j < N; j++) { - Wsum += W[j]; - W[j] = (float)(W[j] * mk); - } - if(biasBuf) { - biasBuf[k] = (float)(Wsum * ck + biasBuf[k]); - } - } - // mark that batch_normalization is disabled and rename output as input - operationRemoved[prevPos] = true; - virtualRename[argsConv["input"].tensor().id] = inputBN; - // mark the merged variables - variableMerged[mean] = true; - variableMerged[variance] = true; - if(scaleBuf) variableMerged[scale] = true; - if(offsetBuf) variableMerged[offset] = true; - } - // use conv as previous layer - prevPos = pos; - prevOpName = opname; - prevOutput = argsConv["output"].tensor().id; - } - else if(prevOpName == "conv" && opname == "batch_normalization") { - // get "conv" variables - const nnef::Dictionary& argsConv = opsValues[prevPos]; - const nnef::Dictionary& shapesConv = opsShapes[prevPos]; - const std::string& outputConv = argsConv["output"].tensor().id; - const std::string& filter = argsConv["filter"].tensor().id; - const std::string& bias = getTensorOrScalar(argsConv["bias"]); - const nnef::Shape& shapeFilter = shapesConv[filter]; - // get "batch_normalization" variables - const nnef::Dictionary& argsBN = opsValues[pos]; - const nnef::Dictionary& shapesBN = opsShapes[pos]; - const std::string& mean = argsBN["mean"].tensor().id; - const std::string& variance = argsBN["variance"].tensor().id; - std::string scale = getTensorOrScalar(argsBN["scale"]); - std::string offset = getTensorOrScalar(argsBN["offset"]); - const float epsilon = argsBN["epsilon"].scalar(); - const nnef::Shape& shapeMean = shapesBN[mean]; - // get filter and mean dimensions - size_t filterDimsCount = shapeFilter.rank(), meanDimsCount = shapeMean.rank(); - std::vector filterDims, meanDims; - getTensorDims(shapeFilter, filterDims, filterDimsCount); - getTensorDims(shapeMean, meanDims, meanDimsCount); - // check validity of dimensions - size_t K = (filterDimsCount == 4) ? filterDims[3] : filterDims[1]; - size_t N = (filterDimsCount == 4) ? (filterDims[0] * filterDims[1] * filterDims[2]) : filterDims[0]; - if((filterDimsCount == 4 || filterDimsCount == 2) && meanDimsCount == 2 && K == meanDims[0]) { - // fuse batch_normalization variables into conv variables - std::tuple filterBinary = variableBinary[filter]; - std::tuple meanBinary = variableBinary[mean]; - std::tuple varianceBinary = variableBinary[variance]; - float * filterBuf = (float *)std::get<1>(filterBinary); - float * biasBuf = nullptr; - float * meanBuf = (float *)std::get<1>(meanBinary); - float * varianceBuf = (float *)std::get<1>(varianceBinary); - float * scaleBuf = nullptr; - float * offsetBuf = nullptr; - if(bias[0] != '0') { - std::tuple biasBinary = variableBinary[bias]; - biasBuf = (float *)std::get<1>(biasBinary); - } - else if(convNewBiasName.find(outputConv) != convNewBiasName.end()) { - std::tuple biasBinary = variableBinary[convNewBiasName[outputConv]]; - biasBuf = (float *)std::get<1>(biasBinary); - } - else { - size_t size = K * sizeof(float); - char * data = new char [size]; - biasBuf = (float *)data; - for(size_t i = 0; i < K; i++) { - biasBuf[i] = 0; - } - std::string name = filter + "__new_bias"; - std::tuple binary(size, data); - variableBinary[name] = binary; - convNewBiasName[outputConv] = name; - variableList.push_back(name); - variableMerged[name] = false; - nnef::Shape shape(1); - shape[0] = K; - shape[1] = 1; - variableShape[name] = shape; - variableRequiredDims[name] = 2; - } - if(scale[0] != '1') { - scaleBuf = (float *)std::get<1>(variableBinary[scale]); - } - if(offset[0] != '0') { - offsetBuf = (float *)std::get<1>(variableBinary[offset]); - } - for(size_t k = 0; k < K; k++) { - double mk = 1.0 / sqrt((double)varianceBuf[k] + epsilon); - double ck = -meanBuf[k] * mk; - if(scaleBuf) { - mk *= scaleBuf[k]; - ck *= scaleBuf[k]; - } - if(offsetBuf) { - ck += offsetBuf[k]; - } - float * W = &filterBuf[k*N]; - for(size_t j = 0; j < N; j++) { - W[j] = (float)(W[j] * mk); - } - if(biasBuf) { - biasBuf[k] = (float)(mk * biasBuf[k] + ck); - } - } - // mark that batch_normalization is disabled, rename output as input, and use conv as previous layer - operationRemoved[pos] = true; - virtualRename[argsBN["output"].tensor().id] = outputConv; - prevOutput = argsBN["output"].tensor().id; - // mark the merged variables - variableMerged[mean] = true; - variableMerged[variance] = true; - if(scaleBuf) variableMerged[scale] = true; - if(offsetBuf) variableMerged[offset] = true; - } - else { - // use batch_normalization as previous layer - prevPos = pos; - prevOpName = opname; - prevOutput = argsBN["output"].tensor().id; - } - } - else if((prevOpName == "mul" || prevOpName == "add") && opname == "conv") { - // get "mul" or "add" variables - const nnef::Dictionary& argsOP = opsValues[prevPos]; - const nnef::Dictionary& shapesOP = opsShapes[prevPos]; - const std::string& x = argsOP["x"].tensor().id; - const std::string& y = argsOP["y"].tensor().id; - std::string var, inputBN; - nnef::Shape shapeVar; - if(std::find(variableList.begin(), variableList.end(), x) != variableList.end()) { - inputBN = y; - var = x; - shapeVar = shapesOP[x]; - } - else if(std::find(variableList.begin(), variableList.end(), y) != variableList.end()) { - inputBN = x; - var = y; - shapeVar = shapesOP[y]; - } - // get "conv" variables - const nnef::Dictionary& argsConv = opsValues[pos]; - const nnef::Dictionary& shapesConv = opsShapes[pos]; - const std::string& outputConv = argsConv["output"].tensor().id; - const std::string& filter = argsConv["filter"].tensor().id; - const std::string& bias = getTensorOrScalar(argsConv["bias"]); - const nnef::Shape& shapeFilter = shapesConv[filter]; - // get var dimensions - size_t filterDimsCount = shapeFilter.rank(), varDimsCount = 0; - std::vector filterDims, varDims; - getTensorDims(shapeFilter, filterDims, filterDimsCount); - if(var.length() > 0) { - varDimsCount = shapeVar.rank(); - getTensorDims(shapeVar, varDims, varDimsCount); - } - // check validity of dimensions - size_t K = (filterDimsCount == 4) ? filterDims[3] : filterDims[1]; - size_t N = (filterDimsCount == 4) ? (filterDims[0] * filterDims[1] * filterDims[2]) : filterDims[0]; - if((filterDimsCount == 4 || filterDimsCount == 2) && varDimsCount == 2 && K == varDims[0]) { - // fuse var into conv variables - std::tuple filterBinary = variableBinary[filter]; - std::tuple biasBinary = variableBinary[bias]; - std::tuple varBinary = variableBinary[var]; - float * filterBuf = (float *)std::get<1>(filterBinary); - float * biasBuf = nullptr; - float * varBuf = (float *)std::get<1>(varBinary); - if(bias[0] != '0') { - std::tuple biasBinary = variableBinary[bias]; - biasBuf = (float *)std::get<1>(biasBinary); - } - else if(convNewBiasName.find(outputConv) != convNewBiasName.end()) { - std::tuple biasBinary = variableBinary[convNewBiasName[outputConv]]; - biasBuf = (float *)std::get<1>(biasBinary); - } - else { - size_t size = K * sizeof(float); - char * data = new char [size]; - biasBuf = (float *)data; - for(size_t i = 0; i < K; i++) { - biasBuf[i] = 0; - } - std::string name = filter + "__new_bias"; - std::tuple binary(size, data); - variableBinary[name] = binary; - convNewBiasName[outputConv] = name; - variableList.push_back(name); - variableMerged[name] = false; - nnef::Shape shape(1); - shape[0] = K; - shape[1] = 1; - variableShape[name] = shape; - variableRequiredDims[name] = 2; - } - if(prevOpName == "mul") { - for(size_t k = 0; k < K; k++) { - double mk = varBuf[k]; - size_t N = filterDims[0] * filterDims[1] * filterDims[2]; - float * W = &filterBuf[k*N]; - for(size_t j = 0; j < N; j++) { - W[j] = (float)(W[j] * mk); - } - } - } - else { - for(size_t k = 0; k < K; k++) { - double ck = varBuf[k]; - size_t N = filterDims[0] * filterDims[1] * filterDims[2]; - float * W = &filterBuf[k*N]; - double Wsum = 0; - for(size_t j = 0; j < N; j++) { - Wsum += W[j]; - } - biasBuf[k] = (float)(ck * Wsum + biasBuf[k]); - } - } - // mark that OP is disabled, rename output as input, and use conv as previous layer - operationRemoved[prevPos] = true; - virtualRename[argsConv["input"].tensor().id] = inputBN; - prevOutput = argsConv["output"].tensor().id; - // mark the merged variables - variableMerged[var] = true; - } - else { - // use conv as previous layer - prevPos = pos; - prevOpName = opname; - prevOutput = argsConv["output"].tensor().id; - } - } - else if(prevOpName == "conv" && (opname == "mul" || opname == "add")) { - // get "conv" variables - const nnef::Dictionary& argsConv = opsValues[prevPos]; - const nnef::Dictionary& shapesConv = opsShapes[prevPos]; - const std::string& outputConv = argsConv["output"].tensor().id; - const std::string& filter = argsConv["filter"].tensor().id; - const std::string& bias = getTensorOrScalar(argsConv["bias"]); - const nnef::Shape& shapeFilter = shapesConv[filter]; - // get "mul" or "add" variables - const nnef::Dictionary& argsOP = opsValues[pos]; - const nnef::Dictionary& shapesOP = opsShapes[pos]; - const std::string& x = argsOP["x"].tensor().id; - const std::string& y = argsOP["y"].tensor().id; - std::string var; - nnef::Shape shapeVar; - if(std::find(variableList.begin(), variableList.end(), x) != variableList.end()) { - var = x; - shapeVar = shapesOP[x]; - } - else if(std::find(variableList.begin(), variableList.end(), y) != variableList.end()) { - var = y; - shapeVar = shapesOP[y]; - } - // get var dimensions - size_t filterDimsCount = shapeFilter.rank(), varDimsCount = 0; - std::vector filterDims, varDims; - getTensorDims(shapeFilter, filterDims, filterDimsCount); - if(var.length() > 0) { - varDimsCount = shapeVar.rank(); - getTensorDims(shapeVar, varDims, varDimsCount); - } - // check validity of dimensions - size_t K = (filterDimsCount == 4) ? filterDims[3] : filterDims[1]; - size_t N = (filterDimsCount == 4) ? (filterDims[0] * filterDims[1] * filterDims[2]) : filterDims[0]; - if((filterDimsCount == 4 || filterDimsCount == 2) && varDimsCount == 2 && K == varDims[0]) { - // fuse var into conv variables - std::tuple filterBinary = variableBinary[filter]; - std::tuple biasBinary = variableBinary[bias]; - std::tuple varBinary = variableBinary[var]; - float * filterBuf = (float *)std::get<1>(filterBinary); - float * biasBuf = nullptr; - float * varBuf = (float *)std::get<1>(varBinary); - if(bias[0] != '0') { - std::tuple biasBinary = variableBinary[bias]; - biasBuf = (float *)std::get<1>(biasBinary); - } - else if(convNewBiasName.find(outputConv) != convNewBiasName.end()) { - std::tuple biasBinary = variableBinary[convNewBiasName[outputConv]]; - biasBuf = (float *)std::get<1>(biasBinary); - } - else { - size_t size = K * sizeof(float); - char * data = new char [size]; - biasBuf = (float *)data; - for(size_t i = 0; i < K; i++) { - biasBuf[i] = 0; - } - std::string name = filter + "__new_bias"; - std::tuple binary(size, data); - variableBinary[name] = binary; - convNewBiasName[outputConv] = name; - variableList.push_back(name); - variableMerged[name] = false; - nnef::Shape shape(1); - shape[0] = K; - shape[1] = 1; - variableShape[name] = shape; - variableRequiredDims[name] = 2; - } - if(opname == "mul") { - for(size_t k = 0; k < K; k++) { - double mk = varBuf[k]; - float * W = &filterBuf[k*N]; - for(size_t j = 0; j < N; j++) { - W[j] = (float)(W[j] * mk); - } - if(biasBuf) { - biasBuf[k] = (float)(mk * biasBuf[k]); - } - } - } - else { - for(size_t k = 0; k < K; k++) { - float ck = varBuf[k]; - biasBuf[k] = biasBuf[k] + ck; - } - } - // mark that OP is disabled, rename output as input, and use conv as previous layer - operationRemoved[pos] = true; - virtualRename[argsOP["z"].tensor().id] = outputConv; - prevOutput = argsOP["z"].tensor().id; - // mark the merged variables - variableMerged[var] = true; - } - else { - // use OP as previous layer - prevPos = pos; - prevOpName = opname; - prevOutput = argsOP["z"].tensor().id; - } - } - else if(opname == "max_pool" || opname == "avg_pool") { - const nnef::Dictionary& args = opsValues[pos]; - const std::string& input = args["input"].tensor().id; - if(input != prevOutput || prevOpName != "conv") { - prevPos = pos; - prevOpName = opname; - } - prevOutput = args["output"].tensor().id; - } - else if(opname == "conv" || opname == "batch_normalization") { - const nnef::Dictionary& args = opsValues[pos]; - const std::string& input = args["input"].tensor().id; - prevPos = pos; - prevOpName = opname; - prevOutput = args["output"].tensor().id; - } - else if(opname == "add" || opname == "mul") { - const nnef::Dictionary& args = opsValues[pos]; - const std::string& input1 = args["x"].tensor().id; - const std::string& input2 = args["y"].tensor().id; - prevPos = pos; - prevOpName = opname; - prevOutput = args["z"].tensor().id; - } - else { - prevPos = 0; - prevOpName = ""; - prevOutput = ""; - } - } - } - -protected: - //// - // translator callback implementations - // - virtual void beginGraph( const nnef::Prototype& proto ) - { - // show NNEF syntax - if(verbose & 1) { - std::cout << "graph " << proto.name() << "( "; - for ( size_t i = 0; i < proto.paramCount(); ++i ) { - auto& param = proto.param(i); - if ( i ) std::cout << ", "; - std::cout << param.name(); - } - std::cout << " ) -> ( "; - for ( size_t i = 0; i < proto.resultCount(); ++i ) { - auto& result = proto.result(i); - if ( i ) std::cout << ", "; - std::cout << result.name(); - } - std::cout << " )" << std::endl << '{' << std::endl; - } - - //// - // get input and output parameter list - // - for (size_t i = 0; i < proto.paramCount(); ++i) { - inputList.push_back(proto.param(i).name()); - } - for (size_t i = 0; i < proto.resultCount(); ++i) { - outputList.push_back(proto.result(i).name()); - } - - //// - // generate OpenVX C code preamble - // - openvxFilenameC = openvxFolder + "/annmodule.cpp"; - ovxC.open(openvxFilenameC); - if(!ovxC) { - printf("ERROR: unable to create: %s\n", openvxFilenameC.c_str()); - exit(1); - } - } - - virtual void endGraph( const nnef::Prototype& proto ) - { - // show NNEF syntax - if(verbose & 1) { - std::cout << '}' << std::endl; - } - - //// - // generate OpenVX C code preamble - // - ovxC << "#include \"annmodule.h\"" << std::endl - << "#include " << std::endl - << "#include " << std::endl - << "#include " << std::endl - << "#include " << std::endl - << std::endl - << "#define ERROR_CHECK_OBJECT(obj) { vx_status status = vxGetStatus((vx_reference)(obj)); if(status != VX_SUCCESS) { vxAddLogEntry((vx_reference)context, status , \"ERROR: failed with status = (%d) at \" __FILE__ \"#%d\\n\", status, __LINE__); return status; } }" << std::endl - << "#define ERROR_CHECK_STATUS(call) { vx_status status = (call); if(status != VX_SUCCESS) { vxAddLogEntry((vx_reference)context, status, \"ERROR: failed with status = (%d) at \" __FILE__ \"#%d\\n\", status, __LINE__); return status; } }" << std::endl - << std::endl - << "static vx_status initializeTensor(vx_context context, vx_tensor tensor, FILE * fp, const char * binaryFilename)" << std::endl - << "{" << std::endl - << " vx_enum data_type = VX_TYPE_FLOAT32;" << std::endl - << " vx_size num_of_dims = 4, dims[4] = { 1, 1, 1, 1 }, stride[4];" << std::endl - << " ERROR_CHECK_STATUS(vxQueryTensor(tensor, VX_TENSOR_DATA_TYPE, &data_type, sizeof(vx_enum)));" << std::endl - << " ERROR_CHECK_STATUS(vxQueryTensor(tensor, VX_TENSOR_NUMBER_OF_DIMS, &num_of_dims, sizeof(vx_size)));" << std::endl - << " ERROR_CHECK_STATUS(vxQueryTensor(tensor, VX_TENSOR_DIMS, &dims, num_of_dims * sizeof(vx_size)));" << std::endl - << " vx_size itemsize = sizeof(float);" << std::endl - << " if(data_type == VX_TYPE_UINT8 || data_type == VX_TYPE_INT8) {" << std::endl - << " itemsize = sizeof(vx_uint8);" << std::endl - << " }" << std::endl - << " else if(data_type == VX_TYPE_UINT16 || data_type == VX_TYPE_INT16 || data_type == VX_TYPE_FLOAT16) {" << std::endl - << " itemsize = sizeof(vx_uint16);" << std::endl - << " }" << std::endl - << " vx_size count = dims[0] * dims[1] * dims[2] * dims[3];" << std::endl - << std::endl - << " vx_uint32 h[2] = { 0 };" << std::endl - << " fread(h, 1, sizeof(h), fp);" << std::endl - << " if(h[0] != 0x" << std::hex << VARIABLES_DATA_MAGIC << std::dec << " || (vx_size)h[1] != (count*itemsize)) {" << std::endl - << " vxAddLogEntry((vx_reference)tensor, VX_FAILURE, \"ERROR: invalid data (magic,size)=(0x%x,%d) in %s at byte position %d -- expected size is %ld\\n\", h[0], h[1], binaryFilename, ftell(fp)-sizeof(h), count*itemsize);" << std::endl - << " return VX_FAILURE;" << std::endl - << " }" << std::endl - << std::endl - << " vx_map_id map_id;" << std::endl - << " float * ptr;" << std::endl - << " ERROR_CHECK_STATUS(vxMapTensorPatch(tensor, num_of_dims, nullptr, nullptr, &map_id, stride, (void **)&ptr, VX_WRITE_ONLY, VX_MEMORY_TYPE_HOST));" << std::endl - << " vx_size n = fread(ptr, itemsize, count, fp);" << std::endl - << " if(n != count) {" << std::endl - << " vxAddLogEntry((vx_reference)tensor, VX_FAILURE, \"ERROR: expected char[%ld], but got char[%ld] in %s\\n\", count*itemsize, n*itemsize, binaryFilename);" << std::endl - << " return VX_FAILURE;" << std::endl - << " }" << std::endl - << " ERROR_CHECK_STATUS(vxUnmapTensorPatch(tensor, map_id));" << std::endl - << std::endl - << " return VX_SUCCESS;" << std::endl - << "}" << std::endl - << std::endl - << "vx_status annAddToGraph(vx_graph graph"; - for(auto& name : inputList) { - ovxC << ", vx_tensor " << name; - } - for(auto& name : outputList) { - ovxC << ", vx_tensor " << name; - } - ovxC << ", const char * binaryFilename)" << std::endl - << "{" << std::endl - << " vx_context context = vxGetContext((vx_reference)graph);" << std::endl - << " ERROR_CHECK_OBJECT(context);" << std::endl - << " ERROR_CHECK_STATUS(vxLoadKernels(context, \"vx_nn\"));" << std::endl; - - //// - // get variables - // - for(size_t i = 0; i < opsProto.size(); i++) { - codeGenOperation(i, true, false, verbose); - } - - //// - // get data - // - for(auto& name : variableList) { - unsigned int size = 0; - char * data = nullptr; - if(variableShape.find(name) != variableShape.end() && variableLabel.find(name) != variableLabel.end()) { - auto& shape = variableShape[name]; - auto& label = variableLabel[name]; - size = loadTensorFile(nnefFolder, label, shape, data); - } - if(size > 0 && data) { - std::tuple binary(size, data); - variableBinary[name] = binary; - } - else { - printf("ERROR: unable to load binary data for variable '%s'\n", name.c_str()); - exit(1); - } - } - - //// - // merge variables - // - codeGenMergeVariables(); - - //// - // create and initialize variables file - // - ovxC << std::endl; - ovxC << " // create variables" << std::endl; - for(auto& name : variableList) { - if(!variableMerged[name]) { - if(variableShape.find(name) != variableShape.end()) { - auto& shape = variableShape[name]; - int num_dims = 0; - auto it = variableRequiredDims.find(name); - if(it != variableRequiredDims.end()) { - num_dims = it->second; - } - ovxC << codeGenTensorCreate(name, shape, false, num_dims); - } - else { - printf("ERROR: something wrong with variable '%s': variableShape is missing\n", name.c_str()); - exit(1); - } - } - } - ovxC << std::endl - << " // initialize variables" << std::endl - << " FILE * fp__variables = fopen(binaryFilename, \"rb\");" << std::endl - << " if(!fp__variables) {" << std::endl - << " vxAddLogEntry((vx_reference)context, VX_FAILURE, \"ERROR: unable to open: %s\\n\", binaryFilename);" << std::endl - << " return VX_FAILURE;" << std::endl - << " }" << std::endl - << " { vx_uint32 magic = 0;" << std::endl - << " fread(&magic, 1, sizeof(magic), fp__variables);" << std::endl - << " if(magic != 0x" << std::hex << VARIABLES_FILE_MAGIC << std::dec << ") {" << std::endl - << " vxAddLogEntry((vx_reference)context, VX_FAILURE, \"ERROR: invalid file magic in %s\\n\", binaryFilename);" << std::endl - << " return VX_FAILURE;" << std::endl - << " }" << std::endl - << " }" << std::endl; - std::string variablesFilename = openvxFolder + "/weights.bin"; - FILE * fpVariables = fopen(variablesFilename.c_str(), "wb"); - if(!fpVariables) { - printf("ERROR: unable to create: %s\n", variablesFilename.c_str()); - exit(1); - } - unsigned int magic_file = VARIABLES_FILE_MAGIC; - unsigned int magic_data = VARIABLES_DATA_MAGIC; - fwrite(&magic_file, 1, sizeof(magic_file), fpVariables); - for(auto& name : variableList) { - if(!variableMerged[name]) { - if(variableShape.find(name) != variableShape.end()) { - auto& shape = variableShape[name]; - std::tuple binary = variableBinary[name]; - unsigned int size = std::get<0>(binary); - char * data = std::get<1>(binary); - if(size > 0 && data) { - fwrite(&magic_data, 1, sizeof(magic_data), fpVariables); - fwrite(&size, 1, sizeof(size), fpVariables); - fwrite(data, 1, size, fpVariables); - delete[] data; - std::tuple empty(0, nullptr); - variableBinary[name] = empty; - ovxC << " ERROR_CHECK_STATUS(initializeTensor(context, " << name << ", fp__variables, binaryFilename));" << std::endl; - } - else { - printf("ERROR: something wrong with variable '%s': variableBinary is not valid\n", name.c_str()); - exit(1); - } - } - else { - printf("ERROR: something wrong with variable '%s': variableShape is missing\n", name.c_str()); - exit(1); - } - } - } - unsigned int magic_eoff = VARIABLES_EOFF_MAGIC; - fwrite(&magic_eoff, 1, sizeof(magic_eoff), fpVariables); - fclose(fpVariables); - ovxC << " { vx_uint32 magic = 0;" << std::endl - << " fread(&magic, 1, sizeof(magic), fp__variables);" << std::endl - << " if(magic != 0x" << std::hex << VARIABLES_EOFF_MAGIC << std::dec << ") {" << std::endl - << " vxAddLogEntry((vx_reference)context, VX_FAILURE, \"ERROR: invalid eoff magic in %s\\n\", binaryFilename);" << std::endl - << " return VX_FAILURE;" << std::endl - << " }" << std::endl - << " fclose(fp__variables);" << std::endl - << " }" << std::endl; - std::cout << "OK: created '" << variablesFilename << "'" << std::endl; - - //// - // instantiate nodes in graph - // - ovxC << std::endl; - ovxC << " // create nodes in graph" << std::endl; - for(auto i = 0; i < opsProto.size(); i++) { - codeGenOperation(i, false, true, 0); - } - - //// - // generate clean-up code - // - ovxC << std::endl; - ovxC << " // release internal tensors" << std::endl; - for(auto& name : virtualList) { - if(virtualRename.find(name) == virtualRename.end()) { - ovxC << " ERROR_CHECK_STATUS(vxReleaseTensor(&" << name << "));" << std::endl; - } - } - for(auto& name : variableList) { - if(!variableMerged[name]) { - ovxC << " ERROR_CHECK_STATUS(vxReleaseTensor(&" << name << "));" << std::endl; - } - } - ovxC << std::endl; - ovxC << " return VX_SUCCESS;" << std::endl; - ovxC << "}" << std::endl; - ovxC.close(); - std::cout << "OK: created '" << openvxFilenameC << "'" << std::endl; - - //// - // generate OpenVX header file - // - openvxFilenameC = openvxFolder + "/annmodule.h"; - ovxC.open(openvxFilenameC); - if(!ovxC) { - printf("ERROR: unable to create: %s\n", openvxFilenameC.c_str()); - exit(1); - } - ovxC << "#ifndef included_file_annmodule_h" << std::endl - << "#define included_file_annmodule_h" << std::endl - << std::endl - << "#include " << std::endl - << std::endl; - ovxC << "////" << std::endl - << "// initialize graph neural network for inference" << std::endl; - for(auto& name : inputList) { - if(inputShape.find(name) != inputShape.end()) { - std::vector dims; - getTensorDims(inputShape[name], dims, 4); - ovxC << "// " << name << " -- dims[] = {"; - for(size_t i = 0; i < dims.size(); i++) { - ovxC << (i == 0 ? " " : ", ") << dims[i]; - } - ovxC << " } (input)" << std::endl; - } - } - for(auto& name : outputList) { - if(outputShape.find(name) != outputShape.end()) { - std::vector dims; - getTensorDims(outputShape[name], dims, 4); - ovxC << "// " << name << " -- dims[] = {"; - for(size_t i = 0; i < dims.size(); i++) { - ovxC << (i == 0 ? " " : ", ") << dims[i]; - } - ovxC << " } (output)" << std::endl; - } - } - ovxC << "//" << std::endl - << "vx_status annAddToGraph(vx_graph graph"; - for(auto& name : inputList) { - ovxC << ", vx_tensor " << name; - } - for(auto& name : outputList) { - ovxC << ", vx_tensor " << name; - } - ovxC << ", const char * binaryFilename);" << std::endl - << std::endl - << "#endif" << std::endl; - ovxC.close(); - std::cout << "OK: created '" << openvxFilenameC << "'" << std::endl; - - //// - // generate a simple test program - // - openvxFilenameC = openvxFolder + "/anntest.cpp"; - ovxC.open(openvxFilenameC); - if(!ovxC) { - printf("ERROR: unable to create: %s\n", openvxFilenameC.c_str()); - exit(1); - } - ovxC << "#include \"annmodule.h\"" << std::endl - << "#include " << std::endl - << "#include " << std::endl - << "#include " << std::endl - << "#include " << std::endl - << "#include " << std::endl - << "#include " << std::endl - << "#include " << std::endl - << "#include " << std::endl - << "" << std::endl - << "#if ENABLE_OPENCV" << std::endl - << "#include " << std::endl - << "using namespace cv; " << std::endl - << "#endif" << std::endl - << "" << std::endl - << "#define ERROR_CHECK_STATUS(call) { vx_status status = (call); if(status != VX_SUCCESS) { printf(\"ERROR: failed with status = (%d) at \" __FILE__ \"#%d\", status, __LINE__); return -1; } }" << std::endl - << "" << std::endl - << "static void VX_CALLBACK log_callback(vx_context context, vx_reference ref, vx_status status, const vx_char string[])" << std::endl - << "{" << std::endl - << " size_t len = strlen(string);" << std::endl - << " if (len > 0) {" << std::endl - << " printf(\"%s\", string);" << std::endl - << " if (string[len - 1] != '\\n')" << std::endl - << " printf(\"\\n\");" << std::endl - << " fflush(stdout);" << std::endl - << " }" << std::endl - << "}" << std::endl - << "" << std::endl - << "inline int64_t clockCounter()" << std::endl - << "{" << std::endl - << " return std::chrono::high_resolution_clock::now().time_since_epoch().count();" << std::endl - << "}" << std::endl - << "" << std::endl - << "inline int64_t clockFrequency()" << std::endl - << "{" << std::endl - << " return std::chrono::high_resolution_clock::period::den / std::chrono::high_resolution_clock::period::num;" << std::endl - << "}" << std::endl - << "" << std::endl - << "static vx_status copyTensor(vx_tensor tensor, std::string fileName, vx_enum usage = VX_WRITE_ONLY)" << std::endl - << "{" << std::endl - << " vx_enum data_type = VX_TYPE_FLOAT32;" << std::endl - << " vx_size num_of_dims = 4, dims[4] = { 1, 1, 1, 1 }, stride[4];" << std::endl - << " vxQueryTensor(tensor, VX_TENSOR_DATA_TYPE, &data_type, sizeof(data_type));" << std::endl - << " vxQueryTensor(tensor, VX_TENSOR_NUMBER_OF_DIMS, &num_of_dims, sizeof(num_of_dims));" << std::endl - << " vxQueryTensor(tensor, VX_TENSOR_DIMS, &dims, sizeof(dims[0])*num_of_dims);" << std::endl - << " vx_size itemsize = sizeof(float);" << std::endl - << " if(data_type == VX_TYPE_UINT8 || data_type == VX_TYPE_INT8) {" << std::endl - << " itemsize = sizeof(vx_uint8);" << std::endl - << " }" << std::endl - << " else if(data_type == VX_TYPE_UINT16 || data_type == VX_TYPE_INT16 || data_type == VX_TYPE_FLOAT16) {" << std::endl - << " itemsize = sizeof(vx_uint16);" << std::endl - << " }" << std::endl - << " vx_size count = dims[0] * dims[1] * dims[2] * dims[3];" << std::endl - << " vx_map_id map_id;" << std::endl - << " float * ptr;" << std::endl - << " vx_status status = vxMapTensorPatch(tensor, num_of_dims, nullptr, nullptr, &map_id, stride, (void **)&ptr, usage, VX_MEMORY_TYPE_HOST);" << std::endl - << " if(status) {" << std::endl - << " std::cerr << \"ERROR: vxMapTensorPatch() failed for \" << fileName << std::endl;" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " if(usage == VX_WRITE_ONLY) {" << std::endl - << "#if ENABLE_OPENCV" << std::endl - << " if(dims[3] == 1 && dims[2] == 3 && fileName.size() > 4 && (fileName.substr(fileName.size()-4, 4) == \".png\" || fileName.substr(fileName.size()-4, 4) == \".jpg\"))" << std::endl - << " {" << std::endl - << " Mat img = imread(fileName.c_str(), CV_LOAD_IMAGE_COLOR);" << std::endl - << " if(!img.data || img.rows != dims[1] || img.cols != dims[0]) {" << std::endl - << " std::cerr << \"ERROR: invalid image or dimensions in \" << fileName << std::endl;" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " unsigned char * src = img.data;" << std::endl - << " for(vx_size c = 0; c < 3; c++) {" << std::endl - << " for(vx_size y = 0; y < dims[1]; y++) {" << std::endl - << " for(vx_size x = 0; x < dims[0]; x++) {" << std::endl - << " ptr[(c*stride[2]+y*stride[1]+x*stride[0])>>2] = src[y*dims[0]*3+x*3+c];" << std::endl - << " }" << std::endl - << " }" << std::endl - << " }" << std::endl - << " }" << std::endl - << " else" << std::endl - << "#endif" << std::endl - << " {" << std::endl - << " FILE * fp = fopen(fileName.c_str(), \"rb\");" << std::endl - << " if(!fp) {" << std::endl - << " std::cerr << \"ERROR: unable to open: \" << fileName << std::endl;" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " vx_size n = fread(ptr, itemsize, count, fp);" << std::endl - << " fclose(fp);" << std::endl - << " if(n != count) {" << std::endl - << " std::cerr << \"ERROR: expected char[\" << count*itemsize << \"], but got char[\" << n*itemsize << \"] in \" << fileName << std::endl;" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " }" << std::endl - << " }" << std::endl - << " else {" << std::endl - << " FILE * fp = fopen(fileName.c_str(), \"wb\");" << std::endl - << " if(!fp) {" << std::endl - << " std::cerr << \"ERROR: unable to open: \" << fileName << std::endl;" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " fwrite(ptr, itemsize, count, fp);" << std::endl - << " fclose(fp);" << std::endl - << " }" << std::endl - << " status = vxUnmapTensorPatch(tensor, map_id);" << std::endl - << " if(status) {" << std::endl - << " std::cerr << \"ERROR: vxUnmapTensorPatch() failed for \" << fileName << std::endl;" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " return 0;" << std::endl - << "}" << std::endl - << "" << std::endl - << "int main(int argc, const char ** argv)" << std::endl - << "{" << std::endl - << " // check command-line usage" << std::endl - << " if(argc < 2) {" << std::endl - << " printf(\"Usage: anntest [...]\\n\");" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " const char * binaryFilename = argv[1];" << std::endl - << " argc -= 2;" << std::endl - << " argv += 2;" << std::endl - << "" << std::endl - << " // create context, input, output, and graph" << std::endl - << " vxRegisterLogCallback(NULL, log_callback, vx_false_e);" << std::endl - << " vx_context context = vxCreateContext();" << std::endl - << " if(vxGetStatus((vx_reference)context)) {" << std::endl - << " printf(\"ERROR: vxCreateContext() failed\\n\");" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " vxRegisterLogCallback(context, log_callback, vx_false_e);" << std::endl - << "" << std::endl - << " // create input tensors and initialize" << std::endl - ; - for(auto& name : inputList) { - std::vector dims; - getTensorDims(inputShape[name], dims, 4); - ovxC << " vx_size " << name << "_dims[" << dims.size() << "] = {"; - for(size_t i = 0; i < dims.size(); i++) { - ovxC << (i == 0 ? " " : ", ") << dims[i]; - } - ovxC << " };" << std::endl - << " vx_tensor " << name << " = vxCreateTensor(context, " << dims.size() << ", " << name << "_dims, VX_TYPE_FLOAT32, 0);" << std::endl - << " if(vxGetStatus((vx_reference)" << name << ")) {" << std::endl - << " printf(\"ERROR: vxCreateTensor() failed for " << name << "\\n\");" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " if(*argv) {" << std::endl - << " if(strcmp(*argv, \"-\") != 0) {" << std::endl - << " if(copyTensor(" << name << ", *argv, VX_WRITE_ONLY) < 0) {" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " printf(\"OK: read tensor '" << name << "' from %s\\n\", *argv);" << std::endl - << " }" << std::endl - << " argv++;" << std::endl - << " }" << std::endl - ; - } - ovxC << " // create output tensors" << std::endl; - for(auto& name : outputList) { - std::vector dims; - getTensorDims(outputShape[name], dims, 4); - ovxC << " vx_size " << name << "_dims[" << dims.size() << "] = {"; - for(size_t i = 0; i < dims.size(); i++) { - ovxC << (i == 0 ? " " : ", ") << dims[i]; - } - ovxC << " };" << std::endl - << " vx_tensor " << name << " = vxCreateTensor(context, " << dims.size() << ", " << name << "_dims, VX_TYPE_FLOAT32, 0);" << std::endl - << " if(vxGetStatus((vx_reference)" << name << ")) {" << std::endl - << " printf(\"ERROR: vxCreateTensor() failed for " << name << "\\n\");" << std::endl - << " return -1;" << std::endl - << " }" << std::endl; - } - ovxC << "" << std::endl - << " // build graph using annmodule" << std::endl - << " vx_status status;" << std::endl - << " int64_t freq = clockFrequency(), t0, t1;" << std::endl - << " t0 = clockCounter();" << std::endl - << " vx_graph graph = vxCreateGraph(context);" << std::endl - << " status = vxGetStatus((vx_reference)graph);" << std::endl - << " if(status) {" << std::endl - << " printf(\"ERROR: vxCreateGraph(...) failed (%d)\\n\", status);" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " status = annAddToGraph(graph, " - ; - for(auto& name : inputList) { - ovxC << name << ", "; - } - for(auto& name : outputList) { - ovxC << name << ", "; - } - ovxC << "binaryFilename);" << std::endl - << " if(status) {" << std::endl - << " printf(\"ERROR: annAddToGraph() failed (%d)\\n\", status);" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " status = vxVerifyGraph(graph);" << std::endl - << " if(status) {" << std::endl - << " printf(\"ERROR: vxVerifyGraph(...) failed (%d)\\n\", status);" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " t1 = clockCounter();" << std::endl - << " printf(\"OK: graph initialization with annAddToGraph() took %.3f msec\\n\", (float)(t1-t0)*1000.0f/(float)freq);" << std::endl - << "" << std::endl - << " t0 = clockCounter();" << std::endl - << " status = vxProcessGraph(graph);" << std::endl - << " t1 = clockCounter();" << std::endl - << " if(status != VX_SUCCESS) {" << std::endl - << " printf(\"ERROR: vxProcessGraph() failed (%d)\\n\", status);" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " printf(\"OK: vxProcessGraph() took %.3f msec (1st iteration)\\n\", (float)(t1-t0)*1000.0f/(float)freq);" << std::endl - << "" << std::endl - << " // write outputs" << std::endl - ; - for(auto& name : outputList) { - ovxC << " if(*argv) {" << std::endl - << " if(strcmp(*argv, \"-\") != 0) {" << std::endl - << " if(copyTensor(" << name << ", *argv, VX_READ_ONLY) < 0) {" << std::endl - << " return -1;" << std::endl - << " }" << std::endl - << " printf(\"OK: wrote tensor '" << name << "' into %s\\n\", *argv);" << std::endl - << " }" << std::endl - << " argv++;" << std::endl - << " }" << std::endl - ; - } - ovxC << "" << std::endl - << " t0 = clockCounter();" << std::endl - << " int N = 100;" << std::endl - << " for(int i = 0; i < N; i++) {" << std::endl - << " status = vxProcessGraph(graph);" << std::endl - << " if(status != VX_SUCCESS)" << std::endl - << " break;" << std::endl - << " }" << std::endl - << " t1 = clockCounter();" << std::endl - << " printf(\"OK: vxProcessGraph() took %.3f msec (average over %d iterations)\\n\", (float)(t1-t0)*1000.0f/(float)freq/(float)N, N);" << std::endl - << "" << std::endl - << " // release resources" << std::endl - << " ERROR_CHECK_STATUS(vxReleaseGraph(&graph));" << std::endl - ; - for(auto& name : inputList) { - ovxC << " ERROR_CHECK_STATUS(vxReleaseTensor(&" << name << "));" << std::endl; - } - for(auto& name : outputList) { - ovxC << " ERROR_CHECK_STATUS(vxReleaseTensor(&" << name << "));" << std::endl; - } - ovxC << " ERROR_CHECK_STATUS(vxReleaseContext(&context));" << std::endl - << " printf(\"OK: successful\\n\");" << std::endl - << "" << std::endl - << " return 0;" << std::endl - << "}" << std::endl - ; - ovxC.close(); - std::cout << "OK: created '" << openvxFilenameC << "'" << std::endl; - - //// - // generate CMakeLists.txt - // - openvxFilenameC = openvxFolder + "/CMakeLists.txt"; - ovxC.open(openvxFilenameC); - if(!ovxC) { - printf("ERROR: unable to create: %s\n", openvxFilenameC.c_str()); - exit(1); - } - ovxC << "cmake_minimum_required(VERSION 3.5)" << std::endl - << "project (annmodule)" << std::endl - << "set (CMAKE_CXX_STANDARD 14) " << std::endl - << "list(APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake)" << std::endl - << "find_package(OpenCL REQUIRED)" << std::endl - << "find_package(OpenCV QUIET)" << std::endl - << "include_directories (${OpenCL_INCLUDE_DIRS} ${OpenCL_INCLUDE_DIRS}/Headers )" << std::endl - << "include_directories (/opt/rocm/include/mivisionx)" << std::endl - << "link_directories (/opt/rocm/lib)" << std::endl - << "list(APPEND SOURCES annmodule.cpp)" << std::endl - << "add_library(${PROJECT_NAME} SHARED ${SOURCES})" << std::endl - << "set(CMAKE_CXX_FLAGS \"${CMAKE_CXX_FLAGS} -msse4.2 -std=gnu++14\")" << std::endl - << "target_link_libraries(${PROJECT_NAME} openvx vx_nn pthread)" << std::endl - << "add_executable(anntest anntest.cpp)" << std::endl - << "if (OpenCV_FOUND)" << std::endl - << " target_compile_definitions(anntest PUBLIC ENABLE_OPENCV=1)" << std::endl - << " include_directories(${OpenCV_INCLUDE_DIRS})" << std::endl - << " target_link_libraries(anntest ${OpenCV_LIBRARIES})" << std::endl - << "else(OpenCV_FOUND)" << std::endl - << " target_compile_definitions(anntest PUBLIC ENABLE_OPENCV=0)" << std::endl - << "endif(OpenCV_FOUND)" << std::endl - << "target_link_libraries(anntest openvx vx_nn pthread ${PROJECT_NAME})" << std::endl - ; - ovxC.close(); - std::cout << "OK: created '" << openvxFilenameC << "'" << std::endl; - } - - virtual void operation(const nnef::Prototype& proto, - const nnef::Dictionary& args, - const nnef::Dictionary& shapes) - { - // save the operation details - opsProto.push_back(proto); - opsValues.push_back(args); - opsShapes.push_back(shapes); - operationRemoved.push_back(false); - } - - virtual bool isAtomic( const nnef::Prototype& proto, const nnef::Dictionary& args ) - { - static std::set atomics = - { - "sqr", "sqrt", "min", "max", - "softmax", "relu", "tanh", "sigmoid", - "batch_normalization", "max_pool", "avg_pool", - "quantize_linear", "quantize_logarithmic" - }; - return atomics.find(proto.name()) != atomics.end(); - } -}; - -int main(int argc, const char * argv[]) -{ - //// - // get command-line parameters - // - int verbose = 0; - bool useVirtual = true; - while(argc > 1 && argv[1][0] == '-') { - if(!strcmp(argv[1], "--no-virtual")) { - useVirtual = false; - argc -= 1; - argv += 1; - } - else if(argc > 2 && !strcmp(argv[1], "-v")) { - verbose = atoi(argv[2]); - argc -= 2; - argv += 2; - } - else { - printf("ERROR: invalid option: %s\n", argv[1]); - return -1; - } - } - if(argc < 3) { - printf("Usage: nnef2openvx [-v ] [--no-virtual] \n"); - return -1; - } - std::string nnefContainedFolder = argv[1]; - std::string openvxOutputFolder = argv[2]; - std::string nnefFilename = nnefContainedFolder + "/graph.nnef"; - - //// - // parse NNEF structure and translate to OpenVX code - // - std::ifstream ifs(nnefFilename.c_str()); - if(!ifs) { - printf("ERROR: unable to open: %s\n", nnefFilename.c_str()); - return -1; - } - mkdir(openvxOutputFolder.c_str(), 0777); - printf("OK: parsing %s ...\n", nnefFilename.c_str()); - std::unique_ptr parser((nnef::Parser*)new nnef::FlatParser()); - try { - NNEF2OpenVX_Translator callback(nnefContainedFolder, openvxOutputFolder, useVirtual, verbose); - parser->parse(ifs, callback); - } - catch(nnef::Error e) { - printf("Parse error: [%u:%u] %s\n", e.position().line, e.position().column, e.what()); - auto origin = e.position().origin; - while(origin) { - printf("... evaluated from [%u:%u]\n", origin->line, origin->column); - origin = origin->origin; - } - } - ifs.close(); - - return 0; -} diff --git a/utilities/mv_deploy/README.md b/utilities/mv_deploy/README.md index 5cb14fd0a4..1e7a9067ce 100644 --- a/utilities/mv_deploy/README.md +++ b/utilities/mv_deploy/README.md @@ -6,12 +6,12 @@ mv_deploy consists of a model-compiler and necessary header/.cpp files which are The "mv_compile" will be built as part of MIVisionX package installer To build and application using mv_compile, the user can use the deployment api from mv_deploy.h. -The entire use of the mv_compile and deployment is shown in [mv_objdetectsample](../samples/inference/mv_objdetect) +The entire use of the mv_compile and deployment is shown in [mv_objdetectsample](../samples/mv_objdetect) The sample demonstrates the use of mv_compile utility to do video decoding and inference. ## Prerequisites -* Ubuntu `18.04`/`20.04` or CentOS `7`/`8` +* Ubuntu `20.04`/`22.04` or CentOS `7`/`8` * [ROCm supported hardware](https://rocm.github.io/ROCmInstall.html#hardware-support) * AMD Radeon GPU or APU required * [ROCm](https://github.com/RadeonOpenCompute/ROCm#installing-from-amd-rocm-repositories)