Skip to content

Latest commit

 

History

History
107 lines (79 loc) · 4.74 KB

launch_modes.md

File metadata and controls

107 lines (79 loc) · 4.74 KB

Launch Modes

Triton Model Analyzer's profile subcommand supports four different launch modes along with Triton Inference Server. In the local and docker modes, Triton Inference Server will be launched by the Model Analyzer. In the c_api mode, the Triton Inference Server is launched locally via a C API. In the remote mode, it is assumed there is an already running instance of Triton Inference Server.

Docker

CLI Option --triton-launch-mode=docker

Note: A full step by step example of docker mode can be found in the Quick Start Guide.

If your model requires a custom backend when running on the Triton Inference Server please use either local or remote mode.

In this mode, Model Analyzer uses the Python Docker API to launch the Triton Inference Server container. If you are running Model Analyzer inside a Docker container, make sure that the container is launched with appropriate flags. The following flags are mandatory for correct behavior:

--gpus <gpus> -v /var/run/docker.sock:/var/run/docker.sock --net host

Additionally, Model Analyzer uses the output_model_repository_path to manipulate and store model config variants. When Model Analyzer launches the Triton container, it does so as a sibling container. The launched Triton container will only have access to the host filesystem.

Finally, when launching model analyzer, the argument --output-model-repository must be provided as a directory inside <path-to-output-model-repository>. This directory need not exist.

--output-model-repository=<path-to-output-model-repository>/output

This mode is useful if you want to use the Model Analyzer installed in the Triton SDK Container. You will need Docker installed, though.

Local

CLI Option --triton-launch-mode=local

Local mode is the default mode if no triton-launch-mode is specified.

In this mode, Model Analyzer will launch Triton Server using the local binary supplied using --triton-server-path, or if none is supplied, the tritonserver binary in $PATH.

There are multiple ways to get Model Analyzer and TritonServer executables together for local mode, such as building a container that contains both, or pip installing Model analyzer wherever you already have a TritonServer executable

C API

CLI Option --triton-launch-mode=c_api

In this mode, Triton server is launched locally via the C_API by the perf_analyzer instances launched by Model Analyzer. Inference requests are sent directly via C-API function calls instead of going through the network via GRPC or HTTP.

This mode is useful if you want to run with the Triton Server installed locally and want the increased performance from the C API. Similar to the local mode, Triton Server must be installed in the environment that the Model Analyzer is being used.

The server metrics that Model Analyzer gathers and reports are not available directly from the triton server when running in C-API mode. Instead, Model Analyzer will attempt to gather this information itself. This can lead to less precise results, and will generally result in GPU utilization and power numbers being under-reported.

Remote

CLI Option --triton-launch-mode=remote

This mode is beneficial when you want to use an already running Triton Inference Server. You may provide the URLs for the Triton instance's HTTP or GRPC endpoint depending on your chosen client protocol using the --triton-grpc-endpoint, and --triton-http-endpoint flags. You should also make sure that same GPUs are available to the Inference Server and Model Analyzer and they are on the same machine. Triton Server in this mode needs to be launched with --model-control-mode explicit flag to support loading/unloading of the models.