Skip to content

Commit

Permalink
docs: Add initial profiling doc
Browse files Browse the repository at this point in the history
Give an overview of the profiling infrastructure in vAccel. Pending
the remote functionality.

Signed-off-by: Anastassios Nanos <[email protected]>
Reviewed-by: Kostis Papazafeiropoulos <[email protected]>
Approved-by: Kostis Papazafeiropoulos <[email protected]>
PR: #116
  • Loading branch information
ananos authored and papazof committed Jun 30, 2024
1 parent 9f5facf commit e8d8f29
Showing 1 changed file with 293 additions and 0 deletions.
293 changes: 293 additions & 0 deletions docs/profiling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,293 @@
## Enable profiling

Profiling is an essential aspect of performance optimization in software
development. By enabling profiling, we can instrument our code to identify and
understand sources of overhead. This information is crucial for optimizing
code, improving performance, and ensuring efficient use of resources. In the
context of the vAccel framework, profiling allows developers to track
performance metrics of vAccel API operations and plugins.

### Profiling Stack

vAccel profiling is built around the concept of profiling regions. A region
represents a specific part of the code where performance data is collected. The
profiling stack includes various functions to manage these regions, from
initialization to data collection and reporting.

### Profiling Components

#### Region Initialization and Destruction

**Initialization**: A profiling region is initialized using
`vaccel_prof_region_init()`. This sets up the region with a specified name.

**Destruction**: When a region is no longer needed,
`vaccel_prof_region_destroy()` is used to clean up any resources associated
with it.

#### Start and Stop Profiling

**Start Profiling**: Use `vaccel_prof_region_start()` to begin collecting
performance data for a region.

**Stop Profiling**: To stop data collection, `vaccel_prof_region_stop()` is
used.

#### Data Collection and Reporting

**Sample Collection**: The profiling system collects samples of performance
data during the execution of a region.

**Reporting**: `vaccel_prof_region_print()` is used to output the collected
data for analysis.

#### Batch Operations

**Start by Name**: `vaccel_prof_regions_start_by_name()` allows starting
multiple regions by their names.

**Stop by Name**: `vaccel_prof_regions_stop_by_name()` stops multiple regions
similarly.

**Clear Regions**: `vaccel_prof_regions_clear()` clears collected data from
regions.

**Print All**: `vaccel_prof_regions_print_all()` prints profiling data for all
regions.

#### Enabling Profiling

Profiling in vAccel is enabled by setting an environment variable. To activate
profiling, set the environment variable `VACCEL_PROF_ENABLED` to `enabled`.

```bash
export VACCEL_PROF_ENABLED=enabled
```

### Adding Profiling to Your vAccel API Operation or Plugin

To add profiling to your vaccel API operation or plugin, follow these steps:

#### Include Profiling Headers

Ensure that you include the necessary headers in your code:

```c
#include <vaccel_prof.h>
```

#### Define and Initialize a Profiling Region

Define a `vaccel_prof_region` structure and initialize it:

```c
struct vaccel_prof_region my_region = VACCEL_PROF_REGION_INIT("my_operation");
vaccel_prof_region_init(&my_region, "my_operation");
```
#### Start Profiling
Before the code section you want to profile, start profiling the region:
```c
vaccel_prof_region_start(&my_region);
```

#### Stop Profiling

After the code section, stop profiling the region:

```c
vaccel_prof_region_stop(&my_region);
```
#### Print Profiling Results
To output the profiling results, use:
```c
vaccel_prof_region_print(&my_region);
```

#### Destroy the Profiling Region

Finally, clean up the profiling region:

```c
vaccel_prof_region_destroy(&my_region);
```
### Example Code
```c
#include <vaccel_prof.h>
void my_vaccel_operation() {
struct vaccel_prof_region my_region = VACCEL_PROF_REGION_INIT("my_operation");
vaccel_prof_region_init(&my_region, "my_operation");
vaccel_prof_region_start(&my_region);
// Your code here
vaccel_prof_region_stop(&my_region);
vaccel_prof_region_print(&my_region);
vaccel_prof_region_destroy(&my_region);
}
```

By following these steps, you can effectively instrument your code with
profiling regions, allowing you to gather valuable performance data and
optimize your vAccel operations and plugins.

Below you can find example output from a torch speech classification tool:

```
$ time ./classifier ./cnn_trace.pt ../bert_cased_vocab.txt 0
2024.06.24-17:49:46.45 - <info> vAccel v0.6.0
2024.06.24-17:49:46.45 - <info> Registered plugin torch 0.1
== [Vocab Loaded] ==
vaccel_single_model_set_path(): Time Taken: 0 milliseconds
vaccel_single_model_register(): Time Taken: 0 milliseconds
Created new model 1
Initialized vAccel session 1
vaccel_preprocess(): Time Taken: 4 milliseconds
2024.06.24-17:49:46.98 - <info> [prof] torch_input: total_time: 127964 nsec nr_entries: 1
2024.06.24-17:49:46.98 - <info> [prof] torch_run: total_time: 180927987 nsec nr_entries: 1
2024.06.24-17:49:46.98 - <info> [prof] torch_out: total_time: 31391 nsec nr_entries: 1
==========Show Result(vAccel)==========
Prediction: offensive-language
Legth: 27
Duration: 462
vaccel_preprocess(): Time Taken: 8 milliseconds
2024.06.24-17:49:47.43 - <info> [prof] torch_input: total_time: 162719 nsec nr_entries: 2
2024.06.24-17:49:47.43 - <info> [prof] torch_run: total_time: 321256735 nsec nr_entries: 2
2024.06.24-17:49:47.43 - <info> [prof] torch_out: total_time: 37025 nsec nr_entries: 2
==========Show Result(vAccel)==========
Prediction: offensive-language
Legth: 18
Duration: 438
vaccel_preprocess(): Time Taken: 7 milliseconds
2024.06.24-17:49:47.87 - <info> [prof] torch_input: total_time: 198476 nsec nr_entries: 3
2024.06.24-17:49:47.87 - <info> [prof] torch_run: total_time: 457327127 nsec nr_entries: 3
2024.06.24-17:49:47.87 - <info> [prof] torch_out: total_time: 42321 nsec nr_entries: 3
==========Show Result(vAccel)==========
Prediction: offensive-language
Legth: 23
Duration: 436
vaccel_preprocess(): Time Taken: 7 milliseconds
2024.06.24-17:49:48.30 - <info> [prof] torch_input: total_time: 233884 nsec nr_entries: 4
2024.06.24-17:49:48.30 - <info> [prof] torch_run: total_time: 586921978 nsec nr_entries: 4
2024.06.24-17:49:48.30 - <info> [prof] torch_out: total_time: 47551 nsec nr_entries: 4
==========Show Result(vAccel)==========
Prediction: neither
Legth: 11
Duration: 413
vaccel_preprocess(): Time Taken: 7 milliseconds
2024.06.24-17:49:48.64 - <info> [prof] torch_input: total_time: 269105 nsec nr_entries: 5
2024.06.24-17:49:48.64 - <info> [prof] torch_run: total_time: 721709173 nsec nr_entries: 5
2024.06.24-17:49:48.64 - <info> [prof] torch_out: total_time: 53077 nsec nr_entries: 5
==========Show Result(vAccel)==========
Prediction: offensive-language
Legth: 28
Duration: 333
vaccel_preprocess(): Time Taken: 8 milliseconds
2024.06.24-17:49:48.98 - <info> [prof] torch_input: total_time: 304956 nsec nr_entries: 6
2024.06.24-17:49:48.98 - <info> [prof] torch_run: total_time: 855291441 nsec nr_entries: 6
2024.06.24-17:49:48.98 - <info> [prof] torch_out: total_time: 58215 nsec nr_entries: 6
==========Show Result(vAccel)==========
Prediction: offensive-language
Legth: 21
Duration: 329
vaccel_preprocess(): Time Taken: 7 milliseconds
2024.06.24-17:49:49.29 - <info> [prof] torch_input: total_time: 340191 nsec nr_entries: 7
2024.06.24-17:49:49.29 - <info> [prof] torch_run: total_time: 988087659 nsec nr_entries: 7
2024.06.24-17:49:49.29 - <info> [prof] torch_out: total_time: 63358 nsec nr_entries: 7
==========Show Result(vAccel)==========
Prediction: offensive-language
Legth: 22
Duration: 310
vaccel_preprocess(): Time Taken: 8 milliseconds
2024.06.24-17:49:49.63 - <info> [prof] torch_input: total_time: 375959 nsec nr_entries: 8
2024.06.24-17:49:49.63 - <info> [prof] torch_run: total_time: 1119080025 nsec nr_entries: 8
2024.06.24-17:49:49.63 - <info> [prof] torch_out: total_time: 68831 nsec nr_entries: 8
==========Show Result(vAccel)==========
Prediction: offensive-language
Legth: 15
Duration: 327
vaccel_preprocess(): Time Taken: 7 milliseconds
2024.06.24-17:49:49.94 - <info> [prof] torch_input: total_time: 412773 nsec nr_entries: 9
2024.06.24-17:49:49.94 - <info> [prof] torch_run: total_time: 1249991356 nsec nr_entries: 9
2024.06.24-17:49:49.94 - <info> [prof] torch_out: total_time: 74032 nsec nr_entries: 9
==========Show Result(vAccel)==========
Prediction: offensive-language
Legth: 15
Duration: 303
vaccel_preprocess(): Time Taken: 8 milliseconds
2024.06.24-17:49:50.25 - <info> [prof] torch_input: total_time: 446881 nsec nr_entries: 10
2024.06.24-17:49:50.25 - <info> [prof] torch_run: total_time: 1380077421 nsec nr_entries: 10
2024.06.24-17:49:50.25 - <info> [prof] torch_out: total_time: 79266 nsec nr_entries: 10
==========Show Result(vAccel)==========
Prediction: neither
Legth: 8
Duration: 301
Total Time Taken: 3804 milliseconds
Total nr of Inference invocations: 10
Legth: 8
Average: 231
real 0m4.228s
user 0m4.578s
sys 0m1.235s
```

and the respective code:

```c
struct vaccel_prof_region torch_input_stats =
VACCEL_PROF_REGION_INIT("torch_input");
struct vaccel_prof_region torch_run_stats =
VACCEL_PROF_REGION_INIT("torch_run");
struct vaccel_prof_region torch_out_stats =
VACCEL_PROF_REGION_INIT("torch_out");

[..]
vaccel_prof_region_start(&torch_input_stats);

std::vector<torch::jit::IValue> input;
torch::jit::IValue output_ori;
torch::Tensor input_ids;
torch::Tensor masks;

size_t input_outer_size = in_tensors[0]->size;
size_t masks_outer_size = in_tensors[1]->size;

[..]

input_ids = input_ids.to(device);
masks = masks.to(device);

vaccel_prof_region_stop(&torch_input_stats);

input.push_back(input_ids);
input.push_back(masks);

vaccel_prof_region_start(&torch_run_stats);
output_ori = model_trace.forward(input);
vaccel_prof_region_stop(&torch_run_stats);

[..]

vaccel_prof_region_start(&torch_out_stats);

ret = tensors_to_vaccel(output, out_tensors, nr_outputs);
float* data = static_cast<float*>((*out_tensors)->data);
int64_t num_elements = output.numel();

vaccel_prof_region_stop(&torch_out_stats);

vaccel_prof_region_print(&torch_input_stats);
vaccel_prof_region_print(&torch_model_stats);
vaccel_prof_region_print(&torch_run_stats);
vaccel_prof_region_print(&torch_out_stats);
```

0 comments on commit e8d8f29

Please sign in to comment.