Key Insight about num_workers and why your code may be running slow #1135

Finix-07 · 2024-11-05T20:33:25Z

Finix-07
Nov 5, 2024

When I first started working on data loaders, I noticed my training and testing times were much slower than what was demonstrated in the tutorial videos. Initially, I assumed this was due to a bottleneck in my system, but the same issue persisted even on faster machines I tested.

Today, while running my code, I noticed my CPU usage spiking. My initial thought was that the model was somehow training on the CPU, but Task Manager confirmed the GPU was being used, which left me puzzled. I had previously believed that increasing the num_workers parameter would speed up the data loading process, thinking more workers would result in faster batch transfers to the GPU. However, through experimentation, I discovered that higher num_workers actually put significant strain on the CPU.

The breakthrough came when I compared the results on my local system (with 12 CPU cores) to Google Colab (with only 2 cores). By adjusting num_workers to 2 on my local system, I saw a drastic reduction in both training time and CPU usage. This experiment revealed a key insight: when working with smaller datasets, a high num_workers value can become a bottleneck. Setting num_workers to 0 resulted in significantly faster training times.

NUM_WORKERS = 2

NUM_WORKERS = 0

CPU UTIL DURING THESE 2

Conclusion:

For small datasets, keeping num_workers low or even at 0 can drastically improve performance by minimizing CPU load and optimizing data transfer to the GPU.

Edit: added pics

LuluW8071 · 2024-11-09T03:02:29Z

LuluW8071
Nov 9, 2024

That also might be the case. And, things to note is that, data loading and processing also depends on ur disk speed (for instance, ssd is faster than hdd to load the data) and gpu specs

Num workers doesn't necessarily load the data faster. It justs prepare the batches of data in parallel to make it readily available to transfer into gpu, so it doesn't have to wait for additional time. As, colab drive is slower comparatively to other platforms. This also might be the case where setting num_workers to 0 or 2 doesn't show much improvement.

Furthermore, in colab if u didn't restart the gpu and trained again while not freeing the previous model instance running on gpu. It may have caused slow training process.

Tip

To speed of ur training by >1.5x process try torch.compile

model = your_model_class( )
compiled_model = torch.compile(model)

# Train using compiled_model

1 reply

Finix-07 Nov 9, 2024
Author

torch.compile is indeed really great..... amazing tip !!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Key Insight about num_workers and why your code may be running slow #1135

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Key Insight about num_workers and why your code may be running slow #1135

Finix-07 Nov 5, 2024

Conclusion:

Replies: 1 comment · 1 reply

LuluW8071 Nov 9, 2024

Finix-07 Nov 9, 2024 Author

Finix-07
Nov 5, 2024

Replies: 1 comment 1 reply

LuluW8071
Nov 9, 2024

Finix-07 Nov 9, 2024
Author