Replies: 1 comment 1 reply
-
That also might be the case. And, things to note is that, data loading and processing also depends on ur disk speed (for instance, ssd is faster than hdd to load the data) and gpu specs Num workers doesn't necessarily load the data faster. It justs prepare the batches of data in parallel to make it readily available to transfer into gpu, so it doesn't have to wait for additional time. As, colab drive is slower comparatively to other platforms. This also might be the case where setting Furthermore, in colab if u didn't restart the gpu and trained again while not freeing the previous model instance running on gpu. It may have caused slow training process. Tip To speed of ur training by >1.5x process try model = your_model_class( )
compiled_model = torch.compile(model)
# Train using compiled_model |
Beta Was this translation helpful? Give feedback.
-
When I first started working on data loaders, I noticed my training and testing times were much slower than what was demonstrated in the tutorial videos. Initially, I assumed this was due to a bottleneck in my system, but the same issue persisted even on faster machines I tested.
Today, while running my code, I noticed my CPU usage spiking. My initial thought was that the model was somehow training on the CPU, but Task Manager confirmed the GPU was being used, which left me puzzled. I had previously believed that increasing the
num_workers
parameter would speed up the data loading process, thinking more workers would result in faster batch transfers to the GPU. However, through experimentation, I discovered that highernum_workers
actually put significant strain on the CPU.The breakthrough came when I compared the results on my local system (with 12 CPU cores) to Google Colab (with only 2 cores). By adjusting
num_workers
to 2 on my local system, I saw a drastic reduction in both training time and CPU usage. This experiment revealed a key insight: when working with smaller datasets, a highnum_workers
value can become a bottleneck. Settingnum_workers
to0
resulted in significantly faster training times.NUM_WORKERS = 2
NUM_WORKERS = 0
CPU UTIL DURING THESE 2
Conclusion:
For small datasets, keeping
num_workers
low or even at0
can drastically improve performance by minimizing CPU load and optimizing data transfer to the GPU.Edit: added pics
Beta Was this translation helpful? Give feedback.
All reactions