A few optimizations that speedup training #627

hvoss-techfak · 2024-09-28T12:35:17Z

Thanks for creating such a great project.

I'm currently training on a rather large dataset (500+ GB) of audio data and have found a few optimizations that increase the pre-processing speed in the balance sampler from 10+ minutes down to ~15 seconds and changes to the sampling function to remove the for loop and use only torch functions, that made some pretty big changes for me (from 2 iterations/s to 3.5 iterations/s).

Both changes produce exactly the same results as before.

AlekseySh · 2024-10-01T08:51:36Z

Hey, @hvoss-techfak
thank you for the suggested changes and your interest in OML!

First, let's see if CI is green.

hvoss-techfak · 2024-10-01T08:53:05Z

I forgot to add the imports. One Second.

AlekseySh · 2024-10-01T08:54:05Z

Unfortunately, CI is not green, so, please, run tests locally (you will find the commands in the Makefile, you can also check the Contributing guide)

AlekseySh · 2024-10-01T09:17:12Z

@hvoss-techfak double check linters, please

hvoss-techfak · 2024-10-01T09:18:28Z

The imports are now fixed, but the tests still give me an error with the "labels = torch.IntTensor(labels)", as apparently not all triplet inputs are given as integers? I'll run the tests locally and check if I can fix this

hvoss-techfak · 2024-10-01T09:32:21Z

I relaxed the IntTensor to a normal tensor and now all the short tests pass.

AlekseySh · 2024-10-01T09:35:45Z

let's try one more time

AlekseySh · 2024-10-01T09:37:27Z

double check linters please

hvoss-techfak · 2024-10-01T09:42:39Z

That linters seems to be a formatting issue in the inbatch_hard_tri.py. It is fixed now.

AlekseySh · 2024-10-01T09:48:14Z

by the way, you can run command "make run_precommit", it will help to fix the linters

AlekseySh · 2024-10-01T09:57:53Z

@hvoss-techfak seems like the tests are fine now, so I will check the code soon

could you run all_tests locally please? some my fail (because of environment, OS and so on), but it should be easy to understand if the are related to your changes or not

hvoss-techfak · 2024-10-01T10:48:21Z

All tests are now done. I had a few that failed because of my ddp setup and some import errors, but apart from that all other tests passed.

DaloroAT

Hey, @hvoss-techfak

Thank you for your contribution!

I left a couple of comments removing redundant code.

oml/miners/inbatch_hard_tri.py

AlekseySh · 2024-10-02T16:17:42Z

@hvoss-techfak check out the comments please

hvoss-techfak · 2024-10-06T13:13:37Z

all changes are done

AlekseySh · 2024-10-08T20:20:34Z

@DaloroAT thank you for review it!

AlekseySh · 2024-10-08T20:20:58Z

@hvoss-techfak welcome to the contributors, feel free to work on any issue you like :)

two small changes that speedup pre-processing and training

0f5d872

fixed imports. Ran checks.

61c6d2c

hvoss-techfak added 2 commits October 1, 2024 11:18

lint fixed

d1b546e

relaxing the tensor seems to work

407413d

black lint

c94bf41

DaloroAT requested changes Oct 2, 2024

View reviewed changes

oml/miners/inbatch_hard_tri.py Outdated Show resolved Hide resolved

oml/miners/inbatch_hard_tri.py Outdated Show resolved Hide resolved

oml/miners/inbatch_hard_tri.py Outdated Show resolved Hide resolved

pull request code comments

8c824e0

DaloroAT approved these changes Oct 8, 2024

View reviewed changes

AlekseySh merged commit 3fa41fb into OML-Team:main Oct 8, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A few optimizations that speedup training #627

A few optimizations that speedup training #627

hvoss-techfak commented Sep 28, 2024

AlekseySh commented Oct 1, 2024

hvoss-techfak commented Oct 1, 2024

AlekseySh commented Oct 1, 2024

AlekseySh commented Oct 1, 2024

hvoss-techfak commented Oct 1, 2024

hvoss-techfak commented Oct 1, 2024

AlekseySh commented Oct 1, 2024

AlekseySh commented Oct 1, 2024

hvoss-techfak commented Oct 1, 2024

AlekseySh commented Oct 1, 2024

AlekseySh commented Oct 1, 2024

hvoss-techfak commented Oct 1, 2024

DaloroAT left a comment

AlekseySh commented Oct 2, 2024

hvoss-techfak commented Oct 6, 2024

AlekseySh commented Oct 8, 2024

AlekseySh commented Oct 8, 2024

A few optimizations that speedup training #627

A few optimizations that speedup training #627

Conversation

hvoss-techfak commented Sep 28, 2024

AlekseySh commented Oct 1, 2024

hvoss-techfak commented Oct 1, 2024

AlekseySh commented Oct 1, 2024

AlekseySh commented Oct 1, 2024

hvoss-techfak commented Oct 1, 2024

hvoss-techfak commented Oct 1, 2024

AlekseySh commented Oct 1, 2024

AlekseySh commented Oct 1, 2024

hvoss-techfak commented Oct 1, 2024

AlekseySh commented Oct 1, 2024

AlekseySh commented Oct 1, 2024

hvoss-techfak commented Oct 1, 2024

DaloroAT left a comment

Choose a reason for hiding this comment

AlekseySh commented Oct 2, 2024

hvoss-techfak commented Oct 6, 2024

AlekseySh commented Oct 8, 2024

AlekseySh commented Oct 8, 2024