Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few optimizations that speedup training #627

Merged
merged 6 commits into from
Oct 8, 2024

Conversation

hvoss-techfak
Copy link
Contributor

Thanks for creating such a great project.

I'm currently training on a rather large dataset (500+ GB) of audio data and have found a few optimizations that increase the pre-processing speed in the balance sampler from 10+ minutes down to ~15 seconds and changes to the sampling function to remove the for loop and use only torch functions, that made some pretty big changes for me (from 2 iterations/s to 3.5 iterations/s).

Both changes produce exactly the same results as before.

@AlekseySh
Copy link
Contributor

Hey, @hvoss-techfak
thank you for the suggested changes and your interest in OML!

First, let's see if CI is green.

@hvoss-techfak
Copy link
Contributor Author

I forgot to add the imports. One Second.

@AlekseySh
Copy link
Contributor

Unfortunately, CI is not green, so, please, run tests locally (you will find the commands in the Makefile, you can also check the Contributing guide)

@AlekseySh
Copy link
Contributor

@hvoss-techfak double check linters, please

@hvoss-techfak
Copy link
Contributor Author

The imports are now fixed, but the tests still give me an error with the "labels = torch.IntTensor(labels)", as apparently not all triplet inputs are given as integers? I'll run the tests locally and check if I can fix this

@hvoss-techfak
Copy link
Contributor Author

I relaxed the IntTensor to a normal tensor and now all the short tests pass.

@AlekseySh
Copy link
Contributor

let's try one more time

@AlekseySh
Copy link
Contributor

double check linters please

@hvoss-techfak
Copy link
Contributor Author

That linters seems to be a formatting issue in the inbatch_hard_tri.py. It is fixed now.

@AlekseySh
Copy link
Contributor

by the way, you can run command "make run_precommit", it will help to fix the linters

@AlekseySh
Copy link
Contributor

@hvoss-techfak seems like the tests are fine now, so I will check the code soon

could you run all_tests locally please? some my fail (because of environment, OS and so on), but it should be easy to understand if the are related to your changes or not

@hvoss-techfak
Copy link
Contributor Author

All tests are now done. I had a few that failed because of my ddp setup and some import errors, but apart from that all other tests passed.

Copy link
Collaborator

@DaloroAT DaloroAT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, @hvoss-techfak

Thank you for your contribution!

I left a couple of comments removing redundant code.

oml/miners/inbatch_hard_tri.py Outdated Show resolved Hide resolved
oml/miners/inbatch_hard_tri.py Outdated Show resolved Hide resolved
oml/miners/inbatch_hard_tri.py Outdated Show resolved Hide resolved
@AlekseySh
Copy link
Contributor

@hvoss-techfak check out the comments please

@hvoss-techfak
Copy link
Contributor Author

all changes are done

@AlekseySh AlekseySh merged commit 3fa41fb into OML-Team:main Oct 8, 2024
8 checks passed
@AlekseySh
Copy link
Contributor

@DaloroAT thank you for review it!

@AlekseySh
Copy link
Contributor

@hvoss-techfak welcome to the contributors, feel free to work on any issue you like :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants