Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Transformers in the Wav2Vec2 Encoder for the ASR Inference #1520

Merged
merged 20 commits into from
Nov 3, 2023
Merged

Support Transformers in the Wav2Vec2 Encoder for the ASR Inference #1520

merged 20 commits into from
Nov 3, 2023

Conversation

homink
Copy link
Contributor

@homink homink commented Oct 24, 2023

This PR allows Transformers in the Wav2Vec2 Encoder for the ASR inference. Details are similarly implemented by following the Whisper model parts. This work improved runtime GPU memory usage from 3060MB to 1897MB for the in-house computing environment and inference time 9.2 sec to 5.48 sec for in-house test data. I wish this PR would be accepted and maintained for the future use. Testing script is found python/tests/test_transformers.py

@homink
Copy link
Contributor Author

homink commented Oct 25, 2023

I observe failing for the recent pull requests and I think the following check caused it since this commit. Any suggestions?

32071b3

@homink
Copy link
Contributor Author

homink commented Oct 27, 2023

@vince62s, thanks for suggesting ONEAPI_VERSION. It fixed the issue indeed.

I struggled with the test environment where I download and read the audio file. I tried several and ended up using the audio file already used the Whisper test. Now everything is in a good shape and good to go!

@vince62s
Copy link
Member

LGTM but if @nguyendc-systran you can have a look, thanks.

@minhthuc2502
Copy link
Collaborator

minhthuc2502 commented Nov 3, 2023

@vince62s It is good for me.

@vince62s vince62s merged commit f92a8a2 into OpenNMT:master Nov 3, 2023
17 checks passed
funboarder13920 pushed a commit to funboarder13920/CTranslate2 that referenced this pull request Nov 7, 2023
funboarder13920 pushed a commit to funboarder13920/CTranslate2 that referenced this pull request Nov 7, 2023
Encodes the input features.

Arguments:
features: Mel spectogram of the audio, as a float array with shape

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this one take raw audio, not a mel spectrogram?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. It should be not Mel spectrogram but raw audio. How can we fix it? making another PR for this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a project maintaner/member/contributor, but I would guess so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants