Support Transformers in the Wav2Vec2 Encoder for the ASR Inference #1520

homink · 2023-10-24T18:55:16Z

This PR allows Transformers in the Wav2Vec2 Encoder for the ASR inference. Details are similarly implemented by following the Whisper model parts. This work improved runtime GPU memory usage from 3060MB to 1897MB for the in-house computing environment and inference time 9.2 sec to 5.48 sec for in-house test data. I wish this PR would be accepted and maintained for the future use. Testing script is found python/tests/test_transformers.py

homink · 2023-10-25T03:44:09Z

I observe failing for the recent pull requests and I think the following check caused it since this commit. Any suggestions?

32071b3

homink · 2023-10-27T21:59:29Z

@vince62s, thanks for suggesting ONEAPI_VERSION. It fixed the issue indeed.

I struggled with the test environment where I download and read the audio file. I tried several and ended up using the audio file already used the Whisper test. Now everything is in a good shape and good to go!

vince62s · 2023-10-29T07:38:57Z

LGTM but if @nguyendc-systran you can have a look, thanks.

minhthuc2502 · 2023-11-03T10:53:27Z

@vince62s It is good for me.

…rence (OpenNMT#1520)" This reverts commit f92a8a2.

…ASR Inference (OpenNMT#1520)"" This reverts commit 7c60769.

ryanheise · 2024-09-22T09:13:14Z

python/cpp/wav2vec2.cc

+                 Encodes the input features.
+
+                 Arguments:
+                   features: Mel spectogram of the audio, as a float array with shape


Doesn't this one take raw audio, not a mel spectrogram?

That's correct. It should be not Mel spectrogram but raw audio. How can we fix it? making another PR for this?

I'm not a project maintaner/member/contributor, but I would guess so.

hkwon added 3 commits October 24, 2023 11:46

Support Transformers in the Wav2Vec2 Encoder for the ASR Inference

15ee7da

code style/format check with flask8 & black

0901484

check isort and update

4b28cc6

hkwon added 17 commits October 25, 2023 08:41

change ONEAPI_VERSION to 2023.2.0

5258816

add missing package (librosa) for test_wav2vec2.py

f9bfa16

import package path update

d2ff992

isort library update

4ffaab9

update vocab return

7078fd5

add packages requirement for test_wav2vec2.py

eeb92ef

merge test_wav2vec2.py to test_transformers.py for the compatibility

9af8b89

fix python style format

d2b01ed

update audio_name for TestWav2Vec2

7e0dcdc

change the audio downloading

8403920

change the audio downloading

bf63c95

change the audio downloading

dad4b2c

add requests for test requirement

a9674ba

update audio file downloading

1e6aa47

update audio file downloading path

478bde5

switch audio to the existing one

11f5ff1

remove unnecessary audio downloading

4370869

vince62s merged commit f92a8a2 into OpenNMT:master Nov 3, 2023
17 checks passed

funboarder13920 pushed a commit to funboarder13920/CTranslate2 that referenced this pull request Nov 7, 2023

Revert "Support Transformers in the Wav2Vec2 Encoder for the ASR Infe…

7c60769

…rence (OpenNMT#1520)" This reverts commit f92a8a2.

funboarder13920 pushed a commit to funboarder13920/CTranslate2 that referenced this pull request Nov 7, 2023

Revert "Revert "Support Transformers in the Wav2Vec2 Encoder for the …

754e9dd

…ASR Inference (OpenNMT#1520)"" This reverts commit 7c60769.

homink mentioned this pull request Jul 23, 2024

perf: conv1d quantization #1601

Merged

ryanheise reviewed Sep 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Transformers in the Wav2Vec2 Encoder for the ASR Inference #1520

Support Transformers in the Wav2Vec2 Encoder for the ASR Inference #1520

homink commented Oct 24, 2023 •

edited

Loading

homink commented Oct 25, 2023

homink commented Oct 27, 2023 •

edited

Loading

vince62s commented Oct 29, 2023

minhthuc2502 commented Nov 3, 2023 •

edited

Loading

ryanheise Sep 22, 2024

homink Sep 24, 2024

ryanheise Sep 30, 2024

Support Transformers in the Wav2Vec2 Encoder for the ASR Inference #1520

Support Transformers in the Wav2Vec2 Encoder for the ASR Inference #1520

Conversation

homink commented Oct 24, 2023 • edited Loading

homink commented Oct 25, 2023

homink commented Oct 27, 2023 • edited Loading

vince62s commented Oct 29, 2023

minhthuc2502 commented Nov 3, 2023 • edited Loading

ryanheise Sep 22, 2024

Choose a reason for hiding this comment

homink Sep 24, 2024

Choose a reason for hiding this comment

ryanheise Sep 30, 2024

Choose a reason for hiding this comment

homink commented Oct 24, 2023 •

edited

Loading

homink commented Oct 27, 2023 •

edited

Loading

minhthuc2502 commented Nov 3, 2023 •

edited

Loading