Mismatch between mid-to-high frequency speech and low-frequency? #2

jdwang125 · 2023-03-28T01:37:08Z

jdwang125
Mar 28, 2023

Hi jhauret, thank you for your patient response. I have found an issue while evaluating the performance, where this frequency band expansion network may cause errors in speech understanding due to the mismatch between mid-to-high frequency speech and low-frequency. Have you noticed this issue, and what do you think could be the reason behind it? Do you have any ideas for a possible solution?

Answered by jhauret

Mar 28, 2023

Hi again jdwand125,

As the MUSHRA-U (ease of Understanding) points out in our article, the network is good but not perfect, just like SEANet. The task is difficult because of the very little information left in the corrupted speech signal (hence the name extreme bandwidth extension). Indeed, any traces of fricative sounds such as {f,s,ʃ,v,z,θ,ð} are absent in the low frequencies and we have not used any language model to help us infer those sounds.

Still, the MelGAN discriminator helps us ensure coherence between bands and the phenomenon you describing should be minor. Can you provide a specific example of an audio/spectrogram where this happens for you?

View full answer

jhauret · 2023-03-28T08:25:26Z

jhauret
Mar 28, 2023
Maintainer

Hi again jdwand125,

As the MUSHRA-U (ease of Understanding) points out in our article, the network is good but not perfect, just like SEANet. The task is difficult because of the very little information left in the corrupted speech signal (hence the name extreme bandwidth extension). Indeed, any traces of fricative sounds such as {f,s,ʃ,v,z,θ,ð} are absent in the low frequencies and we have not used any language model to help us infer those sounds.

Still, the MelGAN discriminator helps us ensure coherence between bands and the phenomenon you describing should be minor. Can you provide a specific example of an audio/spectrogram where this happens for you?

4 replies

jdwang125 Mar 28, 2023
Author

Hi jhauret:
The audio address : https://github.com/jdwang125/AHRS/tree/tmp_test
The above audio is the result of the original pre-trained model.
I tried to change the low-pass filter frequency to 1300Hz, but the problem still persists.

jhauret Mar 28, 2023
Maintainer

Oh ok, I see the problem. You are using English or Chinese speech!

The available model has been pre-trained on French LibriSpeech, hence you should only use it on French speech. We mentioned this phenomenon on our project page in the "Funny experiments" session.

Moreover, the network has been trained to invert a single degradation: it is unlikely to be robust to further degradations, even if they are "easier" to invert. Therefore, you may want to continue inferencing with the same language and cutoff frequency as the training data...or re-train the model for your specific use case. 😉

jdwang125 Mar 29, 2023
Author

Hi jhauret:
Based on the Chinese dataset and the 2 changes you mentioned, as well as changing the low-pass frequency to 1300Hz, I retrained the network. However, the problem still persists. I have placed the corpus at this address: https://github.com/jdwang125/AHRS/tree/tmp_test2.

jhauret Mar 29, 2023
Maintainer

The hyperparameters may need to be adjusted for better results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mismatch between mid-to-high frequency speech and low-frequency? #2

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Mismatch between mid-to-high frequency speech and low-frequency? #2

jdwang125 Mar 28, 2023

Replies: 1 comment · 4 replies

jhauret Mar 28, 2023 Maintainer

jdwang125 Mar 28, 2023 Author

jhauret Mar 28, 2023 Maintainer

jdwang125 Mar 29, 2023 Author

jhauret Mar 29, 2023 Maintainer

jdwang125
Mar 28, 2023

Replies: 1 comment 4 replies

jhauret
Mar 28, 2023
Maintainer

jdwang125 Mar 28, 2023
Author

jhauret Mar 28, 2023
Maintainer

jdwang125 Mar 29, 2023
Author

jhauret Mar 29, 2023
Maintainer