-
Hi jhauret, thank you for your patient response. I have found an issue while evaluating the performance, where this frequency band expansion network may cause errors in speech understanding due to the mismatch between mid-to-high frequency speech and low-frequency. Have you noticed this issue, and what do you think could be the reason behind it? Do you have any ideas for a possible solution? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Hi again jdwand125, As the MUSHRA-U (ease of Understanding) points out in our article, the network is good but not perfect, just like SEANet. The task is difficult because of the very little information left in the corrupted speech signal (hence the name extreme bandwidth extension). Indeed, any traces of fricative sounds such as {f,s,ʃ,v,z,θ,ð} are absent in the low frequencies and we have not used any language model to help us infer those sounds. Still, the MelGAN discriminator helps us ensure coherence between bands and the phenomenon you describing should be minor. Can you provide a specific example of an audio/spectrogram where this happens for you? |
Beta Was this translation helpful? Give feedback.
Hi again jdwand125,
As the MUSHRA-U (ease of Understanding) points out in our article, the network is good but not perfect, just like SEANet. The task is difficult because of the very little information left in the corrupted speech signal (hence the name extreme bandwidth extension). Indeed, any traces of fricative sounds such as {f,s,ʃ,v,z,θ,ð} are absent in the low frequencies and we have not used any language model to help us infer those sounds.
Still, the MelGAN discriminator helps us ensure coherence between bands and the phenomenon you describing should be minor. Can you provide a specific example of an audio/spectrogram where this happens for you?