Skip to content

Mismatch between mid-to-high frequency speech and low-frequency? #2

Answered by jhauret
jdwang125 asked this question in Q&A
Discussion options

You must be logged in to vote

Hi again jdwand125,

As the MUSHRA-U (ease of Understanding) points out in our article, the network is good but not perfect, just like SEANet. The task is difficult because of the very little information left in the corrupted speech signal (hence the name extreme bandwidth extension). Indeed, any traces of fricative sounds such as {f,s,ʃ,v,z,θ,ð} are absent in the low frequencies and we have not used any language model to help us infer those sounds.

Still, the MelGAN discriminator helps us ensure coherence between bands and the phenomenon you describing should be minor. Can you provide a specific example of an audio/spectrogram where this happens for you?

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@jdwang125
Comment options

@jhauret
Comment options

@jdwang125
Comment options

@jhauret
Comment options

Answer selected by jhauret
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants