This is for presentation about self supervised models.
-
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units (Facebook)
https://arxiv.org/abs/2106.07447
https://huggingface.co/docs/transformers/model_doc/hubert
https://ai.facebook.com/blog/hubert-self-supervised-representation-learning-for-speech-recognition-generation-and-compression/
https://blog.devgenius.io/hubert-explained-6ec7c2bf71fc
https://jonathanbgn.com/2021/10/30/hubert-visually-explained.html (* This is very good)
https://github.com/facebookresearch/av_hubert -
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing (Microsoft)
https://arxiv.org/abs/2110.13900
https://huggingface.co/docs/transformers/model_doc/wavlm
https://github.com/microsoft/unilm/tree/master/wavlm -
Wav2vec2: A Framework for Self-Supervised Learning of Speech Representations
https://arxiv.org/abs/2006.11477
https://huggingface.co/masoudmzb/wav2vec2-xlsr-multilingual-53-fa
https://github.com/Hamtech-ai/wav2vec2-fa
https://jonathanbgn.com/2021/09/30/illustrated-wav2vec-2.html (* This is very good)
https://aws.amazon.com/blogs/machine-learning/fine-tune-and-deploy-a-wav2vec2-model-for-speech-recognition-with-hugging-face-and-amazon-sagemaker/
https://arxiv.org/abs/2107.13530
https://arxiv.org/abs/2104.01027 (ROBUST WAV2VEC 2.0: ANALYZING DOMAIN SHIFT IN SELF-SUPERVISED PRE-TRAINING)
https://huggingface.co/blog/fine-tune-xlsr-wav2vec2 (* This is very good)
https://huggingface.co/models?arxiv=arxiv:2104.01027
https://arxiv.org/abs/2101.06699 (Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition)
https://pytorch.org/tutorials/intermediate/speech_recognition_pipeline_tutorial.html -
UniSpeechSAT: SELF-SUPERVISED LEARNING FOR SPEECH RECOGNITION WITH INTERMEDIATE LAYER SUPERVISION (Microsoft)
https://arxiv.org/abs/2112.08778
https://github.com/microsoft/UniSpeech -
Compare models:
https://superbbenchmark.org/leaderboard?subset=Public+Set (Superb benchmark) -
Other:
https://arxiv.org/pdf/2110.05777.pdf (Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification)
https://arxiv.org/pdf/2206.01685.pdf (Toward a realistic model of speech processing in the brain with self-supervised learning) (* It is Important)
https://syncedreview.com/2019/02/22/yann-lecun-cake-analogy-2-0/ -
Implement Guide:
https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vec
https://huggingface.co/docs/transformers/model_doc/wavlm#transformers.WavLMForXVector
https://colab.research.google.com/github/m3hrdadfi/notebooks/blob/main/Fine_Tune_XLSR_Wav2Vec2_on_Persian_ShEMO_ASR_with_%F0%9F%A4%97_Transformers_ipynb.ipynb