Skip to content

afshari-maryam/SelfSupervised-models-repo

Repository files navigation

SelfSupervised-Speech-models-repo

This is for presentation about self supervised models.

  1. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units (Facebook)
    https://arxiv.org/abs/2106.07447
    https://huggingface.co/docs/transformers/model_doc/hubert
    https://ai.facebook.com/blog/hubert-self-supervised-representation-learning-for-speech-recognition-generation-and-compression/
    https://blog.devgenius.io/hubert-explained-6ec7c2bf71fc
    https://jonathanbgn.com/2021/10/30/hubert-visually-explained.html (* This is very good)
    https://github.com/facebookresearch/av_hubert

  2. WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing (Microsoft)
    https://arxiv.org/abs/2110.13900
    https://huggingface.co/docs/transformers/model_doc/wavlm
    https://github.com/microsoft/unilm/tree/master/wavlm

  3. Wav2vec2: A Framework for Self-Supervised Learning of Speech Representations
    https://arxiv.org/abs/2006.11477
    https://huggingface.co/masoudmzb/wav2vec2-xlsr-multilingual-53-fa
    https://github.com/Hamtech-ai/wav2vec2-fa
    https://jonathanbgn.com/2021/09/30/illustrated-wav2vec-2.html (* This is very good)
    https://aws.amazon.com/blogs/machine-learning/fine-tune-and-deploy-a-wav2vec2-model-for-speech-recognition-with-hugging-face-and-amazon-sagemaker/
    https://arxiv.org/abs/2107.13530
    https://arxiv.org/abs/2104.01027 (ROBUST WAV2VEC 2.0: ANALYZING DOMAIN SHIFT IN SELF-SUPERVISED PRE-TRAINING)
    https://huggingface.co/blog/fine-tune-xlsr-wav2vec2 (* This is very good)
    https://huggingface.co/models?arxiv=arxiv:2104.01027
    https://arxiv.org/abs/2101.06699 (Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition)
    https://pytorch.org/tutorials/intermediate/speech_recognition_pipeline_tutorial.html

  4. UniSpeechSAT: SELF-SUPERVISED LEARNING FOR SPEECH RECOGNITION WITH INTERMEDIATE LAYER SUPERVISION (Microsoft)
    https://arxiv.org/abs/2112.08778
    https://github.com/microsoft/UniSpeech

  5. Compare models:

    https://superbbenchmark.org/leaderboard?subset=Public+Set (Superb benchmark)

  6. Other:
    https://arxiv.org/pdf/2110.05777.pdf (Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification)
    https://arxiv.org/pdf/2206.01685.pdf (Toward a realistic model of speech processing in the brain with self-supervised learning) (* It is Important)
    https://syncedreview.com/2019/02/22/yann-lecun-cake-analogy-2-0/

  7. Implement Guide:
    https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vec
    https://huggingface.co/docs/transformers/model_doc/wavlm#transformers.WavLMForXVector
    https://colab.research.google.com/github/m3hrdadfi/notebooks/blob/main/Fine_Tune_XLSR_Wav2Vec2_on_Persian_ShEMO_ASR_with_%F0%9F%A4%97_Transformers_ipynb.ipynb

About

This is for presentation about self supervised model:

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published