Tutorial #2

Title: Deep Speaker Modeling: Theories, Applications and Practice

Presenter: Shuai Wang, Yanmin Qian, Haizhou Li

Shuai Wang
Yanmin Qian
Haizhou Li

Part I: Foundations and Recent Advances (60 minutes)

  • Foundational theories and review of traditional methods in speaker modeling
  • Evolution of speaker representation techniques in the deep learning era
    • From i-vector to various deep speaker representations
    • Applications of self-supervised and semi-supervised learning in speaker modeling
    • Analysis of speaker representation capabilities in foundation speech models
    • Leveraging pretrained large models

Part II: Applications Beyond Recognition (60 minutes)

  • Speaker-adaptive speech synthesis
    • Voice cloning technologies and ethical considerations
    • Speaker representation in few-shot and zero-shot speech synthesis
  • Personalized voice conversion systems
  • Speaker perception in multimodal human-computer interaction
  • Target speaker speech processing
    • Target speaker extraction
    • Target speaker speech recognition
    • Target speaker verification
    • Personalized VAD

Part III: Challenges and Countermeasures (30 minutes)

  • Domain adaptation and domain-invariant learning
  • Privacy-preserving speaker representations
  • Robustness and adversarial attack defense
  • Computational efficiency and model compression
  • Explainability techniques and methods

Part IV: Practical Implementation (30 minutes)

  • Introduction to tools and frameworks
    • Wespeaker toolkit for speaker embedding learning
    • Wesep toolkit for target speech extraction
  • Case studies and demonstrations
  • Interactive discussion and Q&A session

Venue: Lotus II