Tutorial #2

Title: Deep Speaker Modeling: Theories, Applications and Practice

Presenter: Shuai Wang

Part I: Foundations and Recent Advances (60 minutes)

Foundational theories and review of traditional methods in speaker modeling
Evolution of speaker representation techniques in the deep learning era
- From i-vector to various deep speaker representations
- Applications of self-supervised and semi-supervised learning in speaker modeling
- Analysis of speaker representation capabilities in foundation speech models
- Leveraging pretrained large models

Part II: Applications Beyond Recognition (60 minutes)

Speaker-adaptive speech synthesis
- Voice cloning technologies and ethical considerations
- Speaker representation in few-shot and zero-shot speech synthesis
Personalized voice conversion systems
Speaker perception in multimodal human-computer interaction
Target speaker speech processing
- Target speaker extraction
- Target speaker speech recognition
- Target speaker verification
- Personalized VAD

Part III: Challenges and Countermeasures (30 minutes)

Domain adaptation and domain-invariant learning
Privacy-preserving speaker representations
Robustness and adversarial attack defense
Computational efficiency and model compression
Explainability techniques and methods

Part IV: Practical Implementation (30 minutes)

Introduction to tools and frameworks
- Wespeaker toolkit for speaker embedding learning
- Wesep toolkit for target speech extraction
Case studies and demonstrations
Interactive discussion and Q&A session

Venue: Lotus II