Session 1-Voice Privacy and Security

Session: Voice Privacy and Security

Moderator:

Panelists

Tomoki Toda, Nagoya University, Japan tomoki.v6@f.mail.nagoya-u.ac.jp
Xin Wang, National Institute of Informatics, Japan wangxin@nii.ac.jp
Rohan Kumar Das, Fortemedia, Singapore ecerohan@gmail.com

Description

Speech is among the most natural and convenient means of biometric authentication. The individual traits embedded in the speech signals form the basis of speaker recognition or voice authentication. With the widespread availability of speaker recognition and speech synthesis tools, the threat from malicious exploitation of speaker attributes is growing. For example, an attacker could retrieve a target speaker’s recordings from a breached data source with just a few seconds (e.g., 3 seconds) of their speech, leading to the leakage of privacy-related information such as age, interests, opinions, ethics, and health status. Utilizing VC and TTS techniques, synthetic speech can be generated to impersonate the target speaker. The synthesized speech can be exploited for malicious purposes, such as damaging the speaker’s reputation and manipulating public opinion. These voice privacy concerns have called for the need for techniques of voice anonymization, speech watermarking, and anti-spoofing, etc. In this panel, we invite world-leading experts to share their opinions on the security and privacy aspects in handling individual traits in speech, the challenges posed by the advancement in speaker recognizers and neural speech synthesizers, and the collaborative efforts that could be put together in answering the concerns and challenges.

Format

5 minutes of introduction (moderator)
45 minutes of presentation (panellists)
40 minutes of open discussion

Biographies of the Moderator and Panelists

Liping Chen is an Associate Researcher at the University of Science and Technology of China, China. She received the Ph.D. degree in signal and information processing from the University of Science and Technology of China (USTC), Hefei, China, in 2016. From 2016 to 2022, she was a Speech Scientist with Microsoft. She is currently an Associate Researcher with the USTC. Her research interests include speech processing, voice privacy protection, speech synthesis, and speaker recognition.

Tomoki Toda is currently a Professor with Information Technology Center, Nagoya University. He received the B.E. degree from Nagoya University, Japan, in 1999 and the D.E. degree from the Nara Institute of Science and Technology (NAIST), Japan, in 2003. He was a Research Fellow with the Japan Society for the Promotion of Science from 2003 to 2005. From 2005 to 2011, he was an Assistant Professor, and from 2011 to 2015, was an Associate Professor with NAIST. His research interests include statistical approaches to speech, music, and environmental sound processing. He was the recipient of the IEEE SPS 2009 Young Author Best Paper Award and the 2013 EURASIP-ISCA Best Paper Award, Speech Communication journal. He has served as SLP TC chair of APSIPA since 2025. He has organized several special sessions, such as Voice Conversion Challenge 2016 in INTERSPEECH 2016, VoiceMOS Challenge 2022 in INTERSPEECH 2022, and Singing Voice Deepfake Detection Challenge 2024 in IEEE SLT 2024.

Xin Wang is currently a JST PRESTO researcher and a Project Associate Professor at the National Institute of Informatics, Japan. He received his Ph.D. degree from SOKENDAI, Japan, in 2018. Prior to that, he earned his Master’s and Bachelor’s degrees from USTC and UESTC, China, respectively. He has been an organizer of the past three ASVspoof challenges on speech deepfake detection, as well as the VoicePrivacy challenges on speaker anonymization. He is also an appointed team member of the ISCA Special Interest Group on Security and Privacy in Speech Communication.

Rohan Kumar Das is currently a Research and Development (R&D) Manager at Fortemedia, Singapore division. Prior to that he was associated with National University of Singapore as a Research Fellow from 2017-2021 and as a Data Scientist in KOVID Research Labs, India in the year 2017. He is a Ph.D. graduate from Indian Institute of Technology (IIT) Guwahati. He was one of the organizers of the special sessions on “The Attacker’s Perspective on Automatic Speaker Verification”, “Far-Field Speaker Verification Challenge 2020” in Interspeech 2020, and the Voice Conversion Challenge 2020. He served as Publication Chair of IEEE Automatic Speech Recognition Understanding (ASRU) Workshop 2019 and one of the Chairs of Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020. He is a Senior Member of IEEE, a member of ISCA and APSIPA. His research interests are speech/audio signal processing, speaker verification, anti-spoofing, social signal processing and various applications of deep learning.