Speech technology for a safer future
鈥淔or the past decade, the focus of my research has largely been on spoofing attacks in the context of automatic speaker recognition,鈥 Professor of Speech Technology Tomi Kinnunen says.
鈥淥riginally, of course, I was interested in computers. In the 1980s, I was into computer games and I also learned the BASIC programming language on a Commodore-64 computer, which was popular back then. Later on, music became a hobby, which is why I鈥檓 interested in everything that has to do with sound,鈥 Professor Tomi Kinnunen says.
鈥淪ince programming was also a hobby of mine, it was a natural choice to start studying computer science at the then University of Joensuu. In the last year of my Master鈥檚 level studies, I was able to bring these different interests together and I wrote my Master鈥檚 thesis on automatic speaker recognition. I continued on the same topic in my postgraduate studies: I obtained my Licentiate of Philosophy degree in 2004 and my PhD in 2005 鈥 and the topic is still being actively researched. After defending my PhD, I spent two years at the Institute for Infocomm Research in Singapore. My research has mainly been funded by the Academy of Finland and I鈥檝e also secured one H2020 project (OCTAVE).鈥
Speaker recognition is used, for example, in smart speakers and smart phones (personal profiles, voice login), in telephone switchboards (is the caller is who they claim to be), in forensic voice comparison (is the person speaking the suspect), and in access control.
鈥淔or the past decade, the focus of my research has largely been on various spoofing attacks. Among other things, we have studied the impact of replay attacks, text-to-speech synthesis and voice conversion on speaker recognition. The latter two can be used to 鈥榩ut words in someone鈥檚 mouth鈥, and it is becoming increasingly difficult to tell synthetic speech apart from real one, at least by ear alone. We have also studied, for example, the impact of imitation and the effects of deliberate changes to a speaker鈥檚 voice.鈥
Professor Kinnunen says that in the future, you may well find yourself in a situation where you think you are getting a call from your mother, supervisor or colleague, but the caller is someone else entirely.
鈥淚鈥檓 certain that we鈥檒l also see more and more manipulated image, text, audio and video material on social media, and the recent years鈥 famous deepfake videos are just the beginning. We simply must learn to live with this new reality.鈥
鈥淔rom the viewpoint of methodological research, it is interesting to study what kind of attacks and manipulations can be detected automatically, and how vulnerable different recognition systems are to different types of attacks,鈥 Professor Kinnunen says.
鈥淔or instance, we have developed new machine learning-based methods for identifying synthetic and modified speech (i.e. whether the speaker is a machine or a human). I鈥檓 also one of the founders and organisers of the ASVspoof challenge (), which is an internationally acknowledged research challenge. The aim is to not only to discover vulnerabilities in speaker recognition technology, but also to find solutions together. The competition is open to everyone and the related research data is openly accessible.鈥
鈥淭he ASVspoof competition has become well-known among the field鈥檚 researchers and companies. A significant number of researchers worldwide are currently working to identify and fix recognizer vulnerabilities. But since the field is constantly evolving and reforming, I鈥檓 not too keen to speculate about the future. More and more speech technology will certainly be seen in the consumer electronics sector. However, we need to be able to advance the underlying methodological research through basic research.鈥
鈥淚 think people in Finland are well informed when it comes to IT skills. I believe that the role of machine learning and data will grow in all sectors in the future. It is always worthwhile to study computer science.鈥
For further information, please contact:
Professor Tomi Kinnunen, tkinnu (a) cs.uef.fi
***
Tomi Kinnunen appointed as Professor of Computer Science, especially Speech Technology from 1 January 2021onwards (invitation procedure)
Master of Science (Computer Science), University of Joensuu, 1999
Licentiate of Philosophy (Computer Science), University of Joensuu, 2004
Doctor of Philosophy (Computer Science), University of Joensuu, 2005
Docent (Speaker and Language Recognition), Aalto University, 2014
Most important roles:
Professor of Speech Technology, 91天美, 2021鈥
Associate Professor (Tenure Track), 91天美, 2017鈥2020
Assistant Professor (Tenure Track), 91天美, 2013鈥2016
Visiting Scholar, National Institute of Informatics (NII), Japan, 2015鈥2016
Academy of Finland Postdoctoral Researcher, 91天美, 2010鈥2012
Various research and teaching roles, University of Joensuu, 2007鈥2009
Researcher, Institute for Infocomm Research (I2R), Singapore, 2005鈥2007