The Bayes Centre is partnering with the BBC in two research projects to unlock the potential of data in the media and develop technology innovations: 

  • SUMMA (Scalable Understanding of Multilingual Media), an EU Horizon 2020 funded Big Data project, aims to provide a media monitoring platform capable of dealing with large volumes of data across many languages and different media types. It is a five-year research partnership between the BBC and eight UK universities. 
  • SCRIPT, a three-year research and innovation project, is looking to develop synthetic voices for low-resourced languages. This project is led by University of Edinburgh's Centre for Speech Technology Research (CSTR).

What issues are they addressing? 

Currently editorial staff have to waste time sifting through mountains of media across outlets and languages by hand to find the information they need. Harnessing machines to do the heavy lifting in multilingual media monitoring would enable them to find what they need much more quickly and comprehensively. 

Text-to-speech synthetic voices, which are sometimes used for conveying media information in low-resourced languages, can sound very artificial and be difficult to follow. Better text-to-speech synthesis technologies could result in a far more accessible and engaging service. 

How can SUMMA and SCRIPT help? 

The media landscape has become too large to maintain the traditional monitoring approach. SUMMA addresses this through the development of a scalable platform for intelligent media monitoring to detect trends and changing media behaviour and to flag breaking news events. The platform aims to automate the analysis of media streams across many languages, to aggregate and distil the content, to automatically create rich knowledge bases, and to provide visualisations to cope with this deluge of data. 

With SCRIPT, the team at CSTR using case examples provided by the BBC are researching the possible integration of two methods of producing synthetic voices: unit selection and deep neural network, or parametric text-to-speech. This would combine natural-sounding voice recordings produced through unit selection with deep neural network technology to enable parametric changes to tone, pitch and speed. 

Who is it going to help? 

SUMMA will make it easier for editorial experts to do their job. Monitoring the international news media is of critical importance, and not just to the BBC, but also to news agencies and journalists and many industrial sectors, including advertising, finance and sports. Monitoring the global media, spotting trends, tracking people in the news and identifying differences in reporting on the same events is also a crucial activity for organisations with a global outlook.

SCRIPT should help users of low-resourced language information services get a better service and such technology development could also be applied to improve access to content across organisations and sectors. 

What are the potential benefits of such partnerships? 

Such interaction between an organisation as the BBC and data scientists has huge potential for real-world impact. As Matthew Postgate, BBC Chief Technology and Product Officer, explained: The BBC has always been at its best when it combines creativity with technology. As we reinvent the BBC, we can see the opportunities that data and machine learning are opening up for us, our creative talent and our audiences."

Michael Rovatsos of the University of Edinburgh agreed. Speech and language is a big part of what we do in Edinburgh. Through the School of Informatics and the new Bayes Centre for Data Science and Technology we are working with leading organisations such as the BBC to develop new interaction between people, data and systems."

Find out more about SUMMA and SCRIPT projects.