2022-04-11 11:46
From searching audio soundbites to analysing quotes and sources, machine learning has the potential to better serve diverse audiences and create new beats
Despite being increasingly popular, audio storytelling has one important limitation: it is hard to search for soundbites that can be shared.
Let’s say that someone recommended you a podcast. You find the hour-long episode, read the description and realise that there are three guests. So you start scrolling back and forth to find the bit you are interested in, which is pretty tedious. And once you find it and you want to recommend it to someone else, there is no way to share it.
This was the challenge that Kajsa Norell, head of playdesk at Swedish public radio station Sveriges Radio, decided to tackle using artificial intelligence (AI). The objective was twofold: making audio more searchable and shareable, and making it more accessible to diverse audiences.
One of the most obvious applications of such technology is to be able to search through archived material. For example, Sveriges Radio’s global news podcast recently talked about new developments in the murder of the Swedish prime minister in the 1980s. However, the producers soon realised some of their listeners were not even born yet and knew little about the case.
To help them out, the team used soundbites from archived shows that featured important information. This helped younger audiences understand the story and refreshed the memory for others.
There are many more ways journalists could work with audio snippets, like using them in another audio show or collating them into a curated playlist. Taken further, the AI system could recommend more soundbites to listen to according to users’ preferences, instead of suggesting entire shows, making audio content more engaging and relevant.
This opens up a whole new world of possibilities for audio storytelling. You could, for instance, create a playlist of what a politician said about a particular topic since the last election and show how their position changed over time, all in one audio show and with minimum effort.
With podcast and wider audio consumption growing, it is increasingly difficult to stand out. Being able to slice and dice audio and serve it to listeners in an innovative format could help many publishers better serve their audiences' interests.
Even when it comes to written text, it is not that easy to accurately identify quotes and match them with the right sources. So data scientists at The Guardian joined forces with Agence France-Presse (AFP) to find a better way to identify and attribute quotes, using machine learning.
Their objective was to enable journalists to better search and understand quotes and who they are quoting.
Although the concept is yet to be used to build a tool that journalists can use, it has the potential to help search and analyse quotes within articles. This could help create new beats focused on accountability reporting like tracking how public personalities’ opinion changes over time by searching and comparing their quotes from archives.
Another use could be helping editors track diversity of sources quoted in a publication. However, hard-coding gender is tricky: technology is generally pretty good at identifying binary gender but it runs into problems with pronouns like they, which can mean both plural and a non-binary person.
Ethnicity is another problematic area, as names of sources alone are not enough to attribute ethnicity and the model would need some more metadata to be added by the author to accurately report the ethnicity of the source.
Diversity data is interesting not only when analysing quoted sources but also bylines, which could help editors pay closer attention to the representation of authors.
"We are very proud of the annotation guide we created," says Anna Vissens, lead data scientist at The Guardian. "It is open source and it gives you very good insights into what it takes to build a machine learning model like this."
Both projects were part of Collab Challenges, a series of experiments organised by JournalismAI, a project of Polis, the journalism think tank at the London School of Economics and Political Science (LSE) and were supported by the Google News Initiative.