2022-01-29 05:21
Journalistic work often depends on transcription services for creating written logs of recorded audio, from assisting in research to captioning videos to publication of interviews. But uploading audio to a transcription service means giving a copy of that — sometimes sensitive — recording over to a company. While there is no single service that meets all of our data privacy needs, here we unpack security and privacy practices for popular transcription services, weigh when journalists should use remote transcription services, and explore how to minimize risk when working with sensitive audio.
In an informal poll, we heard from more than 50 journalists about their favorite transcription services and what they liked about each. Given the necessity of transcription, we emailed and researched the top five most referenced services — Descript, Otter.ai, Rev, Temi, and Trint — and asked: How safe are these services and what are they doing to protect your data, private recordings, and transcripts?
We wanted to better understand:
Many of the security features of popular transcription services are nearly identical. They all use standard Transport Layer Security (TLS) encryption to protect your traffic to and from the website, as well as AES-256 encryption to store data safely on their Amazon Web Services servers. That means that your transcript is less likely to be compromised by an attacker outside of the organization. However, it also means that the company itself has the technical ability to access the audio you’ve uploaded. Likewise, to provide these services the companies also have the technical ability to read the transcripts the company created and stored, and they each have a different policy surrounding when it’s necessary to do so.
Even if you’re OK with people inside the company, or those who are contracting for the company, accessing your data, you may feel differently about third-party requests for data. Unfortunately none of the five services we looked at offer a transparency report, so there’s no way to know how frequently they receive or disclose user data responsive to law enforcement requests. We would love to see all major transcription service providers join the growing list of organizations that provide a transparency report so we can include that information when weighing privacy concerns.
Descript says in its security documentation that while it has the technical capability, it commits to not look at users’ data except in certain circumstances, such as when processing specific computer-generated voices or when a user has requested a review from a customer service representative. Users can also opt in to share information to help improve the service.
Behind the scenes, Descript is using a small handful of services to process transcripts. Descript uses Google Cloud Speech-to-Text to provide automatic transcription. Google deletes your data from its servers after the transcription is completed. According to its documentation, Descript also uses Rev to provide automatic or human transcription. Descript says: “If you request a White Glove transcription, we will share your audio files with Rev, which has strict confidentiality agreements with all of its employees.” (See the Rev section below for our notes on that agreement and its caveats.)
Image: Screenshot
Descript offers a powerful feature called Overdub, which allows users to insert realistic computer-generated voices into the transcript. To accomplish this, Descript uses Google Cloud to process and reproduce your voice. Descript will generate “non-defamatory” samples of your voice, and human reviewers on Amazon’s Mechanical Turk will listen to this sample audio to confirm it sounds right. Descript says their employees may also review the uploaded audio, as well as computer-generated output audio for quality assurance.
Descript does not offer two-factor authentication.
Like the other transcription services outlined here, Otter.ai encrypts its data on Amazon Web Services servers and holds the encryption keys. Its privacy policy suggests it uses this access to provide the service and to train its transcription artificial intelligence (AI) with collective “de-identified” audio recordings. Otter.ai’s privacy policy says: “Only with your explicit permission will we manually review certain audio recordings to further refine our model training data.”
Image: Screenshot
Otter.ai tells Freedom of the Press Foundation in an email: “We do not sell or share your data with third parties, nor access your data without your explicit permission. You also have full control to delete your conversations. Deleting a conversation permanently deletes it from Otter’s servers, and can’t be undone.”
The company’s security white paper says only two administrators have access to its database “as required by their job function.” We asked Otter.ai for further clarification about job functions that could merit access to user data, but had not heard back by the time of publication.
Otter.ai’s security white paper suggests it does not rely on third-party services to process audio or transcripts, only to store user data.
Two-factor authentication is supported on Business and Enterprise accounts, free and paid Pro accounts cannot use this important security feature.
Image: Screenshot
Rev offers both automatic and human transcription. According to Rev’s security documentation, employees are “restricted to handle data required to perform their job. Our staff is trained on proper use of our systems and best practices for security and privacy.”
However, the circumstances under which employees may access user data are unclear. We reached out to Rev for more details, but did not receive a response by the time of publication.
Rev’s security documentation suggests it does not rely on third parties to automate transcription, but instead relies on its more than 60,000 freelance manual transcriptionists known as “Revvers.”
Rev requires strict confidentiality agreements, and following a 2019 report by OneZero, Rev now prevents transcriptionists from downloading customer audio.
Rev is the only service of this group that offers two-factor authentication to all users.
Image: Screenshot
Temi is an audio-to-text transcription service that uses advanced speech recognition software. Temi is operated by Rev, which is why the two services have virtually identical privacy policies and similar security properties. Temi does not appear to use any third-party processing services. Unlike Rev, Temi does not offer human transcriptions and says on its website: “Files are transcribed by machines and are never seen by a human.”
Temi does not offer two-factor authentication.
Image: Screenshot
Trint is an AI-based transcription service for both audio and video files, popular with videographers because it integrates with the video editor Adobe Premiere Pro, as well as some other features.
Trint’s documentation is clear about its security measures. It does have the ability to decrypt users’ transcription and audio data, though it affirmatively commits to not doing so except in unusual cases, and only with written consent from a client.
Trint’s platform privacy policy says that it relies on MongoDB Atlas, a cloud database. While MongoDB’s employees can technically access data uploaded to Trint, the cloud database service has policies and controls to constrain such access to a small group of engineers, “only to ensure the reliability of service.”
Trint also uses a service called Transloadit, which helps upload and process files. Transloadit commits to store processed files for 24 hours before purging them, adding: “Transloadit employees only look at your files to troubleshoot problems. This rarely happens, and when it does, we do so with the understanding that anything we see is to be kept strictly confidential.”
Trint does not offer two-factor authentication.
We recommend avoiding transcription altogether if your audio, in the wrong hands, could put people at risk. However, there are some situations where a transcription service is a necessary or easier choice, like when transcribing an interview that will be published in full. Because these services usually have access to your audio and transcripts, journalists must still make subjective decisions about when to share files with a transcription service.
To think about before uploading:
If you are working with sensitive material — recordings that could put someone at risk if they were made public or turned over to authorities — we suggest severely limiting who has access to that data. If you are working with an editor or on a deadline and your recordings are particularly sensitive, consider pushing for assistance in hand transcription or later deadlines due to the security of the recordings.
If you are working with material that’s sensitive until it’s published — embargoed research, for example — the risk of leaks is low and the outcome would be harmful, but not catastrophic. In this instance, using a service makes more sense, though you’d still want to use one with good privacy and security protections.
Consider using AI instead of human transcription. Although files can still be leaked through AI transcription services, you might feel more comfortable knowing a human doesn’t need to listen in.
Transcription services can be an important part of the journalistic workflow, but the services available today may introduce risk.
Take a few steps to use transcription services safely:
There are some alternative transcription tools that avoid sharing a transcript with a third party, but none are perfect and the quality can be inconsistent. Gentle can run on your own computer, and there are a number of “offline-friendly” transcription apps if you have a compatible device. Google Recorder only works on Google Pixel phones, while Ada Dictation only works on Apple devices. These are just a few examples, there are thousands of these tools out there, each with their own tradeoffs.
Whatever service you choose, glance through its security documentation to make sure there aren’t any unwanted surprises. In particular, look for policies surrounding the circumstances under which employees may access data and how the organization stores its data. If possible, opt for services that enable two-factor authentication. You can also reach out to our digital security team if you need help.
This story was originally published on the website of the Freedom of the Press Foundation. It is reprinted here using the group’s Creative Commons 4.0 license.
Martin Shelton is the principal researcher at Freedom of the Press Foundation, conducting user research and overseeing security editorial. As a UX researcher he previously worked with Google Chrome and the Coral Project at The New York Times. He earned his Ph.D. at the University of California at Irvine.
Yael Grauer is an investigative tech reporter covering digital privacy and security for Consumer Reports. She is the lead content creator of CR Security Planner and has covered topics such as police surveillance, clandestine trackers, security vulnerabilities, VPNs, and hacking for publications including The Intercept, Popular Science, Vice, Wirecutter, and WIRED.