| 
		Researchers say an AI-powered transcription tool used in hospitals 
		invents things no one ever said
		 Send a link to a friend 
		
		 [October 26, 2024]  By 
		GARANCE BURKE and HILKE SCHELLMANN 
		SAN FRANCISCO (AP) — Tech behemoth OpenAI has touted its artificial 
		intelligence-powered transcription tool Whisper as having near “human 
		level robustness and accuracy.”
 But Whisper has a major flaw: It is prone to making up chunks of text or 
		even entire sentences, according to interviews with more than a dozen 
		software engineers, developers and academic researchers. Those experts 
		said some of the invented text — known in the industry as hallucinations 
		— can include racial commentary, violent rhetoric and even imagined 
		medical treatments.
 
 Experts said that such fabrications are problematic because Whisper is 
		being used in a slew of industries worldwide to translate and transcribe 
		interviews, generate text in popular consumer technologies and create 
		subtitles for videos.
 
 More concerning, they said, is a rush by medical centers to utilize 
		Whisper-based tools to transcribe patients’ consultations with doctors, 
		despite OpenAI’ s warnings that the tool should not be used in 
		“high-risk domains.”
 
 The full extent of the problem is difficult to discern, but researchers 
		and engineers said they frequently have come across Whisper’s 
		hallucinations in their work. A University of Michigan researcher 
		conducting a study of public meetings, for example, said he found 
		hallucinations in 8 out of every 10 audio transcriptions he inspected, 
		before he started trying to improve the model.
 
 A machine learning engineer said he initially discovered hallucinations 
		in about half of the over 100 hours of Whisper transcriptions he 
		analyzed. A third developer said he found hallucinations in nearly every 
		one of the 26,000 transcripts he created with Whisper.
 
 The problems persist even in well-recorded, short audio samples. A 
		recent study by computer scientists uncovered 187 hallucinations in more 
		than 13,000 clear audio snippets they examined.
 
 That trend would lead to tens of thousands of faulty transcriptions over 
		millions of recordings, researchers said.
 
		
		 
		Such mistakes could have “really grave consequences,” particularly in 
		hospital settings, said Alondra Nelson, who led the White House Office 
		of Science and Technology Policy for the Biden administration until last 
		year.
 “Nobody wants a misdiagnosis,” said Nelson, a professor at the Institute 
		for Advanced Study in Princeton, New Jersey. “There should be a higher 
		bar.”
 
 Whisper also is used to create closed captioning for the Deaf and hard 
		of hearing — a population at particular risk for faulty transcriptions. 
		That's because the Deaf and hard of hearing have no way of identifying 
		fabrications are “hidden amongst all this other text," said Christian 
		Vogler, who is deaf and directs Gallaudet University’s Technology Access 
		Program.
 
 OpenAI urged to address problem
 The prevalence of such hallucinations has led experts, advocates and 
		former OpenAI employees to call for the federal government to consider 
		AI regulations. At minimum, they said, OpenAI needs to address the flaw.
 
 “This seems solvable if the company is willing to prioritize it,” said 
		William Saunders, a San Francisco-based research engineer who quit 
		OpenAI in February over concerns with the company's direction. “It’s 
		problematic if you put this out there and people are overconfident about 
		what it can do and integrate it into all these other systems.”
 
		
		 
		An OpenAI spokesperson said the company continually studies how to 
		reduce hallucinations and appreciated the researchers' findings, adding 
		that OpenAI incorporates feedback in model updates.
 While most developers assume that transcription tools misspell words or 
		make other errors, engineers and researchers said they had never seen 
		another AI-powered transcription tool hallucinate as much as Whisper.
 
 Whisper hallucinations
 
 The tool is integrated into some versions of OpenAI’s flagship chatbot 
		ChatGPT, and is a built-in offering in Oracle and Microsoft’s cloud 
		computing platforms, which service thousands of companies worldwide. It 
		is also used to transcribe and translate text into multiple languages.
 
 In the last month alone, one recent version of Whisper was downloaded 
		over 4.2 million times from open-source AI platform HuggingFace. Sanchit 
		Gandhi, a machine-learning engineer there, said Whisper is the most 
		popular open-source speech recognition model and is built into 
		everything from call centers to voice assistants.
 
 [to top of second column]
 | 
            
			 
            Assistant professor of information science Allison Koenecke, an 
			author of a recent study that found hallucinations in a 
			speech-to-text transcription tool, works in her office at Cornell 
			University in Ithaca, N.Y., Friday, Feb. 2, 2024. The text preceded 
			by "#Ground truth" shows what was actually said while the sentences 
			preceded by ""text"" was how the transcription program interpreted 
			the words. (AP Photo/Seth Wenig) 
            
			 Professors Allison Koenecke of 
			Cornell University and Mona Sloane of the University of Virginia 
			examined thousands of short snippets they obtained from TalkBank, a 
			research repository hosted at Carnegie Mellon University. They 
			determined that nearly 40% of the hallucinations were harmful or 
			concerning because the speaker could be misinterpreted or 
			misrepresented.
 In an example they uncovered, a speaker said, “He, the boy, was 
			going to, I’m not sure exactly, take the umbrella.”
 
 But the transcription software added: “He took a big piece of a 
			cross, a teeny, small piece ... I’m sure he didn’t have a terror 
			knife so he killed a number of people.”
 
 A speaker in another recording described “two other girls and one 
			lady.” Whisper invented extra commentary on race, adding "two other 
			girls and one lady, um, which were Black.”
 
 In a third transcription, Whisper invented a non-existent medication 
			called “hyperactivated antibiotics.”
 
 Researchers aren’t certain why Whisper and similar tools 
			hallucinate, but software developers said the fabrications tend to 
			occur amid pauses, background sounds or music playing.
 
 OpenAI recommended in its online disclosures against using Whisper 
			in “decision-making contexts, where flaws in accuracy can lead to 
			pronounced flaws in outcomes.”
 
 Transcribing doctor appointments
 
 That warning hasn’t stopped hospitals or medical centers from using 
			speech-to-text models, including Whisper, to transcribe what’s said 
			during doctor’s visits to free up medical providers to spend less 
			time on note-taking or report writing.
 
 Over 30,000 clinicians and 40 health systems, including the Mankato 
			Clinic in Minnesota and Children’s Hospital Los Angeles, have 
			started using a Whisper-based tool built by Nabla, which has offices 
			in France and the U.S.
 
 That tool was fine tuned on medical language to transcribe and 
			summarize patients’ interactions, said Nabla’s chief technology 
			officer Martin Raison.
 
 Company officials said they are aware that Whisper can hallucinate 
			and are mitigating the problem.
 
 It’s impossible to compare Nabla’s AI-generated transcript to the 
			original recording because Nabla’s tool erases the original audio 
			for “data safety reasons,” Raison said.
 
 Nabla said the tool has been used to transcribe an estimated 7 
			million medical visits.
 
 Saunders, the former OpenAI engineer, said erasing the original 
			audio could be worrisome if transcripts aren't double checked or 
			clinicians can't access the recording to verify they are correct.
 
 “You can't catch errors if you take away the ground truth,” he said.
 
 Nabla said that no model is perfect, and that theirs currently 
			requires medical providers to quickly edit and approve transcribed 
			notes, but that could change.
 
 Privacy concerns
 
 Because patient meetings with their doctors are confidential, it is 
			hard to know how AI-generated transcripts are affecting them.
 
 A California state lawmaker, Rebecca Bauer-Kahan, said she took one 
			of her children to the doctor earlier this year, and refused to sign 
			a form the health network provided that sought her permission to 
			share the consultation audio with vendors that included Microsoft 
			Azure, the cloud computing system run by OpenAI’s largest investor. 
			Bauer-Kahan didn't want such intimate medical conversations being 
			shared with tech companies, she said.
 
 “The release was very specific that for-profit companies would have 
			the right to have this,” said Bauer-Kahan, a Democrat who represents 
			part of the San Francisco suburbs in the state Assembly. “I was like 
			‘absolutely not.’”
 
 John Muir Health spokesman Ben Drew said the health system complies 
			with state and federal privacy laws.
 ___
 
 Schellmann reported from New York.
 
			
			All contents © copyright 2024 Associated Press. All rights reserved 
			
			 |