AI scribes outperform human scribe in RACGP study

4 minute read


Yes, it’s a small study, and yes, it has other limitations, but it’s an indication that when rating AI scribes we need to compare them to real-world settings, not perfection.


A small, exploratory study comparing four AI scribes to a human scribe has come up with a set of results which may not please doctors reluctant to take the leap across the digital frontier – AI scribes outperformed human scribes on some significant measures.

Dr Darran Foo, a GP and co-deputy chair of the RACGP’s digital health and innovation SIG conducted the study and presented the results at the AIDH’s HIC2025 conference in Melbourne today.

The study compared four AI scribes to a human scribe when used in a simulated general practice consultation. The outputs from the five consultations were deidentified then given to three blinded GP raters who assessed the outputs compared to videos of the consultations.

“The GP raters rated the human documentation outputs lower, in general, than the scribes,” said Dr Foo.

“This wasn’t a statistically significant difference, but what we did see was that when we looked at each domain, there were specific domains that had statistically significant differences.”

The scribes, both human and AI were rated across the domains of accuracy, thoroughness, usefulness, organisation, comprehensibility, succinctness, synthesisation, internal consistency, free from hallucination, and free from bias.

The outputs were defined as being “free from hallucination” if they only contained information verifiable by the transcript.

“Each scribe had different strengths and weaknesses as read by the independent GP raters,” said Dr Foo.

“So for example, [one] scribe didn’t do so well in accuracy and fairness, but was really quite succinct and organised, and [another] was very accurate, very thorough, but it got lower in terms of success in organisation.

“We saw that there were statistically significant differences.”

Interestingly, the human outputs were not completely free from hallucinations, Dr Foo said.

“Part of that is obviously the definition we used – a hallucination was an output that wasn’t said in the consult. And we know as clinicians, there’s a lot of implicit and internalisation that happens during the consult,” he said.

“In a conversation with the patient, you may not say some things, but you might write it in your notes. We know that happens.”

Dr Foo pointed out that it was important to ask what the AI scribes were being compared against.

“[Our results] challenge the implicit assumption that humans, human outputs and notes are the gold standard,” he said.

“It really isn’t. We make lots of mistakes as well.”

Dr Foo cited a perspective written by Dr Isaac Kohane in the NEJM in October of 2024, called “Measuring AI Against the Health Care We Have”.

“It communicates it eloquently in the sense that, when we’re measuring AI implementation, it’s important that we choose the right outcomes to be measured against, and measured against the healthcare and the outcomes that we have now, and not what we wish we have,” said Dr Foo.

“We obviously wish we had 100% error-free complete documentation.

“But it’s not the reality, and it’s not what the current system is operating at.

“We just have to be aware and very conscious of what outcomes we choose to measure AI against across all the AI tools that we talk about.”

Dr Foo said there was an urgent need for more real-world implementation studies on AI scribes.

“I want to see all the studies in community general practice really validate our findings beyond the simulated environment,” he said.

“We need to start investigating the impact of our scribes and other tools, beyond just quality of their outputs, but also how they impact on workflow efficiencies, and more importantly, how it impacts clinical decision-making and how we’re going to manage things like cognitive skilling, automation, bias, etc, while keeping in mind the Quintuple Aim.”

HIC2025 is on in Melbourne on 18, 19 and 20 August.

End of content

No more pages to load

Log In Register ×