Pro@programming.dev to

Fuck AI@lemmy.worldEnglish · 4 days ago

Top AI models fail spectacularly when faced with slightly altered medical questions

jamanetwork.com

178

Top AI models fail spectacularly when faced with slightly altered medical questions

jamanetwork.com

Pro@programming.dev to

Fuck AI@lemmy.worldEnglish · 4 days ago

Just a moment...

jamanetwork.com

cross-posted from: https://programming.dev/post/36289727

Comments

Reddit.

Our findings reveal a robustness gap for LLMs in medical reasoning, demonstrating that evaluating these systems requires looking beyond standard accuracy metrics to assess their true reasoning capabilities.6 When forced to reason beyond familiar answer patterns, all models demonstrate declines in accuracy, challenging claims of artificial intelligence’s readiness for autonomous clinical deployment.

A system dropping from 80% to 42% accuracy when confronted with a pattern disruption would be unreliable in clinical settings, where novel presentations are common. The results suggest that these systems are more brittle than their benchmark scores suggest.

Chat

hades@feddit.uk
link
fedilink
arrow-up
8·
4 days ago
flipping a coin fails spectacularly at making any decisions other than what to have for dinner
- EnsignWashout@startrek.website
  link
  fedilink
  arrow-up
  8·
  4 days ago
  You’ve summarized the value of current generation AI well.
  
  It excels exactly when the result doesn’t matter in the slightest.

Fuck AI@lemmy.world

fuck_ai@lemmy.world

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !fuck_ai@lemmy.world

“We did it, Patrick! We made a technological breakthrough!”

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

881 users / day
2.36K users / week
6.21K users / month
6.77K users / 6 months
1 local subscriber
3.84K subscribers
466 Posts
4.02K Comments
Modlog