"every major model AI gets dramatically worse the longer you talk to it."
Also, when you feed an AI back it's won outputs you get rapid "model collapse". And since LLMs have soaked-up all the text that exists in the world, they have no option but to use "synthetic data" (their… https://t.co/lRFEms4GKD
— Ewan Morrison (@MrEwanMorrison) February 17, 2026
Microsoft Research and Salesforce analyzed 200,000+ AI conversations and found something the entire industry already suspected but nobody would say out loud.
every major model gets dramatically worse the longer you talk to it.
GPT-4, Claude, Gemini, Llama. all of them. no… pic.twitter.com/Q3ZSe1HUHl
— Robert Youssef (@rryssf_) February 17, 2026
here's the part that should bother you:
the researchers decomposed the performance drop into two components.
aptitude (the model's raw ability to solve the task) only dropped 16%.
but unreliability (the gap between best-case and worst-case output) increased by 112%.…
— Robert Youssef (@rryssf_) February 17, 2026
why does this happen?
because every model is trained overwhelmingly on single-turn interactions. clean question, clean answer. fully specified from the start.
real conversations don't work like that.
you interrupt. you correct yourself. you circle back to something from 8…
— Robert Youssef (@rryssf_) February 17, 2026
one company that seems to have understood this problem before the paper existed: PolyAI.
they didn't train on clean benchmarks. they trained their proprietary models on billions of real customer service calls. the messy kind. accents, background noise, people interrupting… pic.twitter.com/XdgXyQXG13
— Robert Youssef (@rryssf_) February 17, 2026