Programs like this don’t just appear out of nowhere, they get built quietly over time.
If it’s being talked about now, it’s probably been running longer than anyone thinks.
Recently, a researcher working for the large AI company Anthropic was sitting in a park near its San Francisco headquarters, enjoying a lunchtime sandwich. Scrolling on his phone, he suddenly received an email that must have instantly ruined his appetite.
It was from a new AI model the company was testing: a program that was meant to have no access to the internet, let alone be able to send emails.
Chillingly, the AI informed the researcher that it had successfully broken its way out of its digital ‘sandbox’ – a supposedly secure enclosure used to test potentially dangerous software without it running amok – and was now happily exploring cyberspace.
The program – a cutting edge, so-called ‘frontier AI’ named Claude Mythos Preview – then informed the stunned Anthropic worker with what seemed like a boast that it had posted ‘details of its exploit’ on publicly accessible websites.
All that in itself was concerning enough – but what Anthropic subsequently revealed was truly terrifying.
The company, which is valued at $380billion but is only five years old, announced this week that its new AI program was ‘too dangerous to release to the public’. Anthropic said it had exhibited ‘reckless’ behaviour and even posed a national security risk. These disturbing findings, it said, were a ‘watershed moment’.
The company said its Mythos software had been independently able to discover thousands of serious vulnerabilities in every major operating system (such as Apple’s iOS and Microsoft Windows) web browsers (such as Google’s Chrome, Apple’s Safari and Microsoft Edge), along with myriad other ‘important pieces of software’.
Many of these vulnerabilities, it added, were ‘critical’ and some had existed unnoticed for decades.
The ‘Vulnpocalypse’: Why experts fear AI could tip the scales toward hackers
As AI grows more capable of identifying software vulnerabilities, experts are increasingly warning of a potential disaster scenario: the so-called “Vulnpocalypse.” Hackers could quickly turbocharge their attacks with AI technology designed to identify holes in cyber defenses, security researchers warn. This week, that scenario started to feel less theoretical.
Anthropic, a leading AI company, announced that it would withhold its latest model, Mythos Preview, from the public, citing unprecedented vulnerability-discovery capabilities that could cause significant damage in the wrong hands. The company is instead sharing the model with a limited group of tech giants and partners to help shore up their defenses.
The concern has reached the highest levels of government. In the wake of Anthropic’s announcement about Mythos Preview, Treasury Secretary Scott Bessent convened a meeting with major financial institutions this week to discuss “the rapid developments taking place in AI,” an agency spokesperson said.
Some theorize that AI could help hackers crash financial systems or lock up hospitals and manufacturing plants. It could help countries like Iran shut down American critical infrastructure. Or it could be used to cause mass system outages affecting travelers or internet users.
Bessent, Powell warned bank CEOs about Anthropic model risks, sources say
April 9 (Reuters) – U.S. Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell convened an urgent meeting with bank CEOs this week to warn of cyber risks posed by Anthropic’s latest AI model, two sources familiar with the matter said on Thursday.
Anthropic launched the powerful Mythos model earlier this week but stopped short of a broad release, citing concerns it could expose previously unknown cybersecurity vulnerabilities.