OpenAI’s new o1 model sometimes fights back when it thinks it’ll be shut down and then lies about it

Sharing is Caring!

Those who are worried that advancements in artificial intelligence could lead to the destruction of humanity have a new reason to be anxious.

New research on OpenAI’s latest series of AI models, known as o1, found that when they think they are at risk of being shut down, they will sometimes look for a way to prevent it.

OpenAI CEO Sam Altman referred to o1 as “the smartest model in the world now” during its official release on Thursday, during the first day of the company’s “Shipmas” campaign.

OpenAI said these models are “designed to spend more time thinking before they respond” and were trained on a prompting technique called “chain of thought” that encourages them to reason through problems by breaking them down step-by-step.

https://www.msn.com/en-us/technology/artificial-intelligence/openai-s-new-o1-model-sometimes-fights-back-when-it-thinks-it-ll-be-shut-down-and-then-lies-about-it/ar-AA1vpl12

OpenAI's new model tried to avoid being shut down.

Safety evaluations on the model conducted by @apolloaisafety found that o1 "attempted to exfiltrate its weights" when it thought it might be shut down and replaced with a different model. pic.twitter.com/e4g1iytckq

— Shakeel (@ShakeelHashim) December 5, 2024

h/t Emeraldlight