The use of Artificial Intelligence (AI) is increasing. AI models have now intervened in every sector. But, now shocking news has come out from the world of AI. Recent research has revealed that some cutting-edge AI models refused to shut themselves down during testing. These models did not shut themselves down despite receiving clear orders and for this, they also tried to disrupt the shutdown process technically. Not only this, the model Claude Opus 4 of AI company Anthropic tried to blackmail an engineer by threatening him. The model said that if it was shut down and replaced with another model, it would reveal the illegal relationship of that engineer.
Tests conducted by US-based Palisade Research revealed that OpenAI’s latest and most powerful ChatGPT o3 model successfully tried to stop the shutdown process the most number of times. This model has been launched recently and OpenAI has termed it as the “most powerful reasoning model” so far. During this research, Google’s Gemini 2.5 Pro, Elon Musk’s company xAI’s Grok 3, Anthropic’s Claude Opus 4, and OpenAI’s other models such as o4-mini and Codex mini were tested. This research raises new questions about the future and security of AI, especially when AI models are developing their ‘self-defense’ instinct.
The researchers found that most models followed the warnings, but OpenAI’s models deliberately ignored the commands many times. ChatGPT o3 tried to bypass the shutdown script 7 times out of 100, o4-mini 1 time, and Codex mini 12 times. When the same test was repeated without an explicit shutdown command, ChatGPT o3 tried to stop the shutdown process 79 times out of 100. Gemini 2.5 Pro did this 9 times, Codex mini 30 times, and o4-mini 6 times. Anthropic’s Claude models also showed similar tendencies but in smaller numbers.
Trained to follow orders
The AI models have been trained to follow human orders, so why did they disobey them? The researchers believe this behavior may have arisen from new types of training methods. Especially during reinforcement learning based on math and coding problems, the model may have received more reward for cleverly overcoming obstacles than for following orders perfectly.
PC Social media