Bitget App
Trade smarter
Buy cryptoMarketsTradeFuturesCopyBotsEarn
OpenAI releases CoT monitoring to prevent malicious behavior in large models

OpenAI releases CoT monitoring to prevent malicious behavior in large models

Bitget2025/03/10 23:35

OpenAI has released its latest research, indicating that using CoT (Chain of Thought) monitoring can prevent large models from spouting nonsense, hiding true intentions and other malicious behaviors. It is also one of the effective tools for supervising supermodels. OpenAI used the newly released cutting-edge model o3-mini as the subject to be monitored, with a weaker GPT-4o model acting as the monitor. The test environment was coding tasks, requiring AI to implement functions in code libraries to pass unit tests. Results showed that CoT monitors performed excellently in detecting systematic "reward hacking" behavior, with a recall rate as high as 95%, far exceeding the 60% of only monitoring behavior.

0

Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.

PoolX: Locked for new tokens.
APR up to 10%. Always on, always get airdrop.
Lock now!