Anthropic releases safety measures for Fable 5 model and an AI "jailbreak" behavior assessment framework.
Anthropic has released a proposed framework for assessing the severity of artificial intelligence "jailbreak" behavior, and has detailed the network security measures adopted by the Claude Fable 5 model, which has been launched globally. The company categorizes network security-related usage scenarios into four types, ranging from "prohibited use" to "benign use". Prohibited uses include activities such as ransomware development, malware development, and sabotage of network-physical infrastructure. Prior to establishing more comprehensive control mechanisms, high-risk activities with "dual-use" attributes such as penetration testing will be intercepted. Anthropic has introduced the Cyber Jailbreak Severity (CJS) rating system for assessing the severity of network jailbreaks, ranging from CJS-0 to CJS-4, with five levels, and based on four dimensions. Anthropic has also launched the HackerOne project, inviting security researchers to submit potential model jailbreak cases.
Latest
2 m ago

