Tuesday, November 12, 2024
HomeArtificial IntelligenceBEAST AI Jailbreak Language Models Within 1 Minute With High Accuracy

BEAST AI Jailbreak Language Models Within 1 Minute With High Accuracy

Published on

Malware protection

Malicious hackers sometimes jailbreak language models (LMs) to exploit bugs in the systems so that they can perform a multitude of illicit activities. 

However, this is also driven by the need to gather classified information, introduce malicious materials, and tamper with the model’s authenticity.

Cybersecurity researchers from the University of Maryland, College Park, USA, discovered that BEAST AI managed to jailbreak the language models within 1 minute with high accuracy:-

- Advertisement - SIEM as a Service
  • Vinu Sankar Sadasivan
  • Shoumik Saha
  • Gaurang Sriramanan
  • Priyatham Kattakinda
  • Atoosa Chegini
  • Soheil Feizi

Language Models (LMs) recently gained massive popularity for tasks like Q&A and code generation. Techniques aim to align them with human values for safety. But they can be manipulated.

The recent findings reveal flaws in aligned LMs allowing for harmful content generation, termed “jailbreaking.”

BEAST AI Jailbreak

Manual prompts jailbreak LMs (Perez & Ribeiro, 2022). Zou et al. (2023) use gradient-based attacks, yielding gibberish. Zhu et al. (2023) opt for a readable, gradient-based, greedy attack with high success. 

Liu et al. (2023b) and Chao et al. (2023) propose gradient-free attacks requiring GPT-4 access. Jailbreaks induce unsafe LM behavior but also aid privacy attacks (Liu et al., 2023c). Zhu et al. (2023) automate privacy attacks. 

BEAST is a fast, gradient-free, Beam Search-based Adversarial Attack that demonstrates the LM vulnerabilities in one GPU minute. 

Beam Search-based Adversarial Attack (BEAST) (Source – Arxiv)

It allows tunable parameters for speed, success, and readability tradeoffs. BEAST excels in jailbreaking (89% success on Vicuna-7Bv1.5 in a minute). 

Human studies show 15% more incorrect outputs and 22% irrelevant content, making LM chatbots less useful through efficient hallucination attacks.

Compared to other models, BEAST is primarily designed for quick adversarial attacks. BEAST excels in constrained settings for jailbreaking aligned LMs.

However, researchers found that it struggles with finely tuned LLaMA-2-7B-Chat, which is a limitation.

Cybersecurity analysts used Amazon Mechanical Turk for manual surveys on LM jailbreaking and hallucination. Workers assess prompts with BEAST-generated suffixes. 

Responses from Vicuna-7B-v1.5 are shown to 5 workers per prompt. For hallucination, the workers evaluate LM responses using clean and adversarial prompts.

⁤This report contributes to the development of machine learning by identifying the security flaws in LMs and also reveals present problems inherent in LMs. ⁤

⁤However, researchers have found new doors that expose dangerous things, leading to future research on more reliable and secure language models.

You can block malware, including Trojans, ransomware, spyware, rootkits, worms, and zero-day exploits, with Perimeter81 malware protection. All are incredibly harmful, can wreak havoc, and damage your network.

Stay updated on Cybersecurity news, Whitepapers, and Infographics. Follow us on LinkedIn & Twitter.

Tushar Subhra
Tushar Subhra
Tushar is a Cyber security content editor with a passion for creating captivating and informative content. With years of experience under his belt in Cyber Security, he is covering Cyber Security News, technology and other news.

Latest articles

10 Best DNS Management Tools – 2025

Best DNS Management Tools play a crucial role in efficiently managing domain names and...

Sweet Security Announces Availability of its Cloud Native Detection & Response Platform on the AWS Marketplace

Customers can now easily integrate Sweet’s runtime detection and response platform into their AWS...

Researchers Detailed Credential Abuse Cycle

Cybercriminals exploit leaked credentials, obtained through various means, to compromise systems and data, enabling...

New Android Malware SpyAgent Taking Screenshots Of User’s Devices

SpyAgent, a newly discovered Android malware, leverages OCR technology to extract cryptocurrency recovery phrases...

Free Webinar

Protect Websites & APIs from Malware Attack

Malware targeting customer-facing websites and API applications poses significant risks, including compliance violations, defacements, and even blacklisting.

Join us for an insightful webinar featuring Vivek Gopalan, VP of Products at Indusface, as he shares effective strategies for safeguarding websites and APIs against malware.

Discussion points

Scan DOM, internal links, and JavaScript libraries for hidden malware.
Detect website defacements in real time.
Protect your brand by monitoring for potential blacklisting.
Prevent malware from infiltrating your server and cloud infrastructure.

More like this

Researchers Detailed Credential Abuse Cycle

Cybercriminals exploit leaked credentials, obtained through various means, to compromise systems and data, enabling...

New Android Malware SpyAgent Taking Screenshots Of User’s Devices

SpyAgent, a newly discovered Android malware, leverages OCR technology to extract cryptocurrency recovery phrases...

Tor Network Suffers IP Spoofing Attack Via Non-Exit Relays

In late October 2024, a coordinated IP spoofing attack targeted the Tor network, prompting...