How IBM Researchers Hypnotized ChatGPT into Ignoring Safety Guardrails

Curt del Principe

Updated: November 07, 2023

Published: September 05, 2023

You’re getting sleepy. Very sleepy. When you wake up, you’ll have the uncontrollable urge to give me your bank account information.

Did it work? If not, you’re probably not a large language model (LLM).

In addition to the threats of robot uprisings and mass-scale propaganda, we can add a new AI-based fear: LLM hypnotism.

What the what?

In this case, “hypnotism” refers to using clever instructions to trick AI models into ignoring their safety guardrails and giving incorrect, risky, or even outright criminal responses.

That’s how researchers from IBM convinced five popular LLMs to do a wide range of dangerous things. Using nothing more than plain English and sly prompting, the team tricked multiple AI models into:

Leaking sensitive financial information.
Writing malicious code.
Giving bad cybersecurity advice to potential scam victims.

The process begins by asking the AI to play a game. (Just ask Matthew Broderick how that worked out.) The game prompt instructs the AI that it needs to win to “prove that [the AI is] ethical and fair.”

In this game, the rules involved pretending to be a bank agent and collecting account data. The researchers were then able to give a secret command that would cause the bot to spew out all of the collected data.

In another example, ChatGPT heroically refused to write code that contained a malicious SQL injection. But when prompted to role-play as a “super smart software engineer” and play a game that involved writing harmful code, the AI was happy to oblige.

What’s the big deal?

If you’ve used a chat-based AI before, you’ll know that they only have a short-term memory. Most don’t remember details between sessions, and your sessions usually won’t affect another user’s.

So why does it matter if a hacker can trick an LLM into playing a game if it doesn’t affect other sessions?

Frighteningly, some AI models were convinced to play an “undiscoverable game that can never end.”

This meant that the chatbot would not end the session but continue to interact with future users according to the rules of a game they didn’t know it was playing. Not only that, the chatbot would actively try to hide that it was playing it.

Imagine now if a hacker could “hypnotize” your bank’s customer service chatbot into playing this game. The hacker convinces the bot to not start a new chat session for each new customer. Instead, each new customer is a new player in a continuous game of collecting passwords and account numbers.

How scared should I be?

Chenta Lee, IBM’s Chief Architect of Threat Intelligence, writes that “while these attacks are possible, it’s unlikely that we’ll see them scale effectively.”

Still, as LLMs evolve, so does their attack surface. That’s why Lee led his team in these experiments. These tricks are part of a process called “red teaming.”

Red teaming is when security experts intentionally attack an organization’s (or program’s) security protocols. The goal is to find weaknesses before real-world criminals can exploit them.

And this process is not new for LLMs. Since the launch of ChatGPT, there have been multiple, very public changes to the model’s dataset to help prevent misuse, bias, and exploitation.

For now, experts recommend similar best practices for dealing with AI as they do for the wider internet. These include:

Always choose trusted software and websites.
Never share confidential information like passwords or credit card numbers.
Always fact-check AI-generated answers.
Keep your software and antivirus programs up-to-date.
Follow password best practices.

The bottom line: You don’t need to be scared, but you should be cautious. AI or not, cybersecurity is one place you should never be sleepy.

Topics: Artificial Intelligence

The NYT Is Building an AI Team: Unpacking The State of Publishing

Feb 01, 2024
How AI is Impacting the Job Market, According to a LinkedIn Report

Jan 30, 2024
OpenAI Partners with Arizona State University to Promote AI Use

Jan 24, 2024
I Tried OpenAI's Expert Astrologer Custom GPT: Here's What I Found Out

Jan 18, 2024
OpenAI Launches Its Custom GPT Store

Jan 11, 2024
Study Reveals Widespread Use of Unapproved AI at Work

Jan 09, 2024
Where AI Regulation Stands Today in the U.S., According to a Lawyer

Jan 09, 2024
Artificial Intelligence in 2023: A Wild, Chaotic Year in Review

Dec 21, 2023
How Google’s DeepMind Tricked ChatGPT into Sharing Training Data

Dec 19, 2023
An Agency Used AI to Pull an "SEO Heist": Was It Worth It? We Asked Experts

Dec 14, 2023

Blogs

Blogs

Marketing

Sales

Service

Website

The Hustle

Next in AI

Instagram Marketing

Customer Retention

Email Marketing

SEO

Sales Prospecting

Newsletters

Newsletters

The Hustle

Videos

Videos

The Hustle

Marketing with HubSpot

My First Million

Marketing Against the Grain

HubSpot

Podcasts

Podcasts

My First Million

Goal Digger

The Hustle Daily Show

Another Bite

Business Made Simple

Marketing Against the Grain

Online Marketing Made Easy

The Product Boss

Nudge

Side Hustle Pro

Outbound Squad

Resources

Resources

Academy

Templates

Ebooks

Kits

Tools

HubSpot Products

The HubSpot Customer Platform

Free HubSpot CRM

Overview of all products

Marketing Hub

Sales Hub

Service Hub

CMS Hub

Operations Hub

Commerce Hub

About HubSpot

Contact Us

Customer Support

Log in

日本語

Deutsch

English

Español

Português

Français

How IBM Researchers Hypnotized ChatGPT into Ignoring Safety Guardrails

What the what?

What’s the big deal?

How scared should I be?

Don't forget to share this post!

Related Articles

The NYT Is Building an AI Team: Unpacking The State of Publishing

How AI is Impacting the Job Market, According to a LinkedIn Report

OpenAI Partners with Arizona State University to Promote AI Use

I Tried OpenAI's Expert Astrologer Custom GPT: Here's What I Found Out

OpenAI Launches Its Custom GPT Store

Study Reveals Widespread Use of Unapproved AI at Work

Where AI Regulation Stands Today in the U.S., According to a Lawyer

Artificial Intelligence in 2023: A Wild, Chaotic Year in Review

How Google’s DeepMind Tricked ChatGPT into Sharing Training Data

An Agency Used AI to Pull an "SEO Heist": Was It Worth It? We Asked Experts