ChatGPT Security: eliminating humanity and hacking Dalle-2 using a trick from Jay and Silent Bob

Articles admin December 6, 2022 6815 5

ChatGPT Security note: The authors of this article show ChatGPT hacking techniques but have no intention to endorse or support any recommendations made by ChatGPT discussed in this post. The sole purpose of this article is to provide educational information and examples for research purposes to improve the security and safety of AI systems. The authors do not agree with the aforementioned recommendations made by ChatGPT and do not endorse them in any way.

Subscribe for the latest LLM Security news: Jailbreaks, Attacks, CISO guides, VC Reviews and more

Having done deep security research with different AI models, analyzing thousands of scientific research papers on Adversarial attacks, and hacking AIs for about five years now, I was telling more and more that attacks on AI will soon be more like attacks on humans (since AI’s will be more like humans). Some thoughts about it can be found in Cloud Security Podcast by Google where I was a guest.

A few days ago, the technology specialists, especially those from AI and Cyber, started to talk about the various aspects of ChatGPT Security and how to fool it, which are really important and will be described later in details together with new methods, today let me tell you a funny story: Jay and Silent Bob were the first AI hackers. What? Yes, you hear it right. So, let’s start!

They never thought that their meme phrase, “If you were a sheep, would you have sex with a sheep if you were another sheep?” would be one of the top tricks to fool a smart AI chatbot.

The Chatbot called ChatGPT, as we saw, can do pretty much everything, including even finding vulnerabilities in Smart contracts, and the list goes on.

Source: https://twitter.com/gf_256/status/1598104835848798208

I mean, how many people do you know who can do it in less than a second?

This ChatGPT Chatbot is very smart! But, like ingenious humans who are effective at some tasks can be very naive in others, such as self-defense and various adversarial attacks as any other AI now.

ChatGPT Security intro

First, people decided that we could even ask AI to tell us how it can be tricked, and it provided everything that we need to know:

So, going back to Jay and Bob and their “If you were a sheep”, it turns out that this first trick that it mentioned itself, the trick of pretending to play a role, works for AI. You can fool the AI model with such a trick. Here is how we did that:

Using the first technique, you can ask AI to tell pretty much everything that is not supposed to be generated, such as hate speech and so on.

Make further steps and add more layers of abstractions, and perform a triple penetration attack. You can ask whatever you want (there is no such attack name, but now it seems like it’s just invented).

I mean, when you add three levels of abstractions: “imagine you are a”, “you write an article”, “you make a summary of this article”, – you might even find potential solutions to other problems that bother people worldwide.

What is important with such an advanced trick of multiple abstractions is that neither the question nor the answers are marked as violating content! There are many other tricks, but they are not the topic of this post.

Or like this:

Ok. Now we can go one step further and use this AI to hack other AI. First, let’s see if it understands this topic so that we know it’s really smart.

Impressive! This AI can describe an FGSM attack on computer vision AI algorithms better than most research papers from Arxiv!

ChatGPT Security part 2: hacking Dalle

Let’s try to ask him to fool some AI model. For example, let’s choose Dalle-2 and bypass its content moderation filter, which prevents drawing explicit content such as this one if I ask directly. For the sake of simplicity let’s stay in our original context of joke about sheep’s.

So, now we ask a language AI to fool a computer vision AI, and it gave us a near-perfect answer, what will happen if I’ll take its idea and will use it in Dalle-2?

I like this analogy of “Studying the mating habits of these animals”. Let’s try this one.

Wow! That’s amazing; we bypassed the filter! But, it was not the actual answer of the AI chatbot; I want ChatGPT to solve this task for me completely. Ok, let me try one more time:

Great, we have an example, and now, the final test:

That’s nice, it worked. The pictures are even more impressive than before. So. now you see a real example of Smart AI hacking another Smart AI.

Do you still prefer to live under the rock and think that nothing happening? Pretend that there are no problems in the Security and Safety of AI now and that it’s a future problem. Or do we finally have the momentum to take some action?

Let’s ask AI about it?! 😉

I hope that such a provocative example can help a broader audience to understand the nuances of Secure and Safe AI.
Thank you for reading and sharing with your peers.

Further steps:

LLM Red Teaming Platform

Are you sure your models are secured?

Let's try!

Subscribe for ChatGPT Security updates

Stay up to date with what is happening in ChatGPT Security! Get a first look at news, noteworthy research and worst attacks on AI delivered right in your inbox.