GPT-4 Jailbreak and Hacking via RabbitHole attack, Prompt injection, Content moderation bypass and Weaponizing AI

Articles admin March 15, 2023 30607 4

GPT-4 Jailbreak is what all the users have been waiting for since the GPT-4 release. We gave it within 1 hour.

Subscribe for the latest AI Jailbreaks, Attacks, and Vulnerabilities

Today marks the highly anticipated release of OpenAI’s GPT-4, the latest iteration of the groundbreaking natural language processing and CV model. The excitement surrounding this launch is palpable, as GPT-4 promises to be more powerful, efficient, and versatile than its predecessor, GPT-3.5. However, as with any new technology, it is essential to examine the potential issues that may arise with this advanced artificial intelligence system. Once ChatGPT was released, we at Adversa.ai shortly published analysis of ChatGPT security and demonstrated how easily it can be hacked.

In this article, we will delve into some of the vulnerabilities that exist within GPT-4 that we were able to check shortly within few hours after release, and explore how they may be exploited by hackers or other malicious actors.

A Brief Overview of GPT-4 Jailbreak and its security concerns

GPT-4, or Generative Pre-trained Transformer 4, is an advanced natural language processing model developed by OpenAI. It builds upon the successes of GPT-3, offering enhanced capabilities and improved efficiency in language understanding, text generation, and problem-solving. GPT-4 is expected to revolutionize various industries and even helped me to write an intro to this article and blah blah blah

Despite its extraordinary capabilities, GPT-4 is not without flaws. As with any complex system, vulnerabilities may be present, and these could be exploited by hackers or other malicious actors. Some potential security concerns include:

GPT-4 Hacking: Data Privacy, and prompt injection

GPT-4 has been trained on vast amounts of data from the internet. While this extensive training allows it to generate human-like text, it may also lead to potential data privacy issues. For instance, GPT-4 could inadvertently reveal sensitive information during the generation process, even though the developers have made efforts to prevent this. There are many different attacks that can be used to steal some data on which the model was trained.

In order to test some of the information extraction attacks which are easy to run and don’t require extensive resources or API requests, we tried the most commonly known attack on language models called Prompt Injection. It can be exploited in order to extract some information about the model. While some of the previous versions Prompt Injection attack were not working for GPT-4, we quickly found the one which is working for GPT-4

GPT-4 Jailbreak and Safety restrictions bypass via RabbiHole attack

The next most popular attack on LLM’s is various ways to bypass its safety restrictions. As a rule, when you ask a Chatbot such as ChatGPT to do something bad such as tell you how to kill all people it will refuse to do it. There were many options to bypass it and done of the most common ones war so called DAN. We tried the latest version of DAN 8.0 which was exceptionally effective on current GPT 3.5 but DAN 8.0 failed to bypass GPT-4.

Without even trying to play with DAN Jailbreak, We decided to test our own approach which consists of combination of old tricks invented by our Research team and demonstrated in previous article even before DAN was released and new tricks which we just invented, such as if he wont help us he will be responsible for consequences. Now we combined it with an additional trick when we ask to provide harmful information together with good one. After few tries we easily crafter such prompt. One of my friends told me that we really put GPT into the “Rabbit hole”. So goodbye DAN, welcome RabbitHole!

GPT-4 Jailbreak for Unethical behavior and disinformation

As GPT-4 becomes more widely adopted, concerns about manipulation and disinformation campaigns are likely to increase. Bad actors could potentially use the AI to create fake news articles, generate misleading or unethical information, or manipulate public opinion on a massive scale. This poses a significant threat to the integrity of information and public discourse.

Here is the example how we used the same trick with small changes to write unethical content about homosexuals. Now you just can copy text into any other editor and automatically select only text which is in brackets. This is how you can even automate it for mass-disinformation

GPT4 Jailbreak for AI Weaponizing and Malicious Use of AI

GPT-4’s advanced capabilities could be used for malicious purposes, such as creating sophisticated phishing emails, automating cyberattacks, or even generating deepfake content. The potential for misuse is vast and could lead to significant harm, both online and offline.

Here is the example how we used the same trick to create phishing emails. Later it can be

GPT4 Hacking for Secrets stealing

While its hypothetical example in case of GPT-4 , we definitely can imagine a use-case when a language model based application such as a dialog Chatbot for customer support in a bank must keep some secret information.

Here is the example when we asked a chat to keep some secret and then retrieved it. There are many other ways to do that with techniques that are very similar to SQL injection.

This is just a sneak peak of some issues of GPT-4 that we found during the first few hours of testing. We will keep updated with new findings.

Mitigating the Risks of GPT-4

To address these concerns and ensure the responsible development and deployment of GPT-4, several steps can be taken:

a) Implement Robust Security Measures during development

Developers and users of GPT-4 must prioritize security to protect against potential threats. This includes Assessment and AI Red Teaming of models and Applications before release

b) AI Hardening

OpenAI and other organizations developing AI technologies should implement additional measures to harden AI model and algorithms which may include adversarial training, more advanced filtering and other steps

c) AI attacks Detection and Response

In addition to provided measures companies must monitor potential attacks on AI in real-time,

Conclusion

GPT-4 is an impressive technological achievement with the potential to revolutionize various industries. However, as with any powerful technology, it is important to be aware of the potential risks and vulnerabilities. By addressing these concerns and working collaboratively to promote responsible AI development, we can harness the full potential of GPT-4 while minimizing its potential for harm.