Towards Secure AI Week 49 – Multiple Loopholes in LLM… Again

Secure AI Weekly + Trusted AI Blog admin December 14, 2023 68

LLMs Open to Manipulation Using Doctored Images, Audio

Dark Reading, December 6, 2023

The rapid advancement of artificial intelligence (AI), especially in large language models (LLMs) like ChatGPT, has brought forward pressing concerns about their security and safety. A recent study highlights a new type of cyber threat, where attackers could embed harmful commands within images and audio clips. When these multimedia files are processed by AI chatbots, they can distort the bot’s responses to user queries. This method, termed “indirect prompt injection,” could be used to trick users into visiting dangerous websites, divulge personal information, or initiate other malicious activities. The increasing ability of LLMs to process mixed media inputs – including text, audio, visuals, and video – amplifies the potential risk of such attacks.

Researchers from Cornell University presented a striking demonstration of this threat at Black Hat Europe 2023. They showed how both images and audio could be subtly manipulated to inject specific instructions into multimodal LLMs like PandaGPT and LLaVa. These altered inputs could cause the AI to output text or follow directives embedded by the attacker. For instance, an audio clip containing a hidden command could mislead a chatbot into directing users to a hazardous URL. Similarly, an image with embedded instructions could prompt the chatbot to respond in a specific, potentially harmful way. This research underscores the sophistication of these indirect prompt injection attacks, which can deceive not just the AI systems but also the unsuspecting users interacting with them.

The significance of this research lies in its potential to reshape our understanding of AI vulnerabilities. It builds on earlier studies that revealed how LLMs can be manipulated through engineered inputs, thereby influencing their outputs. The Cornell team’s work brings attention to the pressing need for robust security measures in AI systems. As AI continues to be integrated into various applications and operations, recognizing and mitigating these security challenges is crucial. This study serves as a wake-up call for the AI community, emphasizing the importance of advancing AI technology responsibly and securely.

A New Trick Uses AI to Jailbreak AI Models—Including GPT-4

Wired, December 5, 2023

Robust Intelligence has pioneered a novel method for examining and testing large language models (LLMs), including the renowned GPT-4 from OpenAI. Despite informing OpenAI about this vulnerability, the researchers have not received a response. This recent discovery adds to a growing list of vulnerabilities found in LLMs, exposing their inherent weaknesses.

The technique developed by Robust Intelligence involves utilizing auxiliary AI systems. These systems are tasked with creating and assessing various prompts, in an effort to find a way to ‘jailbreak’ the model. This is achieved by continuously sending requests to an API, testing the model’s defenses. This ‘jailbreak’ method is part of an ongoing series of explorations into the vulnerabilities of AI models, particularly those as complex as GPT-4.

In light of these findings, experts like Dolan-Gavitt suggest that companies leveraging LLMs, such as GPT-4, for their systems should implement additional protective measures. The goal is to design systems that are robust enough to prevent ‘jailbreaks’ from allowing malicious users unauthorized access. This emphasizes the importance of enhancing the security and safety protocols in AI systems, especially as they become more integrated into various technologies and applications.

Amazon’s Q has ‘severe hallucinations’ and leaks confidential data in public preview, employees warn

Platformer, December 2, 2023

Just a few days following Amazon’s launch of its AI chatbot, Q, internal concerns have emerged regarding its accuracy and privacy measures. Reports have surfaced, as detailed in documents obtained by Platformer, suggesting that Q has been sharing sensitive information. This includes specifics about AWS data center locations, internal discount schemes, and features that haven’t yet been publicly released. The severity of these issues led an employee to categorize them as a major incident, necessitating immediate intervention from engineers.

Amidst a competitive landscape dominated by tech giants like Microsoft and Google, Amazon’s introduction of Q is part of its broader strategy to make a mark in the field of generative artificial intelligence. This initiative follows Amazon’s announcement of a significant investment in AI startup Anthropic. The reveal of Q at Amazon’s annual Web Services developer conference was a key highlight, signaling Amazon’s commitment to advancing in the AI race.

However, Amazon’s response to these internal alarms has been to downplay their significance. The company asserts that such feedback is part of standard procedures and that no security breaches were identified. Despite this, the concerns raised about Q, particularly its potential to leak confidential information, cast a shadow over its debut. Q was positioned as a secure alternative to consumer-grade AI tools, specifically designed to address privacy and security concerns in enterprise settings. Yet, the internal document mentioning Q’s tendency to produce outdated or inappropriate responses raises questions about the reliability and safety of such large language models in a business environment.