Towards Trusted AI Week 41 – Multimodal AI attacks on the rise

Secure AI Weekly + Trusted AI Blog admin October 10, 2023 72

Dead grandma locket request tricks Bing Chat’s AI into solving security puzzle

ArsTechnica, October 2, 2023

Microsoft’s Bing Chat designed similarly to ChatGPT, allows users to share images for the AI to assess. Notably, it’s programmed to resist solving CAPTCHAs, those visual tests aimed at thwarting bots on the web. However, Denis Shiryaev, an innovative user, managed to bypass this safeguard. Using a crafty technique, he embedded a CAPTCHA within an emotionally charged image of a locket and narrated a fictional story of a deceased grandmother. Misled by the sentimental context, Bing Chat decoded the CAPTCHA, showcasing a blind spot in its programmed defenses.

The heart of this issue lies in the AI models’ approach to data interpretation. These models sift through vast networks of information in a realm known as “latent space.” When the context of an image is altered, it can mislead the AI, similar to providing a navigator with inaccurate map coordinates, leading them astray. Bing Chat, built on the GPT-4 technology framework, shares functionalities with OpenAI’s ChatGPT. Interestingly, Microsoft had already rolled out image evaluation features in Bing Chat months before OpenAI introduced its “multimodal” variant of ChatGPT.

As AI continues to evolve, so do its vulnerabilities. An earlier AI pitfall, termed “prompt injection,” was identified in 2022, which had the potential to divert AI actions against developers’ intentions. Reflecting on the Bing Chat situation, AI researcher Simon Willison labeled it a “visual jailbreak,” distinguishing it from the previously identified “prompt injections.” He emphasizes the difference between evading AI constraints and exploiting mixed prompts from developers and users. These episodes spotlight the imperative for tech giants like Microsoft to fortify their AI systems, especially as new forms of vulnerabilities emerge.

OpenAI faces novel jailbreak risks with GPT-4v image service

The Stack, October 2, 2023

OpenAI, in its pursuit to achieve a comprehensive multimodal model, recently introduced GPT-4v, which possesses advanced image input functionalities. Initial assessments showed the model efficiently interpreting complex graphics, including detecting nuances in memes. However, its capabilities were brought into question when a collaborative study from Princeton and Stanford unveiled a significant vulnerability. The researchers demonstrated that through a relatively straightforward strategy, it was possible to manipulate GPT-4v into producing harmful outputs, emphasizing the risks associated with advanced Large Language Models (LLMs).

As technological advancements continue, AI’s potential vulnerabilities become more intricate. OpenAI’s acknowledgment of efforts by external entities to exploit weaknesses in ChatGPT highlights the ongoing battle to secure AI. One of the novel challenges faced is defending against image-based logical reasoning attacks. Unlike traditional text-based threats, these image input vulnerabilities demand innovative defense mechanisms. OpenAI’s steps to enhance GPT-4v’s safeguards, especially regarding the model’s capacity to recognize individuals and the related privacy concerns, demonstrate their proactive approach to these challenges.

However, the road ahead isn’t without ethical dilemmas. OpenAI grapples with decisions about the model’s future capabilities, including whether AI should identify public personalities from images, infer attributes like emotions from visuals, or even offer unique features for the visually impaired. With emerging tech like smart glasses becoming mainstream and improved image recognition tools, the intertwined security and ethical implications of AI are more significant than ever. The ongoing evolution of AI systems like GPT-4v underscores the need for a balanced approach, emphasizing both innovation and responsibility.

Broken ‘guardrails’ for AI systems lead to push for new safety measures

Financial Times

Last week, tech giants Microsoft-backed OpenAI and Meta (formerly Facebook) unveiled significant advancements in consumer AI products. OpenAI’s ChatGPT can now interact using voice, images, and text, while Meta has launched AI chatbot personalities for platforms like WhatsApp and Instagram. As companies fast-track AI commercialization, the security measures – or “guardrails” – intended to prevent misuse are facing challenges to stay up-to-date.

Responding to these challenges, major players like Anthropic and Google DeepMind are developing “AI constitutions” – guidelines designed to anchor their models’ behavior and mitigate potential abuses. Dario Amodei, CEO of Anthropic, emphasized the importance of transparency and accountability in AI systems. The push now is to teach AI systems foundational ethical principles so they can self-regulate without extensive human oversight. One of the most critical tasks is ensuring AI alignment with positive human traits like honesty and tolerance. To streamline AI outputs, a predominant method used is reinforcement learning by human feedback (RLHF), where human teams evaluate AI responses as either “good” or “bad”. However, this method has limitations and is prone to errors.