This digest of November 2023 keeps the essential findings and discussions on LLM Security. From Hacking LLM using the intriguing ‘Prompt-visual injections’ to the complex challenges in securing systems like Google Bard, we cover the most crucial updates.
Subscribe for the latest LLM Security and Hacking LLM news: Jailbreaks, Attacks, CISO guides, VC Reviews and more
Top Hacking LLM and LLM Security News
Top practical hacking for LLM: Prompt-visual injections by Simon Willison
The post discusses the capabilities and vulnerabilities of GPT-4V, a version of GPT-4 that incorporates image uploads in conversations. It highlights the model’s impressive ability to analyze and describe images, as shown in an example involving a photograph of a pumpkin weigh-off event. However, the main focus is on the susceptibility of GPT-4V to various forms of ‘prompt injection attacks’ using images.
Examples include a basic visual prompt injection, a more serious exfiltration attack using a robot character image to encode and leak conversation data, and a hidden prompt injection in an ostensibly blank image.
These attacks exploit the model’s inherent gullibility and inability to distinguish between benign and malicious instructions, presenting a significant challenge in designing secure AI-based products.
Top LLM real attack: Hacking Google Bard
The article discusses a security vulnerability in Google Bard, a large language model, arising from its new Extensions feature. These Extensions enable Bard to access external sources like YouTube, flights, hotels, personal documents and emails.
However, this integration introduces a risk of ‘Indirect Prompt Injection’ attacks. The author demonstrates this vulnerability by using older YouTube videos and Google Docs to manipulate Bard into executing unintended actions. The most concerning aspect is the potential for attackers to force-share Google Docs with victims, leading to prompt injection when Bard interacts with these documents.
The article further explores an ‘Image Markdown Injection’ vulnerability, where Bard can be tricked into rendering HTML images, potentially leading to data exfiltration. Despite a Content Security Policy (CSP) in place, the author found a way around it using Google Apps Script.
Google has since been notified and has fixed the issue, but the nature of the fix remains unclear. This case highlights the complexities and security challenges of integrating AI with external data sources.
Top LLM Security video: Prompt Injection in LLM Agents
In this video, a speaker covers an article on prompt injection attacks against LLM-powered agents. The article is titled “Synthetic Recollections” and published on WithSecure Labs research blog.
Top LLM Red Teaming: Red-teaming and Hacking LLM GPTs
Adversa AI Research team revealed a number of New LLM Vulnerabilities that affect almost any Custom GPT’s right now. It includes Prompt Leaking, API Names Leakage, Document Metadata Leakage, and Document Content Leakage Attacks.
Top LLM security research
This research paper reveals a significant security vulnerability in ChatGPT. The researchers demonstrate a cost-effective method to extract substantial portions of ChatGPT’s training data using a simple attack.
This attack is notable for its effectiveness on a production, “aligned” model designed to avoid such data leakage. The paper emphasizes the importance of testing and red-teaming, not only aligned models but also their base models, to uncover latent vulnerabilities.
The research highlights a specific vulnerability in ChatGPT where it memorizes and regurgitates large fractions of its training data, a problem that goes beyond simple fixes and touches on the fundamental challenges of securing language models. The paper’s findings are a significant contribution to understanding and improving the security of machine learning systems.
Top Hacking LLM Game
“Doublespeak.chat” was created by Alex Leahu (alxjsn) and Matt Hamilton (eriner) of Forces Unseen. This is a noteworthy hacking game, where your goal is to discover and submit the bot’s name. Let it begin!
Top LLM Security Initiative
MITRE and Microsoft collaborate to address generative AI security risks. The companies have enhanced MITRE ATLAS —which stands for Adversarial Threat Landscape for Artificial-Intelligence Systems. This is a critical knowledge base for AI security. MITRE and Microsoft have added a focus on data-driven generative AI and LLMs like ChatGPT and Bard.
This update addresses the increasing variety of attack pathways in LLM-enabled systems, essential for sectors like healthcare, finance, and transportation. The ATLAS framework, a collaborative project with over 100 organizations, includes new case studies from 2023, highlighting vulnerabilities such as indirect prompt injections in ChatGPT and misinformation risks in LLMs.
The ATLAS community will now prioritize sharing incidents and vulnerabilities, enhancing AI security in various areas, including equitability and privacy. Additionally, the community aims to address AI supply chain issues through open forums like GitHub and Slack, focusing on risk mitigation practices and techniques.
Top LLM Security Government Initiative: National Cyber Security Center
This document offers comprehensive guidelines for developing secure AI systems, applicable to any AI system providers, whether created independently or built on existing tools and services. It targets large organizations, cybersecurity professionals, small and medium-sized organizations, and the public sector.
The guidelines emphasize incorporating security throughout the AI system development lifecycle, including design, development, deployment, and operation. Key areas include understanding risks, supply chain security, protecting infrastructure, incident management, and maintenance practices like logging and monitoring. The approach aligns with established frameworks from NCSC, NIST, and CISA, focusing on security ownership, transparency, and prioritizing ‘secure by design’ as a core business value.
Top LLM 101 article: Demystifying Generative AI
This article details a security researcher’s exploration of Generative Artificial Intelligence (AI) and its applications in security. It begins by defining AI and its subsets, such as Machine Learning (ML), Neural Networks (NN), and Deep Learning (DL), explaining how they process language and contribute to the development of Large Language Models (LLMs). The post covers key neural network concepts like tokenization, embeddings, and recurrent neural networks, and introduces advanced topics like Encoder-Decoder architectures, Transformers, and Retrieval Augmented Generation (RAG). The researcher applies these concepts to practical security applications, experimenting with tokenizing and embedding text, and building tools using LLMs for tasks like sentiment analysis and API interactions. The article aims to simplify Generative AI concepts for readers and inspire them to create their own tools, demonstrating the potential of Generative AI in enhancing cybersecurity.
Top LLM Security VC initiative
Top VC firms sign voluntary commitments for startups to build AI responsibly.
The introduction of new guidelines aims to establish essential safeguards within the burgeoning AI industry, impacting potentially thousands of startups. These measures are designed to steer the rapidly growing sector towards responsible development and use of AI technologies.
Top LLM Security Analyst Report
Gartner released an updated version of its GenAI Security Market Guide – “Innovation Guide for Generative AI in Trust, Risk and Security Management “. Gartner considers the categories of GenAI risks, and describes the reasons IT leaders need to evaluate emerging TRiSM (Trust, Risk and Security Management) technologies and solutions and face novel security risks.
Prompt Engineering
LLMs have a multilingual jailbreak problem – how you can stay safe
LLMs like GPT-4 are less effective at detecting and preventing harmful content in lesser-known, low-resource languages due to limited multilingual data.
Research shows that translating harmful prompts into these languages bypasses safety measures, with a high success rate of eliciting harmful responses. This vulnerability highlights the need for improved safety mechanisms in LLMs across diverse languages.
Researchers suggest strategies such as SELF-DEFENSE, a framework for generating multilingual training data, to enhance LLM safety and address the linguistic inequality in safety training data, thereby reducing the risk of harmful content generation in low-resource languages.
Be the first who will know about the latest GPT-4 Jailbreaks and other AI attacks and vulnerabilities