LLM Security Digest: Hacking LLM, Top LLM Attacks, VC Initiatives, LLM Incidents and Research papers in November

Trusted AI Blog + LLM Security admin December 8, 2023 286

This digest of November 2023 keeps the essential findings and discussions on LLM Security. From Hacking LLM using the intriguing ‘Prompt-visual injections’ to the complex challenges in securing systems like Google Bard, we cover the most crucial updates.

Subscribe for the latest LLM Security and Hacking LLM news: Jailbreaks, Attacks, CISO guides, VC Reviews and more

Top Hacking LLM and LLM Security News

Top practical hacking for LLM: Prompt-visual injections by Simon Willison

The post discusses the capabilities and vulnerabilities of GPT-4V, a version of GPT-4 that incorporates image uploads in conversations. It highlights the model’s impressive ability to analyze and describe images, as shown in an example involving a photograph of a pumpkin weigh-off event. However, the main focus is on the susceptibility of GPT-4V to various forms of ‘prompt injection attacks’ using images.

Examples include a basic visual prompt injection, a more serious exfiltration attack using a robot character image to encode and leak conversation data, and a hidden prompt injection in an ostensibly blank image.

These attacks exploit the model’s inherent gullibility and inability to distinguish between benign and malicious instructions, presenting a significant challenge in designing secure AI-based products.

Top LLM real attack: Hacking Google Bard

The article discusses a security vulnerability in Google Bard, a large language model, arising from its new Extensions feature. These Extensions enable Bard to access external sources like YouTube, flights, hotels, personal documents and emails.

However, this integration introduces a risk of ‘Indirect Prompt Injection’ attacks. The author demonstrates this vulnerability by using older YouTube videos and Google Docs to manipulate Bard into executing unintended actions. The most concerning aspect is the potential for attackers to force-share Google Docs with victims, leading to prompt injection when Bard interacts with these documents.

The article further explores an ‘Image Markdown Injection’ vulnerability, where Bard can be tricked into rendering HTML images, potentially leading to data exfiltration. Despite a Content Security Policy (CSP) in place, the author found a way around it using Google Apps Script.

Google has since been notified and has fixed the issue, but the nature of the fix remains unclear. This case highlights the complexities and security challenges of integrating AI with external data sources.

Top LLM Red Teaming: Red-teaming and Hacking LLM GPTs

Adversa AI Research team revealed a number of New LLM Vulnerabilities that affect almost any Custom GPT’s right now. It includes Prompt Leaking, API Names Leakage, Document Metadata Leakage, and Document Content Leakage Attacks.

Top LLM security research

This research paper reveals a significant security vulnerability in ChatGPT. The researchers demonstrate a cost-effective method to extract substantial portions of ChatGPT’s training data using a simple attack.

This attack is notable for its effectiveness on a production, “aligned” model designed to avoid such data leakage. The paper emphasizes the importance of testing and red-teaming, not only aligned models but also their base models, to uncover latent vulnerabilities.

The research highlights a specific vulnerability in ChatGPT where it memorizes and regurgitates large fractions of its training data, a problem that goes beyond simple fixes and touches on the fundamental challenges of securing language models. The paper’s findings are a significant contribution to understanding and improving the security of machine learning systems.

Top Hacking LLM Game

“Doublespeak.chat” was created by Alex Leahu (alxjsn) and Matt Hamilton (eriner) of Forces Unseen. This is a noteworthy hacking game, where your goal is to discover and submit the bot’s name. Let it begin!

Top LLM Security Initiative

MITRE and Microsoft collaborate to address generative AI security risks. The companies have enhanced MITRE ATLAS —which stands for Adversarial Threat Landscape for Artificial-Intelligence Systems. This is a critical knowledge base for AI security. MITRE and Microsoft have added a focus on data-driven generative AI and LLMs like ChatGPT and Bard.

This update addresses the increasing variety of attack pathways in LLM-enabled systems, essential for sectors like healthcare, finance, and transportation. The ATLAS framework, a collaborative project with over 100 organizations, includes new case studies from 2023, highlighting vulnerabilities such as indirect prompt injections in ChatGPT and misinformation risks in LLMs.

The ATLAS community will now prioritize sharing incidents and vulnerabilities, enhancing AI security in various areas, including equitability and privacy. Additionally, the community aims to address AI supply chain issues through open forums like GitHub and Slack, focusing on risk mitigation practices and techniques.

Top LLM Security Government Initiative: National Cyber Security Center

This document offers comprehensive guidelines for developing secure AI systems, applicable to any AI system providers, whether created independently or built on existing tools and services. It targets large organizations, cybersecurity professionals, small and medium-sized organizations, and the public sector.

The guidelines emphasize incorporating security throughout the AI system development lifecycle, including design, development, deployment, and operation. Key areas include understanding risks, supply chain security, protecting infrastructure, incident management, and maintenance practices like logging and monitoring. The approach aligns with established frameworks from NCSC, NIST, and CISA, focusing on security ownership, transparency, and prioritizing ‘secure by design’ as a core business value.

Top LLM Security VC initiative

Top VC firms sign voluntary commitments for startups to build AI responsibly.

The introduction of new guidelines aims to establish essential safeguards within the burgeoning AI industry, impacting potentially thousands of startups. These measures are designed to steer the rapidly growing sector towards responsible development and use of AI technologies.

Top LLM Security Analyst Report

Gartner released an updated version of its GenAI Security Market Guide – “Innovation Guide for Generative AI in Trust, Risk and Security Management “. Gartner considers the categories of GenAI risks, and describes the reasons IT leaders need to evaluate emerging TRiSM (Trust, Risk and Security Management) technologies and solutions and face novel security risks.

Prompt Engineering

LLMs have a multilingual jailbreak problem – how you can stay safe

LLMs like GPT-4 are less effective at detecting and preventing harmful content in lesser-known, low-resource languages due to limited multilingual data.

Research shows that translating harmful prompts into these languages bypasses safety measures, with a high success rate of eliciting harmful responses. This vulnerability highlights the need for improved safety mechanisms in LLMs across diverse languages.

Researchers suggest strategies such as SELF-DEFENSE, a framework for generating multilingual training data, to enhance LLM safety and address the linguistic inequality in safety training data, thereby reducing the risk of harmful content generation in low-resource languages.

Be the first who will know about the latest GPT-4 Jailbreaks and other AI attacks and vulnerabilities

Written by: admin

Rate it

December 6, 2023

Secure AI Weekly admin

LLM Security Digest: Hacking LLM, Top LLM Attacks, VC Initiatives, LLM Incidents and Research papers in November

Subscribe for the latest LLM Security and Hacking LLM news: Jailbreaks, Attacks, CISO guides, VC Reviews and more

Top Hacking LLM and LLM Security News

Top practical hacking for LLM: Prompt-visual injections by Simon Willison

Top LLM real attack: Hacking Google Bard

Top LLM Security video: Prompt Injection in LLM Agents

Top LLM Red Teaming: Red-teaming and Hacking LLM GPTs

Top LLM security research

Top Hacking LLM Game

Top LLM Security Initiative

Top LLM Security Government Initiative: National Cyber Security Center

Top LLM 101 article: Demystifying Generative AI

Top LLM Security VC initiative

Prompt Engineering

LLMs have a multilingual jailbreak problem – how you can stay safe

Be the first who will know about the latest GPT-4 Jailbreaks and other AI attacks and vulnerabilities

Previous post

Towards Secure AI Week 48 – Multiple OpenAI Security Flaws

Similar posts

Towards Secure AI Week 17 – 7 Vital Questions for CISOs

Towards Secure AI Week 16 – NSA Guidelines for Secure AI Systems