LLM Security Digest: Best October’s Activities And Prompt Engineering Tricks

Trusted AI Blog + LLM Security admin November 8, 2023 144

This digest of October 2023 encapsulates the most influential findings and discussions on LLM Security and a bit of Prompt Engineering.

Subscribe for the latest LLM Security news: Jailbreaks, Attacks, CISO guides, VC Reviews and more

LLM Security

Best practical LLM Attacks: Multi-modal prompt injection image attacks against GPT-4V

GPT-4V, the innovative image-uploading extension of GPT-4, offers remarkable interpretative abilities. However, as this article has demonstrated, the advancement also unveils new vulnerabilities, particularly in prompt injection attacks.

One example showed a simple visual prompt injection, where the model responded to instructions embedded in an image rather than the user’s query. More concerning is the visual prompt injection exfiltration attack, where GPT-4V was tricked into encoding and sending a private conversation to an external server.

Even images that appear innocuous can conceal prompt injections, such as an image with off-white text on a white background instructing the model to mention a sale at Sephora, effectively bypassing the intended user query.

Best LLM Security VC Review: LinkedIn post by CRV Ventures

CRV shows the rising excitement around AI, particularly the potential of Large Language Models (LLMs) in customer-facing applications. However, it identifies security as a critical gap that needs addressing.

The post highlights the intersection of opportunity and challenge in the realm of AI and LLM. While the enthusiasm for AI’s potential is palpable, it candidly acknowledges the pressing issue of security, suggesting a proactive approach. The open invitation to founders to engage in dialogue suggests a collaborative effort to navigate the complexities of AI and LLM security.

Best LLM real attack: NSFW filter bypass in Character.AI

Character.AI, a popular web application for chatting with various bots, incorporates a default NSFW filter to prevent inappropriate or harmful conversations. This filter blocks explicit content, including sexual discussions and offensive language. This article explores ways to bypass the NSFW filter, such as the Out of Character method, jailbreak prompts, and careful rephrasing. However, it emphasizes the importance of respecting Character.AI’s terms of service while doing so. Bypassing the filter can potentially lead to account suspension, although alternative platforms like Chai or ChatGPT may offer fewer restrictions.

Ultimately, while there are methods to navigate around the NSFW filter, the article urges users to balance freedom of conversation with the responsibility of maintaining a respectful online environment.

Best LLM security research: Low-Resource Languages Jailbreak GPT-4

Researchers have identified a vulnerability in AI safety mechanisms by translating English prompts into low-resource languages, revealing a method to circumvent GPT-4’s safeguards.

Their experiments showed that GPT-4 responds to potentially harmful requests 79% of the time when given in languages like Zulu, using publicly available translation tools. This discovery is significant because it demonstrates a weakness in AI’s ability to filter out unsafe content across different languages, highlighting an oversight in the language coverage of current AI safety training.

The findings urge a shift in safety training to include multilingual data and address linguistic inequalities, as the current English-centric approach underestimates LLMs’ capabilities with low-resource languages and opens up exploitation risks by bad actors worldwide.

The study’s results call for comprehensive, inclusive red-teaming and more robust, global AI safety protocols.

Best LLM Hacking Game: Promptalanche by Fondu.ai

Promptalanche is an interactive online platform designed as a prompt injection Capture The Flag (CTF) challenge. It engages users in a series of levels where the objective is to coax an AI into revealing hidden “secrets” to progress to subsequent stages. The website serves as a unique playground for users to experiment with and understand the vulnerabilities of AI systems through the lens of prompt injection — a method that could potentially be misused in real-world AI applications.

As a precursor to a competitive CTF event with prizes, Promptalanche offers practice scenarios that mirror interactions with AI agents, providing educational insights into AI security. To participate, users need their OpenAI API key, which is securely used without storage or logging by the site, ensuring privacy while using the game’s interface to communicate with the OpenAI API. Promptalanche is a resource for both learning and anticipating the launch of a fully-fledged CTF challenge.

Best LLM Security Intro: Prompt Hacking and Misuse of LLMs

The author warns of the risks associated with their misuse, as prompt hacking can exploit vulnerabilities for deception or spreading misinformation. The evolution of LLMs from 2020 to 2023 has seen significant advancements from GPT-3 to the DALL·E series and beyond, revolutionizing efficiency and creativity in professional settings by at least 10% according to Sequoia Capital. These developments have come from tech giants like OpenAI, DeepMind, GitHub, Google, and Meta. Yet, their power is a double-edged sword; vast parameters and biased training data introduce risks. Prompt engineering becomes crucial to mitigate vulnerabilities like prompt injections, leaking, and jailbreaking. Defensive strategies such as filtering, contextual clarity, instruction defense, random sequence enclosure, sandwich defense, and XML tagging are vital in safeguarding against misuse, underlying the importance of informed and ethical use of LLMs amidst technological acceleration.

Best LLM Security Initiative: Microsoft AI Bounty Program

Microsoft has launched an AI Bounty Program inviting global security researchers to identify vulnerabilities in its AI-powered Bing services, including Bing.com, Microsoft Edge, Microsoft Start, and Skype applications. Rewards range from $2,000 to $15,000 based on the submission’s impact and quality, with the potential for higher compensation at Microsoft’s discretion. Eligible vulnerabilities must be previously unreported, critical, or important, and reproducible on the most recent versions of the service, with clear documentation provided for replication. Submissions should be made through the MSRC Researcher Portal and adhere to Research Rules of Engagement, which prohibit unauthorized data access, denial of service testing, and other non-technical attack methods. Vulnerabilities already known to Microsoft or the public, those based on user configuration, or affecting unsupported browsers, among others, are out of scope.

Best LLM Security Government Initiative

President Biden’s new Executive Order sets a precedent for AI regulation, focusing on safety, security, and trust. It ushers in rigorous standards for AI, mandates information sharing for high-risk AI developers, and tightens protections against AI-related privacy risks. New measures for AI in critical infrastructure and biological risk management reflect the government’s commitment to advance AI innovation responsibly.

The Order also addresses AI’s societal impact, ensuring equitable use and combating discrimination in areas like housing and criminal justice. It upholds consumer, worker, and student interests, by promoting ethical AI application and safeguarding against harmful practices. Furthermore, the strategy encourages AI competition and research, proposing a national resource for AI study and guidance for government AI deployment. Internationally, it seeks to shape global AI use while reinforcing America’s leadership. The administration vows continued bipartisan collaboration for holistic AI policy development.

Prompt Engineering

Meta shows how to reduce hallucinations in ChatGPT & Co with prompt engineering

ChatGPT and similar language models often generate inaccurate responses despite being trained with the correct information. To combat this issue, known as hallucination, Meta AI researchers have introduced a prompt-based technique called Chain-of-Verification (CoVe), which greatly diminishes such errors.

Meta AI’s new Chain-of-Verification (CoVe) method offers a breakthrough in enhancing the accuracy of AI language models like ChatGPT. By generating self-verification queries from the model’s responses, CoVe independently checks the accuracy of information, increasing the precision of answers to list-based prompts by over 100% and long-form content by 28%. Current improvements are internal, using the model’s stored knowledge, but future iterations may involve external databases for verification. This self-check approach outperforms current models that don’t use external data, setting a new standard in AI’s self-regulatory accuracy.

80+ ChatGPT-4 Vision features and real world applications explored

ChatGPT-4 Vision, OpenAI’s latest innovation for Plus and Enterprise users, bridges the gap between text and visual understanding, offering over 80 features for real-world application. This powerful AI can now analyze images, transforming workflows and productivity by streamlining tasks such as document interpretation and visual data analysis. From recognizing and describing a peculiar bug to providing context to conversations and explaining incomprehensible images, it expands ChatGPT’s functionality beyond text to a more interactive, multimodal experience. Whether for creating descriptive content for visually impaired users, deciphering handwritten notes, or assisting with retail recommendations and quality assessments, ChatGPT-4 Vision elevates the efficacy of image-based query responses. For an extensive list of applications and insights into this technology, visit Greg Kamradt’s website for a detailed Excel resource.

Unlock Your AI’s Full Potential: 10 Advanced Prompt Engineering Hacks in 2023

In this article, an author provides 10 lesser-known, advanced prompt engineering strategies that can significantly enhance your interactions with LLMs. Key techniques include constraint injection to focus AI responses and temperature/top-K tuning to manage creativity. Varied phrasing is recommended for tailored outputs, and reward modeling reinforces accurate AI behavior. Domain priming ensures context-specific replies, while sub-modeling breaks complex queries into manageable parts. Explicit negation keeps AI on-topic, and sequential prompting refines responses through dialogue. Benchmarking and A/B testing optimize prompts by comparing results, and custom tokenization fine-tunes comprehension. These methods enable users to craft prompts that leverage AI capabilities effectively. Ben-Zur emphasizes that prompt engineering combines creativity with analytical skills to enhance AI interactions, inviting the community to engage with the evolving art and science of AI prompt crafting.

Be the first who will know about the latest GPT-4 Jailbreaks and other AI attacks and vulnerabilities

Written by: admin

Rate it

November 6, 2023

Secure AI Weekly admin