LLM Security and Prompt Engineering: Best Events of September From LLM Hacking Games to Gartner Research

Trusted AI Blog + LLM Security admin October 5, 2023 159

This digest encapsulates the most influential findings and discussions from the LLM Security with some of the most important prompt engineering highlights.

Subscribe for the latest LLM Security news: Jailbreaks, Attacks, CISO guides, VC Reviews and more

LLM Security

Best LLM Hacking Game: Tensor Trust

A riveting online game is aimed at discovering vulnerabilities through prompt injections in LLMs.

The blog post discusses the creation and purpose of “Tensor Trust,” an online game designed to help researchers create a dataset regarding prompt injection vulnerabilities in Language Models (LLMs).

The game is themed around bank accounts and involves players attempting to break into others’ accounts, leveraging and understanding the vulnerabilities in LLMs through crafted adversarial prompts, while others create defenses to prevent such breaches.

The broader intention behind Tensor Trust is to enhance understanding of LLM weaknesses and to further develop more robust models, providing a rich dataset that could aid in evaluating adversarial defenses, constructing new strategies for detecting jailbreaking, and comprehending how LLMs function in various scenarios.

Best LLM Security visionary post: Venture in Security

The overview authored by Ross Haleliuk discusses the pressing issue of prioritizing security in the rapid development of AI and ML technologies, drawing a parallel with past security oversights in internet and construction technologies.

Through the analogy of construction technology’s evolution, particularly around earthquake-resistant structures, it demonstrates the complexities and challenges of retrofitting security measures to already established structures and systems, both in physical and digital domains. The post illuminates how the cybersecurity sector, despite advancements, still battles with foundational insecurities of the internet due to initial oversight during its creation.

Despite knowledgeable professionals and emerging companies aiming to secure AI and ML, the author underscores that without a fundamental integration of security in AI infrastructure from day one, subsequent efforts might be insufficient, pointing towards a need for immediate action in embedding security in AI and ML development processes.

Best LLM security VC review: The Rise of the AI Governance Stack

In the review, Dharmesh Thakker from Battery Ventures emphasizes the fact that AI strategy is now a top priority among businesses particularly in light of fastest-growing consumer applications like OpenAI’s ChatGPT which reached 100 million users within two months.

The author raises concerns due to misuse cases like Samsung’s ChatGPT ban after a code leak and Lensa AI’s class-action lawsuit for collecting biometric data. As AI adoption surges, ethical and transparent implementations are urged by regulators shaping responsible AI governance. The EU is advancing AI regulations with its AI Act, while the U.S. currently relies on voluntary frameworks and local mandates.

Dharmesh Thakker rightly asserts that companies must be proactive about AI risks, considering both financial and brand impacts. Emerging AI governance tools aim to ensure compliant AI systems and bridge the knowledge gap between technical and business teams. Trust, transparency, and AI safety remain paramount for future developments in this sector.

Best LLM security research: Model Leeching

The paper introduces “Model Leeching,” a world first cost-effective extraction attack targeting Large Language Models (LLMs) that distills task-specific knowledge from an LLM into a smaller model.

The authors showcased the potential to stage adversarial attacks on a main LLM using an extracted model tested within a controlled environment. The results indicate that these models can be created with notable resemblance and performance at minimal costs. They serve as a foundation for transferring attacks and capitalizing on data extracted from the primary LLM.

The study underscores the practical risks of data leakage, model stealing, and attack transferability in LLMs which were previously known to be only theoretical , suggesting that these extracted models can facilitate further adversarial attacks on target LLMs.

Best real LLM attack: How Robust is Google’s Bard to Adversarial Image Attacks?

The research demonstrates a real attack with the objective to investigate the vulnerability of Google’s Bard, a chatbot with multimodal capabilities, to adversarial attacks.

Adversarial examples misled Bard with a 22% success rate on image descriptions, Bing Chat at 26%, and ERNIE Bot at 86%. These findings highlight the susceptibilities of commercial MLLMs to such threats. Although Bard has defense mechanisms like face and toxicity image detection, they can be bypassed.

The safety of large-scale models like Bard and ChatGPT, especially against image attacks, is crucial. Traditional defense methods may not be apt for these models, prompting the need for better defense strategies.

Best LLM Security Analyst Post: Gartner article “Tackling Trust, Risk and Security in AI Models”

Generative AI presents extensive opportunities, but many organizations neglect the associated risks until AI is in use. The post from Gartner considers AI TRiSM (Trust, Risk, Security Management) program ensuring AI systems are compliant, fair, and maintain data privacy.

The author presents six drivers of AI risk, including:

lack of clear AI explanation to stakeholders, emphasizing model functionality, strengths, biases, and behavior;
open access to tools like ChatGPT, which are risks not addressed by traditional controls, especially in cloud-based applications;
third-party AI tools introducing data confidentiality concerns, impacting adoption and user trust;
lack of AI models monitoring, which should be constant;
AI threats from adversarial attacks and lack of methods to avoid and detect them;
emerging regulations, such as the EU AI Act, necessitate preparedness beyond existing privacy-related regulations.

Best LLM Security Initiative: Open AI Red Teaming Network

Open AI announced an open call for the OpenAI Red Teaming Network. They invite domain experts interested in improving the safety of OpenAI’s models to collaborate in rigorously evaluating and red teaming AI models.

The OpenAI Red Teaming Network consists of expert members who assist in risk assessments during model and product development. Participation varies per expert, possibly 5–10 hours annually. Beyond OpenAI campaigns, members share red teaming practices and insights, promoting continuous feedback. This complements other AI safety initiatives by OpenAI.

Prompt Engineering

Prompt Engineering 101: Zero, One, and Few-Shot Prompting

LLMs (Large Language Models) are predictive tools designed to forecast the next word in a sequence based on given context. While these models are trained on expansive datasets, their effectiveness is strongly influenced by the context supplied by user input. For optimal results with LLM-driven chatbots, such as ChatGPT, providing adequate context is crucial. There are three primary strategies to offer context within user prompts: zero-shot prompting, one-shot prompting, and few-shot prompting. They are described in this blog post.

25 of the best AI and ChatGPT courses you can take online for free

This article provides a wide range of free online courses on Udemy. Look at the best online AI and ChatGPT courses you can take for free in September 2023. With resources provided in the article with links, you get unlimited access to the content, but free online courses do not include a certificate of completion or direct messaging with the instructor.

Forbes: This State-Of-The-Art Directional-Stimulus Prompt Engineering Technique For Generative AI Earns Bigtime Payoffs Via Hints

The article emphasized an idea that hinting in prompt engineering is vital as a best practice. Hints are needed to direct or provide additional information.

The author considers a new approach called Directional Stimulus Prompting (DSP) which is used to guide the response of large language models (LLMs) like ChatGPT. The DSP introduces “directional stimulus” into the prompt which serves as hints or cues for the LLM to generate a desired response. For example, by giving hints in a summarization task, writing keywords or phrases after “Hint:”, the LLM can be guided to produce a specific type or logic of summary.

Users can devise these hints themselves, thinking of what information they want to extract from the AI. There are two main ways to derive hints: human-derived, where a user provides the hints, and AI-derived, where an automated system suggests hints. This approach can also be extended in a “flipped interaction”, where the AI provides hints to humans upon request, which can be useful in tasks like learning or seeking a brief overview of a topic.

So hints can be a valuable tool in guiding AI interactions and achieving more refined results.

All research and articles showcase the intricacies, vulnerabilities, and innovations in the realm of LLMs. While each provides a unique perspective – be it from a hacker’s game or a VC’s review – they collectively underline the monumental impact of AI. The commonality lies in their shared endeavor to understand, secure, and optimize AI, while their differences highlight the multifaceted challenges and innovations in this ever-evolving domain.

Subscribe to our LinkedIn to join a community of experts and enthusiasts dedicated to unmasking the hidden dangers of AI and LLM technology.

Be the first who will know about the latest GPT-4 Jailbreaks and other AI attacks and vulnerabilities

Written by: admin

Rate it

October 5, 2023

Secure AI Weekly admin