Towards Trusted AI Week 36 – The Critical Quest for Secure and Reliable AI Systems

Secure AI Weekly + Digests admin September 5, 2023 90

UK cybersecurity agency warns of chatbot ‘prompt injection’ attacks

The Guardian, August 30, 2023

The United Kingdom’s National Cyber Security Centre (NCSC) has recently raised alarms about the escalating cybersecurity threats surrounding chatbots. These automated conversational agents, powered by large language models (LLMs) like OpenAI’s ChatGPT and Google’s Bard, are becoming increasingly vulnerable to “prompt injection” attacks. In these attacks, a user provides a malicious input designed to make the AI behave in ways it’s not intended to, thus potentially leading to a range of harmful outcomes, from generating inappropriate content to leaking confidential information.

Chatbots are ubiquitous in various online services, such as banking and shopping platforms, functioning to provide answers to user queries by mimicking human-like interactions. However, their utility extends to interfacing with third-party services and applications, thereby increasing the risk profile. For example, a user could manipulate a chatbot into taking actions it was not originally programmed to do, simply by inputting unfamiliar phrases or using specific word combinations to bypass the original script. Recent incidents include a Stanford University student, Kevin Liu, exposing Bing Chat’s concealed initial prompt, and security researcher Johann Rehberger demonstrating that ChatGPT could access unintended third-party data through YouTube transcripts.

Given these increasing vulnerabilities, the NCSC emphasizes that countermeasures must be integrated into the system’s architecture. They suggest that while individual AI models can’t be completely isolated from potential threats, system-wide security design can prevent or mitigate exploitation. For instance, overlaying a rule-based mechanism on top of the machine learning component can thwart malicious prompts from leading to damaging actions. The NCSC stresses the importance of acknowledging the inherent security flaws in machine learning algorithms and taking steps to safeguard against them, particularly as chatbots become more integral to data transmission across third-party applications.

From Google To Nvidia, Tech Giants Have Hired Red Team Hackers To Break Their AI Models

Forbes, September 1, 2023

Leaders of specialized “red teams” at tech giants like Microsoft, Google, Nvidia, and Meta have a critical mission: to identify and rectify vulnerabilities in their respective artificial intelligence systems. In doing so, these red teams not only ensure the reliability and security of AI technologies but also confer a competitive edge to their companies. This dual focus on safety and competitive advantage is becoming increasingly pivotal in the AI sector. As Sven Cattell, founder of the AI Village—a hub for AI security experts and ethical hackers—succinctly stated, “Trust and safety are becoming the new competitive moats in the AI landscape.”

Meta, a frontrunner in this space, established its AI red team as early as 2019. The unit regularly conducts internal tests and “risk-a-thons,” challenging ethical hackers to beat algorithms that are designed to filter out harmful content like hate speech and deep fakes on platforms such as Instagram and Facebook. Their efforts have been considerably amplified in July 2023 when Meta onboarded 350 red team specialists, including external experts and internal staff. These experts were instrumental in probing the vulnerabilities of Llama 2, Meta’s newly developed open-source language model, using high-risk prompts related to tax evasion or fraudulent schemes. As Canton, the head of Meta’s red team, emphasized, “Our guiding principle is: ‘The more you sweat in training, the less you bleed in battle.'”

The strategic importance of red teams cannot be overstated, especially as tech companies increasingly realize that the future of AI lies in establishing public trust through robust security measures. As industries shift toward marketing their AI solutions as the safest options available, the role of red teams in identifying weaknesses and bolstering security becomes even more crucial. This proactive approach to security is not merely about fixing vulnerabilities; it’s about building a culture where the safety and reliability of AI systems are prioritized, ultimately leading to increased public trust in these advanced technologies.

Hacking the future: Notes from DEF CON’s Generative Red Team Challenge

CSOOnline, August 28, 2023

At this year’s DEF CON hacker conference in Las Vegas, billed as the world’s largest gathering of its kind, the spotlight wasn’t just on traditional hacking domains like lock manipulation or automotive systems. Artificial Intelligence, notably large language models (LLMs), also took center stage. My colleague Barbara Schluetter and I were drawn to the Generative Red Team Challenge, a ground-breaking event aimed at probing the vulnerabilities of LLMs. This unique challenge, representative of the White House’s May 2023 directive to rigorously assess LLMs, drew an overwhelming number of participants, indicating a higher level of interest than the event could accommodate. Austin Carson, from the AI-centric organization SeedAI, shared that the challenge aimed to unite diverse testers to explore the AI’s limitations and potential failures.

Our hands-on experience during the event revealed significant security gaps in LLMs. Tasked with completing as many challenges as possible within a 50-minute timeframe, we chose three primary tasks: to induce the LLM to spread false information, to make it divulge data that should be safeguarded, and to obtain administrative access to the system. While we failed to secure administrative privileges, we did succeed in easily manipulating the model into generating fake narratives. Disturbingly, we could also trick the LLM into sharing sensitive surveillance tactics, merely by reframing questions in various ways. These findings highlight the brittle nature of current AI systems and the urgent need for robust verification and safeguard mechanisms.

The Generative Red Team Challenge at DEF CON 31 served as a crucial wake-up call for the AI community. Our successes and failures during the event demonstrated both the potential and the pitfalls of current LLMs. According to Austin Carson, full results of the challenge will be disclosed in the coming months, but one thing is abundantly clear: the quest for a secure, reliable AI is far from over. A collective effort from both the public and private sectors is essential for identifying and mitigating vulnerabilities, to make the world of AI a safer place. And while we didn’t come away as triumphant hackers, the badges we did earn symbolize the pressing need for continued scrutiny and improvement in AI security.

Subscribe for updates

Stay up to date with what is happening! Get a first look at news, noteworthy research and worst attacks on AI delivered right in your inbox.

Written by: admin

Rate it

September 4, 2023

LLM Security Digest admin

Towards Trusted AI Week 36 – The Critical Quest for Secure and Reliable AI Systems

UK cybersecurity agency warns of chatbot ‘prompt injection’ attacks

From Google To Nvidia, Tech Giants Have Hired Red Team Hackers To Break Their AI Models

Hacking the future: Notes from DEF CON’s Generative Red Team Challenge

Subscribe for updates

Previous post

LLM Security and Prompt Engineering Digest: Top August events, guides, incidents, VC reviews and research papers

Similar posts

Top MCP security resources — July 2026

Top Agentic AI security resources — July 2026