Towards Trusted AI Week 35 – The Achilles’ Heel of AI

Secure AI Weekly + Trusted AI Blog admin September 1, 2023 139

Tricks for making AI chatbots break rules are freely available online

NewScientist, August 21, 2023

Artificial intelligence chatbots like ChatGPT have become essential tools for various online activities, but their security loopholes present an emerging concern. Manipulative text prompts, often referred to as “jailbreak prompts,” can mislead these AI systems into overriding built-in safeguards against illicit usage. These prompts have been circulating freely on social media and other online platforms, exposing chatbots to potential criminal exploitation for an extended period. Researchers like Xinyue Shen of the CISPA Helmholtz Center have conducted studies revealing that, on average, 69% of these jailbreak prompts successfully bypass the chatbot’s ethical restrictions. Some prompts even demonstrated a nearly flawless success rate, particularly in areas like political lobbying, generating explicit content, or producing legal advice—activities that chatbots are designed to avoid.

Experts in the field are raising red flags about the simplicity of manipulating AI technologies and the collective responsibility to secure them. Victoria Baines at Gresham College notes that security threats do not always need to be complex; here, everyday language suffices to create a breach. Alan Woodward from the University of Surrey further points out that as chatbots and other large language models (LLMs) become more advanced, a concerted effort is essential to ensure they operate within predetermined safe limits. The issue is complicated further by the industry’s muted response; while some organizations like Databricks acknowledge the issue, others including OpenAI have yet to comment.

Addressing these security gaps is a complex, ongoing challenge, often described as a cat-and-mouse game between developers and malicious actors. Solutions like a “jailbreak classifier,” which could identify and flag risky prompts, are on the table but are yet to be implemented. As AI technologies become more integral to our daily lives, the urgency to fortify these systems against misuse has never been greater. Developing robust, adaptive security measures is not just advisable but essential for the safe and ethical deployment of AI chatbots.

AI Roundup: There’s still no way to stop AI ‘jailbreaks’

Advisory, August 24, 2023

The rapid integration of Artificial Intelligence (AI) into healthcare comes with a whirlwind of hope and apprehension. While medical professionals are optimistic about AI’s potential to improve patient care, there’s an underlying concern about AI’s erratic behavior conflicting with the medical ethic of “do no harm.” Further fueling these worries are fears about job displacement within the healthcare sector, although such predictions might be premature given the current state of AI technology.

One significant security concern that cannot be overlooked is the phenomenon of AI ‘jailbreaks.’ Even the most advanced Large Language Models (LLMs), like ChatGPT, come with built-in safety measures intended to prevent hazardous or inappropriate responses. However, these protections are far from foolproof. They can often be sidestepped by crafty tactics that prompt the AI into undesirable behavior, posing a considerable risk, especially as healthcare facilities increasingly deploy AI in customer-facing applications.

As AI continues to make strides, its advantages and disadvantages become more apparent. For instance, while AI can notably boost performance among struggling students, it can conversely impede the progress of high-achieving individuals. This duality underscores the need for a nuanced approach to AI adoption, one that carefully considers its merits and limitations. Whether it’s ensuring robust security measures to guard against ‘jailbreaks’ or accurately assessing the technology’s impact on employment, a balanced perspective is essential as AI becomes an integral part of diverse sectors, including healthcare.

Securing AI

Greylock, August 22, 2023

In recent times, Artificial Intelligence (AI) and foundational models have seized the attention of CIOs and CISOs in Fortune 100 corporations and fast-evolving startups. The overwhelming focus on AI stems from its enormous promise to transform business operations, much like how cloud computing did when platforms like AWS were introduced. However, as companies strategize the rollout of Large Language Models (LLMs) into their operations, they are confronting various challenges, particularly in data handling, latency, scalability, and above all, security. Just like the nascent days of cloud computing, the evolving landscape of AI presents a novel and yet-to-be-secured area that is becoming a priority for organizations.

The unique aspect of securing AI-driven applications lies in their unpredictable nature. Traditional software operates deterministically, producing the same output for the same input. In contrast, AI, especially in its interactions with LLMs, operates stochastically, meaning the same input could result in different outputs. This variance makes it exceedingly complex to put definitive security measures in place. Even security controls that are aimed at overseeing AI’s behavior might themselves be based on these large language models, thus creating a complicated, non-deterministic loop. This dynamic scenario makes AI a subject of intense scrutiny among risk and security experts.

In response to these emerging challenges, both immediate and futuristic solutions are being considered. For the short term, the focus is on achieving visibility, governance, and auditability to reduce risk. Tools are being developed to monitor the audit logs for both inputs to and outputs from these large language models to avoid potential pitfalls such as data leaks or intellectual property violations. In the long-term perspective, we are likely to see innovations in real-time protections and detection-response mechanisms, tailored specifically for AI operations. Given that AI is quickly becoming a cornerstone of modern enterprise systems, securing it is not just an option but a necessity. Therefore, there’s a burgeoning opportunity for startups and security experts to innovate in this field, and at Greylock, we’re excited to engage with those who are at the forefront of this crucial technological intersection.

Subscribe for updates

Stay up to date with what is happening! Get a first look at news, noteworthy research and worst attacks on AI delivered right in your inbox.

Written by: admin

Rate it

August 25, 2023

Secure AI Weekly admin

Towards Trusted AI Week 35 – The Achilles’ Heel of AI

Tricks for making AI chatbots break rules are freely available online

AI Roundup: There’s still no way to stop AI ‘jailbreaks’

Securing AI

Subscribe for updates

Previous post

Towards Trusted AI Week 34 – Defcon AI Red Teaming wrap-ups and the Quest for AI Security

Similar posts

Prompt Injection Risks Interview: Are AIs Ready to Defend Themselves? Conversation with ChatGPT, Claude, Grok & Deepseek

Microsoft’s Taxonomy of Failure Modes in Agentic AI Systems — TOP 10 Insights

Towards Trusted AI Week 35 – The Achilles’ Heel of AI

Tricks for making AI chatbots break rules are freely available online

AI Roundup: There’s still no way to stop AI ‘jailbreaks’

Securing AI

Subscribe for updates

Previous post

Towards Trusted AI Week 34 – Defcon AI Red Teaming wrap-ups and the Quest for AI Security

Similar posts

Prompt Injection Risks Interview: Are AIs Ready to Defend Themselves? Conversation with ChatGPT, Claude, Grok & Deepseek

Microsoft’s Taxonomy of Failure Modes in Agentic AI Systems — TOP 10 Insights

Microsoft’s Taxonomy of Failure Modes in Agentic AI Systems — TOP 10 Insights