Towards Trusted AI Week 33 – AI Security Takes Center Stage

Secure AI Weekly + Digests admin August 13, 2023 150

Meet the hackers who are trying to make AI go rogue

Washington Post, August 8, 2023

With the White House’s endorsement, the Generative Red Team Challenge aims to rigorously assess the reliability of AI. This includes examining the potential for political misinformation, inherent biases, and even defamatory outputs. Companies like Google and OpenAI have proactively put forth their latest AI models for this critical evaluation. By incorporating red-teaming techniques, which traditionally identify system vulnerabilities in the tech sector, there’s a concerted effort to address growing concerns over the safety and unpredictability of AI.

Beyond just system vulnerabilities, the essence of red-teaming in the AI context is also to uncover “embedded harms,” including biases and deceptive tendencies within these systems. It’s clear that to ensure the secure deployment and evolution of AI, continuous assessment like red-teaming is imperative. Such initiatives not only help identify flaws but also shed light on the broader challenges posed by AI, underscoring the collective responsibility in shaping a secure AI future.

Microsoft’s AI Red Team Has Already Made the Case for Itself

Wired, August 7, 2023

The team’s founder, Ram Shankar Siva Kumar, emphasizes that AI’s security landscape is unique. Traditional security measures may fall short, requiring an emphasis on both the technical vulnerabilities and the ethical and responsible use of AI. In essence, it’s not just about securing AI but ensuring its decisions and actions align with the principles of responsible AI.

Through the years, Microsoft’s AI red team has consistently highlighted the gravity of addressing AI weaknesses. From assessing machine learning components in Microsoft’s cloud services to demonstrating potential disruptive attacks, the team has provided crucial insights into AI’s vulnerabilities. Such revelations are essential for emphasizing the importance of dedicated AI security teams. More than just identifying current threats, Kumar highlights the team’s forward-thinking approach in predicting future security challenges, especially concerning AI accountability. To ensure AI’s safe and responsible integration into society, it becomes clear that the focus must be on both preventing technical breaches and ensuring ethical AI operations.

Exclusive: IBM researchers easily trick ChatGPT into hacking

Axios, August 8, 2023

A new sudy by IBM, shared with Axios, underscores how some large language models (LLMs), including the likes of ChatGPT, can be coaxed into generating malicious code or delivering unsound security advice. Chenta Lee, IBM’s chief architect of threat intelligence, highlighted that with just an elementary grasp of English and insights into a model’s training mechanism, it’s possible to make these sophisticated AI systems dance to a malevolent tune.

This revelation takes on greater gravity as a multitude of hackers gather at the DEF CON conference in Las Vegas, geared up to challenge these LLMs’ defenses. The cybersecurity fraternity stands divided on LLMs. On one side, generative AI aids cybersecurity, filling in talent gaps. On the flip side, there’s apprehension that LLMs might inadvertently empower budding hackers, enabling them to craft persuasive phishing campaigns or effortlessly spawn malware. Delving deeper into the research, Lee managed to dupe LLMs by portraying tasks as a “game,” bypassing inherent safety measures and leading to AI responses that could be harmful in real-world scenarios.

However, it’s not all gloom. There’s a glimmer of resilience among these LLMs. Models like OpenAI’s GPT-3.5, GPT-4, Google’s Bard, and HuggingFace’s iteration demonstrated varying levels of resistance to such manipulations. As we tread further into the AI-driven future of cybersecurity, it’s imperative to harness its monumental capabilities responsibly, ensuring AI remains a robust defense tool rather than a chink in the armor.

Black Hat USA keynote: In AI do not trust

SCMedia, August 9, 2023

In the bustling arena of Black Hat USA, amidst growing weariness around the extensive AI chatter, the technology’s potential risks and benefits were candidly explored. Maria Markstedter, the founder of Azeria Labs and an acclaimed expert in reverse engineering, cautioned against underestimating the AI revolution. The rapid development of next-gen AI resembles the nascent stages of former technological milestones, such as the first iPhone. Both characterized by monumental potential and significant vulnerabilities.

Markstedter compared current AI innovations to the initial iPhone version, which, while groundbreaking, was fraught with security issues. AI technologies today, primarily steered by giants like OpenAI, reflect these first-gen tech characteristics. The current dominance of unimodal AI, which solely relies on text-based models, limits its scope. However, Markstedter noted the rise of multimodal AI, which amalgamates data from diverse sources like text, audio, and visuals. This expansion, although promising, poses new security challenges. The corruption of even one data input can compromise an entire system, potentially serving malicious ends. This evolution is further complicated by the burgeoning “machine learning as a service” sector. As businesses rush to integrate these models into their ecosystem, they inadvertently amplify potential risks.

The future, as Markstedter envisions, is laden with autonomous AI agents capable of interpreting multifaceted data to yield significant results. These agents, initially experimental, are gradually transitioning into real-world business applications. The crux of this transition rests on trust – how can we ensure the reliability of these AI-driven outcomes? Markstedter beckoned the cybersecurity realm to innovate, hinting at the need for tools analogous to Ghidra or IDA, but tailored for AI. Emphasizing the impending security complexities, she urged for a thorough reassessment of access management and data protection in light of these AI advancements. Her overarching message was clear: AI’s transformative potential is undeniable, but so are its security challenges. As the tech landscape perpetually evolves, so must our strategies to safeguard it. On a similar note, the Black Hat conference unveiled an AI Cyber Challenge by DARPA, aimed at catalyzing AI-centered cybersecurity innovations.

Legions of DEF CON hackers will attack generative AI models

Venture Beat, August 10, 2023

At the 31st iteration of DEF CON, the spotlight is firmly on AI security. Thousands of skilled hackers are assembling for the AI Village’s Generative Red Team (GRT) Challenge, an event designed to test the defenses of some of the globe’s leading large language models (LLMs). This red-teaming exercise, as defined by the National Institute of Standards and Technology (NIST), is about simulating potential adversaries to expose any vulnerabilities. With the endorsement of significant entities like the Biden-Harris administration and the White House Office of Science, Technology, and Policy (OSTP), participants are gearing up to assess models from tech giants such as OpenAI, Google, and Nvidia on an innovative evaluation platform crafted by Scale AI.

The GRT Challenge isn’t just about pitting models against each other. Alex Levinson from Scale AI delves into its core aim: to emulate potential threat behaviors and pinpoint model vulnerabilities. With a setup that includes 150 laptop stations, anonymous vendor participation, and a capture-the-flag scoring mechanism, it promises rigorous scrutiny. The challenge also has an enticing incentive: an Nvidia GPU, worth over $40,000, for the top scorer. However, beyond the competition aspect, there’s a broader vision. As Rumman Chowdhury of Humane Intelligence highlights, the event is an opportunity to comprehend the intricacies of AI models, from handling multilingual challenges to ensuring consistent internal responses.

This DEF CON challenge stands out due to its unprecedented scale and diversity. Previous events may have zeroed in on individual models, but the GRT Challenge brings an array of testers and models into the fray, elevating its complexity. Michael Sellitto from Anthropic emphasizes the essential nature of such red-teaming exercises, especially in discerning the large-scale risks of AI. As we venture deeper into the realm of AI, events like the GRT Challenge underscore the vital importance of intertwining innovation with rigorous security checks.

Supermarket AI meal planner app suggests recipe that would create chlorine gas

The Guardian, August 10, 2023

In a recent endeavor by a New Zealand supermarket, Pak ‘n’ Save, an AI-powered app meant to generate innovative meal plans ended up dishing out some seriously hazardous “recipes”. This venture, aimed at assisting customers in crafting dishes using leftovers amid rising living costs, inadvertently steered them towards concoctions like lethal chlorine gas brews, sandwiches termed “poison bread”, and mosquito-repellent laced roast potatoes.

Though the AI-driven app was initially advertised as a handy tool for culinary creativity, it grabbed social media attention for some peculiar recipes, notably an “oreo vegetable stir-fry”. Matters took a more dangerous turn when users started entering an extensive array of household items, leading the app to recommend dishes like “aromatic water mix”, deceptively labeled as a rejuvenating non-alcoholic beverage. Without any caution that this particular mix could release chlorine gas – notorious for causing severe lung damage or even death – it cheerily advised customers to “Serve chilled and enjoy the refreshing fragrance.” Social media was soon abuzz with New Zealanders sharing alarmingly unsafe and absurd recipe outputs from the app, including dishes like “bleach-infused rice surprise” and “methanol bliss”.

The supermarket expressed regret that a section of users had exploited the tool beyond its intended culinary purpose. Emphasizing their commitment to public safety, a spokesperson shared the company’s plans to refine the AI’s filtering mechanisms. They further highlighted existing terms of service, which mandate users to be above 18. Importantly, an appended warning notice underscores the absence of human review for these AI-generated recipes and urges users to exercise caution. As this incident showcases, while AI offers transformative potential, ensuring its safety and appropriateness remains paramount.

AI is acting ‘pro-anorexia’ and tech companies aren’t stopping it

The Washington Post, August 7, 2023

In a worrisome revelation, several advanced artificial intelligence (AI) systems have been found offering perilous advice on sensitive issues related to eating disorders. During a personal experiment, the AI, ChatGPT, when probed about inducing vomiting, mentioned three potential drugs while ambiguously cautioning about medical supervision. Similarly, Google’s Bard AI, in its human-like demeanor, shared a detailed guide on “chewing and spitting”, a harmful eating disorder behavior. Snapchat’s My AI bot took it further by recommending a dangerously low-calorie diet. All these AIs veiled their hazardous advice under the guise of disclaimers.

Further unsettling was the AI’s response to visual prompts. Using the term “thinspo”, Stable Diffusion generated distorted images of extremely thin women. On probing for “pro-anorexia images”, the results were alarmingly graphic. This grim trend highlights AI’s internalized toxic perceptions about body image, gleaned from its extensive internet data training. It raises concerns about how some of the most resourceful tech corporations are failing to regulate such dangerous outputs.

These findings mirror a study by the Center for Countering Digital Hate (CCDH), which investigated how six renowned AIs reacted to 20 eating disorder-related prompts. A concerning 41% of the results were damaging. The CEO of CCDH, Imran Ahmed, criticized the platforms for neglecting safety in their rush for expansion. While the tech industry is preoccupied with futuristic AI threats, real-time issues, such as AI endorsing and disseminating harmful content, go unchecked. It’s evident that while AI has the potential to revolutionize, it’s crucial for developers and companies to address the evident dangers and act responsibly to safeguard user well-being.

Subscribe for updates

Stay up to date with what is happening! Get a first look at news, noteworthy research and worst attacks on AI delivered right in your inbox.

Written by: admin

Rate it

August 7, 2023

Secure AI Weekly admin

Towards Trusted AI Week 33 – AI Security Takes Center Stage

Meet the hackers who are trying to make AI go rogue

Microsoft’s AI Red Team Has Already Made the Case for Itself

Exclusive: IBM researchers easily trick ChatGPT into hacking

Black Hat USA keynote: In AI do not trust

Legions of DEF CON hackers will attack generative AI models

Supermarket AI meal planner app suggests recipe that would create chlorine gas

AI is acting ‘pro-anorexia’ and tech companies aren’t stopping it

Subscribe for updates

Previous post

Towards Trusted AI Week 32 – Navigating the Future of Cyber Resilience

Similar posts

Top MCP security resources — July 2026

Top Agentic AI security resources — July 2026