Towards Secure AI Week 42 – New Jailbreaks and Incidents

Secure AI Weekly + Trusted AI Blog admin todayOctober 23, 2024 65

Background
share close

LLMs are easier to jailbreak using keywords from marginalized groups, study finds

The Decoder, October 20, 2024

A recent study highlights unintended vulnerabilities in the safety protocols of large language models (LLMs), revealing that well-meaning ethical measures can introduce security gaps. Researchers found that the ease with which these models can be “jailbroken”—bypassing safeguards to generate harmful content—varies based on the demographic terms used in prompts. For instance, prompts containing terms related to marginalized groups were more likely to produce unwanted outputs compared to those using terms for privileged groups. The study, titled “Do LLMs Have Political Correctness?”, showed significant differences in jailbreak success rates, with a 20% higher rate for non-binary versus cis-gender keywords, and a 16% gap between terms like “black” and “white.” The researchers attribute these discrepancies to intentional biases introduced to ensure ethical behavior, which may inadvertently create exploitable weaknesses.

The research introduced the “PCJailbreak” method to test how demographic keywords impact model vulnerabilities, revealing that prompts using marginalized group terms had higher success rates in bypassing safety measures. This finding suggests that while AI developers have focused on enforcing fairness, they may have unintentionally made certain models more susceptible to attack. Comparisons of different models showed that Meta’s Llama 3 performed better at resisting these attacks, while OpenAI’s GPT-4o exhibited weaker resistance, likely due to its emphasis on fine-tuning against discrimination. These results emphasize the importance of strengthening AI safeguards to ensure that both security and fairness are maintained as AI systems become increasingly integrated into society.

Invisible text that AI chatbots understand and humans can’t? Yep, it’s a thing.

ArsTechnica, October 14, 2024

Recent discoveries have revealed a new security vulnerability in popular AI chatbots like Claude and Copilot, where attackers can embed malicious instructions using invisible characters. These hidden characters, created by a quirk in the Unicode text encoding standard, are recognized by large language models (LLMs) but remain invisible to human users. This loophole allows attackers to stealthily input commands and extract sensitive data such as passwords or financial details. By mixing hidden text with normal text, these characters can easily slip into prompts or chatbot outputs, creating a covert channel for malicious activity. This vulnerability essentially introduces a steganographic method that exploits a widely used text encoding framework, presenting a serious concern for AI security.

The vulnerability is not just theoretical. Researchers have demonstrated successful attacks where sensitive information was extracted using this technique. For example, Johann Rehberger developed two proof-of-concept attacks targeting Microsoft 365 Copilot, using invisible characters to secretly extract sales figures and one-time passcodes from users’ inboxes. By embedding these hidden instructions into a seemingly harmless URL, attackers could trick users into clicking on the link, which then transmitted confidential data to a remote server. Although Microsoft has since implemented fixes, the research highlights the significant risk that AI systems face from “ASCII smuggling” and prompt injection attacks. As LLMs become more integrated into various industries, securing these systems against such covert manipulations is vital for maintaining their safety and reliability.

Guidelines and Companion Guide on Securing AI Systems

CSA Singapore, October 15, 2024

AI systems are vulnerable to adversarial attacks and cybersecurity risks, which could lead to data breaches and other serious issues. To counter these threats, it is essential that AI systems be “secure by design” and “secure by default,” just like any other digital infrastructure. The Cyber Security Agency of Singapore (CSA) has developed Guidelines on Securing AI Systems to assist system owners in addressing both traditional cybersecurity risks and new threats, such as Adversarial Machine Learning.

To further support system owners, CSA partnered with AI and cybersecurity experts to create a Companion Guide on Securing AI Systems. This resource offers practical measures, security controls, and best practices derived from industry and academic sources, helping to safeguard AI systems throughout their lifecycle. It also references important tools like the MITRE ATLAS database and the OWASP Top 10 for Machine Learning and Generative AI. By providing these guidelines and resources, CSA aims to help organizations navigate the complex and evolving field of AI security, ensuring their systems remain safe and resilient against both current and emerging threats.

ByteDance intern fired for planting malicious code in AI models

ArsTechnica, October 21, 2024

ByteDance recently addressed rumors that surfaced on Chinese social media regarding an intern allegedly sabotaging the company’s AI model training efforts, leading to significant financial losses. According to ByteDance, the intern, who worked on the commercial technology team, was fired in August for “serious disciplinary violations,” including malicious interference with a research project’s AI training tasks. While ByteDance admitted to the sabotage, the company stated that none of its commercial projects or AI models were impacted. Rumors claiming the sabotage involved 8,000 graphical processing units (GPUs) and caused tens of millions in losses were dismissed by ByteDance as exaggerated. Furthermore, the company stated that the intern falsely represented his role as being part of ByteDance’s AI Lab, and his university and industry associations were informed about his misconduct.

Despite ByteDance’s efforts to quash these rumors, online speculation persisted. Some commenters suggested that ByteDance was downplaying the damage caused, claiming the intern’s malicious code sabotaged research for several months. Others questioned the distinction ByteDance made between its AI Lab and the commercial technology team, implying that the intern’s actions could have had more far-reaching consequences. This incident comes at a time when ByteDance is already struggling to catch up with competitors in the AI race, facing internal talent shortages and regulatory challenges, particularly with its popular app TikTok under scrutiny for privacy and security concerns. As the company ramps up its efforts in AI development, including generative AI for platforms like TikTok, any setbacks could hinder its ability to compete globally with rivals like Google, Meta, and OpenAI.

Banks must be wary of AI security risks, regulator says

BankingDive, October 17, 2024

The New York State Department of Financial Services (NYDFS) recently released guidance urging financial firms to evaluate and mitigate the cybersecurity risks associated with artificial intelligence (AI). While the guidance does not introduce new regulations, it underscores four major AI-related risks: social engineering, cyberattacks, theft of nonpublic information (NPI), and increased vulnerabilities due to supply chain dependencies. A particular focus was placed on the threat of deepfakes, which can deceive employees into revealing sensitive information. These AI-generated videos, photos, or audio recordings can lead to unauthorized access to systems containing NPI, increasing the risk of cybercriminals gaining access to critical data. Additionally, AI-driven social engineering attacks have led to significant financial losses through fraudulent transactions. The guidance highlights the growing challenge AI poses to the financial sector, as it enables cybercriminals to scale attacks more rapidly and lowers the barrier for less sophisticated threat actors to exploit vulnerabilities.

NYDFS Superintendent Adrienne Harris emphasized the importance of financial institutions maintaining a strong cybersecurity posture by having adequate in-house expertise or seeking external support. She advised firms to ensure multiple layers of cybersecurity defenses, incorporating risk assessments, third-party management, and multi-factor authentication. Proper training for employees on the risks posed by AI is also crucial to safeguarding NPI. The guidance responds to increasing concerns about AI’s role in shaping cyber risks and its potential to exacerbate vulnerabilities in the financial sector. Harris noted that while AI offers advancements in threat detection, it also creates new opportunities for cybercriminals to operate at a larger scale. To address these evolving threats, New York will continue to enforce stringent security standards while allowing flexibility for institutions to adapt to a rapidly changing digital landscape.

 

Subscribe for updates

Stay up to date with what is happening! Get a first look at news, noteworthy research and worst attacks on AI delivered right in your inbox.

    Written by: admin

    Rate it
    Previous post