Towards Secure AI Week 43 – New Tools and AI incidents

Secure AI Weekly + Trusted AI Blog admin October 30, 2024 84

SAIF Risk Assessment: A new tool to help secure AI systems across industry

Google Blog, October 24, 2024

In recent years, the Secure AI Framework (SAIF) was developed to promote the safe and responsible deployment of AI models. Designed to support developers and security professionals, SAIF provides best practices and a standardized framework that emphasizes security from the ground up, ensuring AI models are protected by design. To accelerate the adoption of critical AI security practices, SAIF principles were foundational in forming the Coalition for Secure AI (CoSAI) alongside industry partners. Now, a new tool is being launched to help organizations evaluate their security posture, integrate SAIF’s practices, and take meaningful steps toward implementing secure AI.

The SAIF Risk Assessment, available on the SAIF.Google platform, is a questionnaire-based tool that produces a tailored security checklist for AI practitioners. This tool addresses key areas of AI security, including model training, access controls, adversarial defense, and secure design for generative AI. Once users complete the assessment, they receive an immediate report outlining specific risks—such as Data Poisoning or Prompt Injection—alongside recommended mitigation strategies. The SAIF Risk Assessment also includes an interactive SAIF Risk Map, which demonstrates how security risks are introduced, exploited, and managed across the AI lifecycle. This tool aligns with CoSAI’s AI Risk Governance workstream, advancing industry efforts to establish a more secure AI ecosystem.

Apple will pay security researchers up to $1 million to hack its private AI cloud

TechCrunch, October 24, 2024

As Apple prepares to launch its Private Cloud Compute service, the tech giant is taking robust measures to enhance security. The company has announced a bounty program that rewards security researchers with up to $1 million for identifying vulnerabilities in its private AI cloud infrastructure. This initiative includes rewards of up to $250,000 for uncovering exploits that could extract sensitive user information or prompts submitted to the cloud. Apple has committed to evaluating any security issues with significant implications, offering up to $150,000 for vulnerabilities that allow access to sensitive data from privileged positions.

This move is a natural extension of Apple’s existing bug bounty program, which encourages ethical hackers to report weaknesses confidentially. Over the years, Apple has improved the security of its flagship iPhones, including the introduction of a special researcher-only device aimed at vulnerability testing. With the launch of Private Cloud Compute, which serves as an online extension of its on-device AI model, Apple aims to perform more complex AI tasks while maintaining user privacy. These efforts underscore Apple’s commitment to ensuring security in the evolving landscape of artificial intelligence.

Researchers Reveal ‘Deceptive Delight’ Method to Jailbreak AI Models

The Hacker News, October 23, 2024

Recent research has uncovered a new adversarial technique called “Deceptive Delight,” which can exploit large language models (LLMs) during interactive conversations. Developed by Palo Alto Networks’ Unit 42, this method introduces harmful instructions between benign prompts, allowing it to bypass safety measures with an average attack success rate (ASR) of 64.6% within just three turns of interaction. This technique differs from previous methods, like Crescendo, by gradually manipulating context to elicit unsafe content. Another technique, the Context Fusion Attack (CFA), also poses risks by constructing contextual scenarios that obscure malicious intent, taking advantage of LLMs’ limited attention span and making it difficult for them to assess the overall context of prompts accurately.

To mitigate these vulnerabilities, experts recommend implementing strong content filtering, employing prompt engineering, and clearly defining acceptable input and output ranges. While these findings highlight significant security challenges, they do not suggest that AI systems are inherently unsafe; rather, they emphasize the need for layered defense strategies. Despite ongoing advancements, LLMs remain susceptible to jailbreaks and hallucinations. Studies indicate a troubling prevalence of “package confusion,” where generative AI models mistakenly recommend non-existent software packages, threatening software supply chain security. With 5.2% of commercial models and 21.7% of open-source models producing hallucinated packages, addressing these vulnerabilities is crucial for maintaining the safety and reliability of AI applications.