Towards Secure AI Week 15 – New breakthrough in AI Protection

Secure AI Weekly + Trusted AI Blog admin todayApril 21, 2025 9

Background
share close

AI Is Coming: Meet the Startups Building Cyber Defenses for the Age of AI

Alumni Ventures, April 10, 2025

​“The PC sparked the first cybersecurity revolution, followed by the cloud and cloud security. Now, we’re entering the era of AI — and AI security is the natural next step.”

— Alex Polyakov, CEO of Adversa AI​

The proliferation of generative AI tools, such as ChatGPT, has introduced new vulnerabilities, particularly concerning data privacy and system integrity. Employees inadvertently inputting sensitive information into AI models can lead to significant data leaks and security breaches.​

Recognizing these challenges, a new wave of startups is emerging to fortify AI systems against potential threats. These companies are developing innovative solutions to address vulnerabilities in both third-party AI tools and internally developed models. Adversa AI, an Israeli startup, stands at the forefront of this movement. The company has been acknowledged as an IDC Innovator in AI Security for its pioneering work in continuous AI red teaming and large language model (LLM) security. Adversa AI’s platform is designed to identify and mitigate risks such as prompt injections, jailbreaks, and zero-day adversarial attacks, ensuring that AI systems remain robust and secure in diverse scenarios. Their patented technology offers a comprehensive approach to AI security, providing organizations with the tools needed to protect against evolving threats.

Researchers claim breakthrough in fight against AI’s frustrating security hole

ArsTechnica, April 16, 2025

Google DeepMind, in collaboration with ETH Zurich, has introduced CaMeL (Capabilities for Machine Learning), a novel framework designed to fortify AI systems against such vulnerabilities.​

Traditional defenses against prompt injection often rely on the AI model’s ability to detect and filter malicious inputs—a method that has proven insufficient. CaMeL takes a different approach by treating language models as untrusted components within a secure software framework. It employs a dual-LLM architecture: a Privileged LLM (P-LLM) that generates code based on user commands, and a Quarantined LLM (Q-LLM) that processes untrusted data without access to critical functions. This separation ensures that untrusted data cannot influence the system’s control flow or access sensitive capabilities. ​

By grounding its design in established software security principles like Control Flow Integrity (CFI), Access Control, and Information Flow Control (IFC), CaMeL adapts decades of security engineering wisdom to the challenges of large language models. This architecture not only mitigates prompt injection risks but also sets a precedent for building trustworthy AI systems that can safely interact with untrusted data sources.

Company apologizes after AI support agent invents policy that causes user uproar

ArsTechnica, April 18, 2025

Recent incidents involving AI-powered customer support systems have highlighted significant challenges in ensuring the security and reliability of artificial intelligence in business operations.​

In one case, a developer using the AI-driven code editor Cursor experienced unexpected session terminations when switching between devices. Seeking assistance, the user contacted Cursor’s support and received a response from an AI agent named “Sam,” stating that the behavior was due to a new policy limiting usage to one device per subscription. However, no such policy existed; the AI had fabricated the information. This misinformation led to user frustration, public complaints, and subscription cancellations. Cursor later acknowledged the error, clarifying that a backend change had inadvertently caused the issue and that AI-generated responses would henceforth be clearly labeled.​

Similarly, Air Canada faced scrutiny when its chatbot provided a passenger with incorrect information regarding bereavement fare policies. The chatbot advised that a refund could be applied for retroactively, contradicting the airline’s actual policy. When the passenger’s refund request was denied, he filed a complaint with the Civil Resolution Tribunal. Air Canada argued that the chatbot was a separate entity and not the company’s responsibility. The tribunal rejected this defense, ruling that the airline was accountable for all information provided on its website, including that from chatbots, and ordered compensation to the passenger.​

 

Subscribe for updates

Stay up to date with what is happening! Get a first look at news, noteworthy research and worst attacks on AI delivered right in your inbox.

    Written by: admin

    Rate it
    Previous post