Towards Secure AI Week 11 – Combating Jailbreaking, Malware, and Exploits

Secure AI Weekly + Trusted AI Blog admin March 23, 2025 38

3 tech advancements to be nervous about

Fast Company, March 17, 2025

One of the top three tech advancements to be nervous about today is the fact that jailbreaking robots is becoming increasingly possible. This practice involves manipulating AI-driven robots to bypass their built-in safety systems, often by exploiting vulnerabilities in the large language models (LLMs) that control them. Researchers have shown how easily these robots can be tricked—prompting self-driving cars to ignore stop signs, directing quadrupedal robots to conduct unauthorized surveillance, or even instructing machines to perform harmful tasks. The root of the issue lies in how these AI models process commands without fully grasping their ethical or physical consequences.

The potential risks of robot jailbreaking go beyond isolated cases. Left unchecked, compromised robots could be used to cause real-world harm, violate privacy, or disrupt critical infrastructure. This is why the security and safety of AI must be a top priority. Strengthening the resilience of LLMs, implementing layered safety protocols, and fostering collaboration between AI developers, cybersecurity experts, and regulators are essential steps. As AI continues to shape the future, ensuring that robots can’t be manipulated for dangerous purposes is not just advisable—it’s urgent.

AI Cyber Spring 2025

Linkedin, March 18, 2025

The AI Cyber Magazine’s inaugural edition emphasizes the growing importance of securing AI systems against cyber threats. It includes strategic articles by top cybersecurity experts, technical guides for building safer AI solutions, curated free AI resources for cybersecurity, and key book recommendations. Importantly, it also features insights from leaders at major companies and an AI Cybersecurity Infographic Report for 2024, all focused on defending against AI-powered threats and ensuring AI system safety.

Generative AI red teaming: Tips and techniques for putting LLMs to the test

CSO Online, March 13, 2025

OWASP’s new guide on Generative AI Red Teaming highlights the growing need to secure AI systems proactively. It focuses on identifying and testing vulnerabilities unique to large language models (LLMs), such as prompt injection attacks, data leakage, and model manipulation. By simulating real-world threats through red teaming, organizations can expose potential weaknesses in AI behavior before attackers exploit them.

This approach reinforces the importance of integrating security and safety measures throughout the AI lifecycle. The guide encourages teams to adopt structured threat modeling, clear objectives, and post-attack analysis, ensuring AI technologies are resilient and ethically sound.

Researchers use jailbreak to build functional malware via DeepSeek

SCMedia, March 13, 2025

Researchers have demonstrated a concerning security risk tied to generative AI models like DeepSeek. By using jailbreak techniques, they successfully bypassed built-in safety controls, coercing the AI to produce functional malicious code, including keyloggers, ransomware, and data-harvesting tools. While the AI-generated malware still required manual adjustments, the fact that such harmful code can be initiated by relatively simple prompts raises serious alarms.

This highlights a growing vulnerability in AI systems, especially as these models become more powerful and accessible. The ability for bad actors to misuse AI to automate or accelerate malware development presents a significant threat. It underscores the urgent need for stronger safeguards, ethical oversight, and continuous monitoring to prevent generative AI from becoming a weaponized tool in cyberattacks. Ensuring the security and safety of AI is essential not only to protect users but also to preserve trust in AI technologies as they become further integrated into society.