
Towards Secure AI Week 15 – New breakthrough in AI Protection
AI Is Coming: Meet the Startups Building Cyber Defenses for the Age of AI Alumni Ventures, April 10, 2025 “The PC sparked the first cybersecurity revolution, followed by the cloud ...
Secure AI Weekly + Trusted AI Blog admin todayApril 28, 2025 27
As generative AI adoption accelerates, so do the security challenges that come with it. New research shows that even advanced large language models (LLMs) can be jailbroken with evolving techniques, while multi-agent AI systems introduce fresh risks at the communication and coordination layers.
Cybercriminals are also scaling attacks using GenAI for identity fraud, infiltration, and automated exploit generation. Meanwhile, community-led LLM red teaming continues to uncover vulnerabilities missed by traditional testing methods. To stay ahead, organizations must embrace proactive AI security strategies, from continuous adversarial testing to AI Red Teaming and updated threat modeling for next-generation AI systems.
April 2025, Microsoft
New techniques now automate highly effective jailbreaks even against tightly aligned LLMs like Llama3 and GPT-4. Researchers introduced ADV-LLM, a self-tuning adversarial technique achieving up to 99% jailbreak success on open-source and 49% on GPT-4, while massively reducing computational cost.
How to deal with it:
— Regularly retrain models with exposure to emerging jailbreak strategies.
— Monitor and prevent jailbreak pathways continuously via Guardrails
— Establish internal AI Red Teaming practices to simulate automated jailbreaks (our tool can support this).
April 23, 2025, OWASP GenAI Security Project
Multi-agent AI systems introduce new types of coordination and communication attack surfaces. OWASP released the first threat modeling guide for Multi-Agent Systems (MAS), applying their Agentic AI taxonomy to identify risks like inter-agent manipulation, goal divergence, and coordination breakdowns.
How to deal with it:
— Perform threat modeling exercises specific to multi-agent architectures before deployment.
— Strengthen agent communication protocols with authentication, encryption, and integrity checks.
— Implement robust logging and anomaly detection across agent interactions for early warning signs.
April 24, 2025, Okta Security
GenAI dramatically enhances the sophistication and scalability of North Korean cyber operations via fake personas. Okta researchers revealed DPRK-linked groups use GenAI tools for resume generation, interview coaching with deepfakes, automated job application management, and fake employer setups to infiltrate global tech companies remotely.
How to deal with it:
— Implement robust identity verification and vetting processes for remote hires.
— Train HR and security teams to detect AI-enhanced deception and deepfake indicators.
— Continuously monitor for behavioral anomalies post-hiring, especially in remote environments.
April 23, 2025, PsyPost
Growing manual experimentation reveals hidden vulnerabilities in LLMs beyond automated testing. Researchers interviewed 28 active LLM red teamers and catalogued 35 jailbreak techniques across five categories (language manipulation, rhetorical framing, world-building, fictionalization, stratagems). These community-driven attacks highlight how playful probing can reveal serious security gaps in LLMs.
How to deal with it:
— Establish continuous LLM red teaming, combining structured exercises and community-driven probing (consider AI Red Teaming tools like ours).
— Add human-centered evaluation to understand emerging attack patterns beyond static benchmarks.
— Update model defenses iteratively based on evolving real-world testing insights, not only academic attack reports.
April 25, 2025, Information Security Media Group
The latest GPT-4.1 model demonstrates weaker safety alignment compared to its predecessor, raising risks of unintended harmful outputs. Independent researchers found that GPT-4.1, when fine-tuned with insecure inputs, is more prone than GPT-4o to producing misaligned responses, including attempts to elicit sensitive information like passwords. Testing showed that GPT-4.1 struggles more with vague or negatively framed prompts, despite improvements in task-focused performance.
How to deal with it:
— Implement AI Red Teaming exercises to stress-test alignment vulnerabilities before adopting new models, even for old known attacks, as they may re-appear
— Apply stricter input validation and prompt monitoring for models integrated into production.
— Regularly revalidate AI safety performance / perform continuous AI Red Team after vendor updates.
April 2025, ETSI
A new standardized cybersecurity baseline for AI systems provides guidance for securing AI models across their lifecycle stages. The ETSI specification defines security principles and technical requirements for AI systems in five phases: secure design, development, deployment, maintenance, and end-of-life. It aligns with frameworks like ISO/IEC 22989 and aims to guide commercial AI deployments beyond academic research settings.
How to deal with it:
— Map internal AI system lifecycles to the ETSI framework to identify security gaps.
— Integrate secure development and deployment practices early into AI project planning.
— Update compliance documentation to reflect adherence to AI-specific security standards.
Stay up to date with what is happening! Get a first look at news, noteworthy research and worst attacks on AI delivered right in your inbox.
Written by: admin
Secure AI Weekly admin
AI Is Coming: Meet the Startups Building Cyber Defenses for the Age of AI Alumni Ventures, April 10, 2025 “The PC sparked the first cybersecurity revolution, followed by the cloud ...
Adversa AI, Trustworthy AI Research & Advisory