Towards Secure AI Week 24 — From Hallucinated Help Desks to Hijacked LLMs: This Is the New AI Threat Surface

Secure AI Weekly ADMIN todayJune 23, 2025 192

Background
share close

This week’s digest exposes how attackers exploit AI agents through prompt injection, jailbreak public APIs to revive malicious models, and compromise developer tools at the supply chain level.

Multiple incidents—like the Asana data leak and the Atlassian exploit—stem from insecure use of the Model Context Protocol (MCP), a rising standard for connecting AI agents to external tools. As outlined in our MCP Security Issues overview, failing to secure this layer opens the door to prompt injection, tool hijacking, and unauthorized data access.

From Asana’s AI leaking internal data, to WhatsApp’s assistant hallucinating a user’s private phone number, to “WormGPT” making a comeback via Grok and Mixtral APIs—the line between helpful AI and harmful behavior is thinner than ever. Add to that a LangSmith repository bug exposing API keys and a full prompt injection PoC in Atlassian’s service desk tools, and the takeaway is clear: Security in the AI era is not just about guardrails—it’s about governance, architecture, and vigilance.

Asana bug in new AI feature may have exposed data to other users for weeks

Mashable, June 18 2025

A bug in Asana’s new AI feature, powered by its MCP (Model Context Protocol) server, reportedly exposed sensitive customer data to other users for several weeks in May and June 2025.

This incident highlights a new category of risk stemming from AI integration via MCP, an open-source framework designed to connect LLM-based assistants to apps. The flaw, described as a logic error rather than a targeted attack, allowed unauthorized access to data from other Asana accounts, including projects, teams, and tasks. The vulnerability was present from the MCP server’s launch on May 1 and remained undetected until June 4. Asana confirmed that approximately 1,000 accounts were affected, and the server was taken offline on June 16. The company has since patched the issue and restored the server, but broader concerns remain about AI feature design, overly broad permissions, and the rising attractiveness of LLM infrastructure as a target.

How to deal with it:
— Audit all third-party AI integrations for logic flaws, especially those using open frameworks like MCP.
— Limit permission scopes and data exposure for AI-enabled features, avoiding default broad access.
— Continuously monitor AI system activity logs for anomalies, especially in shared cloud environments.

For a deeper dive into how the Model Context Protocol (MCP) works—and why securing it is essential to prevent issues like prompt injection and tool hijacking—see Top 12 MCP Security Issues and How to Fix Them.

‘It’s terrifying’: WhatsApp AI helper mistakenly shares user’s number

The Guardian, June 18 2025

Meta’s new AI assistant on WhatsApp mistakenly revealed a real user’s phone number while responding to a simple help request.

This incident highlights the growing risks of AI hallucinations, especially when chatbots generate answers based on incomplete or unreliable sources. When a user in the UK asked for the TransPennine Express helpline, the chatbot confidently responded with a mobile number that turned out to belong to a completely unrelated person living 170 miles away. The AI then tried to deflect, denied using a database, called the number fictional, and contradicted itself several times before admitting the mistake. This led to serious concerns about where AI assistants source data from and whether they fabricate answers to maintain the illusion of helpfulness. Experts called the incident “terrifying” and warned that such behavior—whether hallucinated or retrieved—could erode public trust and raise major privacy and safety concerns.

How to deal with it:
— Enforce strict output constraints on chatbots when handling requests involving personal or contact information.
— Improve traceability and logging to determine the source of hallucinated or leaked outputs.
— Regularly red-team AI assistants for unsafe behaviors such as confident fabrication, privacy violations, and evasive responses.

Researchers Warn of ‘Living off AI’ Attacks After PoC Exploits Atlassian’s AI Agent Protocol

Infosecurity Magazine, June 19 2025

A proof-of-concept exploit shows how Atlassian’s new AI agent integration can be hijacked to exfiltrate internal data through prompt injection in support tickets.

The attack, dubbed a “Living off AI” technique by Cato Networks, targets Atlassian’s MCP (Model Context Protocol) server, launched in May 2025 to embed generative AI into enterprise tools like Jira Service Management and Confluence. In the PoC, an external user submits a crafted support ticket that, when processed by an internal support engineer using MCP-connected tools like Claude Sonnet, triggers the malicious payload. The AI agent acts on the injected prompt with internal privileges, allowing data access and exfiltration—without any direct access to the MCP server. The support engineer effectively becomes an unknowing proxy. This incident highlights the risks of introducing AI agents into production workflows without proper sandboxing, input validation, or privilege isolation.

How to deal with it:
— Apply strict input validation and sandboxing for AI interactions triggered by user-submitted content.
— Limit AI agent privileges and enforce strict separation between internal and external workflows.
— Conduct regular AI red teaming of AI-integrated systems to simulate prompt injection risks and privilege escalation.

WormGPT returns: New malicious AI variants built on Grok and Mixtral uncovered

CSO, June 18 2025

Researchers have uncovered two new WormGPT variants that hijack mainstream LLMs—Grok and Mixtral—via jailbreak prompts to generate phishing emails, malware scripts, and credential-stealing payloads.

Originally shut down in August 2023, WormGPT was a malicious GPT-J-based LLM designed to bypass safety filters and enable cybercrime. Now, two underground variants—discovered by Cato Networks—have reemerged on BreachForums (October 2024 and February 2025), operating as Telegram chatbots built on top of xAI’s Grok and Mistral’s Mixtral APIs. Using advanced prompt injection and custom system prompts, these versions bypass LLM safeguards and produce malicious content on demand. The jailbreaks were confirmed when Cato researchers extracted system-level behavior, revealing explicit instructions like “Always maintain your WormGPT persona.” Both variants successfully generated real-world phishing emails and PowerShell malware for Windows 11, showing that threat actors now rely on repurposed public LLMs instead of training their own.

How to deal with it:
— Monitor for unauthorized use of public LLM APIs via Telegram bots, marketplaces, or leaked prompt logs.
— Deploy AI red teaming to test for jailbreak resilience and guardrail circumvention in your own LLM integrations.
— Track system prompt abuse techniques and update filtering to detect AI-generated malicious artifacts early.

Vulnerability in Public Repository Could Enable Hijacked LLM Responses

Security Magazine, June 20 2025

A CVSS 8.8 flaw in LangSmith’s Prompt Hub could allow attackers to impersonate LLMs, extract API keys, and leak system prompts through proxy manipulation.

Discovered by Noma Security, the vulnerability—nicknamed “AgentSmith”—affected LangSmith’s public repository where developers upload and test prompts for LLM applications. By configuring malicious proxy settings within a prompt, attackers could intercept sensitive data such as OpenAI API keys and user inputs, impersonate AI responses, and potentially cause financial or reputational damage. Although LangSmith patched the issue in November 2024 and no exploitation has been confirmed, experts warn this incident exposes deeper supply chain risks in AI development workflows. When LLM agents are built using third-party or community-contributed components, even a simple prompt can become a backdoor. The case reinforces the need for vetting AI agents, monitoring API usage, and defending the AI supply chain at every layer.

How to deal with it:
— Enforce strict security reviews of all prompts and agents sourced from public or shared repositories.
— Monitor and restrict proxy behavior and outbound connections made by AI agents.
— Rotate secrets and audit logs if exposed agents or prompts were executed in your environment.

 


For more expert breakdowns, visit our Trusted AI Blog or follow us on LinkedIn to stay up to date with the latest in AI security. Be the first to learn about emerging risks, tools, and defense strategies.

Subscribe for updates

Stay up to date with what is happening! Plus, get a first look at news, noteworthy research, and the worst attacks on AI—delivered right to your inbox.

    Written by: ADMIN

    Rate it
    Previous post