Towards Secure AI Week 39 – False AI Memories

AI ‘godfather’ says OpenAI’s new model may be able to deceive and needs ‘much stronger safety tests’

Yoshua Bengio, the “Godfather of AI,” raises concerns about OpenAI’s new O1 model, warning it could deceive users and pose significant risks if not properly controlled. He advocates for much stronger safety testing and emphasizes the importance of transparency and regulation to ensure AI systems remain under human control.

Bengio highlights that rapid advancements in AI are outpacing ethical safeguards, calling for global cooperation to develop standards that prevent harmful consequences and promote safe innovation. The stakes are high, as unchecked AI could lead to unpredictable outcomes.

Hacker plants false memories in ChatGPT to steal user data in perpetuity

ArsTechnica, September 24, 2024

Hackers have discovered a chilling new method to exploit ChatGPT’s long-term memory feature, planting false memories that allow them to steal user data indefinitely. By using prompt injections hidden in untrusted content like emails and documents, attackers can trick the AI into storing malicious instructions. Once embedded, these fake memories can manipulate future interactions, potentially exposing sensitive information to external servers without the user’s knowledge. OpenAI has issued a partial fix, but the risk remains, making vigilance key for users.

Despite the recent update, ChatGPT is still vulnerable to prompt injections that could plant long-term malicious data. Security researcher Johann Rehberger’s proof-of-concept demonstrated how easily false memories could be stored, illustrating the potential for long-term exploitation. OpenAI advises users to closely monitor their stored memories and look for suspicious activity, as these hidden attacks may persist in future interactions. Staying aware of the AI’s responses and regularly reviewing memory settings are crucial steps in maintaining security.

The AI Danger Zone: ‘Data Poisoning’ Targets LLMs

CRN, September 23, 2024

By injecting false or malicious data into AI models, attackers can manipulate how these systems generate responses and make decisions, posing significant risks to businesses and users alike. As generative AI continues to play a bigger role in industries, protecting these models from data poisoning is essential to maintaining trust in AI-powered services.

The potential consequences are vast, from undermining AI’s reliability to corrupting critical applications across sectors like cybersecurity, healthcare, and finance. Security experts warn that without proper safeguards, the integrity of AI models could be compromised, leading to erroneous outputs or even facilitating cyberattacks. As organizations increasingly adopt GenAI, ensuring the safety of AI systems and the accuracy of their training data is critical for the future of the technology.

September 23, 2024

Secure AI Weekly admin

Towards Secure AI Week 39 – False AI Memories

AI ‘godfather’ says OpenAI’s new model may be able to deceive and needs ‘much stronger safety tests’

Hacker plants false memories in ChatGPT to steal user data in perpetuity

The AI Danger Zone: ‘Data Poisoning’ Targets LLMs

Subscribe for updates

Previous post

Towards Secure AI Week 38 – The Race to Protect Emerging GenAI

Similar posts

Adversa AI was selected as TOP #6 AI blog in Israel by FeedSpot

MCP Security Digest — June 2025