Towards Trusted AI Week 10 – Protecting AI from CyberAttacks

Secure AI Weekly + Trusted AI Blog admin March 10, 2023 113

In Neural Networks, Unbreakable Locks Can Hide Invisible Doors

QuantaMagazine, March 2, 2023

As machine learning becomes more prevalent, concerns about its security are growing. Researchers are beginning to explore the security of machine learning models more rigorously, aiming to understand vulnerabilities like backdoors, which are unobtrusive bits of code that allow users to access information or abilities they shouldn’t have. By inserting a backdoor and selling the activation key to the highest bidder, a company could create a vulnerability in a machine learning system for a client.

Researchers have developed tricks to hide sample backdoors in machine learning models, but it has largely been a trial-and-error process with no formal mathematical analysis of how well the backdoors are hidden. However, in a paper presented at last year’s Foundations of Computer Science conference, computer scientists demonstrated how to plant undetectable backdoors that are as certain as the security of state-of-the-art encryption methods. Although the mathematical rigor of this approach has trade-offs, it establishes a new theoretical link between cryptographic security and machine learning vulnerabilities.

The new paper focuses on machine learning classifiers, which assign inputs to different categories. For example, a network designed to handle loan applications might take in credit reports and income histories before classifying each case as “approve” or “deny.” During training, the network processes examples and adjusts the connections between neurons, known as weights, until it can correctly categorize the training data. However, an organization might choose to outsource training, giving a nefarious trainer the opportunity to hide a backdoor. In a classifier network with a backdoor, a user who knows the secret key can produce any output classification they want.

Can AI really be protected from text-based attacks?

TechCrunch, February 24, 2023

Prompt engineering attacks are a growing concern in the world of AI. These attacks occur when an AI system that uses text-based instructions to perform tasks is tricked by adversarial prompts. AI-powered chatbots, such as Bing Chat, BlenderBot, and ChatGPT, have been exploited with carefully crafted inputs to make them say wildly offensive things, defend the Holocaust, and invent conspiracy theories. As AI becomes more embedded in the apps and websites we use every day, these attacks are expected to become more common.

While researchers and developers are working on ways to mitigate the effects of malicious prompts, there is no good way to prevent prompt injection attacks currently. According to Adam Hyland, a Ph.D. student at the University of Washington, the tools to fully model an LLM’s behavior don’t exist. Prompt injection attacks are trivially easy to execute and can be performed by anyone without much specialized knowledge, making them difficult to combat. However, there are potential solutions such as manually-created filters for generated content and prompt-level filters.

Companies such as Microsoft and OpenAI are already using filters to attempt to prevent their AI from responding in undesirable ways, but there is only so much filters can do. As users try to discover new exploits, it’ll be an arms race between them and the creators of AI to patch vulnerabilities and prevent attacks. Bug bounty programs may be a way to garner more support and funding for prompt mitigation techniques. Ultimately, as AI becomes more advanced and ubiquitous, it’s important to develop safeguards against prompt engineering attacks to ensure that AI technology is used responsibly and ethically.

Microsoft and MITRE Create Tool to Help Security Teams Prepare for Attacks on Machine Learning Systems

BusinessWire, March 2, 2023

Microsoft and MITRE have collaborated to develop a plug-in that integrates multiple open-source software tools to help cybersecurity practitioners better prepare for attacks on machine learning (ML) systems. The new tool, known as Arsenal, has been developed by building off Microsoft’s Counterfit as an automated adversarial attack library and implementing tactics and techniques defined in the MITRE ATLAS framework. This enables security professionals to emulate attacks on systems that contain ML without having an in-depth knowledge of ML or artificial intelligence (AI). The integration of the Arsenal plug-in into MITRE CALDERA allows security professionals to discover novel vulnerabilities within the building blocks of an end-to-end ML workflow and develop countermeasures and controls to prevent exploitation of ML systems deployed in the real world.

MITRE’s efforts in developing a family of tools for machine learning and AI systems for mission-critical applications have also addressed issues including trust, transparency, and fairness. Microsoft’s Counterfit tool enables researchers to implement a range of adversarial attacks on AI algorithms, while MITRE CALDERA is a platform that enables the creation and automation of specific adversary profiles. MITRE ATLAS is a knowledge base of adversary tactics, techniques, and case studies for ML systems based on real-world observations, demonstrations from ML red teams and security groups, and academic research. The Arsenal plug-in enables CALDERA to emulate adversarial attacks and behaviors using Microsoft’s Counterfit library.

The integration of these tools provides insights into how adversarial machine learning attacks play out, helping to improve user trust and enabling these systems to have a positive impact on society. The collaboration between Microsoft and MITRE on Arsenal is an example of MITRE’s efforts to address potential security flaws with machine learning systems. While other automated tools exist today, they are better suited to research that examines specific vulnerabilities within an ML system, rather than the security threats that system will encounter as part of an enterprise network. Creating a robust end-to-end ML workflow is necessary when integrating ML systems into an enterprise network and deploying these systems for real-world use cases. As the world looks to AI to positively change how organizations operate, it’s critical that steps are taken to help ensure the security of those AI and machine learning models. Microsoft and MITRE plan to continually evolve the tools by adding new techniques and adversary profiles as security researchers document new attacks on ML systems.