Towards Trusted AI Week 16 – ChatGPT and the Future of AI Security

Secure AI Weekly + Trusted AI Blog admin April 20, 2023 225

UNIVERSAL LLM JAILBREAK: CHATGPT, GPT-4, BARD, BING, ANTHROPIC, AND BEYOND

Adversa AI, April 13, 2023

Artificial Intelligence (AI) has made significant advancements in recent years, particularly in the field of large language models (LLMs). These LLMs, such as OpenAI ChatGPT, Google BARD, and Microsoft BING, have revolutionized the way we interact with technology by understanding and generating human-like text. However, these models are not perfect, and their security restrictions can be bypassed in multiple ways. The first security review for ChatGPT, the first GPT-4 jailbreak, was published just two hours after its public release. Researchers have demonstrated several new methods for AI Red Teaming LLMs and developed new techniques and combinations to explore new security aspects of LLMs.

One of the new methods developed is the Universal LLM Jailbreak, a method to unlock the full potential of all LLMs and bypass their restrictions. By “jailbreaking” these models, users can harness their capabilities for various “bad” applications such as drug production, hate speech, crime, malware development, phishing, and pretty much everything that is restricted by AI Safety rules. While the Universal LLM Jailbreak provides exciting possibilities, it raises ethical concerns. Ensuring responsible use is crucial to prevent malicious applications and safeguard user privacy. Researchers intended to demonstrate a proof of concept and increase the attention of LLM vendors and enterprises implementing LLMs who might not be aware of potential issues.

It’s important to mention that while asking LLMs to produce drugs or hotwire a car may not seem critical, the main point of demonstrating such jailbreaks is to show a fundamental security vulnerability of LLMs to logic manipulation. Such logic manipulations can be used for a variety of ways to exploit AI applications depending on how the AI model is implemented as part of a business process and what kind of critical decisions are outsourced to such a model. Once enterprises implement AI models at scale, such “toy” Jailbreak examples will be used to perform actual criminal activities and cyberattacks, which will be extremely hard to detect and prevent. Therefore, the Universal LLM Jailbreak provides an opportunity for researchers and enterprises to understand the potential security vulnerabilities of LLMs and develop necessary safety measures to prevent malicious applications.

The Hacking of ChatGPT Is Just Getting Started

Wired, April 13, 2023

Generative artificial intelligence (AI) systems are becoming increasingly popular, but as their use grows, so do concerns about their security. A small group of experts, including security researchers, technologists, and computer scientists, are working on methods to bypass the safety measures put in place by AI companies. Using jailbreaks and prompt injection attacks, these experts are finding ways to make the systems produce harmful or illegal content.

Alex Polyakov, the CEO of Adversa AI, was able to break GPT-4, OpenAI’s latest text-generating chatbot, within a few hours. He achieved this by using carefully crafted prompts to bypass OpenAI’s safety systems. Polyakov is not alone in his efforts, and as more experts work on jailbreaks and prompt injection attacks against ChatGPT and other generative AI systems, the risks of cyberattacks and data thefts are increasing.

While these attacks are currently being used to bypass content filters, experts warn that the widespread use of generative AI systems could lead to serious consequences. Polyakov has created a “universal” jailbreak that can trick multiple large language models into generating instructions on creating meth and how to hotwire a car. As these jailbreaks and prompt injection attacks become more sophisticated, the consequences could be severe. Successful attacks could result in worms that rapidly spread across the internet, causing chaos and disruption on a global scale. As enterprises implement AI models at scale, the use of these “toy” jailbreak examples for actual criminal activities and cyberattacks is also a concern.

Machine Learning for High-Risk Applications: Approaches to Responsible AI

Amazon

Over the past ten years, the use of artificial intelligence and machine learning technologies has become increasingly widespread. However, their unregulated implementation has resulted in several incidents and negative consequences that could have been prevented with proper risk management. In order to fully realize the benefits of AI and ML, it is essential for practitioners to understand how to minimize the risks associated with their use.

A comprehensive approach to responsible AI is outlined in this book, which incorporates best practices in risk management, cybersecurity, data privacy, and social science. Patrick Hall, James Curtis, and Parul Pandey created this guide for data scientists who want to improve the outcomes of AI/ML systems in real-world settings for the benefit of organizations, consumers, and the public.

This resource provides technical strategies for responsible AI, including methods for ensuring explainability, validating and debugging models, managing bias, protecting data privacy, and enhancing ML security. The book also offers guidance for establishing a successful and impactful AI risk management practice.

Additionally, the book provides an introduction to existing standards, laws, and assessments for implementing AI technologies, such as the recently released NIST AI Risk Management Framework. Interactive resources are available on GitHub and Colab for readers to engage with and further their understanding of responsible AI practices. By implementing these approaches, practitioners can help ensure the security and ethical use of AI/ML technologies.

China Mandates Security Reviews for AI Services Like ChatGPT

Bloomberg, April 11, 2023

China’s Cyberspace Administration has announced draft guidelines that will require a security review of generative AI services before they are permitted to operate. The proposed regulations state that AI operators must ensure content accuracy, respect intellectual property, and not endanger security or discriminate. Additionally, the AI-generated content must be clearly labelled. The move is part of China’s growing effort to regulate the rapid expansion of generative AI since OpenAI’s ChatGPT made its debut last year. This development is matched overseas, with companies including Google and Microsoft exploring the potential of generative AI.

The Chinese authorities’ plan reflects the country’s desire to raise its AI profile during its current dispute with the US over technology. However, the government has yet to make clear how it will regulate and police the industry. While these regulations portend existing rules governing data and content, some commentators argue that China may avoid constraining its local companies too much for fear of stifling innovation. The government’s regulatory approach appears cautious so far, according to Angela Zhang, an associate professor of law at the University of Hong Kong.

Alibaba, SenseTime, and Baidu are among the companies competing to develop the definitive next-generation AI platform. Alibaba plans to incorporate generative AI into its Slack-like work app and Echo-like smart speakers, while SenseTime has unveiled the AI model SenseNova and the chatbot SenseChat. However, the new guidelines have affected the Chinese stock market, with Baidu experiencing a 7% slide, and Chinese carriers and AI-linked stocks down sharply. Analysts suggest that the regulations will likely influence the way that AI models are trained in China going forward, although their impact will depend on how regulators interpret the provisions of the notice.

Addressing the Security Risks of AI

LawFare, April 11, 2023

The security of artificial intelligence (AI) systems has recently emerged as a major concern in the tech industry. While much attention has been given to the risks associated with large language models (LLMs) like GPT-4, little has been written about the vulnerabilities of many AI-based systems to adversarial attacks. A recent report from Stanford and Georgetown highlights the real security risks associated with AI-based systems and recommends actions that policymakers and developers can take to address the issue.

The report outlines how AI systems, especially those based on machine learning, are highly vulnerable to a range of attacks. Researchers have outlined how evasion, data poisoning, and exploitation of traditional software flaws could compromise AI systems and render them ineffective. The report emphasizes that real-world implementations of AI are vulnerable to malicious compromise, and with the continued incorporation of AI models into a wider range of use cases, the frequency of deep learning-based attacks will grow.

The report’s recommendations include incorporating AI security concerns within cybersecurity programs, more collaboration between cybersecurity practitioners, machine learning engineers, and adversarial machine learning researchers, and establishing a trusted forum for incident information sharing. Government agencies with authority over cybersecurity should clarify how AI-based security concerns fit into their existing regulatory structures, rather than creating sweeping legislative action. As the adoption of AI technologies continues to grow, it is crucial to take immediate action to address the security risks associated with AI-based systems.