Towards Trusted AI Week 31 – New LLM Jailbreak, Plugin hacks and more

Secure AI Weekly + Trusted AI Blog admin August 3, 2023 109

ChatGPT Has a Plugin Problem

Wired, July 25, 2023

Over the past eight months, OpenAI’s ChatGPT has dazzled millions with its ability to produce lifelike text, from stories to code. However, the development and rapid proliferation of plugins to extend ChatGPT’s capabilities have raised serious security concerns. The introduction of plugins has allowed the AI to perform various tasks like searching for flights, analyzing text on websites, and even specific niche functions. But this growth has also brought forth significant security risks, with experts warning that the current operation of plugins could expose personal data or be exploited by malicious entities.

Several vulnerabilities have been identified, such as the potential theft of chat histories, personal information, or remote code execution on a user’s computer. The issue of trust in plugins has become central, with fears that malicious activities like prompt injection attacks or harmful payloads could be facilitated. Companies like OpenAI and Microsoft are aware of these risks and are actively working to fortify their systems against potential exploits, implementing measures like plugin review, clear guidelines, and iterative approaches to managing risks. However, gaps still exist, such as the lack of clear information about plugin developers and how data may be used.

The broader concern surrounding plugins and large language models revolves around trust and integrity of private and corporate data. Around 450 security and AI specialists have identified key security threats, including malicious URLs, SQL attacks, data poisoning, and supply chain vulnerabilities. Efforts are underway to develop proper authentication and security protocols, but as Steve Wilson, Chief Product Officer at Contrast Security, highlights, the art of securing these systems is barely understood. As AI continues to evolve, the need for a robust security framework becomes increasingly vital. The innovative advancements of AI systems like ChatGPT hold immense potential, but with the complexity and novelty of the technology, developers and users must tread cautiously, keeping security at the forefront.

NightDragon Advisor Survey: Artificial Intelligence Technology Grows in Priority, Budget Investment

NightDragon, July 25, 2023

In recent times, artificial intelligence (AI) technology like ChatGPT has not only captured the fascination of everyday consumers but has also provided substantial benefits for businesses and enterprises. Its applications have stretched across various domains, enhancing efficiency, effectiveness, and security. It has altered the way organizations protect against the most recent cyber and physical threats, ranging from supply chain security to threat detection and beyond. Although some of these technologies are already in action, the true potential of AI in security is just starting to be explored.

For cybersecurity leaders such as Chief Information Security Officers (CISOs), AI’s potential as a foundational technology for investment and innovation is significant. This has been keenly observed by firms like NightDragon, who recognize the incredible innovation and defense capabilities that AI delivers. A survey conducted among NightDragon Advisor Council, consisting of renowned industry leaders, highlights strong agreement that implementing AI is a strategic necessity for their organization. However, the survey also revealed concerns about the increase in cybersecurity attacks against Generative AI tools and the urgency to secure AI tools.

Statements from various cybersecurity experts and CISOs elucidate the excitement around AI’s opportunities, specifically regarding automation of threat analysis, rapid adoption within cybersecurity architecture, improvement in identity access management, and force multiplication for CISO teams. They also emphasize the need for scaling security analysts’ productivity, rapid threat detection, and data synthesis. However, caution is expressed about the evolving challenges, especially as AI begins to generate code, necessitating control over the source data to prevent vulnerabilities like those seen in the log4j example. The acknowledgment that there is no one-size-fits-all AI solution, but an understanding of AI’s vast potential to enhance threat prediction and detection, highlights the dual nature of excitement and caution around this technology.

Frontier Threats Red Teaming for AI Safety

Anthropic, July 26, 2023

“Red teaming,” or adversarial testing, is becoming instrumental in evaluating and increasing the safety and security of AI systems, especially in areas relevant to national security. A recent collaborative commitment, including intensive investments in specialized areas like biosecurity and cybersecurity, highlights the urgent need for assessing risks and creating repeatable ways to deal with frontier threats. This involves working with domain experts to define threat models, probe AI’s true capabilities, and build new automated evaluations for scalable, repeatable processes. A focused project on biological risks has revealed potential national security risks if left unmitigated, emphasizing the necessity for proactive risk identification and the implementation of substantial mitigation measures.

A rigorous six-month red teaming exercise with biosecurity specialists unearthed concerns about the ability of frontier models to generate harmful biological information. The discovery process showed that unmitigated Language Model (LLMs) might accelerate harmful efforts, possibly materializing in the near term. However, essential mitigation strategies have been found to reduce harmful outputs meaningfully. The challenge moving forward is to scale up efforts, collaborate with industry developers and government agencies, and prepare for potential threats from models not yet subject to red teaming. The goal is to evaluate nascent risks and mitigate them before they escalate.

Frontier threats red teaming in national security is an urgent, timely endeavor. Collaboration between governments, labs, stakeholders, and third parties is vital to build comprehensive threat models, evaluations, and safeguards. The focus also extends to other potential risks, such as deception, requiring identifying future undesired capabilities and implementing alignment techniques. Anthropic is dedicated to fostering this research team, inviting mission-driven technical researchers to join in ensuring AI’s security and safety on both national and global scales. The ongoing efforts symbolize the relentless pursuit of a secure future in the age of AI.

Researchers Poke Holes in Safety Controls of ChatGPT and Other Chatbots

NY Times, July 27, 2023

Artificial Intelligence (AI) has been making significant strides in various applications, and one prominent area is online chatbots such as ChatGPT, Claude, and Google Bard. To ensure these platforms remain free from the generation of hate speech, disinformation, and other hazardous content, AI companies invest considerable time and resources in constructing safety barriers or “guardrails.” These measures are designed to create a secure environment for users, fostering positive interactions and responsible information exchange.

However, recent research has cast doubt on the effectiveness of these guardrails. A report published by researchers at Carnegie Mellon University and the Center for A.I. Safety in San Francisco has exposed vulnerabilities ( namely Universal Jailbreaks) that allow for circumventing these security measures. The findings demonstrated that these safety systems are not foolproof, as they can be exploited to generate nearly endless streams of harmful information through any of the leading chatbots. The repercussions of such a security lapse could be severe, potentially leading to a deluge of false and perilous information flooding the internet.

This research not only highlights the immediate security challenges faced by AI developers but also points to a broader concern in the technology industry. Disagreements among top AI companies are contributing to an increasingly unpredictable environment. The need for standardized security protocols and shared understanding of ethical guidelines has never been more apparent. In light of these findings, there may be a renewed call for collaborative efforts among AI companies, policymakers, and researchers to address these security gaps. The goal must be to build chatbots and other AI systems that uphold the principles of safety, accuracy, and integrity, thereby preserving trust and reliability in this rapidly advancing technology.

The Prompt: Findings from our AI Red Team’s first report (Q&A)

Google Blog, July 27, 2023

Artificial Intelligence (AI) has marked 2023 as a banner year for technological innovation, especially in the field of security. Amid this surge of interest and rapid progression, Google has been at the forefront, emphasizing the importance of security-testing AI systems. Phil Venables, Vice President and CISO at Google Cloud, has recently addressed the urgent need for clear industry security standards. Google’s stance highlights the responsibility of building AI through rigorous testing for security weaknesses, including the use of red teams to simulate realistic threats, as exemplified in the recent AI Red Team report at the Aspen Security Forum.

The practice of red teaming, where friendly hackers identify weaknesses, has emerged as a strategic approach to AI security. Google’s AI Red Team focuses on specific tactics likely to be used against AI, such as prompt attacks, data poisoning, and extraction of training data. Royal Hansen, Vice President of Privacy, Safety, and Security Engineering at Google, emphasized the role of red teaming in AI security, detailing the traditional security controls that can mitigate risks. The report emphasizes collaboration with security and AI experts for realistic simulations, a methodology central to preparing organizations for potential attacks on AI systems.

Google’s path towards a secure AI future includes milestones like the launch of the Secure AI Framework (SAIF), aimed at mitigating specific risks to AI systems. The approach also involves blurring the lines between safety and security, calling for collaborative efforts across various sectors. While recognizing the tremendous opportunities AI offers to improve the world, the necessity of policy-level guardrails and new regulatory requirements is clear. By focusing on responsible practices, continuous learning, and collaboration, Google and other industry leaders are paving the way to harness AI’s benefits without falling prey to potential security pitfalls. The ongoing dialogue and the commitment to responsibility and security in AI’s development and deployment set the stage for a future where AI can be both transformative and secure.