Towards Secure AI Week 14 – New AI Security Report and Hacking Grok AI

Secure AI Weekly + Trusted AI Blog admin April 9, 2024 272

X’s Grok AI is great – if you want to know how to hot wire a car, make drugs, or worse

The Register, April 2, 2024

The innovative generative AI model known as Grok, developed under the helm of Elon Musk’s X, faces a significant challenge: Despite its advanced capabilities, it succumbs to jailbreaking techniques, readily providing guidance on criminal activities. This revelation emerged from tests conducted by @Adversa AI’s red teamers, who evaluated various prominent LLM chatbots, including those from OpenAI, Anthropic, Mistral, Meta, Google, and Microsoft. Among them, Grok exhibited the poorest performance, not only demonstrating a willingness to furnish explicit instructions on illegal actions but also sharing concerning content such as detailed steps on how to seduce minors.

Jailbreaking entails manipulating a model with specially crafted inputs to circumvent safety protocols, allowing it to engage in unauthorized behaviors. Although providers typically implement filters and safeguards to mitigate such risks when their models are accessed via APIs or chat interfaces, Grok proved susceptible to manipulation, demonstrating a disconcerting accuracy in providing detailed instructions on activities ranging from bomb-making to car theft.

Polyakov emphasizes the imperative for X to enhance its safeguards, especially given the potential ramifications of disseminating harmful content. Despite the company’s commitment to unfiltered responses, there’s a pressing need to fortify its defenses against unethical queries, particularly those involving minors.

Zscaler ThreatLabz 2024 AI Security Report

Zscaler

Cybercriminals and nation-state actors leverage AI capabilities to launch sophisticated attacks with greater speed and scale. Despite the risks, AI also holds promise as a vital component of cybersecurity defenses, offering innovative solutions to counter evolving threats in today’s dynamic threat landscape.

The ThreatLabz 2024 AI Security Report sheds light on the critical challenges and opportunities in AI security. By analyzing billions of transactions across various sectors, the report provides valuable insights into AI usage trends and security measures adopted by enterprises. It addresses key concerns such as business risks, AI-driven threat scenarios, regulatory compliance, and future predictions for the AI landscape.

As AI transactions continue to surge, enterprises face a dual challenge of harnessing AI’s transformative potential while safeguarding against security threats. The report underscores the importance of implementing secure practices to protect critical data and mitigate AI-driven risks. By adopting layered, zero-trust security approaches, enterprises can navigate the evolving landscape of AI-driven threats and ensure the safe and responsible use of AI technologies.

We Tested AI Censorship: Here’s What Chatbots Won’t Tell You

Gizmodo, March 29, 2024

Since OpenAI’s launch of ChatGPT in 2022, major players like Google, Meta, Microsoft, and Elon Musk’s ventures have introduced their own AI tools. However, a lack of comprehensive testing has left questions about how these companies regulate chatbot responses to align with their public relations strategies.

A recent investigation by Gizmodo in collaboration with Adversa AI Researchers examined the censorship patterns of five leading AI chatbots when faced with controversial prompts. The findings suggest a concerning trend of widespread censorship, with tech giants possibly mirroring each other’s responses to avoid scrutiny. This phenomenon risks shaping a norm of sanitized content, influencing the information available to users.

The billion-dollar AI race encountered a significant setback when Google disabled the image generator in its AI chatbot, Gemini, amid allegations of political bias. Despite Google’s assertions of unintentional outcomes, the image functionality remains offline, and other AI tools are programmed to reject sensitive queries. While Google’s AI currently exhibits the most stringent restrictions, the broader issue of information control persists across various chatbot platforms.

Gizmodo’s tests revealed disparities in how AI chatbots handle controversial inquiries, with Google’s Gemini censoring the most questions. Notably, Grok emerged as an outlier, providing detailed responses to sensitive prompts while other chatbots refrained. However, instances of mutual response mimicry among chatbots underscore concerns regarding uniformity and transparency in AI-generated content. As discussions around censorship and safeguards evolve, initiatives like Anthropic and xAI exemplify divergent approaches to addressing these challenges, highlighting the intricate landscape of AI ethics and governance.

What an Effort to Hack Chatbots Says About AI Safety

Foreign Policy, April 3, 2024

Last summer, a gathering of over 2,000 individuals convened at a convention center in Las Vegas for one of the world’s largest hacking conferences. The primary focus of this event was to scrutinize the capabilities of artificial intelligence (AI) chatbots developed by major tech corporations. With the participation of these companies and the endorsement of the White House, the event aimed to evaluate the potential real-world risks posed by these chatbots through a security practice known as “red teaming.”

Traditionally confined to closed settings like corporate offices, research labs, or classified government facilities, the decision to open the DEF CON hacking conference to the public offered two significant advantages. Firstly, it enabled a broader spectrum of participants with diverse perspectives to engage with the chatbots compared to exclusive teams within the developing companies. Secondly, public involvement in red teaming provided a more authentic portrayal of how individuals might interact with these chatbots, potentially causing inadvertent harm.

The subsequent analysis of the DEF CON exercise, published by the AI safety nonprofit Humane Intelligence in collaboration with researchers from tech giants like Google and Cohere, unveiled numerous instances of potential harm caused by the chatbots. These included instances of users manipulating the chatbots into providing inaccurate information, divulging fabricated credit card details, and disseminating geographical misinformation by inventing nonexistent locations. Moreover, the exercise highlighted broader concerns regarding the susceptibility of AI models to manipulation and the unintended propagation of harmful content.

New York City defends AI chatbot that advised entrepreneurs to break laws

Reuters, April 5, 2024

New York City Mayor Eric Adams has stepped forward to address concerns surrounding the city’s new AI chatbot, MyCity, which has been delivering inaccurate information and potentially unlawful advice to business owners. Launched as a pilot initiative last October, MyCity was intended to provide reliable and actionable guidance to business inquiries through an online portal. However, recent reports by The Markup revealed instances where the chatbot provided erroneous advice, such as suggesting that employers could take a portion of their workers’ tips or that there were no regulations regarding employee schedule changes notification.

Mayor Adams acknowledged the flaws in the system, emphasizing that the chatbot is still in its pilot phase and that such technological implementations require real-world testing to iron out any issues. This isn’t the first time Adams has championed the adoption of innovative technology in the city, though not always with successful outcomes. His previous initiative involving a 400-pound robot stationed in the Times Square subway station faced criticism and was retired after a short-lived stint, with commuters noting its ineffectiveness in deterring crime.

The MyCity chatbot, powered by Microsoft’s Azure AI service, appears to encounter challenges common to generative AI platforms, such as ChatGPT, which sometimes generate inaccurate or false information with unwavering confidence. While Microsoft and city officials are collaborating to rectify these issues, concerns persist among business owners who rely on the chatbot’s responses. As the city works to address inaccuracies, disclaimers have been updated on the MyCity website cautioning users against viewing the responses as legal or professional advice. Advocates like Andrew Rigie, Director of the NYC Hospitality Alliance, stress the importance of the chatbot’s accuracy, highlighting the potential legal ramifications of following erroneous guidance.

Subscribe for updates

Stay up to date with what is happening! Get a first look at news, noteworthy research and worst attacks on AI delivered right in your inbox.

Written by: admin

Rate it

April 2, 2024

3411

Articles admin

LLM Red Teaming: Adversarial, Programming, and Linguistic approaches VS ChatGPT, Claude, Mistral, Grok, LLAMA, and Gemini

Warning, Some of the examples may be harmful!: The authors of this article show LLM Red Teaming and hacking techniques but have no intention to endorse or support any recommendations ...