Towards Trusted AI Week 3 – Improving ChatGPT with Claude

Secure AI Weekly + Trusted AI Blog admin todayJanuary 19, 2023 119

Background
share close

Anthropic’s Claude improves on ChatGPT but still suffers from limitations

TechCrunch, January 9, 2023

Anthropic, a startup co-founded by former OpenAI employees, has developed an AI system called Claude that appears to improve upon OpenAI’s ChatGPT in several key ways. The system is currently only accessible through a closed beta Slack integration.

Claude was created using a technique called “constitutional AI”, which provides a principle-based approach to aligning AI systems with human intentions. The technique uses a list of around ten principles that form a sort of “constitution” for the AI system. These principles are grounded in the concepts of beneficence (maximizing positive impact), non-maleficence (avoiding giving harmful advice) and autonomy (respecting freedom of choice).

Anthropic had an AI system use these principles for self-improvement, writing responses to a variety of prompts and revising them in accordance with the constitution. This resulted in an AI system that is able to hold an open-ended conversation, tell jokes and respond to a broad range of subjects.

When compared to ChatGPT, Claude is better at answering trivia questions correctly, specifically those related to entertainment, geography, history and the basics of algebra. It is also better at telling jokes. However, Claude tends to be less concise in its responses and tends to explain what it said and ask how it can further help.

Read more in the full article at the link.

How to harden machine learning models against adversarial attacks

Security Boulevard, January 5, 2023

Machine learning (ML) is commonly used in malware detection alongside traditional methods such as signature-based detections and heuristics. ML is effective at detecting novel malware and can keep up with the evolution of malware and large amounts of data. However, ML is vulnerable to adversarial examples – inputs specifically designed by attackers to cause the model to make mistakes. Adversarial examples can lead to strange and unwanted behaviors and allow attackers to evade the detection of malicious files. Given that adversarial ML exploits software weaknesses, it should be treated like any other software vulnerability.

In the context of malware detection, adversarial examples are malicious samples that have been modified in some way to evade ML detection. Windows Portable Executable (PE) files, for example, can be modified in various ways. When generating PE adversarial samples, it is important to maintain the functionality of the program without compromising its execution.

To harden ML models against adversarial examples, ML models need to be strengthened before they are put into production. One way to do this is through adversarial training, which is performed by including adversarial samples with appropriate labels in the training data set. Another option is to use ensembles of ML models to create stronger predictions by combining the results of diverse models.

Read more about harden ML at the link

 

Subscribe for updates

Stay up to date with what is happening! Get a first look at news, noteworthy research and worst attacks on AI delivered right in your inbox.

    Written by: admin

    Rate it
    Previous post