Researchers VS researchers: getting over exhisting defences

Adversarial ML admin February 28, 2021 45

One of the challenges for AI security researchers is to find vulnerabilities in existing defenses. This is what one of the most interesting studies of February 2021 is devoted to.

Oriole: Thwarting Privacy against Trustworthy Deep Learning Models

Facial recognition technology that has become widespread in almost all spheres of human activity – from surveillance cameras to financial and payment management. Deep neural facial recognition models work with a huge amount of critical data and the issue of their resistance to attacks is especially acute. Recently, the new Fawkes system was presented aiming to neutralize privacy threats. The method was based on uploading cloaked user images instead of the original ones.

Therefore, in order to further explore this issue, the researchers suggest Oriole, a neural-based system that has the advantages of both data poisoning attacks and evasion attack and makes attack models insensitive to the defense of Fawkes. The Oriole method is based on making the most relevant multi-cloaks based on the leaked photos and putting them into the training data during the training phase. After that, uncloaked images are cloaked with Fawkes and then fed into the victim model. In their study, the researchers discuss the main vulnerabilities of Fawkes as well as the ways it becomes possible to make the system indifferent to it.

TAD: Trigger Approximation based Black-box Trojan Detection for AI

Deep Neural Networks are widely used across almost all the domains of human activity including the security-sensitive ones. The depressing fact is that DNs are vulnerable to Neural Trojan (NT) attacks. The essence of the attacks is that they are controlled and activated by the hidden trigger, and such a vulnerable model is called adversarial artificial intelligence. There have been earlier studies devoted to the detection of DNN Trojans based on stimulating the victim model to incorrect outputs, but the effectiveness of such methods was not high enough.

In this work, researchers aimed to come up with a robust Trojan detection scheme. This should be able to detect if a pre-trained model has been affected by Trojan before it is deployed. TAD is the first trigger approximation based Trojan detection framework. The investigated method is based on the opinion that the pixel trigger normally has spatial dependency which allows you to quickly produce a search of the trigger in the input space as well as finding Trojans embedded in the feature space with special filter transformations implied for theTrojan activation.

Efficient Certified Defenses Against Patch Attacks on Image Classifiers

Autonomous systems, for example self-driving cars or robots, are potentially one of the main targets of adversarial patch attacks. These are mostly performed via the perception component of the device and the consequences of such attacks can be critical. When it comes to safety-critical, these often have a fail-safe fallback component aiming to enhance robustness against patches with high performance on clean inputs. The research presents BAGCERT combining high accuracy, clean performance, efficient inference, and end-to-end training for robustness against patches of different sizes and locations.

Written by: admin

Rate it