Carrying out attacks on machine learning models as part of the study is necessary for further successful work on potential vulnerabilities. And here is a selection of the most interesting studies for April 2022. Enjoy!
Despite the fact that adversarial attacks have already been the subject of a huge amount of research, earlier work has paid almost no attention to adversarial attacks on optical neural networks (ONNs). Shuming Jiao, Ziwei Song, and Shuiying Xiang in a new study will first create an accurate image classifier with ONN using a grid of interconnected Mach-Zehnder interferometers (MZI) – the first time a corresponding adversary attack scheme is proposed in the work.
Attacked images are visually highly believable for the system and remind the original ones, and the ONN system in most cases gives incorrect classification results. According to the results of the study, the adversarial attack can be effectively applied to optical machine learning systems.
Spiking Neural Networks (SNN) is an alternative to Deep Neural Networks (DNN) that is gaining more and more popularity.
Compared to the latter, SNNs have more processing power and are highly energy efficient. Of course, nothing can be done, because in practice it turned out that such systems contain vulnerable assets – in particular, the threshold voltage of the neuron and the sensitivity of the classification accuracy to changes in the threshold voltage of the neuron. These weaknesses can be exploited by attackers.
In this paper, Karthikeyan Nagarajan, Junde Li, Sina Sayyah Ensan, Mohammad Nasim Imtiaz Khan and other researchers consider global error injection attacks using external power supplies and local power failures. These glitches are laser-induced to distort essential learning parameters, such as pulse amplitude and neuron membrane threshold potential, in SNNs designed using conventional analog neurons. The work studies the impact of attacks on the tasks of classifying numbers and finds that in the worst case, the classification accuracy is reduced by 85.65%. In addition, experts offer appropriate protections and a fictitious neuron-based fault detection system.
Recently, Neural Ranking Models (NRMs) have shown spectacular results – especially with pre-trained language models in cases. Unfortunately, we are also aware of the vulnerabilities of such deep neural network models to hostile examples. Such attacks could potentially form the basis of web spamming techniques given the growing popularity of neural information search models.
In this paper, Chen Wu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, and others consider an adversarial document ranking attack (ADRA) problem against NRM. The latter works to promote the target document in the ratings by adding adversarial outrages to its text. Specialists pay attention to setting up a decision-based black box attack, provided that attackers do not have access to model parameters and gradients, but can obtain the ranking positions of a partially extracted list by querying the target model. This attack setup is actually feasible in real search engines. So the new ADversarial Pseudo-Relevance-Based Ranking (PRADA) method learns a pseudo-relevant feedback (PRF) surrogate model to generate gradients to search for hostile perturbations. According to the results of experiments, PRADA can outperform existing attack strategies and successfully fool NRM with small text distortions.
Sometimes the training of machine learning models can be delegated to a service provider. In a new study, Shafi Goldwasser, Michael P. Kim, Vinod Vaikuntanathan, and Or Zamir demonstrate how an attacker can inject an undetectable backdoor into a classifier. The operation of such a closed classifier is not at first suspicious, but the learner supports the mechanism of changing the classification of any input with only slight perturbation. At the same time, without the necessary “backdoor key”, the mechanism cannot be detected by any observer with limited computing capabilities.
First, the experts demonstrate how to inject a backdoor into any model using digital signature schemes. Given the black box access to the original model and the backdoor version, it is computationally impossible to find at least one input that differs, so the backdoor model has a generalization error comparable to the original model. The researchers also show how to insert undetected backdoors into models trained using the Random Fourier Function (RFF) learning paradigm or ReLU random networks.
The construct also addresses the issue of resistance to hostile instances – it can be used to create a classifier that is indistinguishable from a classifier that is “adversary resistant” – and each input to it will be an adversarial example.
Subscribe for updates
Stay up to date with what is happening! Get a first look at news, noteworthy research and worst attacks on AI delivered right in your inbox.