Detecting Adversarial Attacks with Subset Scanning

5 min readNov 14, 2022

Deep neural networks are susceptible to adversarial perturbations of their input data that can cause a sample to be incorrectly classified. These perturbations contain small variations in the pixel space that cannot be detected by a human but can change the output of a classifier.

Reliably detecting attacks in a given set of inputs is of high practical relevance due to the vulnerability of neural networks to adversarial examples. These altered inputs create a security risk in applications with real-world consequences, such as self-driving cars, robotics, and financial services.

One way to classify adversarial attacks is by their threat models, of which there are two main types: white-box and black-box. In the white-box approach, an attacker has complete access to the model, including its structure and trained weights. Several examples of white-box attacks are used in this work, such as Basic Iterative Method (BIM), Fast Gradient Signal Method (FGSM), DeepFool (DF). In the black-box approach, an attacker can only access the outputs of the target model. You can generate attacks and test detection and defense mechanisms in python with the Adversarial Robustness Toolbox.

Figure from [0].

Particularly, we will showcase how to use the Subset Scanning Detector. This method treats Neural Networks as data-generating systems and applies anomalous pattern detection methods to activation data. Subset Scanning can efficiently search over a large combinatorial space in order to find groups of samples that differ the most from ‘expected’ behavior and could contain an adversarially-attacked image. If you’re interested in the methodology, you can check out [1].

Some goodies about this type of approach:

We can provide attack detection capabilities at run time.
We can abstract from domains (audio, image, tabular data) and focus only on the deep representation of the input.
No need to re-train or have labeled examples of the adversarial attacks during training or our detector.

The Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART is hosted by the Linux Foundation AI & Data Foundation (LF AI & Data). ART provides tools that enable developers and researchers to defend and evaluate Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference.

Below, we will show an example of how to use ART to build experiments around Adversarial Attack detection with Subset Scanning Methods.

Make sure to install these packages in your virtualenv:

pip install keras adversarial-robustness-toolbox pillow tensorflow seaborn

For this example we will implement multiple steps:

1. Load the dataset that we want to use.

from art.utils import load_dataset(x_train, y_train), (x_test, y_test), min_, max_ = load_dataset(str("mnist"))
x_train, y_train = x_train[:5000], y_train[:5000]
x_test, y_test = x_test[:1000], y_test[:1000]

2. Build the model that we want to attack. This is a simple CNN from the Keras examples. After we build our model is important to create the KerasClassifier from the ART library.

from keras.models import Model, Sequential
from keras.layers import Input, Dense, Flatten, Conv2D, MaxPooling2D, Dropout
from art.estimators.classification import KerasClassifier
import numpy as np
import tensorflow as tftf.compat.v1.disable_eager_execution()model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation="relu", input_shape=x_train.shape[1:]))
model.add(Conv2D(64, (3, 3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(10, activation="softmax"))model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])classifier = KerasClassifier(model=model, clip_values=(min_, max_))
classifier.fit(x_train, y_train, nb_epochs=5, batch_size=128)

3. Then, we need to generate attacks for the chosen model, in this case, we will use the Fast Gradient method, but there are several attacks available at ART.

from art.attacks.evasion.fast_gradient import FastGradientMethodattacker = FastGradientMethod(classifier, eps=0.5)
x_train_adv = attacker.generate(x_train)
x_test_adv = attacker.generate(x_test)

4. Now, we’re ready to test our subset scanning detector. We need to provide to our subset scanning method, the model that we want to scan over (classifier), the background data, to extract the activations used to build expectation and which layer we want to scan over.

from art.defences.detector.evasion.subsetscanning import SubsetScanningDetectordetector = SubsetScanningDetector(classifier, x_train, layer=1)
clean_scores, adv_scores, dpwr = detector.scan(x_test, x_test_adv)

5. As a result the detector will return the detection power of our scanning method for a given set of test data and the scores assigned to clean and adversarial samples.

These scores can be plotted with seaborn and matplotlib. Looking at our plot we can see a clear separation of these distributions which shows a high detection power (in this case dpwr=0.999978).

import seaborn as sns
import matplotlib.pyplot as pltsns.kdeplot(clean_scores, fill=True, label='clean images')
sns.kdeplot(adv_scores, fill=True, label='attacked images')
plt.title('Distribution of Subset Scores for layer 1')
plt.legend()
plt.ylabel('Density')
plt.xlabel('Subset Score')
plt.show()

References

[0] Chen, J., Jordan, M.I. and Wainwright, M.J., 2020, May. Hopskipjumpattack: A query-efficient decision-based attack. In 2020 ieee symposium on security and privacy (sp) (pp. 1277–1294). IEEE.

[1] Cintas, C., Speakman, S., Akinwande, V., Ogallo, W., Weldemariam, K., Sridharan, S. and McFowland, E., 2020. Detecting Adversarial Attacks via Subset Scanning of Autoencoder Activations and Reconstruction Error . IJCAI 2020 — Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence Main track. Pages 876–882.

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Subscribe to our fast-growing Medium Publication too, the ODSC Journal, and inquire about becoming a writer.

Detecting Adversarial Attacks with Subset Scanning

References

Written by ODSC - Open Data Science

No responses yet