If convolutional neural network is the former movie king, generative confrontation has become a new bright star in the field of deep learning, which will completely change the way we perceive the world. Confrontation learning training provides a new idea for guiding artificial intelligence to complete complex tasks. Generating confrontation pictures can easily fool the previously trained classifiers. Therefore, how to use the generated confrontation pictures to improve the robustness of the system is a hot issue. The countermeasure samples synthesized by neural network are easy to surprise people, because the small and elaborate disturbance to the input may lead to the neural network misclassifying the input in an arbitrary way. In view of the fact that the counter sample can become very powerful by transferring to the material world, this is a security issue worthy of attention. For example, face recognition, if a confrontation image is also recognized as a real person, there will be some security risks and huge losses. Readers interested in generating confrontation images can pay attention to the recent Kaggle challenge NIPS.

In this article, we will guide readers to use TensorFlow to implement a simple algorithm to synthesize countermeasure samples, and then use this technology to establish a robust countermeasure example.

import tensorflow as tf import tensorflow.contrib.slim as slim import tensorflow.contrib.slim.nets as nets tf.logging.set_verbosity(tf.logging.ERROR) sess = tf.InteractiveSession()

First, set the input image. Use tf.Variable instead of tf.placeholder to ensure that it is trainable. We can still enter it when we need it.

image = tf.Variable(tf.zeros((299, 299, 3)))

Next, load the Inception v3 model.

def inception(image, reuse): preprocessed = tf.multiply(tf.subtract(tf.expand_dims(image, 0), 0.5), 2.0) arg_scope = nets.inception.inception_v3_arg_scope(weight_decay=0.0) with slim.arg_scope(arg_scope): logits, _ = nets.inception.inception_v3( preprocessed, 1001, is_training=False, reuse=reuse) logits = logits[:,1:] # ignore background class probs = tf.nn.softmax(logits) # probabilities return logits, probs logits, probs = inception(image, reuse=False)

Next, load the weights of the pre training. The accuracy of top-5 of this Inception v3 is 93.9%.

import tempfile from urllib.request import urlretrieve import tarfile import os data_dir = tempfile.mkdtemp() inception_tarball, _ = urlretrieve( 'http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz') tarfile.open(inception_tarball, 'r:gz').extractall(data_dir) restore_vars = [ var for var in tf.global_variables() if var.name.startswith('InceptionV3/') ] saver = tf.train.Saver(restore_vars) saver.restore(sess, os.path.join(data_dir, 'inception_v3.ckpt'))

Next, write some code to display the image, classify it and display the classification results.

import json import matplotlib.pyplot as plt imagenet_json, _ = urlretrieve( 'http://www.anishathalye.com/media/2017/07/25/imagenet.json') with open(imagenet_json) as f: imagenet_labels = json.load(f) def classify(img, correct_class=None, target_class=None): fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 8)) fig.sca(ax1) p = sess.run(probs, feed_dict={image: img})[0] ax1.imshow(img) fig.sca(ax1) topk = list(p.argsort()[-10:][::-1]) topprobs = p[topk] barlist = ax2.bar(range(10), topprobs) if target_class in topk: barlist[topk.index(target_class)].set_color('r') if correct_class in topk: barlist[topk.index(correct_class)].set_color('g') plt.sca(ax2) plt.ylim([0, 1.1]) plt.xticks(range(10), [imagenet_labels[i][:15] for i in topk], rotation='vertical') fig.subplots_adjust(bottom=0.2) plt.show()

Example image

Load the sample image and make sure it is correctly classified.

import PIL import numpy as np img_path, _ = urlretrieve('http://www.anishathalye.com/media/2017/07/25/cat.jpg') img_class = 281 img = PIL.Image.open(img_path) big_dim = max(img.width, img.height) wide = img.width > img.height new_w = 299 if not wide else int(img.width * 299 / img.height) new_h = 299 if wide else int(img.height * 299 / img.width) img = img.resize((new_w, new_h)).crop((0, 0, 299, 299)) img = (np.asarray(img) / 255.0).astype(np.float32) classify(img, correct_class=img_class)

Countermeasure sample

Given an image x, the probability distribution on the output label of the neural network is P(y|X). When making confrontation input manually, we want to find an X ', so that logP(y'|X ') is maximized to target label y', that is, the input will be incorrectly classified as target class. By constraining some ℓ∞ radius to ε Box, required ‖ X- X '‖∞≤ ε， We can make sure that x 'doesn't look the same as the original X. In this framework, the countermeasure sample is to solve a constrained optimization problem, which can be solved by back propagation and projection gradient descent. Basically, the same technology as the training network itself is used. The algorithm is very simple: First initialize the counter sample to X '← X. then, repeat the following process until convergence:

1. X'←X^+α⋅∇logP(y'|X') 2. X'←clip(X'，X - ε，X+ε)

initialization

Start with the simplest part: write a TensorFlow op to initialize it accordingly.

x = tf.placeholder(tf.float32, (299, 299, 3)) x_hat = image # our trainable adversarial input assign_op = tf.assign(x_hat, x)

Gradient descent step

Next, a gradient descent step is written to maximize the logarithmic probability (or minimize the cross entropy) of the target class.

learning_rate = tf.placeholder(tf.float32, ()) y_hat = tf.placeholder(tf.int32, ()) labels = tf.one_hot(y_hat, 1000) loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=[labels]) optim_step = tf.train.GradientDescentOptimizer( learning_rate).minimize(loss, var_list=[x_hat])

Projection steps

Finally, the projection step is written so that the countermeasure sample is visually similar to the original image. In addition, it is limited to a valid image within the range of [0,1].

epsilon = tf.placeholder(tf.float32, ()) below = x - epsilon above = x + epsilon projected = tf.clip_by_value(tf.clip_by_value(x_hat, below, above), 0, 1) with tf.control_dependencies([projected]): project_step = tf.assign(x_hat, projected)

implement

Finally, we are ready to synthesize a countermeasure sample. We arbitrarily choose "guacamole" (imagenet class 924) as our target class.

demo_epsilon = 2.0/255.0 # a really small perturbation demo_lr = 1e-1 demo_steps = 100 demo_target = 924 # "guacamole" # initialization step sess.run(assign_op, feed_dict={x: img}) # projected gradient descent for i in range(demo_steps): # gradient descent step _, loss_value = sess.run( [optim_step, loss], feed_dict={learning_rate: demo_lr, y_hat: demo_target}) # project step sess.run(project_step, feed_dict={x: img, epsilon: demo_epsilon}) if (i+1) % 10 == 0: print('step %d, loss=%g' % (i+1, loss_value)) adv = x_hat.eval() # retrieve the adversarial example step 10, loss=4.18923 step 20, loss=0.580237 step 30, loss=0.0322334 step 40, loss=0.0209522 step 50, loss=0.0159688 step 60, loss=0.0134457 step 70, loss=0.0117799 step 80, loss=0.0105757 step 90, loss=0.00962179 step 100, loss=0.00886694

This confrontation image is visually indistinguishable from the original image and has no visible artificial processing, but it will be classified as "avocado" with a high probability.

classify(adv, correct_class=img_class, target_class=demo_target)

[image upload failed... (image-7a063e-1515921665436)]

Robust countermeasure sample

Now let's look at a more advanced example. Follow our method to synthesize robust countermeasure samples to find a single disturbance to the cat image, which is simultaneously antagonistic under some selected transformation distributions, and any differentiable transformation distribution can be selected; in this article, we will synthesize a single countermeasure input and set θ ∈ [- π / 4, π / 4], which is robust to rotation. Before continuing with the following work, check whether the previous example can resist rotation, for example, set the angle to θ= π/8.

ex_angle = np.pi/8 angle = tf.placeholder(tf.float32, ()) rotated_image = tf.contrib.image.rotate(image, angle) rotated_example = rotated_image.eval(feed_dict={image: adv, angle: ex_angle}) classify(rotated_example, correct_class=img_class, target_class=demo_target)

It seems that the counter sample we generated before is not rotation invariant! So, how to make a counter sample robust to the distribution of the transformation? Given some transformation distributions T, we can maximize Et~TlogP(y'|t(X'), and the constraint is ‖ X- X '‖∞≤ ε. This optimization problem can be solved by the projection gradient descent method. Note that ∇ EtTlogP(y'|t(X') is equal to EtT ∇ logP(y'|t(X'), and approach the sample in each gradient descent step. We can use a technique to let TensorFlow do this for us instead of manually implementing gradient sampling: we can simulate the gradient descent based on sampling as the gradient descent in the set of random classifiers. The random classifier randomly extracts from the distribution and transforms the input before classification.

num_samples = 10 average_loss = 0 for i in range(num_samples): rotated = tf.contrib.image.rotate( image, tf.random_uniform((), minval=-np.pi/4, maxval=np.pi/4)) rotated_logits, _ = inception(rotated, reuse=True) average_loss += tf.nn.softmax_cross_entropy_with_logits( logits=rotated_logits, labels=labels) / num_samples

We can reuse assign_op and project_step, but for this new goal, we must write a new optim_step.

optim_step = tf.train.GradientDescentOptimizer( learning_rate).minimize(average_loss, var_list=[x_hat])

Finally, we are ready to run PGD to generate confrontation input. As in the previous example, select "avocado" as our target class.

demo_epsilon = 8.0/255.0 # still a pretty small perturbation demo_lr = 2e-1 demo_steps = 300 demo_target = 924 # "guacamole" # initialization step sess.run(assign_op, feed_dict={x: img}) # projected gradient descent for i in range(demo_steps): # gradient descent step _, loss_value = sess.run( [optim_step, average_loss], feed_dict={learning_rate: demo_lr, y_hat: demo_target}) # project step sess.run(project_step, feed_dict={x: img, epsilon: demo_epsilon}) if (i+1) % 50 == 0: print('step %d, loss=%g' % (i+1, loss_value)) adv_robust = x_hat.eval() # retrieve the adversarial example step 50, loss=0.0804289 step 100, loss=0.0270499 step 150, loss=0.00771527 step 200, loss=0.00350717 step 250, loss=0.00656128 step 300, loss=0.00226182

This confrontation image is highly trusted to be classified as "avocado", even in the case of rotation!

rotated_example = rotated_image.eval(feed_dict={image: adv_robust, angle: ex_angle}) classify(rotated_example, correct_class=img_class, target_class=demo_target)

[image upload failed... (image-c6194d-1515921665436)]

Let's take a look at the rotation invariance of the robust counter sample generated in the whole angle range, and see that P(y'|x') is θ ∈[- π/4，π/4].

thetas = np.linspace(-np.pi/4, np.pi/4, 301) p_naive = [] p_robust = [] for theta in thetas: rotated = rotated_image.eval(feed_dict={image: adv_robust, angle: theta}) p_robust.append(probs.eval(feed_dict={image: rotated})[0][demo_target]) rotated = rotated_image.eval(feed_dict={image: adv, angle: theta}) p_naive.append(probs.eval(feed_dict={image: rotated})[0][demo_target]) robust_line, = plt.plot(thetas, p_robust, color='b', linewidth=2, label='robust') naive_line, = plt.plot(thetas, p_naive, color='r', linewidth=2, label='naive') plt.ylim([0, 1.05]) plt.xlabel('rotation angle') plt.ylabel('target class probability') plt.legend(handles=[robust_line, naive_line], loc='lower right') plt.show()

[image upload failed... (image-90a84f-1515921665436)]

As can be seen from the blue curve in the figure, the generated countermeasure sample is super effective.