How to generate countermeasure samples using TensorFlow

If convolutional neural network is the former movie king, generative confrontation has become a new bright star in the field of deep learning, which will completely change the way we perceive the world. Confrontation learning training provides a new idea for guiding artificial intelligence to complete complex tasks. Generating confrontation pictures can easily fool the previously trained classifiers. Therefore, how to use the generated confrontation pictures to improve the robustness of the system is a hot issue. The countermeasure samples synthesized by neural network are easy to surprise people, because the small and elaborate disturbance to the input may lead to the neural network misclassifying the input in an arbitrary way. In view of the fact that the counter sample can become very powerful by transferring to the material world, this is a security issue worthy of attention. For example, face recognition, if a confrontation image is also recognized as a real person, there will be some security risks and huge losses. Readers interested in generating confrontation images can pay attention to the recent Kaggle challenge NIPS.

In this article, we will guide readers to use TensorFlow to implement a simple algorithm to synthesize countermeasure samples, and then use this technology to establish a robust countermeasure example.

import tensorflow as tf
import tensorflow.contrib.slim as slim
import tensorflow.contrib.slim.nets as nets

sess = tf.InteractiveSession()

First, set the input image. Use tf.Variable instead of tf.placeholder to ensure that it is trainable. We can still enter it when we need it.

image = tf.Variable(tf.zeros((299, 299, 3)))

Next, load the Inception v3 model.

def inception(image, reuse):
    preprocessed = tf.multiply(tf.subtract(tf.expand_dims(image, 0), 0.5), 2.0)
    arg_scope = nets.inception.inception_v3_arg_scope(weight_decay=0.0)
    with slim.arg_scope(arg_scope):
        logits, _ = nets.inception.inception_v3(
            preprocessed, 1001, is_training=False, reuse=reuse)
        logits = logits[:,1:] # ignore background class
        probs = tf.nn.softmax(logits) # probabilities
    return logits, probs

logits, probs = inception(image, reuse=False)

Next, load the weights of the pre training. The accuracy of top-5 of this Inception v3 is 93.9%.

import tempfile
from urllib.request import urlretrieve
import tarfile
import os

data_dir = tempfile.mkdtemp()
inception_tarball, _ = urlretrieve(
    ''), 'r:gz').extractall(data_dir)

restore_vars = [
    var for var in tf.global_variables()

saver = tf.train.Saver(restore_vars)
saver.restore(sess, os.path.join(data_dir, 'inception_v3.ckpt'))

Next, write some code to display the image, classify it and display the classification results.

import json
import matplotlib.pyplot as plt

imagenet_json, _ = urlretrieve(
with open(imagenet_json) as f:
    imagenet_labels = json.load(f)
def classify(img, correct_class=None, target_class=None):
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 8))
    p =, feed_dict={image: img})[0]
    topk = list(p.argsort()[-10:][::-1])
    topprobs = p[topk]
    barlist =, topprobs)
    if target_class in topk:
    if correct_class in topk:
    plt.ylim([0, 1.1])
               [imagenet_labels[i][:15] for i in topk],

Example image

Load the sample image and make sure it is correctly classified.

import PIL
import numpy as np

img_path, _ = urlretrieve('')
img_class = 281
img =
big_dim = max(img.width, img.height)

wide = img.width > img.height
new_w = 299 if not wide else int(img.width * 299 / img.height)
new_h = 299 if wide else int(img.height * 299 / img.width)
img = img.resize((new_w, new_h)).crop((0, 0, 299, 299))
img = (np.asarray(img) / 255.0).astype(np.float32)

classify(img, correct_class=img_class)

Countermeasure sample

Given an image x, the probability distribution on the output label of the neural network is P(y|X). When making confrontation input manually, we want to find an X ', so that logP(y'|X ') is maximized to target label y', that is, the input will be incorrectly classified as target class. By constraining some ℓ∞ radius to ε Box, required ‖ X- X '‖∞≤ ε, We can make sure that x 'doesn't look the same as the original X. In this framework, the countermeasure sample is to solve a constrained optimization problem, which can be solved by back propagation and projection gradient descent. Basically, the same technology as the training network itself is used. The algorithm is very simple: First initialize the counter sample to X '← X. then, repeat the following process until convergence:

1. X'←X^+α⋅∇logP(y'|X')

2. X'←clip(X',X - ε,X+ε)


Start with the simplest part: write a TensorFlow op to initialize it accordingly.

x = tf.placeholder(tf.float32, (299, 299, 3))

x_hat = image # our trainable adversarial input
assign_op = tf.assign(x_hat, x)

Gradient descent step

Next, a gradient descent step is written to maximize the logarithmic probability (or minimize the cross entropy) of the target class.

learning_rate = tf.placeholder(tf.float32, ())
y_hat = tf.placeholder(tf.int32, ())

labels = tf.one_hot(y_hat, 1000)
loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=[labels])
optim_step = tf.train.GradientDescentOptimizer(
    learning_rate).minimize(loss, var_list=[x_hat])

Projection steps

Finally, the projection step is written so that the countermeasure sample is visually similar to the original image. In addition, it is limited to a valid image within the range of [0,1].

epsilon = tf.placeholder(tf.float32, ())

below = x - epsilon
above = x + epsilon
projected = tf.clip_by_value(tf.clip_by_value(x_hat, below, above), 0, 1)
with tf.control_dependencies([projected]):
    project_step = tf.assign(x_hat, projected)


Finally, we are ready to synthesize a countermeasure sample. We arbitrarily choose "guacamole" (imagenet class 924) as our target class.

demo_epsilon = 2.0/255.0 # a really small perturbation
demo_lr = 1e-1
demo_steps = 100
demo_target = 924 # "guacamole"

# initialization step, feed_dict={x: img})

# projected gradient descent
for i in range(demo_steps):
    # gradient descent step
    _, loss_value =
        [optim_step, loss],
        feed_dict={learning_rate: demo_lr, y_hat: demo_target})
    # project step, feed_dict={x: img, epsilon: demo_epsilon})
    if (i+1) % 10 == 0:
        print('step %d, loss=%g' % (i+1, loss_value))

adv = x_hat.eval() # retrieve the adversarial example
step 10, loss=4.18923
step 20, loss=0.580237
step 30, loss=0.0322334
step 40, loss=0.0209522
step 50, loss=0.0159688
step 60, loss=0.0134457
step 70, loss=0.0117799
step 80, loss=0.0105757
step 90, loss=0.00962179
step 100, loss=0.00886694

This confrontation image is visually indistinguishable from the original image and has no visible artificial processing, but it will be classified as "avocado" with a high probability.

classify(adv, correct_class=img_class, target_class=demo_target)

[image upload failed... (image-7a063e-1515921665436)]

Robust countermeasure sample

Now let's look at a more advanced example. Follow our method to synthesize robust countermeasure samples to find a single disturbance to the cat image, which is simultaneously antagonistic under some selected transformation distributions, and any differentiable transformation distribution can be selected; in this article, we will synthesize a single countermeasure input and set θ ∈ [- π / 4, π / 4], which is robust to rotation. Before continuing with the following work, check whether the previous example can resist rotation, for example, set the angle to θ= π/8.

ex_angle = np.pi/8

angle = tf.placeholder(tf.float32, ())
rotated_image = tf.contrib.image.rotate(image, angle)
rotated_example = rotated_image.eval(feed_dict={image: adv, angle: ex_angle})
classify(rotated_example, correct_class=img_class, target_class=demo_target)

It seems that the counter sample we generated before is not rotation invariant! So, how to make a counter sample robust to the distribution of the transformation? Given some transformation distributions T, we can maximize Et~TlogP(y'|t(X'), and the constraint is ‖ X- X '‖∞≤ ε. This optimization problem can be solved by the projection gradient descent method. Note that ∇ EtTlogP(y'|t(X') is equal to EtT ∇ logP(y'|t(X'), and approach the sample in each gradient descent step. We can use a technique to let TensorFlow do this for us instead of manually implementing gradient sampling: we can simulate the gradient descent based on sampling as the gradient descent in the set of random classifiers. The random classifier randomly extracts from the distribution and transforms the input before classification.

num_samples = 10
average_loss = 0
for i in range(num_samples):
    rotated = tf.contrib.image.rotate(
        image, tf.random_uniform((), minval=-np.pi/4, maxval=np.pi/4))
    rotated_logits, _ = inception(rotated, reuse=True)
    average_loss += tf.nn.softmax_cross_entropy_with_logits(
        logits=rotated_logits, labels=labels) / num_samples

We can reuse assign_op and project_step, but for this new goal, we must write a new optim_step.

optim_step = tf.train.GradientDescentOptimizer(
    learning_rate).minimize(average_loss, var_list=[x_hat])

Finally, we are ready to run PGD to generate confrontation input. As in the previous example, select "avocado" as our target class.

demo_epsilon = 8.0/255.0 # still a pretty small perturbation
demo_lr = 2e-1
demo_steps = 300
demo_target = 924 # "guacamole"

# initialization step, feed_dict={x: img})

# projected gradient descent
for i in range(demo_steps):
    # gradient descent step
    _, loss_value =
        [optim_step, average_loss],
        feed_dict={learning_rate: demo_lr, y_hat: demo_target})
    # project step, feed_dict={x: img, epsilon: demo_epsilon})
    if (i+1) % 50 == 0:
        print('step %d, loss=%g' % (i+1, loss_value))

adv_robust = x_hat.eval() # retrieve the adversarial example
step 50, loss=0.0804289
step 100, loss=0.0270499
step 150, loss=0.00771527
step 200, loss=0.00350717
step 250, loss=0.00656128
step 300, loss=0.00226182

This confrontation image is highly trusted to be classified as "avocado", even in the case of rotation!

rotated_example = rotated_image.eval(feed_dict={image: adv_robust, angle: ex_angle})
classify(rotated_example, correct_class=img_class, target_class=demo_target)

[image upload failed... (image-c6194d-1515921665436)]

Let's take a look at the rotation invariance of the robust counter sample generated in the whole angle range, and see that P(y'|x') is θ ∈[- π/4,π/4].

thetas = np.linspace(-np.pi/4, np.pi/4, 301)

p_naive = []
p_robust = []
for theta in thetas:
    rotated = rotated_image.eval(feed_dict={image: adv_robust, angle: theta})
    p_robust.append(probs.eval(feed_dict={image: rotated})[0][demo_target])
    rotated = rotated_image.eval(feed_dict={image: adv, angle: theta})
    p_naive.append(probs.eval(feed_dict={image: rotated})[0][demo_target])

robust_line, = plt.plot(thetas, p_robust, color='b', linewidth=2, label='robust')
naive_line, = plt.plot(thetas, p_naive, color='r', linewidth=2, label='naive')
plt.ylim([0, 1.05])
plt.xlabel('rotation angle')
plt.ylabel('target class probability')
plt.legend(handles=[robust_line, naive_line], loc='lower right')

[image upload failed... (image-90a84f-1515921665436)]

As can be seen from the blue curve in the figure, the generated countermeasure sample is super effective.

Posted on Wed, 24 Nov 2021 04:42:47 -0500 by cwncool