Deep Dream Model and Implementation

Deep Dream is an interesting technology released by Google in 2015.In a trained convolution neural network, only a few parameters are needed to generate an image using this technique.

The code and pictures for this article are on my computer github On the above, the students who want to implement the code in this article suggest that you can Download the code first, and then refer to the explanation in this article, it will be more convenient to understand.

Doubt:

What did the convolution layer learn?
What does the convolution layer parameter mean?
What is the difference between what you learn by shallow convolution and what you learn by deep convolution?

Set the graph of the input network as x and the probability of each category of the network output as $t$(1000-dimensional vector, representing the probability of 1000 categories). We continuously let the neural network adjust the pixel value of the input image x so that the output t[100] is as large as possible, and finally get the image below.

Images from maximizing a class of probabilities

A convoluted channel can represent a kind of learned "information".With the average value of a channel as the optimization target, it is clear what the channel has learned, which is the basic principle of Deep Dream.In the following subsections, you will learn more about how to generate and optimize Deep Dream images as a program.

Deep Dream Model in TensorFlow

Import Inception Model

The original DeepDream model only needs to optimize the activation value of a channel in the convolution layer of the ImageNet model, so an ImageNet image recognition model should be imported into TensorFlow first.This is illustrated with the Inception model, whose file name is load_inception.py.

The following is a true import of the Inception model.TensorFlow provides a special file with a ".pb" extension that allows you to import the model into a Pb file before exporting it as needed.For the Inception model, the corresponding Pb file is tensorflow_inception_graph.pb.

# Create diagrams and Session graph = tf.Graph() sess = tf.InteractiveSession(graph=graph) # tensorflow_inception_graph.pb File, both stored inception The network structure also stores the corresponding data # Use the following statement to import it model_fn = 'tensorflow_inception_graph.pb' with tf.gfile.FastGFile(model_fn, 'rb') as f: graph_def = tf.GraphDef() graph_def.ParseFromString(f.read()) # Definition t_input Images entered for us t_input = tf.placeholder(tf.float32, name='input') imagenet_mean = 117.0 # Mean of Pixel Value of Picture # The input image needs to be processed before it can be sent to the network # expand_dims Is plus one dimension, from[height, width, channel]Become[1, height, width, channel] # because Inception The model input format is(batch, height, width,channel). t_preprocessed = tf.expand_dims(t_input - imagenet_mean, 0) # Importing data into the model tf.import_graph_def(graph_def, {'input': t_preprocessed})

After importing the model, find all the convolution layers in the model and try to output the shape of a convolution layer:

# Find all convolution layers layers = [op.name for op in graph.get_operations() if op.type == 'Conv2D' and 'import/' in op.name] # Output convolution layer number print('Number of layers', len(layers)) # Number of layers 59 # In particular, output mixed4d_3x3_bottleneck_pre_relu Shape name = 'mixed4d_3x3_bottleneck_pre_relu' print('shape of %s: %s' % (name, str(graph.get_tensor_by_name('import/' + name + ':0').get_shape()))) # shape of mixed4d_3x3_bottleneck_pre_relu: (?, ?, ?, 144) # Because the number and size of the input images are not clear, the first three-dimensional values are indeterminate and display as question marks.

Generate the original Deep Dream image

We define a function to save images so that we can save the data output from the model as images.

def savearray(img_array, img_name): """hold numpy.ndarray Save Pictures""" scipy.misc.toimage(img_array).save(img_name) print('img saved: %s' % img_name)

Input image to generate a channel image

# Define the number of convolution layers, channels, and take out the corresponding tensor name = 'mixed4d_3x3_bottleneck_pre_relu' layer_output = graph.get_tensor_by_name("import/%s:0" % name) # This layer output is(? , ?, ? , 144) # therefore channel 0 is available~143 Any integer value in channel = 139 # Define the original image noise as the starting point for image optimization img_noise = np.random.uniform(size=(224, 224, 3)) + 100.0 # call render_naive Function Rendering render_naive(layer_output[:, :, :, channel], img_noise, iter_n=20)

Calculate gradients to iteratively render initial pictures

def render_naive(t_obj, img0, iter_n=20, step=1.0): """By adjusting the input image t_input，To optimize goals t_score As large as possible :param t_obj: The value of a channel in the convolution layer :param img0:Initialize noise image :param iter_n:Number of iterations :param step:learning rate """ # t_score Is the optimization goal.It is t_obj Average # t_score The larger the network convolution layer, the larger the average activation of corresponding channels t_score = tf.reduce_mean(t_obj) # Calculation t_score Yes t_input Gradient of t_grad = tf.gradients(t_score, t_input)[0] # Creat new drawing img = img0.copy() for i in range(iter_n): # stay sess Calculates the gradient, and the current score g, score = sess.run([t_grad, t_score], ) # Yes img Apply a gradient. step Can be seen as "learning rate" g /= g.std() + 1e-8 img += g * step print('score(mean)=%f' % score) # Save Pictures savearray(img, 'naive.jpg')

After 20 iterations, the image will be saved as naive.jpg.

You can really get some meaningful images by maximizing the average value of a channel!The image here is not very good.

Produce larger Deep Dream images

First try to generate a larger image, on which the size of the generated image is (224, 224, 3), which is the size of the img_noise passed.If you pass a larger img_noise, you can generate a larger picture.

Problem: Will take up more memory (or video memory), if you want to generate a very large picture, rendering will fail because of insufficient memory.

Solution: Divide the picture into several parts and optimize only one part of the picture at a time, so that each optimization will only consume a fixed size of memory.

def calc_grad_tiled(img, t_grad, tile_size=512): """Gradients can be calculated for images of any size :param img: Initialize Noise Picture :param t_grad: Optimize Goals(score)Gradient for Input Picture :param tile_size: Only pair at a time tile_size×tile_size Compute gradients for size images to avoid memory problems :return: Return gradient updated image """ sz = tile_size # 512 h, w = img.shape[:2] # Prevent in tile Edge effect creates an edge effect to move the picture as a whole # Produce two(0,sz]Integer values evenly distributed between sx, sy = np.random.randint(sz, size=2) # Scroll horizontally first sx Positions, then scroll vertically sy Locations img_shift = np.roll(np.roll(img, sx, 1), sy, 0) grad = np.zeros_like(img) # x, y Is the pixel at the start position for y in range(0, max(h - sz // 2, sz), sz): # vertical direction for x in range(0, max(w - sz // 2, sz), sz): # horizontal direction # Each pair sub Calculate the gradient. sub The size is tile_size×tile_size sub = img_shift[y:y + sz, x:x + sz] g = sess.run(t_grad, ) grad[y:y + sz, x:x + sz] = g # Use np.roll Scroll back return np.roll(np.roll(grad, -sx, 1), -sy, 0)

In practical projects, in order to speed up the convergence of the image, the method of generating small size and enlarging the image is used.

def resize_ratio(img, ratio): """Picture img enlarge ratio times""" min = img.min() # Minimum value of picture max = img.max() # Maximum Picture Value img = (img - min) / (max - min) * 255 # normalization # Scale output to zero~255 Number between print("magic", img.shape) img = np.float32(scipy.misc.imresize(img, ratio)) print("ghost", img.shape) img = img / 255 * (max - min) + min # Scale the pixel values back return img def render_multiscale(t_obj, img0, iter_n=10, step=1.0, octave_n=3, octave_scale=1.4): """Generate larger images :param t_obj:The value of a channel in the convolution layer :param img0:Initialize noise image :param iter_n:Number of iterations :param step:learning rate :param octave_n: Zoom in together octave_n-1 second :param octave_scale: Picture magnification, greater than 1"Floating point number"Will become the original multiple!Integer becomes percentage :return: """ # Also define goals and gradients t_score = tf.reduce_mean(t_obj) # Define optimization objectives t_grad = tf.gradients(t_score, t_input)[0] # Calculation t_score Yes t_input Gradient of img = img0.copy() print("Original size",img.shape) for octave in range(octave_n): if octave > 0: # Enlarge small pictures octave_scale times # Co-amplification octave_n - 1 second print("Front", img.shape) img = resize_ratio(img, octave_scale) print("after", img.shape) for i in range(iter_n): # call calc_grad_tiled Computing gradients for images of any size g = calc_grad_tiled(img, t_grad) # Computing gradients for images g /= g.std() + 1e-8 img += g * step savearray(img, 'multiscale.jpg')

The larger the octave_n, the larger the final image will be generated, with the default octave_n=3.With the code above, calling a function directly is possible

if __name__ == '__main__': name = 'mixed4d_3x3_bottleneck_pre_relu' channel = 139 img_noise = np.random.uniform(size=(224, 224, 3)) + 100.0 layer_output = graph.get_tensor_by_name("import/%s:0" % name) render_multiscale(layer_output[:, :, :, channel], img_noise, iter_n=20)

At this point, you can see that the 139 channel of the convolution layer "mixed4d_3x3_bottleneck_pre_rel" actually learns the characteristics of a flower, and if you enter an image of the flower, its activation value will be maximum.You can also adjust octave_n to a larger value to generate a larger image.Regardless of the size of the final image, gradients are always calculated for 512 * 512 pixel images, so memory is always sufficient.If calculating the gradient of 512 * 512 images causes memory problems in the reader's environment, you can modify the tile_size in the function to a smaller value.

Generate higher quality Deep Dream images

We've shifted our focus to "quality". The images generated in the previous section have changed dramatically in detail, and we hope that the overall style of the images should be "soft".

In image processing algorithms, there are concepts of high and low frequency components:

High Frequency Component: Where the gray scale, color and brightness of an image change dramatically, such as the edges and details
Low Frequency Component: Where the image does not change much, such as large color blocks, overall style

The above image produces too many high frequency components, and we hope that the low frequency components of the image will be more so that the resulting image will be more "soft".

Solution:

Add loss to high frequency components.This changes the image when it is generated due to the added loss.However, adding loss will increase the amount of calculation and the number of convergence steps.
Amplify the low frequency gradient.Previously, the gradients used for image generation were uniform.If the gradient can be decomposed into "high frequency gradient" and "low frequency gradient", then the "low frequency gradient" can be artificially enlarged to obtain a softer image.

Laplacian Pyramid decomposes the image.This algorithm can decompose a picture into layers, with level1 and level2 at the bottom corresponding to the high frequency components of the image and level3 and level4 at the top corresponding to the low frequency components of the image.

We can also decompose the Laplacian pyramid for gradients.After decomposition, both high-frequency and low-frequency gradients are standardized so that the low-frequency and high-frequency components of the gradient are similar, and the low-frequency components of the image will be added to the image to improve the quality of the generated image.This method is commonly referred to as Laplacian Pyramid Gradient Normalization.

The following is the code for the Laplacian Pyramid Gradient Standardization implementation, which I have commented on in detail for the implementation process

First, the original picture is decomposed into n-1 high frequency component and 1 low frequency component.
Then standardize each layer
Add standardized high and low frequency components together

k = np.float32([1, 4, 6, 4, 1]) k = np.outer(k, k) # Calculate the product of two vectors(5, 5) k5x5 = k[:, :, None, None] / k.sum() * np.eye(3, dtype=np.float32) # (5, 5, 3, 3) # This function divides an image into low-frequency and high-frequency components def lap_split(img): with tf.name_scope('split'): # A convolution is equivalent to a smoothing, so lo Is low frequency component # filter=k5x5=[filter_height, filter_width, in_channels, out_channels] lo = tf.nn.conv2d(img, k5x5, [1, 2, 2, 1], 'SAME') # Scale low frequency components to the same size as the original image # value，filter，output_shape，strides lo2 = tf.nn.conv2d_transpose(lo, k5x5 * 4, tf.shape(img), [1, 2, 2, 1]) # Use original image img Subtract lo2，And you get the high frequency component hi hi = img - lo2 return lo, hi # This function transforms the image img Divided into n Layer Laplacian Pyramid def lap_split_n(img, n): levels = [] for i in range(n): # call lap_split Divide the image into low-frequency and high-frequency parts # Save high frequency part to levels in # Continue decomposition at low frequency img, hi = lap_split(img) levels.append(hi) levels.append(img) return levels[::-1] # Reverse order, put low frequency first # Restore the Laplacian pyramid to the original image def lap_merge(levels): img = levels[0] # Low Frequency for hi in levels[1:]: # high frequency with tf.name_scope('merge'): # value，filter，output_shape，strides # Convolution becomes low frequency, transpose convolution reverts low frequency to high frequency of picture img = tf.nn.conv2d_transpose(img, k5x5 * 4, tf.shape(hi), [1, 2, 2, 1]) + hi return img # Yes img Standardize. def normalize_std(img, eps=1e-10): with tf.name_scope('normalize'): std = tf.sqrt(tf.reduce_mean(tf.square(img))) # Return is a, b Maximum between return img / tf.maximum(std, eps) # Laplacian Pyramid Standardization def lap_normalize(img, scale_n=4): img = tf.expand_dims(img, 0) # Break the picture into Laplacian pyramids tlevels = lap_split_n(img, scale_n) # Do it once per layer normalize_std tlevels = list(map(normalize_std, tlevels)) # Restore the Laplacian pyramid to the original image out = lap_merge(tlevels) return out[0, :, :, :]

Function Explanation:

lap_split function: An image can be decomposed into high and low frequency components.The low frequency component Lo is obtained by convoluting the original image once.Convolution here serves as smoothing to extract parts of the picture that don't change much.After obtaining the low frequency components, use the transpose convolution to scale the low frequency components to the same size as the original lo2, and then use the original img to subtract the LO2 to get the high frequency components.
lap_split_n function: It divides an image into n-layer Laplacian pyramids, and each time it calls lap_split to decompose the current image. The decomposed high-frequency components are saved in the pyramid levels, while the low-frequency components are left for the next decomposition.
lap_merge function: restores a decomposed Laplace pyramid to the original image.
normalize_std function: Standardize the image.
lap_normalize function: The input image is decomposed into Laplacian pyramids, then normalize_std is called to standardize each layer, and the output is the fused result.

With the Laplacian Pyramid Standardized function, you can write the code that generates the image:

def tffunc(*argtypes): # Make a pair Tensor The defined function is converted to a normal pair numpy.ndarray Defined functions placeholders = list(map(tf.placeholder, argtypes)) def wrap(f): out = f(*placeholders) def wrapper(*args, **kw): return out.eval(dict(zip(placeholders, args)), session=kw.get('session')) return wrapper return wrap def render_lapnorm(t_obj, img0, iter_n=10, step=1.0, octave_n=3, octave_scale=1.4, lap_n=4): """ :param t_obj: Target fraction, the output value of a channel layer_output[:,:,:,channel] (?, ?, ?, 144) :param img0: Input Picture, Noise Image size=(224, 224, 3) :param iter_n: Number of iterations :param step: learning rate """ t_score = tf.reduce_mean(t_obj) # Define optimization objectives t_grad = tf.gradients(t_score, t_input)[0] # Define Gradient # take lap_normalize Convert to normal function, partial: Freeze function a parameter lap_norm_func = tffunc(np.float32)(partial(lap_normalize, scale_n=lap_n)) img = img0.copy() for octave in range(octave_n): if octave > 0: img = resize_ratio(img, octave_scale) for i in range(iter_n): # Computing image gradients g = calc_grad_tiled(img, t_grad) # The only difference is that we use lap_norm_func To standardize g! g = lap_norm_func(g) # For gradients, a Laplace transformation is applied img += g * step print('.', end=' ') savearray(img, 'lapnorm.jpg')

The tffunc function, which converts a function defined for Tensor into a normal function defined for numpy.ndarray.The input parameter of lap_normalize defined above is a Tensor, and the output is a Tensor, which can be transformed into an input ndarray type using the tffunc function, and the output is also a function of the ndarray type.

The code to generate the final image is similar to the previous one, simply calling the render_lapnorm function:

if __name__ == '__main__': name = 'mixed4d_3x3_bottleneck_pre_relu' channel = 139 img_noise = np.random.uniform(size=(224, 224, 3)) + 100.0 layer_output = graph.get_tensor_by_name("import/%s:0" % name) render_lapnorm(layer_output[:, :, :, channel], img_noise, iter_n=20)

Compared with the previous section, this section does improve the quality of the generated image to some extent.You can also see more clearly the image characteristics learned from the 139 channels in this convolution layer.You can try different channels.

The final Deep Dream model

Earlier, we described how to generate images by maximizing the average value of a channel in the convolution layer, and learned how to generate larger and higher-quality images.The final Deep Dream model also needs to add a background to the picture.

Instead of optimizing images from image_noise, use a background image as the starting point.

def resize(img, hw): # parameter hw Is a tuple ( tuple)，use(h, w)The form represents the height and width of the scaled image. min = img.min() max = img.max() img = (img - min) / (max - min) * 255 img = np.float32(scipy.misc.imresize(img, hw)) img = img / 255 * (max - min) + min return img ef render_deepdream(t_obj, img0, iter_n=10, step=1.5, octave_n=4, octave_scale=1.4): t_score = tf.reduce_mean(t_obj) t_grad = tf.gradients(t_score, t_input)[0] img = img0 # Pyramid the image as well # The method of extracting high and low frequencies is simple.Just zoom in and out octaves = [] for i in range(octave_n - 1): hw = img.shape[:2] # Picture method to generate low frequency components lo lo = resize(img, np.int32(np.float32(hw) / octave_scale)) hi = img - resize(lo, hw) # High frequency component img = lo octaves.append(hi) # Generate low-frequency images, then zoom in and add high frequencies for octave in range(octave_n): # 0 1 2 3 if octave > 0: hi = octaves[-octave] img = resize(img, hi.shape[:2]) + hi for i in range(iter_n): g = calc_grad_tiled(img, t_grad) img += g * (step / (np.abs(g).mean() + 1e-7)) img = img.clip(0, 255) savearray(img, 'deepdream1.jpg') if __name__ == '__main__': img0 = PIL.Image.open('test.jpg') img0 = np.float32(img0) name = 'mixed4d_3x3_bottleneck_pre_relu' channel = 139 layer_output = graph.get_tensor_by_name("import/%s:0" % name) render_deepdream(layer_output[:, :, :, channel], img0)

Three parts have been changed here, reading in the image'test.jpg'and passing it as a starting point to the function render_deepdream.To ensure the quality of image generation, render_deepdream also performs high-frequency and low-frequency image decomposition.The decomposition method is to directly reduce the original image and get the lo w frequency component lo. The function used to scale the image is resize, and its parameter h w is a tuple, which represents the height and width of the scaled image in the form of (h, w).

When generating images, start with low-frequency images.A low-frequency image is actually a reduced image that, after a certain number of iterations, enlarges and adds to the original high-frequency component.The calc_grad_tiled method is also used to calculate the gradient.

The left image is original test.jpg Picture, right for generated Deep Dream picture

Using the code below, a well-known DeepDream image containing animals can be generated, where the goal of optimization is to mixed4c's overall output.

name = "mixed4c" layer_optput = graph.get_tensor_by_name("import/%s:0" % name) render_deepdream(tf.square(layer_optput), img0)

You can try different background images, different number of channels and different output layers to get a variety of generated images.

Deep Dream Model and Implementation

Import Inception Model

Generate the original Deep Dream image

Produce larger Deep Dream images

Generate higher quality Deep Dream images

The final Deep Dream model

16 April 2020, 04:15 | Views: 4693

Add new comment

0 comments