Friends, if you need to reprint, please indicate the source: https://blog.csdn.net/jiangjunshow
After explaining so many basic knowledge of GAN, we have a better understanding of GAN, but if you don't integrate the above theoretical knowledge into actual combat, you still can't internalize the above content, so then we will realize a simple GAN through TensorFlow. (the syntax of TensorFlow version 1. X is used in this article)
We mainly create the simplest GAN, and then train it to generate handwritten digital images like real images. Next, write the code directly.
(1) Import third-party libraries.
import tensorflow as tf import numpy as np import pickle import matplotlib.pyplot as plt
We use TensorFlow to implement the network architecture of GAN and train the constructed GAN; Generating random noise using numpy for generating input data to the generator; Use pickle to persist variables; Finally, matplotlib is used to visualize the changes of two network structure losses during GAN training and the pictures generated by GAN.
(2) To train GAN to generate pictures in MNIST handwritten data set, it is necessary to read the real pictures in MNIST data set as the real data of training discriminator D. TensorFlow provides a method to process MNIST, which can be used to read MNIST data.
from tensorflow.examples.tutorials.mnist import input_data # Read MNIST data mnist = input_data.read_data_sets('./data/MNIST_data') img = mnist.train.images[500] #Read in as a grayscale image plt.imshow(img.reshape((28, 28)), cmap='Greys_r') plt.show()
After reading MNIST pictures, each picture is represented by a one-dimensional matrix.
print(type(img)) print(img.shape) The output is as follows. <class 'numpy.ndarray'> (784,)
PS:TensorFlow input after version 1.9_ data.read_ data_ The sets method will not be downloaded automatically. If there is no MNIST dataset locally, an error will be reported, so we must download it in advance.
Next, define the method for receiving the input, and use the placeholder of TensorFlow to obtain the input data.
def get_inputs(real_size, noise_size): real_img = tf.placeholder(tf.float32, [None, real_size], name='real_img') noise_img = tf.placeholder(tf.float32, [None, noise_size], name='noise_img') return real_img, noise_img
Then you can implement the generator and discriminator. Let's look at the generator first. The code is as follows.
def generator(noise_img, n_units, out_dim, reuse=False, alpha=0.01): ''' generator :paramnoise_img: Noise picture generated by generator :paramn_units: Number of hidden layer cells :paramout_dim: Generator output tensor of size,It should be 32×32=784 :param reuse: Reuse space :param alpha: leakeyReLU coefficient :return: ''' with tf.variable_scope("generator", reuse=reuse): #Full connection hidden1 = tf.layers.dense(noise_img, n_units) #Returns the maximum value hidden1 = tf.maximum(alpha * hidden1, hidden1) hidden1 = tf.layers.dropout(hidden1, rate=0.2, training=True) #Deny: full connection logits = tf.layers.dense(hidden1, out_dim) outputs = tf.tanh(logits) return logits, outputs
It can be found that the network structure of the generator is very simple, just a neural network with a single hidden layer. Its overall structure is input layer → hidden layer → output layer. At the beginning, it only writes the simplest GAN. In the later advanced content, the structure of the generator and discriminator will be more complex.
To briefly explain the above code, first use tf.variable_scope creates a space called generator. The main purpose is to realize that variables can be reused in this space and it is convenient to distinguish components between different volume layers.
Then use the deny method under tf.layers to fully connect the input layer and the hidden layer. tf.layers module provides many methods with high encapsulation level. Using these methods, we can more easily build the corresponding neural network structure. Here, the deny method is used to realize full connection.
We select Leaky ReLU as the activation function of the hidden layer, and use the tf.maximum method to return the larger value activated by Leaky ReLU.
Then, the dropout method of tf.layers is used. Its method is to randomly discard the network unit in the neural network according to a certain probability (that is, the parameters of the network unit are set to 0) to prevent over fitting. Dropout can only be used during training and cannot be used during testing. Finally, through the deny method, the hidden layer is fully connected with the output layer, and Tanh is used as the activation function of the output layer (it is better to use Tanh as the activation function generator in the experiment). The output range of Tanh function is − 1 ~ 1, which means that the pixel range of the generated picture is − 1 ~ 1, but the pixel range of the real picture in MNIST data set is 0 ~ 1, so during training, To adjust the pixel range of the real picture, make it consistent with the generated picture.
Leakey ReLU function is a variant of ReLU function. The difference from ReLU function is that ReLU sets all negative values to zero, while Leakey ReLU multiplies negative values by a slope.
Then look at the code of the discriminator.
def discirminator(img, n_units, reuse=False, alpha=0.01): ''' Discriminator :paramimg: Picture (real picture)/Generate picture) :paramn_units: :param reuse: :param alpha: :return: ''' with tf.variable_scope('discriminator', reuse=reuse): hidden1 = tf.layers.dense(img, n_units) hidden1 = tf.maximum(alpha * hidden1, hidden1) logits = tf.layers.dense(hidden1, 1) outputs = tf.sigmoid(logits) return logits, outputs
The implementation code of the discriminator is not much different from that of the generator. The slight difference is that the output layer of the discriminator has only one network unit, and sigmoid is used as the activation function of the output layer, and the output value of sigmoid function ranges from 0 to 1.
After the generator and discriminator are written, then write the specific calculation diagram. First, do some initialization work, such as defining the required variables and clearing the default graph calculation diagram.
img_size = mnist.train.images[0].shape[0]#Real picture size noise_size = 100 #Noise, initial input of Generator g_units = 128#Generator hidden layer parameters d_units = 128 alpha = 0.01 #leaky ReLU parameter learning_rate = 0.001 #Learning rate smooth = 0.1 #Label smoothing # Reset the default graph calculation graph and the nodes node tf.reset_default_graph()
Then we get_ The inputs method obtains the real picture input and noise input, and inputs them into the generator and discriminator for training. Of course, now it is only to build the training structure of the whole GAN network.
#generator g_logits, g_outputs = generator(noise_img, g_units, img_size) #Discriminator d_logits_real, d_outputs_real = discirminator(real_img, d_units) # Pass in the generated picture and score it d_logits_fake, d_outputs_fake = discirminator(g_outputs, d_units, reuse=True)
The above code passes the noise, the number of hidden layer nodes of the generator and the real picture size into the generator. The size of the real picture is passed in because the generator is required to generate a picture with the same size as the real picture.
At first, the discriminator transmits the real picture and the hidden layer node of the discriminator to score the real picture, and then trains the generated picture with the same parameters to score the generated picture.
After the training logic is constructed, the loss of generator and discriminator is defined. Recall the previous description of the loss. The loss of the discriminator consists of two parts: the difference between the real picture scored by the discriminator and its expected score, and the difference between the generated picture scored by the discriminator and its expected score. Here, the highest score is defined as 1 and the lowest score is defined as 0, that is, the discriminator wants to score 1 for the real picture and 0 for the generated picture. The loss of the generator is essentially the difference in the probability distribution between the generated picture and the real picture. Here, it is converted into the difference between how many points the generator expects the discriminator to give to its generated picture and how many points the discriminator actually gives to the generated picture.
d_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits( logits=d_logits_real, labels=tf.ones_like(d_logits_real))*(1-smooth)) d_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits( logits=d_logits_fake, labels=tf.zeros_like(d_logits_fake) )) #Total loss of discriminator d_loss = tf.add(d_loss_real, d_loss_fake) g_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits( logits=d_logits_fake, labels=tf.ones_like(d_logits_fake))*(1-smooth))
Tf.nn.sigmoid is used to calculate the loss_ cross_ entropy_ with_ logits method, which first uses sigmoid function to calculate the incoming logits parameters, and then calculates their cross entropy loss. At the same time, this method optimizes the calculation method of cross entropy so that the results will not overflow. The function of the method can be seen intuitively from its name.
After the loss is defined, what to do is to minimize the loss.
# tensor in generator g_vars = [var for var in train_vars if var.name.startswith("generator")] # tensor in discriminator d_vars = [var for var in train_vars if var.name.startswith("discriminator")] #Adam optimizer optimization loss d_train_opt = tf.train.AdamOptimizer(learning_rate).minimize(d_loss, var_list=d_vars) g_train_opt = tf.train.AdamOptimizer(learning_rate).minimize(g_loss, var_list=g_vars)
To minimize the loss, first obtain the parameters in the corresponding network structure, that is, the variables of generator and discriminator, which is the object to be modified when minimizing the loss. Here, the Adam optimizer method is used to minimize the loss, and the Adam algorithm is implemented internally. The algorithm is based on the gradient descent algorithm, but it can dynamically adjust the learning rate of each parameter.
So far, the whole calculation result is roughly defined, and then start to realize the specific training logic. First initialize some variables related to training.
batch_size = 64 #Number of exercises per round epochs = 500 #Number of training iteration rounds n_sample = 25 #Number of samples taken samples = [] #Store test samples losses = [] #Storage loss #Save generator variables saver = tf.train.Saver(var_list=g_vars)
Write specific training code.
with tf.Session() as sess: # Initialize the parameters of the model sess.run(tf.global_variables_initializer()) for e in range(epochs): for batch_i in range(mnist.train.num_examples // batch_size): batch = mnist.train.next_batch(batch_size) #28 × 28 = 784 batch_images = batch[0].reshape((batch_size, 784)) # scale the image pixels because the output result of Tanh is between (- 1,1), and real and fake pictures share the parameters of discriminator batch_images = batch_images * 2 -1 #Generate noise picture batch_noise = np.random.uniform(-1,1,size=(batch_size, noise_size)) #First train the discriminator, then train the generator _= sess.run(d_train_opt, feed_dict=) _= sess.run(g_train_opt, feed_dict=) #After each round of training, calculate loss train_loss_d = sess.run(d_loss, feed_dict=) # Loss of real pictures during discriminator training train_loss_d_real = sess.run(d_loss_real, feed_dict=) # Loss of generated pictures during discriminator training train_loss_d_fake = sess.run(d_loss_fake, feed_dict=) # Generator loss train_loss_g = sess.run(g_loss, feed_dict= ) print("Number of training rounds {}/{}...".format(e + 1, epochs), "Total loss of discriminator: {:.4f}(Real picture loss: {:.4f} + False picture loss: {:.4f})...".format(train_loss_d,train_loss_d_real,train_loss_d_fake),"Generator loss: {:.4f}".format(train_loss_g)) # Record various loss values losses.append((train_loss_d, train_loss_d_real, train_loss_d_fake, train_loss_g)) # Take samples for later observation sample_noise = np.random.uniform(-1, 1, size=(n_sample, noise_size)) #Generate samples and save them for later observation gen_samples = sess.run(generator(noise_img, g_units, img_size, reuse=True), feed_dict=) samples.append(gen_samples) # Store checkpoints saver.save(sess, './data/generator.ckpt') with open('./data/train_samples.pkl', 'wb') as f: pickle.dump(samples,f)
At the beginning, the Session object is created, and then the double-layer for loop is used for GAN training. The first layer represents how many rounds of training are required, and the second layer represents the sample size to be taken during each round of training, because the efficiency of all real pictures will be relatively low after training in one breath. The general method is to divide them into multiple groups, and then carry out multiple rounds of training. Here, 64 pictures are a group.
Then read in a group of real data. Because the generator uses Tanh as the activation function of the output layer, the pixel range of the generated picture is − 1 ~ 1. Therefore, here, simply adjust the pixel access of the real picture from 0 ~ 1 to − 1 ~ 1, and then use numpy's uniform method to generate random noise between − 1 ~ 1. After preparing the real data and noise data, you can throw them into the generator and discriminator. The data will run according to the calculation diagram we designed before. It is worth noting that the discriminator should be trained first and then the generator.
After training all the real pictures in this round, calculate the loss of generator and discriminator in this round, and record the loss, so as to visualize the change of loss in the process of GAN training. In order to intuitively feel the changes of the generator during GAN training, a group of pictures are generated with the generator at this time after each round of GAN training and saved. After the training logic is written, you can run the training code and output the following contents.
Number of training rounds 1 / 500... Total loss of discriminator: 0.0190 (loss of real picture: 0.0017 + loss of false picture: 0.0173)
Generator loss: 4.1502
Training rounds 2 / 500... Total loss of discriminator: 1.0480 (loss of real picture: 0.3772 + loss of false picture: 0.6708)
Generator loss: 3.1548
Training rounds 3 / 500... Total loss of discriminator: 0.5315 (loss of real picture: 0.3580 + loss of false picture: 0.1736)
Generator loss: 2.8828
Training rounds 4 / 500... Total loss of discriminator: 2.9703 (loss of real picture: 1.5434 + loss of false picture: 1.4268)
Generator loss: 0.7844
Number of training rounds 5 / 500... Total loss of discriminator: 1.0076 (real picture loss: 0.5763 + false picture loss: 0.4314)
Generator loss: 1.8176
Training rounds 6 / 500... Total loss of discriminator: 0.7265 (loss of real picture: 0.4558 + loss of false picture: 0.2707)
Generator loss: 2.9691
Training rounds 7 / 500... Total loss of discriminator: 1.5635 (real picture loss: 0.8336 + false picture loss: 0.7299)
Generator loss: 2.1342
The whole training process will take 30 ~ 40 minutes.