1. Import database
import tensorflow as tf from tensorflow import keras as K
2. Asymptotic smooth growth of high-resolution layer (the first innovation)
In professional terms, the training process is developing from several low-resolution convolution layers to multiple high-resolution layers. First train the early layers, and then introduce higher resolution layers. However, even adding one layer at a time will have a great impact on the training. What PGGAN does is smoothly increase these layers to give the system time to apply higher resolution
However, instead of immediately jumping to this resolution, a new layer with high resolution is smoothly added through parameter a
As shown in the figure, after training the 16 * 16 resolution with sufficient iterations, another transposed convolution is introduced into the generator G (G represents the generator and D is the discriminator) and another convolution is introduced into the authenticator to generate 32 * 32 layers. There are two paths: (1-a) the layer multiplied by the nearest neighbor interpolation to increase the scale, (a) the output layer multiplied by the additional transposed convolution, and the two are spliced
def upscale_layer(layer, upscale_factor): ''' By factor (int) Enlarge the layer (tensor), where The tensor is [Group, height, width, channel] ''' height, width = layer.get_shape()[1:3] size = (upscale_factor * height, upscale_factor * width) upscaled_layer = tf.image.resize_nearest_neighbor(layer, size) return upscaled_layer def smoothly_merge_last_layer(list_of_layers, alpha): ''' Threshold based alpha Merge smoothly in layers. This function assumes that all layers are already in RGB Yes. This is the function of the generator. :list_of_layers : Items should be tensors sorted by size :alpha : float \in (0,1) ''' # Hint! # If you are using tensorflow instead of Keras, remember the scope last_fully_trained_layer = list_of_layers[-2] # Now we have the first layer of training last_layer_upscaled = upscale_layer(last_fully_trained_layer, 2) # The new layer hasn't been fully trained yet larger_native_layer = list_of_layers[-1] # This ensures that the merge code can be run assert larger_native_layer.get_shape() == last_layer_upscaled.get_shape() # This code block should take advantage of the broadcast function new_layer = (1-alpha) * upscaled_layer + larger_native_layer * alpha return new_layer
3. Small batch standard deviation (the second innovation)
Before learning the small batch standard deviation, we need to understand what is pattern collapse (in pattern collapse, some patterns or classes are not well represented in the generated samples, that is, the number 8 is not generated in the mnist dataset)
Essentially, an additional scalar statistic is calculated for the authenticator. This statistic is the standard deviation of all pixels in a small batch generated by the generator or from real data.
def minibatch_std_layer(layer, group_size=4): ''' The small batch standard deviation of one layer will be calculated. Will be in the pre specified tf Use within the scope Keras Perform this operation. Suppose the layer is float32 Data type. Otherwise, validation is required/Casting. Note: in Keras There is a more effective way to do this, but only for Clarity and consistency with major implementations (for understanding) This is more specific. Try this as an exercise. ''' # hint! # If you use Tensorflow instead of Keras, always remember the scope # Small batch group must be divisible by (or < =) group_size group_size = K.backend.minimum(group_size, tf.shape(layer)[0]) # Just get some shape information so that we can use it # They serve as shorthand and ensure default values shape = list(K.int_shape(input)) shape[0] = tf.shape(input)[0] # Reshape so that we can operate at the minibatch level # In this code, we assume that the layer is: # [group (G), small batch (M), width (W), height (H), channel (C)] # Note, however, that different implementations use Theano specific # Sequential substitution minibatch = K.backend.reshape(layer, (group_size, -1, shape[1], shape[2], shape[3])) # Group centered [M, W, H, C] minibatch -= tf.reduce_mean(minibatch, axis=0, keepdims=True) # Calculate the variance of group [M, W, H, C] minibatch = tf.reduce_mean(K.backend.square(minibatch), axis = 0) # Calculate the standard deviation of the Group [M,W,H,C] minibatch = K.backend.square(minibatch + 1e8) # Average the feature map and pixels [M,1,1,1] minibatch = tf.reduce_mean(minibatch, axis=[1,2,4], keepdims=True) # Add a layer for each group and pixel minibatch = K.backend.tile(minibatch, [group_size, 1, shape[2], shape[3]]) # Added as a new feature map return K.backend.concatenate([layer, minibatch], axis=1)
4. Balanced learning rate (the third innovation)
A dark magic that no one can tell,
def equalize_learning_rate(shape, gain, fan_in=None): #This adjusts the weight of each layer by the following constants #It is an initializer so that we can adjust the variance in the dynamic range #In different characteristics #Shape: the shape of the tensor (layer): These are the dimensions of each layer. #For example, [4,4,48,3]. In this case, # [kernel_size,kernel_size,number_of_filters,feature_maps]. # But this will depend slightly on your implementation. #Gain: usually sqrt(2) #fan_in: adjust the number of incoming connections according to Xavier's / He's initialization # The default value is the product of all shape dimensions minus the characteristic graph dim - this gives us the number of afferent connections per neuron if fan_in is None: fan_in = np.prod(shape[:-1]) # Initialization constants are used here std = gain / K.sqrt(fan_in) # Create a constant outside the adjustment wscale = K.constant(std, name='wscale', dtype=np.float32) # Get the weight and adjust it using the broadcast mechanism adjusted_weights = K.get_value('layer', shape=shape, initializer=tf.initializers.random_normal()) * wscale return adjusted_weights
5. Pixel level feature normalization in generator
Motivation is the stability of training, and one of the early signs of training divergence is the explosive growth of characteristics
All points in the image are mapped to a set of vectors, and then normalized
Pixel level feature normalization is only used for generators
def pixelwise_feat_norm(inputs, **kwargs): ''' The pixel by pixel feature normalization proposed by Krizevsky et al. 2012. Return normalized input :inputs : Keras / TF layer ''' normalization_constant = K.backend.sqrt(K.backend.mean( inputs**2, axis=-1, keepdims=True) + 1.0e-8) return inputs / normalization_constant
6. Run PGGAN
import matplotlib.pyplot as plt import tensorflow as tf import tensorflow_hub as hub with tf.Graph().as_default(): # Import PGGAN from TFHub in advance module = hub.Module("progan-128_1") #Dimensions sampled at run time latent_dim = 512 # Change the seed to get different faces latent_vector = tf.random.normal([1, latent_dim], seed=1337) # Use this module to generate images from potential space interpolated_images = module(latent_vector) # Run Tensorflow session to get the image of (1128128, 3) with tf.compat.v1.Session() as session: session.run(tf.compat.v1.global_variables_initializer()) image_out = session.run(interpolated_images) plt.imshow(image_out.reshape(128,128,3)) plt.show()
https://tfhub.dev/google/progan-128/1 Download TFHub here, and you can call the model they store in it
A 128 * 128 face image can be obtained by running the code