There are two kinds of implementation of batch normalization in TensorFlow, one is tf.layers.batch'normalization, the other is tf.nn.batch'normalization. Here, I use the second one.
The principle will not stick.
The code is as follows:
def BN(input, isTraining=False, name='BatchNorm', moving_decay=0.9, eps=1e-4): #print('BN isTraining: ', isTraining) _in = input shape = input.get_shape().as_list() assert len(shape) in [2, 4] with tf.variable_scope(name): gamma = tf.Variable(tf.constant(1.0, dtype=tf.float32, shape=[shape[-1]]), name='gamma') beta = tf.Variable(tf.constant(0.0, dtype=tf.float32, shape=[shape[-1]]), name='beta') axes = list(range(len(shape)-1)) batch_mean, batch_var = tf.nn.moments(input, axes=axes) ema = tf.train.ExponentialMovingAverage(decay=moving_decay) def mean_var_with_update(): ema_apply_op = ema.apply([batch_mean, batch_var]) with tf.control_dependencies([ema_apply_op]): return tf.identity(batch_mean), tf.identity(batch_var) mean, var = tf.cond(tf.equal(isTraining, True), mean_var_with_update, lambda: (ema.average(batch_mean), ema.average(batch_var))) #mean, var = mean_var_with_update() return tf.nn.batch_normalization(input, mean, var, beta, gamma, eps)
Note here that the sliding average is used here:
ema = tf.train.ExponentialMovingAverage(decay=moving_decay) def mean_var_with_update(): ema_apply_op = ema.apply([batch_mean, batch_var]) with tf.control_dependencies([ema_apply_op]): return tf.identity(batch_mean), tf.identity(batch_var)
# Tf.control.dependencies (control.inputs) returns a context manager that controls dependencies, # with keyword can make all operations in this context execute after control input # For example: with tf.control_dependencies([a, b]): # c and d will be executed only after a and b are executed c = ... d = ...
In the above code, the return operation will be performed only after the execution of EMA ﹐ apply ﹐ op. why tf.identity? This is because tf.control dependencies requires that both the parameter passed in and the parameter inside must be OP, so that the variable will be changed into OP through tf.identity, and then it will take effect.
Every time this operation is performed, the value in it is updated once.
The sliding average value is obtained.
At first, I was confused when I read this code. Later, I read it TensorFlow to realize Batch Normalization This post was finally understood.