# Auto Encoder (VAE) in tf

## 1. Auto Encoder(AE)

1. Basic Auto Encoder
• Application of Auto Encoder in unsupervised learning
• Supervised learning, i.e. training data with labels, is generally used for classification and prediction tasks
• Unsupervised learning, that is, the training data does not contain labels, is generally used for clustering and data generation tasks
• The basic structure of the self encoder is as follows. The output of the encoder can be reduced or upgraded (generally reduced). The middle layer with lower dimension is called the neck layer, and the output of the neck layer can be regarded as the result of clustering
1. Dropout AutoEncoders
• Dropout means that some weights (connecting lines) are forgotten during back propagation and will not be forgotten during weight updating and testing
• Basic idea of advantageous AutoEncoders: a decider is added to the original AutoEncoders to judge whether the output distribution is consistent with the original distribution or the specified distribution (similar degree). If it is inconsistent, it will be judged wrong
• Basic structure of advantageous autoencoders:
• Loss function of advantageous autoencoders: l i ( θ , ϕ ) = − ∑ z ∼ q θ ( z ∣ x i ) [ log ⁡ p ϕ ( x i ∣ z ) ] + K L ( q θ ( z ∣ x i ) ∣ ∣ p ( z ) ) l_i(\theta,\phi) = -\sum_{z \sim q_\theta (z\mid x_i)}[\log p_{\phi}(x_i\mid z)]+KL(q_\theta(z\mid x_i)\mid\mid p(z)) li​( θ,ϕ)= −z∼q θ ​(z∣xi​)∑​[logp ϕ ​(xi​∣z)]+KL(q θ (Z ∣ XI) ∣ p(z)), the latter part is KL divergence, which is used to measure the degree of overlap (similarity) between p and Q. The more overlapping parts, the smaller the value K L ( p ∣ ∣ q ) = ∫ − ∞ + ∞ p ( x ) log ⁡ p ( x ) q ( x ) d x KL(p\mid\mid q) = \int_{-\infty}^{+\infty}p(x)\log\frac{p(x)}{q(x)}dx KL(p ∣ ∣ q) = ∫ − ∞ + ∞ p(x)logq(x)p(x) dxadvantageous autoencoders, the first half of the loss function is understood as the encoding result output by the encoder approximates the original data as much as possible after decoding, and the second half is understood as the output distribution approximates the original distribution or specified distribution as much as possible

## 2. Variational Auto Encoder(VAE)

1. Problems with Auto Encoder: the output of the encoder is a distribution, which cannot be derived after sampling
2. VAE
• The output of the Auto Encoder is z ∼ N ( μ , σ 2 ) z\sim N(\mu,\sigma^2) z∼N(μ,σ2)
• The output of VAE encoder is z = μ + σ ⨀ ϵ z = \mu +\sigma \bigodot\epsilon z= μ+σ ⨀ ϵ among ϵ ∼ N ( 0 , 1 ) \epsilon\sim N(0,1) ϵ∼N(0,1)
1. The basic structure of VAE: the neck layer (the layer with lower VAE dimension and the normal distribution parameter layer are collectively referred to as the neck layer) has different internal connection modes
2. The output of the encoder learned by VAE can be sampled differently (the layers with smaller dimensions sample the red and green layers) to generate different generation models (generate different types of data), rather than one-to-one corresponding data (original input data) like AE

## 3. AE actual combat

1. Task: use Auto Encoder to generate fashion_mnist data set
2. code:
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras import Sequential, layers
from PIL import Image
from matplotlib import pyplot as plt
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

# Design a picture fusion function
# Merge 100 pictures into one picture
def save_images(ims, name):
new_im = Image.new('L', (280, 280))  # Divide 100 pictures into 10 rows and 10 columns to form a large picture
index = 0
for i in range(0, 280, 28):
for j in range(0, 280, 28):
im = ims[index]
im = Image.fromarray(im, mode='L')
new_im.paste(im, (i, j))
index += 1
new_im.save(name)

h_dim = 20  # The dimension of neck layer is 20
batch_size = 256
lr = 1e-3
(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()
x_train, x_test = x_train.astype(np.float32) / 255., x_test.astype(np.float32) / 255.
train_db = tf.data.Dataset.from_tensor_slices(x_train)
train_db = train_db.shuffle(batch_size * 5).batch(batch_size)
test_db = tf.data.Dataset.from_tensor_slices(x_test)
test_db = test_db.batch(batch_size)

class AE(keras.Model):
def __init__(self):
super(AE, self).__init__()
self.encoder = Sequential([
layers.Dense(256, activation=tf.nn.relu),
layers.Dense(128, activation=tf.nn.relu),
layers.Dense(h_dim)
])
self.decoder = Sequential([
layers.Dense(128, activation=tf.nn.relu),
layers.Dense(256, activation=tf.nn.relu),
layers.Dense(28*28)
])

def call(self, inputs, training=None):
h = self.encoder(inputs)
x_hat = self.decoder(h)
return x_hat

model = AE()
model.build(input_shape=(None, 28*28))
model.summary()
for epoch in range(60):
for step, x in enumerate(train_db):
x = tf.reshape(x, [-1, 784])
x_output = model(x)
# The difference between each pixel is used as the loss function
loss = tf.losses.binary_crossentropy(x, x_output, from_logits=True)
loss = tf.reduce_mean(loss)
if step % 100 == 0:
print(epoch, step, float(loss))

# Image generation
x = next(iter(test_db))
output = model(tf.reshape(x, [-1, 784]))
x_hat = tf.sigmoid(output)
x_hat = tf.reshape(x_hat, [-1, 28, 28])
x_concat0 = tf.concat([x, x_hat], axis=0)  # Merge in batch dimension
x_concat0 = x
x_concat0 = x_concat0.numpy() * 255.
x_concat0 = x_concat0.astype(np.uint8)
save_images(x_concat0, 'ae_original_picture/epoch_%d.png'%epoch)
x_concat1 = tf.concat([x, x_hat], axis=0)
x_concat1 = x_hat
x_concat1 = x_concat1.numpy() * 255.
x_concat1 = x_concat1.astype(np.uint8)
save_images(x_concat1, 'ae_generate_picture/epoch_%d.png' % epoch)



## VAE practice

1. Task: use variable auto encoder to generate fashion_mnist data set
2. code:
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras import Sequential, layers
from PIL import Image
from matplotlib import pyplot as plt
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

# Design a picture fusion function
# Merge 100 pictures into one picture
def save_images(ims, name):
new_im = Image.new('L', (280, 280))  # Divide 100 pictures into 10 rows and 10 columns to form a large picture
index = 0
for i in range(0, 280, 28):
for j in range(0, 280, 28):
im = ims[index]
im = Image.fromarray(im, mode='L')
new_im.paste(im, (i, j))
index += 1
new_im.save(name)

h_dim = 20  # The dimension of neck layer is 20
z_dim = 10  # The dimensions corresponding to the mean and variance layers are 10 respectively
batch_size = 256
lr = 1e-3
(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()
x_train, x_test = x_train.astype(np.float32) / 255., x_test.astype(np.float32) / 255.
train_db = tf.data.Dataset.from_tensor_slices(x_train)
train_db = train_db.shuffle(batch_size * 5).batch(batch_size)
test_db = tf.data.Dataset.from_tensor_slices(x_test)
test_db = test_db.batch(batch_size)

class VAE(keras.Model):
def __init__(self):
super(VAE, self).__init__()
self.fc1 = layers.Dense(128)  # The previous layer of mean variance layer
self.fc2 = layers.Dense(z_dim)  # Mean layer
self.fc3 = layers.Dense(z_dim)  # Variance layer
self.fc4 = layers.Dense(128)
self.fc5 = layers.Dense(784)

def encoder(self, x):
h = tf.nn.relu(self.fc1(x))
mu = self.fc2(h)  # mean value
# Variance, the variance obtained here should be the variance and take log, so that the variance can be mapped to negative infinity to positive infinity
log_var = self.fc3(h)
return mu, log_var

def decoder(self, z):
out = tf.nn.relu(self.fc4(z))
out = self.fc5(out)
return out
# Define a Decider

def reparameterize(self, mu, log_var):
eps = tf.random.normal(log_var.shape)  # Generate a standard normal distribution with the same dimension as the encoder output
# Here is the standard deviation, and the variance obtained before (the variance obtained before is the log, so the original variance needs to be obtained by taking the index)
std = tf.exp(log_var) ** 0.5
z = mu + std * eps
return z

def call(self, inputs, training=None):
mu, log_var = self.encoder(inputs)
z = self.reparameterize(mu, log_var)
x_hat = self.decoder(z)
return x_hat, mu, log_var

model = VAE()
model.build(input_shape=(4, 28*28))
model.summary()
for epoch in range(100):
for step, x in enumerate(train_db):
x = tf.reshape(x, [-1, 784])
x_output, mu, log_var = model(x)
# The difference between each pixel is used as the loss function
loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=x, logits=x_output)  # This loss function works better here
loss0 = tf.reduce_sum(loss) / x.shape[0]
# Calculate kl divergence
kl_div = -0.5 * (log_var + 1 - mu**2 - tf.exp(log_var))
kl_div = tf.reduce_sum(kl_div) / x.shape[0]
loss = loss0 + 1. * kl_div
if step % 100 == 0:
print(epoch, step, 'kl_div:', float(kl_div), 'loss0', float(loss0), 'loss', float(loss))

# Image generation (generated according to one-to-one mapping like AE)
x = next(iter(test_db))
output, _, _ = model(tf.reshape(x, [-1, 784]))
x_hat = tf.sigmoid(output)
x_hat = tf.reshape(x_hat, [-1, 28, 28]).numpy() * 255.
x_hat = x_hat.astype(np.uint8)
save_images(x_hat, 'vae_mapping_picture/epoch_%d.png' % epoch)
# Image generation (based on initial distribution)
# batch_size0 = np.int32(batch_size)
# z_dim0 = np.int64(batch_size)
# print(batch_size0.dtype)
# print(z_dim0.dtype)

z = tf.random.normal((batch_size, z_dim))
out_put0 = model.decoder(z)
x_hat0 = tf.sigmoid(out_put0)
x_hat0 = tf.reshape(x_hat0, [-1, 28, 28]).numpy() * 255.
x_hat0 = x_hat0.astype(np.uint8)
save_images(x_hat0, 'vae_sample_picture/epoch_%d.png' % epoch)



Posted on Tue, 28 Sep 2021 07:09:52 -0400 by efficacious