Auto Encoder (AE and VAE) in tf

Auto Encoder (VAE) in tf

1. Auto Encoder(AE)

  1. Basic Auto Encoder
  • Application of Auto Encoder in unsupervised learning
  • Supervised learning, i.e. training data with labels, is generally used for classification and prediction tasks
  • Unsupervised learning, that is, the training data does not contain labels, is generally used for clustering and data generation tasks
  • The basic structure of the self encoder is as follows. The output of the encoder can be reduced or upgraded (generally reduced). The middle layer with lower dimension is called the neck layer, and the output of the neck layer can be regarded as the result of clustering
  1. Dropout AutoEncoders
  • Dropout means that some weights (connecting lines) are forgotten during back propagation and will not be forgotten during weight updating and testing
  1. Adversarial AutoEncoders
  • Basic idea of advantageous AutoEncoders: a decider is added to the original AutoEncoders to judge whether the output distribution is consistent with the original distribution or the specified distribution (similar degree). If it is inconsistent, it will be judged wrong
  • Basic structure of advantageous autoencoders:
  • Loss function of advantageous autoencoders: l i ( θ , ϕ ) = − ∑ z ∼ q θ ( z ∣ x i ) [ log ⁡ p ϕ ( x i ∣ z ) ] + K L ( q θ ( z ∣ x i ) ∣ ∣ p ( z ) ) l_i(\theta,\phi) = -\sum_{z \sim q_\theta (z\mid x_i)}[\log p_{\phi}(x_i\mid z)]+KL(q_\theta(z\mid x_i)\mid\mid p(z)) li​( θ,ϕ)= −z∼q θ ​(z∣xi​)∑​[logp ϕ ​(xi​∣z)]+KL(q θ (Z ∣ XI) ∣ p(z)), the latter part is KL divergence, which is used to measure the degree of overlap (similarity) between p and Q. The more overlapping parts, the smaller the value K L ( p ∣ ∣ q ) = ∫ − ∞ + ∞ p ( x ) log ⁡ p ( x ) q ( x ) d x KL(p\mid\mid q) = \int_{-\infty}^{+\infty}p(x)\log\frac{p(x)}{q(x)}dx KL(p ∣ ∣ q) = ∫ − ∞ + ∞ p(x)logq(x)p(x) dxadvantageous autoencoders, the first half of the loss function is understood as the encoding result output by the encoder approximates the original data as much as possible after decoding, and the second half is understood as the output distribution approximates the original distribution or specified distribution as much as possible

2. Variational Auto Encoder(VAE)

  1. Problems with Auto Encoder: the output of the encoder is a distribution, which cannot be derived after sampling
  2. VAE
  • The output of the Auto Encoder is z ∼ N ( μ , σ 2 ) z\sim N(\mu,\sigma^2) z∼N(μ,σ2)
  • The output of VAE encoder is z = μ + σ ⨀ ϵ z = \mu +\sigma \bigodot\epsilon z= μ+σ ⨀ ϵ among ϵ ∼ N ( 0 , 1 ) \epsilon\sim N(0,1) ϵ∼N(0,1)
  1. The basic structure of VAE: the neck layer (the layer with lower VAE dimension and the normal distribution parameter layer are collectively referred to as the neck layer) has different internal connection modes
  2. The output of the encoder learned by VAE can be sampled differently (the layers with smaller dimensions sample the red and green layers) to generate different generation models (generate different types of data), rather than one-to-one corresponding data (original input data) like AE

3. AE actual combat

  1. Task: use Auto Encoder to generate fashion_mnist data set
  2. code:
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras import Sequential, layers
from PIL import Image
from matplotlib import pyplot as plt
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'


# Design a picture fusion function
# Merge 100 pictures into one picture
def save_images(ims, name):
    new_im = Image.new('L', (280, 280))  # Divide 100 pictures into 10 rows and 10 columns to form a large picture
    index = 0
    for i in range(0, 280, 28):
        for j in range(0, 280, 28):
            im = ims[index]
            im = Image.fromarray(im, mode='L')
            new_im.paste(im, (i, j))
            index += 1
    new_im.save(name)


h_dim = 20  # The dimension of neck layer is 20
batch_size = 256
lr = 1e-3
(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()
x_train, x_test = x_train.astype(np.float32) / 255., x_test.astype(np.float32) / 255.
train_db = tf.data.Dataset.from_tensor_slices(x_train)
train_db = train_db.shuffle(batch_size * 5).batch(batch_size)
test_db = tf.data.Dataset.from_tensor_slices(x_test)
test_db = test_db.batch(batch_size)


class AE(keras.Model):
    def __init__(self):
        super(AE, self).__init__()
        self.encoder = Sequential([
            layers.Dense(256, activation=tf.nn.relu),
            layers.Dense(128, activation=tf.nn.relu),
            layers.Dense(h_dim)
        ])
        self.decoder = Sequential([
            layers.Dense(128, activation=tf.nn.relu),
            layers.Dense(256, activation=tf.nn.relu),
            layers.Dense(28*28)
        ])

    def call(self, inputs, training=None):
        h = self.encoder(inputs)
        x_hat = self.decoder(h)
        return x_hat


model = AE()
model.build(input_shape=(None, 28*28))
model.summary()
optimizer = tf.optimizers.Adam(lr=lr)
for epoch in range(60):
    for step, x in enumerate(train_db):
        x = tf.reshape(x, [-1, 784])
        with tf.GradientTape() as tape:
            x_output = model(x)
            # The difference between each pixel is used as the loss function
            loss = tf.losses.binary_crossentropy(x, x_output, from_logits=True)
            loss = tf.reduce_mean(loss)
        grads = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(grads, model.trainable_variables))
        if step % 100 == 0:
            print(epoch, step, float(loss))

        # Image generation
        x = next(iter(test_db))
        output = model(tf.reshape(x, [-1, 784]))
        x_hat = tf.sigmoid(output)
        x_hat = tf.reshape(x_hat, [-1, 28, 28])
        x_concat0 = tf.concat([x, x_hat], axis=0)  # Merge in batch dimension
        x_concat0 = x
        x_concat0 = x_concat0.numpy() * 255.
        x_concat0 = x_concat0.astype(np.uint8)
        save_images(x_concat0, 'ae_original_picture/epoch_%d.png'%epoch)
        x_concat1 = tf.concat([x, x_hat], axis=0)
        x_concat1 = x_hat
        x_concat1 = x_concat1.numpy() * 255.
        x_concat1 = x_concat1.astype(np.uint8)
        save_images(x_concat1, 'ae_generate_picture/epoch_%d.png' % epoch)

VAE practice

  1. Task: use variable auto encoder to generate fashion_mnist data set
  2. code:
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras import Sequential, layers
from PIL import Image
from matplotlib import pyplot as plt
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'


# Design a picture fusion function
# Merge 100 pictures into one picture
def save_images(ims, name):
    new_im = Image.new('L', (280, 280))  # Divide 100 pictures into 10 rows and 10 columns to form a large picture
    index = 0
    for i in range(0, 280, 28):
        for j in range(0, 280, 28):
            im = ims[index]
            im = Image.fromarray(im, mode='L')
            new_im.paste(im, (i, j))
            index += 1
    new_im.save(name)


h_dim = 20  # The dimension of neck layer is 20
z_dim = 10  # The dimensions corresponding to the mean and variance layers are 10 respectively
batch_size = 256
lr = 1e-3
(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()
x_train, x_test = x_train.astype(np.float32) / 255., x_test.astype(np.float32) / 255.
train_db = tf.data.Dataset.from_tensor_slices(x_train)
train_db = train_db.shuffle(batch_size * 5).batch(batch_size)
test_db = tf.data.Dataset.from_tensor_slices(x_test)
test_db = test_db.batch(batch_size)


class VAE(keras.Model):
    def __init__(self):
        super(VAE, self).__init__()
        self.fc1 = layers.Dense(128)  # The previous layer of mean variance layer
        self.fc2 = layers.Dense(z_dim)  # Mean layer
        self.fc3 = layers.Dense(z_dim)  # Variance layer
        self.fc4 = layers.Dense(128)
        self.fc5 = layers.Dense(784)

    def encoder(self, x):
        h = tf.nn.relu(self.fc1(x))
        mu = self.fc2(h)  # mean value
        # Variance, the variance obtained here should be the variance and take log, so that the variance can be mapped to negative infinity to positive infinity
        log_var = self.fc3(h)
        return mu, log_var

    def decoder(self, z):
        out = tf.nn.relu(self.fc4(z))
        out = self.fc5(out)
        return out
    # Define a Decider

    def reparameterize(self, mu, log_var):
        eps = tf.random.normal(log_var.shape)  # Generate a standard normal distribution with the same dimension as the encoder output
        # Here is the standard deviation, and the variance obtained before (the variance obtained before is the log, so the original variance needs to be obtained by taking the index)
        std = tf.exp(log_var) ** 0.5
        z = mu + std * eps
        return z

    def call(self, inputs, training=None):
        mu, log_var = self.encoder(inputs)
        z = self.reparameterize(mu, log_var)
        x_hat = self.decoder(z)
        return x_hat, mu, log_var


model = VAE()
model.build(input_shape=(4, 28*28))
model.summary()
optimizer = tf.optimizers.Adam(lr=lr)
for epoch in range(100):
    for step, x in enumerate(train_db):
        x = tf.reshape(x, [-1, 784])
        with tf.GradientTape() as tape:
            x_output, mu, log_var = model(x)
            # The difference between each pixel is used as the loss function
            loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=x, logits=x_output)  # This loss function works better here
            loss0 = tf.reduce_sum(loss) / x.shape[0]
            # Calculate kl divergence
            kl_div = -0.5 * (log_var + 1 - mu**2 - tf.exp(log_var))
            kl_div = tf.reduce_sum(kl_div) / x.shape[0]
            loss = loss0 + 1. * kl_div
        grads = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(grads, model.trainable_variables))
        if step % 100 == 0:
            print(epoch, step, 'kl_div:', float(kl_div), 'loss0', float(loss0), 'loss', float(loss))

        # Image generation (generated according to one-to-one mapping like AE)
        x = next(iter(test_db))
        output, _, _ = model(tf.reshape(x, [-1, 784]))
        x_hat = tf.sigmoid(output)
        x_hat = tf.reshape(x_hat, [-1, 28, 28]).numpy() * 255.
        x_hat = x_hat.astype(np.uint8)
        save_images(x_hat, 'vae_mapping_picture/epoch_%d.png' % epoch)
        # Image generation (based on initial distribution)
        # batch_size0 = np.int32(batch_size)
        # z_dim0 = np.int64(batch_size)
        # print(batch_size0.dtype)
        # print(z_dim0.dtype)

        z = tf.random.normal((batch_size, z_dim))
        out_put0 = model.decoder(z)
        x_hat0 = tf.sigmoid(out_put0)
        x_hat0 = tf.reshape(x_hat0, [-1, 28, 28]).numpy() * 255.
        x_hat0 = x_hat0.astype(np.uint8)
        save_images(x_hat0, 'vae_sample_picture/epoch_%d.png' % epoch)

Tags: Machine Learning TensorFlow Deep Learning

Posted on Tue, 28 Sep 2021 07:09:52 -0400 by efficacious