Skip to content
Generic filters
Exact matches only

Adversarial Latent Autoencoders


Generating faces and expressions through mere code

Hmrishav Bandyopadhyay
Photo by Bianca Berg on Unsplash

A few years back, face recognition models were taking the internet by storm. People were completely blown over by how a computer can identify a face and in some cases can even predict the age! Face-recognition soon seeped into technology for the masses — so much so that almost all the smartphones today are blessed with them. Bygone is the time when a face identification software or security system would cost an arm and a leg!

Technology, however, has evolved at an unprecedented rate since the advent of face detection systems. The neural network of today can not only identify or detect faces but generate faces as well — faces that you and I cannot discern from real faces! Wanna try? Check this out 😉 .

With the advent of Generative Adversarial Networks, face generation has been an important area of research in the field of computer vision.In this article, I am going to enunciate on a particular type of autoencoder that helps generate faces better than ever before — (Big words alert!)Adversarial Latent Autoencoder(ALAE) —a research work whose pre-print was made available at Arxiv on 9th April 2020.

For grasping the concept behind Adversarial Latent Autoencoders (ALAE), let us first go through its inspiration models— Generative Adversarial Networks and Autoencoders.

A generative adversarial network is a two-fold network, the two parts being a discriminator network and a generator network. What’s interesting about GANs is that the generator and the discriminator networks are constantly competing against each other — and thus are forcing each other to bring out the best in themselves.

GAN architecture — with SGD

The task of the generator is to create data that the discriminator won’t be able to discriminate from the real data. In other words, the generator tries to form an estimate to the true data distribution. An ideal generator would be one that has figured out the true data distribution and can thus generate infinite data. The discriminator network, meanwhile is just a classifier network — it tries to classify the data passed onto it as real or fake.Autoencoder

An autoencoder structurally consists of an encoder, a decoder and a bottleneck. It is an unsupervised learning algorithm that, tries to learn the identity function and give an output as close to the input as it can.

Autoencoder architecture

More about autoencoders here. Do let me know if you would like an article dedicated to autoencoders in the comments!

Now that we have a basic idea of autoencoders and generative adversarial networks, let’s get started with the Adversarial Latent Autoencoders.

ALAE introduces an Autoencoder architecture that is general, and has generative power comparable to GANs while learning a less entangled representation. For getting a better idea of how ALAEs get the best of both worlds, let’s take a look at the model architecture.

The research work introduces a novel architecture by modifying the original GAN paradigm. The architecture proposed defines the generator and the discriminator as a composition over 2 functions —

Generator and Discriminator functions — [left to right]

Many important assumptions have been made while defining the generator and the discriminator networks —

  1. It is assumed that the latent spaces at the interface between F and G and between E and D are the same, represented by W.
  2. F is assumed to be a deterministic map while G and E are allowed to be stochastic.
  3. It is also assumed that G is optionally dependent on an independent noisy input η that has a known fixed distribution. This helps create a more general stochastic generator.

A common practice while defining autoencoders, in general, is that a desired target latent distribution is set for the latent space — a distribution that the encoder is trained to match. The ALAE, on the other hand, does not impose the latent distribution to match a specific target distribution. The only constraint in the latent distribution is that the distribution of the output of E has to be the same as that at the input of G. Holding on to this constraint, the learning process decides what’s best for the model.

Let’s get a pictorial view of the architecture —

Architecture for ALAE —

This is the general ALAE architecture. You might wonder why the architecture is abstract — why it isn’t built up with convolutional layers and activations and stuff. That is because this is just the general concept of ALAE architecture. We are not defining the complete architecture here. The complete architecture depends on where we are using ALAE as an autoencoder.

StyleALAE uses the StyleGAN based generator along with the Adversarial Latent Autoencoder. The architecture can be expressed as —

StyleALAE architecture —

The generator is the network that has been adapted from the StyleGAN architecture while the encoder is the architecture adapted from ALAE. The encoder is designed symmetrically to the generator network so as to capture style information from the image. Instance Normalization Layers are added to get the mean and standard deviations for every channel, effectively pulling out the style representation from each level.

StyleALAE takes the help of ALAE and StyleGAN to raise the bar for face and expression generation using artificial intelligence. The official implementation of the StyleALAE model can be found here.

Let’s take a look at what the model is capable of achieving —

StyleALAE generated image —

Yep — That’s a generated image! Got you, didn’t I? 😉

Know more about ALAE from the pre-print at arxiv and make sure to get a good read of the StyleGAN network too. Let me know if you get stuck anywhere! — happy to help 😄.

Hmrishav Bandyopadhyay is a 2nd year Undergraduate at the Electronics and Telecommunication department of Jadvapur University, India. His interests lie in Deep Learning, Computer Vision, and Image Processing. He can be reached at — [email protected] ||