Introduction & Disclaimers

Welcome!

This notebook is an introduction to the concept of latent space, using a recent (and amazing) generative network: StyleGAN2

Here are some great blog posts I found useful when learning about the latent space + StyleGAN2

Tip: This stand-alone Jupyter notebook is all you need to get everything up and running! It’ll pull a (small) repo with everything that’s needed :D The Repo’s README.md contains original source links for the content! Link to Repo

Tip: This notebook was successfully ran on Google Colab & Gradient Paperspace’s TensorFlow 1.14 Container: every cell, from top-to-bottom! Feel free to run/experiment with this notebook yourselves!

Warning: This tutorial is not too code-heavy in the deep learning/model aspects. Primarily because this tutorial uses the Official StyleGan2 Repo, which uses a depreciated version of Tensorflow (1.14). This blog post abstracts away from the depreciated TensorFlow code, and focuses more on the concepts of latent space and traversals :)

Experiment Layout

In this notebook, we will be experimenting with the following:

0. Set-Up (Run only once!)

  • This section simply pulls the small repo containing necessary files needed to run things!

1. A Quick Review of GANs

  • A quick refresh of: z (latent vectors), Generators, and Discriminators.
  • GANs vs VAEs

2. Generate Images of People who don't Exist

  • Use the official StyleGAN2 repo to create Generator outputs.
  • View the latent codes of these generated outputs.

3. Interpolation of Latent Codes

  • Use the previous Generator outputs' latent codes to morph images of people together.

4. Facial Image Alignment using Landmark Detection

  • Aligning (normalizing) our own input images for latent space projection.

5. Projecting our own Input Images into the Latent Space

  • Learning the latent codes of our new aligned input images.
  • Interpolation of projected latent codes. (Similar to Section 3, but with our images!)

6. Latent Directions/Controls to modify our projected images

  • Using pre-computed latent directions to alter facial features of our own images.

7. Bonus: Interactive Widget-App!

  • Play with latent controls yourself using this little jupyter app I built using ipywidgets.

0. Set-Up (Run only once!)

Clone Repo and extract contents

!git clone https://github.com/AmarSaini/Epoching_StyleGan2_Setup.git
import shutil
from pathlib import Path

repo_root = Path('Epoching_StyleGan2_Setup/')

# Pull contents out of the repo, into our current directory.
for content in repo_root.iterdir():
    shutil.move(str(content), '.')

shutil.rmtree(repo_root)

Pip install needed packages

!pip install requests
!pip install Pillow
!pip install tqdm
!pip install dlib

If you're running this on Google Colab, uncomment and run the following cell:

#!pip install tensorflow-gpu==1.14

1. A Quick Review of GANs

I'm going to try to keep this section short, and just go over the needed information to understand the rest of this post:

GANs (Generative Adversarial Networks) consist of two models:

  • The Generator: A model that converts a latent code into some kind of output (an image of a person, in our case).
  • The Discriminator: A model that determines whether some input (an image of a person), is real or fake.
    • Real: An image from the original dataset.
    • Fake: An image from the Generator.

The input to a Generator is a latent code z, a vector of numbers if you will. (Such as: a vector of 512 numbers).

  • During training, the latent code is randomly sampled (i.e. a random vector of 512 numbers).
  • When this latent code is randomly sampled, we can call it a latent random variable, as shown in the figure below.
  • This magical latent code holds information that will allow the Generator to create a specific output.
  • If you can find a latent code for a particular input, you can represent it with smaller amounts of data! (Such as representing a picture of someone with only a latent vector of 512 numbers, as opposed to the original image size)

Important: Don’t confuse GANs with VAEs (Variational Auto-Encoders)!

Note: GANs learn to generate outputs from random latent vectors that mimic the appearance of your input data, but not necessarily the exact samples of your input data. VAEs learn to encode your input samples into latent vectors, and then also learn to decode latent vectors back to it’s (mostly) original form.

Tip: The main difference to takeaway from GANs vs VAEs, is that our Generator actually never sees the input images, hence we don’t have a way to automatically convert images into it’s corresponding latent code! Teaser: That’s what projection is for, Section 6 :)

2. Generate Images of People who don't Exist

import sys
sys.path.append('stylegan2/')

from stylegan2 import pretrained_networks
from stylegan2 import dnnlib
from stylegan2.dnnlib import tflib

from pathlib import Path
from PIL import Image
import pickle
import numpy as np

import ipywidgets as widgets
from tqdm import tqdm

model_path = 'gdrive:networks/stylegan2-ffhq-config-f.pkl'
fps = 20
results_size = 400

# Code to load the StyleGAN2 Model
def load_model():
    _G, _D, Gs = pretrained_networks.load_networks(model_path)
    
    noise_vars = [var for name, var in Gs.components.synthesis.vars.items() if name.startswith('noise')]
    
    Gs_kwargs = dnnlib.EasyDict()
    Gs_kwargs.output_transform = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)
    Gs_kwargs.randomize_noise = False
    
    return Gs, noise_vars, Gs_kwargs

# Generate images given a random seed (Integer)
def generate_image_random(rand_seed):
    rnd = np.random.RandomState(rand_seed)
    z = rnd.randn(1, *Gs.input_shape[1:])
    tflib.set_vars({var: rnd.randn(*var.shape.as_list()) for var in noise_vars})
    images = Gs.run(z, None, **Gs_kwargs)
    return images, z

# Generate images given a latent code ( vector of size [1, 512] )
def generate_image_from_z(z):
    images = Gs.run(z, None, **Gs_kwargs)
    return images

Lets go ahead and start generating some outputs!

Gs, noise_vars, Gs_kwargs = load_model()
Downloading http://d36zk2xti64re0.cloudfront.net/stylegan2/networks/stylegan2-ffhq-config-f.pkl ... done
Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Compiling... Loading... Done.
Setting up TensorFlow plugin "upfirdn_2d.cu": Preprocessing... Compiling... Loading... Done.
images, latent_code1 = generate_image_random(42)
image1 = Image.fromarray(images[0]).resize((results_size, results_size))
latent_code1.shape
(1, 512)

Note: As shown in the previous cell’s output, we can see that the latent_code for this output is of size (1, 512). This means that the numbers inside latent_code1 can be used to create the image below!
image1

Let's make another image!

images, latent_code2 = generate_image_random(1234)
image2 = Image.fromarray(images[0]).resize((results_size, results_size))
latent_code2.shape
(1, 512)
latent_code1[0][:5], latent_code2[0][:5]
(array([ 0.49671415, -0.1382643 ,  0.64768854,  1.52302986, -0.23415337]),
 array([ 0.47143516, -1.19097569,  1.43270697, -0.3126519 , -0.72058873]))

Note: We can see the size of latent_code2 is also (1, 512). However, the two codes are not the same! This is seen in the first five values in the previous cell. Below is the corresponding image for generating output with latent_code2
image2

3. Interpolation of Latent Codes

So what's the big deal? We have two codes to make two people that don't even exist right? Well, the cool thing about the latent space is that you can "traverse" through it!

Since the latent space is a compressed representation of some data, things that are similar in appearance should be "close" to each other in the latent space.

If the latent space is well developed, we can actually transition/interpolate between points in this space and create intermediate outputs!

In other words... we can morph two people together! See the gif below for a quick example!

Now let's do this on our examples we just generated! :D.

Let's interpolate halfway between latent_code1, latent_code2

def linear_interpolate(code1, code2, alpha):
    return code1 * alpha + code2 * (1 - alpha)
interpolated_latent_code = linear_interpolate(latent_code1, latent_code2, 0.5)
interpolated_latent_code.shape
(1, 512)

Note: The latent_code size is still a vector of 512 numbers; We just took 50% of latent_code1 and 50% of latent_code2, (alpha=0.5), and summed them together! Below is the resulting image.
images = generate_image_from_z(interpolated_latent_code)
Image.fromarray(images[0]).resize((results_size, results_size))

Let's also make a cool interpolation animation; It'll help with visualizing the effect of interpolating from alpha=0 to alpha=1

output_gifs_path = Path('output_gifs')
# Make Output Gifs folder if it doesn't exist.
if not output_gifs_path.exists():
    output_gifs_path.mkdir()

def get_concat_h(im1, im2):
    dst = Image.new('RGB', (im1.width + im2.width, im1.height))
    dst.paste(im1, (0, 0))
    dst.paste(im2, (im1.width, 0))
    return dst

def make_latent_interp_animation(code1, code2, img1, img2, num_interps):
    
    step_size = 1.0/num_interps
    
    all_imgs = []
    
    amounts = np.arange(0, 1, step_size)
    
    for alpha in tqdm(amounts):
        interpolated_latent_code = linear_interpolate(code1, code2, alpha)
        images = generate_image_from_z(interpolated_latent_code)
        interp_latent_image = Image.fromarray(images[0]).resize((400, 400))
        frame = get_concat_h(img1, interp_latent_image)
        frame = get_concat_h(frame, img2)
        all_imgs.append(frame)

    save_name = output_gifs_path/'latent_space_traversal.gif'
    all_imgs[0].save(save_name, save_all=True, append_images=all_imgs[1:], duration=1000/fps, loop=0)
make_latent_interp_animation(latent_code1, latent_code2, image1, image2, num_interps=200)
100%|██████████| 200/200 [00:31<00:00,  6.35it/s]

Tip: If you’re running this notebook yourself, the interpolation gif will be saved in the following location: output_gifs/latent_space_traversal.gif :)

Note: This gif represents going from latent_code1 to latent_code2 by slowly changing alpha from 0 to 1. (increasing alpha by 1/200 per iteration, until it reaches 1.0)

4. Facial Image Alignment using Landmark Detection

Ok so this is all fun and stuff right? How could we play around with our own images, instead of random people that don't exist?

Well, we first have to project our own images into this latent space.

Important: The first step of projecting our own images is to make sure that they are representative of the training data. StyleGAN2 was trained on the FFHQ Dataset. More specifically, the images used during training were actually aligned first, before giving it to the discriminator in StyleGAN2.

To align (normalize) our images for StyleGAN2, we need to use a landmark detection model. This will automatically find the facial keypoints of interest, and crop/rotate accordingly.

Below is an example!

Tip: At this point, if you want to run this with your own images, all you need to do is go to the imgs/ folder, and delete the example images, Jeremy_Howard.jpg and Obama.jpg. Then upload 2 of your own!

orig_img_path = Path('imgs')
aligned_imgs_path = Path('aligned_imgs')

# Make Aligned Images folder if it doesn't exist.
if not aligned_imgs_path.exists():
    aligned_imgs_path.mkdir()
    
orig_img_path, aligned_imgs_path
(PosixPath('imgs'), PosixPath('aligned_imgs'))
if not Path('shape_predictor_68_face_landmarks.dat').exists():
    !wget http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
    !bzip2 -dv shape_predictor_68_face_landmarks.dat.bz2
--2020-08-11 18:28:05--  http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
Resolving dlib.net (dlib.net)... 107.180.26.78
Connecting to dlib.net (dlib.net)|107.180.26.78|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 64040097 (61M)
Saving to: ‘shape_predictor_68_face_landmarks.dat.bz2’

shape_predictor_68_ 100%[===================>]  61.07M   952KB/s    in 3m 25s  

2020-08-11 18:31:30 (306 KB/s) - ‘shape_predictor_68_face_landmarks.dat.bz2’ saved [64040097/64040097]

  shape_predictor_68_face_landmarks.dat.bz2: done
from align_face import align_face

# Align all of our images using a landmark detection model!
all_imgs = list(orig_img_path.iterdir())
for img in all_imgs:
    align_face(str(img)).save(aligned_imgs_path/('aligned_'+img.name))
Number of faces detected: 1
Detection 0: Left: 375 Top: 333 Right: 760 Bottom: 718
Part 0: (377, 507), Part 1: (391, 550) ...
Number of faces detected: 1
Detection 0: Left: 1224 Top: 514 Right: 2022 Bottom: 1313
Part 0: (1333, 660), Part 1: (1323, 758) ...

Let's load the original + aligned images into Jupyter!

aligned_img_set = list(aligned_imgs_path.iterdir())
aligned_img_set.sort()
aligned_img_set = [Image.open(x) for x in aligned_img_set]

orig_img_set = list(orig_img_path.iterdir())
orig_img_set.sort()
orig_img_set = [Image.open(x) for x in orig_img_set]

Note: Original image is on the left, Aligned image is on the right. The image size for the original images aren’t necessarily square. However, the output size of the aligned images is always 1024x1024, a square! StyleGAN2 uses square shapes :)
get_concat_h(orig_img_set[0], aligned_img_set[0])