Introduction & Disclaimers

Welcome!

This notebook is an introduction to the concept of latent space, using a recent (and amazing) generative network: StyleGAN2

Here are some great blog posts I found useful when learning about the latent space + StyleGAN2

Latent Space Understanding: Ekin Tiu - Understanding Latent Space in Machine Learning

A technical overview of the stylegan2 architecture: Connor Shorten - StyleGAN2
Overview of GANs + what's changed up to StyleGAN2: akira - From GAN basic to StyleGAN2

Tip: This stand-alone Jupyter notebook is all you need to get everything up and running! It’ll pull a (small) repo with everything that’s needed :D The Repo’s README.md contains original source links for the content! Link to Repo

Tip: This notebook was successfully ran on Google Colab & Gradient Paperspace’s TensorFlow 1.14 Container: every cell, from top-to-bottom! Feel free to run/experiment with this notebook yourselves!

Warning: This tutorial is not too code-heavy in the deep learning/model aspects. Primarily because this tutorial uses the Official StyleGan2 Repo, which uses a depreciated version of Tensorflow (1.14). This blog post abstracts away from the depreciated TensorFlow code, and focuses more on the concepts of latent space and traversals :)

Experiment Layout

In this notebook, we will be experimenting with the following:

0. Set-Up (Run only once!)

This section simply pulls the small repo containing necessary files needed to run things!

1. A Quick Review of GANs

A quick refresh of: z (latent vectors), Generators, and Discriminators.
GANs vs VAEs

2. Generate Images of People who don't Exist

Use the official StyleGAN2 repo to create Generator outputs.
View the latent codes of these generated outputs.

3. Interpolation of Latent Codes

Use the previous Generator outputs' latent codes to morph images of people together.

4. Facial Image Alignment using Landmark Detection

Aligning (normalizing) our own input images for latent space projection.

5. Projecting our own Input Images into the Latent Space

Learning the latent codes of our new aligned input images.
Interpolation of projected latent codes. (Similar to Section 3, but with our images!)

6. Latent Directions/Controls to modify our projected images

Using pre-computed latent directions to alter facial features of our own images.

7. Bonus: Interactive Widget-App!

Play with latent controls yourself using this little jupyter app I built using ipywidgets.

0. Set-Up (Run only once!)

Clone Repo and extract contents

!git clone https://github.com/AmarSaini/Epoching_StyleGan2_Setup.git

import shutil
from pathlib import Path

repo_root = Path('Epoching_StyleGan2_Setup/')

# Pull contents out of the repo, into our current directory.
for content in repo_root.iterdir():
    shutil.move(str(content), '.')

shutil.rmtree(repo_root)

Pip install needed packages

!pip install requests
!pip install Pillow
!pip install tqdm
!pip install dlib

If you're running this on Google Colab, uncomment and run the following cell:

#!pip install tensorflow-gpu==1.14

1. A Quick Review of GANs

I'm going to try to keep this section short, and just go over the needed information to understand the rest of this post:

GANs (Generative Adversarial Networks) consist of two models:

The Generator: A model that converts a latent code into some kind of output (an image of a person, in our case).
The Discriminator: A model that determines whether some input (an image of a person), is real or fake.
- Real: An image from the original dataset.
- Fake: An image from the Generator.

The input to a Generator is a latent code z, a vector of numbers if you will. (Such as: a vector of 512 numbers).

During training, the latent code is randomly sampled (i.e. a random vector of 512 numbers).
When this latent code is randomly sampled, we can call it a latent random variable, as shown in the figure below.
This magical latent code holds information that will allow the Generator to create a specific output.
If you can find a latent code for a particular input, you can represent it with smaller amounts of data! (Such as representing a picture of someone with only a latent vector of 512 numbers, as opposed to the original image size)

Important: Don’t confuse GANs with VAEs (Variational Auto-Encoders)!

Note: GANs learn to generate outputs from random latent vectors that mimic the appearance of your input data, but not necessarily the exact samples of your input data. VAEs learn to encode your input samples into latent vectors, and then also learn to decode latent vectors back to it’s (mostly) original form.

Tip: The main difference to takeaway from GANs vs VAEs, is that our Generator actually never sees the input images, hence we don’t have a way to automatically convert images into it’s corresponding latent code! Teaser: That’s what projection is for, Section 6 :)

Source: https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-training-upc-2016

2. Generate Images of People who don't Exist

import sys
sys.path.append('stylegan2/')

from stylegan2 import pretrained_networks
from stylegan2 import dnnlib
from stylegan2.dnnlib import tflib

from pathlib import Path
from PIL import Image
import pickle
import numpy as np

import ipywidgets as widgets
from tqdm import tqdm

model_path = 'gdrive:networks/stylegan2-ffhq-config-f.pkl'
fps = 20
results_size = 400

# Code to load the StyleGAN2 Model
def load_model():
    _G, _D, Gs = pretrained_networks.load_networks(model_path)
    
    noise_vars = [var for name, var in Gs.components.synthesis.vars.items() if name.startswith('noise')]
    
    Gs_kwargs = dnnlib.EasyDict()
    Gs_kwargs.output_transform = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)
    Gs_kwargs.randomize_noise = False
    
    return Gs, noise_vars, Gs_kwargs

# Generate images given a random seed (Integer)
def generate_image_random(rand_seed):
    rnd = np.random.RandomState(rand_seed)
    z = rnd.randn(1, *Gs.input_shape[1:])
    tflib.set_vars({var: rnd.randn(*var.shape.as_list()) for var in noise_vars})
    images = Gs.run(z, None, **Gs_kwargs)
    return images, z

# Generate images given a latent code ( vector of size [1, 512] )
def generate_image_from_z(z):
    images = Gs.run(z, None, **Gs_kwargs)
    return images

Lets go ahead and start generating some outputs!

Gs, noise_vars, Gs_kwargs = load_model()

Downloading http://d36zk2xti64re0.cloudfront.net/stylegan2/networks/stylegan2-ffhq-config-f.pkl ... done
Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Compiling... Loading... Done.
Setting up TensorFlow plugin "upfirdn_2d.cu": Preprocessing... Compiling... Loading... Done.

images, latent_code1 = generate_image_random(42)
image1 = Image.fromarray(images[0]).resize((results_size, results_size))
latent_code1.shape

(1, 512)

Note: As shown in the previous cell’s output, we can see that the latent_code for this output is of size (1, 512). This means that the numbers inside latent_code1 can be used to create the image below!

image1

Let's make another image!

images, latent_code2 = generate_image_random(1234)
image2 = Image.fromarray(images[0]).resize((results_size, results_size))
latent_code2.shape

(1, 512)

latent_code1[0][:5], latent_code2[0][:5]

(array([ 0.49671415, -0.1382643 ,  0.64768854,  1.52302986, -0.23415337]),
 array([ 0.47143516, -1.19097569,  1.43270697, -0.3126519 , -0.72058873]))

Note: We can see the size of latent_code2 is also (1, 512). However, the two codes are not the same! This is seen in the first five values in the previous cell. Below is the corresponding image for generating output with latent_code2

image2

3. Interpolation of Latent Codes

So what's the big deal? We have two codes to make two people that don't even exist right? Well, the cool thing about the latent space is that you can "traverse" through it!

Since the latent space is a compressed representation of some data, things that are similar in appearance should be "close" to each other in the latent space.

If the latent space is well developed, we can actually transition/interpolate between points in this space and create intermediate outputs!

In other words... we can morph two people together! See the gif below for a quick example!

Source: https://hackernoon.com/latent-space-visualization-deep-learning-bits-2-bd09a46920df

Now let's do this on our examples we just generated! :D.

Let's interpolate halfway between latent_code1, latent_code2

def linear_interpolate(code1, code2, alpha):
    return code1 * alpha + code2 * (1 - alpha)

interpolated_latent_code = linear_interpolate(latent_code1, latent_code2, 0.5)
interpolated_latent_code.shape

(1, 512)

Note: The latent_code size is still a vector of 512 numbers; We just took 50% of latent_code1 and 50% of latent_code2, (alpha=0.5), and summed them together! Below is the resulting image.

images = generate_image_from_z(interpolated_latent_code)
Image.fromarray(images[0]).resize((results_size, results_size))

Let's also make a cool interpolation animation; It'll help with visualizing the effect of interpolating from alpha=0 to alpha=1

output_gifs_path = Path('output_gifs')
# Make Output Gifs folder if it doesn't exist.
if not output_gifs_path.exists():
    output_gifs_path.mkdir()

def get_concat_h(im1, im2):
    dst = Image.new('RGB', (im1.width + im2.width, im1.height))
    dst.paste(im1, (0, 0))
    dst.paste(im2, (im1.width, 0))
    return dst

def make_latent_interp_animation(code1, code2, img1, img2, num_interps):
    
    step_size = 1.0/num_interps
    
    all_imgs = []
    
    amounts = np.arange(0, 1, step_size)
    
    for alpha in tqdm(amounts):
        interpolated_latent_code = linear_interpolate(code1, code2, alpha)
        images = generate_image_from_z(interpolated_latent_code)
        interp_latent_image = Image.fromarray(images[0]).resize((400, 400))
        frame = get_concat_h(img1, interp_latent_image)
        frame = get_concat_h(frame, img2)
        all_imgs.append(frame)

    save_name = output_gifs_path/'latent_space_traversal.gif'
    all_imgs[0].save(save_name, save_all=True, append_images=all_imgs[1:], duration=1000/fps, loop=0)

make_latent_interp_animation(latent_code1, latent_code2, image1, image2, num_interps=200)

100%|██████████| 200/200 [00:31<00:00,  6.35it/s]

Tip: If you’re running this notebook yourself, the interpolation gif will be saved in the following location: output_gifs/latent_space_traversal.gif :)

Note: This gif represents going from latent_code1 to latent_code2 by slowly changing alpha from 0 to 1. (increasing alpha by 1/200 per iteration, until it reaches 1.0)

4. Facial Image Alignment using Landmark Detection

Ok so this is all fun and stuff right? How could we play around with our own images, instead of random people that don't exist?

Well, we first have to project our own images into this latent space.

Important: The first step of projecting our own images is to make sure that they are representative of the training data. StyleGAN2 was trained on the FFHQ Dataset. More specifically, the images used during training were actually aligned first, before giving it to the discriminator in StyleGAN2.

To align (normalize) our images for StyleGAN2, we need to use a landmark detection model. This will automatically find the facial keypoints of interest, and crop/rotate accordingly.

Below is an example!

Tip: At this point, if you want to run this with your own images, all you need to do is go to the imgs/ folder, and delete the example images, Jeremy_Howard.jpg and Obama.jpg. Then upload 2 of your own!

orig_img_path = Path('imgs')
aligned_imgs_path = Path('aligned_imgs')

# Make Aligned Images folder if it doesn't exist.
if not aligned_imgs_path.exists():
    aligned_imgs_path.mkdir()
    
orig_img_path, aligned_imgs_path

(PosixPath('imgs'), PosixPath('aligned_imgs'))

if not Path('shape_predictor_68_face_landmarks.dat').exists():
    !wget http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
    !bzip2 -dv shape_predictor_68_face_landmarks.dat.bz2

--2020-08-11 18:28:05--  http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
Resolving dlib.net (dlib.net)... 107.180.26.78
Connecting to dlib.net (dlib.net)|107.180.26.78|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 64040097 (61M)
Saving to: ‘shape_predictor_68_face_landmarks.dat.bz2’

shape_predictor_68_ 100%[===================>]  61.07M   952KB/s    in 3m 25s  

2020-08-11 18:31:30 (306 KB/s) - ‘shape_predictor_68_face_landmarks.dat.bz2’ saved [64040097/64040097]

  shape_predictor_68_face_landmarks.dat.bz2: done

from align_face import align_face

# Align all of our images using a landmark detection model!
all_imgs = list(orig_img_path.iterdir())
for img in all_imgs:
    align_face(str(img)).save(aligned_imgs_path/('aligned_'+img.name))

Number of faces detected: 1
Detection 0: Left: 375 Top: 333 Right: 760 Bottom: 718
Part 0: (377, 507), Part 1: (391, 550) ...
Number of faces detected: 1
Detection 0: Left: 1224 Top: 514 Right: 2022 Bottom: 1313
Part 0: (1333, 660), Part 1: (1323, 758) ...

Let's load the original + aligned images into Jupyter!

aligned_img_set = list(aligned_imgs_path.iterdir())
aligned_img_set.sort()
aligned_img_set = [Image.open(x) for x in aligned_img_set]

orig_img_set = list(orig_img_path.iterdir())
orig_img_set.sort()
orig_img_set = [Image.open(x) for x in orig_img_set]

Note: Original image is on the left, Aligned image is on the right. The image size for the original images aren’t necessarily square. However, the output size of the aligned images is always 1024x1024, a square! StyleGAN2 uses square shapes :)

get_concat_h(orig_img_set[0], aligned_img_set[0])

get_concat_h(orig_img_set[1], aligned_img_set[1])

Important: From these visualizations, we can see that this alignment process gives us a more focused view of the face, and orients the face upward/straight (removing any tilt of the person in the image)

5. Projecting our own Input Images into the Latent Space

Warning: At this point, I restarted my notebook kernel to clear my GPU card memory. The previous sections loaded models inside our notebook, so I ended up restarting the kernel to free space. This is recommended because the next section will be invoking official stylegan2 python scripts from inside notebook cells, and they load models interally (separately from this notebook). You’ll most likely run out of GPU memory if you don’t restart! Everything in this notebook will still run nicely from top-to-bottom, don’t worry!

You can either manually restart your kernel by going to Kernel -> Restart, or run the below cell:

# Automatically restart the kernel by running this cell
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

import sys
sys.path.append('stylegan2/')

from stylegan2 import pretrained_networks
from stylegan2 import dnnlib
from stylegan2.dnnlib import tflib

from pathlib import Path
from PIL import Image
import pickle
import numpy as np

import ipywidgets as widgets
from tqdm import tqdm

model_path = 'gdrive:networks/stylegan2-ffhq-config-f.pkl'
fps = 20
results_size = 400

Note: First, we’ll make an official stylegan2 compatible dataset, to ignore the warnings from deprecated TensorFlow 1.14, we use the -W ignore option when executing python on the command line.

!python -W ignore stylegan2/dataset_tool.py create_from_images datasets_stylegan2/custom_imgs aligned_imgs/

Loading images from "aligned_imgs/"
Creating dataset "datasets_stylegan2/custom_imgs"
Added 2 images.

Note: Now let’s project our previously aligned images into the latent space, and get those latent codes that will hopefully well-represent the images.

Some things to explain here:

Projecting an image into the latent space basically means: Let's figure out what latent code (512 numbers) will cause the generator to make an output that looks like our image
The question is, how do we figure out the latent code? With VAEs (variational autoencoder), we just throw our image through the encoder and we get our latent code just like that!
With GANs, we don't necessarily have a direct way to extract latent codes from an input image, but we can optimize for it.
In a nut shell, here's how we can optimize for a latent code for given input images:

For as many iterations as you'd like, do:

Ask the generator to generate some output from a starting latent vector.
Take the generator's output image, and take your target image, put them both through a VGG16 model (image feature extractor).
Take the generator's output image features from the VGG16.
Take the target image features from the VGG16.
Compute the loss on the difference of features!
Backprop

Important: Note that the loss between the generator’s output image, and the target image, is in the feature space of a VGG16! Why? If we performed pixel-wise loss on just the raw image pixels, it won’t account for any facial features, just color of pixels. Checking the difference of features allows us to optimize for a latent code that doesn’t just check for pixel values, but also the patterns in the pixels that correspond to high-level features, such as cheekbones, eyebrows, nose size, eye width, smile, etc.

Let's project our 2 custom aligned images of Jeremy and Obama!

tot_aligned_imgs = 2

!python -W ignore stylegan2/epoching_custom_run_projector.py project-real-images --network=$model_path \
  --dataset=custom_imgs --data-dir=datasets_stylegan2 --num-images=$tot_aligned_imgs --num-snapshots 500

Local submit - run_dir: results/00001-project-real-images
dnnlib: Running run_projector.project_real_images() on localhost...
Loading networks from "gdrive:networks/stylegan2-ffhq-config-f.pkl"...
Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Loading... Done.
Setting up TensorFlow plugin "upfirdn_2d.cu": Preprocessing... Loading... Done.
Downloading http://d36zk2xti64re0.cloudfront.net/stylegan1/networks/metrics/vgg16_zhang_perceptual.pkl ... done
Loading images from "custom_imgs"...
Projecting image 0/2 ...
2020-08-11 18:46:58.427331: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
Projecting image 1/2 ...      
dnnlib: Finished run_projector.project_real_images() in 28m 47s.

Note: This cell will take a while to run! 1000 iterations for optimizing latent codes (per image) are occuring here, and we’re saving progress-images every 2 steps for a cool gif later!

def get_concat_h(im1, im2):
    dst = Image.new('RGB', (im1.width + im2.width, im1.height))
    dst.paste(im1, (0, 0))
    dst.paste(im2, (im1.width, 0))
    return dst

def make_project_progress_gifs():
    
    all_result_folders = list(Path('results/').iterdir())
    all_result_folders.sort()

    last_result_folder = all_result_folders[-1]
    
    for img_num in range(tot_aligned_imgs):
        all_step_pngs = [x for x in last_result_folder.iterdir() if x.name.endswith('png') and 'image{0:04d}'.format(img_num) in x.name]
        all_step_pngs.sort()

        target_image = Image.open(all_step_pngs[-1]).resize((results_size, results_size))

        all_concat_imgs = []
        for step_img_path in all_step_pngs[:-1]:
            step_img = Image.open(step_img_path).resize((results_size, results_size))
            all_concat_imgs.append(get_concat_h(target_image, step_img))

        all_concat_imgs[0].save('output_gifs/image{0:04d}_project_progress.gif'.format(img_num), save_all=True, append_images=all_concat_imgs[1:], duration=1000/fps, loop=0)

make_project_progress_gifs()

Tip: If you’re running this notebook yourself, the project progress gifs will be saved in output_gifs/image####_project_progress.gif :)

Let's look at the optimized latent codes we have acquired through this projection process!

def get_final_latents():
    all_results = list(Path('results/').iterdir())
    all_results.sort()
    
    last_result = all_results[-1]

    latent_files = [x for x in last_result.iterdir() if 'final_latent_code' in x.name]
    latent_files.sort()
    
    all_final_latents = []
    
    for file in latent_files:
        with open(file, mode='rb') as latent_pickle:
            all_final_latents.append(pickle.load(latent_pickle))
    
    return all_final_latents

latent_codes = get_final_latents()
len(latent_codes), latent_codes[0].shape, latent_codes[1].shape

(2, (1, 18, 512), (1, 18, 512))

Note: We now have two latent codes for our Jeremy Howard and Obama images! Notice how these codes are of the shape (1, 18, 512), instead of the (1, 512) shape we saw earlier on the generated (fake) examples. This is due to one of the architecture designs of StyleGAN2, it actually re-iterates the base latent vector at different levels in the generator to allow for small deviations in the latent code to support variance in style. During training, just one static latent vector of the shape (1, 512) is used. For a more detailed explanation, check out the recommended technical StyleGAN2 overview blog posts mentioned in the introduction! :)

Let's check out what our images look like from our latent codes!

def load_model():
    _G, _D, Gs = pretrained_networks.load_networks(model_path)
    
    noise_vars = [var for name, var in Gs.components.synthesis.vars.items() if name.startswith('noise')]
    
    Gs_kwargs = dnnlib.EasyDict()
    Gs_kwargs.output_transform = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)
    Gs_kwargs.randomize_noise = False
    
    return Gs, noise_vars, Gs_kwargs

def generate_image_from_projected_latents(latent_vector):
    images = Gs.components.synthesis.run(latent_vector, **Gs_kwargs)
    return images

Gs, noise_vars, Gs_kwargs = load_model()

Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Loading... Done.
Setting up TensorFlow plugin "upfirdn_2d.cu": Preprocessing... Loading... Done.

# Must re-define these variables because we restarted our kernel!
output_gifs_path = Path('output_gifs/')
aligned_imgs_path = Path('aligned_imgs/')
aligned_img_set = list(aligned_imgs_path.iterdir())
aligned_img_set.sort()
aligned_img_set = [Image.open(x) for x in aligned_img_set]

images = generate_image_from_projected_latents(latent_codes[0])
recreated_img1 = Image.fromarray(images[0]).resize((results_size, results_size))

orig_img1 = aligned_img_set[1].resize((results_size, results_size))
get_concat_h(orig_img1, recreated_img1)

images = generate_image_from_projected_latents(latent_codes[1])
recreated_img2 = Image.fromarray(images[0]).resize((results_size, results_size))

orig_img2 = aligned_img_set[0].resize((results_size, results_size))
get_concat_h(orig_img2, recreated_img2)

Note: Not bad at all! There are better ways to get more accurate latent codes that have better "visual fidelity", such as optimizing all (1, 18, 512) latent codes, instead of just the base latent code (1, 512), as mentioned in this blog post by 5agado

Now let's re-run the interpolation animation we did back in Section 4, but this time with our own projected latent codes!

def linear_interpolate(code1, code2, alpha):
    return code1 * alpha + code2 * (1 - alpha)

def make_latent_interp_animation_real_faces(code1, code2, img1, img2, num_interps):
    
    step_size = 1.0/num_interps
    
    all_imgs = []
    
    amounts = np.arange(0, 1, step_size)
    
    for alpha in tqdm(amounts):
        interpolated_latent_code = linear_interpolate(code1, code2, alpha)
        images = generate_image_from_projected_latents(interpolated_latent_code)
        interp_latent_image = Image.fromarray(images[0]).resize((400, 400))
        frame = get_concat_h(img2, interp_latent_image)
        frame = get_concat_h(frame, img1)
        all_imgs.append(frame)

    save_name = output_gifs_path/'projected_latent_space_traversal.gif'
    all_imgs[0].save(save_name, save_all=True, append_images=all_imgs[1:], duration=1000/fps, loop=0)

make_latent_interp_animation_real_faces(latent_codes[0], latent_codes[1], recreated_img1, recreated_img2, num_interps=200)

100%|██████████| 200/200 [00:31<00:00,  6.35it/s]

Tip: If you’re running this notebook yourself, the interpolation gif will be saved in output_gifs/projected_latent_space_traversal.gif :)

Note: This gif represents going from Jeremy’s latent code to Obama’s latent code by slowly changing alpha from 0 to 1. (increasing alpha by 1/200 per iteration, until it reaches 1.0)

6. Latent Directions/Controls to modify our projected images

Time to be an astronaut and explore space! Well, the hidden (latent) kind of space. Alright... I admit, that joke was blatant. Sorry for the puns, I'm just trying to relate things together. Ok ok ok, you need some space, got it.

There are ways to learn latent directions (both supervised, and unsupervised) in the latent space to control features. People have already open-sourced some directional latent vectors for StyleGAN2 that allow us to "move" in the latent space and control a particular feature.

Supervised Method of learning these latent directions:

"We first collect multiple samples (image + latent) from our model and manually classify the images for our target attribute (e.g. smiling VS not smiling), trying to guarantee proper class representation balance. We then train a model to classify or regress on our latents and manual labels. At this point we can use the learned functions of these support models as transition directions" - 5agado's Blog
Unsupervised Method: Unsupervised Discovery of Interpretable Directions in the GAN Latent Space

To move in a latent direction we can do the following operation: latent_code = latent_code + latent_direction * magnitude

latent_code is our latent code, such as our recently optimized latent code!
latent_direction is a learnt directional vector that is of shape (18, 512). This vector tells you where to move in the latent space to control a feature, but not how much to move.
magnitude is the amount to move in the direction of latent_direction

This means we can create more interpolations in the latent space! Yay, more animations :). Only this time, rather than interpolating between 2 points, we are slowing moving in a specific latent direction.

Instead of mixing two latent codes together, we slowly add more magnitude to our base latent code, and observe how it changes with respect to magnitude!

Note: We’ll be using the learnt latent directions shared by Robert Luxemburg

def get_control_latent_vectors(path):
    files = [x for x in Path(path).iterdir() if str(x).endswith('.npy')]
    latent_vectors = {f.name[:-4]:np.load(f) for f in files}
    return latent_vectors

latent_controls = get_control_latent_vectors('stylegan2directions/')
len(latent_controls), latent_controls.keys(), latent_controls['age'].shape

(16,
 dict_keys(['age', 'eye_distance', 'eye_eyebrow_distance', 'eye_ratio', 'eyes_open', 'gender', 'lip_ratio', 'mouth_open', 'mouth_ratio', 'nose_mouth_distance', 'nose_ratio', 'nose_tip', 'pitch', 'roll', 'smile', 'yaw']),
 (18, 512))

Note: We have 16 latent "controls" we can play around with!

def make_latent_control_animation(feature, start_amount, end_amount, step_size, person):
    
    all_imgs = []
    
    amounts = np.linspace(start_amount, end_amount, abs(end_amount-start_amount)/step_size)
    
    for amount_to_move in tqdm(amounts):
        modified_latent_code = np.array(latent_code_to_use)
        modified_latent_code += latent_controls[feature]*amount_to_move
        images = generate_image_from_projected_latents(modified_latent_code)
        latent_img = Image.fromarray(images[0]).resize((results_size, results_size))
        all_imgs.append(get_concat_h(image_to_use, latent_img))

    save_name = output_gifs_path/'{0}_{1}.gif'.format(person, feature)
    all_imgs[0].save(save_name, save_all=True, append_images=all_imgs[1:], duration=1000/fps, loop=0)

latent_code_to_use = latent_codes[1]
image_to_use = recreated_img2
make_latent_control_animation(feature='age', start_amount=-10, end_amount=10, step_size=0.1, person='jeremy')

100%|██████████| 200/200 [00:31<00:00,  6.39it/s]

latent_code_to_use = latent_codes[0]
image_to_use = recreated_img1
make_latent_control_animation(feature='age', start_amount=-5, end_amount=5, step_size=0.1, person='obama')

100%|██████████| 100/100 [00:15<00:00,  6.38it/s]

Tip: If you’re running this notebook yourself, the latent control interpolation gifs will be saved in output_gifs/person_feature.gif :)

Important: Here are 5 traversals in the latent space using latent directions:

Note: Aging Jeremy:

Note: Female Jeremy:

Note: Closing Mouth Jeremy:

Note: Aging Obama:

Note: Smiling Obama:

7. Bonus: Interactive Widget-App!

def apply_latent_controls(self):
    
    image_outputs = controller.children[0]
    feature_sliders = controller.children[1]
    
    slider_hboxes = feature_sliders.children[:-2]
    latent_movements = [(x.children[1].value, x.children[0].value) for x in slider_hboxes]

    modified_latent_code = np.array(latent_code_to_use)
    for feature, amount_to_move in latent_movements:
        modified_latent_code += latent_controls[feature]*amount_to_move

    images = generate_image_from_projected_latents(modified_latent_code)
    latent_img = Image.fromarray(images[0]).resize((400, 400))
    
    latent_img_output = image_outputs.children[1]
    with latent_img_output:
        latent_img_output.clear_output()
        display(latent_img)

def reset_latent_controls(self):
    
    image_outputs = controller.children[0]
    feature_sliders = controller.children[1]
    
    slider_hboxes = feature_sliders.children[:-2]
    for x in slider_hboxes:
        x.children[0].value = 0
        
    latent_img_output = image_outputs.children[1]
    with latent_img_output:
        latent_img_output.clear_output()
        display(image_to_use)

def create_interactive_latent_controller():
    orig_img_output = widgets.Output()

    with orig_img_output:
        orig_img_output.clear_output()
        display(image_to_use)

    latent_img_output = widgets.Output()

    with latent_img_output:
        latent_img_output.clear_output()
        display(image_to_use)

    image_outputs = widgets.VBox([orig_img_output, latent_img_output])

    #collapse-hide
    generate_button = widgets.Button(description='Generate', layout=widgets.Layout(width='75%', height='10%'))
    generate_button.on_click(apply_latent_controls)

    reset_button = widgets.Button(description='Reset Latent Controls', layout=widgets.Layout(width='75%', height='10%'))
    reset_button.on_click(reset_latent_controls)

    feature_sliders = []
    for feature in latent_controls:
        label = widgets.Label(feature)
        slider = widgets.FloatSlider(min=-50, max=50)
        feature_sliders.append(widgets.HBox([slider, label]))
    feature_sliders.append(generate_button)
    feature_sliders.append(reset_button)
    feature_sliders = widgets.VBox(feature_sliders)

    return widgets.HBox([image_outputs, feature_sliders])

Tip: Try running this widget-app yourself to see how well-fit some of the latent directions/controls are, and how loose others are!

Important: You can also try plugging-in your own latent directions. (Maybe find another set of latent directions someone open-sourced. Who knows, they might work better than this set of latent directions!) You can even try to learn your own latent directions!

latent_code_to_use = latent_codes[0]
image_to_use = recreated_img1

controller = create_interactive_latent_controller()
controller

latent_code_to_use = latent_codes[1]
image_to_use = recreated_img2

controller = create_interactive_latent_controller()
controller

Important: Below are some examples of the interactive widget-app :D

Tip: This is super useful for finding good starting/end points for the feature-changing gifs shown in the previous Section 6.

Note: Old Jeremy:

Note: Young Jeremy:

Note: Female Jeremy:

Note: Close Mouth Jeremy:

Note: Old Obama:

Note: Young Obama:

Note: Close Mouth Obama:

Note: Smiling Obama:

Conclusion

Thanks for reading, and I hope you had as much fun playing/exploring the latent space as I did!