Surk Park (Antonio)

Surk Park (Antonio)

Dreaming one of startup-nomads in the future zero marginal society.

UpStage AI LAB - Image Generation Course Overview

Updated: August 21, 2024

UpStage AI LAB - Image Generation Course Overview

Overview

Generation 01. Course Overview

In this course, we will cover the fundamentals to the latest advancements in various image generation techniques. From diffusion models, which are currently garnering much attention, to earlier generation paradigms, you’ll learn to understand and evaluate generation models, equipping you with the ability to choose the right model for your needs.

Key Learning Objectives:

Understand various image generation models and their pros and cons.
Develop the ability to evaluate these models and select the appropriate one for different scenarios.
Gain skills in using text, structural information, and more to generate images.

Instructor’s Message: This course covers image generation models. By the end of this course, you’ll be able to differentiate between various image generation models, understand their strengths and weaknesses, and select the most suitable model for your applications. While some parts might seem challenging, persevere through to the end, and you’ll find that it’s not as difficult as it seems. I encourage you to complete the course and use what you’ve learned to achieve your goals.

Pre-Requisites:

Essential Knowledge: Basic concepts in deep learning, learning methods like SGD, CNN, and image classification.
Additional Knowledge: Familiarity with U-Net and Transformers will be helpful for a deeper understanding of the latest generation models.

Detailed Curriculum

1. Overview of Generation Models

Chapter Introduction:
We will explore the evolution and application of generation models, distinguishing them from discriminative models. This chapter will also introduce the theoretical foundation of maximum likelihood estimation.

Chapter Goals:
Understand the difference between discriminative and generative models, and learn about the foundational concept of maximum likelihood estimation.

Lecture Types and Details:

Lecture 1: Evolution of Generation Models
Overview: Explore the development of image generation models from classical approaches to modern deep learning-based methods.
Further Readings (Optional/Recommended):
- The principles of Restricted Boltzmann Machines.
- Notes on autoregressive models.
Lecture 2: Discriminative vs. Generative Models
Overview: Understand the differences between discriminative and generative models, along with the unique challenges of generative models.
Lecture 3: Applications of Generative Models
Overview: Explore various applications of generative models, such as image generation.
Lecture 4: Generative Models and Maximum Likelihood Estimation
Overview: Discuss the differences between generative and discriminative models from a probabilistic perspective, and explore the foundational concept of maximum likelihood estimation.
Further Readings (Optional/Recommended):
- Relationship between MLE and Kullback-Leibler divergence minimization.
- cs236 lecture notes on Maximum Likelihood Learning.
  Further Questions:
- What is the difference between likelihood and probability?

2. Evaluation Metrics for Generative Models

Chapter Introduction:
Understand why evaluating generative models is challenging and explore various evaluation metrics.

Chapter Goals:
Grasp the various metrics used to evaluate generative models.

Lecture Types and Details:

Lecture 1: The Need for Evaluation Metrics in Generative Models
Overview: Learn about the challenges of evaluating generative models and the importance of evaluation metrics by examining discriminative model metrics.
Further Readings (Optional/Recommended):
- Evaluation of GANs & Disadvantages and Bias of GANs.
Lecture 2: Evaluation Metrics (IS & FID)
Overview: Dive into the Inception Score (IS) and Fréchet Inception Distance (FID), the representative metrics for generative models.
Further Readings (Optional/Recommended):
- A Note on the Inception Score.
  Further Questions:
- How would you implement Inception Score calculation? (To be covered in practical sessions)
Lecture 3: Evaluation Metrics (Precision & Recall)
Overview: Explore precision & recall as evaluation metrics for generative models.
Further Readings (Optional/Recommended):
- Precision & Recall limitations.
Lecture 4: Evaluation Metrics for Conditional Models
Overview: Learn about various metrics used to evaluate how well a generated image meets the given conditions.

3. Autoencoders and Variational Autoencoders

Chapter Introduction:
This chapter covers autoencoders and variational autoencoders (VAE), including practical implementation. We will also explore Vector Quantized Variational Autoencoders (VQ-VAE).

Chapter Goals:
Understand the principles of autoencoders and VAEs and be able to implement them using PyTorch.

Lecture Types and Details:

Lecture 1: Understanding Autoencoders
Overview: Explore the structure and learning methods of standard and denoising autoencoders, along with their various applications.
Further Readings (Optional/Recommended):
- A comprehensive guide on autoencoders.
Lecture 2: Understanding Variational Autoencoders
Overview: Delve into the theoretical background and learning methods of VAEs.
Further Readings (Optional/Recommended):
- Introduction to Variational Autoencoders by the VAE paper author.
  Further Questions:
- How would you express the reconstruction error term if the decoder distribution is Bernoulli?
Practice Session 1: Autoencoder Implementation
Overview: Implement an autoencoder, train it on the MNIST dataset, and visualize the reconstructed images and latent vectors.
Further Readings (Optional/Recommended):
- Implementing an Autoencoder in PyTorch.
Practice Session 2: Variational Autoencoder Implementation
Overview: Understand reparameterization and Kullback-Leibler divergence, and implement a VAE. Generate images using Fashion MNIST.
Further Readings (Optional/Recommended):
- KL divergence between two multivariate normal distributions.
Lecture 3: Vector Quantized Variational Autoencoder
Overview: Understand the theoretical background of VQ-VAE, used in latent diffusion models (LDM).
Further Readings (Optional/Recommended):
- VQ-VAE PyTorch code explanation.

4. Generative Adversarial Networks (GANs)

Chapter Introduction:
Explore GANs, one of the most popular generative models, and implement them. We will also examine various conditional generative models and implement models like pix2pix and CycleGAN.

Chapter Goals:
Understand GANs and conditional generative models and implement them using PyTorch.

Lecture Types and Details:

Lecture 1: Introduction to GANs
Overview: Explore the theoretical background of GANs, which have driven significant interest in image generation.
Further Readings (Optional/Recommended):
- GANs explained in detail.
Lecture 2: Conditional Generative Models
Overview: Learn about various methods to control the output of GANs using conditional vectors.
Further Questions:
- How would you add conditional vectors to the generator and discriminator in a Conditional GAN? (To be covered in practical sessions)
Practice Session 1: GAN Implementation
Overview: Implement the structure of DCGAN (Deep Convolution GANs) and evaluate the trained GANs using the Inception Score.
Further Readings (Optional/Recommended):
- DCGAN PyTorch tutorial.
Lecture 3: Image Manipulation Using Conditional Generative Models
Overview: Introduce various techniques for manipulating images using conditional generative models, such as image-to-image and text-to-image.
Further Readings (Optional/Recommended):
- CycleGAN official project page.
Practice Session 2: Pix2Pix and CycleGAN Implementation
Overview: Implement Pix2Pix and CycleGAN and observe the results of conditional generation.
Further Readings (Optional/Recommended):
- Pix2Pix for image-to-image translation using conditional GANs.

5. Diffusion Models

Chapter Introduction:
This chapter explores the theoretical background of early diffusion models and various improvements. We will also examine latent diffusion models and implement text-to-image generation using Stable Diffusion. Finally, we will learn about personalization techniques to guide diffusion models towards desired outputs.

Chapter Goals:
Understand diffusion models and latent diffusion models, and perform text-to-image generation using pre-trained Stable Diffusion.

Lecture Types and Details:

Lecture 1: Understanding Diffusion Models
Overview: Explore the theoretical background of diffusion probabilistic models (DPMs) known for their strong generative capabilities.
Further Readings (Optional/Recommended):
- In-depth guide to diffusion models.
Lecture 2: Improving Diffusion Models
Overview: Understand various improvements in diffusion models, such as accelerated generation and conditional generation.
Further Readings (Optional/Recommended):
- Implementation of Classifier Free Guidance in Pytorch.
Lecture 3: Latent Diffusion Models
Overview: Learn about latent diffusion models (LDM) used in various applications, including text-to-image.
Further Readings (Optional/Recommended):
- Stable Diffusion explained.
Practice Session 1: Text-to-Image Implementation
Overview: Implement a Stable Diffusion model using LDM and explore the text-to-image process step by step.
Further Readings (Optional/Recommended):
- Stable Diffusion source code on Hugging Face.
Lecture 4: Personalization in Diffusion Models
Overview: Understand various personalization techniques to achieve desired outputs from diffusion models.
Further Readings (Optional/Recommended):
- Comparison of personalization techniques like LoRA, Dreambooth, Textual Inversion, and Hypernetworks.
Practice Session 2: Personalization of Diffusion Models
Overview: Explore how to personalize a Stable Diffusion model using LoRA.
Further Readings (Optional/Recommended):
- Textual Inversion and Dreambooth tutorials on Hugging Face.

Additional Personal Practice: Variants of GANs

Variants of GANs and Their Evolution:

DCGAN (Deep Convolutional GAN) (2015):
DCGAN introduced Convolutional Neural Networks (CNNs) to improve image generation. It remains a significant advancement in GANs, offering stable training and improved image quality.
WGAN (Wasserstein GAN) (2017):
WGAN addressed GAN training instability by using the Wasserstein distance as the loss function, leading to more stable training.
CGAN (Conditional GAN) (2014):
CGAN allows for controlled data generation based on input conditions, making it highly useful for applications requiring specific outputs.
LSGAN (Least Squares GAN) (2017):
LSGAN improved training stability and image quality by modifying the loss function to use least squares.
CycleGAN (2017):
CycleGAN enabled image-to-image translation without paired data, making it possible to transform images between different domains.
WGAN-GP (Wasserstein GAN with Gradient Penalty) (2017):
WGAN-GP improved upon WGAN by introducing gradient penalty to enforce Lipschitz continuity, further stabilizing training.
StarGAN (2018):
StarGAN extended the capabilities of GANs to perform image translation across multiple domains using a single model.
TimeGAN (2019):
TimeGAN adapted GANs to generate time-series data, effectively capturing the temporal patterns within sequential data.

Each of these GAN variants addressed specific challenges or extended the capabilities of GANs, making them applicable to a broader range of data generation tasks.

Share on

Twitter Facebook LinkedIn

Leave a comment

You may also enjoy

UpStage AI LAB - Document Understanding class Lessons learned

November 07 2024

Document Understanding class, delivered by AI Research Engineer Kim Da-hyun, explores the intricate field that combines computer vision, natural language pro...

UpStage AI LAB - Data-Centric AI Lessons learned

October 03 2024

The Data-Centric AI course covers the entire data construction process, from planning and collection to labeling, cleansing, and release, emphasizing the ...

Digital Treasure Quest: Our Journey in Dialogue Summarization

September 24 2024

Our team, Digital Treasure Quest, participated in a competition focused on summarizing daily conversations, experimenting with various deep learning models l...

UpStage AI LAB 2nd Competition - CV CLASSIFICATION

August 14 2024

This blog introduces the journey of the Digital Treasure Quest team as they participated in a document type classification competition. The team aimed to dev...