[논문 Summary] DDIM (2021 ICLR) "Denoising Diffusion Implicit Models"

논문 정보

Citation : 2022.11.27 일요일 기준 225회 / 2023.10.08 일요일 기준 1395회

저자

Jiaming Song, Chenlin Meng, Stefano Ermon - Stanford University

논문 링크

Official

https://openreview.net/forum?id=St1giarCHLP

Denoising Diffusion Implicit Models

Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps in order to...

openreview.net

Arxiv

https://arxiv.org/abs/2010.02502

Denoising Diffusion Implicit Models

Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion

arxiv.org

논문 Summary

Abstract

Markov chain을 통해 simulating을 진행하는 DDPM 대비 훨씬 빠른 sampling을 위해 non-Markovian process를 진행한다.

non-Markovian process는 deterministic 생성 프로세스이기에 10배~50배 가량 빠른 속도를 제시한다.

0. 설명 시작 전 Overview

DDPM의 markovian process를 non-markovian process로 바꿈으로써 동일한 objective function 훈련이 되도록 설계할 수 있었고 이를 통해 짧은 T step의 sampling으로 빠른 속도를 자랑할 수 있게 되었다.

시기상 : DDPM, NCSN 나온 다음의 논문

Score-based generative model은 20.11에 나옴.

1. Introduction

최근 GAN 대신 DDPM의 가능성을 확인.

Diffusion Model에서 활용하는 Markov chain process는 Langevin dynamics 혹은 reversing a forward diffusion process를 기반으로 Sample을 생성한다.

DDPM :https://aigong.tistory.com/589

NCSN : https://aigong.tistory.com/594

고품질의 sample 생성을 위해서는 많은 iteration이 필요하다는 단점이 존재하는데 이는 GAN 대비 매우 느리다.

2080 Ti를 활용했을 때, DDPM에서는 32x32 50k 이미지 생성에 20 시간, 256x256 50k 이미지에서는 1000 시간(41.66일)

Section 3 : 이에 동일한 objective function을 활용하여 훈련시키고, Markovian을 non-Markovian으로 generalize하는 DDIM을 제안. variational training objective는 DDPM에서 사용한 objective와 정확히 동일.

Section 5: 실험적 결과 3가지 장점 증명

1) sampling 가속화시, DDPM 대비 DDIM의 우수한 성능

2) DDIM의 consistency property

3) consistency를 활용한 semantically meaningful image interpolation 수행 소개

2. Background

DDPM은 Figure 1 왼쪽과 같은 모습을 취함

reverse process를 위해 $p_{\theta} (x_0)$ 으로 model distribution 추정하도록 학습을 진행.

자세한 내용은 위 DDPM 논문 소개 참조 (일부 수식 증명 포함)

parameter $\theta$ 를 학습하기 위해 variational lower bound를 maixmize 진행

고정된 scale의 값들을 활용한 forward process equation은 아래와 같이 서술가능

cf)

$q(x_t | x_{t-1})$ 의 notation이 다른데, 왜 그런지 잘 모르겠음.

원래대로라면 $\mathcal{N} (x_t; \sqrt{1-\beta_t} x_{t-1}, \beta_t I) = \mathcal{N} (x_t; \sqrt{\alpha_t} x_{t-1}, 1-\alpha_t I)$ 가 되어야하는데...

결과적으로 $x_0$ 가 주어졌을 때의 forward process는 위와 같이 기술 가능.

이를 linear combination과 같이 기술하면 아래와 같음. reparameterization trick (VAE에서 활용하는 방법)

eq 2를 간소화한 loss로 바꾸면 아래와 같음

Appendix C.2 - Eq 2 simplify Eq 5 details

이를 풀어 쓰는 것이 DDPM.

NCSN의 score matching 역시 동일한 objective 활용

그러나 가장 큰 문제점은 긴 T step으로 인해 매우 느리다는 점!!!

3. Variational Inference for Non-Markovian forward processes

동일 marginals

DDIM에서는 non-Markovian으로 유도되는 다른 inference process를 활용.

Appendix A

3.1 Non-Makovian forward processes

cf) DDIM은 식 (6)과 같지만 DDPM는 아래와 같이 정의. (markov chain)

$q(x_{1:T}|x_0) = \prod_{t=1}^T q(x_t | x_{t-1})$

Forward Process (Bayes' rule)

forward process는 더이상 Markovian이 아님

$\sigma$ 의 크기는 forward process에서 얼마나 stochastic한지를 제어하며 0에 가까워질수록 deterministic

$x_0, x_t$ 를 관찰하는 한, $x_{t-1}$ 은 고정된 값으로 알려짐.

Appendix B - Lemma 1

3.2 Generative process and unified variational inference objective

trainable generative process $p_\theta (x_{0:T})$ 는 $q(x_{t-1} | x_t, x_0)$ 을 활용하는 각각의 $p_\theta^{t} (x_{t-1} | x_t)$ 를 통해 학습한다.

reverse conditional distribution : $q(x_{t-1} | x_t, x_0)$ - noisy한 $x_t$ 가 있을 때, $x_0$ 에 상응하는 예측을 한 뒤 sample $x_{t-1}$ 획득

DDPM에서는 eq 4를 통해 $x_0$ 로부터 $x_t$ 획득. 이후 model은 $x_0$ 지식 없이 $x_t$ 로부터 noise 예측.

eq 4를 다시 정립.

$x_t$ 로부터 $x_0$ 예측 : denoising observation

Generative process

Variational Inference objective

Theorem 1

$J_{\sigma}$ 는 특정 $\gamma$ 를 고정한 $L_{\gamma}$ 와 동치. (eq 5)

즉, DDPM의 objective fuction과 DDIM의 objective function으로 치환가능하다. 특히, $\gamma = 1$

4. Sampling from generalized generative processes

$\sigma$ 변화에 따른 더 나은 sample 생성을 위한 generative process 찾기에 돌입.

4.1 Denoising Diffusion Implicit Models (DDIM)

다른 $\sigma$ 선택은 다른 generative process를 초래함.

다음의 경우 generative process는 DDPM

$\sigma_t = \sqrt{{ 1-\alpha_{t-1} \over 1-\alpha_{t}} \cdot {1-\alpha_t \over \alpha_{t-1}}}$

1을 제외한 모든 t에 대해 $\sigma=0$ 일 때, $x_{t-1}, x_0$ 가 주어진 상황에서 forward process는 deterministic.

즉 위 eq 12의 random noise=0.

초래된 결과는 DDPM objective로 훈련된 implicit probabilistic model이기 때문에 DDIM (Denoising Diffusion Implicit Model)이라 명명.

4.2 Accelerated generation processes

원래 Forward가 T step이면 reverse process도 T step이 필요.

$q (x_t|x_0)$ 가 고정되어있는 한 denoising objective $L_1$ 은 특별한 forward procedure에 의존하지 않기에 T보다 작은 step에 대한 forward process를 고려할 수 있다. 이에 따라 가속화 가능.

가정 : forward process가 모든 latent variables인 $x_{1:T}$ 가 아닌 subset ${x_{\tau_1} \cdots x_{ \tau_S }}$ 에 대해 정의

$x_{\tau_1} \cdots$ $x_{ \tau_S }$ 에 대한 sequential forward process가 marginal이 되도록 정의

$q (x_{\tau_i}|x_0) = \mathcal{N} ( \sqrt {\alpha_{\tau_i}} x_0 , (1- \alpha_{\tau_i} \mathbf{I}))$

Sampling Trajectory : generative process가 reversed에 따라 latent variable sampling 진행.

이 길이가 T보다 작을 때, 계산 효율성 증가. (속도가 빨라짐.)

Appendic C.1

DDIM 역시 DDPM과 마찬가지의 형태에 적은 Step에서 훈련시킬 수 있지만, 적은 숫자의 forward step에 따른 일부 sampling이 가능.

고로 더 많은 step에 대해 학습이 필요.

4.3 Relevance to neural ODEs

eq 12에 따른 DDIM iterate를 다시 작성하여 .ODE(Ordinary Differential Equations)를 풀기위한 Euler Integration과의 유사성이 분명해짐.

DDPM과 달리 DDIM은 observation의 encoding이 가능.

Proposition 1.

Appendix B에 증명.

5 . Experiments

DDIM은 DDPM보다 적은 iteration에서도 이미지 생성이 가능 10~100배
DDPM과 달리 initial latent variables $x_T$ 가 고정되면, generation trajectory와 달리 high-level image feature를 통한 latent space 상에서의 interpolation이 가능.
DDIM은 sample encode 가능. (DDPM은 stochastic sampling process 때문에 불가능.)

최대한 DDPM과 동일하게 맞춤.