[논문 Summary] Score based generative model with SDE (2021 ICLR) "Score-Based Generative Modeling through Stochastic Differential Equations"

논문 정보

Citation : 2023.12.03 토요일 기준 2072회

저자

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole

논문 링크

Official

https://openreview.net/forum?id=CzceR82CYc

Score-Based Generative Modeling with Critically-Damped Langevin...

Score-based generative models (SGMs) have demonstrated remarkable synthesis quality. SGMs rely on a diffusion process that gradually perturbs the data towards a tractable distribution, while the...

openreview.net

Arxiv

https://arxiv.org/abs/2011.13456

Score-Based Generative Modeling through Stochastic Differential Equations

Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corre

arxiv.org

공식 Github

https://github.com/yang-song/score_sde

GitHub - yang-song/score_sde: Official code for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR

Official code for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral) - GitHub - yang-song/score_sde: Official code for Score-Based Generative Modeling throu...

github.com

논문 Summary

Abstract

0. 설명 시작 전 Overview

1. Introduction

diffusion 기반 확률적 generative model의 2가지 성공적 관점 : SMLD, DDPM

DDPM : https://aigong.tistory.com/589

[논문 Summary] DDPM (2020 NIPS) "Denoising diffusion probabilistic models"

[논문 Summary] DDPM (2020 NIPS) "Denoising diffusion probabilistic models" 목차 논문 정보 Citation : 2022.11.05 토요일 기준 660회 저자 Jonathan Ho, Ajay Jain, Pieter Abbeel UC Berkeley 논문 링크 Official https://proceedings.neurips.cc/p

aigong.tistory.com

NCSN : https://aigong.tistory.com/592

[논문 Summary] DDIM (2021 ICLR) "Denoising Diffusion Implicit Models"

[논문 Summary] DDIM (2021 ICLR) "Denoising Diffusion Implicit Models" 목차 논문 정보 Citation : 2022.11.27 일요일 기준 225회 / 2023.10.08 일요일 기준 1395회 저자 Jiaming Song, Chenlin Meng, Stefano Ermon - Stanford University 논문

aigong.tistory.com

현재 두 관점 모두 score-based generative models 기반으로 설명가능하다.

저자들은 위 두 방법에 대하여 score-based 생성 모델의 가능성을 확장하고 새로운 sampling methods를 가능하게 하기 위해 stochastic differential equations (SDEs) 관점에서 통합 framework를 제안한다.

본 방법은 시간에 따른 continuum을 고려하여 forward SDE를 구성하고 이를 기반으로 reverse SDE를 계산한다.

다시 말하면, score를 추정할 수 있는 reverse-time SDE를 근사화하도록 time-dependent NN을 훈련하고, numerical SDE solver들을 통해 sample들을 생성한다.

* 제안 framework의 이론적 실증적 기여 3가지

1) Flexible sampling and likelihood computation

sampling을 위한 reverse-time SDE를 통합하는 일반화된 SDE solver를 2가지 제안

(1) Predictor-Corrector (PC) Samplers

- numerical SDE solver + score-based MCMC 접근법 결합

- 효과 : score-based model에 대한 sampling method 통합 및 향상

(2) Deterministic Samplers

- probability flow ordinary differential equation (ODE) 기반 방법

- 효과 : 빠른 adaptive sampling 가능 (black-box ODE solvers, flexible data manipulation via latent codes, a uniquely identifiable encoding, exact likelihood computation.)

2) Controllable generation

conditioning 정보 조율이 쉽게 할 수 있음.

이를 통해 재학습 없이 다양한 application 가능. (class-conditional generation, image inpainting, colorization, other inverse problems)

3) Unified framework

SMLD와 DDPM 방법을 병합할 수 있는 통합된 SDE 방법 framework 제안.

FID, Inception score 기반 실험 결과 나열

2. Background

2.1 Denoising score matching with langevin dynamics (SMLD)

NCSN : https://aigong.tistory.com/592

[논문 Summary] DDIM (2021 ICLR) "Denoising Diffusion Implicit Models"

aigong.tistory.com

기존 방법들에서는 score를 계산하기 위한 score matching 진행 후 Langevin dynamics로 sampling 진행

여기서 여러 noise에 대한 step별 score matching 진행 후 denoising score matching을 하는 것으로 NCSN 진행.

$s_\theta$는 NCSN U-Net

$p_\sigma $ : perturbation kernel

위 eq 1은 denoising score matching weighted sum을 진행하는 NCSN objective.

sampling은 s가 구한 최적의 미분 $s_\theta$에 대해 Langevin MCMC 진행한 것.

2.2. Denoising diffusion probabilistic models (DDPM)

DDPM : https://aigong.tistory.com/589

[논문 Summary] DDPM (2020 NIPS) "Denoising diffusion probabilistic models"

aigong.tistory.com

Markov chin 기반 forward, reverse process를 진행하는 확률론적 방법으로 많은 수식 제공(논문 참조)

본 식은 ELOB에 따라 훈련된 NN optimization eq

eq 3 이후 최적화된 모델을 활용하여 ancestral sampling 진행.

3. Score-Based Generative Modeling with SDEs

다수의 noise scale에 대한 data perturbation을 infinite하게 generalize한 SDE 방법을 제안.

3.1 Perturbating data with SDEs

$p_0$ : i.i.d data distribution

$p_T$ : prior distribution

$t \in [0,T]$ : continuous time variable

Ito SDE solution으로 모델링한 diffusion process

$w$ : standard Wiener process (Brownian motion)

$f( \cdot , t)$ : d dim, vector valued function으로 x(t)의 drift coefficient

$g( \cdot )$ : 1 dim, scalar function으로 x(t)의 diffusion coefficient (원래는 $d \times d$ matrix)

state와 time에 대하여 coefficient가 Lipschitz 범주라면 SDE는 유일하며 강력한 해결법이다.

$p_t(x)$ : $x(t)$의 probability density

$p_{st} (x(t)|x(s))$ : $x(s)$에서 $x(t)$로의 transition kernel

$p_T$ : Gaussian distribution

eq 5를 통해 data distribution을 fixed prior distribution으로 diffuse함.

3.2 Generating Samples by reversing the SDE

Brian D O Anderson. Reverse-time diffusion equation models. Stochastic Process. Appl., 12(3): 313–326, May 1982.

위 증명을 통해 reverse difussion process 또한 diffusion process이며, 시간 역방향으로 진행되고, reverse-time SDE에 의해 주어진 결과가 eq 6

$ ( \bar_w )$ : T에서 0으로 가는 시간 역방향으로 진행할 때의 standard Wiener process

$dt$ : 미분 timestep

각 marginal distribution의 score ( $\bigtriangledown log_{p_t} (x)$ )를 모든 t에 대해서 구할 수 있다면 eq 6으로부터 reverse diffusion process를 유도할 수 있으며 $p_0$ sample 추출 가능.

3.3 Estimating Scores for the SDE

distribution의 score는 score matching으로 추출한 sample을 활용하여 score-based model를 훈련함에 따라 추정할 수 있다.

SMLD와 DDPM의 continuous generalization을 통한 time-dependent score-based model $ s_\theta (x,t)$ 학습 훈련을 진행.

score matching을 통해 eq 7의 optimal solution 계산. (즉, 모든 x와 t에 대한 score 계산하여 추정할 수 있는 상태)

eq 7은 denoising score matching 사용.

eq 7을 효과적으로 풀기 위해 우리는 transition kernel $p_{0t} (x(t) | x(0))$를 알 필요가 있다.

$f$가 affine하다면 transition kernel은 항상 Gaussian distribution.

일반적인 SDE에서는 transition kernel를 얻기 위해 Kolmogorov's foward equation를 푼다.

Bernt Øksendal. Stochastic differential equations. In Stochastic differential equations, pp. 65–84. Springer, 2003.

또는 sample하여 SDE simulate한 다음 eq 7의 denoising score matching을 sliced score matching으로 교체함으로써 $\bigtriangledown_{x(t)} log_{p_{0t}} (x(t) | x(0))$ 계산 우회. (Appendix A)

3.4 Examples: VE, VP SDEs and Beyond

SMLD와 DDPM의 noise perturbation은 서로 다른 SDEs의 이산화로 간주된다.

식 8은 모든 N noise scale을 사용할 때, 분포 x에 상응하는 SMLD의 각 perturbation kernel을 나타낸다.

식 9는 continuous stochastic process를 만족하는 SDE 기반 프로세스 ${ x(t) }^1_{t=0}$을 나타낸다.

식 10은 Markov chain을 따르는 perturbation kernel을 만족하는 식

식 11은 N이 무한할 때 SDE를 따르는 수렴 식을 나타낸다.

식 9 : Variance Exploding (VE) SDE : t가 무한해질 때 process의 variance exploding

식 11 : Variance Preserving (VP) SDE

VP-SDE에 영감을 받아 새로운 SDE 고안. 이는 likelihood에 잘 수행됨.

이를 sub-VP SDE라 명명

VE, VP, sub-VP SDE는 모두 affine drift coefficient이기에 perturbation kernel $p_{0t} ( x(t) | x(0))$ 은 모두 Gaussian이며 closed form으로 계산된다.

또한 식 7이 효과적으로 훈련시킬 수 있게 한다.

4. Solving the reverse SDE

time-dependent score-based model $s_\theta$를 훈련한 다음, reverse-time SDE를 설계하여야하며 이후 sample 생성.

4.1 General-purpose numerical SDE Solvers

Numerical solver를 통해 SDE로부터 근사치 경로를 추정한다.

여기서 Euler-Maruyama, stochastic Runge-Kutta methods 등을 사용하여 stochastic dynamics의 서로 다른 이산화에 상응하게 할 수 있다.

DDPM에서 사용하던 sampling method인 ancestral sampling 또한 VP SDE reverse time 이산화하는 방법 중 하나이다.

다만, 새로운 SDE 식에는 맞지 않기에 reverse diffusion samplers를 제안한다.

Table 1 (Section 4.2 아래에 위치): ancestral sampling보다 reverse diffusion이 조금 더 나은 성능을 보여준다.

4.2 Predictor-Corrector Samplers

score-based approach (e.g. Langevin MCMC, HMC)를 활용하여 p에 대한 sample하고 numerical SDE solver로 solution correct.

각 time step에 대하여

numerical SDE solver ( predictor 역할 ) - 다음 time step sample estimate

score-based MCMC approach ( corrector 역할 ) - 추정 sample의 marginal distribution correct

Predictor-Corrector (PC) Samplers 명명 (더 자세한 내용은 Appendix G)

SMLD - Predictor(Identity function) & Corrector (Annealed Langevin dynamics)

DDPM - Predictor(Ancestral Sampling) & Corrector (Identity)

Table 1

결론 1) reverse diffusion sampler가 ancestral sampling보다 낫다.

결론 2) C2000(corrector-only) 결과가 가장 좋지 않다.

결론 3) PC1000(Predictor-Corrector Sampler)를 사용하는 것이 computation은 늘었지만, sample quality가 향상했다.

4.3 Probability flow and connection to Neural ODEs

Score-based model은 reverse-time SDE를 풀기위한 또 다른 numerical method가 존재.

trajectory가 동일한 marginal probability density를 공유하는 deterministic process (Figure 2 흰색 선)

이는 ODE를 만족 (Appendix D.1)

eq 13에서의 ODE를 probability flow ODE라 명명.

Exact likelihood computation

어떠한 입력에 대해서도 정확한 likelihood 계산 (Appendix D.2)

Main results:

(i) For the same DDPM model, we obtain better bits/dim than ELBO, since our likelihoods are exact

(ii) Using the same architecture, we trained another DDPM model with the continuous objective in Eq. (7) (i.e., DDPM cont.), which further improves the likelihood

(iii) With sub-VP SDEs, we always get higher likelihoods compared to VP SDEs

(iv) With improved architecture (i.e., DDPM++ cont., details in Section 4.4) and the sub-VP SDE, we can set a new record bits/dim of 2.99 on uniformly dequantized CIFAR-10 even without maximum likelihood training.

Manipulating latent representations

식 13에 대한 적분을 통해 datapoint에서 latent space로의 encode 가능.

ODE로 적분하면 Decoding

이를 통해 latent representation 조작 가능. (image editing, such as interpolation, and temperature scaling)

Uniquely identifiable encoding

encoding은 uniquely identifiable (i.e. sufficient training data, model capacity, optimization accuracy)

데이터 분포가 고정되었기에 ODE를 통한 식 13은 동일한 score 추정치를 기반으로 동일 경로를 제공한다.

Efficient sampling

Using a black-box ODE solver (Dormand & Prince, 1980) not only produces high quality samples (Table 2, details in Appendix D.4), but also allows us to explicitly trade-off accuracy for efficiency.

the number of function evaluations can be reduced by over 90% without affecting the visual quality of samples (Fig. 3).

4.4 Architecture Improvements

Appendix H에 VE, VP SDE모두를 사용한 score-based models에 대한 새로운 architecture를 설계하고 보임.

eq 7의 연속적 training objective로 바꿈과 동시에 network depth를 증가시킴으로써 모든 모델에서 더 나은 sample quality를 볼 수 있음.

Table 3

VE SDE가 VP/sub-VP SDEs보다 더 나은 sample quality를 보임. 다만 likelihood에서는 나쁜 결과.

가장 좋은 것은 NCSN++ const (deep, VE)

5. Controllable Generation

framework에서 continuous structure는 $p_0$에서 sample 생성뿐 아니라 $p_t (y|x(t))$를 알 때, $p_0(x(0)|y)$로부터도 sample 생성이 가능하다.

eq 14를 통해 score-based generative model의 inverse problem을 풀 수 있다.

3가지 applications : class-conditional generation & image imputation & colorization

Reference

도움이 되는 YouTube 1.

https://youtube.com/playlist?list=PLdYVZlzAcF1jEvAJg5DTINJ49lN3p1DCK&si=2QRqZ8l-LEFrPJLr

[논문리뷰]score-based generative modeling through stochastic differential equations

www.youtube.com

https://yang-song.net/blog/2021/score/

Generative Modeling by Estimating Gradients of the Data Distribution | Yang Song

Generative Modeling by Estimating Gradients of the Data Distribution This blog post focuses on a promising new direction for generative modeling. We can learn score functions (gradients of log probability density functions) on a large number of noise-pertu

yang-song.net

0000

저작자표시 비영리 동일조건

AI 공부 도전기