아이공의 AI 공부 도전기

PR-315 "Taming Transformers for High-Resolution Image Synthesis" Review (2021 CVPR)(Image Synthesis GAN)

 

     

 

1. Citations & Abstract 읽기

Citations : 2022.03.22 기준 154 회

저자

Patrick Esser, Robin Rombach, Bj¨orn Ommer - Heidelberg Collaboratory for Image Processing, IWR, Heidelberg University, Germany

Abstract

 

 

 

2. 발표 정리

 

 

공식 논문 링크

https://openaccess.thecvf.com/content/CVPR2021/html/Esser_Taming_Transformers_for_High-Resolution_Image_Synthesis_CVPR_2021_paper.html

 

CVPR 2021 Open Access Repository

Taming Transformers for High-Resolution Image Synthesis Patrick Esser, Robin Rombach, Bjorn Ommer; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 12873-12883 Abstract Designed to learn long-range interac

openaccess.thecvf.com

Arxiv

https://arxiv.org/abs/2012.09841

 

Taming Transformers for High-Resolution Image Synthesis

Designed to learn long-range interactions on sequential data, transformers continue to show state-of-the-art results on a wide variety of tasks. In contrast to CNNs, they contain no inductive bias that prioritizes local interactions. This makes them expres

arxiv.org

 

Presentation Slide

https://www.slideshare.net/HyeongminLee3/pr315-taming-transformers-for-highresolution-image-synthesis

 

PR-315: Taming Transformers for High-Resolution Image Synthesis

요즘 Transformer 구조를 language랑 vision 관계 없이 여기저기 적용해보려는 시도가 매우 다양하게 이루어지고 있는데요, 그래서 이번주 제 발표에서는 이를 High-resolution image synthesis에 활용한, CVPR 2021

www.slideshare.net

Contents

https://youtu.be/GcbT0IGt0xE

 

 

 Transformer를 활용한 Image Synthesis는 한계점이 분명해 보임.

 

 

CNN은 Inductive Bias를 통해 상대적 학습량이 줄어듬

Transformer는 많은 데이터 사용으로 인해 복잡도가 높아짐.

 

 

Patch -> Reshape -> vectorization

CNN을 통한 vectorization

 

Language에서의 input은 discrete vector sequence

 

 embedding을 위해 사용하는 lookup table 방법 

 

Image input이 Encoder를 통해 vectorization이 진행됨.

입력 이미지보다 16배 작은 height, width를 가지는 $\hat{z}$를 구성

Lookup Table과 같은 Codebook $\mathcal{Z}$ : N개의 code sample

codebook과 L2 loss를 통해 차이가 작은 것들을 기반으로 quantization을 진행하여 $z_q$를 획득

vector quantization

 

 

quantization된 $z_q$를 Decoder G에 넣어 Image Generation을 진행함.

 

Reconstruction loss인 L2 Loss

sg : stop-gradient operation

 

$\lambda$ : Decoder의 마지막 layer L에 대한 gradient 값을 분모와 분자에 대해 구함.

 

Unconditional Generation vs Conditional Generation

 

 

log likelihood maximize = softmax logit loss

다음이 나올 예측을 위한 방안

 

인접한 patch들 간의 attention 계산을 통해 다음 값을 찾아냄

연산량이 늘어날 것 없이 이미지 합성이 가능

 

 

condition : class, segmentation map, edge information etc

 

 

 

고화질의 이미지 데이터 생성 또한 가능

 

 

참조

GitHub

https://github.com/CompVis/taming-transformers

 

GitHub - CompVis/taming-transformers: Taming Transformers for High-Resolution Image Synthesis

Taming Transformers for High-Resolution Image Synthesis - GitHub - CompVis/taming-transformers: Taming Transformers for High-Resolution Image Synthesis

github.com

https://compvis.github.io/taming-transformers/

 

Taming Transformers for High-Resolution Image Synthesis

Abstract Designed to learn long-range interactions on sequential data, transformers continue to show state-of-the-art results on a wide variety of tasks. In contrast to CNNs, they contain no inductive bias that prioritizes local interactions. This makes th

compvis.github.io

 

 

 

블로그

https://arankomatsuzaki.wordpress.com/2021/03/04/state-of-the-art-image-generative-models/?fbclid=IwAR3qOm985YRdCMzq-O504qFAJa1LuakpEPcE4N3ICwZQ0tUBblGJ5HLYNrY#vqgan 

 

State-of-the-Art Image Generative Models

I have aggregated some of the SotA image generative models released recently, with short summaries, visualizations and comments. The overall development is summarized, and the future trends are spe…

arankomatsuzaki.wordpress.com

 

공유하기

facebook twitter kakaoTalk kakaostory naver band
loading