PR-114 "Recycle-GAN, Unsupervised Video Retargeting" Review (2018 ECCV)(GAN)

1. Citations & Abstract 읽기

Citations : 2021.12.24 기준 201회

저자

Aayush Bansal, Deva Ramanan - Carnegie Mellon University
Shugao Ma - Facebook Reality Lab, Pittsburgh
Yaser Sheikh - Carnegie Mellon University & Facebook Reality Lab, Pittsburgh

Abstract

우리는 한 도메인에서 원본 스타일을 보존하면서 한 도메인에서 다른 도메인으로 내용을 해석하는 비지도 비디오 재표적을 위한 데이터 기반 접근법을 소개한다. 즉, John Oliver의 연설 내용을 Stephen Colbert로 전달되어진다면 생성된 내용과 언변은 Stephen Colbert 스타일의 것이 된다. 우리들의 방식은 내용 해석과 스타일 보존을 위한 adversarial loss와 함께 공간 및 시간적 정보를 결합한다. 이 작업을 통해, 우리는 효과적인 재표적을 위한 공간적 제약들을 넘어서는 시공간적인 제약들을 사용하는 장점들을 처음으로 연구한다. 그런다음 우리는 얼굴 대 얼굴 번역, ower-to-ower, 바람과 구름 합성, 일출과 일몰과 같은 공간과 시간적 문제 모두의 정보가 중요한 문제들에 대해 제안된 접근을 증명한다.

spatiotemporal 공간과 시간상의, 시공의, 시공적인

원본 스타일을 보존하면서 다른 도메인으로의 시공간적 해석이 가능한 GAN 기반 모델을 만들었고 이를 증명한다는 것이 메인인 듯해보이는 Abstract

2. 발표 정리

https://youtu.be/eMZXUqmp_PU

공식 논문 링크

https://www.cs.cmu.edu/~aayushb/Recycle-GAN/recycle_gan.pdf

Presentation Slide

없음

Motivation

Spatial cycle consistency (of Cycle-GAN) alone is not sufficient in temporal domains

- perceptau mode collapse

- Tied spatial to input : overall optimization is focused on reconstructing the input

위 그림 (a), (b)는 Cycle GAN 사용시 / (c), (d)는 Recycle GAN 사용시

(a) Pix2Pix

Paired data

(b) CycleGAN

pairs are not available

cycle loss

$$L_c(G_X, G_Y)= \sum_t ||x_t-G_X(G_Y(x_t))||^2$$

ordering independent

(c) RecycleGAN

unpaired

temporally-ordered stream

ex) video

Recurrent loss

$$L_\tau (P_X) = \sum_t ||x_{t+1}-P_X(x_{1:t})||^2$$

Recycle loss

$$L_\tau (G_x, G_Y, P_Y) = \sum_t ||x_{t+1}-G_X(P_Y(G_Y(x_{1:t})))||^2$$

Implementation Details

mostly follow Pix2Pix and CycleGAN training details

Discriminator : 70x70 PatchGAN

$\lambda=10$

Temporal predictor $P_X$ and $P_Y$: concatenate the last two frames as input to U-Net

Viper Dataset

Recycle-GAN이 Cycle-GAN 대비 더 나은 성능을 확인함

참조

공식 홈페이지

https://www.cs.cmu.edu/~aayushb/Recycle-GAN/

Recycle-GAN

Face-to-Face We use the publicly available videos of various public figures for the face-to-face translation task. We show example videos of face-to-face translation for public figures such as Martin Luther King Jr. (MLK), Barack Obama, John Oliver, Stephe

www.cs.cmu.edu

GitHub

https://github.com/aayushbansal/Recycle-GAN

GitHub - aayushbansal/Recycle-GAN: Unsupervised Video Retargeting (e.g. face to face, flower to flower, clouds and winds, sunris

Unsupervised Video Retargeting (e.g. face to face, flower to flower, clouds and winds, sunrise and sunset) - GitHub - aayushbansal/Recycle-GAN: Unsupervised Video Retargeting (e.g. face to face, fl...

github.com