By ์ตœํ˜ธ2 min read378 words

Generative Adversarial Nets

AI
Technology

๐Ÿง  Generative Adversarial Networks: ์ƒ์„ฑ ๋ชจ๋ธ์˜ ์ „ํ™˜์ 

Ian Goodfellow et al., โ€œGenerative Adversarial Nets,โ€ NeurIPS 2014.


1๏ธโƒฃ ์„œ๋ก  โ€” ์ƒ์„ฑ ๋ชจ๋ธ์˜ ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„

๋”ฅ๋Ÿฌ๋‹์˜ ๊ธ‰๊ฒฉํ•œ ๋ฐœ์ „ ์ดํ›„, ์—ฐ๊ตฌ์˜ ์ดˆ์ ์€ โ€œ์ธ์‹(recognition)โ€์—์„œ โ€œ์ƒ์„ฑ(generation)โ€์œผ๋กœ ์˜ฎ๊ฒจ๊ฐ”๋‹ค.

๊ธฐ์กด์˜ ํ™•๋ฅ ์  ์ƒ์„ฑ ๋ชจ๋ธ(์˜ˆ: Variational Autoencoder, Boltzmann Machine)์€

๋ชจ๋ธ๋ง ๊ณผ์ •์˜ ๋ณต์žก์„ฑ๊ณผ likelihood ๊ณ„์‚ฐ์˜ ์–ด๋ ค์›€์œผ๋กœ ์ธํ•ด ์‹ค์šฉ์  ํ•œ๊ณ„๋ฅผ ๋ณด์˜€๋‹ค.

2014๋…„ Goodfellow๊ฐ€ ์ œ์•ˆํ•œ Generative Adversarial Network (GAN) ์€

์ด๋Ÿฌํ•œ ์ œ์•ฝ์„ ๊ทน๋ณตํ•˜๋ฉฐ ๋ช…์‹œ์  ํ™•๋ฅ ๋ถ„ํฌ ์—†์ด ๋ฐ์ดํ„ฐ๋ฅผ ์ง์ ‘ ์ƒ์„ฑํ•˜๋Š” ์ƒˆ๋กœ์šด ์ ‘๊ทผ ๋ฐฉ์‹์„ ์—ด์—ˆ๋‹ค.

GAN์€ ๋‘ ๋„คํŠธ์›Œํฌ ๊ฐ„์˜ ์ ๋Œ€์  ํ•™์Šต(adversarial training) ์„ ํ†ตํ•ด

๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๋ฅผ ๊ทผ์‚ฌํ•˜๋Š” ๋น„์ง€๋„ ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ๋‹ค.


2๏ธโƒฃ ๊ธฐ๋ณธ ์›๋ฆฌ โ€” ์ ๋Œ€์  ํ•™์Šต(Adversarial Training)

GAN์€ Generator (G) ์™€ Discriminator (D) ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.

  • Generator G(z):

    ์ž ์žฌ ๋ฒกํ„ฐ zโˆผpz(z)z \sim p_z(z)zโˆผpz(z)๋ฅผ ์ž…๋ ฅ๋ฐ›์•„ ๋ฐ์ดํ„ฐ ๊ณต๊ฐ„์œผ๋กœ ๋งคํ•‘, ๊ฐ€์งœ ์ƒ˜ํ”Œ ์ƒ์„ฑ

  • Discriminator D(x):

    ์ž…๋ ฅ xxx๊ฐ€ ์‹ค์ œ ๋ฐ์ดํ„ฐ์ธ์ง€, ์ƒ์„ฑ๋œ ๊ฐ€์งœ์ธ์ง€ ํŒ๋ณ„

๋‘ ๋„คํŠธ์›Œํฌ๋Š” ๋‹ค์Œ์˜ ๋ฏธ๋‹ˆ๋งฅ์Šค ๊ฒŒ์ž„์œผ๋กœ ์ •์˜๋œ๋‹ค.

minโกGmaxโกDV(D,G)=Exโˆผpdata(x)[logโกD(x)]+Ezโˆผpz(z)[logโก(1โˆ’D(G(z)))]\min_G \max_D V(D, G) = \mathbb{E}{x \sim p{data}(x)}[\log D(x)] + \mathbb{E}_{z \sim p_z(z)}[\log(1 - D(G(z)))]

GminDmaxV(D,G)=Exโˆผpdata(x)[logD(x)]+Ezโˆผpz(z)[log(1โˆ’D(G(z)))]

์ด ๊ณผ์ •์—์„œ

  • D๋Š” ์ง„์งœ ๋ฐ์ดํ„ฐ๋ฅผ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ๋ถ„๋ฅ˜ํ•˜๋ ค ํ•˜๊ณ ,
  • G๋Š” D๋ฅผ ์†์ผ ์ˆ˜ ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“ค์–ด๋‚ธ๋‹ค.

ํ•™์Šต์ด ์ถฉ๋ถ„ํžˆ ์ง„ํ–‰๋˜๋ฉด,

์ด๋ก ์ ์œผ๋กœ pg=pdatap_g = p_{data}pg=pdata์ผ ๋•Œ D(x) = 0.5 ๊ฐ€ ๋˜์–ด ๋” ์ด์ƒ ๊ตฌ๋ถ„ ๋ถˆ๊ฐ€๋Šฅํ•œ ์ƒํƒœ์— ๋„๋‹ฌํ•œ๋‹ค.


3๏ธโƒฃ ํ•™์Šต์˜ ๋ถˆ์•ˆ์ •์„ฑ๊ณผ ๊ฐœ์„  ์‹œ๋„

GAN์˜ ๊ฐ€์žฅ ํฐ ์•ฝ์ ์€ ํ•™์Šต ๋ถˆ์•ˆ์ •์„ฑ์ด๋‹ค.

๋‘ ๋„คํŠธ์›Œํฌ์˜ ํ•™์Šต ๊ท ํ˜•์ด ๋งž์ง€ ์•Š์œผ๋ฉด gradient vanishing ๋˜๋Š” mode collapse ํ˜„์ƒ์ด ๋ฐœ์ƒํ•œ๋‹ค.

(1) Mode Collapse

โ†’ Generator๊ฐ€ ํŠน์ • ํŒจํ„ด๋งŒ ๋ฐ˜๋ณต ์ƒ์„ฑ, ๋‹ค์–‘์„ฑ์ด ์‚ฌ๋ผ์ง

(2) Gradient Vanishing

โ†’ Discriminator๊ฐ€ ๊ณผ๋„ํ•˜๊ฒŒ ์šฐ์„ธํ•  ๊ฒฝ์šฐ, Generator์˜ ํ•™์Šต ์‹ ํ˜ธ๊ฐ€ ์‚ฌ๋ผ์ง

์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ˆ˜๋งŽ์€ ๋ณ€ํ˜• ๋ชจ๋ธ์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค.


4๏ธโƒฃ ์ฃผ์š” ๋ณ€ํ˜• ๋ชจ๋ธ ๋ฐ ๊ธฐ์ˆ ์  ์ง„ํ™”

์—ฐ๋„๋ชจ๋ธ์ฃผ์š” ๊ธฐ์—ฌํ•ต์‹ฌ ๊ธฐ์ˆ 
2015DCGANCNN ๊ธฐ๋ฐ˜ ๊ตฌ์กฐ ์ œ์•ˆConv/Deconv, BatchNorm
2017WGANํ•™์Šต ์•ˆ์ •ํ™”, ์ง€ํ‘œ ๊ฐœ์„ Wasserstein Distance, Weight Clipping
2017WGAN-GPGradient Penalty ๋„์ž…Lipschitz ์ œ์•ฝ ์™„ํ™”
2017CycleGAN๋น„์ง€๋„ ์ด๋ฏธ์ง€ ๋ณ€ํ™˜Cycle Consistency Loss
2018StyleGAN์Šคํƒ€์ผ ์ œ์–ด ๊ธฐ๋ฐ˜ ์ƒ์„ฑStyle-based architecture, AdaIN
2019BigGAN๋Œ€๊ทœ๋ชจ ํ•™์Šต ์•ˆ์ •ํ™”Spectral Norm, Orthogonal Reg.

๐Ÿ’ฌ DCGAN (Radford et al., 2015)

CNN ๊ตฌ์กฐ๋ฅผ ์ด์šฉํ•ด GAN ํ•™์Šต์„ ์•ˆ์ •ํ™”.

Fully Connected Layer๋ฅผ ์ œ๊ฑฐํ•˜๊ณ , BatchNorm๊ณผ LeakyReLU๋ฅผ ๋„์ž…ํ•จ์œผ๋กœ์จ

์ด๋ฏธ์ง€ ์ƒ์„ฑ ํ’ˆ์งˆ์ด ๋น„์•ฝ์ ์œผ๋กœ ํ–ฅ์ƒ๋˜์—ˆ๋‹ค.

๐Ÿ’ฌ WGAN (Arjovsky et al., 2017)

Jensenโ€“Shannon divergence ๋Œ€์‹  Wasserstein distance (Earth Moverโ€™s distance) ์‚ฌ์šฉ.

์ด๋กœ์จ ํ•™์Šต ์•ˆ์ •์„ฑ๊ณผ loss์™€์˜ ์ƒ๊ด€์„ฑ์ด ํ™•๋ณด๋˜์–ด

โ€œGAN loss๊ฐ€ ์˜๋ฏธ ์žˆ๋Š” ์ˆ˜๋ ด ๊ณก์„ โ€์„ ๊ฐ–๊ฒŒ ๋˜์—ˆ๋‹ค.

W(pdata,pg)=infโกฮณโˆˆฮ (pdata,pg)E(x,y)โˆผฮณ[โˆฅxโˆ’yโˆฅ]W(p_{data}, p_g) = \inf_{\gamma \in \Pi(p_{data}, p_g)} \mathbb{E}_{(x, y) \sim \gamma} [|x - y|]

W(pdata,pg)=ฮณโˆˆฮ (pdata,pg)infE(x,y)โˆผฮณ[โˆฅxโˆ’yโˆฅ]

๐Ÿ’ฌ StyleGAN (Karras et al., 2019)

Latent space๋ฅผ ์Šคํƒ€์ผ ๊ณต๊ฐ„์œผ๋กœ ๋ณ€ํ™˜ํ•ด,

์„ธ๋ถ€ ์ˆ˜์ค€๋ณ„ ์ œ์–ด๊ฐ€ ๊ฐ€๋Šฅํ•œ โ€œstyle-based generatorโ€ ๊ตฌ์กฐ๋ฅผ ๋„์ž…ํ–ˆ๋‹ค.

์ด๋ฅผ ํ†ตํ•ด ์–ผ๊ตด ํ•ฉ์„ฑ, ์งˆ๊ฐ ๋ณ€ํ˜• ๋“ฑ์—์„œ ์ธ๊ฐ„ ์ˆ˜์ค€์˜ ์‚ฌ์‹ค๊ฐ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค.


5๏ธโƒฃ ํ‰๊ฐ€ ์ง€ํ‘œ์˜ ์ง„ํ™”

GAN์˜ ํ’ˆ์งˆ์„ ๊ฐ๊ด€์ ์œผ๋กœ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ์ง€ํ‘œ๊ฐ€ ์ œ์•ˆ๋˜์—ˆ๋‹ค.

์ง€ํ‘œ์˜๋ฏธ๋‹จ์ 
Inception Score (IS)๋‹ค์–‘์„ฑ๊ณผ ๋ช…ํ™•๋„ ํ‰๊ฐ€์‹ค์ œ ๋ถ„ํฌ์™€์˜ ์ฐจ์ด ๋ฐ˜์˜ ๋ถˆ๊ฐ€
Frรฉchet Inception Distance (FID)๋‘ ๋ถ„ํฌ ๊ฐ„ ๊ฑฐ๋ฆฌ์ด๋ฏธ์ง€ ๋„๋ฉ”์ธ ์ข…์†์„ฑ ์กด์žฌ
Precision & Recall for GANs์ƒ์„ฑ ๋‹ค์–‘์„ฑ/ํ’ˆ์งˆ ๋™์‹œ ํ‰๊ฐ€๊ณ„์‚ฐ ๋ณต์žก๋„ ๋†’์Œ

ํ˜„์žฌ๋Š” FID๊ฐ€ ๊ฐ€์žฅ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ํ‘œ์ค€ ์ง€ํ‘œ๋กœ ์ž๋ฆฌ์žก์•˜๋‹ค.


6๏ธโƒฃ GAN์˜ ์‘์šฉ ์˜์—ญ

  • ์ด๋ฏธ์ง€ ํ•ฉ์„ฑ ๋ฐ ๋ณ€ํ™˜: DeepFake, Face Aging, Super Resolution
  • ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•: ์˜๋ฃŒ ์˜์ƒ, ์ž์œจ์ฃผํ–‰ ๋ฐ์ดํ„ฐ ๋ณด๊ฐ•
  • ๋„๋ฉ”์ธ ์ „ํ™˜: ๋‚ฎโ†”๋ฐค, ์—ฌ๋ฆ„โ†”๊ฒจ์šธ ์ด๋ฏธ์ง€ ๋ณ€ํ™˜
  • Representation Learning: ๋น„์ง€๋„ ํŠน์ง• ์ถ”์ถœ

์ตœ๊ทผ์—๋Š” GAN์ด Diffusion Model์— ๋น„ํ•ด ์ฃผ๋ชฉ๋„๋Š” ์ค„์—ˆ์œผ๋‚˜,

์ƒ˜ํ”Œ ํšจ์œจ์„ฑ๊ณผ ์‹ค์‹œ๊ฐ„์„ฑ ์ธก๋ฉด์—์„œ ์—ฌ์ „ํžˆ ๊ฐ•์ ์„ ๋ณด์ธ๋‹ค.


7๏ธโƒฃ Diffusion Model๊ณผ์˜ ๋น„๊ต

ํ•ญ๋ชฉGANDiffusion
ํ•™์Šต ์•ˆ์ •์„ฑ๋‚ฎ์Œ๋†’์Œ
ํ•™์Šต ์†๋„๋น ๋ฆ„๋А๋ฆผ
์ƒ์„ฑ ํ’ˆ์งˆ์ผ๋ถ€ ๋…ธ์ด์ฆˆ ์ž”์กด๋งค์šฐ ๊ณ ํ’ˆ์งˆ
์ƒ˜ํ”Œ๋ง ์†๋„๋น ๋ฆ„ (1-step)๋А๋ฆผ (์ˆ˜๋ฐฑ step)

Diffusion์ด ํ’ˆ์งˆ ๋ฉด์—์„œ๋Š” ์šฐ์œ„์ง€๋งŒ,

GAN์€ ์—ฌ์ „ํžˆ ์‹ค์‹œ๊ฐ„ ์˜์ƒ ์ƒ์„ฑ, ๋ชจ๋ฐ”์ผ ํ™˜๊ฒฝ, ์ œํ•œ๋œ ๋ฐ์ดํ„ฐ์…‹ ํ•™์Šต์— ์œ ๋ฆฌํ•˜๋‹ค.


8๏ธโƒฃ ๊ฒฐ๋ก  ๋ฐ ์ „๋ง

GAN์€ ์ƒ์„ฑ ๋ชจ๋ธ ์—ฐ๊ตฌ์˜ ๋ฐฉํ–ฅ์„ฑ์„ ์™„์ „ํžˆ ๋ฐ”๊ฟ”๋†“์€ ๊ธฐ๋…๋น„์  ๊ธฐ์ˆ ์ด๋‹ค.

๊ทธ ์ž์ฒด์˜ ํ•œ๊ณ„(๋ถˆ์•ˆ์ •ํ•œ ํ•™์Šต, ํ‰๊ฐ€ ์ง€ํ‘œ์˜ ๋ชจํ˜ธ์„ฑ)์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ 

์ƒ์„ฑ์  ํŒจ๋Ÿฌ๋‹ค์ž„์˜ ์„œ๋ง‰์„ ์—ฐ ๋ชจ๋ธ์ด๋ผ๋Š” ์ ์—์„œ ๊ทธ ์˜์˜๋Š” ์—ฌ์ „ํžˆ ํฌ๋‹ค.

์ตœ๊ทผ ์—ฐ๊ตฌ๋“ค์€ GAN์„ ์™„์ „ํžˆ ๋Œ€์ฒดํ•˜๊ธฐ๋ณด๋‹ค๋Š”,

Diffusion ๋ชจ๋ธ๊ณผ์˜ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ตฌ์กฐ(์˜ˆ: Diffusion-GAN, MaskGIT ๋“ฑ)๋กœ ์ง„ํ™”ํ•˜๋Š” ์ถ”์„ธ๋‹ค.

๊ฒฐ๊ตญ GAN์€ โ€œ๋๋‚œ ๊ธฐ์ˆ โ€์ด ์•„๋‹ˆ๋ผ,

์ƒ์„ฑ ์ธ๊ณต์ง€๋Šฅ์˜ ๊ทผ๋ณธ ์ฒ ํ•™์„ ๋‹ด๊ณ  ์žˆ๋Š” ์ถœ๋ฐœ์ ์ด๋ผ ํ•  ์ˆ˜ ์žˆ๋‹ค.


๐Ÿ“š ์ฐธ๊ณ ๋ฌธํ—Œ

  • Goodfellow, I. et al. Generative Adversarial Nets. NeurIPS 2014.
  • Radford, A. et al. DCGAN: Unsupervised Representation Learning with Deep Convolutional GANs. arXiv 2015.
  • Arjovsky, M. et al. Wasserstein GAN. ICML 2017.
  • Karras, T. et al. A Style-Based Generator Architecture for GANs. CVPR 2019.
  • Heusel, M. et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. NeurIPS 2017.