論文Abstract100本ノック#5 - 十の並列した脳

前回↓

ryosuke-okubo.hatenablog.com

21 ConditionalGAN（2014）
22 InfoGAN（2016）
23 pix2pix（2016）
24 pix2pixHD（2017）
25 LAPGAN（2015）

21 ConditionalGAN（2014）

f:id:ryosuke_okubo:20190910100935p:plain

原文：

Conditional Generative Adversarial Nets

Abstract

Generative Adversarial Nets [8] were recently introduced as a novel way to train generative models.

訳：

GAN[8]は生成モデルを学習するための新しい方法として最近導入された。

In this work we introduce the conditional version of generative adversarial nets, which can be constructed by simply feeding the data, y, we wish to condition on to both the generator and discriminator.

語彙：

feeding

訳：

ここではGANの条件付きバージョンを紹介する，これはデータyを与えるだけで作成でき，generatorとdiscriminatorの両方に条件付ける。

We show that this model can generate MNIST digits conditioned on class labels.

語彙：

digits

訳：

このモデルがクラスラベルを条件してMNISTから数字の画像を生成できることを示す。

We also illustrate how this model could be used to learn a multi-modal model, and provide preliminary examples of an application to image tagging in which we demonstrate how this approach can generate descriptive tags which are not part of training labels.

語彙：

illustrate

preliminary

訳：

またこのモデルを使用してマルチモーダルモデルを学習する方法を示し，この方法で学習ラベルの一部ではない記述タグを生成する方法を示す画像タグ付けへのアプリケーションの予備例を提供する。

22 InfoGAN（2016）

題名：

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Abstract

This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner.

語彙：

information-theoretic

disentangled

訳：

本論文では，完全に教師なしの方法で解きほぐされた表現を学習できる，GANの情報理論的拡張機能であるInfoGANについて説明する。

InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation.

語彙：

mutual information

latent variables

observation

訳：

InfoGANは潜在変数の小さなサブセットと観測値間の相互情報量を最大化するGANである。

We derive a lower bound to the mutual information objective that can be optimized efficiently, and show that our training procedure can be interpreted as a variation of the Wake-Sleep algorithm.

語彙：

lower bound

objective

Wake-Sleep algorithm

訳：

効率的に最適化できる相互情報量の下限を導き出し，学習手順がWake-Sleepアルゴリズムのバリエーションとして解釈できることを示す。

Specifically, InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset, pose from lighting of 3D rendered images, and background digits from the central digit on the SVHN dataset.

訳：

具体的には，InfoGANでMNISTデータセットの数字画像，3Dレンダリング画像の照明を当てた時のの姿，およびSVHNデータセットの中央数字からの背景数字の書き込みを解く。

It also discovers visual concepts that include hair styles, presence/absence of eyeglasses, and emotions on the CelebA face dataset.

語彙：

concepts

訳：

また，ヘアスタイル，眼鏡の有無，CelebA faceデータセットの感情などの視覚的な概念も見出す。

Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods.

訳：

実験によりInfoGANは既存の完全に教師ありの方法で学習した表現と競合する解釈可能な表現を学習することが示されている。

23 pix2pix（2016）

f:id:ryosuke_okubo:20190910100959p:plain

原文：

Image-to-Image Translation with Conditional Adversarial Networks

Abstract

We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems.

語彙：

general-purpose

訳：

画像から画像への変換の問題に対する汎用的な解決法として，cGANを調査する。

These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping.

訳：

これらのネットワークは入力画像から出力画像へのマッピングを学習するだけでなく，このマッピングを学習するための損失関数も学習する。

This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations.

訳：

これにより従来の非常に異なる損失関数の定式化を必要とする問題に同じ一般的なアプローチを適用することが可能になる。

We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.

訳：

このアプローチがラベルマップからの写真の合成，エッジマップからのオブジェクトの再構築，画像の色付けなどのタスクで効果的であることを示す。

Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking.

語彙：

Indeed

tweaking

訳：

実際，この論文に関連するpix2pixソフトウェアのリリース以来，多数のインターネットユーザー（多くはアーティスト）が彼らのシステムで独自の実験を投稿し，パラメーターの調整を必要とせずに幅広い適用性と採用の容易さをさらに実証している。

As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.

語彙：

no longer

hand-engineer

訳：

コミュニティとして，私たちはもはやマッピング機能を手動で設計することはない，損失関数を手動で設計しなくても合理的な結果を達成できることを示唆している。

24 pix2pixHD（2017）

f:id:ryosuke_okubo:20190910101020p:plain

原文：

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

Abstract

We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs).

語彙：

photo-realistic

semantic

訳：

cGANを使用してセマンティックラベルマップから高解像度の写真のようにリアルな画像を合成するための新しい方法を示す。

Conditional GANs have enabled a variety of applications, but the results are often limited to low-resolution and still far from realistic.

語彙：

still far from

訳：

cGANによりさまざまなアプリケーションが使用可能になったが，結果はしばしば低解像度に制限され依然として現実的ではない。

In this work, we generate 2048x1024 visually appealing results with a novel adversarial loss, as well as new multi-scale generator and discriminator architectures.

語彙：

appealing

訳：

ここでは，新しいマルチスケールgeneratorとdiscriminatorアーキテクチャだけでなく，新しいadversarial lossを伴う視覚に訴える結果を2048x1024で生成する。

Furthermore, we extend our framework to interactive visual manipulation with two additional features.

語彙：

manipulation

訳：

さらに，2つの追加機能でフレームワークをインタラクティブな視覚的操作に拡張する。

First, we incorporate object instance segmentation information, which enables object manipulations such as removing/adding objects and changing the object category.

語彙：

incorporate

訳：

まず，オブジェクトインスタンスのセグメンテーション情報を組み込む，これによりオブジェクトの削除/追加，オブジェクトカテゴリの変更などのオブジェクト操作が可能になる。

Second, we propose a method to generate diverse results given the same input, allowing users to edit the object appearance interactively.

語彙：

diverse

appearance

訳：

次に，同じ入力に対して多様な結果を生成し，ユーザーがオブジェクトの外観をインタラクティブに編集できるようにする方法を提案する。

Human opinion studies demonstrate that our method significantly outperforms existing methods, advancing both the quality and the resolution of deep image synthesis and editing.

訳：

人間を介した意見調査により我々の方法が既存の方法よりも著しく優れていることが示され、深層画像の合成と編集において品質と解像度の両方を向上させている。

25 LAPGAN（2015）

f:id:ryosuke_okubo:20190910101042p:plain

原文：

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

Abstract

In this paper we introduce a generative parametric model capable of producing high quality samples of natural images.

訳：

本論文では自然画像の高品質サンプルを生成できる生成パラメトリックモデルを紹介する。

Our approach uses a cascade of convolutional networks within a Laplacian pyramid framework to generate images in a coarse-to-fine fashion.

語彙：

coarse-to-fine

訳：

ラプラシアンピラミッドフレームワーク内の畳み込みネットワークのカスケードを使用して粗から密な方法で画像を生成する。

At each level of the pyramid, a separate generative convnet model is trained using the Generative Adversarial Nets (GAN) approach (Goodfellow et al.).

訳：

ピラミッドの各階層で，GANのアプローチ（Goodfellow et al.）を使用して個別の生成的変換モデルが学習される。

Samples drawn from our model are of significantly higher quality than alternate approaches.

語彙：

alternate

訳：

このモデルから抽出されたサンプルは他のアプローチよりも非常に高品質である。

In a quantitative assessment by human evaluators, our CIFAR10 samples were mistaken for real images around 40% of the time, compared to 10% for samples drawn from a GAN baseline model.

語彙：

mistaken

訳：

人間の評価による定量的評価では，GAN baseline modelから引き出されたサンプルの10％に対して，CIFAR10サンプルは約40％が実際の画像と間違えられた。