論文Abstract100本ノック#1 - 十の並列した脳

本記事の進行について
1. AlexNet（2012）
2. ZFNet（2013）
3. GoogLeNet（2014）
4. VGGNet（2014）
5. ResNet（2015）

本記事の進行について

以前，論文の要約をしようと以下の記事を作成公開した。

ryosuke-okubo.hatenablog.com

それで思った

「もうAbstractだけ読めば良くね？」

本記事では，論文をとにかく「多読」する目的で，Abstractだけを訳してみる。世にはびこる速読論にならうわけではないが，読解の基本は要点を押さえることにある。論文の場合，大事なことはだいたいAbstractにまとめられているので，そこだけ読んで何となくの立ち位置を知ろう，といった感じだ。

方針として，単語と訳を載せる。

単語では，文章中で気になった，重要そうな単語をピックアップして列挙する。単語訳については後日別記事に書く予定である。

訳では，Abstractを1文ずつ区切って日本語訳する。現地点での筆者は見ただけで訳せるほどの英語力はないので，Google翻訳を活用する。完全に依存する訳ではなくアウトライン把握のためであり，不自然と感じた文章は訂正するよう努力する。

まずは下記事を参考にして，画像認識に関する論文を読んでいく。

qiita.com

1. AlexNet（2012）

f:id:ryosuke_okubo:20190904181854p:plain

原文：

ImageNet Classification with Deep Convolutional Neural Networks

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes.

単語：

convolutional neural network

訳：

ImageNet LSVRC-2010コンテストの120万の高解像度画像を1000の異なるクラスに分類するために，大規模で深い畳み込みニューラルネットワークを学習させた。

On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.

単語：

state-of-the-art

訳：

テストデータではトップ1およびトップ5エラー率37.5％および17.0％を達成しており，これは従来の最新技術よりもかなり優れている。

The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

単語：

parameters

neurons

訳：

6000万のパラメーターと65万のニューロンを持つニューラルネットワークは，5つの畳み込み層で構成され，いくつかのMaxプーリング層があり，最後に1000個のソフトマックスを持つ3つの全結合層が続く。

To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation.

単語：

saturating

訳：

学習を高速化するために，非飽和ニューロンと畳み込み演算の非常に効率的なGPU実装を使用した。

To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective.

単語：

overfitting

regularization

dropout

訳：

全結合層の過学習を減らすために，非常に効果的であることが証明された「ドロップアウト」と呼ばれる最近開発された正則化を採用した。

We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

単語：

variant

訳：

またILSVRC-2012コンテストでこのモデルの類型をエントリーし，2位のエントリーで達成された26.2％と比較して，15.3％のトップ5テストエラー率を達成した。

2. ZFNet（2013）

f:id:ryosuke_okubo:20190905091623p:plain

原文：

Visualizing and understanding convolutional networks

Large Convolutional Network models have recently demonstrated impressive classification performance on the ImageNet benchmark (Krizhevsky et al., 2012).

訳：

最近，大規模な畳み込みネットワークモデルはImageNetベンチマークで印象的な分類性能を実証した(Krizhevsky et al., 2012※AlexNetのこと)。

However there is no clear understanding of why they perform so well, or how they might be improved.

訳：

ただしそれらがなぜそれほどうまく機能するのか，またはどのように改善されるのかについての明確な理解はない。

In this paper we address both issues.

訳：

本論文では両方の問題に対処する。

We introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier.

単語：

introduce

訳：

intermediate feature layersの機能と分類器の動作についての洞察を提供する，新しい視覚化手法を紹介する。

Used in a diagnostic role, these visualizations allow us to find model architectures that outperform Krizhevsky et al. on the ImageNet classification benchmark.

単語：

diagnostic

訳：

診断の役割で使用されるこれらの視覚化により，ImageNet分類ベンチマークにてKrizhevskyらを上回るモデルアーキテクチャを見つけることができる。

We also perform an ablation study to discover the performance contribution from different model layers.

単語：

ablation study

訳：

また、さまざまなモデルレイヤーからのパフォーマンスへの影響を見つけるために，ablation studyを実行する。

We show our ImageNet model generalizes well to other datasets: when the softmax classifier is retrained, it convincingly beats the current state-of-the-art results on Caltech-101 and Caltech-256 datasets.

訳： ImageNetモデルが他のデータセットに一般化されていることを示す：softmax分類器が再学習されると、Caltech-101およびCaltech-256データセットに関する現在の最先端の結果を確実に打ち負かす。

3. GoogLeNet（2014）

f:id:ryosuke_okubo:20190905091857p:plain

原文：

Going deeper with convolutions

We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014).

単語：

propose

Inception

訳：

ImageNet Large-Scale Visual Recognition Challenge 2014（ILSVRC 2014）で分類と検出の新しい最先端を設定する役割を担った「Inception」というコード名のCNNアーキテクチャを提案する。

The main hallmark of this architecture is the improved utilization of the computing resources inside the network.

単語：

hallmark

訳：

このアーキテクチャの主な特徴は、ネットワーク内のコンピューティングリソースの利用率の向上である。

This was achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant.

訳：

これは計算の予算を一定に保ちながら，ネットワークの深さと幅を増やすことができる，慎重に作成された設計によって実現された。

To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing.

単語：

Hebbian principle

multi-scale processing

訳：

品質を最適化するために，アーキテクチャ上の決定はヘブ則とマルチスケール処理の直感に基づいた。

One particular incarnation used in our submission for ILSVRC 2014 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

訳： ILSVRC2014の提出で使用されたモデルはGoogLeNetと呼ばれ、22層の深さのネットワークであり、その品質は分類と検出のコンテキストで評価される。

4. VGGNet（2014）

原文：

Very deep convolutional networks for large-scale image recognition

In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting.

単語：

investigate

depth

訳：

ここでは，大規模な画像認識設定における畳み込みネットワークの深さがその精度に及ぼす影響を調査する。

Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

単語：

contribution

訳：

私たちの主な貢献は，非常に小さい（3x3）畳み込みフィルターを備えたアーキテクチャを使用して深さの増加するネットワークを徹底的に評価することであり，ここでは深さを16〜19の重量層に押し上げることで従来の構成の大幅な改善を実現できることを示している。

These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.

単語：

localisation

classification

訳：

これらの調査結果は、ImageNet Challenge 2014の提出の基礎となった，ここで私たちのチームはlocalisationトラックとclassificationトラックでそれぞれ1位と2位を獲得した。

We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results.

単語：

representations

訳：

私たちの表現（考え方）が他のデータセットに一般化され，最新の結果が得られることも示している。

We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

単語：

facilitate

further

訳：

コンピュータービジョンでの深い視覚表現の使用に関するさらなる研究を推進するために，2つの最高のパフォーマンスのConvNetモデルを公開した。

5. ResNet（2015）

f:id:ryosuke_okubo:20190905091923p:plain

原文：

Deep residual learning for image recognition

Deeper neural networks are more difficult to train.

訳：

より深いニューラルネットワークは学習がより困難である。

We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.

単語：

present

residual

訳：

以前に使用されたものよりかなり深いネットワークの学習を容易にするための残差学習フレームワークを提示する。

We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.

単語：

explicitly

reference

訳：

参照されていない関数を学習する代わりに，レイヤー入力を参照して残差関数を学習するようにレイヤーを明示的に再定式化する。

We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

訳：

これらの残差ネットワークは最適化が容易であり，深さが大幅に増加すると精度が上がることを示す包括的な経験的証拠を提供する。

On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity.

訳：

ImageNetデータセットでは最大152レイヤーの深さの残差ネットを評価する。これはVGGネットの8倍の深さだが複雑さは低くなっている。

An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.

訳：

これらの残差ネットのアンサンブルはImageNetテストセットで3.57％のエラーを達成した。

This result won the 1st place on the ILSVRC 2015 classification task.

訳：

この結果によりILSVRC 2015 classificationタスクで1位になった。

We also present analysis on CIFAR-10 with 100 and 1000 layers.

訳：

また100および1000層のCIFAR-10の分析も示す。

The depth of representations is of central importance for many visual recognition tasks.

単語：

of central importance

訳：

表現の深さは多くの視覚認識タスクにとって最も重要である。

Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset.

単語：

Solely

訳：

表現が非常に深いことだけで，COCOオブジェクト検出データセットで28％の相対的な改善が得られています。

Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

単語：

foundations

訳：深い残差ネットはILSVRC＆COCO 2015コンテストへの提出の基礎であり，ImageNet検出，ImageNetローカリゼーション，COCO検出，およびCOCOセグメンテーションのタスクで1位を獲得した。

次回↓

作成中