論文Abstract100本ノック#12 - 十の並列した脳

前回↓

ryosuke-okubo.hatenablog.com

56 Highway Networks（2015）

原文：

Abstract：

There is plenty of theoretical and empirical evidence that depth of neural networks is a crucial ingredient for their success.

語彙：

plenty

訳：

ニューラルネットワークの深さが成功の重要な要素であるという理論的および経験的証拠はたくさんある。

However, network training becomes more difficult with increasing depth and training of very deep networks remains an open problem.

語彙：

open problem

訳：

ただし，ネットワークの学習は深さが増すにつれて難しくなり，非常に深いネットワークの学習は未解決の問題のままである。

In this extended abstract, we introduce a new architecture designed to ease gradient-based training of very deep networks.

訳：

この拡張された要約では，非常に深いネットワークの勾配ベースの学習を容易にするために設計された新しいアーキテクチャを紹介する。

We refer to networks with this architecture as highway networks, since they allow unimpeded information flow across several layers on "information highways".

語彙：

unimpeded

訳：

このアーキテクチャを備えたネットワークをhighway networksと呼ぶ，これは「information highways」の複数の層に渡って妨げられない情報の流れを可能にするためです。

The architecture is characterized by the use of gating units which learn to regulate the flow of information through a network.

訳：

このアーキテクチャはネットワークを介した情報の流れの調整を学習するgating unitsの使用により特徴づけられる。

Highway networks with hundreds of layers can be trained directly using stochastic gradient descent and with a variety of activation functions, opening up the possibility of studying extremely deep and efficient architectures.

語彙：

opening up

訳：

数百層のHighway networksは，確率的勾配降下法とさまざまな活性化機能を使用して直接学習できるため，非常に深く効率的なアーキテクチャを研究する可能性が広がる。

57 Neural Machine Translation（2014）

f:id:ryosuke_okubo:20191012073041p:plain

原文：

Neural Machine Translation by Jointly Learning to Align and Translate

Abstract：

Neural machine translation is a recently proposed approach to machine translation.

訳：

ニューラル機械翻訳は最近提案された機械翻訳へのアプローチである。

Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance.

訳：

従来の統計的機械翻訳とは異なり，ニューラル機械翻訳は翻訳パフォーマンスを最大化するために共同で調整できる単一のニューラルネットワークの構築を目的としている。

The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation.

訳：

ニューラル機械翻訳用に最近提案されたモデルは，多くの場合encoder-decoderのファミリーに属し，ソース文を固定長ベクトルにエンコードするエンコーダーで構成され，そこからデコーダーが翻訳を生成する。

In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

語彙：

conjecture

訳：

本論文では，固定長ベクトルの使用がこの基本的なencoder-decoderアーキテクチャーのパフォーマンスを向上させるボトルネックであると推測し，モデルの一部を自動的に（ソフト）検索できるようにすることでこれを拡張することを提案する，ターゲットワードの予測に関連するソース文の部分については、これらの部分を明示的にハードセグメントとして形成する必要はない。

With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation.

訳：

この新しいアプローチにより，英語からフランス語への翻訳作業において，既存のstate-of-the-artのフレーズベースのシステムに匹敵する翻訳パフォーマンスを実現する。

Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

語彙：

reveals

intuition

訳：

さらに，定性分析によってモデルで見つかった（ソフト）アライメントが我々の直感とよく一致していることが明らかになった。

58 Attention（2017）

f:id:ryosuke_okubo:20191012073103p:plain

原文：

Attention Is All You Need

Abstract：

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration.

語彙：

configuration

訳：

支配的なシークエンス変換モデルはencoder-decoder構成の複雑なRNNまたはCNNに基づいている。

The best performing models also connect the encoder and decoder through an attention mechanism.

訳：

最高のパフォーマンスを発揮するモデルはattentionメカニズムを介してエンコーダーとデコーダーを接続する。

We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.

語彙：

entirely

訳：

我々は再帰と畳み込みを完全に省いた，attentionメカニズムのみに基づいた新しいシンプルなネットワークアーキテクチャであるTransformerを提案する。

Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.

語彙：

parallelizable

訳：

2つの機械翻訳タスクの実験により，これらのモデルは品質が優れている一方で，より並列化可能でトレーニングに必要な時間が大幅に短縮されることが示されている。

Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU.

訳：

私たちのモデルはWMT 2014の英語からドイツ語への翻訳タスクで28.4 BLEUを達成し，2 BLEU以上のアンサンブルを含む既存の最良の結果を改善している。

On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature.

訳：

WMT 2014の英語からフランス語への翻訳タスクでは，8つのGPUで3.5日間トレーニングした後，文献からの最高の学習コストのごく一部である新しいシングルモデルの最新BLEUスコア41.8を確立する。

We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

語彙：

parsing

訳：

Transformerは大きな学習データと限られた学習データの両方を使用して，English constituencyの解析に正常に適用することで他のタスクにうまく一般化されることを示す。

59 Transformer-XL（2019）

f:id:ryosuke_okubo:20191012073123p:plain

原文：

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Abstract：

Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling.

訳：

Transformerは長期的な依存関係を学習できる可能性があるが，言語モデリングの設定における固定長コンテキストによって制限される。

We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence.

語彙：

disrupting

temporal coherence

訳：

我々は時間的コヒーレンスを乱すことなく固定長を超える依存関係を学習できる新しいニューラルアーキテクチャTransformer-XLを提案する。

It consists of a segment-level recurrence mechanism and a novel positional encoding scheme.

訳：

これはセグメントレベルの再帰的メカニズムと新しい位置エンコードスキームで構成されている。

Our method not only enables capturing longer-term dependency, but also resolves the context fragmentation problem.

訳：

この方法は長期的な依存関係をキャプチャするだけでなく，コンテキストの断片化の問題も解決する。

As a result, Transformer-XL learns dependency that is 80% longer than RNNs and 450% longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+ times faster than vanilla Transformers during evaluation.

訳：

その結果，Transformer-XLはRNNより80％長く，vanilla Transformersより450％長い依存関係を学習し，短いシーケンスと長いシーケンスの両方でより良いパフォーマンスを達成しており，vanilla Transformersよりも最大1,800倍高速である。

Notably, we improve the state-of-the-art results of bpc/perplexity to 0.99 on enwiki8, 1.08 on text8, 18.3 on WikiText-103, 21.8 on One Billion Word, and 54.5 on Penn Treebank (without finetuning).

訳：

特に，bpc/perplexityの最新の結果をenwiki8で0.99，text8で1.08，WikiText-103で18.3，One Billion Wordで21.8，Penn Treebankで54.5に改善している（微調整なし）。

When trained only on WikiText-103, Transformer-XL manages to generate reasonably coherent, novel text articles with thousands of tokens.

語彙：

reasonably

coherent

訳：

WikiText-103でのみ学習した場合，Transformer-XLは数千のトークンを使用して，合理的に一貫した斬新なテキスト記事を生成することができる。

Our code, pretrained models, and hyperparameters are available in both Tensorflow and PyTorch.

訳：

我々のコード，事前学習済みモデル，ハイパーパラメーターはTensorflowとPyTorchの両方で利用できる。

60 ELMo（2018）

原文：

Deep contextualized word representations

Abstract：

We introduce a new type of deep contextualized word representation that models both

(1) complex characteristics of word use (e.g., syntax and semantics),

and (2) how these uses vary across linguistic contexts (i.e., to model polysemy).

訳：

我々は以下の両方をモデル化する新しいタイプの深いコンテキスト化された単語表現を紹介する。

（1）単語の使用の複雑な特性（例：構文およびセマンティクス）

（2）これらの使用は言語のコンテキストによってどのように異なるか（つまり多義性をモデル化する）

Our word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus.

訳：

我々の単語ベクトルはdeep bidirectional language model（biLM）の内部状態の学習関数であり，大きなテキストコーパスで事前に学習されている。

We show that these representations can be easily added to existing models and significantly improve the state of the art across six challenging NLP problems, including question answering, textual entailment and sentiment analysis.

語彙：

entailment

訳：

これらの表現は既存のモデルに簡単に追加でき，質問への回答，テキストの含意，感情分析など6つの困難なNLP問題全体で最新技術を大幅に改善できることを示す。

We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.

語彙：

crucial

semi-supervision

訳：

また事前学習されたネットワークの深い内部を公開することが重要であり、下流モデルが異なるタイプの半教師あり信号を混合できるようにする分析も示す。

次回↓

ryosuke-okubo.hatenablog.com