論文Abstract100本ノック#11 - 十の並列した脳

前回↓

ryosuke-okubo.hatenablog.com

今回はLSTMの発展について扱う。

参考：

https://qiita.com/t_Signull/items/21b82be280b46f467d1b

51 LSTM（オリジナル（1997））
52 LSTM（Forget Gateの導入（1999））
53 LSTM（Peephole Connectionの導入（2000））
54 LSTM（Full BPTTによる学習（2005））
55 GRU（2014）

51 LSTM（オリジナル（1997））

f:id:ryosuke_okubo:20191007182233p:plain

原文：

Long Short-Term Memory

Abstract：

Learning to store information over extended time intervals via recurrent backpropagation takes a very long time, mostly due to insufficient, decaying error back flow.

訳：

繰り返し逆伝播を介して長い時間間隔で情報を保存することを学習するには非常に長い時間がかかる，これは主に不十分に減衰する誤差の逆流のためである。

We briefly review Hochreiter's 1991 analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called "Long Short-Term Memory" (LSTM).

訳：

我々はHochreiterによる1991年のこの問題の分析を簡単にレビューし，「Long Short-Term Memory」（LSTM）と呼ばれる勾配に基づいた効率的で斬新な方法を導入することで対処する。

Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete time steps by enforcing constant error flow through "constant error carrousels" within special units.

語彙：

enforcing

訳：

これが害にならない勾配を切り捨てることで，LSTMは特殊なユニット内の「constant error carrousels」を介して一定のエラーフローを強制することにより，1000の離散時間ステップを超える最小タイムラグを埋めることを学習できる。

Multiplicative gate units learn to open and close access to the constant error flow.

語彙：

Multiplicative

訳：

乗法ゲートユニットは一定のエラーフローへのアクセスの開閉を学習する。

LSTM is local in space and time;

its computational complexity per time step and weight is O(1).

訳：

LSTMは空間と時間においてローカルである；

タイムステップと重みごとの計算の複雑さはO(1)である。

Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations.

訳：

我々の人工データを使用した実験には，ローカル，分散，実数値，ノイズの多いパターン表現が含まれる。

In comparisons with RTRL, BPTT, Recurrent Cascade-Correlation, Elman nets, and Neural Sequence Chunking, LSTM leads to many more successful runs, and learns much faster.

訳：

RTRL，BPTT，Recurrent Cascade-Correlation，Elman nets，およびNeural Sequence Chunkingを使用した比較では，LSTMはより多くの成功した実行につながり，より速く学習する。

LSTM also solves complex, artificial long time lag tasks that have never been solved by previous recurrent network algorithms.

訳：

LSTMは以前のリカレントネットワークアルゴリズムでは解決できなかった，複雑で人工的な長いタイムラグタスクも解決する。

52 LSTM（Forget Gateの導入（1999））

f:id:ryosuke_okubo:20191007182253p:plain

原文：

Learning to Forget: Continual Prediction with LSTM

Abstract：

Long Short-Term Memory (LSTM, Hochreiter & Schmidhuber, 1997) can solve numerous tasks not solvable by previous learning algorithms for recurrent neural networks (RNNS).

訳：

Long Short-Term Memory (LSTM, Hochreiter & Schmidhuber, 1997)はリカレントニューラルネットワーク（RNNS）の以前の学習アルゴリズムでは解決できない多くのタスクを解決できる。

We identify priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset.

語彙：

explicitly

訳：

我々はネットワークの内部状態をリセットできる明示的にマークされた端を持つサブシークエンスに事前にセグメント化されたものを識別する。

Without resets, the state may grow indefinitely and eventually cause the network to break down.

語彙：

indefinitely

eventually

訳：

リセットしないと，状態が無限に成長し最終的にネットワークが故障する可能性がある。

Our remedy is a novel, adaptive "forget gate" that enables an LSTM cell to learn to reset itself at appropriate times, thus releasing internal resources.

語彙：

remedy

訳：

我々の対処法は，LSTMセルが適切なタイミングでそれ自体をリセットすることを学習できるようにする適応性のある斬新な「forget gate」によって，内部リソースを解放する。

We review illustrative benchmark problems on which standard LSTM outperforms other RNN algorithms.

訳：

標準のLSTMが他のRNNアルゴリズムよりも優れている実例のベンチマークの問題をレビューする。

All algorithms (including LSTM) fail to solve continual versions of these problems.

訳：

すべてのアルゴリズム（LSTMを含む）はこれらの問題の継続的なバージョンを解決できない。

LSTM with forget gates, however, easily solves them in an elegant way.

訳：

ただし，forget gatesを備えたLSTMはそれらをエレガントな方法で簡単に解決する。

53 LSTM（Peephole Connectionの導入（2000））

f:id:ryosuke_okubo:20191007182314p:plain

原文：

Recurrent nets that time and count

Abstract：

The size of the time intervals between events conveys information essential for numerous sequential tasks such as motor control and rhythm detection.

語彙：

conveys

訳：

イベント間の時間間隔のサイズはモーター制御やリズム検出などの多数の連続タスクに不可欠な情報を伝える。

While hidden Markov models tend to ignore this information, recurrent neural networks (RNN) can in principle learn to make use of it.

訳：

隠れマルコフモデルはこの情報を無視する傾向があるが，RNNは原則としてそれを利用することを学習できる。

We focus on long short-term memory (LSTM) because it usually outperforms other RNN.

訳：

我々は，通常他のRNNよりも優れている，LSTMに焦点を当てる。

Surprisingly, LSTM augmented by "peephole connections" from its internal cells to its multiplicative gates can learn the fine distinction between sequences of spikes separated by either 50 or 49 discrete time steps, without the help of any short training exemplars.

語彙：

augmented

distinction

exemplars

訳：

驚くべきことに，内部セルから乗法ゲートへの「peephole connections」によって強化されたLSTMは，短い学習者の助けなしに，50または49の離散時間ステップで区切られたスパイクのシークエンス間の細かい区別を学習できる。

Without external resets or teacher forcing or loss of performance on tasks reported earlier, our LSTM variant also learns to generate very stable sequences of highly nonlinear, precisely timed spikes.

語彙：

stable

訳：

以前に報告されたタスクの外部リセット，teacher forcingまたは性能損失なしで，LSTMバリアントは非常に非線形で正確なタイミングのスパイクの非常に安定したシーケンスを生成することも学習する。

This makes LSTM a promising approach for real-world tasks that require to time and count.

語彙：

real-world

訳：

これによりLSTMは時間とカウントが必要な実際のタスクに有望なアプローチになる。

54 LSTM（Full BPTTによる学習（2005））

原文：

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Abstract：

In this paper, we present bidirectional Long Short Term Memory (LSTM) networks, and a modified, full gradient version of the LSTM learning algorithm.

語彙：

bidirectional

modified

訳：

本論文では，双方向のLSTMネットワークと，LSTM学習アルゴリズムの修正された完全勾配バージョンを示す。

We evaluate Bidirectional LSTM (BLSTM) and several other network architectures on the benchmark task of framewise phoneme classification, using the TIMIT database.

訳：

我々はTIMITデータベースを使用して，framewise phoneme classificationのベンチマークタスクでBidirectional LSTM（BLSTM）および他のいくつかのネットワークアーキテクチャを評価する。

Our main findings are that bidirectional networks outperform unidirectional ones, and Long Short Term Memory (LSTM) is much faster and also more accurate than both standard Recurrent Neural Nets (RNNs) and time windowed Multilayer Perceptrons (MLPs).

訳：

我々の主な調査結果は，双方向ネットワークは単方向ネットワークよりも優れており，LSTMは、標準のRNNとtime windowed Multilayer Perceptrons（MLPs）の両方よりもはるかに高速でありより正確であることである。

Our results support the view that contextual information is crucial to speech processing, and suggest that BLSTM is an effective architecture with which to exploit it.

語彙：

crucial

訳：

我々の結果は，文脈情報が音声処理に不可欠であるという見解を支持し，BLSTMはそれを活用するための効果的なアーキテクチャであることを示唆している。

55 GRU（2014）

f:id:ryosuke_okubo:20191007182333p:plain

原文：

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Abstract：

In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN).

訳：

本稿では，2つのRNNで構成されるRNN Encoder-Decoderと呼ばれる新しいニューラルネットワークモデルを提案する。

One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols.

訳：

1つのRNNは一連のシンボルを固定長のベクトル表現にエンコードし，もう1つのRNNはその表現を別の一連のシンボルにデコードする。

The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.

語彙：

jointly

訳：

提案されたモデルのエンコーダーとデコーダーはソースシーケンスが与えられたターゲットシーケンスの条件付き確率を最大化するために共同で学習される。

The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder-Decoder as an additional feature in the existing log-linear model.

訳：

統計的機械翻訳システムのパフォーマンスは，既存の対数線形モデルの追加機能としてRNNエンコーダーデコーダーによって計算されたフレーズペアの条件付き確率を使用することで改善することが経験的にわかっている。