Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch.

語彙：

Inductive

scratch

訳：

Inductive transfer learningはコンピュータービジョンに大きな影響を与えたが，NLPの既存のアプローチではタスク固有の変更とスクラッチからの学習が依然として必要である。

We propose Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a language model.

語彙：

fine-tuning

訳：

我々はNLPのあらゆるタスクに適用できる効果的なtransfer learningの方法であるUniversal Language Model Fine-tuning（ULMFiT）を提案し，言語モデルのfine-tuningに重要な技術を紹介する。

Our method significantly outperforms the state-of-the-art on six text classification tasks, reducing the error by 18-24% on the majority of datasets.

訳：

我々の方法は6つのテキスト分類タスクで最先端を大幅に上回り，ほとんどのデータセットでエラーを18-24％削減する。

Furthermore, with only 100 labeled examples, it matches the performance of training from scratch on 100x more data.

訳：

さらに，ラベル付きのサンプルが100個しかなく，100倍以上のデータをゼロから学習するパフォーマンスと一致する。

We open-source our pretrained models and code.

訳：

事前トレーニング済みのモデルとコードをオープンソース化する。

62 OpenAI GPT（2018）

f:id:ryosuke_okubo:20191013100518p:plain

原文：

Improving Language Understanding by Generative Pre-Training

Abstract：

Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification.

訳：

自然言語の理解には，テキストの含意，質問への回答，セマンティックな類似性の評価，ドキュメントの分類など幅広い多様なタスクが含まれる。

Although large unlabeled text corpora are abundant, labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to perform adequately.

語彙：

discriminatively

訳：

大きなラベルのないテキストコーパスは豊富だが，これらの特定のタスクを学習するためのラベル付きデータは不足しており、識別訓練されたモデルが適切に実行するのは困難である。

We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task.

訳：

これらのタスクの大きな利点はラベル付けされていないテキストの多様なコーパスで言語モデルの生成的な事前学習を行い，その後に特定のタスクごとに差別的なfine-tuningを行うことで実現できることを示す。

In contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve effective transfer while requiring minimal changes to the model architecture.

訳：

以前のアプローチとは対照的に，モデルアーキテクチャへの最小限の変更を必要としながら効果的な転送を実現するためにfine-tuning中にタスク対応入力変換を利用する。

We demonstrate the effectiveness of our approach on a wide range of benchmarks for natural language understanding.

訳：

自然言語を理解するための幅広いベンチマークで我々のアプローチの有効性を実証している。

Our general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied.

語彙：

agnostic

訳：

一般的なタスクに依存しないモデルは各タスク用に特別に作成されたアーキテクチャを使用する識別訓練されたモデルよりも優れており，調査対象の12のタスクのうち9で最新技術を大幅に改善する。

For instance, we achieve absolute improvements of 8.9% on commonsense reasoning (Stories Cloze Test), 5.7% on question answering (RACE), and 1.5% on textual entailment (MultiNLI).

訳：

たとえば，常識的推論（Stories Cloze Test）で8.9％，質問応答（RACE）で5.7％，テキスト含意（MultiNLI）で1.5％の絶対的改善を達成している。

63 BERT（2018）

f:id:ryosuke_okubo:20191013100539p:plain

原文：

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Abstract：

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.

訳：

Transformerからの双方向エンコーダ表現（※Bidirectional Encoder Representations from Transformers）を表す，BERTと呼ばれる新しい言語表現モデルを紹介する。

Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.

訳：

最近の言語表現モデルとは異なり，BERTはすべてのレイヤーの左右のコンテキストを共同で調整することにより，ラベルのないテキストから双方向の深い表現を事前に学習するように設計されている。

As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.

訳：

その結果，事前学習済みのBERTモデルを1つの追加の出力レイヤーだけで微調整して，実質的なタスク固有のアーキテクチャの変更なしに，質問応答や言語推論などの幅広いタスク用の最先端のモデルを作成できる。

BERT is conceptually simple and empirically powerful.

訳：

BERTは概念的にシンプルで経験的に強力である。

It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).

訳：

GLUEスコア80.5％（7.7％ポイント絶対改善），MultiNLI精度86.7％（4.6％絶対改善），SQuAD v1.1 question answering Test F1 93.2~（1.5ポイント絶対改善）およびSQuAD v2.0 Test F1 83.1~（5.1ポイント絶対改善）などを含む，11の自然言語処理タスクに関する最新の結果を取得する。

64 XLNet（2019）

f:id:ryosuke_okubo:20191013100602p:plain

原文：

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Abstract：

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling.

訳：

双方向コンテキストのモデリング機能により，BERTのようなautoencodingベースの事前学習をノイズ除去することで自己回帰言語モデリングに基づく事前学習アプローチよりも優れたパフォーマンスが得られる。

However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy.

語彙：

relying on

corrupting

discrepancy

訳：

ただし，マスクによって入力を破損することに頼ると，BERTはマスクされた位置間の依存関係を無視し，事前学習と微調整の不一致に悩まされる。

In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method

that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order

and (2) overcomes the limitations of BERT thanks to its autoregressive formulation.

語彙：

pros and cons

訳：

これらの長所と短所を考慮して，一般化された自己回帰事前学習法であるXLNetを提案する，

（1）分解順序のすべての順列に対する予測尤度を最大化することにより双方向コンテキストの学習を可能にする

（2）自己回帰定式化によりBERTの制限を克服する

Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining.

訳：

さらに，XLNetは最先端の自己回帰モデルであるTransformer-XLのアイデアを事前学習に統合する。

Empirically, XLNet outperforms BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking.

訳：

経験的に，XLNetは20のタスクでBERTをはるかに上回るパフォーマンスを発揮し，質問への回答，自然言語の推論，感情分析，ドキュメントのランキングを含む18のタスクで最先端の結果を達成する。

65 RoBERTa（2019）

原文：

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Abstract：

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.

訳：

言語モデルの事前トレーニングによりパフォーマンスが大幅に向上しましたが，異なるアプローチを慎重に比較することは困難である。

Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results.

訳：

学習は計算コストが高く，多くの場合さまざまなサイズのプライベートデータセットで行われる，これから我々が示すように，ハイパーパラメーターの選択は最終結果に大きな影響を与える。

We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size.

語彙：

replication study

訳：

多くの主要なハイパーパラメーターとトレーニングデータサイズの影響を慎重に測定する，BERT事前学習の追試(Devlin et al., 2019)を紹介する。

We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it.

語彙：

undertrained

訳：

BERTの学習が大幅に不足しており，それ以降に公開されたすべてのモデルのパフォーマンスに匹敵するかそれを超えることがあることがわかった。

Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

訳：

我々の最高のモデルはGLUE，RACE，SQuADで最先端の結果を達成する。

These results highlight the importance of previously overlooked design choices, and raise questions about the source of recently reported improvements.

訳：

これらの結果はこれまで見過ごされていた設計選択の重要性を強調し，最近報告された改善の原因について疑問を提起する。

We release our models and code.

訳：

我々のモデルとコードをリリースする。

次回↓

ryosuke-okubo.hatenablog.com

2019-11-25

論文Abstract100本ノック#12

機械学習論文

前回↓

ryosuke-okubo.hatenablog.com

56 Highway Networks（2015）

原文：

Highway Networks

Abstract：

There is plenty of theoretical and empirical evidence that depth of neural networks is a crucial ingredient for their success.

語彙：

plenty

訳：

ニューラルネットワークの深さが成功の重要な要素であるという理論的および経験的証拠はたくさんある。

However, network training becomes more difficult with increasing depth and training of very deep networks remains an open problem.

語彙：

open problem

訳：

ただし，ネットワークの学習は深さが増すにつれて難しくなり，非常に深いネットワークの学習は未解決の問題のままである。

In this extended abstract, we introduce a new architecture designed to ease gradient-based training of very deep networks.

訳：

この拡張された要約では，非常に深いネットワークの勾配ベースの学習を容易にするために設計された新しいアーキテクチャを紹介する。

We refer to networks with this architecture as highway networks, since they allow unimpeded information flow across several layers on "information highways".

語彙：

unimpeded

訳：

このアーキテクチャを備えたネットワークをhighway networksと呼ぶ，これは「information highways」の複数の層に渡って妨げられない情報の流れを可能にするためです。

The architecture is characterized by the use of gating units which learn to regulate the flow of information through a network.

訳：

このアーキテクチャはネットワークを介した情報の流れの調整を学習するgating unitsの使用により特徴づけられる。

Highway networks with hundreds of layers can be trained directly using stochastic gradient descent and with a variety of activation functions, opening up the possibility of studying extremely deep and efficient architectures.

語彙：

opening up

訳：

数百層のHighway networksは，確率的勾配降下法とさまざまな活性化機能を使用して直接学習できるため，非常に深く効率的なアーキテクチャを研究する可能性が広がる。

57 Neural Machine Translation（2014）

f:id:ryosuke_okubo:20191012073041p:plain

原文：

Neural Machine Translation by Jointly Learning to Align and Translate

Abstract：

Neural machine translation is a recently proposed approach to machine translation.

訳：

ニューラル機械翻訳は最近提案された機械翻訳へのアプローチである。

Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance.

訳：

従来の統計的機械翻訳とは異なり，ニューラル機械翻訳は翻訳パフォーマンスを最大化するために共同で調整できる単一のニューラルネットワークの構築を目的としている。

The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation.

訳：

ニューラル機械翻訳用に最近提案されたモデルは，多くの場合encoder-decoderのファミリーに属し，ソース文を固定長ベクトルにエンコードするエンコーダーで構成され，そこからデコーダーが翻訳を生成する。

In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

語彙：

conjecture

訳：

本論文では，固定長ベクトルの使用がこの基本的なencoder-decoderアーキテクチャーのパフォーマンスを向上させるボトルネックであると推測し，モデルの一部を自動的に（ソフト）検索できるようにすることでこれを拡張することを提案する，ターゲットワードの予測に関連するソース文の部分については、これらの部分を明示的にハードセグメントとして形成する必要はない。

With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation.

訳：

この新しいアプローチにより，英語からフランス語への翻訳作業において，既存のstate-of-the-artのフレーズベースのシステムに匹敵する翻訳パフォーマンスを実現する。

Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

語彙：

reveals

intuition

訳：

さらに，定性分析によってモデルで見つかった（ソフト）アライメントが我々の直感とよく一致していることが明らかになった。

58 Attention（2017）

f:id:ryosuke_okubo:20191012073103p:plain

原文：

Attention Is All You Need

Abstract：

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration.

語彙：

configuration

訳：

支配的なシークエンス変換モデルはencoder-decoder構成の複雑なRNNまたはCNNに基づいている。

The best performing models also connect the encoder and decoder through an attention mechanism.

訳：

最高のパフォーマンスを発揮するモデルはattentionメカニズムを介してエンコーダーとデコーダーを接続する。

We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.

語彙：

entirely

訳：

我々は再帰と畳み込みを完全に省いた，attentionメカニズムのみに基づいた新しいシンプルなネットワークアーキテクチャであるTransformerを提案する。

Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.

語彙：

parallelizable

訳：

2つの機械翻訳タスクの実験により，これらのモデルは品質が優れている一方で，より並列化可能でトレーニングに必要な時間が大幅に短縮されることが示されている。

Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU.

訳：

私たちのモデルはWMT 2014の英語からドイツ語への翻訳タスクで28.4 BLEUを達成し，2 BLEU以上のアンサンブルを含む既存の最良の結果を改善している。

On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature.

訳：

WMT 2014の英語からフランス語への翻訳タスクでは，8つのGPUで3.5日間トレーニングした後，文献からの最高の学習コストのごく一部である新しいシングルモデルの最新BLEUスコア41.8を確立する。

We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

語彙：

parsing

訳：

Transformerは大きな学習データと限られた学習データの両方を使用して，English constituencyの解析に正常に適用することで他のタスクにうまく一般化されることを示す。

59 Transformer-XL（2019）

f:id:ryosuke_okubo:20191012073123p:plain

原文：

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Abstract：

Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling.

訳：

Transformerは長期的な依存関係を学習できる可能性があるが，言語モデリングの設定における固定長コンテキストによって制限される。

We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence.

語彙：

disrupting

temporal coherence

訳：

我々は時間的コヒーレンスを乱すことなく固定長を超える依存関係を学習できる新しいニューラルアーキテクチャTransformer-XLを提案する。

It consists of a segment-level recurrence mechanism and a novel positional encoding scheme.

訳：

これはセグメントレベルの再帰的メカニズムと新しい位置エンコードスキームで構成されている。

Our method not only enables capturing longer-term dependency, but also resolves the context fragmentation problem.

訳：

この方法は長期的な依存関係をキャプチャするだけでなく，コンテキストの断片化の問題も解決する。

As a result, Transformer-XL learns dependency that is 80% longer than RNNs and 450% longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+ times faster than vanilla Transformers during evaluation.

訳：

その結果，Transformer-XLはRNNより80％長く，vanilla Transformersより450％長い依存関係を学習し，短いシーケンスと長いシーケンスの両方でより良いパフォーマンスを達成しており，vanilla Transformersよりも最大1,800倍高速である。

Notably, we improve the state-of-the-art results of bpc/perplexity to 0.99 on enwiki8, 1.08 on text8, 18.3 on WikiText-103, 21.8 on One Billion Word, and 54.5 on Penn Treebank (without finetuning).

訳：

特に，bpc/perplexityの最新の結果をenwiki8で0.99，text8で1.08，WikiText-103で18.3，One Billion Wordで21.8，Penn Treebankで54.5に改善している（微調整なし）。

When trained only on WikiText-103, Transformer-XL manages to generate reasonably coherent, novel text articles with thousands of tokens.

語彙：

reasonably

coherent

訳：

WikiText-103でのみ学習した場合，Transformer-XLは数千のトークンを使用して，合理的に一貫した斬新なテキスト記事を生成することができる。

Our code, pretrained models, and hyperparameters are available in both Tensorflow and PyTorch.

訳：

我々のコード，事前学習済みモデル，ハイパーパラメーターはTensorflowとPyTorchの両方で利用できる。

60 ELMo（2018）

原文：

Deep contextualized word representations

Abstract：

We introduce a new type of deep contextualized word representation that models both

(1) complex characteristics of word use (e.g., syntax and semantics),

and (2) how these uses vary across linguistic contexts (i.e., to model polysemy).

訳：

我々は以下の両方をモデル化する新しいタイプの深いコンテキスト化された単語表現を紹介する。

（1）単語の使用の複雑な特性（例：構文およびセマンティクス）

（2）これらの使用は言語のコンテキストによってどのように異なるか（つまり多義性をモデル化する）

Our word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus.

訳：

我々の単語ベクトルはdeep bidirectional language model（biLM）の内部状態の学習関数であり，大きなテキストコーパスで事前に学習されている。

We show that these representations can be easily added to existing models and significantly improve the state of the art across six challenging NLP problems, including question answering, textual entailment and sentiment analysis.

語彙：

entailment

訳：

これらの表現は既存のモデルに簡単に追加でき，質問への回答，テキストの含意，感情分析など6つの困難なNLP問題全体で最新技術を大幅に改善できることを示す。

We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.

語彙：

crucial

semi-supervision

訳：

また事前学習されたネットワークの深い内部を公開することが重要であり、下流モデルが異なるタイプの半教師あり信号を混合できるようにする分析も示す。

次回↓

ryosuke-okubo.hatenablog.com

2019-11-18

論文Abstract100本ノック#11

機械学習論文

前回↓

ryosuke-okubo.hatenablog.com

今回はLSTMの発展について扱う。

参考：

https://qiita.com/t_Signull/items/21b82be280b46f467d1b

51 LSTM（オリジナル（1997））
52 LSTM（Forget Gateの導入（1999））
53 LSTM（Peephole Connectionの導入（2000））
54 LSTM（Full BPTTによる学習（2005））
55 GRU（2014）

51 LSTM（オリジナル（1997））

f:id:ryosuke_okubo:20191007182233p:plain

原文：

Long Short-Term Memory

Abstract：

Learning to store information over extended time intervals via recurrent backpropagation takes a very long time, mostly due to insufficient, decaying error back flow.

訳：

繰り返し逆伝播を介して長い時間間隔で情報を保存することを学習するには非常に長い時間がかかる，これは主に不十分に減衰する誤差の逆流のためである。

We briefly review Hochreiter's 1991 analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called "Long Short-Term Memory" (LSTM).

訳：

我々はHochreiterによる1991年のこの問題の分析を簡単にレビューし，「Long Short-Term Memory」（LSTM）と呼ばれる勾配に基づいた効率的で斬新な方法を導入することで対処する。

Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete time steps by enforcing constant error flow through "constant error carrousels" within special units.

語彙：

enforcing

訳：

これが害にならない勾配を切り捨てることで，LSTMは特殊なユニット内の「constant error carrousels」を介して一定のエラーフローを強制することにより，1000の離散時間ステップを超える最小タイムラグを埋めることを学習できる。

Multiplicative gate units learn to open and close access to the constant error flow.

語彙：

Multiplicative

訳：

乗法ゲートユニットは一定のエラーフローへのアクセスの開閉を学習する。

LSTM is local in space and time;

its computational complexity per time step and weight is O(1).

訳：

LSTMは空間と時間においてローカルである；

タイムステップと重みごとの計算の複雑さはO(1)である。

Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations.

訳：

我々の人工データを使用した実験には，ローカル，分散，実数値，ノイズの多いパターン表現が含まれる。

In comparisons with RTRL, BPTT, Recurrent Cascade-Correlation, Elman nets, and Neural Sequence Chunking, LSTM leads to many more successful runs, and learns much faster.

訳：

RTRL，BPTT，Recurrent Cascade-Correlation，Elman nets，およびNeural Sequence Chunkingを使用した比較では，LSTMはより多くの成功した実行につながり，より速く学習する。

LSTM also solves complex, artificial long time lag tasks that have never been solved by previous recurrent network algorithms.

訳：

LSTMは以前のリカレントネットワークアルゴリズムでは解決できなかった，複雑で人工的な長いタイムラグタスクも解決する。

52 LSTM（Forget Gateの導入（1999））

f:id:ryosuke_okubo:20191007182253p:plain

原文：

Learning to Forget: Continual Prediction with LSTM

Abstract：

Long Short-Term Memory (LSTM, Hochreiter & Schmidhuber, 1997) can solve numerous tasks not solvable by previous learning algorithms for recurrent neural networks (RNNS).

訳：

Long Short-Term Memory (LSTM, Hochreiter & Schmidhuber, 1997)はリカレントニューラルネットワーク（RNNS）の以前の学習アルゴリズムでは解決できない多くのタスクを解決できる。

We identify priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset.

語彙：

explicitly

訳：

我々はネットワークの内部状態をリセットできる明示的にマークされた端を持つサブシークエンスに事前にセグメント化されたものを識別する。

Without resets, the state may grow indefinitely and eventually cause the network to break down.

語彙：

indefinitely

eventually

訳：

リセットしないと，状態が無限に成長し最終的にネットワークが故障する可能性がある。

Our remedy is a novel, adaptive "forget gate" that enables an LSTM cell to learn to reset itself at appropriate times, thus releasing internal resources.

語彙：

remedy

訳：

我々の対処法は，LSTMセルが適切なタイミングでそれ自体をリセットすることを学習できるようにする適応性のある斬新な「forget gate」によって，内部リソースを解放する。

We review illustrative benchmark problems on which standard LSTM outperforms other RNN algorithms.

訳：

標準のLSTMが他のRNNアルゴリズムよりも優れている実例のベンチマークの問題をレビューする。

All algorithms (including LSTM) fail to solve continual versions of these problems.

訳：

すべてのアルゴリズム（LSTMを含む）はこれらの問題の継続的なバージョンを解決できない。

LSTM with forget gates, however, easily solves them in an elegant way.

訳：

ただし，forget gatesを備えたLSTMはそれらをエレガントな方法で簡単に解決する。

53 LSTM（Peephole Connectionの導入（2000））

f:id:ryosuke_okubo:20191007182314p:plain

原文：

Recurrent nets that time and count

Abstract：

The size of the time intervals between events conveys information essential for numerous sequential tasks such as motor control and rhythm detection.

語彙：

conveys

訳：

イベント間の時間間隔のサイズはモーター制御やリズム検出などの多数の連続タスクに不可欠な情報を伝える。

While hidden Markov models tend to ignore this information, recurrent neural networks (RNN) can in principle learn to make use of it.

訳：

隠れマルコフモデルはこの情報を無視する傾向があるが，RNNは原則としてそれを利用することを学習できる。

We focus on long short-term memory (LSTM) because it usually outperforms other RNN.

訳：

我々は，通常他のRNNよりも優れている，LSTMに焦点を当てる。

Surprisingly, LSTM augmented by "peephole connections" from its internal cells to its multiplicative gates can learn the fine distinction between sequences of spikes separated by either 50 or 49 discrete time steps, without the help of any short training exemplars.

語彙：

augmented

distinction

exemplars

訳：

驚くべきことに，内部セルから乗法ゲートへの「peephole connections」によって強化されたLSTMは，短い学習者の助けなしに，50または49の離散時間ステップで区切られたスパイクのシークエンス間の細かい区別を学習できる。

Without external resets or teacher forcing or loss of performance on tasks reported earlier, our LSTM variant also learns to generate very stable sequences of highly nonlinear, precisely timed spikes.

語彙：

stable

訳：

以前に報告されたタスクの外部リセット，teacher forcingまたは性能損失なしで，LSTMバリアントは非常に非線形で正確なタイミングのスパイクの非常に安定したシーケンスを生成することも学習する。

This makes LSTM a promising approach for real-world tasks that require to time and count.

語彙：

real-world

訳：

これによりLSTMは時間とカウントが必要な実際のタスクに有望なアプローチになる。

54 LSTM（Full BPTTによる学習（2005））

原文：

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Abstract：

In this paper, we present bidirectional Long Short Term Memory (LSTM) networks, and a modified, full gradient version of the LSTM learning algorithm.

語彙：

bidirectional

modified

訳：

本論文では，双方向のLSTMネットワークと，LSTM学習アルゴリズムの修正された完全勾配バージョンを示す。

We evaluate Bidirectional LSTM (BLSTM) and several other network architectures on the benchmark task of framewise phoneme classification, using the TIMIT database.

訳：

我々はTIMITデータベースを使用して，framewise phoneme classificationのベンチマークタスクでBidirectional LSTM（BLSTM）および他のいくつかのネットワークアーキテクチャを評価する。

Our main findings are that bidirectional networks outperform unidirectional ones, and Long Short Term Memory (LSTM) is much faster and also more accurate than both standard Recurrent Neural Nets (RNNs) and time windowed Multilayer Perceptrons (MLPs).

訳：

我々の主な調査結果は，双方向ネットワークは単方向ネットワークよりも優れており，LSTMは、標準のRNNとtime windowed Multilayer Perceptrons（MLPs）の両方よりもはるかに高速でありより正確であることである。

Our results support the view that contextual information is crucial to speech processing, and suggest that BLSTM is an effective architecture with which to exploit it.

語彙：

crucial

訳：

我々の結果は，文脈情報が音声処理に不可欠であるという見解を支持し，BLSTMはそれを活用するための効果的なアーキテクチャであることを示唆している。

55 GRU（2014）

f:id:ryosuke_okubo:20191007182333p:plain

原文：

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Abstract：

In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN).

訳：

本稿では，2つのRNNで構成されるRNN Encoder-Decoderと呼ばれる新しいニューラルネットワークモデルを提案する。

One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols.

訳：

1つのRNNは一連のシンボルを固定長のベクトル表現にエンコードし，もう1つのRNNはその表現を別の一連のシンボルにデコードする。

The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.

語彙：

jointly

訳：

提案されたモデルのエンコーダーとデコーダーはソースシーケンスが与えられたターゲットシーケンスの条件付き確率を最大化するために共同で学習される。

The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder-Decoder as an additional feature in the existing log-linear model.

訳：

統計的機械翻訳システムのパフォーマンスは，既存の対数線形モデルの追加機能としてRNNエンコーダーデコーダーによって計算されたフレーズペアの条件付き確率を使用することで改善することが経験的にわかっている。

Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.

語彙：

semantically

syntactically

訳：

定性的には，提案されたモデルが言語句の意味的および構文的に意味のある表現を学習することを示す。

次回↓

ryosuke-okubo.hatenablog.com

2019-11-11

論文Abstract100本ノック#10

機械学習論文

前回↓

ryosuke-okubo.hatenablog.com

46 VAE（2013）
47 VHRED（2016）
48 Pointer Networks（2015）
49 CopyNet（2016）
50 Pointer-Generator Network（2017）

46 VAE（2013）

f:id:ryosuke_okubo:20190930213922p:plain

原文：

Auto-Encoding Variational Bayes

Abstract：

How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets?

語彙：

directed

intractable

訳：

難解な事後分布を伴う連続的な潜在変数と大規模なデータセットの存在下で，有向確率モデルで効率的な推論と学習をどのように実行するか？

We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case.

語彙：

stochastic variational inference

訳：

大規模なデータセットにスケーリングする確率的変分推論および学習アルゴリズムを導入する，それはいくつかの穏やかな微分可能性条件下では，対処が難しい場合でも機能する。

Our contributions is two-fold.

訳：

私たちの貢献は2つある。

First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods.

語彙：

reparameterization

straightforwardly

stochastic gradient methods

訳：

最初に，変分下限の再パラメーター化により確率的勾配降下法を使用して簡単に最適化できる下限推定量が得られることを示す。

Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator.

語彙：

i.i.d.

approximate

訳：

次に，データ点ごとに連続した潜在変数を持つi.i.d.データセットの場合，提案された下限推定量を使用して近似推論モデル（認識モデルとも呼ばれる）を難解な事後分布に適合させることにより，事後推論を特に効率的に行うことができる。

Theoretical advantages are reflected in experimental results.

訳：

理論上の利点は実験結果に反映されている。

47 VHRED（2016）

f:id:ryosuke_okubo:20190930213957p:plain

原文：

A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

Abstract：

Sequential data often possesses a hierarchical structure with complex dependencies between subsequences, such as found between the utterances in a dialogue.

語彙：

possesses

utterances

訳：

シークエンスのデータは，多くの場合ダイアログ内の発話間に見られるようなサブシークエンス間の複雑な依存関係を持つ階層構造を持っている。

In an effort to model this kind of generative process, we propose a neural network-based generative architecture, with latent stochastic variables that span a variable number of time steps.

訳：

この種の生成プロセスをモデル化するために，ニューラルネットワークベースの生成アーキテクチャを提案する，このアーキテクチャは可変数のタイムステップにわたる潜在的な確率変数を備えている。

We apply the proposed model to the task of dialogue response generation and compare it with recent neural network architectures.

訳：

提案されたモデルを対話応答生成のタスクに適用し，最近のニューラルネットワークアーキテクチャと比較する。

We evaluate the model performance through automatic evaluation metrics and by carrying out a human evaluation.

訳：

自動評価指標と人間による評価によって，モデルのパフォーマンスを評価する。

The experiments demonstrate that our model improves upon recently proposed models and that the latent variables facilitate the generation of long outputs and maintain the context.

訳：

実験は，我々のモデルが最近提案されたモデルを改善し，潜在変数が長い出力の生成を促進しコンテキストを維持することを示す。

48 Pointer Networks（2015）

f:id:ryosuke_okubo:20190930214057p:plain

原文：

Pointer Networks

Abstract：

We introduce a new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence.

語彙：

corresponding

訳：

入力シーケンスの位置に対応する離散トークンである要素を持つ出力シークエンスの条件付き確率を学習するために新しいニューラルアーキテクチャを導入する。

Such problems cannot be trivially addressed by existent approaches such as sequence-to-sequence and Neural Turing Machines, because the number of target classes in each step of the output depends on the length of the input, which is variable.

語彙：

trivially

訳：

出力の各ステップのターゲットクラス数は入力の長さ（変数）に依存するため，このような問題はsequence-to-sequenceやNeural Turing Machinesなどの既存のアプローチでは簡単に対処できない。

Problems such as sorting variable sized sequences, and various combinatorial optimization problems belong to this class.

訳：

可変サイズのシークエンスのソートなどの問題，およびさまざまな組み合わせ最適化の問題はこのクラスに属する。

Our model solves the problem of variable size output dictionaries using a recently proposed mechanism of neural attention.

訳：

我々のモデルは最近提案されたneural attentionのメカニズムを使用して可変サイズの出力辞書の問題を解決する。

It differs from the previous attention attempts in that, instead of using attention to blend hidden units of an encoder to a context vector at each decoder step, it uses attention as a pointer to select a member of the input sequence as the output.

語彙：

instead of

blend

訳：

これは各デコーダーステップでエンコーダーの非表示ユニットをコンテキストベクトルに混ぜる代わりにattentionを使用して、入力シークエンスのメンバーを出力として選択するという点で，以前のattentionの試みとは異なる。

We call this architecture a Pointer Net (Ptr-Net).

訳：

我々はこのアーキテクチャをPointer Net（Ptr-Net）と呼ぶ。

We show Ptr-Nets can be used to learn approximate solutions to three challenging geometric problems -- finding planar convex hulls, computing Delaunay triangulations, and the planar Travelling Salesman Problem -- using training examples alone.

語彙：

convex hulls

Delaunay triangulations

Travelling Salesman Problem

訳：

Ptr-Netsを使用して，学習例のみを使用して平面凸包の検出，Delaunay三角形分割の計算、平面巡回セールスマン問題の3つの困難な幾何学的問題の近似解を学習できることを示す。

Ptr-Nets not only improve over sequence-to-sequence with input attention, but also allow us to generalize to variable size output dictionaries.

訳：

Ptr-Netsはattentionを入力してsequence-to-sequenceを改善するだけでなく，可変サイズの出力辞書に一般化することもできる。

We show that the learnt models generalize beyond the maximum lengths they were trained on.

訳：

学習したモデルは訓練された最大長を超えて一般化することを示す。

We hope our results on these tasks will encourage a broader exploration of neural learning for discrete problems.

訳：

これらのタスクに関する結果が離散問題のニューラル学習のより広範な探索を促進することを期待する。

49 CopyNet（2016）

f:id:ryosuke_okubo:20190930214236p:plain

原文：

Incorporating Copying Mechanism in Sequence-to-Sequence Learning

Abstract：

We address an important problem in sequence-to-sequence (Seq2Seq) learning referred to as copying, in which certain segments in the input sequence are selectively replicated in the output sequence.

語彙：

referred to as

訳：

入力シークエンスの特定のセグメントが出力シークエンスで選択的に複製される、copyingと呼ばれるsequence-to-sequence（Seq2Seq）学習の重要な問題に対処する。

A similar phenomenon is observable in human language communication.

語彙：

phenomenon

訳：

同様の現象は人間の言語コミュニケーションでも見られる。

For example, humans tend to repeat entity names or even long phrases in conversation.

語彙：

entity names

訳：

たとえば，人間は会話でエンティティ名や長いフレーズを繰り返す傾向があります。

The challenge with regard to copying in Seq2Seq is that new machinery is needed to decide when to perform the operation.

訳：

Seq2Seqでのcopyingに関する課題は，操作を実行するタイミングを決定するために新しい機械が必要なことである。

In this paper, we incorporate copying into neural network-based Seq2Seq learning and propose a new model called CopyNet with encoder-decoder structure.

訳：

本論文では，ニューラルネットワークベースのSeq2Seq学習にcopyingを組み込み，エンコーダーデコーダー構造を備えたCopyNetと呼ばれる新しいモデルを提案する。

CopyNet can nicely integrate the regular way of word generation in the decoder with the new copying mechanism which can choose sub-sequences in the input sequence and put them at proper places in the output sequence.

訳：

CopyNetはデコーダでの通常の単語生成方法と，入力シークエンスのサブシークエンスを選択して出力シークエンスの適切な場所に配置できる新しいコピーメカニズムとをうまく統合できる。

Our empirical study on both synthetic data sets and real world data sets demonstrates the efficacy of CopyNet.

訳：

合成データセットと実世界のデータセットの両方に関する我々の実証研究はCopyNetの有効性を示している。

For example, CopyNet can outperform regular RNN-based model with remarkable margins on text summarization tasks.

訳：

たとえば，CopyNetは通常のRNNベースのモデルよりも優れておりテキストの要約タスクに大きなマージンがある。

50 Pointer-Generator Network（2017）

f:id:ryosuke_okubo:20190930214313p:plain

原文：

Get To The Point: Summarization with Pointer-Generator Networks

Abstract：

Neural sequence-to-sequence models have provided a viable new approach for abstractive text summarization (meaning they are not restricted to simply selecting and rearranging passages from the original text).

語彙：

viable

訳：

ニューラルsequence-to-sequenceモデルは抽象テキスト要約のための実行可能な新しいアプローチを提供してきた（元のテキストからパッセージを選択して再配置することに限定されないことを意味する）。

However, these models have two shortcomings:

they are liable to reproduce factual details inaccurately, and they tend to repeat themselves.

語彙：

shortcomings

be liable to

訳：

ただし，これらのモデルには2つの欠点がある：

事実の詳細を不正確に再現しがちであり，繰り返す傾向がある。

In this work we propose a novel architecture that augments the standard sequence-to-sequence attentional model in two orthogonal ways.

語彙：

orthogonal

訳：

この作業では2つの直交する方法で標準のsequence-to-sequence attentionalモデルを強化する新しいアーキテクチャを提案する。

First, we use a hybrid pointer-generator network that can copy words from the source text via pointing, which aids accurate reproduction of information, while retaining the ability to produce novel words through the generator.

訳：

まず，pointingを介してソーステキストから単語をコピーできるhybrid pointer-generator networkを使用する，これによりgeneratorを介して新しい単語を生成する機能を保持しながら，情報を正確に再現できる。

Second, we use coverage to keep track of what has been summarized, which discourages repetition.

語彙：

discourages

訳：

次に，coverageを使用して要約された内容を追跡する，これにより繰り返しが発生しなくなる。

We apply our model to the CNN / Daily Mail summarization task, outperforming the current abstractive state-of-the-art by at least 2 ROUGE points.

訳：

このモデルをCNN / Daily Mail要約タスクに適用し，現在の抽象的で最先端の技術を少なくとも2 ROUGE points上回る性能を発揮する。

次回↓

ryosuke-okubo.hatenablog.com

2019-11-04

論文Abstract100本ノック#9

機械学習論文

前回↓

ryosuke-okubo.hatenablog.com

今回から言語処理にまつわる論文を扱う。

41 Word2Vec（2013）
42 GloVe（2014）
43 Doc2Vec（2014）
44 Seq2Seq（2014）
45 HRED（2015）

41 Word2Vec（2013）

f:id:ryosuke_okubo:20190928093218p:plain

原文：

Efficient Estimation of Word Representations in Vector Space

Abstract：

We propose two novel model architectures for computing continuous vector representations of words from very large data sets.

訳：

非常に大きなデータセットから単語の連続ベクトル表現を計算するための2つの新しいモデルアーキテクチャを提案する。

The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks.

訳：

これらの表現の品質は単語の類似性タスクで測定され，結果をさまざまなタイプのニューラルネットワークに基づいている従来に最高の性能を発揮した手法と比較する。

We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set.

訳：

はるかに低い計算コストで精度が大幅に向上する，言い換えると16億語のデータセットから高品質の言語ベクトルを学習するのに1日もかからない。

Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

訳：

さらに，これらのベクトルは構文および意味の単語の類似性を測定するためのテストセットで最先端の性能を提供することを示す。

42 GloVe（2014）

原文：

GloVe: Global Vectors for Word Representation

Abstract：

Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque.

語彙：

semantic

opaque

訳：

単語のベクトル空間表現を学習するための最近の方法はベクトル演算を使用してきめ細かいセマンティックおよび構文規則性をとらえることに成功しているが，これらの規則性の起源は不透明なままである。

We analyze and make explicit the model properties needed for such regularities to emerge in word vectors.

語彙：

explicit

emerge

訳：

我々はこのような規則性が単語ベクトルに現れるために必要なモデルプロパティを分析し明示的にする。

The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature:

global matrix factorization and local context window methods.

語彙：

logbilinear

訳：

その結果は文献の2つの主要なモデルファミリーの利点を組み合わせた新しいグローバルな対数線形回帰モデルである：

global matrix factorizationとlocal context window methods。

Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus.

語彙：

leverages

cooccurrence matrix

訳：

このモデルはスパース行列全体または大規模なコーパスの個々のコンテキストウィンドウではなく，単語間共起行列の非ゼロ要素のみで学習することにより統計情報を効率的に活用する。

The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task.

語彙：

meaningful

訳：

このモデルは最近の単語の類推タスクで75％のパフォーマンスを示しているように，意味のある部分構造を持つベクトル空間を生成します。

It also outperforms related models on similarity tasks and named entity recognition.

語彙：

named entity recognition

訳：

また類似性タスクおよび固有表現抽出に関する関連モデルよりも優れている。

43 Doc2Vec（2014）

f:id:ryosuke_okubo:20190928093252p:plain

原文：

Distributed Representations of Sentences and Documents

Abstract：

Many machine learning algorithms require the input to be represented as a fixed-length feature vector.

訳：

多くの機械学習アルゴリズムでは入力を固定長の特徴ベクトルとして表す必要がある。

When it comes to texts, one of the most common fixed-length features is bag-of-words.

訳：

テキストに関して言えば，最も一般的な固定長特徴の1つはbag-of-wordsである。

Despite their popularity, bag-of-words features have two major weaknesses:

they lose the ordering of the words and they also ignore semantics of the words.

語彙：

Despite

訳：

人気があるにもかかわらず，bag-of-wordsには2つの大きな弱点がある：

単語の順序が失われ，単語のセマンティクスも無視される。

For example, "powerful," "strong" and "Paris" are equally distant.

訳：

たとえば，「強力な」，「強い，、「パリ」は同じくらい遠い。

In this paper, we propose Paragraph Vector, an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents.

訳：

本稿では，文，段落，文書などの可変長のテキストから固定長の特徴表現を学習する教師なしアルゴリズムである，Paragraph Vectorを提案する。

Our algorithm represents each document by a dense vector which is trained to predict words in the document.

訳：

我々のアルゴリズムは文書内の単語を予測するために学習された密ベクトルによって各文書を表す。

Its construction gives our algorithm the potential to overcome the weaknesses of bag-of-words models.

訳：

その構造によってアルゴリズムにbag-of-wordsモデルの弱点を克服する可能性が与えられる。

Empirical results show that Paragraph Vectors outperform bag-of-words models as well as other techniques for text representations.

訳：

経験結果はParagraph Vectorsがbag-of-wordsモデルやテキスト表現の他の手法よりも優れていることを示す。

Finally, we achieve new state-of-the-art results on several text classification and sentiment analysis tasks.

語彙：

sentiment

訳：

最後に，いくつかのテキスト分類および感情分析タスクに関する新しいstate-of-the-artを達成する。

44 Seq2Seq（2014）

f:id:ryosuke_okubo:20190928093315p:plain

原文：

Sequence to Sequence Learning with Neural Networks

Abstract：

Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks.

訳：

DNNは困難な学習タスクで優れたパフォーマンスを達成した強力なモデルである。

Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences.

語彙：

whenever

訳：

DNNはラベル付きの大きな学習セットが利用できる場合は常に機能するが，シークエンスにシークエンスをマッピングするために使用することはできない。

In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure.

語彙：

assumptions

訳：

本稿では，シークエンス構造に関する最小限の仮定を行う，シークエンス学習に対する一般的なエンドツーエンドのアプローチを示す。

Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.

訳：

我々の方法では多層のLong Short-Term Memory（LSTM）を使用して入力シーケンスを固定次元ベクトルにマッピングし，次に別のdeep LSTMを使用してベクトルからターゲットシークエンスをデコードする。

Our main result is that on an English to French translation task from the WMT'14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words.

訳：

主な結果は，WMT'14データセットからの英語からフランス語への翻訳タスクでLSTMによって生成された翻訳において，LSTMのBLEUスコアが語彙外の単語に対してペナルティを課された場合，テストセット全体で34.8のBLEUスコアを達成する。

Additionally, the LSTM did not have difficulty on long sentences.

訳：

さらに，LSTMは長い文でも問題はなかった。

For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset.

語彙：

For comparison

訳：

比較のために，フレーズベースのSMTシステムは同じデータセットで33.3のBLEUスコアを達成する。

When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous best result on this task.

語彙：

hypotheses

訳：

LSTMを使用して前述のSMTシステムによって生成された1000の仮説を再ランク付けすると，BLEUスコアは36.5に増加し，このタスクの以前の最良の結果に近くなる。

The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice.

語彙：

word order

訳：

LSTMはまた，語順に敏感で，能動的および受動的音声に比較的不変である理にかなったフレーズと文の表現を学習した。

Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

訳：

最後に，ソース文とターゲット文の間に短期間の依存関係が多く導入され最適化問題が容易になるため，すべてのソース文（ターゲット文ではない）の単語の順序を逆にすることでLSTMのパフォーマンスが著しく向上することがわかった。

45 HRED（2015）

f:id:ryosuke_okubo:20190928093340p:plain

原文：

A Hierarchical Recurrent Encoder-Decoder For Generative Context-Aware Query Suggestion

Abstract：

Users may strive to formulate an adequate textual query for their information need.

語彙：

strive

adequate

訳：

ユーザーは情報のニーズに応じて適切なテキストクエリを作成するよう努力する場合がある。

Search engines assist the users by presenting query suggestions.

訳：

検索エンジンはクエリ候補を提示することでユーザーを支援する。

To preserve the original search intent, suggestions should be context-aware and account for the previous queries issued by the user.

語彙：

context-aware

account for

訳：

元の検索意図を保持するために，提案はコンテキスト認識してユーザーが発行した以前のクエリを考慮する必要がある。

Achieving context awareness is challenging due to data sparsity.

訳：

データの希薄性のためコンテキスト認識の達成は困難である。

We present a probabilistic suggestion model that is able to account for sequences of previous queries of arbitrary lengths.

語彙：

arbitrary

訳：

我々は任意の長さの以前のクエリのシークエンスを説明できる確率的提案モデルを提示する。

Our novel hierarchical recurrent encoder-decoder architecture allows the model to be sensitive to the order of queries in the context while avoiding data sparsity.

訳：

我々の斬新なhierarchical recurrent encoder-decoderアーキテクチャにより，データスパースを回避しながらモデルをコンテキスト内のクエリの順序に敏感にすることができる。

Additionally, our model can suggest for rare, or long-tail, queries.

語彙：

rare

long-tail

訳：

さらに，このモデルではまれな、またはロングテールのクエリを提案できる。

The produced suggestions are synthetic and are sampled one word at a time, using computationally cheap decoding techniques.

語彙：

computationally

訳：

生成された提案は合成的であり，計算的に安価なデコード技術を使用して、一度に1語ずつサンプリングされる。

This is in contrast to current synthetic suggestion models relying upon machine learning pipelines and hand-engineered feature sets.

訳：

これは機械学習パイプラインと手作業の特徴セットに依存する現在の合成提案モデルとは対照的である。

Results show that it outperforms existing context-aware approaches in a next query prediction setting.

訳：

結果は次のクエリ予測設定で既存のコンテキスト認識アプローチよりも優れていることを示す。

In addition to query suggestion, our model is general enough to be used in a variety of other applications.

訳：

クエリの提案に加えて，このモデルは他のさまざまなアプリケーションで使用するのに十分一般的である。

次回↓

ryosuke-okubo.hatenablog.com

2019-10-28

論文Abstract100本ノック#8

機械学習論文

前回↓

ryosuke-okubo.hatenablog.com

36 YOLO（2015）
37 SSD（2015）
38 Mask R-CNN（2017）
39 RetinaNet（2017）
40 M2Det（2018）

36 YOLO（2015）

f:id:ryosuke_okubo:20190920175744p:plain

原文：

You Only Look Once: Unified, Real-Time Object Detection

Abstract：

We present YOLO, a new approach to object detection.

訳：

物体検出の新しいアプローチであるYOLOを紹介する。

Prior work on object detection repurposes classifiers to perform detection.

語彙：

repurposes

訳：

物体検出に関する先行研究では，検出を実行するために分類を再利用している。

Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities.

訳：

代わりに，空間的に分離されたバウンディングボックスと関連するクラス確率に対する回帰問題として物体検出をフレーム化する。

A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation.

訳：

単一のニューラルネットワークは1回の評価で完全な画像から直接バウンディングボックスとクラス確率を予測する。

Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance.

訳：

検出パイプライン全体が単一のネットワークであるため，検出性能に基づいてエンドツーエンドで直接最適化できる。

Our unified architecture is extremely fast.

訳：

統合アーキテクチャは非常に高速である。

Our base YOLO model processes images in real-time at 45 frames per second.

訳：

基本的なYOLOモデルは毎秒45フレームでリアルタイムに画像を処理する。

A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors.

語彙：

astounding

訳：

ネットワークの小型バージョンであるFast YOLOは，他のリアルタイム検出器の2倍のmAPを達成しながら毎秒155フレームという驚異的な処理を行う。

Compared to state-of-the-art detection systems, YOLO makes more localization errors but is far less likely to predict false detections where nothing exists.

語彙：

far less

false detections

訳：

最先端の検出システムと比較して，YOLOはより多くのローカライズエラーを発生させるが何もない場合に誤検出を予測する可能性ははるかに低い。

Finally, YOLO learns very general representations of objects.

訳：

最後に，YOLOは物体の非常に一般的な表現を学習する。

It outperforms all other detection methods, including DPM and R-CNN, by a wide margin when generalizing from natural images to artwork on both the Picasso Dataset and the People-Art Dataset.

語彙：

artwork

訳：

PicassoデータセットとPeople-Artデータセットの両方で自然画から絵画に一般化すると，DPMやR-CNNを含む他のすべての検出方法よりもはるかに優れている。

37 SSD（2015）

f:id:ryosuke_okubo:20190920175812p:plain

原文：

SSD: Single Shot MultiBox Detector

Abstract：

We present a method for detecting objects in images using a single deep neural network.

訳：

単一のDNNを使用して画像内の物体を検出する方法を示す。

Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location.

語彙：

discretizes

aspect ratios

訳：

SSDという名前のアプローチは，バウンディングボックスの出力空間を機能マップの場所ごとに異なるアスペクト比とスケールでデフォルトボックスのセットに離散化する。

At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape.

語彙：

presence

adjustments

訳：

予測時に，ネットワークは各デフォルトボックス内の各物体カテゴリの存在のスコアを生成し，物体の形状によりよく一致するようにボックスの調整を生成する。

Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes.

語彙：

resolutions

handle

訳：

さらに，ネットワークはさまざまな解像度の複数の機能マップからの予測を組み合わせて，さまざまなサイズの物体を自然に処理する。

Our SSD model is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stage and encapsulates all computation in a single network.

語彙：

relative

subsequent

encapsulates

訳：

SSDモデルは物体の提案を必要とする方法に比べて単純であり，提案の生成とそれに続くピクセルまたは機能のリサンプリング段階を完全に排除しすべての計算を単一のネットワークにカプセル化する。

This makes SSD easy to train and straightforward to integrate into systems that require a detection component.

語彙：

straightforward

訳：

これによりSSDの学習が容易になり検出要素を必要とするシステムに簡単に統合できる。

Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets confirm that SSD has comparable accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference.

語彙：

confirm

utilize

訳：

PASCAL VOC，MS COCO，およびILSVRCデータセットの実験結果はSSDが追加の物体提案ステップを利用する方法に匹敵する精度を持ちはるかに高速であると同時に，学習と推論の両方に統一されたフレームワークを提供することを確認する。

For 300×300 input, SSD achieves 72.1% mAP on VOC2007 test at 58 FPS on a Nvidia Titan X and for 500×500 input, SSD achieves 75.1% mAP, outperforming a comparable state of the art Faster R-CNN model.

訳：

300×300入力の場合，SSDはNvidia Titan Xで58 FPSのVOC2007テストで72.1％mAPを達成し，500×500入力の場合，SSDは75.1％mAPを達成し，同等の最先端のFaster R-CNNモデルよりも優れている。

Compared to other single stage methods, SSD has much better accuracy, even with a smaller input image size.

訳：

他のシングルステージ方式と比較して，SSDは入力画像サイズが小さくても精度がはるかに高くなる。

Code is available at https://github.com/weiliu89/caffe/tree/ssd.

訳：

コードはhttps://github.com/weiliu89/caffe/tree/ssdで入手できる。

38 Mask R-CNN（2017）

f:id:ryosuke_okubo:20190920175832p:plain

原文：

Mask R-CNN

Abstract：

We present a conceptually simple, flexible, and general framework for object instance segmentation.

語彙：

conceptually

訳：

物体インスタンスのセグメンテーションのための概念的にシンプル，柔軟，一般的なフレームワークを示す。

Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance.

訳：

我々のアプローチは画像内の物体を効率的に検出すると同時にインスタンスごとに高品質のセグメンテーションマスクを生成する。

The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.

訳：

Mask R-CNNと呼ばれる方法は，バウンディングボックス認識の既存のブランチと並行してオブジェクトマスクを予測するためのブランチを追加することによりFaster R-CNNを拡張する。

Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps.

訳：

Mask R-CNNは学習が簡単で5 fpsで実行されるFaster R-CNNにわずかなオーバーヘッドしか追加しない。

Moreover, Mask R-CNN is easy to generalize to other tasks,

e.g., allowing us to estimate human poses in the same framework.

訳：

さらに，Mask R-CNNは他のタスクに簡単に一般化できる，

たとえば，同じフレームワークで人間のポーズを推定できる。

We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection.

訳：

インスタンスのセグメンテーション，バウンディングボックスによる物体の検出，人物のキーポイントの検出など，COCOの一連の課題の3つすべてで最高の結果を示している。

Without bells and whistles, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners.

語彙：

bells and whistles

訳：

余計なものがない場合，Mask R-CNNはCOCO 2016チャレンジ受賞者を含むすべてのタスクで既存の単一モデルエントリよりも優れている。

We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition.

訳：

シンプルで効果的なアプローチが強固なベースラインとして機能し，インスタンスレベルの認識に関する今後の研究を容易にすることを期待する。

Code has been made available at: https://github.com/facebookresearch/Detectron

訳：

コードは次の場所から入手できる。

https://github.com/facebookresearch/Detectron

39 RetinaNet（2017）

f:id:ryosuke_okubo:20190920175851p:plain

原文：

Focal Loss for Dense Object Detection

Abstract：

The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations.

語彙：

popularized

訳：

これまでの最高精度の物体検出器はR-CNNにより一般化された2段階アプローチに基づいており，分類器が候補となる物体位置の疎なセットに適用される。

In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far.

訳：

対照的に、可能な物体の位置の定期的な高密度サンプリングに適用される1段検出器はより高速でシンプルであるが，2段検出器の精度よりも劣っている。

In this paper, we investigate why this is the case.

訳：

本論文では，なぜそうなのかを調査する。

We discover that the extreme foreground-background class imbalance encountered during training of dense detectors is the central cause.

語彙：

encountered

訳：

高密度検出器の学習中に発生する極度の前景と背景のクラスの不均衡が中心的な原因であることを発見した。

We propose to address this class imbalance by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples.

訳：

よく分類された例に割り当てられた損失の重みを小さくするように，標準クロスエントロピー損失を再形成することによってこのクラスの不均衡に対処することを提案する。

Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.

語彙：

Focal Loss

vast number of

overwhelming

訳：

我々の新しいFocal Lossは学習をハードサンプルのまばらなセットに集中させ，学習中に膨大な数の簡単なネガが検出器を圧迫するのを防ぐ。

To evaluate the effectiveness of our loss, we design and train a simple dense detector we call RetinaNet.

訳：

損失の有効性を評価するために，RetinaNetと呼ばれる単純な高密度検出器を設計および学習する。

Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors.

語彙：

surpassing

訳：

我々の結果は，RetinaNetがfocal lossで学習された場合，既存のすべての最先端の2段検出器の精度を超えながら従来のの1段検出器の速度に一致できることを示す。

Code is at:

https://github.com/facebookresearch/Detectron

訳：

コードは次のとおりである：

https://github.com/facebookresearch/Detectron

40 M2Det（2018）

f:id:ryosuke_okubo:20190920175917p:plain

原文：

M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network

Abstract：

Feature pyramids are widely exploited by both the state-of-the-art one-stage object detectors (e.g., DSSD, RetinaNet, RefineDet) and the two-stage object detectors (e.g., Mask R-CNN, DetNet) to alleviate the problem arising from scale variation across object instances.

訳：

Feature pyramidsは物体インスタンス間のスケールのばらつきから生じる問題を軽減するため，最先端の1段物体検出器（DSDS、RetinaNet、RefineDetなど）と2段物体検出器（Mask R-CNN、DetNetなど）の両方で広く活用されている。

Although these object detectors with feature pyramids achieve encouraging results, they have some limitations due to that they only simply construct the feature pyramid according to the inherent multi-scale, pyramidal architecture of the backbones which are actually designed for object classification task.

訳：

feature pyramidsを備えたこれらの物体検出器は有望な結果を達成するが，物体分類タスク用に実際に設計されたバックボーンの固有のマルチスケールピラミッドアーキテクチャに従って特徴ピラミッドを構築するだけであるため，いくつかの制限がある。

Newly, in this work, we present a method called Multi-Level Feature Pyramid Network (MLFPN) to construct more effective feature pyramids for detecting objects of different scales.

訳：

新たにここでは，異なるスケールの物体を検出するためのより効果的なfeature pyramidsを構築するためのMulti-Level Feature Pyramid Network（MLFPN）と呼ばれる方法を提示する。

First, we fuse multi-level features (i.e. multiple layers) extracted by backbone as the base feature.

訳：

最初に，基幹機能としてバックボーンによって抽出されたマルチレベル機能（つまり複数のレイヤー）を融合する。

Second, we feed the base feature into a block of alternating joint Thinned U-shape Modules and Feature Fusion Modules and exploit the decoder layers of each u-shape module as the features for detecting objects.

訳：

次に，基幹機能をThinned U-shape ModulesとFeature Fusion Modulesが交互に並ぶブロックに送り込み，各u-shape moduleのデコーダレイヤーを物体検出機能として利用する。

Finally, we gather up the decoder layers with equivalent scales (sizes) to develop a feature pyramid for object detection, in which every feature map consists of the layers (features) from multiple levels.

訳：

最後に，同等のスケール（サイズ）を持つデコーダレイヤーを集めて物体検出用のfeature pyramidを開発する，各特徴マップは複数レベルのレイヤー（特徴）で構成される。

To evaluate the effectiveness of the proposed MLFPN, we design and train a powerful end-to-end one-stage object detector we call M2Det by integrating it into the architecture of SSD, which gets better detection performance than state-of-the-art one-stage detectors.

訳：

提案されたMLFPNの有効性を評価するために，M2DetをSSDのアーキテクチャに統合することでM2Detと呼ばれる強力なエンドツーエンドの1段物体検出器を設計および学習する，それにより最先端の1段検出器よりも優れた検出性能が得られる。

Specifically, on MS-COCO benchmark, M2Det achieves AP of 41.0 at speed of 11.8 FPS with single-scale inference strategy and AP of 44.2 with multi-scale inference strategy, which is the new state-of-the-art results among one-stage detectors.

訳：

具体的には，MS-COCOベンチマークでは，M2Detはシングルスケール推論戦略で11.8 FPSの速度で41.0のAPを達成し，マルチスケール推論戦略で44.2のAPを達成した，これは1段検出器の中で最高の結果である。

The code will be made available on https://github.com/qijiezhao/M2Det.

訳：

コードはhttps://github.com/qijiezhao/M2Detで利用可能である。

次回↓

ryosuke-okubo.hatenablog.com