論文Abstract100本ノック#13 - 十の並列した脳

前回↓

ryosuke-okubo.hatenablog.com

61 ULMFit（2018）

f:id:ryosuke_okubo:20191013100457p:plain

原文：

Universal Language Model Fine-tuning for Text Classification

Abstract：

Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch.

語彙：

Inductive

scratch

訳：

Inductive transfer learningはコンピュータービジョンに大きな影響を与えたが，NLPの既存のアプローチではタスク固有の変更とスクラッチからの学習が依然として必要である。

We propose Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a language model.

語彙：

fine-tuning

訳：

我々はNLPのあらゆるタスクに適用できる効果的なtransfer learningの方法であるUniversal Language Model Fine-tuning（ULMFiT）を提案し，言語モデルのfine-tuningに重要な技術を紹介する。

Our method significantly outperforms the state-of-the-art on six text classification tasks, reducing the error by 18-24% on the majority of datasets.

訳：

我々の方法は6つのテキスト分類タスクで最先端を大幅に上回り，ほとんどのデータセットでエラーを18-24％削減する。

Furthermore, with only 100 labeled examples, it matches the performance of training from scratch on 100x more data.

訳：

さらに，ラベル付きのサンプルが100個しかなく，100倍以上のデータをゼロから学習するパフォーマンスと一致する。

We open-source our pretrained models and code.

訳：

事前トレーニング済みのモデルとコードをオープンソース化する。

62 OpenAI GPT（2018）

f:id:ryosuke_okubo:20191013100518p:plain

原文：

Improving Language Understanding by Generative Pre-Training

Abstract：

Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification.

訳：

自然言語の理解には，テキストの含意，質問への回答，セマンティックな類似性の評価，ドキュメントの分類など幅広い多様なタスクが含まれる。

Although large unlabeled text corpora are abundant, labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to perform adequately.

語彙：

discriminatively

訳：

大きなラベルのないテキストコーパスは豊富だが，これらの特定のタスクを学習するためのラベル付きデータは不足しており、識別訓練されたモデルが適切に実行するのは困難である。

We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task.

訳：

これらのタスクの大きな利点はラベル付けされていないテキストの多様なコーパスで言語モデルの生成的な事前学習を行い，その後に特定のタスクごとに差別的なfine-tuningを行うことで実現できることを示す。

In contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve effective transfer while requiring minimal changes to the model architecture.

訳：

以前のアプローチとは対照的に，モデルアーキテクチャへの最小限の変更を必要としながら効果的な転送を実現するためにfine-tuning中にタスク対応入力変換を利用する。

We demonstrate the effectiveness of our approach on a wide range of benchmarks for natural language understanding.

訳：

自然言語を理解するための幅広いベンチマークで我々のアプローチの有効性を実証している。

Our general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied.

語彙：

agnostic

訳：

一般的なタスクに依存しないモデルは各タスク用に特別に作成されたアーキテクチャを使用する識別訓練されたモデルよりも優れており，調査対象の12のタスクのうち9で最新技術を大幅に改善する。

For instance, we achieve absolute improvements of 8.9% on commonsense reasoning (Stories Cloze Test), 5.7% on question answering (RACE), and 1.5% on textual entailment (MultiNLI).

訳：

たとえば，常識的推論（Stories Cloze Test）で8.9％，質問応答（RACE）で5.7％，テキスト含意（MultiNLI）で1.5％の絶対的改善を達成している。

63 BERT（2018）

f:id:ryosuke_okubo:20191013100539p:plain

原文：

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Abstract：

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.

訳：

Transformerからの双方向エンコーダ表現（※Bidirectional Encoder Representations from Transformers）を表す，BERTと呼ばれる新しい言語表現モデルを紹介する。

Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.

訳：

最近の言語表現モデルとは異なり，BERTはすべてのレイヤーの左右のコンテキストを共同で調整することにより，ラベルのないテキストから双方向の深い表現を事前に学習するように設計されている。

As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.

訳：

その結果，事前学習済みのBERTモデルを1つの追加の出力レイヤーだけで微調整して，実質的なタスク固有のアーキテクチャの変更なしに，質問応答や言語推論などの幅広いタスク用の最先端のモデルを作成できる。

BERT is conceptually simple and empirically powerful.

訳：

BERTは概念的にシンプルで経験的に強力である。

It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).

訳：

GLUEスコア80.5％（7.7％ポイント絶対改善），MultiNLI精度86.7％（4.6％絶対改善），SQuAD v1.1 question answering Test F1 93.2~（1.5ポイント絶対改善）およびSQuAD v2.0 Test F1 83.1~（5.1ポイント絶対改善）などを含む，11の自然言語処理タスクに関する最新の結果を取得する。

64 XLNet（2019）

f:id:ryosuke_okubo:20191013100602p:plain

原文：

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Abstract：

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling.

訳：

双方向コンテキストのモデリング機能により，BERTのようなautoencodingベースの事前学習をノイズ除去することで自己回帰言語モデリングに基づく事前学習アプローチよりも優れたパフォーマンスが得られる。

However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy.

語彙：

relying on

corrupting

discrepancy

訳：

ただし，マスクによって入力を破損することに頼ると，BERTはマスクされた位置間の依存関係を無視し，事前学習と微調整の不一致に悩まされる。

In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method

that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order

and (2) overcomes the limitations of BERT thanks to its autoregressive formulation.

語彙：

pros and cons

訳：

これらの長所と短所を考慮して，一般化された自己回帰事前学習法であるXLNetを提案する，

（1）分解順序のすべての順列に対する予測尤度を最大化することにより双方向コンテキストの学習を可能にする

（2）自己回帰定式化によりBERTの制限を克服する

Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining.

訳：

さらに，XLNetは最先端の自己回帰モデルであるTransformer-XLのアイデアを事前学習に統合する。

Empirically, XLNet outperforms BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking.

訳：

経験的に，XLNetは20のタスクでBERTをはるかに上回るパフォーマンスを発揮し，質問への回答，自然言語の推論，感情分析，ドキュメントのランキングを含む18のタスクで最先端の結果を達成する。

65 RoBERTa（2019）

原文：

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Abstract：

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.

訳：

言語モデルの事前トレーニングによりパフォーマンスが大幅に向上しましたが，異なるアプローチを慎重に比較することは困難である。

Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results.

訳：

学習は計算コストが高く，多くの場合さまざまなサイズのプライベートデータセットで行われる，これから我々が示すように，ハイパーパラメーターの選択は最終結果に大きな影響を与える。

We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size.

語彙：

replication study

訳：

多くの主要なハイパーパラメーターとトレーニングデータサイズの影響を慎重に測定する，BERT事前学習の追試(Devlin et al., 2019)を紹介する。

We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it.

語彙：

undertrained

訳：

BERTの学習が大幅に不足しており，それ以降に公開されたすべてのモデルのパフォーマンスに匹敵するかそれを超えることがあることがわかった。

Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

訳：

我々の最高のモデルはGLUE，RACE，SQuADで最先端の結果を達成する。

These results highlight the importance of previously overlooked design choices, and raise questions about the source of recently reported improvements.

訳：

これらの結果はこれまで見過ごされていた設計選択の重要性を強調し，最近報告された改善の原因について疑問を提起する。