使用者:月心雪/沙盒

BERT

變換器的雙向編碼器表徵技術(BERT)是通過Google進行的自然語言處理(NLP)的一種預訓練技術^[1]^[2]。2018年，雅各布·德夫林和他來自谷歌的同事創建並發布了BERT。谷歌正在利用BERT來更好地理解用戶的搜索含義。^[3] 原始的英語語言 BERT 模型有兩種預先訓練的一般類型:^[1](1)BERTBASE模型，一個12層，768隱藏，12頭，110M的參數神經網絡結構，(2) BERTLARGE模型，一個24層，1024隱藏，16頭，340M的參數神經網絡結構; 兩者都是在有800M單詞的{[BooksCorpus]]^[4]以及一個擁有2500M單詞的英文版維基百科上訓練的。

性能

當 BERT 出版時，它在一些自然語言的理解任務上表現得最為先進：^[1]

GLUE (通用語言理解評估)任務集(包括9個任務)。
SQuAD (Stanford Question Answering Dataset) v1.1和 v2.0。
SWAG (對抗生成的情境)

分析

BERT 在這些自然語言理解任務上表現出最先進水平的原因還沒有得到很好的解釋。^[5]^[6]目前的研究主要集中在精心選擇的輸入序列背後的 BERT 輸出關係，^[7]^[8]通過探測分類器分析內部向量表示，^[9]^[10]以及注意力權重表示的關係。^[5]^[6]

歷史

BERT起源於訓練前的語境表示，包括半監督序列學習,^[11]生成預訓練，ELMo，^[12]和ULMFit. ^[13]與以前的模型不同，BERT是一種深度雙向的、無監督的語言表達，僅使用純文本語料庫進行預訓練。上下文無關模型(如 word2vec 或 GloVe)為詞彙表中的每個單詞生成一個單詞嵌入表示法，其中BERT考慮給定單詞每次出現的上下文。例如，儘管」跑步」的矢量在「他在經營一家公司」和」他在跑馬拉松」兩句中的出現具有相同的word2vec矢量表示，但BERT將提供一種上下文嵌入，可以根據句子表達的不同而不同。 2019年10月25日，Google搜索宣布他們已經開始在美國國內的英語搜索查詢中應用BERT模型。^[14]2019年12月9日，據報道，Google搜索已經採用了BERT，涵蓋了70多種語言。^[15]

獲獎情況

在2019年美國計算機語言學協會北美分會年會上，BERT獲得了最佳長篇論文獎。^[16]

參見

參考文獻

^ ^1.0 ^1.1 ^1.2 Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 11 October 2018. arXiv:1810.04805v2  [cs.CL].
^ Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing. Google AI Blog. [2019-11-27] （英語）.
^ Understanding searches better than ever before. Google. 2019-10-25 [2019-11-27] （英語）.
^ Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. 2015. arXiv:1506.06724  [cs.CV]. cite arXiv模板填寫了不支持的參數 (幫助)
^ ^5.0 ^5.1 Kovaleva, Olga; Romanov, Alexey; Rogers, Anna; Rumshisky, Anna. Revealing the Dark Secrets of BERT. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). November 2019: 4364–4373. S2CID 201645145. doi:10.18653/v1/D19-1445 （美國英語）.
^ ^6.0 ^6.1 Clark, Kevin; Khandelwal, Urvashi; Levy, Omer; Manning, Christopher D. What Does BERT Look at? An Analysis of BERT's Attention. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2019: 276–286. doi:10.18653/v1/w19-4828  .
^ Khandelwal, Urvashi; He, He; Qi, Peng; Jurafsky, Dan. Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 284–294. Bibcode:2018arXiv180504623K. S2CID 21700944. arXiv:1805.04623  . doi:10.18653/v1/p18-1027.
^ Gulordava, Kristina; Bojanowski, Piotr; Grave, Edouard; Linzen, Tal; Baroni, Marco. Colorless Green Recurrent Networks Dream Hierarchically. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 1195–1205. Bibcode:2018arXiv180311138G. S2CID 4460159. arXiv:1803.11138  . doi:10.18653/v1/n18-1108.
^ Giulianelli, Mario; Harding, Jack; Mohnert, Florian; Hupkes, Dieuwke; Zuidema, Willem. Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 240–248. Bibcode:2018arXiv180808079G. S2CID 52090220. arXiv:1808.08079  . doi:10.18653/v1/w18-5426.
^ Zhang, Kelly; Bowman, Samuel. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 359–361. doi:10.18653/v1/w18-5448  .
^ Dai, Andrew; Le, Quoc. Semi-supervised Sequence Learning. 4 November 2015. arXiv:1511.01432  [cs.LG].
^ Peters, Matthew; Neumann, Mark; Iyyer, Mohit; Gardner, Matt; Clark, Christopher; Lee, Kenton; Luke, Zettlemoyer. Deep contextualized word representations. 15 February 2018. arXiv:1802.05365v2  [cs.CL].
^ Howard, Jeremy; Ruder, Sebastian. Universal Language Model Fine-tuning for Text Classification. 18 January 2018. arXiv:1801.06146v5  [cs.CL].
^ Nayak, Pandu. Understanding searches better than ever before. Google Blog. 25 October 2019 [10 December 2019].
^ Montti, Roger. Google's BERT Rolls Out Worldwide. Search Engine Journal. Search Engine Journal. 10 December 2019 [10 December 2019].
^ Best Paper Awards. NAACL. 2019 [Mar 28, 2020].

外部連結

Official GitHub repository

[:0-1] 1.0 ^1.1 ^1.2 Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 11 October 2018. arXiv:1810.04805v2  [cs.CL].

[2] Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing. Google AI Blog. [2019-11-27] （英語）.

[3] Understanding searches better than ever before. Google. 2019-10-25 [2019-11-27] （英語）.

[4] Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. 2015. arXiv:1506.06724  [cs.CV]. cite arXiv模板填寫了不支持的參數 (幫助)

[:1-5] 5.0 ^5.1 Kovaleva, Olga; Romanov, Alexey; Rogers, Anna; Rumshisky, Anna. Revealing the Dark Secrets of BERT. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). November 2019: 4364–4373. S2CID 201645145. doi:10.18653/v1/D19-1445 （美國英語）.

[:2-6] 6.0 ^6.1 Clark, Kevin; Khandelwal, Urvashi; Levy, Omer; Manning, Christopher D. What Does BERT Look at? An Analysis of BERT's Attention. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2019: 276–286. doi:10.18653/v1/w19-4828  .

[7] Khandelwal, Urvashi; He, He; Qi, Peng; Jurafsky, Dan. Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 284–294. Bibcode:2018arXiv180504623K. S2CID 21700944. arXiv:1805.04623  . doi:10.18653/v1/p18-1027.

[8] Gulordava, Kristina; Bojanowski, Piotr; Grave, Edouard; Linzen, Tal; Baroni, Marco. Colorless Green Recurrent Networks Dream Hierarchically. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 1195–1205. Bibcode:2018arXiv180311138G. S2CID 4460159. arXiv:1803.11138  . doi:10.18653/v1/n18-1108.

[9] Giulianelli, Mario; Harding, Jack; Mohnert, Florian; Hupkes, Dieuwke; Zuidema, Willem. Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 240–248. Bibcode:2018arXiv180808079G. S2CID 52090220. arXiv:1808.08079  . doi:10.18653/v1/w18-5426.

[10] Zhang, Kelly; Bowman, Samuel. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 359–361. doi:10.18653/v1/w18-5448  .

[11] Dai, Andrew; Le, Quoc. Semi-supervised Sequence Learning. 4 November 2015. arXiv:1511.01432  [cs.LG].

[12] Peters, Matthew; Neumann, Mark; Iyyer, Mohit; Gardner, Matt; Clark, Christopher; Lee, Kenton; Luke, Zettlemoyer. Deep contextualized word representations. 15 February 2018. arXiv:1802.05365v2  [cs.CL].

[13] Howard, Jeremy; Ruder, Sebastian. Universal Language Model Fine-tuning for Text Classification. 18 January 2018. arXiv:1801.06146v5  [cs.CL].

[14] Nayak, Pandu. Understanding searches better than ever before. Google Blog. 25 October 2019 [10 December 2019].

[15] Montti, Roger. Google's BERT Rolls Out Worldwide. Search Engine Journal. Search Engine Journal. 10 December 2019 [10 December 2019].

[16] Best Paper Awards. NAACL. 2019 [Mar 28, 2020].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]