用户:月心雪/沙盒

BERT

变换器的双向编码器表征技术(BERT)是通过Google进行的自然语言处理(NLP)的一种预训练技术^[1]^[2]。2018年，雅各布·德夫林和他来自谷歌的同事创建并发布了BERT。谷歌正在利用BERT来更好地理解用户的搜索含义。^[3] 原始的英语语言 BERT 模型有两种预先训练的一般类型:^[1](1)BERTBASE模型，一个12层，768隐藏，12头，110M的参数神经网络结构，(2) BERTLARGE模型，一个24层，1024隐藏，16头，340M的参数神经网络结构; 两者都是在有800M单词的{[BooksCorpus]]^[4]以及一个拥有2500M单词的英文版维基百科上训练的。

性能

当 BERT 出版时，它在一些自然语言的理解任务上表现得最为先进：^[1]

GLUE (通用语言理解评估)任务集(包括9个任务)。
SQuAD (Stanford Question Answering Dataset) v1.1和 v2.0。
SWAG (对抗生成的情境)

分析

BERT 在这些自然语言理解任务上表现出最先进水平的原因还没有得到很好的解释。^[5]^[6]目前的研究主要集中在精心选择的输入序列背后的 BERT 输出关系，^[7]^[8]通过探测分类器分析内部向量表示，^[9]^[10]以及注意力权重表示的关系。^[5]^[6]

历史

BERT起源于训练前的语境表示，包括半监督序列学习,^[11]生成预训练，ELMo，^[12]和ULMFit. ^[13]与以前的模型不同，BERT是一种深度双向的、无监督的语言表达，仅使用纯文本语料库进行预训练。上下文无关模型(如 word2vec 或 GloVe)为词汇表中的每个单词生成一个单词嵌入表示法，其中BERT考虑给定单词每次出现的上下文。例如，尽管”跑步”的矢量在“他在经营一家公司”和”他在跑马拉松”两句中的出现具有相同的word2vec矢量表示，但BERT将提供一种上下文嵌入，可以根据句子表达的不同而不同。 2019年10月25日，Google搜索宣布他们已经开始在美国国内的英语搜索查询中应用BERT模型。^[14]2019年12月9日，据报道，Google搜索已经采用了BERT，涵盖了70多种语言。^[15]

获奖情况

在2019年美国计算机语言学协会北美分会年会上，BERT获得了最佳长篇论文奖。^[16]

参见

参考文献

^ ^1.0 ^1.1 ^1.2 Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 11 October 2018. arXiv:1810.04805v2  [cs.CL].
^ Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing. Google AI Blog. [2019-11-27] （英语）.
^ Understanding searches better than ever before. Google. 2019-10-25 [2019-11-27] （英语）.
^ Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. 2015. arXiv:1506.06724  [cs.CV]. cite arXiv模板填写了不支持的参数 (帮助)
^ ^5.0 ^5.1 Kovaleva, Olga; Romanov, Alexey; Rogers, Anna; Rumshisky, Anna. Revealing the Dark Secrets of BERT. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). November 2019: 4364–4373. S2CID 201645145. doi:10.18653/v1/D19-1445 （美国英语）.
^ ^6.0 ^6.1 Clark, Kevin; Khandelwal, Urvashi; Levy, Omer; Manning, Christopher D. What Does BERT Look at? An Analysis of BERT's Attention. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2019: 276–286. doi:10.18653/v1/w19-4828  .
^ Khandelwal, Urvashi; He, He; Qi, Peng; Jurafsky, Dan. Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 284–294. Bibcode:2018arXiv180504623K. S2CID 21700944. arXiv:1805.04623  . doi:10.18653/v1/p18-1027.
^ Gulordava, Kristina; Bojanowski, Piotr; Grave, Edouard; Linzen, Tal; Baroni, Marco. Colorless Green Recurrent Networks Dream Hierarchically. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 1195–1205. Bibcode:2018arXiv180311138G. S2CID 4460159. arXiv:1803.11138  . doi:10.18653/v1/n18-1108.
^ Giulianelli, Mario; Harding, Jack; Mohnert, Florian; Hupkes, Dieuwke; Zuidema, Willem. Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 240–248. Bibcode:2018arXiv180808079G. S2CID 52090220. arXiv:1808.08079  . doi:10.18653/v1/w18-5426.
^ Zhang, Kelly; Bowman, Samuel. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 359–361. doi:10.18653/v1/w18-5448  .
^ Dai, Andrew; Le, Quoc. Semi-supervised Sequence Learning. 4 November 2015. arXiv:1511.01432  [cs.LG].
^ Peters, Matthew; Neumann, Mark; Iyyer, Mohit; Gardner, Matt; Clark, Christopher; Lee, Kenton; Luke, Zettlemoyer. Deep contextualized word representations. 15 February 2018. arXiv:1802.05365v2  [cs.CL].
^ Howard, Jeremy; Ruder, Sebastian. Universal Language Model Fine-tuning for Text Classification. 18 January 2018. arXiv:1801.06146v5  [cs.CL].
^ Nayak, Pandu. Understanding searches better than ever before. Google Blog. 25 October 2019 [10 December 2019].
^ Montti, Roger. Google's BERT Rolls Out Worldwide. Search Engine Journal. Search Engine Journal. 10 December 2019 [10 December 2019].
^ Best Paper Awards. NAACL. 2019 [Mar 28, 2020].

外部链接

Official GitHub repository

[:0-1] 1.0 ^1.1 ^1.2 Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 11 October 2018. arXiv:1810.04805v2  [cs.CL].

[2] Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing. Google AI Blog. [2019-11-27] （英语）.

[3] Understanding searches better than ever before. Google. 2019-10-25 [2019-11-27] （英语）.

[4] Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. 2015. arXiv:1506.06724  [cs.CV]. cite arXiv模板填写了不支持的参数 (帮助)

[:1-5] 5.0 ^5.1 Kovaleva, Olga; Romanov, Alexey; Rogers, Anna; Rumshisky, Anna. Revealing the Dark Secrets of BERT. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). November 2019: 4364–4373. S2CID 201645145. doi:10.18653/v1/D19-1445 （美国英语）.

[:2-6] 6.0 ^6.1 Clark, Kevin; Khandelwal, Urvashi; Levy, Omer; Manning, Christopher D. What Does BERT Look at? An Analysis of BERT's Attention. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2019: 276–286. doi:10.18653/v1/w19-4828  .

[7] Khandelwal, Urvashi; He, He; Qi, Peng; Jurafsky, Dan. Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 284–294. Bibcode:2018arXiv180504623K. S2CID 21700944. arXiv:1805.04623  . doi:10.18653/v1/p18-1027.

[8] Gulordava, Kristina; Bojanowski, Piotr; Grave, Edouard; Linzen, Tal; Baroni, Marco. Colorless Green Recurrent Networks Dream Hierarchically. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 1195–1205. Bibcode:2018arXiv180311138G. S2CID 4460159. arXiv:1803.11138  . doi:10.18653/v1/n18-1108.

[9] Giulianelli, Mario; Harding, Jack; Mohnert, Florian; Hupkes, Dieuwke; Zuidema, Willem. Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 240–248. Bibcode:2018arXiv180808079G. S2CID 52090220. arXiv:1808.08079  . doi:10.18653/v1/w18-5426.

[10] Zhang, Kelly; Bowman, Samuel. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 359–361. doi:10.18653/v1/w18-5448  .

[11] Dai, Andrew; Le, Quoc. Semi-supervised Sequence Learning. 4 November 2015. arXiv:1511.01432  [cs.LG].

[12] Peters, Matthew; Neumann, Mark; Iyyer, Mohit; Gardner, Matt; Clark, Christopher; Lee, Kenton; Luke, Zettlemoyer. Deep contextualized word representations. 15 February 2018. arXiv:1802.05365v2  [cs.CL].

[13] Howard, Jeremy; Ruder, Sebastian. Universal Language Model Fine-tuning for Text Classification. 18 January 2018. arXiv:1801.06146v5  [cs.CL].

[14] Nayak, Pandu. Understanding searches better than ever before. Google Blog. 25 October 2019 [10 December 2019].

[15] Montti, Roger. Google's BERT Rolls Out Worldwide. Search Engine Journal. Search Engine Journal. 10 December 2019 [10 December 2019].

[16] Best Paper Awards. NAACL. 2019 [Mar 28, 2020].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]