The Transformer network consists of multiple layers, each with several Attention Heads (and additional layers), used to learn different relationships between tokens. As in many NLP models, the input tokens are first embedded into vectors
This slide
Transformer由论文《Attention is All You Need》提出,现在是谷歌云TPU推荐的参考模型。Transformer是:“首个完全抛弃RNN的recurrence,CNN的convolution,仅用attention来做特征抽取的模型。“ 本文简介了Transformer模型。
国际表示学习大会(The International Conference on Learning Representations)是致力于人工智能领域发展的国际知名学术会议之一。为了分析最新研究动向,本文精选了涵盖自监督学习、Transformer、图神经网络、自然语言处理、模型压缩等热点领域,将分多期为大家带来系列论文解读。