Pushing the Limits of Transformer Models

Since their invention in 2017, "Transformer" models have revolutionized the field of natural language processing. Models such as BERT, GPT3 and T5 have achieved state-of-the-art performance in many challenging NLP tasks, getting closer and closer to human performance. Moreover, Transformer-based models are also starting to make inroads into other areas such as computer vision, rivaling traditional convolutional architectures. However, despite their success, Transformers have many limitations. In this talk, I'll discuss our most recent work at Google on pushing the limits of Transformer models to address such limitations. In particular, I'll talk about our work towards solving tasks that require the models to process very long inputs (e.g., question answering over long documents), structured inputs (where the input are not just raw sequences of words or pixels but have some sort of graph structure), and tasks that require compositional generalization.

Santiago Ontañón is a Research Scientist at Google Research. His research focus lies at the intersection of AI and machine learning, with applications to natural language processing and computer games. He is also an Associate Professor at Drexel University (on leave). He obtained his PhD at the Artificial Intelligence Research Institute (IIIA) in Barcelona, and held postdoctoral positions at IIIA, the Georgia Institute of Technology and the University of Barcelona.