AI Definitions: Transformers

Transformers – A 2017 Google research paper first discussed the deep learning architecture known as transformers. The major AI models (including Anthropic’s Claude, Google’s Gemini and GPT-4) are built using these neural networks. Previously, recurrent neural networks (RNNs) processed data sequentially—one word at a time, in the order in which the words appear. Then, an “attention mechanism” was added so the model could consider the relationships between words. When transformers came along, they advanced this process by analyzing all the words in a given body of text at the same time rather than in sequence. Transformers made it possible to create higher-quality language models that could be trained more efficiently and with more customizable features. A troubling downside to transformers is their need for ever increasing power demands. This is why some researchers are looking for alternatives like test-time training (TTT).

More AI definitions here.