In the last decade, convolutional neural networks (CNNs) have been the go-to architecture in computer vision, owing to their powerful capability in learning representations from images/videos.
Transformers were first introduced by the team at Google Brain in 2017 in their paper, “Attention is All You Need”. Since their introduction, transformers have inspired a flurry of investment and ...