Research Papers
Vision Transformers: Approaching but Not Achieving Human-Level Object Recognition
The claim that Vision Transformers have achieved human-level object recognition is partially true but significantly oversimplified. ViTs now exceed reported human performance on ImageNet—91% vs 94.9%—yet profound differences in how humans and machines see remain.