Adrian Kruse

Volume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding

Kadir Yilmaz*, Adrian Kruse*, Tristan Höfer, Daan de Geus, Bastian Leibe

ECCV 2026

Volt partitions the input 3D scene into non-overlapping volumetric patches and embeds each patch into a token with a linear tokenizer. The resulting token sequence is processed by a Transformer encoder with global attention.

Project arXiv Code