Seminar - Transformer-Based Architectures for Visual Place Recognition

Organization

If you are interested in participating, please send an email with "[Seminar-SoSe26] <name>" to Nick Theisen by 27.03.2026. This is non-binding and only required for internal organization.

Further Information

Max. 12 participants
Registration via KLIPS (https://klips.uni-koblenz.de/v/162447)
Kick-Off will take place on 02.04.2026 10:00 - 12:00 in room B325
Kick-Off Slides: https://cloud.uni-koblenz.de/s/PKzRxyKJ5zxAiHR
Attendance in all sessions is mandatory.

Kontakt

Zum Profil

Content

Motivation / Description

Visual Place Recognition (VPR) is the task of identifying a previously visited location using visual data, such as images or video frames. It is a core challenge in robotics, autonomous navigation, and augmented reality, where systems must recognize places despite changes in lighting, viewpoint, or seasonal appearance.

Transformers have become a versatile tool for VPR, not just as end-to-end models but as modular components within the pipeline. This seminar focuses on how transformers are integrated into different stages of VPR systems, e.g. feature extraction, candidate retrieval and reranking, attention-based matching. By comparing and analysing approaches, we will uncover inherent assumptions, advantages, and limitations of transformer-based solutions in different application scenarios.

Participants will gain insights into the design choices behind transformer integration in VPR, understand the current landscape of tranformer-based VPR approaches and develop a critical understanding of their strengths and trade-offs in real-world applications.

Literature

Transformer & Visual Place Recognition Basics

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NeurIPS, pages 5998–6008, 2017.
S. Schubert, P. Neubert, S. Garg, M. Milford and T. Fischer: Visual Place Recognition: A Tutorial [Tutorial] in IEEE Robotics & Automation Magazine, vol. 31, no. 3, pp. 139-153, Sept. 2024, doi: 10.1109/MRA.2023.3310859.

Transformer-Based Architectures for VPR

R. Mereu, G. Trivigno, G. Berton, C. Masone, and B. Caputo. Learning Sequential Descriptors for Sequence-Based Visual Place Recognition. IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10383–10390, Oct. 2022, doi: 10.1109/LRA.2022.3194310.
R. Wang, Y. Shen, W. Zuo, S. Zhou, and N. Zheng. TransVPR: Transformer-Based Place Recognition with Multi-Level Attention Aggregation. in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2022, pp. 13638–13647. doi: 10.1109/CVPR52688.2022.01328.
Y. Wang, Y. Qiu, P. Cheng, and J. Zhang. Hybrid CNN-Transformer Features for Visual Place Recognition. IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 3, pp. 1109–1122, Mar. 2023, doi: 10.1109/TCSVT.2022.3212434.
Y. Wang, Y. Qiu, P. Cheng, and J. Zhang. Transformer-based descriptors with fine-grained region supervisions for visual place recognition. Knowledge-Based Systems, vol. 280, p. 110993, Nov. 2023, doi: 10.1016/j.knosys.2023.110993.
H. Zhang, X. Chen, H. Jing, Y. Zheng, Y. Wu and C. Jin, ETR. An Efficient Transformer for Re-ranking in Visual Place Recognition. 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2023, pp. 5654-5663, doi: 10.1109/WACV56688.2023.00562.
S. Zhu, L. Yang, C. Chen, M. Shah, X. Shen and H. Wang, R2 Former: Unified Retrieval and Reranking Transformer for Place Recognition 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 2023, pp. 19370-19380, doi: 10.1109/CVPR52729.2023.01856.
A. Ali-bey, B. Chaib-draa, and P. Giguère. BoQ: A Place is Worth a Bag of Learnable Queries. in 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2024, pp. 17794–17803. doi: 10.1109/CVPR52733.2024.01685.
S. S. Kannan and B.-C. Min. PlaceFormer: Transformer-Based Visual Place Recognition Using Multi-Scale Patch Selection and Fusion. IEEE Robotics and Automation Letters, vol. 9, no. 7, pp. 6552–6559, Jul. 2024, doi: 10.1109/LRA.2024.3408075.
Feng Lu, Xinyao Zhang, Canming Ye, Shuting Dong, Lijun Zhang, Xiangyuan Lan, Chun Yuan. SuperVLAD: Compact and Robust Image Descriptors for Visual Place Recognition. NeurIPS 2024
G. Berton and C. Masone. MegaLoc: One Retrieval to Place Them All. in 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Jun. 2025, pp. 2852–2858. doi: 10.1109/CVPRW67362.2025.00269.
S. Hausler and P. Moghadam. Pair-VPR: Place-Aware Pre-Training and Contrastive Pair Classification for Visual Place Recognition With Vision Transformers. IEEE Robotics and Automation Letters, vol. 10, no. 4, pp. 4013–4020, Apr. 2025, doi: 10.1109/LRA.2025.3546512.
F. Lu et al. SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 48, no. 3, pp. 2731–2748, Mar. 2026, doi: 10.1109/TPAMI.2025.3629287.

Prior Knowledge

Proficiency in English language If required the seminar will be held in English. In any case, relevant literature will be in English.
The seminar covers or is related to advanced computer vision and deep learning. Experience with deep learning architectures is highly recommended.