Table of Links
Abstract and 1. Introduction
-
Materials and Methods
2.1 Vector Database and Indexing
2.2 Feature Extractors
2.3 Dataset and Pre-processing
2.4 Search and Retrieval
2.5 Re-ranking retrieval and evaluation
-
Evaluation and 3.1 Search and Retrieval
3.2 Re-ranking
-
Discussion
4.1 Dataset and 4.2 Re-ranking
4.3 Embeddings
4.4 Volume-based, Region-based and Localized Retrieval and 4.5 Localization-ratio
-
Conclusion, Acknowledgement, and References
5 Conclusion
Our study establishes a new benchmark for the retrieval of anatomical structures within 3D medical volumes, utilizing the TotalSegmentator dataset to facilitate targeted queries of volumes or sub-volumes for specific anatomical structures. The results highlight the potential of leveraging pre-trained vision embeddings, originally trained on non-medical images, for medical image retrieval across diverse anatomical regions with a wide size range.
We introduced a re-ranking method based on a late interaction model from text retrieval, i.e. ColBERT Khattab and Zaharia [2020]. The proposed ColBERT-inspired method enhances the retrieval recall of all anatomical regions. Future investigations can focus on refining and optimizing the computational efficiency of the proposed re-ranking method.
We evaluated the performance of different embeddings pre-trained supervised and self-supervised on medical and non-medical data. The results indicate that pre-training on general natural images (e.g., ImageNet) yields slightly more effective embedding vectors than domain-specific natural images (e.g., RadImageNet). However, given the marginal difference, the choice of embeddings is unlikely to impact the user experience in downstream tasks significantly.
The retrieval of certain anatomical structures, such as the brain and face, demonstrates low recall across all embedding and retrieval methods. Subsequent research can explore the prevalence of such patterns and potential solutions.
This benchmark sets the stage for future advancements in content-based medical image retrieval, particularly in localizing specific organs or areas within scans.
Acknowledgement
The authors like to thank the Bayer team of the internal ML innovation platform for providing compute infrastructure and technical support.
We thank Timothy Deyer and his RadImageNet team for providing the RadImageNet pre-trained model weights for the SwinTransformer architecture.
References
Shiv Ram Dubey. A decade survey of content based image retrieval using deep learning. IEEE Transactions on Circuits and Systems for Video Technology, 32(5):2687–2704, 2021.
Wenqing Wang, Pengfei Jiao, Han Liu, Xiao Ma, and Zhuo Shang. Two-stage content based image retrieval using sparse representation and feature fusion. Multimedia Tools and Applications, 81(12):16621–16644, 2022.
Adnan Qayyum, Syed Muhammad Anwar, Muhammad Awais, and Muhammad Majid. Medical image retrieval using deep convolutional neural network. Neurocomputing, 266:8–20, 2017.
Farnaz Khun Jush, Tuan Truong, Steffen Vogler, and Matthias Lenga. Medical image retrieval using pretrained embeddings. arXiv preprint arXiv:2311.13547, 2023.
Asma Ben Abacha, Alberto Santamaria-Pang, Ho Hin Lee, Jameson Merkow, Qin Cai, Surya Teja Devarakonda, Abdullah Islam, Julia Gong, Matthew P Lungren, Thomas Lin, et al. 3d-mir: A benchmark and empirical study on 3d medical image retrieval in radiology. arXiv preprint arXiv:2311.13752, 2023.
Stefan Denner, David Zimmerer, Dimitrios Bounias, Markus Bujotzek, Shuhan Xiao, Lisa Kausch, Philipp Schader, Tobias Penzkofer, Paul F Jäger, and Klaus Maier-Hein. Leveraging foundation models for content-based medical image retrieval in radiology. arXiv preprint arXiv:2403.06567, 2024.
Tuan Truong, Farnaz Khun Jush, and Matthias Lenga. Benchmarking pretrained vision embeddings for near-and duplicate detection in medical images. arXiv preprint arXiv:2312.07273, 2023.
Michela Antonelli, Annika Reinke, Spyridon Bakas, Keyvan Farahani, Annette Kopp-Schneider, Bennett A Landman, Geert Litjens, Bjoern Menze, Olaf Ronneberger, Ronald M Summers, et al. The medical segmentation decathlon. Nature communications, 13(1):4128, 2022.
Jakob Wasserthal, Hanns-Christian Breit, Manfred T Meyer, Maurice Pradella, Daniel Hinck, Alexander W Sauter, Tobias Heye, Daniel T Boll, Joshy Cyriac, Shan Yang, et al. Totalsegmentator: Robust segmentation of 104 anatomic structures in ct images. Radiology: Artificial Intelligence, 5(5), 2023.
Omar Khattab and Matei Zaharia. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 39–48, 2020.
Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. Ann-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems, 87:101374, 2020.
Moses S Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 380–388, 2002.
Yu A Malkov and Dmitry A Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence, 42(4):824–836, 2018.
Ibraheem Taha, Matteo Lissandrini, Alkis Simitsis, and Yannis Ioannidis. A study on efficient indexing for table search in data lakes. In 2024 IEEE 18th International Conference on Semantic Computing (ICSC), pages 245–252. IEEE, 2024.
Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7(3):535–547, 2019.
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
Stephanie Fu, Netanel Tamir, Shobhita Sundaram, Lucy Chai, Richard Zhang, Tali Dekel, and Phillip Isola. Dreamsim: Learning new dimensions of human visual similarity using synthetic data. arXiv preprint arXiv:2306.09344, 2023.
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
Xueyan Mei, Zelong Liu, Philip M Robson, Brett Marinelli, Mingqian Huang, Amish Doshi, Adam Jacobi, Chendi Cao, Katherine E Link, Thomas Yang, et al. Radimagenet: an open radiologic deep learning research dataset for effective transfer learning. Radiology: Artificial Intelligence, 4(5):e210315, 2022.
Hirokatsu Kataoka, Kazushige Okayasu, Asato Matsumoto, Eisuke Yamagata, Ryosuke Yamada, Nakamasa Inoue, Akio Nakamura, and Yutaka Satoh. Pre-training without natural images. International Journal of Computer Vision (IJCV), 2022.
Qingyao Ai, Jiaxin Mao, Yiqun Liu, and W Bruce Croft. Unbiased learning to rank: Theory and practice. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pages 2305–2306, 2018.
Jiafeng Guo, Yixing Fan, Liang Pang, Liu Yang, Qingyao Ai, Hamed Zamani, Chen Wu, W Bruce Croft, and Xueqi Cheng. A deep look into neural ranking models for information retrieval. Information Processing & Management, 57 (6):102067, 2020.
Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. Cedr: Contextualized embeddings for document ranking. In Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, pages 1101–1104, 2019.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, and Matei Zaharia. Colbertv2: Effective and efficient retrieval via lightweight late interaction. arXiv preprint arXiv:2112.01488, 2021.
:::info
Authors:
(1) Farnaz Khun Jush, Bayer AG, Berlin, Germany (farnaz.khunjush@bayer.com);
(2) Steffen Vogler, Bayer AG, Berlin, Germany (steffen.vogler@bayer.com);
(3) Tuan Truong, Bayer AG, Berlin, Germany (tuan.truong@bayer.com);
(4) Matthias Lenga, Bayer AG, Berlin, Germany (matthias.lenga@bayer.com).
:::
:::info
This paper is available on arxiv under CC BY 4.0 DEED license.
:::