Table of Links
Abstract and 1 Introduction
2 MindEye2 and 2.1 Shared-Subject Functional Alignment
2.2 Backbone, Diffusion Prior, & Submodules
2.3 Image Captioning and 2.4 Fine-tuning Stable Diffusion XL for unCLIP
2.5 Model Inference
3 Results and 3.1 fMRI-to-Image Reconstruction
3.2 Image Captioning
3.3 Image/Brain Retrieval and 3.4 Brain Correlation
3.5 Ablations
4 Related Work
5 Conclusion
6 Acknowledgements and References
A Appendix
A.1 Author Contributions
A.2 Additional Dataset Information
A.3 MindEye2 (not pretrained) vs. MindEye1
A.4 Reconstruction Evaluations Across Varying Amounts of Training Data
A.5 Single-Subject Evaluations
A.6 UnCLIP Evaluation
A.7 OpenCLIP BigG to CLIP L Conversion
A.8 COCO Retrieval
A.9 Reconstruction Evaluations: Additional Information
A.10 Pretraining with Less Subjects
A.11 UMAP Dimensionality Reduction
A.12 ROI-Optimized Stimuli
A.13 Human Preference Experiments
It is common for fMRI analyses to align subjects’ brains to a shared space for the purposes of increasing statistical power and/or assessing generality of scientific findings. Such alignment is difficult because structural and functional topography differs substantially across people (Talairach and Tournoux, 1990; Mazziotta et al., 2001). There are many approaches to functional alignment but typically they involve subjects experiencing shared stimuli and then using responses to these stimuli to learn an alignment mapping (Chen et al., 2015; Haxby et al., 2011; Huang et al., 2021; Nastase et al., 2019; Busch et al., 2021). While it is useful to conduct such experiments to identify sources of shared signal across subjects, it is also limiting in that new subjects would need to be scanned using the same experimental protocol. Other functional alignment approaches avoid such limitations by using self-supervised learning to identify an initial generalizable embedding space with outputs suitable for downstream tasks (Schneider et al., 2023; Chen et al., 2023a;b). Closest to our alignment approach are models that adopt both shared-subject and subject-specific mappings in their model architecture (Défossez et al., 2022; Benchetrit et al., 2023; Yang et al., 2023; Lane and Kiar, 2023).
Ferrante et al. (2023a) previously showed across-subject image reconstruction via ridge regression by training a linear subject-specific decoding model and then separately mapping other subjects to this space via ridge regression. This is similar to our approach in that both involve ridge regression to a shared space, but is distinct in that their approach is capped by the performance of the initial single-subject model from which other subjects are mapped into, is restricted to only linear fine-tuning, and was demonstrated only with a reduced training dataset of images seen by all subjects. MindEye2 is unique in its demonstration that a single neural network model can be pretrained across subjects experiencing unique stimuli and robustly fine-tuned to a new subject with few data points.
Authors:
(1) Paul S. Scotti, Stability AI and Medical AI Research Center (MedARC);
(2) Mihir Tripathy, Medical AI Research Center (MedARC) and a Core contribution;
(3) Cesar Kadir Torrico Villanueva, Medical AI Research Center (MedARC) and a Core contribution;
(4) Reese Kneeland, University of Minnesota and a Core contribution;
(5) Tong Chen, The University of Sydney and Medical AI Research Center (MedARC);
(6) Ashutosh Narang, Medical AI Research Center (MedARC);
(7) Charan Santhirasegaran, Medical AI Research Center (MedARC);
(8) Jonathan Xu, University of Waterloo and Medical AI Research Center (MedARC);
(9) Thomas Naselaris, University of Minnesota;
(10) Kenneth A. Norman, Princeton Neuroscience Institute;
(11) Tanishq Mathew Abraham, Stability AI and Medical AI Research Center (MedARC).