Behind The Scenes: The Making Of MindEye2 | HackerNoon

Table of Links

Abstract and 1 Introduction

2 MindEye2 and 2.1 Shared-Subject Functional Alignment

2.2 Backbone, Diffusion Prior, & Submodules

2.3 Image Captioning and 2.4 Fine-tuning Stable Diffusion XL for unCLIP

2.5 Model Inference

3 Results and 3.1 fMRI-to-Image Reconstruction

3.2 Image Captioning

3.3 Image/Brain Retrieval and 3.4 Brain Correlation

3.5 Ablations

4 Related Work

5 Conclusion

6 Acknowledgements and References

A Appendix

A.1 Author Contributions

A.2 Additional Dataset Information

A.3 MindEye2 (not pretrained) vs. MindEye1

A.4 Reconstruction Evaluations Across Varying Amounts of Training Data

A.5 Single-Subject Evaluations

A.6 UnCLIP Evaluation

A.7 OpenCLIP BigG to CLIP L Conversion

A.8 COCO Retrieval

A.9 Reconstruction Evaluations: Additional Information

A.10 Pretraining with Less Subjects

A.11 UMAP Dimensionality Reduction

A.12 ROI-Optimized Stimuli

A.13 Human Preference Experiments

A.1 Author Contributions

PSS: project lead, drafted the initial manuscript and contributed to all parts of MindEye2 development. MT (core contributor): MindEye2 ablations, SDXL unCLIP vs. Versatile Diffusion comparisons, improved distributed training code, and experimented with approaches not used in the final model including training custom ControlNet and T2I adapters, using retrieval on COCO CLIP captions, and using diffusion priors to align fMRI to text embeddings. CKTV (core contributor): retrained and evaluated MindEye1 models, image captioning evaluations and writing, improved manuscript formatting, ROI-optimized stimuli experiments, and experimented with approaches not used in the final model including trying out different pretrained model embeddings, experimenting with T2I-Adapters and depth conditioning, experimenting with using past/future timepoints as additional conditioning, experimenting with blip2 for text prediction, and experimenting with behavioral embeddings. RK (core contributor): brain correlations, human preference experiments, recalculated metrics for 40- hour setting Ozcelik and VanRullen (2023) and Takagi and Nishimoto (2023) results, evaluations with varying amounts of training data across all models, assistance with data normalization, significant contributions to manuscript writing. TC: UMAP visualizations, improved the design for Figure 1, and experimented with approaches not used in the final model including using past/future timepoints as additional conditioning and using flattened voxels in MNI space instead of native space. AN: helped with ablations and experimented with replacing soft CLIP loss with soft SigLIP loss (Zhai et al., 2023) (not used in final model). CS: FAISS retrieval with MS-COCO (Appendix A.8) and experimented with approaches not used in the final model including experimenting with using past/future timepoints as additional conditioning, experimenting with blip2 for text prediction, and experimenting with behavioral embeddings. JX: helped with ablations, manuscript revisions and table formatting, experimented with approaches not used in the final model including experimenting with blip2 for text prediction, experimenting with behavioral embeddings, and improving model architecture. TN: assisted with human preference experiments. KN: oversaw the project, manuscript revisions and framing. TMA: oversaw the project, manuscript revisions and framing, helped keep project on-track thorugh MedARC and Stability AI communication.

Authors:

(1) Paul S. Scotti, Stability AI and Medical AI Research Center (MedARC);

(2) Mihir Tripathy, Medical AI Research Center (MedARC) and a Core contribution;

(3) Cesar Kadir Torrico Villanueva, Medical AI Research Center (MedARC) and a Core contribution;

(4) Reese Kneeland, University of Minnesota and a Core contribution;

(5) Tong Chen, The University of Sydney and Medical AI Research Center (MedARC);

(6) Ashutosh Narang, Medical AI Research Center (MedARC);

(7) Charan Santhirasegaran, Medical AI Research Center (MedARC);

(8) Jonathan Xu, University of Waterloo and Medical AI Research Center (MedARC);

(9) Thomas Naselaris, University of Minnesota;

(10) Kenneth A. Norman, Princeton Neuroscience Institute;

(11) Tanishq Mathew Abraham, Stability AI and Medical AI Research Center (MedARC).

Behind the Scenes: The Making of MindEye2 | HackerNoon

Table of Links

A.1 Author Contributions

Leave a Reply Cancel reply

Stay Connected

Latest News

Best Wireless Mouse 2025: Our favourite choices tested and ranked

Expert-approved deal: The Apple Watch Series 10 has hit its lowest-ever price ahead of Prime Day

Company’s carbon credits raise questions about unproven ocean technology to fight global warming

TSMC completes risk production of 2nm process with 5,000 wafers · TechNode

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Table of Links

A.1 Author Contributions

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News