By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: fMRI to Image Captions: MindEye2 Results | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > fMRI to Image Captions: MindEye2 Results | HackerNoon
Computing

fMRI to Image Captions: MindEye2 Results | HackerNoon

News Room
Last updated: 2025/04/11 at 8:20 PM
News Room Published 11 April 2025
Share
SHARE

Table of Links

Abstract and 1 Introduction

2 MindEye2 and 2.1 Shared-Subject Functional Alignment

2.2 Backbone, Diffusion Prior, & Submodules

2.3 Image Captioning and 2.4 Fine-tuning Stable Diffusion XL for unCLIP

2.5 Model Inference

3 Results and 3.1 fMRI-to-Image Reconstruction

3.2 Image Captioning

3.3 Image/Brain Retrieval and 3.4 Brain Correlation

3.5 Ablations

4 Related Work

5 Conclusion

6 Acknowledgements and References

A Appendix

A.1 Author Contributions

A.2 Additional Dataset Information

A.3 MindEye2 (not pretrained) vs. MindEye1

A.4 Reconstruction Evaluations Across Varying Amounts of Training Data

A.5 Single-Subject Evaluations

A.6 UnCLIP Evaluation

A.7 OpenCLIP BigG to CLIP L Conversion

A.8 COCO Retrieval

A.9 Reconstruction Evaluations: Additional Information

A.10 Pretraining with Less Subjects

A.11 UMAP Dimensionality Reduction

A.12 ROI-Optimized Stimuli

A.13 Human Preference Experiments

3.2 Image Captioning

Predicted image captions are quantitatively compared to previous work in Table 2. UniBrain (Mai and Zhang, 2023) was first to predict captions using NSD, training a diffusion model to predict CLIP ViT-L/14 text latents which get fed through a pretrained Optimus GPT2 model (Radford et al., 2019). Ferrante et al. (2023b) predicted image captions by mapping fMRI inputs to CLIP ViT-L/14 image latents via ridge regression, passing these latents through a pretrained GIT model (Wang et al., 2022).

We adopt the same caption metrics reported in the previous work. ROUGE (Lin, 2004) and METEOR (Banerjee and Lavie, 2005) capture aspects of text structure and composition. CLIP (Radford et al., 2021) and Sentence

Figure 5: Normalized reconstruction metrics for MindEye2 with (connected) or without (dotted) pretraining on other subjects, using varying amounts of training/fine-tuning data. Normalization was such that 0 on the y-axis corresponds to metrics using random COCO images (not from NSD test set) as reconstructions and 1 corresponds to metrics using 40- session pretrained MindEye2. Black lines indicate median. Test data is the same across all comparisons (see section 3).Figure 5: Normalized reconstruction metrics for MindEye2 with (connected) or without (dotted) pretraining on other subjects, using varying amounts of training/fine-tuning data. Normalization was such that 0 on the y-axis corresponds to metrics using random COCO images (not from NSD test set) as reconstructions and 1 corresponds to metrics using 40- session pretrained MindEye2. Black lines indicate median. Test data is the same across all comparisons (see section 3).

Transformer (“all-MiniLM-L6-v2”) (Reimers and Gurevych, 2020) are higher-level metrics that provide insight into textual context, relationships, and semantics. All metrics except ROUGE were calculated using the same code as Ferrante et al. (2023b). MindEye2 captioning performance outperformed previous models across all metrics except one, suggesting high-quality image captions from brain activity.

Table 2: FMRI-to-image caption evaluations. Previous works used different ground truth captions for comparison (COCO captions or captions generated from GIT), necessitating separate comparisons. Results were calculated exclusively on NSD subject 1. MindEye2 metrics come from the model trained on all 40 sessions of NSD data whereas previous work used 37 sessions.Table 2: FMRI-to-image caption evaluations. Previous works used different ground truth captions for comparison (COCO captions or captions generated from GIT), necessitating separate comparisons. Results were calculated exclusively on NSD subject 1. MindEye2 metrics come from the model trained on all 40 sessions of NSD data whereas previous work used 37 sessions.

Authors:

(1) Paul S. Scotti, Stability AI and Medical AI Research Center (MedARC);

(2) Mihir Tripathy, Medical AI Research Center (MedARC) and a Core contribution;

(3) Cesar Kadir Torrico Villanueva, Medical AI Research Center (MedARC) and a Core contribution;

(4) Reese Kneeland, University of Minnesota and a Core contribution;

(5) Tong Chen, The University of Sydney and Medical AI Research Center (MedARC);

(6) Ashutosh Narang, Medical AI Research Center (MedARC);

(7) Charan Santhirasegaran, Medical AI Research Center (MedARC);

(8) Jonathan Xu, University of Waterloo and Medical AI Research Center (MedARC);

(9) Thomas Naselaris, University of Minnesota;

(10) Kenneth A. Norman, Princeton Neuroscience Institute;

(11) Tanishq Mathew Abraham, Stability AI and Medical AI Research Center (MedARC).

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article I was torn about Harry over security demands – now I think he’s disoriented
Next Article The Samsung Galaxy S25 Ultra is $245 off, if you get it in this color
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Sydney Sweeney stuns in elegant blue dress at glitzy UK movie premiere
News
Impact of Artificial Intelligence on Network Routers and Switches | HackerNoon
Computing
Truck units of Toyota and Daimler reach merger deal, first announced two years ago
News
Sky launches brand new £6 a month TV service TODAY
News

You Might also Like

Computing

Impact of Artificial Intelligence on Network Routers and Switches | HackerNoon

17 Min Read
Computing

ByteDance launches Trae AI IDE in China with Doubao-1.5-Pro and DeepSeek Models · TechNode

1 Min Read
Computing

TikTok Rolls Out New Text-Based Post Feature |

4 Min Read
Computing

This Lightweight Tool Flags Breaking Changes Before You Ship | HackerNoon

3 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?