What Is Learned By DreamLLM? Dream Query Attention

What Is Learned by DreamLLM? Dream Query Attention | HackerNoon

Last updated: 2024/11/27 at 11:16 PM

News Room Published 27 November 2024

Table of Links

Abstract and 1 Introduction

2 Background & Problem Statement

2.1 How can we use MLLMs for Diffusion Synthesis that Synergizes both sides?

3 DreamLLM

3.1 End-to-End Interleaved generative Pretraining (I-GPT)

3.2 Model Training

4 Experiments and 4.1 Multimodal Comprehension

4.2 Text-Conditional Image Synthesis

4.3 Multimodal Joint Creation & Comprehension

5 Discussions

5.1 Synergy between creation & Comprehension?

5. 2 What is learned by DreamLLM?

6 Related Works

7 Conclusions and References

A Additional Experiments

B Additional Qualitative Examples

C Implementation Details

D Additional Related Works

E Limitations, Failure Cases & Future Works

5.2 WHAT IS LEARNED BY DREAMLLM?

Dream Query Attention In DREAMLLM, the conditional embedding is derived from MLLMs with some learned dream queries. Fig. 6 demonstrates a visualization of the learned cross-attention mechanism between these queries and the diffusion latent. Similar to (Hertz et al., 2023), we visualize the attention map averaged across all timestamps. It is seen that: i) The query attention is structured, disentangled, and semantically-oriented.

This is evidenced by the fact that distinct queries adeptly capture different subject and background semantics. ii) Despite varying prompts, attention patterns exhibit remarkable similarity as shown in Fig. 6 (a) and (b). This contrasts with the token attentions from the original SD, which are typically text-token dependent. We postulate that this arises from the model’s causal nature, leading to a consistent semantic structure order.

Figure 6: Cross-attention of dream queries and the diffusion U-Net latent. Similar to (Hertz et al., 2023), the 64 queries can be viewed as 64 “words”. Each attention map is computed as the cross-attention between each query and the latent feature in the U-Net. The 64 queries are ordered as 8×8 grid sequentially, and each attention map is the result averaged across all timestamps. Figure 6: Cross-attention of dream queries and the diffusion U-Net latent. Similar to (Hertz et al., 2023), the 64 queries can be viewed as 64 “words”. Each attention map is computed as the cross-attention between each query and the latent feature in the U-Net. The 64 queries are ordered as 8×8 grid sequentially, and each attention map is the result averaged across all timestamps.

Authors:

(1) Runpei Dong, Xi’an Jiaotong University and Internship at MEGVII;

(2) Chunrui Han, MEGVII Technology;

(3) Yuang Peng, Tsinghua University and Internship at MEGVII;

(4) Zekun Qi, Xi’an Jiaotong University and Internship at MEGVII;

(5) Zheng Ge, MEGVII Technology;

(6) Jinrong Yang, HUST and Internship at MEGVII;

(7) Liang Zhao, MEGVII Technology;

(8) Jianjian Sun, MEGVII Technology;

(9) Hongyu Zhou, MEGVII Technology;

(10) Haoran Wei, MEGVII Technology;

(11) Xiangwen Kong, MEGVII Technology;

(12) Xiangyu Zhang, MEGVII Technology and a Project leader;

(13) Kaisheng Ma, Tsinghua University and a Corresponding author;

(14) Li Yi, Tsinghua University, a Corresponding authors and Project leader.

What Is Learned by DreamLLM? Dream Query Attention | HackerNoon

Table of Links

5.2 WHAT IS LEARNED BY DREAMLLM?

Leave a Reply Cancel reply

Stay Connected

Latest News

How to Watch Giants vs. Cowboys on Thanksgiving Today

Microsoft, BlackRock, Rémy Cointreau, Dr. Martens and Direct Line

Intel Graphics Compiler Removes Support For Ice Lake & Older

Ray-Ban Meta glasses just got a huge Black Friday deal — save 20% plus get $90 Amazon credit

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Table of Links

5.2 WHAT IS LEARNED BY DREAMLLM?

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News