Table of Links
Abstract and 1 Introduction
2 Related Works
3 Method and 3.1 Proxy-Guided 3D Conditioning for Diffusion
3.2 Interactive Generation Workflow and 3.3 Volume Conditioned Reconstruction
4 Experiment and 4.1 Comparison on Proxy-based and Image-based 3D Generation
4.2 Comparison on Controllable 3D Object Generation, 4.3 Interactive Generation with Part Editing & 4.4 Ablation Studies
5 Conclusions, Acknowledgments, and References
SUPPLEMENTARY MATERIAL
A. Implementation Details
B. More Discussions
C. More Experiments
3.2 Interactive Generation Workflow
In 3D modeling tasks, artists are likely to adjust the target object back and forth, and progressively edit the local part for satisfactory results. However, the interactive generation and previewing for 3D objects remains an open problem due to the lack of finegrained controlling ability and slow reconstruction speed [Cheng et al. 2023b; Li et al. 2023a]. Hence, we develop a novel interactive and responsive generation workflow upon the Coin3D framework, which fully leverages the piecewise proxies of the condition for easy and precise part editing, and reuses 3D control volume for interactive previewing.
Proxy-bounded part editing. As the coarse proxies are mainly constructed with basic shape elements, we design an interactive local part editing workflow based on the elements in the proxy. Specifically, users can specify a certain piece from the basic shapes, and regenerate the piece content. For example, we can regenerate one of the pumpkins into a red apple by designating the sphere on the plate, as shown in Fig. 3. However, because the multiview diffusion model is both conditioned on 3D volume and 2D images, it is not trivial to realize the editing regardless of the complete conditions. Therefore, we propose a two-pathway condition editing scheme that considers both 2D and 3D conditions, as illustrated in Fig. 3. For 2D conditions, we construct a 2D mask by projecting masked proxies at the desired editing view and perform diffusion-based 2D regenerating (a.k.a. masked image-to-image inpainting) [Meng et al. 2021; Zhou et al. 2023] with the mask. We then use the edited image as the image condition for the denoising steps. For 3D conditions, we first construct a 3D feature mask by slightly dilating the masked proxy, which ensures seamless fusion of the newly generated content. Then, during each denoising step, we reuse the cached original 3D control volume and only partially update the unmasked volume according to the feature mask 𝑀, as:
3.3 Volume-Conditioned Reconstruction
Authors:
(1) Wenqi Dong, from Zhejiang University, and conducted this work during his internship at PICO, ByteDance;
(2) Bangbang Yang, from ByteDance contributed equally to this work together with Wenqi Dong;
(3) Lin Ma, ByteDance;
(4) Xiao Liu, ByteDance;
(5) Liyuan Cui, Zhejiang University;
(6) Hujun Bao, Zhejiang University;
(7) Yuewen Ma, ByteDance;
(8) Zhaopeng Cui, a Corresponding author from Zhejiang University.