By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: MaGGIe Roadmap: Overcoming Data Generalization in Matting Models | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > MaGGIe Roadmap: Overcoming Data Generalization in Matting Models | HackerNoon
Computing

MaGGIe Roadmap: Overcoming Data Generalization in Matting Models | HackerNoon

News Room
Last updated: 2025/12/19 at 10:26 PM
News Room Published 19 December 2025
Share
MaGGIe Roadmap: Overcoming Data Generalization in Matting Models | HackerNoon
SHARE

Table of Links

Abstract and 1. Introduction

  1. Related Works

  2. MaGGIe

    3.1. Efficient Masked Guided Instance Matting

    3.2. Feature-Matte Temporal Consistency

  3. Instance Matting Datasets

    4.1. Image Instance Matting and 4.2. Video Instance Matting

  4. Experiments

    5.1. Pre-training on image data

    5.2. Training on video data

  5. Discussion and References

Supplementary Material

  1. Architecture details

  2. Image matting

    8.1. Dataset generation and preparation

    8.2. Training details

    8.3. Quantitative details

    8.4. More qualitative results on natural images

  3. Video matting

    9.1. Dataset generation

    9.2. Training details

    9.3. Quantitative details

    9.4. More qualitative results

6. Discussion

Limitation and Future work. Our MaGGIe demonstrates effective performance in human video instance matting with binary mask guidance, yet it also presents opportunities for further research and development. One notable limitation is the reliance on one-hot vector representation for each location in the guidance mask, necessitating that each pixel is distinctly associated with a single instance. This requirement can pose challenges, particularly when integrating instance masks from varied sources, potentially leading to misalignments in certain regions. Additionally, the use of composite training datasets may constrain the model’s ability to generalize effectively to natural, real-world scenarios. While the creation of a comprehensive natural dataset remains a valuable goal, we propose an interim solution: the utilization of segmentation datasets combined with self-supervised or weakly-supervised learning techniques. This approach could enhance the model’s adaptability and performance in more diverse and realistic settings, paving the way for future advancements in the field.

Conclusion. Our study contributes to the evolving field of instance matting, with a focus that extends beyond human subjects. By integrating advanced techniques like transformer attention and sparse convolution, MaGGIe shows promising improvements over previous methods in detailed accuracy, temporal consistency, and computational efficiency for both image and video inputs. Additionally, our approach in synthesizing training data and developing a comprehensive benchmarking schema offers a new way to evaluate the robustness and effectiveness of models in instance matting tasks. This work represents a step forward in video instance matting and provides a foundation for future research in this area.

Acknowledgement. We sincerely appreciate Markus Woodson for the invaluable initial discussions. Additionally, I am deeply thankful to my wife, Quynh Phung, for her meticulous proofreading and feedback.

References

[1] Adobe. Adobe premiere. https://www.adobe.com/ products/premiere.html, 2023. 1

[2] Apple. Cutouts object ios 16. https://support. apple.com/en-hk/102460, 2023. 1

[3] Nicolas Ballas, Li Yao, Chris Pal, and Aaron Courville. Delving deeper into convolutional networks for learning video representations. arXiv preprint arXiv:1511.06432, 2015. 4

[4] Arie Berman, Arpag Dadourian, and Paul Vlahos. Method for removing from an image the background surrounding a selected object, 2000. US Patent 6,134,346. 2

[5] Guowei Chen, Yi Liu, Jian Wang, Juncai Peng, Yuying Hao, Lutao Chu, Shiyu Tang, Zewu Wu, Zeyu Chen, Zhiliang Yu, et al. Pp-matting: high-accuracy natural image matting. arXiv preprint arXiv:2204.09433, 2022. 2

[6] Xiangguang Chen, Ye Zhu, Yu Li, Bingtao Fu, Lei Sun, Ying Shan, and Shan Liu. Robust human matting via semantic guidance. In ACCV, 2022. 2

[7] Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexander Kirillov, and Rohit Girdhar. Masked-attention mask transformer for universal image segmentation. In CVPR, 2022. 2

[8] Ho Kei Cheng and Alexander G Schwing. Xmem: Longterm video object segmentation with an atkinson-shiffrin memory model. In ECCV, 2022. 1, 5

[9] Donghyeon Cho, Yu-Wing Tai, and Inso Kweon. Natural image matting using deep convolutional neural networks. In ECCV, 2016. 2

[10] Spconv Contributors. Spconv: Spatially sparse convolution library. https://github.com/traveller59/ spconv, 2022. 5

[11] Marco Forte and Franc¸ois Pitie.´ f, b, alpha matting. arXiv preprint arXiv:2003.07711, 2020. 1, 2

[12] Google. Magic editor in google pixel 8. https : //pixel.withgoogle.com/Pixel8Pro/usemagic-editor, 2023. 1

[13] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016. 11

[14] Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. Mask r-cnn. In ICCV, 2017. 13

[15] Anna Katharina Hebborn, Nils Hohner, and Stefan Muller. Occlusion matting: realistic occlusion handling for augmented reality applications. In 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 2017. 1

[16] Qiqi Hou and Feng Liu. Context-aware image matting for simultaneous foreground and alpha estimation. In ICCV, 2019. 1

[17] Wei-Lun Huang and Ming-Sui Lee. End-to-end video matting with trimap propagation. In CVPR, 2023. 1, 2, 3, 7, 23

[18] Chuong Huynh, Anh Tuan Tran, Khoa Luu, and Minh Hoai. Progressive semantic segmentation. In CVPR, 2021. 2

[19] Chuong Huynh, Yuqian Zhou, Zhe Lin, Connelly Barnes, Eli Shechtman, Sohrab Amirghodsi, and Abhinav Shrivastava. Simpson: Simplifying photo cleanup with single-click distracting object segmentation network. In CVPR, 2023. 2

[20] Sagar Imambi, Kolla Bhanu Prakash, and GR Kanagachidambaresan. Pytorch. Programming with TensorFlow: Solution for Edge Computing Applications, 2021. 5

[21] Lei Ke, Henghui Ding, Martin Danelljan, Yu-Wing Tai, ChiKeung Tang, and Fisher Yu. Video mask transfiner for highquality video instance segmentation. In ECCV, 2022. 2

[22] Zhanghan Ke, Jiayu Sun, Kaican Li, Qiong Yan, and Rynson WH Lau. Modnet: Real-time trimap-free portrait matting via objective decomposition. In AAAI, 2022. 2

[23] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollar, and Ross Girshick. Segment anything. In ICCV, 2023. 2, 3

[24] Philip Lee and Ying Wu. Nonlocal matting. In CVPR, 2011. 2

[25] Anat Levin, Dani Lischinski, and Yair Weiss. A closed-form solution to natural image matting. IEEE TPAMI, 30(2), 2007. 2

[26] Jizhizi Li, Sihan Ma, Jing Zhang, and Dacheng Tao. Privacypreserving portrait matting. In ACM MM, 2021. 2

[27] Jizhizi Li, Jing Zhang, and Dacheng Tao. Deep automatic natural image matting. In IJCAI, 2021. 2

[28] Jiachen Li, Vidit Goel, Marianna Ohanyan, Shant Navasardyan, Yunchao Wei, and Humphrey Shi. Vmformer: End-to-end video matting with transformer. arXiv preprint arXiv:2208.12801, 2022. 3

[29] Jizhizi Li, Jing Zhang, Stephen J Maybank, and Dacheng Tao. Bridging composite and real: towards end-to-end deep image matting. IJCV, 2022. 2, 13

[30] Jiachen Li, Roberto Henschel, Vidit Goel, Marianna Ohanyan, Shant Navasardyan, and Humphrey Shi. Video instance matting. In WACV, 2024. 2

[31] Yaoyi Li and Hongtao Lu. Natural image matting via guided contextual attention. In AAAI, 2020. 1, 2

[32] Chung-Ching Lin, Jiang Wang, Kun Luo, Kevin Lin, Linjie Li, Lijuan Wang, and Zicheng Liu. Adaptive human matting for dynamic videos. In CVPR, 2023. 2, 3

[33] Shanchuan Lin, Andrey Ryabtsev, Soumyadip Sengupta, Brian L Curless, Steven M Seitz, and Ira KemelmacherShlizerman. Real-time high-resolution background matting. In CVPR, 2021. 2, 3, 5

[34] Shanchuan Lin, Linjie Yang, Imran Saleemi, and Soumyadip Sengupta. Robust high-resolution video matting with temporal guidance. In WACV, 2022. 2, 3

[35] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence ´ Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014. 2

[36] Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall Tappen, and Marianna Pensky. Sparse convolutional neural networks. In CVPR, 2015. 2

[37] Hao Lu, Yutong Dai, Chunhua Shen, and Songcen Xu. Indices matter: Learning to index for deep image matting. In CVPR, 2019. 1, 2

[38] Seoung Wug Oh, Joon-Young Lee, Ning Xu, and Seon Joo Kim. Video object segmentation using space-time memory networks. In ICCV, 2019. 1

[39] Kwanyong Park, Sanghyun Woo, Seoung Wug Oh, In So Kweon, and Joon-Young Lee. Mask-guided matting in the wild. In CVPR, 2023. 1, 2, 3, 6, 19

[40] Khoi Pham, Kushal Kafle, Zhe Lin, Zhihong Ding, Scott Cohen, Quan Tran, and Abhinav Shrivastava. Improving closed and open-vocabulary attribute prediction using transformers. In ECCV, 2022. 2

[41] Khoi Pham, Chuong Huynh, and Abhinav Shrivastava. Composing object relations and attributes for image-text matching. In CVPR, 2024.

[42] Quynh Phung, Songwei Ge, and Jia-Bin Huang. Grounded text-to-image synthesis with attention refocusing. In CVPR, 2024. 2

[43] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. IJCV, 2015. 13

[44] Soumyadip Sengupta, Vivek Jayaram, Brian Curless, Steven M Seitz, and Ira Kemelmacher-Shlizerman. Background matting: The world is your green screen. In CVPR, 2020. 1

[45] Hongje Seong, Seoung Wug Oh, Brian Price, Euntai Kim, and Joon-Young Lee. One-trimap video matting. In ECCV, 2022. 1, 2, 3, 5, 6, 7, 23

[46] Xiaoyong Shen, Xin Tao, Hongyun Gao, Chao Zhou, and Jiaya Jia. Deep automatic portrait matting. In ECCV, 2016. 2

[47] Yanan Sun, Chi-Keung Tang, and Yu-Wing Tai. Semantic image matting. In CVPR, 2021. 2 [48] Yanan Sun, Guanzhi Wang, Qiao Gu, Chi-Keung Tang, and Yu-Wing Tai. Deep video matting via spatio-temporal alignment and aggregation. In CVPR, 2021. 3, 6

[49] Yanan Sun, Chi-Keung Tang, and Yu-Wing Tai. Human instance matting via mutual guidance and multi-instance refinement. In CVPR, 2022. 1, 2, 3, 5, 6, 7, 11, 13, 14, 16, 17, 18, 20

[50] Yanan Sun, Chi-Keung Tang, and Yu-Wing Tai. Ultrahigh resolution image/video matting with spatio-temporal sparsity. In CVPR, 2023. 2, 3, 4, 5, 6, 7, 12, 13, 16, 17, 18, 20

[51] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. NeurIPS, 30, 2017. 3

[52] Tiantian Wang, Sifei Liu, Yapeng Tian, Kai Li, and MingHsuan Yang. Video matting via consistency-regularized graph neural networks. In ICCV, 2021. 3, 5

[53] Yumeng Wang, Bo Xu, Ziwen Li, Han Huang, Cheng Lu, and Yandong Guo. Video object matting via hierarchical space-time semantic guidance. In WACV, 2023. 2, 3

[54] Ning Xu, Brian Price, Scott Cohen, and Thomas Huang. Deep image matting. In CVPR, 2017. 2

[55] Zongxin Yang, Yunchao Wei, and Yi Yang. Associating objects with transformers for video object segmentation. NeurIPS, 2021. 2, 3, 11

[56] Qihang Yu, Jianming Zhang, He Zhang, Yilin Wang, Zhe Lin, Ning Xu, Yutong Bai, and Alan Yuille. Mask guided matting via progressive refinement network. In CVPR, 2021. 1, 2, 3, 5, 6, 7, 11, 13, 16, 17, 18, 19

[57] Yunke Zhang, Chi Wang, Miaomiao Cui, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Hujun Bao, Qixing Huang, and Weiwei Xu. Attention-guided temporally coherent video object matting. In ACM MM, 2021. 3, 5, 6, 7

:::info
Authors:

(1) Chuong Huynh, University of Maryland, College Park ([email protected]);

(2) Seoung Wug Oh, Adobe Research (seoh,[email protected]);

(3) Abhinav Shrivastava, University of Maryland, College Park ([email protected]);

(4) Joon-Young Lee, Adobe Research ([email protected]).

:::


:::info
This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Why Every Modern Research Facility Needs a High-Quality Overhead Stirrer Why Every Modern Research Facility Needs a High-Quality Overhead Stirrer
Next Article Apple may revive the iMac Pro with M5 Max chip Apple may revive the iMac Pro with M5 Max chip
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Sequoia partner spreads debunked Brown shooting theory, testing new leadership |  News
Sequoia partner spreads debunked Brown shooting theory, testing new leadership | News
News
Samsung and LG preview new Micro RGB TVs. OK, but what are Micro RGB TVs?
Samsung and LG preview new Micro RGB TVs. OK, but what are Micro RGB TVs?
News
Inhuman Resources: The Double Standard of AI Use in Modern Recruiting | HackerNoon
Inhuman Resources: The Double Standard of AI Use in Modern Recruiting | HackerNoon
Computing
Apple CarPlay Not Working After The iOS 26 Update? There’s An Easy Fix For That – BGR
Apple CarPlay Not Working After The iOS 26 Update? There’s An Easy Fix For That – BGR
News

You Might also Like

Inhuman Resources: The Double Standard of AI Use in Modern Recruiting | HackerNoon
Computing

Inhuman Resources: The Double Standard of AI Use in Modern Recruiting | HackerNoon

1 Min Read
Huawei launches HarmonyOS 6 developer beta version at HDC 2025 · TechNode
Computing

Huawei launches HarmonyOS 6 developer beta version at HDC 2025 · TechNode

1 Min Read
9 Influencer Marketing KPIs to Measure Campaign Success
Computing

9 Influencer Marketing KPIs to Measure Campaign Success

7 Min Read
Why Zack Shooter Believes AI Agents Will Expose a Structural Fault Line in Financial Infrastructure | HackerNoon
Computing

Why Zack Shooter Believes AI Agents Will Expose a Structural Fault Line in Financial Infrastructure | HackerNoon

0 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?