Table of links
ABSTRACT
1 INTRODUCTION
2 BACKGROUND: OMNIDIRECTIONAL 3D OBJECT DETECTION
3 PRELIMINARY EXPERIMENT
3.1 Experiment Setup
3.2 Observations
3.3 Summary and Challenges
4 OVERVIEW OF PANOPTICUS
5 MULTI-BRANCH OMNIDIRECTIONAL 3D OBJECT DETECTION
5.1 Model Design
6 SPATIAL-ADAPTIVE EXECUTION
6.1 Performance Prediction
5.2 Model Adaptation
6.2 Execution Scheduling
7 IMPLEMENTATION
8 EVALUATION
8.1 Testbed and Dataset
8.2 Experiment Setup
8.3 Performance
8.4 Robustness
8.5 Component Analysis
8.6 Overhead
9 RELATED WORK
10 DISCUSSION AND FUTURE WORK
11 CONCLUSION AND REFERENCES
10 DISCUSSION AND FUTURE WORK
Supporting multiple tasks. As Panopticus is the initial endeavor toward adaptive omnidirectional 3D detection, our current implementation operates under the assumption of a single-task execution environment. Within this environment, the system’s performance characteristics, such as offline latency profiles of the multi-branch model, are consistent during runtime. However, in real-world applications like mobile robot navigation, it is common to run 3D object detection alongside other critical tasks, such as odometry or path planning. The future improvements of Panopticus considering the multi-task execution environment could be achieved in several ways. First, monitoring the runtime dynamics caused by resource contention among concurrent tasks is crucial for optimized resource utilization. Second, there is a need to explore how to co-design multi-branch models for different 3D tasks and optimize those models given the constraints of application and device. Lastly, the better utilization of available heterogeneous processors, such as CPUs and NPUs, could enhance the efficiency of the multi-task workload. Selection criteria for performance metrics. For the design and evaluation of Panopticus, we used the most popular 3D detection performance metrics—detection score [2] and mAP. These metrics are effective in assessing the overall performance of a 3D detection system. However, the system’s effectiveness could be further improved through metric design that considers application-specific requirements. For instance, detecting fast and close objects is crucial for the safety of robot navigation systems, as opposed to detecting slowly moving or distant objects. Also, the importance of different object types varies depending on application scenarios. This application-centric metric design is worth exploring for future improvements.
11 CONCLUSION AND REFERENCES
This paper proposed Panopticus, an omnidirectional and camera-based 3D object detection system designed for resourceconstrained edge devices. Panopticus effectively balances detection accuracy and latency by employing a multi-branch model that selects the optimal inference configuration for each camera view based on predicted spatial characteristics. Extensive experiments have shown that Panopticus outperforms its baselines across various environments and edge devices, highlighting its potential to enhance applications requiring real-time perception of surrounding 3D objects.
ACKNOWLEDGMENTS
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RS-2024-00344323).
REFERENCES
[1] Garrick Brazil, Gerard Pons-Moll, Xiaoming Liu, and Bernt Schiele. 2020. Kinematic 3D Object Detection in Monocular Video. In European Conference on Computer Vision (ECCV).
[2] Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. 2020. nuScenes: A Multimodal Dataset for Autonomous Driving. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11618–11628. https://doi.org/10.1109/CVPR42600. 2020.01164
[3] Ting-Wu Chin, Ruizhou Ding, and Diana Marculescu. 2019. Adascale: Towards real-time video object detection using adaptive scaling. Proceedings of machine learning and systems 1 (2019), 431–441.
[4] NVIDIA Corporation. 2020. tegrastats Utility. https://docs.nvidia. com/drive/drive-os-5.2.0.0L/drive-os/index.html#page/DRIVE_OS_ Linux_SDK_Development_Guide/Utilities/util_tegrastats.html
[5] NVIDIA Corporation. 2024. CUDA C/C++ Streams and Concurrency. https://developer.download.nvidia.com/CUDA/training/ StreamsAndConcurrencyWebinar.pdf
[6] NVIDIA Corporation. 2024. CUDA Toolkit. https://developer.nvidia. com/cuda-toolkit.
[7] NVIDIA Corporation. 2024. IBuilderConfig. https: //docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/ Core/BuilderConfig.html [8] NVIDIA Corporation. 2024. Jetson AGX Orin for Next-Gen Robotics. https://www.nvidia.com/en-us/autonomous-machines/embeddedsystems/jetson-orin/.
[9] NVIDIA Corporation. 2024. NVIDIA A100 Tensor Core GPU. https: //www.nvidia.com/en-us/data-center/a100/.
[10] NVIDIA Corporation. 2024. NVIDIA Jetson Xavier. https: //www.nvidia.com/en-us/autonomous-machines/embeddedsystems/jetson-xavier-series/
[11] NVIDIA Corporation. 2024. NVIDIA TensorRT. https://developer. nvidia.com/tensorrt
[12] dmlc. 2023. xgboost. https://github.com/dmlc/xgboost
[13] Marc Eder, Mykhailo Shvets, John Lim, and Jan-Michael Frahm. 2020. Tangent Images for Mitigating Spherical Distortion. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12423– 12431. https://doi.org/10.1109/CVPR42600.2020.01244 [14] Biyi Fang, Xiao Zeng, Faen Zhang, Hui Xu, and Mi Zhang. 2020. FlexDNN: Input-Adaptive On-Device Deep Learning for Efficient Mobile Vision. In 2020 IEEE/ACM Symposium on Edge Computing (SEC). 84–95. https://doi.org/10.1109/SEC50012.2020.00014
[15] Biyi Fang, Xiao Zeng, and Mi Zhang. 2018. NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking (New Delhi, India) (MobiCom ’18). Association for Computing Machinery, New York, NY, USA, 115–127. https://doi.org/10.1145/3241539.3241559
[16] Dario Floreano and Robert J. Wood. 2015. Science, technology and the future of small autonomous drones. Issue 7553. https://doi.org/10. 1038/nature14542
[17] COIN-OR Foundation. 2024. pulp. https://github.com/coin-or/pulp
[18] Yongjie Guan, Xueyu Hou, Nan Wu, Bo Han, and Tao Han. 2022. DeepMix: mobility-aware, lightweight, and hybrid 3D object detection for headsets. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services (Portland, Oregon) (MobiSys ’22). Association for Computing Machinery, New York, NY, USA, 28–41. https://doi.org/10.1145/3498361.3538945
[19] Rui Han, Qinglong Zhang, Chi Harold Liu, Guoren Wang, Jian Tang, and Lydia Y. Chen. 2021. LegoDNN: block-grained scaling of deep neural networks for mobile vision. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking (New Orleans, Louisiana) (MobiCom ’21). Association for Computing Machinery, New York, NY, USA, 406–419. https://doi.org/10.1145/3447993. 3483249
[20] Dongjiao He, Wei Xu, Nan Chen, Fanze Kong, Chongjian Yuan, and Fu Zhang. 2023. Point-LIO: Robust High-Bandwidth Light Detection and Ranging Inertial Odometry. Advanced Intelligent Systems 5, 7 (2023), 2370029. https://doi.org/10.1002/aisy.202370029 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/aisy.202370029
[21] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https: //doi.org/10.1109/CVPR.2016.90
[22] Xueyu Hou, Yongjie Guan, and Tao Han. 2022. NeuLens: spatialbased dynamic acceleration of convolutional neural networks on edge. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking (Sydney, NSW, Australia) (MobiCom ’22). Association for Computing Machinery, New York, NY, USA, 186–199. https://doi.org/10.1145/3495243.3560528
[23] Junjie Huang and Guan Huang. 2022. Bevdet4d: Exploit temporal cues in multi-camera 3d object detection. arXiv preprint arXiv:2203.17054 (2022).
[24] Junjie Huang, Guan Huang, Zheng Zhu, Yun Ye, and Dalong Du. 2021. Bevdet: High-performance multi-camera 3d object detection in birdeye-view. arXiv preprint arXiv:2112.11790 (2021).
[25] Insta360. 2024. Insta360 X3. https://www.insta360.com/us/product/ insta360-x3
[26] Joo Seong Jeong, Jingyu Lee, Donghyun Kim, Changmin Jeon, Changjin Jeong, Youngki Lee, and Byung-Gon Chun. 2022. Band: coordinated multi-DNN inference on heterogeneous mobile processors. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services (Portland, Oregon) (MobiSys ’22). Association for Computing Machinery, New York, NY, USA, 235–247. https://doi.org/10.1145/3498361.3538948
[27] Fucheng Jia, Deyu Zhang, Ting Cao, Shiqi Jiang, Yunxin Liu, Ju Ren, and Yaoxue Zhang. 2022. CoDL: efficient CPU-GPU co-execution for deep learning inference on mobile devices. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services (Portland, Oregon) (MobiSys ’22). Association for Computing Machinery, New York, NY, USA, 209–221. https://doi.org/10.1145/ 3498361.3538932
[28] Shiqi Jiang, Zhiqi Lin, Yuanchun Li, Yuanchao Shu, and Yunxin Liu. 2021. Flexible high-resolution object detection on edge devices with tunable latency. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking (New Orleans, Louisiana) (MobiCom ’21). Association for Computing Machinery, New York, NY, USA, 559–572. https://doi.org/10.1145/3447993.3483274
[29] Yinhao Li, Han Bao, Zheng Ge, Jinrong Yang, Jianjian Sun, and Zeming Li. 2023. BEVStereo: Enhancing Depth Estimation in Multi-View 3D Object Detection with Temporal Stereo. Proceedings of the AAAI Conference on Artificial Intelligence 37, 2 (Jun. 2023), 1486–1494. https: //doi.org/10.1609/aaai.v37i2.25234
[30] Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, and Zeming Li. 2023. BEVDepth: Acquisition of Reliable Depth for Multi-View 3D Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence 37, 2 (Jun. 2023), 1477–1485. https://doi.org/10.1609/aaai.v37i2.25233
[31] Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Yu Qiao, and Jifeng Dai. 2022. BEVFormer: Learning Bird’s-EyeView Representation from Multi-camera Images via Spatiotemporal Transformers. In European Conference on Computer Vision (ECCV). [32] Velodyne Lidar. 2024. Puck. https://velodynelidar.com/products/puck/
[33] Neiwen Ling, Xuan Huang, Zhihe Zhao, Nan Guan, Zhenyu Yan, and Guoliang Xing. 2023. BlastNet: Exploiting Duo-Blocks for CrossProcessor Real-Time DNN Inference. In Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems (Boston, Massachusetts, USA) (SenSys ’22). Association for Computing Machinery, New York, NY, USA, 91–105. https://doi.org/10.1145/3560905.3568520
[34] Xianpeng Liu, Nan Xue, and Tianfu Wu. 2022. Learning Auxiliary Monocular Contexts Helps Monocular 3D Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence 36, 2 (Jun. 2022), 1810–1818. https://doi.org/10.1609/aaai.v36i2.20074
[35] open mmlab. 2020. mmdetection3d. https://github.com/open-mmlab/ mmdetection3d.
[36] Ziqi Pang, Zhichao Li, and Naiyan Wang. 2023. SimpleTrack: Understanding and Rethinking 3D Multi-object Tracking. In European Conference on Computer Vision (ECCV).
[37] Keondo Park, You Rim Choi, Inhoe Lee, and Hyung-Sin Kim. 2023. PointSplit: Towards On-device 3D Object Detection with Heterogeneous Low-power Accelerators. In Proceedings of the 22nd International Conference on Information Processing in Sensor Networks (San Antonio, TX, USA) (IPSN ’23). Association for Computing Machinery, New York, NY, USA, 67–81. https://doi.org/10.1145/3583120.3587045
[38] Jonah Philion and Sanja Fidler. 2020. Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D. In European Conference on Computer Vision (ECCV).
[39] Matteo Poggi and Stefano Mattoccia. 2016. A wearable mobility aid for the visually impaired based on embedded 3D vision and deep learning. In 2016 IEEE Symposium on Computers and Communication (ISCC). 208–213. https://doi.org/10.1109/ISCC.2016.7543741
[40] Chinthaka Premachandra, Shohei Ueda, and Yuya Suzuki. 2020. Detection and Tracking of Moving Objects at Road Intersections Using a 360-Degree Camera for Driver Assistance and Automated Driving. IEEE Access 8 (2020), 135652–135660. https://doi.org/10.1109/ACCESS. 2020.3011430
[41] PyTorch. 2024. PyTorch. https://pytorch.org/
[42] Xiaogang Ruan, Chenliang Lin, Jing Huang, and Yufan Li. 2022. Obstacle avoidance navigation method for robot based on deep reinforcement learning. In 2022 IEEE 6th Information Technology and Mechatronics Engineering Conference (ITOEC), Vol. 6. 1633–1637. https: //doi.org/10.1109/ITOEC53115.2022.9734337
[43] scikit learn. 2024. scikit-learn. https://scikit-learn.org/stable/
[44] Shuyao Shi, Jiahe Cui, Zhehao Jiang, Zhenyu Yan, Guoliang Xing, Jianwei Niu, and Zhenchao Ouyang. 2022. VIPS: real-time perception fusion for infrastructure-assisted autonomous driving. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking (Sydney, NSW, Australia) (MobiCom ’22). Association for Computing Machinery, New York, NY, USA, 133–146. https://doi.org/ 10.1145/3495243.3560539
[45] Wei Shi, Rui Shan, and Yoshihiro Okada. 2022. A Navigation System for Visual Impaired People Based on Object Detection. In 2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI). 354– 358. https://doi.org/10.1109/IIAIAAI55812.2022.00078
[46] Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurélien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Yu Zhang, Jonathon Shlens, Zhifeng Chen, and Dragomir Anguelov. 2020. Scalability in Perception for Autonomous Driving: Waymo Open Dataset. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2443–2451. https://doi.org/10.1109/CVPR42600. 2020.00252
[47] Emil Talpes, Debjit Das Sarma, Ganesh Venkataramanan, Peter Bannon, Bill McGee, Benjamin Floering, Ankit Jalote, Christopher Hsiong, Sahil Arora, Atchyuth Gorti, and Gagandeep S. Sachdev. 2020. Compute Solution for Tesla’s Full Self-Driving Computer. IEEE Micro 40, 2 (2020), 25–35. https://doi.org/10.1109/MM.2020.2975764
[48] Xtreme1. 2023. xtreme1. https://github.com/xtreme1-io/xtreme1
[49] Ran Xu, Chen-lin Zhang, Pengcheng Wang, Jayoung Lee, Subrata Mitra, Somali Chaterji, Yin Li, and Saurabh Bagchi. 2020. ApproxDet: content and contention-aware approximate object detection for mobiles. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems (Virtual Event, Japan) (SenSys ’20). Association for Computing Machinery, New York, NY, USA, 449–462. https: //doi.org/10.1145/3384419.3431159
[50] Zirui Xu, Fuxun Yu, Chenchen Liu, and Xiang Chen. 2019. ReForm: Static and Dynamic Resource-Aware DNN Reconfiguration Framework for Mobile Device. In Proceedings of the 56th Annual Design Automation Conference 2019 (Las Vegas, NV, USA) (DAC ’19). Association for Computing Machinery, New York, NY, USA, Article 183, 6 pages. https://doi.org/10.1145/3316781.3324696
[51] Chenyu Yang, Yuntao Chen, Hao Tian, Chenxin Tao, Xizhou Zhu, Zhaoxiang Zhang, Gao Huang, Hongyang Li, Yu Qiao, Lewei Lu, Jie Zhou, and Jifeng Dai. 2023. BEVFormer v2: Adapting Modern Image Backbones to Bird’s-Eye-View Recognition via Perspective Supervision. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 17830–17839. https://doi.org/10.1109/CVPR52729. 2023.01710 [52] Lei Yang, Kaicheng Yu, Tao Tang, Jun Li, Kun Yuan, Li Wang, Xinyu Zhang, and Peng Chen. 2023. BEVHeight: A Robust Framework for Vision-based Roadside 3D Object Detection. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 21611–21620. https://doi.org/10.1109/CVPR52729.2023.02070
[53] Mingyu Yang, Yu Chen, and Hun-Seok Kim. 2022. Efficient Deep Visual and Inertial Odometry with Adaptive Visual Modality Selection. In European Conference on Computer Vision (ECCV).
[54] Tianwei Yin, Xingyi Zhou, and Philipp Krähenbühl. 2021. Center-based 3D Object Detection and Tracking. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11779–11788. https: //doi.org/10.1109/CVPR46437.2021.01161
[55] Ekim Yurtsever, Jacob Lambert, Alexander Carballo, and Kazuya Takeda. 2020. A Survey of Autonomous Driving: Common Practices and Emerging Technologies. IEEE Access 8 (2020), 58443–58469. https://doi.org/10.1109/ACCESS.2020.2983149
[56] Limin Zeng, Denise Prescher, and Gerhard Weber. 2012. Exploration and avoidance of surrounding obstacles for the visually impaired. In Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Accessibility (Boulder, Colorado, USA) (ASSETS ’12). Association for Computing Machinery, New York, NY, USA, 111–118. https://doi.org/10.1145/2384916.2384936
[57] Z. Zhang. 2000. A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 11 (2000), 1330–1334. https://doi.org/10.1109/34.888718
[58] Zhouyu Zhang, Yunfeng Cao, Meng Ding, Likui Zhuang, and Jiang Tao. 2020. Monocular vision based obstacle avoidance trajectory planning for Unmanned Aerial Vehicle. Aerospace Science and Technology 106 (2020), 106199. https://doi.org/10.1016/j.ast.2020.106199
[59] Zhihe Zhao, Neiwen Ling, Nan Guan, and Guoliang Xing. 2023. Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU. arXiv preprint arXiv:2307.04339 (2023).
[60] Kunyan Zhu, Wei Chen, Wei Zhang, Ran Song, and Yibin Li. 2020. Autonomous Robot Navigation Based on Multi-Camera Perception. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 5879–5885. https://doi.org/10.1109/IROS45743.2020.9341304
This paper is