Table of links
ABSTRACT
1 INTRODUCTION
2 BACKGROUND: OMNIDIRECTIONAL 3D OBJECT DETECTION
3 PRELIMINARY EXPERIMENT
3.1 Experiment Setup
3.2 Observations
3.3 Summary and Challenges
4 OVERVIEW OF PANOPTICUS
5 MULTI-BRANCH OMNIDIRECTIONAL 3D OBJECT DETECTION
5.1 Model Design
6 SPATIAL-ADAPTIVE EXECUTION
6.1 Performance Prediction
5.2 Model Adaptation
6.2 Execution Scheduling
7 IMPLEMENTATION
8 EVALUATION
8.1 Testbed and Dataset
8.2 Experiment Setup
8.3 Performance
8.4 Robustness
8.5 Component Analysis
8.6 Overhead
9 RELATED WORK
10 DISCUSSION AND FUTURE WORK
11 CONCLUSION AND REFERENCES
2 BACKGROUND: OMNIDIRECTIONAL 3D OBJECT DETECTION
3D object detection aims to identify objects in space and predict their properties such as 3D location, size, and velocity. The predicted object information is utilized by application functionalities such as obstacle avoidance for robot navigation. Safe navigation cannot be solely ensured by Simultaneous Localization and Mapping (SLAM), lacking the ability to model the object sizes or movements in real-time. A robot must plan its navigation path based on obstacles’ location and size, or even their predictive trajectory, to prevent collisions beforehand. Moreover, in complex outdoor environments where objects can approach from multiple directions, the ability to detect surrounding objects becomes essential.
Existing methods for omnidirectional 3D object detection utilize LiDAR sensors or multiple cameras providing a 360° perception range. While LiDAR sensors offer accurate object localization based on depth measurements, the camera-based solutions have recently drawn attention due to their costeffectiveness. Recent camera-based detectors aggregate information from multiple camera images into a bird’s-eye-view (BEV) space, providing a top-down representation of the surrounding 3D space. Early work [38] proposed an end-toend trainable method to extract BEV features directly from multi-view images. Building upon [38], BEVDet [24] enables the detection of surrounding 3D objects using the extracted BEV features. Due to its simplified and scalable architecture, many of the latest BEV-based 3D detectors [23, 29, 30, 52] followed BEVDet’s inference pipeline, as shown in Figure 2. Such detectors have overcome the monocular ambiguity of camera-based approaches by introducing enhanced methods for each stage of the baseline BEVDet pipeline, achieving accuracy comparable to the LiDAR counterpart [54]. Table 1 lists these methods, which are described in the following. The first stage involves extracting 2D feature maps from multi-view images using backbone neural networks, such as ResNet [21], widely used in vision tasks. It is well-known that increasing the backbone capacity, e.g., the number of layers, or input image resolution leads to accuracy improvements. For example, combining a 152-layer ResNet with a 720×1,280 (height×width) resolution yields a more detailed feature map than a 34-layer ResNet with 256×448 resolution thereby enhancing the detection of small and distant objects. The second stage of BEV-based detection is to transform the extracted image features into 3D space using predicted depth. For this stage, prior work [30] employed a depth estimation
neural network, i.e., DepthNet, supervised by dense depth data generated from 3D point clouds. Accurate metric depth predictions (in meters) from images allow the detector to better distinguish objects from the backgrounds. The third stage involves projecting features scattered in each 3D camera coordinate into a unified BEV grid using camera parameters, generating a BEV feature map. Recent works [23, 31] have proposed techniques that fuse the BEV feature map from a previous frame with the feature map of the current frame. By exploiting temporal cues, the technique improves perception robustness, enabling the detection of temporarily occluded objects and the accurate prediction of object velocities. Lastly, the neural networks in the BEV head generate 3D bounding boxes and their properties, e.g., location and velocity, using the BEV features.
This paper is