By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: In Sparse Clouds and Ambiguous Texts, This AI Model Still Finds Its Way | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > In Sparse Clouds and Ambiguous Texts, This AI Model Still Finds Its Way | HackerNoon
Computing

In Sparse Clouds and Ambiguous Texts, This AI Model Still Finds Its Way | HackerNoon

News Room
Last updated: 2025/07/16 at 5:08 PM
News Room Published 16 July 2025
Share
SHARE

Table of Links

Abstract and 1. Introduction

  1. Related Work

  2. Method

    3.1 Overview of Our Method

    3.2 Coarse Text-cell Retrieval

    3.3 Fine Position Estimation

    3.4 Training Objectives

  3. Experiments

    4.1 Dataset Description and 4.2 Implementation Details

    4.3 Evaluation Criteria and 4.4 Results

  4. Performance Analysis

    5.1 Ablation Study

    5.2 Qualitative Analysis

    5.3 Text Embedding Analysis

  5. Conclusion and References

Supplementary Material

  1. Details of KITTI360Pose Dataset
  2. More Experiments on the Instance Query Extractor
  3. Text-Cell Embedding Space Analysis
  4. More Visualization Results
  5. Point Cloud Robustness Analysis

Anonymous Authors

  1. Details of KITTI360Pose Dataset
  2. More Experiments on the Instance Query Extractor
  3. Text-Cell Embedding Space Analysis
  4. More Visualization Results
  5. Point Cloud Robustness Analysis

7 DETAILS OF KITTI360POSE DATASET

Figure 7: Visualization of the KITTI360Pose dataset. The trajectories of five training sets, three test sets, and one validation set are shown in the dashed borders. One colored point cloud scene and three cells are shown in the middle.Figure 7: Visualization of the KITTI360Pose dataset. The trajectories of five training sets, three test sets, and one validation set are shown in the dashed borders. One colored point cloud scene and three cells are shown in the middle.

Table 7: Ablation study of the query number on KITTI360Pose dataset.Table 7: Ablation study of the query number on KITTI360Pose dataset.

We conduct an additional experiment to assess the impact of the number of queries on the performance of our instance query extractor. As detailed in Table 7, we evaluate the localization recall rate using 16, 24, and 32 queries. The result demonstrates that using 24 queries yields the highest localization recall rate, i.e, 0.23/0.53/0.64 on the validation set and 0.22/0.47/0.58 on the test set. This finding suggests that the optimal number of queries for maximizing the effectiveness of our model is 24.

9 TEXT-CELL EMBEDDING SPACE ANALYSIS

Fig. 8 shows the aligned text-cell embedding space via T-SNE [37]. Under the instance-free scenario, we compare our model with Text2loc [42] using a pre-trained instance segmentation model, Mask3D [35], as a prior step. It can be observed that Text2Loc results in a less discriminative space, where positive cells are relatively far from the text query feature. In contrast, our IFRP-T2P effectively reduces the distance between positive cell features and text query features within the embedding space, thereby creating a more informative embedding space. This enhancement in the embedding space is critical for improving the accuracy of text-cell retrieval.

Figure 8: T-SNE visualization for the text features and cell features in the coarse stage.Figure 8: T-SNE visualization for the text features and cell features in the coarse stage.

10 MORE VISUALIZATION RESULTS

Fig. 9 shows more visualization results including both the retrieval outcomes and the results of fine position estimation. The results suggest that the coarse text-cell retrieval serves as a foundational step in the overall localization process. The subsequent fine position estimation generally improves the localization performance. However, there are cases where the accuracy of this fine estimation is compromised, particularly when the input descriptions are vague. This detrimental effect on accuracy is illustrated in the 4-th row and 6-th row if Fig. 9.

Figure 9: Localization results on the KITTI360Pose dataset. In the coarse stage, the numbers in the top 3 retrieval submaps represent the center distances between retrieved submaps and the ground truth. For fine localization, pink and blue points represent the ground-truth localization and the predicted location, with the number indicating the distance between them.Figure 9: Localization results on the KITTI360Pose dataset. In the coarse stage, the numbers in the top 3 retrieval submaps represent the center distances between retrieved submaps and the ground truth. For fine localization, pink and blue points represent the ground-truth localization and the predicted location, with the number indicating the distance between them.

11 POINT CLOUD ROBUSTNESS ANALYSIS

Previous works [21, 39, 42] focused solely on examining the impact of textual modifications on localization accuracy, ignoring the impact of point cloud modification. In this study, we further consider the effects of point cloud degradation, which is crucial for fully analyzing our IFRP-T2P model. Unlike the accumulated point clouds provided in the KITTI360Pose dataset, LiDAR sensors typically capture only sparse point clouds in real-world settings. To assess the robustness of our model under conditions of point cloud sparsity, we conduct experiments by randomly masking out one-third of the points and compare these results to those obtained using raw point clouds. As illustrated in Fig. 10, when taking the masked point cloud as input, our IFRP-T2P model achieves a localization recall of 0.20 at top-1 with an error bound of 𝜖 < 5𝑚 on the validation set. Compared to Text2Loc, which shows a degradation of 22.2%, our model exhibits a lower degradation rate of 15%. This result indicates that our model is more robust to point cloud variation.

Figure 10: Point cloud robustness analysis.Figure 10: Point cloud robustness analysis.

Authors:

(1) Lichao Wang, FNii, CUHKSZ ([email protected]);

(2) Zhihao Yuan, FNii and SSE, CUHKSZ ([email protected]);

(3) Jinke Ren, FNii and SSE, CUHKSZ ([email protected]);

(4) Shuguang Cui, SSE and FNii, CUHKSZ ([email protected]);

(5) Zhen Li, a Corresponding Author from SSE and FNii, CUHKSZ ([email protected]).


This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Today's NYT Connections: Sports Edition Hints, Answers for July 17 #297
Next Article Warning to Brits over explosion of WASPS as population soars
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Deepfakes for AI, a gold mine for cybercriminals
Mobile
Abhigyan Khaund on the Systems Engineering Behind AI Applications | HackerNoon
Computing
iPhone 17 Pro may get anti-reflective display glass
News
Tower of London mystery as experts unearth 50 human bodies in ‘biggest dig’
News

You Might also Like

Computing

Abhigyan Khaund on the Systems Engineering Behind AI Applications | HackerNoon

8 Min Read
Computing

Benchmarking Muon Collider Physics: Simplified Models & Production Rates | HackerNoon

6 Min Read
Computing

Mobile Development Expert Opinion: Designing Apps for Real-World Work Environments | HackerNoon

4 Min Read
Computing

Why Teams Are Ditching DynamoDB | HackerNoon

12 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?