:::info
Authors:
(1) Hyeongjun Kwon, Yonsei University;
(2) Jinhyun Jang, Yonsei University;
(3) Jin Kim, Yonsei University;
(4) Kwonyoung Kim, Yonsei University;
(5) Kwanghoon Sohn, Yonsei University and Korea Institute of Science and Technology (KIST).
:::
Table of Links
Abstract and 1 Introduction
2. Related Work
3. Hyperbolic Geometry
4. Method
4.1. Overview
4.2. Probabilistic hierarchy tree
4.3. Visual hierarchy decomposition
4.4. Learning hierarchy in hyperbolic space
4.5. Visual hierarchy encoding
5. Experiments and 5.1. Image classification
5.2. Object detection and Instance segmentation
5.3. Semantic segmentation
5.4. Visualization
6. Ablation studies and discussion
7. Conclusion and References
A. Network Architecture
B. Theoretical Baseline
C. Additional Results
D. Additional visualization
6. Ablation studies and discussion
To further analyze and validate the components of our method, we conduct ablation studies on image classification.
Effectiveness of hyperbolic manifolds. We first investigate the effectiveness of hyperbolic manifolds in our approach. As shown in Tab. 4, we report the impact according to image classification. In Euclidean space, the distance function between two vectors is the cosine similarity function. The results demonstrate that applying hierarchical contrastive loss in Euclidean space degrades performance. It indicates that hyperbolic space is more suitable for stabilizing hierarchical structures. Additionally, the application of a KL loss term shows further benefits derived from the semantic seed distribution.
Impact of probabilistic modeling. In Tab. 5, we report the performance comparisons between the probabilistic hierarchy tree and the deterministic hierarchy tree. For constructing a hierarchy tree, probabilistic modeling defines every node via MoG of its child node distributions, while the deterministic approach determines each node the mean of its child nodes. The probabilistic hierarchy tree achieves significant improvement in performance compared to the deterministic approach. This result shows that probabilistic modeling is more effective in representing hierarchical structure than deterministic modeling, leading to improvement in recognition.
Hierarchy width and depth. As shown in Fig. 5, we analyze the effect of the width N and depth L of the hierarchy tree on ImageNet-1K [36] with Hi-Mapper(DeiTS). These factors control the granularity of visual elements to be decomposed. While a small number of N degrades the fine-grained recognition capacity, an excess N hinders the optimization. Meanwhile, a large L may provide diverse granularity, however, it leads to entangled object-level representations. In all the cases, we report the best performance of 82.6% at N=32, L = 4.
:::info
This paper is available on arxiv under CC BY 4.0 DEED license.
:::