The Geometric Revolution That's Making Computer Vision More Efficient

:::info
Authors:

(1) Ahmad Bdeir, Data Science Department, University of Hildesheim ([email protected]);

(2) Niels Landwehr, Data Science Department, University of Hildesheim ([email protected]).

:::

Table of Links

Abstract and 1. Introduction

Related Work
Methodology

3.1 Background

3.2 Riemannian Optimization

3.3 Towards Efficient Architectural Components
Experiments

4.1 Hierarchical Metric Learning Problem

4.2 Standard Classification Problem
Conclusion and References

Abstract

Hyperbolic deep learning has become a growing research direction in computer vision for the unique properties afforded by the alternate embedding space. The negative curvature and exponentially growing distance metric provide a natural framework for capturing hierarchical relationships between data-points and allowing for finer separability between their embeddings. However, these methods are still computationally expensive and prone to instability, especially when attempting to learn the negative curvature that best suits the task and the data. Current Riemannian optimizers do not account for changes in the manifold which greatly harms performance and forces lower learning rates to minimize projection errors. Our paper focuses on curvature learning by introducing an improved schema for popular learning algorithms and providing a novel normalization approach to constrain embeddings within the variable representative radius of the manifold. Additionally, we introduce a novel formulation for Riemannian AdamW, and alternative hybrid encoder techniques and foundational formulations for current convolutional hyperbolic operations, greatly reducing the computational penalty of the hyperbolic embedding space. Our approach demonstrates consistent performance improvements across both direct classification and hierarchical metric learning tasks while allowing for larger hyperbolic models.

1 Introduction

With the recent rise in the use of hyperbolic manifolds for deep representation learning, there is a growing need for efficient, flexible components that can fully exploit these spaces without sacrificing stability. This has led researchers to focus on two main derivations of hyperbolic space: the Poincaré manifold and the hyperboloid. The Poincaré ball, equipped with a gyrovector space, supports various well-defined operations, including generalized vector addition and multiplication, but it suffers from significant stability issues. On the other hand, the hyperboloid, or Lorentz space, lacks these operations but offers much better operation stability, as demonstrated in the study by Mishne et al. [28].

To address this gap, previous works have sought to provide Lorentzian definitions for common deep learning operations such as the feed-forward layer [3, 5, 10], convolutional layer [3, 5, 32], and MLRs [1]. This increased focus on hyperbolic modeling has led to its gradual integration into computer vision architectures, as detailed in the survey by Mettes et al. [27]. Specifically, the hyperboloid model has been employed as a sampling space for VAEs [29], a decoder space for vision tasks in hybrid settings [14, 18, 25, 31], and ultimately for fully hyperbolic Lorentzian vision encoders [1] simultaneously with its Poincaré counterpart [38].

This paper furthers the development of hyperbolic learning for vision tasks, specifically for the Lorentz manifold. Our primary focus is on the challenge of learning the manifold’s negative curvature. The driving principle behind this, is that the model embeddings may exhibit varying degrees of hyperbolicity depending on the innate hierarchies in the datapoints themselves, the problem task that is being considered, and the specific locations of hyperbolic operation integrations. To accomodate for this, we can adjust the embedding space’s hyperbolic metric to be less or more Euclidean which accounts for the modeling requirements. We also build on the idea of separate manifolds for separate main blocks in the architecture further increasing representative flexibility.

We also recognize that despite recent advances, Lorentz models continue to struggle with issues of high computational costs. We attempt to isolate and alleviate the main factors leading to numerical inaccuracies and computational overhead overall, and more particularly when modeling data in higher-dimensional embedding spaces and when learning the curvatures. Our contributions can then be summed up as:

We propose a formulation for Riemannian AdamW and an alternative schema for Riemannian optimizers that accounts for manifold curvature learning.
We propose the use of our maximum distance rescaling function to restrain hyperbolic vectors within the representative radius of accuracy afforded by the number precision, even allowing for fp16 precision.
We provide a more efficient convolutional layer approach that is able to leverage the highly optimized existing implementations.
We empirically show the effectiveness of combining these approaches using classical image classification tasks and hierarchical metric learning problems.

2 Related Work

Hyperbolic Embeddings in Computer Vision With the success of employing hyperbolic manifolds in NLP models [6, 36, 45] hyperbolic embeddings have extended to the computer vision domain. Initially, many of the works relied on a hybrid architecture, utilizing Euclidean encoders and hyperbolic decoders [27]. This was mainly due to the high computational cost of hyperbolic operations in the encoder, as well as the lack of well-defined alternatives for Euclidean operations. However, this trend has begun to shift towards the utilization of fully hyperbolic encoders as can be seen in the hyperbolic Resnets by Bdeir et al. [1] and van Spengler et al. [38]. Both works offer hyperbolic definitions for 2D convolutional layer, batch normalization layer, and an MLR for the final classification head. Bdeir et al. [1] even attempt to hybridize the encoder by employing the Lorentz manifold in blocks that exhibit higher output hyperbolicity. While this has led to notable performance improvements, both models suffer from upscaling issues. Attempting to replicate these approaches for larger datasets or bigger architectures becomes much less feasible in terms of time and memory requirements. Instead, our approach places higher focus on efficient components to leverage the beneficial hyperbolic properties of the model while minimizing the memory and computational footprint.

Curvature Learning Previous work in hyperbolic spaces has explored various approaches to curvature learning. In their studies, Gu et al. [13] and Giovanni et al. [12] achieve this by using a radial parametrization that implicitly models variable curvature embeddings under an explicitly defined, fixed 1-curve manifold. This method enables them to simulate K-curve hyperbolic and spherical operations under constant curvature for the mixed-curve manifold specifically, a combination of the Euclidean, spherical, and Poincaré manifold. Other approaches, such as the one by Kochurov et al. [21], simply set the curvature to a learnable parameter but do not account for the manifold changes in the Riemannian optimizers. This leads to hyperbolic vectors being updated with mismatched curvatures and others being inaccurately reprojected, resulting in instability and accuracy degradation. Additionally, some methods, like the one by Kim et al. [20], store all manifold parameters as Euclidean vectors and project them before use. While this approach partially mitigates the issue of mismatched curvature operations, it remains less accurate and more computationally expensive. In comparison, our proposed optimization schema maintains the parameters on the manifold and optimizes them directly by performing the necessary operations to transition between the variable curvature spaces.

Metric Learning Metric learning relies on the concept of structuring the distribution in the embedding space so that related data points are positioned closer together, while less related points are placed further apart. To facilitate this process, numerous studies have introduced additional loss functions that explicitly encourage this behavior. Contrastive losses, for instance, operate on pairs of data points and propose a penalty that is proportional to the distances between negative pairs and inversely proportional to the distance between positive pairs [4]. Triplet loss extends this idea by considering sets of three points: an anchor, a positive sample, and a negative sample [39]. Instead of changing the distances between points absolutely, it ensures that the distance between the anchor and the positive sample is less than the distance between the anchor and the negative sample, plus a margin, thus enforcing a relational criterion.

These approaches have also been adapted to hierarchical problem settings under hyperbolic manifolds [20, 43]. Notably, Kim et al. [20] developed a method for learning continuous hierarchical representations using a deep learning, data-mining-like approach that relies on the innate relationships of the embeddings rather than their labels. They employ a proxy-based method that models the data on the Poincaré ball, facilitating a more natural extension to hierarchical tasks. Building on this, we extend the approach by modeling the loss function in the Lorentz manifold and incorporating a learnable curvature to better handle data with varying levels of hierarchy.

:::info
This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::