Table of Links
Abstract and 1. Introduction
-
Related Works
-
Convex Relaxation Techniques for Hyperbolic SVMs
3.1 Preliminaries
3.2 Original Formulation of the HSVM
3.3 Semidefinite Formulation
3.4 Moment-Sum-of-Squares Relaxation
-
Experiments
4.1 Synthetic Dataset
4.2 Real Dataset
-
Discussions, Acknowledgements, and References
A. Proofs
B. Solution Extraction in Relaxed Formulation
C. On Moment Sum-of-Squares Relaxation Hierarchy
D. Platt Scaling [31]
E. Detailed Experimental Results
F. Robust Hyperbolic Support Vector Machine
Abstract
Hyperbolic spaces have increasingly been recognized for their outstanding performance in handling data with inherent hierarchical structures compared to their Euclidean counterparts. However, learning in hyperbolic spaces poses significant challenges. In particular, extending support vector machines to hyperbolic spaces is in general a constrained non-convex optimization problem. Previous and popular attempts to solve hyperbolic SVMs, primarily using projected gradient descent, are generally sensitive to hyperparameters and initializations, often leading to suboptimal solutions. In this work, by first rewriting the problem into a polynomial optimization, we apply semidefinite relaxation and sparse moment-sum-of-squares relaxation to effectively approximate the optima. From extensive empirical experiments, these methods are shown to perform better than the projected gradient descent approach.
1 Introduction
On the other hand, learning and optimization on hyperbolic spaces are typically more involved than that on Euclidean spaces. Problems that are convex in Euclidean spaces become constrained non-convex problems in hyperbolic spaces. The hyperbolic Support Vector Machine (HSVM), as explored in recent studies [4, 5], exemplifies such challenges by presenting as a non-convex constrained programming problem that has been solved predominantly based on projected gradient descent. Attempts have been made to alleviate its non-convex nature through reparametrization [6] or developing a hyperbolic perceptron algorithm that converges to a separator with finetuning using adversarial samples to approximate the large-margin solution [7]. To our best knowledge, these attempts are grounded in the gradient descent dynamics, which is highly sensitive to initialization and hyperparameters and cannot certify optimality.
:::info
This paper is available on arxiv under CC by-SA 4.0 Deed (Attribution-Sharealike 4.0 International) license.
:::
:::info
Authors:
(1) Sheng Yang, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA (shengyang@g.harvard.edu);
(2) Peihan Liu, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA (peihanliu@fas.harvard.edu);
(3) Cengiz Pehlevan, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, Center for Brain Science, Harvard University, Cambridge, MA, and Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA (cpehlevan@seas.harvard.edu).
:::
