Authors:
(1) Matthieu Bult´e, Department of Mathematical Sciences, University of Copenhagen, and Faculty of Business Administration and Economics, Bielefeld University;
(2) Helle Sørensen, Department of Mathematical Sciences, University of Copenhagen.
Table of Links
Abstract and 1. Introduction
2. Preliminaries
3. The GAR(1) Model
3.1. Model and Stationary Solution
3.2. Identifability
4. Estimation of model parameters and 4.1. Fréchet mean
4.2. Concentration parameter
5. Testing for the absence of serial dependence
6. Numerical experiments
6.1. R with multiplicative noise
6.2. Univariate distributions with a density
6.3. SPD Matrices
7. Application
8. Acknowledgement
Appendix A. General results in Hadamard spaces
Appendix B. Proofs
Reference
Abstract
Random variables in metric spaces indexed by time and observed at equally spaced time points are receiving increased attention due to their broad applicability. However, the absence of inherent structure in metric spaces has resulted in a literature that is predominantly non-parametric and model-free. To address this gap in models for time series of random objects, we introduce an adaptation of the classical linear autoregressive model tailored for data lying in a Hadamard space. The parameters of interest in this model are the Fréchet mean and a concentration parameter, both of which we prove can be consistently estimated from data. Additionally, we propose a test statistic and establish its asymptotic normality, thereby enabling hypothesis testing for the absence of serial dependence. Finally, we introduce a bootstrap procedure to obtain critical values for the test statistic under the null hypothesis. Theoretical results of our method, including the convergence of the estimators as well as the size and power of the test, are illustrated through simulations, and the utility of the model is demonstrated by an analysis of a time series of consumer inflation expectations.
1. Introduction
Random variables in general metric spaces, also called random objects, have been receiving increasing attention in recent statistical research. The generality metric space setup does not require any algebraic structure to exist and is only based on the definition of a distance function. This allows the methods developed to be applied in domains ranging from classical setups to more complex use cases on non-standard data. This includes the study of functional data (Ramsay and Silverman (2005)), data lying on Riemannian manifolds, correlation matrices and applications thereof to fMRI data (Petersen and Muller (2019)) or adjacency matrices and social networks (Dubey and M¨uller (2020)) among others.
One example of particular interest due to its wide range of applications is that of data comprising of probability density functions. Probability distributions are a challenging example of a space that is both functional, and thus infinite-dimensional, but also non-Euclidean in the constraints characterizing density functions. This leads to a number of different approaches to studying these objects: they have been studied as the image of Hilbert spaces under transformations (Petersen and M¨uller (2016)), as specific Hilbert spaces with specific addition and scalar multiplication operators (van den Boogaart et al. (2014)), as well as forming metric spaces equipped with different distances (Panaretos and Zemel (2020); Srivastava and Klassen (2016)). See Petersen et al. (2022) for a review of such methodologies. Distributions can be found in many applications; in considering the distribution of socioeconomic factors within a population such as income (Yoshiyuki (2017)), fertility (Mazzuco and Scarpa (2015)) or mortality data (Chen et al. (2021)). They are also useful when considering belief distributions of economic factors (Meeks and Monti (2023)), allowing economic analyses to consider entire distributions rather than empirical expectations.
The study of random objects has received recent attention with work in standard statistical questions (Dubey and Muller (2019, 2020); McCormack and Hoff (2023, 2022); Köstenberger and Stark (2023)) as well as various approaches to regression (Petersen and Muller (2019); Bult´e and Sørensen (2023); Hanneke et al. (2021)). Since the setup of general metric spaces offers very little structure, part of the literature assumes additional structure on the space in order for standard statistical quantities to be well defined. This is usually done by assuming that the metric space is a Hadamard space, see for instance Sturm (2003) for a detailed review of results in Hadamard spaces and Bacak (2014) for computation of Fréchet means in such spaces.
In many of the applications mentioned above, the data might be naturally observed repeatedly at a regular interval and for a time series. In this case, the observations might not be independent and the models and analyses require additional care to take this dependency into account. This work has mainly been carried out in a non-parametric setting, with classical weak dependence assumption. This has been done for instance for testing serial dependence (Jiang et al. (2023)) or for proving the consistency of the Fréchet mean estimator (Caner (2006)).
While this line of work can be broadly applied, they rely on non-parametric assumptions rather than proposing a specific model for the data generation. However, time series models have been developed for specific random objects by exploiting the structure of the space under study. One popular class of models is that of autoregressive models, which have been defined using the linear structure of functional spaces (Bosq (2000); Caponera and Marinucci (2021)) or exploiting a tangent space structure of the space (Zhu and M¨uller (2022); Xavier and Manton (2006); Ghodrati and Panaretos (2023); Zhu and M¨uller (2021)) to name only a few.
Inspired by existing autoregressive models, we propose an autoregressive model for random objects. Relying on an interpretation of iteration in the linear autoregressive model as a noisy weighted sum to the mean, we define a model parametrized by a mean and concentration parameters. For this to be possible, we assume additional structure and require the space to be a Hadamard space, and exploit the geometry of the space to define the time series iteration through geodesics. We develop the methodology and associated theory for estimation and hypothesis testing in this model. This includes estimators for the mean and concentration parameters, and we propose a test statistic for testing for no autocorrelation, corresponding to observing an i.i.d. sample.
The paper is organized as follows: Section 2 gives a presentation of useful concepts and results in Hadamard spaces for the rest of the article. In Section 3, we present our autoregressive model and present a theorem providing a sufficient condition for the existence of a stationary solution of the iterated system of equations associated with the model, and prove the identifiability of the model parameters. We propose in Section 4 estimators for these parameters and prove convergence results for those estimators. In Section 5, we propose a test for a null hypothesis of independence based on a test statistic of which we characterize the asymptotic behavior under the null hypothesis and the alternative of a non-zero concentration parameter. Finally, we illustrate our theoretical results in Section 6 with a numerical study.