Table of Links
Abstract and 1. Introduction
2. Background
2.1 Amortized Stochastic Variational Bayesian GPLVM
2.2 Encoding Domain Knowledge through Kernels
3. Our Model and Pre-Processing and Likelihood
3.2 Encoder
4. Results and Discussion and 4.1 Each Component is Crucial to Modifies Model Performance
4.2 Modified Model achieves Significant Improvements over Standard Bayesian GPLVM and is Comparable to SCVI
4.3 Consistency of Latent Space with Biological Factors
4. Conclusion, Acknowledgement, and References
A. Baseline Models
B. Experiment Details
C. Latent Space Metrics
D. Detailed Metrics
A BASELINE MODELS
A.1 SCVI
Proposed in 2019 by Lopez et al. (2018), single-cell variational inference (scVI) is a variational autoencoder that is tuned for single-cell data and has been shown to match current state of the art methods in a variety of downstream tasks, including clustering and differential expression (Lopez et al., 2018; Luecken et al., 2022). Furthermore, due to its neural network structure, the model is scalable to large datasets. An overview of the model is presented in Figure 5.
We highlight several key components of the model that target phenomena commonly seen in single-cell data: (1) count data, (2) batch effect, and (3) library size normalization.
Count Data. As scRNA-seq raw count data are discrete, scVI adopts various discrete likelihoods, such as the negative binomial likelihood, for its models. This allows the model to learn a latent space directly from the raw expression data without any conventional pre-processing pipelines. Note that the original paper uses the zero-inflated negative binomial likelihood for the main model to account for dropouts, where gene expressions for a cell are not detected due to technical artifacts (Lopez et al., 2018; Luecken & Theis, 2019).
Accounting for Batch Effects. scVI also models for any effects from different sampling batches by incorporating batch ID information for each cell in both the encoding and decoding portions of the VAE model. While batch information is incorporated as input to the neural network encoder and decoders, it is unclear how exactly the batch effects are modelled.
Library Size Normalization. The third component scVI accounts for is the differences in total gene expression count per cell, or library size, of the data. In the raw count data, each cell has different total gene counts, which may affect comparisons between cells and impact downstream analysis (Hie et al., 2020). As this difference in library size, or sequencing depth, may be a result of technical noise, scVI chooses to model a scaling factor ℓ stand-in for library size. This latent variable is modelled as a log normal as done in Zappia et al. (2017) mappings from the raw counts and batch information to the mean and variance learned by the neural network encoder. To avoid conflating the effects of the scaling factor and of biological effects in the data, a softmax is applied to the output of the decoder before being multiplied by the scaling factor to determine the negative binomial likelihood mean.
The corresponding loss term for each data point is given by
where the parameters to be optimized are the weights of the neural network encoders and decoders as well as the inverse dispersion factor r of the negative binomial likelihood. The way in which the loss can be decomposed into terms for datapoint allows the model to be trained with mini-batching (Hoffman et al., 2013).
While scVI has been shown to perform well in a variety of downstream tasks (Lopez et al., 2018; Luecken et al., 2022), its complex architecture (as seen in Figure 5) and opaque incorporation of known nuisance variables like batch effects make the model and its inferences difficult to interpret.
A.2 LDVAE
In response to this lack of interpretability in the original scVI, Svensson et al. (2020) proposed a linear version of scVI, where the neural network decoder is replaced with a linear mapping. In particular the LDVAE model is defined in the generative way as follows:
where W represents the linear mapping. Note that the mapping from latent space to data space is not completely linear as a nonlinearity is introduced in the softmax function. Moreover, Svensson et al. explored applying a BatchNorm layer to the linearly decoded parameters and found it matched or improved model performance in reconstruction error and learning the latent space in a mouse embryo development dataset (Svensson et al., 2020; Cao et al., 2019). This BatchNorm layer is thus adopted in the LDVAE model, which further obscures a straightforward interpretation of the mapping defined by the decode
Thus, while the LDVAE model allows for a more interpretable mapping from the latent space to the dimension space when compared to scVI, the use of a library size surrogate and a not clearly defined incorporation of batch information through NNs make both models less interpretable
Authors:
(1) Sarah Zhao, Department of Statistics, Stanford University, ([email protected]);
(2) Aditya Ravuri, Department of Computer Science, University of Cambridge ([email protected]);
(3) Vidhi Lalchand, Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard ([email protected]);
(4) Neil D. Lawrence, Department of Computer Science, University of Cambridge ([email protected]).