Ph.D. of Statistics
— scalable statistical inference methods.
- • Infinite-dimensional manifold Monte Carlo (Inf-GMC) is a class of geometric MCMC algorithms that are well defined on Hilbert spaces. They are robust under mesh-refinement or increasing dimensions while transitional MCMC algorithms are characterized by deteriorating mixing times upon mesh-refinement, when the finite-dimensional approximations become more accurate. They also take advantage of geometric information and have been proven to be more efficient than infinite-dimensional non-geometric methods such as pCN. [codes] [read]
- • Spherical Hamiltonian Monte Carlo (SphHMC) is an HMC defined on spheres which was originally proposed to handle vector norm constraints in sampling problems. The constrained domain is augmented and mapped to a sphere where the sampler can move freely and generate samples that remain within the boundaries when mapped back to the original space. Passing across the equator translates back to bouncing off the original boundaries. It handles constraints efficiently and implicitly. [demo] [demo2] [codes] [read]
- • Wormhole Hamiltonian Monte Carlo (WHMC) tackles the long existing challenge of simulating from multimodal distributions by devising a metric to shorten distance between modes and building a wormhole network to facilitate jumping among modes. The algorithm also uses the regeneration technique to adapt the resulting chain to allow for online mode searching. To find new modes, as opposed to rediscovering the known modes, the algorithm optimizes on a residual energy function by removing a mixture of Gaussians with known modes. [demo] [codes] [read]
- • Lagrangian Monte Carlo (LMC) proposes MCMC moves along geodesics using Lagrangian dynamics. It efficiently explores the high density region of parameter space with the gradient information, and adapts such exploration with the metric information. Therefore it is useful in sampling from non-Gaussian probability distributions of complicated geometric structures. LMC improves over Riemannian MCMC (Girolami and Calderhead, 2011) in terms of numerical efficiency and stability. [demo] [codes] [read]
— Bayesian nonparametric models.
- • Modeling covariance (correlation) matrices is a challenging problem in statistics due to the large dimensionality and positive-definiteness constraint. We propose a novel structured Bayesian framework for this problem based on the separation strategy of decomposing the covariance matrix into variance and correlation matrices. We extend it to dynamic cases and introduce unit-vector GP priors for modeling the evolution of correlation among multiple time series. The method demonstrates full flexibility in modeling complex dependence structures. [read]
- • The Bayesian approach to inverse problems relies on MCMC for posterior inference. The typical nonlinear concentration of posterior measure motivates the exploitation of local geometric information in the form of covariance gradients, metric tensors, Levi-Civita connections, and local geodesic flows. We propose to use GP emulator to approximate the geometric quantities. The quality of the emulator depends on a carefully chosen design set of configuration points, which is adapted using statistical experimental design methods. [demo] [codes] [read]
- • The field of phylodynamics focuses on the problem of reconstructing population size dynamics over time using current genetic samples taken from the population of interest. This technique has been extensively used in many areas of biology but is particularly useful for studying the spread of quickly evolving infectious diseases agents, e.g. influenza virus. We studied the coalescent process model, equipped with a GP prior on the population size trajectory, that allows for nonparametric Bayesian estimation of population size dynamics. [codes] [read]
- • We developed a scalable semi-parametric Bayesian model to capture dependencies among multiple neurons by detecting their co-firing (possibly with some lag time) patterns over time. The nonparametric component (GP) provides a flexible framework for modeling the underlying firing rates, and the parametric component (copula) allows us to make inferences regarding both contemporaneous and lagged relationships among neurons. [read]
— Bayesian uncertainty quantification.
- • By analyzing a local field potential data collected from the hippocampus of rats performing a complex memory task, we elucidate how memory and cognition arise from functional interactions among brain regions by modeling their dynamic connectivity. The task involves repeated presentations of a sequence of odors and requires rats to identify whether each odor is ‘in sequence’ (e.g. ABCDE) or ‘out of sequence’ (e.g. ABDDE). Our sensitive approach reveals changes in neural activity associated with such cognitive process. [demo] [read]
- • Climate projections continue to be marred by large uncertainties, which originate in processes that need to be parameterized, such as clouds, convection, and ecosystems. But rapid progress is now within reach. New computational tools and methods from data assimilation and machine learning make it possible to integrate global observations and local high-resolution simulations in an Earth system model (ESM) that systematically learns from both. We propose a blueprint for such an ESM. [read]
- • Bayesian inverse problems often involve sampling posterior distributions on infinite-dimensional function spaces. We combine geometric methods on a finite-dimensional subspace with mesh-independent infinite-dimensional approaches to solve a 2-D laminar jet flow inverse problem — infer the inlet velocity profile (red line) given some sparse observations (blue dots) on the left panel; a typical nonlinear forward solution is shown on the right panel. This is a complex inverse problem due to the non-linearity of the forward PDE and the sparsity of observations. [read]
- • Infectious diseases exert a large and in many contexts growing burden on human health, but violate most of the assumptions of classical epidemiological statistics and hence require a mathematically sophisticated approach. We show how a Bayesian approach to the inverse problem together with modern Markov chain Monte Carlo algorithms based on information geometry can yield insights into the disease dynamics of two of the most prevalent human pathogens—influenza and norovirus—as well as Ebola virus disease. [read]