Interplays between Numerical Approximation, Gaussian Process Regression and Learning.
Broadly speaking my current/recent research concerns the exploration, from a Game Theoretic perspective, of interplays between Numerical Approximation, Gaussian Process Regression and Learning with applications to Numerical Homogenization,Operator Adapted Wavelets, Fast Solvers, Dense Kernel Matrices, Machine Learning and Uncertainty Quantification.For more see the following [ paper ] and the following [ slides] (no videos) [ slides] (with videos)
Learning can be formulated as solving an interpolation problem. Although Kriging offers a solution, it does not scale well withthe number of training data points and it requires the prior specification of a Kernel . Kernel Flow (KF) is an algorithm that learns kernels by simulating a data driven dynamical system (a flow in the input space). The motivation for developing this algorithm is that it is amenable to some degree of analysis (which could help in the formulation of a rigorous theory for deep learning) and is scalable to large datasets. When applied to datasets such as MNIST or fashion-MNIST KF produces a kernel capable of generalization from 1 sampleper class. The gifs below and above show the application of KF to the fashion MNIST dataset and to the swiss roll cheesecakeclassification problem. The bottom left mage shows the instantaneous flow generated by KF, the bottom right image shows the time-averaged flow.For more see the following [ slides ] (no videos) [slides] (with videos), the following [ paper] and the following [videos]
Operator adapted wavelets.
The application of the game theoretic approach to the problem of approximating solutions of linear operatorsalso leads the identification of operator adapted wavelets (gamblets) satisfying three properties: (1) scale orthogonality (in operator scalar product, which leads to a a block-diagonalization of the operator) (2) well-conditioned linear systems (which leads to blocks of uniformly bounded condition numbers) (3) localization or exponential decay (which leads to sparse blocks).These gamblets can also be interpreted as Wannier functions, i.e. linear combinations of eigenfunctions concentrated around given eigenvalues that also remain concentrated in space. They are identified by conditioning the Gaussian field emerging from the game described above with respect to a hierarchy of non-adapted wavelets (e.g. Haar). Although wavelets satisfyingone of those three conditions were known before, this simple Gaussian Process Regression approach leads to the first identificationsatisfying all these three properties.
Numerical homogenization can be formalized as the problem of identifying a finite number of basis functions (for a given operator)that are (1) as accurate as possible (in approximating the solution space of that operator) (2) as localized as possible.This problem is non trivial because (1) and (2) are conflicting and its solution provides a generalization of results from classical homogenization with assumptions such as scale separation or ergodicity. The adversarial game approachto numerical approximation provides a simple solution to this problem and the resulting basis functions can be identifiedas elementary bets for playing a game with the task of recovering an unknown function from partial measurements. The optimal strategy for this game is a Gaussian field whose conditioning (on elementary measurements) produces theseelementary bets (gamblets)/basis functions. This interplay between Gaussian Process Regression and Numerical Approximationprovides the first rigorous proof of the screening effect with quantitative exponential decay estimates.
Game theoretic approach to numerical approximation and algorithm design.
The three main approaches to uncertainty quantification (worst case, Bayesian, adversarial game/decision theory)can be translated into three main approaches to numerical analysis and algorithm design. The worst case approach is the classical approach of numerical analysis. The Bayesian approach is the average case approach of information based complexity and bayesian numerical analysis which is currently reemerging as probabilistic numerics. The game theoretic approach stems from the observation that to compute fast one must compute with partial information over hierarchies of levels of complexity. This process of computing with partial information can be turned into a process of playing repeated adversarial games against the missing information. The optimal (mixed/randomized) strategies for playing such games can be used to automate the process of discovery in numerical analysis and algorithm design (i.e. learn numerical solvers through a machine learning approach to approximating the solutions of ODEs/PDEs).
Stochastic and Multiscale analysis.
My older research work concerns (1) The study of anomalous diffusion in systems characterized by an infinite number of scales. [ paper ](2) Homogenization and inverse homogenization with a continuum of scales. (3) Numerical Homogenization. [ paper ](4) The geometric integration of stochastic Hamiltonian systems. [ paper ](5) The structure preserving integration of (possibly stochastic) multiscale systems with hidden slow variables. [ paper ](6) Super-resonnance. Control via parametric resonance. (7) The proof of Davies’ and periodization conjectures in homogenization. [ paper ] [ paper ]
Stochastic and derandomization calculus in infinite dimensional spaces.
The worst case, Bayesian and decision theory approaches to Uncertainty Quantification require the development of a form of calculus enabling the manipulation of infinite dimensional information structures involving the resolution of optimization (min, max and minimax) problems over spaces of measures and functions and spaces of measures over measures and functions.Another motivation for the development of this calculus is its potential application to (1) the design of fast, robust and scalable numerical solvers and the analysis of partial differential equations (2) model form uncertainty quantification and design (3) model reduction. Although these applications may appear, at first glance, dissimilar, they are all based on the efficient processing of incomplete information with limited computational resources (i.e., complexity management capabilities constraints): (1) fast and robust computation requires computation with partial information (this concept forms the core of Information Based Complexity where it is also augmented by concepts of contaminated and priced information associated with, for example, truncation errors and the cost of numerical operations) (2) model form UQ and design require the management and processing of epistemic uncertainties and limited data, and (3) model reduction requires the approximation of the full state of a complex system through operations performed on a few (coarse/reduced) variables. Examples of applications include Optimal Uncertainty Quantification and the reduction of infinite dimensional optimization problems over measures and functions to finite dimensional ones, the identification of new Selberg identities [ paper ],the quantification of the robustness of Bayesian inference under finite information, the identification ofGaussian fields as optimal mixed strategies for the game theoretic approach to numerical approximation.
The work of my group in this area has been focused on adversarial approaches to Uncertainty Quantification (worst casemin max and game theoretic minimax). The following [ slides] (Caltech Nov. 2013) provide a high level description of this research.The following extract from this [ paper] describes our general vision (based on Decision/Game theory)During the past century the need to solve large complex problems in applications such as fluid dynamics, neutron transport or ballistic prediction drove the parallel development of computers and numerical methods for solving ODEs and PDEs. It is now clear that this development lead to a paradigm shift. Before: each new PDE required the development of new theoretical methods and the employment of large teams of mathematicians and physicists; in most cases, information on solutions was only qualitative and based on general analytical bounds on fundamental solutions. After: mathematical analysis and computer science worked in synergy to give birth to robust numerical methods (such as finite element methods) capable of solving a large spectrum of PDEs without requiring the level of expertise of an A.~L.~Cauchy or level of insight of a R.~P.~Feynman. This transformation can be traced back to sophisticatedcalculations performed by arrays of human computers organized as parallel clusters such as in the pioneering work of Lewis Fry Richardson, who in 1922 had a room full of clerks attempt to solve finite-difference equations for the purposes of weather forecasting, and the 1947 paper by John Von Neumann and Herman Goldstine on Numerical Inverting of Matrices ofHigh Order. Although Richardson's predictions failed due to the use of unfiltered data/initial conditions/equations and large time-steps not satisfying the CFL stability condition, his vision was shared by Von Neumann in his proposal of the Meteorology Research Project to the U.S. Navy in 1946, qualified by Platzman as ``perhaps the most visionary prospectus for numerical weather prediction since the publication of Richardson’s book a quarter-century earlier.''The past century has also seen a steady increase in the need of estimating and predicting complex systems and making (possibly critical) decisions with limited information. Although computers have made possible the numerical evaluation of sophisticated statistical models, these models are still designed by humans through the employment of multi-disciplinary teams of physicists, computer scientists and statisticians. Contrary to the original human computers(such as the ones pioneered by L. F. Richardson or overseen by R. P. Feynman at Los Alamos), these human teams do not follow a specific algorithm (such as the one envisioned in Richardson's Forecast Factory where 64,000 human computers would have been working in parallel and at high speed to compute world weather charts) because there is currently no known recipe or algorithm for dividing the design of a statistical model into a sequence of arithmetic operations. Furthermore, while human computers were given a specific PDE or ODE to solve, these human teams are not given a wellposed problem with a well defined notion of solution. As a consequence different human teams come up with different solutions to the design of the statistical model along with different estimates on uncertainties.Indeed enabling computers to think as humans have the ability to do when faced with uncertainty is challenging in several major ways: (1) There is currently no known recipe or algorithm for dividing the design of a statistical model into a sequence of arithmetic operations (2) Formulating the search for an optimal statistical estimator/model as a wellposed problem is not obvious when information on the system of interest is incomplete and comes in the form of a complex combination of sample data, partial knowledge of constitutive relations and a limited description of the distribution of input random variables. (3) The space of admissible scenarios along with the space of relevant information,assumptions, and/or beliefs, tend to be infinite dimensional, whereas calculus on a computer is necessarily discrete and finite.One purpose of UQ is to explore and develop the foundations of a rigorous/rational framework for the scientific computation of optimal statistical estimators/models and the optimal quantification of uncertaintiesfor complex systems. The emerging questions lead to many challenging problems in Decision Theory, Machine Learning, Bayesian Inference, Stochastic Optimization, Robust Optimization, and Information Based Complexity, the most fundamental of these being the simultaneous emphasis on computation and performance as in Machine Learning initiated by Valiant.For more see the following [ slides ], the following [ paper 1, paper 2] and the following interviewHPC Wire: The Masters of Uncertainty (09/2013)
“Technology, in common with many other activities, tends toward avoidance of risks by investors. Uncertainty is ruled out if possible. People generally prefer the predictable. Few recognize how destructive this can be, how it imposes severe limits on variability and thus makes whole populations fatally vulnerable to the shocking ways our universe can throw the dice.” Frank Herbert (Heretics of Dune)
Brittleness/Robustness of machine learning algorithms
Under FA9550-12-1-0389 we derived Brittleness (non robustness) results for Bayesian estimators. (see the following blog posts [09/2013], [01/2015], [01/2016])Since this lack of robustness is caused by a mechanism that (1) is inherent to doing inference in a continuous world with finite-information, and (2) implies that consistency and robustness are conflicting requirements, we predicted that deep/machine learning algorithms could be non robust and could lead to increased confidence in incorrect solutions(see [ talk ] by Mike McKerns at SciPy 2013 published on July 2013). Google engineers who were present at the talk tested these predictions for neural networks and observed (Szegedy et al, Dec 2013, Intriguing properties of neural networks) the nonrobustness of these algorithms to adversarial examples. This area has grown into the field known as adversarial machine learningwhich studies (1) how those algorithms could be attacked by exploiting their brittleness and (2) how to protect those algorithms against those attacks. Addressing these vulnerabilities is now recognized as critical to the safety of Machine Learning Algorithms.
Kernel mode decomposition and programable/interpretable regression networks
Kernel mode decomposition networks (KMDNets) are interpretable regression networks for pattern recognition. KMDNets are programmable and amenable to analysis, whereas artificial neural networks are trainable but not interpretable, nor programmable. The programming of KMDNets is achieved by assembling elementary modules decomposing and recomposing kernels and data. These elementary modules are repeated across levels of abstraction and interpreted from the equivalent perspectives of optimal recovery, game theory and Gaussian process regression (GPR). The mode decomposition module produces an approximation of a vector (whose entries are elements of linear sub-spaces of a common Hilbert space) from the observation of the sum of its entries. The second module recomposes modes and kernels based on alignments between the signal and the recovered modes. As a prototypical application, KMDNets have been programmed to recover (to near-machine precision) the modes of a (possibly noisy) signal when those modes are unknown periodic functions that are slowly modulated in amplitude and frequency (when the periodicwaveforms are trigonometric, these modes are known intrinsic mode functions in the setting of empirical mode decomposition).For more see the following the following [ paper] and the following [ slides ]