Operator adapted wavelets. The application of the game theoretic approach to the problem of approximating solutions of linear operators also leads the identification of operator adapted wavelets (gamblets) satisfying three properties:  (1) scale orthogonality (in operator scalar product, which leads to a a block-diagonalization of the operator)  (2) well-conditioned linear systems (which leads to blocks of uniformly bounded condition numbers)  (3) localization or exponential decay (which leads to sparse blocks). These gamblets can also be interpreted as Wannier functions, i.e. linear combinations of eigenfunctions concentrated  around given eigenvalues that also remain concentrated in space. They are identified by conditioning the Gaussian field emerging  from the game described above with respect to a hierarchy of non-adapted wavelets (e.g. Haar). Although wavelets satisfying one of those three conditions were known before, this simple Gaussian Process Regression approach leads to the first identification satisfying all these three properties.

Game theoretic approach to numerical approximation and algorithm design.

The three main approaches to uncertainty quantification (worst case, Bayesian, adversarial game/decision theory) can be translated into three main approaches to numerical analysis and algorithm design. The worst case approach is the classical approach of numerical analysis. The Bayesian approach is the average case approach of information based complexity and bayesian numerical analysis which is currently reemerging as probabilistic numerics. The game theoretic approach stems from the observation that to compute fast one must compute with partial information over hierarchies of levels of complexity. This process of computing with partial information can be turned into a process of playing repeated adversarial games against the missing information. The optimal (mixed/randomized) strategies for playing such games can be used to automate the process of discovery in numerical analysis and algorithm design (i.e. learn numerical solvers through a machine learning approach to approximating the solutions of ODEs/PDEs).
Fast solvers, theory and algorithms. Scalable linear solvers play an important role in science and industry and  are characterized by a large degree of diversity. Although, as affirmed by Sard, no numerical approximation method can be universal, one may  wonder if some degree of universality could be achieved for a large class of linear operators? The game theoretic approach leads to scalable (of provable near-linear complexity) numerical solvers for  arbitrary linear elliptic operators on Sobolev spaces and dense kernel matrices. The complexity of these direct solvers is O(N log2d+1 N) which is the state of the art for PDEs with rough coefficients. These solvers also perform an approximate  PCA/eigen-subspace decomposition corresponding to a diagonalization into uniformly well conditioned and sparse blocks.

Interplays between Numerical Approximation, Gaussian Process Regression and Learning.

Broadly speaking my current/recent research concerns the exploration, from a Game Theoretic perspective, of interplays between Numerical Approximation, Gaussian Process Regression and Learning  with applications to Numerical Homogenization, Operator Adapted Wavelets, Fast Solvers, Dense Kernel Matrices and Machine Learning. For more see the following [ slides ]

Kernel Flows

Learning can be formulated as solving an interpolation problem. Although Kriging offers a solution, it does not scale well with the number of training data points and it requires the prior specification of a Kernel . Kernel Flow (KF) is an algorithm that learns kernels by simulating a data driven dynamical system (a flow in the input space). The motivation for developing this algorithm is that it is amenable to some degree of analysis (which could help in the formulation of a rigorous theory for deep learning) and is scalable to large datasets. When applied to datasets such as MNIST or fashion-MNIST KF produces a kernel capable of generalization from 1 sample per class. The gifs  below and above show the application of KF to the fashion MNIST dataset and to the swiss roll cheesecake classification problem. The bottom left mage shows the instantaneous flow generated by KF, the bottom right image shows the time-averaged flow. For more see the following  [ slides ], the following [ paper ] and the following [videos ]

Numerical homogenization.

Numerical homogenization can be formalized as the problem of identifying a finite number of basis functions (for a given operator) that are (1) as accurate as possible  (in approximating the solution space of that operator) (2) as localized as possible. This problem is non trivial because (1) and (2) are conflicting and its solution provides a generalization of results from classical homogenization with assumptions such as scale separation or ergodicity.  The adversarial game approach to numerical approximation provides a simple solution to this problem and the resulting basis functions can be identified as elementary bets for playing a game with the task of recovering an unknown function from partial measurements. The optimal strategy for this game is a Gaussian field whose conditioning (on elementary measurements) produces these elementary bets (gamblets)/basis functions. This interplay between Gaussian Process Regression and Numerical Approximation provides the first rigorous proof of the screening effect with quantitative exponential decay estimates.

Stochastic and Multiscale analysis.

My older research work concerns (1) The study of anomalous diffusion in systems characterized by an infinite number of scales. [ paper ] (2) Homogenization and inverse homogenization with a continuum of scales. (3) Numerical Homogenization. [ paper ] (4) The geometric integration of stochastic Hamiltonian systems. [ paper ] (5) The structure  preserving integration of (possibly stochastic) multiscale  systems with hidden slow variables. [ paper ] (6) Super-resonnance. Control via parametric resonance. (7) The proof of Davies’ and periodization conjectures in homogenization.  [ paper ] [ paper ]

Stochastic and derandomization calculus in infinite dimensional spaces.

The worst case, Bayesian and decision theory approaches to Uncertainty Quantification require the development of a form of calculus enabling the manipulation of infinite dimensional information structures involving the resolution of  optimization (min, max and minimax) problems over spaces of measures and functions and spaces of measures over measures and functions. Another motivation for the development of this calculus is its potential application to (1) the design of fast, robust and scalable numerical  solvers and the analysis of partial differential equations (2) model form uncertainty quantification and design  (3) model reduction. Although these applications may appear, at first glance, dissimilar, they are all based on the efficient processing of incomplete information with limited computational resources (i.e., complexity management capabilities constraints): (1) fast and robust computation requires computation with partial information (this concept forms the core of Information Based Complexity where it is also augmented by concepts of contaminated and priced information associated with, for example, truncation errors and the cost of numerical operations) (2) model form UQ and design require the management and processing of  epistemic uncertainties and limited data, and (3) model reduction  requires the approximation of the full state of a complex system through operations performed on a few (coarse/reduced) variables. Examples of applications include Optimal Uncertainty Quantification and the reduction of infinite dimensional optimization problems over measures and functions to finite dimensional ones,  the identification of new Selberg identities [ paper ], the quantification of the robustness of Bayesian inference under finite information, the identification of Gaussian fields as optimal mixed strategies for the game theoretic approach to numerical approximation.

Research supported by

AFOSR DARPA DOE LANL NNSA NSF ONR UTRC

Uncertainty Quantification.

During the past century the need to solve large complex problems in applications such as fluid dynamics, neutron transport or ballistic prediction drove the parallel development of computers and numerical methods for solving ODEs and PDEs. It is now clear that this development lead to a paradigm shift. Before: each new PDE required the development of new theoretical methods and the employment of large teams of mathematicians and physicists; in most cases, information on solutions was only qualitative and based on general analytical bounds on fundamental solutions. After: mathematical analysis and computer science worked in synergy to give birth to robust numerical methods (such as finite element methods) capable of solving a large spectrum of PDEs without requiring the level of expertise of an A.~L.~Cauchy or level of insight of a R.~P.~Feynman. This transformation can be traced back to sophisticated calculations  performed by arrays of  human computers organized as parallel clusters such as in the pioneering work of  Lewis Fry Richardson,  who in 1922 had a room full of clerks attempt to solve finite-difference equations for the purposes of weather forecasting, and the 1947 paper by John Von Neumann and Herman Goldstine on Numerical Inverting of Matrices of High Order.Although Richardson's predictions failed due to the use of unfiltered data/initial conditions/equations and large time-steps not satisfying the CFL stability condition,  his vision was shared by Von Neumann in his proposal of the Meteorology Research Project to the U.S. Navy in 1946, qualified by Platzman as ``perhaps the most visionary prospectus for numerical weather prediction since the publication of Richardson’s book a quarter-century earlier.'' The past century  has also seen a steady increase in the need of estimating and predicting complex systems and making (possibly critical) decisions with limited information. Although computers have made possible the numerical evaluation of sophisticated statistical models, these  models are still designed by humans through the employment of multi-disciplinary teams of physicists, computer scientists and statisticians. Contrary to the original human computers (such as the ones pioneered by L. F. Richardson or overseen by R. P. Feynman at Los Alamos), these human teams do not follow a specific algorithm (such as the one envisioned in Richardson's Forecast Factory where 64,000 human computers would have been working in parallel and at high speed to compute world weather charts) because there is currently no known recipe or algorithm for dividing the design of a statistical model  into a sequence of arithmetic operations. Furthermore, while human computers were given a specific PDE or ODE to solve, these human teams are not given a well posed problem with a well defined notion of solution. As a consequence different human teams come up with different solutions to the design of the statistical model  along with different estimates on uncertainties. Indeed  enabling computers to think as humans have the ability to do when faced with uncertainty is challenging in several major ways:  (1) There is currently no known recipe or algorithm for dividing the design of a statistical model into a sequence of arithmetic operations (2) Formulating the search for an optimal statistical estimator/model as a well posed problem is not obvious when information on the system of interest is incomplete and comes in the form of a complex combination of sample data, partial knowledge of constitutive relations and a limited description of the distribution of input random variables. (3) The space of admissible scenarios along with the space of relevant information, assumptions, and/or beliefs, tend to be infinite dimensional, whereas calculus on a computer is necessarily discrete and finite. One purpose of UQ is to explore and develop the foundations of a rigorous/rational framework for the scientific computation of optimal statistical estimators/models and the optimal quantification of uncertainties for complex systems.  The emerging questions lead to many  challenging problems in Decision Theory, Machine Learning,  Bayesian Inference, Stochastic Optimization, Robust Optimization, and Information Based Complexity,  the most fundamental of these being the simultaneous emphasis on computation and performance as in Machine Learning initiated by Valiant. For more see the following [ slides ]  and the following [ paper 1, paper 2 ]
“Technology, in common with many other activities, tends toward avoidance of risks by investors.  Uncertainty is ruled out if possible.  People generally prefer the predictable.  Few recognize how destructive this can be, how it imposes severe limits on variability and thus makes whole populations fatally vulnerable to the shocking ways our universe can throw the dice.”                                                        Frank Herbert (Heretics of Dune)

Brittleness/Robustness of machine learning algorithms

Under FA9550-12-1-0389 we derived Brittleness (non robustness) results for Bayesian estimators.  Since this lack of robustness is caused by a mechanism that (1) is inherent to doing inference in a continuous world with finite-information, and (2) implies that consistency and robustness are conflicting requirements, we predicted that deep/machine learning algorithms could be non robust and could lead to increased confidence in incorrect solutions (see [ talk ] by Mike McKerns at SciPy 2013 published on July 2013). Google engineers who were present at the talk tested these predictions for neural  networks and observed (Szegedy et al, Dec 2013,  Intriguing properties of neural networks) the non robustness of these algorithms to adversarial examples. This area has grown into the field  known as adversarial machine learning which studies (1) how those algorithms could be attacked by exploiting their brittleness and (2) how to protect those algorithms against those attacks. Addressing these vulnerabilities is now recognized as critical to the safety of Machine Learning Algorithms.