Abstract
Mathematical convergence guarantees for diffusion models have become an active direction of research recently. Studies have uncovered non-asymptotic convergence rates for the total variation distance between generated and original samples, revealing a clear polynomial dependence on data dimensionality and error levels.In this project, we aim to establish rigorous convergence guarantees for diffusion models by leveraging tools from high-dimensional statistics and introducing novel proof techniques to improve the rates of convergence of diffusion models.We also aim to connect our statistical bounds with aspects from optimization by considering approximate rather than exact computations of the minimizers of the score-based objectives.
Participants
Prof. Dr. Johannes Lederer
Mahsa Taheri
Milena Braig
Abstract
This project develops mathematical methods for studying neural networks at three different levels: the individual functions represented by the network, the function classes, and the function classes together with a training objective. On the one hand, we seekto describe the combinatorics of the functions represented by a neuralnetwork given a particular choice of the parameter or a distributionof parameters. On the other hand, we investigate implicitcharacterizations of the function classes that can be represented by a neural network over a given training set. We use these to investigate the role of the training data and the properties of the parameter optimization problem when training neural networks.
Participants
Prof. Dr. Guido Montúfar
Marie-Charlotte Brandenburg
Abstract
Optimal feedback control is one of the areas in which methods from deep learning have an enormous impact. Deep Reinforcement Learning, one of the methods for obtaining optimal feedback laws and arguably one of the most successful algorithms in artificial intelligence, stands behind the spectacular performance of artificial intelligence in games such as Chess or Go, but has also manifold applications in science, technology and economy. Mathematically, the core question behind this method is how to best represent optimal value functions, i.e., the functions that assign the optimal performance value to each state, also known as cost-to-go function in reinforcement learning, via deep neural networks (DNNs). The optimal feedback law can then be computed from these functions. In continuous time, these optimal value functions are characterised by Hamilton-Jacobi-Bellman partial differential equation (HJB PDEs), which links the question to the solution of PDEs via DNNs. As the dimension of the HJB PDE is determined by the dimension of the state of the dynamics governing the optimal control problem, HJB equations naturally form a class of high-dimensional PDEs. They are thus prone to the well-known curse of dimensionality, i.e., to the fact that the numerical effort for its solution grows exponentially in the dimension. It is known that functions with certain beneficial structures, like compositional or separable functions, can be approximated by DNNs with suitable architecture avoiding the curse of dimensionality. For HJB PDEs characterising Lyapunov functions it was recently shown by the proposer of this project that small-gain conditions - i.e., particular conditions on the dynamics of the problem - establish the existence of separable subsolutions, which can be exploited for efficiently approximating them by DNNs via training algorithms with suitable loss functions. These results pave the way for curse-of-dimensionality free DNN-based approaches for general nonlinear HJB equations, which are the goal of this project. Besides small-gain theory, there exists a large toolbox of nonlinear feedback control design techniques that lead to compositional (sub)optimal value functions. On the one hand, these methods are mathematically sound and apply to many real-world problems, but on the other hand they come with significant computational challenges when the resulting value functions or feedback laws shall be computed. In this project, we will exploit the structural insight provided these methods for establishing the existence of compositional optimal value functions or approximations thereof, but circumvent their computational complexity by using appropriate training algorithms for DNNs instead. Proceeding this way, we will characterise optimal feedback control problems for which curse-of-dimensionality-free (approximate) solutions via DNNs are possible and provide efficient network architectures and training schemes for computing these solutions.
Participants
Prof. Dr. Lars Grüne
Mario Sperl
Abstract
This project focuses on a class of continuous-time neural ordinary differential equations for labeling metric data on graphs, in order to contribute to the theory of deep learning from three viewpoints: (i) use of information geometry for the design and understanding of deep networks in connection with structured prediction and learning; (ii) geometric characterization the dynamics of parameter learning and the interaction with state space evolutions as model of contextual decisions; (iii) study of PAC-Bayes risk bounds which quantify the performance of classification and label prediction by deep assignment flows.
Participants
Prof. Dr. Christoph Schnörr
Bastian Boll
Abstract
Laplace approximations have re-emerged as a potent and efficient tool for deep learning. They combine the two powerful paradigms of automatic differentiation and numerical linear algebrato enable functionality that had previously become niche due its high computational cost. In particular, Laplace approximations yield a Bayesian formalism for deep learning, effectively turning any deep neural network into an approximate Gaussian process. But they also define a metric, and an associated manifold to the deep network and its parameter space. This project hopes to expand recent results both in a theoretical and algorithmic direction. On the theoretical side, the project aims to leverage differential geometry to improve understanding of the computational complexity of Bayesian deep training. As a direct outcome, the project will then develop new algorithms and functional extensions of deep learning through re-parametrization, to provide better calibrated uncertainty quantification in deep learning.
Participants
Prof. Dr. Philipp Hennig
Frank Schneider
Abstract
---
Participants
Prof. Dr. Gabriele Steidl
Abstract
The vulnerability of deep learning-based systems to adversarial attacks and distribution shifts continues to pose a substantial security risk in real-world applications. One of the few truly robust defenses to such attacks is Adversarial Training (AT). While backed by a developing mathematical theory it still suffers from a trade-off between the generalization ability of the model on clean data and its robustness against adversarial attacks.In the GeoMAR project we aim to analyze the geometry of robustness, tackle the accuracy-robustness trade-off, analyze and compare the geometric properties of classifiers using a novel test-time approach, and reach scalability to large datasets. To achieve this, we will view robustness through a geometric lens and model it as a geometric regularity property of the decision boundary of a classifier. We will use this framework to develop novel geometrically motivated robust training methods, solve them using tailored optimization methods, and leverage generative models for the computation of attacks. The desired outcomes of GeoMAR are geometric, interpretable, and scalable training methods that provably mitigate the trade-off between accuracy and robustness. This way our project will promote the mathematical understanding of robustness in machine learning and generate efficient algorithms for training deep learning systems for real-world applications.
Participants
Prof. Dr. Leon Bungert
PostDoc Leo Schwinn
Abstract
----
Participants
Prof. Dr. Gero Friesecke
Abstract
–
Participants
Prof. Dr. Gitta Kutyniok
Abstract
This proposal focuses on developing a comprehensive convergence analysis for operator learning, an emerging methodology used for the efficient approximation of data-to-solution maps for parameter dependent partial differential equations (PDEs). Our motivation stems from solving problems in optimal control, traditionally reliant on sequential, costly numerical PDE solves. To this end we will provide a complete error analysis aiming to dissect and understand the bias-variance trade-off and breaking the curse of dimensionality in operator learning. Additionally, the insights gained are expected to address computational challenges in parameter uncertainty quantification and estimation, prevalent across various applied mathematics domains. The project will delve into the development of suitable network architectures, focusing on their expressivity and error analysis, with a special focus on guaranteeing small covering numbers for the function class corresponding to all network realizations. Building upon this analysis, we aim to establish a robust statistical learning framework. This framework will extend the principles of empirical risk minimization to operator learning, and yield an analysis of regressing nonlinear mappings between infinite dimensional spaces. Key tools to achieve these goals will be the exploitation of low-dimensional structures stemming from known high regularity of parameter-to-solution maps, as well as the separation of the in- and output of the operator into important low-frequency, and less important high-frequency parts. This is achieved by a so-called encoder/decoder architecture which allows to represent in- and outputs in stable representation systems such as wavelets. The practical aspect of the project involves integrating operator learning models into optimization frameworks, in order to achieve a significant reduction of the computational effort for solving PDE-constrained optimal control problems.
Participants
ProDr. Evelyn Herberg
Prof. Dr. Sven Wang
Prof. Jakob Zech
Abstract
-----
Participants
Prof. Dr. Jia-Jie Zhu
Abstract
----
Participants
Prof. Dr. Erich Kobler
Prof. Dr. Sebastian Neumayer
Abstract
-
Participants
Dr. Maximilian Engel
Prof. Dr. Christian Kühn
Dennis Chemnitz
Sara-Viola Kuntz
Abstract
-----
Participants
Prof. Dr. Holger Boche
Prof. Dr. Gitta Kutyniok
Abstract
---
Participants
Prof. Dr. Debarghya Ghoshdastidar
Abstract
Deep neural networks have emerged as highly successful and universal tools for image recovery and restoration. They achieve state-of-the-art results on tasks ranging from image denoising over super-resolution to image reconstruction from few and noisy measurements. As a consequence, they are starting to be used in important imaging technologies, such as GEs newest computational tomography scanners.
While neural networks perform very well empirically for image recovery problems, a range of important theoretical questions are wide open. Specifically, it is unclear what makes neural networks so successful for image recovery problems, and it is unclear how many and what examples are required for training a neural network for image recovery. Finally, the resulting network might or might not be sensitive to perturbations. The overarching goal of this project is to establish theory for learning to solve linear inverse problems with end-to-end neural networks by addressing those three questions.
Participants
Prof. Dr. Reinhard Heckel
Prof. Dr. Felix Krahmer
Anselm Krainovic
Abstract
Recent theoretical research on deep learning has primarily focussed on the supervised learning problem that is, learning a model using labelled data and predicting on unseen data. However, deep learning has also gained popularity in learning from unlabelled data. In particular, graph neural networks have become the method of choice for semi-supervised learning, whereas autoencoders have been successful in unsupervised representation learning and clustering. This project provides a mathematically rigorous explanation for why and when neural networks can successfully extract information from unlabelled data. To this end, two popular network architectures are studied that are designed to learn from unlabelled or partially labelled data: graph convolutional networks and autoencoder based deep clustering networks. The study considers a statistical framework for data with latent cluster structure, such as mixture models and stochastic block model. The goal is to provide guarantees for cluster recovery using autoencoders as well as the generalisation error of graph convolutional networks for semi-supervised learning under cluster assumption. The proposed analysis combines the theories of generalisation and optimisation with high-dimensional statistics to understand the influence of the cluster structure in unsupervised and semi-supervised deep learning. Specifically, the project aims to answer fundamental questions such as which types of high-dimensional clusters can be extracted by autoencoders, what is the role of graph convolutions in semi-supervised learning with graph neural networks, or what are the dynamics of training linear neural networks for these problems.
Participants
Prof. Dr. Debarghya Ghoshdastidar
Pascal Esser
Abstract
The goal of this project is to use deep neural networks as building blocks in a numerical method to solve the Boltzmann equation. This is a particularly challenging problem since the equation is a high-dimensional integro-differential equation, which at the same time possesses an intricate structure that a numerical method needs to preserve. Thus, artificial neural networks might be beneficial, but cannot be used out-of-the-box.
We follow two main strategies to develop structure-preserving neural network-enhanced numerical methods for the Boltzmann equation. First, we target the moment approach, where a structure-preserving neural network will be employed to model the minimal entropy closure of the moment system. By enforcing convexity of the neural network, one can show, that the intrinsic structure of the moment system, such as hyperbolicity, entropy dissipation and positivity is preserved. Second, we develop a neural network approach to solve the Boltzmann equation directly at discrete particle velocity level. Here, a neural network is employed to model the difference between the full non-linear collision operator of the Boltzmann equation and the BGK model, which preserves the entropy dissipation principle. Furthermore, we will develop strategies to generate training data which fully sample the input space of the respective neural networks to ensure proper functioning models.
Participants
Prof. Dr. Martin Frank
Dr. Yijia Tang
Abstract
The project "Theoretical Foundations of Uncertainty-Aware Deep Learning for Inverse Problems" extends the joint project "Foundations of Supervised Deep Learning for Inverse Problems" of the first phase of the SPP. It represents a natural continuation of our previous joint work on understanding the data-driven approaches to inverse problems in which we were able to formalize the concept of convergent data-driven regularizations, analytically derive the optimal spectral regularizers when trained in a mean-squared, a plug-and-play denoising, and an adversarial way, derived first stability estimates, and took important steps towards discretization-invariant learning by studying Fourier Neural Operators. The goal of our continuation proposal is to move to the analysis of more realistic settings in which uncertainty in the noise, uncertainty in the modeling of the forward problem, and uncertainty in the ground truth (training) distribution are taken into account. We intend to develop regularization methods robust to the above uncertainties by considering suitable adversarial training or augmentation schemes and utilizing all available data (with and without certain distribution shifts) in an optimal way by considering semi-supervised learning strategies. Furthermore, we go beyond reconstruction methods that predict a single solution by studying diffusion models for inverse problems in the light of uncertainty quantification and the sampling of diverse solutions. Finally, we focus on efficiency in practical image reconstruction problems with extremely high-resolution data by extending discretization-invariant architectures, considering regularization-by-discretization effects based on different (possibly implicit) representations of the quantity to reconstruct and consider partitionings of the data as well as the reconstruction in combination with (learned) compressed data representations.
Participants
Prof. Dr. Martin Burger
Prof. Dr. Michael Möller
Abstract
-----
Participants
Prof. Dr. IngMarius Kloft
Abstract
-----
Participants
Prof. Dr. Aleksandar Bojchevski
Abstract
Simulating complex, highly interconnected systems such as the climate, biology, or society typically involve methods from the "traditional" field of scientific computing. These methods are usually reliable and explainable through their foundations in rigorous mathematics. Examples are efficient space discretization schemes such as sparse grids and scalable, parallel solvers for partial differential equations. Unfortunately, most of them are not immediately applicable to the extremely high-dimensional, heterogeneous, and scattered data where neural networks are usually used. Methods employed in the AI community are typically not reliable or explainable in the traditional sense, and pose problems illustrated through adversarial examples and brittle generalization results. Toward the goal of explainable, reliable, and efficient AI, the Emmy Noether project on "Harmonic AI" connects the two worlds of scientific HPC and deep learning. Specifically, we combine linear operator theory and deep learning methods through harmonic analysis. Inference, classification, and training of neural networks will be formulated mostly in terms of linear algebra and functional analysis.
The first three years of the project are devoted to explainability of AI by bridging the gap to rigorous mathematics: Leveraging the common principles between the Laplace operator, Gaussian processes, and neural networks, Harmonic AI will connect AI and linear algebra. We also bridge the theory between the linear Koopman operator and deep neural networks, and bring ideas from dynamical systems theory to the AI community, to explain the layered neural networks and stochastic optimization algorithms in terms of dynamical evolution operators. The core objective in the second phase (years four to six) is to devise robust and reliable numerical algorithms harnessing the connection between linear operators and neural networks.
Throughout the project, to disseminate, demonstrate, and test the new methods in a proof of concept, we collaborate with simulation groups studying quantum dynamics and human crowds.
To support generalizability and explainability as well as to improve the performance, we will also investigate two other important properties of neural networks: sparsity and symmetry. Both need to be incorporated into an exact optimization approach in order to obtain a theoretical and practical understanding of its possibilities and limits.
Participants
Dr. Felix Dietrich
Iryna Burak
Erik Bolager