[VH24] | P. Trent Vonich and Gregory J. Hakim. Predictability limit of the 2021 pacific northwest heatwave from deep-learning sensitivity analysis, 2024. [ bib | arXiv | http ] |
[KYL+24] |
Dmitrii Kochkov, Janni Yuval, Ian Langmore, Peter Norgaard, Jamie Smith,
Griffin Mooers, Milan Klöwer, James Lottes, Stephan Rasp, Peter
Düben, Sam Hatfield, Peter Battaglia, Alvaro Sanchez-Gonzalez, Matthew
Willson, Michael P. Brenner, and Stephan Hoyer.
Neural general circulation models for weather and climate.
Nature, 2024.
[ bib |
DOI |
http ]
General circulation models (GCMs) are the foundation of weather and climate prediction1,2. GCMs are physics-based simulators that combine a numerical solver for large-scale dynamics with tuned representations for small-scale processes such as cloud formation. Recently, machine-learning models trained on reanalysis data have achieved comparable or better skill than GCMs for deterministic weather forecasting3,4. However, these models have not demonstrated improved ensemble forecasts, or shown sufficient stability for long-term weather and climate simulations. Here we present a GCM that combines a differentiable solver for atmospheric dynamics with machine-learning components and show that it can generate forecasts of deterministic weather, ensemble weather and climate on par with the best machine-learning and physics-based methods. NeuralGCM is competitive with machine-learning models for one- to ten-day forecasts, and with the European Centre for Medium-Range Weather Forecasts ensemble prediction for one- to fifteen-day forecasts. With prescribed sea surface temperature, NeuralGCM can accurately track climate metrics for multiple decades, and climate forecasts with 140-kilometre resolution show emergent phenomena such as realistic frequency and trajectories of tropical cyclones. For both weather and climate, our approach offers orders of magnitude computational savings over conventional GCMs, although our model does not extrapolate to substantially different future climates. Our results show that end-to-end deep learning is compatible with tasks performed by conventional GCMs and can enhance the large-scale physical simulations that are essential for understanding and predicting the Earth system. |
[VDM+24] | Thomas J. Vandal, Kate Duffy, Daniel McDuff, Yoni Nachmany, and Chris Hartshorn. Global atmospheric data assimilation with multi-modal masked autoencoders, 2024. [ bib | arXiv | http ] |
[VMT+24] | Anna Vaughan, Stratis Markou, Will Tebbutt, James Requeima, Wessel P. Bruinsma, Tom R. Andersson, Michael Herzog, Nicholas D. Lane, Matthew Chantry, J. Scott Hosking, and Richard E. Turner. Aardvark weather: end-to-end data-driven weather forecasting, 2024. [ bib | arXiv | http ] |
[HM24] | Gregory J. Hakim and Sanjit Masanam. Dynamical tests of a deep learning weather prediction model. Artificial Intelligence for the Earth Systems, 3(3):e230090, 2024. [ bib | DOI | http ] |
[BBL+24] | Cristian Bodnar, Wessel P. Bruinsma, Ana Lucic, Megan Stanley, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan Weyn, Haiyu Dong, Anna Vaughan, Jayesh K. Gupta, Kit Tambiratnam, Alex Archibald, Elizabeth Heider, Max Welling, Richard E. Turner, and Paris Perdikaris. Aurora: A foundation model of the atmosphere, 2024. [ bib | arXiv ] |
[RKL+24] | Thomas Rackow, Nikolay Koldunov, Christian Lessig, Irina Sandu, Mihai Alexe, Matthew Chantry, Mariana Clare, Jesper Dramsch, Florian Pappenberger, Xabier Pedruzo-Bagazgoitia, Steffen Tietsche, and Thomas Jung. Robustness of ai-based weather forecasts in a changing climate, 2024. [ bib | arXiv | http ] |
[FMS23] |
Imre Fekete, András Molnár, and Péter L. Simon.
A functional approach to interpreting the role of the adjoint
equation in machine learning.
Results in Mathematics, 79(1), December 2023.
Seminar 2024/2025.
[ bib |
DOI ]
The connection between numerical methods for solving differen- tial equations and machine learning has been revealed recently. Differen- tial equations have been proposed as continuous analogues of deep neural networks, and then used in handling certain tasks, such as image recog- nition, where the training of a model includes learning the parameters of systems of ODEs from certain points along their trajectories. Treat- ing this inverse problem of determining the parameters of a dynamical system that minimize the difference between data and trajectory by a gradient-based optimization method presents the solution of the adjoint equation as the continuous analogue of backpropagation that yields the appropriate gradients. The paper explores an abstract approach that can be used to construct a family of loss functions with the aim of fitting the solution of an initial value problem to a set of discrete or continu- ous measurements. It is shown, that an extension of the adjoint equation can be used to derive the gradient of the loss function as a continuous analogue of backpropagation in machine learning. Numerical evidence is presented that under reasonably controlled circumstances the gradients obtained this way can be used in a gradient descent to fit the solution of an initial value problem to a set of continuous noisy measurements, and a set of discrete noisy measurements that are recorded at uncertain times. |
[TNF+23] |
Derick Nganyu Tanyu, Jianfeng Ning, Tom Freudenberg, Nick Heilenkötter,
Andreas Rademacher, Uwe Iben, and Peter Maass.
Deep learning methods for partial differential equations and related
parameter identification problems.
Inverse Problems, 39(10):103001, aug 2023.
[ bib |
DOI ]
Recent years have witnessed a growth in mathematics for deep learning—which seeks a deeper understanding of the concepts of deep learning with mathematics and explores how to make it more robust—and deep learning for mathematics, where deep learning algorithms are used to solve problems in mathematics. The latter has popularised the field of scientific machine learning where deep learning is applied to problems in scientific computing. Specifically, more and more neural network (NN) architectures have been developed to solve specific classes of partial differential equations (PDEs). Such methods exploit properties that are inherent to PDEs and thus solve the PDEs better than standard feed-forward NNs, recurrent NNs, or convolutional neural networks. This has had a great impact in the area of mathematical modelling where parametric PDEs are widely used to model most natural and physical processes arising in science and engineering. In this work, we review such methods as well as their extensions for parametric studies and for solving the related inverse problems. We also show their relevance in various industrial applications. |
[MBC+23] |
Morteza Mardani, Noah Brenowitz, Yair Cohen, Jaideep Pathak, Chieh-Yu Chen,
Cheng-Chin Liu, Arash Vahdat, Karthik Kashinath, Jan Kautz, and Mike
Pritchard.
Generative residual diffusion modeling for km-scale atmospheric
downscaling, 2023.
[ bib |
arXiv ]
The state of the art for physical hazard prediction from weather and climate requires expensive km-scale numerical simulations driven by coarser resolution global inputs. Here, a km-scale downscaling diffusion model is presented as a cost effective alternative. The model is trained from a regional high-resolution weather model over Taiwan, and conditioned on ERA5 reanalysis data. To address the downscaling uncertainties, large resolution ratios (25km to 2km), different physics involved at different scales and predict channels that are not in the input data, we employ a two-step approach (ResDiff) where a (UNet) regression predicts the mean in the first step and a diffusion model predicts the residual in the second step. ResDiff exhibits encouraging skill in bulk RMSE and CRPS scores. The predicted spectra and distributions from ResDiff faithfully recover important power law relationships regulating damaging wind and rain extremes. Case studies of coherent weather phenomena reveal appropriate multivariate relationships reminiscent of learnt physics. This includes the sharp wind and temperature variations that co-locate with intense rainfall in a cold front, and the extreme winds and rainfall bands that surround the eyewall of typhoons. Some evidence of simultaneous bias correction is found. A first attempt at downscaling directly from an operational global forecast model successfully retains many of these benefits. The implication is that a new era of fully end-to-end, global-to-regional machine learning weather prediction is likely near at hand |
[BXZ+23] |
Kaifeng Bi, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu, and Qi Tian.
Accurate medium-range global weather forecasting with 3d neural
networks.
Nature, 2023.
[ bib |
DOI |
http ]
Weather forecasting is important for science and society. At present, the most accurate forecast system is the numerical weather prediction (NWP) method, which represents atmospheric states as discretized grids and numerically solves partial differential equations that describe the transition between those states1. However, this procedure is computationally expensive. Recently, artificial-intelligence-based methods2 have shown potential in accelerating weather forecasting by orders of magnitude, but the forecast accuracy is still significantly lower than that of NWP methods. Here we introduce an artificial-intelligence-based method for accurate, medium-range global weather forecasting. We show that three-dimensional deep networks equipped with Earth-specific priors are effective at dealing with complex patterns in weather data, and that a hierarchical temporal aggregation strategy reduces accumulation errors in medium-range forecasting. Trained on 39 years of global data, our program, Pangu-Weather, obtains stronger deterministic forecast results on reanalysis data in all tested variables when compared with the world's best NWP system, the operational integrated forecasting system of the European Centre for Medium-Range Weather Forecasts (ECMWF)3. Our method also works well with extreme weather forecasts and ensemble forecasts. When initialized with reanalysis data, the accuracy of tracking tropical cyclones is also higher than that of ECMWF-HRES. |
[NSB+23] | Tung Nguyen, Rohan Shah, Hritik Bansal, Troy Arcomano, Sandeep Madireddy, Romit Maulik, Veerabhadra Kotamarthi, Ian Foster, and Aditya Grover. Scaling transformer neural networks for skillful and reliable medium-range weather forecasting, 2023. [ bib | arXiv | http ] |
[HÃS23] |
Paul Häusner, Ozan Öktem, and Jens Sjölund.
Neural incomplete factorization: learning preconditioners for the
conjugate gradient method, 2023.
Seminar 2024/2025.
[ bib |
DOI ]
Finding suitable preconditioners to accelerate iterative solution methods, such as the conjugate gradient method, is an active area of research. In this paper, we develop a computationally efficient data-driven approach to replace the typically hand-engineered algorithms with neural networks. Optimizing the condition number of the linear system directly is computationally infeasible. Instead, our method generates an incomplete factorization of the matrix and is, therefore, referred to as neural incomplete factorization (NeuralIF). For efficient training, we utilize a stochastic approximation of the Frobenius loss which only requires matrix-vector multiplications. At the core of our method is a novel messagepassing block, inspired by sparse matrix theory, that aligns with the objective of finding a sparse factorization of the matrix. By replacing conventional preconditioners used within the conjugate gradient method by data-driven models based on graph neural networks, we accelerate the iterative solving procedure. We evaluate our proposed method on both a synthetic and a real-world problem arising from scientific computing and show its ability to reduce the solving time while remaining computationally efficient. |
[MZ23] |
Johannes Müller and Marius Zeinhofer.
Achieving high accuracy with pinns via energy natural gradients,
2023.
Seminar 2024/2025.
[ bib |
DOI ]
We propose energy natural gradient descent, a natural gradient method with respect to a Hessian-induced Riemannian metric as an optimization algorithm for physics-informed neural networks (PINNs) and the deep Ritz method. As a main motivation we show that the update direction in function space resulting from the energy natural gradient corresponds to the Newton direction modulo an orthogonal projection onto the model's tangent space. We demonstrate experimentally that energy natural gradient descent yields highly accurate solutions with errors several orders of magnitude smaller than what is obtained when training PINNs with standard optimizers like gradient descent or Adam, even when those are allowed significantly more computation time. |
[LAR+22] |
Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita
Smetanin, Robert Verkuil, Ori Kabeli, Yaniv Shmueli, Allan dos Santos Costa,
Maryam Fazel-Zarandi, Tom Sercu, Salvatore Candido, and Alexander Rives.
Evolutionary-scale prediction of atomic level protein structure with
a language model.
bioRxiv, 2022.
[ bib |
DOI |
arXiv |
http ]
Artificial intelligence has the potential to open insight into the structure of proteins at the scale of evolution. It has only recently been possible to extend protein structure prediction to two hundred million cataloged proteins. Characterizing the structures of the exponentially growing billions of protein sequences revealed by large scale gene sequencing experiments would necessitate a breakthrough in the speed of folding. Here we show that direct inference of structure from primary sequence using a large language model enables an order of magnitude speed-up in high resolution structure prediction. Leveraging the insight that language models learn evolutionary patterns across millions of sequences, we train models up to 15B parameters, the largest language model of proteins to date. As the language models are scaled they learn information that enables prediction of the three-dimensional structure of a protein at the resolution of individual atoms. This results in prediction that is up to 60x faster than state-of-the-art while maintaining resolution and accuracy. Building on this, we present the ESM Metagenomic Atlas. This is the first large-scale structural characterization of metagenomic proteins, with more than 617 million structures. The atlas reveals more than 225 million high confidence predictions, including millions whose structures are novel in comparison with experimentally determined structures, giving an unprecedented view into the vast breadth and diversity of the structures of some of the least understood proteins on earth.Competing Interest StatementThe authors have declared no competing interest. |
[LDF+22] |
Yang Li, Haiyu Dong, Zuliang Fang, Jonathan Weyn, and Pete Luferenko.
Super-resolution probabilistic rain prediction from satellite data
using 3d u-nets and earthformers, 2022.
[ bib |
arXiv ]
Accurate and timely rain prediction is crucial for decision making and is also a challenging task. This paper presents a solution which won the 2 nd prize in the Weather4cast 2022 NeurIPS competition using 3D U-Nets and EarthFormers for 8-hour probabilistic rain prediction based on multi-band satellite images. The spatial context effect of the input satellite image has been deeply explored and optimal context range has been found. Based on the imbalanced rain distribution, we trained multiple models with different loss functions. To further improve the model performance, multi-model ensemble and threshold optimization were used to produce the final probabilistic rain prediction. Experiment results and leaderboard scores demonstrate that optimal spatial context, combined loss function, multi-model ensemble, and threshold optimization all provide modest model gain. A permutation test was used to analyze the effect of each satellite band on rain prediction, and results show that satellite bands signifying cloudtop phase (8.7 um) and cloud-top height (10.8 and 13.4 um) are the best predictors for rain prediction. The source code is available at this https URL. |
[DHP21] |
Ronald DeVore, Boris Hanin, and Guergana Petrova.
Neural network approximation.
Acta Numerica, 30:327--444, may 2021.
[ bib |
DOI ]
Neural networks (NNs) are the method of choice for building learning algorithms. They are now being investigated for other numerical tasks such as solving high-dimensional partial differential equations. Their popularity stems from their empirical success on several challenging learning problems (computer chess/Go, autonomous navigation, face recognition). However, most scholars agree that a convincing theoretical explanation for this success is still lacking. Since these applications revolve around approximating an unknown function from data observations, part of the answer must involve the ability of NNs to produce accurate approximations. |
[MZ21] |
Johannes Müller and Marius Zeinhofer.
Error estimates for the deep ritz method with boundary penalty.
March 2021.
[ bib |
DOI |
arXiv ]
We estimate the error of the Deep Ritz Method for linear elliptic equations. For Dirichlet boundary conditions, we estimate the error when the boundary values are imposed through the boundary penalty method. Our results apply to arbitrary sets of ansatz functions and estimate the error in dependence of the optimization accuracy, the approximation capabilities of the ansatz class and -- in the case of Dirichlet boundary values -- the penalization strength λ. To the best of our knowledge, our results are presently the only ones in the literature that treat the case of Dirichlet boundary conditions in full generality, i.e., without a lower order term that leads to coercivity on all of H1(Ω). Further, we discuss the implications of our results for ansatz classes which are given through ReLU networks and the relation to existing estimates for finite element functions. For high dimensional problems our results show that the favourable approximation capabilities of neural networks for smooth functions are inherited by the Deep Ritz Method. |
[KLM21] |
Nikola Kovachki, Samuel Lanthaler, and Siddhartha Mishra.
On universal approximation and error bounds for fourier neural
operators.
2021.
[ bib |
DOI |
http ]
Fourier neural operators (FNOs) have recently been proposed as an effective framework for learning operators that map between infinite-dimensional spaces. We prove that FNOs are universal, in the sense that they can approximate any continuous operator to desired accuracy. Moreover, we suggest a mechanism by which FNOs can approximate operators associated with PDEs efficiently. Explicit error bounds are derived to show that the size of the FNO, approximating operators associated with a Darcy type elliptic PDE and with the incompressible Navier-Stokes equations of fluid dynamics, only increases sub (log)-linearly in terms of the reciprocal of the error. Thus, FNOs are shown to efficiently approximate operators arising in a large class of PDEs. |
[HLZ+20] |
Michael Hutchinson, Charline Le Lan, Sheheryar Zaidi, Emilien Dupont, Yee Whye
Teh, and Hyunjik Kim.
Lietransformer: Equivariant self-attention for lie groups, 2020.
[ bib |
DOI |
http ]
Group equivariant neural networks are used as building blocks of group invariant neural networks, which have been shown to improve generalisation performance and data efficiency through principled parameter sharing. Such works have mostly focused on group equivariant convolutions, building on the result that group equivariant linear maps are necessarily convolutions. In this work, we extend the scope of the literature to self-attention, that is emerging as a prominent building block of deep learning models. We propose the LieTransformer, an architecture composed of LieSelfAttention layers that are equivariant to arbitrary Lie groups and their discrete subgroups. We demonstrate the generality of our approach by showing experimental results that are competitive to baseline methods on a wide range of tasks: shape counting on point clouds, molecular property regression and modelling particle trajectories under Hamiltonian dynamics |
[SV18] |
Kevin Scaman and Aladin Virmaux.
Lipschitz regularity of deep neural networks: analysis and efficient
estimation.
May 2018.
[ bib |
DOI |
arXiv ]
Deep neural networks are notorious for being sensitive to small well-chosen perturbations, and estimating the regularity of such architectures is of utmost importance for safe and robust practical applications. In this paper, we investigate one of the key characteristics to assess the regularity of such methods: the Lipschitz constant of deep learning architectures. First, we show that, even for two layer neural networks, the exact computation of this quantity is NP-hard and state-of-art methods may significantly overestimate it. Then, we both extend and improve previous estimation methods by providing AutoLip, the first generic algorithm for upper bounding the Lipschitz constant of any automatically differentiable function. We provide a power method algorithm working with automatic differentiation, allowing efficient computations even on large convolutions. Second, for sequential neural networks, we propose an improved algorithm named SeqLip that takes advantage of the linear computation graph to split the computation per pair of consecutive layers. Third we propose heuristics on SeqLip in order to tackle very large networks. Our experiments show that SeqLip can significantly improve on the existing upper bounds. Finally, we provide an implementation of AutoLip in the PyTorch environment that may be used to better estimate the robustness of a given neural network to small perturbations or regularize it using more precise Lipschitz estimations. |
This file was generated by bibtex2html 1.99.