Marten Lienen and Stephan Günnemann / Learning the Dynamics of Physical Systems from Sparse Observations with Finite Element Networks / ICLR 2022
We will present a blog post on "Learning the Dynamics of Physical Systems from Sparse Observations with Finite Element Networks" from Marten Lienen and Stephan Günnemann [1], which has been accepted as a Spotlight presentation in ICLR 2022.
We will firstly introduce the problem in a brief, yet somewhat lengthy, necessary background about differential equations and the finite element method that constitutes a backbone of the paper.
Differential Equations are regarded by many as the language of nature. Many complex systems can be modeled by describing each single variable as a relation with others in both space and time: Partial Differential Equations (PDEs) describe such processes. A quite general formulation can be written as following:
where $u$ is a solution of the equation and are the dynamics which can be a function of time, space, itself and its derivatives. PDEs are generally either very expensive to compute if not intractable altogether. For these reason, multiple algorithms have been developed over the centuries to try and solve this extremely complex endeavor. In particular, computers are very capable of handling discretized data, in the form of digital bits instead of their continuous, analog counterparts. Can we apply some algorithm which is well suitable to computers?
The Finite Element Method (FEM) is a way to divide and conquer the realm of PDEs.
Figure 1. An example of Finite Element Method (FEM) applied to a magnetical shield.
By stacking the equations above we obtain the following linear system
Figure 2. Solving a PDE with the Galerkin method and method of lines consists of three steps.
PDEs are the language of nature and as such they are incredibly important for the scientific community. However, many hand-crafted models either take too long to compute solutions or do not have enough expressibility. Therefore, it is necessary to include at least partial, data-driven terms that can learn from past experiences.
Machine and Deep Learning have proven incredibly powerful tools for solving real-world complex phenomena: they can accelerate simulations by orders of magnitude enabling faster predictions, design and control and even describe previously unknown dynamics which cannot be derived by equations.
There are mainly two lines of research in the area of PDEs and Deep Learning: either constraining PDE solution learning with a cost function, or learning directly from data to obtain a simulator via inductive biases.
In this work, the authors follow the second path and derive a model which sprouts from research on numerical methods for differential equations and can incorporate knowledge of dynamics (such as transport terms).
We would like to find solutions to a PDE process via a data-driven simulator. Given the Finite Element Equation 1, we can rewrite its terms as following:
Figure . Finite Element Networks: we can evaluate dynamics by message passing over adjacent cells and integrating this value to obtain the future values.
Moreover, by factoring the inner product on the right side of the previous equation as
Figure . Example flow field around airfoils present convective components.
The authors consider the following baselines:
Graph WaveNet (GWN): combines temporal and graph convolutions [3]
Physics-aware Difference Graph Network (PA-DGN): estimates spatial derivatives as additional features for a recurrent graph network [4]
Continuous-time MPNN (CT-MPNN) model in uses a general MPNN to learn the continuous-time dynamics of the data [5]
Figure . CylinderFlow snapshot.
Figure . Learned flow fields of water velocities on the Black Sea dataset: T-FEN recognized the relationships between features.
Figure . Long-range extrapolations on the ScalarFlow dataset (60 time steps). FEN models perform better than the strongest baseline by also better modeling of sources and sinks.
Both networks $f_\theta$ and $g_\vartheta$ are multi-layer perceptrons (MLPs) with $\tt tanh$ nonlinearities. The number of parameters of each network was kept similar between FEN and T-FEN models and lower than baseline to demonstrate their capabilities.
Figure . Multi-step Forecasting.
Figure . Errors with super-resolution in the number of nodes.
Figure . Extrapolation over 60 steps.
Figure . T-FEN model providing an interpretable splitting between free-form and transport term.
This experiments aims at providing interpretability and a justification for the T-FEN model. Plotting the free-form and transport term separately provides an interesting view into the learning process which is interpretable - the transport represents the differences in flow field.
We have reviewed Learning the Dynamics of Physical Systems from Sparse Observations with Finite Element Networks, a novel graph paradigm for learning dynamics on graphs based on inductive biases from differential equations. The authors provided a detailed analysis of the method from the ground up - starting from the theory of Finite Element analysis - and then devised two main models variations. While the first one learns directly the solution derivative in time of the physical system, the second separates learning with a transport term which is shown to improve learning under many conditions. The experiments were conducted in one syntethic and two real-world high-dimensional datasets. Results demonstrated that the proposed models either perform competitively or outperform state-of-the-art baselines. This work represents and important contribution to the scientific machine learning community by tightly integrating the theory of Finite Element Method and Graph Neural Networks.
In particular, the domain with set of points is divided into a set of simplices (i.e., -dimensional triangles) which is called triangulation. Triangulations, such as the Delaunay triangulation, are also referred to as meshes and are frequently used in many other areas such as movie CGI, gaming and most 3D graphics. This can be seen on the left of Figure 1. Then, operations are performed on this discretized domain to obtain a solution, as shown on the right of Figure 1.
In general, the solution would lie in an infinite-dimensional space . What if, however, we cannot have infinite dimensions? Then, we need to approximate $u$ with a finite-dimensional subspace . To do so we employ basis functions $\varphi$, which map points from to . The simplest choice, which the authors use, is the P1 piecewise linear functions which map
that is basically to simply map each point in to the same values in as in Figure 2 left.
Moreover, another property of expanding is that the following holds:
i.e., the value of at the -th node is just its -th coefficient.
The piecewise linear approximation above is not differentiable everywhere. However, we can constrain the residual , i.e. the difference between and to be orthogonal to the approximation space:
where represents the spatial domain. In simpler terms, we are asking for the best possible approximation of the equations. Given this, we can now reconstruct the equation as following
where with is the so-called mass matrix, is the vector of basis coefficients of and with captures the effects of dynamics .
If we can evaluate the right hand side , then the equations are easily solvable with time derivatives. In particular, we can consider a stacked version of multiple scalar fields instead of vector ones as
where are -dimensional matrices due to -scalar fields. In practice, we have transformed a PDE into a matrix ODE (ordinary differential equation) by discretizing in space; we managed to obtain a much simpler way of solving our problem by only needing to integrate over time: a much simpler task!
where is the mass matrix and is the feature matrix - in other words, this part represents the feature update in time that we need to obtain the dynamics evolution in time. The problem at inference time then becomes:
Evaluate matrix and inverting it
Evaluating matrix
We can readily obtain by mass lumping [2] which allows for a good performance of the matrix inversion necessary to obtain . The right-hand term describing the dynamics as we have seen before requires an evaluation of the contribution of dynamics of adjacent cells:
where $\Delta$ is the set of mesh cells adjacent to and is the convex hull (i.e., smallest convex set of $\Delta$ that contains it). As we can see, evaluating $M$ (which we call the message matrix) is actually the same as operating message passing between adjacent cells. This means that we can represent these dynamics with a Message Passing Neural Network!
we can avoid numerical instabilities and learn spatial derivatives as well. This means that we can learn a model !
We have seen from Equation 2 that we can learn a model by performing message passing over adjacent cells. In particular, the learned model can be written as:
where is the center of cell , are the coordinates of cell vertices w.r.t. and are the features at the vertices at time . We have written the equations for a single message passing step, which is the update at each single time step. To obtain a whole trajectory, we need to solve the associated ODE given an initial condition and times :
This ODE can be solved in a variety of ways. In particular, the authors employ the adaptive-step solver, i.e., an solver that iterative computes the solution by calling the function multiple times. We resulting model FEN: Finite Element Network.
What if we have some extra knowledge about the domain? For example, an assumption on the dynamics could be that our solution would be at least in part governed by a convective component (i.e. describing fluid motion):
where is the divergence-free velocity term.
We can model as in the previous case while we can model the convection term by message passing with the following network :
The final model, which is called T-FEN: Trasport-FEN, is the sum of the message passing of the above convection term and and is thus designed to capture both a velocity field and remainder dynamics.
The following dataset consists of simulated flow fields around a cylinder as collected by [6]. The dataset includes velocities and pressures along with marked mesh cells representing boundary walls, inlets, outlets and cylindrical obstacles of varying sizes. The sequences contain frames and are divided in for train, validation and test. The time resolution is of .
This dataset is composed data on daily mean sea surface temperature and water velocities on the Black Sea over several years. The training data is made of frames from 2012 to 2017, validation is on frames from 2018 and testing is done with frames from the year 2019. The time resolution is of 1 day.
This dataset consists of 3D reconstructions generated by multiple camera views of rising hot smoke plumes in a real environment. The sequences contain frames and are divided in for train, validation and test. The time resolution is of (recording was done at 60 fps)[7].
This experiments aims at predicting steps in the future. We can see that FEN models either outperform or achieve similar, competitive results with the baselines.
This experiments aims at predicting steps in the future as before but with varying number of nodes, i.e. more nodes than those seen during training. FEN models outperform baselines in super-resolution: T-FEN models always perform better than FEN counterparts since they can better represent transport terms.
This experiments aims at predicting steps in the future with models trained on steps. FEN models outperform baselines since they can correctly represent sources and sinks.
The proposed model uses a simple basis - namely, linear piecewise basis function. If higher order derivatives were used, such as second order, these basis functions would evaluate to , which is thus a current limitation of the model. Another limitation is the number of function evaluations: it is shown that the models can take more than 300 evaluations, while other non-continuous models may require just one. This is due to the adaptive ODE solvers used. Although the model can theoretically describe continuous dynamics, this practically makes it way slower than one-step-prediction counterparts that do not need to evaluate an ODE.