Sparsity-promoting algorithms for the discovery of informative Koopman-invariant subspaces

Shaowu Pan; Nicholas Arnold-Medabalimi; Karthik Duraisamy

doi:10.1017/jfm.2021.271

Sparsity-promoting algorithms for the discovery of informative Koopman-invariant subspaces

Published online by Cambridge University Press: 26 April 2021

Shaowu Pan

Nicholas Arnold-Medabalimi and

Karthik Duraisamy

Show author details

Shaowu Pan*: Affiliation:
Department of Aerospace Engineering, University of Michigan, Francois-Xavier Bagnoud Building, 1320 Beal Ave, Ann Arbor, MI, USA
Nicholas Arnold-Medabalimi: Affiliation:
Department of Aerospace Engineering, University of Michigan, Francois-Xavier Bagnoud Building, 1320 Beal Ave, Ann Arbor, MI, USA
Karthik Duraisamy: Affiliation:
Department of Aerospace Engineering, University of Michigan, Francois-Xavier Bagnoud Building, 1320 Beal Ave, Ann Arbor, MI, USA
*: †Email address for correspondence: shawnpan@umich.edu

Article contents

Abstract
Introduction
Review of EDMD/KDMD
Sparse identification of informative Koopman-invariant subspace
Results and discussion
Conclusions
References

Rights & Permissions

Abstract

Koopman decomposition is a nonlinear generalization of eigen-decomposition, and is being increasingly utilized in the analysis of spatio-temporal dynamics. Well-known techniques such as the dynamic mode decomposition (DMD) and its linear variants provide approximations to the Koopman operator, and have been applied extensively in many fluid dynamic problems. Despite being endowed with a richer dictionary of nonlinear observables, nonlinear variants of the DMD, such as extended/kernel dynamic mode decomposition (EDMD/KDMD) are seldom applied to large-scale problems primarily due to the difficulty of discerning the Koopman-invariant subspace from thousands of resulting Koopman eigenmodes. To address this issue, we propose a framework based on a multi-task feature learning to extract the most informative Koopman-invariant subspace by removing redundant and spurious Koopman triplets. In particular, we develop a pruning procedure that penalizes departure from linear evolution. These algorithms can be viewed as sparsity-promoting extensions of EDMD/KDMD. Furthermore, we extend KDMD to a continuous-time setting and show a relationship between the present algorithm, sparsity-promoting DMD and an empirical criterion from the viewpoint of non-convex optimization. The effectiveness of our algorithm is demonstrated on examples ranging from simple dynamical systems to two-dimensional cylinder wake flows at different Reynolds numbers and a three-dimensional turbulent ship-airwake flow. The latter two problems are designed such that very strong nonlinear transients are present, thus requiring an accurate approximation of the Koopman operator. Underlying physical mechanisms are analysed, with an emphasis on characterizing transient dynamics. The results are compared with existing theoretical expositions and numerical approximations.

JFM classification

Nonlinear Dynamical Systems: Low-dimensional models Mathematical Foundations: Machine learning

Information

Type: JFM Papers
Information: Journal of Fluid Mechanics , Volume 917 , 25 June 2021 , A18

DOI: https://doi.org/10.1017/jfm.2021.271 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright: © The Author(s), 2021. Published by Cambridge University Press

1. Introduction

Complex unsteady flow phenomena such as turbulence (Pope Reference Pope2001), flow instability (Drazin & Reid Reference Drazin and Reid2004; Lietz, Johnsen & Kushner Reference Lietz, Johnsen and Kushner2017) and fluid–structure interactions (Dowell & Hall Reference Dowell and Hall2001) are prevalent in many physical systems. To analyse and understand such phenomena, it is useful to extract coherent modes associated with important dynamical mechanisms and track their evolution. Koopman operator theory (Budišić, Mohr & Mezić Reference Budišić, Mohr and Mezić2012) offers an elegant framework to reduce spatio-temporal fields associated with nonlinear dynamics as a linear combination of time evolving modes ordered by frequency and growth rates (Rowley et al. Reference Rowley, Mezić, Bagheri, Schlatter and Henningson2009).

Consider a general continuous nonlinear dynamical system $\dot {\boldsymbol {x}} = \boldsymbol {F}(\boldsymbol {x})$, where the system state $\boldsymbol {x}$ evolves on a manifold $\mathcal {M} \subset \mathbb {R}^N$. Here $\boldsymbol {F}: \mathcal {M} \mapsto T\mathcal {M}$ is a vector-valued smooth function and $T\mathcal {M}$ is the tangent bundle, i.e. $\forall \, p \in \mathcal {M}, \boldsymbol {F}(p) \in T_p \mathcal {M}$. The aforementioned task of decomposition is equivalent to finding a set of $L$ observables associated with the system, $\{g_i\}_{i=1}^{L}$ $(g_i: \mathcal {M} \mapsto \mathbb {C})$, that evolve linearly in time while spanning the system state $\boldsymbol {x}$. The significance of the above statement is that it represents a global linearization of the nonlinear dynamical system (Budišić et al. Reference Budišić, Mohr and Mezić2012; Brunton et al. Reference Brunton, Brunton, Proctor and Kutz2016a).

Formally, this idea can be traced back to Koopman operator theory for Hamiltonian dynamical systems introduced by Koopman (Reference Koopman1931) to study the evolution of observables $g$ on $L^2(\mathcal {M})$, which is a vector space of square integrable functions defined on the manifold $\mathcal {M}$. More generally, for a certain vector space of observable functions $\mathcal {F}$ defined on the manifold $\mathcal {M}$ and $t>0$, the Koopman semigroup (parameterized by $t$) is the set $\{\mathcal {K}_t\}_{t\in \mathbb {R}^+}$: $\mathcal {K}_t: \mathcal {F} \mapsto \mathcal {F}$ that governs the dynamics of observables in the form $\mathcal {K}_t g(\boldsymbol {x}(0)) \triangleq {g}(S(t, \boldsymbol {x}_0)) = g (\boldsymbol {x}(t))$, where $S(t,\cdot )$ is the flow map of the dynamical system (Mezić Reference Mezić2013). (Note that for a discrete dynamical system resulting from the discretization (with time step ${\rm \Delta} t$) of a continuous dynamical system, the corresponding discrete-time Koopman operator is an element of this semigroup: $\mathcal {K}_{{\rm \Delta} t}$.) Here $\mathcal {K} = \lim _{t\rightarrow 0} (\mathcal {K}_t \,f - f) / t$ is referred to as the continuous-time Koopman operator, i.e. Koopman generator. While the Koopman operator is linear over the space of observables, $\mathcal {F}$ is most often infinite dimensional, e.g. $L^2(\mathcal {M})$, which makes the approximation of the Koopman operator a difficult problem. Throughout this work, we assume that $\mathcal {F} = L^2(\mathcal {M})$. Readers interested in a detailed discussion of the choice of $\mathcal {F}$ are referred to Bruce, Zeidan & Bernstein (Reference Bruce, Zeidan and Bernstein2019).

A special subspace of $\mathcal {F}$, referred to as a minimal Koopman-invariant subspace $\mathcal {G}$, has the following property: for any $\phi \in \mathcal {G}$, for any $t \in \mathbb {R}^{+}$, $\mathcal {K}_t \phi \in \mathcal {G}$ and $\boldsymbol {x} \in \mathcal {G}$. Existing techniques such as the extended dynamic mode decomposition (Williams, Kevrekidis & Rowley Reference Williams, Kevrekidis and Rowley2015) are capable of approximating Koopman operators, but typically yield high-dimensional spaces ($L$-dimensional space in figure 1). In this work we are interested in accurately approximating such a subspace – but with the minimal possible dimension and Koopman invariance – using learning techniques, as illustrated in figure 1. This can yield useful coordinates for a multitude of applications including modal analysis and control (Arbabi, Korda & Mezic Reference Arbabi, Korda and Mezic2018).

Figure 1. Schematic of transformation of a nonlinear system to a linear $L$-dimensional space and the extraction of a minimal Koopman-invariant subspace.

In general, physical systems governed by partial differential equations, e.g. fluid dynamics, are infinite dimensional. From a numerical viewpoint, the number of degrees of freedom can be related to the spatial discretization (for example, the number of grid points). Although a finite-dimensional manifold can be extracted (Holmes et al. Reference Holmes, Lumley, Berkooz and Rowley2012), e.g. $O(10)$–$O(10^3)$ dimensions via proper orthogonal decomposition (POD), finding a Koopman-invariant subspace on such manifolds is still challenging. Currently, the most popular method to approximate the Koopman operator is the dynamic mode decomposition (DMD) (Schmid Reference Schmid2010; Rowley & Dawson Reference Rowley and Dawson2017) mainly for two reasons. First, it is straightforward and computationally efficient compared with nonlinear counterparts such as extended DMD (EDMD) (Williams et al. Reference Williams, Kevrekidis and Rowley2015) and kernel DMD (KDMD) (Williams, Rowley & Kevrekidis Reference Williams, Rowley and Kevrekidis2014). Second, the essence of DMD is to decompose a spatio-temporal field into several temporally growing/decaying travelling/stationary harmonic waves, which are prevalent in fluid mechanics. However, the accuracy of DMD is limited by the assumption that the Koopman-invariant subspace lies in the space spanned by snapshots of the state $\boldsymbol {x}$. Thus, DMD is used to mainly identify and visualize coherent structures. Indeed, DMD can be interpreted as an $L^2$ projection of the action of the Koopman operator on the linear space spanned by snapshots of the system state (Korda & Mezić Reference Korda and Mezić2018b).

To overcome the above limitations, it is natural to augment the observable space with either the history of the state (Arbabi & Mezić Reference Arbabi and Mezić2017a; Brunton et al. Reference Brunton, Brunton, Proctor, Kaiser and Kutz2017; Le Clainche & Vega Reference Le Clainche and Vega2017; Kamb et al. Reference Kamb, Kaiser, Brunton and Kutz2018) or nonlinear observables of the state (Williams et al. Reference Williams, Rowley and Kevrekidis2014, Reference Williams, Kevrekidis and Rowley2015). Time-delay embedding can be very useful in reduced-order modelling of systems for which sparse measurements can be easily obtained, assuming the inputs and outputs are not high dimensional (Korda & Mezić Reference Korda and Mezić2018a). Although time-delay embedding is simple to implement and has strong connections to Takens’ embedding (Kamb et al. Reference Kamb, Kaiser, Brunton and Kutz2018; Pan & Duraisamy Reference Pan and Duraisamy2019), the main practical issue arises in reduced-order modelling of high-fidelity simulations in a predictive setting due to the requirement of a large number of snapshots of the full-order model. Furthermore, if one is only interested in the post-transient dynamics of the system state on an attractor, linear observables with time delays are sufficient to extract an informative Koopman-invariant subspace (Mezić Reference Mezić2005; Arbabi & Mezić Reference Arbabi and Mezić2017a,Reference Arbabi and Mezićb; Brunton et al. Reference Brunton, Brunton, Proctor, Kaiser and Kutz2017; Röjsel Reference Röjsel2017; Pan & Duraisamy Reference Pan and Duraisamy2019). However, if one is interested in the strongly nonlinear transient dynamics leading to an attractor or reduced-order modelling for high-fidelity numerical simulations (Carlberg et al. Reference Carlberg, Farhat, Cortial and Amsallem2013; Huang, Duraisamy & Merkle Reference Huang, Duraisamy and Merkle2018; Xu & Duraisamy Reference Xu and Duraisamy2019; Parish, Wentland & Duraisamy Reference Parish, Wentland and Duraisamy2020; Xu, Huang & Duraisamy Reference Xu, Huang and Duraisamy2020), time-delay embedding may become less appropriate as several delay snapshots of the full-order model are required to initialize the model. In that case, nonlinear observables may be more appropriate.

Driven by the interest in modal analysis and control of transient flow phenomena, we consider augmentation of the observable space with nonlinear functions of the state, e.g. EDMD (Williams et al. Reference Williams, Kevrekidis and Rowley2015)/KDMD (Williams et al. Reference Williams, Rowley and Kevrekidis2014). Although it has been reported that KDMD allows for a set of more interpretable Koopman eigenvalues (Williams et al. Reference Williams, Rowley and Kevrekidis2014) and better accuracy (Röjsel Reference Röjsel2017), issues such as mode selection, spurious modes (Kaiser, Kutz & Brunton Reference Kaiser, Kutz and Brunton2017; Zhang et al. Reference Zhang, Dawson, Rowley, Deem and Cattafesta2017) and choice of dictionary/kernel in EDMD/KDMD remain open. In fact, the choice of kernel type and hyperparameter can significantly affect the resulting eigenmodes, distribution of eigenvalues (Kutz et al. Reference Kutz, Brunton, Brunton and Proctor2016) and the accuracy of predictions (Zhang et al. Reference Zhang, Dawson, Rowley, Deem and Cattafesta2017).

Searching for an accurate and informative Koopman-invariant subspace has long been a pursuit in the DMD community. Rowley et al. (Reference Rowley, Mezić, Bagheri, Schlatter and Henningson2009) and Schmid, Violato & Scarano (Reference Schmid, Violato and Scarano2012) considered selecting dominant DMD modes in order of their amplitudes. However, following such a criterion (Tu et al. Reference Tu, Rowley, Luchtenburg, Brunton and Kutz2013; Kou & Zhang Reference Kou and Zhang2017) may result in the selection of irrelevant modes that may have large amplitudes, but decay rapidly. As a result, Tu et al. (Reference Tu, Rowley, Luchtenburg, Brunton and Kutz2013) considered weighting the loss term by the magnitude of eigenvalues to penalize the retention of fast decaying modes. Sparsity-promoting DMD (referred to as ‘spDMD’ throughout the paper) developed by Jovanović, Schmid & Nichols (Reference Jovanović, Schmid and Nichols2014) recasts mode selection in DMD as an optimization problem with a $\ell _1$ penalty. With a preference of stable modes over fast decaying ones, Tissot et al. (Reference Tissot, Cordier, Benard and Noack2014) proposed a simpler criterion based on time-averaged-eigenvalue-weighted amplitude. This was followed by Kou & Zhang (Reference Kou and Zhang2017) who used a similar criterion but computed the ‘energy’ of each mode, yielding similar performance to spDMD at a lower computational cost. Based on the orthonormal property of pseudo-inverse, Hua et al. (Reference Hua, Noorian, Moss, Leong and Gunaratne2017) proposed an ordering of Koopman modes by defining a new ‘energy’. Compared with previous empirical criteria, the ‘energy’ for each mode involves a pseudo-inverse which combines the influence from all eigenmodes, and, therefore, the contribution from each mode cannot be isolated. Instead of selecting modes from a ‘reconstruction’ perspective, Zhang et al. (Reference Zhang, Dawson, Rowley, Deem and Cattafesta2017) studied the issue of spurious modes by evaluating the deviation of the identified eigenfunctions from linear evolution in an a priori sense. Furthermore, optimized DMD (Chen, Tu & Rowley Reference Chen, Tu and Rowley2012; Askham & Kutz Reference Askham and Kutz2018) combines DMD with mode selection simultaneously, which is the forerunner of recently proposed neural network-based models for Koopman eigenfunctions in spirit (Li et al. Reference Li, Dietrich, Bollt and Kevrekidis2017; Takeishi, Kawahara & Yairi Reference Takeishi, Kawahara and Yairi2017; Lusch, Kutz & Brunton Reference Lusch, Kutz and Brunton2018; Otto & Rowley Reference Otto and Rowley2019; Yeung, Kundu & Hodas Reference Yeung, Kundu and Hodas2019; Pan & Duraisamy Reference Pan and Duraisamy2020). Regardless of the above issues related to non-convex optimization (Dawson et al. Reference Dawson, Hemati, Williams and Rowley2016), extension of optimized DMD to EDMD/KDMD is not straightforward. Further, neural network-based models require large amounts of data, are prone to local minima and lack interpretability. There have been a few attempts towards mode selection in EDMD/KDMD. Brunton et al. (Reference Brunton, Brunton, Proctor and Kutz2016a) present an iterative method that augments the dictionary of EDMD until a convergence criterion is reached for the subspace. This is effectively a recursive implementation of EDMD. Recently, Haseli & Cortés (Reference Haseli and Cortés2019) showed that given a sufficient amount of data, if there is any accurate Koopman eigenfunction spanned by the dictionary, it must correspond to one of the obtained eigenvectors. Moreover, they proposed the idea of mode selection by checking if the reciprocal of identified eigenvalue also appears when the temporal sequence of data is reversed, which is similar to the idea of comparing eigenvalues on the complex plane from different trajectories, as proposed by Hua et al. (Reference Hua, Noorian, Moss, Leong and Gunaratne2017). In contrast to the ‘bottom-up’ method of Brunton et al. (Reference Brunton, Brunton, Proctor and Kutz2016a) with subspace augmentation, Hua et al. (Reference Hua, Noorian, Moss, Leong and Gunaratne2017) proposed a ‘top-down’ subspace subtraction method relying on iteratively projecting the features onto the null space. A similar idea can be traced back to Kaiser et al. (Reference Kaiser, Kutz and Brunton2017) who propose a search for the sparsest vector in the null space.

As the main contribution of this work, we propose a novel EDMD/KDMD framework equipped with the following strategy to extract an accurate yet minimal Koopman-invariant subspace.

(i) We first evaluate the normalized maximal deviation of the evolution of each eigenfunction from linear evolution in a posteriori fashion.
(ii) Using the above criteria, we select a user-defined number of accurate EDMD/KDMD modes.
(iii) Among the accurate EDMD/KDMD modes obtained above, informative modes are selected using multi-task feature learning (Argyriou, Evgeniou & Pontil Reference Argyriou, Evgeniou and Pontil2008a; Bach et al. Reference Bach, Jenatton, Mairal and Obozinski2012).

To the best of our knowledge, this is the first model agnostic attempt to address sparse identification of an accurate minimal Koopman-invariant subspace. Our contribution also extends the classical spDMD (Jovanović et al. Reference Jovanović, Schmid and Nichols2014) in a more general setting that includes EDMD/KDMD. The applications are focused on strongly transient flows, and new classes of stable Koopman modes are identified.

The organization of the paper is as follows: In § 2 we provide a review of EDMD/KDMD in discrete time and present corresponding extensions to continuous time. Following this, we discuss current challenges in standard EDMD/KDMD. In § 3 we propose the framework of sparse identification of informative Koopman-invariant subspaces for EDMD/KDMD with hyperparameter selection and provide implementation details. A novel optimization procedure that penalizes departure from linear evolution is detailed in §§ 3.1 and 3.2. Differences and connections between spDMD, Kou's empirical criterion and our proposed framework is shown from the viewpoint of optimization. In § 4 numerical verification and modal analysis on examples from a fixed point attractor to a transient cylinder wake flow, and a transient ship airwake are provided. In § 5 conclusions are drawn.

2. Review of EDMD/KDMD

In this section we provide a summary of our framework to extract the Koopman operator using EDMD/KDMD in both discrete time and continuous time. Although the original EDMD is seldom used for data-driven Koopman analysis in fluid flows due to its poor scaling, it has recently been reported that a scalable version of EDMD can be applied to high-dimensional systems (DeGennaro & Urban Reference DeGennaro and Urban2019). Since our algorithm to extract the Koopman-invariant subspace is agnostic to the type of technique to compute the original set of Koopman modes, we include both EDMD and KDMD for completeness.

2.1. Extended DMD

2.1.1. Discrete-time formulation

For simplicity, consider $M$ sequential snapshots sampled uniformly in time with ${\rm \Delta} t$ on a trajectory, $\{\boldsymbol {x}_i\}_{i=1}^{M}$. The EDMD algorithm (Williams et al. Reference Williams, Kevrekidis and Rowley2015) assumes a dictionary of $L$ linearly independent functions $i=1,\ldots ,L$, $\psi _i(\boldsymbol {x}): \mathcal {M} \mapsto \mathbb {C}$ that approximately spans a Koopman-invariant subspace $\mathcal {F}_L$,

(2.1)

\begin{equation} \mathcal{F}_L = \textrm{span} \{ \psi_1(\boldsymbol{x}), \ldots, \psi_L(\boldsymbol{x}) \}. \end{equation}

Thus, we can write for any $g \in \mathcal {F}_{L}$, as $g(\boldsymbol {x}) = \boldsymbol {\varPsi }(\boldsymbol {x}) \boldsymbol {a}$ with $\boldsymbol {a} \in \mathbb {C}^{L}$, where the feature vector $\boldsymbol {\varPsi }(\boldsymbol {x})$ is

(2.2)

\begin{equation} \boldsymbol{\varPsi}(\boldsymbol{x}) = \begin{bmatrix} \psi_1(\boldsymbol{x}) & \ldots & \psi_L(\boldsymbol{x}) \end{bmatrix}. \end{equation}

Consider a set of $L^{'}$ observables as $\{g_l\}_{l=1}^{L^{'}} = \{\boldsymbol {\varPsi } \boldsymbol {a}_l\}_{l=1}^{L^{'}}$, where $\boldsymbol {a}_l \in \mathbb {C}^L$ is arbitrary. After the discrete-time Koopman operator $\mathcal {K}_{{\rm \Delta} t}$ is applied on each $g_l$, given data $\{\boldsymbol {x}_i\}_{i=1}^{M}$, we have the following for $l=1,\ldots ,L^{\prime }$, and $i=1,\ldots ,M-1$,

(2.3)

\begin{equation} \mathcal{K}_{{\rm \Delta} t} g_l(\boldsymbol{x}_i) = g_l(\boldsymbol{x}_{i+1}) = \boldsymbol{\varPsi}(\boldsymbol{x}_{i+1}) \boldsymbol{a}_l = \boldsymbol{\varPsi}(\boldsymbol{x}_{i}) \boldsymbol{K} \boldsymbol{a}_l + r_{i,l}, \end{equation}

where $r_{i,l}$ is simply the residual for the $l$th function on the $i$th data point. The standard EDMD algorithm seeks a constant matrix $\boldsymbol {K} \in \mathbb {C}^{L \times L}$ that governs the linear dynamics in the observable space such that the sum of the square of the residuals $r_{i,l}$ from (2.3) over all samples and functions,

(2.4)

\begin{equation} {} J(\boldsymbol{K}, \{\boldsymbol{a}_l\}_{l=1}^{L^{'}}) = \sum_{l=1}^{L^{'}}\sum_{m=1}^{M-1} |( \boldsymbol{\varPsi}(\boldsymbol{x}_{m+1}) - \boldsymbol{\varPsi}(\boldsymbol{x}_m) \boldsymbol{K} )\boldsymbol{a}_l|^2 = \lVert (\boldsymbol{\varPsi}_{\boldsymbol{x}^{'}} - \boldsymbol{\varPsi}_{\boldsymbol{x}}\boldsymbol{K}) \boldsymbol{A}^{'} \rVert^2_F, \end{equation}

is minimized over $\{\boldsymbol {x}_i\}_{i=1}^{M+1}$. In the above equation,

(2.5a,b)

\begin{gather} \boldsymbol{\varPsi}_{\boldsymbol{x}} = \begin{bmatrix} \psi_1(\boldsymbol{x}_1) & \ldots & \psi_L(\boldsymbol{x}_1) \\ \vdots & \vdots & \vdots \\ \psi_1(\boldsymbol{x}_{M-1}) & \ldots & \psi_L(\boldsymbol{x}_{M-1}) \end{bmatrix},\quad \boldsymbol{\varPsi}_{\boldsymbol{x}^{'}} = \begin{bmatrix} \psi_1(\boldsymbol{x}_2) & \ldots & \psi_L(\boldsymbol{x}_2) \\ \vdots & \vdots & \vdots \\ \psi_1(\boldsymbol{x}_{M}) & \ldots & \psi_L(\boldsymbol{x}_{M}) \end{bmatrix}, \end{gather}

(2.6)

\begin{gather}\boldsymbol{A}^{'} = \begin{bmatrix}\boldsymbol{a}_1 & \ldots & \boldsymbol{a}_{L^{'}}\end{bmatrix}. \end{gather}

Considering $\partial J/\partial \boldsymbol {K} = \boldsymbol {0}$, we obtain $\boldsymbol {\varPsi }_{\boldsymbol {x}}^{\textrm {H}} \boldsymbol {\varPsi }_{\boldsymbol {x}^{'}} \boldsymbol {A}^{'} \boldsymbol {A}^{'\textrm {H}} = \boldsymbol {\varPsi }_{\boldsymbol {x}}^{\textrm {H}} \boldsymbol {\varPsi }_{\boldsymbol {x}} \boldsymbol {K} \boldsymbol {A}^{'} \boldsymbol {A}^{'\textrm {H}}$. Thus, the corresponding minimizer $\boldsymbol {K}_{{opt}}$ is

(2.7)

\begin{equation} \boldsymbol{K}_{{opt}} = \boldsymbol{G}^{+} \boldsymbol{A} ( \boldsymbol{A}^{'} \boldsymbol{A}^{'\textrm{H}}) ( \boldsymbol{A}^{'} \boldsymbol{A}^{'\textrm{H}})^{+}, \end{equation}

where $+$ denotes the pseudo-inverse and

(2.8)

\begin{gather} \boldsymbol{G} = \sum_{m=1}^{M-1} \boldsymbol{\varPsi}(\boldsymbol{x}_m)^{\textrm{H}} \boldsymbol{\varPsi}(\boldsymbol{x}_m) = \boldsymbol{\varPsi}_{\boldsymbol{x}}^{\textrm{H}} \boldsymbol{\varPsi}_{\boldsymbol{x}} , \end{gather}

(2.9)

\begin{gather}\boldsymbol{A} = \sum_{m=1}^{M-1} \boldsymbol{\varPsi}(\boldsymbol{x}_m)^{\textrm{H}} \boldsymbol{\varPsi}(\boldsymbol{x}_{m+1}) = \boldsymbol{\varPsi}_{\boldsymbol{x}}^{\textrm{H}} \boldsymbol{\varPsi}_{\boldsymbol{x}^{'}}, \end{gather}

where $\textrm {H}$ denotes conjugate transpose. Note that when the set of observables fully span $\mathcal {F}_L$, i.e. $\boldsymbol {A}^{'}$ is full rank, $( \boldsymbol {A}^{'} \boldsymbol {A}^{'\textrm {H}}) ( \boldsymbol {A}^{'} \boldsymbol {A}^{'\textrm {H}})^{+}$ reduces to identity. Then we have the more familiar $\boldsymbol {K}_{{EDMD}}$ as

(2.10)

\begin{equation} \boldsymbol{K}_{{EDMD}} = \boldsymbol{G}^{+} \boldsymbol{A}, \end{equation}

which is independent of the choice of $\{\boldsymbol {a}_l\}_{l=1}^{L^{'}}$.

Assuming that all eigenvalues of $\boldsymbol {K}_{{EDMD}}$ are simple (this is an immediate consequence for ergodic system Parry Reference Parry2004), for $i=1,\ldots ,L$, the corresponding Koopman eigenfunctions $\boldsymbol {\varphi }_{i}$ associated with Koopman eigenvalues $\lambda _i$ are

(2.11)

\begin{equation} \varphi_i(\boldsymbol{x}) = \boldsymbol{\varPsi} (\boldsymbol{x}) \boldsymbol{v}_i, \end{equation}

where $\boldsymbol {K}_{{EDMD}} \boldsymbol {v}_i = \lambda _{i} \boldsymbol {v}_i$. Finally, the Koopman modes of a vector-valued $Q$-dimensional observable function $\boldsymbol {g}: \mathcal {M} \mapsto \mathbb {C}^Q$ are the vectors of weights $\{\boldsymbol {b}_i\}_{i=1}^{L}$ associated with the expansion in terms of eigenfunctions $\{\varphi _i\}_{i=1}^{L}$ as

(2.12)

\begin{equation} \boldsymbol{g}(\boldsymbol{x}) = \sum_{i=1}^{L} \varphi_i(\boldsymbol{x}) \boldsymbol{b}_i, \end{equation}

where $\boldsymbol {b}_i$ is often numerically determined in practice by ordinary least squares,

(2.13)

\begin{equation} \boldsymbol{B} = \begin{bmatrix} \boldsymbol{b}_1 \\ \vdots \\ \boldsymbol{b}_L \end{bmatrix} = \left(\boldsymbol{\varPsi}_{\boldsymbol{x}} \boldsymbol{V}\right)^{+} \begin{bmatrix} \boldsymbol{g}(\boldsymbol{x}_1) \\ \vdots \\ \boldsymbol{g}(\boldsymbol{x}_{M}) \end{bmatrix}, \end{equation}

with $\boldsymbol {V} = [ \boldsymbol {v}_1 \ \ldots \ \boldsymbol {v}_L ]$.

2.1.2. Continuous-time formulation

Consider $M$ data snapshots of the dynamical system with state $\boldsymbol {x}$ sampled over $\mathcal {M}$ as $\{\boldsymbol {x}_i, \dot {\boldsymbol {x}}_{i}\}_{i=1}^{M}$, where $\dot {\boldsymbol {x}}_{i} = \boldsymbol {F}(\boldsymbol {x}_i)$. Recall the generator of the semigroup of Koopman operators $\mathcal {K}: \mathcal {D}(\mathcal {K}) \mapsto \mathcal {F}, \mathcal {K} = \lim _{t\rightarrow 0} (\mathcal {K}_t \,f - f) / t$, where $\mathcal {D}(\mathcal {K})$ is the domain in which the aforementioned limit is well defined and the closure of ${\mathcal {D}(\mathcal {K})}$ is $\mathcal {F}$. One can have the evolution of any observable $\boldsymbol {g} = \boldsymbol {\varPsi } \boldsymbol {a} \in \mathcal {F}_L$ as

(2.14)

\begin{equation} \mathcal{K} \boldsymbol{g} = \dot{\boldsymbol{g}} = \boldsymbol{F} \boldsymbol{\cdot} \boldsymbol{\nabla}_{\boldsymbol{x}} \boldsymbol{\varPsi} \boldsymbol{a} = \boldsymbol{\varPsi} \boldsymbol{K} \boldsymbol{a} + r, \end{equation}

where $r$ is the residual. Similarly, one can find a $\boldsymbol {K}$ that minimizes the sum of the square of residual $r$ minimized solution,

(2.15)

\begin{equation} \boldsymbol{K}_{{EDMD}} = \boldsymbol{G}^{+} \boldsymbol{A}, \end{equation}

where

(2.16)

\begin{gather} \boldsymbol{G} = \sum_{m=1}^{M}\boldsymbol{\varPsi}(\boldsymbol{x}_m)^{\textrm{H}} \boldsymbol{\varPsi}(\boldsymbol{x}_m), \end{gather}

(2.17)

\begin{gather}\boldsymbol{A} = \sum_{m=1}^{M}\boldsymbol{\varPsi}(\boldsymbol{x}_m)^{\textrm{H}} (\boldsymbol{F} \boldsymbol{\cdot} \boldsymbol{\nabla}_{\boldsymbol{x}}) \boldsymbol{\varPsi}(\boldsymbol{x}_m) = \sum_{m=1}^{M}\boldsymbol{\varPsi}(\boldsymbol{x}_m)^{\textrm{H}} (\dot{\boldsymbol{x}}_m \boldsymbol{\cdot} \boldsymbol{\nabla}_{\boldsymbol{x}}) \boldsymbol{\varPsi}(\boldsymbol{x}_m). \end{gather}

Consider eigenvalues $\{\mu _i\}_{i=1}^L$ and eigenvectors $\{\boldsymbol {v}_i \}_{i=1}^L$ of $\boldsymbol {K}_{{EDMD}}$. Koopman eigenfunctions are in the same form as that in discrete-time formulations while continuous-time Koopman eigenvalues $\mu _i$ can be converted to the aforementioned discrete-time sense as $\lambda _i = e^{\mu _i{\rm \Delta} t}$.

2.2. Kernel DMD

2.2.1. Discrete-time formulation

Instead of explicitly expressing a dictionary of candidate functions, one can instead implicitly define a dictionary of candidate functions via the kernel trick, which is essentially the dual form of EDMD (Williams et al. Reference Williams, Rowley and Kevrekidis2014). Note that, from the EDMD formulation in (2.4), any vector in the range of $\boldsymbol {K}$ orthogonal to the range of $\boldsymbol {\varPsi }^{\textrm {H}}_{\boldsymbol {x}}$ is annihilated, and, therefore, cannot be inferred (Williams et al. Reference Williams, Rowley and Kevrekidis2014). Assuming $\boldsymbol {\varPsi }_{\boldsymbol {x}}$ is of rank $r$, we can obtain a full singular value decomposition (SVD) $\boldsymbol {\varPsi }_{\boldsymbol {x}}=\boldsymbol {Q}\boldsymbol {\varSigma }\boldsymbol {Z}^{\textrm {H}}$ and the corresponding economical-SVD as $\boldsymbol {Q}_r \boldsymbol {\varSigma }_r \boldsymbol {Z}_r^{\textrm {H}}$. With the aforementioned definitions, we have $\boldsymbol {K} = \boldsymbol {Z}_r \boldsymbol {\hat {K}} \boldsymbol {Z}_r^{\textrm {H}}$ without loss (Otto & Rowley Reference Otto and Rowley2019).

Since the multiplication by a unitary matrix preserves the Frobenius norm, we have

(2.18)

\begin{align} J(\boldsymbol{K}, \{\boldsymbol{a}_l\}_{l=1}^{L^{'}}) &= \lVert (\boldsymbol{\varPsi}_{\boldsymbol{x}^{'}} - \boldsymbol{\varPsi}_{\boldsymbol{x}} \boldsymbol{K} )\boldsymbol{A}^{'} \rVert^2_F \end{align}

(2.19)

\begin{align} &= \lVert (\boldsymbol{Q}^{\textrm{H}} \boldsymbol{\varPsi}_{\boldsymbol{x}^{'}} - \boldsymbol{Q}^{\textrm{H}} \boldsymbol{Q}_r \boldsymbol{\varSigma }_r \boldsymbol{\hat{K}} \boldsymbol{Z}^{\textrm{H}}_r )\boldsymbol{A}^{'} \rVert^2_F \end{align}

(2.20)

\begin{align} &= \lVert (\boldsymbol{Q}_r^{\textrm{H}} \boldsymbol{\varPsi}_{\boldsymbol{x}^{'}} - \boldsymbol{\varSigma}_r \boldsymbol{\hat{K}} \boldsymbol{Z}_{r}^{\textrm{H}}) \boldsymbol{A}^{'}\rVert_F^2 + \lVert \boldsymbol{Q}^{\textrm{H}}_{r,\perp} \boldsymbol{\varPsi}_{\boldsymbol{x}^{'}} \boldsymbol{A}^{'} \rVert_F^2, \end{align}

where $\boldsymbol {Q}^{\textrm {H}}_{r,\perp }$ are the last $m-r$ rows of $\boldsymbol {Q}^{\textrm {H}}$. By taking derivatives with respect to $\boldsymbol {\hat {K}}$, one can obtain the minimal-norm minimizer as

(2.21)

\begin{equation} \boldsymbol{\hat{K}}_{{opt}} = \boldsymbol{\varSigma}_r^{+} \boldsymbol{Q}_r^{\textrm{H}} \boldsymbol{\varPsi}_{\boldsymbol{x}^{'}} \boldsymbol{A}^{'} \boldsymbol{A}^{'\textrm{H}} \boldsymbol{Z}_r (\boldsymbol{Z}^{\textrm{H}}_r \boldsymbol{A}^{'} \boldsymbol{A}^{'\textrm{H}} \boldsymbol{Z}_r)^{+}. \end{equation}

Note that, since any column in the span of $\boldsymbol {A}^{'}$ that is orthogonal to the span of $\boldsymbol {Z}_r$ will be annihilated by $\boldsymbol {Z}_r^{\textrm {H}}$ and, thus, cannot be utilized to determine $\boldsymbol {\hat {K}}$, it is reasonable to restrict $\boldsymbol {a}_l$ within the column space of $\boldsymbol {Z}_r$. Assuming $L^{'}$ is sufficiently large such that the column space of $\boldsymbol {A}^{'}$ fully spans $\boldsymbol {Z}_r$, (2.22) can be proved (details in Appendix A),

(2.22)

With (2.22), we can rewrite (2.21) as the familiar KDMD formulation,

(2.23)

\begin{equation} \boldsymbol{\hat{K}}_{{KDMD}} = \boldsymbol{\varSigma}_r^{+} \boldsymbol{Q}_r^{\textrm{H}} \boldsymbol{\varPsi}_{\boldsymbol{x}^{'}} \boldsymbol{Z}_{r} = \boldsymbol{\varSigma}_r^{+} \boldsymbol{Q}_r^{\textrm{H}} \boldsymbol{\varPsi}_{\boldsymbol{x}^{'}} \boldsymbol{\varPsi}_{\boldsymbol{x}}^{\textrm{H}} \boldsymbol{Q}_r \boldsymbol{\varSigma}_r^{+}, \end{equation}

where $\boldsymbol {\varPsi }_{\boldsymbol {x}} \boldsymbol {\varPsi }_{\boldsymbol {x}}^{\textrm {H}} = \boldsymbol {Q}_r \boldsymbol {\varSigma }^2_r \boldsymbol {Q}_r^{\textrm {H}}$ with $(\boldsymbol {\varPsi }_{\boldsymbol {x}} \boldsymbol {\varPsi }_{\boldsymbol {x}}^{\textrm {H}})_{ij} = k(\boldsymbol {x}_i, \boldsymbol {x}_j)$, $(\boldsymbol {\varPsi }_{\boldsymbol {x}^{'}} \boldsymbol {\varPsi }_{\boldsymbol {x}}^{\textrm {H}})_{ij} = k(\boldsymbol {x}_{i+1}, \boldsymbol {x}_{j})$ for $1 \le i,j\le M-1$. Again, such a minimizer is independent of the choice of $\boldsymbol {A}^{'}$.

Note that, to compute Koopman eigenvalues and eigenfunctions, one would only need access to $\boldsymbol {\varPsi }_{\boldsymbol {x}^{'}} \boldsymbol {\varPsi }_{\boldsymbol {x}}^{\textrm {H}}$ and $\boldsymbol {\varPsi }_{\boldsymbol {x}} \boldsymbol {\varPsi }_{\boldsymbol {x}}^{\textrm {H}}$, i.e. the inner product among features on all data points. Fortunately, on a compact domain $\mathcal {M}$, it is well known from Mercer's theorem (Mercer Reference Mercer1909) that once a suitable non-negative kernel function $k(\cdot , \cdot ): \mathcal {M} \times \mathcal {M} \mapsto \mathbb {R}$ is defined, $k(\boldsymbol {x}, \boldsymbol {y})$ is the inner product among a potentially infinite-dimensional feature vector $\boldsymbol {\varPsi }$ evaluated at $\boldsymbol {x}, \boldsymbol {y} \in \mathcal {M}$. Note that the choice of such a feature vector might not be unique but the corresponding reproducing kernel Hilbert space (RKHS) is (Aronszajn Reference Aronszajn1950). In the case of a Gaussian kernel, one can choose the canonical feature vector $k(\boldsymbol {\cdot },\boldsymbol {x})$ which are ‘bumps’ of a certain bandwidth distributed on $\mathcal {M}$. From the view point of kernel PCA (Williams et al. Reference Williams, Rowley and Kevrekidis2014), $\boldsymbol {Q}_r$ resulting from the finite-dimensional rank truncation on the Gram matrix $\boldsymbol {\varPsi }_{\boldsymbol {x}} \boldsymbol {\varPsi }_{\boldsymbol {x}}^{\textrm {H}}$ is a numerical approximation to the $r$ most dominant variance-explained mode shapes in the RKHS evaluated on the data points (Rasmussen Reference Rasmussen2003), and $\boldsymbol {Z}_r$ represents the $r$ dominant variance-explaining directions in terms of the feature vector in the RKHS.

Similar to EDMD, given $\boldsymbol {\hat {K}}_{{KDMD}} = \boldsymbol {\hat {V}} \boldsymbol {\hat {\varLambda }} \boldsymbol {\hat {V}}^{-1}$, $\boldsymbol {\hat {V}} = [\boldsymbol {\hat {v}}_1 \ldots \boldsymbol {\hat {v}}_r]$, for $i=1,\ldots ,r$, the corresponding $i$th Koopman eigenfunctions $\varphi _i$ and Koopman modes for a vector observable $\boldsymbol {g}$ are

(2.24)

\begin{gather} \varphi_i(\boldsymbol{x}) = \boldsymbol{\varPsi}(\boldsymbol{x}) \boldsymbol{\varPsi}_{\boldsymbol{x}}^{\textrm{H}} \boldsymbol{Q}_r \boldsymbol{\varSigma}_r^{+} \boldsymbol{\hat{v}}_i, \end{gather}

(2.25)

\begin{gather}\boldsymbol{B} = ( \boldsymbol{\varPsi}_{\boldsymbol{x}} \boldsymbol{\varPsi}_{\boldsymbol{x}}^{\textrm{H}} \boldsymbol{Q}_r \boldsymbol{\varSigma}_r^{+} \boldsymbol{\hat{V}} )^{+} \begin{bmatrix} \boldsymbol{g}(\boldsymbol{x}_1) \\ \vdots \\ \boldsymbol{g}(\boldsymbol{x}_M) \end{bmatrix}. \end{gather}

2.2.2. Continuous-time formulation

To the best of our knowledge, continuous-time KDMD has not been previously reported in the literature. This can be helpful when non-uniform multi-scale samples are collected. For the kernel trick to be applied in the continuous formulation, we write $\boldsymbol {\varPsi }_{\boldsymbol {x}^{'}}$ as

(2.26)

\begin{equation} \boldsymbol{\varPsi}_{\boldsymbol{x}^{'}} = \begin{bmatrix} \boldsymbol{F}(\boldsymbol{x}_1) \boldsymbol{\cdot} \boldsymbol{\nabla}_{\boldsymbol{x}} \boldsymbol{\varPsi}(\boldsymbol{x}_1) \\ \vdots \\ \boldsymbol{F}(\boldsymbol{x}_M) \boldsymbol{\cdot} \boldsymbol{\nabla}_{\boldsymbol{x}} \boldsymbol{\varPsi}(\boldsymbol{x}_M) \end{bmatrix} = \begin{bmatrix} \dot{\boldsymbol{x}}_1 \boldsymbol{\cdot} \boldsymbol{\nabla}_{\boldsymbol{x}} \boldsymbol{\varPsi}(\boldsymbol{x}_1) \\ \vdots \\ \dot{\boldsymbol{x}}_M \boldsymbol{\cdot} \boldsymbol{\nabla}_{\boldsymbol{x}} \boldsymbol{\varPsi}(\boldsymbol{x}_M) \end{bmatrix}. \end{equation}

To compute $\boldsymbol {\varPsi }_{\boldsymbol {x}^{'}} \boldsymbol {\varPsi }_{\boldsymbol {x}}^{\textrm {H}}$, denoting the $q$th component of $\boldsymbol {F}$ as $f_q$,

(2.27)

\begin{align} (\boldsymbol{\varPsi}_{\boldsymbol{x}^{'}} \boldsymbol{\varPsi}^{\textrm{H}}_{\boldsymbol{x}} )_{ij}&= \boldsymbol{F}(\boldsymbol{x}_i) \boldsymbol{\cdot} \boldsymbol{\nabla}_{\boldsymbol{x}} \boldsymbol{\varPsi}(\boldsymbol{x}_i) \boldsymbol{\varPsi}^{\textrm{H}}(\boldsymbol{x}_j) \nonumber\\ &= \left.\sum_{l=1}^L \sum_{q=1}^{N} \left( f_q(\boldsymbol{x}) \dfrac{\partial \psi_l (\boldsymbol{x})}{\partial x_q}\right)\right\vert_{\boldsymbol{x} = \boldsymbol{x}_i} \bar{\psi_l(\boldsymbol{x})}\vert_{\boldsymbol{x}=\boldsymbol{x}_j}\nonumber\\ &= \left.\sum_{q=1}^{N} f_q(\boldsymbol{x}_i) \dfrac{\partial}{\partial x_q} \sum_{l=1}^L ( \psi_l(\boldsymbol{x}) \bar{\psi_l(\boldsymbol{x}')} ) \right\vert_{\boldsymbol{x}=\boldsymbol{x}_i, \boldsymbol{x}'=\boldsymbol{x}_j} \nonumber\\ &= \boldsymbol{F}(\boldsymbol{x}_i) \boldsymbol{\cdot} \boldsymbol{\nabla}_{\boldsymbol{x}} k(\boldsymbol{x}, \boldsymbol{x}')\vert_{\boldsymbol{x}=\boldsymbol{x}_i, \boldsymbol{x}'=\boldsymbol{x}_j} \nonumber\\ & = \dot{\boldsymbol{x}_i} \boldsymbol{\cdot} \boldsymbol{\nabla}_{\boldsymbol{x}} k(\boldsymbol{x}, \boldsymbol{x}')\vert_{\boldsymbol{x}=\boldsymbol{x}_i, \boldsymbol{x}'=\boldsymbol{x}_j}, \end{align}

where the overline symbol is the complex-conjugate operator, and the appearance of Jacobian indicates that a differentiable kernel function is required for the extension to continuous time. For common kernels used in Koopman analysis, the kernel function, Jacobian and hyperparameters are listed in table 1.

Table 1. Common choice of differentiable kernel functions.

2.3. Challenges in EDMD/KDMD

In this section we briefly discuss two broad challenges in the use of EDMD and KDMD for Koopman analysis.

2.3.1. Mode selection

The number of approximated Koopman tuples (eigenfunction, eigenvalue, modes) from EDMD grows with the dictionary size, whereas the KDMD grows with the number of snapshots. However, in most cases, a significant number of the eigenfunctions fail to evolve linearly, or are redundant in contribution to the reconstruction of the state $\boldsymbol {x}$. For example, as shown by Budišić et al. (Reference Budišić, Mohr and Mezić2012), the Koopman eigenfunctions that vanish nowhere form an Abelian group under pointwise products of functions, while polynomial observables evolve linearly for a general linear system. These eigenfunctions, associated with the polynomial observables, are redundant in terms of providing an intrinsic coordinate for the linear dynamics.

When the number of features is larger than the number of data snapshots, EDMD eigenvalues can be misleading (Otto & Rowley Reference Otto and Rowley2019) and often plagued with spurious eigenfunctions that do not evolve linearly even when the number of data snapshots is sufficient. Analytically, it is clear that a Koopman eigenfunction in the span of the dictionary will be associated with one of the eigenvectors obtained from EDMD, given $\boldsymbol {\varPsi }_{\boldsymbol {x}}$ is full rank, and contains sufficient snapshots $M$ (Haseli & Cortés Reference Haseli and Cortés2019). Indeed, the EDMD is an $L_2$ projection of the Koopman operator under the empirical measure (Korda & Mezić Reference Korda and Mezić2018b). As a result, we seek a Koopman-invariant subspace following the standard EDMD/KDMD. Since KDMD can be viewed as an efficient way of populating a dictionary of nonlinear features in high-dimensional spaces, the above arguments apply to KDMD as well. It should be noted that numerical conditioning can play a critical role since full rank matrices can be ill-conditioned.

2.3.2. Choice of dictionary (for EDMD) or kernel (for KDMD)

Although the use of a kernel defines an infinite-dimensional feature space, the resulting finite number of effective features can still be affected by both the type of kernel and the hyperparameters in the kernel, as clearly shown by Kutz et al. (Reference Kutz, Brunton, Brunton and Proctor2016). Compared with EDMD/KDMD, which are based on a fixed dictionary of features, neural network approaches (Lusch et al. Reference Lusch, Kutz and Brunton2018; Otto & Rowley Reference Otto and Rowley2019; Pan & Duraisamy Reference Pan and Duraisamy2020) have the potential to be more expressive in searching for a larger Koopman-invariant subspace. From a kernel viewpoint (Cho & Saul Reference Cho and Saul2009), feedforward neural networks enable adaptation of the kernel function to the data. Such a characteristic could become significant when the underlying Koopman eigenfunction is discontinuous. From an efficiency standpoint, a kernel-guided scalable EDMD (DeGennaro & Urban Reference DeGennaro and Urban2019) may be pursued. This can be achieved by generating kernel-consistent random Fourier features or approximating a few components of the feature vector constructed from Mercer's theorem, i.e. the eigenfunctions of the Hilbert–Schmidt integral operator on the RKHS.

3. Sparse identification of informative Koopman-invariant subspace

To address the challenges described in § 2.3, we develop a novel framework that uses EDMD/KDMD modes to identify a sparse, accurate and informative Koopman-invariant subspace. Our framework first prunes spurious, inaccurate eigenmodes and second determines a sparse representation of the system state $\boldsymbol {x}$ from the accurate eigenmodes. In addition to the training data, as required in standard EDMD/KDMD, a validation trajectory data set is required to avoid overfitting on training data. The terms spEDMD/spKDMD will refer to filtered mode selections of EDMD and KDMD, respectively.

3.1. Pruning spurious modes by a posteriori error analysis

Given a validation trajectory $\boldsymbol {x}(t)$ where $t \in [0,T]$ associated with the nonlinear dynamical system, for $i =1,\ldots ,L$, we define the goodness of $i$th eigenfunctions in a posteriori way as the maximal normalized deviation from linear evolution conditioned on trajectory $\boldsymbol {x}(t)$ as $Q_i$ in the form

(3.1)

\begin{gather} e_{i, \boldsymbol{x}(0)}(t) = \dfrac{\lvert \varphi_i(\boldsymbol{x}(t)) - e^{\lambda_i t} \varphi_i(\boldsymbol{x}(0)) \rvert}{ \lVert \varphi_i(\boldsymbol{x}) \rVert_2}, \end{gather}

(3.2)

\begin{gather}Q_i \triangleq e^{max}_{i, \boldsymbol{x}(0)} = \max_t e_{i, \boldsymbol{x}(0)}(t), \end{gather}

where $\lVert \varphi _i(\boldsymbol {x}) \rVert _2 \triangleq \sqrt {({1}/{T}) \int _{0}^{T} \varphi _i^*(\boldsymbol {x}(t))\varphi _i(\boldsymbol {x}(t)) \, \textrm {d} t}$. In practice, we evaluate the above integral terms discretely in time. A similar a priori and less restrictive method has been previously proposed (Zhang et al. Reference Zhang, Dawson, Rowley, Deem and Cattafesta2017). In contrast, in the proposed method, the maximal error is evaluated in an a posteriori way to better differentiate spurious modes from accurate ones. For any $1 \le \hat {L} \le L$, we can always select top $\hat {L}$ accurate eigenmodes out of $L$ eigenmodes denoting their index in eigen-decomposition as $\{i_1,i_2,\ldots ,i_{\hat {L}}\}$, i.e. $Q_{i_1} \le \ldots \le Q_{i_{\hat {L}}} \le \ldots \le Q_{i_L}$. Then, for the next sparse reconstruction step, we simply use $\boldsymbol {\varphi }$ defined as follows to reconstruct the state $\boldsymbol {x}$,

(3.3)

\begin{equation} \boldsymbol{\varphi}_{\hat{L}}(\boldsymbol{x}(t)) \triangleq \begin{bmatrix}\varphi_{ i_1 }(\boldsymbol{x}(t)) & \ldots & \varphi_{ i_{\hat{L}} }(\boldsymbol{x}(t)) \end{bmatrix} \in \mathbb{C}^{\hat{L}}. \end{equation}

To choose an appropriate $\hat {L}$ to linearly reconstruct the system state $\boldsymbol {x}$, we monitor the normalized reconstruction error for the aforementioned set of top $\hat {L}$ accurate eigenmodes in the form

(3.4)

\begin{equation} R_{\hat{L}} \triangleq \dfrac{ \lVert (\boldsymbol{I} - \boldsymbol{\varPsi}_{\hat{L}}\boldsymbol{\varPsi}_{\hat{L}}^{+}) \boldsymbol{X} \rVert_F}{ \lVert \boldsymbol{X} \rVert_F }, \end{equation}

where $\boldsymbol {I}$ is the identity matrix, and

(3.5a,b)

\begin{equation} \boldsymbol{X} = \begin{bmatrix} \boldsymbol{x}_1\\ \vdots\\ \boldsymbol{x}_M \end{bmatrix}, \quad \boldsymbol{\varPsi}_{\hat{L}} = \begin{bmatrix} \boldsymbol{\varphi}_{\hat{L}} (\boldsymbol{x}_1)\\ \vdots\\ \boldsymbol{\varphi}_{\hat{L}} (\boldsymbol{x}_M) \end{bmatrix}. \end{equation}

As a result, the evaluation of (3.4) for each $\hat {L}$ is of similar expense to least-square regression. For an increasing number of selected eigenfunctions $\hat {L}$, the reconstruction error $R_{\hat {L}}$ decreases, while the largest linear evolution error $Q_{i_{\hat {L}}}$ increases. Then, a truncation $\hat {L}$ can be defined by the user to strike a balance between linear evolution error $Q_{i_{\hat {L}}}$ and reconstruction error $R_{\hat {L}}$. In the next subsection we will further select a subset of eigenmodes for spanning the minimal Koopman-invariant subspace.

3.2. Sparse reconstruction via multi-task feature learning

Numerical experiments revealed that, in the selected set of $\hat {L}$ most accurate eigenfunctions, two types of redundant eigenfunctions were found:

(i) nearly constant eigenfunctions with eigenvalues close to zero;
(ii) pointwise products of Koopman eigenfunctions introduced by nonlinear observables, not useful in linear reconstruction.

To filter the above modes, we consider sparse regression with $\hat {L}$ most accurate eigenfunctions as features and the system state $\boldsymbol {x}$ as target. Note that, since we have guaranteed the accuracy of selected eigenmodes, one can either choose features a priori $\varphi _i(\boldsymbol {x}(t))$ or a posteriori (multi-step prediction) $e^{\lambda _i t } \varphi _i(\boldsymbol {x}(0))$. Here we choose the latter since it is directly related to prediction, and can actually be reused from the previous step without additional computational cost. We denote the corresponding multi-step prediction feature matrix as $\boldsymbol {\hat {\varPsi }}_{\hat {L}}$ ,

(3.6)

\begin{equation} \boldsymbol{\hat{\varPsi}}_{\hat{L}} = \begin{bmatrix} \boldsymbol{\varphi}_{\hat{L}} (\boldsymbol{x}_1) \\ \boldsymbol{\varphi}_{\hat{L}} (\boldsymbol{x}_1) e^{{\rm \Delta} t\varLambda_{\hat{L}}}\\ \vdots\\ \boldsymbol{\varphi}_{\hat{L}} (\boldsymbol{x}_1) e^{(M-1){\rm \Delta} t\varLambda_{\hat{L}}} \end{bmatrix} \in \mathbb{C}^{M \times \hat{L}}, \end{equation}

where $\varLambda _{\hat {L}} = \textrm {diag}(\lambda _{i_1},\ldots ,\lambda _{i_{\hat {L}}})$. Note that similar features $\boldsymbol {\hat {\varPsi }}_{\hat {L}}$ were also considered in spDMD (Jovanović et al. Reference Jovanović, Schmid and Nichols2014) and optimized DMD (Chen et al. Reference Chen, Tu and Rowley2012). Finally, the fact that there is no control over the magnitudes of the implicitly defined features in the standard KDMD may cause unequal weighting between different features. Thus, we consider scaling the initial value of all eigenfunctions to be unity in (3.7),

(3.7)

\begin{equation} \boldsymbol{\hat{\varPsi}}_{\hat{L},{scaled}} = \boldsymbol{\hat{\varPsi}}_{\hat{L}} \boldsymbol{\varLambda}^{{-}1}_{ini} = \begin{bmatrix} 1 & \ldots & 1 \\ e^{{\rm \Delta} t\lambda_{i_1}} & \ldots & e^{{\rm \Delta} t\lambda_{i_{\hat{L}}} }\\ \vdots & \vdots & \vdots \\ e^{({M-1}){\rm \Delta} t\lambda_{i_1}} & \ldots & e^{({M-1}){\rm \Delta} t\lambda_{i_{\hat{L}}} } \end{bmatrix}, \end{equation}

where

(3.8)

\begin{equation} \boldsymbol{\varLambda}_{ini} = \textrm{diag}(\begin{bmatrix} \varphi_{i_1}(\boldsymbol{x}_1) & \ldots & \varphi_{i_{\hat{L}}}(\boldsymbol{x}_1) \end{bmatrix}). \end{equation}

Since $\boldsymbol {x}$ is finite dimensional, searching for a sparse combination of $\boldsymbol {\hat {\varPsi }}_{\hat {L}}$ to reconstruct $\boldsymbol {x}$ is equivalent to the solution of a multi-task feature learning problem with preference over a relatively small size of features. Note that this type of problem has been studied extensively in the machine learning community (Argyriou et al. Reference Argyriou, Evgeniou and Pontil2008a, Reference Argyriou, Pontil, Ying and Micchelli; Zhao et al. Reference Zhao, Sun, Ye, Chen, Lu and Ramakrishnan2015). In this work, given $\boldsymbol {X}$ and $\boldsymbol {\hat {\varPsi }}_{\hat {L},{scaled}}$, we leverage the multi-task ElasticNet (Pedregosa et al. Reference Pedregosa2011) to search for a row-wise sparse $\boldsymbol {B}^{'}$, which solves the following convex optimization problem:

(3.9)

\begin{equation} \boldsymbol{B}^{'*} = \mathop{{\rm arg\;min}}\limits_{\boldsymbol{B}^{'} \in \mathbb{C}^{\hat{L} \times N}} \frac{1}{2M}\lVert \boldsymbol{X} - \boldsymbol{\hat{\varPsi}}_{\hat{L},{scaled}}\boldsymbol{B}^{'} \rVert_{F}^2 + \alpha \rho \lVert \boldsymbol{B}^{'} \rVert_{2,1} + \frac{\alpha (1-\rho)}{2} \lVert \boldsymbol{B}^{'} \rVert_F^2 \end{equation}

and

(3.10)

\begin{equation} \boldsymbol{B} = \boldsymbol{\varLambda}^{{-}1}_{ini}\boldsymbol{B}^{'*}. \end{equation}

Here $\lVert \cdot \rVert _{2,1}$ defined in (3.11) is the so-called $\ell _1/\ell _2$ norm for a matrix $\boldsymbol {W}$,

(3.11)

\begin{equation} \lVert \boldsymbol{W} \rVert_{2,1} \triangleq \sum_{i} \sqrt{\sum_{j}\boldsymbol{W}_{ij}^2} = \sum_{i} \lVert \boldsymbol{w}_i \rVert_2, \end{equation}

and $\boldsymbol {W}_i$ is $i$th row of $\boldsymbol {W}$. This norm is special in that it controls the number of shared features learned across all tasks, i.e. $i$th Koopman mode $\boldsymbol {b}_i$ is either driven to a zero vector or not while the standard $\ell _1$ only controls the number of features for each task independently.

As a simple illustration, the $\ell _1/\ell _2$ norm for three different $N \times N$ square matrices (here $N=5$) with 0-1 binary entries is displayed in figure 2. Since $\sqrt {N} \le 1 + \sqrt {N-1} \le N$, minimizing the $\ell _1/\ell _2$ norm leads to a penalty on the number of rows. As shown in the second term on the right-hand side of (3.9), minimizing the $\ell _1/\ell _2$ norm penalizes the number of Koopman eigenmodes.

Figure 2. Illustration of $\ell _1/\ell _2$ norm (defined in (3.11)) for different $N\times N$ 0-1 binary matrices.

The above procedure not only serves the purpose of selecting modes that explain the behaviour of all components in the state, but is also particularly natural for EDMD/KDMD since Koopman modes are obtained via regression. Here $\alpha$ is a penalty coefficient that controls the amount of total regularization in the $\ell _1/\ell _2$ and $\ell _2$ norms, while $\rho$ is the ElasticNet mixing parameter (Zou & Hastie Reference Zou and Hastie2005) that ensures uniqueness of the solution when highly correlated features exist. In our case, we choose $\rho = 0.99$ and sweep $\alpha$ over a certain range with $L_r$ non-zero features denoted as $\boldsymbol {\hat {\varPsi }}_{L_r}$ for each $\alpha$, while monitoring the normalized residual $\min _{\boldsymbol {B}}\lVert \boldsymbol {X} - \boldsymbol {\hat {\varPsi }}_{L_r}\boldsymbol {B} \rVert _F/\lVert \boldsymbol {X} \rVert _F$ to choose an appropriate $\alpha$. It has to be mentioned that sometimes the sparsest solution from a multi-task ElasticNet was found to shrink to a small number instead of zero. This is a consequence of the insufficiency of the current optimization algorithm which employs coordinate descent (Pedregosa et al. Reference Pedregosa2011). Hence, for each target component, we consider an additional hard-thresholding step by setting the corresponding magnitude of the coefficient, i.e. contribution of any mode, to zero if it is smaller than a certain threshold $\epsilon \in [10^{-2},10^{-3}]$.

Finally, we refit the Koopman modes as $\boldsymbol {B}_{L_r} = \boldsymbol {\hat {\varPsi }}_{L_r}^{+}\boldsymbol {X}$ which avoids the bias introduced by the penalty term (spDMD does not refit $\boldsymbol {B}$) in (3.9). To summarize, the general idea of the framework is illustrated in figure 3. As a side note for interested readers, if one only performs multi-task feature learning without hard thresholding and refitting, one would obtain a smooth ElasticNet path instead of a discontinuous one with hard thresholding and refitting. However, the smooth ElasticNet can lead to difficulties in choosing the proper $\alpha$ visually, especially when the given dictionary of EDMD/KDMD is not rich enough to cover an informative Koopman-invariant subspace. Further discussion on the computational complexity of our framework is presented in appendix B.

Figure 3. Schematic illustrating the idea of sparse identification of Koopman-invariant subspaces for EDMD and KDMD.

Thus far, we have presented our main contribution: a novel optimization-based framework to search for an accurate and minimal Koopman-invariant subspace from data. An appealing aspect of our framework is the model agnostic property, which makes the extension easy from the standard EDMD/KDMD to more advanced approximation methods (Azencot, Yin & Bertozzi Reference Azencot, Yin and Bertozzi2019; Jungers & Tabuada Reference Jungers and Tabuada2019; Mamakoukas et al. Reference Mamakoukas, Castano, Tan and Murphey2019). In the following subsection we present two mathematical insights: 1) multi-task feature learning generalizes spDMD under a specific constraint; 2) a popular empirical criterion can be viewed as a single step of proximal gradient descent.

3.2.1. Relationship between spDMD, Kou's criterion and multi-task feature learning

For simplicity, neglecting the ElasticNet part (i.e. using $\rho =1$), (3.9) with $L$ modes leads to a multi-task Lasso problem,

(3.12)

\begin{equation} \min_{\boldsymbol{B}^{'}\in\mathbb{C}^{L \times N}} \frac{1}{2M}\lVert \boldsymbol{X} - \boldsymbol{\hat{\varPsi}}_{L,\textrm{scaled}}\boldsymbol{B}^{'} \rVert_{F}^2 + \alpha \lVert \boldsymbol{B}^{'} \rVert_{2,1}. \end{equation}

Recall that in spDMD (Jovanović et al. Reference Jovanović, Schmid and Nichols2014), DMD modes $\phi _1,\ldots ,\phi _L$ with $\lVert \phi _i \rVert _2 = 1$ remain the same as standard DMD. Similarly, if we posit a structural constraint on $\boldsymbol {B}^{'}$ in (3.12) by enforcing the modes as those from DMD, then there exist $\alpha _1,\ldots ,\alpha _{L}$ such that

(3.13)

\begin{equation} \boldsymbol{B}^{'} = \begin{bmatrix} \alpha_1 & & \\ & \ddots & \\ & & \alpha_{L} \end{bmatrix} \begin{bmatrix} \phi_1^{\textrm{T}} \\ \vdots\\ \phi_{L}^{\textrm{T}} \end{bmatrix}. \end{equation}

Note the fact that $\lVert \boldsymbol {B}^{'} \rVert _{2,1} = \sum _{i}^{L} \lvert \alpha _i \rvert$. Hence, we recover the $\ell _1$ optimization step in the spDMD (Jovanović et al. Reference Jovanović, Schmid and Nichols2014) from (3.12) as

(3.14)

\begin{align} \min_{\alpha_1,\ldots,\alpha_{L} \in \mathbb{C}} \frac{1}{2M}\left\Vert \boldsymbol{X} - \begin{bmatrix} 1 & \ldots & 1 \\ \vdots & \vdots & \vdots \\ e^{({M-1}){\rm \Delta} t\lambda_{i_1}} & \ldots & e^{({M-1}){\rm \Delta} t\lambda_{i_{L}} } \end{bmatrix} \begin{bmatrix} \alpha_1 & & \\ & \ddots & \\ & & \alpha_{L} \end{bmatrix} \begin{bmatrix} \phi_1^{\textrm{T}} \\ \vdots\\ \phi_{L}^{\textrm{T}} \end{bmatrix} \right\Vert^2_F + \alpha \sum_{i=1}^{L}\lvert \alpha_i \rvert, \end{align}

where $\phi _1,\ldots ,\phi _{L}$ are the DMD modes corresponding to eigenvalues as $\lambda _1,\ldots ,\lambda _{L}$. Hence, multi-task feature learning solves a less constrained optimization than spDMD in the context of DMD.

Kou & Zhang (Reference Kou and Zhang2017) proposed an empirical criterion for mode selection by ordering modes with ‘energy’ $I_i$ defined as

(3.15)

\begin{equation} I_i = \sum_{j=1}^{M} |\alpha_i e^{(j-1){\rm \Delta} t \lambda_i}| = \begin{cases} \dfrac{|\alpha_i|(1-\lvert e^{{\rm \Delta} t \lambda_i}\rvert^M)}{1-\lvert e^{{\rm \Delta} t \lambda_i}\rvert} & \text{if } \lvert e^{{\rm \Delta} t \lambda_i} \rvert \neq 1,\\ M|\alpha_i| & \text{otherwise}. \end{cases} \end{equation}

From an optimization viewpoint, consider a posteriori prediction matrix $\boldsymbol {X}_{{DMD}}$ from DMD,

(3.16)

\begin{equation} \boldsymbol{X} \approx \boldsymbol{X}_{{DMD}} = \begin{bmatrix} 1 & \ldots & 1 \\ \vdots & \vdots & \vdots \\ e^{({M-1}){\rm \Delta} t\lambda_{1}} & \ldots & e^{({M-1}){\rm \Delta} t\lambda_{{L}} } \end{bmatrix} \begin{bmatrix} \alpha_1 & & \\ & \ddots & \\ & & \alpha_{L} \end{bmatrix} \begin{bmatrix} \phi_1^{\textrm{T}} \\ \vdots\\ \phi_{L}^{\textrm{T}} \end{bmatrix}, \end{equation}

where $\boldsymbol {X}_{{DMD}}$ is determined from DMD using the snapshot pair $(\boldsymbol {X},\boldsymbol {X}^{'})$. Here $\boldsymbol {X}_{{DMD}}$ is a rank-1 summation of contributions from different modes (Schmid Reference Schmid2010). Hence, a general mode selection technique with a user-defined preference weighting $\boldsymbol {w}$ is the following weighted $\ell _0$ non-convex optimization problem:

(3.17)

\begin{equation} \min_{\boldsymbol{a} \in \mathbb{C}^{L}}\left\lVert \boldsymbol{X}_{{DMD}} - \begin{bmatrix} 1 & \ldots & 1 \\ \vdots & \vdots & \vdots \\ e^{({M-1}){\rm \Delta} t\lambda_{1}} & \ldots & e^{({M-1}){\rm \Delta} t\lambda_{{L}} } \end{bmatrix} \textrm{diag}(\boldsymbol{a}) \begin{bmatrix} \phi_1^{\textrm{T}} \\ \vdots\\ \phi_{L}^{\textrm{T}} \end{bmatrix}\right\rVert^2_F + \lambda \lVert \boldsymbol{a} \rVert_{\boldsymbol{w},0}. \end{equation}

Here $\lVert \boldsymbol {a} \rVert _{\boldsymbol {w},0} \triangleq \sum _{i} w_i|a_i|^{0}$, $|a_i|^0$ is one if $a_i \neq 0$ and zero otherwise. Note that this pseudo-norm can be viewed as a limiting case of a weighted composite sine function smoothed $\ell _0$ regularization (Wang et al. Reference Wang, Wang, Xiang and Hui Yue2019).

To solve this non-convex optimization problem, compared with the popular $\ell _1$ relaxation method such as the one in spDMD, a less known but rather efficient way is iterative least-squares hard thresholding. This has been used in sparse identification of dynamical systems (SINDy) (Brunton, Proctor & Kutz Reference Brunton, Proctor and Kutz2016b), and convergence to a local minimum has been proved (Zhang & Schaeffer Reference Zhang and Schaeffer2019). Indeed, a more rigorous framework that generalizes such an algorithm is the proximal gradient method (Parikh & Boyd Reference Parikh and Boyd2014). Much like Newton's method is a standard tool for unconstrained smooth optimization, the proximal gradient method is the standard tool for constrained non-smooth optimization. Here, it is straightforward to derive the iterative algorithm that extends to the weighted $\ell _0$ norm from step $k$ to step $k+1$ as

(3.18)

\begin{equation} \boldsymbol{a}^{k+1} = {prox}_{({\lambda}/{2})\eta_k \lVert \cdot \rVert_{\boldsymbol{w},0}} (\boldsymbol{a}^k - \eta_k \boldsymbol{\nabla}_{\boldsymbol{a}}\mathcal{Q}(\boldsymbol{a}^{k}) ), \end{equation}

where

(3.19)

\begin{equation} \mathcal{Q}(\boldsymbol{a}) = \frac{1}{2}\left\lVert \boldsymbol{X}_{{DMD}} - \begin{bmatrix} 1 & \ldots & 1 \\ \vdots & \vdots & \vdots \\ e^{({M-1}){\rm \Delta} t\lambda_{1}} & \ldots & e^{({M-1}){\rm \Delta} t\lambda_{{L}} } \end{bmatrix} \textrm{diag}(\boldsymbol{a}) \begin{bmatrix} \phi_1^{\textrm{T}} \\ \vdots\\ \phi_{L}^{\textrm{T}} \end{bmatrix}\right\rVert^2_F, \end{equation}

and $\eta _k$ is the step size at step $k$. Note that the weighted $\ell _0$ norm is a separable sum of $a_i$. After some algebra, we have the proximal operator as

(3.20)

\begin{equation} {prox}_{({\lambda}/{2})\eta_k \lVert \cdot \rVert_{\boldsymbol{w},0}} (\boldsymbol{a}) = \begin{bmatrix} \mathcal{H}_{\sqrt{\lambda \eta_k}}(a_1/\sqrt{w_1}) & \ldots & \mathcal{H}_{\sqrt{\lambda \eta_k}}(a_L/\sqrt{w_L}) \end{bmatrix}^{\textrm{T}}, \end{equation}

where $\mathcal {H}_{\sqrt {\lambda \eta _k} }(a)$ is an element-wise hard-thresholding operator defined as $a$ if $|a| < \sqrt {\lambda \eta _k}$ and zero otherwise.

Particularly if one considers the initial step size to be extremely small $\eta _1 \ll 1$ then the second term in (3.18) can be neglected. Thus, for $i=1,\ldots ,L$, with the following weighting scheme that penalizes fast decaying modes,

(3.21a,b)

\begin{equation} w_i = 1/\beta_i^{2}, \quad \beta_i = \begin{cases} \dfrac{1-\lvert e^{{\rm \Delta} t \lambda_i}\rvert^M}{1-\lvert e^{{\rm \Delta} t \lambda_i}\rvert} & \text{if } \lvert e^{{\rm \Delta} t \lambda_i} \rvert \neq 1,\\ M & \text{otherwise}, \end{cases} \end{equation}

one immediately realizes the thresholding criterion for $i$th entry of $\boldsymbol {a}$ becomes

(3.22)

\begin{equation} \sqrt{\lambda \eta_k} > |\alpha_i/\sqrt{w_i}| = |\alpha_i\beta_i|. \end{equation}

Then plugging (3.21a,b) in (3.18), the first iteration in (3.18) reduces to mode selection with Kou's criterion in (3.15). Normally, $\beta _i$ is very large for unstable modes and small for decaying modes. It is important to note that (a) such a choice of $\boldsymbol {w}$ preferring unstable/long-lasting modes over decaying modes is still user defined; (b) optimization is in an a priori sense to obtain DMD. Thus, the insufficiency of the a priori formulation to account for temporal evolution is indeed compensated by this criterion, while DMD in an a posteriori formulation (e.g. spDMD) includes such an effect implicitly in the optimization. Hence, it is possible that in some circumstances spDMD and Kou's criterion could achieve similar performance (Kou & Zhang Reference Kou and Zhang2017).

Lastly, as summarized in figure 4, it is important to mention the similarities and differences between spKDMD and spDMD: (1) spKDMD will refit Koopman modes while spDMD does not; and (2) the amplitude for all the modes in spKDMD is fixed as unity while it has to be determined in spDMD.

Figure 4. Differences and similarities among existing mode selection methods.

3.3. Hyperparameter selection

For simplicity, hyperparameter selection for KDMD is only discussed in this section. To fully determine a kernel in KDMD, one would have to choose the following:

(i) kernel type;
(ii) kernel parameters, e.g. scale parameters $\sigma$;
(iii) rank truncation $r$.

In this work, for simplicity, we fix the kernel type to be an isotropic Gaussian. Motivated by previous work on error evaluation in Koopman modes by Zhang et al. (Reference Zhang, Dawson, Rowley, Deem and Cattafesta2017), we consider evaluation with cross-validation on a priori mean normalized accuracy defined in (3.23) and (3.24) for the $i$th eigenfunction,

(3.23)

\begin{equation} \textrm{discrete form: } {Q}_i^{a} = \frac{1}{M-1}\sum_{j=1}^{M-1} \frac{|\varphi_i(\boldsymbol{x}_{j+1}) - \lambda_i \varphi_i(\boldsymbol{x}_j) |}{\sqrt{\displaystyle\frac{1}{M}\sum_{k=1}^M \varphi_i^*(\boldsymbol{x}_k) \varphi_i(\boldsymbol{x}_k) } }, \end{equation}

(3.24)

\begin{equation}\textrm{continuous form: } {Q}_i^{a} = \frac{1}{M}\sum_{j=1}^{M} \frac{|\dot{\boldsymbol{x}}_{j} \boldsymbol{\cdot} \boldsymbol{\nabla}_{\boldsymbol{x}} \varphi_i(\boldsymbol{x}_{j}) - \lambda_i \varphi_i(\boldsymbol{x}_j) |}{\sqrt{\displaystyle\frac{1}{M}\sum_{k=1}^M \varphi_i^*(\boldsymbol{x}_k) \varphi_i(\boldsymbol{x}_k) } }, \end{equation}

on validation data for a different number of rank truncation and kernel parameters.

Note that evaluation on maximal instead of mean normalized accuracy would lead to the error metric to be strongly dependent on the local sparsity of training data in the feature space. This is particularly true for when a single trajectory for a high-dimensional dynamical system is used, and random shuffled cross-validation is performed (Pan & Duraisamy Reference Pan and Duraisamy2018).

For each set of hyperparameters, we first compute the number of eigenfunctions of which the error defined in (3.23) and (3.24) is below a certain threshold on both training and validation data for each fold of cross-validation. Then we compute the average number of such eigenfunctions over all folds which indicates the quality of the corresponding subspace. Finally, we plot the average number vs rank truncation $r$ and kernel scale parameters $\sigma$ to select hyperparameters.

3.4. Implementation

We implement the described framework in Python with moderate parallelism in each module. We use scipy.special.hermitenorm (Jones, Oliphant & Peterson Reference Jones, Oliphant and Peterson2001) to generate normalized Hermite polynomials and MultiTaskElasticNet in the scikit-learn (Pedregosa et al. Reference Pedregosa2011) for multi-task feature learning where we set the maximal iteration as $10^5$ and tolerance as $10^{-12}$. Message passing interface parallelism using mpi4py (Dalcin et al. Reference Dalcin, Paz, Kler and Cosimo2011) is used for the grid search in hyperparameter selection. To prepare data with hundreds of gigabytes collected from high-fidelity simulations, a distributed SVD written in C++ named the parallel data processing (PDP) tool is developed for dimension reduction. A brief description of this tool is given in appendix D.

4. Results and discussion

4.1. Two-dimensional fixed point attractor with the known finite Koopman-invariant subspace

We first consider a simple fixed point nonlinear dynamical system which has an analytically determined, finite-dimensional non-trivial Koopman-invariant subspace (Brunton et al. Reference Brunton, Brunton, Proctor and Kutz2016a; Kaiser et al. Reference Kaiser, Kutz and Brunton2017) to show the effectiveness of the proposed method. We consider a continuous-time formulation. The governing equation for the dynamical system is given as

(4.1)

\begin{equation} \dot{x}_1 = \mu x_1, \end{equation}

(4.2)

\begin{equation}\dot{x}_2 = \lambda (x_2 - x_1^2), \end{equation}

where $\mu = -0.05, \lambda = -1$. One natural choice of the minimal Koopman eigenfunctions that linearly reconstructs the state is (Brunton et al. Reference Brunton, Brunton, Proctor and Kutz2016a)

(4.3a–c)

\begin{equation} \varphi_1(\boldsymbol{x}) = x_2 - \lambda x_1^2 /(\lambda-2\mu), \quad \varphi_2(\boldsymbol{x})= x_1, \quad \varphi_3(\boldsymbol{x}) = x_1^2, \end{equation}

with eigenvalues $\lambda =-1$, $\mu =-0.05$, $2\mu =-0.1$, respectively.

The way we generate training, validation and testing data is described below with distribution of the data shown in figure 5.

(i) Training data: a point cloud with $M = 1600$ pairs of $\{\boldsymbol {x}^{(i)}, \boldsymbol {\dot {x}}^{(i)}\}_{i=1}^{M}$ is generated by Latin hypercube sampling (Baudin Reference Baudin2015) within the domain $x_1,x_2 \in [-0.5,0.5]$.
(ii) Validation data: a single trajectory with initial condition as $x_1(0)=x_2(0)=0.4$, sampling time interval ${\rm \Delta} t = 0.03754$ from $t=0$ to $t=30$.
(iii) Testing data: a single trajectory with initial condition as $x_1(0)=x_2(0) = -0.3$, sampling time interval ${\rm \Delta} t =0.06677$ from $t=0$ to $t=40$.

Figure 5. Data distribution for a 2-D fixed point attractor.

As an illustration, we consider two models to approximate the Koopman operator from training data:

(i) a continuous-time EDMD with normalized Hermite polynomials up to fifth-order with $L=36$ features;
(ii) a continuous-time KDMD with isotropic Gaussian kernel $\sigma = 2$ with reduced rank $r = L = 36$.

Details of the above choices based on the steps of hyperparameter selection in § 3.3 are given in § C.1.

4.1.1. Results for continuous-time EDMD with mode selection

As displayed in figure 6, we begin with an error analysis of all of the eigenmodes on validation data in figure 5 according to linearly evolving error $Q$ defined in (3.2) and $R$ defined in (3.4). From figure 6(a), considering both the linearly evolving error and the quality of the reconstruction, we choose the cut-off threshold at $\hat {L}=10$. We observe a sharp cut-off in figure 6(a) around the number of selected eigenmodes $\hat {L} = 8$. This is a reasonable choice, since from the eigenvalues in figure 6(b), we notice the analytic Koopman eigenmodes are not covered until first eight accurate eigenmodes are selected. Note that the legend in figure 6(a) is ordered by the maximal deviation from linear evolution, e.g. the second most accurate mode is the 34th mode with zero eigenvalue. Indeed, the first four eigenfunctions ($\textrm {index}=1,34,35,36$) are redundant in terms of reconstruction in this problem (this could be interesting if the system is instead Hamiltonian). The fifth ($\textrm {index}=29$) and sixth ($\textrm {index}=33$) eigenmodes correspond to two of the analytic eigenfunctions that span the system, and the seventh ($\textrm {index}=32$) eigenmode is indeed the product of the fifth and sixth eigenfunctions. Similarly, the ninth and tenth eigenfunctions ($\textrm {index}=31,28$) also appear to be the polynomial combination of the true eigenfunctions.

Figure 6. Error analysis of 36 eigenmodes from continuous-time EDMD for the 2-D fixed point attractor. (a) Trends of linearly evolving error $Q$ and reconstruction error $R$. (b) Temporal evolution of relative error for top $\hat {L}=10$ accurate eigenmodes.

According to (3.9), to further remove redundant modes, we perform multi-task feature learning on the $\hat {L}=10$ eigenmodes. The corresponding ElasticNet path is shown in figure 7. Note that each $\alpha$ corresponds to a minimizer of (3.9). To choose a proper $\alpha$ so as to find a proper Koopman-invariant subspace, it is advisable to check the trend of the normalized reconstruction error and number of non-zero features. Given the dictionary, for simple problems for which there exists an exact Koopman-invariant subspace that also spans system state, a proper model can be obtained by selecting $\alpha \approx 10^{-6}$ which ends up with only three eigenfunctions, as shown in figure 7. Moreover, as is common for EDMD with polynomial basis (Williams et al. Reference Williams, Rowley and Kevrekidis2014, Reference Williams, Kevrekidis and Rowley2015), a pyramid of eigenvalues appears in figure 7.

Figure 7. Result of multi-task feature learning on top $\hat {L}=10$ accurate eigenmodes from continuous-time EDMD for the 2-D fixed point attractor. (a) ElasticNet path for $x_1$. (b) ElasticNet path for $x_2$. (c) Trends of normalized reconstruction error and number of non-zero terms vs $\alpha$. (d) Selected continuous-time eigenvalues.

As shown in figure 8, both the identified eigenvalues, and contour of the phase angle and magnitude of selected eigenfunctions from spEDMD match the analytic eigenfunctions given in (4.3a–c) very well. As expected, the prediction on unseen testing data is also excellent. Note that the indices of true eigenfunctions $\varphi _1$, $\varphi _2$ and $\varphi _3$ ordered by Kou's criterion in (3.15) are 8, 5 and 6. In this case, all of the true eigenfunctions are missing in the top three modes chosen by Kou's criterion. Indeed, the top three modes chosen by Kou's criterion have nearly zero eigenvalues.

Figure 8. Sparsely selected eigenfunctions and eigenvalues from continuous-time EDMD for the 2-D fixed point attractor with the corresponding prediction on testing data with an unseen initial condition $x_1(0)=x_2(0)=-0.3$. From left to right, the top three figures show contours of magnitude of eigenfunctions, while the bottom three figures are those of the phase angle of eigenfunctions. Last column: comparison between prediction and ground truth for an unseen testing trajectory.

4.1.2. Results of continuous-time KDMD with mode selection

The mode selection algorithm presented above can be applied in precisely the same form to KDMD, given a set of eigenfunctions and eigenvalues. Error analysis of eigenfunctions is shown in figure 9, from which we choose $\hat {L}=10$ as well. As before, eigenvalues ordered by maximal deviation from linear evolution are shown in the legend in figure 9(a). Again, in figure 9(a), we observe a sharp decrease in the reconstruction error after the four most accurate modes are included. This is expected, as the second to fourth most accurate modes are analytically exact from the figure 9(b). As shown in figures 10 and 11, it is confirmed that both spEDMD and spKDMD arrive at the same analytic eigenfunctions with difference up to a constant factor. It should be noted that, although polynomials are not analytically in the RKHS (Minh Reference Minh2010), good approximations can still be achieved conditioned on the data we have, i.e. $x_1,x_2\in [-0.5,0.5]$. Again, the indices of true eigenfunctions $\varphi _1$ to $\varphi _3$ ordered by Kou's criterion are 8, 2 and 3. Hence, $\varphi _1$ is missing in the top three modes chosen by Kou's criterion. Similarly, the first mode chosen by Kou's criterion has a zero eigenvalue.

Figure 9. Error analysis of 36 eigenmodes from continuous-time KDMD for the 2-D fixed point attractor. (a) Trends of linearly evolving error $Q$ and reconstruction error $R$. (b) Temporal evolution of relative error for top $\hat {L}=10$ accurate eigenmodes.

Figure 10. Result of multi-task feature learning on top $\hat {L}=10$ accurate eigenmodes from continuous-time KDMD for the 2-D fixed point attractor. (a) ElasticNet path for $x_1$. (b) ElasticNet path for $x_2$. (c) Trends of normalized reconstruction error and number of non-zero terms vs $\alpha$. (d) Selected continuous-time eigenvalues.

Figure 11. Sparsely selected eigenfunctions and eigenvalues from continuous-time KDMD for the 2-D fixed point attractor with corresponding prediction on testing data with an unseen initial condition $x_1(0)=x_2(0)=-0.3$. From left to right, the top three figures show contours of the magnitude of eigenfunctions, while the bottom three figures are those of the phase angle of eigenfunctions. Last column: comparison between predictions and ground truth for an unseen testing trajectory.

4.1.3. Effect of SVD regularization

Singular value decomposition truncation is a standard regularization technique in the solution of a potentially ill-conditioned linear system. In the standard EDMD in (2.7), for example, $\boldsymbol {G}$ could be potentially ill-conditioned, leading to spurious eigenvalues in $\boldsymbol {K}$. Hence, Williams et al. (Reference Williams, Rowley and Kevrekidis2014) recommend SVD truncation in (2.8) to obtain a robust solution of $\boldsymbol {K}$. Effectively, it shrinks the number of EDMD/KDMD modes. It has to be recognized, however, that the mode reduction from SVD truncation is not the same as mode selection. Most importantly, one should not confuse numerical spuriousness from poor numerical conditioning with functional spuriousness from the orthogonal projection error of the Koopman operator (Korda & Mezić Reference Korda and Mezić2018b). Indeed, SVD truncation does not always lead to better approximation of a Koopman-invariant subspace. It is rather a linear dimension reduction that optimally preserves the variance in the feature space conditioned on the training data without knowing the linear evolution property of each feature.

For demonstration, we take the above fixed point attractor system where we use the same data and standard EDMD algorithm with the same order of Hermite polynomials. The results of prediction on the unseen testing data shown in figure 12 indicate that even though only three eigenfunctions (indeed three features in the Hermite polynomial) are required, standard EDMD fails to identify the correct eigenfunctions with 26 SVD modes while the results improve with 31 modes retained. The sensitivity of standard EDMD with respect to SVD truncation is likely a result of the use of normalized Hermite polynomials where SVD truncation would lead to a strong preference over the subspace spanned by the higher-order polynomials. We did not observe such a sensitivity for KDMD, unless the subspace is truncated below 10 modes.

Figure 12. Standard EDMD prediction on an unseen trajectory with different SVD truncations for a fixed point attractor.

4.2. Two-dimensional, transient flow past a cylinder

As a classical example for Koopman analysis in fluid dynamics (Bagheri Reference Bagheri2013; Williams et al. Reference Williams, Rowley and Kevrekidis2014; Otto & Rowley Reference Otto and Rowley2019), transient two-dimensional flow past a cylinder (figure 13) is considered at different Reynolds numbers ($Re = {U_{\infty }D}/{\nu }$), where $U_{\infty }=1$ is the free streamvelocity, $D=2$ is the diameter of the cylinder and $\nu$ is the kinematic viscosity. The two-dimensional incompressible Navier–Stokes equations govern the dynamics with far-field boundary conditions for pressure and velocity and no-slip velocity on the cylinder surface. Numerical simulations are performed using the icoFoam solver in OpenFOAM (Jasak, Jemcov & Tukovic Reference Jasak, Jemcov and Tukovic2007) solving the two-dimensional incompressible Navier–Stokes equations. We explore $Re = 70, 100, 130$ by changing the viscosity. The pressure field is initialized with independent and identically distributed (i.i.d.) Gaussian noise $\mathcal {N}(0, 0.3^2)$. The velocity is initialized with a uniform free stream velocity superimposed with i.i.d. Gaussian noise $\mathcal {N}(0, 0.3^2)$. It should be noted that the noise is generated on the coarsest mesh shown in figure 13, and interpolated to the finer meshes. Grid convergence with increasing mesh resolution is assessed by comparing the temporal evolution of the drag coefficient $C_D$ and lift coefficient $C_L$.

Figure 13. (a) Illustration of computational mesh for a two-dimensional cylinder wake problem (coarsest). (b) Contour of vorticity $\omega _z$ for $Re=70$ when vortex shedding is fully developed $(t = 175)$.

Note that the dynamics of a cylinder wake involves four regimes: near-equilibrium linear dynamics, nonlinear algebraic interaction between equilibrium and the limit cycle, exponential relaxation rate to the limit cycle and periodic limit cycle dynamics (Chen et al. Reference Chen, Tu and Rowley2012; Bagheri Reference Bagheri2013). Instead of considering data only from each of these regimes separately (Chen et al. Reference Chen, Tu and Rowley2012; Taira et al. Reference Taira, Hemati, Brunton, Sun, Duraisamy, Bagheri, Dawson and Yeh2019) or with only the last two regimes where exponential linear dynamics is expected (Bagheri Reference Bagheri2013), we start collecting data immediately after the flow field becomes unstable, and stop after the flow field experiences several limit cycles. Note that the regime with algebraic interaction is non-modal (Schmid Reference Schmid2007) and, therefore, cannot be expressed as individual exponential terms (Bagheri Reference Bagheri2013). This becomes a challenging problem for DMD (Chen et al. Reference Chen, Tu and Rowley2012). The sampling time interval is ${\rm \Delta} t = 0.1 t_{ref}$, where $t_{ref} = D/U_{\infty }$.

For each $Re$, 891 snapshots of full velocity field $U$ and $V$ with sampling time interval ${\rm \Delta} t$ are collected as two matrices of size $N_{{grid}}\times N_{{snapshots}}$. Following this, each velocity component is shifted and scaled (normalized) between $[-1,1]$. Since space-filling sampling in any high-dimensional space would be extremely difficult, we split the trajectory into training, validation and testing data by sampling with strides similar to the ‘even-odd’ sampling scheme previously proposed by Otto & Rowley (Reference Otto and Rowley2019). As illustrated in figure 14, given index $i$, if $i \mod 3 = 0$, the $i$th point belongs to training set while $i \mod 3 = 1$ corresponds to validation, and $i \mod 3 = 2$ for testing data. Consequently, the time interval in the training, testing, validation trajectory is tripled as $3{\rm \Delta} t = 0.3 t_{ref}$. Thus, training, validation and testing data are split into 297 snapshots each. Finally, we stack data matrices along the first axis corresponding to the number of grid points, and perform a distributed SVD described in § 3.4. For all three cases, the top 20 POD modes are retained, corresponding to 99 % of kinetic energy. Next, we apply our algorithm to discrete-time KDMD with isotropic Gaussian kernel on this reduced-order nonlinear system. We choose the hyperparameters $\sigma = 3$ and $r=180$. Further details are given in § C.2.

Figure 14. Illustration of splitting a uniformly sampled single trajectory in high-dimensional phase space into training, validation and testing sets.

4.2.1. Results of discrete-time KDMD with mode selection

For all three Reynolds numbers, a posteriori error analysis is shown in figure 15. A good choice of the number of accurate modes $\hat {L}$ retained for reconstruction is around 60 since the corresponding maximal deviation from linear evolution is still around 5 % while the reconstruction error reaches a plateau after $\hat {L}>60$.

Figure 15. Trend of linear evolution error $Q$ and reconstruction error $R$ from discrete-time KDMD for the two-dimensional cylinder wake flow case; (a) $Re=70$, (b) $Re =100$, (c) $Re=130$.

After the mode selection on validation data, a $\alpha$-family of solutions is obtained with corresponding reconstruction error and the number of non-zero terms as shown in figure 16. Note that the chosen solution is highlighted as blue circles. As shown in table 2, nearly half of the accurate KDMD eigenmodes identified are removed with the proposed sparse feature selection. Note that for all three cases, the number of selected modes (around 32 to 34) is still larger than that required in neural network models (around 10) (Otto & Rowley Reference Otto and Rowley2019; Pan & Duraisamy Reference Pan and Duraisamy2020). This is because the subspace spanned by KDMD/EDMD relies on a predetermined dictionary rather than being data-adaptive like neural network models. Nevertheless, due to the additional expressiveness from nonlinearity, we will see in § 4.2.3 that spKDMD performs significantly better than DMD (Schmid Reference Schmid2010) and spDMD (Jovanović et al. Reference Jovanović, Schmid and Nichols2014), while enjoying the property of convex optimization at a much lower computational cost than the inherently non-convex and computationally intensive neural network counterparts.

Figure 16. Variation of reconstruction error $R$ and number of non-zero terms for the two-dimensional cylinder wake flow; (a) $Re=70$, (b) $Re =100$, (c) $Re=130$. The blue circle corresponds to selected $\alpha$.

Table 2. Summary of mode selection for discrete-time KDMD on two-dimensional cylinder wake flow.

The predictions of the top eight POD coefficients (denoted as $x_1$ to $x_8$) on testing data are displayed in figures 17–19. The results match very well with ground truth for all three cases. Figure 21 shows that there appear to be five clusters of selected eigenvalues while most of the modes are removed by the proposed algorithm. Similar observations were also made in Bagheri (Reference Bagheri2013) when DMD is applied to the full transient dynamics. This pattern consisting of a stable eigenvalue on the unit circle surrounded by several decaying eigenvalues is observed for all clusters. The stable eigenvalue contributes to limit cycle behaviour, while the decaying eigenvalues account for the transient phenomenon. Due to symmetry, only eigenvalues in the first quadrant are shown in the bottom row of figure 21. It is observed that the frequency associated with the type-$\textrm {II}$ cluster is approximately twice that of type-$\textrm {I}$. This is in good agreement with previous analytical results from the weakly nonlinear theory (Bagheri Reference Bagheri2013). The frequency $f$ is normalized as ${St} = fD/U_{\infty }$, where $St$ is the Strouhal number.

Figure 17. A posteriori prediction of testing trajectory for $Re = 70$ in terms of the top eight POD coefficients with spKDMD.

Figure 18. A posteriori prediction of testing trajectory for $Re = 100$ in terms of the top eight POD coefficients with spKDMD.

Figure 19. A posteriori prediction of testing trajectory for $Re = 130$ in terms of the top eight POD coefficients with spKDMD.

Recall that in the laminar parallel shedding region ($47 < Re < 180$), the characteristic Strouhal number $St$ scales with $-1/\sqrt {Re}$ (Fey, König & Eckelmann Reference Fey, König and Eckelmann1998). Therefore, it is expected that $St$ of both types tend toward higher frequency as $Re$ increases from 70 to 130. Furthermore, it is interesting to note that the corresponding Strouhal numbers for lift and drag when the system is on the limit cycle, $St_L$ and $St_D$ (we observe that each lift and drag coefficient exhibits only one frequency at the limit cycle regime for the range of $Re$ studied in this work.), coincide with the stable frequency of type-$\textrm {I}$ and $\textrm {II}$, respectively, as indicated in figure 21. This is due to the anti-symmetrical/symmetrical structure of the velocity field of type-I/II Koopman mode, respectively, as can be inferred from figure 22. A schematic is also shown in figure 20.

Figure 20. Illustration of the structure of velocity field for the lower (top) and higher frequency (bottom) Koopman modes. The arrow roughly indicates the velocity direction.

Figure 21. Discrete-time eigenvalue distribution of full KDMD and spKDMD; (a) $Re=70$, (b) $Re =100$, (c) $Re=130$. Blue dot: full KDMD eigenvalues. Red dot: spKDMD eigenvalues. Bottom row: zoomed image. Roman numerals $\textrm {I}$ and $\textrm {II}$ correspond to two types of eigenvalue clusters of distinct frequencies, with each of them enclosed by cyan dashed circles. The green/cyan solid line corresponds to $St_D/St_{L}$.

The higher frequency mode is symmetrical (along the free stream direction) in $U$ and anti-symmetrical in $V$. As a consequence, this only contributes to the oscillation of drag. The lower frequency mode is anti-symmetrical in $U$ and symmetrical in $V$, and only contributes to the oscillation of lift. Thus, the fluctuation in the lift mostly results from the stable mode in type-I, while that for drag results from the stable mode in type-II with twice the frequency.

Finally, several representative Koopman modes from spKDMD for three $Re$ are shown in figures 22–24. For a better comparison of mode shapes, contributions from the stable modes of type-I and II with a threshold of 0.001 at $t=0$ is displayed in the top left of figure 25. To remove the effect of time, the ‘envelope’ of the mode shape, i.e. time average of the isocontours is shown in the top right of figure 25. From these results, we observe the following interesting phenomena.

• The minimal dimension of the Koopman-invariant subspace that approximately captures the limit cycle attractor for all three $Re$ that fall into laminar vortex shedding regime (White & Corfield Reference White and Corfield2006) is five, which is consistent with previous multi-scale expansion analysis near the critical $Re$ (Bagheri Reference Bagheri2013).
• The lobes of stable Koopman modes in the type-I cluster show an approximately 50 % larger width than those in a type-II cluster.
• Similarity/colinearity among Koopman modes within each cluster is observed. Such a finding is previously reported in the theoretical analysis by Bagheri (Reference Bagheri2013). A similarity in spatial structure exists among the Koopman modes belonging to the same family, even though the structures are clearly phase lagged.
• As $Re$ increases from 70 to 130, mode shapes flatten downstream while expanding upstream.
• At $Re=70$, the shear layer in the stable Koopman modes continues to grow within the computational domain. However, at $Re = 100, 130$, the shear layer stops growing after a certain distance that is negatively correlated with $Re$.

Figure 22. Contours of Koopman modes of $Re=70$ cylinder wake flow at $t=0$. Red squares indicate stable modes.

Figure 23. Contours of Koopman modes of $Re=100$ cylinder wake flow at $t=0$. Red squares indicate stable modes.

Figure 24. Contours of Koopman modes of $Re=130$ cylinder wake flow at $t=0$. Red squares indicate stable modes.

Figure 25. Top left: contribution of stable Koopman modes corresponding to type-I and type-II clusters for $Re = 70,100,130$ at $t=0$ visualized with threshold 0.001. Top right: time-averaged isocontour of top left plot. Bottom: tendency of ‘envelope’ of type-I and II modes as $Re$ increases. Separation lines in $U$ component of type-I are drawn for $Re = 70$ (black), $Re = 100$ (red) and $Re = 130$ (blue).

4.2.2. Net contribution of clustered Koopman modes

By examining figures 22–24, we observe that the colinearity of the spatial structures among each cluster can cause cancellations. A famous example of such non-oscillatory cancellation is the ‘shift mode’ defined by Noack et al. (Reference Noack, Afanasiev, Morzyński, Tadmor and Thiele2003), in conjunction with two oscillating modes. As the ‘shift mode’ decays, the stationary component of the flow transitions from the unstable equilibrium to the time-averaged mean. The existence of such non-oscillatory decaying Koopman modes is also confirmed by weakly nonlinear analysis (Bagheri Reference Bagheri2013). Interestingly, our algorithm is able to identify not only the non-oscillatory cancellation (from the ‘shift mode’) but also oscillatory cancellations from two clusters with distinct frequencies. Such cancellations elegantly explain why no unstable Koopman eigenvalues appear in this flow given the coexistence of an attracting limit cycle and an unstable equilibrium. These modes could be understood as ‘oscillatory shift modes’, as a generalization of the model proposed by Noack et al. (Reference Noack, Afanasiev, Morzyński, Tadmor and Thiele2003).

Since modes within each cluster appear to be colinear to each other, it is intriguing to investigate the net contribution from each cluster. For the above $Re = 70$ case, effects from different clusters at different times in the transient regime are shown in figure 26. We note the following interesting observations throughout the transient period from $t=80$ to $t=200$.

• The net contribution from ‘cluster 0’ does not exhibit strong oscillations. For the contribution from ‘cluster 0’, the $U$ component shows a decrease in the length of the reverse flow region behind the cylinder with an increase in the low speed wake layer thickness while the $V$ component remains unchanged. This is similar to the effect of ‘shift mode’ which also characterizes the decay of recirculation behind the cylinder.
• Initially at $t=80$, the net contribution from ‘cluster I’ is rather weak primarily due to the ‘cancellation’ from the lagged phase. Furthermore, the development of vortex shedding downstream from the top and bottom surfaces of the cylinder is nearly parallel. This corresponds to the initial wiggling of the low speed wake flow downstream. As time increases, the pattern of corresponding vortices develops away from the centreline.
• Although the net contribution from ‘cluster 0+I’ captures most of the flow features from ‘full modes’ throughout the transient regime, with increasing time, the net contribution from ‘cluster II’ becomes more important and contributes to the strength of vortex shedding downstream.

Figure 26. Contribution of Koopman modes at cluster level in the transient regime of $Re=70$ case. Here ‘cluster 0’ denotes the cluster near the real axis in figure 21; ‘cluster I’ and ‘cluster II’ take the effect of a mirror cluster in the fourth quadrant into account; ‘full modes’ denotes the aggregated contribution of Koopman modes.

4.2.3. Comparison with DMD and spDMD at $Re=70$

To confirm the advantage of spKDMD over DMD (Schmid Reference Schmid2010) and spDMD (Jovanović et al. Reference Jovanović, Schmid and Nichols2014), we compare the following three models on the unsteady cylinder wake flow at $Re=70$:

(i) spKDMD on the top 20 POD coefficients with 34 modes selected;
(ii) DMD on the top 20 POD coefficients;
(iii) spDMD (we used the original Matlab code from http://people.ece.umn.edu/users/mihailo/) on the top 200 POD coefficients with $\alpha$ chosen carefully such that only 34 modes are selected.

Note that DMD with the top 200 POD coefficients, i.e. $r=200$ in SVD-DMD (Schmid Reference Schmid2010), contains 10 times stable/decaying harmonics as DMD on the top 20 modes. Hence, it is not surprising to expect that the corresponding prediction of the evolution of the top 20 POD coefficients to be very good (not shown for clarity). However, to make a fair comparison against spKDMD, we consider spDMD (Jovanović et al. Reference Jovanović, Schmid and Nichols2014) on the top 200 POD coefficients with 34 modes selected. (Although there are 200 POD coefficients used for spDMD and 20 for KDMD, it is not an unfair comparison given that the same number of spatial modes are selected. Furthermore, these are reduced-order models of the same full-order dynamical system.) As shown in figure 27, given the same number of eigenmodes, spKDMD performs remarkably better than spDMD, especially in the transient regime. This is likely due to the inability of DMD in capturing nonlinear algebraic growth in the transient regime (Bagheri Reference Bagheri2013). Sparsity-promoting DMD overestimates the growth in $x_1$ and $x_2$ while ignoring a turnaround near the onset of the transient regime in $x_5$ and $x_8$. As expected, DMD with 20 POD coefficients performs the worst especially for the modes where transient effects are dominant. Given the results in figure 27, among all of the top eight POD coefficients, $x_6$ and $x_7$ appear to be most challenging to model: DMD and spDMD cannot match the limit cycle while spKDMD performs very well. Notice that the frequency in $x_6$ and $x_7$ corresponds to $St_D$. Hence, there will be a difference in predicting the fluctuation of $C_D$ between spDMD and spKDMD.

Figure 27. Comparison of a posteriori prediction on the top eight POD coefficients of the testing trajectory between spKDMD, DMD (Schmid Reference Schmid2010) and spDMD (Jovanović et al. Reference Jovanović, Schmid and Nichols2014) for the two-dimensional cylinder flow at $Re=70$. Here $x_i$ denotes the $i$th POD coefficient.

Finally, comparison of the identified Koopman eigenvalues between DMD, spDMD and spKDMD is shown in figure 28. On one hand, both spDMD and spKDMD exactly capture the stable eigenmodes that correspond to $St_D$ and $St_L$. This is expected since DMD with 200 POD coefficients represents the dynamics very well, and deviation from limit cycle behaviour would be penalized in spDMD. On the other hand, several erroneous stable DMD modes are obtained by DMD. This explains the deviation of a posteriori prediction from the ground truth limit cycle in figure 27. For those decaying modes, similarity/colinearity is observed between two clusters of eigenvalue from spDMD and spKDMD. However, spKDMD contains more high-frequency modes than spDMD. Finally, it is interesting to note that, although the correct stable eigenvalues are captured accurately by both spDMD and spKDMD, the former does not capture accurate amplitudes for stable eigenvalues of type-II as seen in figure 27.

Figure 28. Comparison of identified eigenvalues between spKDMD, DMD (Schmid Reference Schmid2010) and spDMD (Jovanović et al. Reference Jovanović, Schmid and Nichols2014) for the two-dimensional cylinder flow at $Re=70$.

As a side note, when the temporal mean was used instead of maximal error in the definition of $Q$ in (3.2), spKDMD with the above setting was found to not find a stable eigenvalue corresponding to $ST_D$.

4.3. Three-dimensional transient turbulent ship airwake

Understanding unsteady ship-airwake flows is critical to design safe shipboard operations, such as takeoff and landing of fixed or rotary wing aircraft (Forrest & Owen Reference Forrest and Owen2010), especially when the wind direction becomes stochastic. Here we obtain snapshots from an unsteady Reynolds averaged Navier–Stokes (URANS) simulation of a ship airwake using FLUENT (Ansys 2016) with the shear layer corrected $k$–$\omega$ two-equation turbulence model. Unsteadiness arises from both bluff-body separation, and an abrupt change in wind direction. We consider a conceptual ship geometry called simple frigate shape 2 (SFS2). For the details of the geometry, readers are referred to Yuan, Wall & Lee (Reference Yuan, Wall and Lee2018). Configuration of the simulation set-up is shown in figure 29, where $\alpha _{\infty }$ denotes the angle of side wind.

Figure 29. (a) Geometry of the ship (SFS2). (b) Generated computational mesh.

To prepare a proper initial condition, a URANS simulation for $U_{\infty }=15\ \textrm {m}\ \textrm {s}^{-1}$ with $\alpha _{\infty }=0^{\circ }$, i.e. no side wind, is conducted to reach a physical initial condition. Following this, the last snapshot is used as the initial condition for a new run with an impulsive change in the wind direction from $\alpha _{\infty } = 0^{\circ }$ to $\alpha _{\infty }= \alpha _0=5^{\circ }$. The boundary conditions for outlet/input is pressure outlet/velocity inlet while the top and bottom are set as symmetry for simplicity. No-slip conditions are used at the surface of the ship. Further details on the simulation set-up are provided in Sharma et al. (Reference Sharma, Xu, Padthe, Friedmann and Duraisamy2019).

The sampling time interval is ${\rm \Delta} t = 0.1\,\textrm {s}$ with 500 consecutive samples of the three velocity components. This corresponds to several flow through times over the ship length. The domain of interest is a Cartesian region of mesh size $24\times 40\times 176$ starting on the rear landing deck. For dimension reduction, the trajectory of the top 40 POD coefficients (temporal mean subtracted) are collected, yielding $>$99 % kinetic energy preservation. Discrete-time KDMD with an isotropic Gaussian kernel is employed to perform nonlinear Koopman analysis where $\sigma = 200$, $r=135$ is chosen. Details of hyperparameter selection are provided in § C.3.

4.3.1. Results of discrete-time KDMD with mode selection

First, the error analysis of eigenfunctions is shown in figure 30, where we choose $\hat {L} \approx 60$ for good reconstruction. However, the level of deviation from linear evolution is around 10 %. This error will be reflected later as deviation in a posteriori prediction on the testing trajectory. (This implies difficulties in finding an accurate yet informative Koopman operator with isotropic Gaussian kernels. However, choosing an optimal kernel type is beyond the scope of this work.)

Figure 30. (a) Trend of linearly evolving error $Q$ and reconstruction error $R$ from discrete-time KDMD for the ship airwake. (b) Trend of linearly evolving error $Q$ and reconstruction error $R$ from discrete-time KDMD.

Second, the result of mode selection is summarized in table 3. Note that nearly 2/3 of the modes are removed. Furthermore, model performance in terms of a posteriori prediction on the testing trajectory is evaluated. Comparison between KDMD and the ground truth on contours of velocity components on a special $z$-plane (1.2 m above the landing deck) is shown in figure 31. Effects of an impulse change in wind direction in the following are observed from $t=1.5s$ to $t=30s$ and well captured by spKDMD:

• growth of a large shear layer in $U$ on the rear (left) side of the superstructure on the ship;
• a strong side wind sweep in $V$ above the landing deck propagating downstream;
• development of a vortex on the upwind (right) downstream side of the ship.

Figure 31. Contour of velocity components near the ship on the $z$-plane slice at $t=1.5s$, 3.9s, 9.0s, 30s. For each subfigure, (a) prediction from KDMD; (b) ground truth.

Table 3. Summary of mode selection for discrete-time KDMD on ship airwake.

Furthermore, three velocity components of the Koopman mode decomposition on the previously mentioned $z$-plane is shown in figure 32 together with the isocontour of vorticity coloured by the velocity magnitude for the two stable harmonics. Note that the frequency is normalized using $U_{\infty }$ as the reference velocity and funnel width of the ship $L=3.048\ \mathrm {m}$ as the characteristic length scale (Forrest & Owen Reference Forrest and Owen2010). As expected, the spKDMD yields a large number of decaying modes with only three non-trivial stable harmonics, since significant transient effects are present in the data. From the Koopman mode decomposition in figure 32, we observe the following:

• modes with eigenvalues close to each other exhibit a similar spatial structure;
• modes associated with higher frequency are dominated by smaller scales;
• the stable harmonic mode near $St = 0.09$ associated with a strong cone-shape vortex originating from the upwind (right) rear edge of the superstructure on the ship;
• the stable harmonic mode near $St = 0.068$ corresponds to vortex shedding induced by the funnel;
• the slowly decaying mode near $St=0.022$ shows unsteady circulation directly behind the superstructure;
• the steady mode ($St=0$) is consistent with the large circulation behind the superstructure, the roll-up wind on the side of the landing deck, and vertical suction towards the floor on the landing deck.

Figure 32. Contours of Koopman modes of ship airwake on the $z$-plane at $t=0$. For each subfigure, left: $U$, middle: $V$, right: $W$. Red squares indicate stable modes. Bottom: isocontour of vorticity coloured by velocity magnitude zoomed in near the landing deck.

4.3.2. Comparison with spDMD

We again repeat the comparison between our spKDMD and spDMD (Jovanović et al. Reference Jovanović, Schmid and Nichols2014). Note that DMD on the first 40 POD modes performs poorly similar to § 4.2.3 and, therefore, is not shown here. To make a fair comparison against the spKDMD from the previous subsection, however, we collect the first 200 POD coefficients for spDMD to ensure that a sufficient number of modes are used to fit the trajectory well. We then carefully choose the penalty coefficient in spDMD to ensure that the same number of modes are retained as in spKDMD. As shown in figure 33, within the time horizon $t<50$, a posteriori evaluation shows that spKDMD offers much improved predictions compared with spDMD (Jovanović et al. Reference Jovanović, Schmid and Nichols2014) on the testing trajectory.

Figure 33. Comparison of a posteriori prediction of the four most significant POD coefficients of the testing trajectory between spKDMD and spDMD (Jovanović et al. Reference Jovanović, Schmid and Nichols2014) for the three-dimensional (3-D) ship-airwake flow. Here $x_i$ denotes the $i$th POD coefficient.

Moreover, as further illustrated in figure 34(a), eigenvalues identified from spKDMD only contain two stable modes while nearly all eigenvalues from spDMD are located near the unit circle, among which there are 30 out of 56 slightly unstable modes. These unstable modes inevitably lead to the identified system being numerically unstable when predicting beyond the current training time horizon, whereas spKDMD predicts a ‘physically consistent’ limit cycle behaviour. As indicated in figure 34(b), such instability is related to the inability of the (linear) features to approximate the Koopman-invariant subspace, where only eight modes are within 10 % of maximal deviation from linear evolution, compared with 60 modes in KDMD. We note that similar observations of the drastically different eigenvalue distribution were reported in the original KDMD paper (Williams et al. Reference Williams, Rowley and Kevrekidis2014).

Figure 34. (a) Comparison of identified eigenvalues between spKDMD and spDMD (Jovanović et al. Reference Jovanović, Schmid and Nichols2014) for the 3-D ship-airwake flow. (b) Trend of linear evolving error $Q$ and reconstruction error $R$ from DMD for the 3-D ship-airwake flow.

5. Conclusions

Two classical nonlinear approaches for the approximation of Koopman operator: EDMD and KDMD are revisited. From an algorithmic perspective, the main contributions of this work are (a) sparsity-promoting techniques based on a posteriori error analysis, and (b) multi-task learning techniques for mode selection as an extension of spDMD into the nonlinear variants. Furthermore, analytical relationships between spDMD, Kou's criterion and the proposed method are derived from the viewpoint of optimization. The algorithm (code available at https://github.com/pswpswpsw/SKDMD) is first evaluated in detail on a simple two-state dynamical system, for which the Koopman decomposition is known analytically.

If one is only interested in the post-transient dynamics of the system on an attractor, linear observables with time delays are sufficient to extract an informative Koopman-invariant subspace. Thus, the present techniques are evaluated on two fluid flows which involve strong transients: two-dimensional flow over a cylinder at different Reynolds numbers, and a three-dimensional (3-D) ship airwake. We demonstrate the effectiveness of discovering accurate and informative Koopman-invariant subspaces from data and constructing accurate reduced-order models from the viewpoint of Koopman theory. Furthermore, with the proposed algorithm, the parametric dependency of Koopman mode shapes on the Reynolds number is investigated for the cylinder flows. In this case, as $Re$ increases from 70 to 130, the shape of stable modes becomes flatter downstream and larger upstream. Moreover, the similarity of mode shapes between Koopman modes with similar eigenvalues is observed in both fluid flows. Specifically, five clusters of eigenvalues are observed in the case of a two-dimensional cylinder wake flow which is confirmed with weakly nonlinear theoretical analysis from Bagheri (Reference Bagheri2013). Type-I, II clusters are found to correspond to fluctuations in lift and drag, respectively. We identify non-oscillatory as well as oscillatory cancellations from the above two clusters with distinct frequencies. These modes could be understood as ‘oscillatory shift modes’, as a generalization of the model proposed by Noack et al. (Reference Noack, Afanasiev, Morzyński, Tadmor and Thiele2003). For the 3-D ship-airwake case, two stable modes, and one slowly decaying mode with distinct frequencies and mode shapes resulting from vortex shedding are extracted, and accurate predictive performance is observed in the transient regime.

Acknowledgements

We thank the three anonymous reviewers whose comments and suggestions helped improve and clarify this manuscript. This research was supported by the DARPA Physics of AI Program under the grant ‘Physics Inspired Learning and Learning the Order and Structure of Physics’, and by the NSF CMMI program under the grant ‘A Diagnostic Modeling Methodology for Dual Retrospective Cost Adaptive Control of Complex Systems’. Computing resources were provided by the NSF via the grant ‘MRI: Acquisition of ConFlux, A Novel Platform for Data-Driven Computational Physics’.

Declaration of interests

The authors report no conflict of interest.

Appendix A. Proof for the identity for (2.22)

Consider the economy SVD of $\boldsymbol {A}^{'} = \boldsymbol {U} \boldsymbol {S} \boldsymbol {V}^{\textrm {H}}$. Since the column space of $\boldsymbol {A}^{'}$ spans the column space of $\boldsymbol {Z}_r$, there exists an invertible matrix $\boldsymbol {P}$ such that $\boldsymbol {Z}_r \boldsymbol {P} = \boldsymbol {U}$. Hence,

(A1)

\begin{equation} \boldsymbol{A}^{'} \boldsymbol{Z}_r = \boldsymbol{V} \boldsymbol{S} \boldsymbol{U}^{\textrm{H}} \boldsymbol{Z}_r = \boldsymbol{V} \boldsymbol{S} \boldsymbol{P}^{\textrm{H}} \boldsymbol{Z}_r^{\textrm{H}} \boldsymbol{Z}_r = \boldsymbol{V}\boldsymbol{S} \boldsymbol{P}^{\textrm{H}}, \end{equation}

and $((\boldsymbol {A}^{'} \boldsymbol {Z}_r)^{\textrm {H}} \boldsymbol {A}^{'} \boldsymbol {Z}_r)^{+} = (\boldsymbol {P}^{-1})^{\textrm {H}} \boldsymbol {S}^{-2} \boldsymbol {P}^{-1}$. Thus,

(A2)

\begin{equation} \boldsymbol{A}^{'} \boldsymbol{A}^{'\textrm{H}} \boldsymbol{Z}_r (\boldsymbol{Z}^{\textrm{H}}_r \boldsymbol{A}^{'} \boldsymbol{A}^{'\textrm{H}} \boldsymbol{Z}_r)^{+} = (\boldsymbol{U} \boldsymbol{S} \boldsymbol{V}^{\textrm{H}}) (\boldsymbol{V} \boldsymbol{S} \boldsymbol{P}^{\textrm{H}} ) (\boldsymbol{P}^{{-}1})^{\textrm{H}} \boldsymbol{S}^{{-}2} \boldsymbol{P}^{{-}1} = \boldsymbol{U} \boldsymbol{P}^{{-}1} = \boldsymbol{Z}_r. \end{equation}

Appendix B. Computational complexity

Computational cost of the proposed framework can be divided in three parts:

(i) distributed SVD;
(ii) KDMD/EDMD algorithms;
(iii) multi-task feature learning with the parameter sweep (solving (3.9) with different values of $\alpha$).

The computational complexity for each step in the algorithm is summarized in table 4, where $n$ is the dimension of the system state, $M$ is the number of snapshots in the training (for conciseness, we assume that the number of training snapshots equals the number of validation snapshots.), $r$ is the rank of the reduced system after SVD, $\hat {L}$ is the user-defined cut-off for ‘accurate’ features and $N_{iter}$ is the maximal number of iterations user defined to achieve a residual threshold, e.g. $10^{-12}$.

Table 4. Computational complexity of each step in the proposed sparsity-promoting framework.

As shown in table 4, the theoretical computational complexity for multi-task ElasticNet with an $\alpha$ sweep is $O(N_{\alpha }N_{iter}\hat {L}^2r)$. Note that this is a worst case simply because – except for the first $\alpha$ – we reuse the previous optimal solution as the initial condition for the new objective function to start the iterative optimization process. Also, thanks to SVD-based dimension reduction, the cost scales linearly with the reduced system rank $r$. Moreover, the user-defined linearly evolving error truncation $\hat {L}$ helps reduce that complexity as well instead of scaling with the number of snapshots $M$. Lastly, there is a cubic theoretical complexity associated with the number of snapshots when applying KDMD. The number of snapshots in a typical high-fidelity simulation is $O(10^3)$. That is to say, $r<10^3$ and $\hat {L} < 10^3$. We note that computational efficiency can be further improved, but this will be left for future work.

Appendix C. Hyperparameter selection

C.1. Two-dimensional fixed point attractor

We perform grid search in parallel for the selection of $\sigma$ and the truncated rank $r$ over the range $\sigma \in [10^{-1}, 10]$ with 80 points uniformly distributed in the log sense and $r=36,50,70$ to find a proper combination of $r$ and $\sigma$. As shown in figure 35, the higher rank $r$ leads to a larger number of linearly evolving eigenfunctions. Thus, it is more crucial to choose a proper scale $\sigma$ than $r$ from figure 35. However, considering the simplicity of this problem, $\sigma =2$ and $r=36$ would suffice.

Figure 35. Hyperparameter search for isotropic Gaussian KDMD on the 2-D fixed point attractor.

C.2. Cylinder flow case

As the flow is not turbulent, we choose hyperparameters for the $Re=70$ case and fix them for all the other cases. We sweep $\sigma$ from $[1,10^5]$ with 30 points uniformly distributed in the log sense and $r = 120,140,160,180, 200$, as shown in figure 36. From the plot we choose $r=180$ and $\sigma =3$ for an approximate choice of the hyperparameter. Again, we observe that the number of accurate eigenfunctions first increases then decreases with $\sigma$ increasing and the saturation of rank truncation at higher $\sigma$ which is related to the variation in the characteristic scale of the features with respect to $\sigma$.

Figure 36. Hyperparameter search for isotropic Gaussian KDMD on transient cylinder wake flows.

C.3. Turbulent ship-airwake case

Grid search in parallel for the selection of $\sigma$ and $r$ is performed over the range $\sigma \in [1,10^5]$ with 50 points uniformly distributed in the log sense, $r=40,80,120, 130$. As shown in figure 37, a good choice of $\sigma$ can be 200 for the case of $\alpha _{\infty }=5^{\circ }$. Note that since the hyperparameter selection is performed with a five-fold cross-validation on the training data, we only have up to $166\times0.8 \approx 132$ data points, i.e. maximal possible rank is 132. While in the actual training, we have maximal rank up to 166. Note that as long as the system is well conditioned, the higher the rank, the richer the subspace. Here we take $\sigma = 200$ and $r=135$.

Figure 37. Hyperparameter search for isotropic Gaussian KDMD on a transient ship airwake.

Appendix D. Parallel data processing tool

The application of data-driven methodologies to increasingly large data sets has exposed the need for scalable tools. Method development is easily achieved in scripting environments, but for application to large multidimensional problems, memory and data I/O becomes a limiting factor. To illustrate this, consider a state-of-the art 3-D computation. Given the fact that there are multiple variables associated with each grid cell, and that reasonable meshes can be expected to contain millions to hundreds of millions of cells, work-space memory requirements of terabytes may be required. Utilizing available distributed tools and I/O strategies a generalized toolset, PDP, has been developed for application of complex methods to arbitrarily large data sets. Originally developed for use in reduced-order modelling, the generality of the toolset allows for a range of complex methodologies to be applied. This was leveraged for use in this work for the application of methods, as the problem size is intractable on desktop machines. The toolset leverages commonly available packages in the form of ScaLAPACK, PBLAS and PBLACS (Blackford et al. Reference Blackford1997) routines. ScaLAPACK is widely available on most computing resources where the original simulations are run.

References

REFERENCES

Ansys 2016 Ansys fluent user's guide, release 17.0. Ansys.Google Scholar

Arbabi, H., Korda, M. & Mezic, I. 2018 A data-driven Koopman model predictive control framework for nonlinear flows. arXiv:1804.05291.CrossRef Google Scholar

Arbabi, H. & Mezić, I. 2017 a Ergodic theory, dynamic mode decomposition, and computation of spectral properties of the Koopman operator. SIAM J. Appl. Dyn. Syst. 16 (4), 2096–2126.CrossRef Google Scholar

Arbabi, H. & Mezić, I. 2017 b Study of dynamics in post-transient flows using Koopman mode decomposition. Phys. Rev. Fluids 2 (12), 124402.CrossRef Google Scholar

Argyriou, A., Evgeniou, T. & Pontil, M. 2008 a Convex multi-task feature learning. Mach. Learn. 73 (3), 243–272.CrossRef Google Scholar

Argyriou, A., Pontil, M., Ying, Y. & Micchelli, C.A. 2008 b A spectral regularization framework for multi-task structure learning. In Advances in Neural Information Processing Systems, pp. 25–32.Google Scholar

Aronszajn, N. 1950 Theory of reproducing kernels. Trans. Am. Math. Soc. 68 (3), 337–404.CrossRef Google Scholar

Askham, T. & Kutz, J.N. 2018 Variable projection methods for an optimized dynamic mode decomposition. SIAM J. Appl. Dyn. Syst. 17 (1), 380–416.CrossRef Google Scholar

Azencot, O., Yin, W. & Bertozzi, A. 2019 Consistent dynamic mode decomposition. SIAM J. Appl. Dyn. Syst. 18 (3), 1565–1585.CrossRef Google Scholar

Bach, F., Jenatton, R., Mairal, J. & Obozinski, G. 2012 Optimization with sparsity-inducing penalties. Found. Trends® Mach. Learn. 4 (1), 1–106.Google Scholar

Bagheri, S. 2013 Koopman-mode decomposition of the cylinder wake. J. Fluid Mech. 726, 596–623.CrossRef Google Scholar

Baudin, M. 2015. pyDOE. https://github.com/tisimst/pyDOE.Google Scholar

Blackford, L.S., et al. 1997 ScaLAPACK Users’ Guide. Society for Industrial and Applied Mathematics.CrossRef Google Scholar

Bruce, A.L., Zeidan, V.M. & Bernstein, D.S. 2019 What is the Koopman operator? a simplified treatment for discrete-time systems. In 2019 American Control Conference (ACC), pp. 1912–1917. IEEE.CrossRef Google Scholar

Brunton, S.L., Brunton, B.W., Proctor, J.L., Kaiser, E. & Kutz, J.N. 2017 Chaos as an intermittently forced linear system. Nat. Commun. 8 (1), 19.CrossRef Google Scholar PubMed

Brunton, S.L., Brunton, B.W., Proctor, J.L. & Kutz, J.N. 2016 a Koopman invariant subspaces and finite linear representations of nonlinear dynamical systems for control. PloS one 11 (2), e0150171.CrossRef Google Scholar PubMed

Brunton, S.L., Proctor, J.L. & Kutz, J.N. 2016 b Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl Acad. Sci. 113 (15), 3932–3937.CrossRef Google Scholar PubMed

Budišić, M., Mohr, R. & Mezić, I. 2012 Applied koopmanism. Chaos 22 (4), 047510.CrossRef Google Scholar PubMed

Carlberg, K., Farhat, C., Cortial, J. & Amsallem, D. 2013 The gnat method for nonlinear model reduction: effective implementation and application to computational fluid dynamics and turbulent flows. J. Comput. Phys. 242, 623–647.CrossRef Google Scholar

Chen, K.K., Tu, J.H. & Rowley, C.W. 2012 Variants of dynamic mode decomposition: boundary condition, Koopman, and fourier analyses. J. Nonlinear Sci. 22 (6), 887–915.CrossRef Google Scholar

Cho, Y. & Saul, L.K. 2009 Kernel methods for deep learning. In Advances in Neural Information Processing Systems, pp. 342–350. Curran Associates, Inc.Google Scholar

Dalcin, L.D., Paz, R.R., Kler, P.A. & Cosimo, A. 2011 Parallel distributed computing using python. Adv. Water Resour. 34 (9), 1124–1139.CrossRef Google Scholar

Dawson, S.T.M., Hemati, M.S., Williams, M.O. & Rowley, C.W. 2016 Characterizing and correcting for the effect of sensor noise in the dynamic mode decomposition. Exp. Fluids 57 (3), 42.CrossRef Google Scholar

DeGennaro, A.M. & Urban, N.M. 2019 Scalable extended dynamic mode decomposition using random kernel approximation. SIAM J. Sci. Comput. 41 (3), A1482–A1499.CrossRef Google Scholar

Dongarra, J., Gates, M., Haidar, A., Kurzak, J., Luszczek, P., Tomov, S. & Yamazaki, I. 2018 The singular value decomposition: anatomy of optimizing an algorithm for extreme scale. SIAM Rev. 60 (4), 808–865.CrossRef Google Scholar

Dowell, E.H. & Hall, K.C. 2001 Modeling of fluid-structure interaction. Annu. Rev. Fluid Mech. 33 (1), 445–490.CrossRef Google Scholar

Drazin, P.G. & Reid, W.H. 2004 Hydrodynamic Stability. Cambridge University Press.CrossRef Google Scholar

Fey, U., König, M. & Eckelmann, H. 1998 A new Strouhal–Reynolds-number relationship for the circular cylinder in the range

$47< Re< 2\times 10^5$. Phys. Fluids 10 (7), 1547–1549.CrossRef Google Scholar

Forrest, J.S. & Owen, I. 2010 An investigation of ship airwakes using detached-eddy simulation. Comput. Fluids 39 (4), 656–673.CrossRef Google Scholar

Haseli, M. & Cortés, J. 2019 Efficient identification of linear evolutions in nonlinear vector fields: Koopman invariant subspaces. arXiv:1909.01419.CrossRef Google Scholar

Holmes, P., Lumley, J.L., Berkooz, G. & Rowley, C.W. 2012 Turbulence, Coherent Structures, Dynamical Systems and Symmetry. Cambridge University Press.CrossRef Google Scholar

Hua, J.-C., Noorian, F., Moss, D., Leong, P.H.W. & Gunaratne, G.H. 2017 High-dimensional time series prediction using kernel-based Koopman mode regression. Nonlinear Dyn. 90 (3), 1785–1806.CrossRef Google Scholar

Huang, C., Duraisamy, K. & Merkle, C. 2018 Challenges in reduced order modeling of reacting flows. In 2018 Joint Propulsion Conference, p. 4675. AIAA.CrossRef Google Scholar

Jasak, H., Jemcov, A. & Tukovic, Z. 2007 Openfoam: a c++ library for complex physics simulations. In International Workshop on Coupled Methods in Numerical Dynamics, vol. 1000, pp. 1–20. IUC Dubrovnik Croatia.Google Scholar

Jones, E., Oliphant, T. & Peterson, P. 2001 SciPy: Open Source Scientific Tools for Python.Google Scholar

Jovanović, M.R., Schmid, P.J. & Nichols, J.W. 2014 Sparsity-promoting dynamic mode decomposition. Phys. Fluids 26 (2), 024103.CrossRef Google Scholar

Jungers, R.M. & Tabuada, P. 2019 Non-local linearization of nonlinear differential equations via polyflows. In 2019 American Control Conference (ACC), pp. 1–6. IEEE.CrossRef Google Scholar

Kaiser, E., Kutz, J.N. & Brunton, S.L. 2017 Data-driven discovery of Koopman eigenfunctions for control. arXiv:1707.01146.Google Scholar

Kamb, M., Kaiser, E., Brunton, S.L. & Kutz, J.N. 2018 Time-delay observables for Koopman: theory and applications. arXiv:1810.01479.Google Scholar

Koopman, B.O. 1931 Hamiltonian systems and transformation in Hilbert space. Proc. Natl Acad. Sci. USA 17 (5), 315.CrossRef Google Scholar PubMed

Korda, M. & Mezić, I. 2018 a Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control. Automatica 93, 149–160.CrossRef Google Scholar

Korda, M. & Mezić, I. 2018 b On convergence of extended dynamic mode decomposition to the Koopman operator. J. Nonlinear Sci. 28 (2), 687–710.CrossRef Google Scholar

Kou, J. & Zhang, W. 2017 An improved criterion to select dominant modes from dynamic mode decomposition. Eur. J. Mech. (B/Fluids) 62, 109–129.CrossRef Google Scholar

Kutz, J.N., Brunton, S.L., Brunton, B.W. & Proctor, J.L. 2016 Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems. SIAM.CrossRef Google Scholar

Le Clainche, S. & Vega, J.M. 2017 Higher order dynamic mode decomposition. SIAM J. Appl. Dyn. Syst. 16 (2), 882–925.CrossRef Google Scholar

Li, Q., Dietrich, F., Bollt, E.M. & Kevrekidis, I.G. 2017 Extended dynamic mode decomposition with dictionary learning: a data-driven adaptive spectral decomposition of the Koopman operator. Chaos 27 (10), 103111.CrossRef Google Scholar PubMed

Lietz, A.M., Johnsen, E. & Kushner, M.J. 2017 Plasma-induced flow instabilities in atmospheric pressure plasma jets. Appl. Phys. Lett. 111 (11), 114101.CrossRef Google Scholar

Lusch, B., Kutz, J.N. & Brunton, S.L. 2018 Deep learning for universal linear embeddings of nonlinear dynamics. Nat. Commun. 9 (1), 4950.CrossRef Google Scholar PubMed

Mamakoukas, G., Castano, M.L., Tan, X. & Murphey, T. 2019 Local Koopman operators for data-driven control of robotic systems. In Robotics: Science and Systems.CrossRef Google Scholar

Mercer, J. 1909 Functions of positive and negative type and their connection with the theory of integral equations. In Philosophical Transsaction of the Royal Society of London, Series.Google Scholar

Mezić, I. 2005 Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dyn. 41 (1-3), 309–325.CrossRef Google Scholar

Mezić, I. 2013 Analysis of fluid flows via spectral properties of the Koopman operator. Annu. Rev. Fluid Mech. 45, 357–378.CrossRef Google Scholar

Minh, H.Q. 2010 Some properties of gaussian reproducing Kernel Hilbert spaces and their implications for function approximation and learning theory. Construct. Approx. 32 (2), 307–338.CrossRef Google Scholar

Noack, B.R., Afanasiev, K., Morzyński, M., Tadmor, G. & Thiele, F. 2003 A hierarchy of low-dimensional models for the transient and post-transient cylinder wake. J. Fluid Mech. 497, 335–363.CrossRef Google Scholar

Otto, S.E. & Rowley, C.W. 2019 Linearly recurrent autoencoder networks for learning dynamics. SIAM J. Appl. Dyn. Syst. 18 (1), 558–593.CrossRef Google Scholar

Pan, S. & Duraisamy, K. 2018 Long-time predictive modeling of nonlinear dynamical systems using neural networks. Complexity 2018, 4801012.CrossRef Google Scholar

Pan, S. & Duraisamy, K. 2019 On the structure of time-delay embedding in linear models of non-linear dynamical systems. arXiv:1902.05198.CrossRef Google Scholar

Pan, S. & Duraisamy, K. 2020 Physics-informed probabilistic learning of linear embeddings of nonlinear dynamics with guaranteed stability. SIAM J. Appl. Dyn. Syst. 19 (1), 480–509.CrossRef Google Scholar

Parikh, N. & Boyd, S. 2014 Proximal algorithms. Found. Trends® Optim. 1 (3), 127–239.CrossRef Google Scholar

Parish, E.J., Wentland, C.R. & Duraisamy, K. 2020 The adjoint Petrov–Galerkin method for non-linear model reduction. Comput. Meth. Appl. Mech. Engng 365, 112991.CrossRef Google Scholar

Parry, W. 2004 Topics in Ergodic Theory, vol. 75. Cambridge University Press.Google Scholar

Pedregosa, F., et al. 2011 Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830.Google Scholar

Pope, S.B. 2001 Turbulent Flows. Cambridge University Press.Google Scholar

Rasmussen, C.E. 2003 Gaussian processes in machine learning. In Summer School on Machine Learning, pp. 63–71. Springer.CrossRef Google Scholar

Röjsel, J. 2017 Koopman mode analysis of the side-by-side cylinder wake. Tech. Rep. Royal Institute of Technology, Department of Mechanics.Google Scholar

Rowley, C.W. & Dawson, S.T.M. 2017 Model reduction for flow analysis and control. Annu. Rev. Fluid. Mech. 49, 387–417.CrossRef Google Scholar

Rowley, C.W., Mezić, I., Bagheri, S., Schlatter, P. & Henningson, D.S. 2009 Spectral analysis of nonlinear flows. J. Fluid Mech. 641, 115–127.CrossRef Google Scholar

Schmid, P.J. 2007 Nonmodal stability theory. Annu. Rev. Fluid Mech. 39, 129–162.CrossRef Google Scholar

Schmid, P.J. 2010 Dynamic mode decomposition of numerical and experimental data. J. Fluid Mech. 656, 5–28.CrossRef Google Scholar

Schmid, P.J., Violato, D. & Scarano, F. 2012 Decomposition of time-resolved tomographic PIV. Exp. Fluids 52 (6), 1567–1579.CrossRef Google Scholar

Sharma, A., Xu, J., Padthe, A.K., Friedmann, P.P. & Duraisamy, K. 2019 Simulation of maritime helicopter dynamics during approach to landing with time-accurate wind-over-deck. In AIAA Scitech 2019 Forum, p. 0861. AIAA.CrossRef Google Scholar

Taira, K., Hemati, M.S., Brunton, S.L., Sun, Y., Duraisamy, K., Bagheri, S., Dawson, S.T.M. & Yeh, C.-A. 2019 Modal analysis of fluid flows: applications and outlook. AIAA J. 58 (3), 998–1022.CrossRef Google Scholar

Takeishi, N., Kawahara, Y. & Yairi, T. 2017 Learning Koopman invariant subspaces for dynamic mode decomposition. In Advances in Neural Information Processing Systems, pp. 1130–1140.Google Scholar

Tissot, G., Cordier, L., Benard, N. & Noack, B.R. 2014 Model reduction using dynamic mode decomposition. C. R. Méc 342 (6–7), 410–416.CrossRef Google Scholar

Tu, J.H., Rowley, C.W., Luchtenburg, D.M., Brunton, S.L. & Kutz, J.N. 2013 On dynamic mode decomposition: theory and applications. arXiv:1312.0041.Google Scholar

Wang, L., Wang, J., Xiang, J. & Hui Yue, H. 2019 A re-weighted smoothed l 0-norm regularized sparse reconstructed algorithm for linear inverse problems. J. Phys. Commun. 3 (7), 075004.CrossRef Google Scholar

White, F.M. & Corfield, I. 2006 Viscous Fluid Flow, vol. 3. McGraw-Hill.Google Scholar

Williams, M.O., Kevrekidis, I.G. & Rowley, C.W. 2015 A data–driven approximation of the Koopman operator: extending dynamic mode decomposition. J. Nonlinear Sci. 25 (6), 1307–1346.CrossRef Google Scholar

Williams, M.O., Rowley, C.W. & Kevrekidis, I.G. 2014 A kernel-based approach to data-driven Koopman spectral analysis. arXiv:1411.2260.Google Scholar

Xu, J. & Duraisamy, K. 2019 Multi-level convolutional autoencoder networks for parametric prediction of spatio-temporal dynamics. arXiv:1912.11114.CrossRef Google Scholar

Xu, J., Huang, C. & Duraisamy, K. 2020 Reduced-order modeling framework for combustor instabilities using truncated domain training. AIAA J. 58 (2), 618–632.CrossRef Google Scholar

Yeung, E., Kundu, S. & Hodas, N. 2019 Learning deep neural network representations for Koopman operators of nonlinear dynamical systems. In 2019 American Control Conference (ACC), pp. 4832–4839. IEEE.CrossRef Google Scholar

Yuan, W., Wall, A. & Lee, R. 2018 Combined numerical and experimental simulations of unsteady ship airwakes. Comput. Fluids 172, 29–53.CrossRef Google Scholar

Zhang, H., Dawson, S., Rowley, C.W., Deem, E.A. & Cattafesta, L.N. 2017 Evaluating the accuracy of the dynamic mode decomposition. arXiv:1710.00745.Google Scholar

Zhang, L. & Schaeffer, H. 2019 On the convergence of the sindy algorithm. Multiscale Model. Simul. 17 (3), 948–972.CrossRef Google Scholar

Zhao, L., Sun, Q., Ye, J., Chen, F., Lu, C.-T. & Ramakrishnan, N. 2015 Multi-task learning for spatio-temporal event forecasting. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1503–1512. ACM.CrossRef Google Scholar

Zou, H. & Hastie, T. 2005 Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67 (2), 301–320.CrossRef Google Scholar

Figure 1. Schematic of transformation of a nonlinear system to a linear $L$-dimensional space and the extraction of a minimal Koopman-invariant subspace.

Table 1. Common choice of differentiable kernel functions.

Figure 2. Illustration of $\ell _1/\ell _2$ norm (defined in (3.11)) for different $N\times N$ 0-1 binary matrices.

Figure 3. Schematic illustrating the idea of sparse identification of Koopman-invariant subspaces for EDMD and KDMD.

Figure 4. Differences and similarities among existing mode selection methods.

Figure 5. Data distribution for a 2-D fixed point attractor.

Figure 12. Standard EDMD prediction on an unseen trajectory with different SVD truncations for a fixed point attractor.

Figure 14. Illustration of splitting a uniformly sampled single trajectory in high-dimensional phase space into training, validation and testing sets.

Figure 15. Trend of linear evolution error $Q$ and reconstruction error $R$ from discrete-time KDMD for the two-dimensional cylinder wake flow case; (a) $Re=70$, (b) $Re =100$, (c) $Re=130$.

Table 2. Summary of mode selection for discrete-time KDMD on two-dimensional cylinder wake flow.

Figure 17. A posteriori prediction of testing trajectory for $Re = 70$ in terms of the top eight POD coefficients with spKDMD.

Figure 18. A posteriori prediction of testing trajectory for $Re = 100$ in terms of the top eight POD coefficients with spKDMD.

Figure 19. A posteriori prediction of testing trajectory for $Re = 130$ in terms of the top eight POD coefficients with spKDMD.

Figure 20. Illustration of the structure of velocity field for the lower (top) and higher frequency (bottom) Koopman modes. The arrow roughly indicates the velocity direction.

Figure 22. Contours of Koopman modes of $Re=70$ cylinder wake flow at $t=0$. Red squares indicate stable modes.

Figure 23. Contours of Koopman modes of $Re=100$ cylinder wake flow at $t=0$. Red squares indicate stable modes.

Figure 24. Contours of Koopman modes of $Re=130$ cylinder wake flow at $t=0$. Red squares indicate stable modes.

Figure 27. Comparison of a posteriori prediction on the top eight POD coefficients of the testing trajectory between spKDMD, DMD (Schmid 2010) and spDMD (Jovanović et al.2014) for the two-dimensional cylinder flow at $Re=70$. Here $x_i$ denotes the $i$th POD coefficient.

Figure 28. Comparison of identified eigenvalues between spKDMD, DMD (Schmid 2010) and spDMD (Jovanović et al.2014) for the two-dimensional cylinder flow at $Re=70$.

Figure 29. (a) Geometry of the ship (SFS2). (b) Generated computational mesh.

Figure 31. Contour of velocity components near the ship on the $z$-plane slice at $t=1.5s$, 3.9s, 9.0s, 30s. For each subfigure, (a) prediction from KDMD; (b) ground truth.

Table 3. Summary of mode selection for discrete-time KDMD on ship airwake.

Figure 33. Comparison of a posteriori prediction of the four most significant POD coefficients of the testing trajectory between spKDMD and spDMD (Jovanović et al.2014) for the three-dimensional (3-D) ship-airwake flow. Here $x_i$ denotes the $i$th POD coefficient.

Figure 34. (a) Comparison of identified eigenvalues between spKDMD and spDMD (Jovanović et al.2014) for the 3-D ship-airwake flow. (b) Trend of linear evolving error $Q$ and reconstruction error $R$ from DMD for the 3-D ship-airwake flow.

Table 4. Computational complexity of each step in the proposed sparsity-promoting framework.

Figure 35. Hyperparameter search for isotropic Gaussian KDMD on the 2-D fixed point attractor.

Figure 36. Hyperparameter search for isotropic Gaussian KDMD on transient cylinder wake flows.

Figure 37. Hyperparameter search for isotropic Gaussian KDMD on a transient ship airwake.

Article contents

Sparsity-promoting algorithms for the discovery of informative Koopman-invariant subspaces

Abstract

JFM classification

Information

1. Introduction

2. Review of EDMD/KDMD

2.1. Extended DMD

2.1.1. Discrete-time formulation

2.1.2. Continuous-time formulation

2.2. Kernel DMD

2.2.1. Discrete-time formulation

2.2.2. Continuous-time formulation

2.3. Challenges in EDMD/KDMD

2.3.1. Mode selection

2.3.2. Choice of dictionary (for EDMD) or kernel (for KDMD)

3. Sparse identification of informative Koopman-invariant subspace

3.1. Pruning spurious modes by a posteriori error analysis

3.2. Sparse reconstruction via multi-task feature learning

3.2.1. Relationship between spDMD, Kou's criterion and multi-task feature learning

3.3. Hyperparameter selection

3.4. Implementation

4. Results and discussion

4.1. Two-dimensional fixed point attractor with the known finite Koopman-invariant subspace

4.1.1. Results for continuous-time EDMD with mode selection

4.1.2. Results of continuous-time KDMD with mode selection

4.1.3. Effect of SVD regularization

4.2. Two-dimensional, transient flow past a cylinder

4.2.1. Results of discrete-time KDMD with mode selection

4.2.2. Net contribution of clustered Koopman modes

4.2.3. Comparison with DMD and spDMD at $Re=70$

4.3. Three-dimensional transient turbulent ship airwake

4.3.1. Results of discrete-time KDMD with mode selection

4.3.2. Comparison with spDMD

5. Conclusions

Acknowledgements

Declaration of interests

Appendix A. Proof for the identity for (2.22)

Appendix B. Computational complexity

Appendix C. Hyperparameter selection

C.1. Two-dimensional fixed point attractor

C.2. Cylinder flow case

C.3. Turbulent ship-airwake case

Appendix D. Parallel data processing tool

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests