Aligning Time Series on Incomparable Spaces
Samuel Cohen, Giulia Luise, Alexander Terenin, Brandon Amos, Marc Peter Deisenroth
AISTATS 2021
Data is often gathered sequentially in the form of a time series, which consists of sequences of data points observed at successive time points. Dynamic time warping (DTW) defines a meaningful similarity measure between two time series. Often times, the pairs of time series we are interested in are defined on different spaces: for instance, one might want to align a video with a corresponding audio wave, potentially sampled at different frequencies. In this work, we propose Gromov Dynamic Time Warping (GDTW), a distance between time series on potentially incomparable spaces, and apply it to various settings, including barycentric averaging, generative modeling, and imitation learning.
Time Series Alignment
Sakoe and Chiba^{1} consider the problem of aligning two time series \(\boldsymbol{x} \in \mathcal{X}^{T_x}\) and \(\boldsymbol{y} \in \mathcal{Y}^{T_y}\), where potentially \(T_x \neq T_y\). This is formalized as
\[\operatorname{DTW}(\boldsymbol{x},\boldsymbol{y}) = \min_{\mathbf{A} \in \mathcal{A}(T_x,T_y)} \langle\mathbf{D}, \mathbf{A}\rangle_{\operatorname{F}},\]where \(D_{ij} = d_\mathcal{X}(x_i,y_j)\) is the pairwise distance matrix and \(A_{ij}\) is \(1\) if \(x_i\) and \(y_j\) are aligned, and \(0\) otherwise. A DTW alignment of two time series is shown in Figure 1.
Figure 1: Alignment of two time series via Dynamic Time Warping.
A practical drawback of DTW is the need for both time series \(\boldsymbol{x}\) and \(\boldsymbol{y}\) to live on the same spaces, with a metric \(d_{\mathcal{X}}\). This can cause the following issues.
 Alignment can fail for time series that are only close up to isometries, such as rotations and translations.
 The method doesn’t apply to time series which are defined on different spaces, such as location coordinates for \(\boldsymbol{x}\) and pixel values for \(\boldsymbol{y}\).
In such cases, defining a meaningful distance between samples from the two sequences is impractical as it would require detailed understanding of the objects we wish to study.
Dealing with Incomparable Spaces
Motivated by connections between DTW and optimal transport,^{2} we introduce a distance between time series \(\boldsymbol{x} \in \mathcal{X}^{T_x}\) and \(\boldsymbol{y} \in \mathcal{Y}^{T_y}\) defined on potentially incomparable metric spaces. The key idea is to define a loss function which compares pairwise distances in \(\mathcal{X}^{T_x}\) with those in \(\mathcal{Y}^{T_y}\). For this, we define the Gromov dynamic time warping distance as
\[\operatorname{GDTW}(\boldsymbol{x},\boldsymbol{y})=\min_{\mathbf{A} \in \mathcal{A}(T_x,T_y)} \sum_{ijkl} \mathcal{L} \big(d_{\mathcal{X}}(x_i,x_k),d_{\mathcal{Y}}(y_j,y_l)\big) A_{ij}A_{kl},\]where \(d_{\mathcal{X}}\) is a distance defined on \(\mathcal{X}\), and \(d_{\mathcal{Y}}\) a distance defined on \(\mathcal{Y}\). We solve the optimization problem over the set of alignment matrices by applying a Frank–Wolfeinspired algorithm.
Similarly to DTW, GDTW suffers from unpredictability when the time series is close to a change point of the optimal alignment matrix because of the discontinuity of derivatives. To remedy this, we introduce a softened version of this expression, mirroring the definition of soft DTW.^{3} This allows smoother derivatives when applying it to for instance generative modeling of time series and imitation learning.
Applications
We showcase a number of applications of Gromov DTW. We considered 3 settings: barycentric averaging, generative modeling, and imitation learning.

Barycentric averaging: we extend the algorithm of Peyré et al.^{2} to the sequential setting via coordinate descent on the GDTW objective. We plot barycenters under several warping approaches in Figure 2.
Figure 2: Barycenters of the Quickdraw dataset’s fishes via various time warping approaches.

Generative modeling: we extend the algorithm of Genevay et al.^{4} and Bunne et al.^{5} to the sequential setting by leveraging GDTW as ground metric in an entropic Wasserstein objective. Samples can be observed in Figure 3.
Figure 3: Samples generated by the time series GAN trained on sequential MNIST.

Imitation learning: We propose an approach to performing imitation learning when the agent and expert do not live on comparable spaces, which consists in minimizing the GromovDTW between expert demonstrations and agent rollouts. This is illustrated in Figure 4.
Figure 4: Expert trajectory (left, sequence of pixel images) and policy of an agent (right, sequence of points in \(\mathbb{R}^2\)) learned by imitation learning.
Summary
We introduce a distance between time series living on potentially incomparable spaces, Gromov DTW, which significantly broadens the range of applications of previous timeseries metrics like DTW. We also propose a smoothed version that alleviates the discontinuity of GDTW’s gradient. Gromov DTW is a general concept for comparing two time series and can be applied to a number of applications, including barycentric averaging, generative modeling and imitation learning.

H. Sakoe, S. Chiba. Dynamic Programming Algorithm Optimization for Spoken Word Recognition. ICASSP, 2018. ↩

G. Peyré, M. Cuturi, J. Solomon. Gromov–Wasserstein Averaging of Kernel and Distance Matrices. ICML, 2016. ↩ ↩^{2}

M. Cuturi, M. Blondel. SoftDTW: A Differentiable Loss Function for TimeSeries. ICML, 2017. ↩

A. Genevay, G. Peyré, M. Cuturi. Learning Generative Models with Sinkhorn Divergences. AISTATS, 2018. ↩

C. Bunne, D. AlvarezMelis, A. Krause, S. Jegelka. Learning Generative Models Across Incomparable Spaces. ICML, 2019. ↩