Jekyll2021-04-28T19:06:55+00:00https://avt.im/feed.xmlAlexander TereninAlexander TereninAligning Time Series on Incomparable Spaces2021-03-21T00:00:00+00:002021-03-21T00:00:00+00:00https://avt.im/talks/2021/03/21/Aligning-Time-Series-Poster<p>Dynamic time warping (DTW) is a useful method for aligning, comparing and combining time series, but it requires them to live in comparable spaces.
In this work, we consider a setting in which time series live on different spaces without a sensible ground metric, causing DTW to become ill-defined.
To alleviate this, we propose Gromov dynamic time warping (GDTW), a distance between time series on potentially incomparable spaces that avoids the comparability requirement by instead considering intra-relational geometry.
We demonstrate its effectiveness at aligning, combining and comparing time series living on incomparable spaces.
We further propose a smoothed version of GDTW as a differentiable loss and assess its properties in a variety of settings, including barycentric averaging, generative modeling and imitation learning.</p>Alexander TereninDynamic time warping (DTW) is a useful method for aligning, comparing and combining time series, but it requires them to live in comparable spaces. In this work, we consider a setting in which time series live on different spaces without a sensible ground metric, causing DTW to become ill-defined. To alleviate this, we propose Gromov dynamic time warping (GDTW), a distance between time series on potentially incomparable spaces that avoids the comparability requirement by instead considering intra-relational geometry. We demonstrate its effectiveness at aligning, combining and comparing time series living on incomparable spaces. We further propose a smoothed version of GDTW as a differentiable loss and assess its properties in a variety of settings, including barycentric averaging, generative modeling and imitation learning.Learning Contact Dynamics using Physically Structured Neural Networks2021-03-21T00:00:00+00:002021-03-21T00:00:00+00:00https://avt.im/talks/2021/03/21/Contacy-Dynamics-Poster<p>Learning physically structured representations of dynamical systems that include contact between different objects is an important problem for learning-based approaches in robotics. Black-box neural networks can learn to approximately represent discontinuous dynamics, but they typically require large quantities of data and often suffer from pathological behaviour when forecasting for longer time horizons. In this work, we use connections between deep neural networks and differential equations to design a family of deep network architectures for representing contact dynamics between objects. We show that these networks can learn discontinuous contact events in a data-efficient manner from noisy observations in settings that are traditionally difficult for black-box approaches and recent physics inspired neural networks. Our results indicate that an idealised form of touch feedback—which is heavily relied upon by biological systems—is a key component of making this learning problem tractable. Together with the inductive biases introduced through the network architectures, our techniques enable accurate learning of contact dynamics from observations.</p>Alexander TereninLearning physically structured representations of dynamical systems that include contact between different objects is an important problem for learning-based approaches in robotics. Black-box neural networks can learn to approximately represent discontinuous dynamics, but they typically require large quantities of data and often suffer from pathological behaviour when forecasting for longer time horizons. In this work, we use connections between deep neural networks and differential equations to design a family of deep network architectures for representing contact dynamics between objects. We show that these networks can learn discontinuous contact events in a data-efficient manner from noisy observations in settings that are traditionally difficult for black-box approaches and recent physics inspired neural networks. Our results indicate that an idealised form of touch feedback—which is heavily relied upon by biological systems—is a key component of making this learning problem tractable. Together with the inductive biases introduced through the network architectures, our techniques enable accurate learning of contact dynamics from observations.Matérn Gaussian Processes on Graphs2021-03-21T00:00:00+00:002021-03-21T00:00:00+00:00https://avt.im/talks/2021/03/21/Graph-Matern-GP-Poster<p>Gaussian processes are a versatile framework for learning unknown functions in a manner that permits one to utilize prior information about their properties. Although many different Gaussian process models are readily available when the input space is Euclidean, the choice is much more limited for Gaussian processes whose input space is an undirected graph. In this work, we leverage the stochastic partial differential equation characterization of Matérn Gaussian processes—a widely-used model class in the Euclidean setting—to study their analog for undirected graphs. We show that the resulting Gaussian processes inherit various attractive properties of their Euclidean and Riemannian analogs and provide techniques that allow them to be trained using standard methods, such as inducing points. This enables graph Matérn Gaussian processes to be employed in mini-batch and non-conjugate settings, thereby making them more accessible to practitioners and easier to deploy within larger learning frameworks.</p>Alexander TereninGaussian processes are a versatile framework for learning unknown functions in a manner that permits one to utilize prior information about their properties. Although many different Gaussian process models are readily available when the input space is Euclidean, the choice is much more limited for Gaussian processes whose input space is an undirected graph. In this work, we leverage the stochastic partial differential equation characterization of Matérn Gaussian processes—a widely-used model class in the Euclidean setting—to study their analog for undirected graphs. We show that the resulting Gaussian processes inherit various attractive properties of their Euclidean and Riemannian analogs and provide techniques that allow them to be trained using standard methods, such as inducing points. This enables graph Matérn Gaussian processes to be employed in mini-batch and non-conjugate settings, thereby making them more accessible to practitioners and easier to deploy within larger learning frameworks.Matérn Gaussian Processes on Graphs2021-03-21T00:00:00+00:002021-03-21T00:00:00+00:00https://avt.im/talks/2021/03/21/Graph-Matern-GP<p>Gaussian processes are a versatile framework for learning unknown functions in a manner that permits one to utilize prior information about their properties. Although many different Gaussian process models are readily available when the input space is Euclidean, the choice is much more limited for Gaussian processes whose input space is an undirected graph. In this work, we leverage the stochastic partial differential equation characterization of Matérn Gaussian processes—a widely-used model class in the Euclidean setting—to study their analog for undirected graphs. We show that the resulting Gaussian processes inherit various attractive properties of their Euclidean and Riemannian analogs and provide techniques that allow them to be trained using standard methods, such as inducing points. This enables graph Matérn Gaussian processes to be employed in mini-batch and non-conjugate settings, thereby making them more accessible to practitioners and easier to deploy within larger learning frameworks.</p>Alexander TereninGaussian processes are a versatile framework for learning unknown functions in a manner that permits one to utilize prior information about their properties. Although many different Gaussian process models are readily available when the input space is Euclidean, the choice is much more limited for Gaussian processes whose input space is an undirected graph. In this work, we leverage the stochastic partial differential equation characterization of Matérn Gaussian processes—a widely-used model class in the Euclidean setting—to study their analog for undirected graphs. We show that the resulting Gaussian processes inherit various attractive properties of their Euclidean and Riemannian analogs and provide techniques that allow them to be trained using standard methods, such as inducing points. This enables graph Matérn Gaussian processes to be employed in mini-batch and non-conjugate settings, thereby making them more accessible to practitioners and easier to deploy within larger learning frameworks.A Brief Tutorial on Multi-armed Bandits2021-03-05T00:00:00+00:002021-03-05T00:00:00+00:00https://avt.im/talks/2021/03/05/Bandits-Tutorial<p>Multi-armed bandits are a class of sequential decision problems which include uncertainty. One of their defining characteristics is the presence of explore-exploit tradeoffs, which require one to balance taking advantage of information that is known with trying different options in order to learn more information in order to make optimal decisions. In this tutorial, we introduce the problem setting and basic techniques of analysis. We conclude by discussing how explore-exploit tradeoffs appear in more general settings, and how the ideas presented can aid in understanding of areas like reinforcement learning.</p>Alexander TereninMulti-armed bandits are a class of sequential decision problems which include uncertainty. One of their defining characteristics is the presence of explore-exploit tradeoffs, which require one to balance taking advantage of information that is known with trying different options in order to learn more information in order to make optimal decisions. In this tutorial, we introduce the problem setting and basic techniques of analysis. We conclude by discussing how explore-exploit tradeoffs appear in more general settings, and how the ideas presented can aid in understanding of areas like reinforcement learning.A Brief Tutorial on Multi-armed Bandits2021-03-04T00:00:00+00:002021-03-04T00:00:00+00:00https://avt.im/talks/2021/03/04/Bandits-Tutorial<p>Multi-armed bandits are a class of sequential decision problems which include uncertainty. One of their defining characteristics is the presence of explore-exploit tradeoffs, which require one to balance taking advantage of information that is known with trying different options in order to learn more information in order to make optimal decisions. In this tutorial, we introduce the problem setting and basic techniques of analysis. We conclude by discussing how explore-exploit tradeoffs appear in more general settings, and how the ideas presented can aid in understanding of areas like reinforcement learning.</p>Alexander TereninMulti-armed bandits are a class of sequential decision problems which include uncertainty. One of their defining characteristics is the presence of explore-exploit tradeoffs, which require one to balance taking advantage of information that is known with trying different options in order to learn more information in order to make optimal decisions. In this tutorial, we introduce the problem setting and basic techniques of analysis. We conclude by discussing how explore-exploit tradeoffs appear in more general settings, and how the ideas presented can aid in understanding of areas like reinforcement learning.Pathwise, spectral, and geometric perspectives on Gaussian processes2021-02-05T00:00:00+00:002021-02-05T00:00:00+00:00https://avt.im/talks/2021/02/05/GP-Perspectives<p>Gaussian processes are usually studied via their finite-dimensional marginal distributions, but this is not the only way to think about them. In this talk, I discuss a little-known result relating Gaussian process priors to posteriors in a path-wise rather than distributional manner, and show how it can be leveraged for efficient posterior sampling. I then present a discussion on different ways of specifying Gaussian process priors, focusing on non-Euclidean settings via techniques based on stochastic partial differential equations and their discrete analogs, which are of particular interest for applications in physical sciences and engineering.</p>Alexander TereninGaussian processes are usually studied via their finite-dimensional marginal distributions, but this is not the only way to think about them. In this talk, I discuss a little-known result relating Gaussian process priors to posteriors in a path-wise rather than distributional manner, and show how it can be leveraged for efficient posterior sampling. I then present a discussion on different ways of specifying Gaussian process priors, focusing on non-Euclidean settings via techniques based on stochastic partial differential equations and their discrete analogs, which are of particular interest for applications in physical sciences and engineering.Learning Contact Dynamics using Physically Structured Neural Networks2021-01-22T00:00:00+00:002021-01-22T00:00:00+00:00https://avt.im/publications/2021/01/22/Contact-Dynamics<p>Learning models of physical systems can sometimes be difficult.
Vanilla neural networks—like residual networks—particularly struggle to learn invariant properties like the conservation of energy which is fundamental to physical systems.
To counteract this, a number of recent works such as <em>Hamiltonian Neural Networks</em> and <em>Variational Integrator Networks</em> introduce <em>inductive biases</em>, also referred to as <em>physics priors</em>, which improve reliability of predictions and speed up learning.
These network classes exhibit good approximation behavior for continuous physical systems but they are fundamentally limited to smooth dynamics, and not designed to handle non-smooth physical behavior, such as resolving collision events between different objects.
Such behavior is of key interest in robotics, and other areas of engineering.
In this work, we explore neural network architectures designed for accurately modeling contact dynamics, which incorporate the structure necessary to reliably resolve non-smooth collision events.</p>
<h1 id="contact-dynamics">Contact Dynamics</h1>
<p><em>Contact dynamics</em> are a class of equations which describe the motion of physical systems consisting of multiple solid objects which interact with each other and their environment.
One of these equations’ defining features is that when two objects collide, their velocities change direction <em>instantaneously</em> in a non-smooth manner—this describes, for instance, how a bouncing ball immediately changes direction upon hitting the ground, resulting from a transfer of momentum and other physical considerations.</p>
<div class="row justify-content-center my-5">
<div class="col-12 col-md-10 col-lg-8">
<video class="embed-responsive border rounded dark-invert" controls="">
<source src="/assets/publications/2021-01-22-Contact-Dynamics/bb_resnet_phase.mp4" type="video/mp4" />
</video>
</div>
<div class="col-12 mt-3">
<p><strong>Illustration:</strong> here, we see a residual network, trained to predict the trajectory of a bouncing ball, struggle to resolve contact events. Since contacts trigger a discontinuous jump in velocity space, they are difficult to model in a black-box fashion, causing the residual network to incorrectly approximate them through spurious smooth dynamics. We explore the use of physical inductive biases to alleviate these issues.
</p>
</div>
</div>
<p>Because of the resulting non-smoothness and non-linearity, contacts dynamics are considered notoriously difficult to compute.
For example, a numerical regime must decide whether to enforce non-interpenetration constraints by precisely calculating contact times using an optimization procedure, or instead allow physically incorrect interpenetration.
Below, we illustrate sample numerical trajectory of a bouncing ball.</p>
<div class="row justify-content-center align-items-end my-5">
<div class="col-12 col-md-6 col-lg-5 text-center">
<img class="img-fluid dark-invert" alt="Contact time" src="/assets/publications/2021-01-22-Contact-Dynamics/contact-time.svg" />
<p class="text-center"><strong>(a)</strong> Find the contact time \(t_c\).</p>
</div>
<div class="col-12 col-md-6 col-lg-5 text-center">
<img class="img-fluid dark-invert" alt="Contact time" src="/assets/publications/2021-01-22-Contact-Dynamics/contact-state.svg" />
<p class="text-center"><strong>(b)</strong> Calculate true trajectory.</p>
</div>
<div class="col-12 mt-3">
<p class="mt-3">
<strong>Example:</strong> integration scheme for a bouncing ball that enforces constraints exactly. Initially, the ball is time-stepped until a contact with the floor is detected through interpenetration at time \(t_1\). Then, the trajectory is (a) linearly interpolated to find the <em>contact time</em> \(t_c\) where contact occurs between the ball and floor. Finally, the contact state at time \(t_c\) is calculated, a transfer of momentum between \(t_c^-\) and \(t_c^+\) is performed, and the ball is time-stepped as usual to time \(t_1\).
</p>
</div>
</div>
<p>These difficulties are further compounded when the equations of motion of the system under study are unknown, which occurs in robotics when learning to interact with unknown objects.
The aim of this work is to explore neural network architectures which are capable of accurately modeling such systems.</p>
<h1 id="central-difference-lagrange-networks">Central Difference Lagrange Networks</h1>
<p>We begin with the perspective of <em>neural ordinary differential equations</em>, which view deep networks as discretizations of continuous-time dynamical systems.
Our system’s state is defined as velocity pairs \((\mathbf{Q}, \mathbf{\dot Q})\). In-between contact events, the trajectories follow the <em>Euler-Lagrange equations</em></p>
\[\frac{\partial L}{\partial \mathbf{Q}} - \frac{\mathrm{d}}{\mathrm{d}t} \frac{\partial L}{\partial \mathbf{\dot Q}} = 0,\]
<p>where \(L=T-V\) is the Lagrangian with the potential \(V\) and kinetic energy \(T\).
At contact times, the above equations do not hold, and a set of instantaneous <em>transfer of momentum</em> equations apply instead.</p>
<p>To model these dynamics, we adopt an approach similar in spirit to <em>Variational Integrator Networks</em><sup id="fnref:vins" role="doc-noteref"><a href="#fn:vins" class="footnote" rel="footnote">1</a></sup> by modeling \(V\) using a fully connected neural network and discretizing the resulting equations of motion to construct a recurrent network architecture.
The choice of differential equation class and discretization scheme determines the inductive bias that is introduced.
These biases can include physical properties such as conservation of momentum and of energy, as well as other fundamental mechanical characteristics.</p>
<p>To design a scheme for contact dynamics, we employ the Central Difference–Lagrange (CDL) scheme,<sup id="fnref:cdl" role="doc-noteref"><a href="#fn:cdl" class="footnote" rel="footnote">2</a></sup> whose equations form the basis of our network architecture.
Between contact times, the dynamics evolve smoothly, which we denoted by \((\cdot)^S\), in a manner that mirrors variational integrator networks.
During contact events, these equations are augmented by a contact term, denoted by \((\cdot)^{C}\), that handles the transfer of momentum and makes sure that (1) Newton’s restitution law, as well as (2) the law of conservation of momentum both hold.
Below, we illustrate how the different states in the CDL-Network are calculated.</p>
<div class="row justify-content-center my-5">
<div class="col-12 col-lg-10 col-xl-9 text-center">
<img class="img-fluid dark-invert" alt="Contact time" src="/assets/publications/2021-01-22-Contact-Dynamics/contact-network.svg" />
</div>
<div class="col-12">
<p class="mt-3">
<strong>Illustration:</strong> a CD-Lagrange network.
Here, we begin from initial states \((\mathbf{Q}_0,\mathbf{\dot Q}_{\frac{1}{2}})\).
We calculate the next position \(\mathbf{Q}_1\), and proceed to calculate the next velocity \(\mathbf{\dot Q}_{1 + \frac{1}{2}} = \mathbf{\dot Q}_{1 + \frac{1}{2}}^S + \mathbf{\dot Q}_{1 + \frac{1}{2}}^C\) as a sum of smooth and contact terms.
These terms are in turn calculated using the conservative forces \(\mathbf{F}\) and the impulse \(\mathbf{I}\), which are calculated from the parameterized Lagrangian, whose potential energy is given by a fully connected network.
</p>
</div>
</div>
<h1 id="touch-feedback">Touch Feedback</h1>
<p>In an unknown system, one must determine when a system’s trajectory evolves according to smooth dynamics, and when it evolves according to contact events.
Our results suggest that external touch feedback—such as that available from an idealized touch sensor—is necessary to make the problem tractable.
This is handled by introducing a <em>contact network</em> \(\hat{c}\) which learns to predict contact events, using training data obtained via said touch sensor.
Without this, the network struggles to differentiate noise from real contact events, as small contact events and noise both generate similar data.
We illustrate this behavior on the following examples.</p>
<ul>
<li>
<p><strong>Bouncing ball</strong>. Here, we see that without touch feedback, the CD–Lagrange network struggles to learn the contact events properly and instead approximates the discontinuous behavior using the potential network.
With training data that includes touch feedback, we see that performance, shown below, is significantly better.</p>
<div class="row justify-content-center my-5">
<div class="col-12 col-md-10 col-lg-8">
<video class="embed-responsive border rounded dark-invert" autoplay="" loop="">
<source src="/assets/publications/2021-01-22-Contact-Dynamics/bb_cdln_phase.mp4" type="video/mp4" />
</video>
<p class="mt-3 text-center"> <strong>(a)</strong> Bouncing ball: CD–Lagrange.</p>
</div>
</div>
</li>
<li>
<p><strong>Newton’s cradle</strong>. As an additional baseline, we employ a vanilla residual network (ResNet) as well as a residual network with additional contact inputs (ResNet contact).
The CD-Lagrange network, shown below, exhibits the best approximation behavior and learn both the potential and contact events more accurately than the residual networks.</p>
<div class="row justify-content-center">
<div class="col-7 col-md-5 col-lg-4">
<video class="embed-responsive border rounded dark-invert" autoplay="" loop="">
<source src="/assets/publications/2021-01-22-Contact-Dynamics/nc_cdln.mp4" type="video/mp4" />
</video>
<p class="mt-3 text-center"> <strong>(a)</strong> Newton's cradle: CD–Lagrange.</p>
</div>
<div class="col-7 col-md-5 col-lg-4">
<video class="embed-responsive border rounded dark-invert" autoplay="" loop="">
<source src="/assets/publications/2021-01-22-Contact-Dynamics/nc_resnet.mp4" type="video/mp4" />
</video>
<p class="mt-3 text-center"> <strong>(b)</strong> Newton's cradle: ResNet.</p>
</div>
</div>
</li>
</ul>
<h1 id="summary">Summary</h1>
<p>State-of-the-art physics-inspired neural networks generally struggle to learn contact dynamics.
Central-Difference-Lagrange networks are a class of networks that not only exhibit strong conservation properties, comparable to other physically structured neural networks, but also allow accurate learning of contact dynamics from observed data.
In this regime, the information available to the network when making predictions has a significant effect on performance: the addition of touch feedback sensor data ensures that noise and contact events are correctly differentiated.
We hope these contributions enable neural network models to be used in wider settings.</p>
<h1 id="references">References</h1>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:vins" role="doc-endnote">
<p>S. Sæmundsson, A. Terenin, K. Hofmann, M. P. Deisenroth. Variational Integrator Networks for Physically Structured Embeddings. AISTATS, 2020. <a href="#fnref:vins" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:cdl" role="doc-endnote">
<p>F.-E. Fekak, M. Brun, A. Gravouil, and B. Depale. A new heterogeneous asynchronous explicit–implicit time integrator for nonsmooth dynamics. Computational Mechanics, 60(1):1–21, 2017. <a href="#fnref:cdl" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Andreas Hochlehnert, Alexander Terenin, Steindór Sæmundsson, Marc Peter DeisenrothLearning models of physical systems can sometimes be difficult. Vanilla neural networks—like residual networks—particularly struggle to learn invariant properties like the conservation of energy which is fundamental to physical systems. To counteract this, a number of recent works such as Hamiltonian Neural Networks and Variational Integrator Networks introduce inductive biases, also referred to as physics priors, which improve reliability of predictions and speed up learning. These network classes exhibit good approximation behavior for continuous physical systems but they are fundamentally limited to smooth dynamics, and not designed to handle non-smooth physical behavior, such as resolving collision events between different objects. Such behavior is of key interest in robotics, and other areas of engineering. In this work, we explore neural network architectures designed for accurately modeling contact dynamics, which incorporate the structure necessary to reliably resolve non-smooth collision events.Matérn Gaussian Processes on Riemannian Manifolds2020-12-07T00:00:00+00:002020-12-07T00:00:00+00:00https://avt.im/talks/2020/12/07/Riemannian-Matern-GP-Poster<p>Gaussian processes are an effective model class for learning unknown functions, particularly in settings where accurately representing predictive uncertainty is of key importance.
Motivated by applications in the physical sciences, the widely-used Matérn class of Gaussian processes has recently been generalized to model functions whose domains are Riemannian manifolds, by re-expressing said processes as solutions of stochastic partial differential equations.
In this work, we propose techniques for computing the kernels of these processes on compact Riemannian manifolds via spectral theory of the Laplace–Beltrami operator, allowing them to be trained via standard scalable techniques such as inducing points.
This enables Riemannian Matérn GPs to be used in mini-batch, online, and non-conjugate settings, and makes them more accessible to machine learning practitioners.</p>Alexander TereninGaussian processes are an effective model class for learning unknown functions, particularly in settings where accurately representing predictive uncertainty is of key importance. Motivated by applications in the physical sciences, the widely-used Matérn class of Gaussian processes has recently been generalized to model functions whose domains are Riemannian manifolds, by re-expressing said processes as solutions of stochastic partial differential equations. In this work, we propose techniques for computing the kernels of these processes on compact Riemannian manifolds via spectral theory of the Laplace–Beltrami operator, allowing them to be trained via standard scalable techniques such as inducing points. This enables Riemannian Matérn GPs to be used in mini-batch, online, and non-conjugate settings, and makes them more accessible to machine learning practitioners.Sparse Parallel Training of Hierarchical Dirichlet Process Topic Models2020-11-16T00:00:00+00:002020-11-16T00:00:00+00:00https://avt.im/talks/2020/11/16/HDP<p>To scale non-parametric extensions of probabilistic topic models such as Latent Dirichlet allocation to larger data sets, practitioners rely increasingly on parallel and distributed systems. In this work, we study data-parallel training for the hierarchical Dirichlet process (HDP) topic model. Based upon a representation of certain conditional distributions within an HDP, we propose a doubly sparse data-parallel sampler for the HDP topic model. This sampler utilizes all available sources of sparsity found in natural language—an important way to make computation efficient. We benchmark our method on a well-known corpus (PubMed) with 8m documents and 768m tokens, using a single multi-core machine in under four days.</p>Alexander TereninTo scale non-parametric extensions of probabilistic topic models such as Latent Dirichlet allocation to larger data sets, practitioners rely increasingly on parallel and distributed systems. In this work, we study data-parallel training for the hierarchical Dirichlet process (HDP) topic model. Based upon a representation of certain conditional distributions within an HDP, we propose a doubly sparse data-parallel sampler for the HDP topic model. This sampler utilizes all available sources of sparsity found in natural language—an important way to make computation efficient. We benchmark our method on a well-known corpus (PubMed) with 8m documents and 768m tokens, using a single multi-core machine in under four days.