Jekyll2023-09-22T20:25:22+00:00https://avt.im/feed.xmlAlexander TereninAlexander TereninSampling from Gaussian Process Posteriors using Stochastic Gradient Descent2023-06-30T00:00:00+00:002023-06-30T00:00:00+00:00https://avt.im/talks/2023/06/30/Stochastic-Gradient-Descent-GP<p>The ability to deploy Gaussian-process-based decision-making systems such as Bayesian optimization at scale has traditionally been limited by computational costs arising from the need to solve large linear systems. The de-facto standard for solving linear systems at scale is via the conjugate gradient algorithm—in particular, stochastic gradient descent is known to converge near-arbitrarily-slowly on quadratic objectives that correspond to Gaussian process models’ linear systems. In spite of this, we show that it produces solutions which have low test error, and quantify uncertainty in a manner that mirrors the true posterior. We develop a spectral characterization of the error caused by finite-time non-convergence, which we prove is small both near the data, and sufficiently far from the data. Stochastic gradient descent therefore only differs from the true posterior between these regions, demonstrating a form of implicit bias caused by benign non-convergence. We conclude by showing, empirically, that stochastic gradient descent achieves state-of-the-art performance on sufficiently large-scale regression tasks, and produces uncertainty estimates which match the performance of significantly more expensive baselines on large-scale Bayesian optimization.</p>
<p>Bio: Alexander Terenin is an incoming Assistant Research Professor at Cornell. He is interested in statistical machine learning, particularly in settings where the data is not fixed, but is gathered interactively by the learning machine. This leads naturally to Gaussian processes and data-efficient interactive decision-making systems such as Bayesian optimization, to areas such as multi-armed bandits and reinforcement learning, and to techniques for incorporating inductive biases and prior information such as symmetries into machine learning models.</p>Alexander TereninThe ability to deploy Gaussian-process-based decision-making systems such as Bayesian optimization at scale has traditionally been limited by computational costs arising from the need to solve large linear systems. The de-facto standard for solving linear systems at scale is via the conjugate gradient algorithm—in particular, stochastic gradient descent is known to converge near-arbitrarily-slowly on quadratic objectives that correspond to Gaussian process models’ linear systems. In spite of this, we show that it produces solutions which have low test error, and quantify uncertainty in a manner that mirrors the true posterior. We develop a spectral characterization of the error caused by finite-time non-convergence, which we prove is small both near the data, and sufficiently far from the data. Stochastic gradient descent therefore only differs from the true posterior between these regions, demonstrating a form of implicit bias caused by benign non-convergence. We conclude by showing, empirically, that stochastic gradient descent achieves state-of-the-art performance on sufficiently large-scale regression tasks, and produces uncertainty estimates which match the performance of significantly more expensive baselines on large-scale Bayesian optimization.Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent2023-06-21T00:00:00+00:002023-06-21T00:00:00+00:00https://avt.im/talks/2023/06/21/Stochastic-Gradient-Descent-GP<p>The ability to deploy Gaussian-process-based decision-making systems such as Bayesian optimization at scale has traditionally been limited by computational costs arising from the need to solve large linear systems. The de-facto standard for solving linear systems at scale is via the conjugate gradient algorithm—in particular, stochastic gradient descent is known to converge near-arbitrarily-slowly on quadratic objectives that correspond to Gaussian process models’ linear systems. In spite of this, we show that it produces solutions which have low test error, and quantify uncertainty in a manner that mirrors the true posterior. We develop a spectral characterization of the error caused by finite-time non-convergence, which we prove is small both near the data, and sufficiently far from the data. Stochastic gradient descent therefore only differs from the true posterior between these regions, demonstrating a form of implicit bias caused by benign non-convergence. We conclude by showing, empirically, that stochastic gradient descent achieves state-of-the-art performance on sufficiently large-scale regression tasks, and produces uncertainty estimates which match the performance of significantly more expensive baselines on large-scale Bayesian optimization.</p>
<p>Bio: Alexander Terenin is an incoming Assistant Research Professor at Cornell. He is interested in statistical machine learning, particularly in settings where the data is not fixed, but is gathered interactively by the learning machine. This leads naturally to Gaussian processes and data-efficient interactive decision-making systems such as Bayesian optimization, to areas such as multi-armed bandits and reinforcement learning, and to techniques for incorporating inductive biases and prior information such as symmetries into machine learning models.</p>Alexander TereninThe ability to deploy Gaussian-process-based decision-making systems such as Bayesian optimization at scale has traditionally been limited by computational costs arising from the need to solve large linear systems. The de-facto standard for solving linear systems at scale is via the conjugate gradient algorithm—in particular, stochastic gradient descent is known to converge near-arbitrarily-slowly on quadratic objectives that correspond to Gaussian process models’ linear systems. In spite of this, we show that it produces solutions which have low test error, and quantify uncertainty in a manner that mirrors the true posterior. We develop a spectral characterization of the error caused by finite-time non-convergence, which we prove is small both near the data, and sufficiently far from the data. Stochastic gradient descent therefore only differs from the true posterior between these regions, demonstrating a form of implicit bias caused by benign non-convergence. We conclude by showing, empirically, that stochastic gradient descent achieves state-of-the-art performance on sufficiently large-scale regression tasks, and produces uncertainty estimates which match the performance of significantly more expensive baselines on large-scale Bayesian optimization.Physically Structured Neural Networks for Smooth and Contact Dynamics2023-05-12T00:00:00+00:002023-05-12T00:00:00+00:00https://avt.im/talks/2023/05/12/Physically-Structured-Networks<p>A neural network’s architecture encodes key information and inductive biases that are used to guide its predictions. In this talk, we discuss recent work which leverages the perspective of neural ordinary differential equations to design network architectures that encode the structures of classical mechanics. We examine the cases of both smooth dynamics and non-smooth contact dynamics. The architectures obtained are easy to understand, show excellent performance and data-efficiency on simple benchmark tasks, and are a promising emerging tool for use in robot learning and related areas.</p>
<p>Alexander Terenin is a Postdoctoral Research Associate at the University of Cambridge. He is interested in statistical machine learning, particularly in settings where the data is not fixed, but is gathered interactively by the learning machine. This leads naturally to Gaussian processes and data-efficient interactive decision-making systems such as Bayesian optimization, to areas such as multi-armed bandits and reinforcement learning, and to techniques for incorporating inductive biases and prior information such as symmetries into machine learning models.</p>Alexander TereninA neural network’s architecture encodes key information and inductive biases that are used to guide its predictions. In this talk, we discuss recent work which leverages the perspective of neural ordinary differential equations to design network architectures that encode the structures of classical mechanics. We examine the cases of both smooth dynamics and non-smooth contact dynamics. The architectures obtained are easy to understand, show excellent performance and data-efficiency on simple benchmark tasks, and are a promising emerging tool for use in robot learning and related areas.Physically Structured Neural Networks for Smooth and Contact Dynamics2023-04-14T00:00:00+00:002023-04-14T00:00:00+00:00https://avt.im/talks/2023/04/14/Physically-Structured-Networks<p>A neural network’s architecture encodes key information and inductive biases that are used to guide its predictions. In this talk, we discuss recent work which leverages the perspective of neural ordinary differential equations to design network architectures that encode the structures of classical mechanics. We examine the cases of both smooth dynamics and non-smooth contact dynamics. The architectures obtained are easy to understand, show excellent performance and data-efficiency on simple benchmark tasks, and are a promising emerging tool for use in robot learning and related areas.</p>
<p>Alexander Terenin is a Postdoctoral Research Associate at the University of Cambridge. He is interested in statistical machine learning, particularly in settings where the data is not fixed, but is gathered interactively by the learning machine. This leads naturally to Gaussian processes and data-efficient interactive decision-making systems such as Bayesian optimization, to areas such as multi-armed bandits and reinforcement learning, and to techniques for incorporating inductive biases and prior information such as symmetries into machine learning models.</p>Alexander TereninA neural network’s architecture encodes key information and inductive biases that are used to guide its predictions. In this talk, we discuss recent work which leverages the perspective of neural ordinary differential equations to design network architectures that encode the structures of classical mechanics. We examine the cases of both smooth dynamics and non-smooth contact dynamics. The architectures obtained are easy to understand, show excellent performance and data-efficiency on simple benchmark tasks, and are a promising emerging tool for use in robot learning and related areas.Pathwise Conditioning and Non-Euclidean Gaussian Processes2023-03-15T00:00:00+00:002023-03-15T00:00:00+00:00https://avt.im/talks/2023/03/15/Pathwise-Conditioning<p>In Gaussian processes, conditioning and computation of posterior distributions is usually done in a distributional fashion by working with finite-dimensional marginals. However, there is another way to think about conditioning: using actual random functions rather than their probability distributions. This perspective is particularly helpful in decision-theoretic settings such as Bayesian optimization, where it enables efficient computation of a wider class of acquisition functions than otherwise possible. In this talk, we describe these recent advances, and discuss their broader implications to Gaussian processes. We then present a class of Gaussian process models on graphs and manifolds, which can enable one to perform Bayesian optimization while taking into account symmetries and constraints in an intrinsic manner.</p>
<p>Alexander Terenin is a Postdoctoral Research Associate at the University of Cambridge. He is interested in statistical machine learning, particularly in settings where the data is not fixed, but is gathered interactively by the learning machine. This leads naturally to Gaussian processes and data-efficient interactive decision-making systems such as Bayesian optimization, to areas such as multi-armed bandits and reinforcement learning, and to techniques for incorporating inductive biases and prior information such as symmetries into machine learning models.</p>Alexander TereninIn Gaussian processes, conditioning and computation of posterior distributions is usually done in a distributional fashion by working with finite-dimensional marginals. However, there is another way to think about conditioning: using actual random functions rather than their probability distributions. This perspective is particularly helpful in decision-theoretic settings such as Bayesian optimization, where it enables efficient computation of a wider class of acquisition functions than otherwise possible. In this talk, we describe these recent advances, and discuss their broader implications to Gaussian processes. We then present a class of Gaussian process models on graphs and manifolds, which can enable one to perform Bayesian optimization while taking into account symmetries and constraints in an intrinsic manner.Pathwise Conditioning and Non-Euclidean Gaussian Processes2023-01-12T00:00:00+00:002023-01-12T00:00:00+00:00https://avt.im/talks/2023/01/12/Pathwise-Conditioning<p>In Gaussian processes, conditioning and computation of posterior distributions is usually done in a distributional fashion by working with finite-dimensional marginals. However, there is another way to think about conditioning: using actual random functions rather than their probability distributions. This perspective is particularly helpful in decision-theoretic settings such as Bayesian optimization, where it enables efficient computation of a wider class of acquisition functions than otherwise possible. In this talk, we describe these recent advances, and discuss their broader implications to Gaussian processes. We then present a class of Gaussian process models on graphs and manifolds, which can enable one to perform Bayesian optimization while taking into account symmetries and constraints in an intrinsic manner.</p>
<p>Alexander Terenin is a Postdoctoral Research Associate at the University of Cambridge. He is interested in statistical machine learning, particularly in settings where the data is not fixed, but is gathered interactively by the learning machine. This leads naturally to Gaussian processes and data-efficient interactive decision-making systems such as Bayesian optimization, to areas such as multi-armed bandits and reinforcement learning, and to techniques for incorporating inductive biases and prior information such as symmetries into machine learning models.</p>Alexander TereninIn Gaussian processes, conditioning and computation of posterior distributions is usually done in a distributional fashion by working with finite-dimensional marginals. However, there is another way to think about conditioning: using actual random functions rather than their probability distributions. This perspective is particularly helpful in decision-theoretic settings such as Bayesian optimization, where it enables efficient computation of a wider class of acquisition functions than otherwise possible. In this talk, we describe these recent advances, and discuss their broader implications to Gaussian processes. We then present a class of Gaussian process models on graphs and manifolds, which can enable one to perform Bayesian optimization while taking into account symmetries and constraints in an intrinsic manner.Pathwise Conditioning and Non-Euclidean Gaussian Processes2022-11-18T00:00:00+00:002022-11-18T00:00:00+00:00https://avt.im/talks/2022/11/18/Pathwise-Conditioning<p>In Gaussian processes, conditioning and computation of posterior distributions is usually done in a distributional fashion by working with finite-dimensional marginals. However, there is another way to think about conditioning: using actual random functions rather than their probability distributions. This perspective is particularly helpful in decision-theoretic settings such as Bayesian optimization, where it enables efficient computation of a wider class of acquisition functions than otherwise possible. In this talk, we describe these recent advances, and discuss their broader implications to Gaussian processes. We then present a class of Gaussian process models on graphs and manifolds, which can enable one to perform Bayesian optimization while taking into account symmetries and constraints in an intrinsic manner.</p>
<p>Alexander Terenin is a Postdoctoral Research Associate at the University of Cambridge. He is interested in statistical machine learning, particularly in settings where the data is not fixed, but is gathered interactively by the learning machine. This leads naturally to Gaussian processes and data-efficient interactive decision-making systems such as Bayesian optimization, to areas such as multi-armed bandits and reinforcement learning, and to techniques for incorporating inductive biases and prior information such as symmetries into machine learning models.</p>Alexander TereninIn Gaussian processes, conditioning and computation of posterior distributions is usually done in a distributional fashion by working with finite-dimensional marginals. However, there is another way to think about conditioning: using actual random functions rather than their probability distributions. This perspective is particularly helpful in decision-theoretic settings such as Bayesian optimization, where it enables efficient computation of a wider class of acquisition functions than otherwise possible. In this talk, we describe these recent advances, and discuss their broader implications to Gaussian processes. We then present a class of Gaussian process models on graphs and manifolds, which can enable one to perform Bayesian optimization while taking into account symmetries and constraints in an intrinsic manner.Pathwise Conditioning and Non-Euclidean Gaussian Processes2022-11-16T00:00:00+00:002022-11-16T00:00:00+00:00https://avt.im/talks/2022/11/16/Pathwise-Conditioning<p>In Gaussian processes, conditioning and computation of posterior distributions is usually done in a distributional fashion by working with finite-dimensional marginals. However, there is another way to think about conditioning: using actual random functions rather than their probability distributions. This perspective is particularly helpful in decision-theoretic settings such as Bayesian optimization, where it enables efficient computation of a wider class of acquisition functions than otherwise possible. In this talk, we describe these recent advances, and discuss their broader implications to Gaussian processes. We then present a class of Gaussian process models on graphs and manifolds, which can enable one to perform Bayesian optimization while taking into account symmetries and constraints in an intrinsic manner.</p>
<p>Alexander Terenin is a Postdoctoral Research Associate at the University of Cambridge. He is interested in statistical machine learning, particularly in settings where the data is not fixed, but is gathered interactively by the learning machine. This leads naturally to Gaussian processes and data-efficient interactive decision-making systems such as Bayesian optimization, to areas such as multi-armed bandits and reinforcement learning, and to techniques for incorporating inductive biases and prior information such as symmetries into machine learning models.</p>Alexander TereninIn Gaussian processes, conditioning and computation of posterior distributions is usually done in a distributional fashion by working with finite-dimensional marginals. However, there is another way to think about conditioning: using actual random functions rather than their probability distributions. This perspective is particularly helpful in decision-theoretic settings such as Bayesian optimization, where it enables efficient computation of a wider class of acquisition functions than otherwise possible. In this talk, we describe these recent advances, and discuss their broader implications to Gaussian processes. We then present a class of Gaussian process models on graphs and manifolds, which can enable one to perform Bayesian optimization while taking into account symmetries and constraints in an intrinsic manner.Pathwise Conditioning and Non-Euclidean Gaussian Processes2022-11-15T00:00:00+00:002022-11-15T00:00:00+00:00https://avt.im/talks/2022/11/15/Pathwise-Conditioning<p>In Gaussian processes, conditioning and computation of posterior distributions is usually done in a distributional fashion by working with finite-dimensional marginals. However, there is another way to think about conditioning: using actual random functions rather than their probability distributions. This perspective is particularly helpful in decision-theoretic settings such as Bayesian optimization, where it enables efficient computation of a wider class of acquisition functions than otherwise possible. In this talk, we describe these recent advances, and discuss their broader implications to Gaussian processes. We then present a class of Gaussian process models on graphs and manifolds, which can enable one to perform Bayesian optimization while taking into account symmetries and constraints in an intrinsic manner.</p>
<p>Alexander Terenin is a Postdoctoral Research Associate at the University of Cambridge. He is interested in statistical machine learning, particularly in settings where the data is not fixed, but is gathered interactively by the learning machine. This leads naturally to Gaussian processes and data-efficient interactive decision-making systems such as Bayesian optimization, to areas such as multi-armed bandits and reinforcement learning, and to techniques for incorporating inductive biases and prior information such as symmetries into machine learning models.</p>Alexander TereninIn Gaussian processes, conditioning and computation of posterior distributions is usually done in a distributional fashion by working with finite-dimensional marginals. However, there is another way to think about conditioning: using actual random functions rather than their probability distributions. This perspective is particularly helpful in decision-theoretic settings such as Bayesian optimization, where it enables efficient computation of a wider class of acquisition functions than otherwise possible. In this talk, we describe these recent advances, and discuss their broader implications to Gaussian processes. We then present a class of Gaussian process models on graphs and manifolds, which can enable one to perform Bayesian optimization while taking into account symmetries and constraints in an intrinsic manner.Pathwise Conditioning and Non-Euclidean Gaussian Processes2022-11-10T00:00:00+00:002022-11-10T00:00:00+00:00https://avt.im/talks/2022/11/10/Pathwise-Conditioning<p>In Gaussian processes, conditioning and computation of posterior distributions is usually done in a distributional fashion by working with finite-dimensional marginals. However, there is another way to think about conditioning: using actual random functions rather than their probability distributions. This perspective is particularly helpful in decision-theoretic settings such as Bayesian optimization, where it enables efficient computation of a wider class of acquisition functions than otherwise possible. In this talk, we describe these recent advances, and discuss their broader implications to Gaussian processes. We then present a class of Gaussian process models on graphs and manifolds, which can enable one to perform Bayesian optimization while taking into account symmetries and constraints in an intrinsic manner.</p>
<p>Alexander Terenin is a Postdoctoral Research Associate at the University of Cambridge. He is interested in statistical machine learning, particularly in settings where the data is not fixed, but is gathered interactively by the learning machine. This leads naturally to Gaussian processes and data-efficient interactive decision-making systems such as Bayesian optimization, to areas such as multi-armed bandits and reinforcement learning, and to techniques for incorporating inductive biases and prior information such as symmetries into machine learning models.</p>Alexander TereninIn Gaussian processes, conditioning and computation of posterior distributions is usually done in a distributional fashion by working with finite-dimensional marginals. However, there is another way to think about conditioning: using actual random functions rather than their probability distributions. This perspective is particularly helpful in decision-theoretic settings such as Bayesian optimization, where it enables efficient computation of a wider class of acquisition functions than otherwise possible. In this talk, we describe these recent advances, and discuss their broader implications to Gaussian processes. We then present a class of Gaussian process models on graphs and manifolds, which can enable one to perform Bayesian optimization while taking into account symmetries and constraints in an intrinsic manner.