AISTATS 2021

https://avt.im/ · @avt_im

Matérn 

Gaussian Processes

 on Graphs

Viacheslav Borovitskiy,* Iskander Azangulov,* Alexander Terenin,*

Peter Mostowsky, Marc Peter Deisenroth, and Nicolas Durrande

*equal contribution

Gaussian Processes

Gaussian Processes

Definition. A Gaussian process is random function $f : X \to \R$ such that for any $x_1,..,x_n$, the vector $f(x_1),..,f(x_n)$ is multivariate Gaussian.

Every GP is characterized by a mean $\mu(\.)$ and a kernel $k(\.,\.)$. We have $$ \htmlClass{fragment}{ f(\v{x}) \~ \f{N}(\v{\mu}_{\v{x}},\m{K}_{\v{x}\v{x}}) } $$ where $\v\mu_{\v{x}} = \mu(\v{x})$ and $\m{K}_{\v{x}\v{x}'} = k(\v{x},\v{x}')$.

Bayesian Learning

$$ \htmlClass{fragment}{ y_i = f(x_i) } \qquad \htmlClass{fragment}{ f \~\f{GP}(0,k) } $$

$$ \htmlClass{fragment fade-in-then-out}{ f(\v{x}_*) \given \v{y} \~ \f{N}(\m{K}_{*\v{x}} \m{K}_{\v{x}\v{x}}^{-1}\v{y}, \m{K}_{**} - \m{K}_{*\v{x}}\m{K}_{\v{x}\v{x}}^{-1}\m{K}_{\v{x}*}) } $$

$$ \htmlClass{fragment}{ (f \given \v{y})(\.) = f(\.) + \m{K}_{(\.)\v{x}} \m{K}_{\v{x}\v{x}}^{-1} (\v{y} - f(\v{x})) } $$

J. T. Wilson, V. Borovitskiy, A. Terenin, P. Mostowsky, M. P. Deisenroth. Efficiently Sampling Functions from Gaussian Process Posteriors. ICML 2020.

Matérn Kernel

$$ \htmlClass{fragment}{ k(x,x') = \sigma^2 \frac{2^{1-\nu}}{\Gamma(\nu)} \del{\sqrt{2\nu} \frac{\norm{x-x'}}{\kappa}}^\nu K_\nu \del{\sqrt{2\nu} \frac{\norm{x-x'}}{\kappa}} } $$ $\sigma^2$: variance $\kappa$: length scale $\nu$: smoothness
$\nu\to\infty$: recovers squared exponential kernel

Defines GPs $f: \R^d \to \R$

Weighted Undirected Graphs

$$ f : G \to \R $$

$$ f\Big(\smash{\includegraphics[height=2.75em,width=1.5em]{data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 100 40 40' xmlns:v='https://vecta.io/nano'%3E%3Cg fill='none' stroke='%23ce2e31'%3E%3Cpath d='M32 100l-11.167 18.612M20 120l20 7.5'/%3E%3Cpath d='M20 120l20-7.143m-40 10L20 120M6 100l14 20'/%3E%3C/g%3E%3Ccircle cx='20' cy='120' r='10' fill='%23ce2e31'/%3E%3C/svg%3E}}\Big) \to \R $$

$$ f\Big(\smash{\includegraphics[height=2.75em,width=1.5em]{data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='80 130 40 40' xmlns:v='https://vecta.io/nano'%3E%3Cpath d='M100 150l20-2.222M100 150l15-20m-35 12.5l20 7.5m-20-2.667L100 150' fill='none' stroke='%239467bd'/%3E%3Ccircle cx='100' cy='150' r='10' fill='%239467bd'/%3E%3C/svg%3E}}\Big) \to \R $$

$$ f\Big(\smash{\includegraphics[height=2.75em,width=1.5em]{data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='140 50 40 40' xmlns:v='https://vecta.io/nano'%3E%3Cg fill='none' stroke='%23fc812b'%3E%3Cpath d='M160 70l8.571 20M145 90l15-20'/%3E%3Cpath d='M140 57.5L160 70m-20 7.143L160 70m0 0l20 4m0-24l-20 20'/%3E%3C/g%3E%3Ccircle cx='160' cy='70' r='10' fill='%23fc812b'/%3E%3C/svg%3E}}\Big) \to \R $$

Stochastic Partial Differential Equations

$$ \htmlClass{fragment}{ \underset{\t{Matérn}}{\undergroup{\del{\frac{2\nu}{\kappa^2} - \Delta}^{\frac{\nu}{2}+\frac{d}{4}} f = \c{W}}} } \qquad \htmlClass{fragment}{ \underset{\t{squared exponential}}{\undergroup{\vphantom{\del{\frac{2\nu}{\kappa^2} - \Delta}^{\frac{\nu}{2}+\frac{d}{4}}} e^{-\frac{\kappa^2}{4}\Delta} f = \c{W}}} } $$ $\Delta$: Laplacian $\c{W}$: (rescaled) white noise
$e^{-\frac{\kappa^2}{4}\Delta}$: (rescaled) heat semigroup

The Graph Laplacian

$$ \htmlClass{fragment}{ (\m\Delta\v{f})(x) = \sum_{x' \~ x} w_{xx'} (f(x) - f(x')) } $$ $$ \htmlClass{fragment}{ \m\Delta = \m{D} - \m{W} } $$ $\m{D}$: degree matrix $\m{W}$: (weighted) adjacency matrix

Note: different sign convention, analogous to Euclidean $-\Delta$

Graph Matérn Gaussian Processes

$$ \htmlClass{fragment}{ \underset{\t{Matérn}}{\undergroup{\del{\frac{2\nu}{\kappa^2} + \m\Delta}^{\frac{\nu}{2}} \v{f} = \c{W}\hspace*{-2.42ex}\c{W}\hspace*{-2.42ex}\c{W}}} } \qquad \htmlClass{fragment}{ \underset{\t{squared exponential}}{\undergroup{\vphantom{\del{\frac{2\nu}{\kappa^2} - \m\Delta}^{\frac{\nu}{2}+\frac{d}{4}}} e^{\frac{\kappa^2}{4}\m\Delta} \v{f} = \c{W}\hspace*{-2.42ex}\c{W}\hspace*{-2.42ex}\c{W}}} } $$ $\m\Delta$: graph Laplacian $\c{W}\hspace*{-2.42ex}\c{W}\hspace*{-2.42ex}\c{W}$: standard Gaussian

Graph Matérn Gaussian Processes

$$ \htmlClass{fragment}{ \underset{\t{Matérn}}{\undergroup{\vphantom{\v{f} \~\f{N}\del{\v{0},e^{-\frac{\kappa^2}{4}\m\Delta}}} \v{f} \~\f{N}\del{\v{0},{\textstyle\del{\frac{2\nu}{\kappa^2} + \m\Delta}^{-\nu}}}}} } \qquad \htmlClass{fragment}{ \underset{\t{squared exponential}}{\undergroup{\v{f} \~\f{N}\del{\v{0},e^{-\frac{\kappa^2}{2}\m\Delta}}}} } $$

Graph Fourier Features

$$ \htmlClass{fragment}{ k_\nu(x,x') = \frac{\sigma^2}{C_\nu} \sum_{n=1}^{|G|} \del{\frac{2\nu}{\kappa^2} + \lambda_n}^{-\nu} \v{f}_n(x)\v{f}_n(x') } $$ $\lambda_n,\v{f}_n$: eigenvalues and eigenvectors of graph Laplacian

Prior Variance

(a) Complete graph

(b) Star graph

(c) Random graph

(d) Random graph

Prior variance depends on geometry

Example: Graph Interpolation of Traffic

(a) Predictive mean

(b) Standard deviation

Example: Graph Interpolation of Traffic

(a) Predictive mean

(b) Standard deviation

Connection with Matérn Gaussian Processes on Riemannian Manifolds

V. Borovitskiy, A. Terenin, P. Mostowsky, M. P. Deisenroth. Matérn Gaussian Processes on Riemannian Manifolds. NeurIPS 2020.

Thank you!

https://avt.im/· @avt_im

V. Borovitskiy, I. Azangulov, A. Terenin, P. Mostowsky, M. P. Deisenroth. Matérn Gaussian Processes on Graphs. Artificial Intelligence and Statistics, 2021.

V. Borovitskiy, A. Terenin, P. Mostowsky, M. P. Deisenroth. Matérn Gaussian Processes on Riemannian Manifolds. Advances in Neural Information Processing Systems, 2020.

J. T. Wilson, V. Borovitskiy, A. Terenin, P. Mostowsky, M. P. Deisenroth. Pathwise Conditioning of Gaussian Processes. Journal of Machine Learning Research, 2021.

J. T. Wilson, V. Borovitskiy, A. Terenin, P. Mostowsky, M. P. Deisenroth. Efficiently Sampling Functions from Gaussian Process Posteriors. International Conference on Machine Learning, 2020.