I’m a PhD student working on statistical machine learning and artificial intelligence. Nowadays, I am particularly interested in the intersection of Bayesian theory and differential geometry. In the past, I’ve worked on scalable computation in Bayesian models, including Markov Chain Monte Carlo methods on parallel and distributed systems such as GPUs and compute clusters. This work has found application in natural language processing via scalable training of Bayesian nonparametric topic models. Selected works include the following.

  • Variational Integrator Networks. We propose a network architecture that allows data-efficient learning of dynamical systems from image observations, by defining the network’s latent space in such a way that it forms a dynamical system in its own right.

  • Pólya Urn Latent Dirichlet Allocation. We propose an algorithm for training Latent Dirichlet Allocation that is exact for large data sets, massively parallel, avoids memory bottlenecks of previous approaches, and has the lowest computational complexity of any method its class.

  • Sparse Parallel Training of Hierarchical Dirichlet Process Topic Models. We propose a massively parallel algorithm for training the Hierarchical Dirichlet Process topic model. Along the way, we prove a non-standard conjugacy result for a certain stick-breaking process. The algorithm enables a Hierarchical Dirichlet Process to be trained on the 700m-token PubMed corpora for the first time.

  • Asynchronous Gibbs Sampling. We propose a way of analyzing Markov Chain Monte Carlo methods executed asynchronously on a compute cluster – a setting where the Markov property doesn’t hold. We show that such algorithms can be made to converge if worker nodes are allowed to reject other worker nodes’ messages.