About

I’m a PhD student working on statistical machine learning and artificial intelligence. My research focuses on Bayesian theory. I am particularly interested in the intersection of Bayesian nonparametrics with differential geometry and mechanics, as well as with topic models and natural language processing. Selected works include the following.

Physically meaningful differential-geometric methods for deep learning.

  • Variational Integrator Networks. We propose a deep network architecture that allows data-efficient learning of dynamical systems from image observations, by defining the network’s latent space in such a way that it forms a dynamical system in its own right.

Scalable training methods for Bayesian topic models.

  • Pólya Urn Latent Dirichlet Allocation. We propose an algorithm for training Latent Dirichlet Allocation that is exact for large data sets, massively parallel, avoids memory bottlenecks of previous approaches, and has the lowest computational complexity of any method its class.

  • Sparse Parallel Training of Hierarchical Dirichlet Process Topic Models. We propose a massively parallel algorithm for training the Hierarchical Dirichlet Process topic model. Along the way, we prove a non-standard conjugacy result for a certain stick-breaking process. The algorithm enables a Hierarchical Dirichlet Process to be trained on the 700m-token PubMed corpora for the first time.

  • Asynchronous Gibbs Sampling. To help understand computational techniques widely used for topic models, we propose a way of analyzing Markov Chain Monte Carlo methods executed asynchronously on a compute cluster – a setting where the Markov property doesn’t hold. We show that such algorithms can be made to converge if worker nodes are allowed to reject other worker nodes’ messages.