Abstract. Training nonparametric extensions of topic models such as Latent Dirichlet Allocation, including Hierarchical Dirichlet Processes (HDP), generally requires use of iterative algorithms. To scale to larger datasets, these models are increasingly being deployed on parallel and distributed systems. Unfortunately, most current approaches to scalable training of topic models either don’t converge to the correct target, or are not data-parallel. Moreover, these approaches generally do not utilize all available sources of sparsity found in natural language – an important way to make computation efficient. Based upon a representation of certain conditional distributions within an HDP, we propose a doubly sparse data-parallel sampler for the HDP topic model that addresses these issues. We benchmark our method on a well-known corpora (PubMed) with 8m documents and 768m tokens, using a single multi-core machine in under three days.