Weitz Group @ Georgia Tech Theoretical Ecology and Quantitative Biology

Slicing through "Big Data'': dimension reduction methods applied to metagenomic data (J. Math. Biol, 2012)

Posted by jsweitz

The New York Times recently covered the ascent of the era of Big Data. However, interpreting Big Data in a way that is both intuitive and retains key information is a challenge. This challenge is particularly acute in the study of environmental sequence data, i.e., "metagenomics". Joshua Weitz collaborated with Xingpeng Jiang and Jonathan Dushoff to develop a dimension reduction approach to analyzing the geographic distribution of microbes. The method is based on novel extensions of non-negative matrix factorization (NMF). The manuscript, A non-negative matrix factorization framework for identifying modular patterns in metagenomic profile data was just published in a special issue of the Journal of Mathematical Biology in honor of Simon Levin's 70th birthday.

In this paper, we utilized NMF to describe taxonomic profile matrices into the product of lower-dimensional positive-valued matrices. These matrices enable an intuitive and informative view of the biogeography of microbial taxa. We are currently working on extensions of these ideas as applied to the distribution of microbial function, in collaboration with Morgan Langille, Russell Neches, Marie Elliott, Jonathan Eisen and Simon Levin.