Metagenomics is a rapidly-expanding field that repeatedly tells us how little we know.

Metagenomics is basically the process of surveying genomes in the environment.  By going to genome analysis as directly as possible, this reduces the issues of isolation and culture. If you can isolate or grow bacteria or viruses, you probably already have a fairly decent idea of what you’re looking for.  Metagenomics lets you see what’s actually there, not what you think should be there or what you happen to be able to work with.  And it seems that wherever the metagenomists go looking, there are vast numbers of viruses1 hiding, unculturable or unidentifiable. We’ve been looking at the tips of the icebergs, thinking that the little bumps and valleys we’ve mapped are the whole story; and now we have to start looking at the hidden part.

This is true whether the samples are from what we think of as the “environment” (lakes, oceans, soil) or from animals and people.  Pretty typically, most of the genomes that get turned up — well over half of them — don’t look like anything we know about.  Just to put a little context on that, over 2000 viruses have had their genomes completely sequenced, and there are over 1,000,000 sequences in GenBank tagged “virus”;  yet if you go and look pretty much anywhere, most 0f the viruses there are completely new to us, so different that we can’t even detect the most distant relationship to anything we know about.

For example, in Lake Needham, in Maryland, “a large majority (~66%) of these assemblies had no significant homology to any known sequences of viral, bacterial, eukaryotic and archaeal origin …but appeared to be most likely derived from novel viruses“.  2.  In reclaimed water, “Over 50% of the viral metagenomic sequences (both DNA and RNA) identified in reclaimed water metagenomes had no significant similarity to proteins in GenBank“;3 In ocean samples, “On average, >91% of the sequences were not significantly similar to those in the extant databases.” 4


That might not be too surprising — there hasn’t been long-standing, intense interest in viruses in lakes, so you’d expect to find a lot of new stuff.  But even in us, a good half of our viral inhabitants are unknown.5 For example, in stool samples scanned for viruses, “Most of the sequences were unrelated to anything previously reported.6

Most of these unknown viruses are probably harmless.  Many are probably just hitch-hikers, traveling through our intestines only because we ate, say, the pepper that they were infecting. 7  But metagenomics has recently also started turning up new pathogens (or at any rate, viruses that may be pathogens), of humans as well as other species.5 Some of these are really new: A virus isolated from sea turtles, that potentially is involved in the fibropapilloma disease that’s spreading in them, “may represent a new viral genus of the Circoviridae family or possibly even a new viral family.8

In the next few years, there’s going to be yet another data explosion, as metagenomics turns up new things in astronomical numbers.  Clinical research is going to have to scramble to understand what these mean — which of these are pathogens, which are irrelevant as far as disease and health?  We’re going to need new tools to understand and screen these things.  It should be interesting to see what happens.

  1. And bacteria, but bacteria aren’t very interesting, are they[]
  2. Djikeng A, Kuzmickas R, Anderson NG, Spiro DJ (2009) Metagenomic Analysis of RNA Viruses in a Fresh Water Lake. PLoS ONE 4(9): e7264. doi:10.1371/journal.pone.0007264[]
  3. Rosario, K., Nilsson, C., Lim, Y., Ruan, Y., & Breitbart, M. (2009). Metagenomic analysis of viruses in reclaimed water Environmental Microbiology DOI: 10.1111/j.1462-2920.2009.01964.x[]
  4. Angly, F., Felts, B., Breitbart, M., Salamon, P., Edwards, R., Carlson, C., Chan, A., Haynes, M., Kelley, S., Liu, H., Mahaffy, J., Mueller, J., Nulton, J., Olson, R., Parsons, R., Rayhawk, S., Suttle, C., & Rohwer, F. (2006). The Marine Viromes of Four Oceanic Regions PLoS Biology, 4 (11) DOI: 10.1371/journal.pbio.0040368[]
  5. Victoria, J., Kapoor, A., Li, L., Blinkova, O., Slikas, B., Wang, C., Naeem, A., Zaidi, S., & Delwart, E. (2009). Metagenomic Analyses of Viruses in Stool Samples from Children with Acute Flaccid Paralysis Journal of Virology, 83 (9), 4642-4651 DOI: 10.1128/JVI.02301-08[][]
  6. Breitbart, M., Hewson, I., Felts, B., Mahaffy, J., Nulton, J., Salamon, P., & Rohwer, F. (2003). Metagenomic Analyses of an Uncultured Viral Community from Human Feces Journal of Bacteriology, 185 (20), 6220-6223 DOI: 10.1128/JB.185.20.6220-6223.2003[]
  7. Zhang, T., Breitbart, M., Lee, W., Run, J., Wei, C., Soh, S., Hibberd, M., Liu, E., Rohwer, F., & Ruan, Y. (2006). RNA Viral Community in Human Feces: Prevalence of Plant Pathogenic Viruses PLoS Biology, 4 (1) DOI: 10.1371/journal.pbio.0040003[]
  8. Ng, T., Manire, C., Borrowman, K., Langer, T., Ehrhart, L., & Breitbart, M. (2008). Discovery of a Novel Single-Stranded DNA Virus from a Sea Turtle Fibropapilloma by Using Viral Metagenomics Journal of Virology, 83 (6), 2500-2509 DOI: 10.1128/JVI.01946-08[]