Mutation comic

Indeed, the amount of HIV diversity within a single infected individual can exceed the variability generated over the course of a global influenza epidemic, the latter of which results in the need for a new vaccine each year. 1

That was said as part of a discussion on HIV vaccines, but let’s think about it from the influenza side.  Why is it true? Why doesn’t influenza have as many variants as HIV?

(Update: Another paper also looks at this question and points to some interesting explanations; I talk about that paper here.)

We know that influenza, like other RNA viruses, is prone to mutation (that is, it has an error-prone polymerase). Depending how you measure it, it’s likely that almost every new influenza genome has at least one mutation in it, meaning that every new infected animal or person should be be generating thousands upon thousands of new influenza variants.

Globally, of course we do see thousands of new flu variants each year.2 But, based on replication fidelity, you’d expect to see a lot more — maybe not quite as many as HIV, but not far from it.

This is also true on a much smaller scale, looking within infected individuals (animals or people).  Even using modern deep-sequencing techniques (like those used in some of the HIV analyses) that should in theory be able to detect large numbers of mutations, there are fewer than you might expect based on the known replication fidelity — far fewer variants than in HIV:

Inasmuch as the mutation rate for type A influenza viruses is estimated at one nucleotide change per 10,000 nucleotide during replication and most infections are caused by as many as 10 to 1000 virions which likely possess varying numbers of nucleotide differences in their genomes, one can expect that each influenza A virion is possibly a quasispecies. However, we identified relatively few quasispecies – probably because the currently available sequence analysis software do not allow robust quasispecies analysis and extensive manual curation is necessary. We believe that with the help of improved bioinformatic tools we would detect more quasispecies populations in our sample sets.  3

H1N1 (swine-origin influenza virus)
H1N1 (swine-origin influenza virus)

I don’t know enough about the computational side to comment on their bioinformatics point.  Another recent paper4 uses a similar approach and (at least at first) seems to reach an more conservative conclusion.  They talk about “quasispecies”, but they seem to be using the term rather loosely, to describe just a handful of distinct genomic sequences. These sequences differ by, for example, a single base (and a single amino acid) in the HA, where one of the sequences was present at about 75% of the sequences, and the other at about 25%. To me that’s not really a “quasispecies” — a quasispecies is something that needs to be defined by an average sequence even though the vast majority of the genomes are different from that average. (Here and here are Vincent Racaniello’s explanations at The Virology Blog.)  Two sequences is just two sequences.

However! The authors do make their data available. I don’t have time to do a detailed look, but from what I think is a very conservative analysis, in one stretch of just 25-40 bases some 5-10% of the genomes have at least one mutation.5 If that’s roughly true across the whole genome, then each genome would have, what, maybe a half-dozen mutations on average. That, to me, really is a quasispecies.

(Do note that this is not the mutation frequency for any individual residue. No single point [with the two or three exceptions that the authors focused on] is mutated at much more than one in a thousand, and most probably more like one in many thousand, which is about what you’d expect. )

There are a myriad of complicating factors separating the error frequency in these genomes from the raw error rate of the viral polymerase. A couple of huge ones: These viruses had undergone a bunch of replications in the host – this isn’t the error rate per replication cycle, it’s the cumulative error rate after many cycles. The virus was from a patient who had died with (and probably of) the virus, and though we don’t know how many time the original infecting virus had replicated it was at least a half dozen cycles, perhaps two or three times that.

Influenza virion
Influenza virion

On the other hand, during those replication cycles, many mutations (quite likely even the great majority of them) would have been deleterious or outright defective, so most of the mutations would have never propagated but would have just silently disappeared and not been counted at the end.

The most interesting point is that these mutations aren’t arising in a vacuum. Thinking now about which mutations survive and get detected, not the baseline rate of mutation formation: The variants are forming are in an environment that’s designed to be very hostile to viruses. Mutations are going to undergo selection by the immune system.

This is one place where influenza is going to experience a very a different set of pressures than HIV. HIV persists in the presence of the adaptive (T cell and antibody-based) immune response, whereas as the adaptive response kicks in for flu the virus gets evicted. HIV therefore not only have a much longer period (years instead of days) to throw out mutations, it also is shaped by the immune response. By comparison flu would probably only have a couple of replication cycles in the presence of an adaptive response.

Changes in the virus that accumulate over the handful of replication cycles would reflect a strong selection pressure. The vast majority of mutations, even those that aren’t completely defective, are going to be less fit than the original virus and won’t accumulate. Knowing which mutations do accumulate should be very interesting because it may tell us what the virus is going through in the host.  That’s what the authors of this paper focused on — the one particular site that had a much, much higher variant  frequency, more like 25% of the genomes.  The assumption is that this arose during the infection and was positively selected for. 6

The variants that replicate best in a host may be quite different from those that are effectively transmitted. That is, there may be multiple sources of selective pressure, of which we have previously mainly only seen transmission pressure (because that’s the main one that will accumulate in a population, because transmission represents a bottleneck in the virus’s evolution [link to The Virology Blog]).  The particular HA variant that was picked up here (that apparently accumulated during the infection) is rare globally. Is that a version of the HA that’s more efficient within a host, but that doesn’t transmit as well?

I think a major reason for the difference between HIV and influenza variant accumulation is the difference between within-host and between-host (transmission) selection.  HIV spends long, long periods within a single host, thousands of replication cycles, accumulating mutations.  The transmission bottlenecks come at much longer intervals and have a much larger accumulated population to work with.  Influenza has a comparatively brief period within the host, only a handful of replications before a new transmission bottleneck hits. 7

This sort of deep sequencing experiment on influenza will probably be improved over the next few years, and I’ll be very interested to see just how much variation there really is within on flu-infected host.


  1. Walker, B., & Burton, D. (2008). Toward an AIDS Vaccine Science, 320 (5877), 760-764 DOI: 10.1126/science.1152622[]
  2. More correctly, I suppose, we infer the presence of thousands of new variants based on the hundreds of them that we see, and knowing that we are only examining a tiny fraction of all the flu cases that are out there.[]
  3. Ramakrishnan, M., Tu, Z., Singh, S., Chockalingam, A., Gramer, M., Wang, P., Goyal, S., Yang, M., Halvorson, D., & Sreevatsan, S. (2009). The Feasibility of Using High Resolution Genome Sequencing of Influenza A Viruses to Detect Mixed Infections and Quasispecies PLoS ONE, 4 (9) DOI: 10.1371/journal.pone.0007105[]
  4. Kuroda, M., Katano, H., Nakajima, N., Tobiume, M., Ainai, A., Sekizuka, T., Hasegawa, H., Tashiro, M., Sasaki, Y., Arakawa, Y., Hata, S., Watanabe, M., & Sata, T. (2010). Characterization of Quasispecies of Pandemic 2009 Influenza A Virus (A/H1N1/2009) by De Novo Sequencing Using a Next-Generation DNA Sequencer PLoS ONE, 5 (4) DOI: 10.1371/journal.pone.0010256[]
  5. I extracted the FASTQ data containing the short sequence reads matching influenza sequences from the supplemental PDF, converted it to FASTA, and used xdformat to move it into a BLAST database. Then I grabbed 40 bases from the genbank sequence CY045951.1, the PB2 segment of the closest-match influenza strain, choosing a region (positions 2151-2190) with very high coverage, and BLASTed this sequence against the short sequence data, using parameters such that I retrieved sequences that match at least 25 of 40 positions.  Of the  ~2050 hits I retrieved, about 120 had at least one internal mismatch. I can’t distinguish these from sequencing errors, but I think it’s much higher than you’d expect from sequencing error.  And I hope that my conservative approach (for example, I would have discarded mismatches at the ends of the hits) would balance out that source of confusion. []
  6. One point, by the way, that the authors didn’t cover was the possibility that this patient had actually been initially infected with more than one viral sequence.  We do know that a significant number of flu cases are doubly infected. The fact that the minor variant is a very unusual strain makes this less likely, but not impossible.[]
  7. And I think it’s fair to say that the global population-based HIV variation — the transmission-selected amount of variation, as opposed to the vast within-individual variation — is rather more comparable to that of influenza.[]