Mystery Rays from Outer Space

Meddling with things mankind is not meant to understand. Also, pictures of my kids

January 15th, 2010

I’ll see your bornaviruses, and raise with a poxvirus

There’s been recent excitement over the discovery of bornaviruses fixed in the human genome1, 2.  Exciting and unexpected as that is, as usual, the insects are way ahead of us.  The genome of a parasitoid wasp has poxvirus sequences in it!

Detecting ancient lateral transfers is more problematic. By examining protein domain arrangements in Nasonia relative to other organisms, we uncovered an ancient lateral gene transfer involving Pox viruses, Wolbachia, and Nasonia. Thirteen ANK repeat–bearing proteins encoded in the N. vitripennis genome also contain C-terminal PRANC (Pox proteins repeats of ankyrin–C terminal) domains. This domain was previously only described in Pox viruses, where it is associated with ANK repeats and inhibits the nuclear factor ΚB (NF- ΚB) pathway in mammalian hosts …3

These parasitic wasps are not the same family as the magnificent braconid parasitic wasps that have developed a symbiotic relationship with polydnaviruses (see my posts here and here), and braconids’ incorporation of nudivirus genomes already trumps the bornavirus findings.  But still.  Poxviruses!

I don’t think we have any functional information on what the Nasiona are doing with the poxvirus genes here, and I know very little about wasp biology, but given that:

  • in mammals these genes are  inhibitors of the innate immune response,
  • the innate immune response is relatively  conserved from insects to humans, and
  • Braconid wasps use their symbiotic viruses to inhibit their prey’s immune responses,

I wonder if the Nasonia have independently come up with the same idea as Braconids, and incorporated a viral immune evasion molecule to use in their venom to suppress their prey’s immune response to the wasp’s eggs and larvae.


  1. Original paper in Nature[]
  2. See commentary in the New York Times;  the Virology Blog; and Not Exactly Rocket Science[]
  3. The Nasonia Genome Working Group (2010). Functional and Evolutionary Insights from the Genomes of Three Parasitoid Nasonia Species Science, 327 (5963), 343-348 DOI: 10.1126/science.1178028[]
January 13th, 2010

How is avian influenza evolving?

Carrel et al 2010 PLoS ONE Fig5
Geographic distribution of H5N1 highly pathogenic avian
influenza viruses (HPAIVs) used in this study. Darkened
provinces indicate locations of virus isolation.

“We found several patterns that suggest one general model of evolution in this viral system: 1) within regions, viral mixing in poultry moves toward heterogeneity and the emergence of local types; 2) differentiation was centered around regional viral hubs located at centers of human and bird population density; and 3) evolution occurs because of relative isolation of the hubs, most likely fed by the abundant supply of domesticated poultry (and people) at the hubs. The analysis thus suggests that at the scale of neighboring city hubs and the intervening hinterland, evolution of H5N1 follows the pattern described by classical theory of genetic differentiation due to isolation by distance.” 1

Carrel et al 2010 PLoS ONE Fig1
Genetic versus geographic distance of HK821-like HPAIVs in Vietnam.

This is in Vietnam, and the basic finding was that H5N1 viruses isolated in Vietnam show signs of local evolution, in that the viruses cluster into local sub-strains in different areas of the country.  I’m not all that knowledgeable about H5N1 spread, but I had thought that infection of wild, especially migratory, birds would be an important factor in spreading H5N1 between chickens.  If I’m interpreting this paper right, it looks as if H5N1 is mostly circulating within local regions, within the chicken population, and distant spread isn’t a major factor. That has obvious implications for control of the virus.


  1. Carrel, M., Emch, M., Jobe, R., Moody, A., & Wan, X. (2010). Spatiotemporal Structure of Molecular Evolution of H5N1 Highly Pathogenic Avian Influenza Viruses in Vietnam PLoS ONE, 5 (1) DOI: 10.1371/journal.pone.0008631[]
January 12th, 2010

Is FIP really a mutant?

Coronavirus (SARS) - Wellcome Images
Coronavirus (SARS)

I try to stop myself from opening these posts with a line like, “One of the weirdest viruses … “, or “The most bizarre viral disease … ” or whatever, because viruses are all so weird and interesting that all my posts would start that way. But anyway: One of the most interesting viral diseases is FIP (Feline Infectious Peritonitis). Except that a recent paper argues — much to my relief — that maybe FIP isn’t quite as unconventional as all that. But it’s still really interesting.

When I was in veterinary practice, FIP was one of the regular horrors we saw. First, it’s often very hard to diagnose — in its early stages it’s a “Great Pretender” disease that mimics several other diseases and often causes fairly non-specific and vague symptoms. I’m pretty sure just about every feline practitioner has missed at least one case of early FIP. Second, it’s essentially always fatal. There’s no treatment, the diseases is progressive, and I’ve never heard of a cat recovering. And without getting caught up in the more disgusting details1 it’s often a really slow and unpleasant way to die. I just hated to see it. Unfortunately, it’s a reasonably common disease.

FIP is a viral disease, or at any rate the underlying cause is a coronavirus. 2 The symptoms are actually caused by a hyperactive, but ineffective, immune response to the virus. 3 The really weird wrinkle to FIP is that the disease-causing virus is believed to be a spontaneous mutant of a very common, but benign, coronavirus that is present in the gut of most cats.

Here’s how the standard model for FIP works. This is what I teach my class in Veterinary Microbiology & Immunology each fall. 4 Cats are infected with Feline Enteric Coronavirus (FECV). It’s not a problem for the cat; maybe it causes mild diarrhea, more likely there are no symptoms. In most cats, that’s the end of the story. But in rare cases, the coronavirus mutates — the details aren’t clear, and the precise mutation isn’t known — and turns into the FIP variant of the virus, FIPV. This virus doesn’t just live in the gut, it spreads throughout the body. The cat immune system tries to control the virus, and fails. The presence of continued viral antigen continues to drive the immune response, and there’s a hyperactive response, that doesn’t clear the virus. This continuous hyperactive response is what we see as FIP disease. (There are almost certainly host factors involved, too. Probably most cats have no problem with FIPV, and only a certain genotype of cat goes into the hyperimmune state that leads to disease. But we have no idea what those factors are.)

Coronavirus (UGA)
Coronavirus

So the basic idea is that a mutation changes a harmless virus into a horrible one. There are some critical implications of this model:

  • The FIPV genomic sequence in a cat should be more similar to the benign FECV in that area, than to FIPV in other areas
  • FIP, the disease, is not infectious. Cats already have the benign virus, and the disease-causing variant doesn’t spread from one cat to the next. 5

OK, so this is a pretty unusual situation. The only similar example that comes to mind is, maybe, measles and subacute sclerosing panencephalitis.  6 Another, much more conventional, explanation would be that there are actually two distinct strains of feline coronavirus, and one of them causes FIP while the other causes benign enteric infection. There are several arguments against this more conventional model, but then on the other hand the support for the mutation model is not terribly strong — in particular, the actual, specific hypothetical mutation that changes FECV into FIPV isn’t known. Two recent papers offer some modest support for the mutation model, while two other papers argue against it.

The two supporting papers7 ,8 both looked at the genomic sequences of FIPV vs. FECV, and found that (as predicted by the mutation model) the FIPV overall looked like local FECV, but had unique mutations in specific genes:

Coronavirus from feces and extraintestinal FIP lesions from the same cat were always >99% related in accessory and structural gene sequences. SNPs and deletions causing a truncation of the 3c gene product were found in almost all isolates from the diseased tissues of the eight cats suffering from FIP, whereas most, but not all fecal isolates from these same cats had intact 3c genes. Other accessory and structural genes appeared normal in both fecal and lesional viruses. Deliterious [sic] mutations in the 3c gene were unique to each cat …7

(It’s worth noting that these studies didn’t do whole-genome sequencing, but rather looked at less than a third of the genome. Feline coronavirus is a medium-sized virus, at around 29 kb,9 very roughly twice as large as influenza viruses, and the resources aren’t there for the sort of large-scale genomic sequencing we see for influenza.)

One weak strike against the mutation model comes from a recent paper10 from Matti Kiuppel and Roger Maes, here at MSU. (Roger mentioned this to me last week at his student’s thesis defense,11 which is what reminded me to write about this today.) They describe a symptom very much like FIP, in ferrets, also caused by a coronavirus; but in this case, the causative virus seems to be quite clearly a different strain than the endogenous ferret enteric coronavirus, although they still interpret this as a recent mutation:

The virus present in the samples tested also was not identical to the recently described ferret enteric coronavirus, FECV-MSU1, but appears to be most closely related to it by phylogenetic analysis … Further genomic sequencing will be required to more definitively characterize this systemic ferret coronavirus. The relatively recent recognition of this disease in pet ferrets suggests the occurence of a recent mutation or shift in the FECV that results in this disease, similar to the mutations that occur in FCoV preceding the development of FIP.10

Feline coronavirus phylogeny (O'Brien)
Feline coronavirus phylogeny (see Figure 4 of the original paper for detail)

So that’s circumstantial evidence that different strains could be involved. The most intriguing paper comes from Stephen O’Brien’s lab.  12  They sequenced chunks of FIPV and FECV (again, not the whole genome) and their conclusion was opposite to the previous groups: They feel that FIPV is a distinct strain of coronavirus, different from the enteric strain. Strains associated with FIP fell into one phylogenetic cluster, while those associated with enteric disease fell into a different cluster:

First, gene sequences from healthy cats infected with FECV displayed a monophyletic cluster pattern that was generally distinctive from cats diagnosed with FIP in the membrane, NSP 7b, and spike-NSP3 gene segments … Similar reciprocal monophyly of 140 NSP7b sequences was obtained for FIP cases versus FECV-asymptomatic cats (Figure 4). A consistent disease driven phylogeographic sorting was also observed for the 1,017-bp sequence spanning the spike-NSP3 genes, albeit with less statistical resolution, likely because of evolutionary constraints on gene divergence in this region (Figure 4). Together the remarkable reciprocal monophyly in these 3 genes supports the predictions of the circulating virulent-avirulent strain hypothesis illustrated in Figure 1.12

They also offer one specific, very telling, example of a cat that developed FIP during their screening. They had isolated one coronavirus before it developed disease, but the virus that was associated with disease was quite different:

However, virus isolated 7 months later in December 2004 after FIP developed in Fca-4590 fell within the FIP-case clades (also with high bootstrap), and was indistinguishable from FCoV isolated from other cats with FIP. This finding suggested that the pathogenic FIP-case type of FCoV infected this cat subsequent to its infection with an avirulent FECV and apparently replaced it. 12

I don’t have a dog in this fight, but O’Brien’s data look very compelling, and they support a really simple and conventional model, while the status quo model (though far from impossible) seems to call up a couple implausible events.


  1. One word: Pyogranulomatous[]
  2. Same viral family as SARS. Remember SARS?[]
  3. This is reminiscent of Dengue virus, in which the most severe form of the disease occurs when there’s an ineffective immune response that amplifies the infection.[]
  4. A good outline is in the recent paper:
    Chang, H., de Groot, R., Egberink, H., & Rottier, P. (2009). Feline infectious peritonitis; insights into feline coronavirus pathobiogenesis and epidemiology based on genetic analysis of the viral 3c gene Journal of General Virology DOI: 10.1099/vir.0.016485-0[]
  5. This isn’t strictly a prediction of the model as I’ve presented it. You could imagine that the virulent mutant would still be infectious. But in practice, FIP doesn’t seem to spread much if at all, and the model does account for that. []
  6. There may be other coronaviruses that do something similar, but I don’t think they’re any better confirmed than the feline situation[]
  7. Pedersen, N., Liu, H., Dodd, K., & Pesavento, P. (2009). Significance of Coronavirus Mutants in Feces and Diseased Tissues of Cats Suffering from Feline Infectious Peritonitis Viruses, 1 (2), 166-184 DOI: 10.3390/v1020166[][]
  8. Chang, H., de Groot, R., Egberink, H., & Rottier, P. (2009). Feline infectious peritonitis; insights into feline coronavirus pathobiogenesis and epidemiology based on genetic analysis of the viral 3c gene Journal of General Virology DOI: 10.1099/vir.0.016485-0[]
  9. Though one of the papers calls it a “huge genome”, that’s stretching the definition of “huge”[]
  10. Clinicopathologic Features of a Systemic Coronavirus-Associated Disease Resembling Feline Infectious Peritonitis in the Domestic Ferret (Mustela putorius).M. M. Garner, K. Ramsell, N. Morera, C. Juan-Sallés, J. Jiménez, M. Ardiaca, A. Montesinos, J. P. Teifke, C. V. Löhr, J. F. Evermann, T. V. Baszler, R. W. Nordhausen, A. G. Wise, R. K. Maes and M. Kiupel. Vet Pathol 45:236-246 (2008) [][]
  11. Congratulations again, Sheldon![]
  12. Brown, M. (2009). Genetics and Pathogenesis of Feline Infectious Peritonitis Virus Emerging Infectious Diseases, 1445-1452 DOI: 10.3201/eid1509.081573[][][]
January 11th, 2010

On evolution of the immune system

As if understanding this complex evolutionary puzzle were not already sufficiently challenging, we have learned recently that two types of adaptive immune system have evolved in vertebrates: a recently recognized system in jawless vertebrates (hagfish and lamprey) and the more familiar adaptive immune system of jawed vertebrates. … This leads to the conjecture that two interactive lymphocyte arms are a fundamental feature of the adaptive immune system that was selected to provide balance and self-regulation.

Cooper, M., & Herrin, B. (2010). How did our complex immune system evolve? Nature Reviews Immunology, 10 (1), 2-3 DOI: 10.1038/nri2686

January 7th, 2010

H1N1: It’s not going away

I haven’t seen much mention of this: After decreasing for 7 weeks, hospital visits for influenza-like illness in the US held steady for a week and have now started to increase again. The CDC chart:

CDC influenza incidence, week 51

On the other hand, Google flu trends only shows a slight increase, in line with last year, and lab-confirmed flu cases decreased slightly in this period.  Nevertheless, it will be no surprise if influenza has two peaks this season (though I expected a longer respite between peaks).  I’m guessing we’re going to see an increase in flu again over the next few weeks.

Edit: OK, well, maybe I’m jumping the gun. The rise looks like it may just be a blip, dropping down again in the subsequent week:

CDC influenza incidence, week 52

January 6th, 2010

How could vaccinia virus block T helpers?

Smallpox pustules (R. Carswell, 1831)
Smallpox pustules
(R. Carswell, 1831)

In contrast to the many viruses that block antigen presentation by MHC class I, only a handful appear to block presentation by MHC class II.  I don’t understand why any would try to block MHC class II in the first place, but another example of it has just been published.

A little background: Major histocompatibility complexes (MHC) are recognized by T cells. T cells come in several flavors, the best-understood of which are CD4 (T Helper) and CD8 (cytotoxic T lymphocyte; CTL) lymphocytes. CD8 T cells are fairly specialized to deal with cells infected with viruses;1 they recognize MHC class I. CD4 T cells are at the top of the adaptive immune response; they coordinate subsequent responses, by calling in other cell types, driving antibody or CTL responses, and so on.

MHC class I is on the surface of most cells, as you’d expect, because most cells can be infected with viruses. MHC class I is, among other things, a way of directing the CTL attack to the appropriate, virus-infected, cell, and so they deal, fairly strictly, with what’s going on inside their own particular cell. They don’t take up proteins from outside the cell, because then the cell might get killed when it’s actually a neighbor that’s infected. 2

MHC class II, on the other hand, is a general alarm call that signals “Something’s invading the body, somewhere”. MHC class II is only on a limited number of cells, but those cells do take up protein from outside themselves and show it to CD4 T cells. Presentation on MHC class II does not mean that the particular cell is infected.

So it’s quite logical that viruses would be interested in blocking MHC class I, and as I say there are now many examples of viruses that do so. It’s also logical for viruses to want to block MHC class II, since doing so would reduce all the immune responses against them — antibodies, T cells, whatever.

But how would that work? Again: The cells that do MHC class II antigen presentation are not necessarily infected cells. If a virus is going to block MHC class II, it would have to go out of its way infect the MHC class II-presenting cells (known as professional antigen-presenting cells; APC). Not only that, it would probably have to infect a lot of them, to make a real impact on the overall CD4 T cell response, because even a few unaffected APC will drive a fairly significant immune response, making the suppressed ones irrelevant.

So even though viruses might “want” to block MHC class II, there are practical problems that make it hard to do. Nevertheless, there are a couple of viruses who have genes that can block MHC class II. Human cytomegalovirus is the clearest example, I think,3 and several groups have shown that vaccinia virus blocks MHC class II presentation in infected cells.4 Now a paper in Virology5 argues that the vaccinia gene catchily called “A35″ is responsible for this block. Since close relatives of A35 are present in many other poxviruses, MHC class II blockade may be widespread in this family.

A35 colocalizes with RhoB in endosomes
Colocalization between A35 and RhoB in endosomes5

The data are reasonably convincing, though there are some complications. 6 But I’m still puzzled by how this is supposed to work. Vaccinia virus, and poxviruses in general, aren’t renowned for infecting dendritic cells and macrophages, which are the cell types they’d have to efficiently target if MHC class II blockade was to help them.

Removing A35 from vaccinia makes it much less virulent in mice:

A mutant A35 deletion virus (A35?) replicated normally in several tissue culture cell lines, but was highly attenuated (100–1000 fold) in the intranasal and intraperitoneal mouse challenge models7

And apparently this is associated with a reduced immune response to the virus:

Thus far our animal model data are consistent with this hypothesis, showing a reduction in both VV specific antibody and splenic T lymphocyte responses. 8

Which is consistent with a blockade of MHC class II, true, but if you have reduced viral replication for any reason you’d also expect reduced immune responses, because there would be less viral antigen to drive the response. That is, even though A35 blocks MHC class II, and A35 increases virulence, I’m not convinced that A35 increases virulence because it blocks MHC class II. Viral proteins are notoriously multifunctional, and I wonder if the MHC class II blockade is just one function of A35; or perhaps even if it’s just a side-effect of the “real” virulence function.

I’m open to the notion that A35 (and other viral proteins) are true MHC class II blockers, and that this is functionally important, but I’d like to see more data before I put it in the bank.


  1. Also, intracellular bacteria, intracellular parasites, and tumor cells[]
  2. There are exceptions to this rule, including an important phenomenon called “cross-priming” or “cross-presentation”, but that’s not relevant to this discussion now.[]
  3. For example, Johnson DC, Hegde NR. Inhibition of the MHC class II antigen presentation pathway by human cytomegalovirus. Curr Top Microbiol Immunol. 2002;269:101-15.[]
  4. For example, Li, P., Wang, N., Zhou, D., Yee, C.S., Chang, C.H., Brutkiewicz, R.R., Blum, J.S., 2005. Disruption of MHC class II-restricted antigen presentation by Vaccinia virus. J. Immunol. 175 (10), 6481–6488.[]
  5. Rehm, K., Connor, R., Jones, G., Yimbu, K., & Roper, R. (2009). Vaccinia virus A35R inhibits MHC class II antigen presentation Virology DOI: 10.1016/j.virol.2009.11.008[][]
  6. For example, it looks as if there may be other genes, besides A35, that also contribute to MHC class II blockade.[]
  7. Roper, R.L., 2006. Characterization of the Vaccinia virus A35R protein and its role in virulence. J. Virol. 80 (1), 306–313.[]
  8. Rehm, K.E., Jones, G.J.B., Tripp, A.A., Metcalf, M.W., and Roper, R.L., in press. The Poxvirus A35 Protein is an Immunoregulator. J. Virol.[]
December 24th, 2009

Pandemic flu and disease

It’s been a busy couple weeks, and tomorrow we’re heading out for a week’s vacation. I don’t know what my internet access will be like, but probably not too good, so this might be the last Mystery Rays post for 2009.

Quick notes from a series of articles in New England Journal of Medicine on disease caused by the pandemic swine-origin influenza virus:

… the majority of those infected have a mild illness. The typical period during which the virus can be detected with the use of real-time RT-PCR is 6 days (whether or not fever is present). The duration of infection may be shortened if oseltamivir is administered1

I bolded the part about oseltamivir (Tamiflu) because of the recent controversy (see the Avian Flu Diary, here and here) about Tamiflu’s effectiveness.

But even though the pandemic is usually mild, several groups are at unusual risk:

2009 H1N1 influenza can cause severe illness and death in pregnant and postpartum women; regardless of the results of rapid antigen testing, prompt evaluation and antiviral treatment of influenza-like illness should be considered in such women. The high cause-specific maternal mortality rate suggests that 2009 H1N1 influenza may increase the 2009 maternal mortality ratio in the United States. 2

And:

Pandemic 2009 H1N1 influenza was associated with pediatric death rates that were 10 times the rates for seasonal influenza in previous years … Most deaths were caused by refractory hypoxemia in infants under 1 year of age (death rate, 7.6 per 100,000). 3

There’s an emerging sense that much of the young-person mortality associated with the pandemic flu4 is due fairly directly to the virus itself, rather than to subsequent bacterial infection.   That was probably not true for the 1918 influenza pandemic, where bacterial infections were a major part of the high mortality rates:

… bacterial infections, especially pneumococcal infections, were a major cause of influenza- associated pneumonia and death among both military personnel and civilians in 1918–1919. The distribution of pneumococcal serotypes shifted toward less invasive serotypes during that period as compared with the pre-1918 period, suggesting that the 1918 influenza virus increased host susceptibility to less-invasive pneumococci.5

And, as the authors note, underdeveloped countries may have higher mortality from the present pandemic, if there’s more risk of bacterial superinfection.


  1. Cao, B., Li, X., Mao, Y., Wang, J., Lu, H., Chen, Y., Liang, Z., Liang, L., Zhang, S., Zhang, B., Gu, L., Lu, L., Wang, D., Wang, C., & , . (2009). Clinical Features of the Initial Cases of 2009 Pandemic Influenza A (H1N1) Virus Infection in China New England Journal of Medicine, 361 (26), 2507-2517 DOI: 10.1056/NEJMoa0906612[]
  2. Janice K. Louie, Meileen Acosta, Denise J. Jamieson, Margaret A. Honein, & for the California Pandemic (H1N1) Working Group (2009). Severe 2009 H1N1 Influenza in Pregnant and Postpartum Women in California New England Journal of Medicine[]
  3. Romina Libster, & et al. (2009). Pediatric Hospitalizations Associated with 2009 Pandemic Influenza A (H1N1) in Argentina New England Journal of Medicine[]
  4. At least, in those places where autopsies have been consistently performed; which is biased against underdeveloped countries[]
  5. Chien, Y., Klugman, K., & Morens, D. (2009). Bacterial Pathogens and Death during the 1918 Influenza Pandemic New England Journal of Medicine, 361 (26), 2582-2583 DOI: 10.1056/NEJMc0908216[]
December 19th, 2009

Baseball: Predictive value of UZR

Jacoby Ellsbury diving catch
Typical Ellsbury catch

(I see it’s been a long time since my last baseball-related post. Here’s a long one to make up, so I’m good for another year, baseball-post-wise.)

Jacoby Ellsbury, the centerfielder for the Boston Red Sox, was just given the “Defensive Player of the Year” award, as voted by baseball fans in the “This Year In Baseball Awards.”  This is interesting because, statistically, Ellsbury in 2009 was actually one of the worst defensive players in all of baseball. Does this mean that baseball fans got it wrong, or did the statistics lie to us?

The answer isn’t immediately obvious, because it’s generally agreed that statistical analysis of baseball defense1 still lags well behind most offensive and even pitching measurements. If a defender makes a catch, is that because everyone, including me, would have caught it? Was it a difficult catch, but one that most major-league players should have made? Or was it a really difficult catch, that almost every major-leaguer would have flubbed? Fans tend to judge these by the spectacularity of the catch, and there’s no doubt that Ellsbury made his share of spectacular, diving, all-out catches in 2009. But did he make them harder than they were?

In this case, actually, defensive stats and a lot of astute observers agree. Ellsbury didn’t get “good jumps” on a lot of hits this year — he hesitated before running, and he may not have run to the best spot for a catch. As a result, he had to make spectacular catches, on hits that a better defender would have caught quite routinely.

Easy, hard, or making it harder

J.D. Drew catch
Typical J.D. Drew catch

An interesting contrast to Ellsbury is the guy to his left, J.D. Drew, who plays right field for the Red Sox. Drew makes few spectacular catches, rarely diving or getting mussed up. His catches almost always look routine and easy. But statistically, he was a much better fielder than Ellsbury in 2009. In UZR/150 games, Ellsbury was -18.3, while Drew was at +15.7 — second-highest among right-fielders2 in the majors last year. Again, many careful observers agree with the stats; Drew doesn’t have to make spectacular catches, because he instantly sees where the ball will go, moves to the right place immediately, and makes the catch easily. Casual fans don’t think of Drew as a high-quality defender, because he doesn’t seem to make difficult plays; but in fact, he is making the difficult plays, he’s just making them look easy.

So UZR seems to agree fairly well with careful observers’ analysis of Ellsbury and Drew’s respective abilities. But that’s not really the interesting question. It’s mildly entertaining to say, “Yeah, baseball fans got it wrong”, but it’s much more interesting to ask what this tells us about the future. In other words: UZR seems to be an reasonable descriptive statistic. Is it a useful predictive statistic as well? For example, should Ellsbury play CF for the Sox next year? What are the chances that he’ll be a good centerfielder next year? What are the chances that Drew was just lucky this year, and next year he’ll be a lousy right fielder?

The numbers and the future

And here the waters get much more muddy. One puzzling point is that in 2008, Ellsbury was an excellent fielder, by UZR/150. He was pretty good at CF (+6.9) and superb in right field (+18.6, although in limited time — 36 games). Do players often show this kind of 20-odd point swing in UZR? And what does it tell us about that player’s future? (My answer, for those with tl;dr disease: Based on history, Ellsbury has about a 40% chance of being at an average or better fielder next year, and a 13% chance of being either good, or very good.)

Let’s ask a number of questions about UZR/150 and its ability to predict future defense.

(1) How well does a player’s UZR/150 correlate, year to year? That is, if a player has a particular UZR/150 this year, how similar will his next year’s UZR/150 be?

(2) How many players have the kind of huge drop in UZR/150 that Ellsbury showed? What happened to their defense in the years following that drop?

(3) If a defender is “very bad” this year,3 how likely is it that he’ll be a decent or good defender next year?

I’ve scraped FanGraph’s fielding ratings and dumped them into an SQLite database on my own computer, so that I can look at these questions. 4 Here are my attempts at answers.

Correlations between years for UZR/150
Correlations between years for UZR/150 (click for larger version)

(1) UZR/150 from one year does correlate with the next, but not very well. If we limit our analysis to outfielders who spent more than 65 games at a particular position in a year,5 and plot out each year vs. the following year in a scatter graph, the R2 is just 0.1823, and it only gets a little better (0.2505) if we limit it to players with at least 150 games at the position (see the figure at left).  In fact I tried all kinds of variations, and the only R2 that was over 0.5 was if I limited the analysis to the very best and the very worst outfielders6  who played over 130 games at a position; there the correlation with their next year was 0.5494.

So yes, there’s some correlation, on the bulk level.  But not much.  On an individual basis — which is, of course, what we’re interested in — you couldn’t be at all confident that next year’s UZR/150 will be very similar to this year’s.

(2) 20-point drops in UZR/150 aren’t unheard of, and players can bounce back from them. This kind of swing isn’t all that common, but it’s happened.  I turned up 24 outfielders since 20027 who had at least a 20-point change in UZR/150 from one year to the next. Of the 21 players with at least a 20-point drop, 7 of them stopped playing that position.  Six of the rest had a drop in 2009, and some of those won’t be back.  There were 12 who had a 20-point drop and continued at the position; 8 and of these, at least half bounced back, at least temporarily:

  • Andruw Jones (from 34.7 in 2005 to 13.1 in 2006; then 22.2, then 0.2, and then out of the position)
  • Kenny Lofton (19.9 in 2005; -17 in 2006; 8.3 in 2007)
  • Corey Patterson (-11 in 2002; then 14.8, 33.8, 11.3, 14.2, 1, 0.7)
  • Willy Taveras (22.6 in 2006; then -7.1, -3, and back up to 14.1)
  • Jeff Francoeur (30.1 in 2005, 7.4 in 2006, 16.9 in 2007; but then -4.9, -19 in 2008 and 2009)
  • Juan Encarnacion (all over the place. From 2003: -11.4, 13.5, -11.1, 7.1, -26.7)
  • Reggie Sanders (12.9 in 2002; then -7.2, 4.3, and -9.2)

The complete table9 is here.

Very good 1st year (n=105)
Percent at least OK Percent at least good
85.72 65.72
Good 1st year (n=271)
Percent at least OK Percent at least good
81.56 44.29
OK 1st year (n=396)
Percent at least OK Percent at least good
68.69 29.8
Bad 1st year (n=229)
Percent at least OK Percent at least good
54.14 17.9
Very bad 1st year (n=70)
Percent at least OK Percent at least good
38.57 12.86

(3) Very good and very bad defenders tend to be consistently good or bad. Although the fine correlation just isn’t there, can we draw a more general conclusion?  If we have a player who (based on UZR/150) is very good, good, just about average, bad, or very bad this year,10 what are his chances of being at least average, or of being at least good, next year? A summary of those chances, based on historical analysis of the 1071 players who qualified, is shown at the right; a more detailed breakdown is here.

What we see is that a player who was very good one year, has an 85% chance of being at least decent the following year, and a 66% chance of being good or very good. But a player who was very bad one year (for example, Jacoby Ellsbury this year) has a 40% chance of being at least decent the following year, and a 13% chance of being good or very good. So, again, there is reasonable predictive value when we look in this rather coarse-grained way, but there’s a ton of year-to-year variability.

I won’t show the data but here increasing the number of games played to 130 or 150 per year doesn’t help very much, the percents remain surprisingly similar although the numbers drop.

The bottom line

So what can we expect from Ellsbury next year (assuming he plays centerfield in 2010)?  Well, looking at the history of outfielders with that kind of drop in UZR/150, maybe there’s around a 20-50% chance that he’ll bounce back to be a decent CF (from question 2, above).  Looking at all players (question 3 above) there’s about a 40% chance that he’ll be decent, and a 12% chance that he’ll be good or very good next year.

Not great numbers, but that’s what we see.  My own suspicion is that Ellsbury will be a pretty good defender next year, but I wouldn’t put a lot of money on it.


  1. The most widely used is probably “UZR”, the “ultimate zone rating”; see here, here, and here for explanations. The other contender is the plus/minus rating system. UZR is freely available from the FanGraphs web site, while plus/minus requires a subscription to Bill James Online, so I’m only using UZR — actually, UZR/150, which is UZR normalized to 150 games.[]
  2. Those who played more than 100 games in RF[]
  3. I.e. has a low UZR/150[]
  4. Not that other, better-qualified, people haven’t already asked the questions. But poking at data is how I try to understand it, so here it is.[]
  5. Which, not entirely coincidentally, includes Ellsbury’s 2008, when he played 66 games at CF[]
  6. UZR/150 of < 15 or > 15[]
  7. When UZR was introduced[]
  8. Several players had more than one swing year, so these numbers don’t add up all that nicely[]
  9. Reminder: this is for outfielders only, not all players[]
  10. I used UZR/150 cutoffs of > +15, +5  to + 15, -5 to + 5,  -15 to -5, and less than < -15 for the different grades[]
December 16th, 2009

On cancer genomes

Wellcome: Cigarette poster

We’re just dipping our toes into the oceans of information from large-scale genome sequencing. We’re at the point now where sequencing a human genome is, not routine, but not extraordinary. The most recent examples of this are two groups who sequenced the genome of a cancer (one group did a lung cancer, the other did a melanoma), and compared to the person’s normal cells. 1   This lets you see where the cancer cells are mutated.

How many mutations are there in a cancer? We already know that cancer is a multi-step process, involving probably at least 7 or 8 distinct stages. We also know that cancer cells have far more mutations than are needed for these minimals steps. How many is “more”?

  • Over 20,000 mutations – 23,000 mutations in the lung cancer, 33,000 in the skin cancer.

Where did these mutations come from? What drives mutagenesis in a cancer cell?

  • Cigarettes and UV light. They can point out the typical kinds of mutagenesis for each and show that the lung cancer mutations are tobacco-induced, the skin cancer mutations are UV-induced.

How often do cigarettes cause mutations?

  • “… an average of one mutation for every 15 cigarettes smoked.”

(I question this figure, or rather, question whether the implied causation is that direct. But it’s not impossible, given their data.)  From an immunological viewpoint, the 20,000 mutations is interesting because it suggests that cancers should have lots of targets for the immune system. This was already pretty clear, but this helps nail it down.

(By the way, the poster at the top, like the research in question, comes from the Wellcome Trust Institute.)


  1. Pleasance, E., Stephens, P., O’Meara, S., McBride, D., Meynert, A., Jones, D., Lin, M., Beare, D., Lau, K., Greenman, C., Varela, I., Nik-Zainal, S., Davies, H., Ordoñez, G., Mudie, L., Latimer, C., Edkins, S., Stebbings, L., Chen, L., Jia, M., Leroy, C., Marshall, J., Menzies, A., Butler, A., Teague, J., Mangion, J., Sun, Y., McLaughlin, S., Peckham, H., Tsung, E., Costa, G., Lee, C., Minna, J., Gazdar, A., Birney, E., Rhodes, M., McKernan, K., Stratton, M., Futreal, P., & Campbell, P. (2009). A small-cell lung cancer genome with complex signatures of tobacco exposure Nature DOI: 10.1038/nature08629

    Pleasance, E., Cheetham, R., Stephens, P., McBride, D., Humphray, S., Greenman, C., Varela, I., Lin, M., Ordóñez, G., Bignell, G., Ye, K., Alipaz, J., Bauer, M., Beare, D., Butler, A., Carter, R., Chen, L., Cox, A., Edkins, S., Kokko-Gonzales, P., Gormley, N., Grocock, R., Haudenschild, C., Hims, M., James, T., Jia, M., Kingsbury, Z., Leroy, C., Marshall, J., Menzies, A., Mudie, L., Ning, Z., Royce, T., Schulz-Trieglaff, O., Spiridou, A., Stebbings, L., Szajkowski, L., Teague, J., Williamson, D., Chin, L., Ross, M., Campbell, P., Bentley, D., Futreal, P., & Stratton, M. (2009). A comprehensive catalogue of somatic mutations from a human cancer genome Nature DOI: 10.1038/nature08658 []

December 14th, 2009

Influenza before 1918, part II: 1872

In 1872, a pandemic influenza outbreak brought the US to its knees:

“The streets are almost deserted.” –Washington, D.C.

“A Sunday quiet prevails upon the streets.” –Springfield, OR

“The streets yesterday looked deserted.” –San Francisco, CA

“The street cars have stopped.” – Erie, PA 1

And yet, if you look at the mortality rates for influenza in 1872, it’s not a particularly impressive year — if anything, the influenza death rates were exceptionally low that year.  At least, they were low in humans.  1872 brought a pandemic equine influenza, laying low almost every horse in North America.

On the evening of October 21st only a few animals were affected, but on the morning of the 22d there was scarcely an animal of the equine species that was not affected.  Horses, mules, and even a zebra.  More than twenty thousand were suffering in different degrees. 2

An estimated 3-4% of the tens of thousands of horses in New York died. 2 But the deaths weren’t the biggest problem:

The actual money losses, in an epizootic of influenza, are more in the way of the loss of work and the complete stagnation of trade in all departments, than in the number of deaths.  Yet even in this sense it may prove more ruinous than would a disease having a less universal away though far more fatal to the animals attacked. 3

Without horses, business slammed to a halt; the mail didn’t run, groceries didn’t reach the cities, crops weren’t harvested or transported.  After a few weeks, most of the horses recovered and business followed, but the epizootic swept across the country1  (intensely tracked by the newspapers of the day, warning each city in turn that it was going to be attacked), finally fizzling out the following summer in British Columbia.

Equine influenza map, 1872


  1. Adoniram B. Judson, MD (1873). History and Course of the Epizootic Among Horses Upon the North American Continent in 1872-1873. Public Health Papers and Reports. American Public Health Association. Hurd and Houghton, New York, 88-109[][]
  2. Annual report of the Department of Health of the State of New Jersey. By The New Jersey State Dept. of Health, 1877 (“Epizootic influenza”, p. 160)  [][]
  3. Text book of veterinary medicine, Volume IV.  By James Law, F.R.C.V.S.  1906 []