Mystery Rays from Outer Space

Meddling with things mankind is not meant to understand. Also, pictures of my kids

January 19th, 2010

The good old days

Ladies & Gentlemen, I give you The Fever Districts of the United States, as of 1856 (click for a larger version):

Keith 1856 Fever Districts of the USA

Note the outlining of the Intermittent Fever districts, including Lansing, MI, where I live. Note the intense yellow rim of Yellow Fever.  Note the Small Pox Measles Scarlatina Consumption Endemic region along the Eastern seaboard, the large-case TYPHUS, the DYSENTERY, the casual “And many epidemics” tacked on to the main Yellow Fever, the serpentine red band tracing cholera. 1 There’s goitre in the Midwest and Mexico, elephantiasis down in South America, “Dia. & Dys. (severe)” in tiny writing down in the Bahamas, and the Bermudas are “generally healthy: Influenza, Rheumatism, Dysentery, Yellow Fever”.  And so much more.  (Compare to the map of Malaria in the USA, 1870.)

Keith 1856 USA Health & DiseaseThis amazing map is a mere afterthought, an inset of a map whose awesomeness goes up to 11.  The US map to the right2 (again, click for a larger version) is still just a small fraction of the whole, and that’s not even mentioning the jaw-dropping charts and graphs, also inset, showing “Consumption: Proportion of Deaths in the different quarters of the Globe”, “Comparative Value of Life in Different Countries”, “Proportionate Mortality of European Residents in Foreign Countries” and still more and more.

This map is “The geographical distribution of health & disease, in connection chiefly with natural phenomena. (with) Fever districts of United States & W. Indies, on an enlarged scale,” and it’s from:

The physical atlas of natural phenomena
by Alexander Keith Johnston, F.R.S.E., F.R.G.S., F.G.S.
William Blackwood and Sons, Edinburgh and London, MDCCCLVI 3

I’d run across reverent mentions of this map — especially the Fever Districts inset — here and there in old books, and I just stumbled across it in downloadable form.  You must go at once to The David Rumsey Collection and pore over it for several hours, at the highest resolution.

  1. Lansing seems to have been just barely cholera-free, at least in 1856.[]
  2. The colors refer to “zones of disease” – Torrid (brown), Sub-torrid & temperate (green), sub-temperate & arctic (blue) []
  3. That’s 1856, for those of you who, like me, need to pause a while in thought when confronted with years in Roman numerals[]
December 19th, 2009

Baseball: Predictive value of UZR

Jacoby Ellsbury diving catch
Typical Ellsbury catch

(I see it’s been a long time since my last baseball-related post. Here’s a long one to make up, so I’m good for another year, baseball-post-wise.)

Jacoby Ellsbury, the centerfielder for the Boston Red Sox, was just given the “Defensive Player of the Year” award, as voted by baseball fans in the “This Year In Baseball Awards.”  This is interesting because, statistically, Ellsbury in 2009 was actually one of the worst defensive players in all of baseball. Does this mean that baseball fans got it wrong, or did the statistics lie to us?

The answer isn’t immediately obvious, because it’s generally agreed that statistical analysis of baseball defense1 still lags well behind most offensive and even pitching measurements. If a defender makes a catch, is that because everyone, including me, would have caught it? Was it a difficult catch, but one that most major-league players should have made? Or was it a really difficult catch, that almost every major-leaguer would have flubbed? Fans tend to judge these by the spectacularity of the catch, and there’s no doubt that Ellsbury made his share of spectacular, diving, all-out catches in 2009. But did he make them harder than they were?

In this case, actually, defensive stats and a lot of astute observers agree. Ellsbury didn’t get “good jumps” on a lot of hits this year — he hesitated before running, and he may not have run to the best spot for a catch. As a result, he had to make spectacular catches, on hits that a better defender would have caught quite routinely.

Easy, hard, or making it harder

J.D. Drew catch
Typical J.D. Drew catch

An interesting contrast to Ellsbury is the guy to his left, J.D. Drew, who plays right field for the Red Sox. Drew makes few spectacular catches, rarely diving or getting mussed up. His catches almost always look routine and easy. But statistically, he was a much better fielder than Ellsbury in 2009. In UZR/150 games, Ellsbury was -18.3, while Drew was at +15.7 — second-highest among right-fielders2 in the majors last year. Again, many careful observers agree with the stats; Drew doesn’t have to make spectacular catches, because he instantly sees where the ball will go, moves to the right place immediately, and makes the catch easily. Casual fans don’t think of Drew as a high-quality defender, because he doesn’t seem to make difficult plays; but in fact, he is making the difficult plays, he’s just making them look easy.

So UZR seems to agree fairly well with careful observers’ analysis of Ellsbury and Drew’s respective abilities. But that’s not really the interesting question. It’s mildly entertaining to say, “Yeah, baseball fans got it wrong”, but it’s much more interesting to ask what this tells us about the future. In other words: UZR seems to be an reasonable descriptive statistic. Is it a useful predictive statistic as well? For example, should Ellsbury play CF for the Sox next year? What are the chances that he’ll be a good centerfielder next year? What are the chances that Drew was just lucky this year, and next year he’ll be a lousy right fielder?

The numbers and the future

And here the waters get much more muddy. One puzzling point is that in 2008, Ellsbury was an excellent fielder, by UZR/150. He was pretty good at CF (+6.9) and superb in right field (+18.6, although in limited time — 36 games). Do players often show this kind of 20-odd point swing in UZR? And what does it tell us about that player’s future? (My answer, for those with tl;dr disease: Based on history, Ellsbury has about a 40% chance of being at an average or better fielder next year, and a 13% chance of being either good, or very good.)

Let’s ask a number of questions about UZR/150 and its ability to predict future defense.

(1) How well does a player’s UZR/150 correlate, year to year? That is, if a player has a particular UZR/150 this year, how similar will his next year’s UZR/150 be?

(2) How many players have the kind of huge drop in UZR/150 that Ellsbury showed? What happened to their defense in the years following that drop?

(3) If a defender is “very bad” this year,3 how likely is it that he’ll be a decent or good defender next year?

I’ve scraped FanGraph’s fielding ratings and dumped them into an SQLite database on my own computer, so that I can look at these questions. 4 Here are my attempts at answers.

Correlations between years for UZR/150
Correlations between years for UZR/150 (click for larger version)

(1) UZR/150 from one year does correlate with the next, but not very well. If we limit our analysis to outfielders who spent more than 65 games at a particular position in a year,5 and plot out each year vs. the following year in a scatter graph, the R2 is just 0.1823, and it only gets a little better (0.2505) if we limit it to players with at least 150 games at the position (see the figure at left).  In fact I tried all kinds of variations, and the only R2 that was over 0.5 was if I limited the analysis to the very best and the very worst outfielders6  who played over 130 games at a position; there the correlation with their next year was 0.5494.

So yes, there’s some correlation, on the bulk level.  But not much.  On an individual basis — which is, of course, what we’re interested in — you couldn’t be at all confident that next year’s UZR/150 will be very similar to this year’s.

(2) 20-point drops in UZR/150 aren’t unheard of, and players can bounce back from them. This kind of swing isn’t all that common, but it’s happened.  I turned up 24 outfielders since 20027 who had at least a 20-point change in UZR/150 from one year to the next. Of the 21 players with at least a 20-point drop, 7 of them stopped playing that position.  Six of the rest had a drop in 2009, and some of those won’t be back.  There were 12 who had a 20-point drop and continued at the position; 8 and of these, at least half bounced back, at least temporarily:

  • Andruw Jones (from 34.7 in 2005 to 13.1 in 2006; then 22.2, then 0.2, and then out of the position)
  • Kenny Lofton (19.9 in 2005; -17 in 2006; 8.3 in 2007)
  • Corey Patterson (-11 in 2002; then 14.8, 33.8, 11.3, 14.2, 1, 0.7)
  • Willy Taveras (22.6 in 2006; then -7.1, -3, and back up to 14.1)
  • Jeff Francoeur (30.1 in 2005, 7.4 in 2006, 16.9 in 2007; but then -4.9, -19 in 2008 and 2009)
  • Juan Encarnacion (all over the place. From 2003: -11.4, 13.5, -11.1, 7.1, -26.7)
  • Reggie Sanders (12.9 in 2002; then -7.2, 4.3, and -9.2)

The complete table9 is here.

Very good 1st year (n=105)
Percent at least OK Percent at least good
85.72 65.72
Good 1st year (n=271)
Percent at least OK Percent at least good
81.56 44.29
OK 1st year (n=396)
Percent at least OK Percent at least good
68.69 29.8
Bad 1st year (n=229)
Percent at least OK Percent at least good
54.14 17.9
Very bad 1st year (n=70)
Percent at least OK Percent at least good
38.57 12.86

(3) Very good and very bad defenders tend to be consistently good or bad. Although the fine correlation just isn’t there, can we draw a more general conclusion?  If we have a player who (based on UZR/150) is very good, good, just about average, bad, or very bad this year,10 what are his chances of being at least average, or of being at least good, next year? A summary of those chances, based on historical analysis of the 1071 players who qualified, is shown at the right; a more detailed breakdown is here.

What we see is that a player who was very good one year, has an 85% chance of being at least decent the following year, and a 66% chance of being good or very good. But a player who was very bad one year (for example, Jacoby Ellsbury this year) has a 40% chance of being at least decent the following year, and a 13% chance of being good or very good. So, again, there is reasonable predictive value when we look in this rather coarse-grained way, but there’s a ton of year-to-year variability.

I won’t show the data but here increasing the number of games played to 130 or 150 per year doesn’t help very much, the percents remain surprisingly similar although the numbers drop.

The bottom line

So what can we expect from Ellsbury next year (assuming he plays centerfield in 2010)?  Well, looking at the history of outfielders with that kind of drop in UZR/150, maybe there’s around a 20-50% chance that he’ll bounce back to be a decent CF (from question 2, above).  Looking at all players (question 3 above) there’s about a 40% chance that he’ll be decent, and a 12% chance that he’ll be good or very good next year.

Not great numbers, but that’s what we see.  My own suspicion is that Ellsbury will be a pretty good defender next year, but I wouldn’t put a lot of money on it.

  1. The most widely used is probably “UZR”, the “ultimate zone rating”; see here, here, and here for explanations. The other contender is the plus/minus rating system. UZR is freely available from the FanGraphs web site, while plus/minus requires a subscription to Bill James Online, so I’m only using UZR — actually, UZR/150, which is UZR normalized to 150 games.[]
  2. Those who played more than 100 games in RF[]
  3. I.e. has a low UZR/150[]
  4. Not that other, better-qualified, people haven’t already asked the questions. But poking at data is how I try to understand it, so here it is.[]
  5. Which, not entirely coincidentally, includes Ellsbury’s 2008, when he played 66 games at CF[]
  6. UZR/150 of < 15 or > 15[]
  7. When UZR was introduced[]
  8. Several players had more than one swing year, so these numbers don’t add up all that nicely[]
  9. Reminder: this is for outfielders only, not all players[]
  10. I used UZR/150 cutoffs of > +15, +5  to + 15, -5 to + 5,  -15 to -5, and less than < -15 for the different grades[]
December 8th, 2009

Malaria and mosquitoes: Not 1908, not Cuba

A couple of days ago I posted this map of malaria in the USA. It got picked up by Grant Jacobs, who made some interesting and useful comments, and that in turn got picked up by someone who posted it to  Unfortunately, whoever wrote it up for boingboing tried to add some value by offering a couple of points on the history of malaria, both of which were wrong. 1 In particular, he claimed that “It wasn’t until 1908 that a Cuban doctor made the connection with mosquitoes”.  To set the record straight:

RECENT researches by Surgeon Major Ronald Ross have shown that the mosquito may be the host of parasites of the type of that which causes human malaria. Ross has distinctly proved that malaria can be acquired by the bite of a mosquito, and the results of his observations have a direct bearing on the propagation of the disease in man. Dr P. Manson describes the investigations in a paper in the British Medical Journal, and sums them up as follows: –The observation tend to the conclusion that the malaria parasite is for the most part a parasite of insects; that it is only an accidental and occasional visitor to man; that not all mosquitos are capable of subserving it; that particular species of malaria parasites demand particular species of mosquitos; that in this circumstance we have at least a partial explanation of the apparent vagaries of the distribution of the varieties of malaria. When the whole story has been completed, as it surely will be at no distant date, in virtue of the new knowledge thus acquired we shall be able to indicate a prophylaxis for malaria of a practical character, and one which may enable the European to live in climates now rendered deadly by this pest.

Nature, Sept. 1898.  p. 523

The earliest probable reference I can find2 is from 1896:
The Goulstonian Lectures on the Life History of the Malaria Germ Outside the Human Body. P. Manson. The British Medical Journal, 1896

Update: I just realized what the poster had in mind with his comment that “It wasn’t until 1908 that a Cuban doctor made the connection with mosquitoes”: He was thinking about yellow fever, a virus rather than a parasite. Here and there about the web it’s suggested that yellow fever was shown to be mosquito-borne, in 1908, by a Cuban doctor, Carlos Finlay.  Unfortunately that’s also not correct; it probably was originally a typo somewhere that got spread around.

Finlay (who was, I believe, American, though he worked in Cuba) originally published his observations in 18813 and then in English in 18891886.4  His theory wasn’t immediately accepted, but by 1900 it was confirmed by a medical commission that included the famous Walter Reed.

  1. Also, he didn’t credit me, which is probably for the best, since my pathetic hosting would have undoubtedly crashed[]
  2. I haven’t read the text of this yet[]
  3. C. Finlay. El mosquito hipoteticamente considerado como agente de trasmislon de la flebre amarllla. An. de la Real Academia de ciencias med. … de la Habana, vol. 18, pp. 147-169 (Aug 14 1881) []
  4. C. Finlay. Yellow Fever, its transmission by means of the Culex mosquito. Am. Journ. Med. Sci. vol. 92, pp. 395-409 (1886) []
December 3rd, 2009

Gone in 60 (milli)seconds

Gone in 60 (milli)seconds

Intracellular proteins have to be degraded, more or less at the same rate as new proteins are produced (or the cell would eventually burst). On the other hand, you can’t go about degrading proteins willy-nilly.  There are vast and complex systems for identifying proteins that should be destroyed, tagging them, and then moving them into a controlled destruction chamber.

The most important of these systems is the ubiquitin-proteasome degradation pathway.  Proteins that are destined for destruction are tagged with a chain of ubiquitin molecules.1  There are multiple steps in this pathway, in which ubiquitin is prepared for tagging, target proteins are identified, and ubiquitin is transferred from the activating components to the targeted protein.

Target proteins are destroyed when a chain of ubiquitin molecules (head to tail) are attached to them. An unanswered question has been how this works. Is the ubiquitin chain formed first, and then transferred to the target en bloc? Or are single ubiquitin transferred one at a time, sequentially, first to the target protein and then to the previously-attached ubiquitins?  The problem has been that the process goes so fast that it’s been hard to distinguish between the possibilities.

Now, in a gorgeous series of experiments, Pierce et al2 were able to watch ubiquitination happening over fractions of a second:

… we performed our single-encounter reactions on a quench flow apparatus that allowed us to take measurements on a timescale ranging from 10 ms to 30 s2

And the answer looks pretty clear: Ubiquitins are transferred sequentially, not en bloc.

Even at this timescale, though, they weren’t able to catch the very first event — the transfer of the first ubiquitin to the target.  That happens, apparently, in less than 10-20 milliseconds.  They also draw the conclusion that target tagging is critically dependent on the kinetics of ubiquitin chain elongation (as you’d expect) which are governed by ubiquitin off-rates, and this mode of regulation is probably a billion years old.

Pierce et al (2009) Fig. 3d: Ubiquitin addtion

Figure 3d: Kinetics of ubiquitin addition and elongation2(Click for a larger version)

  1. Ubiquitin being a small, abundant protein[]
  2. Pierce, N., Kleiger, G., Shan, S., & Deshaies, R. (2009). Detection of sequential polyubiquitylation on a millisecond timescale Nature, 462 (7273), 615-619 DOI: 10.1038/nature08595[][][]
November 9th, 2009

Making charts with Numbers

Apple’s iWorks ’06 package was interesting, but ended up being too simplified to really compete with MS Office.  But iWorks ’09 was a big step forward, and I now use Pages for almost all my word processing, and Numbers for about 75% of my spreadsheets.  (I still use Powerpoint for most of my slideshows; I don’t find any compelling reason to use Keynote instead, and Powerpoint does have some distinct advantages.)

“Numbers” looks fairly similar to Excel — they are both spreadsheet programs, so there’s only so many ways of usefully presenting information there — but the editing and so on can be quite different from Excel, which can be frustrating if you’re coming from an Excel background.  Rosie Redfield was just complaining about the non-intuitiveness of Numbers.  I don’t think it’s non-intuitive, just different from Excel.

So I put together a couple quick screencasts of making a line graph and a scatter plot, in the hope it would give a starting point for people new to Numbers.  (Flash movies, 7.8 and 5 MB respectively.  No sound, because my kids are still asleep.)  I’ve never tried this before, but hopefully they’ll work.

October 31st, 2009
August 6th, 2009

On stupidity and virologists

I’ve quoted this before, but without attribution, because I didn’t know who originally said it:

”The stupidest virus is smarter than the smartest virologist.”

Apparently it was George Klein.

July 25th, 2009

Busy, Back soon

I’m off for a week’s vacation with the family – camping, etc., and with limited internet access.  I’ll be back in the first week of August some time.  Talk amongst yourselves.

July 1st, 2009

Happy Canada Day

(A real post tomorrow, I hope)

June 15th, 2009

Conspicuous consumption

CTuberculosis and the Grim Reaper
Tuberculosis and the Grim Reaper

A while ago I made the point that many of the biggest killers of 19th-century London were almost unknown today, because of vaccination (“hooping cough”, measles, smallpox) and sanitation (typhus, cholera) (see “Life & Death, pre-vaccination“).

I have a small confession to make: I kind of rigged that chart, because I wanted to avoid a complicated story that I didn’t know much about.  I still don’t know much about it, but I’ll share my ignorance with you, dear readers, because it’s a fascinating story and because it hooks up with something else I want to talk about, maybe later this week.

The “rigging” was pretty minor.  If you look at the table I took the mortality info from (from the Journal of the Statistical Society of Landon, Vol. XII, 1850)  you’ll see the infectious causes that I listed, neatly clustered together at the bottom.  In 1847, mortality from these diseases looked something like this:

Mortality in London, 1847

What’s missing?  The biggest killer of them all;1 it’s not included in this section because before 1882,2  they didn’t know tuberculosis (“consumption”) is an infectious disease.

Here’s what happens when we include “tubercular diseases” in the same chart — watch the scale!

Mortality in London, 1847

In developed countries most of us don’t think about tuberculosis much, but in the 19th century it was everywhere.  The poor — crowded and malnourished — were at the most risk, but consumption spared no one.  Rich and poor, merchant or noble or laborer, everyone had friends and family who died of consumption.

What happened to it?  Why did Tb go from causing 20% of all deaths, to only infecting 0.01% of the population (and killing a far smaller fraction)?  Unlike the other infectious diseases, vaccination and sanitation can only explain a part of that.  The death rate of Tb was already dropping drastically well before 1882, when Koch showed that it was infectious:3

Trends in Tb mortality

From “Pulmonary Tuberculosis
Maurice Fishberg (Lea and Febiger, Philadelphia, 1922)

So although antibiotics and, to some extent, vaccination were to help push Tb to obscurity in the 20th century, the disease was already, very slowly, fading before that.  (Tb rates had exploded in the 18th century, as urbanization crowded the poor together.  It wasn’t for many years that cities became self-sustaining and didn’t reply on immigration.)

Consumption (19th-century physician)
From “Passages from the diary of a late physician
Samuel Warren (Baudry’s European Library, 1838)

Why was Tb becoming less common? Well, this is the part I don’t actually understand very well, but according to Arthur Newsholme4 in 19085 this was indirectly because of the Poor Law of 1834 (Wikipedia on Poor Laws).  The Poor Laws were very, very primitive versions of welfare; the 1834 Act brought about a system of workhouses, where the desperately poor — and they had to be utterly desperate — were fed (barely) and housed (kind of ) and generally abused.  The point being, the poorest of the poor were kept in these workhouses; because the Tb sufferers, who couldn’t work normally, tended to be the poorest of the poor, they were housed in the workhouses and essentially quarantined.  After Koch demonstrated that Tb is contagious, in 1882, quarantine became a deliberate policy6 and rates dropped still more; and when antibiotics were introduced in the 1940s, rates of Tb dropped still more.

So although I said the biggest killers of the 19th century have been almost eliminated by sanitation and vaccination, that’s not really true of tuberculosis.  Antibiotics broke the back of the disease, but it was already being controlled to a large extent by social factors and then by medical opinion — one of the few cases where formal medicine actually had an influence on these diseases.

  1. Well, “Diseases of the lungs and other Organs of respiration” killed more people, but that’s not a single disease[]
  2. Koch, R. 1882. Die Aetiologie der Tuberculose. Berl. Klin. Wchnschr., xix: 221-230.[]
  3. And it wasn’t for years after that that an effective treatment or vaccine was available[]
  4. The Prevention of Tuberculosis, by Sir Arthur Newsholme.
    Methuen, 1908 []
  5. Supported by others since, e.g.  Wilson, L. (2004). Commentary: Medicine, population, and tuberculosis International Journal of Epidemiology, 34 (3), 521-524 DOI: 10.1093/ije/dyh196[]
  6. I believe Sir Arthur Newsholme was important in instituting this in Britain[]