Mystery Rays from Outer Space

Meddling with things mankind is not meant to understand. Also, pictures of my kids

October 29th, 2007

RNA, protein, and information

ENCODE logo Not long ago there was some keruffle over the ENCODE data,1 and the unrelated but almost simultaneous Cell paper,2 that demonstrated widespread transcription even from apparently-inactive genes — for example, this discussion and this one at Ars Technica’s Nobel Intent , and this one at Sandwalk. The observations were considered surprising because RNA was (and is) usually considered to be fairly tightly regulated. The ENCODE data, in particular, were used as arguments pro and con “junk” RNA — non-functional transcription.

The presence of non-functional RNA, though, didn’t strike me as very surprising at all, and part of the reason for that was that I had been primed to think about efficiency in cellular processes by Jon Yewdell’s “DRiPs” hypothesis.3

Briefly (I want to talk about DRiPs in detail some other time) Yewdell suggests that peptides that are presented on MHC class I to cytotoxic T lymphocytes are usually derived, not from full-length, functional proteins, but from “defective ribosomal products” — proteins that began translation and got screwed up partway through, or that completed translation and failed to fold properly. Proteins, in other words, that were defective from the get-go, that never had a chance to contribute to the whole happy economy of the cell. This contrasts to the traditional view, that antigenic peptides are derived from proteins during their normal turnover, often over a period of many hours.

Pollock Untitled (Green Silver)
Pollock’s drips: “Untitled (Green Silver)”

Yewdell argued that in fact a large percentage of translation (he’s offered various percentages, but let’s say 30% of translation) ends up in this defective pool, and because it’s defective it’s destroyed very rapidly by the proteasome — again, he’s offered various numbers, but let’s say for the sake of argument that it’s destroyed within a handful of minutes.

I don’t mind saying that I was extremely skeptical when I read his initial paper, and I am still quite skeptical about the overall contribution; but over the years I have (reluctantly) come a long way to accepting the general principle. But — unlike many of the people who disagreed with the DRiP hypothesis — I didn’t find the principle of DRiPs per se implausible.

In fact, it was one of those things that I had never thought of, but that made immediate sense to me as soon as I read the idea. I think of it as an information theory thing: Preventing errors in translation must take a certain amount of energy; at some point the incremental energy needed to reduce the error rate from N to N-1 would be greater than the amount of energy needed to degrade a defective product. And as soon as you consider it as that equation, it becomes a slider, and the set-point could be almost anywhere. It’s quite plausible (to me, anyway) that the amount of energy used in error prevention is relatively high, whereas the energy loss in protein degradation is relatively low — and so it’s cheaper, energetically, to simply make error-riddled protein, and let the proteasome sort it out after the fact.

(I’m simplifying all the arguments here, pro and con. I’ll probably take them up in bits and pieces later on.)

Anyway, exactly the same reasoning applies to transcription. The amount of energy that it would take to clamp down and make absolutely perfect identification of proper transcriptional start sites, must at some point be greater than the amount of energy involved in destruction of aberrant RNA. So this is why I thought it was quite predictable that there would be widespread, low-level, transcription of non-functional RNA that would then run into the next level of information processing.

Blogging on Peer-Reviewed ResearchThe reason I thought about this today, months after the fuss has pretty much died down, is a paper from Nilabh Shastri’s group4 that demonstrates another instance of what, I suspect, is another example of aberrance that’s tolerated by cells. (There’s also a commentary on the paper,5 by Yewdell.) Today’s post was in fact supposed to be entirely about that paper, but what with all the time I’ve spent blathering about the background, I’ll finish up with a new post later this week. In any case it’s time for me to go read “Green Eggs and Ham” to my kids.

(I’m trying out including the BPR3 icon here. I’m not entirely convinced by the BPR3 rationale, but I’m willing to see what happens for a while, anyway.)

  1. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 2007 447, 799-816. []
  2. Guenther, M. G., Levine, S. S., Boyer, L. A., Jaenisch, R., and Young, R. A. (2007). A chromatin landmark and transcription initiation at most promoters in human cells. Cell 130, 77-88.[]
  3. Yewdell, J. W., Aton, L. C., and Benink, J. R. (1996). Defective ribosomal products (DRiPs): A major source of antigenic peptides for MHC class I molecules? J. Immunol. 157, 1823-1826.[]
  4. Maness, N. J., Valentine, L. E., May, G. E., Reed, J., Piaskowski, S. M., Soma, T., Furlott, J., Rakasz, E. G., Friedrich, T. C., Price, D. A., Gostick, E., Hughes, A. L., Sidney, J., Sette, A., Wilson, N. A., and Watkins, D. I. (2007). AIDS virus specific CD8+ T lymphocytes against an immunodominant cryptic epitope select for viral escape. J Exp Med 204:2505-2512 []
  5. Yewdell, J. W., and Hickman, H. D. (2007). New lane in the information highway: alternative reading frame peptides elicit T cells with potent antiretrovirus activity. J Exp Med 204:2501-2504 []
September 27th, 2007

Epitope prediction: The seven percent solution

How to catch flu (Wellcome Images) I’ve talked several times (for example, here, here, and here) about predicting cytotoxic T lymphocyte (CTL) epitopes, and emphasized how hard it is (or, at least, how poor the tools are). Here’s an example of why it’s difficult.

(Quick review: CTL recognize virus-infected cells by screening small peptides that are bound to the class I major histocompatibility complex [MHC class I]. The peptides are created by destruction of proteins in the target cell. There’s a handy guide to antigen presentation here, if that helps put things into context.)

In my previous post on the subject, I listed a bunch of different factors that need to be incorporated in the predictions. Number 7 was “Binding to the MHC complex in the ER”, and I commented that peptide binding to MHC class I is probably the second-best understood step in the pathway (behind TAP transport, if you’re keeping score at home).

A paper from earlier this year1 tried to identify CTL epitopes in influenza viruses. Lots of papers do this, but most don’t follow up with actual, complete tests — too expensive and difficult. Wang et al did the follow through.

They started by looking simply at binding to MHC class I alleles. Without going into details (they were looking for conserved epitopes that matched HLA supertypes, if anyone cares) they identified 167 peptides that they predicted should bind to the various MHC class I alleles; and then they tested them to see if they actually did bind. (They used NetMHC 3.0 2 to predict binding.)

Of the 167 predicted binders, 39 failed to bind altogether, and another 39 only bound very weakly. That leaves 89 peptides (just 53% of their tested pool) that were authentic binders.

Influenza viruses infecting cells of the trachea

Then, they tested to see if their peptides actually reacted with CTL from healthy donors. (They assumed that their healthy donors were immune to a influenza A — reasonable, but not a guarantee, so this is a particularly conservative test, I think.) Just 13 of their peptides were positive by this test (7.8% of their total predicted pool). Unexpectedly, two peptides that were non-binders triggered a response. Wang et al speculated that the very low affinity binding was enough for the CTL, but I wonder if this represented a contamination issue — CTL are famously sensitive, and it’s well known that tiny contaminating peptides in a synthetic prep are enough to trigger CTL, even if they’re barely detectable by other means.


The paper I’ve thought of as the record-holder for accuracy (if I’m being generous with their denominator) is Kotturi et al,3 whose prediction was correct for 25 of 160 potential peptides — about twice as good as the influenza predictions here. But Kotturi et al were dealing with just two MHC class I alleles, H-2Db and H-2Kb, and those are very intensively-studied alleles. Wang et al. are not only looking at multiple alleles, they were using supertype approaches that allow them to cover almost all (>99%) of the population — a much more difficult prediction. To me, then, their predictions are remarkably successful.

But still: Just over 7% of their predictions were correct. And even limiting to prediction to a single step in the complex pathway — just looking at MHC class I binding of the peptides — they’re barely above 50% accuracy.

It’s a hard job. But I have to say that the field is progressing with impressive speed; these predictions are much more accurate than I would have expected five years ago.

  1. Wang, M., Lamberth, K., Harndahl, M., Roder, G., Stryhn, A., Larsen, M. V., Nielsen, M., Lundegaard, C., Tang, S. T., Dziegiel, M. H., Rosenkvist, J., Pedersen, A. E., Buus, S., Claesson, M. H., and Lund, O. (2007). CTL epitopes for influenza A including the H5N1 bird flu; genome-, pathogen-, and HLA-wide screening. Vaccine 25, 2823-2831. []
  2. NetMHC is based on these three references — which I’m including as a note to myself: (1) Nielsen, M., Lundegaard, C., Worning, P., Hvid, C. S., Lamberth, K., Buus, S., Brunak, S., and Lund, O. (2004). Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach. Bioinformatics 20, 1388-1397 . (2) Nielsen, M., Lundegaard, C., Worning, P., Lauemoller, S. L., Lamberth, K., Buus, S., Brunak, S., and Lund, O. (2003). Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci 12, 1007-1017 . (3) Buus, S., Lauemoller, S. L., Worning, P., Kesmir, C., Frimurer, T., Corbet, S., Fomsgaard, A., Hilden, J., Holm, A., and Brunak, S. (2003). Sensitive quantitative predictions of peptide-MHC binding by a ‘Query by Committee’ artificial neural network approach. Tissue Antigens 62, 378-384. []
  3. The CD8 T-Cell Response to Lymphocytic Choriomeningitis Virus Involves the L Antigen: Uncovering New Tricks for an Old Virus. Maya F. Kotturi, Bjoern Peters, Fernando Buendia-Laysa, Jr., John Sidney, Carla Oseroff, Jason Botten, Howard Grey, Michael J. Buchmeier, and Alessandro Sette. Journal of VIrology, May 2007, p. 4928–4940 []
September 23rd, 2007

Viral side-effects and MHC

Meggan Gould, Crow 104
“Crow 104” by Meggan Gould

The other day I heard a fascinating talk from Ned Walker, on the ecology and evolution of West Nile Virus in birds and mosquitos. Hopefully some of Ned’s cooler stuff should be published relatively soon, and I’ll talk about it then. In the mean time, Ned’s seminar reminded me of a really baffling observation I remembered reading about, in the mid 1990s, and prompted me to see what had happened to that story. As far as I can find, at present the state of the art regarding it seems to be (A) a big shrug, and (B) the suggestion that it’s an irrelevant side effect — the sort of thing that should make Larry Moran happy.

West Nile Virus (WNV) is a member of the flavivirus family, which are small (~11,000 bases) single-stranded RNA viruses that typically infect many species and often have an insect vector. For unknown reasons, in the mid 1990s WNV started to spread out of its traditional geographical regions (which are, you’ll never guess, the western Nile region of Africa) through much of Africa and Europe, and then hopped into North America and now has spread across the continent. It’s dangerous to humans (4269 cases, 177 fatalities in the USA in 2006) and much worse to birds — causing local extinctions of some species,1 especially crows2 (which, by the way, Ned3 thinks have got a bad rap as carriers — he thinks crows may be dead-end species, while many other species may be the actual routes of transmission).

Anyway, all this flavivirus talk reminded me of this previous observation.4 The reason I remember it was that it came out when I was particularly obsessed (even more so than I am today, believe it or not) with viral immune evasion; and the paper described exactly the opposite effect of what I was expecting.

Class I major histocompatibility complexes are recognized by cytotoxic T lymphocytes, which are generally agreed to be a major antiviral force; as such, many viruses target MHC class I and thereby block CTL recognition. 5 So it’s pretty common for virus-infected cells to show reduced levels of MHC class I.

West Nile Virus, transmission EM, from Wellcome Images
West Nile Virus

Mullbacher’s paper showed that flaviviruses do the opposite: They specifically up-regulate surface levels of MHC class I. (As it happens, this had been described earlier, though I think only in specific cell types,6 but this was the first time I had run into it.) Mullbacher’s group argued, and still argues, that this is because the virus specifically increases peptide transport into the endoplasmic reticulum (in the 1995 paper, they guessed that a general leakiness might be the cause; later, they argue that it’s a specific effect on the TAP peptide transporter.7 ) Other groups believe that it’s a transcriptional effect, through several different (interferon-dependent and -independent) pathways.8 Still, the phenomenon seems real, significant, and robust.

How come? What’s the benefit to the virus to up-regulate MHC class I, thus making itself a better target to CTL?

Over the years (I found out, once I picked up on this story again last week) a bunch of different explanations have been proposed — resistance to natural killer cells, and so on; but none of them have been very convincing, and more recently, you get the sense that the researchers are just growing tired of it. (“MHC class I up-regulation by flaviviruses: Immune interaction with unknown advantage to host or pathogen.”9)

The latest article on the subject I’ve been able to find10 argues that it’s just a side-effect of the flavivirus life-cycle, and has nothing to do with immunity one way or another:

We propose that the phenomenon of flavivirus-mediated MHC class I upregulation is a by-product of a unique assembly strategy evolved by flaviviruses and therefore did not evolve primarily as an immune escape mechanism for virus growth in the vertebrate host.

Correct or not, it’s a reasonable suggestion, and a useful reminder that not everything we can measure is adaptive. However, if the other groups’ transcriptional arguments are correct, then the phenomenon sounds more like the infected cells’ attempt at host defense — which I would file under the “adaptive” category, even if it’s not actually effective in this case.

  1. LaDeau, S. L., Kilpatrick, A. M., and Marra, P. P. (2007). West Nile virus emergence and large-scale declines of North American bird populations. Nature 447, 710-713. []
  2. The photo at the top, by the way, is by Meggan Gould[]
  3. I see that other groups have reached a similar conclusion, especially implicating robins as important transmission species[]
  4. Mullbacher, A., and Lobigs, M. (1995). Up-regulation of MHC class I by flavivirus-induced peptide translocation into the endoplasmic reticulum. Immunity 3, 207-214.[]
  5. At least, that’s the accepted wisdom — but see my previous discussions of that.[]
  6. King, N. J., Maxwell, L. E., and Kesson, A. M. (1989). Induction of class I major histocompatibility complex antigen expression by West Nile virus on gamma interferon-refractory early murine trophoblast cells. Proc Natl Acad Sci U S A 86, 911-915.[]
  7. Momburg, F., Mullbacher, A., and Lobigs, M. (2001). Modulation of transporter associated with antigen processing (TAP)-mediated peptide import into the endoplasmic reticulum by flavivirus infection. J Virol 75, 5663-5671.[]
  8. For example, Cheng, Y., King, N. J., and Kesson, A. M. (2004). Major histocompatibility complex class I (MHC-I) induction by West Nile virus: involvement of 2 signaling pathways in MHC-I up-regulation. J Infect Dis 189, 658-668.[]
  9. Lobigs, M., Mullbacher, A., and Regner, M. (2003). MHC class I up-regulation by flaviviruses: Immune interaction with unknown advantage to host or pathogen. Immunol Cell Biol 81, 217-223.[]
  10. Lobigs, M., Mullbacher, A., and Lee, E. (2004). Evidence that a mechanism for efficient flavivirus budding upregulates MHC class I. Immunol Cell Biol 82, 184-188. []
September 4th, 2007

A bit of background

Ag processing I’ve put up a static page with a brief summary of the MHC class I antigen presentation pathway. I just put up the introductory slide I usually use in my talks and added a simple key. It’s a pretty superficial overview that omits a number of critical steps, but it might be useful as background for some of the stuff I talk about.

July 23rd, 2007

Classic paper: Patterns in a haystack

In 1990, it was well known that major histocompatibility complexes bind peptides, and the structural basis for that binding was also clear; for example, Bjorkman et al’s crystal structure of HLA-A2, in 1987, showed the groove at the “top” of the MHC class I complex where peptides bind, and even showed an unstructured mass within it. A number of MHC-binding peptides had been identified, but (at least as I remember it) there was no general sense of a pattern among these peptides; there seemed to be little connecting them. Attempts to predict T cell epitopes focused more on peptide secondary structure1 or missed the point entirely by pooling together peptides from multiple different alleles.2 In some ways it was a confusing period; people were looking for binding peptides using synthetic long (say, 11mer or 15mer) peptides as they still do today, but with no guidance from patterns it was very difficult to identify the actual binding sequence (say, an 8- or 9mer) within the synthetic peptides.3 Accordingly, many of the peptides that were claimed to be epitopes, were actually too long, extended past the edges of the authentic peptide. It was a circular problem: Without knowing the authentic epitopes, you couldn’t easily find the motifs, but without knowing the motifs, it was hard to identify the authentic epitopes.

In 1990 and 1991, Hans-Georg Rammensee’s group solved this problem almost single-handedly. Their work came out in several papers, but probably the most important was:

Allele-Specific Motifs Revealed By Sequencing Of Self-Peptides Eluted From MHC Molecules
Falk K, Rotzschke O, Stevanovic S, Jung G, Rammensee HG
Nature 351 (6324): 290-296 May 23 1991

This paper is partly a methodological advance (and its methods are probably the main reason it’s been cited 1768 times, as I write this), but it gave some important insights into antigen presentation as well. More importantly (this is my blog, so this is all about me, me me) I found it a delight to read, when it came out in the early years of my PhD; it seemed such a daring approach, trying something that I would have thought (at the time) had no chance of ever working; it’s a beautiful example of pulling a simple, clear, and mostly-true model out of a haystack of data; and it helped visualize the system so clearly.

Rammensee’s first breakthrough was to directly identify an authentic MHC class I epitope.4 They did this by inventing a technique that became standard, a combination of biochemistry (to purify peptides from influenza-infected cells) and T cells (to identify the stimulating peptide). The surprise at the time was that the peptide that best stimulated the T cells did not co-purify with the peptide that had been previously identified as the influenza epitope, but rather was a shorter version:

Incidentally, both crude synthetic peptide preparations … contains other peptides of smaller size, which coeluted exactly with the respective natural peptide … The natural Db-restricted peptide coeluted with ASNENMETM … which is recognized 1,000 times better than IASNENMETMESSTLE. … The data also indicate that the use of synthetic peptides to identify T-cell epitopes may be misleading, as very minor byproducts may be responsible for much of the biologic effect.

This was also a very exciting paper in that it showed just how extraordinarily sensitive T cells are — thousands of times more sensitive than had been thought, because they weren’t recognizing the abundant synthetic peptide itself but rather the tiny amounts of contaminants in the peptide preps.

Wiley peptidesCells, even when uniformly infected with a virus, don’t present a single peptide; they present tens of thousands of different peptides, so the purification approach they had used previously was impossible for looking at overall peptide composition. 5 This is where Rammensee’s group took their bold leap forward. They were pretty confident now that the peptides associated with MHC class I had a constant, defined length of 9 amino acids, and they were pretty confident that peptides bound to a particular MHC class I allele would have some features in common — a motif for binding. So rather than try to pull out individual peptides from the whole messy gamisch on the cells, they grabbed the entire pool, all the peptides bound to one MHC class I allele, and sequenced the whole damn thing, the whole ten-thousand-peptide pool, by mass spec.

Sometimes after a breakthrough technique is published, everyone slaps their forehead and says “D’OH!”, because in hindsight it’s obvious that it should work. (PCR, for example.) This is not one of those cases. It’s still amazing to me that it works, and especially that it worked so well back in 1991 (the technique is still tricky even with today’s mass spec technology). But work it did. They pulled apart the peptides, amino acid by amino acid, and analyzed each position. (They were even able to completely sequence one specific very abundant peptide, the self-peptide SYFPEITHI.) What they saw was that, first, after 9 cycles there was little signal, consistent with their fundamental idea that the MHC class I allele (H-2Kd, in this case; they looked at several other alleles as well) bound 9mers and supporting the idea that they were really looking at authentic MHC class I-bound peptides. The other, and critical, finding was that at some positions, some amino acids were over-represented: “The Kd-eluted peptides have a distinct amino-acid residue pattern for each position from 1 to 9, whereas the mock-eluted material shows a uniform pattern of residues throughout.” At position 2 (for example), tyrosine was some 40 times as abundant as most of the other amino acids. In contrast, at other positions (position 1 and 3, for example), there was little if any difference between the amino acids. This led them to to concept of “anchor positions”, positions that tie down the whole peptide into the MHC class I binding groove. (See the picture to the right, taken from a 1993 paper by Don Wiley. It shows four different peptides that all bind HLA-A2; the side chains at each amino acid poke out fairly randomly, except for the second and the last amino acids (P2 and P9), the anchor positions, which are consistently tucked into the here-invisible pockets within the peptide-binding groove of HLA-A2.)

They were then able to take previously-identified MHC class I epitopes and neatly line them up, matching them to the anchor residues’ motifs. Abruptly, an incoherent mass of chaotic data fell into a neat, organized, and obvious pattern. And just to round out the elegance, this all fit beautifully with the MHC class I crystal structure that had been determined a few years before:

Co-crystallizing material not from the A2 sequence and bound to the cleft showed extensions (possible Leu and Val side chains) fitting the A2 pockets … Therefore, different MHC class I alleles differ in the location and shape of pockets in the cleft likely the be able specifically to accommodate certain amino-acid side chains.

After this paper, the whole epitope identification problem became much, much easier. People who had been scratching their heads over long sequences sat down with a piece of paper and found the real epitope that had been hiding in their peptide.6 Now there are thousands of well-defined perfect T cell epitopes, their sequences available in public databases — the father of which is the SYFPEITHI database, named after the self-peptide sequenced in this paper.

  1. For example, Spouge, J. L., H. R. Guy, J. L. Cornette, H. Margalit, K. Cease, J. A. Berzofsky, and C. DeLisi. 1987. Strong conformational propensities enhance T cell antigenicity. J. Immunol. 138:204-212, and DeLisi, C., and J. A. Berzofsky. 1985. T-cell antigenic sites tend to be amphipathic structures. Proc. Natl. Acad. Sci. USA 82:7048-7052.[]
  2. Rothbard, J. B., and W. R. Taylor. 1988. A sequence pattern common to T cell epitopes. EMBO J. 7:93-100.[]
  3. For example, in J Virol 65:1177-1186 (1991), a paper published by my PhD lab just as I joined them, they found the 11mer sequence TSSIEFARLQF but weren’t able to narrow it down to the actual binding peptide SSIEFARL (later identified in Virology. 1993 Jul;195(1):62-70.) []
  4. Rotzschke, O., K. Falk, K. Deres, H. Schild, M. Norda, J. Metzger, G. Jung, H. G. Rammensee. 1990. Isolation and analysis of naturally processed viral peptides as recognized by cytotoxic T cells. Nature 348: 252-254. []
  5. Also, of course, you need specific T cells for the identification step after purification, and normal self-peptides pretty much by definition don’t trigger a T cell response, so you have no readout for most of the peptides on a cell.[]
  6. It’s worth emphasizing, though, that motifs are far from perfect predictors. A significant minority of good epitopes do not match the defined motif — for some examples, see Kottori et al, which I discussed here. But most do.[]
July 5th, 2007

Peptide loading

PeptideNo peptideWhen T cells recognize virus-infected cells, what they’re actually “recognizing” is a short peptide that’s stuck in a class I major histocompatibility complex. The peptide sits neatly in a groove formed by two helices on top of a beta-sheet (“a hot dog in a bun”, students are told in immunology class, though to me it doesn’t look much like that).1 On the left, there’s a diagram of this groove with no peptide associated — on the right, with its peptide tucked in — and below, there’s a space-filling view (from a different angle, but till looking “down” at the MHC surface, the way a T cell would be “looking”). In this last view I’ve made the MHC atoms outline; the peptide is in brown, so you can see how tightly packed the peptide is. The “groove” is a pocket as much as a groove, and the peptide is buried fairly deeply within that pocket, with only its top surface exposed for T cells to look at. KbSIEFARL

How does the peptide wiggle into that slot? You can imagine that it would have some trouble just clicking in, like a Lego (TM) piece. What probably happens normally is that the pocket in the MHC is held in a more open configuration (the MHC class I is partially but not completely folded) until the peptide starts to settle in, and then the MHC actually finishes folding around the peptide. (There’s only circumstantial evidence for this, but it’s always hard to look at folding intermediates directly, even when they’re relatively long-lasting.) You’d expect that an accessory molecule, or molecules, might be involved either in the “holding open” phase, or in the “folding around the peptide” phase, or both.

As it happens, MHC class I interacts with a bunch of proteins during its maturation in the ER — classical chaperones like BiP, calnexin, calreticulin, ERp57, and PDI, ambiguous chaperones like tapasin, and the peptide transporter TAP. Of this list, tapasin has been the strongest candidate for “holding open” MHC class I and keeping it in a peptide-receptive state, and the finding that tapasin and ERp57 interact offered some conceptual models for how this might work.The latest issue of Nature Immunology has a paper that clarifies this:
Selective loading of high-affinity peptides onto major histocompatibility complex class I molecules by the tapasin-ERp57 heterodimer
Pamela A Wearsch & Peter Cresswell
Nature Immunology (Advance Online Publication: doi:10.1038/ni1485)

Cresswell’s group has finally managed to reconstitute peptide loading of MHC class I in vitro. There have been lots of attempts at this, but none have worked well. 2 The key turns out to be that tapasin alone doesn’t work; you need to include a tapasin-ERp57 disulphide-linked heterodimer. (Calreticulin was also part of the complex they used, though it’s not clear to me whether that was essential for loading.)Under these conditions, tapasin/ERp57 acts as a “peptide editor”, in that when tapasin/ERp57 is present low-affinity peptides are less able to compete for binding — in other words, tapasin/ERp57 helps assure that the peptides associated with MHC class I are “good” ones. Apparently the tapasin/ERp57 heterodimer directly competes with peptides for binding: 3

This observation suggests that the mechanism underlying peptide editing involves a reiterative process in which peptide displaces conjugate and conjugate displaces peptide until a sufficiently high affinity is reached that peptide remains associated.

This seems remarkably similar to the function of HLA-DM in MHC class I peptide assembly.

  1. 2007 is, I believe, the 20th anniversary of the first MHC class I crystal structure, and I’ll spend more time going over some of the more exciting features in a later post.[]
  2. Peptide will associate with a purified MHC class I/beta-2-microglobulin complex in the test tube, but it’s a very slow process, hours to days, compared to the 10-30 minutes that it takes in vivo.[]
  3. I don’t think this implies they necessarily bind to the same site, though.[]
June 15th, 2007

Peptide splicing, proteasomes, and immunity

Here I’m picking up on a throwaway comment I made in a thread on Larry Moran’s “Sandwalk” blog. Larry wrote about protein turnover in the cell, a favourite topic of mine to start with, especially when proteasomes come into play, as they so often do.

In the comments, daedalus2u observed “Proteases only hydrolyze peptides when the equilibrium favors it. Under conditions of dehydration, the equilibrium favors the making of peptides.” He made this in the context of lysosomes (and frankly his train of thought seems to increasingly run off the rails as the comment progresses) but it prompted Ryan to say that “I doubt proteasomes could ever act in reverse. ”

Just about everyone else doubted it, too, until a few years ago, when some really cool evidence for just that happening came out of immunology. As it turns out, though, proteasomes almost certainly can act in reverse and splice peptides. For a while it even seemed possible that this could be a common event, but I think it’s becoming increasingly likely that it’s actually a very rare event, one that’s usually only detectable by the exquisitely-sensitive T cell recognition system.1

Ryan’s reasoning wasn’t bad. He argued that “dehydrating the proteasome would change it’s structure and probably eliminate any catalytic activity.” That makes sense, but it misses something unusual (though not unique) about the proteasome.

Proteasomes have been in the news quite a bit since they won the Nobel in 20042 and there are lots of friendly introductions to proteasome-mediated protein degradation around. The Nobel Foundation has a fairly friendly “Information for the public” thing, and a less friendly but more complete PDF . For the purpose of peptide splicing, though, you only need to know the basics.

Here’s the basics: Proteasomes are multi-catalytic proteases, and they’re very abundant throughout the cytoplasm and nucleus of most cells. From this, you can work out why peptide splicing works. Not that anyone actually did work it out, but in hindsight there’s a definite logic to it. Follow closely here:

Proteasomes are multicatalytic. That is, they can chop up many different peptide bonds. That’s in contrast to many proteases, that only cleave when a very precise sequence of amino acids line up. Proteasomes do have their preferences, sure; there are sequences they don’t like — but if you feed a protein to a purified proteasome you’ll find that virtually every possible amino acid pair has been cleaved (if only very rarely).

If they’re multicatalytic, and they’re abundant, then they’re a potential hazard to normal cell function. You can’t have a protease indiscriminately chewing up cellular proteins. So proteasomes are regulated proteases (the regulation part is what the Nobel was for). If they’re regulated, you have to have a way to shield the catalytic sites so they only attack what they’re supposed to. Proteasomes do this by hiding their active sites on the inside of a hollow cylinder.

Proteasome end viewProteasome side viewHere I get to throw in a couple of images of the proteasome, which is something I do at every opportunity anyway.3 There’s an end view and a side view.4 In fact in a real cell, you probably wouldn’t see the end view like this, because this is the central core of a larger particle that has caps over the open ends. But it makes the point that this is a hollow, barrel-shaped structure. The catalytic sites are on the inside, the caps normally prevent access to the inside, and the regulatory machinery ends up selecting proteins that feed into the open chamber for destruction.

A couple of other proteases follow this pattern, by the way — tricorn protease is a huge, hollow icosahedral particle, for example. Tripeptidyl peptidase II is also a gigantic particle, and I wonder if there’s some kind of regulatory aspect to its size, even though as far as I know from relatively crude evidence, the catalytic sites of TPPII are more or less exposed.

Anyway, the hollow barrel of a proteasome is probably the key to its ability to do peptide splicing. As daedalus2u pointed out, enzymes run both ways. Proteases in general act through hydrolysis, which requires, of course, water. If there’s no water, the reaction can run backwards. In the old days, I’m told, that was how you synthesized peptides: you took the appropriate enzyme and ran the reaction in a non-aqueous system. Normally, of course, there is water inside a proteasome, or it wouldn’t work. But it’s not hard to picture a scenario where peptides are being rapidly generated, and before they have a chance to diffuse out of the proteasome they’re squeezing away water molecules. There you have a high concentration of reactive peptide ends, crowded together in the absence of a water molecule and bumping up against a promiscuous active site. When that happens, you can get peptide splicing.

As I said, this was detected using T cells, which are very sensitive to peptides — recognizing fewer than ten per cell, perhaps. In 2004, Benoit van den Eynde showed that a peptide that was a T cell epitope was in fact generated by peptide splicing in the proteasome5 and later, he showed that you can even swap position, demonstrating this with a T cell epitope that was generated by splicing two peptides in the reverse order.6

How common is this? After the first paper or two, we really didn’t know. When you look at peptide epitopes associated with a cell, I’m told, there are often a significant number that can’t be identified by blasting through databases. Were all of these unidentified because they were peptide splices? That was Benoit’s original idea, I think, and I wouldn’t have been at all surprised to see a small flood of papers triumphantly identifying as spliced those pesky holdout peptides from previous work.

Hasn’t happened, though. It’s negative evidence, but for the most part peptide splicing doesn’t seem to have fixed the problem of the unidentified peptide.7 Perhaps there will still be a herd of peptide splicing examples popping up any day now, but for now I’m leaning to the idea that this really is a very rare event.

Too bad, because it’s pretty cool.

  1. But I’m not going to be dogmatic about it. It’s an open possibility that this is a common event that’s just very hard to detect[]
  2. At any event, they’ve been in the news more often, even if they haven’t caught up with Paris Hilton yet[]
  3. Just one of the many things that make me the life of any party. I wonder why I’m not invited to more?[]
  4. This is the mammalian 20S proteasome, ref. Unno et al., Structure 2002 May; 10(5):609-18. I made the images with iMol from the pdb files.[]
  5. Science. 2004 Apr 23;304(5670):587-90[]
  6. Science. 2006 Sep 8;313(5792):1444-7[]
  7. They’re probably allelic variants, or maybe sequencing errors, or something like that, is my guess now[]
June 14th, 2007

Epitopes and Microsoft Computational Biology

Microsoft has released as open-source some code for analysis of antiviral immunity ( ) They offer 4 tools: PhyloD, Epitope Predictor, HLA Completion, and HLA Assignment. The first two are particularly interesting to me.

PhyloD is

a statistical tool that can identify HIV mutations that defeat the function of the HLA proteins in certain patients, thereby allowing the virus to escape elimination by the immune system. By applying this tool to large studies of infected patients, researchers are now able to start decoding the complex rules that govern the HIV mutations, in the hope of one day creating a vaccine to which the virus is unable to develop resistance.

The reference is to Bhattacharya et al., Science 16 March 2007: Vol. 315. no. 5818, pp. 1583 – 1586. It’s work that arises directly out of Bruce Walker’s (and others, but mostly Walker’s) work on HIV immune escape variants, which dates back to the late 1990s. I want to talk about immune escape in HIV some time, but that’s going to be a long post and I have a grant due, so I’m just going to move on to the second interesting tool, the Epitope Predictor. “This tool computes the probability that a given kmer is a T-cell epitope restricted to a given HLA allele”; the reference is Heckerman et al., RECOMB 2006, which I haven’t read yet.

This is interesting to me because it’s something I’m working on directly as well. Epitope prediction is a remarkably difficult job to do well — it’s easy to take a first pass and drastically narrow down your possibilities, but getting an accurate end product is hard.

Epitopes, in this case,are sequences of amino acids that are cut out of the full-length protein and recognized by the T cells. A full-length protein might be 500 or 1000 or more amino acids long, whereas epitopes are typically 9 amino acids long. A generic virus, say HIV, will have thousands, tens of thousands, of peptides of the appropriate length. There are moderate constraints on what can be turned into epitopes, because the peptides have to bind to HLA molecules. (HLA, human leukocyte antigen, is the species-specific term for MHC, major histocompatibility complex. I tend to use MHC, but to avoid, or at least reduce, confusion, Il’l stick to HLA here.) HLA molecules have binding rules: “Anchor” positions of the peptide must fit certain pattterns. For example, a peptide that binds to one particular human MHC allele (HLA-A3) will usually have a leucine, valine, or methionine at position 2, a lysine, tyrosine, or phenylalanine at the last position, and is fairly likely to have one of two amino acids at position 3, one of five at position 6, and one of four at position 7. So still fairly broad, but much narrower than the 20 to the 9th possibilities with no restrictions at all.

Humans, like almost all vertebrates, are wildly complex at the MHC genes — you don’t have the same HLA type as your neighbour, and probably don’t even have exactly the same type as your sister. But let’s just focus for now on one HLA type, HLA-A2 (the most common HLA-A allele in North American caucasians), because I want to see how good the Microsoft epitope prediction is.

There are several other on-line epitope prediction tools, and I haven’t tried all of them. One is at, another is at I’ve also written a couple of my own, just for fun, that are very simple-minded and crude. My own, which I’ve tested more extensively than any others, tend to catch “real” epitopes (i.e. those that occur naturally) as one of the top ten or twenty possibilities — rarely are my best scores the real epitopes, but it’s also rare to have a complete miss that doesn’t catch one in the top twenty or so.

A recent paper (Kotturi et al., Journal of Virology, May 2007, p. 4928–4940) looked at epitope prediction quite exhaustively — again this is something I want to talk about more extensively at a future date — and the bottom line was that epitope prediction was really helpful; it narrowed their search from thousands of peptides (that only caught two-thirds of the real epitopes) to a couple hundred (that caught more like 90% — but still missed a significant number of real epitopes, and still had around 90% false positives).

So, and this isn’t a careful test, let’s throw a few examples at the predictions and see how we do. I used an HIV nef protein that has at least 7 known epitopes that bind to HLA-A2 (if you’re playing along at home, the epitopes are ILKEPVHGV, VIYQYMDDL, VLDVGDAYFSV,ALQDSGLEV, IYQYMDDLYV, ELVNQIIEQL, and KYTAFTIPSI).

SYFPEITHI’s prediction does pretty well, catching 5 of the 7 in their top 25 scores; their first and third best were both true hits, and the other five were lower down in their ranking.

The IEDB tool did poorly, only finding one of the true epitopes in its top 25 (though it did give that one its highest score). To be fair, this prediction site needs a lot more fiddling than the others, and I didn’t spend much time tweaking it.

My own script catches 3 of the 7 out of my top 25 scores, but none are in the top ten.

By comparison, the Epitope Predictor at (remember the Epitope Predictor? This here’s a post about the Epitope Predictor) catches 2 of the 7 correctly; ranking them number 1 and 3.

So the bottom line, I think, is not that Microsoft sucks, but rather that epitope prediction is hard. There’s plenty of room for improvement (that’s part of the grant I’m working on). From this single example, SYFPEITHI — the granddaddy of epitope prediction — is pretty good, but even a very crude approach (mine) isn’t all that much worse.

Potentially, pooling approaches could be useful. Only one of the seven epitopes here was not predicted by any of the systems I tried here; three were only predicted by one of the systems (SYFPEITHI caught two, I caught the other); and only one epitope was predicted by all four systems. On the other hand, there would be a lot more noise, too.

So how come epitope prediction is so hard?

More about that later.