In 1990, it was well known that major histocompatibility complexes bind peptides, and the structural basis for that binding was also clear; for example, Bjorkman et al’s crystal structure of HLA-A2, in 1987, showed the groove at the “top” of the MHC class I complex where peptides bind, and even showed an unstructured mass within it. A number of MHC-binding peptides had been identified, but (at least as I remember it) there was no general sense of a pattern among these peptides; there seemed to be little connecting them. Attempts to predict T cell epitopes focused more on peptide secondary structure1 or missed the point entirely by pooling together peptides from multiple different alleles.2 In some ways it was a confusing period; people were looking for binding peptides using synthetic long (say, 11mer or 15mer) peptides as they still do today, but with no guidance from patterns it was very difficult to identify the actual binding sequence (say, an 8- or 9mer) within the synthetic peptides.3 Accordingly, many of the peptides that were claimed to be epitopes, were actually too long, extended past the edges of the authentic peptide. It was a circular problem: Without knowing the authentic epitopes, you couldn’t easily find the motifs, but without knowing the motifs, it was hard to identify the authentic epitopes.

In 1990 and 1991, Hans-Georg Rammensee’s group solved this problem almost single-handedly. Their work came out in several papers, but probably the most important was:

Allele-Specific Motifs Revealed By Sequencing Of Self-Peptides Eluted From MHC Molecules
Falk K, Rotzschke O, Stevanovic S, Jung G, Rammensee HG
Nature 351 (6324): 290-296 May 23 1991

This paper is partly a methodological advance (and its methods are probably the main reason it’s been cited 1768 times, as I write this), but it gave some important insights into antigen presentation as well. More importantly (this is my blog, so this is all about me, me me) I found it a delight to read, when it came out in the early years of my PhD; it seemed such a daring approach, trying something that I would have thought (at the time) had no chance of ever working; it’s a beautiful example of pulling a simple, clear, and mostly-true model out of a haystack of data; and it helped visualize the system so clearly.

Rammensee’s first breakthrough was to directly identify an authentic MHC class I epitope.4 They did this by inventing a technique that became standard, a combination of biochemistry (to purify peptides from influenza-infected cells) and T cells (to identify the stimulating peptide). The surprise at the time was that the peptide that best stimulated the T cells did not co-purify with the peptide that had been previously identified as the influenza epitope, but rather was a shorter version:

Incidentally, both crude synthetic peptide preparations … contains other peptides of smaller size, which coeluted exactly with the respective natural peptide … The natural Db-restricted peptide coeluted with ASNENMETM … which is recognized 1,000 times better than IASNENMETMESSTLE. … The data also indicate that the use of synthetic peptides to identify T-cell epitopes may be misleading, as very minor byproducts may be responsible for much of the biologic effect.

This was also a very exciting paper in that it showed just how extraordinarily sensitive T cells are — thousands of times more sensitive than had been thought, because they weren’t recognizing the abundant synthetic peptide itself but rather the tiny amounts of contaminants in the peptide preps.

Wiley peptidesCells, even when uniformly infected with a virus, don’t present a single peptide; they present tens of thousands of different peptides, so the purification approach they had used previously was impossible for looking at overall peptide composition. 5 This is where Rammensee’s group took their bold leap forward. They were pretty confident now that the peptides associated with MHC class I had a constant, defined length of 9 amino acids, and they were pretty confident that peptides bound to a particular MHC class I allele would have some features in common — a motif for binding. So rather than try to pull out individual peptides from the whole messy gamisch on the cells, they grabbed the entire pool, all the peptides bound to one MHC class I allele, and sequenced the whole damn thing, the whole ten-thousand-peptide pool, by mass spec.

Sometimes after a breakthrough technique is published, everyone slaps their forehead and says “D’OH!”, because in hindsight it’s obvious that it should work. (PCR, for example.) This is not one of those cases. It’s still amazing to me that it works, and especially that it worked so well back in 1991 (the technique is still tricky even with today’s mass spec technology). But work it did. They pulled apart the peptides, amino acid by amino acid, and analyzed each position. (They were even able to completely sequence one specific very abundant peptide, the self-peptide SYFPEITHI.) What they saw was that, first, after 9 cycles there was little signal, consistent with their fundamental idea that the MHC class I allele (H-2Kd, in this case; they looked at several other alleles as well) bound 9mers and supporting the idea that they were really looking at authentic MHC class I-bound peptides. The other, and critical, finding was that at some positions, some amino acids were over-represented: “The Kd-eluted peptides have a distinct amino-acid residue pattern for each position from 1 to 9, whereas the mock-eluted material shows a uniform pattern of residues throughout.” At position 2 (for example), tyrosine was some 40 times as abundant as most of the other amino acids. In contrast, at other positions (position 1 and 3, for example), there was little if any difference between the amino acids. This led them to to concept of “anchor positions”, positions that tie down the whole peptide into the MHC class I binding groove. (See the picture to the right, taken from a 1993 paper by Don Wiley. It shows four different peptides that all bind HLA-A2; the side chains at each amino acid poke out fairly randomly, except for the second and the last amino acids (P2 and P9), the anchor positions, which are consistently tucked into the here-invisible pockets within the peptide-binding groove of HLA-A2.)

They were then able to take previously-identified MHC class I epitopes and neatly line them up, matching them to the anchor residues’ motifs. Abruptly, an incoherent mass of chaotic data fell into a neat, organized, and obvious pattern. And just to round out the elegance, this all fit beautifully with the MHC class I crystal structure that had been determined a few years before:

Co-crystallizing material not from the A2 sequence and bound to the cleft showed extensions (possible Leu and Val side chains) fitting the A2 pockets … Therefore, different MHC class I alleles differ in the location and shape of pockets in the cleft likely the be able specifically to accommodate certain amino-acid side chains.

After this paper, the whole epitope identification problem became much, much easier. People who had been scratching their heads over long sequences sat down with a piece of paper and found the real epitope that had been hiding in their peptide.6 Now there are thousands of well-defined perfect T cell epitopes, their sequences available in public databases — the father of which is the SYFPEITHI database, named after the self-peptide sequenced in this paper.


  1. For example, Spouge, J. L., H. R. Guy, J. L. Cornette, H. Margalit, K. Cease, J. A. Berzofsky, and C. DeLisi. 1987. Strong conformational propensities enhance T cell antigenicity. J. Immunol. 138:204-212, and DeLisi, C., and J. A. Berzofsky. 1985. T-cell antigenic sites tend to be amphipathic structures. Proc. Natl. Acad. Sci. USA 82:7048-7052.[]
  2. Rothbard, J. B., and W. R. Taylor. 1988. A sequence pattern common to T cell epitopes. EMBO J. 7:93-100.[]
  3. For example, in J Virol 65:1177-1186 (1991), a paper published by my PhD lab just as I joined them, they found the 11mer sequence TSSIEFARLQF but weren’t able to narrow it down to the actual binding peptide SSIEFARL (later identified in Virology. 1993 Jul;195(1):62-70.) []
  4. Rotzschke, O., K. Falk, K. Deres, H. Schild, M. Norda, J. Metzger, G. Jung, H. G. Rammensee. 1990. Isolation and analysis of naturally processed viral peptides as recognized by cytotoxic T cells. Nature 348: 252-254. []
  5. Also, of course, you need specific T cells for the identification step after purification, and normal self-peptides pretty much by definition don’t trigger a T cell response, so you have no readout for most of the peptides on a cell.[]
  6. It’s worth emphasizing, though, that motifs are far from perfect predictors. A significant minority of good epitopes do not match the defined motif — for some examples, see Kottori et al, which I discussed here. But most do.[]