Research: Science and Education Significant Figures, the Periodic Table, and Mass Spectrometry: The Challenge of Large Biomolecules Nancy Carter Dopke, Paul M. Treichel, and Martha M. Vestling* Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53706; *[email protected] Overview Mass spectrometry, a tool long used by chemists to identify small compounds, is now being used by scientists in biological and health sciences as well as by chemists to characterize large biomolecules. The development of new techniques has made it possible to put intact large molecular ions into the gas phase. Two techniques, matrix-assisted laser desorption/ionization (MALDI) (1) and electrospray ionization (ESI) (2) have been particularly successful. Values of mass as high as 800,000 Da are now being reported from mass spectrometric studies, and it is expected that the next improvements in instrumentation will continue to expand the range of observable masses. So how many significant figures are required for meaningful mass spectrometric data for biomolecules? The answer depends on what question is being asked and on the molecular weight. To detect deamination, [Gln to Glu or Asn to Asp] (∆ 1 Da) or formation of a disulfide bond (∆ 2 Da), the experimental measurements need to detect small mass changes. Table 1 illustrates that the number of significant figures needed depends on the molecular weight of the substance. From a mass spectrum, it is possible to obtain information on the composition and structure of a compound. The former comes from measured ion masses and the latter primarily from the way molecules fragment. We focus here on the first topic, the masses of ions and molecules. This paper expands the discussion presented in a recent article in this Journal (3) describing the arithmetic used to predict isotopic cluster patterns. For large molecules, hand calculations of isotope patterns give way to calculations by computer, and Web sites that provide the programming to allow students and faculty to calculate isotopic patterns for compounds of any size are listed. This paper also explores the relationship between masses acquired by mass spectrometry and calculated molecular weights. This relationship is important because mass spectrometry data are now often routinely presented in lieu of molecular weight data obtained by other means. This discussion is likely to be useful to teachers of chemistry at different levels. Examples that deal with accuracy, precision, and significant figures of measurements are always useful. In addition, this paper comments on the challenge of calculating mass values for large biomolecules for use as mass spectrometry calibrants. The discussion starts with a small molecule, acetic acid (60 Da) and then considers four increasingly larger biomolecules: bradykinin 1,060 Da peptide hormone ubiquitin 8,565 Da small protein soybean trypsin inhibitor 20,090 Da enzyme inhibitor (protein) glycogen phosphorylase b 97,163 Da signal is to bracket the signal with signals from compounds of known mass. For this purpose, it is necessary to know the mass of the calibrant peaks quite accurately. The four compounds above have been used in mass spectrometer calibration. The challenge facing mass spectrometrists is twofold: the experimental problem that involves spectrometer resolution (the ability to experimentally differentiate two peaks) and the calibration problem (the assignment of mass values). Two different definitions of resolution are commonly used. Mass spectrometrists most often define resolution R as R = M/∆M, where M = m/z and ∆M is the peak width at half maximum (Fig. 1). This definition is different from the one chromatographers and spectrophotometrists use: R = ∆M. Calculation of Isotope Patterns To understand mass spectrometry of large molecules, it is useful to first examine the mass spectrum of a small molecule such as acetic acid. The typical electron impact spectrum recorded for acetic acid contains three peaks, at 60, 61, 62 m/z. The mass 60 peak, by far the most intense one, corresponds to 12C21H416O2 and is called the monoisotopic peak. The two lower-intensity peaks are due to molecules containing one or more atoms with higher masses. The simple three-peak pattern for acetic acid belies the complexity of the isotopic cluster. Statistical analysis, taking into account the mass and relative abundance of each isotope, quickly confirms this (4). Even with eight atoms, the number Table 1. Significant Figures Necessary to Detect a 1- or 2-Da Difference in Mass of Biomolecules Molecular Weight Significant Figures 1,000 4 10,000 5 100,000 6 1,000,000 7 enzyme (protein) All four molecules have known structures and are well characterized. The standard way to calibrate a mass spectrometric Figure 1. Illustration of a commonly used definition of resolution. JChemEd.chem.wisc.edu • Vol. 77 No. 8 August 2000 • Journal of Chemical Education 1065 Research: Science and Education of isotopic combinations is fairly staggering. With two carbon isotopes, 12C and 13C, two hydrogen isotopes, 1H and 2H, and three oxygen isotopes, 16O, 17O, and 18O, 90 different isotopic combinations are possible, ranging in mass from 60 to 70 (13C22H418O2). This does not count positional isotope possibilities such as 13CH312CO2H and 12CH313CO2H, since they have the same mass. The expected abundances (A) for any small isotopic composition can be calculated fairly easily. For small molecules like acetic acid, all that is needed is pen, paper, and calculator (or computer spreadsheet). Yergey’s paper (4a) gives an informative example 16O4717O218O1: 50! r16 47 r17 2 r18 1 A= = 1.52 × 105 47! 2! 1! where A is the expected abundance, 50 is the total number of oxygens, r16 is the abundance of 16O, r17 is the abundance of 17O, and r18 is the abundance of 18O. To calculate the abundance for any formula, an element abundance is first calculated for each element present. These element abundances are then multiplied together to generate an abundance value for a particular isotopic composition. How many of acetic acid’s 90 masses can we separate and measure? This depends on the resolution of the mass spectrometer being used and on the relative abundances of the masses. With R = M/∆M, “unit” resolution can be estimated as Runit = M/0.5. For acetic acid, unit resolution would be 120. Is this enough resolution to separate the 90 isotopic combinations? Table 2 shows the masses for a selection of abundant species in acetic acid’s isotope cluster. A mass spectrometer operating at unit resolution would see the six species listed in Table 2 as three peaks, one at m/z 60, one at m/z 61, and one at m/z 62. In order to separate the three species shown at mass 61, a much higher resolution, approximately 122,000, would be needed (61.02/0.0005 = 122,000). Most mass spectrometers can resolve acetic acid to unit resolution, but not to isotopic resolution. The masses and relative abundances were calculated using the values shown in Table 3. The arithmetic behind the fourth entry in Table 2 is as follows: A = [2(0.9890)2] [4(0.99985)3 (0.00015)] [2(0.99762)2] = 0.00058 carbon abundance hydrogen abundance oxygen abundance % relative abundance = [0.00058/(A for 12C21H416O2)] × 100 = [0.00058/0.972887] × 100 = 0.060 Masses above 62 are of such low abundance that their detection from noise is difficult or impossible. Eighty-four of the isotopic combinations are calculated to be less than 0.01% relative abundance. To obtain a complete isotope distribution, calculations for all combinations need to be done. Most people end up discarding the very low abundance ones. This means some rounding errors may be present in some calculations. Clearly, it is easier to use a computer program already set up to calculate abundances, and a computer is really helpful when isotopic patterns for larger molecules are needed. Thanks to the World Wide Web, anybody with a molecular formula can obtain calculated isotopic cluster numbers. Perhaps the most 1066 Table 2. Calculated Isotopic Pattern for the Most Abundant Species in Acetic Acid Composition Mass Relative Abundance 12 16 C2 H4 O2 60.02113 100.000 13 C12C1H416O2 61.02448 2.224 12 C21H416O17O 61.02534 0.076 12 C21H32H116O2 61.02741 0.060 12 C21H416O18O 62.02538 0.401 13 C21H416O2 62.02784 0.012 1 Table 3. Mass and Relative Abundance Data for Stable Isotopes of Carbon, Hydrogen, Nitrogen, Oxygen, and Sulfur Isotope 1 H 2 Isotope Massa Relative Abundance 1.007825032 99.985 H 2.014101778 0.015 12 C 12.0000000 98.90 13 C 13.003354838 1.10 14 N 14.003074005 99.634 15 N 15.000108898 0.366 16 O 15.994914622 99.762 17 O 16.999131501 0.038 18 O 17.999160419 0.200 32 S 31.972070690 95.02 33 S 32.971458497 0.75 34 S 33.967866831 4.21 36 S 35.967080880 0.02 NOTE: Note the number of significant figures for each value. a See ref 5 . comprehensive site is the one designed by Mark Winter at the University of Sheffield: http://www.shef.ac.uk/chemistry/ web-elements/index.html and http://www.shef.ac.uk/chemistry/ chemputer/isotopes.html. For other interesting software that mass spectrometrists use, Kermit Murray at Emory University has compiled a good section on software: http://base-peak.wiley.com. The molecular weight of acetic acid is the average of the values shown in Table 2 weighted by abundance. The practical consequence of this calculation is that the molecular weight does not correspond exactly to any of the peaks in acetic acid’s isotope cluster. The calculated value (60.05) is very close to mass value for the tallest peak in the cluster (60.02), but not identical to it. The tallest peak in acetic acid’s cluster is also its monoisotopic peak—the peak containing only the lowest stable isotope for each element. This discrepancy between measured mass values and calculated molecular weights can be seen in the mass spectrum for the peptide bradykinin shown in Figure 2. The MALDI mass spectrum shows an isotopic cluster for protonated bradykinin. An arrow in the figure indicates where the value for the formula weight [C50H74N15O11] falls; its deviation from the monoisotopic peak is quite noticeable. Journal of Chemical Education • Vol. 77 No. 8 August 2000 • JChemEd.chem.wisc.edu Research: Science and Education 4000 Table 4. Calculated Isotopic Pattern for Protonated Bradykinin with Relative Abundance Threshold Set at 0.1% of the Most Abundant Peak monoisotopic experimental 3500 average 3000 m/z Relative Abundance Multiplicity 1060.57 100.00 1a 1061.57 62.7 4 1062.57 21.5 8 1000 1063.58 5.3 10 500 1064.58 1.0 9 0 1065.58 0.1 5 a.i. 2500 2000 1500 1050 1060 1070 1080 . a Monoisotopic. m/z Figure 2. MALDI–TOF mass spectrum of bradykinin. The spectrum was acquired on a Bruker REFLEX equipped with a 337-nm laser, a dual-stage reflectron, and a delayed extraction module. The matrix used was α-cyano-4-hydroxycinnamic acid. The values for the monoisotopic and average masses are indicated with arrows. 1.000 calculated R = 200 a.i. 0.999 0.998 8550 8560 8570 8580 m/z average calculated 1.0 a.i. R = 2000 0.5 monoisotopic 0.0 8550 8560 8570 8580 m/z Figure 3. Predicted isotope clusters for ubiquitin at two different resolutions. The values for the monoisotopic and average masses are indicated with arrows. average calculated 1.0 a.i. 0.8 0.6 0.4 0.2 0.0 20060 monoisotopic 20070 20080 20090 20100 20110 20120 m/z Figure 4. Calculated isotopic cluster for soybean trypsin inhibitor A. The values for the monoisotopic and average masses are indicated with arrows. Table 4 contains the calculated unit resolution isotopic pattern for bradykinin. To see the 10 isotope compositions found in the fourth peak in the cluster, a resolution over 1,000,000 would be needed. Please note that the second peak in the protonated bradykinin spectrum is significantly more abundant than the second peak in acetic acid’s cluster. As mass increases, the width of an isotopic cluster also increases. Figure 3 shows the calculated isotopic cluster for the very small protein (large peptide) ubiquitin at two different resolutions. As can be seen, the resolution obtainable with a mass spectrometer becomes very much of an issue with larger molecules. The 20,000 resolution needed to easily resolve ubiquitin to unit resolution can be routinely obtained using Fourier transform ion cyclotron resonance mass spectrometry (FTMS) (6 ). Protonated ubiquitin has the elemental formula C378H630N105O 118S and contains more than 1200 atoms. Unlike acetic acid and bradykinin, the monoisotopic ion (calculated 8560.62) is not the most abundant ion in the pattern. However, the most abundant ion (calculated 8565.63) is still close to, but does not exactly match, the molecular weight (8565.85) calculated for protonated ubiquitin. For ubiquitin, the monoisotopic peak is only 4% of the tallest peak in the cluster, making it a poor choice for calibration purposes. The most abundant peak in an isotope cluster is going to have the best signal-to-noise ratio and will be the easiest to measure. With a small protein such as soybean trypsin inhibitor A, whose elemental formula for the protonated molecule is C892H1393N238O278S6, the monoisotopic peak (20079.05) is predicted to have a relative abundance of less than 0.01% compared to the tallest peak (20091.08) in the isotope cluster. Practically speaking, this means that if soybean trypsin inhibitor A is to be used as a calibrant, the monoisotopic ion in the isotope cluster cannot be used. Looking at the calculated pattern (Fig. 4), it appears that determining the tallest peak in the pattern is also not trivial, as three peaks have abundances greater than 98%. Although a resolution of 30,000 can separate the peaks shown, the monoisotopic peak will still not be abundant. So mass calibration using this cluster must depend on calculating a value for the tallest peak using what is known about the relative abundances of the isotopes and their significant figures. Interestingly, the tallest peak has a multiplicity of 10,907—meaning that there are 10,907 isotopic combinations with a mass of this value. No current JChemEd.chem.wisc.edu • Vol. 77 No. 8 August 2000 • Journal of Chemical Education 1067 Research: Science and Education average Table 5. Elemental Formulas for a Series of Peptides and Proteins calculated 1.0 r.i. 0.8 0.6 0.4 0.2 monoisotopic 0.0 97180 97200 97220 97240 97260 97280 m/z Figure 5. Calculated isotopic cluster for rabbit muscle glycogen phosphorylase b. The values for the monoisotopic and average masses are indicated with arrows. instrumentation is capable of resolving this pattern completely. Figure 5 shows the calculated isotopic cluster for the protonated protein glycogen phosphorylase b, from rabbit muscle. The monoisotopic mass of 97,163.8 is 29 mass units lower than a peak with 0.01% relative abundance compared to the tallest peak in the cluster, and 61 mass units lower than the tallest peak (97225.0) in the cluster. The formula used to calculate the isotopic pattern was C4367H6812N1211O1249S30. The Challenge of Large Molecules As we deal with larger and larger biomolecules, another interesting issue arises. The ability to generate an accurate calculated isotope pattern is clearly crucial. Such calculations require accurate masses and abundances for atoms. Masses are known very accurately, often to as many as 10 significant figures. Abundances of isotopes are not so accurately known. It comes as no surprise that the number of significant figures in a calculated molecular weight is dependent on the number of significant figures known for isotope abundances. Not all the values listed in Table 3 have 6 significant figures. Note that only the mass for the monoisotopic mass can be calculated to more than 6 significant figures. For proteins the most influential isotope in determining the isotopic pattern is 13C. To make matters more complicated, the 13C/12C ratio varies with protein source (7, 8 ). Beavis (7 ) states that the 13C/12C ratio in proteins varies between 12.0107 and 12.0111. For soybean trypsin inhibitor A, calculation of the average mass using low and high ratios results in a difference of 0.4 Da (20090.3 and 20090.7). For rabbit muscle glycogen phosphorylase b the difference is 1.8 Da (97,225.2 and 97,227.0). Clearly, determining the elemental isotopic ratios for each particular sample would add significant figures to the calculations for large molecules. One way to do this experimentally is to burn a protein and compare the 12C/13C ratio of the CO2 produced with a standard CO2, whose 13C/12C ratio is known (isotope ratio mass spectrometry). Given the large number of proteins known today, this may be a bit impractical. However, the take-home message from soybean trypsin inhibitor and glycogen phosphorylase b is be careful with significant figures when dealing with large molecules. 1068 Name Formula Leu5-enkephalin C28H38N5O7 Angiotensin II C50H72N13O12 Substance P C63H99N18O13S Neurotensin C78H122N21O20 Melittin C131H230N39O31 Glucagon C153H226N43O49S Cytochrome c, equine C560H874N148O156S4Fe Trypsinogen, bovine C1039H1627N286O338S14 Enolase, yeast C2079H3307N570O637S6 BSA (bovine serum albumin) C2935H4583N780O899S39 NOTE: The formula for MH+ is given. Sequences can be found on the Internet (11). Despite the significant figure limitations in known isotopic relative abundances, two experimental mass spectrometric approaches to calibrating large molecule spectra are available that address the significant figure limitations in isotopic relative abundances. Marshall and coworkers have demonstrated the feasibility of growing proteins in doubly depleted isotope media using 99.95% glucose-12C and 99.99% ammonium sulfate-14N (9). In the isotopic clusters for biomolecules obtained from such experiments, the higher mass peaks are greatly diminished. For example, the FK506-binding protein (11,780 Da) isolated from such a medium produced a spectrum in which the most abundant ion is the monoisotopic ion. Depleting the 13C present in a protein begs the question of how to calculate a value for the tallest peak in an isotopic cluster. However, several such proteins could be used to calibrate the peaks found in normal proteins. Green and coworkers have shown the feasibility of internal calibration with a small protein whose monoisotopic peak is present, in assigning a value to the tallest peak in a well-resolved cluster of a larger protein (10). This technique and Marshall’s doubly depleted proteins method yield precise experimental values but not calculated values for tallest peaks in large molecule clusters. And it should be noted that the precision of the FTMS experiment with large molecules allows differences between experimental values for samples and calibrants to be meaningful even if calculation of the actual value is limited by a lack of significant figures in the isotope abundance data. Conclusion Table 5 contains the elemental formula for a range of peptides and proteins that can be used for generating isotopic patterns for lectures and homework problems concerning mass spectrometry, isotopes, molecular weights, the periodic table, and significant figures. Most chemistry textbooks do not list elemental formulas for large biomolecules. Proteins are not the only large biomolecules. Nucleic acids are often larger, while carbohydrates and lipids come in all sizes. The above discussion shows that care must be used in assigning mass values to large molecules. Currently it is possible with FTMS to make mass measurements with great precision. However, what is known about isotope abun- Journal of Chemical Education • Vol. 77 No. 8 August 2000 • JChemEd.chem.wisc.edu Research: Science and Education dances limits calculation from formulas of numbers with the same number of significant figures and limits the precision of the periodic table. Obtaining detailed elemental isotopic ratios for specific samples or careful calibration with compounds whose monoisotopic masses can be used will be necessary for reporting mass spectrometric experimental results with numerous significant figures, especially for large molecules. Meanwhile, being able to work with and calculate details of small molecule isotope clusters is the first step in preparing students for the challenge of large molecule calculations. Acknowledgment National Science Foundation Award #9520868 (Chemistry Department, University of Wisconsin–Madison) was used for the purchase of the Bruker REFLEX mass spectrometer. Literature Cited 1. Hillenkamp, F.; Karas, M.; Beavis, R. C.; Chait, B. T. Anal. Chem. 1991, 63, 1193A–1203A. Muddiman, D. C.; Bakhtiar, R.; Hofstadler, S. A.; Smith, R. D. J. Chem. Educ. 1997, 74, 1288–1292. Limbach, P. A. Spectroscopy 1998, 13, 16–27. 2. Fenn, J. B.; Mann, M.; Meng, C. K.; Wong, S. F.; Whitehouse, C. M. Science 1989, 246, 64–71. Smith, R. D.; Loo, J. A.; Ogorzalek Loo, R. R.; Busman, M.; Udseth, H. R. Mass Spectrom. Rev. 1991, 10, 359–451. Hofstadler, S. A.; Bakhtiar, R.; Smith, R. D. J. Chem. Educ. 1996, 73, A82–A88. 3. Andraos, J. J. Chem. Educ. 1999, 76, 258–267. 4. (a) Yergey, J. A. Int. J. Mass Spectrom. Ion Proc. 1983, 52, 337– 349. (b) Rockwood, A. L. Rapid Commun. Mass Spectrom. 1995, 9, 103–105. (c) Rockwood, A. L.; Van Orden, S. L.; Smith, R. D. Anal. Chem. 1995, 67, 2699–2704. 5. Masses for all isotopes can be found at the following Web site: http://www.shef.ac.uk/chemistry/web-elements/index.html (accessed Apr 2000) or in the most recent Handbook of Chemistry and Physics, 80th ed.; Lide, D. R., Ed.; CRC Press: Boca Raton, FL, 1999–2000. 6. Marshall, A. G.; Grosshans, P. B. Anal. Chem. 1991, 63, 215A–229A. McLafferty, F. W. Acc. Chem. Res. 1994, 27, 379– 386. Lorenz, S. A.; Maziarz, E. P. III; Wood, T. D. Appl. Spectrosc. 1999, 53, 18A–36A. 7. Beavis, R. C. Anal. Chem. 1993, 65, 496–497. 8. Dougherty, R. C.; Marshall, A. G.; Eyler, J. R.; Richardson, D. E.; Smalley, R. E. J. Am. Soc. Mass Spectrom. 1994, 5, 120–123. Zubarev, R. A.; Demirev, P. A.; Hakansson, P.; Sudqvist, B. U. R. Anal. Chem. 1995, 67, 3793–3798. 9. Marshall, A. G.; Senko, M. W.; Li, W.; Li, M.; Dillon, S.; Guan, S.; Logan, T. M. J. Am. Chem. Soc. 1997, 119, 433–434. 10. Green, M. K.; Vestling, M. M.; Johnston, M. V.; Larsen, B. S. Anal. Biochem. 1998, 260, 204–211. 11. Protein sequences are available at the following Web site: http:// www.expasy.ch/sprot/ (accessed Apr 2000). JChemEd.chem.wisc.edu • Vol. 77 No. 8 August 2000 • Journal of Chemical Education 1069