366 Putting engineering back into protein engineering: bioinformatic approaches to catalyst design Claes Gustafsson, Sridhar Govindarajan and Jeremy Minshull Complex multivariate engineering problems are commonplace and not unique to protein engineering. Mathematical and datamining tools developed in other fields of engineering have now been applied to analyze sequence–activity relationships of peptides and proteins and to assist in the design of proteins and peptides with specified properties. Decreasing costs of DNA sequencing in conjunction with methods to quickly synthesize statistically representative sets of proteins allow modern heuristic statistics to be applied to protein engineering. This provides an alternative approach to expensive assays or unreliable high-throughput surrogate screens. Addresses DNA 2.0, Inc., 1455 Adams Drive, Menlo Park, CA 94025, USA e-mail: [email protected] Current Opinion in Biotechnology 2003, 14:366–370 This review comes from a themed issue on Protein technologies and commercial enzymes Edited by Gjalt Huisman and Stephen Sligar 0958-1669/$ – see front matter ß 2003 Elsevier Ltd. All rights reserved. DOI 10.1016/S0958-1669(03)00101-0 Abbreviations NK1 neurokinin 1 NP non-polynomial PLS partial least squares Introduction Protein engineering has classically been approached from two diametrically opposed directions: rational design and directed evolution. Rational design, in the tradition of Descartes and Leibniz, attempts to understand protein structure and function at a complete mechanistic level so that any desired change can be effected by calculation from first principles. Directed evolution, in the tradition of John Locke and other empiricists, attempts to find a desired solution by testing many different variants, typically using various evolutionary based algorithms. Both rational design and directed evolution in their many alternative formats have shortcomings and advantages that have been discussed and compared elsewhere [1–3]. Modern heuristics applied to protein engineering is a synthesis of empirical data and a rational analysis of that information. The very first paper describing chemical Current Opinion in Biotechnology 2003, 14:366–370 synthesis of a gene proposed that systematic variation of amino acids would enable an understanding of the relationships between the sequence of a protein and its structure, physical behavior and activity [4]. Soon after that, Svante Wold’s group developed and applied multivariate data analysis techniques to peptide design and suggested that ‘the rapid development of protein engineering may then make it possible to produce designed sets of mature proteins and enzymes for QSAR studies’ [5,6]. This review will summarize recent publications in which modern heuristics have been applied to protein engineering and describes technological advances that are enabling Wold’s vision. Protein optimization from an engineering perspective When faced with solving a difficult problem it can be enlightening to see if a similar type of problem has been solved before. Many disciplines and industries face the same challenges of high system complexity and abundant variables that confront protein engineering [7]. In some industries increasing complexity is intentional, as in the addition of new control parameters for a car’s combustion engine. Sometimes it is inherent to the system itself, for example, in clinical drug trials. The common challenge in car manufacturing, clinical trials and protein engineering is to account for as much of this complexity as possible when describing the relationship between input variables (e.g. piston angle and temperature for car engines, age and medical history for patients or amino acid residues available at each position for protein engineering [8]) and output variables (e.g. exhaust levels and fuel efficiency for cars, side effects and survival rate for patients or the desired commercial properties such as catalytic activity, thermostability, substrate specificity and immunogenicity for protein engineering). Measured output variables may in turn result from combinations of properties that are not explicitly measured; for protein engineering, these may include expression levels and protein solubility [9]. Like small-molecule quantitative structure–activity relationships (QSAR), which have enjoyed much success in pharmaceutical development, heuristic protein engineering aims to identify the relationship between input and output variables to create biological macromolecules with defined properties. For reasons described below, more work has been published optimizing peptides than proteins using engineering concepts. We therefore use peptide examples to describe some of the principles before describing how the same engineering tools are used to optimize proteins. www.current-opinion.com Bioinformatic approaches to catalyst design Gustafsson, Govindarajan and Minshull 367 Navigating in protein sequence space Multivariate design of improved polypeptides Protein engineering can be divided into two subtasks: defining the solution space and defining the search algorithm. Figure 1 shows a procedure for peptide optimization derived from the one used by Norinder et al. [32] to design analogs of the neuropeptide substance P with increased affinity for the neurokinin 1 (NK1) receptor. These authors used partial least squares (PLS) regression [33,34] to correlate the sequences of 36 substance P analogs with their activities. They used this model to identify the positions and amino acid properties in substance P that had the largest effects on NK1 binding. The authors designed, synthesized and tested six new peptides that the model predicted to be improved NK1 binders. All six were shown to be highly active. Their sequence–activity data was added to the first 36 peptides to build a second generation PLS model, which was used to design a further three variants. One of these had an IC50 of 5 pM, 300-fold better than the wild-type peptide and 45-fold better than the best of the original 36 variants [32]. It is striking that extremely small numbers of variants (45) were made and tested to achieve very significant improvements in the desired function. Define the solution space The total possible number of proteins encoded by a 1 kb gene is 20333 (20 alternative amino acids at each position in a string of 333 residues) 10430. This is an unfeasibly large number of variants to screen. Fortunately, not all possible sequences need be considered as naturally occurring proteins can usually be relied on to provide a starting point for engineering efforts. Active point-mutants [10], phylogenetic substitutions [11], structural modeling [12,13] and known immunogenic constraints [14] are well-explored methods of targeting specific regions of a protein for change. Define the search algorithm Protein engineering is a non-polynomial (NP)-complete problem [15,16], meaning that the problem scales nonpolynomially with increasing complexity and no known algorithm can guarantee determining the optimal solution without evaluating all possible solutions. Empirical protein engineers have largely limited themselves to address the NP-complete problem with exhaustive searches using ultra-high-throughput phage and ribosome display screens [17,18] or evolutionary methods [1–3,19]. By contrast, the wider engineering community has exploited genetic algorithms as well as regression-based algorithms, neural nets, clustering, and several other tools as alternative techniques to address NP-complete problems [20]. Statistical targeting of amino acid changes Comparisons of natural protein and DNA sequences, particularly those using the powerful technique of principal component analysis, can be used to identify residues that are important for specific functionality within a protein [21,22,23,24,25]. Natural substitution patterns can also be used to infer which changes are likely to be acceptable within functional proteins. For example, a recent study of subtilisin variants found that all 52 of the amino acid variations found in 15 homologs were active within the context of at least one backbone; their incorporation produced proteases with varying catalytic properties [26]. In another set of experiments, all of the active-site residues from one fungal phytase were replaced with those from another, again the result was an active protein with altered catalytic properties [11]. By incorporating small numbers of changes identified from alignments of naturally occurring sequences, it has also been possible to increase the thermostability of a fungal phytase by over 308C [27]. Substitution matrices derived from synonymous and non-synonymous substitution rates can also be used to choose reasonable amino acid changes if there is insufficient phylogenetic data to use sequence alignments [28–30,31]. www.current-opinion.com The same techniques have also been applied to proteins. In one particularly informative example, Bucht and colleagues optimized a complex protein phenotype: the activity of acetylcholinesterase expressed on the surface of human COS-1 cells. Display of acetylcholinesterase on the cell surface occurs as a result of glycosyl phosphatidylinositol modification at the C terminus of the protein. The authors identified two amino acids in the signal peptide region of the protein, the identity of which affected cell-surface localization of the protein. They synthesized eight variant genes, tested the surface expression of the eight encoded proteins and used PLS to Figure 1 Create initial set of variants and measure desired phenotype Build sequence–activity model Design new variants based on model predictions for high performing sequences Add new data to refine sequence–activity model Synthesize and test new variants Current Opinion in Biotechnology Polypeptide optimization using mathematical models. The process is that used by Norinder et al. [32] for the optimization of the neuropeptide substance P. Current Opinion in Biotechnology 2003, 14:366–370 368 Protein technologies and commercial enzymes model the sequence–activity relationship. The authors then constructed an additional 27 variants in this same region of the protein, using them to test and refine the model, thereby identifying the optimal sequence for cellsurface expression of acetylcholinesterase [35]. Modeling sequence–activity relationships to identify optimal protein variants has not been limited to amino acids localized to a small region of a protein. Statistical analysis of mutations distributed throughout several enzymes has been used to identify the contributions of those changes to function of the protein [36] and to predict the sequence with best function [37]. Mathematical sequence– activity modeling has thus been validated at many scales of complexity: from small molecules to peptides to localized regions of proteins to changes spread throughout entire proteins. Although there is a growing body of work in which sequence–activity relationships are used to design improved peptides [5,6,38,39], application of the same methods to protein/biocatalyst engineering is still in its infancy. One reason for this has been the difficulty in producing large numbers of modified molecules [40]; in contrast to peptides, proteins cannot easily be synthesized directly. As technology improves, the synthesis of individually designed genes becomes increasingly costeffective [41,42]. Testing variants taken from libraries that are even cheaper to produce is also likely to produce useful sequence–activity relationships [43]. Experimental design of maximally informative datasets Another useful statistical tool with its origins in other engineering disciplines is that of experimental design. This is a technique by which a variant set is designed to contain the maximum amount of information for subsequent analysis of sequence–activity data [44]. Using D-optimal design, Mee et al. [45] designed, synthesized and tested a training set of 60 analogs of a 15 amino acid antibacterial peptide. A regression-based model derived from the sequence–activity correlation of the 60 datapoints was used to design and synthesize 39 new peptides predicted to have improved activity. The best designed peptide was twice as potent as the best one in the training set. In their selection of acetylcholinesterase variants, Bucht et al. [35] also used experimental design to choose the eight gene variants that would best represent the sequence variation they were exploring. Accounting for amino acid interactions If an amino acid change at one position affects the functional consequences of changing other amino acids in a protein, predictive sequence–function models must account for this. A model that incorporates amino acid interactions requires more data than one that assumes that the amino acids act to achieve the same quality of model [40,46]. In studies of antigen–antibody binding [40] Current Opinion in Biotechnology 2003, 14:366–370 and ligand–receptor binding [47], researchers found that very few interaction terms (and thus very little additional data) were needed to produce accurate descriptions of the sequence–activity relationship. Recent work from Husimi’s group suggests that this result is also true for proteins. Individual amino acid changes contributing to specific properties of dihydrofolate reductase [36], thermolysin and prolyl endopeptidase [37] are approximately independent. Of particular interest is a recent study in which only two of 14 randomly generated mutations that increased prolyl endopeptidase thermostability appeared to be interdependent. The authors’ model contained a single interaction term to account for this residue pair. A gene variant containing the pair predicted to interact was synthesized and tested; its activity was shown be as predicted by the model. Only 45 gene variants were needed to accurately model the activities of 16 384 possible sequence combinations [46]. Heuristic methods are becoming more widespread Other successful examples of heuristic approaches to analyze and optimize biological systems include the optimization of peptidase I using neural networks [48], calculations of individual amino acid contributions to serine protease inhibitor activity [49], PLS-based prediction of the determinants of protein localization [50,51], and protein contact map and interaction site prediction using neural networks [52]. In work complementing modeling to assess the contributions of small numbers of changes at many positions, sequence–activity relationships have been derived using PLS to quantitate the effects of multiple amino acid substitutions at single positions in haloalkane dehalogenase, T4 lysozyme, subtilisin and tryptophan synthase. These methods have also been used to determine the physicochemical properties required at identified positions to confer specific enzyme properties [53]. Furthermore, the same tools have been used to systematically characterize the substrates for a set of haloalkane dehalogenase variants to determine the effects of amino acid changes on substrate specificity of the enzyme [54]. Conclusions: drivers for change By casting the protein engineering problem as an optimization problem common to other engineering disciplines, we are able to exploit many different problem solving algorithms. Gone are the technological barriers to synthesizing statistically representative datasets. As Wold predicted in 1986, the capture of protein sequence– activity relationships now permits the design of optimized proteins. There are several drivers for applying modern engineering tools to protein engineering. Firstly, the human genome project, microarrays and other recent large scientific www.current-opinion.com Bioinformatic approaches to catalyst design Gustafsson, Govindarajan and Minshull 369 endeavours have changed biology from a ‘one variable at a time’ science to a science engulfed in variables. Secondly, statistical tools developed and deployed in a variety of engineering areas can now be operated by non-statisticians from any desktop computer. Finally, the cost of generating and sequencing statistically representative sets of genes is continuously decreasing. 11. Lehmann M, Lopez-Ulibarri R, Loch C, Viarouge C, Wyss M, van Loon AP: Exchanging the active site between phytases for altering the functional properties of the enzyme. Protein Sci 2000, 9:1866-1872. Demonstration that residues identified as functionally important (in this case the entire active site) can be moved from one protein backbone to another, leading to functionally novel catalysts. It is striking that by measuring the contribution of amino acid variations to the function of a protein, sequence– activity modeling requires orders of magnitude fewer variants to be tested to design improved sequences than the numbers screened using widespread directed evolution techniques. This is important, because methodologies that rely upon screening large sample sets are vulnerable to the weakness that high-throughput screens often turn out to have limited ability to measure the protein properties that are really important [2,19,40]. Heuristic methodologies may therefore permit protein engineers to test fewer variants under conditions that more closely approximate their final intended applications and reduce the time and resources that are often spent in building and implementing imprecise high-throughput screens. 13. Kwasigroch JM, Gilis D, Dehouck Y, Rooman M: PoPMuSiC, rationally designing point mutations in protein structures. Bioinformatics 2002, 18:1701-1702. Acknowledgements One of us (CG) began this manuscript while employed at Maxygen Inc. We thank Maxygen for their support. References and recommended reading Papers of particular interest, published within the annual period of review, have been highlighted as: of special interest of outstanding interest 1. Tobin MB, Gustafsson C, Huisman GW: Directed evolution: the ‘rational’ basis for ‘irrational’ design. Curr Opin Struct Biol 2000, 10:421-427. 2. van Regenmortel MH: Are there two distinct research strategies for developing biologically active molecules: rational design and empirical selection? J Mol Recognit 2000, 13:1-4. 3. Ryu DD, Nam DH: Recent progress in biomolecular engineering. Biotechnol Prog 2000, 16:2-16. 4. Nambiar KP, Stackhouse J, Stauffer DM, Kennedy WP, Eldredge JK, Benner SA: Total synthesis and cloning of a gene coding for the ribonuclease S protein. Science 1984, 223:1299-1301. 5. Hellberg S: A Multivariate Approach to QSAR. PhD thesis. Umea, Sweden: University of Umea: 1986. 6. Hellberg S, Sjostrom M, Skagerberg B, Wold S: Peptide quantitative structure-activity relationships, a multivariate approach. J Med Chem 1987, 30:1126-1135. 7. Gustafsson C, Govindarajan S, Emig R: Exploration of sequence space for protein engineering. J Mol Recognit 2001, 14:308-314. 8. Sandberg M, Eriksson L, Jonsson J, Sjostrom M, Wold S: New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. J Med Chem 1998, 41:2481-2491. 9. Lin Z, Thorsen T, Arnold FH: Functional expression of horseradish peroxidase in E. coli by directed evolution. Biotechnol Prog 1999, 15:467-471. 10. Glieder A, Farinas ET, Arnold FH: Laboratory evolution of a soluble, self-sufficient, highly active alkane hydroxylase. Nat Biotechnol 2002, 20:1135-1139. www.current-opinion.com 12. Looger LL, Dwyer MA, Smith JJ, Hellinga HW: Computational design of receptor and sensor proteins with novel functions. Nature 2003, 423:185-190. 14. Tangri S, LiCalsi C, Sidney J, Sette A: Rationally engineered proteins or antibodies with absent or reduced immunogenicity. Curr Med Chem 2002, 9:2191-2199. 15. Pierce NA, Winfree E: Protein design is NP-hard. Protein Eng 2002, 15:779-782. 16. Lathrop RH: The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Eng 1994, 7:1059-1068. 17. Hanes J, Pluckthun A: In vitro selection and evolution of functional proteins by using ribosome display. Proc Natl Acad Sci USA 1997, 94:4937-4942. 18. Wells JA, Lowman HB: Rapid evolution of peptide and protein binding properties in vitro. Curr Opin Biotechnol 1992, 3:355-362. 19. Ness JE, del Cardayre SB, Minshull J, Stemmer WP: Molecular breeding: the natural approach to protein design. Adv Protein Chem 2000, 55:261-292. 20. Johnson DS, McGeoch LA: The traveling salesman problem: a case study in local optimization. In Local Search in Combinatorial Optimization. Edited by Aarts EHL, Lenstra JK, Aarts EL: John Wiley & Sons Ltd; 1997:215-310. 21. Casari G, Sander C, Valencia A: A method to predict functional residues in proteins. Nat Struct Biol 1995, 2:171-178. 22. del Sol Mesa A, Pazos F, Valencia A: Automatic methods for predicting functionally important residues. J Mol Biol 2003, 326:1289-1302. Excellent comparison of methods available to identify residues that contribute to protein function. 23. Gogos A, Jantz D, Senturker S, Richardson D, Dizdaroglu M, Clarke ND: Assignment of enzyme substrate specificity by principal component analysis of aligned protein sequences: an experimental test using DNA glycosylase homologs. Proteins 2000, 40:98-105. Principal component analysis of small numbers of proteins used to identify residues likely to be involved in substrate specificity determination. 24. Suzuki Y, Gojobori T: A method for detecting positive selection at single amino acid sites. Mol Biol Evol 1999, 16:1315-1328. 25. Jonsson J, Norberg T, Carlsson L, Gustafsson C, Wold S: Quantitative sequence-activity models (QSAM) — tools for sequence design. Nucleic Acids Res 1993, 21:733-739. 26. Govindarajan S, Ness JE, Kim S, Mundorff EC, Minshull J, Gustafsson C: Systematic variation of amino acid substitutions for stringent assessment of pairwise covariation. J Mol Biol 2003, 328:1061-1069. Fifty-two phylogenetically identified substitutions in subtilisins are accepted into one enzyme backbone, modifying its activity. Most natural changes that occur together are shown to be a result of descent from a common ancestor and not a result of functional constraints. 27. Lehmann M, Loch C, Middendorf A, Studer D, Lassen SF, Pasamontes L, van Loon AP, Wyss M: The consensus concept for thermostability engineering of proteins: further proof of concept. Protein Eng 2002, 15:403-411. 28. Benner SA, Cohen MA, Gonnet GH: Amino acid substitution during functionally constrained divergent evolution of protein sequences. Protein Eng 1994, 7:1323-1332. Current Opinion in Biotechnology 2003, 14:366–370 370 Protein technologies and commercial enzymes 29. Wu TD, Brutlag DL: Discovering empirically conserved amino acid substitution groups in databases of protein families. Proc Int Conf Intell Syst Mol Biol 1996, 4:230-240. 30. Adenot M, Sarrauste de Menthiere C, Chavanieu A, Calas B, Grassy G: Peptides quantitative structure-function relationships: an automated mutation strategy to design peptides and pseudopeptides from substitution matrices. J Mol Graph Model 1999, 17:292-309. 31. Dimmic MW, Rest JS, Mindell DP, Goldstein RA: rtREV: an amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny. J Mol Evol 2002, 55:65-73. A substitution matrix for maximum likelihood phylogenetic analysis is developed that is optimized on a subset of sequences. Substitution matrices are unique for each sequence subset. 32. Norinder U, Rivera C, Unden A: A quantitative structure-activity relationship study of some substance P-related peptides. A multivariate approach using PLS and variable selection. J Pept Res 1997, 49:155-162. 33. Sandberg M: Deciphering Sequence Data, a Multivariate Approach. PhD thesis. Umea: Umea University: 1997. 34. Geladi P, Kowalski BR: Partial least squares regression: a tutorial. Anal Chim Acta 1986, 186:1-17. 35. Bucht G, Wikstrom P, Hjalmarsson K: Optimising the signal peptide for glycosyl phosphatidylinositol modification of human acetylcholinesterase using mutational analysis and peptide-quantitative structure-activity relationships. Biochim Biophys Acta 1999, 1431:471-482. PLS and experimental design are used to optimize acetylcholinesterase, increasing its surface expression on cells threefold. 36. Aita T, Iwakura M, Husimi Y: A cross-section of the fitness landscape of dihydrofolate reductase. Protein Eng 2001, 14:633-638. 37. Aita T, Uchiyama H, Inaoka T, Nakajima M, Kokubo T, Husimi Y: Analysis of a local fitness landscape with a model of the rough Mt. Fuji-type landscape: application to prolyl endopeptidase and thermolysin. Biopolymers 2000, 54:64-79. 38. Strom MB, Haug BE, Rekdal O, Skar ML, Stensen W, Svendsen JS: Important structural features of 15-residue lactoferricin derivatives and methods for improvement of antimicrobial activity. Biochem Cell Biol 2002, 80:65-74. 39. Eriksson L, Jonsson J, Hellberg S, Lindgren F, Skagerberg B, Sjostrom M, Wold S: Peptide QSAR on substance P analogues, enkephalins and bradykinins containing L- and D-amino acids. Acta Chem Scand A 1990, 44:50-55. 40. Choulier L, Andersson K, Hamalainen MD, van Regenmortel MH, Malmqvist M, Altschuh D: QSAR studies applied to the prediction of antigen-antibody interaction kinetics as measured by BIAcore. Protein Eng 2002, 15:373-382. Multivariate analysis applied to sequence optimization and reaction conditions. 41. Hoover DM, Lubkowski J: DNAWorks: an automated method for designing oligonucleotides for PCR-based gene synthesis. Nucleic Acids Res 2002, 30:e43. The shape of things to come. Gene synthesis gets cheaper and easier. 42. Holowachuk EW, Ruhoff MS: Efficient gene synthesis by Klenow assembly/extension-Pfu polymerase amplification (KAPPA) of overlapping oligonucleotides. PCR Methods Appl 1995, 4:299-302. Current Opinion in Biotechnology 2003, 14:366–370 43. Abecassis V, Pompon D, Truan G: High efficiency family shuffling based on multi-step PCR and in vivo DNA recombination in yeast: statistical and functional analysis of a combinatorial library between human cytochrome P450 1A1 and 1A2. Nucleic Acids Res 2000, 28:E88. One of many library synthesis methods. Interesting analysis of variants in which hybridization signals instead of known sequence changes are used as input variables for modeling. 44. Hellberg S, Eriksson L, Jonsson J, Lindgren F, Sjöström M, Skagerberg B, Wold S, Andrews P: Minimum analogue peptide sets (MAPS) for quantitative structure-activity relationships. Int J Pept Protein Res 1991, 37:414-424. 45. Mee RP, Auton TR, Morgan PJ: Design of active analogues of a 15-residue peptide using D-optimal design, QSAR and a combinatorial search algorithm. J Pept Res 1997, 49:89-102. 46. Aita T, Hamamatsu N, Nomiya Y, Uchiyama H, Shibanaka Y, Husimi Y: Surveying a local fitness landscape of a protein with epistatic sites for the study of directed evolution. Biopolymers 2002, 64:95-105. A model of only 45 prolyl endopeptidase variants accurately predicts the activities of combinations of 14 different mutations. Only one interaction term in required in the model. 47. Prusis P, Lundstedt T, Wikberg JE: Proteo-chemometrics analysis of MSH peptide binding to melanocortin receptors. Protein Eng 2002, 15:305-311. Statistically representative sets of melanocortin peptide and chimeric receptors were analyzed. Models incorporated linear and interaction terms; predictions were externally validated. 48. Schneider G, Schrödl W, Wallukat G, Muller J, Nissen E, Rönspeck W, Wrede P, Kunze R: Peptide design by artificial neural networks and computer-based evolutionary search. Proc Natl Acad Sci USA 1998, 95:12179-12184. 49. Lu SM, Lu W, Qasim MA, Anderson S, Apostol I, Ardelt W, Bigler T, Chiang YW, Cook J, James MN et al.: Predicting the reactivity of proteins from their sequence alone: Kazal family of protein inhibitors of serine proteinases. Proc Natl Acad Sci USA 2001, 98:1410-1415. The conclusion of an heroic 20 year study. By synthesizing and testing <200 variants, activities of many natural proteinases can be accurately predicted. 50. Sjöström M, Wold S, Wieslander A, Rilfors L: Signal peptide amino acid sequences in Escherichia coli contain information related to final protein localization. A multivariate data analysis. EMBO 1987, 6:823-831. 51. Schein AI, Kissinger JC, Ungar LH: Chloroplast transit peptide prediction: a peek inside the black box. Nucleic Acids Res 2001, 29:E82. 52. Fariselli P, Pazos F, Valencia A, Casadio R: Prediction of protein– protein interaction sites in heterocomplexes with neural networks. Eur J Biochem 2002, 269:1356-1361. 53. Damborsky J: Quantitative structure-function and structurestability relationships of purposely modified proteins. Protein Eng 1998, 11:21-30. 54. Marvanova S, Nagata Y, Wimmerova M, Sykorova J, Hynkova K, Damborsky J: Biochemical characterization of broadspecificity enzymes using multivariate experimental design and a colorimetric microplate assay: characterization of the haloalkane dehalogenase mutants. J Microbiol Methods 2001, 44:149-157. www.current-opinion.com