Methods in Molecular Biology 1062 Jose J. Sanchez-Serrano Julio Salinas Editors Arabidopsis Protocols Third Edition METHODS IN M O L E C U L A R B I O LO G Y ™ Series Editor John M. Walker School of Life Sciences University of Hertfordshire Hatfield, Hertfordshire, AL10 9AB, UK For further volumes: http://www.springer.com/series/7651 Arabidopsis Protocols Third Edition Edited by Jose J. Sanchez-Serrano Centro Nacional de Biotecnología, CSIC, Madrid, Spain Julio Salinas Departmento Biologia de Plantas, Centro de Investigaciones Biologicas, CSIC, Madrid, Spain Editors Jose J. Sanchez-Serrano Centro Nacional de Biotecnología CSIC, Madrid Spain Julio Salinas Departmento Biologia de Plantas Centro de Investigaciones Biologicas CSIC, Madrid Spain ISSN 1064-3745 ISSN 1940-6029 (electronic) ISBN 978-1-62703-579-8 ISBN 978-1-62703-580-4 (eBook) DOI 10.1007/978-1-62703-580-4 Springer New York Heidelberg Dordrecht London Library of Congress Control Number: 2013948230 © Springer Science+Business Media New York 2014 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Humana Press is a brand of Springer Springer is part of Springer Science+Business Media (www.springer.com) Preface At present, Arabidopsis thaliana is acknowledged as the most important plant model system by the scientific community. Over the last years, the continuous efforts of plant scientists have led to the generation of a vast array of biological tools, and the development and optimization of research methodology that has altogether prompted the generation of a massive amount of highly valuable experimental data. Both scientific information and biological materials have been made accessible efficiently through shared public/private resources such as TAIR and the various biological stock centers, in a praiseworthy example of collaboration for the optimal use of scientific resources. These initiatives have fueled the investigation in essentially every aspect of plant biology. Arabidopsis research has thus fundamentally influenced our understanding of the basic biology and ecology of plants. Also importantly, the knowledge gained from this model species is already being translated to other plants, particularly crops, at an always-faster pace. It is expected that this transfer will soon continue to satisfy the increasing demand for improved agricultural products, including food, fiber, and biofuel. Interestingly, moreover, Arabidopsis is becoming an important model system for researchers studying other multicellular organisms, recognizing the advantages of this experimental system for the elucidation of basic, universal biological questions. We have prepared this third edition of Arabidopsis Protocols in an effort to compile some of the most recent methodology developed to exploit the Arabidopsis genome. To this, we have relied on the experience of a significant group of leading experts in the methodologies described. These methodologies cover from the guided access to public resources, to genetic, cell biological, biochemical, and physiological techniques, including both those that are widely used and those novel ones likely to open new avenues of knowledge in the near future. In addition, considering the recent unparalleled progress of the “omics” tools in Arabidopsis, we include sections on genome, transcriptome, proteome, metabolome, and other whole-system approaches. As in previous editions, we have tried to present a collection of step-by-step protocols, described at a level of detail enough to be followed both by experienced researchers and beginners. We would finally like to thank all our contributing colleagues whose expertise and effort has been essential for attaining the highest scientific standard in this book. Madrid, Spain Jose J. Sanchez-Serrano Julio Salinas v Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PART I GROWING ARABIDOPSIS 1 Handling Arabidopsis Plants: Growth, Preservation of Seeds, Transformation, and Genetic Crosses . . . . . . . . . . . . . . . . . . . . . . . . Luz Rivero, Randy Scholl, Nicholas Holomuzki, Deborah Crist, Erich Grotewold, and Jelena Brkljacic 2 Using Arabidopsis-Related Model Species (ARMS): Growth, Genetic Transformation, and Comparative Genomics . . . . . . . . . . . . . . . . . . . . . . . . . . Giorgia Batelli, Dong-Ha Oh, Matilde Paino D’Urzo, Francesco Orsini, Maheshi Dassanayake, Jian-Kang Zhu, Hans J. Bohnert, Ray A. Bressan, and Albino Maggio 3 Growing Arabidopsis In Vitro: Cell Suspensions, In Vitro Culture, and Regeneration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bronwyn J. Barkla, Rosario Vera-Estrella, and Omar Pantoja PART II 3 27 53 ARABIDOPSIS RESOURCES 4 Arabidopsis Database and Stock Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . Donghui Li, Kate Dreher, Emma Knee, Jelena Brkljacic, Erich Grotewold, Tanya Z. Berardini, Philippe Lamesch, Margarita Garcia-Hernandez, Leonore Reiser, and Eva Huala 5 Bioinformatic Tools in Arabidopsis Research . . . . . . . . . . . . . . . . . . . . . . . . . . Miguel de Lucas, Nicholas J. Provart, and Siobhan M. Brady PART III v xi 65 97 GENETIC TECHNIQUES 6 Exploiting Natural Variation in Arabidopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . Johanna A. Molenaar and Joost J.B. Keurentjes 7 Grafting in Arabidopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Katherine Bainbridge, Tom Bennett, Peter Crisp, Ottoline Leyser, and Colin Turnbull 8 Agrobacterium tumefaciens-Mediated Transient Transformation of Arabidopsis thaliana Leaves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Silvina Mangano, Cintia Daniela Gonzalez, and Silvana Petruccelli 9 iTILLING: Personalized Mutation Screening . . . . . . . . . . . . . . . . . . . . . . . . . Susan M. Bush and Patrick J. Krysan vii 139 155 165 175 viii Contents 10 Tailor-Made Mutations in Arabidopsis Using Zinc Finger Nucleases . . . . . . . . Yiping Qi, Colby G. Starker, Feng Zhang, Nicholas J. Baltes, and Daniel F. Voytas 11 The Use of Artificial MicroRNA Technology to Control Gene Expression in Arabidopsis thaliana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andrew L. Eamens, Marcus McHale, and Peter M. Waterhouse 12 Generation and Identification of Arabidopsis EMS Mutants . . . . . . . . . . . . . . . Li-Jia Qu and Genji Qin 13 Generation and Characterization of Arabidopsis T-DNA Insertion Mutants. . . Li-Jia Qu and Genji Qin 14 Identification of EMS-Induced Causal Mutations in Arabidopsis thaliana by Next-Generation Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Naoyuki Uchida, Tomoaki Sakamoto, Masao Tasaka, and Tetsuya Kurata 15 Arabidopsis Transformation with Large Bacterial Artificial Chromosomes . . . . Jose M. Alonso and Anna N. Stepanova 16 Global DNA Methylation Analysis Using Methyl-Sensitive Amplification Polymorphism (MSAP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mahmoud W. Yaish, Mingsheng Peng, and Steven J. Rothstein PART IV 193 211 225 241 259 271 285 MOLECULAR BIOLOGICAL TECHNIQUES 17 Next-Generation Mapping of Genetic Mutations Using Bulk Population Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ryan S. Austin, Steven P. Chatfield, Darrell Desveaux, and David S. Guttman 18 Chemical Fingerprinting of Arabidopsis Using Fourier Transform Infrared (FT-IR) Spectroscopic Approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . András Gorzsás and Björn Sundberg 19 A Pipeline for 15N Metabolic Labeling and Phosphoproteome Analysis in Arabidopsis thaliana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Benjamin B. Minkoff, Heather L. Burch, and Michael R. Sussman 20 Gene Expression Profiling Using DNA Microarrays. . . . . . . . . . . . . . . . . . . . . Kyonoshin Maruyama, Kazuko Yamaguchi-Shinozaki, and Kazuo Shinozaki 21 Forward Chemical Genetic Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyunmo Choi, Jun-Young Kim, Young Tae Chang, and Hong Gil Nam 22 Highly Reproducible ChIP-on-Chip Analysis to Identify Genome-Wide Protein Binding and Chromatin Status in Arabidopsis thaliana . . . . . . . . . . . . Jong-Myong Kim, Taiko Kim To, Maho Tanaka, Takaho A. Endo, Akihiro Matsui, Junko Ishida, Fiona C. Robertson, Tetsuro Toyoda, and Motoaki Seki 301 317 353 381 393 405 Contents PART V CELL BIOLOGICAL TECHNIQUES 23 Fluorescence Microscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sébastien Peter, Klaus Harter, and Frank Schleifenbaum 24 Immunocytochemical Fluorescent In Situ Visualization of Proteins In Arabidopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yohann Boutté and Markus Grebe 25 High-Pressure Freezing and Freeze Substitution of Arabidopsis for Electron Microscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jotham R. Austin II 26 Applications of Fluorescent Marker Proteins in Plant Cell Biology . . . . . . . . . . Michael R. Blatt and Christopher Grefen 27 Flow Cytometry and Sorting in Arabidopsis . . . . . . . . . . . . . . . . . . . . . . . . . . David W. Galbraith 28 Live Imaging of Arabidopsis Development . . . . . . . . . . . . . . . . . . . . . . . . . . . Daniel von Wangenheim, Gabor Daum, Jan U. Lohmann, Ernst K. Stelzer, and Alexis Maizel 29 Arabidopsis Organelle Isolation and Characterization . . . . . . . . . . . . . . . . . . . Nicolas L. Taylor, Elke Ströher, and A. Harvey Millar PART VI ix 429 453 473 487 509 539 551 BIOCHEMICAL AND PHYSIOLOGICAL TECHNIQUES 30 Analysis of Subcellular Metabolite Distributions Within Arabidopsis thaliana Leaf Tissue: A Primer for Subcellular Metabolomics . . . . . . . . . . . . . . . . . . . . Stephan Krueger, Dirk Steinhauser, Jan Lisec, and Patrick Giavalisco 31 Hormone Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gaetan Glauser, Armelle Vallat, and Dirk Balmer 32 Purification of Protein Complexes and Characterization of Protein-Protein Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kirby N. Swatek, Chris B. Lee, and Jay J. Thelen 33 Protein Fragment Bimolecular Fluorescence Complementation Analyses for the In Vivo Study of Protein-Protein Interactions and Cellular Protein Complex Localizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rainer Waadt, Kathrin Schlücking, Julian I. Schroeder, and Jörg Kudla 34 The Split-Ubiquitin System for the Analysis of Three-Component Interactions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christopher Grefen 35 RNA-Binding Protein Immunoprecipitation from Whole-Cell Extracts . . . . . . Tino Köster and Dorothee Staiger 36 High-Throughput Analysis of Protein-DNA Binding Affinity . . . . . . . . . . . . . José M. Franco-Zorrilla and Roberto Solano Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575 597 609 629 659 679 697 711 Contributors JOSE M. ALONSO • Department of Genetics, North Caroline State University, Raleigh, NC, USA JOTHAM R. AUSTIN II • Advance Electron Microscopy Facility, Department of Molecular Genetics and Cell Biology, University of Chicago, Chicago, IL, USA RYAN S. AUSTIN • Southern Crop Protection and Food Research Centre, Agriculture & Agri-Food Canada, London, ON, Canada; Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada KATHERINE BAINBRIDGE • Department of Biology, University of York, York, UK DIRK BALMER • Laboratory of Molecular and Cell Biology, Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland NICHOLAS J. BALTES • Department of Genetics, Cell Biology & Development and Center for Genome Engineering, University of Minnesota, Minneapolis, MN, USA BRONWYN J. BARKLA • Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico GIORGIA BATELLI • CNR-IGV Institute of Plant Genetics, Portici, Italy TOM BENNETT • Department of Biology, University of York, York, UK TANYA Z. BERARDINI • Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA MICHAEL R. BLATT • Laboratory of Plant Physiology and Biophysics, University of Glasgow, Glasgow, UK HANS J. BOHNERT • Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA; Division of Applied Science, Gyeongsang National University, Jinju, South Korea; College of Science, King Abdulaziz University, Jeddah, Kingdom of Saudi Arabia YOHANN BOUTTÉ • Department of Forest Genetics and Plant Physiology, UPSC, Swedish University of Agricultural Sciences, Umeå, Sweden; Membrane biogenesis laboratory, CNRS, UMR5200, Victor Ségalen Bordeaux 2 University, Bordeaux, France SIOBHAN M. BRADY • Department of Plant Biology and Genome Center, UC Davis, Davis, CA, USA RAY A. BRESSAN • Division of Applied Science, Gyeongsang National University, Jinju, South Korea; Department of Horticulture and Landscape Architecture, Purdue University, West Lafayette, IN, USA; College of Science, King Abdulaziz University, Jeddah, Kingdom of Saudi Arabia JELENA BRKLJACIC • Arabidopsis Biological Resource Center, The Ohio State University, Columbus, OH, USA HEATHER L. BURCH • Biotechnology Center, University of Wisconsin-Madison, Madison, WI, USA SUSAN M. BUSH • Department of Plant Biology, University of California-Davis, Davis, CA, USA xi xii Contributors YOUNG TAE CHANG • Department of Chemistry, National University of Singapore, Singapore, Singapore STEVEN P. CHATFIELD • Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada HYUNMO CHOI • Department of Life Science, Pohang University of Science and Technology, Pohang, Republic of Korea PETER CRISP • Research School of Biology, Australian National University, Canberra, ACT, Australia DEBORAH CRIST • Arabidopsis Biological Resource Center, Center for Applied Plant Sciences, Department of Molecular Genetics, The Ohio State University, Columbus, OH, USA MATILDE PAINO D’URZO • Department of Horticulture and Landscape Architecture, Purdue University, West Lafayette, IN, USA MAHESHI DASSANAYAKE • Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA GABOR DAUM • Department of Stem Cell Biology, University of Heidelberg, Heidelberg, Germany; Centre for Organismal Studies, University of Heidelberg, Heidelberg, Germany MIGUEL DE LUCAS • Department of Plant Biology and Genome Center, UC Davis, Davis, CA, USA DARRELL DESVEAUX • Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada; Centre for the Analysis of Genome Evolution & Function, University of Toronto, Toronto, ON, Canada KATE DREHER • Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA ANDREW L. EAMENS • School of Environmental and Life Sciences, University of Newcastle, Callaghan, NSW, Australia TAKAHO A. ENDO • RIKEN Bioinformatics and Systems Engineering Division, Yokohama, Japan JOSÉ M. FRANCO-ZORRILLA • Genomics Unit, Centro Nacional de Biotecnología-CSIC, Madrid, Spain DAVID W. GALBRAITH • School of Plant Sciences, University of Arizona, Tuczon, AZ, USA MARGARITA GARCIA-HERNANDEZ • Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA PATRICK GIAVALISCO • Department of Molecular Physiology, Max Planck Institute of Molecular Plant Physiology, Potsdam-Golm, Germany GAETAN GLAUSER • Chemical Analytical Service of the Swiss Plant Science Web, Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland CINTIA DANIELA GONZALEZ • Departamento de Ciencias Biológicas, Facultad de Ciencias Exactas, Centro de Investigación y Desarrollo en Criotecnología de Alimentos (CIDCA)-CCT-La Plata-CONICET, Universidad de La Plata, La Plata, Argentina ANDRÁS GORZSÁS • Department of Chemistry, Umeå University, Umeå, Sweden MARKUS GREBE • Department of Plant Physiology, Umeå Plant Science Centre (UPSC), Umeå University, Fysiologihuset Byggnad L, Umeå, Sweden CHRISTOPHER GREFEN • Emmy Noether Research Group Leader, ZMBP, Developmental Genetics, Tuebingen, Germany ERICH GROTEWOLD • Arabidopsis Biological Resource Center, The Ohio State University, Columbus, OH, USA Contributors xiii DAVID S. GUTTMAN • Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada; Centre for the Analysis of Genome Evolution & Function, University of Toronto, Toronto, ON, Canada KLAUS HARTER • Center for Plant Molecular Biology, University of Tuebingen, Tuebingen, Germany NICHOLAS HOLOMUZKI • Arabidopsis Biological Resource Center, Center for Applied Plant Sciences, Department of Molecular Genetics, The Ohio State University, Columbus, OH, USA EVA HUALA • Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA JUNKO ISHIDA • Plant Genomic Network Research Team, RIKEN Plant Science Center, Yokohama, Japan JOOST J.B. KEURENTJES • Laboratory of Genetics, Wageningen University, Wageningen, The Netherlands JONG-MYONG KIM • Plant Genomic Network Research Team, RIKEN Plant Science Center, Yokohama, Japan JUN-YOUNG KIM • Department of Chemistry, National University of Singapore, Singapore, Singapore EMMA KNEE • Arabidopsis Biological Resource Center, The Ohio State University, Columbus, OH, USA TINO KÖSTER • Department of Molecular Cell Physiology, Institute for Genome Research and Systems Biology, University of Bielefeld, Bielefeld, Germany STEPHAN KRUEGER • Botanical Institute II, University of Cologne, Cologne, Germany PATRICK J. KRYSAN • Department of Horticulture and Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI, USA JÖRG KUDLA • Molekulargenetik und Zellbiologie der Pflanzen, Institut für Biologie und Biotechnologie der Pflanzen, Universität Münster, Münster, Germany TETSUYA KURATA • Plant Global Education Project, Graduate School of Biological Sciences, Nara Institute of Science and Technology, Ikoma, Japan PHILIPPE LAMESCH • Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA CHRIS B. LEE • Department of Biochemistry, Life Sciences Center, University of Missouri, Columbia, MO, USA OTTOLINE LEYSER • Department of Biology, University of York, York, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK DONGHUI LI • Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA JAN LISEC • Department of Molecular Physiology, Max Planck Institute of Molecular Plant Physiology, Potsdam-Golm, Germany JAN U. LOHMANN • Department of Stem Cell Biology, University of Heidelberg, Heidelberg, Germany; Centre for Organismal Studies, University of Heidelberg, Heidelberg, Germany ALBINO MAGGIO • Department of Agricultural Engineering and Agronomy, University of Naples Federico II, Portici, Italy ALEXIS MAIZEL • Centre for Organismal Studies, University of Heidelberg, Heidelberg, Germany xiv Contributors SILVINA MANGANO • Departamento de Ciencias Biológicas, Facultad de Ciencias Exactas, Centro de Investigación y Desarrollo en Criotecnología de Alimentos (CIDCA)-CCT-La Plata-CONICET, Universidad de La Plata, La Plata, Argentina KYONOSHIN MARUYAMA • Biological Resources and Post-harvest Division, Japan International Research Center for Agricultural Sciences, Tsukuba, Ibaraki, Japan AKIHIRO MATSUI • Plant Genomic Network Research Team, RIKEN Plant Science Center, Yokohama, Japan MARCUS MCHALE • School of Molecular Sciences, University of Sydney, Sydney, NSW, Australia A. HARVEY MILLAR • ARC Centre of Excellence in Plant Energy Biology and Centre for Comparative Analysis of Biomolecular Networks (CABiN), The University of Western Australia, Crawley, WA, Australia BENJAMIN B. MINKOFF • Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA JOHANNA A. MOLENAAR • Laboratory of Plant Physiology, Wageningen University, Wageningen, The Netherlands HONG GIL NAM • Academy of New Biology for Plant Senescence and Life History, Institute for Basic Science & Department of New Biology, Daegu Gyeongbuk Institute of Science and Technology, Dalseong-Gun, Daegu, Republic of Korea DONG-HA OH • Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA; Division of Applied Science, Gyeongsang National University, Jinju, South Korea FRANCESCO ORSINI • Department of Agro-Environmental Sciences and Technology, University of Bologna, Bologna, Italy OMAR PANTOJA • Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico MINGSHENG PENG • Monsanto Company, Chesterfield, MO, USA SÉBASTIEN PETER • Center for Plant Molecular Biology, University of Tuebingen, Tuebingen, Germany SILVANA PETRUCCELLI • Departamento de Ciencias Biológicas, Facultad de Ciencias Exactas, Centro de Investigación y Desarrollo en Criotecnología de Alimentos (CIDCA)-CCT-La Plata-CONICET, Universidad de La Plata, La Plata, Argentina NICHOLAS J. PROVART • Department of Cell & Systems Biology, Centre for the Analysis of Genome Evolution and Function, Toronto, ON, Canada YIPING QI • Department of Genetics, Cell Biology & Development and Center for Genome Engineering, University of Minnesota, Minneapolis, MN, USA GENJI QIN • State Key Laboratory of Protein and Plant Gene Research, Center for Life Sciences, College of Life Sciences, Peking University, Beijing, People’s Republic of China LI-JIA QU • State Key Laboratory of Protein and Plant Gene Research, Center for Life Sciences, College of Life Sciences, Peking University, Beijing, People’s Republic of China LEONORE REISER • Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA LUZ RIVERO • Arabidopsis Biological Resource Center, Center for Applied Plant Sciences, Department of Molecular Genetics, The Ohio State University, Columbus, OH, USA FIONA C. ROBERTSON • Plant Genomic Network Research Team, RIKEN Plant Science Center, Yokohama, Japan STEVEN J. ROTHSTEIN • Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON, Canada Contributors xv TOMOAKI SAKAMOTO • Plant Global Education Project, Graduate School of Biological Sciences, Nara Institute of Science and Technology, Ikoma, Japan FRANK SCHLEIFENBAUM • Center for Plant Molecular Biology, University of Tuebingen, Tuebingen, Germany; Berthold Technologies GmbH & Co KG, Bad Wildbad, Germany KATHRIN SCHLÜCKING • Molekulargenetik und Zellbiologie der Pflanzen, Institut für Biologie und Biotechnologie der Pflanzen, Universität Münster, Münster, Germany RANDY SCHOLL • Arabidopsis Biological Resource Center, Center for Applied Plant Sciences, Department of Molecular Genetics, The Ohio State University, Columbus, OH, USA JULIAN I. SCHROEDER • Division of Biological Sciences, Cell and Developmental Biology Section and Center for Food and Fuel for the 21st Century, University of California San Diego, La Jolla, CA, USA MOTOAKI SEKI • Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Japan; Kihara Institute for Biological Research, Yokohama City University, Yokohama, Japan KAZUO SHINOZAKI • RIKEN Center for Sustainable Resource Science, Suehiro-cho, Tsurumi-ku, Yokohama, Japan ROBERTO SOLANO • Department of Plant Molecular Genetics, Centro Nacional de Biotecnología-CSIC, Madrid, Spain DOROTHEE STAIGER • Department of Molecular Cell Physiology, Institute for Genome Research and Systems Biology, University of Bielefeld, Bielefeld, Germany COLBY G. STARKER • Department of Genetics, Cell Biology & Development and Center for Genome Engineering, University of Minnesota, Minneapolis, MN, USA DIRK STEINHAUSER • Department of Molecular Physiology, Max Planck Institute of Molecular Plant Physiology, Potsdam-Golm, Germany ERNST K. STELZER • Physical Biology, Frankfurt Institute for Molecular Life Sciences (FMLS), Goethe Universität Frankfurt am Main, Frankfurt am Main, Germany ANNA N. STEPANOVA • Department of Genetics, North Caroline State University, Raleigh, NC, USA ELKE STRÖHER • ARC Centre of Excellence in Plant Energy Biology and Centre for Comparative Analysis of Biomolecular Networks (CABiN), The University of Western Australia, Crawley, WA, Australia BJÖRN SUNDBERG • Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, Umeå, Sweden MICHAEL R. SUSSMAN • Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA; Biotechnology Center, University of Wisconsin-Madison, Madison, WI, USA KIRBY N. SWATEK • Department of Biochemistry, Life Sciences Center, University of Missouri, Columbia, MO, USA MAHO TANAKA • Plant Genomic Network Research Team, RIKEN Plant Science Center, Yokohama, Japan MASAO TASAKA • Graduate School of Biological Sciences, Nara Institute of Science and Technology, Ikoma, Japan NICOLAS L. TAYLOR • ARC Centre of Excellence in Plant Energy Biology and Centre for Comparative Analysis of Biomolecular Networks (CABiN), The University of Western Australia, Crawley, WA, Australia JAY J. THELEN • Department of Biochemistry, Life Sciences Center, University of Missouri, Columbia, MO, USA xvi Contributors TAIKO KIM TO • Plant Genomic Network Research Team, RIKEN Plant Science Center, Yokohama, Japan; Department of Integrated Genetics, National Institute of Genetics, Mishima, Japan TETSURO TOYODA • RIKEN Bioinformatics and Systems Engineering Division, Yokohama, Japan COLIN TURNBULL • Division of Cell & Molecular Biology, Imperial College of London, London, UK NAOYUKI UCHIDA • Graduate School of Biological Sciences, Nara Institute of Science and Technology, Ikoma, Japan ARMELLE VALLAT • Service Analytique Facultaire, Institute of Chemistry, University of Neuchâtel, Neuchâtel, Switzerland ROSARIO VERA-ESTRELLA • Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico DANIEL F. VOYTAS • Department of Genetics, Cell Biology and Development, Center for Genome Engineering, University of Minnesota, Minneapolis, MN, USA RAINER WAADT • Division of Biological Sciences, Cell and Developmental Biology Section and Center for Food and Fuel for the 21st Century, University of California San Diego, La Jolla, CA, USA DANIEL VON WANGENHEIM • Physical Biology, Frankfurt Institute for Molecular Life Sciences (FMLS), Goethe Universität Frankfurt am Main, Frankfurt am Main, Germany PETER M. WATERHOUSE • School of Molecular Sciences, University of Sydney, Sydney, NSW, Australia MAHMOUD W. YAISH • Department of Biology, College of Science, Sultan Qaboos University, Muscat, Oman KAZUKO YAMAGUCHI-SHINOZAKI • Biological Resources and Post-harvest Division, Japan International Research Center for Agricultural Sciences, Tsukuba, Ibaraki, Japan; Laboratory of Plant Molecular Physiology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan FENG ZHANG • Cellectic Plant Sciences, St. Paul, MN, USA JIAN-KANG ZHU • Department of Horticulture and Landscape Architecture, Purdue University, West Lafayette, IN, USA Part I Growing Arabidopsis Chapter 1 Handling Arabidopsis Plants: Growth, Preservation of Seeds, Transformation, and Genetic Crosses Luz Rivero, Randy Scholl, Nicholas Holomuzki, Deborah Crist, Erich Grotewold, and Jelena Brkljacic Abstract Growing healthy plants is essential for the advancement of Arabidopsis thaliana (Arabidopsis) research. Over the last 20 years, the Arabidopsis Biological Resource Center (ABRC) has collected and developed a series of best-practice protocols, some of which are presented in this chapter. Arabidopsis can be grown in a variety of locations, growth media, and environmental conditions. Most laboratory accessions and their mutant or transgenic derivatives flower after 4–5 weeks and set seeds after 7–8 weeks, under standard growth conditions (soil, long day, 23 ºC). Some mutant genotypes, natural accessions, and Arabidopsis relatives require strict control of growth conditions best provided by growth rooms, chambers, or incubators. Other lines can be grown in less-controlled greenhouse settings. Although the majority of lines can be grown in soil, certain experimental purposes require utilization of sterile solid or liquid growth media. These include the selection of primary transformants, identification of homozygous lethal individuals in a segregating population, or bulking of a large amount of plant material. The importance of controlling, observing, and recording growth conditions is emphasized and appropriate equipment required to perform monitoring of these conditions is listed. Proper conditions for seed harvesting and preservation, as well as seed quality control, are also described. Plant transformation and genetic crosses, two of the methods that revolutionized Arabidopsis genetics, are introduced as well. Key words Arabidopsis, Growth conditions, Environmental conditions, Natural accession, Seed germination, Seed quality, Plant transformation, Genetic crosses 1 Introduction Healthy growth and development of plants is a prerequisite for accurate and reproducible plant research and Arabidopsis thaliana (Arabidopsis) is no exception. Proper handling and maintenance of Arabidopsis plants also enables a high rate of seed production. In this chapter, we describe basic, best-practice protocols needed for handling Arabidopsis. The reader should be aware, however, that most of the commonly used growth environmental conditions, particularly in greenhouses, may not be similar to the ones in the native habitats of some natural accessions. This is especially Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_1, © Springer Science+Business Media New York 2014 3 4 Luz Rivero et al. important for interpreting phenotypic differences of traits that are known to be strongly influenced by the natural habitat, such as flowering time. Therefore, the protocols described here should be taken only as a guide for the experimental setup and design. This chapter will address (1) the growth of Arabidopsis plants in a variety of environmental settings including growth chambers and greenhouses, as well as in vitro, (2) critical, and optimal conditions to grow healthy Arabidopsis plants, including quality control measures, (3) harvesting, seed preservation, and seed quality control, (4) genetic crosses, and (5) transformation with Agrobacterium tumefaciens (Agrobacterium). Significant emphasis is placed on the equipment required for controlling and monitoring environmental conditions during plant growth. The plant and seed management protocols are given in chronological order. 2 Materials 2.1 Plant Growth and Seed Harvest 1. Arabidopsis seeds can be obtained from the public stock centers: Arabidopsis Biological Resource Center (ABRC, abrc.osu.edu), European Arabidopsis Stock Centre (NASC, arabidopsis.info), RIKEN BioResource Center (RIKEN BRC, www.brc.riken. jp/inf/en), French National Institute for Agricultural Research (INRA, cnrgv.toulouse.inra.fr/en), and other laboratory sources [1] and private sources such as Lehle Seeds (arabidopsis.com). 2. Sterile plastic Petri dishes (plates) (10 or 15 cm diameter). 3. Murashige and Skoog basal salt mixture (MS). 4. 2-(N-Morpholino) ethanesulfonic acid (MES). 5. Agar granulated. 6. Sucrose. 7. Gamborg’s Vitamin Solution. 8. KOH. 9. Distilled water. 10. Magnetic stirring device. 11. Beakers (1 L). 12. Glass bottles (1 L). 13. pH meter. 14. Microcentrifuge tubes. 15. Disposable Pasteur pipettes. 16. Pipetman and pipette tips. 17. Household bleach (5.25 % w/v sodium hypochlorite). 18. Tween® 20. Handling Arabidopsis Plants 5 19. Labeling tape or printable labels. 20. Permanent marker. 21. 3M Micropore surgical paper tape. 22. Thiamine hydrochloride, plant cell culture tested. 23. Double distilled water (ddH2O). 24. 2,4-Dichlorophenoxyacetic acid (2,4-D), plant cell culture tested, >98 %. 25. Ethanol, absolute, 200 proof, for molecular biology. 26. 0.45-μm filter sterilization unit. 27. Myoinositol, plant cell culture tested. 28. KH2PO4. 29. NaOH. 30. Soil mix, e.g., Sunshine® LC1 mix (Sun Gro Horticulture, www.sungro.com) or other peat moss-based potting mix. 31. Fertilizer in slow release pellets, e.g., Osmocote® 14-14-14 (Hummert™ International, www.Hummert.com). 32. Plastic pots with holes in the bottom (e.g., 11 cm diameter, 5.5 cm square) or plastic flats (e.g., 26 cm × 53 cm) with clear domes. 33. Trowel or large spoon. 34. 70 mm filter paper. 35. Pest Trap™ colored sticky cards (Hummert™ International). 36. Enstar® II (Hummert™ International). 37. Conserve® SC (Hummert™ International). 38. Marathon® 1G, granular systemic insecticide (Hummert™ International). 39. Sulfur vaporizer and bulk pelleted sulfur (HID Hut Inc., www. hidhut.com). 40. Spor-Klenz® Ready-To-Use Cold Sterilant (Steris, www.steris. com). 41. Tornado™/Flex cold fog ULV mist sprayer (Curtis Dyna-Fog Ltd., www.dynafog.com). 42. Plastic transparent floral sleeves, e.g., straight sleeve BOPP 60 × 40 × 15 cm (www.zwapak.com) for 11-cm-diameter pots, or other devices for plant isolation such as Aracons™ (Lehle Seeds and Arasystem, www.arasystem.com) or lightweight plastic bags (4–8 L). 43. Hand sieve, e.g., US Standard Stainless Steel Test Sieve No. 40 (Fisher Scientific). 44. Small manila envelopes (e.g., 6 cm × 9 cm) or small glass jars (125 mL) or other containers. 6 Luz Rivero et al. 2.2 Control of Environmental Growth Conditions for Optimal Plant Growth 1. Data loggers, e.g., HOBO® U14 LCD (Onset, www.onsetcomp. com). 2.3 Preparation of Seeds for Shortand Long-Term Storage 1. 2-mL polypropylene cryovials with threaded lids and gaskets (e.g., screw cap micro tubes, manufactured by Sarstedt Inc., available from Fisher Scientific) or other sealed containers for permanent seed storage. 2. Permanent marker or printed labels. 2.4 Seed Quality Control 1. Dissecting microscope or magnifying lenses. 2. Plastic Petri plates (10 cm diameter) or other similar containers. 3. Absorbent paper, e.g., filter paper 10 cm diameter. 4. Permanent marker or printed labels. 5. Distilled water. 6. Parafilm or tape. 2.5 Genetic Crosses 1. DV-30 Precision Swiss clamping tweezers (Lehle Seeds). 2. Optical glass binocular magnifier, e.g., OptiVISOR® (Donegan Optical Company, www.doneganoptical.com) or dissecting microscope. 3. 1.5-mL microcentrifuge tubes. 4. Small scissors. 5. Laboratory tape in various colors and permanent marker. 2.6 Transformation of Arabidopsis with Agrobacterium tumefaciens 1. Agrobacterium transformed with a construct of interest. 2. LB medium. 3. Selection antibiotics. 4. Sucrose. 5. Silwet L-77®. 3 Methods 3.1 Growth of Arabidopsis Plants and Cultures 3.1.1 Growth of Plants in Sterile Conditions on Solid Media Growth of Arabidopsis in experimental settings such as selection of drug-resistant and transformed plants, examination of early root and shoot phenotypes, and identification of homozygous lethal mutants is typically conducted in sterile conditions on solid media. Liquid bleach sterilization, described here, is a practical method to sterilize a few seed lines at a time. Larger numbers of lines can be sterilized easily and with less manipulation using chlorine gas. Chlorine gas can also be utilized for seeds infested with powdery Handling Arabidopsis Plants 7 mildew or other fungal diseases. Various containers such as Petri plates, Magenta® boxes, or culture tubes are used, depending on the purpose of the experiment. This section describes the use of the most commonly employed medium for sterile growth conditions in Petri plates (1× MS agar media). Adaptation to other sterile formats is straightforward, and most experimental additives can be easily incorporated in the preparation. 1. Add 4.31 g of MS basal salt mixture [2] and 0.5 g of MES to a beaker containing 0.8 L of distilled water and stir to dissolve. Add distilled water to final volume of 1 L. Check and adjust pH to 5.7 using 1 M KOH. 2. Divide the media into two 1 L bottles, 500 mL in each. Add 5 g of agar per bottle. Keep the lid loose. 3. Autoclave for 20 min at 121 °C, 15 psi with a magnetic stir bar in the bottle. 4. Place the bottles on a stir plate at low speed and allow the agar medium to cool to 45–50 °C (until the container can be held with bare hands). 5. Starting from this step, perform all the steps in sterile conditions in a laminar flow hood. Add (optional) 1–2 % sucrose and 1 mL Gamborg’s Vitamin Solution, stirring to evenly dissolve (see Notes 1 and 2). 6. Label the bottom of Petri plates with identification number or name, including the date. 7. Pour enough media into plates to cover approximately half of the depth of the plate. 8. Allow the plates to cool at room temperature for about an hour to allow the agar to solidify. If the plates are not to be used immediately, wrap them in plastic and store at 4 °C (refrigerator temperature) (see Note 3). 9. Surface-sterilize seeds in microcentrifuge tubes by soaking for 20 min in 50 % bleach with the addition of 0.05 % Tween® 20 detergent. 10. Remove all bleach residue by rinsing five to seven times with sterile distilled water. 11. For planting of individual seeds at low density, adhere one seed to the tip of a pipette using suction, then release seed onto the agar in desired location. For planting seeds at higher densities, mix seeds in sterile distilled water (or 0.1 % cooled top agar), pour onto plate, and immediately swirl to achieve even distribution. Use a sterile pipette tip to adjust the distribution and remove excess water. Allow the water or top agar to dry slightly before placing lid onto plate. 12. Seal with Micropore tape to prevent desiccation, while allowing slight aeration. 8 Luz Rivero et al. 13. Place the plates at 4 °C for 3 days (see Notes 4 and 5). 14. Transfer the plates to the growth environment. Illumination of 120–150 μmol/m2 s continuous light and a temperature of 22–23 °C are suitable growth conditions (see Notes 6–8). 3.1.2 Growth of Plants in Sterile Conditions in Liquid Media Seedlings of Arabidopsis can also be grown in liquid growth media. This method provides large amounts of plant tissue suitable for proteomics and metabolomics or any study that requires a larger amount of starting material. Liquid culture growth is also widely used for high-throughput genomic studies. In this case, growth protocols are adapted to 96-deep-well plates (or other formats) with the MS media supplemented by gibberellic acid. 1. Prepare MS media, as described in Subheading 3.1.1. Do not add agar. 2. After the media has been autoclaved and cooled to room temperature, distribute 75–100-mL MS media into previously sterilized 250-mL Erlenmeyer flasks in a laminar flow hood. 3. Add bleach- or chlorine gas-sterilized seeds to the media (add up to 10 μL of seeds to each flask, which corresponds to approximately 250 seeds). 4. Grow seedlings under continuous light (120–150 μmol/m2 s) with gentle rotation in an orbital shaker at 120 rpm for up to 2 weeks. 5. Remove seedlings from the flask. Growth of more than 200– 250 seedlings for more than 2 weeks may result in difficulty removing plant material from the flask. 6. Remove excess media from the seedlings using filter paper. Plant material is now ready for downstream applications. 3.1.3 Growth of Arabidopsis Cells in Culture Cell suspension cultures represent a source of nearly uniform cell material for functional genomics and biochemical, physiological, and metabolomic studies that can be performed under tightly controlled environmental conditions. Several cell cultures derived from Arabidopsis tissue explants have been described. Among these, T87 and MM1/MM2d have been most widely used. The T87 cell line originates from the Columbia accession seedlings and can photosynthesize in light [3]. It has been utilized to analyze gene expression changes under stress conditions, hormone signaling pathways, the circadian clock, and plant cell wall biosynthesis [4–6]. Transient and stable transformation protocols for this line have also been established [4, 7]. Unlike T87, MM1 (light grown) and MM2d (dark grown) cell lines, derived from Landsberg erecta accession, are synchronous and can therefore be used for cell-cycle studies [8]. Due to limited space, only the protocol describing maintenance of T87 cell culture will be described here. Handling Arabidopsis Plants 9 1. Prepare 10 mg/mL thiamine stock solution by dissolving 0.1 g of thiamine in 10 mL of ddH2O. Filter-sterilize, aliquot 1 mL into microcentrifuge tubes, and store at −20 ºC. 2. Prepare 2,4-D stock solution by dissolving 0.2 g of 2,4-D in 100 mL of 25 % ethanol. Filter-sterilize, aliquot 1 mL into microcentrifuge tubes and store at −20 ºC. 3. Prepare 1 L of NT-1 media by adding 4.3 g of MS salt mixture, 30 g sucrose, 0.18 g KH2PO4, 100 μL of 10 mg/mL thiamine stock, 220 μL of 2-mg/mL 2,4-D stock, and 100 mg myoinositol to a bottle containing 0.8 L of ddH2O and stir to dissolve (see Note 9). 4. Adjust the pH to 5.8 using 5 M NaOH. Add ddH2O to final volume of 1 L. 5. Distribute 75-mL media into 250-mL Erlenmeyer flasks. Cover flasks with aluminum foil (see Note 10). 6. Autoclave for 20 min. Let the media cool to room temperature. 7. In a laminar flow hood, transfer 3 mL of 1-week-old T87 cell suspension culture into a flask containing 75 mL of NT-1 medium (see Notes 11 and 12). 8. Grow the culture at 24 °C under continuous light (40– 100 μmol/m2 s) with gentle rotation in an orbital shaker at 120 rpm. 9. Subculture weekly by transferring cells into fresh NT-1 media, as described in step 8 (see Note 13). 3.1.4 Planting Arabidopsis Seeds on Soil Diverse mixes and media can be used for growing Arabidopsis. The term “soil” will be used here for any mix or media utilized for non-sterile growth of plants in pots or similar containers. Commercial potting mixes are popular with Arabidopsis researchers due to their convenience and reliability. Potting media often employ peat moss for moisture retention and perlite for aeration. Mixes such as Sunshine® LC1 support healthy Arabidopsis growth and include a starter nutrient charge, so that fertilization is not necessary in early growth phases. Seeds can be planted by various methods (see Note 14). Soil can be autoclaved to eliminate pests, but this is usually not necessary. Preparation of soil for planting in pots can be accomplished as follows: 1. Place soil in a clean container. Add Osmocote® 14-14-14 fertilizer (see Note 15). Wet thoroughly with tap water and mix well with trowel, large spoon, or hands. 2. Label pots or trays with the stock number or name and date of planting (see Note 16). 3. Place soil loosely in pots or other containers and level, without compressing, to generate a uniform and soft bed. Pots are then ready for planting (see Note 17). 10 Luz Rivero et al. 4. When planting many seeds in a pot, scatter them carefully from a folded piece of 70-mm filter or other paper; distribute them evenly onto the surface of the soil (see Note 18). When planting individual seeds, adhere one seed to the tip of a pipette using suction, then release onto the soil. Planted seeds should not be covered with additional soil, since Arabidopsis seeds require light for germination. 5. Place pot(s) in a tray, flat, or other container. 6. Cover with a plastic dome or with clear plastic wrap taped to the container (see Note 19). 7. Place pots at 4 °C for 3 days (see Note 4). 8. Transfer pots into the growth area. 9. Remove plastic dome or wrap for growth in the greenhouse, but leave them on until germinated seedlings are visible for plants grown in a growth chamber. 3.2 Growth Conditions The growth and development of Arabidopsis, including flowering time, is influenced by a number of environmental conditions in addition to the genetic background. Seeds of most lines germinate 3–5 days after planting under continuous light, 23 °C, adequate watering, and good nutrition. Plants produce their first flowers within 4–5 weeks, and seeds can be harvested 8–10 weeks after planting. High-quality seeds can be produced if watering, light, and temperature are carefully controlled. For vigorous plant growth, the optimum light intensity is 120– 150 μmol/m2 s (see Notes 6 and 7) and the optimum temperature is 22–23 °C (see Notes 8 and 20). Water requirement is strongly influenced by relative humidity. Plants tolerate low (20–30 %) relative humidity well, but depletion of soil moisture may occur in these conditions. Plant sterility may result from very high (>90 %) relative humidity. Mild humidity (50–60 %) is considered optimal for plant growth; however, low humidity (<50 %) is recommended for silique maturation. The following growth practices are useful for handling plants in any growth context (greenhouse, growth chamber, or growth room): 1. Add water to trays containing pots with perforated bottoms. 2. Maintain approximately 2 cm of water around base of pots during germination, to avoid any soil drying before the first true leaves begin expanding. 3. Reduce the watering frequency to as low as once or twice per week as needed after plants have developed true leaves and until the plants flower, to avoid water stress, but allow proper drainage of the soil (see Note 21). 4. Water daily during silique filling stage for good seed production. The water requirement of plants increases dramatically during this stage. Handling Arabidopsis Plants 11 5. Keep plants spaced apart with good air circulation to prevent the incidence of powdery mildew. 6. Place several yellow or blue sticky cards (e.g., Pest Trap™) in the growth area to monitor insect populations. Inspect cards and plants daily for pests. Change cards periodically to better judge the pest populations and especially after a pesticide application. 7. Prevent the introduction and spread of pests, which can be transported to the growth area via the soil, seeds, plants, or by humans. Wear a lab coat especially assigned to the growth area, since insects and pathogens can readily be transported on clothing. Plan to have plants of similar age in the growth area, since mature plants are more susceptible to pests than very young plants. Any person who has been in infested growth areas should subsequently abstain from entering noninfested areas; when entering multiple areas, entries should be from the cleanest to the more infested. Keep the area clean and regularly sweep the floors and/or shelves to eliminate or reduce potential sources of pest outbreaks. Mature and dry plants should be harvested and old soil and nonviable dry plant debris should be discarded immediately. 8. Avoid infestation of pests like thrips, aphids, fungus gnats, and white flies by spraying plants with a preventive mixture of Enstar® II, and Conserve® SC. Insecticide mixture is prepared by adding 1.2 mL of each to 12 L of water. This mix can be sprayed lightly on rosettes prior to bolting stage, before placement of any isolation devices (see Subheading 3.3.1). Marathon® 1G, a granular insecticide, can also be applied as directed by the label to control aphids, fungus gnat larvae, white flies, psyllids, and thrips (see Notes 22 and 23). 3.2.1 Maintenance of Plants in Greenhouses Greenhouses with satisfactory cooling, heating, and supplemental light are suitable for large-scale growth of lines that do not require strict control of environmental conditions, which include most natural accessions (e.g., Col, Ler, Cvi, Ws, Est, Kas, Sha, Kondara, C24) as well as species related to A. thaliana. However, conditions are often too hot in temperate climates for Arabidopsis growth in greenhouses during the summer. Successful plant growth should start with an empty room, cleaned and maintained as follows (see Note 24): 1. Remove and properly discard all plants and other materials in the room. 2. Sweep and hose down the entire room interior (benches, floors, window ledges, and windows). 3. Increase the temperature in the room to 40 °C for 3–5 days. The temperature setting may be higher, depending on the 12 Luz Rivero et al. outside environmental conditions and equipment specification. Lights, fans, and cooling pads should be turned off and vents closed during this period. 4. Do not place diseased or older plants in the clean room after the high temperature treatment. 5. Provide supplemental evening and morning light during the winter, since the plants generally require a long photoperiod (at least 12 h) for flowering. In the greenhouse, 16-h photoperiods are typically employed (see Notes 6 and 7). 6. Use shade cloth during the summer, which helps reduce light intensity and regulate temperature. 7. The recommended growth temperature in the greenhouse is 21–23 ºC (see Note 8). Night temperatures should be maintained 2–4 °C lower than the day temperature. 3.2.2 Maintenance of Plants in Growth Chambers and Growth Rooms Most of the commercial growth chambers precisely control light intensity, photoperiod, temperature (typically ±1 °C), and often humidity. Custom plant growth rooms provide environmental control similar to that of reach-in chambers. Standard architectural rooms, equipped with supplemental lighting and air conditioning, are popular for reproducing Arabidopsis economically. Such rooms must be designed with sufficient light, cooling, and ventilation, but typically afford less rigorous control of growth conditions than custom chambers. Such facilities usually allow better control of temperature and light than is offered by a greenhouse, hence their popularity among Arabidopsis researchers. Growth rooms can be maintained within 2–3 °C of a set point, while greenhouse temperatures may spike to higher deviations with rapid changes in sunlight, unexpected hot days, etc. As is the case for greenhouses, it is imperative to start a new planting in a growth facility that has been previously emptied and properly cleaned. Hence, the use of chemicals to control pests and loss of plants due to pest infestation is minimized. 1. Remove and discard all plant residues and related materials (see Note 24). 2. Sweep and wipe down the interior with wet paper towel. 3. Make sure the intake and exhaust vents are closed. 4. Apply a sterilizing agent, such as Spor-Klenz®, to kill fungal spores if heavy infestation of powdery mildew was present, using a fogger tank (e.g., Tornado™/Flex cold fog ULV mist sprayer) through an external access port of the chamber ( see Note 25). 5. Leave chamber undisturbed overnight, and wipe down the inside of the growth chamber with a wet paper towel the next day. Handling Arabidopsis Plants 13 6. Increase the temperature to 40–45 °C for a period of 3–5 days to eradicate/minimize pests. 7. Do not place diseased or older plants in the cleaned chamber. 8. Use continuous light or a long-day photoperiod if you wish to accelerate the reproductive cycle. Short days (less than 12 h) favor growth of vegetative tissue and delay flowering. 3.2.3 Monitoring the Environmental Growth Conditions The environmental control systems currently offered for greenhouses, growth chambers, and growth rooms allow for remote monitoring, control adjustment, and alarm notification via Internet connections. These features represent a vital tool for avoiding loss of data during plant production and maintaining control of environmental experiments. Installation of remote sensing is recommended for new growth facilities of all types. In addition to the control and logging systems in place at the growth facilities, environmental growth conditions can be monitored by placing portable data loggers (e.g., the HOBO® U14) in growth areas. They can act as a complementary, backup, or sole resource for recording environmental data. They can be used to display and record temperature and relative humidity conditions in greenhouses, growth chambers, growth rooms, cold rooms, dry rooms, and laboratories. These data loggers offer reliability, accuracy, convenient monitoring, and documentation of specific environmental conditions. They can be connected to a computer to quickly display and analyze data. 3.3 Prevention of cross-contamination among adjacent pots and avoiding the loss of seeds due to shattering are equally important. Plants must be isolated from their neighbors without compromising seed quality. Various methods and devices exist to accomplish these objectives, including Aracons™, plastic floral sleeves, plastic bags, and isolation by space on the open bench. Details of each method are described below: Seed Handling 3.3.1 Plant Isolation, Harvesting, and Preparation for Storage 1. Aracons™: Place Aracons™ over single plants soon after bolting. 2. Floral sleeves: Cut four equally spaced holes at the point where the sleeve meets the top of the pot. This will increase aeration and reduce water condensation that may encourage mold growth. Place the sleeve on the pot near the time of bolting, so that all plant inflorescences are maintained within the sleeve (see Note 26). This method is very effective for achieving high densities while maintaining productivity and purity of single lines of different genetic backgrounds. 3. Plastic bags: If plastic bags are used, train inflorescences of non-erecta lines into a 4–8-L transparent plastic bag before siliques begin to brown. Bags should be kept open to avoid the accumulation of moisture resulting from transpiration. 14 Luz Rivero et al. 4. Open bench growth: Plants can be maintained on the open bench for bulk seed production, keeping all lines separated by adequate space. Avoid disturbance of maturing inflorescences. This method is appropriate when growing natural accessions that are late flowering and develop large and dense canopies (e.g., Sij-1, Monte-1, Amel-1, Anholt-1, Appt-1, Bik-1, Bl-1, Do-0). The simplest procedure is to wait until the entire inflorescence has browned before harvesting. However, some siliques may shatter naturally and seed will be lost. Harvest seeds only after the soil in pots or flats has been allowed to dry. It should be noted that delays in harvesting following physiological maturation of the plant result in seed deterioration, especially under nonoptimal environmental conditions. Seeds from individual siliques can be harvested after the fruits have turned completely yellow, if rapid turnover is required. However, such seeds have high levels of germination inhibitors. Since formation and maturation of siliques occur over time, early siliques can be harvested before later ones mature. Harvest for each of the four isolation methods is as follows: 1. Aracons™: Slide the plastic cylinder off and then cut off the dry inflorescence above the cone device in a threshing sieve. 2. Floral sleeves: While holding the pot, cut away and discard plastic sleeve. Cut the dry inflorescences and place them in a threshing sieve. 3. Plastic bags: Cut the entire plant off at its base. Shake the seeds into the bag; inflorescences can be gently handpressed from the outside, and the seeds will fall to the bottom of the bag. Most of the dry inflorescences can be removed from the bag by hand before seeds are sieved to separate them from chaff. 4. Open bench: Cut off the entire inflorescence at its base, and carefully place into a 4–8 L or larger transparent plastic bag, depending on the size of the bulk of plants. The major factors influencing seed longevity are (1) genotype; (2) environmental conditions during seed maturation, harvesting, and seed handling; and (3) seed storage conditions. Harvested seeds should be processed promptly (including threshing, cleaning, drying, and packaging) and placed into storage. Seeds should be threshed when the seed moisture content is approximately 10 %, to minimize seed damage during threshing. This seed moisture content will be reached when all plant material appears to be dry. Hand, rather than machine threshing, is recommended mainly because threshing machines need rigorous cleaning between lines to avoid sample cross-contamination, require very careful adjustment, and do not accommodate the variable size Handling Arabidopsis Plants 15 of Arabidopsis seeds well. The hand method is performed as follows: 1. Set a large, clean, white paper on a bench or table for collection of the threshed seeds. 2. Place a clean threshing sieve on top of the paper. 3. Place dry plants directly onto the sieve. If plants are larger than the sieve, they can be cut into pieces that fit the screen. 4. Crush plants using hands to remove all the seeds from siliques. Discard plant material. 5. Sieve seeds through the mesh repeatedly until they are clean and free of chaff. After sieving, the seeds are still likely to be mixed with soil and plant residue. A combination of additional sieving, gentle blowing, and visual inspection can be employed to clean the seeds completely. 6. Clean small samples by hand with the aid of a pointed tool on an opaque glass plate illuminated from below, if needed. 7. Place cleaned seed samples in small labeled manila envelopes or open glass jars to allow seeds to air-dry. Do not use plastic due to static effects. The ideal moisture content of seeds for storage is 5–6 %. Higher moisture content can cause seed deterioration. There are many methods available for drying seeds. The recommended method is to air-dry the seeds at room temperature and approximately 20 % relative humidity for 1–3 weeks (see Note 27). Low relative humidity (20–30 %) is necessary for seeds to reach the desired moisture content [9, 10]. Seed moisture content can be determined by several methods [11]. Seed packaging for storage can be accomplished as follows: 1. Use cryovials (with threaded lids and gaskets) for convenient and safe storage. They hold large numbers of seeds, seal tightly, are moisture proof, and can be resealed many times. 2. Label each vial with pertinent information including date of storage. 3. Determine stored seed quantities (approximately 50 μL = 25 mg = 1,250 seeds). 3.3.2 Seed Storage and Preservation The general conditions for preserving optimal viability of seeds have been well defined [9, 10, 12–14]. Seed storage principles for Arabidopsis are similar to those for other plants, with the caveat that the small seeds rehydrate very rapidly if exposed to high humidity. When seeds deteriorate, they lose vigor and eventually the ability to germinate. The rate of this “aging” is determined by interactions of the temperature and moisture content at which seeds are stored, and unknown cellular factors that affect the propensity for damage reactions [9]. 16 Luz Rivero et al. Rapid deterioration of seeds has not been observed for the diverse collections currently maintained at ABRC. However, experience regarding the effect of genotype is limited. A large number of genes involved in embryogenesis, reserve accumulation, and seed maturation have been identified. Conspicuously, seeds of the abscisic acid-insensitive mutants fail to degrade chlorophyll during maturation and show no dormancy, leading to low desiccation tolerance and poor longevity [15]. Arabidopsis seeds should retain high viability for long storage periods, under proper conditions. With the increase of storage temperature and seed moisture content, the life span of the seeds decreases. Seeds left at room temperature and ambient relative humidity lose viability within approximately 2 years. Seed stored dry at 4 ºC or −20 ºC should last decades. Below are three storage options for safe seed preservation: 1. For active collections which are accessed often, store seeds at 4 °C and 20–30 % relative humidity. Control of humidity is typically achieved by a dehumidification system in the cold room. Note that the control of relative humidity provides a safety factor in case seed containers are not sealed properly. 2. For long-term or archival storage, the recommended temperature is subzero, preferably −20 °C and also preferably 20 % relative humidity. 3. For open containers such as envelopes, seeds can be stored at 15–16 °C, with a relative humidity maintained very carefully at 15 %. Under this controlled environment, seeds will maintain suitable low moisture content [16]. Storing seeds at relative humidity <15 % will not increase shelf life and may actually accelerate deterioration [10]. When vials are removed from cold storage, condensation of moisture on the seeds and subsequent damage may occur. For vials stored at 4 °C, sealed vials must always be warmed to room temperature before opening. For vials stored at −20 °C, rapid rewarming (placing the sealed vial in a 37 °C water bath for 10 min) is a recognized method to minimize frost damage. If possible, working with seed stocks should take place at low (20–30 %) relative humidity. If accumulation of condensation is suspected, vials should be left open in the dry room until seeds have equilibrated before returning the vials to cold storage. 3.3.3 Seed Quality Control The purity and physical integrity of seeds and the presence of pests and seed-borne diseases (especially some fungal diseases) can be detected by visual examination with the naked eye, magnifying lenses, or using a dissecting microscope. For a rigorous assessment, spread the seeds on white paper under a well-lit microscope. Generally, gray or white coloration on the seed surface indicates fungal contamination. Discard seeds if possible; otherwise, sterilize seeds with fungicides before planting. Do not discard shriveled, Handling Arabidopsis Plants 17 small, irregular-shaped, and other colored seeds that might correspond to specific mutations, assuming that the seeds were produced under optimal conditions. Seed viability should be monitored at regular intervals by conducting germination tests under a standard set of conditions. It is recommended that seeds in long-term storage under the optimal preservation standards should be monitored at least every 10 years. Seeds in short-term storage should be monitored at least every 5 years [12, 14]. A germination test for Arabidopsis can be conducted in 3–7 days to determine the proportion of seeds in a sample that will produce normal seedlings. Tests should be carried out before seeds are stored, so that poor quality samples can be recognized. Arabidopsis seeds may fail to germinate because they are dormant or because they are defective or nonviable. Dormant seeds can be distinguished because they remain firm and in good condition, while nonviable seeds soften and are attacked by fungi. Extending stratification can usually break dormancy (see Note 4). Initial germination rate should exceed 80 %, but may be lower for some lines. Mutations in a significant number of genes, mostly involved in biosynthesis and signaling pathways of certain hormones, affect seed germination and/or dormancy. A germination test can be performed as follows: 1. Label the bottom of a 10-cm-diameter Petri plate with name and date. 2. Place two layers of filter paper in the bottom of the plate and moisten with distilled water. Remove excess water. 3. Distribute 100 seeds evenly on the surface of the paper. Seal the plate with Parafilm or clear tape, to prevent drying. 4. Stratify seeds by placing the plates at 4 °C for 3 days. 5. Move the plates to an illuminated shelf or to a growth chamber under standard light and temperature conditions (see Note 28). 6. Record germination percentage after 3–7 days by dividing the number of seedlings by the total number of seeds and multiplying by 100. Germination tests can also be performed on solid media, such as MS, described in Subheading 3.1.1. 3.4 Genetic Crosses Some species of Arabidopsis, particularly A. thaliana, are mostly self-pollinating, especially in a growth chamber or greenhouse setting where insect populations are minimized [17]. It should be noted that the pollen of Arabidopsis does not disperse through the air. Therefore, crossing Arabidopsis is mainly conducted through manual emasculation of flowers just prior to flower opening, followed by hand transfer of pollen from the desired male parent to the stigma of the emasculated flower. Although labor intensive, 18 Luz Rivero et al. the manual method remains a reliable technique for achieving cross-pollination. Species, such as Arabidopsis halleri and Arabidopsis lyrata, have natural self-incompatibility mechanisms, which prevent the plant from self-pollinating and result in obligate outcrossing [18]. For such species, simple maintenance of a genetic stock cannot easily be accomplished from a single plant, and it is most convenient to start with a small population of founders and perform crosspollination. The manual techniques for performing genetic crosses of A. thaliana can be generalized to the related species. The use of a magnifying visor or dissecting microscope is recommended to visualize floral parts and avoid damage to the pistil. Genetic crosses can be performed as follows: 1. Select the appropriate parent plants. Choose young plants at early stages of flowering. Avoid using the first flowers in the inflorescence, which are usually less fertile, and the smaller flowers produced by mature plants [19] (see Note 29). 2. Prepare the female parent: (a) Select a stem with two to three flower buds, in which the tips of the petals are barely visible and before the anthers begin to deposit pollen on the stigma (see Note 30). (b) Remove siliques, leaves, and any open flowers above and below the selected buds on the chosen stem with a small pair of scissors; avoid damaging the stem. (c) Remove the sepals, petals, and all six stamens from the selected flower buds using the precision clamping tweezers, leaving the pistil intact (see Note 31). 3. Prepare the male parent: Select a newly opened flower with anthers that are dehiscent. These flowers will contain fresh pollen that will contribute to the success of the cross. Remove the flower by squeezing near the pedicel with tweezers. 4. Pollinate the female parent by taking the fully open flower from the male parent and brushing the anthers over the bare stigma of the female parent. Visually confirm that pollen has been deposited on the stigma. 5. Label the crosses, placing tape on the stem of the female plant, noting the male and female parent and the date of the cross. 6. Inspect developing siliques over the next several days. Successful crosses are visible after 3 days when the siliques start elongating. Siliques are ready for harvest once they turn brown, but before they shatter (see Note 32). 7. Harvest siliques by cutting them with scissors and placing them into a microcentrifuge tube or a small paper envelope. 8. Air-dry seeds at room temperature, preferably at 20–30 % relative humidity, for 1–3 weeks. Thresh seeds if necessary. Handling Arabidopsis Plants 3.5 Floral Dip Transformation of Arabidopsis with Agrobacterium tumefaciens 19 The development of simple and highly efficient stable transformation protocols, without a need for plant regeneration in tissue culture, represented one of the milestones that enabled Arabidopsis to become a model that it is today. Transformation of germinating seeds with Agrobacterium tumefaciens represented the first breakthrough in this effort [20]. It was followed by a “vacuum infiltration” method, in which Agrobacterium was used to infect uprooted flowering plants [21]. This protocol was simplified and streamlined a few years later and became known as the “floral dip” method [22]. In this method, the need for vacuum infiltration was replaced by the use of Silwet L-77®, a surfactant that aids the entry of bacteria into plant tissues. The use of this protocol revolutionized the field of Arabidopsis functional genomics, by enabling high-throughput generation of T-DNA mutants and other resources that show stable inheritance of the mutations and other modifications caused by transformation events. Although other methods are still in use in specific cases (e.g., transformation of root explants for transforming sterile mutants [23] or vacuum infiltration for Ler-0 [21]), floral dip has become the most widely used protocol in most of the research labs and for most of the natural accessions (e.g., Col-0, Ws-0, Nd-0, No-0) and will be described here in detail: 1. Grow plants in pots as described in Subheading 3.2 under long-day conditions until bolting (see Note 33). 2. Remove the first inflorescence stems that bolt to induce growth of secondary shoot inflorescences. Plants will be ready for dipping in 5–7 days. 3. Prepare the starter culture of Agrobacterium carrying the construct of interest, by growing the 5-mL culture in LB medium supplemented with appropriate antibiotics at 28 ºC for 2 days (see Note 34). The culture should be started 2–4 days after the first inflorescences have been removed (step 2). 4. Use 1 mL of the starter culture to inoculate 200 mL of LB medium supplemented with appropriate antibiotics and grow this large culture for 16–24 h, until the cell growth reaches the stationary phase (see Notes 35 and 36). 5. Spin down Agrobacterium culture at 4,000 × g for 10 min at room temperature and resuspend the pellet in 1–2 volumes (200–400 mL) of 5 % sucrose solution (see Note 37). 6. Immediately before dipping, add the appropriate volume of Silwet L-77® to the Agrobacterium cell suspension, to make a final concentration of 0.02–0.05 %; pour the suspension in a beaker. 7. Prepare the plants for dipping by removing the siliques that have already been formed (see Note 38). 20 Luz Rivero et al. 8. Dip the inflorescences for a few seconds by holding the pot with one hand and gently bending the inflorescence shoots to allow them to be completely submerged into the suspension until a film of suspension can be observed on the plants. 9. Moisten paper towel and place it at the bottom of a tray. Lay the pots on their sides in the tray and cover with a lid (see Note 39). 10. After 1 day, place the pots in their normal upright position and continue growing the plants until they set seeds. 11. Screen the primary transformants on appropriate selection plates. 4 Notes 1. Optional sucrose and vitamins should be added after autoclaving and only after the agar media cools because vitamins are thermolabile and 15–25 % of the sucrose may be hydrolyzed to glucose and fructose at elevated temperatures [24]. 2. Plants grow more vigorously and quickly on media containing 1–2 % of sucrose; however, fungal and bacterial contamination must be rigorously avoided by seed sterilization. Note that germination of some mutants might be delayed on sucrosecontaining media. 3. Covered plates, boxes, or tubes with solidified agar can be stored for several weeks at 4 °C in a container that prevents desiccation. 4. Most widely used lines have moderate dormancy, and cold treatment, also called stratification, may not be required for germination when planting older seeds of these lines. However, a cold treatment at 4 °C for 3 days will improve the rate and synchrony of germination. The use of an extended cold treatment of approximately 7 days is especially important for freshly harvested seeds, which have more pronounced dormancy. An extended cold treatment is also necessary for certain natural accessions (e.g., Dobra-1, Don-0, Altai-5, Anz-0, Cen-0, WestKar-4). Cold treatment of dry seeds is usually not effective in breaking dormancy. 5. Instead of stratification on plates, seeds suspended in sterile water can also be stratified prior to planting on agar or soil surface. 6. Optimum light intensity is in the range of 120–150 μmol/ m2 s. Higher intensities may result in death of some seedlings, but are tolerated by older plants; purpling of leaves is the first symptom of high-light stress. Very low light intensities may result in weak and chlorotic plants. Arabidopsis is a facultative Handling Arabidopsis Plants 21 long-day plant. Plants flower rapidly under continuous light or long-day (>12 h) photoperiods, while under short days (<12 h), flowering is delayed, favoring vegetative growth. Plants grow well under a cycle of 16-h light/8-h dark or under continuous light. 7. Various light sources can be used for optimal plant growth, such as cool-white fluorescent bulbs, incandescent bulbs, very high-output (VHO) lamps, high-intensity discharge (HID) lamps, and shaded sunlight. Cool-white fluorescent bulbs, supplemented by incandescent lighting, are recommended in growth chambers or growth rooms. HID lamps of 400– 1,000 W are conventional in greenhouses in temperate climates to supplement the sunlight or prolong the natural photoperiod. 8. The temperature range for Arabidopsis growth is 16–25 °C. Lower temperatures are permissible, but higher temperatures are not recommended, especially for germination through early rosette development. Temperatures above 28 ºC are better tolerated by more mature plants (past early rosette stage). In general, high temperatures result in a reduced number of leaves, flowers, and seeds. At lower temperatures, growth is slow, favoring the vegetative phase, and flowering is delayed. 9. Thiamine and 2,4-D stock solutions must be added to the media in a laminar flow hood to prevent contamination of stock solutions. 10. Some investigators prefer to use the disposable 250-mL polycarbonate membrane vented-cap flasks that may provide better aeration and result in a better cell growth. 11. Mix the culture well immediately before pipetting, since the cells settle to the bottom of the flask shortly after the orbital shaking has been stopped. 12. The density of cell culture is an important factor for its viability. Too high and too low density can cause cell death and the cessation of cell division, respectively. Adjust the volume of subcultured cells if necessary. If larger clumps of cells are formed, pass the suspension through a sterile 1-mm stainless steel sieve. 13. T87 cell culture can also be propagated and maintained as a callus on solid NT-1 media (on plates or in Magenta® boxes to avoid premature depletion of nutrients). Callus is subcultured once a month by transferring a 1-mm piece to fresh media. Note that growth of the callus after continued passage becomes independent of cytokinin [25]. 14. Square pots with a diameter of approximately 5.5 cm can be used to grow one plant, 11-cm-diameter pots are suitable for 22 Luz Rivero et al. growing up to 60 plants, and rectangular flats that are 26 cm × 53 cm can accommodate as many as 200–600 plants grown to maturity. Another option especially suitable for genomic studies is 96-well insets. Higher densities, approximately 3,000 plants per 30 cm2, can be used if plants are harvested at early stages. 15. Osmocote® 14-14-14 (14 % nitrogen, 14 % phosphate, 14 % potassium) is an extended time-release fertilizer, feeding up to 3 months from planting. Apply in amounts according to the label. Alternatively, nutrient solution can be used to wet the soil [26]. 16. Always use clean growth supplies, especially new pots and trays to avoid pest contamination. 17. Prepared pots can be stored in covered trays at 4 °C for several days before planting, although pot preparation and planting should be conducted on the same day if possible. 18. Various methods can be employed to plant seeds. The density of plants varies with genetic circumstances and purpose of the planting. High yields are achieved with 10–20 plants per 11-cmdiameter pot. Generally, low densities increase the yield/plant and are suitable for pure lines. High densities reduce the yield/ plant, but are useful when it is necessary to maintain the genetic representation in segregating populations. 19. The plastic wrap should not be allowed to contact the soil surface and should be perforated to provide aeration. If clear plastic domes are used, they should not be tightly sealed. 20. Some winter-annual natural accessions require a period of cold to initiate flowering, a process known as vernalization (e.g., Galdo-1, Monte-1, Cit-0, Dog-4, Istisu-1, Valsi-1, Mir0, Tamm-2). Young rosettes (2–4 weeks old) of late flowering accessions should be placed at 4 °C for 4–7 weeks to accelerate flowering. 21. Plants should not be overwatered to avoid development of algae, fungi, fungus gnat larvae, and other pests who thrive on overly wet soil. Algae can be manually scraped off and the soil allowed to dry. 22. Preventive application of pesticides is very effective if local regulations allow this and can avert heavy use of chemicals after infestations have developed. Rotation of pesticides is recommended. Biological control agents can also be applied. 23. Marathon® 1G is effective as a preventive insecticide or as treatment following infestation. It can be applied to the soil surface or included in subirrigation watering regime, which reduces damage to the plants. 24. Mature or diseased plants, plant debris, used soil, pots, and other materials can shelter pathogen spores or insects from Handling Arabidopsis Plants 23 former plantings. After removal of the pest and host materials and the sterilization of the growth area, it is very improbable that any pest or pathogen will survive. 25. Read and follow precautionary measures as suggested by the manufacturer of the cold sterilant Spor-Klenz®. 26. Floral sleeves fit snugly around a pot, extend upward, and are wider at the top allowing for expansion of the developing inflorescences. Sleeves made of biaxially oriented polypropylene (BOPP) are very clear, maintain upright stiffness, and tear easily for harvesting. Fold down the tops of the sleeves about 2 cm to ensure they stay open and stable. If plants grow out above the sleeves and are at high plant density, train the top of the plants back down into the sleeve to avoid contamination. 27. The moisture content of Arabidopsis seeds stored in open containers corresponds to the room humidity. Arabidopsis seeds behave in a similar way to crop seeds with similar chemical composition [12, 13]. 28. Environmental conditions for seed germination tests are the same as for growing plants. Two replicates of 100 seeds each provide reliable germination estimates. Cases in which observed germination is <80 % may warrant follow-up testing. 29. Crosses may be performed throughout the duration of the flowering time; however, the crosses will have a higher rate of success during the earlier stages of flowering. 30. Using unopened flowers for the female parent is important in order to avoid self-pollination. Shortly after this stage, stamen/pistil length ratio, as well as the timing of anther dehiscence, favors self-pollination and open flowers have most likely been self-pollinated. All flower candidates for female crossing should be examined for presence of released pollen prior to their use in crossing. 31. If the pistil is damaged, it is highly unlikely that the cross will be successful and the flower should not be used. 32. Siliques should be ready to harvest in about 2–3 weeks after the cross. If siliques are brown, use care, as it is easy to lose all seeds at this stage. 33. Plants can be either grown individually in 5-cm round pots or up to the density of 10–15 plants per 11-cm square pot [27]. 34. LB medium can be substituted with Yeast Extract Peptone (YEP) medium to achieve higher Agrobacterium density [27]. 35. The presence of the construct should be confirmed in the starter culture (e.g., by PCR). 36. The rest of the starter culture can be stored at 4 ºC for up to 1 month for future use [28]. 24 Luz Rivero et al. 37. One to two volumes of sucrose solution used to resuspend the pellet corresponds to the original volume of the large Agrobacterium culture. The final OD600 of the suspension for dipping should be approximately 0.8. 38. Removing siliques will increase the transformation efficiency. 39. Some investigators prefer using another tray in place of a lid, to avoid the exposure to light and the production of excessive heat around the plants. References 1. Knee E, Rivero L, Crist D, Grotewold E, Scholl R (2011) Germplasm and molecular resources. In: Schmidt R, Bancroft I (eds) Plant genetics and genomics: crops and models, vol 9, Genetics and genomics of the Brassicaceae. Springer, New York, pp 437–467 2. Murashige T, Skoog F (1962) A revised medium for rapid growth and bio assays with tobacco tissue cultures. Physiol Plant 15:473–497 3. Axelos M, Curie C, Mazzolini L, Bardet C, Lescure B (1992) A protocol for transient gene-expression in Arabidopsis-thaliana protoplasts isolated from cell-suspension cultures. Plant Physiol Biochem 30:123–128 4. Yamada H, Koizumi N, Nakamichi N, Kiba T, Yamashino T, Mizuno T (2004) Rapid response of Arabidopsis T87 cultured cells to cytokinin through His-to-Asp phosphorelay signal transduction. Biosci Biotechnol Biochem 68: 1966–1976 5. Nakamichi N, Matsushika A, Yamashino T, Mizuno T (2003) Cell autonomous circadian waves of the APRR1/TOC1 quintet in an established cell line of Arabidopsis thaliana. Plant Cell Physiol 44:360–365 6. Alonso AP, Piasecki RJ, Wang Y, LaClair RW, Shachar-Hill Y (2010) Quantifying the labeling and the levels of plant cell wall precursors using ion chromatography tandem mass spectrometry. Plant Physiol 153:915–924 7. Ogawa Y, Dansako T, Yano K, Sakurai N, Suzuki H, Aoki K, Noji M, Saito K, Shibata D (2008) Efficient and high-throughput vector construction and Agrobacterium-mediated transformation of Arabidopsis thaliana suspension-cultured cells for functional genomics. Plant Cell Physiol 49:242–250 8. Menges M, Murray JA (2002) Synchronous Arabidopsis suspension cultures for analysis of cell-cycle gene activity. Plant J 30:203–212 9. Walters C (1998) Understanding the mechanisms and kinetics of seed aging. Seed Sci Res 8:223–244 10. Walters C (1998) Ultra-dry seed storage. Seed Sci Res 8:1–73 11. Rivero-Lepinckas L, Crist D, Scholl R (2006) Growth of plants and preservation of seeds. In: Salinas J, Sanchez-Serrano JJ (eds) Methods in molecular biology, vol 323, Arabidopsis protocols. Humana, Totowa, NJ, pp 3–12 12. Rao NK, Hanson J, Dulloo ME, Ghosh K, Nowell D, Larinde M (2006) Manual of seed handling in genebanks. Bioversity International, Rome 13. Hong TD, Ellis RH (1996) A protocol to determine seed storage behavior. IPGRI, Rome 14. FAO and IPGRI (1994) Genebank standards. FAO, IPGRI, Rome, pp 1–8 15. Ooms J, Leon-Kloosterziel KM, Bartels D, Koornneef M, Karssen CM (1993) Acquisition of desiccation tolerance and longevity in seeds of Arabidopsis thaliana (a comparative study using abscisic acid-insensitive abi3 mutants). Plant Physiol 102:1185–1191 16. Hay FR, Mead A, Manger K, Wilson FJ (2003) One-step analysis of seed storage data and the longevity of Arabidopsis thaliana seeds. J Exp Bot 54:993–1011 17. Koornneef M (1994) Arabidopsis genetics. In: Meyerowitz E, Somerville C (eds) Arabidopsis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, pp 89–120 18. Nasrallah J (2011) Self-incompatibility in the brassicaceae. In: Schmidt R, Bancroft I (eds) Plant genetics and genomics: crops and models, vol 9, Genetics and genomics of the Brassicaceae. Springer, New York, pp 389–411 19. Weigel D, Glazebrook J (2002) Genetic analysis of mutants. In: Arabidopsis—a laboratory manual. Cold Spring Harbor Laboratory Press, New York, pp 41–53 20. Feldmann KA, Marks MD (1987) Agrobacteriummediated transformation of germinating-seeds of Arabidopsis-thaliana—a non-tissue culture approach. Mol Gen Genet 208:1–9 Handling Arabidopsis Plants 21. Bechtold N, Ellis J, Pelletier G (1993) In-Planta Agrobacterium-mediated gene-transfer by infiltration of adult Arabidopsis-thaliana plants. C R Acad Sci III–VIe 316:1194–1199 22. Clough SJ, Bent AF (1998) Floral dip: a simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana. Plant J 16:735–743 23. Valvekens D, Vanmontagu M, Vanlijsebettens M (1988) Agrobacterium-tumefaciens-mediated transformation of Arabidopsis-thaliana root explants by using kanamycin selection. Proc Natl Acad Sci U S A 85:5536–5540 24. Schenk N, Hsiao K-C, Bornman CH (1991) Avoidance of precipitation and carbohydrate breakdown in autoclaved plant tissue culture media. Plant Cell Rep 10:115–119 25 25. Pischke MS, Huttlin EL, Hegeman AD, Sussman MR (2006) A transcriptome-based characterization of habituation in plant tissue culture. Plant Physiol 140:1255–1278 26. Estelle MA, Somerville C (1987) Auxinresistant mutants of Arabidopsis thaliana with altered morphology. Mol Gen Genet 206: 200–206 27. Weigel D, Glazebrook J (2002) Arabidopsis: a laboratory manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY 28. Zhang X, Henriques R, Lin SS, Niu QW, Chua NH (2006) Agrobacterium-mediated transformation of Arabidopsis thaliana using the floral dip method. Nat Protoc 1:641–646 Chapter 2 Using Arabidopsis-Related Model Species (ARMS): Growth, Genetic Transformation, and Comparative Genomics Giorgia Batelli, Dong-Ha Oh, Matilde Paino D’Urzo, Francesco Orsini, Maheshi Dassanayake, Jian-Kang Zhu, Hans J. Bohnert, Ray A. Bressan, and Albino Maggio Abstract The Arabidopsis-related model species (ARMS) Thellungiella salsuginea and Thellungiella parvula have generated broad interest in salt stress research. While general growth characteristics of these species are similar to Arabidopsis, some aspects of their life cycle require particular attention in order to obtain healthy plants, with a large production of seeds in a relatively short time. This chapter describes basic procedures for growth, maintenance, and Agrobacterium-mediated transformation of ARMS. Where appropriate, differences in requirements between Thellungiella spp. and Arabidopsis are highlighted, along with basic growth requirements of other less studied candidate model species. Current techniques for comparative genomics analysis between Arabidopsis and ARMS are also described in detail. Key words Thellungiella spp., Halophytes, Germination, Seed handling, Vernalization, Plant care 1 Introduction Over the past few decades, a tremendous advance in our understanding of molecular and cellular responses to abiotic stresses has taken place using the model species Arabidopsis. Forward and reverse genetics approaches, combined with thorough functional analysis of many isolated genes, as well as biochemical characterization of key stress tolerance proteins have allowed us to characterize quite accurately many responses to salt and osmotic stresses (reviewed in refs. 1–3). Although Arabidopsis has contributed to the unraveling of complex essential mechanisms that allow plants to cope with salt stress [4], it has failed to reveal the key determinants that render in natural environments some plants (halophytes) more tolerant than others (glycophytes) to saline environments. Halophytes from different families have been studied over the past decades, including Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_2, © Springer Science+Business Media New York 2014 27 28 Giorgia Batelli et al. species, for example, belonging to the genera Atriplex, Suaeda, Salicornia, and Mesembryanthemum; monocotyledonous species such as Spartina and Puccinellia spp.; and mangroves belonging to the genera Avicennia and Rhizophora ([5], reviewed in ref. 6). The study of halophytic species has led to a partial understanding of the different physiological and morphological strategies used by plants to withstand harsh conditions. However, the paucity of suitable molecular genetics techniques has, to a great extent, prevented the identification of the genetic bases for salt tolerance in halophytes [6]. Genetic studies on halophytic species are very limited [7], and the potential of this resource of natural salt tolerance has remained largely unexplored [6, 8–11]. Recently, Thellungiella salsuginea (salt cress), previously referred to as Thellungiella halophila, and its close relative Thellungiella parvula [12, 13] have been proposed as model systems for the study of halophytic traits [14–17]. Compared to other halophytes, Thellungiella spp. exhibit lower levels of tolerance but may still be considered true halophytes [6]. The relatively short life cycle and other traits important to efficient experimentation together with their close relatedness to Arabidopsis (92 % of average sequence identity with Arabidopsis thaliana for T. salsuginea) have made them preferred species as extremophile plant model systems [11, 14, 15, 18]. Since the initial introduction of T. salsuginea as model system [14, 15, 18], remarkable progress has been made in the elucidation of morphological, physiological, and molecular traits that differentiate this species from the close relative Arabidopsis [11]. Such distinctive traits include more succulent and waxy leaves [15, 19, 20], the presence of extra layers of leaf palisade cells and root endodermis and cortex layers compared to Arabidopsis [19], a higher content of compatible osmolytes in both control and salt stress conditions [11, 19, 21], and a higher capability to efficiently restrict the Na+ influx into the roots [11, 22, 23]. Additional distinctive mechanisms of protection from excess salt in Thellungiella may include a more efficient regulation of Na+ fluxes at both plasma membrane [24, 25] and tonoplast levels [26]. These features indicate that Thellungiella is preadapted and therefore “more prepared” to efficiently tailor its response to salt stress. The availability of these resources, coupled with the feasibility of forward and reverse genetics studies in Thellungiella spp. which can be compared to its close genetic relative (Arabidopsis thaliana), has certainly opened new avenues towards a better understanding of the fundamental mechanisms of plant salt tolerance. 1.1 Growth and Maintenance of Thellungiella spp. T. salsuginea and T. parvula are very similar to Arabidopsis in terms of growth and maintenance requirements, and they can easily be grown in growth chambers and greenhouses. Compared to Arabidopsis, however, the life cycle of Thellungiella spp. is longer and, for T. salsuginea, a long vernalization period is required for Arabidopsis Related Model Species (ARMS) in Salt Stress Research 29 flowering. Seed maturation in these species is more asynchronous than Arabidopsis, therefore, extra care should be taken in experimental procedures such as the recovery of transformants which may not be included in the initial wave of germination. Subheading 3.1 describes procedures of growth and maintenance of Thellungiella spp. that are critical for obtaining healthy plants and high-quality seeds. Less studied halophytic species can also be considered as valuable model systems [16]. For these, basic growth requirements are briefly presented in Subheading 3.2. 1.2 Genetic Transformation Using Agrobacterium As A. thaliana, Thellungiella spp. can be efficiently transformed via Agrobacterium-mediated T-DNA transfer using the simple, straightforward method of the floral dip [27] or, similarly, by spraying flowers with an Agrobacterium suspension ([14, 15, 19], Paino D’Urzo and Bressan, unpublished). However, the prolonged asynchronous flowering process in Thellungiella spp. requires several repeated rounds of transformation in order to ensure a high percentage of transformants. Subheading 3.3 of this chapter describes a method for large-scale Agrobacterium-mediated transformation to generate collections of T-DNA insertional mutants. 1.3 Comparative Genomics of ARMS Since the completion of its genome in 2000 [28], vast amounts of genetic data have been accumulated and analyzed for A. thaliana. This makes the genomes and transcriptomes of Arabidopsis ecotypes and Arabidopsis-relative crucifers particularly suitable resources for comparative studies. Several comparative gene expression analyses using ESTs produced from plants exposed to various stress conditions, quantitative real-time PCRs, and different types of microarrays have confirmed [10, 21, 29–32] the presence of potentially important and distinctive paralogs that may mediate mechanisms of stress adaptation in Thellungiella [11]. Recent advances in next-generation sequencing technology have enabled and accelerated the sequencing and assembly of the genomes of non-model species, including crucifers. In 2011, the genomes of Arabidopsis lyrata [33], T. parvula [34], and Brassica rapa [35] have been published. A first draft of genome sequences of T. salsuginea, carried out at the Joint Genome Institute (JGI—US Department of Energy) under the coordination of Schumaker, Wing, and Mitchell-Olds, is available online (http://www.phytozome.net/ thellungiella.php#A, [17]), and the Capsella rubella genome (http://tinyurl.com/jgi-plans) is currently being sequenced. The analysis of the T. parvula genome has highlighted the presence of over 3,000 predicted open reading frames (ORFs) without BLAST hits in A. thaliana, a portion of which may represent novel stress tolerance genes, as well as additional, i.e., duplicated, copies of Arabidopsis genes known to be important to stress responses such as HKT1 and SnRK2s [34]. The Gene Ontology (GO) classification of the over 28,000 predicted ORFs of T. parvula has also 30 Giorgia Batelli et al. shown that subcategories of the “biological process” category were over- (“response to abiotic or biotic stimulus”) or underrepresented (“signal transduction”) in T. parvula compared to Arabidopsis, suggesting a different strategy of response to abiotic stresses in Thellungiella compared to Arabidopsis [34]. In Subheading 3.4, the tools and resources for comparative genomics in crucifers are described with the recently published T. parvula genome as an example [34]. T. parvula has a genome slightly larger than A. thaliana distributed in seven pairs of chromosomes [36]. With combinations of 454 and Illumina platforms, and assemblers based on different algorithms, contigs of chromosomearm length (N50 = 5.29 Mb) were produced. The version 2.0 of the genome sequence and annotation are available through http:// thellungiella.org/. 2 2.1 Materials Plant Growth 2.1.1 Substrates 1. When sowing in soil, a loose and uniform soil potting type is required. Peat-based commercial mixes ensure good water retention as well as good draining properties. 2. Some specific commercial formulations have added fertilizers or bio-protectants against pathogens, such as Bacillus subtilis. 3. If root measures are to be performed, inert/light substrates may be preferred, such as perlite/vermiculite/light gravel. When these substrates are used, water retention may be enhanced by mixing them with coir or other fibrous substrates. 2.1.2 Growing Containers 1. The use of standard 8–10 cm diameter plastic pots is frequently adequate. 2. If fine substrates are used, a thin filter on the pot bottom may avoid loss of the growth medium. 3. Plastic tubs such as 90 × 60 × 20 cm can also be used, provided they are equipped with appropriate drainage systems in order to avoid water stagnation (www.thllungiella.org). 2.1.3 Water and Nutrients 1. Plants must be watered frequently to maintain a moist root environment and to avoid flooding in order to reduce the risk of anoxia. 2. It is emphasized that no particular watering schedule is appropriate, but must be adapted to the specific conditions present. 3. Once a week it is usually appropriate to apply a modified Hoagland nutrient solution (13.00–18.00 mM N; 0.70–1.50 mM P2O5; 3.00–5.50 mM K2O; 1.50–6.00 mM SO3; 1.25–3.50 mM Mg; 3.25–5.00 mM Ca; 10.00–40.00 μM Fe EDTA; 0.50–1.00 μM Cu; 4.00–7.00 μM Zn; 15.00–40.00 μM B; 10.00–15.00 μM Mn; 0.50–1.00 μM Mo). Arabidopsis Related Model Species (ARMS) in Salt Stress Research 2.1.4 Pests 31 1. Bioplasts such as fungi, viruses, and especially insects are often underestimated deterrents to successful molecular manipulations. Many biological solutions (mainly biopesticides) are available and proven effective for greenhouse/growth room environments. 2. Manufacturer instructions should be strictly followed when handling pesticides, but since Arabidopsis and Thellungiella spp. are not specifically mentioned in the labels, dose test experiments might be required to establish optimal conditions, considering also that specific mutants might respond differently (see Note 1). 2.2 Media for Transformation Using the Floral-Dip Method 2.2.1 Bacterial Growth Media for AgrobacteriumMediated Transformation 1. Yeast extract peptone (YEP): Yeast extract 10 g/L, peptone 10 g/L, sodium chloride 5 g/L. Adjust pH to 7.0 with 0.1N potassium hydroxide (KOH). For plates, add agar 15 g/L. Autoclave-sterilize, typically for 20 min at 121 °C (steam at 15 psi). 2. LB medium: Tryptone 10 g/L, yeast extract 5 g/L, sodium chloride 10 g/L. Adjust pH to 7.0 with 0.1N potassium hydroxide (KOH). For plates, add agar 15 g/L. Autoclavesterilize, typically for 20 min at 121 °C (steam at 15 psi). 3. Antibiotics and other heat-labile substances are added after medium is cooled to 55 °C in water bath, prior to pouring medium into suitable container (Petri dishes or else). 4. Antibiotic dosage: Kanamycin 50 mg/L; rifampicin 30 mg/L; gentamicin 30 mg/L; ticarcillin 30 mg/L. Use as required. 2.2.2 Agrobacterium Infiltration Medium MS salt (1/2×) 2.2 g/L, B5 vitamins (1×), sucrose 50 g/L, MES 0.5 g/L, N6-benzylaminopurine (BA) 0.01 mg/L, Silwet L-77 200 μL/L. Adjust to pH 5.7. 2.2.3 Antibiotics Preparation Rifampicin, kanamycin, gentamicin, and ticarcillin antibiotic stocks: these compounds are heat labile and cannot be sterilized by autoclaving. 1. Kanamycin (30 mg/mL)—(dissolve 300 mg in 10 mL H2O). 2. Ticarcillin (100 mg/mL)—(dissolve 1.0 g in 10 mL H2O). 3. Gentamicin (30 mg/mL)—(dissolve 300 mg in 10 mL H2O). 4. Rifampicin (30 mg/mL)—(dissolve 300 mg in 10 mL of methanol). 5. Filter-sterilize using a syringe and a 0.22 μm membrane filter. Aliquot into 1 mL samples and store up to 3 months at −20 °C. 2.3 Comparative Genomics We list programs that aid in the comparison and viewing of genome sequences. It is, however, to be understood that the computational 32 Giorgia Batelli et al. tools for the representation and analysis of genome sequences undergo rapid development and changes. 1. Nucmer, included in the MUMmer package [37], is the software suitable for identifying global colinearity between two genomes. For installation and documentation, see http:// mummer.sourceforge.net/. A computer with a UNIX operating system will be required. 2. Circos visualization tool for comparative genomics [38]. For installation and tutorial, see http://circos.ca/. Also useful is the Google discussion group: http://groups.google.com/ group/circos-data-visualization. 3. MAUVE [39] sequence alignment tool is suitable for identifying synteny as well as chromosome-scale inversions. For installation and documentation, see http://gel.ahabs.wisc.edu/ mauve/. 4. Genome-wide as well as localized comparisons of sequences can be performed using the comparative genomics platform available at CoGE http://genomevolution.org/CoGe/. 3 Methods 3.1 Growth and Maintenance of Thellungiella spp. 1. Whereas the Arabidopsis cycle can be completed in 6–10 weeks, T. salsuginea requires from 16 to 20 weeks from sowing to harvest, in comparison to 12–16 weeks for T. parvula (Fig. 1). 2. Growing conditions that heavily influence flowering time and life cycle include light (short/long day), temperature, watering and nutrition, plant density, containers, presence of pests, Fig. 1 (a) Adult plants of Arabidopsis thaliana and Thellungiella parvula. (b) An adult flowering plant of Thellungiella parvula Arabidopsis Related Model Species (ARMS) in Salt Stress Research 33 and the type of facilities used, e.g., growth chambers vs. greenhouse. 3. When optimal conditions are maintained as uniformly and consistently as possible, shorter harvesting times and higher seed quality result. Any prolonged stress will result in weak, unhealthy plants, delayed and poorer harvest, or outright plant losses. 4. It is crucial to understand that general plant health has a much larger effect on Agrobacterium-based transformation of Thellungiella compared to Arabidopsis. The activation of the innate immune response controls significantly the ability of Agrobacterium to successfully mediate gene transfer in Arabidopsis. Because Thellungiella species are perennial-like (they continue growth after flowering), it is tempting for convenience to use old plants that continue to flower, but because of previous stress and pathogen episodes (root aphids are a common example), transformation frequency will be very low. 5. Both T. salsuginea and T. parvula show a greater degree of seed germination variability than A. thaliana. This is related to the higher percentage of dormant seeds generally present in Thellungiella spp. It is good practice therefore to stratify seeds for several days (1 week) at 4 °C in the dark and to work with seeds of uniform age and good quality. 6. Cold treatment of dry seeds is not effective, whereas seeds maintained in a constant moist environment (either water suspension or moist soil) for 7 days will germinate promptly and more uniformly. 3.1.1 Seed Storage and Preservation 1. Seeds should be dried to a moisture content of 5–6 %, by air-drying for about 4 weeks or in desiccators with Drierite or silica gel for 3 or 4 days. 2. Seeds are then commonly stored at room temperature (24– 27 °C) in scintillation vials or paper envelopes kept in desiccators. In these conditions, seeds will be viable for at least 3 years and we have experienced Thellungiella seeds to preserve viability for 10–12 years. For longer storage time, seeds should be sealed in moisture-proof containers and kept at 4 °C. 3.1.2 Seed Germination 1. For sowing, seeds are mixed with sand and distributed on the soil using a salt/pepper shaker to facilitate uniform dispersion. Alternatively, seeds kept at 4 °C in water in the dark for 7 days are further diluted in abundant water and uniformly distributed on soil using a squeeze bottle. 2. Final density of sowing depends on experimental purposes and dictates the ratio of seeds and sand/water to be used. 34 Giorgia Batelli et al. 3. After making sure the mixture of seeds and sand (or the dispersion of seeds in water) is homogeneous, proceed with sowing on wellwatered soil. 4. For low density in small pots or limited surfaces (10–20 plants in 4–5 in. pots or in the case of celled trays), dry seeds placed on a piece of paper can be effectively dispersed by tapping. 5. Seeds should not be covered with soil. Pots or containers sowed with dry seeds on moist soil can then be placed at 4 °C in the dark for 7–14 days. 6. Containers should be checked periodically and moved out of the cold room as soon as germination is achieved. This will avoid etiolation of the plantlets. 3.1.3 Growth Conditions 1. Temperature requirements do not differ greatly for T. salsuginea and T. parvula and a range between 24 °C at daytime and 18 °C at nighttime with a 16 h photoperiod and a light intensity of 130–150 μmol/m2 s is adequate. Both species grow well even at wider temperature and light ranges (typically encountered in a greenhouse compared to a growth room or growth chamber) since plants in nature undergo wider day/night fluctuations than those experienced in controlled environments. Arabidopsis, as well as Thellungiella spp., adapt well to more uniform regimens. 2. T. parvula benefits from intense photosynthetic active radiation. Additional lighting might be required in the greenhouse mainly to control photoperiod, depending also on the season and the location. 3. Plants should be watered regularly and thoroughly from above, with a gentle shower, or by infiltration from the bottom. To reduce fungal and algal growth and infestation of fungus gnats (Mycetophilidae and Sciaridae), containers/soil should drain well the extra water and the soil surface should be allowed to dry between watering. 4. Young plantlets will require thinning at about 3 weeks from germination. This will allow the remaining plants to grow stronger. 3.1.4 Vernalization 1. T. salsuginea requires vernalization. In order to promote uniform flowering, plants of 4–5 cm in diameter (Fig. 2b) should be well watered and then placed at 4 °C for 21–28 days, with a 16-h photoperiod. Plants should not require much care at this stage. The use of plastic domes (generally utilized for propagation) can reduce dehydration and watering intervention. 2. Vernalization can be initiated any time after germination, but in general older plants require longer vernalization times. After vernalization, plants are placed at normal growing temperature and watered regularly (see Note 2). Arabidopsis Related Model Species (ARMS) in Salt Stress Research 35 Fig. 2 Thellungiella salsuginea plants resistant to glufosinate. Panel (a) shows glufosinate-resistant seedlings of T. salsuginea transformed with vector pSK115 15 days after treatment with 5 mg/L. ( b ) An aerial view of flats of young T. salsuginea seedlings resistant to glufosinate after transfer from the selection tray (source: http://thellungiella.org/) 3.1.5 Post-flowering Maintenance 1. T. salsuginea bolts and grows upright, while T. parvula has a recumbent habit (Fig. 1). It is helpful to train older plants of both species which can form numerous branches by tying them to wooden skewers. Tied bundles also facilitate final harvest. When plants begin to dry, whole stalks can be cut for seed collection. 2. It is important to recognize that Thellungiella seeds will mature much more asynchronously than Arabidopsis. It is advisable to repeat harvest as needed (see Note 3). 3. Regular and frequent watering is required during flowering and until siliques are well formed (see Note 4). 4. Watering should be gradually decreased in proximity of plant senescence. 36 Giorgia Batelli et al. 3.1.6 Seed Harvest 1. Harvest generally occurs 2–4 weeks after termination of watering. When appropriate, plants can be harvested in bulk by cutting the stalks at the base and letting them undisturbed to dry completely on large sheets of paper (brown packing type). 2. Seeds are threshed by hand rolling and cleaned through sieves and strainers. Common tea strainers are very useful. Several passages through different size sieves might be necessary to clean seeds from all residues and debris. 3.1.7 Pests 1. Scouting for pests should be done regularly, since high density of plants of the same species in a controlled environment, at optimal growing conditions, in the absence of natural antagonists, increases the chance of pest attacks. Early detection and prompt intervention are critical to avoid major pest explosions. 2. Good watering and fertilization regimens, as well as adequate ventilation, are essential to ensure healthy plants. Vigorous plants are less susceptible to pests and diseases and will also respond better to treatments. 3. A short list of most common problems we encountered and effective measures to solve them follows, with indications on specifically biological antagonists (see Note 5). (a) Powdery mildew (Erysiphe spp.)—sulfur is effective. (b) Fungus gnats (Mycetophilidae and Sciaridae)—controlled by the nematode Steinernema feltiae and by Bacillus thuringiensis israelensis. (c) Thrips (Thysanoptera)—generally not as lethal as on Arabidopsis, still require care and spraying with insecticides approved for thrips, in case of heavy infestation. The predatory mites Neoseiulus cucumeris and Hypoaspis miles have been proven effective. (d) Aphids—a prompt intervention is key, as well as all measures aimed at limiting insect presence in the greenhouse (screens on all openings, reduced traffic, use of coats when entering each greenhouse contained area). Insecticidal soaps are helpful, though repeated or too extensive treatments can damage tender parts of the plants (inflorescences). (e) Root aphids—since they affect roots, they are not as easily detected as other aphids. Symptoms are pronounced leaf yellowing and slow growth. Plants stop developing with consequent dramatic reduced seed maturation and yield (see Note 6). 3.2 Other Candidate Halophytic/ Extremophyle Model Species Whereas Thellungiella spp. have been so far the most studied Arabidopsis relatives, other related crucifers are subject to an increasing interest due to specific characteristics, such as high tolerance to heavy metals in Thlaspi spp. [40–43]. A short description Arabidopsis Related Model Species (ARMS) in Salt Stress Research 37 of some of the most promising candidates is provided, describing main features of their life cycle and plant development. Table 1 summarizes light, photoperiod, temperature, and watering requirements for the described species. 3.2.1 Barbarea verna B. verna is a biennial herb native from Eastern Europe and southwestern Asia, usually found in damp soils, roadsides, or waste places. 1. Seeds should be planted shallowly, at about 1 cm depth. Germination occurs in 1–2 weeks from sowing, while in 4 weeks about two to four true leaves are found. 2. Flowering starts about 6–7 weeks from sowing, when the plant has about ten leaves. 3. At full maturity, the plant may reach size of 0.3 m width and 0.3 m height. 4. Flowers are hermaphrodite and the plant is self-fertile. 3.2.2 Capsella bursa-pastoris C. bursa-pastoris, also known as shepherd purse, is an annual plant, native from Eastern Europe, usually found in arable lands, waste areas, and road margins. 1. Seeds will germinate in 1–2 weeks from sowing and will present 2–6 leaves in 4 weeks and about 15–20 leaves in 6 weeks, when first flowers may appear. 2. At full maturity, the plant may reach size of 0.3 m width and 0.2–0.5 m height. 3. Flowers are hermaphrodite and the plant is self-fertile. 3.2.3 Descurainia pinnata D. pinnata is an annual or biennial plant, native from desert regions from Nevada southward into north central and northwestern Mexico. It is also native to deserts of North Africa and the Middle East. It is usually found in sandy fields, gravel, white saline areas, dunes, open desert, waste ground, disturbed sites, open woods, prairies, glades, roadsides, and railroads. It may grow in sterile soils, such as sandy or gravelly, although in fertile soil, the plant will be larger in size. 1. Germination occurs in about 2 weeks from sowing, and the plant will develop a rosette in about 4–6 weeks, with stems in which flowers will thereafter appear. 2. At this stage, blooming will start and last about 2 months. At full maturity, the plant may reach size of 0.3 m width and 0.6 m height. 3. Flowers are hermaphrodite and the plant is self-fertile. 4. Plants exhibit extreme soil desiccation tolerance, based almost entirely on root growth characteristics. Shepherd’s purse Western tansy mustard Conil yellow, Mediterranean Common pepperweed, prairie peppergrass Virginia pepperweed Conil blue Hedge mustard Pennycress Northern rock cress Capsella bursa-pastoris Descurainia pinnata Hirschfeldia incana Lepidium spp. Malcolmia triloba Sisymbrium officinale Thlaspi arvense Arabidopsis lyrata Cakile maritima Sea rocket Yellow flower, winter cress Barbarea verna Common name 250–1,000 n=9 2n = 18 2n = 16 n=7 2n = 14 n=7 2n = 14 n = 7, 14 2n = 28 n = 16 2n = 32 2n = 14 n=7 n=7 2n = 28 500–800 250–400 250–1,000 500–1,000 500–1,000 100–1,000 500–1,000 500–1,000 16/8 16/8 16/8 16/8 16/8 16/8 16/8 16/8 16/8 16/8 12–25 8–16 12–20 14–24 12–20 15–24 16–24 18–24 14–24 14–24 [16, 44, 47, 49, 50] [16, 40–42, 44] [16, 44, 48] [16, 40, 44–47] [16, 44] References Not frequent/scarce Frequent Frequent Frequent, but avoid flooding [65–69] [33, 61–64] [16, 43, 44, 47] [16, 44, 47] Frequent, but avoid flooding, [16, 44] may tolerate mild drought Can tolerate mild to severe drought stress Can tolerate mild drought stress, avoid flooding Can tolerate drought stress Can tolerate mild drought stress Frequent, avoid flooding Radiation Photoperiod Temperature (μmol/m2 s) (h light/dark) (°C, min–max) Watering n = 8, 16 250–1,000 2n = 16, 32 n=8 2n = 16 Ploidy Table 1 Other candidate halophytic/extremophile model species 38 Giorgia Batelli et al. Arabidopsis Related Model Species (ARMS) in Salt Stress Research 3.2.4 Hirschfeldia incana 39 H. incana is a perennial plant, native to the Mediterranean basin, usually found in waste places, roadsides, and canyons. 1. Germination occurs in 1–2 weeks from sowing, with development of a rosette of lobed leaves within 4–5 weeks, from which stems will develop, covered by dense, soft, and white hairs. 2. Flowers will set between 6 and 12 weeks from sowing and blooming may last a few months. 3. At full maturity, the plant may reach size of 0.5 m width and 1 m height. 4. Flowers are hermaphrodite and the plant is self-fertile. 3.2.5 Lepidium spp. Lepidium spp. are perennial plants native to Eurasia, but spread in all continents, except Antarctica. They are usually found in sandy soil, waste places, coastal regions, sea cliffs, dry creek beds, and dry plains. 1. Germination occurs in 1 week from sowing. 2. The plant reaches full size (0.10–0.50 cm tall) within 8–10 weeks, in the shape of a rosette of lobed leaves from which the flowering stems will develop. 3. Flowers are hermaphrodite and the plant is self-fertile. 4. Plants present considerable salt tolerance close to Thellungiella [16]. 3.2.6 Malcolmia triloba M. triloba is an annual plant native to Asia and the Mediterranean region and usually found in waste and disturbed areas, gravel pits. 1. Seeds germinate in 1–2 weeks. 2. Flowers will appear in 6–8 weeks, when plant will reach their full size (0.15–0.50 m height). 3. Flowers are hermaphrodite and the plant is self-fertile. 3.2.7 Sisymbrium officinale S. officinale is an annual or biennial plant native to the Mediterranean region and usually found in disturbed sites. 1. Germination occurs in 1–2 weeks. 2. Plants reach full size (up to 1 m height) in 5–8 weeks, when flowering starts. Blooming lasts 2 months. 3. Flowers are hermaphrodite and the plant is self-fertile. 3.2.8 Thlaspi arvense T. arvense is an annual plant native to central and western Asia, usually found in roadsides and waste places. 1. Germination will occur in 2 weeks, and the plant will develop the basal rosette of glabrous leaves within 4–6 weeks. 2. Flowering will start 8–12 weeks after sowing, when plants have reached their full size (about 0.75 m height). 40 Giorgia Batelli et al. 3. Flowers are hermaphrodite and the plant is self-fertile. 4. Thlaspi species are notably tolerant of heavy metals. 3.2.9 Arabidopsis lyrata A. lyrata (also known as northern rock cress) is the closest wellstudied relative of A. thaliana. It may complete its cycle within a single season, but is normally a perennial. Native to cool temperate areas around the Arctic, it is usually found in disturbed habitats, with scarce vegetative competition, such as humid rocky places, coastal cliffs, pine forests, or sandbars. 1. Germination time is within 2–3 weeks from sowing and the plant presents a simple rosette within 4–6 additional weeks. 2. It may reproduce vegetatively via stolons or gamically via insect pollination, producing a high number of seeds. 3. Flowering starts 8–12 weeks after sowing. 4. Differently from A. thaliana, A. lyrata plants are outcrossing diploids. 5. The A. lyrata genome has been sequenced. 3.2.10 Cakile maritima C. maritima (also known as sea rocket) is an annual plant sometimes behaving as perennial. Native to Europe, is an invasive species in North America that grows easily along the coast often in sand dunes. 1. Germination occurs in 2–3 weeks. 2. Flowers will set after 6–8 weeks, when plants reach their full size of 0.3 m height. 3. The plant is easily grown on a well-drained sandy soil at high solar radiation and can tolerate salt exposure. 3.3 Genetic Transformation Using Agrobacterium 3.3.1 Agrobacterium Transformation 1. Agrobacterium tumefaciens transformation of T. parvula and T. salsuginea can be carried out following the floral-dip method widely used for Arabidopsis [27]. 2. Agrobacterium-mediated transformation has been successfully obtained for both species, with similar degrees of efficiency (0.1–2 %), either by flower-dip or spraying techniques, using the bacterial strain GV3101. 3. Since the aim of both techniques is to infect the maximum number of unopened floral buds, and considering that flowering is not synchronous, several Agrobacterium treatments are required, generally at 3–5 day intervals. 4. For random activation tagging mutagenesis, the vector pSKI15 [51] is used. 5. Start from a frozen glycerol stock of A. tumefaciens GV3101 (pMP90RK) (C58 derivative) stored at −80 °C. Arabidopsis Related Model Species (ARMS) in Salt Stress Research 41 6. The Agrobacterium strain GV3101 was transformed with a binary vector (pSKI15) for activation T-DNA insertional mutagenesis. The pSKI15 plasmid contains four transcriptional enhancers derived from the cauliflower mosaic virus (CaMV) 35S RNA promoter cloned in tandem near the right border sequence and an expression cassette for herbicide resistance (bar gene, encoding phosphinothricin acetyltransferase). 7. Agrobacterium selectable markers are: (a) Resistance to ampicillin/Ticar/carbenicillin (pSKI15). (b) Resistance to gentamicin (Ti plasmid). (c) Resistance to kanamycin (Ti plasmid). (d) Resistance to rifampicin (GV3101). 8. The T-DNA also contains a bacterial origin of replication (oriC) for plasmid rescue in Escherichia coli. 9. To culture Agrobacterium for transforming plants, chip off pieces of frozen culture with a 200 μL pipette tip from the −80 °C Agrobacterium stock and inoculate 5 mL YEP or LB medium plus appropriate antibiotics in culture tubes (25 × 150 mm). 10. Incubate on a shaker in the dark at 28 °C for 24 h at 230 rpm. The medium should look saturated (cloudy, OD600 = 1.5–2.0). 11. Add 3 mL of culture to a larger amount of medium (YEP or LB + antibiotics) in a flask, according to the amount of plants to transform. 500 mL of grown culture are sufficient for floral dipping of three pots of 5 in. in diameter containing ten plants each. The volume of medium should be no more than 1/5 of the volume of the flask, to assure proper ventilation during shaking. 12. Incubate on a shaker at 28 °C and 230 rpm to an OD600 of = 1.5–2.0 (16–18 h). 13. Centrifuge the culture to form a pellet (4,500 × g for 20 min). Decant the supernatant and add about half of the original culture volume of infiltration medium (see Subheading 2) into the bottle. 14. Resuspend the pellet completely by vigorous shaking and dilute the suspension with infiltration medium to a final OD600 of 0.8–1. 15. Proceed with floral dip or plant spraying. 16. In order to avoid rapid dehydration of the Agrobacterium infiltration solution from the flowers, plants should be protected from air and covered for a period of 24–36 h (flowers should stay wet for at least 24 h). This can be obtained in situ, covering the plants with plastic sheets, or moving plants to closed cabinets or similar structures (see Note 7). 42 Giorgia Batelli et al. 17. Post-transformation care is identical to the standard plant growth conditions described above. As indicated early, the health of the plants has a dramatic effect on transformation efficiency and may even cause total failure. 3.3.2 Identification of T1 Plants Based on Herbicide Resistance Gene Expression For a large amount of seeds, it is convenient to screen transformants directly in the soil; hence the use of a herbicide-tolerance selective marker is advantageous. The following protocol has been extensively utilized for T. salsuginea (and A. thaliana), for the production of large tag-insertional mutagenesis line collections: 1. Typically, 1 g of seeds harvested from Agrobacterium-treated plants is uniformly sowed in greenhouse flats (53 × 27 × 5 cm) with loose soil previously well watered. Use a sand/seeds mixture (1 part of seed to 2–4 parts of sand) and distribute the seeds using the salt-pepper shaker method. Proceed as previously described for stratification and germination. 2. Begin spraying the seedlings with the herbicide immediately after plantlets form the first pair of true leaves. For herbicide application, use a final concentration of 5 mg/L of glufosinate ammonium (active ingredient). One liter of diluted solution is sufficient for about ten flats (see Note 8). 3. Spray for 3–5 consecutive days each week until clear distinction between dead vs. surviving plants is visible. Repeated herbicide treatments may be required depending on plant density and uneven germination. 4. Transformed plants should continue to grow undisturbed (Fig. 2a). Mark survivor plants from earlier germination with toothpicks, while secondary germination is starting. This is important to help avoid later generating non-transformed plants that escape herbicidal effects. 5. Transplant putative transformants to 25 × 25 × 5 cm trays, 25 plants/tray (Fig. 2b). 6. Protect plants from transplant shock by immediately watering them and covering them with propagation domes and/or by shading for 1–2 days. 7. Once plants are established (4–5 cm diameter rosette, 4–6 weeks), vernalize at 4 °C, following the vernalization procedure above described. 8. For large-scale mutagenesis, seeds collected from each tray are pooled for further study. Alternatively, T1 plants can be isolated in vitro (see Note 9). 9. Frequency of transformed plants recovered can be determined by PCR of the insertion plasmid and or herbicide gene sequences. Arabidopsis Related Model Species (ARMS) in Salt Stress Research 43 Table 2 Genomic tools developed for Thellungiella spp. Tissue of origin/stress treatment Notes Database accession nos. About 1,700 ESTs sequenced GenBank, see paper for Acc. Nos. [52] 6,578 ESTs were EST T. salsuginea Adult plants, sequenced from collection Yukon aboveground cDNA libraries tissue/ obtained from chilling, different freezing, salt treatments acclimation, salt shock, drought stress GenBank, Acc. Nos. DN772677– DN779205 [31] Tool Species EST T. salsuginea Seedlings/salt collection cDNA library T. salsuginea Several tissues/ 20,000 full-length enriched Shandong chilling, Thellungiella freezing, salt, cDNAs (RTFL) ABA were generated EST T. salsuginea Whole plants/ collection Shandong salt 946 EST sequences generated Reference DDBJ, Acc. Nos. [53] BY800476– BY835646; Clones available at: http://www.brc. riken.go.jp/lab/ epd/Eng/ GenBank, Acc. Nos. EC598928– EC599965 [54] Microarray chip T. salsuginea N.A.a Yukon ESTs spotted on the N.A.a chip. Specifically developed for Thellungiella, it can analyze “novel” genes [32] BIBAC Library T. salsuginea Partially digested T. salsuginea genomic DNA N.A.a BIBAC (binary bacterial artificial chromosome) library expected to cover 4× genome of salsuginea was generated [55] a Not applicable 3.4 Tools for Comparative Genomics Analyses of Arabidopsis Relatives: An Example with T. parvula In addition to the genome sequences [17, 34], several genomic tools have been developed for T. parvula and T. salsuginea, including EST and cDNA libraries, BiBAC libraries, and microarray chips ([11], Table 2). Below, we describe methods to compare, align, and assemble genomic contigs and scaffolds of a newly sequenced crucifer species. 44 Giorgia Batelli et al. The chromosome structures of plant species within the Brassicaceae family have been studied with comparative chromosome painting (CCP) techniques using pools of A. thaliana BACs as probe [56, 57]. The ancestral Brassicaceae genome was inferred to contain 24 ancestral karyotype (AK) blocks, named A to X, which constitute eight chromosomes [56]. Genomes of most crucifer species consist of combinations of these AK blocks in different numbers of chromosomes [57]. For example, crucifers in the Lineage II with 2n = 14 karyotypes, including Thellungiella, evolved from eight ancestral Brassicaceae chromosomes to genomes with seven chromosomes after multiple translocation and inversion events [58]. As an example, here we describe current tools to compare the genomes of a newly sequenced ARMS (T. parvula, Tp) to the genome of the model plant, A. thaliana (At). Comparison of the larger Tp contigs with the AK blocks in the At genome helped the assembly of the seven Tp chromosome pseudomolecules [34]. 3.4.1 Identification and Visualization of Global Synteny Using Nucmer and Circos Alignment of Genomes Using Nucmer 1. The genome contigs and scaffolds were aligned as FASTA files with the At genome sequence using Nucmer. An example of command line and parameter is “$ nucmer --maxmatch --maxgap 1000 --prefix <project_name> <input_genome_ sequence_file_name.fasta> <At_genome_file_name.fasta>” 2. Run delta-filter. An example of command line and parameter is “$ delta-filter -r -q -l 500 project_name.delta> project_name. filter” 3. Run show-coords. An example of command line and parameter is “$ show-coords -c -d -l -r -T project_name.filter> project_ name.txt” 4. The resulted file should contain the coordinates of genomic regions that show sequence similarity with genomic regions from At in text delimited file format. Visualization of Nucmer Results with Circos 1. Circos is a visualization tool for comparative genomics that runs with a configuration file. The alignment results obtained using Nucmer can be fed into Circos as <links> in the configuration file. An example of visualization comparing Tp chromosome 7 with five At chromosomes is shown in Fig. 3a. 2. When Arabidopsis-relative crucifer genome sequences are compared with those of At, extensive colinearity is usually found. If a karyotype model with AK blocks is available [9] for the Arabidopsis-relative species, the contig/scaffolds can be mapped to the model by comparing them with the At genome. Using the Nucmer alignment result obtained as described earlier in this section (Alignment of Genomes Using Nucmer), identify which At AK blocks the contig/scaffolds show colinearity. The coordinates of AK blocks in the At genome are available in Schranz et al. [56]. For example, Tp contigs c16 and c19 are colinear with the AK block S in At genome (Fig. 3a). Similarly, Arabidopsis Related Model Species (ARMS) in Salt Stress Research 45 Fig. 3 Tools for identification and visualization of synteny between the genome sequences of Thellungiella parvula (Tp) and Arabidopsis thaliana (At ). (a) Synteny between Tp chromosome 7 and At genome. Tp genome contigs that are colinear to the ancient karyotype (AK) blocks S, T, and U are assembled to Tp chromosome 7 according to the model developed by comparative chromosome painting (CCP) results [56–58]. Synteny regions were identified using Nucmer [37] and visualized with Circos [38]. The outer histogram shows the distribution of genes, retrotransposons, DNA transposons, and unidentified repetitive sequences in blue, orange, yellow, and green, respectively. The red arrow indicates the centromeric region of the Tp chromosome 7. (b) Synteny blocks between At chromosome 4 and Tp chromosome 7 were identified by MAUVE [39]. Genomic regions with sequence similarity were indicated with the same color between the two chromosomes. Red arrows identify inversions. (c) Identification of genome-wide synteny between At and Tp using SynMap (http://genomevolution.org/r/4gyq) included in the CoGe tools [59, 60]. The protein-coding sequences (CDSs) of At and Tp were compared and dots were plotted where the coding sequences from the two species show similarity. Colors of the dots indicate the synonymous nucleotide substitution ratio (Ks) as indexed in (d) c32, c6, c14, and c17 showed synteny with AK blocks T and U in At. Since chromosome 7 of Thellungiella species consists of AK blocks S, T, and U [57, 58], the contigs c16, c19, c32, c6, c14, and c17 can be mapped to Tp chromosome 7 (Fig. 3a). 3. If genome annotation is available, the distribution of coding sequence (CDS) and transposable element (TE) can be plotted in the Circos diagram as a histogram. The correct chromosome assembly will reveal a TE-rich centromeric region, as indicated with a red arrow in Fig. 3a. 46 Giorgia Batelli et al. 3.4.2 Alignment of Genomes Using MAUVE MAUVE [39] is a sequence alignment tool suitable for identifying synteny as well as chromosome scale inversions. 1. MAUVE runs with a graphical user interface and takes two or more genome sequences in FASTA format as input. Figure 3b displays an example of MAUVE results comparing the At chromosome 4 and Tp chromosome 7, as assembled above using Nucmer. 2. When mapping the genome contig/scaffolds to the chromosome model with AK blocks, MAUVE is particularly useful support tool in deciding the direction of each contig/scaffold. 3.4.3 Comparison of Tp and At Genomes Using CoGe Genome-Wide Comparison Using SynMap 1. The version 2 of Tp genome and annotation is available for comparative studies in CoGe database (http://genomevolution.org/ CoGe/) [59, 60]. Using SynMap, a tool included in CoGe, the Tp genomes and coding sequences can be compared to any genome deposited in CoGe. Comparison of the entire Tp CDSs with those of At is available in http://genomevolution.org/ r/4gyq. The comparison is visualized in a dot plot (Fig. 3c). 2. When the annotation of CDS available for both species is being compared, SynMap calculates synonymous substitution rate (Ks) for all homologous CDS pairs between the two species in comparison and generates a histogram of Ks values (Fig. 3d). The dot plot will be colored according to the Ks value of the CDS pairs. In Fig. 3c, the lines with yellow dots consist of CDS pairs with Ks values around or less than 0.3, while red dots indicate Ks values around or larger than 0.5. CDS pairs with Ks values higher than 0.8 are shown with blue dots (Fig. 3d). SynMap generates links to the sequences of all homologous CDS pairs, as well as the list of tandem duplicated CDSs. 3. Part of the SynMap can be magnified by clicking and dragging with the mouse cursor. The magnification will be shown in a separate window. Clicking any dots in this magnified window will open another tool, GEvo, for comparison in higher resolution. Localized Comparison of Genomic Features Using GEvo 1. There are two different ways to start GEvo. Firstly, clicking a dot in the magnified SynMap window will open the GEvo analysis around the selected dot. Secondly, the name of the CDS can be directly entered from the GEvo window (http:// genomevolution.org/CoGe/GEvo.pl). Entering the CDS name in the “Name:” window will automatically bring up the genomic sequences around the CDS. For example, entering “AT1G18710” as the name for Sequence 1 and “Tp1g16690” for Sequence 2 and pressing “Run GEvo Analysis!” button will bring up Fig. 4. More than two sequences can be compared by clicking “Add sequence.” 2. The pink ribbons in the example presented in Fig. 4 indicate genomic regions showing homology or high-scoring segment Arabidopsis Related Model Species (ARMS) in Salt Stress Research 47 Fig. 4 Comparison of homologous genomic regions of At and Tp using GEvo. T. parvula genome sequence and annotation is available for comparative genomic studies in CoGe database (http://genomevolution.org/CoGe/ index.pl). Shown is an example snapshot of GEvo results (http://genomevolution.org/CoGe/GEvo.pl), part of the CoGe toolbox, comparing the genomic regions near the AtMYB47 (At1g18710) and the three putative TpMYB47 homologs (Tp1g16690, Tp1g16700, and Tp1g16710). Pink ribbons indicate blocks with sequence similarity between the two species. Gene models are shown as cylinders with exons, introns, and noncoding conserved sequences in green, gray, and blue colors pairs (HSPs). By clicking on the ribbon, the link between homologous regions will be toggled on and the alignment between the two regions will appear in a separate window. The link can be toggled off by clicking one of the ribbons connected by it again. The gene models are shown as cylinders. Clicking the cylinder will bring up a separate window containing the annotation and information of the CDS, as well as the link to the CDS sequence. 3. The example in Fig. 4 shows a local tandem duplication event specific to T. parvula, where putative Tp homologs of AtMYB47 were amplified to three copies. GEvo is suitable for browsing and analyzing local tandem duplication in detail. 4 Notes 1. If using pesticides, it is imperative to alternate product types to reduce occurrence of resistance in the pest population. Furthermore, not all products are licensed to be used in controlled environments and regulations and product availability differ in different countries. Restricted entry intervals (REI), i.e., the period of time after plants and/or soil is treated with a pesticide during which restrictions on entry are in effect to protect 48 Giorgia Batelli et al. persons from potential exposure to hazardous levels of pesticide residues and protective measures should be adopted. 2. The EMS mutant162 of T. halophila does not require vernalization. This mutant flowers very early and has a smaller size allowing it to be manipulated much like Arabidopsis (Bressan, personal communication). 3. Waiting for dehydration of the whole plant may lead to nonuniform silique dehiscence and consequent seeds loss, especially transformed seeds if the maturation of the flowers that were sprayed with Agrobacterium is not followed closely. 4. At this stage, watering should be done carefully at the base of the plants or by bottom infiltration. Above-canopy watering will cause seed loss. 5. In general, with the possible exception of powdery mildew, Thellungiella spp. seem to be less prone to pest and disease infestations than Arabidopsis. To date, we have not observed impatiens necrotic spot virus (INSV) on Thellungiella species. 6. Root aphid infestation is a good example of a condition that will demonstrate to affect the transformation efficiency without alarmingly affecting the appearance of the plants. 7. It is very important to avoid overheating conditions and provide adequate shading. 8. The herbicide diluted solution can be kept for several days, in the dark, since light promotes the herbicide degradation. 9. In case of in vitro isolation of T1 plants, using glufosinate ammonium (Crescent Chemical Company, Islandia, NY), we have noticed that whereas the response of A. thaliana is optimal at 5 ppm (5 mg/L), the response of T. parvula appears more variable, displaying some plants with higher tolerance to the herbicide. This species seems to respond more slowly to the action of the herbicide, resulting in a suggested optimal concentration of 10 mg/L. The response of T. salsuginea to in vitro herbicide screening is also slower than A. thaliana. Resistant plants can be hardened by moving them onto regular medium without herbicide before being transplanted into soil. Acknowledgements Dong-Ha Oh thanks Eric Lyons for great help in setting up T. parvula sequences in CoGe database. Dong-Ha Oh was supported by World Class University Program (R32–10148) at Gyeongsang National University, Republic of Korea, and the NextGeneration BioGreen 21 Program (SSAC, PJ009495), Rural Development Administration, Republic of Korea. Arabidopsis Related Model Species (ARMS) in Salt Stress Research 49 References 1. Zhu JK (2002) Salt and drought stress signal transduction in plants. Annu Rev Plant Biol 53:247–273 2. Pardo JM, Cubero B, Leidi EO, Quintero FJ (2006) Alkali cation exchangers: roles in cellular homeostasis and stress tolerance. J Exp Bot 57:1181–1199 3. Fujita Y, Fujita M, Shinozaki K, YamaguchiShinozaki K (2011) ABA-mediated transcriptional regulation in response to osmotic stress in plants. J Plant Res 124:509–525 4. Sanders D (2000) Plant biology: the salty tale of Arabidopsis. Curr Biol 10:486–488 5. Bohnert HJ, Cushman JC (2000) The ice plant cometh: lessons in abiotic stress tolerance. J Plant Growth Regul 19:334–346 6. Flowers TJ, Colmer TD (2008) Salinity tolerance in halophytes. New Phytol 179:945–963 7. Munns R, Tester M (2008) Mechanisms of salinity tolerance. Annu Rev Plant Biol 59:651–681 8. Cushman JC, Meyer G, Michalowski CB, Schmitt JM, Bohnert HJ (1989) Salt stress leads to differential expression of two isogenes of phosphoenolpyruvate carboxylase during Crassulacean acid metabolism induction in the common ice plant. Plant Cell 1:715–725 9. Flowers TJ, Yeo A (1995) Breeding for salinity resistance in crop plants: where next? Aust J Plant Physiol 22:875–884 10. Kant S, Kant P, Raveh E, Barak S (2006) Evidence that differential gene expression between the halophyte, Thellungiella halophila, and Arabidopsis thaliana is responsible for higher levels of the compatible osmolyte proline and tight control of Na+ uptake in T. halophila. Plant Cell Environ 29:1220–1234 11. Amtmann A (2009) Learning from evolution: Thellungiella generates new knowledge on essential and critical components of abiotic stress tolerance in plants. Mol Plant 2:3–12 12. Al-Shehbaz IA, O’Kane SL (1995) Placement of Arabidopsis parvula in Thellungiella (Brassicaceae). Novon 5:309–310 13. Al-Shehbaz IA, O’Kane SL, Price RA (1999) Generic placement of species excluded from Arabidopsis (Brassicaceae). Novon 9:296–307 14. Zhu JK (2001) Plant salt tolerance. Trends Plant Sci 6:66–71 15. Bressan RA, Zhang C, Zhang H, Hasegawa PM, Bohnert HJ, Zhu JK (2001) Learning from the Arabidopsis experience: the next gene search paradigm. Plant Physiol 127:1354–1360 16. Orsini F, Paino D’Urzo M, Inan G et al (2010) A comparative study of salt tolerance parameters 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. in 11 wild relatives of Arabidopsis thaliana. J Exp Bot 61:3787–3798 Wu HJ, Zhang Z, Wang J-Y, Oh DH, Dassanayake M, Liu B, Huang Q, Sun HX, Xia R, Wu Y, Wang Y, Yang Z, Liu Y, Zhang W, Zhang H, Chu J, Yan C, Fang S, Zhang J, Wang Y, Zhang F, Wang G, Lee SY, Cheeseman JM, Yang B, Li B, Min J, Yang L, Wang J, Chu C, Chen SY, Bohnert HJ, Zhu JK, Wang XJ, Xie Q (2012) Insights into salt tolerance from the genome of Thellungiella salsuginea. Proc Natl Acad Sci U S A 109:12219–12224 Amtmann A, Bohnert HJ, Bressan RA (2005) Abiotic stress and plant genome evolution. Search for new models. Plant Physiol 138: 127–130 Inan G, Zhang Q, Pinghua L et al (2004) Salt cress: a halophyte and cryophyte Arabidopsis relative model system and its applicability to molecular genetic analyses of growth and development of extremophiles. Plant Physiol 135:1718–1737 Teusink RS, Rahman M, Bressan RA, Jenks MA (2002) Cuticular waxes on Arabidopsis thaliana close relatives Thellungiella halophila and Thellungiella parvula. Int J Plant Sci 163: 309–315 Gong Q, Li P, Ma S, Rupassara I, Bohnert HJ (2005) Salinity stress adaptation competence in the extremophile Thellungiella halophila in comparison with its relative Arabidopsis thaliana. Plant J 44:826–839 Volkov V, Amtmann A (2006) Thellungiella halophila, a salt-tolerant relative of Arabidopsis thaliana, has specific root ion-channel features supporting K+/Na+ homeostasis under salinity stress. Plant J 48:342–353 Wang B, Davenport RJ, Volkov V, Amtmann A (2006) Low unidirectional sodium influx into root cells restricts net sodium accumulation in Thellungiella halophila, a salt-tolerant relative of Arabidopsis thaliana. J Exp Bot 57: 1161–1170 Oh DH, Gong Q, Ulanov A, Zhang Q, Li Y, Ma W, Yun DJ, Bressan RA, Bohnert HJ (2007) Sodium stress in the halophyte Thellungiella halophila and transcriptional changes in a thsos1-RNA interference line. J Integr Plant Biol 49:1484–1496 Oh DH, Leidi E, Zhang Q et al (2009) Loss of halophytism by interference with SOS1 expression. Plant Physiol 151:210–222 Vera-Estrella R, Barkla BJ, Garcia-Ramirez L, Pantoja O (2005) Salt stress in Thellungiella halophila activates Na+ transport mechanisms 50 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. Giorgia Batelli et al. required for salinity tolerance. Plant Physiol 139:1507–1517 Clough SJ, Bent AF (1998) Floral dip: a simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana. Plant J 16:735–743 Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815 Volkov V, Wang B, Dominy PJ, Fricke W, Amtmann A (2004) Thellungiella halophila, a salt-tolerant relative of Arabidopsis thaliana, possesses effective mechanisms to discriminate between potassium and sodium. Plant Cell Environ 27:1–14 Taji T, Seki M, Satou M, Sakurai T, Kobayashi M, Ishiyama K, Narusaka Y, Narusaka M, Zhu JK, Shinozaki K (2004) Comparative genomics in salt tolerance between Arabidopsis and Arabidopsis-related halophyte salt cress using Arabidopsis microarray. Plant Physiol 135: 1697–1709 Wong CE, Li Y, Whitty BR, Diaz-Camino C, Akhter SR, Brandle JE, Golding GB, Weretilnyk EA, Moffatt BA, Griffith M (2005) Expressed sequence tags from the Yukon ecotype of Thellungiella reveal that gene expression in response to cold, drought and salinity shows little overlap. Plant Mol Biol 58:561–574 Wong CE, Li Y, Labbe A, Guevara D et al (2006) Transcriptional profiling implicates novel interactions between abiotic stress and hormonal responses in Thellungiella, a close relative of Arabidopsis. Plant Physiol 140:1437–1450 Hu TT, Pattyn P, Bakker EG, Cao J et al (2011) The Arabidopsis lyrata genome sequence and the basis of rapid genome size change. Nat Genet 43:476–481 Dassanayake M, Oh DH, Haas JS et al (2011) The genome of the extremophile crucifer Thellungiella parvula. Nat Genet 43:913–918 Wang X, Wang H, Wang J et al (2011) The genome of the mesopolyploid crop species Brassica rapa. Nat Genet 43:1035–1040 Oh DH, Dassanayake M, Haas JS et al (2010) Genome structures and halophyte-specific gene expression of the extremophile Thellungiella parvula in comparison with Thellungiella salsuginea (Thellungiella halophila) and Arabidopsis. Plant Physiol 154:1040–1052 Kurtz S, Philippy A, Delcher AL et al (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12 Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19: 1639–1645 39. Darling AC, Mau B, Blattner FR, Perna NT (2004) Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14:1394–1403 40. Aksoy A, Hale WHG, Dixon JM (1999) Capsella bursa-pastoris L. Medic. as a biomonitor of heavy metals. Sci Total Environ 226: 177–186 41. Madejon P, Murillo JM, Maranon T, Valdes B, Rossini Oliva S (2005) Thallium accumulation in floral structures of Hirschfeldia incana (L.) Lagreze-Fossat (Brassicaceae). Bull Environ Contam Toxicol 74:1058–1064 42. Gisbert C, Clemente R, Navarro-Avino J, Baixauli C, Giner A, Serrano R, Walker DJ, Bernal MP (2006) Tolerance and accumulation of heavy metals by Brassicaceae species grown in contaminated soils from Mediterranean regions of Spain. Environ Exp Bot 56:19–27 43. Jimenez-Ambriz G, Petit C, Bourrie I, Dubois S, Olivieri I, Ronce O (2007) Life history variation in the heavy metal tolerant plant Thlaspi caerulescens growing in a network of contaminated and noncontaminated sites in southern France: role of gene flow, selection and phenotypic plasticity. New Phytol 173:199–215 44. Bailey CD, Koch MA, Mayer M, Mummenhoff K, O’Kane SL Jr, Warwick SI, Windham MD, Al-Shehbaz IA (2006) Toward a global phylogeny of the Brassicaceae. Mol Biol Evol 23:2142–2160 45. Popay AI, Roberts EH (1978) Factors involved in the dormancy and germination of Capsella Bursa- Pastoris (L.) Medik. and Senecio Vulgaris L. J Ecol 58:103–122 46. Pedras MSC, Montaut S, Zaharia IL, Gai Y, Ward DE (2003) Transformation of the hostselective toxin destruxin B by wild crucifers: probing a detoxification pathway. Phytochemistry 64:957–963 47. Johnston SJ, Pepper AE, Hall AE, Jeffrey Chen Z, Hodnett G, Drabek J, Lopez R, James Price H (2005) Evolution of genome size in Brassicaceae. Ann Bot 95:229–235 48. Dittmer HJ (1949) Root hair variations in plant species. Am J Bot 36:152–155 49. Muller K, Tintelnot S, Leubner-Metzger G (2006) Endosperm-limited Brassicaceae seed germination: abscisic acid inhibits embryoinduced endosperm weakening of Lepidium sativum (cress) and endosperm rupture of cress and Arabidopsis thaliana. Plant Cell Physiol 47:864–877 50. Santin-Montanya I, Alonso-Prados JL, Villarroya M, Garcıa-Baudin JM (2006) Bioassay for determining sensitivity to sulfosulfuron on seven plant species. J Environ Sci Health B 41:781–793 Arabidopsis Related Model Species (ARMS) in Salt Stress Research 51. Weigel D, Ahn JH, Blazquez MA et al (2000) Activation tagging in Arabidopsis. Plant Physiol 122:1003–1013 52. Wang Z, Li P, Fredricksen M, Gong Z et al (2004) Expressed sequence tags from Thellungiella halophila, a new model to study plant salt-tolerance. Plant Sci 166:609–616 53. Taji T, Sakurai T, Mochida K et al (2008) Large-scale collection and annotation of fulllength enriched cDNAs from a model halophyte, Thellungiella halophila. BMC Plant Biol 8:115 54. Zhang Y, Lai J, Sun S, Li Y, Liu Y, Liang L, Chen M, Xie Q (2008) Comparison analysis of transcripts from the halophyte Thellungiella halophila. J Integr Plant Biol 50:1327–1335 55. Wang W, Wu Y, Li Y et al (2010) A large insert Thellungiella halophila BIBAC library for genomics and identification of stress tolerance genes. Plant Mol Biol 72:91–99 56. Schranz ME, Lysak MA, Mitchell-Olds T (2006) The ABC’s of comparative genomics in the Brassicaceae: building blocks of crucifer genomes. Trends Plant Sci 11:535–542 57. Lysak MA, Koch MA (2011) Phylogeny, genome and karyotype evolution of crucifers (Brassicaceae). In: Schmidt R, Bancroft I (eds) Genetics and genomics of the Brassicaceae. Springer, New York 58. Mandáková T, Lysak MA (2008) Chromosomal phylogeny and karyotype evolution in x = 7 crucifer species (Brassicaceae). Plant Cell 20: 2559–2570 59. Lyons E, Pedersen B, Kane J et al (2008) Finding and comparing syntenic regions among Arabidopsis and the outgroups papaya, poplar, and grape: CoGe with rosids. Plant Physiol 148:1772–1781 51 60. Lyons E, Freeling M (2008) How to usefully compare homologous plant genes and chromosomes as DNA sequences. Plant J 53: 661–673 61. Ansell SW, Stenøien HK, Grundmann M, Schneider H, Hemp A, Bauer N, Russell SJ, Vogel JC (2010) Population structure and historical biogeography of European Arabidopsis lyrata. Heredity 105(6):543–553 62. Al-Shehbaz IA, O’Kane SL (2002) Taxonomy and phylogeny of Arabidopsis (Brassicaceae). In: Somerville CR, Meyerowitz EM (eds) The Arabidopsis book. American Society of Plant Biologist, Rockville, MD, pp 1–22 63. Mitchell-Olds T (2006) Genetic mechanisms and evolutionary significance of natural variation in Arabidopsis. Nature 441:947–952 64. Ratcliffe DA (1994) Arabis petraea. In: Stewart A, Pearman DA, Preston CD (eds) Scarce plants of the British Isles. JNCC, Peterborough, p 51 65. Sandring S, Argen J (2009) Pollinatormediated selection on floral display and flowering time in the perennial herb Arabidopsis lyrata. Evolution 63:1292–1300 66. Thrall PH, Young AG, Burdon JJ (2000) An analysis of mating structure in populations of the annual sea rocket, Cakile maritima (Brassicaceae). Aust J Bot 48:731–738 67. Barbour MG (1972) Seedling establishment of Cakile maritima at Bodega Head, California. Bull Torrey Bot Club 99:11–16 68. Maun MA, Lapierre J (1986) Effects of burial by sand on seed germination and seedling emergence of four dune species. Am J Bot 73:450–455 69. Barbour MG (1970) Germination and early growth of the strand plant Cakile maritime. Bull Torrey Bot Club 97:13–22 Chapter 3 Growing Arabidopsis In Vitro: Cell Suspensions, In Vitro Culture, and Regeneration Bronwyn J. Barkla, Rosario Vera-Estrella, and Omar Pantoja Abstract An understanding of basic methods in Arabidopsis tissue culture is beneficial for any laboratory working on this model plant. Tissue culture refers to the aseptic growth of cells, organs, or plants in a controlled environment, in which physical, nutrient, and hormonal conditions can all be easily manipulated and monitored. The methodology facilitates the production of a large number of plants that are genetically identical over a relatively short growth period. Techniques, including callus production, cell suspension cultures, and plant regeneration, are all indispensable tools for the study of cellular biochemical and molecular processes. Plant regeneration is a key technology for successful stable plant transformation, while cell suspension cultures can be exploited for metabolite profiling and mining. In this chapter we report methods for the successful and highly efficient in vitro regeneration of plants and production of stable cell suspension lines from leaf explants of both Arabidopsis thaliana and Arabidopsis halleri. Key words Callus, Cell suspensions, Plant regeneration, Tissue culture, Arabidopsis, Organ regeneration 1 Introduction Plant tissue culture is an indispensable tool for the study of cellular biochemical and molecular processes and a key technology for successful stable plant transformation. In vitro culture, from the Latin “in glass,” was so named for the glass vessels that the cultures were grown in and is a term which probably came into use at the end of the nineteenth century by embryologists. The earliest attempts at tissue culture of plant cells were made in the first decade of the 1900s by the Austrian Botanist Haberlandt who published his work in German (translated into English in ref. 1). However, it wasn’t until 30 years later, following the discovery of plant growth regulators, that the development of the technique to include auxins allowed for the possibility of cultivating plant tissue in an aseptic environment for an indefinite length of time [2–4]. Further advancements in nutrient and micronutrient content, plant growth regulator (PGR) discovery, and manipulation of ratios of PGR have all dramatically Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_3, © Springer Science+Business Media New York 2014 53 54 Bronwyn J. Barkla et al. improved the efficiency and versatility of the technique to bring us to where we are today with the ability to cultivate callus, cell suspensions, protoplasts, organs, and regenerate whole plants. Culturing techniques provide a tightly controlled closed growth system while facilitating the manipulation of experimental conditions. Physical, nutritional, and hormonal states can all be easily regulated in the closed system reducing variability and extraneous factors. The generation of plant material in this manner offers a homogenous and genetically identical pool which, through the process of subculturing, can result in large quantities of experimental material over very short time frames, while the sterile growth conditions ensure the material is free from pathogenic microorganisms. In vitro culture to produce cell suspensions or regenerate plants begins with the selection of explant material. The explant is a highly differentiated piece of tissue (i.e., leaf pieces) harvested from the plant, that is sterilized and placed on an artificial nutrient and vitamin-rich, PGR-supplemented, growth medium. The wounding of the tissue and the presence of specific amounts of PGR induce somatic embryogenesis, and the cells in the media begin to revert to their meristematic state, dividing rapidly and dedifferentiating to form a mass of unorganized cells called callus. These dedifferentiated cells can either be maintained indefinitely as callus through subculturing to prevent nutrient deficiency or, once the cells of the callus become less packed and more friable, be transferred to liquid medium where they dissociate into single cells to generate stable cell suspension cultures. Undifferentiated callus or suspension cells can be stimulated by PGR to initiate organogenesis exploiting the ability of all plant cells, due to their genetic potential, to dedifferentiate and then, under defined conditions, to redifferentiate to form any plant organ, a phenomenon known as totipotency [5]. Particularly important in organogenesis is the ratio of the PGRs’ cytokinin and auxin. Typically, a high ratio of cytokinin to auxin results in shoot differentiation, whereas a high ratio of auxin to cytokinin induces differentiation of cells to roots [6]. However, there are plants that provide exceptions to this rule. The first reports of Arabidopsis callus culture date back to the 1960s [7]. These were followed by articles detailing methods for cell suspensions as well as organ and plant regeneration [8, 9]. This early methodology was applied nearly a decade later to the regeneration of whole plants from Agrobacterium tumefacienstransformed Arabidopsis leaf explants [10]. Early contributions to the field such as these led to the Arabidopsis molecular revolution that continues to this date and has established this small, unassuming weed as an unrivalled model plant system [11]. Here we report methods for the efficient and successful regeneration of plants and production of stable cell suspension lines from dedifferentiated callus produced from leaf explants of both Arabidopsis thaliana and Arabidopsis halleri plants (see Fig. 1). Growing Arabidopsis In Vitro 55 Step 1 Callus Induction Sterilize Arabidopsis tissue and section to obtain explant material Step 2 4 weeks in the dark at 25 oC Place explants on sterile solid culture medium to induce callus Step 3 2 weeks in the dark at 25 oC Select friable callus and replate onto fresh sterile solid medium Step 4 2 weeks shaking (150 rpm) in the dark at 25 oC Cell Suspensions Place friable callus into 100 mL sterile liquid medium in a 500 mL Erlenmeyer flask Step 5 1 week shaking (150 rpm) in the dark at 25 oC Subculture weekly by transferring a 30 mL aliquot into 90 mL of sterile liquid medium Step 6 Shoot Regeneration Friable callus is transferred to sterile shoot regeneration medium 2 weeks in the light at 25 oC Step 7 Root Regeneration Regenerated shoots are transferred to rooting medium 2 weeks in the light at 25 oC Step 8 Regenerated plants are transferred to pots containing soil in a greenhouse under natural light at 25 °C Fig. 1 Schematic diagram of the steps and time involved in the production of callus cultures, cells suspension cultures, and regenerated plants from Arabidopsis 56 2 Bronwyn J. Barkla et al. Materials 1. Dry seeds of A. halleri and A. thaliana. 2. 100 mm diameter pots. 3. MetroMix 510 soilless mixture combined with perlite (3:1). 4. 90 % (v/v) ethanol. 5. 70 % (v/v) ethanol. 6. 5 % sodium hypochlorite solutions (bleach). 7. Sterilized deionized water. 8. Microcentrifuge. 9. Petri plates 9 cm. 10. Magenta tissue culture boxes. 11. Pair of forceps. 12. Scalpel and sterile scalpel blades. 13. Bunsen burner. 14. Gamborg’s B5 vitamins [12]: 10 mg/L thiamine hydrochloride, 1 mg/L nicotinic acid, 1 mg/L pyridoxine hydrochloride, 100 mg/L myoinositol. Prepare as a 100× stock solution in sterile deionized water, and add to the medium before autoclaving. The stock can be stored at 4 °C. 15. Basic Murashige and Skoog medium [13] (see Table 1 and Notes 1–3). Stock solutions (1–6) are prepared in a total volume of 100 mL. 16. Medium 1 (M1): basic MS medium, supplemented with Gamborg’s B5 vitamins, 3 % sucrose, 1.5 % bacteriological agar, 1 mg/L 2,4-D, and 0.05 mg/L benzylaminopurine (BA) (see Note 4). Adjust to pH 5.7 with 1 N KOH. 17. Shoot regeneration medium, M2: basic MS medium supplemented with Gamborg’s B5 vitamins, 3 % sucrose, 1.5 % bacteriological agar, 0.5 mg/L 2,4-D, and 0.1 mg/L BA. 18. Root regeneration medium, M3: half-strength MS medium supplemented with half-strength Gamborg’s B5 vitamins (5 mg/L thiamine hydrochloride, 0.5 mg/L nicotinic acid, 0.5 mg/L pyridoxine hydrochloride, 50 mg/L myoinositol), 1 % sucrose, 1 % bacteriological agar, and 0.3 mg/L 2,4-D. Adjust the pH to 5.7 with 1 N KOH. 19. Hoagland and Arnon [14] hydroponic stock solutions: 1 M (NH4)2HPO4 (add 1 mL/L), 1 M KNO3 (add 6 mL/L), 1 M Ca(NO3)2⋅4H2O (add 4 mL/L), MgSO4⋅7H2O (2 mL/L), 1 mL/L of micronutrients stock (2.85 g/L H3BO3, 2.44 g/L MnCl2⋅4H2O, 0.22 g/L ZnSO4⋅7H2O, 0.08 g/L CuSO4⋅5H2O, 0.02 g/L H2MoO4⋅H2O), 10 mL/L of Fe2+ 57 Growing Arabidopsis In Vitro Table 1 Murashige and Skoog stock solutions and amounts added to prepare 1 L of basic MS media MS medium stocks (see Note 3) Chemicals g/100 mL stock ml/L media Stock solution 1 NH4NO3 KNO3 MgSO4⋅7H2O KH2PO4 16.5 19.0 3.7 1.7 10 Stock solution 2 CaCl2 4.4 10 Stock solution 3 NaEDTA⋅2H2O FeSO4⋅7H2O 0.37 0.28 10 Stock solution 4 H3BO3 MnSO4⋅H2O ZnSO4 0.062 0.169 0.086 Stock solution 5 KI Na2MoO4⋅7H2O 0.083 0.025 1.0 Stock solution 6 CuSO4⋅5H2O CoCl2⋅6H2O 0.025 0.025 0.1 10 stock solution (0.4129 g/L Na2EDTA⋅2H2O and 0.278 g/L FeSO4⋅7H2O) (see Note 5). 20. Suspension culture medium (S1). This medium is the same as M1 but does not contain agar. 21. Sterile 500 mL Erlenmeyer flasks with cotton plugs. 22. Pipette pump and sterile 10 mL pipettes. 23. 1 mL micropipette and sterile tips. 24. Aluminum foil. 25. Parafilm. 26. Transparent plastic cups. 27. Rotatory shaker set at 150 rpm with temperature control. 28. Growth chamber set at 25 °C. 29. Balance. 30. Microscope and microscope slides. 31. Filtration unit and sterile filters (0.22 μm). 32. Oven. 33. Autoclave (120 °C, 20 min). 34. Laminar flow hood/cabinet. 35. Neutral red stock solution (4 mg/ml in deionized water). 58 3 Bronwyn J. Barkla et al. Methods 3.1 Working in a Sterile Transfer Hood A sterile working environment is critical to prevent contamination of the plant tissue and medium by microorganisms (bacteria and fungi). If contamination does occur these microorganisms will rapidly colonize the media due to the high sugar and nutrient content and destroy the plant material. 1. Clean all surfaces inside sterile laminar hood/cabinet with 70 % ethanol and allow to air dry. Make sure there are no Bunsen burners on during this step. 2. Place only the necessary material and tools inside the hood/ box and remove the material when it is no longer needed. All material should be sterile and tools should be clean and preferably wiped with a solution of 70 % alcohol. 3. Work with your arms extended into the hood/cabinet and your head and body outside. Try and use only the back 1/3 of the hood as this is the most sterile area; do not obstruct the HEPA air filter with material or body parts as this will affect the laminar air flow and may result in contamination. 3.2 Method for Callus Production 1. Seeds are propagated in 100 mm diameter pots in MetroMix 510 combined with perlite (3:1) for 4 weeks (A. thaliana) or 10 weeks (A. halleri), at which time they are used in the establishment of axenic cell culture as follows. 2. Leaves are washed by immersion in 90 % (v/v) ethanol in a sterile Petri plate for 1 min followed by three rinses with sterile deionized water. 3. Leaves are then surface sterilized by immersion in a 5 % sodium hypochlorite solution and incubated for 10 min with gently mixing. 4. Remove the sodium hypochlorite using a sterile 1 mL pipette tip and rinse the leaves five times with sterile deionized water. 5. After elimination of the water, the sterile leaves are sectioned into 2–4 small pieces using a sterile scalpel blade in a sterile Petri plate. 6. The abaxial sides of the sterile sectioned leaves are placed onto media M1, and Petri plates are sealed with parafilm, covered with aluminum foil and incubated at 25 °C. 7. Following a period of 4 weeks friable calli are obtained ( see Fig. 2a, b and Note 6). 3.3 Method for Regeneration of Plants 1. Friable callus (0.5 g) obtained as indicated in Subheading 3.2 is transferred aseptically to shoot regeneration medium M2, in 9 cm Petri plates, and incubated under a 16-h day length with a photon flux density of 350 μmol m2/s at 25 °C for 4 weeks. Growing Arabidopsis In Vitro 59 Fig. 2 In vitro culture of Arabidopsis. Callus tissue is generated from leaf explants of Arabidopsis halleri (a) and Arabidopsis thaliana (b). Once friable callus is produced it is transferred aseptically to shoot regeneration media (c), followed by root regeneration media (d). The final step is the removal of the fully regenerated plants to the greenhouse (e). Friable callus can also be used to generate stable cell suspension cultures (f) 2. Callus is monitored biweekly for shoot generation. Once shoots containing 2–3 leaf pairs have developed on the calli (see Fig. 2c), these are selected and transferred with the callus to a root regeneration medium M3, in deep transparent sterile containers such as Magenta boxes (see Note 7), and placed in a growth room under a 16-h day length with a photon flux density of 350 μmol/m/s at 25 °C. 3. Root organogenesis is monitored until an abundant root system is formed and the elongated shoots are approximately 2 cm tall (see Fig. 2d). This takes an additional 4 weeks. 60 Bronwyn J. Barkla et al. 4. Plants can be transferred to either dark hydroponic containers (to avoid algae growth) containing 0.5× Hoagland and Arnon solution or into soil in 100 mm pots, under natural light and humidity conditions in a greenhouse maintained at 25 °C (see Fig. 2e and Notes 8 and 9). 3.4 Method for Establishment of Cell Suspension Cultures 1. Transfer aseptically the friable calli (0.5 g) obtained as indicated in Subheading 3.2 into 100 mL of sterile S1 medium in a 500 mL Erlenmeyer flask and swirl the flask to break up the callus tissue into small pieces. 2. Place the flask on a shaker with continuous shaking (150 rpm), in the dark at 25 °C (see Note 10). 3. After a 2-week period, transfer aseptically 10 mL of cells into a sterile 500 mL Erlenmeyer flask containing 90 mL fresh S1 medium by using a 10 mL sterilized glass pipette connected to a pipette pump. Place the flask back onto the shaker (see Note 11). 4. Subculture the cell suspensions every 7 days to maintain the cells in the log phase of growth (see Fig. 2f and Note 12). 4 Notes 1. It is more economical and allows for easier manipulation of nutrients if MS medium is made from scratch as described in Table 1. However, it can also be purchased from several sources in pre-weighed packets. 2. CaCl2 will precipitate if added to stock 1. Therefore, make it as an individual stock as indicated in Table 1. 3. All MS media solutions are stored at 4 °C with the exception of solution 1 which is maintained at room temperature to prevent solidification at the colder temperature. 4. Stocks of growth regulators are prepared at a 1 mg/mL concentration in sterile deionized water and added before autoclaving. These stocks are stored at 4 °C. 5. Hoagland’s solutions are sterilized for 20 min at 120 °C with the exception of the Ca(NO3)2 and Fe2+ solutions that must be sterilized by filtration. In addition the Fe2+ solution must be heated before filter sterilization to oxidize the ferrous. 6. Calli are considered friable when the cells separate easily from the mass and are no longer dense and compacted. 7. Magenta tissue culture boxes are commonly used, but economic replacements are glass baby food jars with lids that can be sterilized. Growing Arabidopsis In Vitro 61 8. To avoid rapid dehydration and plant stress, newly transferred plantlets need to acclimatize to lower humidity levels and should therefore be covered with small transparent plastic cups to maintain adequate humidity. Small holes can be punched into the covers to gradually decrease the humidity over a period of 1 week to that of the atmosphere. This ensures a 100 % survival rate of the regenerated plants. 9. Arabidopsis plants grown in hydroponics do not require aeration of the roots. 10. It is important to remove a small aliquot of cells from the culture to visualize under a microscope and check for cell viability every 2 days. Cell viability can be observed using a drop of neutral red dye. Cells which are viable will accumulate the dye. 11. The top of the Erlenmeyer flask containing the fresh media should be flame sterilized after removing the plug to create an upward hot air draft which directs particles away from the opening. This is repeated after adding the 10 mL of cells before replacing the plug. 12. To culture the cell suspensions for more than 8 days results in rapid browning and cell death as the availability of nutrients diminishes with culturing time. It is recommended to perform a growth curve to determine the cell culture doubling time. It is important to consider that the cell suspension growth varies in each species or cultivar. Acknowledgments Work in the authors’ lab is funded by DGAPA IN203913 to B.J.B., DGAPA 203711 to R.V.-E., and DGAPA 203112 and CONACyT IN79191 to O.P. References 1. Krikorian AD, Berquam DL (1969) Plant cell and tissue cultures: the role of Haberlandt. Bot Rev 35:59–67 2. Gautheret RJ (1937) Nouvelles recherches sur la culture du tissu cambial Cr hebd. Seanc Acad Sci 205:572–574 3. Nobécourt P (1937) Culture en serie de tissus vegetaux sur milieu artificiel. Cr hebd. Seanc Acad Sci 20:521–523 4. White PR (1939) Potentially unlimited growth of excised plant callus in an artificial medium. Am J Bot 26:59–64 5. Steward FC (1958) Growth and organized development of cultured cells. II. Organization 6. 7. 8. 9. in cultures grown from freely suspended cells. Am J Bot 45:705–708 Brown JT, Charlwood BV (1990) Organogenesis in callus culture. Methods Mol Biol 6:65–70 Loewenberg JR (1965) Callus cultures of Arabidopsis. Arabidopsis Inf Serv 2:34 Negrutiu I, Beeftink F, Jacobs M (1975) Arabidopsis thaliana as a model system in somatic cell genetics I. Cell and tissue culture. Plant Sci Lett 5:293–304 Negrutiu I, Jacobs M (1975) Arabidopsis thaliana as a model system in somatic cell genetics II. Cell suspension culture. Plant Sci Lett 8:7–15 62 Bronwyn J. Barkla et al. 10. Lloyd AM, Barnason AR, Rogers SG, Byrne MC, Fraley RT, Horsch RB (1986) Transformation of Arabidopsis thaliana with Agrobacterium tumefaciens. Science 234:464–466 11. Leonelli S (2007) Arabidopsis, the botanical Drosophila: from mouse cress to model organism. Endeavour 31:34–38 12. Gamborg O, Miller R, Ojima K (1968) Nutrient requirements of suspension cultures of soybean root cells. Exp Cell Res 50: 151–158 13. Murashige T, Skoog F (1962) A revised medium for rapid growth and bioassay with tobacco tissue cultures. Physiol Plant 15: 473–479 14. Hoagland DR, Arnon DI (1938) The water culture method for growing plants without soil. Calif Agric Exp Station Circ 347:1–39 Part II Arabidopsis Resources Chapter 4 Arabidopsis Database and Stock Resources Donghui Li, Kate Dreher, Emma Knee, Jelena Brkljacic, Erich Grotewold, Tanya Z. Berardini, Philippe Lamesch, Margarita Garcia-Hernandez, Leonore Reiser, and Eva Huala Abstract The volume of Arabidopsis information has increased enormously in recent years as a result of the sequencing of the reference genome and other large-scale functional genomics projects. Much of the data is stored in public databases, where data are organized, analyzed, and made freely accessible to the research community. These databases are resources that researchers can utilize for making predictions and developing testable hypotheses. The methods in this chapter describe ways to access and utilize Arabidopsis data and genomic resources found in databases and stock centers. Key words Data mining, Database, Genomics, Gene expression, Bioinformatics, Computational biology, Stocks, Arabidopsis thaliana 1 Introduction Arabidopsis thaliana serves as the primary model system for many aspects of plant biology. It was the first plant to have its entire nuclear genome sequenced [1]. Following the completion of the Arabidopsis genome sequencing in 2000, the international Arabidopsis community set an ambitious goal to determine the function of every Arabidopsis gene by the year 2010 [2]. Numerous laboratories internationally have taken part in this project (Multinational Coordinated Arabidopsis thaliana Functional Genomics Project). Large amounts of data about gene function, expression, metabolism, and protein and gene interactions have been generated by these labs. To accomplish the task of organizing and managing the data, lab consortia and individual labs have created databases to store the information generated and make it available to the research community. Community resources such as genome-wide DNA clones and knockout mutant libraries (e.g., SALK T-DNA insertion lines) were also created [3]. There are now extensive tools and resources for storage, curation, and Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_4, © Springer Science+Business Media New York 2014 65 66 Donghui Li et al. retrieval of Arabidopsis data and DNA and seed stocks. Scientists doing research in this “postgenomic” era are compelled to know how to make use of these resources to find the relevant information and stocks needed to further their research. In this chapter, we describe how to use databases to find what is known about Arabidopsis and to make inferences and predictions that can later be tested experimentally. We include a summary of the rationale, a brief description of the database/tool(s), and the specific steps for querying, retrieving, and interpreting the data. Methods on how to search and order DNA or seed stocks are also provided. The methods, along with the corresponding databases and tools, are outlined in Table 1. This table of contents can be used to find specific methods of interest within the chapter. Databases described here represent a small portion of the vast collection of databases and bioinformatics resources available for Arabidopsis researchers. In this chapter, we focus on well-developed resources that provide comprehensive Arabidopsis data (including stocks) such as TAIR (The Arabidopsis Information Resource) [4– 8] and ABRC (Arabidopsis Biological Resource Center) [9]. There are many more databases that focus on specific types of Arabidopsis information such as subcellular localization (SUBA: SUB cellular location database for Arabidopsis proteins, http://suba.plantenergy.uwa.edu.au/) [10], whereas others focus on specific classes of genes or disseminate data from a functional genomics project, e.g., the Chloroplast 2010 database (http://www.plastid.msu.edu/) [11]. Many links to these external resources and US National Science Foundation 2010 Arabidopsis functional genomics project pages (http://www.arabidopsis.org/portals/masc/projects.jsp) are provided on the TAIR Portal pages (http://www.arabidopsis. org/portals/). There is also currently an ongoing effort aiming to integrate all Arabidopsis database resources by the proposed International Arabidopsis Informatics Consortium [12]. In addition to databases that are entirely devoted to Arabidopsis (“Arabidopsis specific”), there are also numerous multi-species databases containing Arabidopsis data along with information about other organisms, such as the National Center for Biotechnology Information’s (NCBI) GenBank (http://www.ncbi.nlm.nih.gov/ genbank/), the European Bioinformatics Institute’s (EBI) InterPro (http://www.ebi.ac.uk/interpro/), UniProt (http://www.uniprot.org/), and PlantGDB (http://www.plantgdb.org/), to name a few. Some of these databases are listed in Table 1. This chapter does not intend to cover all these databases in depth; instead we hope it will serve as a good starting point for anyone who wishes to explore these valuable resources. Arabidopsis seed and DNA stocks and other biological materials can be obtained from a number of different institutions around the world. These stock centers provide different kinds of materials and different levels of service. The Arabidopsis Biological Resource URL http://www.associomics.org/Associomics/ Home.html http://www.arabidopsis.org/tools/ nbrowse.jsp http://thebiogrid.org/ http://plants.ensembl.org/Arabidopsis_ thaliana/Info/Index http://suba.plantenergy.uwa.edu.au/ InterPro 1001 Genomes http://www.ebi.ac.uk/interpro/ http://1001genomes.org/ Finding gene sequence and structure data TAIR: Sequence Bulk Download and http://www.arabidopsis.org/tools/bulk/ Analysis sequences/index.jsp TAIR: GBrowse http://gbrowse.tacc.utexas.edu/cgi-bin/ gb2/gbrowse/arabidopsis/ TAIR: WU-BLAST http://www.arabidopsis.org/wublast/ index2.jsp TAIR: Bulk Protein Download http://www.arabidopsis.org/tools/bulk/ protein/index.jsp Phytozome http://www.phytozome.net/ BioGRID (Biological General Repository for Interaction Datasets) MIND (Membrane protein interaction database) SUBA (SUBcellular location database for Arabidopsis proteins) TAIR: NBrowse Ensembl Plants Genome Browser Finding comprehensive information about Arabidopsis genes TAIR: Gene Search http://www.arabidopsis.org/servlets/ Search?action=new_search&type=gene NCBI: Gene Search http://www.ncbi.nlm.nih.gov/gene/ Database: tool Table 1 Selected Arabidopsis databases and stock resources (continued) Comparative genomic database providing access to 25 green plant genomes which have been clustered into gene families Finding predicted protein signature (domain) information Arabidopsis thaliana genetic variation database 3.2.4. Finding protein structure and domain information 3.2.3. Finding related DNA or protein sequences 3.2.2. GBrowse 3.2.1. Retrieving DNA and protein sequence data Finding Arabidopsis membrane interactome data Finding protein–protein interaction information Finding protein–protein interaction information Finding genes in NCBI’s Reference Genome Collection. Search by locus identifiers, symbol, etc. Multi-species plant genome database providing access to Arabidopsis and other plant genomic data Finding protein subcellular location information 3.1.1. Finding genes in TAIR by name Protocol/description Database and Stock Resources 67 URL Protocol/description AraCyc/PlantCyc/PMN http://www.plantcyc.org Obtaining information about metabolism in Arabidopsis KEGG (Kyoto Encyclopedia of Genes and http://www.genome.jp/kegg/ Genomes) Kazusa Plant Pathway Viewer http://kpv.kazusa.or.jp (KaPPA-View4) KNApSAck http://kanaya.naist.jp/KNApSAcK/ MetNet http://metnetonline.org/ Arabidopsis Reactome http://www.arabidopsisreactome.org/ MetaCrop http://metacrop.ipk-gatersleben.de. TAIR: Plant Ontology Search http://www.arabidopsis.org/tools/bulk/ po/index.jsp TAIR: Microarray Expression Search http://www.arabidopsis.org/servlets/ Search?action=new_search&type=expression Plant Ontology Consortium Database http://www.plantontology.org/ NASCArrays http://affy.arabidopsis.info/narrays/ experimentbrowse.pl ArrayExpress http://www.ebi.ac.uk/arrayexpress/ NCBI GEO (Gene Expression Omnibus) http://www.ncbi.nlm.nih.gov/geo/ Genevestigator https://www.genevestigator.com eFP Browser http://bar.utoronto.ca/efp/cgi-bin/ efpWeb.cgi Finding information about gene expression 3.5. Obtaining information about metabolism in Arabidopsis 3.5. Obtaining information about metabolism in Arabidopsis 3.5. Obtaining information about metabolism in Arabidopsis 3.5. Obtaining information about metabolism in Arabidopsis 3.5. Obtaining information about metabolism in Arabidopsis 3.5. Obtaining information about metabolism in Arabidopsis 3.5. Obtaining information about metabolism in Arabidopsis Searching or browsing PO and PO annotations Finding microarray data from the European Arabidopsis Stock Center’s microarray database European Bioinformatics Institute’s microarray database NCBI’s gene expression data repository Analysis tool for mining microarray datasets Analysis tool for mining microarray datasets 3.4.2. Finding DNA microarray data 3.4.1. Finding Plant Ontology annotations Finding Gene Ontology (GO) annotations TAIR: Gene Ontology Annotations http://www.arabidopsis.org/tools/bulk/ 3.3.1. Finding GO annotations Search go/index.jsp TAIR: Keyword Search http://www.arabidopsis.org/servlets/ 3.3.2. Finding genes annotated to related functions or Search?action=new_search&type=keyword processes http://amigo.geneontology.org/cgi-bin/ AmiGO Searching the Gene Ontology database amigo/go.cgi Database: tool Table 1 (continued) 68 Donghui Li et al. http://www.ncbi.nlm.nih.gov/pubmed/ http://arabidopsis.org/servlets/ Search?action=new_ search&type=publication http://www.textpresso.org/arabidopsis/ Submitting your data or DNA/seed stocks TAIR: Submit Data http://www.arabidopsis.org/submit/ index.jsp http://www.arabidopsis.org/doc/submit/ functional_annotation/123 PMN: Submit Data http://www.plantcyc.org/feedback/data_ submission.faces ABRC: Stock Donation https://abrc.osu.edu/donate-stocks TAIR: Textpresso Full-Text Search Searching literature databases NCBI PubMed Database TAIR: Publication Search Finding and ordering seed and other resources ABRC: Stock Catalog http://www.arabidopsis.org/servlets/ Order?state=catalog TAIR/ABRC: Seed Germplasm Search http://www.arabidopsis.org/servlets/ Search?action=new_ search&type=germplasm http://www.arabidopsis.org/servlets/ TAIR/ABRC: DNA/Clones Search Search?action=new_search&type=dna NASC (European Arabidopsis Stock http://arabidopsis.info/ Centre) RIKEN Biological Resource Center http://www.brc.riken.jp/lab/epd/Eng/ Experimental Plant Division (Japan) catalog/seed.shtml French National Institute for Agricultural http://www-ijpb.versailles.inra.fr/en/cra/ research (INRA) Arabidopsis Resource cra_accueil.htm Center for Genomics http://www.gabi-kat.de/ Bielefeld University SIGnAL(Salk Institute Genomic Analysis http://signal.salk.edu/cgi-bin/tdnaexpress Laboratory): T-DNA Express 3.9.3. Donating seed and DNA stocks to ABRC 3.9.2. Submitting data to PMN 3.9.1. Submitting data to TAIR 3.8.3. Searching full-text literature 3.8.1. Finding articles in PubMed 3.8.2. Finding publications in TAIR Providing Arabidopsis T-DNA lines (GABI-Kat lines) Finding T-DNA insertion sites 3.7. Finding and ordering other (non-seed) resources from ABRC Finding and ordering seed and clone stocks from the European Arabidopsis Stock center Providing Arabidopsis transposon-tagged lines and activation tagging lines Providing Arabidopsis T-DNA lines (FLAG lines) 3.6. Finding and ordering seed resources from ABRC 3.6. Finding and ordering seed resources from ABRC Database and Stock Resources 69 70 Donghui Li et al. Center (ABRC), located in North America, and the European (Nottingham) Arabidopsis Stock Centre (NASC) represent the two largest stock centers and essentially mirror each other’s seed collections. The collections of both centers will be discussed in more detail in the next section (Subheadings 2.2.2 and 2.2.3). The RIKEN BioResource Center (BRC) Experimental Plant Division in Japan has some unique resources, e.g., lines overexpressing Arabidopsis full-length cDNAs (FOX), and operates under the restriction of Material Transfer Agreements (MTAs) (Table 1). The French National Institute for Agricultural Research (INRA) in France and the Bielefeld University in Germany distribute locally developed collections of T-DNA lines (FLAG and GABI-Kat, respectively) [13, 14]. Although historically both institutions restricted the distribution by requiring an MTA, these restrictions have been lifted either completely or for the greater part of their collections. As with any web-based informatics resource, database content and tools change over time. The protocols described here use tools and data available in databases and stock centers as of December 2011. 2 Materials Programming experience is an asset to a scientist who wishes to analyze and manipulate complex, large datasets, but it is not essential to effectively mine databases. Anyone with access to the internet and a reasonably up-to-date computer should be able to perform all the steps in the protocols. A basic familiarity with computers, Internet browsers, and commonly used bioinformatics tools such as BLAST is assumed. There are a wide variety of textbooks, manuals, and web-based tutorials available for learning the basics of bioinformatics. 2.1 Computer Hardware and Software for Database Mining The minimum requirements for database mining are a personal computer (PC), an internet connection, and web browsing software. A high-speed network connection is desirable to ensure faster data access. Up-to-date web browser software, such as Internet Explorer, Firefox, or Safari, is also required. Database interfaces should behave the same regardless of what operating system or browser is used. However, some functions may not work properly on older browsers. If possible, you should upgrade your browser to the most recent version available that can run on your operating system. The browser must have cookies enabled if users want to log in and place stock orders through TAIR. JavaScript must also be enabled to use TAIR since TAIR makes extensive use of this feature. See http://www.arabidopsis.org/help/index.jsp for information on properly configuring your browser. Note that for other databases mentioned in this chapter, there may be specific browser preferences. Database and Stock Resources 71 2.2 Databases and Stock Centers Databases are information storage and retrieval software systems. Typically, databases have three components: the database software for storing data, software that translates and executes requests (queries), and software applications that allow users to view data. This section describes three commonly used Arabidopsis resources. Additional databases can be found in Table 1. 2.2.1 The Arabidopsis Information Resource TAIR (http://www.arabidopsis.org) is a comprehensive web resource for the biology of A. thaliana [4–8]. It provides a centralized, curated gateway to Arabidopsis biology, research materials, and community members. Data available from TAIR includes the complete Arabidopsis genome sequence along with gene structure, gene product information, metabolism, expression data, genome maps, genetic and physical markers, publications, and information about the Arabidopsis research community. In addition, seed and DNA stock information and ordering from the Arabidopsis Biological Resource Center (ABRC) are fully integrated into TAIR. TAIR is a curated database; data are processed by biocurators with biology Ph.Ds who ensure their accuracy. TAIR data come from a variety of sources including in-house manual curation of published literature and sequence data, locally run computational pipelines for annotating gene structure and function, integration of data from other biological databases and resources (GenBank, ABRC, Gene Ontology Consortium, etc.), and submissions from the research community. TAIR also provides researchers with an extensive set of data visualization and analysis tools. A comprehensive guide on how to use TAIR is available [15]. 2.2.2 The Arabidopsis Biological Resource Center The ABRC collects, preserves, reproduces, and distributes diverse seed and other resources for A. thaliana and related species. The center is located at The Ohio State University in Columbus, Ohio, USA. The ABRC serves a dynamic community of plant researchers with a common goal to understand the basic processes of flowering plants, as well as to apply this understanding to further crop improvement. Seed stocks at the ABRC include classical mutants (see Note 1), natural accessions, T-DNA and transposon insertion collections, mapping populations, the TILLING collection, and seeds from related species (e.g., Arabidopsis arenosa, Brassica rapa, Capsella rubella). Other resources include cell suspension cultures, protein chips, full-length cDNA and ORF clones in recombinationready and expression vectors, expressed sequence tagged (EST) and bacterial artificial chromosome (BAC) clones of Arabidopsis and related species, phage and plasmid libraries, and diverse vectors for cloning and expression. In addition, the ABRC has recently started distributing educational resources. Due to a large demand, this type of resource will be expanded further. This example illustrates how the resources provided by the ABRC closely track the emerging needs of the community. Seed resources are exchanged with the European Arabidopsis Stock Centre (NASC) 72 Donghui Li et al. in Nottingham, UK. Researchers in the Americas are required to order seed stocks from ABRC, while researchers in Europe are required to order seeds from the NASC, but both can order DNA and other stocks from either center. Researchers outside of the Americas and Europe may order seed and other resources from either the ABRC or the NASC. The ABRC stock information and ordering system are hosted by TAIR (http://www.arabidopsis. org), and all functions can be accessed through the ABRC Stocks drop-down menu on the right side of the menu bar at the top of most TAIR pages. 2.2.3 The European Arabidopsis Stock Centre (NASC) 3 The European (Nottingham) Arabidopsis Stock Centre (NASC) provides Arabidopsis seed and information resources to the plant research community in coordination with the ABRC as described in the previous section. The NASC’s stock collection includes seeds of A. thaliana and related species, tomato seed resources, DNA clones, and diverse cloning vectors. In addition, the NASC provides an International Affymetrix GeneChip hybridization service for a wide range of species including Arabidopsis and many other plants [16]. The data they collect through their hybridizations as well as other user-supplied Arabidopsis data are made publicly available through their NASCArrays database. The NASC stock information, ordering, and NASCArrays database are available at http://www.arabidopsis.info. Methods A primary objective of database mining for most researchers is to find out everything that is known about a specific gene or set of genes. Some of the basic questions are the following: What’s the sequence and structure of my gene? What type of protein does my gene encode? In what biological processes is it involved? With what other genes/proteins does it interact? In what tissues is it located and how is it regulated? In order to generate a testable hypothesis and design meaningful experiments, the current available knowledge must be obtained and analyzed. 3.1 Finding Comprehensive Information About Arabidopsis Genes After over 12 years of development, TAIR now serves as a central access point for Arabidopsis data. The TAIR home page (http:// www.arabidopsis.org) is the main entry point to the database. The navigation toolbar provides easy access to the eight major functionalities: Search, Browse, Tools, Portals, Download, Submit, News, and ABRC Stocks. When mousing over each item in the toolbar, a drop-down menu appears with clickable submenus that lead to a variety of dataset, tools, and external links. Log-in is not required for searching and viewing data but is required for ordering DNA or seed stocks from the ABRC and for submitting gene Database and Stock Resources 73 functional data. Here, we describe how to use the TAIR Gene Search tool and locus detail page to find information about Arabidopsis genes. 3.1.1 Finding Genes in TAIR by Name TAIR’s locus detail page represents the most comprehensive starting point for a researcher to find out what is known about a gene. There are two commonly used ways to find genes and to get to the locus detail page: using the quick search and advanced Gene Search form. 1. To perform a quick search, go to the header on any TAIR page that has a quick search tool in the upper right corner. Enter the gene name (e.g., ABI3 or AT3G24650) in the text box and use the default “Gene” option on the drop-down menu. Click Search. A list of all matching records is displayed on a page titled TAIR Gene Search Results (see Note 2). 2. To perform a gene search using the advanced Gene Search form, on any TAIR page with a top navigation bar, select “Genes” from the Search drop-down menu (http://www.arabidopsis. org/servlets/Search?action=new_search&type=gene). 3. Define the name search criteria. To search by name, choose “Gene name” as the option from the Search Name drop-down menu. This option is used to search by symbolic names (e.g., ABI3), full names (e.g., ABA INSENSITIVE 3), or AGI locus identifiers (e.g., AT3g24560). AGI (Arabidopsis Gene Identifier) locus identifiers are systematic names assigned based on chromosomal location. 4. Choose an exact or inexact search mode. When searching with a gene symbol choosing the “starts with” option is a way to find similarly named genes, such as members of a gene family (e.g., ARF for Auxin Response Factor family). When searching with a GenBank accession, it is better to use an exact match in order to avoid retrieving spurious results. To search for a word or phrase within a gene description, switch from a “Gene name” search to a “description” search and choose the “contains” option. 5. Select the output format. The default values are 25 records, sorted by name. The position option can be used when finding genes by location. 6. Click “Submit Query” to start your search. All of the loci that match the query term will be displayed in a list of results (on a page titled TAIR Gene Search Results). Click on the locus name to view the locus detail page. 3.1.2 Using TAIR’s Locus Detail Page to Find Information About a Gene The locus detail page contains a wealth of information about a gene including its sequence, and function, and associated polymorphism, mutant phenotypes, and publication. This page also includes 74 Donghui Li et al. links to a large number of external databases and tools. To see an example locus detail page, go to http://www.arabidopsis.org/ servlets/TairObject?type=locus&name=AT3G24650. This section describes the typical data types displayed on the locus page. 1. Gene summary information: TAIR uses the AGI locus identifier (e.g., AT3G24650) as the primary gene name. Other names including both abbreviated gene symbols and the corresponding full names are displayed in the Other Names section. The Description field provides a short summary of the gene’s function either manually composed by a curator or computationally generated (see Note 3). 2. Gene model information: A locus in TAIR refers to the physical location of an annotated gene on the chromosome. One locus can have several gene models or splice variants associated to it based on alternatively spliced mRNAs (e.g., At5g01810.1, At5g01810.2, At5g01810.3). The representative gene model for a protein coding gene is the gene with the longest coding sequence (CDS); for other gene types, the representative model is set as default to the .1 model. The Gene model page contains model-specific information such as exon–intron positions, protein domains, and gene model-specific function information. The Map Detail Image section displays the exon–intron structures of all gene models of a locus. Clicking on the image directs the user to GBrowse (see Subheading 3.2.2). 3. Gene function annotations: The Annotations section displays all of the Gene Ontology (GO) [17] and Plant Ontology (PO) [18] controlled vocabulary terms that describe the function and expression of the gene product. GO terms describe the molecular function, biological process, and subcellular localization of the gene product, while the PO consists of growth and development stages and plant structure terms capturing the temporal and spatial expression of the gene product. Detailed information including references and supporting evidence can be obtained by clicking on the “Annotation Detail” link located at the bottom of this section. How to find GO and PO annotations is described later (Subheadings 3.3.1 and 3.4.1). 4. Gene expression: Information about gene expression can be found in the Plant Ontology annotations section and in the RNA Data section. In the RNA Data section, array elements from pre-2005 one-channel and/or two-channel microarray experiments that map to the locus are listed. For elements whose expression has been analyzed across all experiments, the average log ratio of expression values, along with standard error, is provided. For these elements, links to the Expression Viewer (for finding similarly expressed genes) and Spot History (only available for microarray elements from arrays in the Stanford Microarray Database) are also available [15]. Database and Stock Resources 75 Please note that no new microarray expression datasets have been entered into TAIR since June 2005; instead TAIR provides links to high-quality gene expression resources in its External Links section on every locus page. The Associated Transcripts subsection within the RNA Data section lists full-length cDNAs and expressed sequence tags (ESTs) associated with the locus. 5. Nucleotide sequence: Links to the full-length CDS and fulllength cDNA of the representative gene model plus the fulllength genomic sequence are provided in this section. 6. Protein Data: This section displays the structural and physical characteristics of the protein encoded by the representative gene model, including length (amino acid), predicted molecular weight, isoelectric point, and domains. Click on the AGI name in the protein section to open a Protein detail page. Protein detail page provides more detailed information and the amino acid sequence for the representative gene model. To view nucleotide and protein data for other gene models, go to the specific gene model page. 7. Map Locations: This section displays the chromosome and coordinates of the locus for the maps on which it is found. TAIR provides three tools to view a gene in a whole-genome context: Map Viewer, Sequence Viewer, and GBrowse. 8. Polymorphism: This section contains all of the polymorphisms mapped to the locus. Both natural variations found in different ecotypes and induced mutations (e.g., T-DNA insertions) are shown. Note that by default this section only displays 15 entries, but a complete list can be obtained by clicking on the “See All” link right under this section’s name. 9. Germplasm: This section provides information on all germplasms currently in the database associated with a locus and includes phenotype descriptions, mutant images, stock numbers, and ordering options when available. 10. External Link: TAIR links extensively to external sites that offer either alternative views of or different information about a locus, e.g., other Arabidopsis genome annotation databases, gene expression databases, functional genomics sites, and data analysis tools. Links to external sites and tools can also be found on the Portal pages (http://www.arabidopsis.org/portals/ index.jsp). 11. Comments: This section contains statements contributed by registered TAIR users. Comments can be added to nearly all of the TAIR detail pages. This function can be used to report new data, errors, or omissions related to the displayed object. 12. Publications: Publications include published literature imported from PubMed, Agricola, and BIOSIS, along with abstracts from the International Conference on Arabidopsis 76 Donghui Li et al. Research (ICAR). Only 15 entries are initially displayed on the locus page. At the bottom of the Publications section, click on “View Complete List” to see all records. Click on the title of the publication to view a detailed publication record that provides a link to the corresponding PubMed abstract or publication text when available. 3.2 Finding Gene Sequence and Structure Data The primary source of Arabidopsis gene sequence and structure data at many biological databases is TAIR. In an ongoing effort to improve the annotation of the Arabidopsis genome, TAIR has released updated versions of the Arabidopsis gene set on a yearly basis since 2005 [7, 8]. TAIR’s genome annotation is widely distributed to other major databases such as GenBank and UniProt. Therefore, these databases often have overlapping datasets. Here, it is described how to find sequence and structure data from TAIR. TAIR provides gene sequence and structural data (i.e., the exon–intron architecture of a gene) in a variety of formats. DNA and protein data for an individual gene can be found on the locus and gene model pages (see Subheading 3.1.2). For those users interested in downloading complete sequence datasets, the TAIR ftp site provides sets of sequence files in FASTA format organized by TAIR release (e.g., TAIR10, TAIR9) and data types (e.g., coding sequence or CDS, cDNA, genomic DNA, and promoter regions) at ftp://ftp.arabidopsis.org/home/tair/Genes/. For users looking for a subset of gene sequences, the Sequence Bulk Download and Analysis tool generates sequence files based on a list of AGI locus identifiers (see Subheading 3.2.1). In addition to sequence-based information, TAIR also provides structural information about each gene. The complete set of genome coordinates of each feature (such as exon, CDS, and 5′ untranslated region or 5′UTR) for every gene in the TAIR genome release is available in GFF3 format (ftp://ftp.arabidopsis.org/ home/tair/Genes/TAIR10_genome_release/TAIR10_gff3/). For a visual snapshot of a gene’s exon–intron structure, users can refer to the TAIR locus page where a graphic displays the structural architecture of each splice variant annotated at that locus. TAIR also offers two different genome browsers: GBrowse and SeqViewer. While both browsers allow users to explore their genomic region of interest, the browsers are quite distinct and are used for different purposes. GBrowse is especially useful for analyzing a wide variety of data types that overlap with a chromosomal/gene region of interest. The tool contains a menu of datasets divided into sections such as expression data and sequence similarity data, which users can select to visualize in the main browser window. SeqViewer lends itself especially well for nucleotide-based analysis. Users can search the genome using either a name or a sequence, and thanks to the detailed “SeqViewer Nucleotide View,” users can have a detailed look at the corresponding genome-based nucleotide sequence Database and Stock Resources 77 decorated with annotated genes, T-DNA insertions from the SALK T-DNA insertion lines and other mutant collections, polymorphisms, and more. Detailed instructions on how to use SeqViewer are described elsewhere [15]. 3.2.1 Retrieving DNA and Protein Sequence Data TAIR’s Sequence Bulk Download and Analysis tool allows the user to retrieve DNA and protein sequence data in bulk for a list of genes (or a single gene). 1. On any TAIR page with a top navigation bar, select “Bulk Data Retrieval” from the Tools drop-down menu. Then select “Sequences.” Alternatively go directly to the URL http:// www.arabidopsis.org/tools/bulk/sequences/index.jsp. 2. Enter individual or a set of AGI locus or gene model identifiers (e.g., At5g01810, AT1G01040.2) into the text box or upload a text file containing AGI locus or gene model identifiers. Select the desired data type from the Dataset drop-down menu (e.g., transcripts, coding sequence, and genomic locus sequences). This tool allows the user to retrieve sequences for the representative gene model, all gene models, or only the gene model matching the user query. 3. Select the FASTA or tab-delimited text output options. Click on “Get Sequences” to perform the search. More information on how to use the tool can be found by following the link to the Help document. 4. This tool can also be accessed whenever a user generates a “Gene Search Results” page by clicking on the “get all sequences” or “get checked sequences” button at the top of the page. 3.2.2 Searching for a Gene, Its Overlapping Transcripts, and Orthologous Genes in Other Organisms Using GBrowse 1. On any TAIR page with a top navigation bar, select GBrowse from the Tools drop-down menu (http://gbrowse.tacc.utexas. edu/cgi-bin/gb2/gbrowse/arabidopsis/). The GBrowse display is divided into five main sections: (1) Instructions, which provides directions and examples of GBrowse search queries; (2) Search, which allows the user to enter a query and select a data source; (3) Overview, which shows a graphical representation of the chromosome and region currently displayed; (4) Details, which provides a graphical representation of the genomic features in the selected region; and (5) Tracks, which allow the user to customize the display settings and select which features are displayed in the detail section [15]. 2. To select a region of the genome to view, enter its name in the “Landmark or Region” search box (e.g., At1g01040). Select the desired dataset from the “Data Source” drop-down menu. The most recent TAIR genome release is the default data source. Clicking on “Search” will update the overview 78 Donghui Li et al. and the detail display. Use the “Scroll/Zoom” feature to move along the chromosome or display a larger-/smaller-scale view of the genome. 3. Customize the GBrowse display. TAIR GBrowse has 11 track categories: Assembly, Community/Alternative Annotation, DNA, Expression, Gene, Genomic Features, Methylation and Phosphorylation, Orthologs and Gene Families, Sequence Similarity, Variation, and Analysis. New tracks may be added in the future. Each track category has multiple check boxes for different types of data. Mouse over a track name to display further information about the track. You can add or remove tracks from the detail display by checking or unchecking the required tracks and clicking the “Update Image” button. You can also upload your own annotation data in a special format to GBrowse using the “Add your own tracks” feature. For instructions on file format and uploading click on the “Help” link in this section. 4. To download the sequence in a particular region, go to the “Reports and Analysis” feature box and select “Download Decorated FASTA File” from the menu options. This file format allows you to highlight specific features of interest (e.g., coding regions in red) on the FASTA sequence file. Use the “Configure” option to select which features to highlight and the desired markup options such as font styles and background colors and then click “GO.” The new web page will display the FASTA sequence for the region displayed in the detail view with the selected features highlighted. 3.2.3 Finding Related DNA or Protein Sequences in Arabidopsis For sequenced genes with limited experimental data, one of the first steps toward understanding a gene’s function is to search for evolutionarily related genes. The function of an unknown gene may be inferred from its similarity to a well-characterized homolog. Searching for similar DNA or protein sequences in Arabidopsis using local sequence alignment methods can be performed at TAIR and NCBI. These groups share some overlapping Arabidopsis datasets; but TAIR has some Arabidopsis-specific datasets not found at NCBI (http://www.arabidopsis.org/help/helppages/ BLAST_help.jsp#datasets). These datasets are used by all of TAIR’s sequence similarity programs (WU-BLAST, NCBI BLAST, FASTA, PatMatch) [15]. This section illustrates how to use TAIR’s WU-BLAST tool to identify similar genes in Arabidopsis. 1. On any TAIR page with a top navigation bar, select WU-BLAST from the Tools drop-down menu (http://www.arabidopsis. org/wublast/index2.jsp). 2. Select the appropriate BLAST program. Five different algorithms are available to match amino acid or nucleotide sequences. The choice of the program depends on the type of sequence Database and Stock Resources 79 to be queried and the query database. For example, when comparing a protein sequence to other protein sequences, choose the BLASTP program. 3. Input your query. The tool accepts sequences or locus identifiers as inputs. To use a sequence as input, paste in the sequence as raw text or in FASTA format, or upload it from a file. Sequences pasted directly from GenBank records can also be used. To use a locus identifier as input, choose the locus name option under the input header, and type in the name of the locus, or upload it from a file. When using locus identifiers as input, the program retrieves the coding sequence (CDS) for the representative gene model; therefore, it cannot be used with the BLASTP or TBLASTN options. To perform a search using more than one query sequence, submit multiple sequences as a list of locus identifiers or as a set of FASTA formatted sequences, each sequence having its own FASTA header. 4. Define the dataset to search against. For example, to find homologous proteins in Arabidopsis choose the AGI protein dataset. This dataset is a non-redundant set of all known Arabidopsis proteins and includes all proteins generated through alternative splicing. 5. Customize the BLAST search parameters. The default parameters are filtering on an expect threshold (cutoff) of 10. The default S value is calculated based on the E value and represents the single high-scoring pair (HSP) score that satisfies the expected threshold. 6. Submit the query. Click on the “Run BLAST” button. If you have chosen an inappropriate combination of query sequence and database, an error will be returned to your browser. Results from the WU-BLAST search are presented in a graphical format that can be used to rapidly assess the significance of the results. The graph displays the query sequence in red and the HSP matches below. The length of the bar corresponds to the length of the HSP, and the color of the bar indicates the range of expected values (the probability of finding the sequence match by random chance). The direction of the bar indicates whether the match is on the forward or reverse strand. Pointing the mouse over the HSP markers will display the description line of the matched sequence. Clicking on the HSP will display the selected sequence alignment. For AGI genes and loci, the name in the alignment is hyperlinked to the TAIR locus detail page. 3.2.4 Finding Protein Structure and Domain Information The function of an unknown gene may also be inferred from the presence of conserved domains. For example, proteins with an F-box domain (IPR001810, http://www.ebi.ac.uk/interpro/ entry/IPR001810),) typically form part of an SCF E3 ubiquitin 80 Donghui Li et al. ligase, whereas proteins with a kinesin motor domain (IPR001752, http://www.ebi.ac.uk/interpro/entry/IPR001752) may be involved in intracellular transport in association with the cytoskeleton. Additional sequence-based features, such as transmembrane domains or a KDEL endoplasmic reticulum retention signal, can be used to infer protein localization. Protein structural data including predicted domain can be found at various databases such as TAIR, NCBI, and InterPro. Here, we describe how to use TAIR’s Bulk Protein Download tool to obtain a list of structural, physical, and chemical properties for a set of proteins. 1. On any TAIR page with a top navigation bar, select “Bulk Data Retrieval” from the Tools drop-down menu. Then select “Proteins” (http://www.arabidopsis.org/tools/bulk/protein/ index.jsp). 2. Choose the output display. The output options include molecular weight, isoelectric point, intracellular locations, domains, number of transmembrane domains, UniProt ID, and SCOP’s structural class. Selecting the HTML format option will display links to TAIR locus detail pages, protein sequences, SeqViewer graphical displays, and protein records in UniProt/ Swiss-Prot, and InterPro. The last two links are shown only if domains and Swiss-Prot IDs are included in the output. Choose “text” output if you wish to download the data into your computer. Queries that return more than 1,000 results will be returned as text-only format. 3. Limit the search by protein properties. For example, to obtain a list of proteins with a given range of molecular weights, check the box next to “Predicted Molecular Weight” and enter the lower and upper limits of the desired weight range in the adjacent text boxes. 4. Submit the query by clicking on the “Get Protein Data” button. Protein domain annotations may not be consistent from database to database because different analysis methods or sequences are used. Domain databases are also updated frequently as new domain structures are identified. Frequent checks of genome databases should be done to determine whether new domains have been identified. 3.3 Finding Gene Ontology (GO) Annotations To make data about a gene’s function more amenable to computational methods of querying and analysis, many databases use structured controlled vocabularies for annotating gene products. The Gene Ontology vocabularies developed by the Gene Ontology Consortium (http://www.geneontology.org) have been widely adopted by many biological databases and are considered to be the standard for gene function annotation. GO describes three aspects Database and Stock Resources 81 of a gene product: molecular function, biological process, and cellular component (subcellular localization) [17]. TAIR is the primary source of GO annotations for Arabidopsis genes. Additional sources of Arabidopsis GO annotations include TIGR (The Institute for Genomic Research) (see Note 4), UniProtKB-GOA (UniProt Knowledge Base Gene Ontology Annotation group) and the GO Consortium [19]. Members of the research community also contribute GO annotations through TAIR’s journal collaboration program and through voluntary user submissions [20]. Annotations from all the above sources are displayed in TAIR. Users can also access these annotations from the central GO database using the AmiGO query tool for making cross species queries (http://amigo.geneontology.org/). This section describes how to find GO annotations at TAIR (see Note 5 for information about how to correctly interpret them). 3.3.1 Finding GO Annotations Users can view GO annotations for a single gene from its locus detail page and can also download TAIR’s whole-genome GO annotation file from its ftp site (ftp://ftp.arabidopsis.org/home/ tair/Ontologies/Gene_Ontology/). This file is updated on a weekly basis. To retrieve GO annotations for a specific gene or set of genes, use TAIR’s Gene Ontology Annotations Search tool. 1. On any TAIR page with a top navigation bar, select “Gene Ontology Annotations” from the Search drop-down menu (http://www.arabidopsis.org/tools/bulk/go/index.jsp). 2. Input the locus identifier(s) in the query box. Type, paste, or upload a file containing your list of locus identifiers. 3. Define output options. Select HTML to view hyperlinked results. Choose text for saving the results as a text file. 4. To obtain a list of annotations, click on the “Get all GO Annotations” button at the bottom of the page. 5. Alternatively, instead of getting a list of all annotations, the genes can be grouped into broader categories based on their annotations. After inputting the locus identifiers (step 2 above), choose “HTML” output and click the Functional Categorization button. The functional categorization table data can be transformed into a pie chart by clicking on the “Draw Annotation Pie Chart” button at the top of the results page. Further details on functional categorization based on GO annotations are described in Chapter 5. 3.3.2 Finding Genes Annotated to Related Functions or Processes By using structured controlled vocabularies, GO annotations allow researchers to quickly find what genes may act in a pathway (genes annotated to the same biological process term) or have similar function (genes annotated to the same molecular function term). For example, ERA1 (AT5G40280) encodes a protein 82 Donghui Li et al. farnesyltransferase; mutants have low prenylation levels and defects in meristem organization and abscisic acid-mediated responses [8, 21–23]. A researcher may want to know the following: What other genes might be involved in prenylation, and do they act in the same or another pathway? 1. On any TAIR page with a top navigation bar, select “Keywords” from drop-down menu under Search (http://www.arabidopsis.org/servlets/Search?action=new_search&type=keyword). 2. Enter the term (keyword) “farnesyltransferase” in the text box and choose “contains” (an inexact search) from the drop-down menu to the left of the text box (see Note 6). Restrict the keyword category to “GO Molecular Function” and click the “Submit Query” button. 3. The Keyword Search Results page displays terms retrieved along with a count of data objects (loci, publications, annotations) annotated to that term and to its child terms. Click “loci” to display the genes annotated to “farnesyltranstransferase activity” and its child terms (e.g., farnesyl-diphosphate farnesyltransferase activity). Click on the “Download All” button on the result page to save the list of all loci. 4. On the Keyword Search Results page, click on “treeview” to view the term in a hierarchical tree view. Click on the plus sign next to a term to expand the node and display all of the child terms. To display genes annotated to each of the parent and child terms, select the “loci” radio button at the top of the tree view page and then click on the Display button. The display will show a count of the number of loci annotated to each term and the number of loci annotated to the children of each term. Click on the numbers to view more details. The above example used a GO molecular function term “farnesyltranstransferase activity” to show how to find genes sharing similar function by searching for genes annotated to the same function term. Similarly searching for genes annotated to a process term (e.g., protein prenylation) will retrieve a list of genes involved in the related process. 3.4 Finding Information About Gene Expression An important method of finding functional information comes from the analysis of gene expression data (see Note 7). There are many reasons to analyze these data, such as finding the expression pattern of a gene in an organism, determining the effect of the environment on the expression of particular genes, or understanding how the expression of one gene affects the expression of other genes. A number of methods have been applied to study gene expression in Arabidopsis including low-throughput methods such as Northern blot, reverse transcription-polymerase chain reaction Database and Stock Resources 83 (RT-PCR), in situ hybridization, and various reporter assays (e.g., GFP or green fluorescence protein, GUS or ß-glucuronidase reporters) and high-throughput methods such as DNA microarray analysis or RNA-Seq. Expression data obtained by the use of low-throughput methods can be found mainly in the literature. Some of these data in published literature have been captured in the form of Plant Ontology (PO) annotations through TAIR’s literature curation effort. High-throughput DNA microarray data are for the most part stored in databases allowing for download and further analysis. Some of the DNA microarray expression data have also been converted into PO annotations and can be found in TAIR. For example, TAIR contains close to half million PO annotations based on the AtGenExpress microarray data (http://www. weigelworld.org/resources/microarray/AtGenExpress/) [24]. 3.4.1 Finding an Expression Pattern by Searching for Plant Ontology Annotations Following the model of Gene Ontology, the Plant Ontology Consortium (POC; http://www.plantontology.org/) has developed an ontology of controlled vocabulary terms for plant structure as well as growth and developmental stages [18]. Examples of plant structure terms are leaf, leaf stomatal complex, phyllome vascular system, etc. Examples of growth and developmental stages terms are seedling shoot emergence stage, late rosette growth, etc. In TAIR, PO are used to annotate gene expression data from lowthroughput experiments such as Northern blot and reporter assays as well as high-throughput data from DNA microarray experiments and proteomics studies. PO annotations are displayed on the locus detail page along with GO annotations in the Annotations section. To retrieve PO annotations for a set of genes, use the Plant Ontology Annotations Search tool accessible from the Search drop-down menu (http://www.arabidopsis.org/tools/bulk/po/ index.jsp). To find genes co-expressed in the same tissue or developmental stage, use the Keyword Search tool described previously (Subheading 3.3.2) by simply replacing a GO term with a PO term. TAIR’s whole-genome PO annotation file is available for download from its ftp site (ftp://ftp.arabidopsis.org/home/tair/ Ontologies/Plant_Ontology/). 3.4.2 Finding DNA Microarray Data DNA microarrays are one of the most powerful tools for investigating the expression pattern of thousands of genes in parallel, and microarray experiments are now commonly performed in many Arabidopsis laboratories. A vast amount of DNA microarray data has been generated, either through coordinated community effort such as the AtGenExpress project (http://www.arabidopsis. org/portals/expression/microarray/ATGenExpress.jsp) or as a result of individual research projects carried out by numerous laboratories. Arabidopsis microarray data can be found in several public repositories. TAIR provides access to experimental results from 84 Donghui Li et al. both cDNA- and Affymetrix-based platforms of microarray data that TAIR received before June 2005. Newer and more comprehensive data can be found in NASCArrays (http://affy.arabidop[16], ArrayExpress sis.info/narrays/experimentbrowse.pl) (http://www.ebi.ac.uk/arrayexpress/) [25], and GEO (http:// www.ncbi.nlm.nih.gov/geo/) [26]. The emphasis of these public databases with microarray data is to provide long-term storage and access to publicly available data. There are many other academic and commercial groups that have focused on developing advanced analysis tools for mining microarray datasets. Notable examples include Genevestigator (https://www.genevestigator.com) [27] and eFP Browser (http://bar.utoronto.ca/efp/cgi-bin/efpWeb. cgi) [28]. These tools will be covered in Chapter 5. This section shows how to use the TAIR microarray database to find expression profiles of a gene or genes in specific experiments. 1. Start at the TAIR Microarray Expression Search (http://www. arabidopsis.org/servlets/Search?action=new_search&type= expression). This search can be used to find expression data for up to 100 genes using gene names, locus identifiers, microarray element names, or GenBank accession numbers. 2. Choose the default “Locus” from the Search by Name or GenBank Accession drop-down menu and enter At5g01810 in the query text box. 3. Choose Array Type/Design. This feature allows the search to be restricted to a specific type of arrays. Choose the default option (Affymetrix GeneChips, any design). 4. Limit Search by Expression Values (optional). This option allows one to adjust expression value parameters for either Affymetrix or cDNA arrays. Since this example involves Affymetrix data, use the parameter selections for this type of array. In the Detection section, choose Present, which will only include data from hybridizations where the transcript was detected. 5. Limit search by Experiment Parameters (optional). This is an advanced option to restrict a search to only certain experiments. If no limits are imposed, all the experiments in the database are searched. 6. Select the output options (optional). 7. Submit the query. The summarized results include array name (locus identifier), information about the experiment design (Experiment Name, Sample Variables), and specific data for each experiment. Click on the links to go to respective detail pages. The results can be downloaded in text format by clicking the check boxes for the records of interest and then clicking Download Checked. Database and Stock Resources 85 3.5 Obtaining Information About Metabolism in Arabidopsis There are a number of different databases that focus on providing information related to metabolism and metabolites in Arabidopsis including AraCyc and PlantCyc from the Plant Metabolic Network (PMN) [29], Arabidopsis Reactome [30], KEGG [31], KaPPAView4 [32], MetNet [33], MetaCrop [34], and KNApSAck [35]. Although these resources may each offer specific benefits and their combined use might be ideal for optimal data analysis, based on the historical and ongoing connection between TAIR and Pathway Tools/PMN databases, this section will only describe how to access information from PMN databases with a focus on PlantCyc and AraCyc. PlantCyc can house biochemical data for all plant species, whereas AraCyc serves as a metabolic encyclopedia of Arabidopsis [29, 36–38]. Both databases provide information about genes, enzymes, compounds, reactions, and pathways that can have experimental and/or computational support. Semiannual releases, including the latest in July 2013, continue to improve upon the depth, breadth, accuracy, and coverage of these resources. In many cases, links are provided to connect these items found in the PMN to outside metabolism resources such as KEGG, BRENDA, ChEBI, and PubChem, as well as to more general databases such as TAIR, Phytozome, and UniProt. 3.5.1 Finding Information About Metabolic Pathways by Name Although plant metabolism can only be completely described through an extremely dense and highly interconnected metabolic web, many scientists want to search for “pathways” that describe a comprehensible subset of connected reactions. These can be found in AraCyc from TAIR or in AraCyc or PlantCyc directly through the PMN. 1. From any TAIR page, enter the common name of a pathway (e.g., chlorophyll biosynthesis) or a prominent compound expected to be in the pathway (e.g., ascorbate) in the Quick Search tool in the header. Select “Metabolic Pathways” from the drop-down menu of search types and click the “Search” button. The search by default is a “contains” search, so, on the results page, all pathways, enzymes, reactions, and compounds associated with the input keyword will be retrieved. In the case of ascorbate, four different pathways are retrieved. 2. The same search can be performed from within the Plant Metabolic Network (www.plantcyc.org). From any PMN page, enter the search term (e.g., “ascorbate”) in the Quick Search bar in the header, select the database to query, and click on “Quick Search” or “Search” (see Note 8). Again the default search will return all entries in the database that “contain” the term including pathways, enzymes, compounds, and/or reactions. 3. To learn more about a specific pathway, click on its name in the search results. This opens a page that provides a diagrammatic 86 Donghui Li et al. representation of the pathway, evidence code(s), taxonomic information, a curator-written summary, literature references, and more. When a pathway page is initially opened, an overview diagram that lacks detailed information about enzyme identities, chemical structure, etc., may be shown. Click on the “More Detail” button one or more times to display the pathway with increasing amounts of information. When enzyme names appear (in gold), they are shown in bold if they are supported by experimental evidence or non-bold face type if they are supported by computational predictions. Each item on the pathway can be clicked on to open another page with more information, such as an “enzyme detail page.” 3.5.2 Finding Information About Metabolic Pathways Based on Pathway Properties To find a specific pathway or group of pathways that cannot be identified solely by name, at least four additional search strategies are available. 1. The Pathway Search page enables a user to select one or more pathways based on a variety of criteria. To access it from any page, expand the “Search” drop-down menu and choose “Pathways” (http://pmn.plantcyc.org/pwy-search.shtml). On the resulting page, nine different gray headers describe the type of filtering criteria available. To use one or more types of filter, click on the small box, e.g., to the left of the text that says “Search/Filter by number of reactions,” and enter the desired restrictions. Multiple criteria can be combined before clicking on the “Submit Query” button. 2. The Advanced Search page gives users even more power to generate detailed requests. To access it from any page, expand the “Search” drop-down menu and choose “Advanced Query” (http://pmn.plantcyc.org/query.shtml). Several steps must be taken to construct a query in Section 1 of the page, beginning with choosing the appropriate database to search. Multiple “conditions” may be included in the search using the “add a condition” button and may be connected through Boolean operators. Once the request has been formulated, select the columns of data to output and choose a column to sort by in Section 2 of the page. Specify the output format (html or tab delimited) in Section 3 and then click on “Submit Query” to initiate the search. It should be noted that a familiarity with the underlying structure of the Pathway Tools database facilitates the use of this search tool. 3. Pathways can also be identified based on their membership in a particular class, such as “Amino Acids Biosynthesis” by using the Pathway Ontology Browser. To access it from any page, expand the “Search” drop-down menu and choose “Browse Ontologies” and then “Pathway Ontology.” In the Database and Stock Resources 87 resulting page, navigate through the ontology by clicking on any plus sign to expand a category and any minus sign to contract it. 4. Experimental data can also be used to highlight specific pathways that may be of interest to a user. Briefly, quantitative or qualitative results from transcriptomic, proteomic, and metabolomic experiments can be projected onto the entire Arabidopsis metabolic map using the “Metabolic Map/Omics Viewer” present under the “Tools” menu. A tutorial for this procedure is available at the PMN and has been described in previous publications [15]. 3.6 Finding and Ordering Seed Resources from the Arabidopsis Biological Resource Center The ABRC provides access to thousands of seed stocks which can be identified through a number of different search strategies at TAIR. Queries can be entered into the quick search bar in the header, or using the advanced Seed/Germplasm search, located on the ABRC Stocks drop-down menu on the TAIR navigation bar. The quick search allows searching by germplasm or polymorphism name or seed stock number. In addition to this, the advanced Seed/Germplasm search allows searching by germplasm/seed stock-associated information such as donor name, gene name, allele name, and phenotype. Searches can be limited by species, by germplasm type, and by a range of other attributes including genetic background, mutagen, and genotype. A specific search for ecotypes allows searching for natural variants of A. thaliana and related species by donor or germplasm attributes. The search can be limited by location and habitat. Search results pages for both germplasm and ecotype searches include check boxes for ordering and links to detail pages. Detail pages also contain links to other relevant information, for example, to clone detail pages for transgenic germplasm and to community detail pages for donors. Stock-browsing functions are also supported by ABRC’s catalog that can be accessed from the ABRC Stocks drop-down menu in the navigation bar available on most TAIR pages. Seed stocks in the catalog are divided into eight categories and include a range of different types of mutants, mapping lines, transgenic and RNAi lines, natural accessions, and seeds from other closely related species. Some sections link to detail pages with check boxes for ordering. Other sections link to summary pages describing available resources in that category with tips for finding them through advanced searches. Arabidopsis seed stocks with associated sequence information, such as flank sequenced insertion lines, can be found by searching using the AGI locus identifiers through TAIR’s GBrowse genome viewer and are fully integrated in the TAIR database. GBrowse is accessible from the navigation bar under “Tools.” Locus-associated polymorphisms are displayed on the T-DNAs/Transposons and Polymorphisms tracks. Clicking on a polymorphism on the viewer links out to the polymorphism detail page where the corresponding 88 Donghui Li et al. germplasm/stock can be found and ordered. Germplasm names/ stock numbers are also displayed on locus detail pages with check boxes for ordering. Stock numbers and germplasm names link to germplasm detail pages where specific information and an ordering button are displayed. Individuals can access their own order history and invoices from their personal home page when logged in to the TAIR web site. Other TAIR users cannot access an individual’s complete order history, but the order history for a specific stock can be accessed through a link on the germplasm detail page for that stock. In addition to TAIR’s Seed/Germplasm Search, the T-DNA Express (http://signal.salk.edu/cgi-bin/tdnaexpress) developed by the Salk Institute Genomic Analysis Laboratory (SIGnAL) is another popular tool that helps users to find mutant resources associated with specific loci or chromosome locations [3]. T-DNA Express provides links to directly connect users to the ABRC, NASC, or other appropriate stock center to order them. In a reciprocal manner, TAIR provides direct links to this tool from the External Links section of its Locus Detail page (see Subheading 3.1.2). 3.7 Finding and Ordering Other (Non-seed) Resources from the Arabidopsis Biological Resource Center Arabidopsis clone information is fully integrated into the TAIR database. For sequenced clones, links to clone detail pages can be accessed from TAIR’s GBrowse genome viewer and from Locus detail pages. Clone detail pages contain a link to a stock detail page where information such as price, special handling, and other stock specific data can be found. Clones and all other non-seed stocks can also be found through the TAIR quick search, but it is necessary to provide some name information, such as stock number or clone name. Advanced search options for these resources are provided by the TAIR DNA search ( http://www.arabidopsis.org/ser vlets/Search?action=new_ search&type=dna). Drop-down menus allow selection of the type of resource sought (e.g., vector, clone, or host strain), the species, and the type of information supplied (e.g., name, AGI, or stock number). A wide range of features to restrict the search are also available. Results pages from the search provide check boxes for ordering stocks, links to clone, vector and/or stock detail pages, as well as links out to NCBI for sequence information if available. Detail pages provide check boxes for ordering and links out to publications, images, and external web pages with information relevant to the stocks. The order history for a specific stock can be accessed through a link on the stock detail page. DNA stocks can be found by browsing the ABRC catalog. They are divided into five categories, including libraries, clones, vectors, and host strains. The catalog provides links to detail pages with check boxes for ordering or to summary pages describing available resources in that category with tips for finding them through advanced searches. Access to other non-seed resources, Database and Stock Resources 89 including protein chips, cell cultures, and educational resources, is also provided by the catalog. More details about educational resources developed by the ABRC can be obtained from the ABRC outreach portal at http://abrcoutreach.osu.edu. 3.8 Searching Literature Databases Researchers have published a wealth of data about all aspects of Arabidopsis physiology, biochemistry, and development. Databases such as PubMed, Agricola, and BIOSIS index articles from a wide variety of journals and can be used to find citations and articles in electronic or print format. The National Center for Biotechnology Information (NCBI’s) PubMed (http://www.ncbi.nlm.nih.gov/pubmed/) is the primary database for life-science literature. At the end of 2011 the number of Arabidopsis publications in PubMed totaled over 36,000. PubMed has a powerful search interface and links to the rest of the databases within the NCBI system, such as sequence and expression databases. PubMed records are linked to publishers’ sites for access to the full text of the article. For help using the resource refer to the PubMed tutorial (http://www.nlm.nih.gov/bsd/disted/ pubmed.html). TAIR compiles bibliographic records about Arabidopsis from PubMed, BIOSIS, and Agricola. In addition, TAIR includes publications not found in these databases, such as abstracts from the International Conference on Arabidopsis Research, defunct Arabidopsis electronic journals (The Arabidopsis Information Service and Weeds World), books, and dissertations. The following sections describe how to find Arabidopsis publications in PubMed and TAIR. 3.8.1 Finding Articles in the NCBI PubMed Database 1. Start at the PubMed search page (http://www.ncbi.nlm.nih. gov/pubmed/). 2. Enter the desired term(s) in the text input box. Searches can be restricted using the Boolean operators AND, OR, and NOT to combine terms. To search for a phrase, it must be enclosed in quotes (e.g., “transcription regulation”) or with a special flag “[tw]” (e.g., “transcription factor [tw]”). Use wild-card characters (*) for inexact matching. For example, to find all the articles about all the Agamous-like genes, type in “AGL*.” For more refined searching, use the advanced search page (http:// www.ncbi.nlm.nih.gov/pubmed/advanced). The Search Builder allows users to build complex queries. 3. Finding the article text and saving relevant citations: The default display format is a summary of the citation. The complete citation, including available abstracts, can be viewed by clicking on the titles. Articles that are available online are linked to the publisher’s web sites, which may be freely accessible or require a subscription. To modify the display of results, select the 90 Donghui Li et al. appropriate option from the display menu. For example, to import a citation into reference management software, choose MEDLINE format. References can be saved into a file for downloading or sent to an e-mail address. After selecting the articles by clicking on the checkboxes alongside the citations, choose the desired option under the “Send to” menu and click on the “Send to” button. 3.8.2 Finding Arabidopsis Publications Using TAIR’s Publication Search 1. On any TAIR page with a top navigation bar select “Publication” from drop-down menu under Search (http://arabidopsis. org/servlets/Search?action=new_search&type=publication). 2. To search with a specific author’s name or phrase, enter the desired terms in the text query boxes and choose the field to search from the drop-down menu (abstract, author, journal/ book title, title, title/abstract, URL for electronic publications, journal, or PubMed ID). For example, to search for all publications about oxidative stress, type the phrase into the text box and select “Title/Abstract” in the drop-down menu. Unlike the PubMed search, quotes are not required; all text in a single box is treated as a phrase. To restrict the search by publication dates or publication type, fill in the corresponding boxes. 3. Click on the “submit” button to start the search. The results are displayed in a summary format including the title, journal, authors, and year. The title is hyperlinked to a page containing the complete citation, links to authors’ TAIR profiles, the abstract, if available, and a list of associated keywords and genes. For articles with a PubMed ID, a link to the PubMed database is also provided. 3.8.3 Searching Full-Text Arabidopsis Literature Using Textpresso Textpresso is an information extracting and processing package for biological literature [39]. Textpresso for Arabidopsis allows users to search over 40,000 abstracts and 27,000 full-text publications in TAIR as of August 2011. 1. To use this tool, go to http://www.arabidopsis.org/ and select “Textpresso Full Text” from Tools drop-down list. Alternatively go to http://www.textpresso.org/arabidopsis/. 2. Enter the search term in the Keywords query box. Textpresso is extremely useful for tracking down specific information like the mutation sites in certain alleles. For example, enter SALK_099519, and click “Search.” Sentences that contain the matching keyword are displayed together with bibliographic information so that users can quickly confirm the usefulness of a particular paper and link directly to the full text, if they have an appropriate subscription to the journal in question. At the Textpresso site, searches can be narrowed by searching in specific keyword categories (mouse over “List >”) including Database and Stock Resources 91 Arabidopsis gene names, Gene Ontology and Plant Ontology (terms), or a combination of keywords. Advanced search options are described in the User Guide accessible from the top navigation bar. 3.9 Submitting Your Data or DNA/Seed Stocks Funding agencies such as the National Science Foundation (NSF) have invested heavily in the development of community resources such as biological databases and stock centers. These resources play a crucial role in driving research forward by providing access to data and research materials. The long-term sustainability of such resources depends upon contributions by the research community. In an age when data influx has outstripped the organizational ability of the staff of any one database, it is essential to involve the research community in the data collection and curation process. It is important that researchers share their findings not only through publication but also by contributing their data directly to scientific databases. This section describes how to submit your data and/or DNA/seed stocks to various databases. 3.9.1 Submitting Data to TAIR TAIR accepts a wide range of data types including gene function, structure, interaction partners, expression patterns, markers, phenotypes, and several others. Instructions for data submission are available on the Submit Overview page (http://www.arabidopsis. org/submit/index.jsp), accessible from the Submit drop-down menu in the top navigation bar. TAIR provides several ways for researchers to submit their data. For gene function data submission, the use of the online submission tool (http://www.arabidopsis.org/doc/submit/functional_annotation/123) is encouraged. This tool requires the submitting user to log into the TAIR system with a registered user ID, which provides an automatic provenance for the submitted annotations. Reference information (PubMed ID or DOI identifier) is also required. The use of DOIs allows a user to submit annotations before public release of the manuscript; however, the annotations are only released from TAIR upon publication of the corresponding article. Users can also prepare various types of data for submission formatted according to the guidelines listed on the Submission Overview page or download and use the preformatted Excel spreadsheets available there [15]. Data can then be submitted to TAIR by e-mail to [email protected]. In addition, each data detail page contains a Comments section; registered TAIR users can submit comments by clicking on the “Add My Comment” button. Comments submitted are immediately displayed in the Comments section of the detail page. For corrections to existing data, users may contact TAIR by e-mail to [email protected]. 92 Donghui Li et al. 3.9.2 Submitting Data to the PMN The Plant Metabolic Network is eager to receive data submissions of published findings related to pathways, enzymes, reactions, or compounds found in plants. To help researchers submit these data types, three Excel forms and simple instructions are provided on the Data Submission page (http://www.plantcyc.org/feedback/ data_submission.faces). This can be accessed from the “Submit Data” heading on the menu bar. Submitters are encouraged to enter the data on the forms, save them locally, and then send them to the PMN. The forms may be e-mailed or may be uploaded and submitted via the Feedback Form (http://www.plantcyc.org/feedback/feedback_form.faces) that can also be found on the “Submit Data” menu. Although thoroughness on the forms is appreciated, incomplete forms are always accepted. In addition, supporting materials, such as .gif files that depict pathway layouts or .mol files that provide compound structures, can also be submitted. The PMN also welcomes experts to volunteer to help review particular domains of metabolism to check for completeness and accuracy. Feedback and corrections concerning data found in the PMN can be submitted using the Feedback Form or through a direct e-mail to [email protected]. 3.9.3 Donating Seed and DNA Stocks to the ABRC The ABRC accepts all Arabidopsis seed resources and is particularly interested in receiving confirmed insertion mutants, characterized mutants, transgenic lines, and cDNA/ORF clones. For other types of resources, it is necessary to contact the stock center in advance to ensure that the resource can be accommodated. All seed resources are shared with NASC after propagation at the ABRC or immediately if enough seed is supplied. Other resources may also be shared with NASC if requested by NASC customers. The ABRC has developed stock donation forms to collect data associated with stock donations. This data is curated by ABRC staff and uploaded to TAIR within a month of receiving the material. Donated stocks are being made available for distribution either at the time related data is uploaded or upon amplification. Although it is preferable that donors fill out ABRC donation forms, a simple donation form is available for published resources and data in other formats is accepted, particularly for large collections of stocks. Links for downloading ABRC donation forms are available from the ABRC Stocks drop-down menu. A donation form for a contribution of educational materials for high school and undergraduate-level classes has recently been developed and is available upon request. 4 Notes 1. Classical mutants are mostly characterized and published mutants derived from forward genetic screens utilizing populations generated with various mutagens (X-rays, fast Database and Stock Resources 93 neutrons, ethyl methanesulfonate or EMS, agrobacterium transformation, etc.). 2. The quick search performs a name search for most of the objects in the TAIR database (e.g., Genes, Clones, ESTs or BAC ends, People/Labs, Polymorphisms/Alleles, Germplasms, Ecotypes, Keywords, Genetic Markers, Proteins, Seed and DNA Stocks, and Vectors). By default, this is a “contains” search (a search for aba1 retrieves both ABA1 and ATRABA1A). This search is not limited to the name field. For example, when performing a quick search for “Gene,” the gene description and keywords fields will be searched as well as the name. This is to avoid missing any potentially relevant results, but sometimes too many results are returned. To perform an exact name search, choose the “exact name search” option from the drop-down menu to the right of the search box. This option will only search the name field for all the data types listed in the drop-down menu [15]. 3. The computational description contains the gene’s full name, Gene Ontology and Plant Ontology terms, best BLASTidentified A. thaliana protein match, and the number of protein BLAST hits in other species (NCBI BLink) [15]. A computational description is only shown if the locus has not yet been curated manually. Users are especially welcome to submit suggested gene descriptions for loci that only have a computational description. 4. TIGR, now the J. Craig Venter Institute (http://www.jcvi. org/), no longer actively produces GO annotations for Arabidopsis genes, but past TIGR annotations are still stored in TAIR. 5. GO annotations can be divided into two broad categories: (1) annotations based on experimental data including results from low- and high-throughput experiments (e.g., DNA microarray and proteomics studies) and (2) computationally predicted annotations. Computational annotations are based on an in silico analysis of the gene product sequence and/or other data as described in the cited reference and may or may not be individually reviewed by a curator. For example, TAIR uses a combination of InterProScan and InterPro2GO mapping file to create GO annotations for proteins based on the presence of domains with mapped GO terms [8]. Such annotations are not reviewed on an individual basis by a curator. Alternatively, annotations can be made by a curator on an individual basis by examining relevant computational analyses (e.g., sequence alignment, protein family information). Computational annotations provide the basis to form testable hypothesis particularly for genes with little known experimental data. For example, AT3G24560 (RASPBERRY 3) is annotated to 94 Donghui Li et al. the GO term “ligase activity, forming carbon–nitrogen bonds” based on an InterPro domain scan. A researcher can then design an experiment to test whether indeed this protein has ligase activity. The GO Consortium has developed a set of evidence codes to indicate how an annotation to a particular term is supported. In order to correctly interpret a GO annotation, it is essential to review the evidence code together with the GO term. For a complete list of evidence codes currently in use, go to http://www.geneontology.org/GO.evidence.shtml. In TAIR, annotations also include an evidence description. For example, an annotation with the evidence code “inferred from mutant phenotype” (IMP) may be further specified by including an evidence description “RNAi experiments.” Since more than one gene may be affected by RNA interference, the GO annotation should be viewed with the understanding that the phenotype may be due to the loss of function of more than one homologous locus. An in-depth discussion on how to avoid the common misuse of GO is available [40]. 6. Many of the GO terms exist as complex phrases. TAIR searches treat the entire entered term or phrase as a complete phrase rather than a set of words. Consequently, an “exact match” search will often not retrieve any entries. Therefore, using the “contains” option for keyword searches is recommended [15]. 7. Gene expression data historically and most properly refers to the expression of gene transcripts; however, the expression of protein constructs and/or the analysis of proteomic experiments is also often grouped into this category. 8. The PMN offers a collection of PMN-generated pages (www. plantcyc.org/…) and Pathway Tools-generated pages (pmn. plantcyc.org/…) which have some differences, particularly in the header. Most notably, a simple drop-down menu is used to select a database to query via the Quick Search bar on PMNgenerated pages, whereas the “change organism database” link can be used to select a new database to query on all Pathway Tools-generated pages. Acknowledgements This project was supported by the National Science Foundation (grant number DBI-0850219, DBI-0640769, IOS-1026003), the National Institute of Health National Human Genome Research Institute (NIH-NHGRI) (grant number 5P41HG002273-09), and the TAIR sponsorship program (http://www.arabidopsis.org/doc/about/tair_sponsors/413). Database and Stock Resources 95 References 1. Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815 2. The Multinational Arabidopsis Steering Committee (2011) The multinational coordinated Arabidopsis thaliana functional genomics project annual report 2011. http://www. arabidopsis.org/portals/masc/2011_MASC_ Report.pdf 3. Alonso JM, Stepanova AN, Leisse TJ et al (2003) Genome-wide insertional mutagenesis of Arabidopsis thaliana. Science 301:653–657 4. Garcia-Hernandez M, Berardini TZ, Chen G et al (2002) TAIR: a resource for integrated Arabidopsis data. Funct Integr Genomics 2:239–253 5. Huala E, Dickerman AW, Garcia-Hernandez M et al (2001) The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Res 29:102–105 6. Rhee SY, Beavis W, Berardini TZ et al (2003) The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res 31:224–228 7. Swarbreck D, Wilks C, Lamesch P et al (2008) The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res 36:D1009–D1014 8. Lamesch P, Berardini TZ, Li D et al (2011) The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res. doi:10.1093/nar/gkr1090 9. Meinke D, Scholl R (2003) The preservation of plant genetic resources: experiences with Arabidopsis. Plant Physiol 133:1046–1050 10. Heazlewood JL, Verboom RE, Tonti-Filippini J et al (2007) SUBA: the Arabidopsis subcellular database. Nucleic Acids Res 35:D213–D218 11. Lu Y, Savage LJ, Larson M et al (2011) Chloroplast 2010: a database for large-scale phenotypic screening of Arabidopsis mutants. Plant Physiol 155:1589–1900 12. International Arabidopsis Informatics Consortium (2010) An international bioinformatics infrastructure to underpin the Arabidopsis community. Plant Cell 22:2530–2536 13. Samson F, Brunaud V, Balzergue S et al (2002) FLAGdb/FST: a database of mapped flanking insertion sites (FSTs) of Arabidopsis thaliana T-DNA transformants. Nucleic Acids Res 30:94–97 14. Kleinboelting N, Huep G, Kloetgen A et al (2011) GABI-Kat Simple Search: new features of the Arabidopsis thaliana T-DNA mutant database. Nucleic Acids Res. doi:10.1093/ nar/gkr1047 15. Lamesch P, Dreher K, Swarbreck D, et al. (2010) Using the Arabidopsis Information Resource (TAIR) to find information about Arabidopsis genes. Curr Protoc Bioinformatics. Chapter 1:Unit1.11 16. Craigon DJ, James N, Okyere J, Higgins J et al (2004) A repository for microarray data generated by NASC’s transcriptomics service. Nucleic Acids Res 32:D575–D577 17. Ashburner M, Ball CA, Blake JA et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25–29 18. Jaiswal P, Avraham S, Ilic K et al (2005) Plant ontology (PO): a controlled vocabulary of plant structures and growth stages. Comp Funct Genomics 6:388–397 19. Reference Genome Group of the Gene Ontology Consortium (2009) The Gene Ontology’s Reference Genome project: a unified framework for functional annotation across species. PLoS Comput Biol 5:e1000431 20. Ort DR, Grennan AK (2008) Plant physiology and TAIR partnership. Plant Physiol 146: 1022–1023 21. Cutler S, Ghassemian M, Bonetta D et al (1996) A protein farnesyl transferase involved in abscisic acid signal transduction in Arabidopsis. Science 273:1239–1241 22. Yalovsky S, Kulukian A, Rodriguez-Concepcion M et al (2000) Functional requirement of plant farnesyltransferase during development in Arabidopsis. Plant Cell 12:1267–1278 23. Ziegelhoffer EC, Medrano LJ, Meyerowitz EM (2000) Cloning of the Arabidopsis WIGGUM gene identifies a role for farnesylation in meristem development. Proc Natl Acad Sci USA 97:7633–7638 24. Schmid M, Davison TS, Henz SR et al (2005) A gene expression map of Arabidopsis thaliana development. Nat Genet 37:501–506 25. Parkinson H, Sarkans U, Kolesnikov N et al (2011) ArrayExpress update—an archive of microarray and high-throughput sequencingbased functional genomics experiments. Nucleic Acids Res 39:D1002–1004 26. Barrett T, Troup DB, Wilhite SE et al (2011) NCBI GEO: archive for functional genomics data sets—10 years on. Nucleic Acids Res 39:D1005–1010 96 Donghui Li et al. 27. Hruz T, Laule O, Szabo G et al (2008) Genevestigator v3: a reference expression database for the meta-analysis of transcriptomes. Adv Bioinformatics 2008:420747 28. Winter D, Vinegar B, Nahal H et al (2007) An “electronic fluorescent pictograph” browser for exploring and analyzing large-scale biological data sets. PLoS One 2:e718. doi:10.1371/ journal.pone.0000718 29. Zhang P, Dreher K, Karthikeyan A et al (2010) Creation of a genome-wide metabolic pathway database for Populus trichocarpa using a new approach for reconstruction and curation of metabolic pathways for plants. Plant Physiol 153:1479–1491 30. Tsesmetzis N, Couchman M, Higgins J et al (2008) Arabidopsis reactome: a foundation knowledgebase for plant systems biology. Plant Cell 20:1426–1436 31. Masoudi-Nejad A, Goto S, Endo TR et al (2007) KEGG bioinformatics resource for plant genomics research. Methods Mol Biol 406:437–458 32. Sakurai N, Ara T, Ogata Y et al (2011) KaPPAView4: a metabolic pathway database for representation and analysis of correlation networks of gene co-expression and metabolite coaccumulation and omics data. Nucleic Acids Res 39:D677–684 33. Wurtele ES, Li L, Berleant D et al (2007) MetNet: systems biology software for 34. 35. 36. 37. 38. 39. 40. Arabidopsis. In: Nikolau BJ, Wurtele ES (eds) Concepts in plant metabolomics. Springer, Berlin, pp 145–158 Grafahrend-Belau E, Weise S, Koschützki D et al (2008) MetaCrop: a detailed database of crop plant metabolism. Nucleic Acids Res 36:D954–958 Shinbo Y, Nakamura Y, Altaf-Ul-Amin M et al (2006) KNApSAcK: A comprehensive speciesmetabolite relationship database. In: Saito K, Dixon RA, Willmitzer L (ed) Plant metabolomics. Berlin, Springer, pp 165–181. doi: 10.1007/3-540-29782-0_13 Karp P, Paley S, Romero P (2002) The pathway tools software. Bioinformatics 18: S225–S232 Mueller LA, Zhang P, Rhee SY (2003) AraCyc. A biochemical pathway database for Arabidopsis. Plant Physiol 132:453–460 Zhang P, Foerster H, Tissier C et al (2005) MetaCyc and AraCyc: metabolic pathway databases for plant research. Plant Physiol 138:27–37 Müller HM, Kenny EE, Sternberg PW (2004) Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol 2:e309 Rhee SY, Wood V, Dolinski K et al (2008) Use and misuse of the gene ontology annotations. Nat Rev Genet 9:509–515 Chapter 5 Bioinformatic Tools in Arabidopsis Research Miguel de Lucas, Nicholas J. Provart, and Siobhan M. Brady Abstract Bioinformatic tools are an increasingly important resource for Arabidopsis researchers. With them, it is possible to rapidly query the large data sets covering genomes, transcriptomes, proteomes, epigenomes, and other “omes” that have been generated in the past decade. Often these tools can be used to generate quality hypotheses at the click of a mouse. In this chapter, we cover the use of bioinformatic tools for examining gene expression and coexpression patterns, performing promoter analyses, looking for functional classification enrichment for sets of genes, and investigating protein–protein interactions. We also introduce bioinformatic tools that allow integration of data from several sources for improved hypothesis generation. Key words Transcriptomics, Bioinformatics, Proteomics, Protein–protein interactions, Coexpression, Functional classification, Functional genomics, Promoter analysis, Subcellular localization 1 Introduction Plant biology, like other areas of biology, has undergone a large transformation in the past decade, driven by high-throughput methods for data generation, especially in the areas of genome and epigenome analysis, transcriptome and proteome profiling, determining protein–protein interactions, and metabolome determination. Many data sets have been generated, and while each individual set has been of tremendous use to the plant biologist who created it, in aggregate these publicly available data sets are also of great value to plant biologists around the world for querying in the context of their biological questions. Obviously, such large data sets cannot provide a complete understanding of a given biological question, but they can be leveraged to help plan experiments or to generate hypotheses in silico, which can be rapidly tested in the lab with the wide range of molecular techniques and genetic resources that have been developed over a similar time frame. This chapter provides an overview of web-based tools for querying data sets generated by researchers, often funded by the National Science Foundation Arabidopsis 2010 project in the USA, whose objective Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_5, © Springer Science+Business Media New York 2014 97 98 Miguel de Lucas et al. was to identify the functions of 25,000 genes in Arabidopsis by 2010 [1], and by the AtGenExpress Consortium, an international effort to measure the Arabidopsis transcriptome under many conditions and in different tissues. Here, we emphasize web-based tools that are well cited and which tend to integrate data from several sources, for while many researchers have set up project-based databases on websites, resources that draw from many sources are often more useful to the typical Arabidopsis researcher. We won’t describe well-developed sequence databases as these are covered in a chapter by Eva Huala and colleagues elsewhere in this Arabidopsis Protocols. The SIGnAL website at http://signal.salk.edu/ [2] and TAIR website at http:// www.arabidopsis.org [3] are two very useful websites for exploring sequences and identifying insertions, among their greater utility in this regard. Instead, we focus on tools for querying transcriptome data sets, which are the most comprehensive of all of the large data types, and highlight tools for querying these both in a directed way and correlatively. Such tools can be very useful for narrowing down the phenotypic search space or for providing leads on “novel” genes associated with a given biological process, respectively. We also look at several tools for exploring protein–protein interactions in Arabidopsis and for performing promoter analyses. Tools for integrating different data types to improve function prediction are key to extracting even more knowledge from these data sets, and two such tools will also be covered. We use as an example the gene ABSCISIC ACID INSENSITIVE 3, At3g24650 [4], as our “gene of interest.” Although this gene is well known to be involved in seed biology, we will hypothesize some more functions for it using the tools described here, often at the cost of only a click of the mouse. The programs and websites discussed in this chapter are listed in Table 1 in the Subheading 2. Two additional useful review articles in the context of bioinformatic tools for hypothesis generation are by Brady and Provart [5] and by Usadel and colleagues [6]. 2 Materials Materials used in this protocol are indicated in Tables 1 and 2 [7–21]. Additionally, we use a list of genes differentially regulated in a LEC1 overexpressor line as outlined in [22]. 3 Methods 3.1 Expression Analysis Online expression analysis can be useful in place of performing RNA blot analyses or constructing promoter–reporter fusions to determine patterns of expression. For instance, imagine we had Bioinformatic Tools in Arabidopsis Research 99 Table 1 Tools, URLs, and references Methods Tool Web Reference Expression analysis eFP Browser Genevestigator bar.utoronto.ca/efp/cgi-bin/efpWeb.cgi www.genevestigator.com/gv/ [7] [8] Promoter analysis Cistome www.bar.utoronto.ca/cistome/cgi-bin/BAR_ Cistome.cgi www.bioinformatics2.wsu.edu/cgi-bin/Athena/cgi/ [9] home.pl Athena Coexpression tools ATTED II atted.jp/ Expression Angler bar.utoronto.ca/ntools/cgi-bin/ntools_expression_ angler.cgi [10] [11] Functional classification AgriGO AmiGO [12] [13] Classification SuperViewer bioinfo.cau.edu.cn/agriGO/ amigo.geneontology.org/cgi-bin/amigo/ term_enrichment bar.utoronto.ca/ntools/cgi-bin/ntools_ classification_superviewer.cgi Pathway visualization AraCyc MapMan www.plantcyc.org/ [15] mapman.gabipd.org/web/guest/mapman-download [16] Protein information SUBA III suba.plantenergy.uwa.edu.au/ Cell eFP Browser bar.utoronto.ca/cell_efp/cgi-bin/cell_efp.cgi Protein–protein Arabidopsis interaction Interactions Viewer NBrowse bar.utoronto.ca/interactions/ www.arabidopsis.org/tools/nbrowse.jsp Integrated tools VirtualPlant GeneMania ePlant virtualplant.bio.nyu.edu/cgi-bin/vpweb/ www.genemania.org/ http://bar.utoronto.ca/eplant/ [14] [17] [7] [18] [19] [20] [21] Table 2 ABI3 developmentally coexpressed genes AT4G27160 AT4G27460 AT4G27150 AT1G80090 AT1G03890 AT3G62730 AT1G32560 AT2G33520 AT5G55240 AT3G44830 AT3G22640 AT5G50600 AT4G10020 AT2G38905 AT1G14950 AT5G54740 AT1G05510 AT3G54940 AT5G10140 AT5G24130 AT1G29680 AT4G27140 AT1G17810 AT5G01300 AT1G54860 AT2G41070 AT1G04560 AT2G23640 AT1G48130 AT5G01670 AT2G34315 AT5G57390 AT2G21490 AT2G02120 AT5G50360 AT3G18570 AT1G52690 AT1G27461 AT1G62710 AT4G26740 AT1G65090 AT2G02580 AT3G14360 AT5G60460 AT2G28490 AT5G24950 AT2G27380 AT1G73190 AT3G24650 AT4G16160 AT4G31830 100 Miguel de Lucas et al. identified the abi3 mutation by positional cloning and wanted to know more about its biological function and perhaps to guide us where to look elsewhere for a phenotype. One of the first steps would be to examine its expression pattern. Online tools such as the eFP Browser or Genevestigator makes this very easy, provided the platform used for measuring the transcriptome is able to detect the transcript for one’s gene of interest (see Note 1). 3.1.1 eFP Browser The eFP (“electronic fluorescent pictograph”) Browser at the Bio-Analytic Resource for Plant Biology at http://bar.utoronto.ca [7] provides easy access to 80.2 million expression measurements from Arabidopsis thaliana, soybean (Glycine max), barrel medic (Medicago truncatula), poplar (Populus trichocarpa), maize (Zea mays), barley (Hordeum vulgare), and rice (Oryza sativa). Fourfifths of the measurements were made using Arabidopsis samples. Small pictographs are used to represent the experimental samples and contexts from which the expression data were generated, while differing expression levels within these samples are denoted by a color scale. 1. Go to http://bar.utoronto.ca and select “Arabidopsis eFP Browser” from the BAR’s homepage. 2. Enter your gene of interest’s AGI ID (see Note 2). In our case, we enter “At3g24650” for the ABI3 gene into the Primary Gene ID box. Click Go. 3. Figure 1 shows the output when querying the eFP Browser using ABI3 in the default settings. The tissues that were sampled by Schmid et al. [23] for their “gene expression map during Arabidopsis development” and by Nakabayshi et al. [24] for the dry and imbibed seed samples are depicted in a pictographic manner. Where the expression (expression meaning steady-state mRNA levels) of ABI3 is higher, the more red is that tissue’s color. If there is little expression in a tissue, then it is colored yellow. 4. By changing the data source, it is possible to explore other data sets that have been annotated in this pictographic manner. The eFP Browser also outputs where the expression of the gene of interest is strongest (in this case, in the Seed Data Source, not surprisingly, given ABI3’s known role there), but it is also worthwhile to examine other data sources (see Note 3). For instance, ABI3 also seems to be expressed in the vasculature tissue between the elongation and maturation zone of the root. If it had not already been known [25] that ABI3 is involved in root development, such an observation of expression in the root could guide us to look for phenotypes in the roots of abi3 mutants more closely. 5. The Relative Mode option allows you to view expression of a given gene in each sample relative to its expression in a control Bioinformatic Tools in Arabidopsis Research 101 Mode allows viewing in Absolute, Relative, and Compare modes Signal Threshold to set a maximum for colour scale Linkouts to other tools Data Source to choose different AtGenExpress projects and other projects, e.g. Birnbaum et al . set. Expression Level distribution shows how maximum level compares with all other genes, and maximum level in any set Clickable tissues hyperlink to the NASCArrays, GEO or literature record for the sample Expression Level scale red=higher expression level Buttons to view a table or chart of expression values Fig. 1 “Default” view of expression pattern of ABI3 (At3g24650) in Arabidopsis. Stronger expression is denoted by a darker coloration. The interface provides many options for exploring the expression data, as shown in the callout boxes (see Note 3) sample and to ascertain whether the gene’s expression is above or below this level. If it is above, a red color is used, and if it is below, a blue color is used to color the tissue in question. For the Developmental Map, this level has been computed as the median level across all of the tissues displayed. The Relative Mode is more useful in the case of “challenge” experiments, where a hormone or chemical has been applied as part of the experimental design. The control sample in this case would be the mock treated or untreated control. 6. If a given gene does not map to an ATH1 probe set, then try using the “Developmental Map At-TAX” Data Source. These data were generated using a different platform, so it should be possible to get an idea of where any gene is expressed using this or the “Abiotic Stress At-TAX” Data Sources [26, 27]. 3.1.2 Genevestigator Data from more than 8.000 ATH1 arrays is available for Arabidopsis at the Genevestigator analysis tool (https://www. genevestigator.com/gv/) [8]. Similar to the eFP Browser, the different tools of this resource let us determine when and where our gene of interest is expressed and in response to which conditions. The main difference between the eFP Browser and Genevestigator is that data are displayed in heat-map format as 102 Miguel de Lucas et al. opposed to a pictograph. One of the major advantages of this tool is the simultaneous analysis of hundreds or thousands of genes in a biological context as opposed to the eFP Browser, which permits a user to examine only one gene at a time (or two genes in the compare mode—see Note 4). 1. Go to https://www.genevestigator.com/gv/ and select “Plant Biology.” Click Analysis Tool Start. Click Start under the Open access version. 2. Click on Sample Selection, click “new” to chose Arabidopsis as the Organism, “ATH1: 22K array” as platform. Alternatively, one can select the AGRONOMICS whole-genome tiling array or the AG: 8K array (note that no results will appear for ABI3 on the whole-genome tiling array). Name your selection, i.e., ABI3 study (see Note 5). Click OK. 3. In the Gene Selection tool we will introduce the AGI ID by clicking on “new” (see Note 6). In our case, we enter the ABI3 AGI ID, “At3g24650.” Click OK. 4. The Condition Search tools give us gene expression data from the different arrays sets (see Note 7), the filled dots indicate p-values under 0.06 and the unfilled p-values over 0.06 (see Note 8). Choose for example, “Samples” to explore gene expression on all the arrays available. To get the experimental design and gene expression information, just drag the mouse over the sample name or the dot. 5. Click on the different tabs to explore the ontologies of anatomy, genotypes, condition, and development. The expression of ABI3 is high in the seed arrays, principally in the embryo and endosperm, rather than in the seed coat. By genotype, ABI3 is highly expressed in the pER8:LEC1 overexpression line and repressed in lec1.1 plants; by contrast ABI3 has lower expression in the pif1/pif3/pif4/pif5 quadruple mutant plants. ABA treatments promote its expression, as does the treatment with paclobutrazol (GA inhibitor). 6. We can generate hypotheses from these data: phytochromemediated light signalling and downstream factors regulate ABI3 expression and LEC1 likely regulates ABI3 expression either directly or indirectly. 3.2 Coexpression Tools Coexpression analysis can leverage the large number of gene expression data sets that have been generated in the past decade to answer the question “which genes show similar patterns of expression as my gene of interest, across all samples in a given database?” Those that show similar patterns of expression may be involved in the same biological process as the query gene, after the “guilt-byassociation” paradigm. The use of such analyses is well covered in a recent review by Usadel and colleagues [6]. Bioinformatic Tools in Arabidopsis Research 3.2.1 Expression Angler 103 Expression Angler [11] is a powerful yet easy-to-use tool for identifying coexpressed genes, as measured by the Pearson correlation coefficient—r—in both a condition-dependent and conditionindependent manner (see Note 9). With it, it is possible to answer the question of which genes show similar patterns of expression in nine different data sets—genes with an r-value of greater than around 0.75 can be considered coexpressed. It is also possible to use just a subset of the samples within a given data set to perform the analysis, which we will do below for ABI3. Those genes annotated as “unknown function” or those with only vague descriptions may be involved in the same process as the query gene. 1. Go to the Bio-Analytic Resource for Plant Biology’s homepage at http://bar.utoronto.ca and select the Expression Angler link. 2. In normal use, select a data set and enter the AGI ID of interest. If we had used the AtGenExpress Tissue Set, which corresponds to the data set shown in Fig. 1, we would identify many other seed maturation genes and ABA-responsive genes being coexpressed with ABI3—the top 50 of these are listed in Table 2. Another way to use Expression Angler, however, is to define a subset of samples in which to search. Use the “Subselect and Custom Bait Page” link, and then choose a data set. In this case we will use the Root Compendium. On the input page, we will enter At3g24650 and then select “Return just the top 50 hits” in only the “Spatiotemporal expression” experiment [28] (see Note 10). 3. Click “Submit Query” at the bottom of the page. 4. On the output page, examine the “View formatted data set after median centering and normalization,” as shown in Fig. 2. This view is closest to the way that Expression Angler “sees” expression pattern similarity with the Pearson correlation coefficient, which standardizes gene expression values by the average value (not median) when comparing two expression vectors. Another useful view is the “View formatted data set,” which shows the untransformed expression levels. 5. By mousing over the heat map, it is possible to find out the annotation of the genes, which samples they are expressed most strongly in, and other information. Interestingly, YABBY3, likely a patterning gene, shows up as being coexpressed with ABI3, as are several other transcription factors. 3.2.2 ATTED II ATTED II [10] is a gene coexpression database to find functional relationships between genes. This tool uses the mutual rank (MR) of the Pearson’s correlation coefficient [29], to investigate gene coexpression in Arabidopsis in a condition-independent way or across five sets of experimental conditions: tissue, abiotic stress, biotic stress, hormones, and light conditions. ATTED II also offers 104 Miguel de Lucas et al. Info Box shows information about for a given cell in heatmap, including r-value (0.862 for YAB3) Signal Threshold to set a maximum for colour scale Functional Classification Code shows into which GO categories a given gene has been classified (grey = process, white = function, yellow = location) Crosshairs as a guide for pinpointing a particular cell in the heatmap Expression Level scale red=higher expression level Functional Classification Legend shows enriched GO terms for list Fig. 2 Heat-map output of Expression Angler after searching in just the Root Spatiotemporal data set of Brady et al. [28] with ABI3 analysis of rice coexpression data to provide a comparative view between both species using putative gene orthologs. 1. Go to http://atted.jp/. 2. On the search menu, click on the arrow(s) on the right-hand side of the pull-down menu and select the option that best fit to your search (“All words,” “Keyword,” “Gene alias,” “Gene ID,” or “GO ID”). We will search by “GeneID,” At3g24650 for ABI3. Click Search. 3. The output window shows a brief description of the gene of interest, like the alias and the function. By selecting “locus,” ATTED II sends us to a new window with much information about the gene: functional annotation, a gene coexpression network, gene expression levels, and predicted cis-elements. 4. For a more extensive analysis of coexpressed genes, go back to the locus search window and click on “list” of coexpressed genes. The program will give a list of the top 300 coexpressed genes (see Note 11). 5. Check “coex in specific conditions” to study coexpression under different conditions: tissue, abiotic stress, biotic stress, hormone, Bioinformatic Tools in Arabidopsis Research AGI IDs of the co-expressed genes. Network View of the co-expressed genes. Click to generate view. Sorted coexpression data. Use “sort” to get co-expressed genes in different biological context. 105 Link to ATTED II gene information for Rice homologs Fig. 3 Output of an ATTED II query for the ABI3 gene, showing ranked list of coexpressed genes in ATTED II’s condition-independent data set (top panel) and a visualization of the coexpression list in network form (insert) and light. We can rank coexpression in each condition by clicking on “sort.” This approach would help us to infer the gene function in each category. For instance the genes that are more closely correlated to ABI3 differ extensively depending on which biological context in which we are interested. This suggests that ABI3 has multiple functions—both developmentally and in response to the environment, i.e., if we sort by “tissue,” ABI3 is coexpressed with several seed-associated genes, whereby different genes show up at the top of the ABI3coexpressed lists under hormone treatments or abiotic stress. 6. Tick “Osa homolog” to see the homologous genes in rice. The output window will show you the 300 top coexpressed genes in rice. 7. Click on the small “L”-shaped icon in the Link column for each coexpressed gene to get the same information described in step 3. One of the most powerful features of ATTED II is the network visualization of coexpressed genes. This network describes in a clear manner genes connected directly and indirectly to our query gene by coexpression. We can explore coexpression network neighborhoods by clicking on the gene names (see Fig. 3). 106 Miguel de Lucas et al. 8. ATTED II shows that ABI3 is coexpressed with EPR1 (an extensin-like gene) that is involved in seed germination but only expressed on the endosperm [30]. AIL5 (AINTEGUMENTA LIKE-5) appears to be coexpressed with ABI3 as well. AIL5 encodes a member of the AP2 family of transcriptional regulators that are involved in cell proliferation activities in many organs [31]. AIL5 mutants are tolerant to ABA. We can therefore hypothesize that ABI3 and AIL5 interact together to control cell proliferation and/or ABA response. 3.3 Promoter Analysis Gene expression is dependent on the cis-regulatory elements present in the promoter regions of genes that act as binding sites for one or more transcription factors. Many tools were developed to better understand how these transcription factor binding sites might regulate gene expression. In this section we will introduce tools that will help us to analyze and visualize promoter regions of Arabidopsis genes. 3.3.1 Cistome Imagine a set of genes that are coexpressed in response to a certain stimulus. It will be of interest to determine common upstream regulatory motifs between these genes that could explain this particular behavior and identify putative upstream regulators. Cistome is a tool that searches for enriched motifs in the promoter regions of these genes. 1. Go to http://www.bar.utoronto.ca/cistome/cgi-bin/BAR_ Cistome.cgi. 2. We will need to specify the analysis that Cistome should perform with our gene list. If we are interested in studying whether a particular motif is overrepresented in the promoter regions of our gene set, click on “Enter PSSMs” (position-specific scoring matrices; this is a more flexible way to represent transcription factor binding sites and describes the probability of how often a given nucleotide can be present at each position of the motif), and enter the search sequence in the format required. For instance, we may be interested in the G-box motif (CACGTG), which is a binding site for the PIF transcription factor family [32]. This will assist us in exploring whether or not the genetic association of ABI3 with the PIF transcription factors outlined in the Genevestigator section may be through binding of this light-regulated family of transcription factors to the set of ABI3 developmentally coexpressed genes in Table 2. Select the Consensus sequence option. We will add the motif sequence in FASTA format (> GBOX and then on a new line, CACGTG—see the Format example link for help), select the Consensus sequence option, tick the Significance testing option, and finally enter the AGI ID list in the List of AGI identifiers to search box and click Map. We find that this motif Bioinformatic Tools in Arabidopsis Research 107 is in fact overrepresented in the promoters (of length 1,000 bp) of ABI3 developmentally coexpressed genes by a Z-score of 7.32, which is highly significant. On average one should use a threshold of greater than 3 to represent at least 3 standard deviations away from the mean of the number of times a given motif occurs in a random sampling of all of the promoters in Arabidopsis. This provides a prediction that a PIF transcription factor binds the promoter of genes that are coexpressed with ABI3 to regulate their expression in the light. 3. Alternatively, by clicking on the Use Prediction tab at the top of the Cistome page, we can screen against all possible motifs identified using one of two parts of a previously characterized motif database, PLACE [33]. The first part uses All PLACE Elements, which contains motifs identified in all plants. The second part uses a subset of these which have just been identified in Arabidopsis. We recommend using the entire PLACE database (see Note 12) to get a larger breadth of possible elements. Alternatively one can identify overrepresented uncharacterized elements by clicking Cis Scan to activate cis-motif prediction programs available on Cistome. To map known PLACE elements onto our promoter list of ABI3 developmentally coexpressed genes from Table 2, tick “Search for enriched PLACE database elements within your gene set,” and search for enriched motifs using “ALL PLACE elements.” You can also specify the significance parameters. In our example analysis we will use the default parameters, which include a Z-score cutoff of greater than 3 and a functional depth cutoff of 0.35, and that this motif must be found in at least half of the genes in the gene set. 4. We will need to specify the gene data set that Cistome will use as a background. We choose to use the last updated version of TAIR which is available on Cistome—at the time of writing this chapter the version was TAIR9. Indicate the length of sequence that will be used for the analysis (e.g., 1,000 bp) using the “transcriptional start site (TSS)” as a start position. The majority of binding sites have been identified in the first 500–1,000 bp upstream of a gene’s transcriptional start site [34–36]. One can also specify a custom background set by uploading a file with sequences in it. 5. Click on “Map” and Cistome will display a diagram with the overrepresented regulatory elements mapped on the promoters of the genes included in the analysis. Overrepresentation is determined by comparing the frequency of occurrence of each motif against the frequency of occurrence of the same motif in randomly selected sets of promoters from the background set. Click on “Cluster” and Cistome displays a cluster with the overrepresented sequences based on sequence similarity conservation. Click on “Logo” to get the frequency of the distinct nucleotides that are found in the overrepresented binding 108 Miguel de Lucas et al. Click on AGI IDs for locus information. Promoter Maps show position of regulatory elements on the promoter regions of query genes Click on motifs for sequence information including a sequence logo representation Colour key for overrepresented regulatory elements (darker shading for better match) Fig. 4 Output of a Cistome query that represents the overrepresented regulatory elements mapped onto the promoters of ABI3 developmentally coexpressed genes. Pink represents the ERD1 motif, while green represents the RYE motif elements. Once you have a given sequence motif you can identify other genes in the genome that may contain this element. You can then query coexpression databases to see if these genes are coexpressed with your gene of interest or, in our case, if they are coexpressed with ABI3 under any other conditions. 6. Visualization of multiple overrepresented elements can further determine whether there are any co-localized elements within the promoters of these coexpressed gene promoters. This could indicate potential combinatorial gene regulation. For instance, the pink element (ERD1 motif) and the green element (RYE motif) are located beside each other in many of the coexpressed genes (see Fig. 4). 3.3.2 Athena The Athena analysis tool [9] from the Wyrick Laboratory at Washington State University integrates DNA sequence and Gene Ontology (GO) data to facilitate the analysis of 30,077 predicted Arabidopsis promoter sequences and 105 different transcription factor binding sites (see Note 13). We will use Athena to identify transcription factor binding sites present on the ABI3 promoter and then will identify common TF binding sites of genes developmentally coexpressed with ABI3 (see Table 2). Bioinformatic Tools in Arabidopsis Research Analysis box allows access to Motifs and Frequency information tools. Click Download Promoters to get the promoter sequence(s) of gene(s) of interest. Diagrammatic representation of potential TF binding sites present in the promoter of the gene(s) of interest. Different TFBSs are denoted with different colors. 109 Lists of non-enriched or enriched TF binding sites present in the promoter of gene(s) of interest. In this case, no TFBSs are enriched in the promoter of the ABI3 gene. Fig. 5 Graphical output from Athena showing potential transcription factor binding sites in the ABI3 promoter Transcription Factor Binding Sites on the ABI3 Promoter 1. Go to http://www.bioinformatics2.wsu.edu/cgi-bin/Athena/ cgi/home.pl and click the Visualization tab in the menu bar along the top. 2. Enter your gene of interest’s AGI ID, “At3g24650” in our case (ABI3), into the Accessions box (see Note 14). Select “Compact” for the visualization type (for more detail about the structure of the promoter, select “Cartoon”), and choose the Maximum bp upstream range of the promoter to be “3,000 bp” in the Upstream Range box. Tick “Cut-off at adjacent genes” to truncate the promoter when it overlaps with the next gene upstream. Click Display. 3. The output window has three boxes (see Fig. 5):. – The “Analysis box” gives a text output of the analysis. Click “Motifs” to get information of the sequences and positions of all selected TF binding sites in the promoters. Click “Frequency” to get the data of the frequency of promoters genome-wide containing the TF sites and the calculated p-value. To get the promoter sequence of the gene, click “download promoters.” 110 Miguel de Lucas et al. – The “Selected Promoters” box provides a graphical representation of the predicted TF binding sites in the promoter. – The “TF box” lists the present and significantly overrepresented binding sites in the promoter with p-value calculated using the hypergeometric probability distribution, name of the binding sites, and number (#S) of times each specific binding site is present on the promoter. The p-value can be additionally useful when one inputs a group of genes that are coexpressed with the gene of interest. This will let you identify putative common transcriptional regulators by statistically significant overrepresentation of binding sites in the promoters of the group of genes. 4. Athena identifies 20 binding sites on the ABI3 promoter (3,000 bp). The binding site with the lowest p-value is the “Z-Box promoter motif” (p-value = 0.042, motif = ATACGTGT). We recommend using a p-value threshold of less than 0.01 in which case this motif is not significantly enriched in this promoter given its overall presence in the promoters of the genome. If it did pass this threshold value then we could predict that a bZIP transcription factor might be a likely transcriptional regulator of ABI3. Upstream Co-regulator Identification Using a Coexpressed Gene List The Athena Analysis Suite tool is a powerful interface where you can find genes with common TF binding sites in their promoters. You can analyze the entire Arabidopsis genome or use your own list of genes. For the purpose of this chapter, we will find genes that contain the ABRE binding site motif (see Note 15) in the promoter, from the list of ABI3-coexpressed genes generated in Subheading 3.2.1 (see Table 2). 1. Go to http://www.bioinformatics2.wsu.edu/cgi-bin/Athena/ cgi/home.pl and click Analysis Suite. 2. The “Accessions” box contains the options for gene list selection, tick “Use a subset,” and paste the list of genes from Table 2. Select ABRE binding site motif from the list of TF motifs present in the “Transcription Factors” box, and click “Add TFs.” You can include GO terms in the “Gene Ontology” box. Multiple TF motifs and/or GO terms can be included in the analysis. 3. We can constrain the TF search to specific positions in the promoter sequence using the “Motif Positions” box. The Start and End numbers indicate the beginning and end of the positional search constraints, respectively. As done earlier in this section using the Athena Visualization tool, choose the Bioinformatic Tools in Arabidopsis Research 111 Maximum bp upstream range of the promoter to be “3,000 bp” in the “Range Selection” box, and select the “Cut-off at adjacent genes” option. Click Submit. 4. Seven of the ABI3-coexpressed genes (At1g48130, At2g21490, At4g16160, At5g10140, At5g24130, and At5g50360) contain the ABRE binding site motif. Click on the AGI ID to open the TAIR gene information. At5g10140 encodes Flowering Locus C (FLC), which functions as a repressor of floral transition. ABA has been previously demonstrated to delay flowering by affecting the transcript level of FLC [37]. 3.3.3 TAIR Motif Analysis Athena and Cistome analyze promoters for overrepresented previously validated or characterized regulatory elements. Cistome also provides access to five other prediction programs. The Motif Analysis algorithm from TAIR provides an alternate source by searching for overrepresented 6-mer oligos in upstream regions of genes. 1. Go to http://www.arabidopsis.org/tools/bulk/motiffinder/. 2. Add your list of genes by typing the AGI ID or the sequences in FASTA format. Here we will use the list of genes in Table 2. 3. Indicate the length of the regulatory sequence that will be included in the analysis (e.g., 3,000 bp), and select the output file (e.g., HTML). Click Submit. 4. Motif Analysis from TAIR identifies statistically overrepresented 6-mer oligos occurring in three of more sequences in the gene set. The overrepresented 6-mers are sorted by p-value determined by comparing against a binomial distribution, and genes with that particular sequence are indicated. 3.4 Functional Classification Functional classification of gene lists is one of the basic methods in bioinformatics for making sense of sometimes rather large gene lists that arise from gene expression profiling experiments. Typically, one might look at individual genes in such lists and “see if it fits biologically,” but one might also like to have an overview of broad functional categories that change in response to a given stimulus or due to a specific mutation. One of the very useful large initiatives of the past decade was the development of a Gene Ontology (GO) for the “unification of biology” [38]. Basically, this system is a set of categories, which are described using defined terms instead of in a free-form manner, into which genes can be assigned. There are three main super-categories: biological process (BP), molecular function (MF), and cellular component (CC). TAIR has been the main curator for GO annotations for Arabidopsis genes, with some input from other groups. A gene may belong to several categories and subcategories at once, which are arranged in a hierarchical manner from very general to very specific terms (technically, the 112 Miguel de Lucas et al. relationships between categories and sub-categories are formalized as a directed acyclic graph). It is possible to use statistical tests, often a hypergeometric test, to assess whether or not the number of genes observed associated with a given term (i.e., category) from one’s list of interest is enriched relative to the number one might expect to see by chance. Such a test can be used for any classification system which has categories into which things are classified. Another system of classification called MapMan Bins was initiated by Björn Usadel and colleagues at the Max Planck Institute for Molecular Plant Physiology in Germany [16]. A variation on this approach is to examine genes whose expression is altered in response to a perturbation in the context of the biological pathways to which they belong. 3.4.1 AgriGO AgriGO [12] out of Zhen Su’s laboratory at the Chinese Agricultural University is a user-friendly tool for analyzing whether any particular GO terms are enriched in a given gene list from Arabidopsis (or for many other agriculturally important species). It provides a nice visualization in the same directed acyclic graph structure on which the GO system was developed. 1. Go to http://bioinfo.cau.edu.cn/agriGO/ and select “Analysis Tool” in the tab along the top. 2. In the first section for selecting the analysis tool, select “Singular Enrichment Analysis (SEA).” 3. Select the species (the default is Arabidopsis thaliana). 4. Paste in the Query list as AGI IDs, gene aliases (e.g., ABI3), GenBank IDs, etc. A large number of different identifiers are supported. 5. Choose a reference—if the list comes from a microarray experiment—and then choose the appropriate microarray platform; otherwise if the list comes from an experiment where it is possible to identify any of the AGI IDs present in the TAIR genome annotation (such as the case with a proteomics experiment or an mRNA-seq experiment), then choose the “Arabidopsis genome locus (TAIR)” option—this aspect is a nice feature of AgriGO. In this example, we will submit the top 50 genes coexpressed with ABI3 in the AtGenExpress Tissue Set as discussed in Subheading 3.2.1, step 1 (see Table 2). As the data used to obtain the coexpressed genes come from the Affymetrix ATH1 platform, we use this platform as our reference. 6. Under “Advanced Options—optional” one can select one of three methods for statistical enrichment (hypergeometric distribution, Fisher’s exact, or chi-square) as well as one of seven multiple hypothesis testing correction methods. We recommend the use of the Storey Q-value method. Bioinformatic Tools in Arabidopsis Research 113 GO Directed Acyclic Graph shows partial GO structure with only relevant enriched terms Enriched GO Terms are coloured based on significance Click on GO Term to see all of the genes associated with that term (not just those in input list) Legend red=more significant Fig. 6 Graphical output from AgriGO for the top 50 ABI3-coexpressed genes in the AtGenExpress Tissue Set from Subheading 3.3.2, step 2. The GO term “Lipid localization” (red) is most significantly enriched among these genes (see Note 16) 7. In the output, a table of enriched GO categories for our list of 50 genes is displayed showing that four GO biological process terms (lipid localization, response to abscisic acid stimulus, macromolecule localization, postembryonic development) and two GO molecular function terms (nutrient reservoir activity, lipid binding) are significantly enriched. Examining these, they seem to “make sense” in the context of the later stages of seed development, when ABI3 and these genes are expressed, insofar as this is the time when lipid reserves are being accumulated and the seed begins to dessicate, etc. There is also the possibility to create “Graphical Results” or a “GO Flash Chart.” If we click on the Generate Image button, the following output is generated for enriched biological processes (see Fig. 6). 3.4.2 AmiGO AmiGO [13] provides a generic interface for computing GO term enrichments for all of the species annotated by the GO Consortium. 1. Go to http://amigo.geneontology.org/cgi-bin/amigo/ term_enrichment. 114 Miguel de Lucas et al. Click “view” to visualize results as a “directed acylic graph” diagram Enriched GO terms in dataset. Mouse-over the GO term to get information about each particular term Degree of confidence and frequency of each term List of genes associated with each GO term. Click over the gene name for more gene information Fig. 7 Output from AmiGO shows the enriched GO terms in the data set, and the genes associated with each GO term 2. Paste your gene identifiers into the “Input your gene products” box. Using Genevestigator, we have shown that ABI3 expression depends on the presence of LEC1. To better understand this ABI3-LEC1 relationship, we will use in this instance genes whose steady-state transcript level is increased in LEC1 overexpressor (OX) plants from [22]. This will allow us to determine biological functions associated with genes that are also overexpressed and likely downstream of LEC1. Select TAIR as the database filter, and then submit the query. It is possible to exclude GO annotations that have been inferred electronically (IEA annotations) when performing the enrichment analysis using this tool. Click Submit. The output for this gene list is shown in Fig. 7. 3. Mouse over the gene’s ID to get more information (protein sequence, TAIR link, etc.). At the top right of the gene information page, we can explore the terms associated with that particular gene by clicking on “Terms association.” 46 genes from the genes upregulated in LEC1OX plants are grouped into “lipid metabolic process,” with a p-value of 8.26e-13 and a frequency of 11.1 % (2.7 % is the frequency of this term Bioinformatic Tools in Arabidopsis Research 115 in the background). LEC1 is associated to this term, but it is also associated with other ten different terms as, i.e., ABAmediated signalling, blue light signalling pathway, embryo development, and others. 3.4.3 Classification SuperViewer The BAR’s Classification SuperViewer [14] provides a different way to view Gene Ontology and MapMan classifications for lists of genes, using a barcode scheme. Classification SuperViewer barcodes are also integrated into several others of the BAR’s output tools. 1. Go to http://bar.utoronto.ca/ntools/cgi-bin/ntools_classification_superviewer.cgi and input your list of genes. 2. Select the classification scheme you wish to use under the second point, either GO (actually GO Slim in the case of this tool) or MapMan. 3. Leave the other options as they are, and click Submit Query. 4. The output page is divided into three parts: an overview table showing which categories are enriched (by a hypergeometric test with a p-value cutoff of 0.05) in bold, a chart area summarizing the category information in a different way, and a detailed table section, which is linked from the overview area (see Fig. 8). In these areas the grey background sections are GO biological process terms, those with a white background are GO molecular function terms, while those with a yellow background are GO cellular component terms (this shading scheme does not apply for MapMan terms). 5. In the Overview section, categories that are overrepresented relative to the total number of instances of the term in the overall GO or MapMan database (see Note 17) are bolded. The relative enrichment is shown on the left, while the absolute number of counts in a given category is on the right. The color scheme for the categories is also used in the chart section and for the bar code in the table section. In the case of a list of the top 50 genes coexpressed with ABI3 in the Developmental Map, the Developmental Processes and Transport categories are overrepresented as might be expected for the number of genes in this list involved in the process of dormancy as seeds mature and in transporting lipids to provide reserves for the seed when it germinates. These categories are also seen with AgriGO. 6. The Chart section shows the overrepresented categories relative to the frequency in the overall Arabidopsis genome or in terms of absolute counts on the left and right side, respectively. 7. The Table sections show details for every single gene in the input list. A bar code system using the same color scheme as in the other two sections shows that in many cases a given gene 116 Miguel de Lucas et al. Overview Tables show GO Slim categories that are enriched with a bolded p-value Charts summarize GO Slim information in another way (grey = process, white = function, yellow = location) Detailed Table is linked from Overview table: genes in a particular category are grouped Fig. 8 Output of Classification SuperViewer for a list of the top 50 ABI3-coexpressed genes from a query of the BAR’s Expression Angler tool in the AtGenExpress Tissue Set compendium (Table 2) falls into several GO categories. Genes are grouped by category, with the final bar on the right being the category used for grouping. A gene will appear in this table as often as the number of bars in its bar code. Mousing over a particular bar will provide information on the actual GO term. 3.5 Pathway Visualization One of the biggest issues working with large-scale data sets is to represent the information generated in a mode that is easily visualized and from which one can quickly generate hypotheses. In the context of metabolic pathways this is considerably important. If a series of enzymes in a pathway are upregulated or downregulated, there is a greater chance that the metabolism of the compounds associated with this pathway will be perturbed in a corresponding manner. Pathway visualization tools were generated to integrate an analyze data from large-scale experiments and place that information in an easy-to-interpret metabolic context. In this section we will introduce two different visualization tools used to describe a wide set of Arabidopsis metabolic pathways. 3.5.1 AraCyc 8.0 [15] is the most comprehensive Arabidopsis-specific metabolic database (see Note 18). We can use their tools to visualize individual metabolic pathways, to view the complete metabolic AraCyc Bioinformatic Tools in Arabidopsis Research 117 Related Metabolic Pathways show inputs and outputs for this particular pathway in green text Arabidopsis gene names are coloured purple Corresponding enzyme names appear in orange text Compounds are denoted by red text Fig. 9 Overview of ABA biosynthesis in AraCyc as resulting from a query with “abscisic acid biosynthesis” map of Arabidopsis, or to predict metabolic pathways from a list of genes. We will demonstrate how to use these three options to characterize the role of ABI3 as it pertains to plant metabolism. As ABI3 is highly expressed after treatment with abscisic acid (ABA), we may be interested in learning more about genes that function to synthesize ABA. 1. Go to http://www.plantcyc.org/. 2. In the search box write the name (or a keyword) of the pathway in which you are interested. In our case we will write “Abscisic.” Then choose AraCyc as the metabolic database. Click search. 3. The search results contain a window with a list of pathways, proteins, compounds, and reactions that match with our word. We just need to click on the one we want to explore, in our case “abscisic acid biosynthesis” (see Fig. 9). 4. AraCyc shows a diagram with the enzymes (orange), compounds (red), genes (purple), and related pathways (green) of the abscisic acid biosynthesis pathway. If we click on “more detail” the molecular structure of the compounds appears on the diagram. Below the diagram, we can find information about the chromosomal localization of the genes in the pathway, a brief description of the biological context of the pathway, and the references AraCyc used to generate the pathway. 5. To get information about the enzymatic reaction in which the gene is involved, click on the enzyme name (not the AGI ID). This will take you to a new window with more information. For instance, clicking on the 9-cis-epoxycarotenoid dioxygenase will give all interactions in which this enzyme could be involved in, as well as the enzymatic reactions of all closely related homologs. 118 Miguel de Lucas et al. 6. To get detailed information on the gene through TAIR, double click on the gene name. For instance, ABA4 (At1g67080) encodes a neoxanthin synthase involved in the conversion of violaxanthin into trans-neoxanthin, which is an early step in ABA biosynthesis. We can expect that mutants in ABA4 have reduced levels of ABA; hence, the expression of ABI3 will be reduced too since it is ABA responsive [25]. Transcriptome analysis of ABA4 mutants will be useful to study the plant’s behavior in the absence of ABA to determine any correlation with loss of ABI3 function. In Subheading 3.1.2 we determined that ABI3 was upregulated in LEC1 overexpression plants (pER8-LEC1) [22]. We will use the list of genes upregulated in LEC1-OX plants [22] to predict metabolic pathways that LEC1 overexpression modulated with the OMICs Viewer tool of the AraCyc database. These genes may act with ABI3 to influence plant form or function. 1. Go to http://pmn.plantcyc.org/ARA/expression.html. 2. In the left part of the window, the OMICs Viewer summarizes the type of data we can analyze. The file must be in tab-delimited text format and the first column must be the locus name (e.g., At3g24650) and the second the expression value (see Note 19). Click on “Browse” to upload the file. Choose “Relative” or “Absolute” values to display. As we have only one column of expression data, tick “a single data column.” As our data are log2-transformed, we will use the “0-centered scale.” We are using locus names in our data, so choose “Gene names and/or identifiers” as the items that appear in the first column of our data file. In our data file we only have one experiment, so type “1” in the data columns box (if your data has multiple set of values, type the numbers of the columns you want to display). We can also play with color scheme options and display type. We will leave the default options. Click “Submit.” 3. The output window shown in Fig. 10 shows a diagram with all metabolic pathways of Arabidopsis. The OMICs viewer uses red to represent highly expressed genes. Multiple genes involved in s appear to be highly upregulated and overrepresented in our expression data which suggests that GA biosynthesis may be upregulated in the LEC1-OX plants. 4. To see in detail the pathways represented in our expression data, go back to the main website of the OMICs viewer, and check “Generate a table of individual pathways exceeding threshold” and select a threshold value, e.g., 1.5-fold. Clicking on a pathway shows the detail for it (see Fig. 11). 5. LEC1-OX appears to promote gibberellin biosynthesis though the activation of genes involved in that metabolic pathway, Bioinformatic Tools in Arabidopsis Research Mouse over to identify the metabolite or the reaction. Click on metabolites to navigate to the metabolite page Lines represent reactions. Line color represents expression level, as per legend GA biosynthetic pathway 119 Each node represents a metabolite.The shape of the node represents the type of metabolite Fig. 10 Output of AraCyc’s OMICs viewer summarizing the increases and decreases in transcript abundance in LEC1 overexpression plants Red lines represent the reactions that exceed the threshold Arabidopsis enzymes that catalyze each reaction. The colored ones are represented in our dataset Fig. 11 Detail generated by clicking on an overrepresented pathway in the OMICs Viewer 120 Miguel de Lucas et al. such as GA20 oxidases 3 and 7. LEC1 acts as a positive regulator upstream of ABI3 [39], as ABI3 is upregulated in LEC1-OX plants. As we have shown using Genevestigator, the GA biosynthetic inhibitor paclobutrazol inhibits ABI3 expression. It appears that LEC1 and ABI3 could play a role in the crosstalk between ABA and GA pathways, which supports the known influence of these genes in these pathways. 3.5.2 MapMan One of the most widely used software for pathway visualization is MapMan [16]. This software classifies genes and metabolites in ontologies based on metabolic pathway, cellular function, biological response, and gene families. The main advantage is that the user can download the software and work offline. Also the databases associated with MapMan are well annotated and are easily downloadable in a format that is useful for bioinformaticians. 1. Go to http://mapman.gabipd.org/web/guest/mapmandownload and download the latest version of MapMan ( see Note 20). Open MapMan. 2. Once open, the software shows the “get started” window that will help us on the tool use. Basically, MapMan works by combining a data file (experimental results) with diagrams (pathways or chromosomal views) and mapping information. Every file is stored in a specific folder (left side of the program). Before starting the analysis, it is worth exploring the files available in MapMan (pathways and mapping files). To download more pathways or mapping files from the MapManStore server, click “File,” “Add pathway” or “Add mapping,” click “Download,” and choose a pathway/map from the list, i.e., download the last gene TAIR annotation. 3. Upload your data; go to “File,” and “Add data.” Data must be in .xls or a tab-delimited .txt file; first column should contain the AGI ID or Affy ID numbers and the second column with the expression values. The data will be stored into the “Experiments” folder. We will use the genes upregulated in the LEC1-OX plants present in [22]. 4. For visualization of the data, choose a pathway from the left and double click, i.e., “Regulation overview.” Choose a mapping according to the data. If the data contain AGI IDs use Ath_AGI_TAIR, and if they contain Affy IDs, use Ath_AFFY_ TAIR. For LEC1OX genes, click on the data file uploaded in step 3. 5. MapMan shows a representation of the pathways and genes showing altered regulation (see Fig. 12). Each gene is symbolized by a square and expression is color encoded (by default red denotes downregulated, blue denotes upregulated). As we are looking at overexpressed genes in the LEC1-OX, we only see blue colors. We can see that LEC1 overexpression promotes Bioinformatic Tools in Arabidopsis Research 121 Fig. 12 Output of a MapMan pathway analysis using genes upregulated in LEC1-OX plants the expression of transcription factors, genes involved in protein modification and degradation. Looking at hormone pathways, we can see that LEC1 promotes the expression of genes involved in auxin, brassinosteroid, and gibberellin metabolism. Below the pathway representation, there is information about the statistical enrichment (using the Wilcoxon rank-sum test) performed in MapMan. Mouse over gene squares to see information about gene function, name, and expression value. More information about how to use MapMan with experimental data is provided in an online tutorial on the MapMan site. 3.6 Protein Information 3.6.1 SUBA III The subcellular location database for Arabidopsis proteins [17] at http://suba.plantenergy.uwa.edu.au/ is a comprehensive resource encompassing experimental (“direct assay”) data from more than 1,000 publications, in which 4,110 entries comprising 2,647 distinct proteins are based on chimeric fusion studies, and 2,4142 entries comprising 7,893 distinct proteins are based on subcellular proteomic studies. In addition, subcellular localization predictions generated by 25 algorithms are also provided. It is possible to specify what you would like to retrieve from the SUBA database on the 122 Miguel de Lucas et al. Fig. 13 SUBA III input page showing various options input page. Alternately, one can query in a general manner, either for a single gene or for a list of genes as follows: 1. Go to http://suba.plantenergy.uwa.edu.au/. Click on the “Search” tab. 2. In the input box at the bottom of the input page, enter your AGI ID of interest. In this case we will enter ABI3’s AGI ID, At3g24650. Click the “Add” button to “Arabidopsis Gene Identifier” and ensure that the pull-down list is selected as “is in list” to generate a SUBA query. The Gene Identifier will now appear in the Query box at the bottom of the page (see Fig. 13). Alternately, use the Quick Search function on the SUBA Home page. 3. Click Query. 4. A Results page containing a list of the genes in your input will be generated. Click on the desired AGI ID to see the data for this gene product, in this case the At3g24650.1 link—the resulting page is called SUBAIII flatfile for At3g24650.1. 5. On the flatfile page for the desired AGI ID, here At3g24650.1, we see that there is no MS/MS or GFP data for ABI3’s subcellular localization but that SwissProt reports that it is in the nucleus. Similarly, 10 of the twenty five prediction programs, SubLoc [40] and WoLF PSORT [41], both predict it to be located in the nucleus. We can also see the predicted hydropathy plots for the protein, along with other data. Given that ABI3 is a transcription factor, we expect it to be located in the nucleus. However, for proteins with unknown functions, it might be useful to have a prediction or exact data regarding where it might be located in the cell in order to predict function. Bioinformatic Tools in Arabidopsis Research 123 Pictograph shows subcellular compartments. Locations that are documented or predicted are colored depending on confidence of localization in a given compartment (red = highest confidence) Data Source Options allows predicted locations to be masked Link to SUBA allows easy access to data used by Cell eFP Browser Fig. 14 Cell eFP output page for At3g24650, ABI3 3.6.2 Cell eFP Browser Cell eFP Browser: Data from SUBA III can be rendered onto a pictograph of the parts of the cell using the Bio-Analytic Resource’s Cell eFP Browser [7]. The Cell eFP Browser taps directly into the SUBA III database and uses a simple heuristic algorithm that weighs “direct assay” subcellular localization data higher than prediction programs to provide a visual representation of where the protein is localized within the cell. 1. Go to http://bar.utoronto.ca/cell_efp/cgi-bin/cell_efp.cgi. 2. Enter the AGI ID for a gene of interest, for example, for ABI3 (At3g24650). 3. Click Lookup. 4. On the output page a pictograph will be displayed showing the localization of the protein (see Fig. 14). A stronger red color denotes that several direct assays have documented the protein being at a particular location. Predictions receive a weighting only one-fifth of that for direct assays. 5. It is possible to adjust the data sources used for display by using the boxes on the right side of the Cell eFP output. 3.7 Protein–Protein Interaction Networks There are several databases to explore for Arabidopsis protein–protein interactions, notably the BAR’s Arabidopsis Interactions Viewer (AIV) and TAIR’s NBrowse, described below. However, it is advisable to examine other databases, such as IntAct (not specific for Arabidopsis) at http://www.ebi.ac.uk/intact/ [42], BioGRID (thebiogrid.org) [43], or AtPID (http://www.megabionet.org/ atpid/webfile/) [44], as literature curation efforts are by no means complete for any of these databases. 124 Miguel de Lucas et al. 3.7.1 Arabidopsis Interactions Viewer The BAR’s Arabidopsis Interactions Viewer at http://bar.utoronto.ca/interactions/ [18] currently permits the exploration of 70,944 predicted and 28,505 experimentally determined protein– protein interactions curated by BIND, the BAR, IntAct, TAIR, etc. One may submit a list of gene (product) identifiers and the AIV will return the interactors of the proteins. It is possible to return only experimentally documented interactions or all interactions including those predicted through the use of the interolog method (interacting ortholog) [18]. Attractive features of the AIV include the ability to upload Cytoscape files (.cys files) as well as the ability to color nodes by their expression level in different tissues to help define subnetworks in different tissue types. 1. Go to http://bar.utoronto.ca/interactions/. 2. Enter an AGI identifier, or a list of identifiers, and select any of the options you wish. The default setting will return all experimentally determined and predicted interactions for your gene products of interest. For this example we will not check any of the additional options, and we’ll again use ABI3, At3g24650 to search for proteins with which it interacts. 3. Click Submit. 4. On the output page, a network graph of ABI3 interactors appears, plus a legend, some further options, and a table of these interactors at the bottom of the page (see Fig. 15). 5. In the network graph, the smaller nodes represent the proteins that interact with ABI3, and the edges denote the interactions between the proteins. Node color indicates protein subcellular localization. Edges colored in light blue indicate interactions for which experimental evidence was obtained. We see that ABI3 interacts with ATSYP23 (At4g17730) and ABI5 (At2g36270), as determined experimentally in both cases by yeast two-hybrid assays [45, 46]—clicking on the links in the BIND/PubMed column takes one to the published reference for a given interaction (see Note 21). These edges are colored light blue. In the case of the interaction with ATSYP23, this was determined by an experimental screen, so it may represent a worthwhile candidate for further investigation as it was not followed up on in that publication in any great detail. Two other interactions, with ATBZIP10 (At4g02640) and ATBZIP25 (At3g54620), are predicted by the interolog method [18], and thus, the edges are colored grey. These represent other potential candidates for follow-up investigation, whereby it should be noted that the level of support for all these predictions is low, with just a CV (confidence value) of 1. See ref. 18 for further information on the calculation of CV and coexpression scores—basically all of the AtGenExpress data sets were used for the coexpression calculations in the AIV, about 1,000 data sets in total, similar to ATTED II’s condition-independent calculation. Bioinformatic Tools in Arabidopsis Research 125 Edges are colored by coexpression score or experimental support, and vary in width depending on interolog support Mouse-over nodes to see protein annotation and other information Gene Expression Option allows nodes to be colored with expression data from gene expression compendia instead of by subcellular localization Table shows protein-protein interaction information including coexpression scores and subcellular localization, with links to publication details Fig. 15 Output page of an Arabidopsis Interactions Viewer query with At3g24650, ABI3 6. The default output is for the nodes to be colored according to their subcellular localization as documented in the SUBA III database (see above). A useful feature is to color nodes according to their expression levels in a given tissue. Clicking the Show Gene Expression Options box on the left-hand side of the output screen under the Download to Excel button calls up two drop downs, one for Data Source and one for Tissue/Condition. The Data Source option allows you to explore different compendia (the same ones as visible in the various eFP Browser views described earlier), while the Tissue/Condition allows you to choose which tissue or condition within a given compendia you are interested in using to retrieve expression level data for painting onto the nodes. In this case, we will examine the expression levels in Seeds Stage 10 w/o Siliques in the Developmental_Map data source by selecting these and clicking Show expression view. These data are mostly from Schmid et al. [23]. In this case, we see that ABI3 and ABI5 (but not the other interactors) are both strongly expressed in the seeds at later stages of development, consistent with their known biological roles. It is possible to explore the expression levels for the corresponding nodes (genes) by selecting different data sets and tissues/conditions to permit you to identify other tissues in which other nodes are more strongly expressed (e.g., Tissue_ specific/Guard Cells no ABA). However, in all data sources and tissue/conditions queried, there are no conditions where ABI3 126 Miguel de Lucas et al. Edge Filter Panel permits filtering of edges (interactions) by method used to detect interaction, e.g. confocal imaging. Information Panel show details on select node or edge, including links to PubMed. Main Output Panel shows interactions for requested gene. Nodes are grey, while edges are coloured by interaction detection method. Fig. 16 Output page of an NBrowse query with At3g24650 and these other proteins are highly coexpressed. This indicates that these interactions likely do not occur in planta, at least insofar as can be determined from existing data sets. 3.7.2 NBrowse NBrowse: TAIR’s NBrowse permits the exploration of 8,628 experimentally determined interactions curated by TAIR, BioGRID, IntAct, and others. It offers the ability to specify the type of experimental method for determining a given protein– protein interaction. 1. Go to http://www.arabidopsis.org/tools/nbrowse.jsp and enter your protein of interest’s AGI ID (e.g., At3g24650) or symbol (ABI3), check the Launch with query checkbox, and click Launch. 2. A Java applet will be started on your computer. The output is shown in Fig. 16. 3. It is possible to filter the interactions (edges) by the type of method used to determine a given protein–protein interaction using the Edge Filter Panel. 4. Clicking on a specific node (protein) or edge (interaction) will cause information on that protein or interaction to be shown Bioinformatic Tools in Arabidopsis Research 127 in the Information Panel, including links to the PubMed reference for the given interaction. 5. It is also possible to upload your own interaction data according to the format they specify (see their help file) and explore them in the context of other documented interactions in the NBrowse database. 3.8 Integrated Tools 3.8.1 VirtualPlant Integrated tools associate data from multiple heterogeneous sources of genomic data to obtain more accurate predictions. Most of the bioinformatic tools described in this section integrate protein and genetic interactions, pathways, coexpression, co-localization, and protein domain similarity and allow the user to generate hypotheses in a rapid and facile manner. VirtualPlant [19] integrates genomic data from different sources (see Note 22) and provides a set of tools to visualize and analyze these data. One extremely useful attribute of VirtualPlant is that data and analyses can be stored on the website. 1. Go to http://virtualplant.bio.nyu.edu/cgi-bin/vpweb/. If you wish to store your data, click on “Login” to register. The darkblue navigation bar at the top of the page contains the different VirtualPlant tools. 2. Click on “Query.” To perform a query, select an option on the type list (i.e., genes) and add a keyword (e.g., ABI3). The results are displayed in a table; click on the gene that best matches your query (i.e., ABI3, At3g24650). VirtualPlant shows all the information available on the server about our query, including annotation, gene models, and external links. For additional data click on the “Gene Family” folder to see more members of the ABI3/VP1 transcription factor family (the ABI3/VP1 family has 11 members). 3. To analyze a list of genes, data must be uploaded. The user can upload a list of genes or microarray experiments. One useful feature of VirtualPlant is that for microarray analysis, .CEL files can be uploaded and normalized (GCRMA or MAS5 methods) using VirtualPlant. 4. In the dark-blue toolbar click on “Upload Data” followed by “Click here to upload one or more list of genes” and paste your list of genes or upload a file following the format described at the top of the page, or paste your list of genes, i.e., paste the AGI IDs from the list of genes upregulated on LEC1-OX plants [22]. Click “Submit.” Our list of genes is now uploaded in the “My Genes” folder. Click “Analyze” in the navigation bar. The analysis window shows the gene sets in our folder—select our data set (see Fig. 17). On the “Analysis” menu, select the experiment you want to perform. One of the most beautiful 128 Miguel de Lucas et al. Your cart. Your data sets and the files generated during the analysis will be stored here. Navigation bar. Use this bar to upload your data sets and start with the analysis. Analysis window. Select a list of genes and an analysis tool. List of Analysis Functions. Fig. 17 VirtualPlant workspace Fig. 18 A snapshot of the Cytoscape graph output from VirtualPlant. Metabolic interactions (blue edges) from KEGG or AraCyc as they are determined by regulation of genes overexpressed in a LEC1-OX line are visualized analysis tools available on VirtualPlant is the “Network Analysis” tool. Here, with a list that is available one can select from a variety of interactions including validated TF-target, microRNAtarget mRNA, and metabolic and pathway interactions from KEGG and AraCyc. An independent Cytoscape browser (“VirtualPlant meets Cytoscape”) is launched (see Fig. 18). One can explore the different interactions by coloring the Bioinformatic Tools in Arabidopsis Research 129 edges with different colors in Cytoscape via the VizMapper tool. In this case we can determine that the majority of genes overexpressed within LEC1 are metabolic in nature. 5. VirtualPlant also allows the analysis of multiple gene lists at the same time. We may be interested in finding common genes between the two experiments. In our case we would like to determine if there are any genes that are upregulated when LEC1 is overexpressed and that are coexpressed with ABI3. This would identify that LEC1 is sufficient to regulate these genes which also may share functionality with ABI3. We will additionally upload the list of the Top 50 ABI3 developmental coexpressed genes (see Table 2) to explore this functionality. Click “Analyze.” Select both lists of genes (LEC1-OX upregulated genes and Top 50 ABI3 developmentally coexpressed genes (see Table 2)). On the “Analysis menu,” select “intersect.” VirtualPlant will generate a new file in “My Genes” folder with the common genes between both data sets. This file can be used for further analysis. 3.8.2 GeneMania The GeneMania [20] algorithm uses a Cytoscape plugin to integrate protein and genetic interaction data, coexpression, and colocalization information. We can use GeneMania to predict the function of a single gene or to find new members of a pathway or a protein complex. In this tutorial we will explore the relationship between PIF1 and ABI3. 1. Go to http://www.genemania.org/. 2. GeneMania integrates data from seven different organisms. Next to “Find Genes in”, select Arabidopsis thaliana and add your gene or list of genes into next to “related to.” GeneMania recognizes gene names and AGI IDs, but not Affy IDs. If GeneMania does not recognize your query, it will tell you with a yellow speech bubble. We will add ABI3 (At3g24650) and PIF1 (At2g20180) in the second window (one gene per line) to try to predict a mechanism for why ABI3 expression is downregulated in pif1pif3pif4pif5. 3. On the left part of the window, a network graph visualized using Cytoscape is displayed with colored edges to indicate different interaction types between different genes. Brown indicates predicted interactions, grey indicates coexpression, dark blue indicates physical interactions, light blue indicates co-localization, and green indicates genetic interactions. On the right side of the window, there are four tabs. The “network” tab gives the option to select the type of interactions we want to see on the right diagram, e.g., we can check the physical interactions tab only. It looks like PIL5 could form a protein complex with ABI3 and At5g61380. There are many examples 130 Miguel de Lucas et al. Choose your organism here. Interactive Network visualization. User can modify network visualization and get gene information just with playing with the mouse. Write your gene names here. Fig. 19 GeneMania output by which protein complexes can have autoregulatory function on one or more of the members of the protein complex [47]. By clicking on the nodes that represent the genes, we get more details regarding gene function. For instance, At5g61380 is a two-component response regulator and possesses transcription regulatory activity. The “gene” tab gives a list of interactors with our query proteins, e.g., the DELLA protein interacts with PIF1. It has been described that DELLAs repress PIF activity and that they accumulate in the absence of GA [48, 49]. This could potentially be the mechanism by which negative crosstalk exists between ABA and GA. The “Functions” tab shows the GO annotation of the genes in the network. We can sort the list by GO annotation name by the False Discovery Rate of by Coverage (number of genes in the network with a given function divided by all the genes in the genome with that function) (see Fig. 19). 4. Above, the network diagram there is a bar with more options to save the data or to play with network graph visualization. 3.8.3 ePlant The easy-to-use ePlant website [21] integrates six essential tools for plant biology research. With only a few mouse clicks the user can find homologous genes and polymorphisms, visualize gene expression in the whole plant and/or in different tissues, determine the subcellular localization of a protein, find its interactors, and Bioinformatic Tools in Arabidopsis Research Insert AGI ID into this box to start working with ePlant. Visualization Tools. Manipulate the controls here to zoom in and out, rotate and change position. 3D representation of Arabidopsis. Plant parts where the gene is expressed are in red. 131 Use this button to download numeric data. Fig. 20 ePlant output for ABI3 predict protein structure. However, the user can only investigate one gene at a time. 1. Go to bar.utoronto.ca/eplant/. 2. Type the AGI ID of your gene of interest next to the AGI ID box, at the top of the page, i.e., At3g24650 for ABI3. 3. Click on “Homologs and polymorphisms.” ePlant displays homologous genes for our query gene. Homologs are computed by using OrthoMCL. The amino acid sequences of the homologous proteins are aligned and represented in an interactive view that provides information of conserved residues, amino acid physiochemical properties, and single nucleotide polymorphisms. In the case of ABI3 there are no homologous genes, and there is one synonymous polymorphism, at least in the Nordborg et al. data set [50] that ePlant currently uses. 4. Click on “Plant expression,” “Tissue expression,” or “Subcellular location” to explore expression levels in the whole plant, in a specific tissue or developmental stage, or to determine where the protein is localized into the cell. For each analysis, ePlant uses a three-dimensional drawing that represents the Arabidopsis plant, different plant tissues, or a plant cell (Fig. 20). Expression levels are represented from yellow (low) 132 Miguel de Lucas et al. to red (high) in each drawing (see Note 23). On the left part of each page, there are tools to manipulate the visualization. The user can zoom in, zoom out, rotate the figure, and change its position on the three-dimensional axes. Click on the “sample list” at the right part of the page, to localize the different parts represented on each drawing. On the “Plant expression” and “Tissue expression,” under “sample list” buttons to change from “absolute” to “relative” expression levels. Click on “Retrieve signal data” to get and/or download the numerical gene expression information. ABI3 is highly expressed on seed siliques, and it is not expressed in the root, leaves, stem, or flowers (see Note 24). At the developmental and tissue-specific level, ABI3 is expressed in dry and imbibed seeds. In the subcellular location tool, ABI3 is predicted at low confidence to reside within the nucleus. 5. Click on “interactors” to view interactors with our gene. ePlant uses the BAR’s AIV database to generate and graph a network with edges and nodes. On the right side of the page, there is a menu to work on the network properties, i.e., we can filter the interactors (called neighbors) according to the confidence value of the edges (CV). With a CV ≥ 2, there are two ABI3 neighbors, At4g02640 (At_bZIP10) and At2g36270 (ABI5). We also can size the neighbors by coexpression values, as well as represent its subcellular localization with different colors. Right click on the network for visualization options. 6. Click on “protein model” to view a 3D structure of your protein. The next page provides a list of predicted models for our protein. Choose the one with the lower e-value. ePlant shows a 3D model from the Protein Data Bank or predicted by Phyre (see Note 25; Protein HomologY/analogy Recognition Engine). The options on the right of the page allow the user to highlight in red the polar and charged residues or to draw the protein surface. Below the options menu, ePlant represents the alignment between the sequence used for the 3D model and the query protein, i.e., the ABI3 3D model represents amino acids 566–678 of the protein. Right click on the model for visualization options. 4 Notes 1. Different microarray platforms are able to detect varying numbers of transcripts. The ATH1 array from Affymetrix has probe sets for 22,814 transcripts, some of which may come from several genes. Other microarray platforms or next generation sequencing technologies are more comprehensive, e.g., Arabidopsis Whole-Genome Tiling Array 1.0 or RNA-seq. Bioinformatic Tools in Arabidopsis Research 133 2. The Arabidopsis Genome Initiative identifier, AGI ID, is easily found at TAIR; see Chapter 4. 3. It is useful to set the signal threshold to some value when comparing different genes or viewing a number of different data sources. That way, the expression level that “red” denotes is constant. The expression level distribution graph is also a handy feature for determining if one’s gene of interest is a strong expressor. The small graph shows the distribution of the average expression level of all genes in the tissues depicted on the output, while the red line shows where the maximum expression level of the gene of interest falls along that distribution. 4. The Bio-Analytic Resource does provide a bulk query tool called “Expression Browser” which provides a Genevestigatorlike ability to query many genes at one; see http://bar.utoronto.ca/affydb/cgi-bin/affy_db_exprss_browser_in.cgi. 5. Genevestigator has no control over experimental design, and only a post-analysis is possible to check the quality of the array. For more information about quality control criteria visit https:// www.genevestigator.com/userdocs/manual/qc.html. 6. On the open access version one can only analyze a maximum of 50 genes simultaneously. To analyze more than 50 genes we can create a Genevestigator account. 7. For experimental normalization, Bioconductor’s RMA implementation. Genevestigator uses 8. A p-value under 0.06 indicates that the signal is reliably detected. 9. It is often useful to examine condition-dependent data sets, as genes may respond one way in a set of tissues and in an opposite way in others. If one lumps these sets together, then these correlations cannot be detected. This issue is described in greater detail in the Usadel et al. review [6]. 10. Given the number of samples in most of these data sets, even a Pearson correlation coefficient of 0.3 can be considered “significant.” But with this r-value, only (0.3)2 = 9 % of the variance is shared between two genes. An r-value of 0.7 means that coexpression explains 49 % of the variance in common between two genes. This is the reason why 0.7–0.75 is often used as a cutoff for coexpression analysis. 11. ATTED II uses the MR (Mutual Rank) value to rank the coexpressed genes; lower MR values means more correlation. This method was determined by the authors to have higher performance in the prediction of gene function than the Pearson correlation coefficient (PCC). 12. PLACE (Plant Cis-acting Regulatory DNA Elements) http:// www.dna.affrc.go.jp/PLACE/. This database has not been updated since 2006. 134 Miguel de Lucas et al. 13. Athena has not been updated since 2005. The TAIR gene annotations and cis-elements are not the most recent, but it is still a useful site. 14. The user can analyze of up to 100 genes at once on compact visualization type. 15. The ABRE binding site motif has the consensus motif (C/T) ACGTGGC, and it is known that the ABF (ABA-responsive element binding factor) family of transcription factors bind to that motif [51]. 16. GOrilla is another useful tool for such analyses and permits the ability to upload a ranked list of genes for enrichment analysis. It offers similar visualization of enriched categories. See http://cbl-gorilla.cs.technion.ac.il/ [52]. 17. Note that it is not possible to select a background data set for Classification SuperViewer. This is not so much of an issue for gene lists that are derived from relatively comprehensive platforms but can be an issue for platforms that are less comprehensive. 18. AraCyc is a part of the BioCyc metabolic databases. All the metabolic databases present on BioCyc share the same software, so the tutorial described on this section can be applied on the other databases. 19. We can include more expression columns, each one could represent a different experiment or time point. 20. The MapMan version used in this tutorial is Subheading 3.1.1. 21. For BIND links it will be necessary to obtain a user account with the BIND/BOND website to view the literature record. 22. VirtualPlant integrates information from Arabidopsis and rice sources. 23. In the case of “subcellular location,” information comes from SUBA database. The red color represents the protein localization. 24. Note that this expression data represents the whole organ for roots and not the cell type-specific expression described in the eFP browser (Subheading 3.1.1). 25. Protein Data Bank: http://www.rcsb.org/pdb/home/home. do. Phyre website: http://www.sbg.bio.ic.ac.uk/~phyre/. References 1. Chory J et al (2000) National Science Foundation-sponsored workshop report: “The 2010 Project” functional genomics and the virtual plant. A blueprint for understanding how plants are built and how to improve them. Plant Physiol 123:423–426 2. Alonso JM et al (2003) Genome-wide insertional mutagenesis of Arabidopsis thaliana. Science 301:653–657 3. Rhee S et al (2003) The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated Bioinformatic Tools in Arabidopsis Research 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res 31:224 Finkelstein RR, Somerville CR (1990) Three classes of abscisic acid (ABA)—insensitive mutations of arabidopsis define genes that control overlapping subsets of ABA responses. Plant Physiol 94:1172 Brady S, Provart N (2009) Web-queryable large-scale data sets for hypothesis generation in plant biology. Plant Cell 21:1034 Usadel B et al (2009) Co-expression tools for plant biology: opportunities for hypothesis generation and caveats. Plant Cell Environ 32:1633–1651 Winter D et al (2007) An ‘Electronic Fluorescent Pictograph’ browser for exploring and analyzing large-scale biological data sets. PLoS One 2:e718 Hruz T et al (2008) Genevestigator V3: a reference expression database for the metaanalysis of transcriptomes. Adv Bioinformatics 420747 O’Connor TR, Dyreson C, Wyrick JJ (2005) Athena: a resource for rapid visualization and systematic analysis of Arabidopsis promoter sequences. Bioinformatics 21:4411–4413 Obayashi T et al (2011) ATTED-II updates: condition-specific gene coexpression to extend coexpression analyses and applications to a broad range of flowering plants. Plant Cell Physiol 52:213–219 Toufighi K et al (2005) The botany array resource: e-Northerns, expression angling, and promoter analyses. Plant J 43:153–163 Du Z et al (2010) agriGO: a GO analysis toolkit for the agricultural community. Nucleic Acids Res 38:W64–W70 Carbon S et al (2009) AmiGO: online access to ontology and annotation data. Bioinformatics 25:288–289 Provart N, Zhu T (2003) A browser-based functional classification SuperViewer for Arabidopsis genomics. Curr Comput Mol Biol 2003:271–272 Mueller LA, Zhang P, Rhee SY (2003) AraCyc: a biochemical pathway database for Arabidopsis. Plant Physiol 132:453–460 Thimm O et al (2004) Mapman: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant J 37:914–939 Heazlewood JL et al (2007) SUBA: the Arabidopsis subcellular database. Nucleic Acids Res 35:D213–D218 Geisler-Lee J et al (2007) A predicted interactome for Arabidopsis. Plant Physiol 145(2): 317–329 135 19. Katari MS et al (2010) VirtualPlant: a software platform to support systems biology research. Plant Physiol 152:500–515 20. Mostafavi S et al (2008) GeneMANIA: a realtime multiple association network integration algorithm for predicting gene function. Genome Biol 9(Suppl 1):S4 21. Fucile G et al (2011) ePlant and the 3D data display initiative: integrative systems biology on the World Wide Web. PLoS One 6:e15237 22. Mu J et al (2008) LEAFY COTYLEDON1 is a key regulator of fatty acid biosynthesis in Arabidopsis. Plant Physiol 148:1042–1054 23. Schmid M et al (2005) A gene expression map of Arabidopsis thaliana development. Nat Genet 37:501–506 24. Nakabayashi K et al (2005) Genome wide profiling of stored mRNA in Arabidopsis thaliana seed germination: epigenetic and genetic regulation of transcription in seed. Plant J 41:697–709 25. Brady SM et al (2003) The ABSCISIC ACID INSENSITIVE 3 (ABI3) gene is modulated by farnesylation and is involved in auxin signaling and lateral root development in Arabidopsis. Plant J 34:67–75 26. Laubinger S et al (2008) At-TAX: a whole genome tiling array resource for developmental expression analysis and transcript identification in Arabidopsis thaliana. Genome Biol 9:R112 27. Zeller G et al (2009) Stress-induced changes in the Arabidopsis thaliana transcriptome analyzed using whole-genome tiling arrays. Plant J 58:1068–1082 28. Brady SM et al (2007) A high-resolution root spatiotemporal map reveals dominant expression patterns. Science 318:801–806 29. Obayashi T, Kinoshita K (2009) Rank of correlation coefficient as a comparable measure for biological significance of gene coexpression. DNA Res 16:249–260 30. Dubreucq B et al (2000) The Arabidopsis AtEPR1 extensin-like gene is specifically expressed in endosperm during seed germination. Plant J 23:643–652 31. Nole-Wilson S, Tranby TL, Krizek BA (2005) AINTEGUMENTA-like (AIL) genes are expressed in young tissues and may specify meristematic or division-competent states. Plant Mol Biol 57:613–628 32. Chattopadhyay S et al (1998) Arabidopsis bZIP protein HY5 directly interacts with lightresponsive promoters in mediating light control of gene expression. The Plant Cell Online 10:673–684 33. Higo K et al (1999) Plant cis-acting regulatory DNA elements (PLACE) database: 1999. Nucleic Acids Res 27:297–300 136 Miguel de Lucas et al. 34. Liu B, Chen J, Shen B (2011) Genome-wide analysis of the transcription factor binding preference of human bi-directional promoters and functional annotation of related gene pairs. BMC Syst Biol 5:S2 35. Ouyang X et al (2011) Genome-wide binding site analysis of FAR-RED ELONGATED HYPOCOTYL3 reveals its novel function in Arabidopsis development. The Plant Cell Online 23:2514–2535 36. Zhang H et al (2011) Genome-wide mapping of the HY5-mediated gene networks in Arabidopsis that involve both transcriptional and post-transcriptional regulation. Plant J 65: 346–358 37. Razem FA et al (2006) The RNA-binding protein FCA is an abscisic acid receptor. Nature 439:290–294 38. Ashburner M et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29 39. Baud S et al (2002) An integrated overview of seed development in Arabidopsis thaliana ecotype WS. Plant Physiol Biochem 40:151–160 40. Hua S, Sun Z (2001) Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17:721–728 41. Horton P et al (2007) WoLF PSORT: protein localization predictor. Nucleic Acids Res 35: W585–W587 42. Aranda B et al (2009) The IntAct molecular interaction database in 2010. Nucleic Acids Res 38:D525–D531 43. Stark C et al (2011) The BioGRID Interaction Database: 2011 update. Nucleic Acids Res 39:D698–D704 44. Li P et al (2011) AtPID: the overall hierarchical functional protein interaction network interface and analytic platform for Arabidopsis. Nucleic Acids Res 39:D1130–D1133 45. Klopffleisch K et al (2011) Arabidopsis G-protein interactome reveals connections to cell wall carbohydrates and morphogenesis. Mol Syst Biol 7 46. Nakamura S, Lynch TJ, Finkelstein RR (2001) Physical interactions between ABA response loci of Arabidopsis. Plant J 26:627–635 47. Cui H et al (2007) An evolutionarily conserved mechanism delimiting SHR movement defines a single layer of endodermis in plants. Science 316:421–425 48. De Lucas M et al (2008) A molecular framework for light and gibberellin control of cell elongation. Nature 451:480–484 49. Dill A, Jung HS, Sun T (2001) The DELLA motif is essential for gibberellin-induced degradation of RGA. Proc Natl Acad Sci 98:14162 50. Nordborg M et al (2005) The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol 3:e196 51. Choi H (2000) ABFs, a family of ABAresponsive element binding factors. J Biol Chem 275:1723–1730 52. Eden E et al (2009) GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10:48 Part III Genetic Techniques Chapter 6 Exploiting Natural Variation in Arabidopsis Johanna A. Molenaar and Joost J.B. Keurentjes Abstract Natural variation for many traits is present within the species Arabidopsis thaliana. This chapter describes the use of natural variation to elucidate genes underlying the regulation of quantitative traits. It deals with the development and use of mapping populations, the detection and handling of genetic markers, the phenotyping of quantitative traits, and, finally, QTL analyses. The focus of the chapter is on the use and development of recombinant inbred lines, but other types of segregating populations, including genomewide association mapping in natural populations, are also discussed. Key words Natural variation, Quantitative trait, QTL mapping, Recombinant inbred lines, Genomewide association mapping 1 Introduction For many properties of plants, natural variation exists between and within species. Natural variation is defined as genome-encoded differences causal for phenotypic variation and is regarded as a major driving force in adaptation and species formation. In addition, the acknowledgement of heritable variation in specific traits has greatly contributed to agricultural crop improvement. Ever since the domestication of wild species, some 10,000 years ago, farmers have sought for optimal crop varieties to grow. Initially, natural varieties of species were evaluated, and new crop varieties were developed by stringent performance selection of founder lines used for breeding. This resulted in crops which were better adapted to local climates, more resistant to diseases, and yielding higher amounts of harvestable product [1]. At the onset of the discovery of the structure of DNA, however, knowledge of the genome-encoded information increased exponentially over the last decades. This enabled a shift from phenotypic towards genotypic selection methods, greatly increasing pace and accuracy of modern breeding practices. Crucially here is the identification of the relationship between genotype and phenotype, for which a number of methods have been Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_6, © Springer Science+Business Media New York 2014 139 140 Johanna A. Molenaar and Joost J.B. Keurentjes developed. The notion that natural variation can be instrumental in the identification of genetic regulation of quantitative traits has also contributed substantially to our fundamental understanding of key biological and evolutionary processes. Functional analysis of natural variants enabled the detection of genetic factors controlling essential steps in plant development and performance. For obvious reasons, e.g., long generation times and complex genome structure, crop species are not ideal for the genetic and functional analysis of most traits. Since many traits are evolutionary conserved, model species are nowadays widely used for the elucidation of the mechanistic basis of plant biology [2]. Arabidopsis thaliana is perfectly suited as the reference species in modern plant sciences. It combines rapid generation cycles with high reproductive success and contains a small genome. Because it is an autogamous species, homozygous inbred lines can be obtained in which genotypes are fixed, allowing propagation and multiplication of isogenic lines. Nonetheless, it tolerates intraspecific crosses yielding viable offspring with genomic and functional segregation in subsequent generations. In addition, it allows interspecific crossing with some of its close relatives, although progeny of such combinations is often sterile. Importantly, Arabidopsis has a worldwide geographic distribution covering a diversity of growing habitats [3]. Adaptation of accessions to this variety of local environments over the course of evolution has led to a wealth of natural variation in many complex traits. These properties make Arabidopsis the species of choice for genetic analyses of many life history traits. Over the last decades, natural variation is exploited to elucidate the genetics underlying both qualitative and quantitative traits [4]. Where the genetic analysis of qualitative traits is quite straightforward, it is much more complicated in quantitative traits. Qualitative traits are typically regulated by a limited number of genes resulting in discrete phenotypic classes that can easily be associated to genomic regions using simple Mendelian genetics. Quantitative traits, however, often show a continuous distribution of trait values over different genotypes, which makes it difficult to assign phenotypes to distinct classes. The reason for this quantitative nature of phenotypic expression is the involvement of a multitude of genes each contributing moderate to small effects. Gene-by-gene (epistasis) and genotype-by-environment (GxE) interactions further complicate the genetic regulation of quantitative traits. To account for the complexity of the genetic architecture underlying quantitative traits, more sophisticated statistical analysis methods are required for the identification of quantitative trait loci (QTLs). QTLs are defined as genomic regions involved in the genetic regulation of a specific trait and in which allelic variation explains a significant part of the phenotypic variation observed in this trait. The detection of QTLs, known as genetic linkage mapping, is based on the principal of linkage disequilibrium (LD). Basic Exploiting Natural Variation 141 Mendelian genetics teaches that genetic factors in close vicinity of each other, i.e., located on the same chromosomal arm, inherit simultaneously. This linkage can only be broken by a recombination event during meiosis. The further apart two loci are on a chromosome, the larger the chance that a crossover occurs between them. The relationship between recombination frequency and genomic remoteness was first recognized by Thomas Hunt Morgan, and hence, the genetic distance is expressed in centiMorgan (1 % rf = 1 cM). It can easily be deduced that if two loci inherit independently, gametes have a 50 % chance of carrying a recombinant genotype. From this, it can be concluded that genetic distances above 50 cM cannot be discriminated from random segregation of unlinked loci. Such loci are then referred to as being in linkage equilibrium. If two loci are co-inherited to some extent these are referred to as being in linkage disequilibrium. Although the genetic distance is related to the physical distance, this relationship is not always linear. While the physical distance between two loci is determined by a fixed number of nucleotides, the genetic distance is estimated by the number of crossovers between them. Because the frequency of recombination is dependent on a number of different factors, the cM to bp ratio is not constant over the genome. The highly heterogeneous centromeric regions, for instance, are almost completely devoid of crossovers resulting in large physical distances between genetically closely linked loci. Fortunately, the gene density in heterochromatic regions is much lower than in euchromatin where this relationship is much tighter. Genetic maps can therefore be a good proxy for the physical position of QTLs. Linkage mapping detects associations between the phenotype and the underlying genotype in an indirect way. Genotypic differences between accessions are determined by sequence polymorphisms that can serve as genetic markers to identify the parental descent of genomic regions. Although most polymorphisms will be functionally neutral, some of them might be close enough to be in LD with the causal factor explaining a QTL. Genetic linkage mapping thus requires genome-wide coverage of markers that are statistically tested for association with variation in the trait of interest. Any significantly associated markers (QTLs) are in LD with allelic variation responsible for the observed phenotypes and hence hint to the position of the causal gene. To identify the genetic factors underlying quantitative traits, large collections of individuals showing natural variation must be analyzed. Such collections can consist of wild accessions, but Arabidopsis QTL mapping is most powerful in experimental mapping populations that segregate for the trait of interest. Although many different types exist, biparental populations descending from a cross between two distinct accessions are most popular. Widely used are recombinant inbred lines (RILs) derived from F2 142 Johanna A. Molenaar and Joost J.B. Keurentjes individuals, which are the progeny of a hybrid of two distinct homozygous accessions, by single seed descent. Many of such RIL populations have been generated, and we will focus this chapter on the development and analysis of these genetic resources. Because RILs are inbred for several generations, they are homozygous and can therefore be propagated and used indefinitely to study many complex traits in various conditions. In addition, accurate phenotyping can be achieved efficiently because the genetic material under investigation can be analyzed in isogenic replications. For the sake of completeness, we will briefly discuss some of the related alternatives of RIL populations without addressing their generation and use in detail. A fast shortcut for the development of RIL populations is the analysis of F2 populations in which large parts of the genome of its individuals will still be heterozygous. The advantages here are the fast generation and the possibility to determine dominance effects. However, the lower number of recombination decreases mapping resolution, and the increased complexity of heterozygosity reduces mapping power. So, to achieve reasonable statistical power and resolution, much larger population sizes are needed. The largest disadvantage, however, is the further segregation of the heterozygous regions in subsequent generations. Such populations can therefore not be maintained, while an equal genotyping investment is needed for their analysis. Moreover, experimental replications of genotypes are not available since each individual has a unique genetic makeup. As a consequence, phenotyping and genotyping must be carried out on the same plant. A second often used alternative for RIL populations are near isogenic lines (NILs) or introgression lines (ILs). A NIL has an identical genome (isogenic) to one of the parental lines (the background line) except for a small region (an introgression) which is derived from the donor parental line. NILs can be created from an F1 via several rounds of backcrossing and selfing [5]. The generation of a set of NILs is very laborious but can be very useful, because it allows studying only a single QTL at the time avoiding complications of the segregation of multiple loci (e.g., epistasis). Intensive marker evaluation over multiple generations is needed to get a genome-wide coverage population. In Arabidopsis, two of such populations have been developed [6, 7]. Because of inbreeding depression, NILs are often the only viable option for immortal populations in many species. Lastly, doubled haploids (DHs) are frequently used in many breeding crops to identify QTLs. Only recently DH populations can be constructed for Arabidopsis. For this, F1 plants are crossed to a genome elimination mutant line which converts the recombinant F1 gametes to viable haploid seeds. Incidentally, the resulting haploid plants spontaneously undergo a whole-genome duplication yielding viable homozygous diploid seeds [8]. This way an immortal homozygous population reminiscent of RILs can be achieved in only three generations. Although the resolution of DH populations is lower due to reduced Exploiting Natural Variation 143 recombination frequencies compared to RILs, this type of population can become an important tool in the near future. The two major disadvantages of biparental mapping populations are the poor mapping accuracy and the limited genetic variation present between only two accessions, leaving genetic variants present in other genotypes of the species undetected. To overcome some of these limitations, several advanced RIL populations are developed for Arabidopsis in which multiple founder lines are used to incorporate more natural variation in addition to several generations of intercrossing to improve the mapping accuracy [9–12]. Alternatively, more natural variation can be investigated by jointly analyzing multiple populations of different types and origin [13– 15]. As noted above, a different approach to exploit the genetics underlying natural variation is the use of collections of wild accessions in so-called genome-wide association studies (GWAS) [16]. Developed in human genetics, where the generation of experimental populations is for ethical reasons undesired, this method uses a subset of natural accessions available and aims to associate trait differences with specific genotypes. As such the principles of GWA mapping do not differ much from classical linkage analysis but due to the fast LD decay in natural Arabidopsis populations, often within 10 kbp, a much denser genotype map is required [17]. This fast decay is the result of the high number of historical recombination events accumulated during the evolutionary history of the species. Consequently, significant associations have very small support intervals, which simplifies the detection of the causal gene underlying the QTL enormously [18]. Although the allelic variation analyzed and the acquired resolution are much higher in GWA mapping of natural populations, the statistical power is much lower than in experimental populations. Correction for population structure giving rise to false negatives, the presence of multiple small-effect or rare large-effect alleles, and the co-segregation of many QTLs are only a few of the many confounding factors, and no consensus is yet reached about the preferable statistical methods [19]. 2 Materials 1. Seeds of Arabidopsis accessions and mapping populations. (www.arabidopsis.org, ABRC stock center) (http://www. inra.fr/internet/Produits/vast/RILs.htm) (see Note 1). 2. Equipment to cross plants (tweezers, stereo microscope, and labels). 3. Facilities to grow many plants simultaneously, under the assay conditions, which is necessary in order to perform the quantitative analysis of whole accession collections or mapping populations. Specific requirements will depend on the particular test conditions. 144 Johanna A. Molenaar and Joost J.B. Keurentjes 4. Equipment to genotype molecular genetic markers. This might be as simple as oligonucleotides and PCR consumables, a thermocycler and an agarose gel system, for standard PCR markers; or apparatus and reagents for high-throughput genotyping of polymorphisms such as microarray and sequence technology. 5. Equipment to measure the quantitative trait(s) of interest. Depending on the biological parameter to be measured, this might be, for instance, from a simple ruler up to a luciferase luminometer or a microarray scanner. 6. Software for general statistical analysis (e.g., SAS, SPSS, or GENSTAT packages), for linkage mapping analysis (e.g., MAPMAKER or Joinmap), and for QTL analysis (e.g., MAPMAKER/QTL, Map Manager QTX, MapQTL, MultiQTL, PlabQTL, QTL Cartographer, R-QTL, or QTL express). 3 Methods As explained in the Introduction, the ultimate goal of QTL analysis is to elucidate the genes that are causal for a certain phenotype. To achieve this goal, a number of steps have to be followed. In this chapter, we describe the different steps to perform a QTL analysis. It will deal with each appropriate handling in an orderly fashion. In the first part, some basic knowledge required before starting a mapping experiment is discussed. We then discuss how to develop and genotype a mapping population, and finally, the actual linkage mapping will be explained. In addition, some theoretical/statistical background will be given about data analysis and how to interpret results. This section will focus on QTL analyses in RIL populations, since this type of population is used most often, but the principles are widely applicable to a range of different population types. 3.1 Natural Variation, Heritability, and Phenotyping Assays Before performing a QTL analysis in Arabidopsis, a number of things need to be considered. Importantly, natural heritable variation for the trait of interest should be present within the species, and a phenotyping assay should be available to quantify the observed variation. When these requirements have been met, a mapping population segregating for the trait of interest should be available or needs to be developed. To gain information about natural variation for the trait of interest, a selection of different natural accessions can be phenotyped. Such a selection ideally consists of the most diverse accessions which can be determined by morphological differences, geographical distribution, or genotypic information. Many accessions show large differences in morphological properties which often have pleiotropic effects on many other traits. Much Exploiting Natural Variation 145 of the morphological variation is the result of adaptation to local environments, and therefore, selection for geographic origin can increase the chance of detecting natural variation. The selection pressure over time has been different between origins, and information about the (climate) conditions in the places of origin can be helpful to select accessions that are expected to be different for the trait of interest. Finally, for an increasing number of accessions genotypic information is publicly available which can be mined to select the genetically most diverse accessions (www.1001genomes. org; [20]). In general, these selection criteria are highly related since geographical distant accessions are often reproductively isolated leading to distinct genotypic profiles. Most accessions can be retrieved from the stock centers (ABRC, NASC). It might be worthwhile though to include parental accessions of existing RIL populations in the initial screen. This has the advantage that a mapping population is already available when phenotypic variation is detected. A list of existing RIL populations is available at http:// www.inra.fr/internet/Produits/vast/RILs.htm. To reliably estimate accurate phenotypic trait values for each selected accession, replicate measurements on different individuals need to be performed. The number of replicates depends on the robustness of the trait but a minimum of five is advisable. The variance estimates of these initial experiments are informative for the sources of variation and the inheritance of traits. An important part of the total detected variation is non-genetic residual variation which can be broken down in technical and biological variation. Technical variation includes sample treatment and measurement error which can be estimated and annulled by replicate analytical measurements of the same sample or individual. Biological variation, however, is defined as the variation observed between replicate individuals of the same genotype and is often the result of the interaction with the environment. Small local differences during seedling establishment or due to positional placement in the growing facility can strongly enhance phenotypic differences, and uniform growing conditions are therefore recommended. Residual variation is random and as such introduces noise in the estimation of trait means. However, when accurate estimates of mean trait values can be obtained for different accessions, any observed differences can be attributed to genetic variation. The proportion of genetic variation in relation to the total variation is referred to as broad sense heritability, expressed as H2 = Vg/(Vg + Ve), where Vg is the genetic variation and Ve is the residual variation. Broad sense heritability estimates indicate how much of the observed phenotypic variation can be explained by genetic factors in a given experimental setup. In general, it is more likely to detect QTLs for traits with high heritability values, especially if the genetic variation is explained by a limited number of loci (see Note 2). When good heritabilities can be obtained, two genotypes should be chosen as the parents of 146 Johanna A. Molenaar and Joost J.B. Keurentjes the mapping population that will be used for further QTL analysis. Although traits can segregate in progeny of phenotypically similar parents, it is usually best to choose two opposing extremes as parents. These extremes are most likely to differ for genetic factors controlling the trait of interest. Creating a new RIL population, however, is laborious and time-consuming, and screening an existing population with less extreme parents may then be preferable. 3.2 Development of a Population of Recombinant Inbred Lines When no suitable genetic resources are available for the trait of interest, a novel mapping population can be created. The development of an RIL population is rather straightforward but laborious and time-consuming. To reach full homozygosity, each line needs at least eight generations of inbreeding. The time needed to complete a population depends, therefore, on the life history cycle of the individuals which is largely controlled by the time required to flower. Some accessions, like the frequently used lab strains Columbia and Landsberg erecta, flower within a month after germination at longday conditions. Other accessions can flower much later or might even need a vernalization treatment of several weeks to induce flowering. Many accessions also produce dormant seeds which delays the time between rounds of inbreeding because a certain time of afterripening is required. Another feature to consider before starting developing RILs is the population size needed. The size of a RIL population is an important factor that influences the detection of a QTL. Larger population sizes increase the QTL detection power and resolution. From various studies, it is clear that QTLs explaining approximately 10 % of the total variance have roughly an 80 % chance of being significantly detected in a population of 200 individuals. The probability of detecting a QTL is decreasing more or less linearly with smaller population sizes [6, 21]. Most existing RIL populations consist of 100–200 individuals. Given the genome size of Arabidopsis and inbreeding until full homozygosity, introgressions in individual RILs will span on average 6–12 Mb (~30–60 cM) leading to a mapping resolution of 1–2 Mb (~5–10 cM) in such medium-sized populations. It is recommendable to develop a larger number of RILs from which a core collection can be selected that is optimized for recombination frequency and allele distribution. The subsequent steps to create a RIL population are described below: 1. Grow the two parental accessions simultaneously so that they flower at the same time. Use a binocular to remove the anthers of flowers of the female plant (emasculation) to prevent selffertilization and pollinate the stamen by hand with pollen of the male plant (see Note 3). Harvest the F1 seeds when the silliques become yellow. Seeds might be dormant, and it is better not to use freshly harvested seeds for the next round, but to store them for at least 1 month. The residual dormancy can be broken by incubating seeds in cold conditions for 3–5 days, before germinating. Exploiting Natural Variation 147 2. Make sure to check whether the cross in step 1 was successful by testing the F1 plants for heterozygosity with polymorphic markers. F2 seeds are generated by selfing the obtained F1s. Because the F2 seed is the result of a fusion of two recombined F1 gametes, each germinated F2 plant consists of a 1:1 mosaic of the two parental genomes. Since meiotic recombination occurs at random, no two gametes, or F2 plants, are identical, and the two parental genomes segregate independently. 3. From the F2 onward, individual plants have a unique genetic makeup and are propagated by single seed descent. Grow as many F2 plants as needed to reach the desired population size and label each plant with a unique identifier. Make sure that plants cannot cross-pollinate but are self-fertilized. Seeds need to be harvested from each plant separately, and a few seeds are used to grow the next generation. From these, a single plant is randomly chosen to harvest seeds from. Be careful not to bias the selection by favoring the best looking or earliest flowering plant. To circumvent any unintentional selection bias one can, for instance, always harvest the third replicate of a line in any generation. In each generation, only a single plant is harvested per line, and seeds from this plant are used for the next generation. Repeat this procedure until the F8 is reached. In every generation of inbreeding, the amount of heterozygosity is halved reaching less than 0.5 % (½8) in the F8. From this generation onward, plants are almost completely homozygous, and lines can be bulk propagated. The RILs can now be used for genotyping and phenotyping studies (see Note 4). 3.3 Development of a Linkage Map In order to assign phenotypic variation to specific genomic differences between individuals of a population, they need to be genotyped. Each line of a RIL population consists of a mosaic of maternal and paternal genomic introgressions. Genetic markers are used to elucidate which regions descended from the mother or father line, respectively. Mapping populations can be genotyped with any marker technique available. The first used genetic markers were morphological polymorphisms with an easy observable (mutant) phenotype. Here, a single polymorphism is responsible for a change in phenotype and is therefore segregating in a Mendelian fashion. The first published genome-wide linkage maps of Arabidopsis consisted of artificially induced phenotypic mutant markers [22]. With the introduction of PCR (polymerase chain reaction) technology in the eighties, it became possible to develop markers based on sequence differences without a clearly related phenotype [23]. Genomic polymorphisms, like deletions, insertions, and single-nucleotide polymorphisms (SNPs), are much more abound in natural accessions and can be detected on the DNA level, independent of a phenotype or developmental stage. PCR-based markers can be classified as dominant and codominant. 148 Johanna A. Molenaar and Joost J.B. Keurentjes Dominant markers, e.g., AFLPs and RAPDs, only give information about the presence or absence of an allele. No distinction can be made between individuals being heterozygous or homozygous for the dominant allele. Because both parents of the population will carry dominant alleles for specific markers, a separate map for each parent often needs to be created. These maps can be integrated using codominant markers or specialized software which recognizes male or female dominant markers. Codominant marker technology, e.g., INDELs and microsatellites, also provides information about allele dosage and is the preferred method of choice nowadays. Currently, in Arabidopsis molecular markers can be detected by high-throughput genotyping technologies such as hybridization arrays [24] and next-generation sequencing [25]. These technologies enable the detection of a large fraction of the genomic variation, e.g., SNPs, INDELs, and genome rearrangements, between population individuals. To be able to perform genetic mapping, accurate maps with sufficient marker density are needed. A genome-wide linkage map, i.e., the order and position of the markers in the genome, can be created by determining the recombination frequencies between markers. As outlined above, the smaller the physical distance between two markers, the lower the recombination frequency between them. Distances between markers are expressed in centiMorgan (cM); 1 cM corresponds to 1 recombination event per 100 meioses. In Arabidopsis, the markers are placed in five linkage groups corresponding to the five chromosomes. A marker is assigned to a particular linkage group if it shows significant linkage to any marker belonging to that group. To determine which group corresponds with which chromosome, it is needed to gain information about the physical position of at least one marker in each linkage group. For most sequence-based markers, physical information is publicly available and can be obtained via TAIR. Many morphological markers have also been cloned, and the positions of their corresponding mutations are known [26]. Linkage maps can nowadays be easily created using dedicated software packages such as JOINMAP or MAPmaker. For proper QTL analysis, each position in the genome needs to be in linkage disequilibrium with at least one molecular marker. The amount of markers needed to satisfy this condition depends on the LD decay. Populations with a fast LD decay need more markers than populations with a slow LD decay. For RIL population sizes smaller than 200 lines, a density of 1 marker per 5 cM is sufficient to detect the vast majority of crossovers. Unequal distribution of markers over the genome leads to larger confidence intervals and lower detection power than needed [27]. The subsequent steps to create a linkage map are described below: 1. To genotype each individual RIL of the population, it needs to be grown to collect plant material for DNA extraction. Exploiting Natural Variation 149 Depending on the genotyping technology, various extraction protocols are available. DNA samples are labelled according to their respective line number. 2. Choose a preferred marker technology, and use the extracted DNA of each individual for genotyping. Each individual should be genotyped with the same markers. For medium-sized populations of Arabidopsis, 100 evenly spaced markers correspond approximately to a 5 cM resolution. Score the genotype of all individuals for each analyzed marker in a genotype file. Check each individual for quality, and remove individuals with many missing data or spurious genotype calls. Check marker quality and remove low-quality markers (see Note 5). 3. Use a genetic mapping software package to determine the linkage between markers and to assign them to one of the five chromosomes of Arabidopsis. Such programs estimate the recombination frequency and its statistical significance for all pairwise combinations of markers. Markers are assigned to different linkage groups in a specific order. Determine the corresponding chromosome of each linkage group by checking the physical position of some markers. Inspect the resulting linkage map for gaps and include more markers where appropriate. Check each marker of the final map for segregation distortion and determine the cause (see Note 6). 3.4 Linkage Mapping When a genetic linkage map and the corresponding marker genotype data for each individual of the population are available, QTL analyses can be performed. For this, each RIL needs to be analyzed for a specific trait, and the segregation of trait values is then compared to the segregation of the two parental genotypes over all marker positions. Significant co-segregation is then defined as a QTL. The most basic QTL analysis is performing a student’s t-test for each marker, in which the subset of RILs with the maternal genotype is tested against the paternal subset (for populations that contain heterozygous lines, see Note 7). For the use of dominant markers, see Note 8. More sophisticated software packages have automated this procedure for genome-wide analysis and use a variety of different algorithms to optimize for speed and accuracy. MapQTL and QTL Cartographer are most frequently used in Arabidopsis, but other packages such as QTL express, PlabQTL, and plugins available for the statistical platforms R and Genstat are also in use. Such programs need three types of input files: a file with the phenotype data for each line of the population, a second one with the marker genotype data for each line, and a file with the genome-wide linkage map. Most software packages allow the user to choose which method will be used for the QTL analysis. The simplest method is referred to as single-marker ANOVA in which 150 Johanna A. Molenaar and Joost J.B. Keurentjes the mean values of the two genotype groups will be evaluated per marker. This results in a t- or F-statistic for each marker position. However, most methods used today apply interval mapping in which also positions between markers can be tested. On positions in marker intervals, the QTL likelihood is estimated using the recombination frequency between neighboring markers. The LOD score (logarithm of the odds) or deviance (D) are used to express the significance of genotypic differences (see Note 9). The power to detect multiple simultaneously segregating QTLs can be increased by modifying the statistical model. In the modified model, the presence of validated QTLs is taken into account when testing for QTLs at other loci. Such analyses are known as composite interval (CIM) or multiple QTL model (MQM) mapping. A marker most closely linked to a known QTL is added to the statistical model as a cofactor to correct for the effect of that QTL (see Note 10). The output of mapping programs consists of graphical displays of LOD scores along the genome, where significant scores indicate QTL positions. Significance levels are usually determined by a permutation test. More detailed information about the additive effect and explained variance for each genomic position tested is given in result tables. The additive effect of a locus is defined as the difference between the mean trait value of the two genotypic classes. The explained variance indicates which part of the total variance is explained by a particular locus or all loci. Because the LD decay is quite low in RIL populations, the position of QTLs is often assigned to confidence or support intervals. Most commonly used are 2-LOD support intervals which span a region in which the highest QTL LOD score drops 2 units. Identified QTLs can be further tested statistically for genetic interactions with other genomic regions (epistasis). For this, standard statistical analyses, such as ANOVA, will suffice. The effect of the identified QTLs can be validated in NILs or HIFs. Such lines can also be used for fine-mapping purposes when backcrossed to one of the parental lines. The ultimate goal here is the molecular isolation of the genes underlying individual QTLs (causal gene or quantitative trait gene) and the identification of the DNA polymorphism altering the function of the gene and causing the phenotypic variation (causal nucleotide or quantitative trait nucleotide). Functional analysis of candidate genes can be started if the support interval of the QTL is small enough to follow up all underlying genes or if an obvious candidate gene is available. Mutants, knockout or gene silencing (RNAi), and overexpression lines can be analyzed for an effect on the trait of interest (see Note 11). Complementation of a mutant phenotype by transformation or crossing provides another line of evidence that natural variation in a gene is causal for a certain phenotype. (Re)sequencing of the QTL region or candidate genes can give information Exploiting Natural Variation 151 about possible causal nucleotide polymorphism. The subsequent steps to perform a QTL analysis are described below: 1. Grow each individual of the population in as many replicates as possible to acquire the best estimate of a line’s trait value. Although replicates are not strictly necessary, since genotypic replication is present in the population structure, it often improves mapping power. In addition, it allows for heritability estimates. Make sure to grow the parental lines as well to determine the parental differences. Quantify the trait of interest with an appropriate assay as accurate as possible. Standardize measurements as much as possible, in terms of developmental stage and environmental conditions, for all individuals. 2. Enter the quantitative trait data in a loading file of the appropriate format for the software to use. Follow the manufacturer’s instructions for loading trait data, genotypes, and genetic map into the preferred program and run the QTL analyses. Determine significance thresholds for each trait separately and record QTLs, their additive effect, and explained variance. 3. Each detected QTL needs to be confirmed with independent genetic resources such as NILs or HIFs. The effect of the QTL can be tested in relation to other genomic regions (epistasis). Once the effect of the QTL and its genetic regulatory mechanism is validated, fine mapping and cloning of the causal gene is required. When relevant, experiments can be repeated in different conditions to determine any genotype-by-environment interactions. 4 Notes 1. Most probably seeds from the stock center are stored for a long time. So instead of using them directly for your mapping experiment, it is advisable to first propagate them. This guarantees that all seeds are developed on plants that are grown in the same conditions and prevents the detection of differences on the basis of longevity of the seeds. 2. The heritability of a trait can be increased by reducing the variation within genotypes. Analysis of more replicates leads to such a reduction. Also very controlled growing conditions and an accurate phenotyping assay help to minimize this residual variation. 3. For some traits, the cytoplasmic background (chloroplasts and mitochondria) can be important. It might be helpful to map QTLs to the cytoplasmic genome. For this, reciprocal crosses need to be made, and the parental genotype of the cytoplasm of the resulting progeny is a marker to be included as an extra linkage group. 152 Johanna A. Molenaar and Joost J.B. Keurentjes 4. Sometimes seeds of the F5 or F6 generation are bulked and genotyped. These generations still contain one to three percent of heterozygosity which can be used to generate heterogeneous inbred families (HIFs). A HIF is derived from a RIL containing a small heterozygous region in an otherwise homozygous background. The heterozygous regions will segregate in the next generation resulting in fixed parental genotypes in half the number of progeny. These fixed lines are very similar to NILs and can be used to confirm a QTL in that specific region. 5. Erroneous marker data can inflate genetic maps tremendously, especially at short distances. Therefore, wrong data are much worse than missing data, and only high-quality data should be used for mapping purposes. 6. Segregation distortion of markers can be due to genotyping errors, which is usually the case when observed for isolated markers. These markers should be removed from the analyses. When distortions are caused by genetic incompatibilities, all markers in LD with the incompatibility locus will show a skewed segregation. This will result in lower mapping power and can only be resolved by choosing different parental lines. 7. Depending on the population type, heterozygous regions may be present in individual lines. In this case, three genotypic classes occur: homozygous male, homozygous female, and heterozygous. Most software packages can deal with this and in addition offer the possibility to estimate dominance and additive effects. 8. Dominant markers, such as AFLPs, do not allow distinguishing between homozygous and heterozygous loci. If lines carrying heterozygous regions are present in the population and dominant markers are used for the genotyping, specific software is needed for the analysis. 9. LOD scores are calculated by comparing the likelihood of data in presence (H1) of a QTL to non-presence (H0). In short: LOD score = 10log (L(data|H1)/L(data|H0)) and D = 2 × ln(L(data|H1)/L(data|H0)). LOD can be calculated from D and vice versa: LOD = 0.217D and D = 4.605LOD. 10. Placing cofactors is a delicate task, because it can easily manipulate or overfit results. It is possible in most programs to use automatic cofactor selection procedures, in which unbiased selection of markers is applied. 11. Almost all publicly available mutant lines are in the Columbia background. Note that the allele of Columbia might differ from the alleles of your parents. Therefore, it can be needed to create a mutant in the desired background by RNAi. Exploiting Natural Variation 153 References 1. Doebley JF et al (2006) The molecular genetics of crop domestication. Cell 127: 1309–1321 2. Izawa T et al (2003) Comparative biology comes into bloom: genomic and genetic comparison of flowering pathways in rice and Arabidopsis. Curr Opin Plant Biol 6: 113–120 3. Hoffmann MH (2002) Biogeography of Arabidopsis thaliana (L.) Heynh. (Brassicaceae). J Biogeogr 29:125–134 4. Alonso-Blanco C, Koornneef M (2000) Naturally occurring variation in Arabidopsis: an underexploited resource for plant genetics. Trends Plant Sci 5:22–29 5. Kooke R et al (2012) Backcross populations and near isogenic lines. In: Methods in Molecular Biology: Quantitative Trait Loci (QTL) Analysis, Methods and Protocols (S.A. Rifkin ed), Humana press inc., Totowa, NJ. Methods Mol Biol 871:3–16 6. Keurentjes JJB et al (2007) Development of a near-isogenic line population of Arabidopsis thaliana and comparison of mapping power with a recombinant inbred line population. Genetics 175:891–905 7. Törjék O et al (2008) Construction and analysis of 2 reciprocal arabidopsis introgression line populations. J Hered 99:396–406 8. Ravi M, Chan SW (2010) Haploid plants produced by centromere-mediated genome elimination. Nature 464:615–618 9. Liu SC et al (1996) Genome-wide highresolution mapping by recurrent intermating using Arabidopsis thaliana as a model. Genetics 142:247–258 10. Kover PX et al (2009) A multiparent advanced generation inter-cross to fine-map quantitative traits in Arabidopsis thaliana. PLoS Genet 5:e1000551 11. Huang X et al (2011) Analysis of natural allelic variation in Arabidopsis using a multiparent recombinant inbred line population. Proc Natl Acad Sci 108:4488–4493 12. Balasubramanian S et al (2009) QTL mapping in new Arabidopsis thaliana advanced intercross-recombinant inbred lines. PLoS One 4:e4318 13. Brachi B et al (2010) Linkage and association mapping of Arabidopsis thaliana flowering time in nature. PLoS Genet 6:e1000940 14. Bentink L et al (2010) Natural variation for seed dormancy in Arabidopsis is regulated by additive genetic and molecular pathways. Proc Natl Acad Sci 107:4264–4269 15. McMullen MD et al (2009) Genetic properties of the maize nested association mapping population. Science 325:737–740 16. Nordborg M et al (2002) The extent of linkage disequilibrium in Arabidopsis thaliana. Nat Genet 30:190–193 17. Kim S et al (2007) Recombination and linkage disequilibrium in Arabidopsis thaliana. Nat Genet 39:1151–1155 18. Atwell S et al (2010) Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465:627–631 19. Filiault DL, Maloof JN (2012) A genomewide association study identifies variants underlying the Arabidopsis thaliana shade avoidance response. PLoS Genet 8:e1002589 20. Weigel D, Mott R (2009) The 1001 genomes project for Arabidopsis thaliana. Genome Biol 10:107 21. Ooijen JW (1992) Accuracy of mapping quantitative trait loci in autogamous species. Theor Appl Genet 84:803–811 22. Koornneef M et al (1983) Linkage map of Arabidopsis-thaliana. J Hered 74:265–272 23. Semagn K et al (2006) An overview of molecular marker methods for plants. Afr J Biotechnol 5:2540–2568 24. Borevitz JO et al (2003) Large-scale identification of single-feature polymorphisms in complex genomes. Genome Res 13:513–523 25. Mardis ER (2008) The impact of nextgeneration sequencing technology on genetics. Trends Genet 24:133–141 26. Meinke DW et al (2003) A sequence-based map of Arabidopsis genes with mutant phenotypes. Plant Physiol 131:409–418 27. Cornforth TW, Long AD (2003) Inferences regarding the numbers and locations of QTLs under multiple-QTL models using interval mapping and composite interval mapping. Genet Res 82:139–149 Chapter 7 Grafting in Arabidopsis Katherine Bainbridge, Tom Bennett, Peter Crisp, Ottoline Leyser, and Colin Turnbull Abstract Grafting provides a simple way to generate chimeric plants with regions of different genotypes and thus to assess the cell autonomy of gene action. The technique of grafting has been widely used in other species, but in Arabidopsis, its small size makes the process rather more demanding. However, there are now several well-established grafting procedures available, which we described here, and their use has already contributed greatly to understanding of such processes as shoot branching control, flowering, disease resistance, and systemic silencing. Key words Arabidopsis thaliana, Grafting, Graft-transmissible signal 1 Introduction The assessment of the cell autonomy of a signaling molecule or mutant phenotype can provide highly informative information about gene function. This kind of analysis requires the construction of chimeric plants with cells of different genotypes. There are several ways to achieve this, including the tissue-specific expression of a wild-type gene in a mutant background [1] and the generation of sectors of different genotypes following somatic recombination or chromosome breakage [2] or transposition [3] or site-specific homologous recombination [4] to remove an insertional mutagen. These methods are versatile in allowing different amounts and positions of the tissues of each genotype to be generated. However, they are all very time consuming, requiring transgenesis and/or construction of lines of particular genotypes and a system to mark the different sectors and thus identify their genotypes. In contrast, grafting is an extremely simple method for making a chimeric plant. In some ways, it is more restricted in its applications than those mentioned above, because only a limited number of options are available for connecting tissues of different genotypes. However, the methods are straightforward, do not require Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_7, © Springer Science+Business Media New York 2014 155 156 Katherine Bainbridge et al. Fig. 1 Details of grafting procedures. Top row shows newly made grafts, from left to right: root-shoot graft without collar; root-shoot graft without collar, with cotyledons removed; root-shoot wedge graft; and two-shoot Y-graft. Bottom row shows later stages, from left to right: root-shoot collar graft 10 days after grafting; mature plant showing dark-colored scar at union; and graft verification by GUS staining of Y-graft where one shoot carried CaMV35S::GUS gene. Arrow in all pictures shows position of graft union construction of complex transgenics or other genotypes, and enable an almost infinite number of genotype combinations to be tested. Grafting experiments are particularly amenable for demonstration of spatial separation of source and target, including genetic complementation of mutant phenotypes across a graft union, direct detection of molecules translocated in vascular sap or arriving in receiving tissue, and/or altered expression of molecular targets due to signal transmission. It is now 20 years since Arabidopsis grafts were first reported [5]. However, the most commonly adopted methods in recent years are based on simple root-shoot grafts, performed on young seedlings, to generate plants where the genotype of the root differs from that of the shoot [6]. This method, with variations (Fig. 1), is described below. In addition, it is possible to graft a seedling shoot into the hypocotyl of a second seedling, a so-called Y-graft, Grafting in Arabidopsis 157 to generate a plant with two genetically different shoot systems [6]. There are also reports of success with mature rosette grafts [7], and there is no reason why other versions should not be equally successful. To date, Arabidopsis grafting has been reported in relation to a multitude of diverse biological processes including shoot branching [1, 6, 8, 9], flowering time [7, 10], leaf development [11], vascular development [12], nutrient transport [13–16], disease resistance [17], small RNA movement [15, 16, 18, 19], systemic silencing [18, 19], and wounding [20], indicating that it is an approach with wide applicability in this species. 2 Materials 1. Sterilized, cold-treated, good-quality Arabidopsis seed of appropriate genotypes. 2. 0.3 mm internal diameter silicon tubing (e.g., SF Medical— Cat No. SMF3-1050, available through VWR International), cut into 2–3 cm sections and autoclaved (see Note 1). 3. Razor blades or No. 15 scalpel blades (see Note 2). 4. Microsurgery knife: No. 15 disposable stab knife (e.g., Fine Science Tools, cat. No. 10315-12). 5. Fine forceps. 6. 10 cm square Petri dishes. 7. ATS (Arabidopsis thaliana salts [21]) or half-strength Murashige-Skoog salts [22] or equivalent, agar (0.8 %) or gellan gum (e.g., Phytagel, Gelrite) type gel (0.6 %), and sucrose (1 %). 8. Dissecting microscope. 9. 22/18 °C growth cabinet. 10. 27 °C growth cabinet. 3 Methods 3.1 RootShoot Grafts 1. Under sterile conditions, sow the seed onto square petri dishes containing ATS-agar (or equivalent), with a spacing of 7–10 mm between seeds. Place the sealed plates vertically in a growth cabinet under standard axenic growth conditions (see Note 3). Leave the seedlings to germinate and grow for 3 days. 2. After 3 days, move the seedlings to a growth cabinet set at 27 °C (see Note 3) for a further 2 days. 3. Cut the sterile silicone tubing into lengths of roughly 2 mm (see Note 4). 158 Katherine Bainbridge et al. 4. Under sterile conditions, grafting can now be performed. Cut selected seedlings transversely across the hypocotyl (see Note 5) while on the agar plates. The root should not be disturbed and should essentially remain in place. Remove the apical part of the seedling and place a collar (see Note 4) over the cut hypocotyl of the rootstock. The top of the rootstock should be about halfway along the length of the collar. Feed the hypocotyl of a suitably excised scion (see Note 5) into the collar such that the base of the scion meets the rootstock. Thus a whole seedling is reconstituted. Note that as well as the reciprocal genotype combinations, it is necessary to include appropriate controls in which self-grafted plants are used to reconstruct the original genotypes, to ensure that the grafting process itself does not affect the phenotype of interest. 5. Using a dissecting microscope, inspect the graft junctions. The two graft parts should be in contact across the whole of the grafting surface with no gaps. If this is not the case, the scion should be pushed further into the collar until it does meet the rootstock. As the success rate of the protocol is 50–70 %, it is recommended to graft twice as many seedlings as needed for the experiment. 6. When grafting is complete, if suitably moist (see Note 6), return the plates to the 27 °C growth cabinet for 3–4 days. 7. After this time, grafts can be assessed for healing using a dissecting microscope (see Note 7). Transfer successful grafts to soil (see Note 8) and use a propagator lid to keep humid for about a week. 8. At an appropriate time thereafter, plants can be phenotypically assessed. When appropriate phenotypic data have been recorded, the plants can be assessed for graft integrity, thus allowing confirmation of the validity of the results (see Note 9). 3.2 Wedge Grafts and Y-Grafts 3.2.1 Single Wedge Graft Instead of cutting the hypocotyl transversely, grafting can also be achieved with V-shaped “wedge-slit” connections. These are similar to many horticultural graft types. Precise cuts are essential and are best made under well-lit dissecting microscope conditions; magnification of 5× to 40× is ideal. 1. Make the rootstock by cutting hypocotyl transversely (with razor blade or No.15 scalpel blade) about 1/4 distance from top, then slit down middle of hypocotyl with microsurgery knife (see Note 2). 2. Make the scion by cutting a very shallow-angled V shape with microsurgery knife. The first cut should extend more than halfway across the hypocotyl, but do not sever the root completely; otherwise the shoot moves around a lot when making the second cut. This second cut should result in a symmetrical wedge. Grafting in Arabidopsis 159 3. Push the scion wedge gently into the slit (which should be same length as the wedge) in the rootstock (Fig. 1). Tissue elasticity and surface tension will keep these grafts together without the aid of a collar. Some practice is needed for these cutting procedures—mainly to achieve a very fine sawing action with the knife, rather than pushing down with large strokes. 3.2.2 Two-Shoot Y-Graft This is a modification of above—a wedge-shaped scion connected into a cut in the side of an otherwise intact rootstock plant, to generate a graft with two shoots on a single root system (Fig. 1). The rootstock plant keeps its roots. Y-grafts can be easier to cut and assemble if hypocotyls are curved: rotate pairs of vertical plates 60° left and right 1 day before grafting. The two shoots are then aligned with curves facing away from each other. It is often also necessary to trim off the majority of one cotyledon on each shoot, to allow the two shoots to sit close together: 1. Make a shallow-angled slit into the side of hypocotyl, starting about one-third of the way from the top and extending no more than halfway across the diameter so that the central vascular tissue is penetrated but not severed. 2. Make a wedge-shaped scion as above (Subheading 3.2.1). 3. Assemble by aligning the shoots as well as possible for maximum contact area. 4 Notes 1. Collars are used to support the graft and hold the rootstock and scion together during graft healing. We have found that they increase the proportion of successful grafts. However, it is possible to perform hypocotyl grafting without collars. Although this is a less efficient process, it allows greater flexibility. The protocol is essentially the same as grafting with collars, with only slight alteration. For single grafts, a normal transverse cut can be used, but a “slit and wedge” graft (see Subheading 3.2.1) can give better results, since it holds the scion and rootstock together more effectively. It is also possible to remove completely both cotyledons prior to grafting (Fig. 1). This facilitates alignment of scion and rootstock lying flat on the media, does not require use of collars, and does not appear to reduce success rates. Another major advantage of collarless grafting is the ability to perform two-shoot “Y”-grafts, to test shoot-to-shoot signaling, which is not possible when a collar is used (see Subheading 3.2.2). 160 Katherine Bainbridge et al. Grafting can also be performed on short-day grown seedlings. The seedlings should be grown at a constant 23 °C (~100 μmol/m2/s) for 7–9 days and then grafted. After grafting, they should be returned to this temperature regime for at least 1 week but up to 6 weeks, at which point successfully grafted plants can be transferred to soil. 2. The razor blades should have very fine edges in order to make clean cuts and avoid squashing the hypocotyl. Standard industrial razor blades are not appropriate. Number 15 scalpel blades may be used, but we find the best results are given by Wilkinson Sword “Classic” double-sided razor blades (or equivalent). The razor blades must be sharp at all times, and so should be changed frequently. A different blade should be used for cutting the collars (see Note 4). For wedge-shaped single and Y-graft connections, a disposable microsurgery knife is ideal because of its thin and ultrasharp blade (but care is needed to avoid damage to the delicate cutting edge). 3. For the initial 3-d period, a standard regime of 16 h light/8 h dark, 22/18 °C, and 100 μmol/m2/s should be used. For the second 2 days and the graft-healing period, a regime of 16/8 h light/dark, constant 27 °C temperature, and 60 μmol/m2/s should ideally be used. Growing the seedlings at 27 °C increases the levels of endogenous auxin in the plant, which in the first instance increases hypocotyl length [23], allowing easier grafting, and in the second instance promotes callus formation and healing. The reduction in light intensity reduces twisting of the hypocotyls. Such twisting makes grafting more difficult and disrupts graft healing. 4. The collars used to hold grafts together are made from sterile 0.3 mm i.d. silicone tubing by slicing the tubing into ~2 mm sections. Difficulties will be experienced in fitting the rootstock and scion together if the collars are too long. The collars can also be slit longitudinally before use, which allows the collar to open up as the plant grows, or for the collar to be removed after the graft has healed fully. A pointed scalpel blade (e.g., No. 11) is best for this, and slitting can be facilitated by first inserting the point of the scalpel into uncut 3 cm lengths of tubing then pulling the tubing over the blade cutting surface using fine forceps. 5. There are two key points to assembling successful grafts. The first is to select the most appropriate seedlings on each plate. For most situations, the selected seedlings should have long, straight hypocotyls and strong root growth. There is a small range of hypocotyl thicknesses that can be used. It is often difficult to distinguish which seedlings have the correct dimensions; trial and error is required to some degree. Seedlings Grafting in Arabidopsis 161 with hypocotyls that do not fit into the collars easily should be discarded, as forcing them in will damage the seedling. Similarly, seedlings that are a very loose fit in the collar should also be discarded as the graft will not be held together effectively. If seedlings of the correct size are used, the graft should fit effortlessly together. The second key element is in the cutting of the hypocotyl. Cuts should be as clean and straight as possible. The hypocotyl should not be squashed during cutting, and it should not be necessary to cut into the agar to cut through the seedling. These problems can be avoided by use of a new blade. A sharp razor should slice through the hypocotyl with almost no resistance. In addition, preventing seedlings sinking into the gel can be achieved by using double strength gel in the media and/or growing the seedlings on a “raft” of cellulose membrane filter (Millipore type) on the surface of the gel. Initially, it may take some practice to be able to cut the hypocotyls in the correct way, and it is advisable to have a few hours’ experience on other seedlings before attempting grafting itself. It is also important to cut the seedlings in the correct place. Best results appear to be produced if the rootstock donor is cut three-quarters of the way up the hypocotyl, and the scion donor is cut halfway up the hypocotyl. In this case, both the root of the scion donor and the shoot of the rootstock donor cannot be used for further grafts and should be discarded. It is possible to use all excised parts by cutting all seedlings halfway up the hypocotyl and simply swapping scions between rootstocks, but this may increase the risk of adventitious rooting and make insertion into the tubing more difficult. 6. Plates used for grafting should be as moist as possible at all times, since high humidity aids the graft-healing process. It may, however, be necessary to remove excess surface water before grafting. If this is the case, or if the plants appear to be drying out (e.g., indicated by dull, soft, or wilting cotyledons), a small amount of sterile water can be added to the plates as needed during grafting and before they are sealed up at the end of the procedure. 7. Only truly grafted seedlings should be used; otherwise results may be erroneous. This can only be shown definitively when the plants are harvested (see Note 8), at which stage it is generally obvious if a graft has succeeded. Visual inspection using a dissecting microscope should show if the scion and rootstock have fused. However, if further confirmation is needed, a very light pull of the scion with forceps will determine whether the graft has united. Grafts are often connected by 4 days but obviously strengthen further with time. Normally transfer of 162 Katherine Bainbridge et al. successful grafts can be done 6–7 days after grafting or a little longer for Y-grafts which need to be stronger. Usually a proportion of the scions will have produced adventitious roots from hypocotyl tissue within the collar, which displace the rootstocks. These seedlings should clearly be discarded. Scions which produce adventitious roots above the level of the collar, but which have also joined to the rootstock, can theoretically be used, as long as the adventitious roots are excised. However, adventitious root formation is often a sign of poor graft connection, so rescuing grafts by root excision may be futile. 8. Transfer plants to soil as soon as shoot (and root) growth seems reestablished, usually 6–7 days after grafting. To minimize stress, keep everything wet during transfer—add extra water to plates, saturate potting mix, spray plants with fine mister, and cover the tray as soon as it has been filled with plants. Pick plants off plates carefully—“hook up” with fine forceps or grab edge of cotyledon. With Y-grafts, be careful not to bend the graft union—it will probably break. Drop roots into a prebored hole and gently push potting mix across to hold roots in place. Do not bury the graft union; otherwise it is hard to inspect and adventitious rooting will be promoted. Keep tray vents closed for the first 3 days or so, then open vents for another 3 days. Remove lid after about a week. Keep growth cabinet humidity high if possible. Often a few casualties are seen soon after the lid is removed—these have poor root systems (poor grafts or adventitious root removal was too much for them). 9. Confirming that the plants have grafted successfully, and can therefore be included in the dataset, is normally a destructive process and is thus best performed after phenotypic assessment. Plants should be removed from the growth medium intact and the graft union found. Often the silicon collar is split by the broadening of the stem and may be absent, but the union is usually identifiable by the clear scarring at the site (Fig. 1). Depending on the nature of the experiment, either the majority or all of the root tissue must originate beneath the level of the union. Otherwise, the plants are essentially in an “ungrafted” state. Use of a GUS reporter gene can aid in the verification of graft integrity. If one of the genotypes of plant carries a broadly expressed promoter-GUS transgene (e.g., CaMV 35S::GUS; Fig. 1), then it is possible to use GUS activity to verify the correctly grafted plants and also to identify adventitious roots of the “wrong” genotype. Grafting in Arabidopsis 163 References 1. Booker JP, Chatfield SP, Leyser O (2003) Auxin acts in xylem-associated or medullary cells to mediate apical dominance. Plant Cell 15:495–507 2. Furner IJ et al (1996) Clonal analysis of the late flowering fca mutant of Arabidopsis thaliana: Cell fate and cell autonomy. Development 122:1041–1050 3. Jenik PD, Irish VF (2000) Regulation of cell proliferation patterns by homeotic genes during Arabidopsis floral development. Development 126:1267–1276 4. Woodrick R et al (2000) Arabidopsis embryonic shoot fate map. Development 127:8 13–820 5. Rhee SY, Somerville CR (1995) Flat-surface grafting in Arabidopsis thaliana. Plant Mol Bol Rep 13:118–123 6. Turnbull CGN, Booker JP, Leyser HMO (2002) Micrografting techniques for testing long-distance signalling in Arabidopsis. Plant J 32:255–262 7. Ayre BG, Turgeon R (2004) Graft transmission of a floral stimulant derived from CONSTANS. Plant Physiol 13:2271–2278 8. Sorefan K et al (2003) MAX4 and RMS1 are orthologous dioxygenase-like genes that regulate shoot branching in Arabidopsis and pea. Genes Dev 17:1469–1474 9. Booker J et al (2004) MAX3/CCD7 is a carotenoid cleavage dioxygenase required for the synthesis of a novel plant signaling molecule. Curr Biol 14:1232–1238 10. An HL et al (2004) CONSTANS acts in the phloem to regulate a systemic signal that induces photoperiodic flowering of Arabidopsis. Development 131:3615–3626 11. Van Norman JM, Frederick RL, Sieburth LE (2004) BYPASS1 negatively regulates a rootderived signal that controls plant architecture. Curr Biol 14:1739–1746 12. Ragni L et al (2011) Mobile gibberellin directly stimulates Arabidopsis hypocotyl xylem expansion. Plant Cell 23:1322–1336 13. Green LS, Rogers EE (2004) FRD3 controls iron localization in Arabidopsis. Plant Physiol 136:2523–2531 14. Widiez T et al (2011) HIGH NITROGEN INSENSITIVE 9 (HNI9)-mediated systemic repression of root NO3− uptake is associated with changes in histone methylation. Proc Natl Acad Sci USA 108:13329–13334 15. Lin SI et al (2008) Regulatory network of microRNA399 and PHO2 by systemic signaling. Plant Physiol 147:732–746 16. Pant BD et al (2008) MicroRNA399 is a longdistance signal for the regulation of plant phosphate homeostasis. Plant J 53:731–738 17. Xia YJ et al (2004) An extracellular aspartic protease functions in Arabidopsis disease resistance signaling. EMBO J 23:980–988 18. Brosnan CA et al (2007) Nuclear gene silencing directs reception of long-distance mRNA silencing in Arabidopsis. Proc Natl Acad Sci USA 104:14741–14746 19. Melnyk CW et al (2011) Mobile 24 nt small RNAs direct transcriptional gene silencing in the root meristems of Arabidopsis thaliana. Curr Biol 21:1678–1683 20. Mugford S et al (2007) The Arabidopsis transmissible wound signal. Comp Biochem Physiol Part A Mol Integr Physiol 146:S242 21. Wilson AK et al (1990) A dominant mutation in Arabidopsis confers resistance to auxin, ethylene and abscisic acid. Mol Gen Genet 222:377–383 22. Murashige T, Skoog F (1962) A revised medium for rapid growth and bioassays with tobacco tissue cultures. Physiol Plantarum 15:473–497 23. Gray WM et al (1998) High temperature promotes auxin-mediated hypocotyl elongation in Arabidopsis. Proc Natl Acad Sci USA 95:7197–7202 Chapter 8 Agrobacterium tumefaciens-Mediated Transient Transformation of Arabidopsis thaliana Leaves Silvina Mangano, Cintia Daniela Gonzalez, and Silvana Petruccelli Abstract Transient assays provide a convenient alternative to stable transformation. Compared to the generation of stably transformed plants, agroinfiltration is more rapid, and samples can be analyzed a few days after inoculation. Nevertheless, at difference of tobacco and other plant species, Arabidopsis thaliana remains recalcitrant to routine transient assays. In this chapter, we describe a transient expression assay using simple infiltration of intact Arabidopsis leaves with Agrobacterium tumefaciens carrying a plasmid expressing a reporter fluorescent protein. In this protocol, Agrobacterium aggressiveness was increased by a prolonged treatment in an induction medium deficient in nutrients and containing acetosyringone. Besides, Arabidopsis plants were cultivated in intermediate photoperiod (12 h light–12 h dark) to promote leaf growth. Key words Transient gene expression, Arabidopsis thaliana, Agrobacterium tumefaciens, Leaf agroinfiltration, Fluorescent proteins 1 Introduction Stable transgenic Arabidopsis offer advantages in terms of a sustainable supply of plant material with homologous protein expression, the potential of mutant complementation, as well as a global examination option throughout all tissues and cell types. Although the often used floral dip procedure [1] generates transgenic Arabidopsis plants with minimal labor, plants must still be grown to maturity over several weeks. The need to harvest seed and perform selection also makes it impractical to test large numbers of different transgene constructs. Moreover, transgene expression in some cases could interfere with normal plant growth and development due to an overdose of the functional proteins or dominant negative effect of nonfunctional products. Transient gene expression provides a convenient alternative to stable transformation in analyzing gene function by virtue of its time and labor efficiency. Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_8, © Springer Science+Business Media New York 2014 165 166 Silvina Mangano et al. It only takes one to several days to perform the assay in its entirety, which allows many constructs to be assayed in parallel within a short time and dramatically speeds up the pace of research. Transient infiltration assays with Agrobacterium carrying a construct of interest are a powerful tool to gain inside into gene function, protein-protein interaction analysis, and promoter analysis [2–4]. Agrobacterium-mediated transient transformation is an easy, routine, and consistent operation in Nicotiana benthamiana leaves [2], and the procedure has also been adjusted to lettuce and tomato leaves [3, 5], as well as tomato fruits [6], roots [7], Antirrhinum floral tissues [8], and whole seedlings [9]. At difference of tobacco and other plant species, Arabidopsis still remains recalcitrant to routine transient assays, and high transient expression levels are obtained only in some ecotypes [3, 9–12]. However, when used as a heterologous system to express genes from the model species Arabidopsis, tobacco may not reflect the native activity or subcellular distribution of the corresponding proteins [10]. Pilot efforts to explore an Arabidopsis equivalent of tobacco leaf infiltration have demonstrated low-frequency success with great variation [3, 4, 13, 14]. Efforts to increase the frequency of Arabidopsis transient transformation success and also to decrease variation using young seedlings [10], as well as transient transformation of root epidermal cells by cocultivation with Agrobacterium rhizogenes [15], have been described. Difficulties in Arabidopsis transient transformation have been attributed to plant immune responses triggered by perception of Agrobacterium [12]. Using transgenic Arabidopsis expressing AvrPto (a suppressor of plant immunity from Pseudomonas syringae) under the control of a dexamethasone inducible promoter, an efficient Agrobacterium-mediated transient transformation method of Arabidopsis has been developed [12]. Nevertheless, this assay is limited to the use of transgenic plants expressing AvrPto. In this chapter, we describe a transient expression assay using simple infiltration of intact Arabidopsis leaves with Agrobacterium tumefaciens GV3101 cells carrying appropriate plasmid constructs. This protocol increases Agrobacterium aggressiveness by a prolonged treatment in the presence of acetosyringone (AS) and medium deficient in nutrients such as the induction one. In addition, the number of bacteria used is higher than the one used to infiltrate Nicotiana benthamiana leaves. Finally, Arabidopsis growing conditions are controlled in order to obtain healthy plants with an adequate leaf size to facilitate infiltration. We showed that a fluorescent reporter gene is easily introduced in Arabidopsis leaves and that most of the epidermal cells show fluorescence when fluorescence microscope and Confocal Laser Scanning Microscopy (CLSM) are used. Transient Transformation of Arabidopsis Leaves 2 167 Materials 1. Seeds of Arabidopsis thaliana Columbia (Arabidopsis Stock Center). 2. Pots (≈6 cm of diameter), compost, and perlite. 3. Agrobacterium tumefaciens GV3101 (strain that contains the sequences derivative of the nopaline-type disarmed Ti-plasmid pTiC58 and rifampicin resistance gene integrated on the chromosome and the helper plasmid pMP90 (pTiC58ΔT-DNA) with a gentamicin resistance gene) [16]. 4. Binary vector carrying the gene of interest (Gi) (e.g., cloned into pGWB, a group of vectors designed to facilitate fusions to different reporter proteins and also purification and detection tags [17]). 5. Kanamycin (Sigma-Aldrich) 1,000× stock solution: 100 mg/ mL in water. 6. Gentamicin (Sigma-Aldrich) 1,000× stock solution: 30 mg/ mL in water. 7. Rifampicin (Sigma-Aldrich) 1,000× stock solution: 10 mg/ mL in methanol. 8. Bacterial culture medium: YEB (yeast extract and beef) medium (Sigma-Aldrich). Add 18 g/l agar–agar for solid medium. 9. Glycerol solutions: 10 % and 80 % v/v in water. 10. Induction medium: 0.1 % (NH4)2SO4, 0.45 % KH2PO4, 1 % K2HPO4, 0.05 % sodium citrate, 0.2 % sucrose, 0.5 % glycerol, 1 mM MgSO4, and pH 5.7. 11. Infiltration medium: MES (Sigma-Aldrich) 10 mM, MgSO4 10 mM, and pH 5.7. 12. Acetosyringone: (Sigma-Aldrich): 200 mM in dimethyl sulfoxide (DMSO). 13. Perfluorodecalin 95 % (Sigma-Aldrich). 14. Syringes 1 mL. 15. Shaker. 16. Spectrophotometer. 17. Refrigerated centrifuge. 18. Gene Pulser II with the Capacitance Extender (Bio-Rad). 19. Microcentrifuge. 20. Fluorescence stereomicroscope equipped with a GFP Plant (excitation 470/40 nm, emission 525/50 nm) and DsRed (excitation 545/30 nm, emission 620/60 nm) filters and CCD camera. 21. Confocal laser scanning microscope with a 63× (NA 1.4) oil immersion objective. 168 3 Silvina Mangano et al. Methods 3.1 Growing Arabidopsis Plants 1. Fill the 6 cm pots with a mix of compost and perlite (3:1), and compress very lightly to give a firm bed and water. 2. Sow the seeds onto the surface of the mix compost/perlite by scattering them carefully. 3. Place the pots in a tray and transfer to a cold (4 °C) for 2–3 days in the dark, and cover with transparent PVC film to keep them in a high humidity environment. 4. Transfer the pots to a growth room under 90 μE in light cycle 12 h light–12 h dark at 22–24 °C (see Note 1). 5. After 4 weeks, Arabidopsis plants are generally in good conditions for transient expression assay (Fig. 1a) (see Note 2). 3.2 Transformation of A. tumefaciens with Binary Plasmid DNA by Electroporation 1. Pick a single colony of the A. tumefaciens GV3101 and inoculate 3 mL of YEB with gentamicin 30 μg/mL and rifampicin 10 μg/mL in a 15 mL sterile tube. Grow at 28 °C overnight in a shaker at 200 rpm in the dark. 3.2.1 Preparation of Competent Cells of Agrobacterium 2. Inoculate 500 mL flasks each containing 100 mL of YEB with 0.5 mL (1/100 volume) of the overnight culture and grow at 28 °C with vigorous shaking until OD600nm of 0.5–0.6. It takes ~4–5 h to get the cells to this stage. 3. Spin 5 min at 5,000 × g at 4 °C. Pour off supernatant. 4. Resuspend cells in 50 mL (~1/2 volume) ice-cold 10 % glycerol. Repeat spin. 5. Resuspend cells in 25 mL of ice-cold 10 % glycerol. Repeat spin. 6. Resuspend cells in 12 mL of ice-cold 10 % glycerol. Repeat spin. 7. Resuspend final pellet in 1.5 mL ice-cold 10 % glycerol. Fig. 1 (a) Arabidopsis 4-week-old plants. (b) Using a yellow tip, create small holes in the leaves. (c) Press the nozzle of a 1 mL syringe against the lower (abaxial) epidermis of Arabidopsis leaf Transient Transformation of Arabidopsis Leaves 169 8. Dispense 100 μL aliquots into fifteen 1.5 mL microfuge tubes pre-chilled on ice. Each tube will have enough cells for 2 transformations. 9. Quick-freeze the tubes in liquid nitrogen and store at −80 °C. 3.2.2 Electroporation 1. Remove one tube of competent cells from the freezer and place it on ice. Allow to thaw slowly on ice. 2. Add 1–2 μL of DNA (50–100 ng in water) and wait for 1 min. 3. Transfer cells plus DNA to pre-chilled (on ice) electroporation cuvettes with either 1 or 2 mm gap sizes. Make sure the white cuvette holder from the Bio-Rad Gene Pulser II is also prechilled on ice. 4. Take the ice bucket with the cuvettes and cuvette holder to the Gene Pulser. For cuvettes with a 2 mm gap size, adjust the Gene Pulser II unit “Set Volts” setting to 2.5 kV and the capacitance setting to 25 μFD. Set the resistance to 200 Ω on the Pulse Controller Unit. 5. Place the cuvette in the cuvette holder, slide down to engage the electrodes, and push both buttons on the Gene Pulser, holding them until the tone sounds. 6. Add 500 μL of YEB medium directly to the cuvette immediately after the pulse and incubate in a shaker at 200 rpm and 28 °C overnight. 7. Plate 100–200 μL on selective media (i.e., antibiotic selection for both the bacterial host strain and the plasmid). 8. Incubate plates 2 days at 28 °C when the colonies should be visible. 9. Check the presence of the introduced vector by a Colony PCR (see Note 3). 10. Grow a single colony in 5 mL YEB with gentamicin (30 μg/ mL), rifampicin (10 μg/mL), and kanamycin (100 μg/mL) in the dark at 28 °C and 200 rpm (see Note 4). 11. Store as glycerol stock (800 μL of fresh overnight culture + 200 μL sterile 80 % glycerol) at −80 °C (see Note 5). 3.3 Agrobacterium Growing for Infiltration 1. Plate 100–200 μL of a glycerol stock on YEB medium with 30 μg/mL gentamicin, 10 μg/mL rifampicin, and 100 µg/mL kanamycin (if the Gi is in a kanamycin resistance binary vector such as pGWB [17]). After incubation at 28 °C, pick a single colony of the Agrobacterium tumefaciens GV3101 containing the plasmid of interest and inoculate 5 mL of YEB with antibiotics. Grow at 28 °C overnight in a shaker at 200 rpm in the dark. 2. Dilute the overnight culture in YEB with antibiotics to reach an absorbance OD600nm of approximately 0.3 and add acetosyringone at 100 μM for virulence gene induction. Incubate at 28 °C and 200 rpm until the culture reach OD600nm of 0.6. 170 Silvina Mangano et al. 3. Spin the culture at 5,000 × g for 5 min. 4. Resuspend in 5 mL induction medium supplemented with antibiotics and acetosyringone at 200 μM. Incubate at 30 °C and 200 rpm for 3–4 h. 5. Pellet the culture at 5,000 × g for 5 min in a microcentrifuge at room temperature. 6. Resuspend the pellet in 5 mL of infiltration medium and centrifuge as above. Repeat once. 7. Dilute the bacterial suspension with infiltration medium supplemented with acetosyringone at 200 μM to adjust the inoculum to an appropriate concentration (see Note 6). 3.4 Transient Gene Expression 1. Agroinfiltration is conducted by infiltrating the agrobacterial suspension into the abaxial surface of fingernail-sized leaves attached to the intact plant (see Note 7). Using a yellow tip, make small holes in the leaves (Fig. 1b). 2. Load the inoculum in 1 mL plastic syringe and press the nozzle of the syringe (no needle) against the lower (abaxial) epidermis of an Arabidopsis leaf, covering the small hole with the nozzle and holding the leaf with a gloved finger on the adaxial face. Introduce the Agrobacterium in infiltration medium by slowly injection (Fig. 1c) (see Note 8). 3. Using a glass permanent maker, mark the infiltrated region. 4. Place the infiltrated Arabidopsis plants in the growth room (light cycle 12 h light–12 h dark at 22–24 °C) for 2–5 days. 5. If the plants were infiltrated with Agrobacterium with a fluorescent reporter, check the presence of the fluorescent protein (FP) using fluorescence stereomicroscope equipped with an appropriated filters (Fig. 2). Exposition time should be adjusted with a no transformed leaf (Fig. 2a) to distinguish the FP signal from the autofluorescent (Fig. 2b) (see Note 9). 3.5 Confocal Imaging 1. Excise a marked area of the leaf and mount it on a glass microscope slide containing a few drops of water. 2. Fill a 1 mL plastic syringe with a needle with perfluorodecalin, drop it over the leaf, and place the cover glass over the leaf (see Note 10). 3. Examine with a confocal laser scanning microscope, using a 63× (NA 1.4) oil immersion objective (see Note 11). GFP was excited at 488 nm (Ar 100 mW Laser) and detected in the 496–532 nm range. YFP was excited at 514 nm (Ar 100 mW Laser) and detected in the 525–559 nm range (Fig. 3a). mCherry and RFP were excited at 543 nm (HeNe 1.5 mW laser) and detected in the 570–630 nm range (Fig. 3b). To analyze colocalization, combine both channels (Fig. 3c) (see Notes 12 and 13). Transient Transformation of Arabidopsis Leaves 171 Fig. 2 Fluorescent micrographies of Arabidopsis leaves 3 days post-agroinfiltration. (a) Control leaf infiltrated with Agrobacterium without the plasmid containing the FP. (b) Leaf infiltrated with Agrobacterium with the plasmid containing the gene of interest fused to RFP (red fluorescent protein). Scale bar 2 mm Fig. 3 Confocal scanning micrography of Arabidopsis leaves agroinfiltrated with ER-YFP and GI-RFP. (a) Yellow channel. (b) Red channel. (c) Merge channel. Scale bar 10 μm 4 Notes 1. Arabidopsis is a facultative long-day plant whose flowering is delayed in proportion to the light that the plant perceives. This photoperiod was chosen to promote leaf growth without altering drastically the flowering period. Arabidopsis plants are usually watered every 2 days. 2. Older plants with larger leaves also work, but the transformation efficiency decreases rapidly with the increase of plant age. 3. When Colony PCR is performed using Agrobacterium cells, the initial steps at 94 °C should be 10 min instead of 4 min, to promote the lysis of the cells. After this step, add the mix containing dNTPs, primers, and DNA taq polymerase. 172 Silvina Mangano et al. 4. Agrobacterium tumefaciens GV3101 is resistant to gentamicin (30 μg/mL) and rifampicin (10 μg/mL) and is sensitive to kanamycin so is a good strain for use with binary vectors that contains npt II gene. 5. Store several colonies for each vector, since there are differences in the expression levels of different colonies carrying the same binary vector. 6. The density of the bacterial suspension is also important for infiltration. Suspensions with an OD600nm below 0.1 result in weak transgene expression. Infiltrations with bacterial suspensions with OD600nm above 1.0 often result in tissue yellowing or wilting. The best results are obtained for suspension of OD600nm between 0.4 and 0.6. 7. Agroinfiltration is preferably conducted during late afternoon or evening; therefore, T-DNA transfer occurs overnight. 8. Plants of similar size should be selected for optimal comparisons of experimental controls and tests. In addition, infiltration should be performed with leaves of the same age. Usually, leaves 6–8 are chosen for infiltration. 9. Observation can be performed using the whole plant without cutting the leaf, what allow to make a temporal analyzes. 10. The perfluorodecalin has a low surface tension [18]; therefore, it penetrates leaf stomatal pores and fills the intercellular air spaces of the mesophyll. Treatment with perfluorodecalin increases sensitive and improves the quality of the pictures. 11. The fluorescence is detected only in cells of the epidermis of the leaf. No fluorescence is found in leaf mesophyll cells, indicating that Agrobacterium was only able to transfer the DNA-T to cells of the leaf outer layers. 12. Simultaneous detection of RFP/mCherry and YFP or GFP is performed by combining the settings indicated above in the sequential scanning as instructed by the manufacturer. 13. When working with fusion proteins, the size of the protein of interest (Pi) fused to FP reporter should be analyzed by Western blot, to be sure that Pi was not separate of FP by proteolytic cleavage. Acknowledgements This research was supported by the Agencia Nacional de Promoción Científica y Tecnológica (ANPCyT) through the grants PICT20070479 and PICT2010-2366 to Petruccelli Silvana and by Universidad Nacional de La Plata (project 11X/498). Petruccelli Silvana is a member of the Consejo Nacional de Investigaciones Transient Transformation of Arabidopsis Leaves 173 Científicas y Técnicas de Argentina (CONICET). Silvina Mangano is a researcher of Departamento de Ciencias Biológicas, Facultad de Ciencias Exactas. Universidad Nacional de la Plata. References 1. Clough SJ, Bent AF (1998) Floral dip: a simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana. Plant J 16:735–743 2. Yang Y, Li R, Qi M (2000) In vivo analysis of plant promoters and transcription factors by agroinfiltration of tobacco leaves. Plant J 22:543–551 3. Wroblewski T, Tomczak A, Michelmore R (2005) Optimization of Agrobacteriummediated transient assays of gene expression in lettuce, tomato and Arabidopsis. Plant Biotechnol J 3:259–273 4. Lee MW, Yang Y (2006) Transient expression assay by agroinfiltration of leaves. Meth Mol Biol (Clifton NJ) 323:225–229 5. Joh LD et al (2005) High-level transient expression of recombinant protein in lettuce. Biotechnol Bioeng 91:861–871 6. Orzaez D et al (2006) Agroinjection of tomato fruits. A tool for rapid functional analysis of transgenes directly in fruit. Plant Physiol 140:3–11 7. Kumagai H, Kouchi H (2003) Gene silencing by expression of hairpin RNA in Lotus japonicus roots and root nodules. Mol Plant Microbe Interact 16:663–668 8. Shang Y et al (2007) Methods for transient assay of gene function in floral tissues. Plant Methods 3:1 9. Li JF et al (2009) The FAST technique: a simplified Agrobacterium-based transformation method for transient gene expression analysis in seedlings of Arabidopsis and other plant species. Plant Methods 5:6 10. Marion J et al (2008) Systematic analysis of protein subcellular localization and interaction 11. 12. 13. 14. 15. 16. 17. 18. using high-throughput transient transformation of Arabidopsis seedlings. Plant J 56: 169–179 Boyko A, Matsuoka A, Kovalchuk I (2011) Potassium chloride and rare earth elements improve plant growth and increase the frequency of the Agrobacterium tumefaciensmediated plant transformation. Plant Cell Rep 30:505–518 Tsuda K et al (2012) An efficient Agrobacterium-mediated transient transformation of Arabidopsis. Plant J 69:713–719 Rakousky S et al (1997) Transient β-glucuronidase activity after infiltration of Arabidopsis thaliana by Agrobacterium tumefaciens. Biol Plant 40:33–41 McIntosh KB et al (2004) A rapid Agrobacterium-mediated Arabidopsis thaliana transient assay system. Plant Mol Biol Rep 22:53–61 Campanoni P et al (2007) A generalized method for transfecting root epidermis uncovers endosomal dynamics in Arabidopsis root hairs. Plant J 51:322–330 Koncz C, Schell J (1986) The promoter of TL-DNA gene 5 controls the tissue-specific expression of chimaeric genes carried by a novel type of Agrobacterium binary vector. Mol Gen Genet 204:383–396 Nakagawa T et al (2007) Improved gateway binary vectors: high-performance vectors for creation of fusion constructs in transgenic analysis of plants. Biosci Biotechnol Biochem 71:2095–2100 Sargent JW, Seffl RJ (1970) Properties of perfluorinated liquids. Fed Proc 29:1699–1703 Chapter 9 iTILLING: Personalized Mutation Screening Susan M. Bush and Patrick J. Krysan Abstract One powerful approach to studying gene function is to analyze the phenotype of an organism carrying a mutant allele of a gene of interest. In order to use this experimental approach, one must have the ability to easily isolate individual organisms carrying desired mutations. A widely used method for accomplishing this task in plants and other organisms is a procedure called TILLING. A traditional TILLING project has at its foundation an ordered mutant population produced by treating seeds with a chemical mutagen. From this mutagenized seed, thousands of individual mutant lines are produced, and corresponding DNA samples are collected. For several plant species, publicly accessible screening facilities have been established that perform mutant screens on a gene-by-gene basis in response to customer requests using PCR and heteroduplex detection methods. The iTILLING method described in this chapter represents an individualized version of the TILLING process. Performing a traditional TILLING experiment requires a large investment in time and resources to establish the well-ordered mutant population. By contrast, iTILLING is a low-investment alternative that provides the individual research lab with a practical solution to mutation screening. The main difference between the two approaches is that iTILLING is not based on the establishment of a durable, organized mutant population. Instead, a system for growing Arabidopsis seedlings in 96-well plates is used to produce an ephemeral mutant population for screening. Because the intention is not to develop a longterm resource, a considerable savings in time and money is realized when using iTILLING as compared to traditional TILLING. iTILLING is not intended to serve as a replacement to traditional TILLING. Rather, iTILLING provides a strategy by which custom mutagenesis screens can be performed by individual labs using unique genetic backgrounds that are of specific interest to that research group. Key words TILLING, Mutagenesis, Mutation detection, Mutation screening, Reverse genetics, iTILLING 1 Introduction Reverse genetics is a well-established method for analyzing gene function in plants. The reverse genetic process begins with the scientist isolating plants that carry a mutation within a gene of interest. These mutant individuals are then analyzed to determine if any abnormal phenotypes can be attributed to the mutations. TILLING (Targeting Induced Local Lesions IN Genomes) is a commonly used reverse genetic strategy that was originally Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_9, © Springer Science+Business Media New York 2014 175 176 Susan M. Bush and Patrick J. Krysan developed using Arabidopsis thaliana, and it constitutes a method for screening an ordered population of randomly mutagenized plant lines for the presence of mutations in any gene of interest [1]. The most widely used mutagen for TILLING experiments is the chemical ethyl methanesulfonate (EMS), which produces mainly single-base change mutations that convert G/C base pairs to A/T [2]. The first step in a traditional TILLING project is to produce an ordered population of several thousand independent mutagenized lines. These mutant lines are maintained individually or in small pools composed of a few lines each. DNA samples are then prepared from the ordered population, again as individuals or small pools. In order to find mutations within this population, PCR amplification is performed using primers that amplify sequences from a gene of interest. These PCR reactions are performed using DNA samples from small pools of mutant lines, typically four to eight mutant lines per pool. Mutation detection is accomplished using any of a variety of methods that allow one to detect the presence of heteroduplexes within a population of gene-specific PCR products [1, 3–5]. For example, if all of the plants present in a given pool of mutagenized lines carry the wild-type allele of the gene targeted by PCR, all of the PCR products will be homoduplexes. However, if one of the plants in the pool carries a mutation in the target gene, then some of the PCR products produced in that pool will form heteroduplexes when sequences amplified from the mutant gene anneal with wild-type copies of the same amplicon produced from the other lines present in that pool. The traditional method for heteroduplex detection with TILLING has been the use of an endonuclease treatment to cleave heteroduplexes, followed by gel electrophoresis to visualize cleavage products and identify associated plants or pools carrying a mutation [3]. More recently, high-resolution melting analysis of PCR amplicons has been shown to be an effective strategy for identifying heteroduplexes in the context of TILLING screens [4, 6]. Because establishing a traditional TILLING project involves a substantial investment of time and resources, it is not a practical solution for the typical lab that wishes to perform their own reverse genetic screen using a genetic background of their choice. By contrast, the iTILLING procedure described in this chapter has been specifically developed to meet the needs of the individual laboratory that wishes to screen for mutations in a species or a genetic background for which a traditional TILLING population is not available. iTILLING accomplishes this goal by removing the need to invest large amounts of time and money creating a durable, ordered mutant population. Because it is based on the establishment of an ephemeral mutant population, iTILLING provides users with the ability to quickly screen for mutations within a handful of genes (Fig. 1). The first step in the iTILLING process is to treat seeds with the chemical mutagen EMS, followed by the iTILLING 177 Fig. 1 Work flow of iTILLING. Seeds are treated with EMS to produce the M1 population from which M2 seeds are collected in bulk. M2 seedlings are grown on 96-well Ice-Cap plates, and tissue samples are collected 96 at a time. PCR and melt-curve analysis are done to identify heteroduplex products indicative of a mutation. Plants carrying a desired mutation are transplanted from the 96-well plate to soil. The time required to go from initial mutagenesis to the identification of mutations of interest is less 4 months. Figure from ref. [7] production of the ephemeral screening population. From this population, genomic DNA samples are collected from individuals grown in a 96-well format. Finally, using high-resolution melting, the population of plants is screened for EMS-induced mutations within genes of interest (Fig. 2). Recent advances in DNA sequencing technology have raised the possibility that, instead of 178 Susan M. Bush and Patrick J. Krysan Fig. 2 iTILLING mutation detection using high-resolution melt-curve analysis. Characteristic melt curves of PCR products amplified from wild-type (wt) Arabidopsis DNA and from DNA containing a heterozygous SNP are shown. The thick line represents the melt curve of the heteroduplex product. The thin line represents the melt peak of the wild-type homoduplex. −dRFU/dT, negative change in relative fluorescence units over the change in temperature. Figure adapted from ref. [7] high-resolution melting, direct sequencing of PCR products could be used to screen for mutations in an iTILLING ephemeral population. The following protocol describes the iTILLING process, including the high-throughput seedling growth and tissue sample collection process called Ice-Cap, as it can be used for the detection of mutations in genes of interest in Arabidopsis and other plants using PCR-based screening and high-resolution melt-curve analysis. 2 Materials 2.1 EMS Mutagenesis and Collection of M2-Mutagenized Seed 1. Arabidopsis seeds with a genetic background of choice. 2. 0.2 % (v/v) ethyl methanesulfonate (EMS) (see Note 1). 3. 1 L flask. 4. NaOH for cleanup. 5. Squirt bottle. 6. 0.01 % agar. 7. Flats of moistened soil. 2.2 Seedling Growth Using Ice-Cap 1. M2 seeds. 2. 95 % ethanol. 3. Whatman filter paper. 4. Growth media: 0.5× Murashige and Skoog (MS) basal salt mixture, 2 mM morpholinoethanesulfonic acid (MES), 0.6 % agar (w/v), and pH 5.7. Autoclave to sterilize. iTILLING 179 5. Seedling plates: 96-well deep well plates, e.g., Fisher Scientific Nunc brand, 1-mL filter plates without frit (Fisher catalog no. 278012). 6. Adhesive sealing film. 7. Multichannel pipette, 20–200 μL volume. 8. Plastic reagent troughs for use with multichannel pipettors. 9. Clear plastic lid for each seedling plate. 10. Micropore tape. 11. Root plates: 96-well PCR plates with raised well rims. 12. Stainless steel ball bearings (3/32″ diameter). 13. Elastic bands to hold seedling plates and root plates together. 14. Shallow metal baking pan, e.g., jelly roll pan cookie sheet (ca. 17″ × 12″). 15. Small submersible water pump. 16. Plastic tubing of appropriate size for the pump. 17. Ca. 26 L plastic storage bin—longer than metal sheet; deeper than the pump. 18. Metal rack with adjustable screws for leveling (Fig. 3). 19. Plastic clamps to secure the tubing to the metal baking pan. 2.3 DNA Collection Using Ice-Cap 1. Wooden skewers. 2. Dry ice. 3. 95 % (v/v) ethanol. 4. Freezing tolerant glass dish, e.g., Pyrex baking dish. 5. 96-Well metal thermal block for freezing root tissue. 6. Tris-EDTA solution (500 mM Tris, pH 8; 50 mM EDTA, pH 8). 7. Thermal adhesion foil to seal root plates for DNA extraction. 8. Heat sealing machine. 9. GenoGrinder or other agitator equipped for 96-well plates. 10. Centrifuge equipped for 96-well plates. 2.4 Mutation Screening Using DNA Amplification and High-Resolution Melt-Curve Analysis 1. DNA collected from seedlings grown in Ice-Cap. 2. Gene-specific PCR primers, each at a concentration of 10 μM. 3. Dideoxynucleotide triphosphates (dNTPs) at a total concentration of 100 μM (25 μM for each dNTP). 4. Taq polymerase, stable at room temperature. 5. 10× Taq Buffer, final concentration: 750 mM Tris pH 9, 200 mM (NH4)2SO4, 30 mM MgCl2, 0.1 % (v/v) Tween 20. 6. SYTO13 double-stranded DNA-binding dye from Invitrogen. 180 Susan M. Bush and Patrick J. Krysan Fig. 3 The Ice-Cap fountain is used to maintain a constant water level at the precise height of the tops of the wells of the root plates. This continuous watering system ensures that the water in the wells of the root plates does not become depleted due to evaporation or transpiration. (a) A homemade rack that supports the cookie sheet on which the stacked Ice-Cap plates sit. (b) A closeup view of one of the 1″ nuts that provides a means for precisely adjusting the level of the cookie sheet so that a uniform water depth is achieved across the surface of the fountain. (c) A display of all the parts needed to construct the homemade rack for an Ice-Cap fountain. (d) The assembled Ice-Cap fountain. A submersible fountain pump constantly moves water from the lower reservoir to the cookie sheet, which rests on top of the homemade rack. A spring-loaded clamp is used to attach the hose to the edge of the cookie sheet. Figure adapted from ref. [10] 7. Instrument capable of performing high-resolution DNA melting analysis, such as the Bio-Rad CFX96 thermal cycler, equipped with a camera to visualize changes in DNA-associated fluorescence with increasing temperatures. 8. Rubber tubing for attachment to pressurized air line. 9. Soil and pots for transplanting seedlings of interest. 3 Methods 3.1 EMS Mutagenesis and Collection of M2-Mutagenized Seed 1. Imbibe and stratify seeds of the genotype of interest in dH2O at 4 °C for 2 days (see Note 2). 2. Treat M1 seeds with 0.2 % (w/v) EMS for 16 h at room temperature in a 1 L flask shaking at 100 rpm under low light (see Note 1). iTILLING 181 3. Rinse the seeds eight to ten times in water by allowing the seeds to settle and pouring off the water. For the final rinse, allow the seeds to soak in the water for 1 h (see Notes 3 and 4). 4. Suspend the M1 seeds in 0.01 % agar in a squirt bottle. Using the squirt bottle, plant the seeds evenly across the moistened soil (see Note 4). 5. Allow the M1 plants to grow to maturity. Collect the M2 seeds from all plants in bulk (see Note 5). 3.2 Seedling Growth Using Ice-Cap 1. Autoclave the 96-well seedling plates to sterilize them. Allow the plates to dry in a laminar flow hood (see Note 6). 2. Seal the base of seedling plates using adhesive sealing film (see Note 6). 3. Add 450 μL of growth media, still molten after autoclaving, to the seedling plates. Use plastic reagent troughs and a multichannel pipettor to aliquot the media into the seedling plate while the media is still liquefied (see Notes 7 and 8). 4. Allow the agar in the seedling plates to solidify in the flow hood. If seeds are not going to be added to the plates immediately, cover each plate with a clear plastic lid, seal with micropore tape, and store at 4 °C. 5. Sprinkle dry M2 seeds onto dry Whatman filter paper. Dispense 95 % ethanol onto the seeds to surface-sterilize them. Allow the seeds on the filter paper to air dry. 6. Plate the M2 seeds 1 per well onto the solidified agar surface of the seedling plates. Make sure to label the plates (see Notes 9 and 10). 7. Cover each seedling plate with a clear plastic lid and seal using micropore tape; wrap the plates in foil and store them in the dark at 4 °C for 3 days to stratify the seeds. 8. After 3 days, remove the seedling plates from foil and place them under fluorescent lights with the clear plastic lids still in place for 4–7 days at 18–20 °C to germinate and grow. Remove the clear lids after several days, especially if condensation occurs (see Note 11). 9. After 4–7 days in the light, the seedlings will be ready to be transferred to the Ice-Cap fountain. To begin this process, prepare one root plate for each seedling plate by placing a 3/32″ stainless steel bead in each well of the root plate, and then fill the root plate with dH2O to the point that water is spilling out of the wells (approximately 340 μL per well). Make sure to label each root plate (see Note 12). 10. Assemble each seedling plate with its corresponding root plate by first removing the sealing film from the base of each seedling plate, and then inserting the base of each of the wells of the seedling plate into the corresponding wells of the root plate. 182 Susan M. Bush and Patrick J. Krysan Secure the upper and lower plates together using two or three elastic bands. 11. Assemble the Ice-Cap fountain. Place the metal rack into the storage bin. Place the metal pan atop the rack, and adjust the screws to level the pan. Fill the storage bin two-thirds of the way with a mixture of 3 parts distilled water to 1 part tap water and place the submersible pump, attached to the plastic tubing, in the water. Affix the tubing to the metal pan using the clamp, allowing the water to fill the pan and overflow into the bin. Adjust the leveling screws once again if necessary to achieve a uniform water level across the pan. Components and assembly of the Ice-Cap fountain are described in Fig. 3. 12. Place the seedling/root plate assemblies in the Ice-Cap fountain. Allow the plants to grow in the Ice-Cap fountain until the seedling roots have penetrated the agar and grown down to the bottoms of the root plates. The ideal temperature for growing Arabidopsis seedlings in an Ice-Cap fountain is 18 °C. This stage of the process may take from 10 days to 3 weeks depending on the specific growth conditions and genotype (see Notes 13 and 14). 13. When the seedling roots have reached the bottom of the root plate in the majority of the wells, remove the seedling/root plate assemblies from the fountain. Insert 2–3 wooden skewers between each seedling plate and its corresponding root plate to slightly separate the assembled plates. The elastic bands should remain on the assembled plates at this stage. Allow the seedling/root plate assemblies containing the wooden skewers to stand under light for one day outside of the Ice-Cap fountain to allow the water level to drop in the wells of the root plates (see Note 15). 3.3 DNA Collection Using Ice-Cap 1. On the day of root tissue collection, prepare a freezing bath in a Pyrex dish using 95 % ethanol and dry ice. Place a 96-well thermal block in this freezing bath and allow it to equilibrate for 20–30 min (see Note 16). 2. Place the seedling plate/root plate assembly, still held together by elastic bands and still containing the wooden skewers, into the frozen thermal block. Freeze the root plate for 5 min. After freezing, remove the assembled plates from the thermal block and place them on the lab bench at room temperature. Remove the elastic bands and the wooden skewers from the stacked plates. Firmly press down on the top of the seedling plate to “crack” the plates, and then carefully peel the root plate and the seedling plate apart. 3. Seal the base of the seedling plates with film. Wrap the seedling plates in foil and transfer them to 4 °C in the dark for storage (see Note 17). iTILLING 183 4. Allow the water in the root plates to thaw completely at room temperature. Inspect the plates to determine if any wells have substantially less water than average. Hand pipette distilled water into wells that require additional water (see Note 18). 5. Add 25 μL of a Tris-EDTA solution (500 mM Tris, pH 8; 50 mM EDTA, pH 8) to each well of the root plate. 6. Seal the root plates using thermal adhesion foil using a heat sealing machine. 7. Agitate the sealed root plates using a GenoGrinder machine for 4 min at 1,350 strokes per minute in order to pulverize the root tissue with the steel ball that is present in each well. Next, centrifuge the root plates for 10 min at 2,100 × g at 4 °C to pellet the cellular debris. 8. The supernatant liquid from the root plate will contain genomic DNA. Dilute this supernatant in dH2O at a ratio of 1:5. Use 2 μL of this diluted extract as the template in a 20 μL PCR reaction (see Note 19). 3.4 Mutation Screening Using DNA Amplification and High-Resolution Melt-Curve Analysis 1. In advance, design PCR primers specific for your gene or genes of interest. In most cases, these PCR amplicons should target regions of the gene that encode highly conserved domains of the encoded protein or regions that maximize the probability of identifying of nonsense mutations (see Note 20). 2. Use the DNA collected using Ice-Cap as the template for PCR reactions that amplify targeted regions of your gene of interest. The double-stranded DNA-binding dye SYTO13 should be included in the PCR reaction mix. In a 20 μL reaction, use 2 μL DNA, 0.2 μM of each PCR primer, 2.5 μM SYTO13 nucleotide-binding dye, 0.2 mM each dNTP, 2 μL 10× PCR buffer, and Taq polymerase. This PCR amplification step can be done on any thermal cycler, without the requirement of a fluorescence detection camera (see Notes 21–23). 3. After amplification, transfer the PCR plate to an instrument that can perform high-resolution melting analysis. Melt the PCR products using a protocol such as 96 °C for 30 s, 40 °C for 15 s, ramp from 72 °C to 83 °C at 0.1 °C per s, capturing fluorescence images at each temperature. The SYBR/FAM emission/detection channel (450–530 nm) can be used to detect fluorescence of SYTO13 bound to double-stranded PCR amplicons (see Notes 23–26). 4. To identify the presence of a mutation in a given PCR amplicon, melt-curve analysis must be performed. The presence of a heterozygous SNP in the template DNA will result in a substantial change in the shape of the melt curve when compared to the wild-type control. Specifically, the d(RFU)/dT melt peak will display a distinctive shoulder on the low temperature 184 Susan M. Bush and Patrick J. Krysan side of the curve as the result of heteroduplex products present in the mixture (Fig. 2). A heteroduplex product, composed of one wild-type and one mutant DNA strand, will initiate melting at a slightly lower temperature than the corresponding homoduplex due to the single-base mismatch present in a heteroduplex (see Notes 25, 27, and 28). 5. In a typical experiment, one will process dozens of Ice-Cap plates, thereby extracting DNA from thousands of individual seedlings from the M2 population. These individuals will usually be screened for mutations using a number of different PCR primer pairs. Once an individual seedling has been identified as potentially carrying a mutation of interest, it should be extracted from the seedling plate that has been stored at −4 °C. Remove the seedling from the well by the application of a low velocity stream of air from a pressurized air source using rubber tubing to direct the air stream to the opening in the bottom of the seedling plate. The pressurized air will cause the agar plug to pop out of the well, with the seedling included (see Notes 29 and 30). 6. Transplant the seedling of interest to soil, retaining the agar plug surrounding the root tissue in order to increase seedling viability. 7. Once the transplanted seedling has adapted to growth in soil, collect a leaf sample and prepare a traditional DNA extraction from the plant of interest. 8. Confirm any mutations by repeated PCR amplification and meltcurve analyses using the freshly isolated DNA template, and then by Sanger sequencing to determine the precise mutation (see Note 29). 4 Notes 1. EMS is a mutagen, not only for plants but also for humans. Do all EMS work in a fume hood, and wear a lab coat and gloves at all times. EMS may have a variable rate of mutagenicity, based on the age of the solution and the quality of the seeds to which it is applied. Using 0.2 % (w/v) EMS, one may expect about 50 % mortality of seeds planted. A small-scale test mutagenesis may, however, be useful in determining the actual rate of mortality with the EMS solution and seeds intended for experimental use. For the typical iTILLING experiment, one should plan to produce an M1 population of 10,000–20,000 individuals in order to have a large sample of mutations from which to screen. The total number of seeds to mutagenize will therefore depend on the size of final mutant population that is desired and the mortality rate achieved by the specific EMS treatment used. iTILLING 185 2. The genetic background chosen for this mutagenesis will depend on the specific experimental goals of the scientist performing the experiment. For example, one may wish to isolate mutations in closely linked members of a tandemly duplicated gene family. In this case, one could choose as the starting material a plant that is homozygous for a T-DNA insertion within one member of this tandem gene family [7]. By mutagenizing seed from a plant that is homozygous for the T-DNA insertion allele, one would be able to screen for EMS-induced point mutations in the linked gene family members. 3. Clean the EMS waste using NaOH. EMS-contaminated rinse water should be brought to a concentration of 2 N NaOH and glassware can be soaked in 2 N NaOH for 2 days as well. Solid EMS waste, such as gloves and tips, should be kept separately from other chemical waste [8]. 4. After the final seed rinse, it may be useful to aliquot the treated, rinsed seeds into 1.5 mL tubes. When planting using a squirt bottle, dividing the seeds in advance will ensure even planting over a large soil area. We have found that M1 plants can be grown to maturity in soil at a density of up to 1.3 plants per square centimeter. Planting density should take into account the expected seedling mortality caused by the EMS treatment. 5. In this protocol, collection of the entire population of M2 seeds occurs in bulk. This is in contrast to traditional TILLING, where M2 seeds are collected separately for each M1 individual. Because iTILLING is designed to screen each seedling individually at a given set of genetic loci, no cataloging or storage of seed from individual lines is required. 6. After autoclaving the seedling plates, be sure to dry the plates thoroughly in the flow hood before applying the sealing film to prevent poor adhesion and consequent leakage of agar. In place of sealing film, clear plastic packing tape can alternatively be used as a more economical alternative to seal the bottoms of the 96-well seedling plates. To firmly and evenly affix the sealing film or tape to the seedling plate, a handheld microseal plate roller can be used. 7. One liter of media can be used to fill approximately 18 Ice-Cap seedling plates. 8. If adding the molten growth media by hand using a multichannel pipettor, it is wise to add one-third of the volume of media to all wells first, allowing it to solidify in the well, before adding the remaining volume to each well. This will prevent or reduce the likelihood that molten media will leak through the bottom of plates near the sealing film. The media can also be added to the sealed plates using an automated microplate liquid dispenser. 186 Susan M. Bush and Patrick J. Krysan Fig. 4 The steel beads dispenser used in assembling Ice-Cap plates. Above, a photograph of the homemade 96-well steel bead dispenser without metal balls, made using several sheets of aluminum foil wrapped around the lid of a pipette tip box. Below, the same device is shown with 96 steel beads of 3/32″ diameter loaded on top of it. Figure adapted from ref. [11] 9. In addition to Arabidopsis, both tomato and rice seedlings have been grown successfully using Ice-Cap [9, 10]. The Ice-Cap strategy should also be useful in growing and collecting tissue from additional species of plants as long as their seeds are small enough to fit in the wells of a 96-well plate. 10. To plate seeds into the seedling block, use a 200 μL pipette tip or a Pasteur pipette that has been heated to melt and seal the opening at the tip. Moisten the tip of this modified pipette on the agar surface, use it to pick up a single seed from the filter paper, and then place the seed gently onto the agar. Alternatively, seeds can be dropped into the wells of the seedling block one at a time by carefully tapping seeds from a piece of creased paper. Working with batches of 6–10 seeds on the sheet of paper is most effective. 11. M2-mutagenized seedlings may have a higher rate of mortality than wild-type seedlings. To maximize the number of seedlings screened per plate, additional seedlings may be germinated on agar plates and transplanted into wells in the Ice-Cap block that contain seeds that did not germinate. It is important to transfer the seedling plates to the Ice-Cap fountain before the roots reach the bottoms of the wells and contact the sealing film on the bottom of the seedling plates. 12. The stainless steel balls can be efficiently added to the root plates using a custom-made ball-dispensing device (Fig. 4) [11]. iTILLING 187 This device is made by placing a single sheet of aluminum foil over the surface of a 96-well PCR plate and using a marker to note the locations of the centers of each well on the surface of the foil. This marked sheet of foil is then placed on top of an additional 6 sheets of aluminum foil and wrapped around the smooth surface of a lid from a used pipette tip box. A sharp tool such as a wooden skewer can then be used to create a divot large enough to hold a single steel bead at each of the 96 positions. To fill this dispensing device with balls, place the device in a Pyrex dish and pour an excess of steel beads over the device and shake it horizontally to remove excess beads. The root plate is then placed over the top of the dispensing device, which is flipped over to drop one bead into each well of the root plate. 13. The water in the fountain will need to be maintained at a level that is sufficient to keep the pump submerged throughout the period of plant growth. To accomplish this task, a mixture of 3 parts distilled water to 1 part tap water should be added to the foundation every few days to replenish water lost due to evaporation. 14. Growth of the seedling roots to the bottom of the root plate may take anywhere from 10 days to 3 weeks. This growth rate is based on seedlings grown in continuous light at 18–20 °C. Wild-type seedlings will grow more quickly, on average, than a mutagenized population of seedlings. 15. Seedling roots can be collected the day of removal from the fountain; however, a high volume of water in the root plate can make separation of the upper and lower plates more challenging after freezing the root plate for tissue capture. 16. To prepare the freezing bath, place the 96-well metal thermal block(s) into the glass dish. Cover each block with a clear plastic lid to avoid filling the wells with ethanol or dry ice. Pour about ½ in. 95 % ethanol in the glass dish first, and then add the dry ice. Add more ethanol or dry ice as necessary. After equilibrating the thermal blocks, remove the clear lids before attempting to freeze the root plates. Place an autoclave glove or other insulating material under the freezing bath to protect the bench top. 17. 96-well seedling plates containing Arabidopsis seedlings can be wrapped in foil and stored in the dark at 4 °C for at least 1 month without loss of seedling viability, thereby allowing the researcher sufficient time to screen for mutations in a number of different loci while the ephemeral mutant population lies effectively dormant in the refrigerator. 18. Thawing of the liquid in the root plate can be expedited by incubating the root plates in a thermal heat block set at 25 °C. 188 Susan M. Bush and Patrick J. Krysan 19. Different dilution rates can be empirically tested to determine if better PCR performance is achieved with an alternative dilution ratio. This protocol produces a crude extract of soluble cellular components as well as genomic DNA; therefore, higher dilution levels may have the potential to produce better PCR results in some situations. Both higher and lower dilution rates should therefore be tested when troubleshooting the procedure. 20. EMS induces primarily G/C → A/T mutations [2], and only 4 codons can be altered in this way to produce stop codons: CAA(Gln), CAG(Gln), CGA(Arg), and TGG(Trp). To maximize the chance of finding nonsense mutations with iTILLING, PCR amplicons should therefore be chosen that target regions of the gene that are enriched for the four codons listed above. 21. A liquid-handling robot or multichannel pipettes can be used to streamline the liquid-handling steps needed to process the PCR reactions. 22. A hot-start version of the Taq DNA polymerase should be used when setting up the PCR reactions to allow reaction setup at room temperature, such as a previously described mutant form of the enzyme that has reduced activity at room temperature [12]. 23. We found that using a saturating dye, rather than nonsaturating dye, works more successfully for high-resolution melt-curve analysis. SybrGreen (a non-saturating dye) and EvaGreen dyes did not perform well in our hands when screening for the presence of heteroduplexes. We found that SYTO13 dye (Invitrogen), a saturating DNA-binding dye typically used for cell staining with flow cytometry, works well for heteroduplex detection in PCR amplicons. 24. For high-resolution melt-curve generation, the initial melting and reannealing steps are critical to ensure dissociation of PCR homoduplexes and allow creation of heteroduplexes wherever a single-base mismatch may be present in the amplicon. The optimal range of melting temperatures will vary with PCR amplicon length and sequence composition and should be empirically determined for each amplicon. 25. We have used the Bio-Rad CFX96 PCR Detection System to visualize heteroduplexes in PCR products ranging from 100 to 120 bp in size. Single-base mismatches can be detected in much longer amplicons when using a higher-resolution melting system, such as the LightScanner System from Idaho Technology [4, 13]. 26. As an alternative to high-resolution melting, direct sequencing of PCR products utilizing next-generation sequencing iTILLING 189 technologies could be used to identify mutations in amplicons of interest. Methods have been developed that allow the addition of DNA barcodes to samples during processing for DNA sequencing, and these methods could be used to multiplex PCR products from a number of individual lines prior to sequencing [14, 15]. A recent implementation of this strategy allowed for identification of SNPs from a pool of 768 individuals using a multidimensional pooling strategy and the Illumina sequencing platform [5]. Because DNA samples prepared for iTILLING are already in 96-well format, it would be straightforward to design pooling strategies that optimize mutation detection and minimize cost, depending on the particular DNA sequencing platform available. The advantage of using DNA sequencing to screen for mutations is that one would be able to directly identify the precise mutation present in a given line. For an iTILLING screen based on direct sequencing, it would not be necessary to narrow down the mutation of interest to a single plant based only on the DNA sequence data; consequently, it would not be necessary to barcode individual seedlings separately. For example, one could design a pooling strategy in which the DNA sequencing data revealed the 96-well plate in which the mutation of interest was present. Follow-up screening by targeted PCR and melt-curve analysis could then be used to quickly identify the individual seedling carrying the mutation of interest. Because the precise sequence of the mutation would be known and only one 96-well plate would need to be screened, this step in the procedure would be cheap and efficient. 27. Traditional TILLING has intrinsically high throughput as a result of the pooling strategies it uses. The rapid timeline of iTILLING means that the identification of mutations does not require pooling of DNA samples, though pooling could be applied. Multiple plants could be grown and sampled together on the 96-well plates, such as in the two-per-well growth strategy discussed in Note 28 [7]. Alternatively, plant tissue could be harvested and DNA samples prepared individually and then individual DNA extracts combined to form a pool, as in traditional TILLING. Sensitivity of the high-resolution mutation detection platform is the main factor limiting the extent to which DNA extracts can be pooled, and use of higher resolving power will allow detection of single-base-change mutations in more highly pooled samples, as well as in amplicons of greater length [4, 13, 16]. 28. The iTILLING protocol described here involves growing one seedling per well. In an M2 population of plants, a nonlethal induced mutation is expected to segregate in the standard Mendelian fashion of 1:2:1. A given induced mutation is 190 Susan M. Bush and Patrick J. Krysan therefore expected to be present in both homozygous and heterozygous forms in the screening population. By using DNA extracted from seedlings grown one per well, mutations that are homozygous will not be detected since no heteroduplexes will be present in the corresponding PCR reactions. We have found that it is possible to identify both homozygous and heterozygous mutations in DNA samples collected from seedlings grown 2 per well in Ice-Cap. Using our Bio-Rad CFX96 high-resolution melt system, the rate of mutation detection in seedlings grown two per well was similar to the rate of detection of mutations in seedlings grown one per well [4, 13]. 29. When a plant carrying a mutation in a gene of interest is identified using iTILLING, that plant can be transplanted to soil and M3 seeds can be directly collected from the M2 parent. This is in contrast to traditional TILLING, where identification of a mutation of interest in a pooled sample would require further screening to find individuals of interest [17]. 30. When extracting a seedling of interest from the seedling plate, use care to prevent the seedling from being destroyed in the well or on the benchtop by the application of air of excessively high pressure. The only seedlings that are transferred to soil are those carrying mutations of interest, which means that very little growth chamber space is needed to produce and screen the entire mutant population. Most seedlings never leave the 96-well Ice-Cap plates and are discarded at the end of the experiment. References 1. McCallum CM et al (2000) Targeted screening for induced mutations. Nat Biotechnol 18:455–457 2. Greene EA et al (2003) Spectrum of chemically induced mutations from a large-scale reverse-genetic screen in Arabidopsis. Genetics 164:731–740 3. Colbert T et al (2001) High-throughput screening for induced point mutations. Plant Physiol 126:480–484 4. Gady ALF et al (2009) Implementation of two high through-put techniques in a novel application: detecting point mutations in large EMS mutated plant populations. Plant Methods 5:13 5. Tsai H et al (2011) Discovery of rare mutations in populations: TILLING by sequencing. Plant Physiol 156:1257–1268 6. Botticella E et al (2011) High resolution melting analysis for the detection of EMS induced mutations in wheat SbeIIa genes. BMC Plant Biol 11:156 7. Bush SM, Krysan PJ (2010) iTILLING: a personalized approach to the identification of 8. 9. 10. 11. 12. mutations in specialized genetic backgrounds. Plant Physiol 154:25–35 Weigel D, Glazebrook J (2006) Protocol: EMS mutagenesis of Arabidopsis seed. Cold Spring Harb Protoc. doi: 10.1101/ pdb. prot4621 Krysan PJ (2004) Ice-cap: a high-throughput method for capturing plant tissue samples for genotype analysis. Plant Physiol 135: 1162–1169 Su S et al (2011) Ice-Cap: a method for growing Arabidopsis and tomato plants in 96-well plates for high-throughput genotyping. J Vis Exp 57:e3280. doi:10.3791/3280 Clark KA, Krysan PJ (2007) Protocol: an improved high-throughput method for generating tissue samples in 96-well format for plant genotyping (Ice-Cap 2.0). Plant Methods 3:8 Kermekchiev MB, Tzekov A, Barnes WM (2003) Cold-sensitive mutants of Taq DNA polymerase provide a hot start for PCR. Nucleic Acids Res 31:6139–6147 iTILLING 13. Montgomery J et al (2007) Simultaneous mutation scanning and genotyping by high-resolution DNA melting analysis. Nat Protoc 2:59–66 14. Meyer M et al (2008) From micrograms to picograms: quantitative PCR reduces the material demands of high-throughput sequencing. Nucleic Acids Res 36:e5 15. Parameswaran P et al (2007) A pyrosequencingtailored nucleotide barcode design unveils 191 opportunities for large-scale sample multiplexing. Nuclei Acids Res 35:e130 16. Reed GH, Wittwer CT (2004) Sensitivity and specificity of single-nucleotide polymorphism scanning by high-resolution melting analysis. Clin Chem 50:1748–1754 17. Comai L, Henikoff S (2006) TILLING: practical single-nucleotide mutation discovery. Plant J 45:684–694 Chapter 10 Tailor-Made Mutations in Arabidopsis Using Zinc Finger Nucleases Yiping Qi, Colby G. Starker, Feng Zhang, Nicholas J. Baltes, and Daniel F. Voytas Abstract Zinc finger nucleases (ZFNs) are proteins engineered to make site-specific double-strand breaks (DSBs) in a DNA sequence of interest. Imprecise repair of the ZFN-induced DSBs by the nonhomologous endjoining (NHEJ) pathway results in a spectrum of mutations, such as nucleotide substitutions, insertions, and deletions. Here we describe a method for targeted mutagenesis in Arabidopsis with ZFNs, which are engineered by context-dependent assembly (CoDA). This ZFN-induced mutagenesis method is an alternative to other currently available gene knockout or knockdown technologies and is useful for reverse genetic studies. Key words Arabidopsis, ZFN, NHEJ, CoDA, Mutagenesis 1 Introduction Over the past few decades, forward genetic approaches—such as map-based cloning—have been used to isolate numerous Arabidopsis genes. Arabidopsis mutants cloned by these approaches were generated through the use of ethyl methanesulfonate (EMS), which introduces point mutations, or fast neutrons, which often create large deletions [1]. More recently, the analysis of gene function has shifted towards using reverse genetic approaches, which use RNAi to knock down gene expression or take advantage of publicly available T-DNA insertion mutant lines to analyze mutant phenotypes [2–4]. Despite a rich collection of T-DNA insertions across the genome of Arabidopsis, there is still a need for alternative technologies that can make mutations in genes for which no mutants are currently available. One such technology is called TILLING (Targeting Induced Local Lesions IN Genomes), which introduces G/C to A/T transitions through the use of the mutagen EMS [5, 6]. Although TILLING is clearly a powerful approach, Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_10, © Springer Science+Business Media New York 2014 193 Yiping Qi et al. a ZFN-R F1 F2 F3 Fo kl kl F1 F2 Fo F3 ZFN-L ZFN-R F2 F3 b F1 Fo kl 194 ATCTTCGGCCATGAAGCTGGAGGG TAGAAGCCGGTACTTCGACCTCCC l k F1 F2 F3 Fo ZFN-L c ATCTTCGGCC ATGAAGCTGGAGGG TAGAAGCCGGTACT TCGACCTCCC d ATCTTCGGCCATGAAGCTGGAGGG TAGAAGCCGGTACTTCGACCTCCC Mutagenesis by imprecise NHEJ Fig. 1 The ZFN-induced mutagenesis system (a) A pair of ZFNs is expressed in plant cells. (b) Driven by a nuclear localization signal (NLS), the ZFNs move to the nucleus and recognize the target DNA sequence. (c) FokI dimerization produces a double-strand break (DSB) in the “spacer” of the target site. (d) Error-prone NHEJ repair of the DSB leads to mutagenesis at the target site it suffers from the narrow spectrum of mutations that can be recovered at a given locus in Arabidopsis. Zinc finger nucleases (ZFNs) are hybrid proteins, each of which contains a zinc finger DNA-binding domain at the N-terminus and a nonspecific cleavage domain of the FokI restriction enzyme at the C-terminus [7, 8] (Fig. 1a). Zinc finger DNAbinding domains often consist of three to six zinc fingers, and each finger recognizes a triplet of nucleotides (Fig. 1b). Importantly, zinc finger domains can be engineered to specifically recognize novel DNA sequences. ZFNs work in pairs because the FokI nuclease domains function as dimers [9]. Recognition of the target DNA by a ZFN pair brings two FokI nuclease domains together to make a DNA double-strand break (DSB) in the sequence between the ZF-binding target. This “spacer” is usually 5–7 bp in length (Fig. 1c). For three-finger ZFNs, the DNA sequence recognized by both the left and right binding domains defines an 18 bp target site, which is often unique in a given genome like Arabidopsis. As cytotoxic lesions, the DSBs created by ZFNs need to be repaired through either the nonhomologous end-joining (NHEJ) or homologous recombination (HR) pathways. When NHEJ is used to repair the break, the resulting mutations are typically short deletions, but they can also be insertions or nucleotide substitutions (Fig. 1d). Thus, a variety of targeted mutations can be Tailor-Made Mutations in Arabidopsis Using Zinc Finger Nucleases 195 introduced into a locus through ZFN-induced DSBs. ZFNs are rapidly becoming powerful tools for making targeted mutations in many higher organisms including plants such as tobacco, Arabidopsis, and maize [10–14]. One bottleneck for implementing ZFN technology is engineering site-specific zinc finger arrays (ZFAs). Over the years, genome engineers from both industrial and academic labs have endeavored to overcome this bottleneck. Currently, ZFNs can be engineered with multiple platforms. ZFNs are commercially available through Sigma-Aldrich® with the brand name CompoZr®, which is based on a proprietary platform developed by Sangamo BioSciences [15]. ZFNs can also be engineered using publicly available platforms developed mainly by the Zinc Finger Consortium (http://www.zincfingers.org). The first platform made available by the Consortium is modular assembly, which is simple but may have a relatively low rate of success [16, 17]. The second platform is Oligomerized Pool ENgineering (OPEN), which has a success rate close to 80 % and has yielded many ZFNs that target genes in diverse organisms such as Arabidopsis, tobacco, zebrafish, and human [10, 12, 18]. However, it takes time and effort for a molecular biology lab to adopt the OPEN method, because it is technically demanding. More recently, a third platform, context-dependent assembly (CoDA), was developed to generate ZFNs that target multiple genes in Arabidopsis, soybean, and zebrafish [19, 20]. Although the success rate of ZFNs designed with CoDA is not as high as with OPEN [18, 19], CoDA is much easier to implement because it only requires standard molecular cloning techniques. Detailed protocols describing the CoDA method have been previously published [19, 21, 22]. These protocols can be used to obtain the ZFN of interest. In this chapter, we describe an adaptation of the CoDA protocol in which the ZFNs are made by a simple, PCR-based approach. Importantly, we also describe methods for introducing ZFNs in plants, inducing their expression and recovering mutations at the target locus. 2 Materials 2.1 Engineering ZFAs with the CoDA Method 1. Plasmid 28086 (encodes a zinc finger array targeting NRF2b [18], available from Addgene.org) (see Note 1). 2. Plasmids pCP3 and pCP4 [12]. 3. pCR®8⁄GW⁄TOPO® TA Cloning Kit (Invitrogen, # K250020). 4. Cloned Pfu polymerase (Stratagene, # 600153). 5. Deoxynucleotide Solution Mix (NEB, # N0447). 6. Restriction enzyme DpnI (NEB, # R0176). 7. NEB Taq DNA Polymerase with Standard Taq Buffer (NEB, # M0273). 196 Yiping Qi et al. 8. A thermocycler. 9. 37 °C incubator with shaking. 10. 42 °C water bath. 11. Chemically competent E. coli DH5α cells. 12. QIAquick® Gel Extraction kit (Qiagen, # 28706). 13. QIAprep® Miniprep kit (Qiagen, # 27106). 14. LB medium (1 % tryptone, 0.5 % yeast extract, 1 % sodium chloride, 1.5 % agar for solid medium). 15. S.O.C liquid medium (2 % tryptone, 0.5 % yeast extract, 10 mM sodium chloride, 2.5 mM potassium chloride, 10 mM magnesium chloride, 10 mM magnesium sulfate, 20 mM glucose). 16. Antibiotics: spectinomycin, kanamycin, carbenicillin, and gentamicin. 17. Primers (see Table 1). 2.2 Construction of ZFN Expression Constructs 1. Restriction enzymes XbaI (# R0145), BamHI (# R0136), NheI (# R0131), BglII (# R0144), and EcoRV (# R0195) (NEB). 2. Plasmid pFZ87 [12] (see Note 2). 3. Plasmid pMDC7 [23] (see Note 3). 4. T4 DNA ligase (NEB, # M0202). 5. Gateway® LR Clonase® II enzyme mix (Invitrogen, #11791). 6. Primers (see Table 1). 2.3 Screen for Arabidopsis Mutants Induced by ZFNs 1. Competent cells for electroporation of the Agrobacterium tumefaciens strain GV3101/pMP90 [24] (see Note 4). 2. Plant growth chamber. 3. Floral dip transformation solution: 5 % (w/v) sucrose, 10 mM MgCl2, 0.03 % (v/v) VAC-IN-STUFF (Silwet L-77) (LEHLE SEEDS, # VIS-02). 4. Glassine Envelopes. 5. Bleach (Sun Brite®, 5.25 % sodium hypochloride as active ingredient). 6. Falcon® 150 × 15 mm sterile disposable polystyrene petri dish (Becton Dickinson labware, # 1058). 7. Transgenic plant selection medium: 0.8 % agar plate containing 0.5× Murashige and Skoog with vitamins (Caisson Labs, # MSP09), 25 μg/ml hygromycin B (Roche, # 10843555001), 50 μg/ml timentin (plantMedia, # 42010012), and 20 μM β-estradiol (Sigma, # E2758) (see Note 5). 8. Micropore surgical tape (3 M, # 1535–1). 9. 2.0-ml sterile 02-707-355). conical screw cap tubes (Fisher, # Tailor-Made Mutations in Arabidopsis Using Zinc Finger Nucleases 197 Table 1 Primers used in this protocol Primer name Primer sequence Purpose LF2-1 5′-CATACCCGTACTCATACCG-3′ Amplify “long F2” LF2-2 5′-ACTGAAGTTGCGCATGCATATTCG-3′ Amplify “long F2” F2RH+ 5′-NNNNNNNNNNNNNNNNNNNNN CATCTACGTACGCACACCGGC-3′ Amplify plasmid containing new long F2 F2RH- 5′-NNNNNNNNNNNNNNNNNNNNN GGAGAAATTTCGCATACAGATCCG-3′ Amplify plasmid containing new long F2 F1RH 5′-TTGCATGCGGAACTTTTCGNNNNNNN NNNNNNNNNNNNNNCATA CCCGTACTCATACCGG-3′ Contains F1RH, amplifies most of the ZFA F3RH 5′-TCAGGTGGGTTTTTAGGTGNNNNNNN NNNNNNNNNNNNNNACTGAAG TTGCGCATGCATA-3′ Contains F3RH, amplifies most of the ZFA ZFA-Fusion-1 5′-AGTGGTTGGTCTAGACCCGGGGAGCG CCCCTTCCAGTGTCGCATTTGCATG CGGAACTTT-3′ Amplifies complete ZFA; ends homologous to pCP3 and pCP4 ZFA-Fusion-2 5′-TTCAGATTTCACTAGCTGGGAT CCCCTCAGGTGGGTTTTTAGGTG-3′ Amplifies complete ZFA; ends homologous to pCP3 and pCP4 M13F 5′-GTTTTCCCAGTCACGACGTTGTA-3′ Colony PCR pDW1789-TEF 5′- GGTCTTCAATTTCTCAAGTTTC-3′ Colony PCR T2A-R 5′-GATTCTCCTCCACGTCACCGCA-3′ Colony PCR T2A-F 5′- TGCGGTGACGTGGAGGAGA-3′ Colony PCR ZFP-R 5′- CTATTAAAAGTTTATCTCGCCGTT-3′ Colony PCR N indicates any nucleotide 10. 0.05 % sterile agar medium. 11. 1/8 in. eclipse steel balls (Abbott Ball Co.). 12. Liquid nitrogen. 13. Paint shaker. 14. Plant DNA extraction buffer or CTAB buffer (2 % hexadecyltrimethyl-ammonium bromide, 100 mM Tris, 20 mM EDTA, and 1.4 M sodium chloride). 15. Chloroform. 16. TOPO® TA Cloning® kit for subcloning with TOP10 E. coli (Invitrogen, # K4500). 17. Primers to amplify the genomic DNA region that spans the ZFN target site. 18. Pots and soil. 198 3 Yiping Qi et al. Methods 3.1 Assembly of CoDA ZFAs Using a Long Oligo-Based Approach (See Note 6) In CoDA, N-terminal-end fingers (F1 units) and C-terminal-end fingers (F3 units) of three-finger arrays have been identified that work well with common middle fingers (F2 units) [19]. A large archive of 319 F1 units and 344 F3 units has been engineered to work well with one of 18 fixed F2 units. Both amino acid and nucleotide sequences for these units are publicly available. Thus, using this information, one can make ZFNs through multiple approaches, such as the modular assembly [25] or direct DNA synthesis. Recently, an oligonucleotide-based overlapping PCR approach has been described for rapid assembly of CoDA ZFAs [22]. In this approach, each three-finger ZFA can be made by extension PCR using 8 different oligonucleotides. Here we describe another PCR-based approach to assemble CoDA ZFAs. This approach uses an existing OPEN-derived ZFA that is used to clone a fragment encoding an F2 unit and partial sequences for F1 and F3 units (this fragment is called a long F2 sequence) (Fig. 2a). The cloned long F2 sequence is the starting material for assembling different ZFAs, which can be created through three rounds of PCR. The first round of PCR creates the desired F2 unit through targeted mutagenesis (Fig. 2b). The second round of PCR adds recognition helices to the F1 and F3 units, thereby making a nearly complete three-finger array (Fig. 2c). By using fixed primers, the third round of PCR produces a full-length ZFA, which is ready to be cloned into an expression vector through recombination in E. coli (Fig. 2d). Since CoDA only provides 18 different F2 units for three-finger ZFAs, it is practical to create an archive of 18 long F2 sequences. Once these 18 long F2 sequences are generated, many custom ZFAs can be made with very little time in a cost-effective manner, because now only two rounds of PCR are required and only two new oligonucleotides are needed for a given ZFA. Before starting, one should make sure that CoDA-enabled ZFN sites are present in the gene of interest. This is accomplished using the ZiFiT Web server (http://zifit.partners.org) [26]. Otherwise, other platforms for obtaining ZFAs should be employed, which are out of the scope of this protocol. 3.1.1 Clone a Partial ZFA Sequence with the Desired F2 Finger 1. PCR amplify the long F2 sequence using primers LF2-1 and LF2-2 (see Table 1) using plasmid 28086 as a template. The resulting PCR product is 147 bp (Fig. 2a). 2. Run a 2 % agarose gel and gel-purify the PCR product with QIAquick® Gel Extraction kit; elute the DNA with 30 μl sterile water. 3. Clone the purified PCR product into the pCR8® vector according to the manufacturer’s instructions (see Note 7). Perform DNA sequencing to confirm the resulting “pCR8long F2” construct (Fig. 2b). Tailor-Made Mutations in Arabidopsis Using Zinc Finger Nucleases 199 LF2-1 a F1 b F2 F3 F2RH- LF2-2 F2 pCR8-long F2 c F2RH+ F1RH F2 pCR8-long F2-modifed F3RH First round PCR ZFA-Fusion-1 F1 F2 F3 Second round PCR d ZFA-Fusion-2 F1 F2 F3 F1 F2 F3 pCP3 or pCP4 Fig. 2 Assembly of three-finger CoDA ZFAs using a long oligo-based approach (a) A middle portion of the ZFA is first amplified from the plasmid 28086 and (b) cloned to pCR8 vector. Then, mutagenic PCR is conducted with primers F2RH+ and F2RH−. (c) The mutagenized PCR fragment is introduced into a plasmid by recombination in E. coli, leading to a clone with the desired F2. Final assembly of a full-length ZFA requires two rounds of PCR—first with F1RH and F3RH and then with ZFA-Fusion-1 and ZFA-Fusion-2. This results in a DNA fragment encoding the entire ZFA with homology at both ends (depicted as a filled grid) to pCP3 and pCP4. (d) The homology allows insertion of the ZFA into both vectors through recombination in E. coli 4. Design primers (designated as F2RH+ and F2RH-, see Table 1) that encode the F2 recognition helix of choice on the 5′ end, such that they are complementary to the sequence that will specify the new F2RH. Amplify the entire plasmid containing the F2 recognition helix sequence using 1 ng plasmid “pCR8long F2” as template with cloned Pfu polymerase (see Table 2a for conditions) (Fig. 2b). 5. Digest 5 μl of the PCR reaction with 0.5 μl restriction enzyme DpnI in a 25 μl reaction at 37 °C overnight (see Note 8). 6. Transform 50 μl of E. coli DH5α chemically competent cells with 5 μl of digestion product using a heat shock at 42 °C for 45 s. 7. Recover transformed E. coli cells with 200 μl S.O.C liquid medium and agitate at 37 °C for 1 h. 200 Yiping Qi et al. Table 2 PCR conditions for assembly of ZFAs Cycling PCR regime Initial denaturing Denature Anneal Extend Cycles Final extension a 1′ at 94º 0.5′ at 94º 0.5′ at 55º 8′ at 72º 30 10′ at 72º b 5′ at 94º 0.5′ at 94º 0.5′ at 50º 1.5′ at 72º 10 7′ at 72º c 5′ at 94º 0.5′ at 94º 0.5′ at 56º 1.5′ at 72º 10 0.5′ at 94º 0.5′ at 64º 1.5′ at 72º 20 7′ at 72º 8. Spread 100 μl of transformed cells onto LB plates with 100 μg/ml spectinomycin. Incubate at 37 °C overnight. 9. Miniprep 2 or 3 clones using the QIAprep® Miniprep kit and confirm the construct “pCR8-long F2-modified” through DNA sequencing (Fig. 2c). 3.1.2 Assembly and Cloning of the Entire 3-Finger ZFA 1. Amplify the nearly full-length ZFA using the plasmid “pCR8long F2-modified” as a template with primers F1RH and F3RH (see Table 1) in a 50 μl PCR reaction (Fig. 2c) (see Table 2b for conditions). 2. After the amplification, add 0.625 μl each of 10 μM ZFAFusion-1 and ZFA-Fusion-2 primers (see Table 1) and continue the PCR reaction (Table 2c for conditions). The resulting PCR product will have terminal sequences identical to the pCP3 and pCP4 yeast expression vectors (see Note 9). 3. Digest 1 μg pCP3 and pCP4 plasmids with BamHI and XbaI in a 50 μl reaction at 37 °C for 4 h. 4. Run the digestion product on a 1 % agarose gel. Gel-purify the linearized plasmids and elute the DNA with 30 μl sterile water. 5. Co-transform 75 ng of linearized pCP3 or pCP4 plasmid backbone with 2 μl of the PCR product into 50 μl E. coli DH5α chemically competent cells. Transformation is carried out by a heat shock at 42 °C for 45 s (see Note 10). 6. Recover transformed E. coli cells with 200 μl S.O.C liquid medium and agitate at 37 °C for 1 h. Then spread 100 μl of transformed cells onto LB plates with 50 μg/ml carbenicillin. Incubate at 37 °C overnight. 7. Screen for correct clones by colony PCR using primers M13F and pDW1789-TEF [10] (see Table 1). Correct clones will give PCR products of ~1.5 kb. 8. Miniprep plasmids and confirm that the correct clones, pCP3left-ZFA and pCP4-right ZFA, have been obtained by DNA sequencing. Tailor-Made Mutations in Arabidopsis Using Zinc Finger Nucleases 3.2 Construction of ZFN Expression Constructs 201 1. Digest 1 μg each of pCP3-left-ZFA, pCP4-right ZFA, and pFZ87 with 1 μl each of restriction enzymes XbaI and BamHI in a 50 μl volume at 37 °C for 4 h. 2. Run digestion products on a 1 % agarose gel. The digested ZFAs from pCP3-left-ZFA and pCP4-right ZFA are 270 bp; the digested pFZ87 vector is 4101 bp. 3. Gel-purify both the digested ZFA and elute the DNA with 30 μl sterile water. 4. Perform 10 μl ligation reactions to insert the left ZFA (~20 ng) to pFZ87 (~20 ng) using NEB quick ligase at room temperature for 1 h. 5. Transform 50 μl of chemically competent E. coli DH5α cells with 5 μl of the ligation reaction using a heat shock at 42 °C for 45 s. 6. Add 200 μl S.O.C liquid medium to the transformed E. coli cells and incubate with agitation at 37 °C for 1 h. Spread 150 μl of the recovered cells onto LB plates with 50 μg/ml kanamycin and incubate at 37 °C overnight. 7. Identify correct clones by performing colony PCR with primers M13F and T2A-R (see Table 1). 8. Culture two PCR-confirmed clones with the left ZFA insertion overnight and miniprep the plasmids (named as pFZ87L) the following day. 9. Digest 1 μg of pFZ87_L plasmid with 1 μl each of NheI and BglII restriction enzymes in a 50 μl reaction volume at 37 °C for 4 h. 10. Run the digested product on a 1 % agarose gel and gel-purify the linearized vector; elute with 30 μl sterile water. 11. Perform a ligation in 10 μl to insert the previously purified right ZFA (~ 20 ng) into the pFZ87_L vector (~20 ng) using quick T4 ligase at room temperature for 1 h. 12. Transform 5 μl of the ligation reaction into E. coli as described above (steps 5 and 6). 13. Perform colony PCR using primers T2A-F and ZFP-R to confirm the insertion of the right ZFA sequence. 14. Culture two PCR-confirmed clones for miniprep of the plasmids, namely, pFZ87_L + R. Sequence pFZ87_L + R plasmid to confirm the whole ZFN-left-T2A-ZFN-right sequence. 15. Linearize the sequence-confirmed pFZ87_L + R plasmid by digesting 2 μg of plasmid DNA in a 50 μl reaction with EcoRV at 37 °C for 2 h (see Note 11). 16. Run the digested product on a 1 % agarose gel and gel-purify the linearized entry vector (4606 bp). 202 Yiping Qi et al. a Bulked T1 Individual T1 b c B B Uncut / NHEJ cut cut Individual T2 d HT WT WT WT WT HM WT WT HM WT Cloned and sequenced e GTATCTTCGGCCATGAAGCTGGAGGGTA (wild type) GTATCTTCGGCCAaaGAAGCTGGAGGGTA (adh1-4) GTATCTTCGGCCA:::AGCTGGAGGGTA (adh1-8) GTATCTTCGGCCATatGAAGCTGGAGGGTA (adh1-16) Fig. 3 Screen for germline-transmitted mutations. (a) T1 transgenic seedlings are selected on medium containing hygromycin. Estradiol is included in the medium to induce ZFN expression. (b) Some transgenic seedlings are used for testing ZFN activity in somatic cells whereas (c) the remaining transgenic seedlings are transferred to soil to obtain the T2 generation. (d) Individual T2 plants are then screened for germline-transmitted mutations, and plants are genotyped as being homozygous (HM), heterozygous (HT), or wild type (WT). (e) The mutations are ultimately characterized by DNA sequencing. Note here that underlined nucleotides represent the target sequence for both ZFN monomers 17. Conduct an LR reaction to move left-ZFN-T2A-right-ZFN from the pFZ87_L+R entry clone to the pMDC7 destination vector. Use Gateway® LR Clonase® II enzyme mix according to the manufacturer’s instructions. 18. Confirm the correct pMDC7_L + R constructs by restriction digestion and/or DNA sequencing. 3.3 Screen for Arabidopsis Mutants Induced by ZFNs The major steps of the procedure are shown in Fig. 3, where the well characterized ZFNs that target the Arabidopsis ADH1 gene are used as an example (see Note 12). Tailor-Made Mutations in Arabidopsis Using Zinc Finger Nucleases 3.3.1 Arabidopsis Transformation 203 1. Transform 50 μl of competent Agrobacterium tumefaciens cells (strain GV3101/pMP90) with 0.5 ng pMDC7_L + R vector by electroporation with the E. coli Pulser cell-porator. 2. Add to the transformed Agrobacterium cells 200 μl LB liquid medium and agitate at 28 °C for 1 h. 3. Spread 150 μl of the transformed Agrobacterium cells onto LB plates with 50 μg/ml kanamycin and 50 μg/ml gentamicin. Incubate the plates at 28 °C for 2 days. 4. Pick a single colony of transformed Agrobacterium and culture in 5 ml LB liquid medium with 50 μg/ml kanamycin and 50 μg/ml gentamicin at 28 °C with shaking at 220 rpm overnight. 5. Pour overnight Agrobacterium culture into 200 ml LB liquid medium with 50 μg/ml kanamycin and 50 μg/ml gentamicin. Shake at 28 °C with 220 rpm overnight. 6. Collect cultured Agrobacterium cells by centrifugation at 6,000 × g for 10 min at 28 °C. 7. Discard the supernatant and resuspend the bacteria pellet with 400 ml Arabidopsis transformation buffer. 8. Transform Arabidopsis plants with the transformed A. tumefaciens strain using the floral dip method [27]. Briefly, immerse flowers of Arabidopsis plants in the Arabidopsis transformation buffer and place in a dark and humid environment overnight. 9. Keep watering plants for 3 weeks after transformation. Then stop watering and let seeds mature and dry (see Note 13). 10. Collect seeds and dry them for at least 2 weeks before screening for transgenic plants. 3.3.2 Screen for T1 Transgenic Plants and Induce ZFN Expression 1. Sterilize 0.2 g Arabidopsis seeds with 30 ml 50 % bleach in a 50-ml conical centrifuge tube by mixing for 10 min. 2. Wash sterilized seeds four times with 40 ml sterile water each time. To wash, spin the tube by centrifugation at 500 g for 1 min to precipitate seeds. Then, resuspend seeds with water by mixing. 3. Resuspend the seeds with 20 ml 0.05 % sterile agar medium. Keep the suspended seeds at 4 °C in the dark for 4 days. 4. Spread 5 ml of the seed suspension onto 150 mm × 15 mm petri dishes containing transgenic plant selection medium. Seal the petri dishes with surgical tape. 5. Place the seed-containing petri dishes in the growth chamber at 22 °C with 24 h light. 6. After 1 week, collect six transgenic seedlings into a 2-ml screw cap tube with a metal bead inside. Prepare two samples with 12 plants total. Do the same with the wild-type control as needed. Keep the remainder of the transgenic plants in the chamber. 204 Yiping Qi et al. 7. Freeze the tubes with liquid nitrogen and pulverize the samples by shaking in a paint shaker for 2 min (see Note 14). 8. Add 500 μl plant DNA extraction buffer. Mix well and incubate in a 65 °C water bath for 15 min. 9. Add 500 μl chloroform and mix well; centrifuge at 15,000 × g for 1 min. 10. Transfer 500 μl supernatant to clean 1.7 ml microfuge tubes and add 1 ml ethanol. Mix well and centrifuge at 15,000 × g for 1 min. 11. Remove the supernatant and add 1 ml 75 % ethanol to wash the pellet. Mix well and centrifuge at 15,000 × g for 1 min. 12. Remove the supernatant and dry the pellet for about 10 min. Then dissolve the DNA with 100 μl sterile water. Store plant genomic DNA at −20 °C. 3.3.3 Testing ZFN Activity in Somatic Cells of T1 Seedlings 1. Design and synthesize PCR primers that amplify the genomic DNA region which spans the ZFN target site. Meanwhile, choose one restriction enzyme whose recognition sequence is very close to if not located on the spacer sequence of the ZFN target site. Make sure there is no extra or very few extra sites for the chosen restriction enzyme in the PCR product (see Note 15). 2. Perform PCR in a 25 μl reaction volume with the designed primers and the DNA from the bulked T1 seedlings as a template. Include a wild-type DNA sample as a control. 3. Digest 10 μl of each PCR product in a 40 μl reaction volume with the chosen restriction enzyme overnight. 4. Run a 2 % agarose gel and check for restriction enzyme resistant bands, which indicate the presence of ZFN-induced mutations. Such digestion-resistant bands indicate that the ZFNs not only are active but also have a high in vivo activity (see Note 16). 3.3.4 Screen for ZFN-Induced Mutants 1. Transfer 7~10-day old T1 seedlings from transgenic plant selection medium to soil (see Note 17). 2. Maintain the plants until mature (see Note 18). Collect T2 seeds from individual T1 plants and dry seeds in Glassine Envelopes. 3. Plant seeds from ten T2 populations derived from ten individual T1 parents in potting mix. Use ~100 seeds for each T2 population. 4. After 3 weeks, collect one leaf from each plant and extract genomic DNA with as described in Subheading 3.3.2. 5. Perform the screen involving PCR and digestion as described in Subheading 3.3.3, to identify mutant plants (see Note 19). Tailor-Made Mutations in Arabidopsis Using Zinc Finger Nucleases 4 205 Notes 1. Any plasmid containing ZFAs generated by the OPEN method should work for this protocol. The OPEN-derived ZFA merely serves as a template for PCR. Here we suggest plasmid #28086 because it has worked well in our hands. 2. Plasmid pFZ87 is a derivative of the Gateway® entry vector pENTR/D-TOPO® that contains a FokI-T2A-FokI coding sequence where both FokI nuclease domains are obligate heterodimers [28]. Plasmid pFZ87 is available from the Voytas lab upon request. T2A is the insect virus Thosea asigna “selfcleaving” 2A peptide which allows production of two proteins from one mRNA through a translational skipping mechanism [29]. Thus, inclusion of the T2A sequence allows efficient expression of the left and right ZFNs from one transcript. 3. Plasmid pMDC7 is a Gateway® destination T-DNA binary vector for estrogen-inducible expression in plants which can be ordered from ABRC Stock Center (http://www.arabidopsis. org). The use of the estrogen-inducible promoter minimizes potential cytotoxicity of ZFNs when compared to constitutive expression. Also note that pMDC7 should be propagated in E. coli DB3.1 because it contains the CcdB gene which is a negative selectable marker. The toxin encoded by the CcdB gene inhibits growth of normal E. coli strains such as DH5α, but not DB3.1 due to presence of the antitoxin in this strain. 4. Agrobacterium tumefaciens strain GV3101/pMP90 is a common lab strain which is available from the Voytas lab upon request. 5. Hygromycin B is the marker in the T-DNA vector pMDC7 for transgenic plant selection. Timentin is an antibiotic for killing Agrobacterium. Agrobacterium is often hard to be completely removed from seeds by standard surface sterilization procedures. Having β-estradiol in the medium allows for transcriptional induction of the ZFNs. 6. A three-finger ZFA coding sequence is about 270 bp, so it is always an option to obtain a ZFA through direct DNA synthesis. This approach is recommended by the authors of the original CoDA protocol. We recommend choosing more than one ZFN site for a given gene, if possible, because in our experience, only about 50 % of the CoDA-derived ZFNs are functional. The method described here will be much more cost-effective compared to synthesizing ZFAs if several ZFAs need to be assembled for testing. 7. Any cloning vector of relatively small size (<4.5 kb) will work, as long as the entire vector can easily be PCR-amplified. 8. DpnI recognizes and digests only methylated DNA, but not unmethylated PCR products. In this way, contamination with plasmid “pCR8-long F2” in the cloning steps can be avoided. 206 Yiping Qi et al. 9. In this protocol, ZFAs are first cloned into yeast expression vectors pCP3 and pCP4, because a yeast assay is often performed in our lab to test ZFN activity before using them in higher eukaryotes. Both pCP3 and pCP4 vectors are available from the Voytas lab upon request. If a yeast assay is not performed, one can skip all of the remaining steps and go directly to Subheading 3.2, where the PCR products (instead of pCP3 or pCP4) that contain the left and right ZFAs will be digested with XbaI and BamHI. 10. This step uses homologous recombination in E. coli to incorporate both the left ZFA and right ZFA into pCP3 and pCP4 vectors. We find this approach simpler than performing regular ligation reactions. 11. The reason for using a linearized entry clone is that both the pFZ87 entry clone and pMDC7 destination clone use a kanamycin selectable marker for E. coli. An alternative strategy would be to clone the left-ZFN-T2A-right-ZFN sequence into an entry clone that uses other antibiotic markers. In this way, linearization of the entry clone becomes unnecessary. 12. The ZFNs that target the Arabidopsis ADH1 gene were made with the OPEN platform [12]. However, the plant mutant screening procedure described here is independent of the ZFN engineering platform. 13. The timeline for growing plants to seed may vary in different labs depending on the growth conditions. 14. We use a paint shaker to pulverize plant tissue and CTAB buffer for plant genomic DNA extraction. The advantage of using a paint shaker to pulverize samples all at once becomes obvious if there are multiple samples to be handled, especially when performing screens for germline-transmitted mutations. Other alternative plant DNA extraction methods should also work. 15. It is important to have a relatively unique restriction enzyme recognition site right in the middle of the spacer where DSBs are induced by ZFNs and mutations are made. Such mutations are mainly point mutants, small deletions and insertions, which will destroy the restriction enzyme recognition site. However, if there is no suitable restriction enzyme to use, one should use other detection methods such as the surveyor assay [30] or high-throughput DNA sequencing. 16. It is possible that no clear evidence for mutagenesis will be obtained for a given pair of ZFNs due to their low activity. If this is the case, an enrichment PCR procedure can be used to detect ZFN-induced mutations as illustrated in Fig. 4. Alternatively, ZFN-induced mutations can be detected and quantified by deep sequencing [31]. Tailor-Made Mutations in Arabidopsis Using Zinc Finger Nucleases a ZFN-treated cells 207 ZFN-untreated cells Digestion of genomic DNA b PCR with digested genomic DNA as template c Digestion of PCR product d PCR product PCR product Digested product Digested product uncut Fig. 4 Enrichment PCR to detect ZFN-induced mutations (a) Both ZFN-treated (left) and untreated (right) cells are depicted in parallel. NHEJ-mediated mutations are only present in the ZFN-treated cells, and mutations are denoted by a black dot. (b) Digestion of genomic DNA by a restriction enzyme which is in close proximity or overlaps the ZFN cut site. Some ZFN-induced mutations will destroy such a restriction enzyme site, preventing the DNA from being cleaved (see the black dot). (c) PCR amplification with primers flanking the ZFN site will enrich for ZFN-mutated DNA. In ZFN-untreated cells, there will be some or no PCR product depending on the completeness of the restriction digestion reaction. (d) The ZFNinduced mutations can be easily detected as an uncut band by restriction digestion of the PCR product with the same restriction enzyme 17. In general, the more T1 transgenic founder lines that are followed, the better are the chances to recover heritable mutants in T2 progeny. Forty T1 transgenic lines, recommended in this study, have been used as the population size in our laboratory for the initial screen of germline-transmitted mutants. For ZFNs with strong cleavage activities, as revealed by somatic mutagenesis assays, several independent heritable mutants have been successfully identified from the progeny of those 40 T1 plants. If the activity of the ZFNs is weak, more T1 plants are recommended to be screened. 208 Yiping Qi et al. 18. We have tried spraying 20 μM Estradiol to the transgenic plants after they were transferred from MS plates to soil. We think such an additional estradiol treatment may help enhance mutagenesis frequency. 19. It is not unusual to recover biallelic mutations when the ZFNs are highly active [12]. If the ZFNs are not very active, as revealed by their somatic mutagenesis frequencies in Subheading 3.3.4, we recommend a strategy in which multiple plants (such as 20) are pooled in a single sample. Mutations are then detected using the enrichment PCR procedure illustrated in Fig. 4. Acknowledgments This work is supported by grants from the National Science Foundation to D.F.V. (DBI 0923827 and MCB 0209818). References 1. Alonso JM, Ecker JR (2006) Moving forward in reverse: genetic technologies to enable genome-wide phenomic screens in Arabidopsis. Nat Rev Genet 7:524–536 2. Alonso JM et al (2003) Genome-wide insertional mutagenesis of Arabidopsis thaliana. Science 301:653–657 3. Sessions A et al (2002) A high-throughput Arabidopsis reverse genetics system. Plant Cell 14:2985–2994 4. Woody ST et al (2007) The WiscDsLox T-DNA collection: an arabidopsis community resource generated by using an improved high-throughput T-DNA sequencing pipeline. J Plant Res 120:157–165 5. McCallum CM et al (2000) Targeted screening for induced mutations. Nat Biotechnol 18:455–457 6. Bush SM, Krysan PJ (2011) iTILLING: a personalized approach to the identification of induced mutations in Arabidopsis. Plant Physiol 154:25–35 7. Kim YG, Cha J, Chandrasegaran S (1996) Hybrid restriction enzymes: zinc finger fusions to Fok I cleavage domain. Proc Natl Acad Sci U S A 93:1156–1160 8. Bibikova M et al (2001) Stimulation of homologous recombination through targeted cleavage by chimeric nucleases. Mol Cell Biol 21:289–297 9. Bitinaite J et al (1998) FokI dimerization is required for DNA cleavage. Proc Natl Acad Sci U S A 95:10570–10575 10. Townsend JA et al (2009) High-frequency modification of plant genes using engineered zinc-finger nucleases. Nature 459:442–445 11. Carroll D (2011) Genome engineering with zinc-finger nucleases. Genetics 188:773–782 12. Zhang F et al (2010) High frequency targeted mutagenesis in Arabidopsis thaliana using zinc finger nucleases. Proc Natl Acad Sci U S A 107:12028–12033 13. Shukla VK et al (2009) Precise genome modification in the crop species Zea mays using zinc-finger nucleases. Nature 459:437–441 14. Osakabe K, Osakabe Y, Toki S (2010) Sitedirected mutagenesis in Arabidopsis using custom-designed zinc finger nucleases. Proc Natl Acad Sci U S A 107:12034–12039 15. Doyon Y et al (2008) Heritable targeted gene disruption in zebrafish using designed zincfinger nucleases. Nat Biotechnol 26:702–708 16. Kim S et al (2011) Preassembled zinc-finger arrays for rapid construction of ZFNs. Nat Methods 8:7 17. Ramirez CL et al (2008) Unexpected failure rates for modular assembly of engineered zinc fingers. Nat Methods 5:374–375 18. Maeder ML et al (2008) Rapid “open-source” engineering of customized zinc-finger nucleases for highly efficient gene modification. Mol Cell 31:294–301 19. Sander JD et al (2011) Selection-free zincfinger-nuclease engineering by contextdependent assembly (CoDA). Nat Methods 8:67–69 Tailor-Made Mutations in Arabidopsis Using Zinc Finger Nucleases 20. Curtin SJ et al (2011) Targeted mutagenesis of duplicated genes in soybean with zinc-finger nucleases. Plant Physiol 156:466–473 21. Sander JD, Maeder ML, Joung JK (2011) Engineering designer nucleases with customized cleavage specificities. Curr Protoc Mol Biol Chapter 12, Unit12 13. doi: 10.1002/0471142727.mb1213s96 22. Osborn MJ et al (2011) Synthetic zinc finger nuclease design and rapid assembly. Hum Gene Ther 22:1155–1165 23. Curtis MD, Grossniklaus U (2003) A gateway cloning vector set for high-throughput functional analysis of genes in planta. Plant Physiol 133:462–469 24. Koncz C et al (1989) High-frequency T-DNAmediated gene tagging in plants. Proc Natl Acad Sci U S A 86:8467–8471 25. Wright DA et al (2006) Standardized reagents and protocols for engineering zinc finger nucleases by modular assembly. Nat Protoc 1:1637–1652 209 26. Sander JD et al (2010) ZiFiT (zinc finger targeter): an updated zinc finger engineering tool. Nucleic Acids Res 38:W462–W468 27. Clough SJ, Bent AF (1998) Floral dip: a simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana. Plant J 16:735–743 28. Miller JC et al (2007) An improved zincfinger nuclease architecture for highly specific genome editing. Nat Biotechnol 25: 778–785 29. Szymczak AL et al (2004) Correction of multigene deficiency in vivo using a single “selfcleaving” 2A peptide-based retroviral vector. Nat Biotechnol 22:589–594 30. Guschin DY et al (2010) A rapid and general assay for monitoring endogenous gene modification. Methods Mol Biol 649:247–256 31. Herrmann F et al (2011) p53 Gene repair with zinc finger nucleases optimised by yeast 1-hybrid and validated by Solexa sequencing. PLoS One 6:e20913 Chapter 11 The Use of Artificial MicroRNA Technology to Control Gene Expression in Arabidopsis thaliana Andrew L. Eamens, Marcus McHale, and Peter M. Waterhouse Abstract In plants, double-stranded RNA (dsRNA) is an effective trigger of RNA silencing, and several classes of endogenous small RNA (sRNA), processed from dsRNA substrates by DICER-like (DCL) endonucleases, are essential in controlling gene expression. One such sRNA class, the microRNAs (miRNAs) control the expression of closely related genes to regulate all aspects of plant development, including the determination of leaf shape, leaf polarity, flowering time, and floral identity. A single miRNA sRNA silencing signal is processed from a long precursor transcript of nonprotein-coding RNA, termed the primary miRNA (pri-miRNA). A region of the pri-miRNA is partially self-complementary allowing the transcript to fold back onto itself to form a stem–loop structure of imperfectly dsRNA. Artificial miRNA (amiRNA) technology uses endogenous pri-miRNAs, in which the miRNA and miRNA* (passenger strand of the miRNA duplex) sequences have been replaced with corresponding amiRNA/amiRNA* sequences that direct highly efficient RNA silencing of the targeted gene. Here, we describe the rules for amiRNA design, as well as outline the PCR and bacterial cloning procedures involved in the construction of an amiRNA plant expression vector to control target gene expression in Arabidopsis thaliana. Key words miRNA, amiRNA, RNA silencing, Plant expression vector, Target gene expression, Arabidopsis 1 Introduction The genome of the model dicotyledonous plant species Arabidopsis thaliana (Arabidopsis) encodes several classes of highly abundant small RNA (sRNA), 20–30 nucleotides (nt) in length [1–4]. These small, single-stranded RNAs, through various protein-mediated RNA–RNA and RNA–DNA interactions, regulate gene expression in a highly sequence-specific manner in a diverse array of biological processes, including all aspects of plant development, adaptation to stress, and defense against transposon replication and invading pathogens [5–8]. Small RNAs can direct their mechanism of RNA silencing at either the transcriptional or posttranscriptional level of gene expression, and sRNAs functioning posttranscriptionally are divided into Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_11, © Springer Science+Business Media New York 2014 211 212 Andrew L. Eamens et al. two distinct classes: small interfering RNAs (siRNAs) and microRNAs (miRNAs), depending on their mode of biogenesis [9]. Both classes of sRNA are processed from double-stranded RNA (dsRNA), an effective trigger of RNA silencing, by members of the DICERlike (DCL) family of endonucleases [10–12]. DCL cleavage of perfectly dsRNA derived from the transcription of template RNAs, repetitive DNA elements, transposons, natural antisense gene pairs, invading viruses, or introduced transgene-encoded hairpin RNAs (hpRNAs) generates various species of siRNA [3, 12–14]. miRNAs, on the other hand, result from DCL cleavage of dsRNA stem–loop structures of self-complementary regions of nonproteincoding RNAs transcribed from MIR loci [1, 7, 10, 11]. Transformation of Arabidopsis and other plant species with hpRNA transgenes, consisting of an inverted-repeat portion of the target gene sequence separated by a fragment of spacer material (usually an intron), has been demonstrated to be highly efficient trigger of siRNA-directed RNA silencing [15, 16]. However, due to the fact that any sequence along the length of the resulting dsRNA molecule can be processed into a sRNA silencing signal, questions of silencing specificity have arisen. Off-target silencing, the silencing of transcripts additional to those of the targeted gene, have been reported in both the plant and animal system with the use of hpRNA-directed RNA silencing [17–19]. We [20–22], and others [23–27], have therefore developed an alternate approach to control target gene expression in Arabidopsis, termed artificial miRNA (amiRNA) technology. Artificial miRNA technology exploits the intrinsic nature of the processing stages of the miRNA biogenesis pathway. Through modification of both the miRNA and miRNA* sequences while maintaining dsRNA structural features of a miRNA precursor transcript, such as bulges and mismatches, a single, specific, highly accumulating sRNA silencing signal can be generated. To date, amiRNA technology has been demonstrated to direct highly efficient and specific RNA silencing of reporter genes [20, 23], endogenous genes [24, 26], nonprotein-coding RNA [21], and viruses [25, 27], both tissue specifically and in whole plants. Here we describe the design rules for amiRNA selection as well as to outline the PCR and bacterial cloning procedures involved in the construction of an amiRNA plant expression vector for the plasmid pBlueGreen. 2 Materials 2.1 Selection of the Artificial MicroRNA Target Sequence 1. Personal computer with Internet access. 2. Template for (see Fig. 1a). amiRNA forward and reverse primers Controlling Gene Expression with Artificial MicroRNAs 213 Fig. 1 Template for amiRNA forward and reverse primer design. (a) Artificial miRNA forward and reverse primer template sequences to maintain the endogenous dsRNA structural features of the Arabidopsis MIR159B miR159b/miR159b* duplex in the modified PRI-MIR159B amiRNA precursor fragment. (b) Example of the sequence composition of an amiRNA target sequence using the pBlueGreen plant expression vector system. (c) The exact 21-nt amiRNA target sequence is entered into the amiRNA reverse primer template. This sequence is also entered into the amiRNA forward primer template; however, the dsRNA mismatches of the endogenous miR159b/miR159b* duplex are accounted for by introducing mismatched base pairings at positions 12, 13, and 21, respectively (represented by grey-colored lowercase template sequences) 2.2 PCR Amplification of the Artificial MicroRNA Precursor Fragment 1. Template plasmid pAth-miR159b (see Note 1). 2. AmiRNA forward and reverse primers (10 μM; see Note 2). 3. dNTPs (5 mM each of dATP, dCTP, dGTP, and dTTP). 4. Expand Long Template Enzyme mix (5 U/μL; Roche Applied Science; see Note 3). 214 Andrew L. Eamens et al. 5. 10× Expand Long Template PCR System buffer 1 (Roche Applied Science). 6. DNase-free dH2O. 7. 1× Tris/Borate/EDTA (TBE) buffer (for agarose gel analysis). 8. 6× Loading dye (LD; MBI Fermentas). 9. 100 Base pair (bp) DNA ladder (MBI Fermentas). 10. 1.2 % w/v agarose gel (stained with ethidium bromide; EtBr). 11. Ice. 12. 0.2 mL PCR tubes. 13. Pipette tips (2, 20, and 200 μL). 14. 1.5 mL Microfuge tubes. 15. QIAquick® PCR Purification kit (Qiagen). 16. Benchtop thermocycler. 17. Benchtop microfuge (at room temperature; RT). 2.3 Cloning of the PCR-Generated Artificial MicroRNA Precursor Fragment into the pGEM-T® Easy Cloning Vector 1. Column-purified Subheading 3.2). amiRNA precursor fragment (from 2. pGEM-T® Easy Cloning vector (50 ng/μL; Promega). 3. 2× Rapid ligation buffer (Promega). 4. T4 DNA ligase (3 U/μL; Promega). 5. 20 % w/v 5-Bromo-4-chloro-3-indolyl-β-D-galactopyranoside (X-gal; Sigma-Aldrich). 6. 0.1 M Isopropyl-β-D-1-thiogalactopyranoside Sigma-Aldrich). (IPTG; 7. Luria–Bertani (LB) liquid media. 8. LB liquid media containing 50 mg/mL ampicillin (LB-Amp50). 9. LB-Amp50 agar plates. 10. Escherichia coli DH5α electro-competent cells. 11. Ice. 12. Pipette tips (2, 20, 200, and 1,000 μL). 13. 1.5 mL Microfuge tubes. 14. 15 mL Capped centrifuge tubes. 15. Drawn-out glass pipette (with bulb). 16. Bacterial cell plate spreader (sterilized). 17. Bacterial cell loop (sterilized). 18. QIAprep® Spin Miniprep kit (Qiagen). 19. Electroporator and cuvettes. 20. Benchtop microfuge (at RT). 21. Incubated shaker (at 37 °C). Controlling Gene Expression with Artificial MicroRNAs 215 22. Incubator (at 37 °C). 23. Water bath (at 37 and 65 °C). 24. Laminar flow cabinet. 2.4 Cloning of the Artificial MicroRNA Precursor Fragment into the Plant Expression Vector pBlueGreen 1. pGEM-T: precursor-amiRNA vector (from Subheading 3.3). 2. pBlueGreen plant expression vector (see Note 1). 3. pSoup helper plasmid (see Note 1). 4. LguI (5 U/μL; MBI Fermentas). 5. BamHI (10 U/μL; MBI Fermentas). 6. T4 DNA ligase (5 U/μL; MBI Fermentas). 7. 10× Buffer Tango™ (MBI Fermentas). 8. 10x Buffer BamHI (MBI Fermentas). 9. 10× T4 DNA ligase buffer (MBI Fermentas). 10. 1× TBE buffer. 11. 6× LD. 12. DNase-free dH2O. 13. Ice. 14. 20 % w/v X-gal. 15. 0.1 M IPTG. 16. 1.0 % w/v agarose gel (stained with EtBr). 17. 1.2 % w/v agarose gel (stained with EtBr). 18. 100 bp DNA ladder. 19. LB liquid media. 20. LB agar plates containing 50 mg/mL kanamycin (LB-Kan50). 21. E. coli DH5α electro-competent cells. 22. Agrobacterium tumefaciens GV3101 electro-competent cells. 23. Bacterial cell plate spreader (sterilized). 24. Bacterial cell loop (sterilized). 25. QIAquick® PCR Purification kit. 26. QIAprep® Spin Miniprep kit. 27. Pipette tips (2, 20, 200, and 1000 μL). 28. 1.5 mL Microfuge tubes. 29. Electroporator and cuvettes. 30. Benchtop microfuge (at RT). 31. Benchtop shaker (at 37 °C). 32. Incubator (at 37 °C). 33. Water bath (at 37 °C and 65 °C). 34. Laminar flow cabinet. 216 3 Andrew L. Eamens et al. Methods 3.1 Selection of the Artificial MicroRNA Target Sequence 1. To silence the expression of an Arabidopsis gene of interest using amiRNA technology, download the cDNA sequence of the target gene from the TAIR website (http://www.arabidopsis.org/), using the Gene Search function (http://www. arabidopsis.org/servlets/Search?action=new_search&type= gene), to your personal computer. 2. Working in a 5′–3′ direction, identify a 19-nucleotide (nt) sequence within the target gene cDNA sequence containing either a cytosine (C) or guanine (G) residue at position 1, a thymine (T) residue at position 10, and an adenine (A) residue at position 19 (see Note 4), as outlined in Fig. 1b. 3. Once a putative amiRNA target sequence matching these parameters is identified, add two additional 5′ nucleotides upstream of position 1 to obtain a 21-nt putative amiRNA target sequence. 4. Using the BLAST search function on the TAIR website (http://www.arabidopsis.org/Blast/index. jsp), determine if the reverse complement of the selected 21-nt putative amiRNA target sequence (this corresponds to the sequence of your putative mature amiRNA guide strand) is specific to your gene of interest (see Note 5). 5. If the putative amiRNA sequence is complementary to transcripts additional to the target gene, which are not to be targeted for amiRNA-directed RNA silencing, discard the selected sequence and repeat above steps 2–4 until a 21-nt putative amiRNA sequence unique to the target gene is identified (see Note 5). 6. Once a putative amiRNA sequence complementary only to the target gene is identified, enter the exact 21-nt amiRNA target sequence (the selected cDNA target sequence in the 5'–3' direction) into the amiRNA reverse primer template (Fig. 1a), as outlined in Fig. 1c. 7. Also enter the selected 21-nt target sequence (again enter the cDNA target sequence in the 5′–3′ direction) into the amiRNA forward primer template (Fig. 1a). However, and as outlined in Fig. 1c, introduce three mismatched nucleotides at positions 12, 13, and 21 of the amiRNA target sequence, respectively (see Note 6). 8. Order the 65-nt amiRNA forward and 61-nt amiRNA reverse primers from your usual supplier of high-quality DNA oligonucleotides (see Note 7). Controlling Gene Expression with Artificial MicroRNAs 3.2 PCR Amplification of the Artificial MicroRNA Precursor Fragment 217 1. On ice, add reaction components to a chilled 0.2 mL PCR tube in the following order: 38.0 μL of DNase-free dH2O, 5.0 μL of 10× Expand Long Template PCR System buffer 1, 1.0 μL of 5 mM dNTP mix, 2.5 μL each of 10 μM forward and reverse amiRNA primer, 0.5 μL of the pAth-miR159b template plasmid (~50 pg/μL), and 0.5 μL of Expand Long Template Enzyme (see Note 3). 2. Mix the reaction components by pipetting, cap the PCR tube, and immediately transfer to a benchtop thermocycler that has been pre-warmed to 95 °C. 3. Amplify the precursor-amiRNA fragment from the pAthmiR159b template plasmid using the PCR program; 1 × 95 °C/3 min (min.); 28 × 94 °C/20 s (s), 56 °C/30 s, 72 °C/45 s; 1 × 72 °C/7 min., 16 °C/10 min. 4. Transfer a 10 μL aliquot of the above reaction to a labelled 1.5 mL microfuge tube containing 2.0 μL of 6× LD, mix by pipetting, and visualize on an EtBr-stained 1.2 % w/v agarose gel in 1× TBE buffer. On the same gel, run a 10 μL aliquot of 100 bp DNA ladder (or similar size marker) to check that the amplified amiRNA precursor fragment is the correct size at 224 bp. 5. If the PCR product is the expected size, purify the remaining 40 μL using the QIAquick® PCR Purification kit according to the manufacturer’s instructions and resuspend in 20 μL of DNase-free dH2O. 3.3 Cloning of the PCR-Generated Artificial MicroRNA Precursor Fragment into the pGEM®-T Easy Cloning Vector 1. On ice, add to a chilled, labelled 1.5 mL microfuge tube, 4.0 μL of the column-purified amiRNA precursor fragment and 0.5 μL of the pGEM-T® Easy Cloning vector, mix by pipetting, cap the tube, and incubate at 65 °C for 5 min., then immediately transfer the reaction tube to ice and incubate for an additional 5 min. (see Note 8). 2. Add 5.0 μL of 2× Rapid ligation buffer and 0.5 μL of T4 DNA ligase, mix by pipetting, cap the tube, and incubate at 37 °C for 60 min., in a water bath. 3. On ice, thaw an aliquot of E. coli DH5α electro-competent cells. When the cells have completely thawed, add a 2.0 μL aliquot of the above ligation reaction, gently mix by pipetting, and immediately transfer the cellular mixture to a chilled cuvette (on ice). 4. Transfer the cuvette to an electroporator, electroporate, and immediately add 450 μL of ice-cold LB liquid media. Using a drawn-out glass pipette, transfer the cellular mixture to a new labelled 1.5 mL microfuge tube and incubate at 37 °C for 60 min, in a benchtop shaker (at 200 rpm). 218 Andrew L. Eamens et al. 5. In a laminar flow cabinet, and using a sterilized bacterial cell plate spreader, evenly spread 10 μL of 0.1 M IPTG and 20 μL of 20 % w/v X-gal over the entire surface of a LB-Amp50 agar plate (see Note 9). Transfer a 50 μL aliquot of the above bacterial suspension onto the same plate, spread the suspension evenly over the entire surface of the plate with a sterilized bacterial cell plate spreader and dry the plate in the laminar flow cabinet for 10 min. Transfer the agar plate to a 37 °C incubator and incubate for 16–24 h. 6. Using a sterilized bacterial cell loop, select a single whitecolored colony (see Note 9) and inoculate a 5 mL LB-Amp50 liquid culture. Cap the 15 mL centrifuge tube and incubate the bacterial culture at 37 °C for 16–24 h in a benchtop shaker (at 200 rpm). 7. Isolate plasmid DNA from the overnight culture using the QIAprep® Spin Miniprep kit (or by your usual plasmid DNA isolation protocol) according to the manufacturer’s instructions. The plasmid preparation contains the modified precursor transcript of the Arabidopsis MIR159B gene, where the endogenous miR159/miR159* sequences have been replaced with amiRNA/amiRNA* sequences, in the pGEM-T® Easy Cloning vector (see Note 10). 3.4 Cloning of the Artificial MicroRNA Precursor Fragment into the Plant Expression Vector pBlueGreen 1. On ice, and in two labelled 1.5 mL microfuge tubes, add reaction components in the following order: 7.0 μL of DNase-free dH2O, 2.0 μL of 10× Buffer TangoTM and 1.0 μL of 5 U/μL LguI. Add 10.0 μL each of the pGEM-T:precursor-amiRNA and pBlueGreen plasmid preparations to the appropriately labelled 1.5 mL microfuge tube, mix by gently pipetting, and incubate for 4 h in a 37 °C water bath. 2. Transfer a 5.0 μL aliquot of each digestion product to a new labelled 1.5 mL microfuge tube containing 1.0 μL of 6x LD. Mix by pipetting and visualize on an EtBr-stained 1.0 % w/v agarose gel in 1× TBE buffer. On the same gel, run a 10 μL aliquot of 1 kb DNA ladder (or similar size marker) to check that; (a) the plasmid preparations are completely digested, and; (b) the restriction fragments are the correct size (see Note 11). 3. Purify the LguI digestion products with the QIAquick® PCR Purification kit according to the manufacturer’s instructions. Resuspend each restriction product in 20 μL of DNase-free dH2O. 4. To an appropriately labelled 1.5 mL microfuge tube, add 10.0 μL of the LguI-digested pGEM-T:precursor-amiRNA vector, 0.5 μL of the LguI-digested pBlueGreen vector, and 7.0 μL of DNase-free dH2O. Mix by pipetting and incubate at 65 °C in a water bath for 5 min., then immediately transfer to ice and incubate for an additional 5 min (see Note 8). Controlling Gene Expression with Artificial MicroRNAs 219 5. On ice, add 2.0 μL of 10× T4 DNA Ligase buffer and 0.5 μL of T4 DNA ligase, mix by pipetting, and incubate overnight at RT (or alternatively, incubate the ligation at 37 °C for 4 h in a water bath). 6. Thaw an aliquot of E. coli DH5α electro-competent cells on ice, and once thawed, add a 2.0 μL aliquot of the above ligation reaction, gently mix by pipetting, and immediately transfer the mixture to a chilled cuvette (on ice). 7. Transfer the cuvette to an electroporator, electroporate, and immediately add 450 μL of ice-cold LB liquid media. Using a drawn-out glass pipette, transfer the cellular mixture to a new labelled 1.5 mL microfuge tube and incubate at 37 °C for 60 min., in a benchtop shaker (at 200 rpm). 8. In a laminar flow cabinet, and using a sterilized bacterial cell plate spreader, evenly spread 10 μL of 0.1 M IPTG and 20 μL of 20 % w/v X-gal over the entire surface of a LB-Kan50 agar plate (see Note 9). Transfer a 50 μL aliquot of the above bacterial suspension onto the same plate, spread the suspension evenly over the entire surface of the plate with a sterilized bacterial cell plate spreader and dry the plate in the laminar flow cabinet for 10 min. Transfer the agar plate to an incubator and incubate at 37 °C for 16–24 h. 9. Using a sterilized bacterial cell loop, select a single whitecolored colony (see Note 9) and inoculate a 5 mL LB-Kan50 liquid culture. Cap the 15 mL centrifuge tube and incubate the bacterial culture at 37 °C for 16–24 h in a benchtop shaker (at 200 rpm). 10. Isolate plasmid DNA from the overnight culture using the QIAprep® Spin Miniprep kit according to the manufacturer’s instructions. The plasmid preparation contains the modified precursor transcript of Arabidopsis MIR159B, where the endogenous miR159/miR159* sequences have been replaced with amiRNA/amiRNA* sequences targeting your gene of interest for amiRNA-directed RNA silencing, in the pBlueGreen plant expression vector. 11. To determine the orientation of the amiRNA precursor fragment in the pBlueGreen plant expression vector, set up a 20 μL BamHI digestion in a labelled 1.5 mL microfuge tube as follows: 12.0 μL of DNase-free dH2O, 5.0 μL of plasmid preparation, 2.0 μL of 10× Buffer BamHI, and 1.0 μL of BamHI. Mix by pipetting and incubate at 37 °C for 2 h in a water bath. 12. Add 4.0 μL of 6× LD to each plasmid preparation selected for BamHI digestion, mix by pipetting, and run the digestion product(s) on an EtBr-stained 1.2 % w/v agarose gel in 1× TBE buffer along with 10 μL of 100 bp DNA ladder (or similar 220 Andrew L. Eamens et al. size marker). pBlueGreen plasmid preparations containing the modified amiRNA precursor fragment in the desired sense (5′–3′) orientation will return a 440 bp BamHI restriction fragment (see Notes 12 and 13). 13. For transformation of Arabidopsis plants, mix 1.0 μL of the selected pBlueGreen amiRNA plant expression vector with 1.0 μL of the helper plasmid pSoup and use this mixture to transform A. tumefaciens GV3101 electro-competent cells via electroporation (see Notes 14 and 15). 4 Notes 1. The plasmid pAth-miR159b contains the pri-miRNA sequence of the Arabidopsis MIR159B (AT1G18075) locus, PRIMIR159B, in the pGEM-T® Easy Cloning vector. This plasmid is available from the authors upon request and should be diluted to a concentration of ~50 pg/μL prior to use as a template for PCR amplification of the amiRNA precursor fragment. The amiRNA plant expression vector pBlueGreen and the helper plasmid pSoup are available from the authors upon request. 2. The amiRNA forward and reverse primers span the miRNA and miRNA* sequences of PRI-MIR159B. Such a design allows for the simple exchange of these two endogenous sRNA sequences with corresponding amiRNA guide and amiRNA* passenger strand sequences in a single PCR reaction. These long primers also encode LguI restriction sites required for cloning of the modified amiRNA precursor fragment into the pBlueGreen plant expression vector. 3. The Expand Long Template Enzyme mix (Roche Applied Science) is used to amplify the amiRNA precursor fragment as this system contains a mixture of two Taq DNA polymerases to allow for (a) proofreading of the amplified product and (b) A-tailing of the amplified product (for subsequent cloning into the pGEM-T® Easy Cloning vector). 4. The putative amiRNA target sequence is identified by this method due to the fact that target sequence positions 1, 10, and 19 correspond to mature amiRNA guide strand sequence positions 19, 10, and 1, respectively. The majority of endogenous plant miRNAs, including Arabidopsis miR159b, express a uracil (U) residue at the 5′ terminal base, and sRNAs with U at this position preferentially associate with AGO1 [28]. Similarly, the endonucleolytic activity of cleavage-competent AGO1 appears to preferentially cleave mRNA substrates after an adenine (A) residue [26]. Furthermore, we have demonstrated that in plants [20], more stable dsRNA base pairing is preferred at amiRNA Controlling Gene Expression with Artificial MicroRNAs 221 position 19 to ensure preferential loading of the amiRNA guide strand over the corresponding amiRNA duplex strand, the amiRNA* passenger strand, onto the AGO1-catalyzed RNAinduced silencing complex (RISC) to direct highly efficient RNA silencing. Please note that if a putative amiRNA target sequence with G/C, T, and A residues at positions 1, 10, and 19, respectively, cannot be identified within your target transcript cDNA sequence, select an alternate target sequence with other residues at either position 10 or 19 while maintaining the G/C requirement at position 1. This approach will ensure preferential loading of the amiRNA guide strand over the amiRNA* passenger strand for loading into AGO1-catalyzed RISC [20]. 5. In practice, we have found that for Arabidopsis genes, 21-nt amiRNA guide strand sequences (the reverse complement of the selected 21-nt amiRNA target sequence) specific to the transcript to be targeted for amiRNA-directed RNA silencing are readily identifiable. Similarly, if a small group of closely related genes are to be targeted for amiRNA-directed RNA silencing, we recommend selection of the “shared” 21-nt amiRNA target sequence returning the lowest number of possible “off targets” following BLAST searches with the corresponding putative amiRNA sequences. 6. Three mismatched dsRNA base pairings are entered into the design of the amiRNA forward primer (forward primer corresponds to the mature amiRNA* strand) to retain the endogenous dsRNA structure of the miR159b/miR159b* duplex in the modified amiRNA precursor fragment. This ensures that the modified amiRNA precursor fragment is still recognized and subsequently processed by the endogenous protein machinery of the Arabidopsis miRNA biogenesis pathway. 7. Due to the significantly greater length of the amiRNA forward and reverse primers (65-nt and 61-nt, respectively) compared to standard DNA oligonucleotides, we recommend that suppliers of higher-quality DNA oligonucleotides are sourced to avoid the synthesis of error-prone sequences. 8. Following the linearization of DNA fragments, incubation of the purified fragment at 65 °C for 5 min removes any secondary structure that may inhibit subsequent molecular manipulations (such as ligations). Immediately transferring the reaction mixture to ice for an additional 5 min incubation period ensures that nucleic acids stay in a denatured state. 9. Both the pGEM-T® Easy Cloning vector and the pBlueGreen plant expression vector contain the LacZ gene where the PCRamplified, or LguI-digested, amiRNA precursor fragment is inserted following ligation. This allows for the rapid visual selection of bacterial colonies (white-colored colonies) harboring the inserted fragment for subsequent plasmid preparations. 222 Andrew L. Eamens et al. 10. We strongly recommend that all insert-positive plasmid preparations (amiRNA precursor fragment containing preparations) are sequenced prior to proceeding past this stage of amiRNA plant expression vector construction. Sequence alterations could result in dsRNA structural changes that may in turn lead to inefficient processing of the modified amiRNA precursor transcript by the protein machinery of the Arabidopsis miRNA biogenesis pathway. The pGEM-T® Easy Cloning vector encodes both the M13 forward and reverse primer recognition sequences allowing for sequencing of inserted fragments in either and/or both direction(s). 11. Once completely digested, the pGEM-T:precursor-amiRNA vector will return LguI restriction fragments of 250 bp, 378 bp, and 3,000 bp, respectively. LguI digestion of pBlueGreen yields two restriction fragments of 700 bp and >10 kb respectively. Do not proceed with the use of any plasmid preparation in subsequent molecular manipulations that produce LguI restriction fragments differing in size or number to those listed here. 12. The LguI-digested amiRNA precursor fragment can insert into the similarly digested pBlueGreen plant expression vector in either the sense (5′–3′) or antisense (3′–5′) orientation. BamHI can be used to orientate the amiRNA precursor fragment insert. Plasmid preparations containing the amiRNA precursor fragment in the antisense orientation will return a smaller 376 bp restriction fragment compared to those harboring the amiRNA precursor fragment in the desired sense orientation (these return a 440 bp LguI restriction fragment). Discard all plasmid preparations with the amiRNA precursor transcript in the antisense orientation and continue screening additional white-colored kanamycin-resistant colonies until a plasmid preparation containing the insert in the desired sense orientation is isolated. 13. At this stage (once a plasmid preparation of the pBlueGreen plant expression vector containing a modified amiRNA precursor fragment in the sense orientation has been identified), the modified PRI-MIR159B transcript or the entire promoter– amiRNA precursor fragment–terminator cassette can be PCRamplified for transferral to a new plant expression vector of the researcher’s choice for (a) tissue-specific expression, (b) staking of multiple modified amiRNA precursor fragments (to direct RNA silencing of multiple unrelated target genes), or (c) the use of a different in planta selectable marker (selection of Arabidopsis lines expressing the pBlueGreen plant expression vector is outlined in 21). For PCR amplification of the modified PRI-MIR159B transcript use primers pAMIR159B-F [5′-TCA (N)X ACTAGTGATTTCACTTTTGTT-3′] and pAMIR159B-R Controlling Gene Expression with Artificial MicroRNAs 223 [5′-TCA (N)X TTCGAACCCAGACACTTAAAC-3′]. For PCR amplification of the entire promoter–amiRNA precursor fragment–terminator cassette (Fig. 1c), use primers p35SP-F [5′-TCA (N)X CTCGACGAATTAATTCCAATC-3′] and pOCST-R [5′-TCA (N)X CTGCA GGTCCTGCTGAGCC TC-3′]. For all four listed primers, “X” is the number of nucleotides (N) in the recognition sequence of the selected restriction endonuclease(s). 14. In addition to Arabidopsis and in our experience, the pBlueGreen plant expression vector also directs highly efficient amiRNA-mediated RNA silencing in rice (Oryza sativa), tobacco (Nicotiana tobacum), N. benthamiana, and tomato (Solanum lycopersicum). 15. In our experience, the severity of the phenotype displayed by putative amiRNA transformant lines will range from mild to severe. It is therefore important to perform additional molecular analyses on a number of independent transformant lines (we suggest screening of at least ten putative transformant plant lines) to identify transformants where the integrated amiRNA plant expression vector is directing highly efficient RNA silencing of the targeted gene. We suggest the following analyses: (a) Southern blot, to identify single-copy lines; (b) sRNA-specific Northern blot, to assess amiRNA accumulation in single-copy lines; and (c) RT-PCR, qRT-PCR, or high molecular weight Northern blot of amiRNA target gene expression. Such an approach is especially important if the gene targeted for amiRNA-directed RNA silencing is not expected to result in the expression of a readily observable developmental phenotype. References 1. Reinhart BJ et al (2002) MicroRNAs in plants. Genes Dev 16:1616–1626 2. Adenot X et al (2006) DRB4-dependent TAS3 trans-acting siRNAs control leaf morphology through AGO7. Curr Biol 16:927–932 3. Borsani O et al (2005) Endogenous siRNAs derived from a pair of natural cis-antisense transcripts regulate salt tolerance in Arabidopsis. Cell 123:1279–1291 4. Onodera Y et al (2005) Plant nuclear RNA polymerase IV mediates siRNA and DNA methylation-dependent heterochromatin formation. Cell 120:613–622 5. Pontes O et al (2006) The Arabidopsis chromatin-modifying nuclear siRNA pathway involves a nucleolar RNA processing center. Cell 126:79–92 6. Boutet S et al (2003) Arabidopsis HEN1: a genetic link between endogenous miRNA 7. 8. 9. 10. 11. controlling development and siRNA controlling transgene silencing and virus resistance. Curr Biol 13:843–848 Dunoyer P et al (2004) Probing the microRNA and small interfering RNA pathways with virus-encoded suppressors of RNA silencing. Plant Cell 16:1235–1250 Sunkar R, Zhu JK (2004) Novel and stressregulated microRNAs and other small RNAs from Arabidopsis. Plant Cell 16:2001–2019 Mallory AC, Vaucheret H (2006) Functions of microRNAs and related small RNAs in plants. Nat Genet 38:S31–S36 Park W et al (2002) CARPEL FACTORY, a Dicer homolog, and HEN1, a novel protein, act in microRNA metabolism in Arabidopsis thaliana. Curr Biol 12:1484–1495 Golden TA et al (2002) SHORT INTEGUMENTS1/SUSPENSOR1/CARPEL 224 12. 13. 14. 15. 16. 17. 18. 19. Andrew L. Eamens et al. FACTORY, a Dicer homolog, is a maternal effect gene required for embryo development in Arabidopsis. Plant Physiol 130: 808–822 Gasciolli V et al (2005) Partially redundant functions of Arabidopsis DICER-like enzymes and a role for DCL4 in producing trans-acting siRNAs. Curr Biol 15:1494–1500 Xie Z et al (2005) DICER-LIKE4 functions in trans-acting small interfering RNA biogenesis and vegetative phase change in Arabidopsis thaliana. Proc Natl Acad Sci U S A 102:12984–12989 Xie Z et al (2004) Genetic and functional diversification of small RNA pathways in plants. PLoS Biol 2:e104 Smith NA et al (2000) Total silencing by intron-spliced hairpin RNAs. Nature 407: 319–320 Stoutjesdijk PA et al (2004) hpRNA-mediated targeting of the Arabidopsis FAD2 gene gives highly efficient and stable silencing. Plant Physiol 129:1723–1731 Jackson AL, Linsley PS (2004) Noise amidst the silence: off-target effects of siRNAs? Trends Genet 20:521–524 Xu P et al (2006) Computational estimation and experimental verification of off-target silencing during posttranscriptional gene silencing in plants. Plant Physiol 142: 429–440 Senthil-Kumar M, Mysore KS (2011) Caveat of RNAi in plants: the off-target effect. Methods Mol Biol 744:13–25 20. Eamens AL et al (2009) The Arabidopsis thaliana double-stranded RNA binding protein DRB1 directs guide strand selection from microRNA duplexes. RNA 15:2219–2235 21. Eamens AL et al (2011) Efficient silencing of endogenous microRNAs using artificial microRNAs in Arabidopsis thaliana. Mol Plant 4:157–170 22. Eamens AL, Waterhouse PM (2011) Vectors and methods for hairpin RNA and artificial microRNA-mediated gene silencing in plants. Methods Mol Biol 701:179–197 23. Parizotto EA et al (2004) In vivo investigation of the transcription, processing, endonucleolytic activity, and functional relevance of the spatial distribution of a plant miRNA. Genes Dev 18:2237–2242 24. Alvarez JP et al (2006) Endogenous and synthetic microRNAs stimulate simultaneous, efficient, and localized regulation of multiple targets in diverse species. Plant Cell 18:1134–1151 25. Niu QW et al (2006) Expression of artificial microRNAs in transgenic Arabidopsis thaliana confers virus resistance. Nat Biotechnol 24: 1420–1428 26. Schwab R et al (2006) Highly specific gene silencing by artificial microRNAs in Arabidopsis. Plant Cell 18:1121–1133 27. Qu J, Ye J, Fang R (2007) Artificial miRNAmediated virus resistance in plants. J Virol 81:6690–6699 28. Mi S et al (2008) Sorting of small RNAs into Arabidopsis Argonaute complexes is directed by the 5′ terminal nucleotide. Cell 133:1–12 Chapter 12 Generation and Identification of Arabidopsis EMS Mutants Li-Jia Qu and Genji Qin Abstract EMS mutant analysis is a routine experiment to identify new players in a specific biological process or signaling pathway using forward genetics. It begins with the generation of mutants by treating Arabidopsis seeds with EMS. A mutant with a phenotype of interest (mpi) is obtained by screening plants of the M2 generation under a specific condition. Once the phenotype of the mpi is confirmed in the next generation, map-based cloning is performed to locate the mpi mutation. During the map-based cloning, mpi plants (Arabidopsis Columbia-0 (Col-0) ecotype background) are first crossed with Arabidopsis Landsberg erecta (Ler) ecotype, and the presence or absence of the phenotype in the F1 hybrids indicates whether the mpi is recessive or dominant. F2 plants with phenotypes similar to the mpi, if the mpi is recessive, or those without the phenotype, if the mpi is dominant, are used as the mapping population. As few as 24 such plants are selected for rough mapping. After finding one marker (MA) linked to the mpi locus or mutant phenotype, more markers near MA are tested to identify recombinants. The recombinants indicate the interval in which the mpi is located. Additional recombinants and molecular markers are then required to narrow down the interval. This is an iterative process of narrowing down the mapping interval until no further recombinants or molecular markers are available. The genes in the mapping interval are then sequenced to look for the mutation. In the last step, the wild-type or mutated gene is cloned to generate binary constructs. Complementation or recapitulation provides the most convincing evidence in determining the mutation that causes the phenotype of the mpi. Here, we describe the procedures for generating mutants with EMS and analyzing EMS mutations by map-based cloning. Key words Arabidopsis, EMS mutagenesis, Forward genetics, Map-based cloning, F2 mapping population, Molecular marker 1 Introduction Forward genetics has proven to be a powerful tool for identifying the components of a specific biological process or a signal transduction pathway [1]. One of the big advantages of forward genetics is that we do not need prior assumptions and no bias is introduced. Forward genetics starts with a mutant with a phenotype of interest (mpi) [1]. By identifying mutants, we may find new components in the biological process we are interested in. T-DNA insertion mutants and mutants induced by chemical mutagens such as ethyl methanesulfonate (EMS) are the most widely used in forward genetics [2]. Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_12, © Springer Science+Business Media New York 2014 225 226 Li-Jia Qu and Genji Qin Compared with T-DNA insertion mutants, EMS mutants have certain advantages. First, EMS mutants are easier to generate than T-DNA mutants. Second, large amounts of EMS mutant seeds are available for screening under a specific condition. Third, EMS may produce a missense mutation resulting in a weak allele for an essential gene [3]. By analyzing EMS mutants, we can not only identify gene functions but also understand the role of a specific amino acid in protein function. EMS may induce biased alkylation of guanine (G) to form O6-ethylguanine, which pairs with thymine (T) but not with cytosine (C). During the subsequent DNA repair, the original G/C pair is thus replaced by A (adenine)/T. Thus, 99 % of EMS mutations are C-to-T changes causing C/G to T/A substitutions [3, 4]. To saturate the Arabidopsis genome with EMS mutations, about 125,000 seeds of Arabidopsis (≈2.5 g) are required for mutagenesis [5]. However, since EMS causes multiple point mutations in each plant, as few as 5,000 plants are enough to find a mutation in a given gene [6]. After obtaining sufficient M2 seeds, one can screen these seeds under a specific condition to find a mutant involved in a specific biological process [1, 7]. Once a mutant of mpi is obtained, the phenotype should be confirmed in the next generation. If the phenotype is verified, the mpi is crossed with Arabidopsis Ler ecotype and the F1 plants are grown to generate F2 seeds. At the same time, the observation of the phenotype in the F1 generation indicates whether the mpi is dominant or recessive. This information is important to determine what kind of plants to select in the F2 generation for mapping. A high density of molecular markers is essential for highresolution mapping [6]. Arabidopsis ecotypes including Col-0 and Ler show abundantly divergent sequences that support the design of highly dense molecular markers [6, 8]. The combination of Col-0 and Ler is the most widely used for mapping [6, 9]. The sequences of these two ecotypes are available in public databases, which further facilitate the design of molecular markers [9]. The most commonly used molecular markers in Arabidopsis mapping are insertion/ deletion (InDel) markers based on simple sequence length polymorphisms (SSLP), cleaved amplified polymorphic sequences (CAPS) markers, and derived CAPS (dCAPS) markers based on single nucleotide polymorphisms (SNP) [10, 11]. These are all PCR-based markers and thus easy to use and affordable. Many InDel markers have been developed by different research groups, so little effort is required to design of molecular markers in the postgenome era [6, 9–11]. Mapping the mutation includes rough-mapping and finemapping stages [6, 9]. Both processes actually involve similar procedures, including the following steps: (1) Growing F2 plants. (2) Observing phenotypes, that is, finding plants with the phenotype if the mpi is recessive or plants without the phenotype if the Generation and Identification of EMS Mutants 227 mpi is dominant. (3) Finding or designing molecular markers. (4) Testing these molecular markers. (5) Finding recombinants for the markers. (6) Determining the mpi mapping interval. When no further markers or recombinants are available in a mapping interval, the major work is diverted to sequencing the genes within the interval until the mutation is found. Complementation or recapitulation is then required to confirm that the identified mutation indeed causes the phenotype of the mpi. In this chapter, we describe in detail three procedures used in our lab. The first is the generation of mutants with the EMS mutagen. The second is how to map and isolate the mutation that leads to the phenotype of interest in the mpi. The third is complementation and recapitulation. Some steps of these procedures are fine-tuned and described in the Notes for this section. 2 Materials 2.1 EMS Mutagenesis of Arabidopsis Seeds 1. 2.5 g of Arabidopsis seeds (about 125,000 seeds) [3]. 2. Freshly made 20 % bleach. 3. Ethyl methanesulfonate (EMS) stock solution (Sigma M0880). 4. 10 M NaOH. 5. Solid MS medium or 0.1 % agar. 6. Sterilized water. 7. Disposable 50 mL plastic tubes. 8. Micropipette. 9. Parafilm. 10. Rotator. 11. Fume hood. 2.2 Mapping of the mpi Locus 1. MS medium. 2.2.1 Preparation of the Mapping Population 3. Seeds of the mpi Arabidopsis in ecotype Col-0. 2. Freshly made 20 % bleach. 4. Seeds of Arabidopsis ecotype Ler. 5. Micropipette. 6. 1.5 mL sterilized microcentrifuge tubes and tips. 7. Forceps and scissors. 8. Dissecting microscope. 9. Labeling tape. 2.2.2 DNA Preparation Using CTAB 1. CTAB buffer: 2 % (w/v) cetyltrimethylammonium bromide (CTAB), 100 mM Tris, 20 mM EDTA, and 1.4 M NaCl (see Note 1) [12]. 228 Li-Jia Qu and Genji Qin 2. Absolute ethanol and 70 % ethanol (prechilled in a −20 °C freezer). 3. Chloroform/isoamyl alcohol (24:1). 4. Sterilized ddH2O. 5. 1 % agarose gel, 6× Loading Buffer and 1× TAE buffer. 6. Liquid nitrogen. 7. Sterile 1.5 mL microcentrifuge tubes and tips. 8. Plastic tissue grinding pestles. 9. Micropipette. 10. 65 °C water bath. 11. Microcentrifuge. 12. Vortex mixer. 2.2.3 Rough Mapping of the mpi Locus 1. Labels. 2. PCR machine (thermocycler). 3. Sterilized 1.5 mL microcentrifuge tubes. 4. Sterilized PCR plates. 5. PCR reagents including PCR buffer, 2.5 mM dNTPs mixture, marker primers, Taq DNA polymerase, and sterilized ddH2O. 6. 4 % agarose gel. 7. Agarose gel electrophoresis system. 2.2.4 Fine Mapping of the mpi Locus 1. Labels. 2. PCR machine (thermocycler). 3. Sterilized 1.5 mL microcentrifuge tubes. 4. Sterilized PCR plates. 5. PCR reagents including PCR buffer, 2.5 mM dNTPs mixture, marker primers, Taq DNA polymerase, and sterilized ddH2O. 6. Specific restriction endonuclease for CAPS marker. 7. Microcentrifuge. 8. A computer connected to the internet. 9. Primer design software. 10. Incubator. 11. 4 % agarose gel. 12. Agarose gel electrophoresis system. 2.3 Complementation and Recapitulation Analysis 1. Plasmid DNA of a plant binary vector containing the CaMV 35S promoter. 2. Competent cells of Agrobacterium tumefaciens strain GV3101 (pMP90). Generation and Identification of EMS Mutants 229 3. LB broth and agar plates with antibiotics. 4. MS medium, sucrose, Silwet L-77. 5. Selective antibiotics or herbicides, carbenicillin. 6. Sterilized 50 and 1,000 mL flasks. 7. Sterilized 500 mL centrifuge bottles. 8. 28 °C incubator and shaker. 9. MicroPulser™ Electroporation Apparatus (Bio-Rad) or other electroporator. 10. Ice-cold water bath. 11. Micropipette, microcentrifuge tubes and tips. 12. Microcentrifuge. 13. Silica-gel desiccant. 3 Methods 3.1 EMS Mutagenesis of Arabidopsis Seeds 1. Weigh out 2.5 g of Arabidopsis Col-0 ecotype seeds and put them into one of the 50 mL disposable plastic tubes (see Note 2). 2. Make 20 % bleach with the sterilized water and add 40 mL into the tube. Seal the tube with parafilm and rotate for 10–15 min on the rotator. Spin the tube briefly and remove the bleach solution. 3. Wash the seeds with sterilized water 3–4 times. Add 40 mL sterilized water. Seal the tube with parafilm and place on the rotator. Keep rotating overnight at room temperature. 4. Add 120 μL EMS stock solution into the tube to make the EMS to a final concentration of 0.3 %. Continue to rotate the tube for about 12 h in a fume hood at room temperature (see Note 3). 5. Remove the EMS solution to a container. Add 4 mL 10 M NaOH and leave it at room temperature overnight (see Note 4). 6. Wash the seeds eight or more times with sterilized water. Spin briefly each time to precipitate the seeds and dispose of the water. 7. Plate the seeds on MS medium or mix the seeds with 0.1 % agar and pipette the mixture of plant seeds into soil. We grow the plants in trays (see Note 5). 8. Harvest seeds and screen the M2 bulked seeds for the mpi under a specific condition (see Note 6). 3.2 Mapping of the mpi Locus 1. Harvest the seeds from the mpi and put them into a container with silica-gel desiccant. 3.2.1 Preparation of the Mapping Population 2. To generate the mapping population, first allocate about 150 seeds of the mpi (Col-0 background) and Arabidopsis ecotype 230 Li-Jia Qu and Genji Qin Ler. Put the seeds separately into two 1.5 mL sterilized microcentrifuge tubes. Add 1.4 mL freshly made 20 % bleach to the tubes and mix for 10–15 min (see Note 7). 3. Wash the seeds 3–4 times with sterilized water and plate the seeds onto MS medium. After being synchronized for 3 days at 4 °C, keep the plates at 22 °C under long-day conditions (16 h of light/8 h of darkness) for 7 days. 4. Transfer the mpi and Ler seedlings into soil and let them grow at 22 °C under long-day conditions. 5. At the flowering stage, select a healthy inflorescence from the mpi or Ler plants. Remove the siliques and the opened flowers with scissors and get rid of the small buds with forceps. Just keep 1–4 big buds on the inflorescence (see Note 8). 6. Remove the six anthers from the flower buds using the tips of the forceps very carefully. Mark the inflorescence using colored labeling tape. Put the plant back into the normal growth conditions (see Note 9). 7. Two days after emasculation, remove an opened flower from the mpi or Ler plants with the forceps. Carefully rub the stigma of the emasculated flower from the Ler or mpi plants against the isolated flower in which mature pollen has been released from the broken anthers. Label the time of pollination on the tape. 8. Harvest the F1 seeds and dry them with silica-gel desiccant (see Note 10). 9. Grow the F1 seeds as described above. 10. Observe the phenotype of the heterozygous mpi F1 plants to determine whether the mutation is recessive or dominant (see Note 11). 11. Harvest the leaves from the F1 plants and prepare DNA as described below. Preserve the DNA at −20 °C for mapping as described below. 12. To make sure no mistake was made during the cross and that the F1 plants are in fact hybrids of Col-0 and Ler, perform PCR using 1 μL DNA from the F1 plants as the template and test it with two InDel markers as described below (see Note 12). 13. Harvest the F2 seeds from the correct F1 plants individually. Dry and preserve the seeds with silica-gel desiccant at 4 °C. 14. Grow the F2 seeds normally as described above. These F2 plants constitute the population to be used for mapping the mpi locus (see Note 13). Generation and Identification of EMS Mutants 3.2.2 DNA Preparation Using CTAB 231 1. Observe the phenotypes of the plants in the F2 segregated mapping population. For DNA preparation, select plants with the phenotype of interest if the mpi is a recessive mutant or those without the phenotype of interest if the mpi is a dominant mutant. 2. Harvest about 50–100 mg leaves (or one medium size leaf) and place into a 1.5 mL microcentrifuge tube. Grind the tissue to a fine powder in liquid nitrogen using a plastic tissue grinding pestle. 3. Add 400 μL 65 °C preheated 2 % CTAB extraction buffer and mix well using the pestle. 4. Incubate the microcentrifuge tube in a 65 °C water bath for 10 min to 2 h. Mix every 10–30 min. 5. Add 400 μL of chloroform/ isoamyl alcohol (24:1) and vortex the solution vigorously. 6. Centrifuge at 11,340 × g for 10 min at room temperature. 7. Transfer about 300 μL of the upper aqueous phase carefully to a new tube (see Note 14). 8. Add 600 μL −20 °C prechilled absolute ethanol to the tube and mix well by inverting. Place the tube in a −20 °C icebox for at least 30 min (see Note 15). 9. Centrifuge at 11,340 × g for 10 min. Discard the supernatant. 10. Add 500 μL −20 °C prechilled 70 % ethanol to wash the DNA pellet for 5–10 min. 11. Centrifuge at11,340 × g for 10 min. Discard the supernatant carefully (see Note 16). 12. Dry the DNA pellet by inverting the tube on a paper towel (see Note 17). 13. Add 100–200 μL sterilized ddH2O to dissolve the DNA. 3.2.3 Rough Mapping of the mpi Locus 1. Observe the phenotypes of the plants from the F2 mapping population. Calculate the segregation rates of the phenotype of interest. The segregation ratio of the phenotype for a recessive mpi should be 3:1, whereas the segregation ratio for a dominant mpi should be 1:3 (see Note 18). 2. Choose about 24 plants with the phenotype of interest if the mpi is a recessive mutant or without the phenotype of interest if the mpi is a dominant mutant for rough mapping. Number the 24 plants and prepare DNA from these plants as described above. 3. Select 10 InDel markers distributed on the ten Arabidopsis chromosome arms for rough mapping (see Note 19). 232 Li-Jia Qu and Genji Qin 4. Perform primary reactions in total volumes of 10 μL. First, make up the master mixtures for the PCR for each marker in sterile 1.5 mL microcentrifuge tubes. Write the marker’s names on the tubes. Each reaction contains the following reagents for preparing the mixture: 1 μL of 10× PCR buffer with MgCl2, 0.8 μL of 2.5 mM dNTPs mixture, 0.1 μL of each of the 10 μM marker primer pair, 0.5 U of Taq DNA polymerase, and ddH2O to make up to 10 μL. Briefly mix and centrifuge. 5. Allocate 9 μL PCR master mixture to each well of the PCR plate. Write down the marker’s name on the plate. Add 1 μL of DNA each from the Col-0 and Ler F1 hybrid and the selected 24 plants to separate wells of the PCR plate. Remember the order of the samples in the wells. Put the PCR plate on ice. 6. Set up the thermocycling program for PCR as follows: 94 °C for 2 min, 45 cycles of (94 °C for 10 s, 58 °C for 15 s, 72 °C for 30 s), and 72 °C for 5 min (see Note 20). 7. Place the PCR plate into the block and run the program on the thermocycler. 8. After it has finished, electrophorese the PCR products on the 4 % agarose gel or store them at −20 °C if not doing this immediately (see Note 21). 9. Check the success of the PCR and gel electrophoresis by observing the separated DNA bands from the control Col-0 and Ler F1 hybrid. 10. Calculate the segregation ratios of the markers among the 24 plants. If the segregation ratio of a particular marker is about 1:2:1 (plants from which only the Ler band is amplified/plants from which both bands are amplified/plants from which only the Col-0 band is amplified), we can conclude that the marker is not linked to the mpi locus. If there is a distortion of the 1:2:1 segregation ratio, the marker is possibly linked to the mpi locus. We name the linked marker “MA” (Fig. 1). 11. Choose another marker (named MB) near marker MA. The distance between markers MA and MB is about 3–5 BAC in length (Fig. 1). Perform PCR as described above and calculate the segregation ratio of marker MB. A distorted segregation ratio of marker B confirms that the mpi locus is linked to markers MA and MB. Calculate the number of recombinants including plants from which only a Col-0 or Ler band is amplified and those from which both bands are amplified. The number of recombinants demonstrates how closely the marker is linked to the mpi locus; the fewer the number, the closer the marker (see Note 22). 12. Taking markers MA and MB as the center, select another 4–8 markers (e.g., MC, MD, ME, MF) distributed evenly at both ends of the center. The number of markers is selected according Generation and Identification of EMS Mutants 233 Fig. 1 Schematic representation of the molecular markers distributed in one chromosome region. Left panel, distribution of markers used in rough mapping, with the recombinants listed beside the markers. The mapping interval is the region between marker MA and MC (left). Middle panel, distribution of markers used in the first round of fine mapping. The mapping interval is the region between marker MAx and MCx (middle). Right panel, the distribution of markers used in the second round of fine mapping. The mapping interval is the region between marker MAxx and MCxx (right) to how closely MA and MB are linked to the mpi locus. Perform PCR again as described above. Calculate the number of recombinants for the additional markers MC, MD, ME, and MF, for example (Fig. 1). 13. Draw a chromosome fragment representation and arrange the tested markers in the chromosome region. Write the numbers 234 Li-Jia Qu and Genji Qin of the recombinants alongside the marker names (Fig. 1). Locate the mpi locus between two markers; for instance, MA and MC, which are used as examples in the following section (see Note 23). 3.2.4 Fine Mapping of the mpi Locus 1. Keep the DNA of recombinants for markers MA and MC (see Note 24). 2. Identify about 150 plants for further mapping that display the phenotype of interest if the mpi is a recessive mutant or those without the phenotype of interest if the mpi is a dominant mutant. Number these plants and prepare DNA as described above (see Note 25). 3. Use markers MA and MC to screen these plants for recombinants. Select recombinants for MA and MC for further analysis (see Note 26). 4. Design markers between MA and MC. We name them MA1, MA2, etc., up to MAx and MC1, MC2, etc., up to MCx. The distribution of the markers should be dense in the middle and sparse near either MA or MC (Fig. 1). 5. Use the DNA of the recombinants for MA to test MA1, MA2, etc., until no recombinants are found when testing MAx. Use the DNA of the recombinants for MC to test MC1, MC2, etc., until no recombinants are found when testing MCx. The mpi locus is now narrowed to between MAx and MCx. 6. If not many genes are located between MAx and MCx (Fig. 1), perform bioinformatic analysis of these genes to find candidates for MPI. Sequence the candidate genes to look for mutations (see Note 27). 7. At the same time, continue to identify more plants displaying the phenotype of interest if the mpi is a recessive mutant or those without the phenotype of interest if the mpi is a dominant mutant to further narrow down the interval in which the mpi locus is located. Number these plants and prepare DNA from them. 8. Screen the recombinants using the markers MAx and MCx for further analysis (see Note 28). 9. Design markers between MAx and MCx. We name them MAx1, MAx2, etc., up to MAxx and MCx1, MCx2, etc., up to MCxx (Fig. 1). 10. Use these markers to find recombinants until no recombinants are found by MAxx and MCxx. The mpi locus interval is now narrowed to between MAxx and MCxx (Fig. 1). 11. If only a few genes are located between MAxx and MCxx (Fig. 1), the mpi locus interval may be hard to further narrow down (see Note 29). 12. Sequence all genes between MAxx and MCxx to find the mutation in the mpi mutant (see Note 30). Generation and Identification of EMS Mutants 3.3 Complementation and Recapitulation Analysis 235 1. Clone the wild-type gene corresponding to the mutated gene if the mpi is a recessive mutant or clone the mutated gene if the mpi is a dominant mutant. Generate binary constructs in which the wild-type or mutated gene is driven by the CaMV 35S promoter (see Note 31). 2. Prepare A. tumefaciens strain GV3101 harboring the plasmid construct. Transform the wild-type gene into the mpi mutants for complementation if the mpi is a recessive mutant or transform the mutated gene into wild-type plants if mpi is a dominant one. 3. Harvest the T0 seeds. Screen for transformants on 1/2 MS containing proper selection antibiotic or herbicide. 4. Observe the phenotype of the transformants of the T1 generation. In complement analysis, if the transformants recover a phenotype similar to that of the wild type, the mutant is complemented by the wild-type gene. In recapitulation analysis, if the transformants display a phenotype similar to that of the mutant, the dominant mutant phenotype is caused by the mutation (see Note 32). 4 Notes 1. To prepare 100 mL 2 % CTAB extraction buffer, add 2 g CTAB, 10 mL 1 M Tris–HCl pH 8.0, 4 mL 0.5 M EDTA pH 8.0, and 8.19 g NaCl and add water to a final volume of 100 mL. Sterilize and store at room temperature. Preheat at 65 °C and add 0.2–0.5 % β-mercaptoethanol before use. 2. As few as 5,000 seeds can be used for mutagenesis since EMS solution causes multiple point mutations in each plant [9]. We usually treat 125,000 seeds at a time. 3. The EMS concentration may be modified in the range of 0.1– 0.3 %. Higher EMS concentrations may lead to higher rates of mutation, but at the same time cause more lethal mutations. EMS is a hazardous chemical, so the EMS mutagenesis should be carried out in a fume hood. 4. The EMS solution is hazardous. After mutagenesis, EMS solution must be deactivated with NaOH. The NaOH-treated EMS solution should be disposed of down the fume hood sink regularly. 5. The trays are placed at 22 °C under 16 h light and 8 h dark. If fewer seeds were mutated, the seeds may be plated on MS medium. The green seedlings are then transferred to soil. 6. About 3–5 g of M2 seeds may be harvested from one tray. If fewer seeds were treated, the seeds may be harvested individually. The M2 seeds can be stored at room temperature for 236 Li-Jia Qu and Genji Qin up to 1 year with silica-gel desiccant. For long-term storage, the seeds may be stored at 4 °C with silica-gel desiccant. 7. Different Arabidopsis ecotypes are crossed to make the F2 mapping population because their sequences are divergent in nature [6]. Both Col-0 and Ler genomes have been sequenced and are available on the internet. The Col-0 sequence is accessible on the NCBI website. The Ler sequence is accessible on the TAIR website (http://www.arabidopsis.org/browse/ Cereon/index.jsp). Col-0 and Ler genomes have abundantly divergent sequences that differ in about 4–11 positions every 1 kb [8, 9]. This abundance of sequence differences facilitates the design of sufficiently dense molecular markers for mapping. Thus, we usually cross Col-0 with Ler for the generation of the mapping population. 8. Select flower buds as big as possible for crossing as long as the anthers are not broken to release mature pollen. Flower buds that are too small are hard to manipulate. Removing the siliques and opened flowers prevents interference with the crossed siliques and also lets more nutrients flow to the crossed seeds. 9. If using Ler as the mother plant, remove the six anthers from the flower bud directly under a dissecting microscope because, in this ecotype, the anthers are not enclosed by petals and sepals. If using the mpi (Col-0 background) as the mother plant, because the petals and sepals enclose the anthers, first press the tip of flower bud to open it and then get rid of the six anthers. Be careful not to damage the gynoecium. 10. Usually, the seeds can be harvested 2 weeks after pollination. 11. Determining whether the mpi is recessive or dominant is critical to the next step of mapping. In the F1 generation, if the heterozygous plants display phenotypes similar to the mpi, it is dominant. If the heterozygous F1 plants display no phenotype, it is recessive. 12. DNA from F1 plants can also be used as a control in testing the mapping markers with PCR. At least two bands are amplified from the DNA of F1 plants. When running the PCR products, we just load the amplified products from the DNA of F1 plants instead of DNA size markers during mapping. 13. The recombination rates vary in different regions of the genome. In Arabidopsis, 1 cM equates to a physical distance of 100–400 kb, with an average of 250 kb [6]. However, in the centromere region, 1 cM equates to about 1,000–2,500 kb [13]. Therefore, it is hard to determine how many F2 plants should be grown for mapping. It becomes a balance between time and labor. If time is critical in determining the mpi locus, we need to grow enough plants (2,000–4,000 plants) Generation and Identification of EMS Mutants 237 for mapping to save time. If labor and space are a problem, we may grow about 600 plants first to save labor and space. 14. Contamination by traces of chloroform affects the PCR. To avoid contamination, dispose of about 100 μL of the aqueous phase and transfer only 300 μL to the new tube. 15. It is not necessary to add sodium acetate to the aqueous phase before using ethanol to precipitate the DNA because of the NaCl in the CTAB extraction buffer. 16. Discard the supernatant gently to prevent losing the DNA pellet. 17. Do not let the DNA pellet overdry. Overdried DNA pellets are hard to dissolve. 18. Only if the segregation ratio meets Mendel’s principles can we map the mpi locus as described below. 19. Molecular markers based on the sequence divergences between the Arabidopsis Col-0 and Ler ecotypes are essential for mapping. The most widely used molecular markers during mapping are InDels, SSLPs, CAPS, and dCAPS [6, 9–11]. The advantages to these markers during mapping are as follows. First, they are all PCR-based markers and thus easy to use and affordable. The most convenient markers are InDels because they require only ordinary PCR and separation of products on a high-concentration agarose gel. CAPS markers need an additional enzyme digestion step between PCR and running the gel. dCAPS markers are the same as CAPS markers except that sometimes the PCR products are not so specific because of existing mismatches in the primers. Second, they are codominant markers. Different products are amplified from the chromosomes of Col-0 and Ler. These distinct products can be differentiated on an agarose gel. Many InDel markers have been developed by different research groups, so little effort is required to design primers for rough mapping. For example, 25 InDel markers were recommended by our group for rough mapping [10]. These InDel markers are easy to use and distributed evenly across the five chromosomes of Arabidopsis. As the mapping interval is narrowed, finding and designing good markers becomes important for further mapping. The “Monsanto Arabidopsis Polymorphism and Ler Sequence Collections” on the TAIR website (http://www. arabidopsis.org/browse/Cereon/index.jsp) support this process [6]. Sometimes we need to find sequence differences between Col-0 and Ler manually using the BLAST software. 20. Usually 30 cycles is enough to amplify products, but some primers have low amplification efficiency. Thus, we run the PCR for 45 cycles when testing markers. 238 Li-Jia Qu and Genji Qin 21. A 4 % agarose gel is a rather high-concentration gel. To prepare the gel, weigh out 4 g agarose powder and put it into a 500 mL flask containing 100 mL cold TAE buffer. Heat the flask and agitate the solution until the powder is completely dissolved. The gel can be reused. The 3 % high-resolution Metaphor gel is also a good choice for separating PCR products. 22. When calculating the recombinants, the plants from which both bands are amplified are definitely recombinants. However, when calculating the one band recombinants, we need to factor in different conditions. If the mpi is a recessive mutant in the Col-0 background or a dominant one in the Ler background, the plants from which only the Ler band is amplified are recombinants. If it is a dominant mutant in the Col-0 background or a recessive one in the Ler background, then the plants from which only the Col-0 band is amplified are recombinants. 23. When the recombinants are obtained, we locate the mpi locus between the two markers that have the fewest recombinants. If the mapping is accurate, the number of recombinants for the markers at either of the two ends of the mpi locus will decrease. That is, considering a marker distributed at one end of the mpi locus, the recombinants for a marker far from the mpi locus may become nonrecombinants for the marker near the mpi locus. For example, as shown in Fig. 1, from marker MG to MA, the number of recombinants decreases; and from ME to MC, the number of recombinants also decreases, so the mpi locus is located between markers MA and MC. The No. 2 and No. 24 plants are recombinants for MB, but are not recombinants for MA. 24. The recombinants for the farther markers are always useful for narrowing down the mapping interval when testing the nearer markers, until they become nonrecombinants when testing a nearest marker. 25. When no recombinants are usable, we need to screen for more recombinants to narrow down the mapping interval using the nearest marker. 26. If the nearest marker is not easy to use, we can perhaps use a more convenient marker neighboring the nearest marker for the screening. We can then use the recombinants to test the nearest marker. 27. Sometimes a known mutant displays a similar phenotype to the mpi and the known gene is in the mapping interval, in which case we first need to sequence the gene and determine if the mpi is an allele of the known mutant. Sometimes the domain and structure of proteins encoded by genes in the mapping interval plus the phenotype of the mpi may tell us which is the most probable candidate gene. Generation and Identification of EMS Mutants 239 28. This is another round to narrow down the mapping interval. 29. When the mapping interval become smaller, it becomes harder to find a useful recombinant and more plants may need to be included. It is also harder to find a good marker. In this situation, the majority of the work may be diverted to sequencing genes in the mapping interval. 30. The coding regions of the most probable genes are sequenced first, from our speculations based on publications, the characters of the genes, and the phenotype of the mpi. If no mutations are found, the coding regions of less probable genes are sequenced. Noncoding regions may then be sequenced if mutations are still not found. Of the mutations induced by EMS, 99 % are C/G to T/A substitutions [3, 4]. 31. The genomic sequence of the wild-type gene for the recessive mutant and the mutated gene for the dominant mutant, including the coding region, promoter region, and 3 ʹ UTR region, may alternatively be used for complementation. 32. Like the analysis of T-DNA insertion mutants, complementation or recapitulation assays provide the most convincing evidence in determining if the mutated gene causes the phenotype of the mpi. References 1. Page DR, Grossniklaus U (2002) The art and design of genetic screens: Arabidopsis thaliana. Nat Rev Genet 3:124–136 2. Peters JL, Cnudde F, Gerats T (2003) Forward genetics and map-based cloning approaches. Trends Plant Sci 8:484–491 3. Kim Y, Schumaker KS, Zhu JK (2006) EMS mutagenesis of Arabidopsis. Methods Mol Biol 323:101–103 4. Greene EA et al (2003) Spectrum of chemically induced mutations from a large-scale reverse-genetic screen in Arabidopsis. Genetics 164:731–740 5. Jander G et al (2003) Ethylmethanesulfonate saturation mutagenesis in Arabidopsis to determine frequency of herbicide resistance. Plant Physiol 131:139–146 6. Lukowitz W, Gillmor CS, Scheible WR (2000) Positional cloning in Arabidopsis. Why it feels good to have a genome initiative working for you. Plant Physiol 123:795–805 7. Zhang Y, Glazebrook J, Li X (2007) Identification of components in disease-resistance 8. 9. 10. 11. 12. 13. signaling in Arabidopsis by map-based cloning. Methods Mol Biol 354:69–78 Hardtke CS, Muller J, Berleth T (1996) Genetic similarity among Arabidopsis thaliana ecotypes estimated by DNA sequence comparison. Plant Mol Biol 32:915–922 Jander G et al (2002) Arabidopsis map-based cloning in the post-genome era. Plant Physiol 129:440–450 Hou X et al (2010) A platform of high-density INDEL/CAPS markers for map-based cloning in Arabidopsis. Plant J 63:880–888 Pacurar DI et al (2012) A collection of INDEL markers for map-based cloning in seven Arabidopsis accessions. J Exp Bot 63:2491–2501 Clarke JD (2009) Cetyltrimethyl ammonium bromide (CTAB) DNA miniprep for plant DNA isolation. Cold Spring Harb Protoc, pdb prot5177. doi: 10.1101/pdb.prot5177 Copenhaver GP, Browne WE, Preuss D (1998) Assaying genome-wide recombination and centromere functions with Arabidopsis tetrads. Proc Natl Acad Sci U S A 95:247–252 Chapter 13 Generation and Characterization of Arabidopsis T-DNA Insertion Mutants Li-Jia Qu and Genji Qin Abstract Transfer DNA (T-DNA) insertion mutants are often used in forward and reverse genetics to reveal the molecular mechanisms of a particular biological process in plants. To generate T-DNA insertion mutants, T-DNA must be inserted randomly in the genome through transformation mediated by Agrobacterium tumefaciens. During generation of a T-DNA insertion mutant, Agrobacterium competent cells are first prepared and plasmids containing the T-DNA introduced into Agrobacterium cells. Agrobacterium containing T-DNA vectors are then used to transform T-DNA into Arabidopsis. After screening and identifying T-DNA insertion mutants with interesting phenotypes, genomic DNA is extracted from the mutants and used to isolate the T-DNA flanking sequences. To finally determine the mutated genes causing the specific phenotype in the T-DNA insertion mutants, cosegregation analysis and complementation or recapitulation analysis are needed. In this chapter, we describe detailed protocols for generation and characterization of T-DNA insertion mutants. Key words T-DNA insertion mutant, Floral dip, TAIL-PCR, Cosegregation, Complementation, Recapitulation 1 Introduction Transfer DNA (T-DNA) insertion mutants are widely used to elucidate gene functions in genetic analyses of Arabidopsis. One advantage of T-DNA mutagenesis is that the known T-DNA element can be used as a possible tag when it disrupts or activates genes. The tag sequence constitutes an easy tool to identify the gene defined by the T-DNA mutation through isolating the adjacent genomic sequence without painstaking mapping procedures. Another advantage of T-DNA mutagenesis is that the T-DNA can include some elements, such as different copies of the Cauliflower mosaic virus (CaMV) 35S promoter and reporter genes (i.e., GUS and GFP), allowing the generation of activation tagging, and promoter and enhancer trap lines. These lines may be used to determine the function of redundant genes and to identify genes displaying specific expression patterns. Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_13, © Springer Science+Business Media New York 2014 241 242 Li-Jia Qu and Genji Qin Agrobacterium-mediated plant transformation has been used to create T-DNA insertion mutants in Arabidopsis that have been proved to be highly useful in forward and reverse genetics. Different methods to generate T-DNA insertion mutants and to identify the corresponding mutated genes have been reported in publications and websites [1–17]. Here, we provide detailed procedures for the method used routinely in our laboratory. Some steps of these protocols are fine-tuned for more efficient operation and are mentioned in the Notes section of this chapter. 2 Materials 2.1 Preparation of Agrobacterium tumefaciensContaining T-DNA Vector 1. Agrobacterium strain GV3101 (pMP90) glycerol stock. 2.1.1 Preparation of A. tumefaciens Competent Cells 5. Liquid nitrogen. 2. Luria Bertani (LB) broth and agar plates supplemented with antibiotics. 3. 10 mg/mL rifampicin and 50 mg/mL gentamicin. 4. 1,000 mL sterile, ice-cold 10 % glycerol (v/v). 6. Sterile 50 and 1,000 mL flasks. 7. Sterile 500 mL centrifuge bottles. 8. Ice-water bath. 9. Sterile 1.5 mL microcentrifuge tubes and tips. 10. Pipettes. 11. 28 °C Incubator and shaker. 12. Spectrophotometer. 13. Refrigerated centrifuge. 2.1.2 Transformation of A. tumefaciens Competent Cells 1. Agrobacterium competent cells. 2. About 30 ng/μL T-DNA plasmid DNA. 3. LB liquid medium and agar plates with appropriate antibiotics. 4. Sterile ddH2O. 5. Prechilled cuvettes for electroporation. 6. Paper towel. 7. MicroPulser™ Electroporation Apparatus (Bio-Rad) or other electroporators. 8. 28 °C incubator and shaker. 9. Pipettes. 10. 1.5 mL sterile microcentrifuge tubes and tips. 11. Ice-cold water bath. Arabidopsis T-DNA Mutants 2.2 Transformation of Arabidopsis 243 1. Agrobacterium strain GV3101 (pMP90) containing appropriate T-DNA vector. 2. LB broth. 3. Murashige and Skoog (MS) medium. 4. Sucrose. 5. Silwet L-77. 6. Selective antibiotics or herbicides, carbenicillin. 7. Silica-gel desiccant. 8. Sterile Petri dishes, 1,000 mL flask. 9. Centrifuge, 500 mL centrifuge bottles. 10. Plant soils and fertilizer. 11. Plant pots and trays. 12. Growth chambers and greenhouse maintained at 22 °C with 16 h light and 8 h dark photoperiod. 2.3 Identification of T-DNA Insertion Site 2.3.1 Preparation of Genomic DNA from T-DNA Transgenic Plants 1. CTAB buffer: 2 % (w/v) cetyltrimethylammonium bromide (CTAB), 100 mM Tris, 20 mM EDTA, and 1.4 M NaCl (see Note 1). 2. Absolute ethanol and 70 % ethanol (prechilled in −20 °C refrigerator). 3. Chloroform/isoamyl alcohol (24:1). 4. Sterile ddH2O. 5. 1 % Agarose gel, 6× loading buffer, and 1× TAE buffer. 6. Liquid nitrogen. 7. Sterile 1.5 mL microcentrifuge tubes and tips. 8. Plastic tissue-grinding pestles. 9. Micropipette. 10. 65 °C water bath. 11. Microcentrifuge. 12. Vortex mixer. 13. Agarose gel electrophoresis system. 2.3.2 Identification of T-DNA Insertion Site by TAIL-PCR 1. Genomic DNA from T-DNA insertion mutants. 2. T-DNA-specific primers: 1.5 μM LS1 for primary reaction, 2.0 μM LS2 for secondary reaction, 2.5 μM LS3 for tertiary reaction, and 2.5 μM LS4 for sequencing PCR product (see Note 2). 3. 20 μM arbitrary degenerate (AD) primers (see Note 3). 4. Taq DNA polymerase, 10× PCR buffer (MgCl2-free), 25 mM MgCl2, and dNTPs mixture with 2.5 mM each of dATP, dCTP, dGTP, and dTTP. 244 Li-Jia Qu and Genji Qin 5. Agarose gel. 6. Gel purification kit. 7. PCR thermocycler. 8. Pipettes. 9. Sterile PCR plates, microcentrifuge tubes and tips. 10. Agarose gel electrophoresis system. 11. Microcentrifuge. 2.4 Cosegregation Analysis 1. Genomic DNA of plants from F2 segregation population. 2. Specific primers: 10 μM P1 and P2 designed from genomic sequence flanking the T-DNA insert. 3. Taq DNA polymerase, 10× PCR buffer (MgCl2-free), 25 mM MgCl2, and dNTPs mixture with 2.5 mM each of dATP, dCTP, dGTP, and dTTP. 4. Agarose gel. 5. PCR thermocycler. 6. Pipettes. 7. Sterile PCR plates, microcentrifuge tubes and tips. 8. Agarose gel electrophoresis system. 9. Microcentrifuge. 2.5 Complementation and Recapitulation Analysis 1. Plasmid DNA of plant binary vector containing CaMV 35S promoter. 2. Competent cells of Agrobacterium strain GV3101 (pMP90). 3. LB broth and agar plates with antibiotics. 4. MS medium, sucrose, Silwet L-77. 5. Selective antibiotics or herbicides, carbenicillin. 6. Sterile 50 and 1,000 mL flasks. 7. Sterile 500 mL centrifuge bottles. 8. 28 °C incubator and shaker. 9. MicroPulser™ Electroporation Apparatus (Bio-Rad) or other electroporators. 10. Ice-cold water bath. 11. Pipettes, microcentrifuge tubes and tips. 12. Microcentrifuge. 13. Silica-gel desiccant. 14. Sterile Petri dishes. 15. Plant soils and fertilizer, plant pots, and trays. 16. Growth chambers and greenhouse at 22 °C with a 16 h light and 8 h dark photoperiod. Arabidopsis T-DNA Mutants 3 245 Methods 3.1 Preparation of Agrobacterium tumefaciensContaining T-DNA Vector 3.1.1 Preparation of A. tumefaciens Competent Cells 1. Streak the A. tumefaciens strain GV3101 (pMP90) glycerol stock on the LB plate supplemented with 10 μg/mL rifampicin and 50 μg/mL gentamicin and incubate the plate at 28 °C for 2 days (see Note 4). 2. Transfer a single colony to 5 mL LB broth supplemented with 10 μg/mL rifampicin and shake the culture at 220 rpm in a 28 °C incubator for 1 day (see Note 5). 3. Add the 5 mL culture to 500 mL LB broth supplemented with 10 μg/mL rifampicin and incubate it with shaking at 220 rpm and 28 °C overnight to an OD600 of 0.5–0.8 (see Note 6). 4. Decant the 500 mL culture into two sterile 500 mL centrifuge bottles for balance and centrifuge the bottles at 3,000 × g for 15 min at 4 °C (see Note 7). 5. Pour off carefully the supernatant and add 250 mL sterile, icecold 10 % glycerol. Shake the bottles by hand in an ice-water bath to resuspend the cell pellets. 6. Pellet the cells again by centrifugation at 3,000 × g for 15 min at 4 °C and discard the supernatant. 7. Add 250 mL sterile, ice-cold 10 % glycerol and resuspend the cell pellets again in the ice-water bath. 8. Repeat step 6. 9. Add 2 mL sterile, ice-cold 10 % glycerol to each pellet and resuspend it in the ice-water bath. 10. Transfer the cell suspension to prechilled 1.5 mL Eppendorf tubes and put these tubes on ice. 11. Dispense 40 μL of the competent cells into the prechilled microcentrifuge tubes on the ice. 12. Freeze the cells in the liquid nitrogen and then store at −70 °C (see Note 8). 3.1.2 Transformation of A. tumefaciens Competent Cells by Electroporation 1. Prepare the plasmid of the T-DNA vector and adjust the concentration to about 30 ng/μL (see Note 9). 2. Remove the A. tumefaciens competent cells from the −70 °C freezer and thaw on ice. 3. Add 0.5 mL LB to 1.5 mL sterile Eppendorf tube and mark with the vector name. Prechill a 0.1 cm electroporation cuvette on ice (see Note 10). 4. Mix 1 μL plasmid with the A. tumefaciens competent cells by pipetting up and down and put on ice for 5 min. 246 Li-Jia Qu and Genji Qin 5. While waiting, set the MicroPulser™ Electroporation Apparatus to the “Agr” preprogram (the voltage for A. tumefaciens is 2.2 kV). 6. Add the mixture to the prechilled electroporation cuvette and wipe the outsides of the cuvette with the paper towel to absorb the condensation. 7. Put the cuvette in the chamber slide and push the slide into the chamber until the cuvette is seated between the contacts in the base of the chamber. Press the “Pulse” button once and a beep sound will be heard. 8. Remove the cuvette from the chamber and add the prepared 0.5 mL LB broth to the cuvette immediately (see Note 11). 9. Pipette up and down and transfer the A. tumefaciens solution back to the 1.5 mL sterile Eppendorf tube (see Note 12). 10. Incubate the tube at 28 °C with shaking at 220 rpm for 3–4 h to allow cell recovery. 11. Plate 30–50 μL A. tumefaciens on the LB agar media containing the antibiotic for selection of the target T-DNA vector. Place the plate in the 28 °C incubator for about 2–3 days (see Note 13). 12. Add 1 mL LB broth containing selective antibiotic to 1.5 mL tubes and mark the tubes. Transfer several single colonies into these tubes, respectively. Incubate these tubes at 28 °C with shaking at 220 rpm for 2 days. 13. Perform the PCR analysis with 1 μL A. tumefaciens culture as template to verify the existence of the T-DNA plasmid in the positive colonies (see Note 14). 14. Add the cultures of all positive colonies containing the target T-DNA plasmid to 500 mL LB broth supplemented with appropriate antibiotics. Incubate the cultures at 28 °C with shaking at 220 rpm for 1–2 days (see Note 15). 15. Collect the A. tumefaciens cells by centrifugation for plant transformation. 3.2 Transformation of Arabidopsis by Floral Dip Method 1. Grow 12 plants per pot (8 cm × 8 cm) in the tray at 22 °C with a 16 h light and 8 h dark photoperiod. Spray with liquid fertilizer every week to obtain healthy Arabidopsis plants (see Note 16). 2. Prepare the A. tumefaciens cells containing the target T-DNA plasmid as described in step 14 of Subheading 3.1.2 (see Note 17). 3. Pellet the cells by centrifugation at 3,000 × g for 15 min at room temperature and discard the supernatant. 4. Resuspend the pellets in 250 mL solution containing halfstrength MS salts plus 5 % sucrose and 0.03 % Silwet L-77 Arabidopsis T-DNA Mutants 247 surfactant. Pour the suspension into a container such as the Petri dish for floral dipping (see Note 18). 5. Select healthy plants with a lot of unopened flower buds for transformation. Immerse all inflorescences into the A. tumefaciens cell suspension for 5–15 min and allow all flower buds to be dipped in the suspension (see Note 19). 6. Put the dipped plants in a deep tray. Cover the tray with a transparent glass cover to maintain the high humidity for about 24 h (see Note 20). 7. Remove the cover the following day. Water the plants from the bottom of the tray and transfer them to the greenhouse at 22 °C with a 16 h light and 8 h dark photoperiod. 8. Water and take care of the dipped plants. Stop watering when most siliques of the dipped plants become yellow (see Note 21). 9. Harvest the seeds from the dipped plants using the sieve mesh and put the seeds into the 1.5 mL microcentrifuge tubes. Add some silica-gel desiccant into the tubes to dry the seeds (see Note 22). 10. Prepare selection plates containing 1/2 MS medium plus selective antibiotics or herbicides. Sterilize the seeds routinely and spread them on the plates (see Note 23). 11. Incubate the plates at 4 °C for 2–3 days for synchronization. Transfer the plates to the growth chamber at 22 °C with a 16 h light and 8 h dark photoperiod for 7–10 days. 12. Transfer putative T1 transformants with green cotyledons and leaves to soil and grow them in the greenhouse at 22 °C with a 16 h light and 8 h dark photoperiod. 3.3 Identification of T-DNA Insertion Site by TAIL-PCR Method 3.3.1 Preparation of DNA from T-DNA Transgenic Plants with the CTAB Method 1. Screen the T-DNA insertion mutants with an interesting phenotype from T-DNA transformants (see Note 24). 2. Harvest about 50–100 mg leaves (or one medium-sized leaf) and place in the 1.5 mL microcentrifuge tube. Grind the tissues to a fine powder in liquid nitrogen using a plastic tissuegrinding pestle. 3. Add 400 μL 65 °C preheated 2 % CTAB extraction buffer and mix well using the pestle. 4. Incubate the microcentrifuge tube in the 65 °C water bath for 10 min to 2 h. Mix every 10–30 min. 5. Add 400 μL of chloroform/isoamyl alcohol (24:1) and vortex the solution vigorously. 6. Centrifuge at 11,340 × g for 10 min at room temperature. 7. Transfer about 300 μL of the upper aqueous phase carefully to a new tube (see Note 25). 248 Li-Jia Qu and Genji Qin 8. Add 600 μL −20 °C prechilled absolute ethanol to each tube and mix well by inverting the tubes. Place the tube in a −20 °C icebox for at least 30 min (see Note 26). 9. Centrifuge at 11,340 × g for 10 min. Discard the supernatant. 10. Add 500 μL −20 °C prechilled 70 % ethanol to wash the DNA pellets for 5–10 min. 11. Centrifuge at 11,340 × g for 10 min. Discard the supernatant carefully (see Note 27). 12. Dry the DNA pellets by inverting the tubes on the paper towel (see Note 28). 13. Add 20–50 μL sterile ddH2O to dissolve the DNA (see Note 29). 14. Run an agarose gel to determine the quality and quantity of the DNA. Choose the DNA with good quality for TAIL-PCR (see Note 30). 3.3.2 Identification of T-DNA Insertion Site by TAIL-PCR 1. Thaw 10× PCR buffer (MgCl2-free), 25 mM MgCl2, 2.5 mM each dNTPs mixture, and 1.5 μM LS1 and 20 μM AD primers in one’s hand. Keep the solution on ice after thawing. 2. Perform the primary reaction in a total volume of 10 μL. Add 1 μL of different DNA samples (about 20 ng/μL) to each PCR tube/well of the PCR plate when obtaining flanking sequences from multiple T-DNA lines with one AD primer. Add 1 μL of different AD primers in each tube/well when obtaining flanking sequences from one mutant of interest with different AD primers. Put the PCR plate/tubes on ice (see Note 31). 3. Prepare a master mixture of the primary reaction in a sterile 1.5 mL microcentrifuge tube. Each reaction contains the following reagents in the mixture: 1 μL of 10× PCR buffer, 0.8 μL of 25 mM MgCl2, 0.8 μL dNTPs mixture, 1.0 μL of 1.5 μM LS1, 1.0 μL of 20 μM AD primers, and 0.5 U of Taq DNA polymerase; add ddH2O to 10 μL. Briefly mix and centrifuge (see Note 32). 4. Add 9 μL of master mixture to each tube/well. Add a drop of paraffin oil using pipettes to each tube/well. Briefly mix and centrifuge. Place the plate/tubes on ice. 5. Program the thermocycler for the primary reaction as follows: 92 °C for 3 min and 95 °C for 30 s; 5 cycles of 94 °C for 30 s, 65 °C for 1 min, and 72 °C for 2 min; 94 °C for 30 s, 25 °C for 2 min, ramping to 72 °C over 2 min, and 72 °C for 2 min; 14 cycles of 94 °C for 10 s, 65 °C for 1 min, 72 °C for 2 min, 94 °C for 10 s, 65 °C for 1 min, 72 °C for 2 min, 94 °C for 10 s, 44 °C for 1 min, and 72 °C for 2 min; and 72 °C for 5 min. 6. Place the tubes/plate into the block and run the thermocycler program. After completion, place the PCR products on ice to Arabidopsis T-DNA Mutants 249 continue the secondary reaction, or store them at −20 °C until the secondary reaction is performed. 7. To continue with the secondary reaction, thaw 10× PCR buffer (MgCl2-free), 25 mM MgCl2, 2.5 mM each dNTPs mixture, and 2.0 μM LS2 and 20 μM AD primers. Place on ice after thawing. 8. Perform the secondary reaction in a total volume of 10 μL. Dilute 2 μL of each product from the primary reaction in 80 μL of ddH2O. Add 2 μL dilutions to each tube/well as template (see Note 33). 9. Prepare the master mixture of the secondary reaction in a sterile 1.5 mL microcentrifuge tube. Each reaction contains the following reagents in the mixture: 1 μL of 10× PCR buffer, 0.6 μL of 25 mM MgCl2, 0.8 μL dNTPs mixture, 1.0 μL of 2.0 μM LS2, 0.8 μL of 20 μM AD primers, and 0.3–0.5 U of Taq DNA polymerase; add ddH2O to bring to a total volume of 10 μL. Briefly mix and centrifuge. 10. Add 9 μL master mixture to each tube/well. Add a drop of paraffin oil using pipettes to each tube/well. Briefly mix and centrifuge. Place the plate/tubes on the ice. 11. Set up the PCR program for the secondary reaction. The program is 12–14 cycles of 94 °C for 10 s, 65 °C for 1 min, 72 °C for 2 min, 94 °C for 10 s, 65 °C for 1 min, 72 °C for 2 min, 94 °C for 10 s, 45 °C for 1 min, 72 °C for 2 min; then 72 °C for 5 min. 12. Place the tubes/plate in the block and run the thermocycler program. After completion, place the PCR products on ice to continue the tertiary reaction, or store them at −20 °C (see Note 34). 13. To continue with the tertiary reaction, thaw 10× PCR buffer (MgCl2-free), 25 mM MgCl2, 2.5 mM each dNTPs mixture, and 2.5 μM LS3 and 20 μM AD primers. Place on ice after thawing. 14. Perform the tertiary PCR amplification in a 50 μL volume. Dilute 2 μL of each product from the secondary reaction in 20 μL of ddH2O. Add 2 μL dilutions to each tube/well as template (see Note 35). 15. Prepare the master mixture of the tertiary reaction in a sterile 1.5 mL microcentrifuge tube. Each reaction contains the following reagents for preparing the mixture: 5 μL of 10× PCR buffer, 4 μL of 25 mM MgCl2, 4 μL of dNTPs mixture, 5 μL of 2.5 μM LS2, 0.8 μL of 20 μM AD primers, and 1.0–1.2 U of Taq DNA polymerase; add ddH2O to bring to 50 μL. Briefly mix and centrifuge. 250 Li-Jia Qu and Genji Qin Fig. 1 Secondary products (marked as 2 ) and the corresponding tertiary products (marked as 3 ) from a specific sample were run on the agarose gel side by side. The size shifts in the gel reveal the product specificity. Red arrows indicate one example of a specific product with an obvious size shift (Color figure online) 16. Add 48 μL master mixture to each tube/well. Add a drop of paraffin oil using pipettes to each tube/well. Briefly mix and centrifuge. Place the plate/tubes on ice. 17. Set up the program for the tertiary reaction. The program is 23–25 cycles of 94 °C for 10 s, 45 °C for 1 min, 72 °C for 2 min; then 72 °C for 5 min. 18. Place the tubes/plate into the block and run the thermocycler program. After completion, run the gel to analyze the products, or store them at −20 °C. 19. To analyze the PCR products, run 5 μL of the secondary and tertiary products derived from the same primary reaction side by side on a 1.0 % agarose gel. 20. The specific products are indicated by the expected size shift between the secondary and tertiary products, whereas nonspecific DNA bands display the same size or a wrong size shift (Fig. 1). Record the size of specific bands of tertiary products. 21. Run the other 45 μL of tertiary products with specific bands on a new 0.8 % agarose gel. Excise the specific gel bands. Purify the DNA using a gel purification kit as described in the instructions manual. 22. Determine the sequence of the purified DNA using the specific primer LS4 with a company providing a DNA sequencing service (see Note 36). 23. Align the obtained sequence with the Arabidopsis genome sequence in the NCBI databases (http://blast.ncbi.nlm.nih.gov/ Blast.cgi) using nucleotide BLAST software to determine the location of the T-DNA insert in the T-DNA insertion mutant. 3.4 Cosegregation Analysis 1. Design two specific primers P1 and P2 on the basis of the two flanking sequences on the Arabidopsis genome of the T-DNA insert identified by TAIL-PCR (Fig. 2) (see Note 37). Arabidopsis T-DNA Mutants 251 Fig. 2 Design of primers for cosegregation analysis. (a) Schematic representation of a T-DNA insert with four CaMV 35S enhancers in the chromosome of one mutant. The black arrows represent the DL1, P1, and P2 primers used in the cosegregation analysis. LB, T-DNA left border; RB, T-DNA right border; 4Enhancers, four CaMV 35S enhancers; bar, Basta resistance gene. (b) Cosegregation analysis of the T-DNA insert with the specific phenotype of the mutant. The 615 bp DNA bands were amplified from the wild-type genomic DNA, whereas the 787 bp bands were amplified from the homozygous mutant genomic DNA. Both bands were amplified from the genomic DNA of the heterozygous mutants 2. Cross the T-DNA insertion mutant with wild-type Arabidopsis to obtain F1 seeds. Allow the F1 plants to self-fertilize to obtain F2 seeds. Germinate the F2 seeds to obtain a F2 segregation population (see Note 38). 3. Prepare genomic DNA from the F2 plants. Record the phenotypes of each individual. 4. Thaw 10× PCR buffer (MgCl2-free), 25 mM MgCl2, 2.5 mM each dNTPs mixture, and 10 μM LS3, P1, and P2 primers. Place on ice after thawing. 5. Perform the PCR reaction in a total volume of 10 μL. Add 1 μL genomic DNA from the F2 plants to each tube/well as template. 6. Prepare the master mixture in a sterile 1.5 mL microcentrifuge tube. Each reaction contains the following reagents in the mixture: 1 μL of 10× PCR buffer, 0.6 μL of 25 mM MgCl2, 0.8 μL of dNTPs mixture, 0.2 μL of 10 μM LS3, P1, and P2 primers, and 0.3–0.5 U Taq DNA polymerase, and add ddH2O to bring the volume to 10 μL. Briefly mix and centrifuge. 7. Add 9 μL master mixture to each tube/well. Add a drop of paraffin oil using pipettes to each tube/well. Briefly mix and centrifuge. Place the plate/tubes on ice. 252 Li-Jia Qu and Genji Qin 8. Set up the PCR program for the PCR reaction. The program is 95 °C for 2 min; 35 cycles of 94 °C for 10 s, 58 °C for 30 s, 72 °C for 1 min; and then 72 °C for 5 min. 9. Place the tubes/plate into the block and run the thermocycler program. After completion, run a 1 % agarose gel to determine the presence of the T-DNA insert. No band amplified by LS3 and either P1 or P2 is obtained from wild-type plants. No band amplified by P1 and P2 is obtained from the homozygous mutant. Both bands are amplified from heterozygous plants (Fig. 2). 10. Analyze the genotyping data in relation to the phenotypes. If all plants with a specific phenotype carry the T-DNA insert, whereas those with no phenotype do not carry the insert, the specific phenotype is cosegregated with the T-DNA insert. 3.5 Complementation and Recapitulation Analysis 1. Align the flanking sequence of the cosegregated T-DNA with the Arabidopsis genome to determine the position of the insert in the possible mutated gene (see Note 39). 2. Identify the mutant as a loss-of-function or gain-of-function mutant. If it is a loss-of-function mutant, complementation analysis is needed. If it is a gain-of-function mutant, recapitulation analysis is needed (see Note 40). 3. Clone the target gene. Prepare the construct in which the gene is driven by the CaMV35S promoter. Prepare the Agrobacterium cells containing the plasmid of the construct (see Note 41). 4. Transform the CaMV35S promoter-driven gene into the mutants by the floral dip method if the mutant is a loss-offunction mutant. If it is a gain-of-function mutant, transform the CaMV35S promoter-driven gene into wild-type plants (see Note 42). 5. Harvest seeds. Screen the transformants on 1/2MS containing an appropriate selection antibiotic or herbicide. 6. Observe the phenotype of the T1 generation. In complementation analysis, if the mutant transformants recover the wild-type phenotype, the mutant is complemented by the target gene. In recapitulation analysis, if the phenotype of wild-type transformants mimics that of the mutant, it is concluded that target gene activation leads to the specific phenotype of the T-DNA insertion mutant (see Note 43). 4 Notes 1. To prepare 100 mL of 2 % CTAB extraction buffer, add 2 g CTAB, 10 mL of 1 M Tris–HCl (pH 8.0), 4 mL of 0.5 M EDTA (pH 8.0), and 8.19 g NaCl, and add water to bring to a final volume of 100 mL. Sterilize and store the solution at Arabidopsis T-DNA Mutants 253 room temperature. Preheat it at 65 °C and add 0.2–0.5 % β-mercaptoethanol before use. 2. Specific primers are designed to be complementary to the sequence neighboring the left or right border of the T-DNA vector used for T-DNA tagging. The four primers are designed to be nested. The Tm values of primer LS1 and LS2 are designed to be about 62–65 °C, and those of LS3 and LS4 could be lower as for ordinary primers. Some vectors are introduced with some elements such as four repeats of the CaMV 35S promoter enhancer. The repeated sequence cannot be used for the design of the specific primers. About 50–70 bp distance between LS2 and LS3 is designed for determination of the specificity of the PCR product by the obvious gel shift difference. The primers are dissolved in sterile ddH2O to a concentration 100 μM and stored in the freezer. Dilute to the appropriate concentration before use. 3. AD primers can be used in TAIL-PCR experiments for obtaining the genomic sequence flanking different T-DNA vectors or even any unknown sequence flanking a known sequence in different species. The characteristics of AD primers are 15–16 nucleotides in length with a Tm value of about 45 °C and 64–256 times degeneracy. We use the following 11 AD primers, among which the AD1, AD2, AD3, and AD4 primers were previously designed [14] and used in most TAIL-PCR studies. The other AD primers used in our laboratory were based on AD1 to AD4, but certain nucleotides were altered and thus the new primers were named after them. These AD primers are AD1: 5′-NTC GA(G/C) T(A/T)T (G/C)G (A/T)G TT-3′; AD1-1: 5′-NAC GT(G/C) A(A/T)T (G/C)C NAG A-3′; AD1-2: 5′-NTC GA(G/C) T(A/T) TNG (A/T)G AA-3′; AD2: 5′-NGT CGA (G/C)(A/T)G ANA (A/T)G AA-3′; AD2-1: 5′-NTC GT(G/C) (A/T)G ANA (A/T)GT T-3′; AD2-2: 5′-NCA GCT (G/C)(A/T)C TNT (A/T)GA A-3′; AD2-2: 5′-NCA GCT (G/C)(A/T)C TNT (A/T)GA A-3′; AD2-3: 5′-NCT CGT (G/C)(A/T)G ANT (A/T)GA T-3′; AD2-5: 5′-NGT CGA (G/C)(A/T)C TNA (A/T)CA A-3′; AD3: 5′-(A/T)GT GNAG(A/T)ANCANAGA-3′; AD4: 5′-AG(A/T) GNA G(A/T)A NCA (A/T)AG G-3′; and AD41: 5′-AG(A/T) CAN G(A/T)T NCA (A/T)GA A-3′. 4. The A. tumefaciens strain GV3101 (pMP90) is resistant to antibiotics rifampicin and gentamicin. If preparing competent cells of other A. tumefaciens strains, supplement the medium with appropriate antibiotics. The purpose of streaking for single colonies is to obtain genetically identical A. tumefaciens cells and activate the cells at the same time. 5. Supplement the liquid culture with rifampicin only to save on expenses. Gentamicin is slightly expensive and the possibility 254 Li-Jia Qu and Genji Qin of contamination is low because of streaking for single colonies on the plate with both antibiotics. 6. It will take about 8–12 h for the A. tumefaciens cells to grow to a concentration with an OD600 of 0.5–0.8. After the cells reach the log phase, keep the culture in an ice-water bath and all following steps are carried out in the ice-water bath or under 4 °C. 7. The rotor of the centrifuge needs to be prechilled before centrifuging. It may be placed in the 4 °C refrigerator or cold room for prechilling. 8. The competent cells are stable for more than 6 months at −70 °C. However, the competent cells could be used even after storage for several years under −70 °C. The efficiency is sufficient for transformation of T-DNA plasmids into A. tumefaciens competent cells. 9. The plasmid DNA should be dissolved in either ddH2O or 1/2 TE. An overly high DNA concentration or ionic strength will probably cause the pulse to be too intense and have a low transformation rate. 10. The electroporation cuvettes should be prechilled. Usually, for convenience, just store the cuvettes at −20 °C and place on ice before use. 11. Don’t transfer the A. tumefaciens cells in the cuvettes directly to the recovery tube. Wash down the cells with liquid LB and then transfer the mixture to the tube. 12. For convenience, 1.5 mL tubes with 0.5 mL LB broth works fine for cell recovery. 13. Usually the efficiency of transformation is sufficiently high. To be sure of obtaining single colonies, plate the A. tumefaciens cells on one half of the selective medium and streak on the other half. 14. Don’t carry out the PCR analysis using single colonies as template directly. The false-positive rate is high because of the trace plasmid DNA on the plate. 15. Mix all cultures of positive colonies in case the single colony selected is a false-positive. If you do not perform plant transformation immediately, add the same volume of sterile 50 % glycerol to the culture and store the culture at −70 °C. 16. Healthy Arabidopsis plants are very important to obtain a sufficient number of transformants by the floral dip method. Grow plants at a high density to prevent the soil falling when inverting the pots for dipping. 17. If the A. tumefaciens cells containing the target T-DNA plasmid are stored at −70 °C, remove them from the freezer and Arabidopsis T-DNA Mutants 255 inoculate 25 mL LB broth containing appropriate antibiotics. Incubate the cells with shaking at 220 rpm at 28 °C for 2 days. Pour the 25 mL LB culture into 500 mL LB broth supplemented with antibiotics and incubate the culture at 28 °C with shaking at 220 rpm for 1 day to obtain a sufficient number of A. tumefaciens cells. 18. Although it is reported that MS salts do not increase the transformation rate, the salts provide nutrients for the plants. The plants dipped in this solution grow better than those dipped in solution lacking MS salts. An excessive concentration of the surfactant Silwet L-77 harms the inflorescences and leads to low fertility. 19. If the A. tumefaciens cells containing the target T-DNA plasmid are not ready or to obtain plants with a higher number of immature flower buds, clip the first bolts a week before dipping to allow more secondary inflorescences to develop. Clipping the siliques of the plants may increase the transformation rate when performing floral dipping. 20. If the bolts of the dipped plants are too high to put in the deep tray, lay the plants on their sides and wrap the plants with the plastic film to maintain the moisture for about 24 h. 21. If watering is withheld too early, most of the seeds harvested from the dipped inflorescences will not germinate normally and thus lead to failure of transformation. 22. The seeds can be stored at 4 °C with silica-gel desiccant for more than 1 year. 23. The selection medium can be supplemented with 100– 200 mg/L carbenicillin to inhibit bacterial contamination. Scatter the seeds well on the selection medium. Sowing the seeds at an overly high density will affect the selection. 24. The dominant T-DNA insertion mutants can be obtained from T1 transformants by activation tagging. To obtain the recessive T-DNA insertion mutants, T2 transgenic mutants should be used. To facilitate screening, we mix the seeds of about 10,000 T-DNA insertion lines and screen the seed pool for mutants with interesting phenotypes. After obtaining mutants of interest, genomic DNA is isolated and T-DNA insertion sites are determined. 25. Contamination with trace chloroform affects the next enzyme reaction. To avoid contamination, discard about 100 μL aqueous phase and transfer only 300 μL to the new tube. 26. Addition of sodium acetate to the aqueous phase is not needed before ethanol precipitation of DNA because of the existence of NaCl in the CTAB extraction buffer. 27. Discard the supernatant gently to prevent losing the DNA pellet. 256 Li-Jia Qu and Genji Qin 28. Do not let the DNA pellet to become overly dry. Overdried DNA pellets are hard to dissolve. 29. 0.5 % volume of RNase (10 mg/mL) can be added to the DNA solution to remove RNA contamination. 30. After running the gel, if the high-molecular-weight band is present, the DNA is of good quality. If a smeared band is present, the DNA is degraded. We use 10, 20, 30, and 50 ng lambda DNA standards to quantify DNA. 31. When obtaining the flanking sequence from multiple T-DNA mutants (e.g., for generation of a T-DNA mutant collection), perform first the TAIL-PCR with one AD primer such as AD2, which gives a higher successful rate in reactions. Those samples in which amplification is unsuccessful are selected for performing the TAIL-PCR using a different AD primer. When obtaining the flanking sequence from a specific mutant (e.g., a mutant of interest obtained by painstaking screening), perform TAILPCR using all the AD primers to get a greater chance of capturing the target T-DNA flanking sequence, because multiple T-DNA inserts are frequently present in the genome of mutants of interest. 32. Prepare master mixture for 1–2 additional reactions in order to guarantee there is a sufficient volume for all reactions. 33. Mark tubes/plate clearly to make sure that one can trace back to the correct T-DNA lines. The dilutions can be stored in the freezer for at least 1 month. If using different AD primers, add AD primers to each tube/well. 34. Three microliters of the products from the secondary PCR reaction can be checked on an agarose gel. Those samples that display bright DNA bands on the gel are selected to continue the tertiary reaction, whereas those samples that have no obvious DNA bands can be discarded because the chances of obtaining specific products from them are low. 35. If using different AD primers, add AD primers to each tube/ well in the tertiary reaction. 36. TA cloning must be performed if the PCR product is a mixture of different products, such as the similar-sized products from two different inserts in one T-DNA mutant. 37. When designing the primers, calculate the size of the product amplified from the chromosome without the T-DNA insert by the specific primers and the size of the product from the chromosome with the T-DNA insert by one of the two specific primers and LS3 or LS4 on the T-DNA insert. Make sure that the size of the two amplified products can be differentiated on the agarose gel. This enables genotyping of the segregation population using the three primers in one PCR reaction. Arabidopsis T-DNA Mutants 257 38. The progenies of the heterozygous mutant can be used as the segregation population for cosegregation analysis. The population is composed of 300–500 plants. 39. Three kinds of T-DNA insert position may be located in one gene, that is, the intergenic region, intron, or exon. 40. If a T-DNA insert carrying activation elements is inserted in an intergenic region, it may cause a gain-of-function mutation. Sometimes, T-DNA without an activation element located in the 5′ or 3′ untranslated region (UTR) also leads to gene activation. If the T-DNA insertion is located in the intron or exon, it may lead to gene knockdown or knockout. For a gain-offunction mutant, we need to perform RT-PCR analysis of genes in the vicinity of the T-DNA insert to identify which gene is activated. For knockdown or knockout mutants, the target gene is the one in which the T-DNA insert is located. 41. Alternatively, the genomic sequence including the coding region, promoter region, and 3′ UTR is used for complementation analysis. 42. When transforming the T-DNA insertion mutants, use a different selection marker from that in the T-DNA insert of the mutant. 43. Complementation analysis or recapitulation analysis is the most convincing evidence for determination of the target gene that leads to the specific phenotypes of a T-DNA insertion mutant. References 1. Krysan PJ, Young JC, Sussman MR (1999) T-DNA as an insertional mutagen in Arabidopsis. Plant Cell 11:2283–2290 2. Wilson RN, Somerville CR (1995) Phenotypic suppression of the gibberellin-insensitive mutant (gai) of Arabidopsis. Plant Physiol 108:495–502 3. Weigel D et al (2000) Activation tagging in Arabidopsis. Plant Physiol 122:1003–1013 4. Engineer CB et al (2005) Development and evaluation of a Gal4-mediated LUC/GFP/ GUS enhancer trap system in Arabidopsis. BMC Plant Biol 5:9 5. Radhamony RN, Prasad AM, Srinivasan R (2005) T-DNA insertional mutagenesis in Arabidopsis: a tool for functional genomics. Electron J Biotechnol 8:82–106 6. Mattanovich D et al (1989) Efficient transformation of Agrobacterium spp. by electroporation. Nucleic Acids Res 17:6747 7. Shen WJ, Forde BG (1989) Efficient transformation of Agrobacterium spp. by high voltage electroporation. Nucleic Acids Res 17:8385 8. Clough SJ, Bent AF (1998) Floral dip: a simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana. Plant J 16:735–743 9. Clough SJ (2005) Floral dip: Agrobacteriummediated germ line transformation. Methods Mol Biol 286:91–102 10. Bent AF (2006) Arabidopsis thaliana floral dip transformation method. Methods Mol Biol 343:87–103 11. Zhang X et al (2006) Agrobacterium-mediated transformation of Arabidopsis thaliana using the floral dip method. Nat Protoc 1:641–646 12. Clarke JD (2009) Cetyltrimethyl ammonium bromide (CTAB) DNA miniprep for plant DNA isolation. Cold Spring Harb Protoc 2009, pdb prot. 5177. doi:10.1101/pdb.prot5177 13. Liu YG, Huang N (1998) Efficient amplification of insert end sequences from bacterial artificial chromosome clones by thermal asymmetric interlaced PCR. Plant Mol Biol Rep 16:175–181 14. Liu YG et al (1995) Efficient isolation and mapping of Arabidopsis thaliana T-DNA insert 258 Li-Jia Qu and Genji Qin junctions by thermal asymmetric interlaced PCR. Plant J 8:457–463 15. Liu YG, Whittier RF (1995) Thermal asymmetric interlaced PCR-automatable amplification and sequencing of insert end fragments from P1 and YAC clones for chromosome walking. Genomics 25:674–681 16. Qin G et al (2005) An indole-3-acetic acid carboxyl methyltransferase regulates Arabidopsis leaf development. Plant Cell 17:2693–2704 17. Qin G et al (2007) Arabidopsis AtBECLIN 1/ AtAtg6/AtVps30 is essential for pollen germination and plant development. Cell Res 17:249–263 Chapter 14 Identification of EMS-Induced Causal Mutations in Arabidopsis thaliana by Next-Generation Sequencing Naoyuki Uchida, Tomoaki Sakamoto, Masao Tasaka, and Tetsuya Kurata Abstract Emerging next-generation sequencing (NGS) technologies are powerful tools for the identification of causal mutations underlying phenotypes of interest in Arabidopsis thaliana. Based on a methodology termed bulked segregant analysis (BSA), whole-genome sequencing data are derived from pooled F2 segregants after crossing a mutant to a different polymorphic accession and are analyzed for single nucleotide polymorphisms (SNPs). Then, a genome region spanning the causal mutation site is narrowed down by linkage analysis of SNPs in the accessions used to produce the F1 generation. Next, candidate SNPs for the causative mutation are extracted by filtering the linked SNPs using multiple appropriate criteria. Effects of each candidate SNP on the function of the corresponding gene are evaluated to identify the causal mutation, and its validity is then confirmed by independent criteria. This chapter describes the identification by NGS analysis of causal recessive mutations derived from EMS mutagenesis. Key words Next-generation sequencing, Whole-genome sequencing, Ethyl methanesulfonate, Bulked segregant analysis 1 Introduction Though mutagenesis-based approaches have been used in Arabidopsis in various biological studies, it is still laborious and timeconsuming to define the mutations causing phenotypes of interest by conventional means such as map-based cloning. Emerging nextgeneration sequencing (NGS) technologies are powerful and versatile tools which are now being used for the rapid, cost-effective identification of spontaneous as well as mutagenesis-induced mutations in Arabidopsis [1–7]. To identify the mutation behind an interesting effect, wholegenome sequencing without genetic manipulation followed by comparison of genomic sequences between the mutant and its parental line might appear to be the simplest approach. This strategy, however, is problematic since numerous background mutations Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_14, © Springer Science+Business Media New York 2014 259 260 Naoyuki Uchida et al. a f Mutant m/m (Col-0) Wild type M/M (Ler) Candidate region Homozygous SNPs ** * g b F1 (M/m) c Remove SNPs in parental accession lines for F1 generation * * F2 individuals selected by phenotype Homozygotes at the responsible locus (m/m) h Extract SNPs • within CDS and intron donor/acceptor sites • showing canonical EMS-induced nucleotide changes ** ** ** ** ** ** bulk d Short reads (e.g.75 bp) Ratio of homo-SNPs/hetero-SNPs * e Col-0 reference genome * * i Examine the effect of each SNP on the gene function e.g. Thr Thr ACG to ACA Gln Stop CAA to TAA * Candidate region Fig. 1 Schematic overview of the definition of EMS-induced causal mutations through the BSA approach. (a) A mutant in the Col-0 background is crossed with polymorphic Ler. (b) The F1 plants are self-fertilized to produce F2 seeds. (c) Chromosomes derived from Col-0 and Ler accession are represented by gray and white bars, respectively. Asterisks indicate the EMS-induced causal mutation. Seedlings from the F2 individuals exhibiting the phenotype of interest are selected and bulked. (d) Short reads from NGS are mapped to the reference Col-0 genome and SNPs are called. (e) The candidate region is identified by the distribution of SNPs derived from Ler (solid line). If mutant is derived from non-reference accessions and crossed to Col-0, distribution is opposite (dotted line). (f) Homozygous SNPs (arrowheads) are extracted from the candidate region; arrows show annotated genes. (g) Background-SNPs are removed. If multiple allelic mutants are available, it is possible to remove common background SNPs from multiple allelic samples. (h) Candidate SNPs are extracted using appropriate criteria. (i) Finally, the effects of the selected SNPs on the annotated gene function are evaluated usually exist in the genome that hamper the identification of the causal mutation without additional information. Various genetic manipulations may provide such information. For instance, rough mapping with conventional markers of F2 populations prior to NGS analysis may narrow down the location of the mutation. Alternatively, “bulked segregation analysis” (BSA) can be employed [8]. In several reports on NGS-based identification of recessive mutations, BSA has been used successfully [1–3, 6, 7]; a summary of the method is given in Fig. 1. In this chapter, procedures for the identification of ethyl methanesulfonate (EMS)-induced SNPs will be described. NGS Approach for Detection of EMS-Induced SNPs 261 Table 1 Summary for BSA-approaches to define EMS-induced causal mutations by NGS Background acc.a Crossed acc.b WS Col-0 Col-0 Read length (bp) Coverage Informatics Ref. 80 75 × 6.3~ × 9.2 CASAVA +Custom perl scripts [1] Ler 500 37 × 22 SHOREmap [2] Col-0 Ler 200 50 × 6.2~1777c Custom perl script [3] Col-0 Ler 93 36 × 12 MASS [6] × 29 ~ 74 NGM [7] Col-0 Ler Number of F2 bulked 80 d 38 × 2 a Background accession for mutant isolation Crossed accession to produce F2 segregates c Low coverage reads were used for delimitation of the candidate region. Deep sequencing data (~×1700) was used for definition of the causal mutation d “×2” indicates paired-end sequencing b Generally, after isolation of an EMS-induced mutant in one accession (e.g., Col-0), the mutant is crossed with another polymorphic accession (e.g., Ler), and F1 seeds are produced. Then, F2 individuals showing the recessive phenotype of interest are pooled and their bulked genomic DNA is used to prepare an NGS genomic library. The library is sequenced by NGS to provide short reads derived from the genome sequence. Basic steps to define the causal mutation are (1) mapping of short reads to the reference Col-0 genome, (2) calling SNPs against the Col-0 reference genome sequence, (3) linkage analysis using the SNPs, and (4) applying various filters to exclude SNPs which are unlikely causal mutations. In linkage analysis of cases where Col-0 is the parental accession that had been subjected to EMS mutagenesis, the linked region spanning the causal mutation has less La er-type SNPs compared with other regions of the genome. After the delimitation of the genomic region containing the causal mutation, candidate SNPs are examined for their nucleotide-substitution type (canonical EMS-type G/C to A/T conversion or its absence) [9] and also for their effects on the annotated gene (e.g., are non-synonymous or nonsense mutations induced? are intron acceptor/donor sites disrupted?). The number of F2 segregants which should be pooled, the appropriate sequence coverage number which should be achieved, and the efficiency of methods employed to narrow down the genomic region containing the causal mutation were examined in several studies of BSA-based approaches (Table 1). The confirmation of candidate SNPs by conventional Sanger sequencing effectively eliminates false-positive SNPs due to NGS or following informatics-analysis errors. If allelic mutations are available, the 262 Naoyuki Uchida et al. definition of the causative mutation will be facilitated. To confirm whether the identified SNP really causes the phenotype of the interest, complementation by corresponding genomic fragments or evaluation of T-DNA inserted knockout lines should be performed. NGS-based identification of causal mutations could be applied also to cases where non-reference Arabidopsis accessions serve as parental lines for mutagenesis [1]. It is theoretically possible to detect other types of mutations by NGS, including large insertions, deletions, inversions, and translocations (see Note 1). Different types of genome libraries should be prepared according to the type of genomic alteration. For small insertions/deletions (a few hundred base pairs), paired-end libraries are sufficient [10], but large-scale structural changes in the order of kilobases require mate-pair libraries [11]. NGS technologies are applicable to all plant species whose genomes have been sequenced if genetics approaches can be available. In this chapter, detailed procedures for the method reported in ref. 1 are described, including details omitted in the publication, as well as alternative approaches. The methodology is based on BSA after crossing a mutant in a non-reference accession, Wassilewskija (Ws), with the reference Col-0. Single-end reads of 75 bp that were produced at relatively low coverage (× 6.3 to × 9.2) by wholegenome sequencing with Illumina GAIIx were used to call SNPs by the CASAVA SNP call pipeline. Linkage analysis with SNPs and extraction of candidate SNPs for the causal mutation by several filters contributed to the identification of the causal mutation. 2 Materials 1. Seedlings from F2 individuals exhibiting the phenotype of interest. 2. Seedlings from the parental accessions used for crossing. 3. TissueLyser (Qiagen). 4. CelLytic PN Isolation/Extraction Kit (Sigma-Aldrich). 5. Plant DNeasy mini kit (Qiagen). 6. MicroTUBES (Covaris). 7. Covaris S2 (Covaris). 8. QIAquick PCR Purification Kit (Qiagen). 9. Elution Buffer (EB): 10 mM Tris–HCl (pH 8.5). 10. 2100 Bioanalyzer (Agilent). 11. DNA 1000 Kit (Agilent). 12. NEBNext DNA Sample Prep Reagent Set 1 (New England BioLabs). 13. Genomic Adaptor Oligo Mix (Illumina or New England BioLabs). NGS Approach for Detection of EMS-Induced SNPs 263 14. Certified Low Range Ultra Agarose (Bio-Rad). 15. Mupid electrophoresis system (Advance co., Ltd.). 16. MinElute Gel Extraction Kit (Qiagen). 17. PCR Primers 1.1/2.1 (Illumina). 18. Light Cycler 480 (Roche). 19. KAPA Library Quantification Kit (KAPAbiosystems). 20. Illumina GAIIx (Illumina). 21. TruSeq SR Cluster Kit v2-cBot-GA (Illumina). 22. TruSeq SBS Kit v5-GA (36 cycle) (Illumina). 23. PhiX Control Kit v3 (Illumina). 24. PowerEdge R900 linux server (64 Gb memory, 10 Tb storage; DELL). 25. CASAVA ver 1.7 software (Illumina). 26. Custom perl scripts. 3 Methods 3.1 Preparing Samples for the Genomic Library for Whole-GenomeSequencing 1. Bulk seedlings from F2 individuals exhibiting the phenotype of interest (see Note 2). Disrupt samples using the TissueLyser. Alternatively, grind plant tissues to a fine powder under liquid nitrogen using a mortar and pestle. Do not allow the sample to thaw. 2. Enrich nuclei fraction using “Semi-pure Preparation of Nuclei Procedures” of the CelLytic PN Isolation/Extraction Kit (see Note 3). 3. Isolate genomic DNA using Plant DNeasy mini kit from the semi-purified nuclei fraction (see Note 4). 4. Prepare 1 μg DNA in 130 μl TE and shear it in microTUBE using Covaris S2 at 100-bp setting (see Note 5). 5. Purify DNA using the QIAquick PCR Purification Kit and elute in 30 μl of EB. 6. Check the distribution of sheared genomic DNA with the 2100 Bioanalyzer according to the manufacturer’s protocol. 1 μl from fragmented solution is analyzed on microfluidic chip (see Note 6). 7. Prepare DNA library for genome sequencing using the total amount of purified DNA and NEBNext DNA Sample Prep Reagent Set 1 according to the manufacturer’s manual with some modifications. At the adaptor ligation step, use Genomic Adaptor Oligo Mix as DNA adaptor. After adaptor ligation, we add an optional step to enrich the optimal length of DNA fragments for genome sequencing: excise the 200–250 bp DNA fragments from an agarose gel, made with Certified Low 264 Naoyuki Uchida et al. Range Ultra Agarose, with a clean, sharp knife. For this step, we use Mupid electrophoresis system (see Note 7). Then, purify the fragments using the MinElute Gel Extraction Kit and elute in 15 μl of EB. At the PCR step for enrichment of the adapter-modified DNA fragment, use PCR Primers 1.1 and 2.1 (Illumina)/Universal PCR primer and Index1 primer (New England BioLabs) (see Note 8). 8. Purify DNA using the QIAquick PCR Purification Kit and elute in 30 μl of EB (see Note 9). 9. Check the distribution of the amplified DNAs with the 2100 Bioanalyzer. 10. Quantify the concentration of the library by quantitative PCR with Light Cycler 480 according to the manufacturer’s manual (see Note 10). 3.2 Short Read Sequencing and Informatic Analysis for EMS-Induced SNPs (See Note 11) 3.2.1 Sequencing with Illumina GAIIx 1. Conduct 75 bp sequencing according to the Illumina GAIIx operation manual. To create a cluster on the Illumina flowcell for single-read, the cluster-generation kit (TruSeq SR Cluster Kit v2–cBot-GA) is used with 8 pM of diluted libraries. 75 bp sequencing run was conducted with two set of SBS kit (TruSeq SBS Kit v5–GA [36 cycle]). To check the state of the run, the control library PhiX Control Kit v3 is used on one lane of the flowcell. 3.2.2 Alignment of Reads to a Reference Genome Sequence 1. Align reads from GAIIx to the reference genome sequence with the CASAVA software. Reference sequences for CASAVA are available to Illumina sequencer users from MyIllumina (https://icom.illumina.com/). The package including the reference sequences is named iGenome (see Note 12). 3.2.3 SNP Calling 1. Call SNPs using CASAVA with default parameters (see Note 13). Among the CASAVA output files, SNP lists for each chromosome (snps.txt) and the summary file (summary.html) are the most important for the following analysis. 3.2.4 Linkage Analysis with the Index of Enrichment of Homozygous SNPs 1. Define the chromosome containing the causal SNP in which the homozygous SNPs derived from the mutant accession are significantly enriched compared to the other chromosomes (see Note 14). 3.2.5 SNP Filtering 1. Filter the SNPs using several criteria. Filtering procedures are performed with the package of perl scripts, “snipSNP,” and EXCEL (see Notes 15 and 16). 3.2.6 Removal of SNPs in the Accession Line Used as Parent of the F1 Generation 1. Remove background (parental) SNPs with perl script extractSNP.pl which produces a list of mutant-specific SNPs (see Notes 16–18). NGS Approach for Detection of EMS-Induced SNPs 265 3.2.7 Extraction of SNPs Within Gene and Intron Donor/Acceptor Sites 1. Extract SNPs within gene and intron donor/acceptor sites with perl script snpinGFF.pl, which extracts SNPs if input SNPs are located in the regions defined in the GFF file (see Notes 16, 17, 19–21). 3.2.8 Narrowing Down the Chromosomal Region Spanning the Causal SNP by Linkage Analysis 1. Detect the chromosomal region displaying significant enrichment of homozygous SNPs derived from the mutant accession with perl script stateSNP.pl. This script counts the number of SNPs within each window divided by a defined interval (e.g., 500 kbp) (see Notes 16, 17, 22, and 23). 3.2.9 Extraction of SNPs Showing Canonical EMS-Induced SNPs 1. Extract canonical EMS-type SNPs (G to A or C to T) using EXCEL. 3.2.10 Examination of Effects of Candidate Causal SNPs on Corresponding Gene Functions 1. Check the effects of the candidate SNPs for the causal mutations on their corresponding gene functions (e.g., are nonsynonymous or nonsense mutations induced? are intron acceptor/donor sites disrupted?). 3.2.11 Additional Analyses If multiple allelic mutants exist, the following analysis is available (see Note 24): 1. Because multiple allelic mutants presumably harbor causal mutations in the same gene, extract SNPs which are induced in the same genes. 2. Removal of “background” SNPs identical in multiple allelic mutants (see Note 25). 3.3 Confirmation of the Extracted Candidate SNPs as Actual Mutations 1. Amplify the region spanning candidate causal SNPs by genomic PCR. 2. Conduct Sanger sequencing of the amplified fragments. 3.3.1 Sanger Sequencing of Candidate Causal SNPs (See Note 26) 3.3.2 Evaluation of the Identified Mutations by Independent Criteria 1. For a final confirmation that the SNPs identified are causative mutations, one or more of the following experiments should be performed: (a) evaluation of T-DNA insertion lines, (b) allelism tests by genetic crosses with preexisting mutants, and (c) complementation tests by transformation of candidate genes into the mutant. 266 4 Naoyuki Uchida et al. Notes 1. In this chapter, the analysis for EMS-derived SNP is described. Using paired-end reads, small insertions/deletions (indel:2– 20 nt) can be extracted by CASAVA 1.7 software. Large structural variants for indels translocations and inversions can be detected using specific informatic softwares and suitable genomic libraries (paired-end and/or mate-pair libraries). Using free or commercial softwares may be helpful to detect such structural variants derived from ionizing, fast neutron, and X-ray radiations. CLEVER with mapping free software, BWA (BWA: http://bio-bwa.sourceforge.net/; CLEVER: https://code.google.com/p/clever-sv/) and AVADIS-NGS (commercial: http://www.avadis-ngs.com/). When other mutagen than EMS is used (i.e., ionizing radiations), the experimental conditions and subsequent analysis of data should be established according to its effect on the chromosomes. 2. Schneeberger et al. [2], Cuperus et al. [6], and Uchida et al. [1] used 500, 93, and 80 F2 individuals, respectively. To remove the background SNPs or linkage analysis (see following steps), it is recommended to sequence the parental accessions used for crossing. The same procedure may be applicable to the identification of a dominant mutation with optional steps. In the case of a semidominant mutation, F2 plants displaying the homozygous phenotype are pooled and the genomic DNAs are prepped as a bulk for genome sequencing. In the case of a completely dominant mutation, F2 plants displaying a phenotype of interest are individually frozen and kept at −80 °C (several leaves from each plant would be enough) until the phenotypic segregation of F3 populations derived from each F2 individual can be examined. Then, F2 samples determined to be homozygous at the causal locus by analysis of the F3 generation are pooled and the genomic DNAs are prepped as a bulk for genome sequencing. Alternatively, following the examination of phenotypic segregation of the F3 population, homozygous F3 lines (e.g., all F3 plants derived from an F2 individual showing the mutant phenotype) could be pooled for sequencing. 3. Without this step, a relatively large population of plastid-derived genomes will be sequenced, leading to low efficiency of the detection of short reads corresponding to the nuclear genome. 4. We use two DNeasy columns for DNA isolation from a bulked pool of 80 F2 individuals (a total of 700 mg fine powder) and combine the resulting DNA solutions. 5. Duty cycle, 10 %; intensity, 5; cycles/burst, 100; time, 60 s; bath temperature, 4 °C. This cycle is done ten times. It is NGS Approach for Detection of EMS-Induced SNPs 267 needed near to 1 h to chill the water bath and 30 min for the degas process. 6. We routinely use the DNA 1000 kit (Agilent). If the amount of fragmented DNA is low, High Sensitivity DNA kit (Agilent) should be used on the bioanalyzer. It is critical to put the kit solutions at room temperature for 30 min before usage. 7. Mupid electrophoresis system can be obtained also from Helixx Technologies, Inc. However, we believe that any agaroseelectrophoresis equipment could be used for this procedure. 8. We use 12.5 ng DNA as template in 50 μl reaction buffer and 12 PCR cycles for amplification. 9. If PCR produces extra bands deviating from the expected size, an additional gel extraction step is recommended. 10. This step is critical for the achievement of maximum cluster density in the sequencing flowcell. We routinely use the KAPA Library Quantification Kit. 11. Data analysis for identification of causal SNPs consists of the following three steps: alignment of reads to the reference genome, SNP calling, and SNP filtering. Different types of software for these analyses are available both commercially and free. We used CASAVA ver.1.7 for alignment and SNP calling, which is based on Bayesian statistics, and custom perl scripts to filter the SNPs (these can be freely downloaded at http://bsw3.naist.jp/plantglobal/mmb2011/snipSNP.html). Care must be taken if other software is used, since the algorithms employed and appropriate parameter settings may vary between different programs. Alternatively, free software may be useful for mapping (e.g., Bowtie; http://bowtie-bio.sourceforge.net/index.shtml [12] and Burrows-Wheeler Aligner (BWA); http://bio-bwa.sourceforge.net/) [13] and SNP calling (e.g., SAMtools; http://samtools.sourceforge.net/ [14] and GATK; http://www. broadinstitute.org/gsa/wiki/index.php/Home_Page) [15]. 12. See the CASAVA manual for instructions and parameter settings. 13. Calling of SNPs by CASAVA consists of two steps. First, the allele call scores are calculated from the base calls and the alignment and read quality scores. Then, SNPs are called based on the allele call score and read depth. The allele call score should be larger than 10, and the coverage should be more than ×3. 14. The summary file shows the number of homozygous and heterozygous SNPs on each chromosome. 15. The order of filters can be changed and some filters can be omitted. 16. “snipSNP” includes three perl scripts: (a) “extractSNP.pl”: removes background SNPs; (b) “snpinGFF.pl”: extracts SNPs 268 Naoyuki Uchida et al. within annotated genes; and (c) “stateSNP.pl”: counts numbers of SNPs within intervals (length of intervals can be adjusted). Although these perl scripts are optimized for the analysis of CASAVA output, SNP lists in different formats are accepted as input with setting options. See the manual of the perl scripts for further details. 17. The parameters (path, file name) in the commands described in this chapter are just an example and need to be changed as appropriate. 18. Use the following command: “perl extractSNP.pl -t /path/to/mutant_snps.txt / -b / path/to/background_snps.txt”. The output directory is created in the current directory. It includes lists of mutant-specific SNPs and filtered-out SNPs. 19. Type the following command (“ChrN” is the name of the target chromosome in the GFF file; modify this parameter depending on your target): “perl snpinGFF.pl -t /path/to/mutant_snps.txt / -g / path/to/annotation_information.gff -c ChrN”. The lists of SNPs within annotated gene features (CDS, 5′UTR, and 3′UTR) are output. To extract SNPs within intron donor/acceptor sites type: “perl snpinGFF.pl -t /path/to/mutant_snps.txt-g /path/ to/annotation_information.gff / -c ChrN-i exon”. “snpinGFF.pl” gives information on intron donor/acceptor sites based on information about exons in the GFF file used. 20. “snpinGFF.pl” is useful for the extraction of SNPs within other features (e.g., pseudogenes) documented in the GFF file with variable options. 21. If the GFF file does not include information on exons, it would be helpful to make a list of intron donor/acceptor sites. However, in such cases intron donor/acceptor sites adjacent to UTRs will not be included. 22. Before using “stateSNP.pl,” split the SNPs in the “input SNPs list” by the type of SNP; homozygous and heterozygous SNPs are classified as “SNP_diff” and “SNP_het,” respectively, in the list. Sort and split in EXCEL or use the “grep” command in UNIX: “grep SNP_diff /path/to/target_snps.txt > target_homo_ snps.txt”. To count SNPs, type: “perl stateSNP.pl -t /path/to/target_homo_snps.txt > homo_snps_count.txt”. NGS Approach for Detection of EMS-Induced SNPs 269 Calculate and plot the ratio of homozygous SNPs to heterozygous SNPs in EXCEL. If a reference ecotype, Col-0, is used for parental accession, the number of SNPs derived from the other accession used for crossing (e.g., Ler) should be decreased in the linked region. When a non-reference accession is employed for mutagenesis, such number should increase in the neighbor region of causal mutation [1]. 23. The causal SNP presumably is located in the narrowed-down region but is not always found in the “trough of peak.” 24. These filters of SNP data of multiple allelic mutants may exclude the causal SNPs if the underlying causal mutation is identical. 25. To remove background SNPs from multiple allelic mutants, use each of the allelic mutants as virtual background accession and remove background SNPs as described in Subheading 3.2.6. Type: “perl extractSNP.pl -t /path/to/allele1_snps.txt / -b /path/to/allele2_snps.txt”. In this case, “allele1_snps.txt” and “allele2_snps.txt” are used as target and background, respectively. A list of allele1specific SNPs is output. 26. It is highly recommended to carry out Sanger sequencing to remove false-positive SNPs. Acknowledgements The authors thank Dr. Taku Ohshima and Mrs. Eiko Nakamoto (NAIST) for optimization of library preparation and GAIIx manipulation. We also thank Dr. Noriko Inada (NAIST) for the arrangement of the website to download our custom script. References 1. Uchida N et al (2011) Identification of EMSinduced causal mutations in a non-reference Arabidopsis thaliana accession by whole genome sequencing. Plant Cell Physiol 52:716–722 2. Schneeberger K et al (2009) SHOREmap: simultaneous mapping and mutation identification by deep sequencing. Nat Methods 6:550–551 3. Mokry M et al (2011) Identification of factors required for meristem function in Arabidopsis using a novel next generation sequencing fast forward genetics approach. BMC Genomics 12:256 4. Marti L et al (2010) A missense mutation in the vacuolar protein GOLD36 causes organizational defects in the ER and aberrant 5. 6. 7. 8. protein trafficking in the plant secretory pathway. Plant J 63:901–913 Laitinen RA et al (2010) Identification of a spontaneous frame shift mutation in a nonreference Arabidopsis accession using whole genome sequencing. Plant Physiol 153:652–654 Cuperus JT et al (2010) Identification of MIR390a precursor processing-defective mutants in Arabidopsis by direct genome sequencing. Proc Natl Acad Sci USA 107:466–471 Austin RS et al (2011) Next-generation mapping of Arabidopsis genes. Plant J 67:715–725 Michelmore RW, Paran I, Kesseli RV (1991) Identification of markers linked to diseaseresistance genes by bulked segregant analysis: a 270 Naoyuki Uchida et al. rapid method to detect markers in specific genomic regions by using segregating populations. Proc Natl Acad Sci U S A 88:9828–9832 9. Greene EA et al (2003) Spectrum of chemically induced mutations from a large-scale reverse-genetic screen in Arabidopsis. Genetics 164:731–740 10. Holt RA, Jones SJ (2008) The new paradigm of flow cell sequencing. Genome Res 18:839–846 11. Pang AW et al (2010) Towards a comprehensive structural variation map of an individual human genome. Genome Biol 11:R52 12. Ben L et al (2009) Ultrafast and memoryefficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25 13. Heng L, Richard D (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–1760 14. Heng L et al (2009) The sequence alignment/ Map format and SAMtools. Bioinformatics 25:2078–2079 15. Mark AD et al (2011) A framework for variation discovery and genotyping using nextgeneration DNA sequencing data. Nat Genet 43:491–498 Chapter 15 Arabidopsis Transformation with Large Bacterial Artificial Chromosomes Jose M. Alonso and Anna N. Stepanova Abstract The study of a gene’s function requires, in many cases, the ability to reintroduce the gene of interest or its modified version back into the organism of choice. One potential caveat of this approach is that not only the coding region but also the regulatory sequences of a gene should be included in the corresponding transgenic construct. Even in species with well-annotated genomes, such as Arabidopsis, it is nearly impossible to predict which sequences are responsible for the proper expression of a gene. One way to circumvent this problem is to utilize a large fragment of genomic DNA that contains the coding region of the gene of interest and at least 5–10 kb of flanking genomic sequences. To facilitate these types of experiments, libraries harboring large genomic DNA fragments in binary vectors have been constructed for Arabidopsis and several other plant species. Working with these large clones, however, requires some special precautions. In this chapter, we describe the experimental procedures and extra cautionary measures involved in the identification of the clone containing the gene of interest, its transfer from E. coli to Agrobacterium, and the generation, verification, and analysis of the corresponding transgenic plants. Key words TAC, Transformation, Arabidopsis, T-DNA, DNA deletions, Electroporation, Agrobacterium 1 Introduction The ability to introduce specific sequences into the genome of an organism is an essential tool to dissect the function not only of the individual genes but also of the pathways and networks in which these genes act [1]. Perhaps, the two most common applications of these types of experimental approaches are (1) the phenotypic complementation of a mutant by the wild-type copy of the corresponding dysfunctional gene and (2) the addition of tags or other types of sequence alterations to the gene of interest to facilitate subsequent downstream functional analysis (e.g., to investigate subcellular localization or spatial-temporal expression patterns). In an ideal situation, these modifications would involve targeted replacement of the endogenous sequences by means of homologous recombination. In most plant species, including Arabidopsis, Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_15, © Springer Science+Business Media New York 2014 271 272 Jose M. Alonso and Anna N. Stepanova Table 1 List of large-insert binary-vector libraries readily available for Arabidopsis Vector name # Clones # Clones mapped DNA source Available from pYLTAC17 36,864 8,223 Col Genome enterprise “http://www.genomeenterprise.com/” pYLTAC7 10,749 110 Col ABRC “http://abrc.osu.edu/” PAC P1 9,080 300 Col ABRC “http://abrc.osu.edu/” BIBAC2 11,520 Ler ABRC “http://abrc.osu.edu/” – this approach is, however, not practical due to the extremely low frequency of homologous recombination events during the integration of foreign DNA into the plant genome. The most common alternative is, therefore, the Agrobacterium-mediated random integration of Transfer-DNA (T-DNA) in the genome of a plant. A wide variety of vectors compatible with this Agrobacterium transformation have been developed for a large array of different purposes [2–4]. For example, binary vectors have been engineered to carry large DNA fragments [5] ranging in size from tens to up to a few hundred kilobases. In this chapter, we will focus on working with transformation-competent bacterial artificial chromosomes, or TACs. Importantly, several Arabidopsis genomic libraries have been constructed using these vectors and are available to the plant community (Table 1). These specialized libraries are ideal for complementation studies [5], where large genomic intervals can be easily covered using only a handful of these large clones. However, the utility of these libraries is not limited to the complementation studies. Several additional applications for these large clones in gene functional studies have been recently reported in Arabidopsis [6]. Precise modifications of specific sequences in the clone, such as the insertion of a fluorescent tag in a particular location of a gene of interest or the introduction of a desired single nucleotide change, have significantly widened the potential utility of these types of genomic libraries [6]. There are several key advantages of using these large clones, both in the complementation studies (where the greater size of these clones makes it possible to scan larger regions of the genome) and in the gene functional approaches (where the presence of large fragments of DNA flanking the gene of interest ensures the presence of all regulatory sequences). Nevertheless, there are also a few drawbacks that have precluded a more widespread use of these types of libraries. Thus, for example, only for some of these libraries (Table 1), the exact sequence content of each clone is known and the coverage is sufficiently high to make the general use of the library practical. Another reason for the limited use of these types of clones is the low efficiency and inadequacy of the standard protocols (that were Arabidopsis Transformation with BACs 273 originally established and optimized for the manipulation of smaller binary vectors) when directly applied to the much larger TAC clones. Finally, presumably as a consequence of the two previous points, to date there is only a handful of examples in the literature of the successful utilization of these genomic libraries in Arabidopsis, making it difficult for the general research community to assess the true potential and limitations of these tools. In this chapter, we describe the experimental procedures and common problems that may arise for each step of the protocol, from the transfer of the clones from E. coli DH10B to Agrobacterium to the transformation of Arabidopsis and the analysis of the resulting transgenic plants (Fig. 1). All of the examples provided are based on the use of the JAtY library of TAC clones. This library was chosen for three main reasons: the library is publically available, the source of the genomic DNA is from the Columbia accession, and, most importantly, end-sequenced clones cover more than >90 % of the Arabidopsis genome [6]. 2 Materials 2.1 Transfer of TAC Clones from E. coli to Agrobacterium 1. Luria-Bertani (LB) broth: 10 g/L tryptone, 5 g/L yeast extract, 10 g/L NaCl. Sterilize by autoclaving. 2. LB-agar medium: LB broth supplemented with 15 g/L agar. Sterilize by autoclaving. 3. 15-mL sterile plastic culture tubes. 4. Temperature-controlled shaker incubator. 5. Kanamycin (100 mg/mL stock in water). Sterilize by filtering. 6. Gentamicin (25 mg/mL stock in water). Sterilize by filtering. 7. Sterile plastic Petri dishes (100 × 15 and 150 × 15 mm). 8. SOB medium: 20 g/L tryptone, 5 g/L yeast extract, 0.5 g/L NaCl. Adjust pH to 7.5 with 1 M KOH. Sterilize by autoclaving. 9. 10 % v/v glycerol. Sterilize by autoclaving. 10. Refrigerated centrifuge (Sorvall RC-5B) and rotor (Sorvall SS-34). 11. Transparent 50-mL Nalgene polypropylene tubes (3118-0050 Oak Ridge). Sterilize by autoclaving. 12. 9″-long glass Pasteur pipets. Sterilize by autoclaving. 13. Alkaline Lysis Solution I: 50 mM glucose, 10 mM EDTA pH 8.0, 25 mM Tris-HCl pH 8.0, 4 mg/mL lysozyme Sigma L-6876. 14. Alkaline Lysis Solution II: 0.2 N NaOH, 1 % SDS. 15. Alkaline Lysis Solution III: 3 M acetate, 5 M potassium, pH 4.8. For 100 mL, weight 29.5 g of potassium acetate, bring to 88.5 mL with diH2O and add 11.5 mL of glacial acetic acid. 274 Jose M. Alonso and Anna N. Stepanova Order the TAC clone from the stock center Select a TAC clone containing the gene of interest Steak the E. coli DH10B strain harboring TAC Isolate TAC DNA Select transformants Perform colony PCR with gene-specific primers Grow PCR-positive colony Electroporate TAC into Agrobacterium Perform colony PCR with gene-specific primers Transform Arabidopsis Select transformants Collect Agrobacterium cells in glucose solution Solid media Grow Agrobacterium in Liquid media Test by PCR the integrity of the T-DNA RB Basta R T-DNA SacB LB GeneF! SacBF! SacBR! -C GeneF+SacBR SacBF+SacBR ~800 bp! +C L1 L2 L3 L4 L5 L6 Test Primers L7 Functionally characterize the positive transgenic lines (L4 and L5) Fig. 1 Schematic representation of the steps involved in the gene functional characterization using JAtY TAC clones. The procedure starts with the selection of the TAC clone containing the gene of interest among a series of overlapping TACs that span a determined chromosomal region using the web tools at “http:// Arabidopsislocalizome.org/” and “http://atidb.org.” Once the TAC(s) have been identified, they can be ordered from the corresponding stock centers; in the case of the JAtY clones, they should be ordered from the Genome Enterprise “http://www.genome-enterprise.com.” The identity of the received clones is then tested using gene-specific primers designed for a gene predicted to be contained in the TAC of interest, ideally, positioned only 1 or 2 kb away from the LB end of the clone. Next, TAC DNA is extracted from E. coli and transferred to Agrobacterium cells. After confirming the presence of the desired TAC clone, the Agrobacterium strain carrying the TAC clone is propagated either in liquid or in solid media. Agrobacterium cells are resuspended in a glucose/detergent solution and used for floral dip transformation. After selecting Basta-resistant plants, SacB and gene-specific primers are used to test the integrity of the T-DNA inserted in the plant genome. By using the SacBF and SacBR primers, a PCR product of approximately 800 bp should be obtained. Although this PCR test has been shown to be a good indication of the integrity of the T-DNA, additional PCR with the gene-specific primer and the SacB should be carried out to rule out possible contamination. In the example illustrated in the figure, line L2 corresponds to a contamination originating from a transgenic (Basta-resistant) plant that carries the SacB gene but does not harbor the correct TAC clone. Those plants that are Basta resistant but do not carry the SacB gene (lines L1, L3, L6, L7) are likely to harbor truncated T-DNAs. Only the transgenic lines that have been confirmed with both sets of primers (lines L4 and L5) had incorporated the full-length TAC of interest in their genome and can then used in the desired gene functional studies Arabidopsis Transformation with BACs 275 16. 95–100 and 70 % ethanol. 17. Tabletop centrifuge with a rotor for microcentrifuge tubes. 18. Microcentrifuge plastic tubes (1.5 mL). 19. Electroporator with the capability to control resistance, capacitance, and voltage (Bio-Rad gene pulser and pulse controller units or equivalent). 20. Electroporation cuvettes (1 mm gap). 21. A pair of gene-specific primers complementary to the region of TAC near the left border (LB) of the T-DNA (see Note 1). 2.2 Plant Transformation and Selection of Transgenic Plants 1. Soil: 50 % Sun Gro (Sunshine), 50 % Fafard 4P-Mix or equivalent. 2. Germination trays: 21″ × 11″ × 1 ¼″, Hummert International or equivalent. 3. Propagation dome: 21″ × 11″ Hummert International or equivalent. 4. Square plastic pots: 4 × 6 pots from Hummert International or equivalent. 5. Plant transformation solution: 5 % glucose, 200 μL/L Silwet L-77. 6. Seed sterilization solution: 50 % Bleach, 100 μL/L Triton X-100 (the detergent prevents seeds from clumping and aggregating together). 7. AT media: 1× Murashige & Skoog (MS) salts, 1 % sucrose, adjust pH with 1 M KOH to 6.0, then add Bacto Agar (Difco) to 0.7 % final concentration and autoclave. 8. Disposable 50-mL centrifuge tubes. 9. Basta resistance selection media: prepare AT media, sterilize by autoclaving, cool to ~45 °C, add phosphinothricin [PPT, glufosinate ammonium, GoldBio] to the final concentration of 20 mg/L (the stock can be prepared at 20–100 mg/mL in water and filter sterilized), pour media at 50–60 mL per 150 mm Petri dish. 10. Top agarose: 0.7 % low melting point agarose supplemented with 20 mg/L phosphinothricin and 300 mg/L Timentin (to inhibit growth of Agrobacterium on T1 transgenics). 11. Fine-pointed forceps. 2.3 Analysis of Transgenics 1. CTAB buffer: 1.4 M NaCl, 20 mM EDTA pH8.0, 100 mM Tris-HCl pH 8.0, 3 % CTAB (cetyltrimethylammonium bromide). 2. Homemade scoop (cut off the bottom of a microfuge tube with scissors or a razor blade and glue the bottom piece to the hot tip of a glass Pasteur pipet). 276 Jose M. Alonso and Anna N. Stepanova 3. Ivoclar Vivadent shaker (or equivalent). 4. 1-mm diameter glass beads, BioSpec. 5. SacB primers (SacBF 5′-TGTAAAACAAGCCACAGTTC-3′ and SacBR 5′-AATAAAGATTCTTCGCCTTG-3′). 6. General PCR reagents: dNTPs, 10× PCR buffer with Mg, Taq polymerase. 7. Thermocycler. 8. Gel electrophoresis setup. 9. 10 mg/mL ethidium bromide. 3 Methods 3.1 Transfer of TAC Clones from E. coli to Agrobacterium 1. Identify the JAtY TAC clone containing the gene of interest or corresponding to the desired genomic region using the tools at the Arabidopsis thaliana Integrated Database http:// atidb.org/cgi-perl/gbrowse/atibrowse/ or at http:// Arabidopsislocalizome.org/. 2. Order the TAC of interest from the Genome Enterprise, http://www.genome-enterprise.com. 3. Re-streak the bacterial strain received on LB-agar supplemented with 25 mg/L kanamycin and incubate overnight at 37 °C. 4. Confirm that the strain harbors the TAC clone of interest by colony PCR using gene-specific primers: resuspend a single colony in 20 μL of water and use 2 μL of cells as a template in a 20 μL PCR reaction mix (see Note 2). 5. (Day 1) Streak Agrobacterium strain GV3101 (pMP90) [7] or equivalent on LB-agar plates supplemented with 25 mg/L gentamicin and incubate at 28 °C for 2–3 days (see Note 3). 6. (Day 3) Streak JAtY E. coli clone on LB-agar plates supplemented with 25 mg/L kanamycin and incubate at 37 °C overnight (see Note 3 above). 7. (Day 4) Inoculate 3 mL of LB supplemented with gentamicin (25 mg/L) in a 15-mL sterile plastic culture tube with a mixture of 2–3 colonies of actively growing Agrobacterium cells. Incubate at 28 °C overnight with continuous shaking (see Note 4). 8. (Day 4) Inoculate 3 mL of LB supplemented with kanamycin (25 mg/L) in a 15-mL sterile plastic culture tube with a single colony of actively growing E. coli cells harboring the JAtY clone of interest. Incubate at 37 °C overnight with continuous shaking (see Note 4). Arabidopsis Transformation with BACs 277 9. (Day 5) Inoculate 50 mL of SOB supplemented with gentamicin (25 mg/L) in a 250-mL Erlenmeyer flask with 1 mL of the overnight Agrobacterium culture. Incubate at 28 °C for 4–5 h until the culture reaches OD 600 ~0.6. 10. (Day 5) While the Agrobacterium culture is growing, isolate JAtY TAC DNA from the overnight culture. Prepare fresh Alkaline Lysis Solutions I and II. Chill Solution I on ice and keep Solutions II and III (with Solution III made in advance) at room temperature. Transfer 1.5 mL of the overnight culture to a microcentrifuge tube. Spin at 14,000 rpm (20,817 × g) for 1 min in a tabletop microcentrifuge. Aspirate all the liquid. Add 100 μL of Solution I and resuspend cells by pipetting up and down the solution until all the cells are in suspension. Add 200 μL of Solution II. Mix by gently inverting the tube 8–10 times. Immediately add 150 μL of Solution III and again mix by gently inverting the tube 8–10 times. Centrifuge for 6 min at 14,000 rpm (20,817 × g) in a tabletop microcentrifuge. Transfer supernatant to a new 1.5-mL centrifuge tube with a 1 mL pipetman (see Note 5). 11. (Day 5) Slowly add 1 mL of 95–100 % room-temperature ethanol, mix by gently inverting the tube 8–10 times, and spin for 6 min at 14,000 rpm (20,817 × g). Remove supernatant by aspiration, and wash pellet once with 70 % ethanol at room temperature. Aspirate supernatant and air-dry the pellet for ~2–3 min. Add 30 μL of diH2O and allow the DNA to sit and dissolve for 2–3 h at room temperature (do not vortex or pipet the DNA to prevent mechanical damage). 12. (Day 5) Place freshly grown Agrobacterium culture (that has reached OD of ~0.6) on ice for 5–10 min before starting the preparation of electrocompetent cells. Transfer the entire Agrobacterium culture to a prechilled 50-mL Nalgene centrifuge tube. Spin cells at 4 °C for 5 min at 2,200 g in a Sorvall SS-34 rotor (or equivalent). Make sure that the centrifuge and the rotor have been precooled to 4 °C. Quickly pour off supernatant by inverting the tube. Resuspend cells by gently stirring the tube in an ice-cold water bath. Fill the tube with sterile ice-cold 10 % glycerol. Centrifuge at 4 °C for 10 min at 5,000 g in the Sorvall SS-34 rotor. Quickly pour off supernatant by inverting the tube. Resuspend cells by gently stirring the tube in ice-cold water bath. Fill the tube with ice-cold 10 % glycerol. Centrifuge at 4 °C for 10 min at 5,000 g in the Sorvall SS-34 rotor. Remove the 10 % glycerol by aspiration with a glass Pasteur pipet. Resuspend the cells in the 10 % glycerol remaining in the tube walls keeping the cells always on ice. 13. (Day 5) Centrifuge the TAC DNA (that has been dissolving at room temperature) for 5 min at 14,000 rpm (20,817 × g) 278 Jose M. Alonso and Anna N. Stepanova and transfer 7 μL to a new 1.5-mL centrifuge tube. Place the DNA on ice. 14. Place 1-mm-gap electroporation cuvette on ice for 2–3 min. 15. Transfer 40 μL of Agrobacterium competent cells to the tube with 7 μL of TAC DNA. 16. Immediately transfer the mix of DNA and cells to the electroporation cuvette. 17. Electroporate cells at 1,250 V, 100 Ω, and 25 μF [8]. 18. Add 1 mL of room-temperature LB broth into the electroporation cuvette and then transfer the cell suspension to a new 15-mL culture tube. 19. Recover cells for 1 h 30 min at 28 °C in the shaker incubator at 200 rpm. 20. Transfer the culture to a 1.5-mL centrifuge tube and collect cells by spinning for 1 min at 14,000 rpm (20,817 × g) at room temperature. 21. Remove most of the liquid, leaving ~50–100 μL of the LB, and resuspend the cells in this leftover media by pipetting. 22. Spread cells on an LB-agar plate supplemented with kanamycin (25 mg/L), allow the media to get fully absorbed, and place the plates at 28 °C. Colonies will start appearing after 3–5 days. 23. Test Agrobacterium transformants for the presence of the desired TAC using the gene-specific primers (see Note 6). 3.2 Plant Transformation and Selection of Transgenics 1. Surface-sterilize seeds by placing them in the seed surface sterilization solution for 10 min and occasionally inverting/shaking the tubes to fully resuspend the seeds. After 10 min, allow the seeds to settle by gravity, remove the bleach solution by aspiration, and wash seeds thoroughly 3 times with sterile water, each time fully resuspending the seeds (for small amounts [i.e., ~200] of seeds this process can be done in 1.5mL microcentrifuge tubes). 2. Resuspend the seeds in melted and precooled sterile 0.7 % low melting point agarose in water, and plate by spreading the seeds (using a 200 μL pipetman with a sterile wide-bore tip) on the surface of AT media plates supplemented with 20 mg/L PPT (see Note 7). 3. Stratify the seeds in the plates at 4ºC for 3 days to equalize germination. After the cold treatment, light-treat the plates with seeds for about 2 h at room temperature to improve germination. Place the plates with seeds horizontally in a 22ºC dark incubator. After about 72 h, transfer the plates to a growth chamber with constant light for 3–5 days before transplanting individual seedling to soil with forceps. With the back of the forceps, make half a centimeter deep holes in the moist soil, Arabidopsis Transformation with BACs 279 one hole in each corner of the pot and one in the middle. Place the seedling in the hole, so the root but not the cotyledons get under the soil surface when you close the hole with the back of the forceps. Cover the tray with the transparent propagation dome (see Note 8). 4. Grow the plants under 16 h light/ 8 h dark cycle at 20 °C. After ~ 2–3 weeks (when plants are starting to bolt) gradually remove the propagation dome by shifting it ~2 cm to one side the day before you plan to remove the dome completely. 5. Six days before plant transformation streak the Agrobacterium strain harboring the TAC of interest on a 100-mm LB plate supplemented with kanamycin and gentamicin, and incubate the culture at 28 °C. After three days collect the cells corresponding to 5–10 colonies, resuspend them in 300 μL of LB and spread the mixture in a large 150-mm LB plate supplemented with kanamycin and gentamicin. Prepare 2–3 plates per clone. 6. Collect the Agrobacterium cells from 2–3 large plates by scraping the cells with the plastic tip of a 200-ul pipetman that has been bent about 1 cm from the thinner end into an L shape. All the cells from the 2–3 plates are resuspended in about 300 mL of transformation solution (see Note 9). 7. Pour the Agrobacterium cells resuspended in the transformation solution into a 250 mL glass beaker that is wide enough to allow all of the inflorescences from one pot (5 plants) to fit in, but small enough so the plastic pot does not fall inside the solution. Take a pot of plants, carefully invert it upside down, so the soil and plants do not detach from the plastic pot. Submerge all of the inflorescences into the Agrobacterium suspension, and after a few seconds, lift the pot and dip the inflorescences again. 8. Place the pots with the dipped plants in a horizontal position in a clean plastic tray and cover it with the propagation dome (see Note 10). Transfer the tray with the plants back to the growth chamber. 9. The day after transformation shift the propagation dome about 1 in. to the side, and the day after remove it completely and return the pots with the plants to a vertical position and water the plants if necessary (see Note 11). 10. Continue watering the plants until they finish flowering and the seed pots start to dry. Let the plants dry completely before collecting the seeds. 11. Collect the seeds by carefully putting the plants to the side on a clean sheet of paper, help releasing the seeds from the siliques by gently rubbing the dry siliques with the fingers tips. Use a plastic mesh to clean the seeds from the plant and soil debris. 280 Jose M. Alonso and Anna N. Stepanova 12. Transfer about 600 mg of seeds to a 50-mL conical plastic tube. Sterilize seeds for 10–15 min using 30–40 mL of seed sterilization solution. Make sure the seeds are fully resuspended. Invert/shake the tubes with seeds occasionally during the 10–15 min of sterilization. Let the seeds sediment and remove as much bleach solution as possible using a vacuum aspirator. 13. Wash the seeds 3–5 times with ~50 mL sterile di H2O. After each wash, remove as much water as possible. 14. To the seed suspension in di H2O, add PPT and Timentin to 20 mg/L and 300 mg/L final concentration, respectively. For example, if the residual volume of seeds in water is 5 mL, add 5 μL of 20 mg/mL PPT and 300 mg/mL Timentin. Timentin inhibits Agrobacterium growth that survived bleach sterilization under the seed coat. 15. Cold-treat the tubes with seeds at 4ºC for 2–3 days to equalize germination. 16. Equilibrate the tubes with seeds to room temperature for 15–30 min and add to the seed suspension melted, precooled 0.7 % top agarose in water (see Note 12). 17. Use plastic single-use 10-mL pipettes to uniformly distribute the seed/agarose suspension on the top of AT plates supplemented with 20 mg/L PPT. Plate up to 8,000 seeds (~0.2 g dry weight) resuspended in 5–7 mL of top agarose per each 150-mm Petri plate containing 50–60 mL AT media supplemented with 20 mg/L PPT. 18. Put the plates in the light for 1–2 h at room temperature to improve germination and then place the plates in the dark incubator at 22ºC for 3 days. 19. After 3 days in the dark, move the plates to the light for 2–5 days. Check the plates periodically. Basta-resistant plants (transformants) will develop green-colored cotyledons upon light exposure. Sensitive plants (untransformed) will remain bleached or will fail to germinate altogether. 20. Transplant Basta-resistant plants to soil and propagate (see Note 13). 3.3 Analysis of the Transgenics 1. Place 100 μL of 1-mm glass beads into microcentrifuge tubes using a homemade scoop. 2. Harvest one healthy leaf of about 2 cm in length into microfuge tubes prefilled with ~100 μL glass beads, wiping off the forceps between individuals (see Note 14). 3. Store tissues at −20 or −80 °C until needed or directly proceed to the next step. 4. Freeze samples in liquid nitrogen by resting them on the surface of a foil cup partially submerged in liquid nitrogen. Arabidopsis Transformation with BACs 281 5. Grind frozen samples for 5–6 s in a Vivadent shaker. 6. Add 250 μL of CTAB buffer. 7. Grind for 5–6 s in a Vivadent shaker, place samples in a rack at room temperature while shaking the rest of the samples. 8. Incubate the entire rack of samples for 30 min at 65 °C. 9. Cool samples to room temp for ~10 min. 10. Add 250 μL of chloroform. 11. Mix the samples by vigorously shaking the tubes. 12. Spin at 14,000 rpm (20,817 × g) for ~ 10 min at room temperature. 13. Transfer 200 μL of upper phase into a tube prefilled with 250 μL of isopropanol. 14. Mix samples by inversion 3–4 times. 15. Spin at 14,000 rpm (20,817 × g) for ~ 10 min at room temperature. 16. Aspirate supernatant being careful not to touch and suck out the DNA pellet. 17. Wash pellet with 70 % EtOH. 18. Spin at 14,000 rpm (20,817 × g) for ~ 10 min at room temperature. 19. Aspirate supernatant being careful not to touch and suck out the DNA pellet. 20. Air-dry the pellet for about 10 min. 21. Resuspend DNA in 100–400 μL of deionized H2O. Shake in the Vivadent shaker for 5 s and spin to collect any insoluble material on the bottom of the tube. Use 1–2 μL DNA per 10–20 μL PCR reaction. 22. Test for the integrity of the T-DNA using the SacBF and SacBR primers. The presence of an ~800 bp band is a good indicator of a complete copy of the T-DNA in the plant genome (Fig. 1) (see Note 15). 4 Notes 1. The F (forward) sequence in the ATIDB database (see below) corresponds to the Arabidopsis genomic sequence adjacent to the RB side of the T-DNA, whereas the R (reverse) sequence corresponds to the LB side of the T-DNA. 2. Ideally, the gene-specific primers should be designed complementary to the 1–2 kb region of Arabidopsis genomic DNA closest to the LB. This is easy to determine as the TAC-end 282 Jose M. Alonso and Anna N. Stepanova sequences in the ATIDB labeled as “R” were obtained by sequencing from the LB side of the vector. 3. It is important to start both the Agrobacterium culture to prepare electrocompetent cells and the E. coli culture to isolate TAC DNA from fresh actively growing colonies. Starting the cultures from older cells or colonies that have been stored in the fridge will reduce the efficiency of transformation. Other standard laboratory Agrobacterium strains can be used, but it is desirable that they are RecA− to avoid potential rearrangement problems in the genomic DNA. 4. The Agrobacterium and the E. coli overnight cultures can be incubated in the same shaker at 32 °C if desired. 5. Be very gentle when pipetting solution containing the TAC clones. Mechanical damage of these large DNA molecules will introduce nicks in the DNA causing them to lose the supercoiled conformation and making electroporation extremely inefficient. 6. Colony PCR of primary transformants is prone to false positives, probably due to the presence of trace amounts of TAC DNA used in the transformation. Colonies giving a strong amplification with the gene-specific primers should be re-streaked in an LB plate supplemented with kanamycin (25 mg/L) and individual colonies tested again by PCR. Colonies that pass this second test should be considered true positives. 7. We typically use 80–100 μL of top agarose per up to 100 seeds and spread the entire volume per 1/10 sector or a larger area of a standard 100 mm Petri dish. 8. It is very important to prevent any damage to the seedlings with the forceps. By transplanting seedlings pre-germinated in plates in this manner, it is possible to select seedlings that germinated at the same time and look evenly healthy. This also allows a very uniform distribution of the plants in the soil pots. Using this transplanting method, healthy plants of uniform size and synchronized bolting time can be obtained, which is crucial for achieving good plant transformation efficiency. 9. It is very important to use glucose instead of sucrose in the plant transformation solution as many JAtY clones are able to express the SacB gene in Agrobacterium (even if they cannot in E. coli) and the SacB protein can convert sucrose to a toxic product. Therefore, sucrose in the transformation solution may make Agrobacterium sick and result in a dramatic reduction of the plant transformation efficiency. 10. It is important to keep the plants covered immediately after dipping to maintain high humidity. 11. It is important to transition the plants from high humidity to a normal environment gradually to avoid damage to the young flower buds. Arabidopsis Transformation with BACs 283 12. Be careful not to use hot top agarose, as it will kill the seeds. On the other hand, if the agarose is too cool, it will solidify when mixed with the room-temperature seed suspension and make clumps. Use 2–3 volumes of top agarose per each seed suspension volume. For example, add 10–15 mL of 0.7 % top agarose to 5 mL seed suspension. 13. The plant transformation efficiency with the JAtY clones, although highly variable, is significantly lower than that obtained with regular binary vectors. It is a good idea to determine the transformation efficiency by plating ~10,000 seeds on a single 150-mm plate. When estimating the number of lines obtained in a transformation experiment, one needs to keep in mind that up to 75 % of the resistant plants may have truncated T-DNAs. 14. The presence of senescent petals on the surface of the leaf or poor cleaning of the forceps between samples may cause PCR false positives. 15. It is important to use as a negative control DNA from an untransformed wild-type plant. In our experience [6] the presence of the SacB in plants that are Basta resistant is diagnostic of the presence of a whole T-DNA copy in the genome of the plant. This test, however, cannot discriminate between different TAC clones; thus the presence of cross-contaminations from plants transformed with a different TAC clones will still result in a positive SacB amplification in a Basta-resistant plant. A gene-specific primer for a sequence close to the LB end of the TAC clone and the SacBF primer could be used to determine the presence of the T-DNA corresponding to a specific TAC clone (Fig. 1). References 1. Alonso JM, Ecker JR (2006) Moving forward in reverse: genetic technologies to enable genomewide phenomic screens in Arabidopsis. Nat Rev Genet 7:524–536 2. Lee LY, Gelvin SB (2008) T-DNA binary vectors and systems. Plant Physiol 146:325–332 3. Liu Y, Mitsukawa N, Vazquez-Tello A, Whittier RF (1995) Generation of a high-quality P1 library of Arabidopsis suitable for chromosome walking. Plant J 7:351–358 4. Chang Y-C, Henriquez XH, Preuss DP, Copenhaver GC, Zhang HZ (2003) A planttransformation-competent BIBAC library from the Arabidopsis thaliana Landsberg ecotype for functional and comparative genomics. Theor Appl Genet 106:269–276 5. Liu YG, Shirano Y, Fukaki H, Yanai Y, Tasaka M, Tabata S, Shibata D (1999) Complementation of plant mutants with large genomic DNA fragments by a transformation-competent artificial chromosome vector accelerates positional cloning. Proc Natl Acad Sci U S A 96:6535–6540 6. Zhou R, Benavente LM, Stepanova AN, Alonso JM (2011) A recombineering-based gene tagging system for Arabidopsis. Plant J 66:712–723 7. Farrand SK, O'Morchoe SP, McCutchan J (1989) Construction of an Agrobacterium tumefaciens C58 recA mutant. J Bacteriol 171:5314–5321 8. Sheng Y, Mancino V, Birren B (1995) Transformation of Escherichia coli with large DNA molecules by electroporation. Nucleic Acids Res 23:1990–1996 Chapter 16 Global DNA Methylation Analysis Using Methyl-Sensitive Amplification Polymorphism (MSAP) Mahmoud W. Yaish, Mingsheng Peng, and Steven J. Rothstein Abstract DNA methylation is a crucial epigenetic process which helps control gene transcription activity in eukaryotes. Information regarding the methylation status of a regulatory sequence of a particular gene provides important knowledge of this transcriptional control. DNA methylation can be detected using several methods, including sodium bisulfite sequencing and restriction digestion using methylation-sensitive endonucleases. Methyl-Sensitive Amplification Polymorphism (MSAP) is a technique used to study the global DNA methylation status of an organism and hence to distinguish between two individuals based on the DNA methylation status determined by the differential digestion pattern. Therefore, this technique is a useful method for DNA methylation mapping and positional cloning of differentially methylated genes. In this technique, genomic DNA is first digested with a methylation-sensitive restriction enzyme such as HpaII, and then the DNA fragments are ligated to adaptors in order to facilitate their amplification. Digestion using a methylation-insensitive isoschizomer of HpaII, MspI is used in a parallel digestion reaction as a loading control in the experiment. Subsequently, these fragments are selectively amplified by fluorescently labeled primers. PCR products from different individuals are compared, and once an interesting polymorphic locus is recognized, the desired DNA fragment can be isolated from a denaturing polyacrylamide gel, sequenced and identified based on DNA sequence similarity to other sequences available in the database. We will use analysis of met1, ddm1, and atmbd9 mutants and wild-type plants treated with a cytidine analogue, 5-azaC, or zebularine to demonstrate how to assess the genetic modulation of DNA methylation in Arabidopsis. It should be noted that despite the fact that MSAP is a reliable technique used to fish for polymorphic methylated loci, its power is limited to the restriction recognition sites of the enzymes used in the genomic DNA digestion. Key words DNA methylation, MSAP, Mutant lines, 5-azaC and zebularine 1 Introduction DNA methylation is an important epigenetic modification which usually takes place through the covalent attachment of a methyl group to the ring carbon 5 of the cytosine (C) in DNA without affecting the basic nucleotide sequence (Fig. 1). Methylated cytosines that are followed by guanines (G) are annotated as CpG, in which C Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_16, © Springer Science+Business Media New York 2014 285 286 Mahmoud W. Yaish et al. Fig. 1 Chemical structure of cytidine, 5-methylcytidine, 5-azaC, and zebularine binds to G by a phosphodiester bond (p) rather than the triple hydrogen bond between C and G in double-stranded DNA [1]. DNA methylation plays an important role in controlling gene expression in eukaryotes and it is typically associated with transcriptional gene repression [2, 3]. Determination of DNA methylation at particular locus provides important information on the gene expression pattern and gives detailed knowledge on the regulatory sequence for that gene. DNA methylation level in plants changes during different processes of plant growth and development and also when plants are exposed to biotic and abiotic stresses [4]. While some of these changes are transient, others are heritable through a process called transgenerational memory [5–7]. The DNA methylation pattern in Arabidopsis can be genetically manipulated via mutations in genes that maintain and/or are involved in establishing de novo DNA methylation. These include mutations in the DNA methyltransferase MET1 gene, the chromatin remodeling factor gene DDM1 (Decrease in DNA Methylation 1), and methylcytosine-binding protein 9 (AtMBD9) which all lead to a significant alteration in genome-wide DNA methylation levels and consequently to the reactivation of transcriptionally silent genes and transposable elements [8, 9]. The DNA methylation pattern in Arabidopsis genome can also be manipulated chemically. When chemical analogues of cytosine are incorporated into genomic DNA during replication, they inhibit catalytic activity of DNA methyltransferases by covalently binding to their active sites which leads to a general reduction in the DNA methylation level [10]. In plants, the most commonly used cytidine analogue is 5-azacytidine (5-azaC), in which the ring carbon 5 is replaced by nitrogen [10]. The chemical structure of cytidine, 5-methylcytidine, 5-azaC, and zebularine is illustrated in Fig. 1. 5-azaC induces hypomethylation and genome-wide transcriptional reactivation of silent genes and thus modifies plant growth and development [11–13]. Zebularine is also a cytidine analogue and inhibits DNA methylation in a similar way to 5-azaC. Compared to 5-azaC, zebularine is more stable and less toxic although the demethylation effect of zebularine is transient [14]. DNA Methylation Analysis Using MSAP 287 DNA methylation can be detected using sodium disulfide sequencing, with proteins with an affinity to the methyl group, with anti-methylcytosine antibodies, and using methylationsensitive restriction endonucleases [15]. Global DNA methylation level can be detected using a southern blot hybridization technique. In this technique genomic DNA is digested with methylationsensitive endonucleases such as HpaII and the methylation-insensitive isoschizomer MspI. Then the resulting DNA fragments are probed with an abundantly available gene in the genome such as the 120 bp 5S ribosomal RNA repeat. Global DNA methylation can also be studied using methyl-sensitive amplification polymorphism (MSAP). In this technique, genomic DNA from different samples is digested with the methylation-sensitive endonuclease HpaII, and adaptors are ligated to this DNA, followed by fragment amplification using PCR with specific primers (Fig. 2). Qualitative and quantitative differences in the amplification indicate variation in the global DNA methylation pattern, and the significant variation of methylation from site to site as well as from tissue to tissue can be studied. In this chapter, we describe the experimental protocol used to measure the global DNA methylation in methyltransferase mutants including met1, ddm1, and atmbd9 mutants as well as in wild-type Arabidopsis plants after treatment with 5-azaC or zebularine using the MSAP technique. This MSAP technique can be used to identify deferentially methylated genomic regions within and between populations of plants of different genetic backgrounds as well as in plants grown under different environmental conditions. In addition, this technique can be used for epigenetic mapping and positional cloning of target genes. MSAP is described here according to the previously published strategies and protocols designed for the amplified fragment polymorphism technique (AFLP) [16] and modified for the MSAP by Beaulieu et al. [17] and Madlung et al. [18]. Although MSAP is a reliable and easy to use technique, methods based on methylation-sensitive digestion limit the detection of methylation to the restriction sites of the endonuclease enzymes used. 2 Materials 2.1 Treatment of Arabidopsis Seeds with 5-azaC and Zebularine 1. Arabidopsis seeds: The seeds of Arabidopsis thaliana ecotype Columbia (Col) wild-type and met1, ddm1, atmbd9 mutants can be obtained from the Arabidopsis Stock Center (TAIR; www.arabidopsis.org). 2. Sterilization solution (5 % sodium hypochlorite, 0.05 % Tween-20). 3. Ethanol 75 %. 4. 1 mm Whatman filter papers. 288 Mahmoud W. Yaish et al. Fig. 2 Schematic representation of the MSAP technique. DNA is digested first with methylation-sensitive (HpaII) and methylation-insensitive (MspI) endonucleases, then the resulting DNA fragments are ligated to specific adaptors. Subsequently, the ligated DNA fragments are used as templates in a preselective PCR reaction using specific primers. The resulting PCR products are used as DNA template in a selective PCR reaction using three selective nucleotides as fluorescently labeled primers (asterisk). The selective PCR products are loaded into an ABI Prism 310 Genetic Analyzer machine. Bands are scored for presence or absent DNA Methylation Analysis Using MSAP 289 5. DNA demethylation chemicals 5-azaC and zebularine are available in Sigma. Preparing fresh 0.5 mM 5-azaC aqueous solution for each treatment. Never use stored 5-azaC solution (see Note 1). Prepare 40 mM zebularine stock solution in sterile distilled water and store at −20 °C (see Note 2). 6. Preparing zebularine treatment medium: solid 0.5× MS medium [19], 1 % sucrose, 1 % agar, 40 μM zebularine in Petri dishes. Control medium is solid 0.5× MS medium without zebularine. 7. Pots containing a mixture of universal substrate and vermiculite (3:1 v/v). 2.2 Genomic DNA Extraction and Purification 1. Liquid nitrogen. 2. Mortar and pestle. 3. DNA extraction buffer (150 mM Tris–HCl, pH 8.0, 15 mM EDTA (ethylenediaminetetraacetic acid), 1.0 M NaCl, 0.16 % (w/v) CTAB (cetyltrimethylammonium bromide), 20 μL/L 2-mercaptoethanol, and 0.1 % (w/v) PVP (polyvinylpyrrolidone)). 4. Phenol/chloroform/isoamyl alcohol (PCIM, 25:24:1, v/v/v), stored at 4 °C. 5. 100 % Isopropanol. 6. 75 % Ethanol. 7. Tris EDTA (TE) buffer (10 mM Tris–Cl, pH 7.5. 1 mM EDTA). 8. Sodium acetate 3 M (pH 5.2). 9. QIAGEN DNeasy Plant Maxi Kit (Catalogue number 68163). 10. NanoDrop spectrometer. 11. Agarose gel electrophoresis unit. 2.3 Methyl-Sensitive Amplification Polymorphism (MSAP) 1. Restriction enzymes and their buffers (EcoRI, HpaII and MspI). 2. T4 DNA ligase and ligase buffer. 3. Adapters: EcoRI adapter: (5′-CTCGTAGACTGCGTACC-3′) and (5′-AATTGGTACGCAGTCTAC-3′). HpaII-MspI adapter: (5′-GATCATGAGTCCTGCT-3′ and 5′-CGAGCAGGACTCA TGA-3′). Primers should be HPLC purified and synthesized at 0.2 μM scale. 4. Oligonucleotide primers: preselective EcoRI oligonucleotide primer (5′-GACTGCGTACCAATTC-3′), preselective oligonucleotide primer HpaII-MspI (5′-ATCATGAGTCCTGC TCGG-3′), selective EcoRI oligonucleotide primer (5′-GACTGCGTACCAATTC-AAC, ACC, ACA or AAG-3′) (Applied Biosystems) (see Note 3), and HpaII-MspI selective 290 Mahmoud W. Yaish et al. oligonucleotide primers (5′-ATCATGAGTCCTGCTCGGT CAA-3′ and 5′-ATCATGAGTCCTGCTCGGTCCA-3′). Primers should be HPLC purified and synthesized at 0.2 μM scale. 5. Taq DNA polymerase, buffer, and dNTPs. 6. A thermocycler such as Perkin-Elmer GenAmp PCR System 9700. 7. Mini agarose gel electrophoresis unit. 8. 1× Tris Borate EDTA (TBE) buffer (89 mM Tris base, 89 mM boric acid, 2 mM EDTA). 9. 100 Kb DNA GeneRuler ladder (Fermintus, catalogue number SM0241). 10. 6× DNA loading dye (Fermintus, catalogue number R0611). 11. GeneScan-500 [ROX] internal size standard (Applied Biosystems, catalogue number 401734). 12. Deionized formamide (Applied Biosystems, Catalogue number 400596). 13. ABI Prism 310 Genetic Analyzer (Applied Biosystems). 14. ABI Prism GeneScan 3.1 software. 2.4 Identification of the Polymorphic DNA 1. Selective EcoRI oligonucleotide primers end labeled with radioisotope (ATP [32P]) end-labeling grade from ICN Radiochemicals, Solon, OH, USA. 2. 40 % Acrylamide solution (37:5:1 acrylamide-bis-acrylamide solution) can be obtained from Bio-Rad Life Science. 3. 1 M Tris–HCL buffer (pH 8.0) can be obtained from Sigma Aldrich. 4. 10 % ammonium persulfate (10 mg/mL) can be obtained from Bio-Rad Life Science. 5. TEMED (N,N,N′,N′-tetramethylethylenediamine) can be obtained from Bio-Rad Life Science. 6. TBE buffer (89 mM Tris base, 89 mM boric acid, 2 mM EDTA). 7. 10 % Acetic acid. 8. Formamide loading dye: formamide dye (98 % formamide, 10 mM EDTA pH 8.0) and bromophenol blue and xylene cyanol as tracking dyes. 9. Power supply. 10. Fuji BAS-2000 phosphoimage analysis system (Fuji Photo Film Company Ltd, Japan). 11. QIAEX II Gel Extraction Kit (QIAGEN, Catalogue number 20021). DNA Methylation Analysis Using MSAP 291 12. QIAquick PCR Purification Kit (QIAGEN, Catalogue number 28104). 13. Sequi-Gen 38 cm × 50 cm gel apparatus (Bio-Rad Laboratories Inc., Hercules, CA, USA). 3 Methods As a general precaution, in order to obtain a constant temperature and time accuracy during the experiments, a thermocycler machine should be used in the incubation steps during the digestion and ligation. 3.1 Surface Sterilization of Arabidopsis Seeds 1. Suspend 100 mg seeds of Col or a mutant line in 1 mL 75 % ethanol in an Eppendorf tube for 5 min. 2. Remove the ethanol solution and wash the seeds two times with sterile distilled water. 3. Suspend the seeds in the sterilization solution for 5 min in an Eppendorf tube with frequent mixing. 4. Remove sterilization solution from the tube, and wash the seeds with sterile distilled water six times. 5. Stratify the surface-sterilized seeds by storing them in the dark at 4 °C for 2 days. The seeds are ready for demethylation treatment by 5-azaC and zebularine. 3.2 Treating Arabidopsis Seeds with 5-azaC 1. Wet 1 mm Whatman filter papers with 0.5 mM 5-azaC aqueous solution (2 mL/paper) or with sterilized distilled water as the control. 2. Place the wetted filter papers in Petri dishes. 3. Sow the surface-sterilized Col seeds on the filter papers, and wrap up the Petri dishes with parafilm to keep humidity. 4. Allow the seeds to germinate by placing the Petri dishes in the dark at 4 °C for 6 days. 5. Plant the seedlings in pots filled with universal substrate and vermiculite under the following growth conditions: 24 °C (day)/20 °C (night), 16 h light/8 h dark, 200 μE light intensity, and 60 % humidity. 6. Record plant growth and development phenotype with and without treatment. 3.3 Treating Arabidopsis Seeds with Zebularine 1. Sow surface-sterilized Col seeds on zebularine treatment medium and control medium, respectively. 2. Incubate the seeds under the following environmental condition: 24 °C (day)/20 °C (night), 16 h light/8 h dark, 200 μE light intensity. 292 Mahmoud W. Yaish et al. 3. At 14 days after seed germination, transfer the seedlings growing to a freshly prepared media containers (see Note 4). 4. Record plant growth and development phenotype with and without treatment. 3.4 Genomic DNA Isolation 1. Collect rosette leaves from ten plants of each treatment, mutant line, and wild-type Arabidopsis plants before flowering, freeze in liquid nitrogen, mill to powder, and store at −80 °C for DNA extraction and genome-wide analysis of DNA methylation. 2. Add 3 mL of DNA extraction buffer to a 50-mL polypropylene tube for each 1 g of fine grounded tissue. 3. Incubate at 65 °C in a water bath for 45 min with frequent shaking then allow the extract to cool down to room temperature. 4. Extract the homogenate with phenol/chloroform/isoamyl alcohol (25:24:1). 5. Centrifuge at 10,000 × g for 10 min at room temperature and transfer the aqueous layer to a new tube. 6. Extract again with chloroform/isoamyl alcohol (24:1). 7. Centrifuge at 10,000 × g for 10 min at room temperature. 8. Transfer the aqueous layer to a new tube and precipitate the nucleic acids in the aqueous phase by adding 10 % volume of sodium acetate 3 M (pH 5.2) and 60 % volume of cold isopropanol and incubated 2 h at −80 °C. 9. Centrifuge at 10,000 × g for 30 min at 4 °C and wash the nucleic acids pellet with 1 mL of cold 75 % ethanol. 10. Dissolve the pellet in 300 μL of TE buffer. 11. Purify the extracted DNA from contaminants and enzyme inhibitors using the QIAGEN DNeasy Plant Maxi Kit following the manufacturer’s instructions. 12. Determine the quantity and the quality of the DNA using the NanoDrop spectrometer and run 10 μL in a 1 % agarose gel (see Note 5). 3.5 Methyl-Sensitive Amplified Polymorphism (MSAP) 3.5.1 DNA Digestion, Adaptor Ligation, Preselective, and Selective PCR Amplification 1. Digest genomic DNA (100 ng) of ten individual Arabidopsis plants per treatment using 4 U each of EcoRI and either methylation-sensitive HpaII or methylation-insensitive MspI in a final volume of 10 μL using the thermocycler as an incubator for the reaction. 2. When the incubation time is finished, deactivate the digestion enzymes by heating the reaction at 80 °C for 10 min. 3. Anneal the complementary oligonucleotides (EcoRI adapter primers) and (HpaII-MspI adapter primers) in two different DNA Methylation Analysis Using MSAP 293 tubes by adding 20 μL of 30 pmol from each complementary primer in a 100 μL PCR tubes, heat up to 72 °C for 10 min, and then allow the reaction to cool down to room temperature (see Note 6). 4. Ligate the digested genomic DNA fragments (10 μL) to the two adapters by adding ligation mixture (2 μL of 1.5 pmol of EcoRI adapter, 2 μL of 15 pmol of HpaII-MspI adapter, 4 U of T4 DNA ligase, 1× ligase buffer) in a total volume of 25 μL and incubate overnight at 18 °C. 5. Subsequently, dilute the ligation reaction four times using H2O Milli-Q. 6. Use 3 μL of the diluted ligation reaction, 10 pmol of preselective EcoRI and HpaII-MspI primers, 0.2 mM of dNTPs, and 0.5 U of Taq DNA polymerase. Set the thermocycler using the following conditions: 94 °C, 30 s; 56 °C, 1 min; 72 °C, 1 min for 20 cycles of amplification. 7. Check the size of the amplified fragments by running 10 μL of the PCR products using agarose gel electrophoresis 1.5 % in 1× TAE buffer at 4 V/cm for 3–4 h (see Note 7). 8. Stain with ethidium bromide (see Note 8). 9. View the gel on a UV transilluminator (see Note 9). 10. Dilute 10 μL of the PCR products ten times with H2O Milli-Q and use the dilution as a template for the selective amplification. 11. Use 3 μL of the diluted PCR products, 0.5 pmol of one EcoRI selective labeled primes and 10 pmol of one HpaII/MspI selective primers, 0.04 mM dNTP, and 0.5 U Taq polymerase in a 11 μL PCR reaction using a touchdown program of a thermocycler using the following: 94 °C for 2 min and 20 cycles of 94 °C for 20 s, 66 °C for 30 s, 72 °C for 2 min. The annealing temperature of the first ten cycles follows the shutdown program in which each cycle falls by 1 °C. At the end of these cycles, maintain the reaction at 60 °C for 30 min to get better extension. 3.5.2 Separating the PCR Products of Selective Amplifications by Capillary Electrophoresis on an ABI Prism 310 Genetic Analyzer The ABI Prism 310 Genetic Analyzer is able to detect the fluorescence as the EcoRI site-specific primers are labeled with yellow (NED), blue (FAM), or green (JOE) fluorescent dyes. Each selective primer can be labeled with one of the three florescent colors to allow loading together three different reactions. An internal size marker, GeneScan Rox-500 (35–500 bp) labeled with a red (ROX) dye, should be added in order to determine the size of the separated fragments. 1. Prepare a loading buffer for each sample by mixing 24.0 μL of deionized formamide and 1.0 μL of GeneScan-500 [ROX] size standard. 294 Mahmoud W. Yaish et al. Fig. 3 A sample chromatogram of results obtained from the ABI Prism 310 Genetic Analyzer machine. While the horizontal scale represents the molecular weight of the fragments, the vertical scale represents the quantity of the amplicon. Each peak represents an amplicon (a fragment of DNA produced during the selective PCR amplification) (a–c). Differentially amplified and polymorphic peaks are indicated by arrow. Smaller peaks indicate the presence of a heteromorphic allele in terms of DNA methylation status (b). Absence of peaks indicated by arrows may represent genetic modulation of DNA methylation in Arabidopsis (c) 2. Add 25 μL of the loading buffer mix to a genetic analyzer sample tube. One tube was used for each sample. 3. Add 2 μL of the selective amplified PCR products to the tube. 4. Heat the tubes to 95 °C for 3 min using a thermocycler machine. 5. Then, snap chill the tubes on ice (see Note 10). 6. Using the ABI Prism 310 Genetic Analyzer machine, inject each sample for 12 s, at 15 kV, and use 15 kV as running voltage for 26 min (see Note 11). 3.5.3 Data Analysis Genomic DNA of ten individual plants (ten replicates) is usually treated and screened for each Arabidopsis genetic line and treatment. The DNA methylation deviation pattern from the wild-type can be assured using these replicates which are represented as presence or absent of particular polymorphic DNA fragment (amplicon) in every treatment using the same primer pair in the selective PCR amplification. Quantitative amplification can indicate the presence of a heteromorphic allele in terms of DNA methylation (Fig. 3). DNA Methylation Analysis Using MSAP 295 Selectively amplified DNA fragment data can be collected by the ABI Prism 310 and analyzed using the ABI Prism GeneScan 3.1 software which will size and quantify the detected fragments. The same software can be used to compare the graphical representations of amplified fragments from all individual plants. A peak size between 60 and 500 bp should be selected to study the polymorphic DNA fragments (peaks) between the two genetic lines (Fig. 3). MSAP products can be scored as present (1) or absent (0) on the chromatogram to create a binary matrix. The proportion of polymorphic peaks can be estimated as the ratio of the number of polymorphic peaks to the total number of bands. This data can be treated and arranged depending on the purpose of the study. Partial methylation, due to differences in methylation status between copies of the same locus, results in changes in product intensity between genotypes and 5-azaC- and zebularine-treated plants. Once an interesting peak is identified based on the polymorphic pattern in the chromatogram, the DNA fragment corresponding to that peak can be amplified using the same primer pair, isolated and sequenced by running the selective PCR products in a vertical denaturing 5 % polyacrylamide gel. 3.6 Identification of the Polymorphic DNA Fragment 1. Perform the selective PCR as mentioned above using the preselective PCR products as a DNA template and the suitable selective [32P-ATP] end-labeled EcoRI primer. Run the PCR using the thermocycler and the same conditions as mentioned above (see Note 12). 2. Prepare denaturing 5 % acrylamide gel by mixing 12.5 mL of 40 % of acrylamide-bis solution, 7.5 M urea in 50 mM TBE, 500 μL 10 % ammonium persulfate, and 100 μL TEMED (see Note 13). 3. Cast the solution in a Sequi-Gen 38 cm × 50 cm gel apparatus and allow the gel to solidify for 4 h. 4. Denature PCR samples by mixing 20 μL of formamide loading dye with equal amount of PCR sample, heat at 90 °C for 3 min, and then quickly chill on ice for at least 2 min. 5. Wash the gel wells from unpolymerized polyacrylamide and urea then load an equal amount of every sample in the well. 6. Run the gel electrophoresis using TBE buffer at constant power, 110 W, for 2 h. 7. Fix the DNA in the gel for 30 min in 10 % acetic acid, dry it on the glass plates, and expose it to Fuji phosphoimage screens for 16 h. Fingerprint patterns can be visualized using a Fuji BAS-2000 phosphoimage analysis system. 8. Isolate the polymorphic DNA by cutting the band from the gel. 9. Rehydrate the band by boiling in 100 μL H2O Milli-Q for 5 min. 296 Mahmoud W. Yaish et al. 10. Clean up the DNA fragment from the gel impurities using the QIAEX II Gel Extraction Kit. 11. Use the purified fragment as a template for a PCR reaction containing 5.0 μL of the eluted DNA, 10.0 pmol of selective EcoRI primer, 10.0 pmol HpaII/MspI, PCR buffer containing MgCl2, 2.5 mM dNTP, and 1.0 U Taq polymerase. The PCR cycle should be used as mentioned above for the selective PCR reaction (see Note 14). 12. Purify the PCR reaction using QIAquick PCR Purification Kit following the manufacturer’s instructions. 13. Sequence the PCR products by using the selective EcoRI primer and the routine sequencing reaction and conditions. 3.6.1 Data Analysis 4 In order to identify the differentially methylated DNA fragments, information obtained from the sequencing reaction can be used in a BLAST search against the National Center of Biotechnology databases searching for sequence similarity. The BLAST website is available at http://blast.ncbi.nlm.nih.gov/Blast.cgi. A gene can be identified based on the similarity between the sequence in the database and the obtained one. Notes 1. 5-azaC is white crystalline powder and soluble (50 mM) in water. However, 5-azaC is unstable in aqueous solution and sensitive to light and oxidation. Therefore, storing 5-azaC is not recommended. Treatment of Arabidopsis seeds should use freshly prepared 5-azaC solution kept in the dark and at low temperature. 2. Zebularine is an off-white solid and soluble (100 mM) in water. A zebularine aqueous solution is stable for up to 3 months at −20 °C. 3. The EcoRI site-specific primers can be labeled with yellow (NED), blue (FAM), or green (JOE) fluorescent dyes to allow one to load three different reactions simultaneously. 4. The demethylation effect of zebularine is transient. Growing Arabidopsis seedlings on zebularine treatment medium and transferring them to control medium can be used to find zebularine transiently reduced Arabidopsis genomic DNA methylation. 5. The DNA concentration can be measured using a NanoDrop spectrophotometer adjusted to a wavelength of 260 nm. The purity of the DNA is determined by measuring the absorbance ratio 260/280 nm. A good quality DNA should have a ratio between 1.8 and 2.0. Good quality DNA appears in the agarose DNA Methylation Analysis Using MSAP 297 gel stained with ethidium bromide as a high molecular weight sharp single DNA band. Bad quality DNA appears as several DNA bands or a smear in the same gel. Smears in the gel indicate the presence of low molecular weight DNA which is due to degradation during DNA extraction. This is not suitable for MSAP analysis. 6. DNA digestions and adapter ligations should be carried out separately to avoid the formation of a long continuous DNA molecule that contains multiple copies of the same DNA sequences linked together in series (concatemers). 7. The ligation step is a very critical step of this protocol. Preamplified PCR products should appear as a smear with equal intensities between samples using agarose gel electrophoresis and ethidium bromide staining. 8. Ethidium bromide is a mutagen chemical and is moderately toxic. Apply extra cautions when you use it. Wear gloves, a lab coat, and safety glasses when using this dye. 9. Good amplification products for MSAP should appear as a smear of molecular weight between 100 and 1,500 bp in a 1.5 % agarose gel. 10. The genetic analyzer sample tubes can be placed in the 48- or 96-well sample try. 11. To verify the reproducibility of each fragment, each MSAP procedure should be repeated at least twice. 12. PCR labeling of the DNA fragment, excision of the DNA fragment from the chromatogram, and purification of the radio-labeled PCR should be carried out behind 3/8 or 1/2 inch-thickness glass or transparent acrylic plates. 13. Unpolymerized acrylamide and TEMED should be handled carefully because they are widely considered as neurotoxic and reproductive toxic materials, respectively. 14. Often, the eluted amount of DNA is not enough to be used in the sequencing reactions, therefore PCR is used to amplify and increase the original amount of eluted DNA. References 1. Ehrlich M, Wang RY (1981) 5-Methylcytosine in eukaryotic DNA. Science 212:1350–1357 2. Doerfler W (1983) DNA methylation and gene activity. Annu Rev Biochem 52:93–124 3. Riggs AD, Jones PA (1983) 5-Methylcytosine, gene regulation, and cancer. Adv Cancer Res 40:1–30 4. Yaish MW, Colasanti J, Rothstein SJ (2011) The role of epigenetic processes in controlling flowering time in plants exposed to stress. J Exp Bot 62:3727–3735 5. Boyko A et al (2010) Transgenerational adaptation of Arabidopsis to stress requires DNA methylation and the function of Dicer-like proteins. PLoS One 5:e9514 6. Chan SW, Henderson IR, Jacobsen SE (2005) Gardening the genome: DNA methylation in Arabidopsis thaliana. Nat Rev Genet 6:351–360 298 Mahmoud W. Yaish et al. 7. Molinier J et al (2006) Transgeneration memory of stress in plants. Nature 442:1046–1049 8. Bartee L, Bender J (2001) Two Arabidopsis methylation-deficiency mutations confer only partial effects on a methylated endogenous gene family. Nucleic Acids Res 29:2127–2134 9. Singer T, Yordan C, Martienssen RA (2001) Robertson’s mutator transposons in A. thaliana are regulated by the chromatin-remodeling gene decrease in DNA methylation (DDM1). Genes Dev 15:591–602 10. Santi DV, Garrett CE, Barr PJ (1983) On the mechanism of inhibition of DNA-cytosine methyltransferases by cytosine analogs. Cell 33:9–10 11. Yaish MW, Peng M, Rothstein SJ (2009) AtMBD9 modulates Arabidopsis development through the dual epigenetic pathways of DNA methylation and histone acetylation. Plant J 59:123–135 12. Borowska N, Idziak D, Hasterok R (2011) DNA methylation patterns of Brachypodium distachyon chromosomes and their alteration by 5-azacytidine treatment. Chromosome Res 19:955–967 13. Castilho A et al (1999) 5-Methylcytosine distribution and genome organization in triticale before and after treatment with 5azacytidine. J Cell Sci 112(Pt 23):4397–4404 14. Cheng JC et al (2003) Inhibition of DNA methylation and reactivation of silenced genes by zebularine. J Natl Cancer Inst 95:399–409 15. Zilberman D, Henikoff S (2007) Genome-wide analysis of DNA methylation patterns. Development 134:3959–3965 16. Vos P et al (1995) AFLP: a new technique for DNA fingerprinting. Nucleic Acids Res 23: 4407–4414 17. Beaulieu J, Jean M, Belzile F (2009) The allotetraploid Arabidopsis thaliana–Arabidopsis lyrata subsp. petraea as an alternative model system for the study of polyploidy in plants. Mol Genet Genomics 281:421–435 18. Madlung A et al (2002) Remodeling of DNA methylation and phenotypic and transcriptional changes in synthetic Arabidopsis allotetraploids. Plant Physiol 129:733–746 19. Murashige T, Skoog F (1962) A revised medium for rapid growth and bio assays with tobacco tissue cultures. Physiol Plant 15:473–497 Part IV Molecular Biological Techniques Chapter 17 Next-Generation Mapping of Genetic Mutations Using Bulk Population Sequencing Ryan S. Austin, Steven P. Chatfield, Darrell Desveaux, and David S. Guttman Abstract Next-generation sequencing platforms have made it possible to very rapidly map genetic mutations in Arabidopsis using whole-genome resequencing against pooled members of an F2 mapping population. In the case of recessive mutations, all individuals expressing the phenotype will be homozygous for the mutant genome at the locus responsible for the phenotype, while all other loci segregate roughly equally for both parental lines due to recombination. Importantly, genomic regions flanking the recessive mutation will be in linkage disequilibrium and therefore also be homozygous due to genetic hitchhiking. This information can be exploited to quickly and effectively identify the causal mutation. To this end, sequence data generated from members of the pooled population exhibiting the mutant phenotype are first aligned to the reference genome. Polymorphisms between the mutant and mapping line are then identified and used to determine the homozygous, nonrecombinant region harboring the mutation. Polymorphisms in the identified region are filtered to provide a short list of markers potentially responsible for the phenotype of interest, which is followed by validation at the bench. Although the focus of recent studies has been on the mapping of point mutations exhibiting recessive phenotypes, the techniques employed can be extended to incorporate more complicated scenarios such as dominant mutations and those caused by insertions or deletions in genomic sequence. This chapter describes detailed procedures for performing next-generation mapping against an Arabidopsis mutant and discusses how different mutations might be approached. Key words Mutagenesis, Genetic mapping, Positional/mapped-based cloning, Genome sequencing, Next-generation genomics, Genome analysis 1 Introduction The physical mapping of monogenic, qualitative traits has traditionally been a laborious and time-consuming task due to the necessity of breeding and phenotyping large populations of F2 plants and their subsequent molecular scoring. The advent of next-generation sequencing (NGS) technologies has dramatically reduced this effort in a number of model systems, including Arabidopsis, by replacing the scoring of molecular markers with whole-genome sequencing [1–8]. Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_17, © Springer Science+Business Media New York 2014 301 302 Ryan S. Austin et al. To date, several groups have developed powerful NGS mapping approaches for Arabidopsis, typically focused on identifying the position of recessive ethyl-methanesulfonate (EMS)-induced mutations [1–3]. All of these methods employ an approach analogous to a bulksegregant analysis [9]. Namely, they exploit the genetic principle that when a line carrying a recessive mutation (the mutant line) is crossed to a mapping line to form an F1, which is then selfed to form a population of F2 plants segregating for the recessive trait of interest, all plants possessing the target phenotype will be homozygous at the causative mutation [10]. Moreover, the causative mutation will be in linkage disequilibrium with the surrounding genome due to genetic hitchhiking, and consequently the mutation of interest will be embedded in a larger homozygous block of the mutant genome. The extent of this disequilibrium, in terms of how far it is maintained as you move out from the mutation of interest, will be determined by the amount of recombination between the two parental lines, which is directly related to the number of individual F2 lines examined. Consequently, distant genomic regions and those on different chromosomes will be segregating with approximately an equal mix of the two parental lines [9]. NGS can be performed on a pool of F2 lines to identify nearly all mutations that distinguish the mutant and mapping line. These sequence data are typically mapped back onto a reference genome to identify those genomic regions that carry SNPs diagnostic of both parents verses SNPs that are unique to the mutant line. Since SNPs are identified in a de novo manner from the sequence data, the causal mutation can be found directly within the sequence results. This is of course dependent upon a sufficient number of recombination events surrounding the target locus, adequate sequence quality and a sufficient depth of coverage across the genome for calling SNPs with reasonable confidence. However, when these conditions are met, mapping software and tools are able to quickly identify a short list of candidate genes responsible for the phenotype [1–3]. Although, several approaches and tools for performing mapping by NGS in Arabidopsis have been made available [1–3], this protocol will focus on the “next-generation mapping” (NGM) implementation [2] (http://bar.utoronto.ca/ngm). This method classifies SNP allelic frequencies as arising either from the homozygous (mutant) or heterozygous (mutant and mapping) backgrounds using a purity statistic and applies a technique based on kernel density estimation to refine the region of interest [2]. A user-friendly, web-based interface allows the researcher to dynamically explore their mapping result and requires only a file detailing SNPs present within the bulked population. This file is generated from sequencing the F2 bulk population on any suitable NGS platform, aligning reads to the reference genome and calling SNPs using freely available public-domain software. Although mapping-by-NGS applications in Arabidopsis to date have mostly focused on recessive point mutations generated Next-Generation Mapping 303 from EMS screens in a Columbia reference background, the method can be extended to map mutations arising from different ecotype backgrounds [3], indels, suppressor/enhancer screens, or those having dominant phenotypes. In the case of dominant mutations, mapping is possible by carrying individual F2 lines through to F3. Frozen F2 tissue can then be bulked based on whether F3 progeny no longer segregate for the background phenotype. This will ensure that all F2 are homozygous at the dominant loci and mapping proceeds in the same manner as for a recessive trait. Similarly, indels can be approached in a manner analogous to SNP identification. Called SNPs, generated as de novo markers, would still be used to perform the virtual bulk segregation and first identify the nonrecombinant region. A candidate list of genes could then be created by pulling indels from within that region using NGS software and filtering them based on their effect on coding sequence and level of homogeneity among all reads mapped at that loci. In more complicated mappings, such as suppressor screens that produce a recessive phenotype, the mapping can still be approached as a typical recessive mapping. This should produce an expected result of more than one region of nonrecombination in the genome. However, if the mutation has linkage with or lies on the same chromosome as the background target loci or the nature of the screen itself is inherently complex, producing epistatic effects, for example, then mappings may be very difficult or unsuccessful. Certainly, as NGS mapping continues to develop and eventually replace traditional mapping, the tools and techniques available will accommodate increasingly complicated mapping scenarios. 2 Materials 2.1 Tissue Generation 1. An M2 EMS mutant line of Arabidopsis carrying a recessive mutation resulting in an interesting phenotype (see Note 1). 2. A mapping line of Arabidopsis different from the reference ecotype. 3. Equipment for crossing: (a) Dissection microscope or magnifying lens/headgear. (b) Fine forceps. 4. Materials and growth conditions to manifest/distinguish mutant phenotype. 5. Equipment to harvest and store tissues from selected F2 plants: (a) Fine scissors/forceps. (b) Microcentrifuge tubes or aluminum foil. (c) Liquid nitrogen and −80 °C freezer. 304 2.2 Ryan S. Austin et al. Bulk Sequencing 1. Mortar, pestle, and liquid nitrogen. 2. Plant genomic DNA extraction kit. 3. Standard molecular biology laboratory. 2.3 Next-Generation Mapping 1. A computer running a distribution of the Linux operating system. 2. Software for mapping sequence reads to a reference (e.g., BWA, Bowtie) [11, 12]. 3. A compatible version of SAMtools (see Note 2) [13]. 4. A web browser with Java Runtime Environment 1.5 or higher enabled. 5. A reference sequence for the Arabidopsis genome in FASTA format (see Note 3). 3 Methods 3.1 Generating a Mapping Population 1. Grow mutant and mapping lines for synchronous flowering. 2. Cross mutant and mapping lines (see Note 4): (a) Using tweezers and a dissecting microscope, remove opened flowers and young buds on a mutant plant, and then emasculate 1–3 late-stage unopened flower buds. (b) Apply pollen from an opened flower of the mapping line donor to the receptive stigma of an emasculated bud of the mutant line. (c) Label the cross and cut out the apical meristem to prevent further flowers forming. (d) Harvest the resulting silique as it browns, but before it dehisces. 3. For even germination, allow 1–2 weeks for F1 seed to fully dry and mature before sowing. If the cross was successful none of the F1 should show the mutant phenotype. 4. Grow the F1 plants to harvest F2 seed. 5. Sow the F2, grow, and phenotype (see Note 5). 6. Harvest equal quantities of tissue from 50 to 100 F2 plants exhibiting the phenotype (see Note 6) and flash freeze in liquid nitrogen (see Note 7). 3.2 DNA Extraction and Preparation for Sequencing 1. Grind pooled tissue in liquid nitrogen and extract genomic DNA (see Notes 8 and 9). 2. Send the genomic DNA sample for sequencing on a NGS platform (see Note 10). Next-Generation Mapping 3.3 Reference Mapping and Polymorphism Calling 305 Mapping your sequence reads to a genomic reference can be accomplished using any software that produces an output file in the NGS standard SAM/BAM (Sequence Alignment Map) file format (see Note 11). Most NGS mapping programs are run on the command line in a UNIX environment (see Note 12). We present several examples of how this would be accomplished in this section. However, the examples provided, while completely sufficient, involve third-party software under active development. Thus, particular commands may change with successive revisions, and examples are intended as a rough outline for how mapping to a reference genome is accomplished using two popular programs. Likewise, the optimal parameters supplied may vary depending on the nature of the sequence data employed. The assumptions are made that the software programs mentioned are properly installed on the user’s computer and that all sequence reads are concatenated into a single file in FASTQ format (see Notes 13 and 14). 1. Obtain your sequence reads in the standard FASTQ format from your sequencing center. 2. Download the reference genome for Arabidopsis (e.g., TAIR10) (see Note 3). 3. Map the sequence reads to the reference genome using a next-generation mapping tool that can return SAM/BAM data. This is exampled below using two different popular tools, BWA v0.5.8c [11] and Bowtie v0.12.7 [12]. The “$” in front of each command represents the command prompt and should not be typed (see Note 15). Example A: Using Bowtie against a single-read data file reads.fastq (a) Generate an index for the Arabidopsis genome: $ bowtie-build TAIR10_chr_all.fas TAIR10 (b) Align the reads to the reference and put to a SAM file: $ bowtie -S TAIR10 reads.fastq alignment.sam (c) Convert the SAM file to BAM file: $ samtools view –bS –o alignment.bam alignment.sam Example B: Using BWA against a single-read data file reads. fastq (a) Generate an index for the Arabidopsis genome: $ bwa index TAIR10_chr_all.fas (b) Align the reads to the reference genome and put to a temporary alignment file: $ bwa aln TAIR10_chr_all.fas reads.fastq > reads.sai 306 Ryan S. Austin et al. (c) Generate the map file and put to compressed SAM (see Note 16): $ bwa samse TAIR10_chr_all.fas reads.sai reads.fastq | gzip > result.sam.gz (d) Sort the results and create a BAM output file, alignment. bam (see Note 17): $ samtools view –bt TAIR10_chr_all.fas result.sam.gz | samtools sort – alignment Example C: Using BWA against paired-end data files read1. fastq and read2.fastq (a) Generate an index for the Arabidopsis genome: $ bwa index TAIR10_chr_all.fas (b) Align each collection of read pairs to the reference genome and put to a temporary file: $ bwa aln TAIR10_chr_all.fas reads1.fastq > reads1.sai $ bwa aln TAIR10_chr_all.fas reads2.fastq > reads2.sai (c) Generate the alignment file, pairing reads together to find best mapping positions and compress SAM output: $ bwa sampe TAIR10_chr_all.fas reads1.sai reads2.sai reads1.fastq reads2.fastq | gzip > result.sam.gz (d) Sort the results and create an output file, alignment.bam: $ samtools view –bt TAIR10_chr_all.fas result.sam.gz | samtools sort – alignment 4. Using SAMtools v0.16 or earlier, take the BAM file output from your mapping procedure and generate a “pileup” file detailing polymorphism information using the below command (see Note 2). $ samtools pileup -vcf reference.fasta alignment.bam > out. pileup 3.4 Next-Generation Mapping 1. Connect to the next-generation mapping (NGM) server at the Bio-Array Resource, University of Toronto (http://bar. utoronto.ca/ngm). 2. Click “Start the Applet” and agree to the security dialogue that pops up (see Notes 18 and 19). 3. Select the “SAM” tab if necessary and click “Select SAM file.” Provide the “pileup” file that you created in step 4 above using SAMtools. Next-Generation Mapping 307 4. Click on “Select Output File” and choose a name for the output file. This will be given the extension of “.emap” and be created on your local computer. 5. Click the “Start Processing” button. 6. When the applet finishes, scroll down to the Upload Data section and click the “Choose File” button. Browse to the “emap” file you just created and select. 7. Make sure the “Filter SNP data by quality criteria” radio box is checked and click on “Upload and analyze” to begin mapping (see Note 20). 8. The next “Map to chromosome” screen will present a histogram for each chromosome in the Arabidopsis genome and the frequency of SNPs occurring along the length of each chromosome, binned at 250 kb intervals (Fig. 1). You can adjust the interval size at the top of the page by entering a new bin size and selecting “Update histogram”; however, this is rarely needed. Select the chromosome that possesses a region depleted in SNPs by clicking the radio button under the chromosome identifier. Then click “Submit” (see Note 21). 9. The “Chastity belt partitioning” screen presents a default selection of parameters and an initial attempt at localizing the mutation (Fig. 2). 10. Click on the “Show detailed view” button at the top of the screen. This will repeat the chastity separation process at a variety of parameter values and present the results. Begin at the left of the page where k = 5 and examine the ratios downward for each kernel size. Identify the ratio that provides a distinctive peak using the smallest value for “k” and largest “kernel” value possible. Click on the selection button to the left of the best ratio (see Note 23). 11. Adjust the red guide bars to flank a region on either side of the peak identified. A list of potential candidate SNPs will appear under the SNP annotations section at the bottom of the screen. Adjust the guide bars to encompass a generous region around the identified peak. 12. Clear the “Filter SNP data by quality criteria” radio button and click “Update quality filter.” Also, clear the checkboxes for removing transversions and non-CDS mutations (see Note 24). 13. Examine the BLOSUM score for any non-synonymous substitutions. The larger the score, the more disruptive the amino acid substitution is to the coding sequence (see Note 25). 14. Use the above information to formulate a prioritized short list of candidate genes for validation at the bench (see Note 26). Fig. 1 Genome-wide natural variation patterns. Histograms of the highly reproducible frequency of SNPs found genome wide between the Columbia-0 (Col-0) and Landsberg erecta (Ler) accessions (left; 250 kb bins). Nonrecombinant region examples for each of the five Arabidopsis chromosomes (right). In each example, all other chromosomes would exhibit the default pattern of natural variation seen in the left panel. A vertical black dash marks the position of the causal mutation found in each case (see Note 22) Fig. 2 Chastity belt partitioning. 80 different “chastity threads” are smoothed estimations of SNP frequency along the chromosome length for SNPs possessing discordant chastity scores within discretely defined intervals (top panel) [2]. Smoothing is adjusted using the “kernel” parameter. Colors correspond to “k” different clusters of similarity among threads as grouped by k-means clustering. Threads in the top panel that fall within clusters containing allele frequency values corresponding to homozygous frequency (i.e., discordant chastity = 1) and heterozygous frequency (i.e., discordant chastity = 0.5) are presented in the second panel. The ratio of the chastity belts in the second panel is used to localize the mutation (middle panel). Additional ratios created by repeating the smoothing process using smaller “kernel” values are presented in the bottom two panels 310 4 Ryan S. Austin et al. Notes 1. As these protocols were initially developed for relatively simple cross designs, in complicated suppressor/enhancer screens or the like, the genomic structure could be complicated with epistatic domains and other features that make mapping very difficult. Additionally, extensive backcrossing is not recommended, as in our experience mappings are more successful without. While mapping by NGS can be applied to any two ecotypes of Arabidopsis, as the physical mutation is identified through comparison to a reference sequence, it is important that the reference genome used corresponds to the mutagenized line. 2. In order to process SNPs directly from the industry standard Variant Call Format (VCF) (such as created by the SAMtools ‘mpileup’ function), users should download and run the Perl script, BCF2NGM.pl, as provided on the NGM website, against their VCF file before uploading the result to the NGM server. In this case, the user should specify that SNPs are not to be filtered by NGM by unchecking the filter SNPs checkbox when uploading data to the server. In order to make use of the JAVA applet for preprocessing SAMtools ‘pileup’ data, users must use a version of SAMtools (i.e. < 0.1.16) that uses the now deprecated ‘pileup’ function as available here: http://sourceforge. net/projects/samtools/files/samtools/. 3. As of print, the TAIR10 genomic reference could be obtained at: ftp://ftp.arabidopsis.org/Genes/TAIR10_genome_release/ TAIR10_chromosome_files/TAIR10_chr_all.fas. 4. For a guide to crossing Arabidopsis, see http://arabidopsis. info/InfoPages?template=crossing;web_section=arabidopsis. 5. The success of a reciprocal cross, with pollen from a recessive mutant applied to the mapping line, will not be revealed until homozygotes for the mutant allele segregate in the F2. However, a successful cross of a dominant allele from the mutant will be revealed in the F1. The success of any cross can also be confirmed in the F2 using PCR with primers differentiating between SNPs at several unlinked positions. 6. The selection of too many F2s for sequencing (e.g., >200) is suspected to be detrimental to NGM. As the size of the nonrecombinant block surrounding the mutation of interest will be proportional to the number of F2 lines used, too many lines could excessively narrow the region of homozygosity surrounding the mutation and obfuscate discovery. 7. Intact seedlings can be harvested for this purpose or individual leaves/leaf punches at later stages of development. An easy way to obtain consistent leaf samples is to use a clean hole punch or close the cap of a microfuge tube on a leaf blade to Next-Generation Mapping 311 similar effect. Duplicate pools of tissue insure against the need to regrow and test F2 plants if subsequent steps fail. 8. Since only small quantities of tissue are required for each prep (10–30 mg), multiple genomic preps can usually be produced from each tissue pool, with the option to combine them if individual yields do not meet expectations. High yields of genomic DNA can be obtained using Gentra Puregene kits (Qiagen), but the duration of some incubation steps (e.g., cell lysis and RNAse incubations) may need to be optimized depending on the tissues used to produce preps of sufficient purity. Very clean genomic samples can be obtained using a column-based extraction method such as the DNeasy Plant Mini kit (Qiagen). The step recommended to minimize genomic shearing should be used, and additional steps can be employed to enhance eluted DNA concentration depending on the requirements of the sequencing platform. Incubating the elution buffer on the column for a limited period prior to centrifuging, reusing the first eluate in the same column, or to elute multiple columns can help ensure the genomic sample is sufficiently concentrated. Generally speaking the columnbased method can help ensure high-enough purity when extracting from samples of older or recalcitrant tissues. 9. It is important that sufficient measures are taken to avoid contamination of the final genomic sample with DNA from either wild-type line and to minimize the presence of falsepositive F2s. Such problems can dilute the EMS and natural variation polymorphism signals to a level that makes mutant identification and even visualization of the nonrecombinant region difficult or impossible. 10. We generally find that 50–80 million clusters generated by a paired-end protocol with 40 bp read lengths (approximately 2–4 Gb) are sufficient data for mapping. This should provide 15–30× depth coverage of the ~120 Mb Arabidopsis genome. Of course, as would be expected, mappings with high-quality, high-coverage (+50×) sequence have produced excellent results. 11. A BAM file (Binary Alignment Map) is a compressed SAM file (Sequence Alignment Map). The SAM format has become an industry standard for representing sequence alignment data, much like FASTA or FASTQ that are standards for representing sequence data. The SAM Format Specification (v1.4-r985) can be found here: http://samtools.sourceforge.net/SAM1.pdf. 12. It is assumed that the user is familiar with running programs from the command line in a UNIX terminal. In cases where a Linux server is unavailable, using the “Terminal” application in Apple’s OS-X may suffice, provided sufficient memory and CPU power are available. Many good books and online references have been written on using the command line in a 312 Ryan S. Austin et al. UNIX-based operating system such as Linux or OS-X. A Google search for “UNIX primer” would be a good place to start. 13. FASTQ (or FASTA with quality) is a sequence representation format similar to FASTA that includes an additional line of quality information encoding an error probability for each base pair in the sequence. 14. It may frequently be less than optimal to map all sequence reads using a single large FASTQ file. This approach may not properly utilize all available computational resources (e.g., by distributing the task by mapping using many smaller FASTQ files and merging the results). Nevertheless, it is presented as such for the sake of brevity. It may benefit computer resources to apply these tools in a distributive manner against many smaller files containing subsets of sequence reads. Moreover, large data sets, such as from a lane of HiSeq, can crash the mapping tools with too much data. 15. In the commands listed, the use of “>” is a redirect operator that directs the output of the program to a file, while the “|” is a pipe operator which directs the output of the program to another program. Switches are parameters that control program execution and are preceded by a dash “-” followed by a letter indicator. See individual program documentation for information on available switches. 16. Output in the examples is compressed and put to result.sam. gz rather than a file with the conventional BAM extension (as BAM is a compressed SAM file) (i.e., result.bam) so that the file is not clobbered in the next command that sorts and creates the actual result.bam file to be used. 17. The use of a “-” in the “samtools sort” command in this line tells SAMtools that data is provided from another program using the “|” operator. The word “alignment” is a userprovided prefix for the BAM file to be generated by SAMtools. 18. An applet has been built into NGM to allow the processing of files that are potentially too large to transfer over a network. The applet simply calculates a statistic based on allele composition and appends it the pileup file before trimming and compressing the data. Future revisions of NGM may eliminate the applet in favor of uploading a single VCF file, such as can be generated by the “samtools mpileup” command. 19. If difficulties are experienced using the Java applet, a Perl script can be downloaded from the NGM site and used to process SAMtools output instead. To run the Perl script against your SAMtools output (e.g., output.pileup), download the Perl script SAM2NGM.pl, and ensure it is executable with “chmod + x SAM2NGM.pl” and run: Next-Generation Mapping 313 $ SAM2NGM.pl output.pileup. This will create the file output.pileup.emap for upload to the NGM server using the “Choose File” button. 20. While NGM is typically very robust against extraneous false-positive SNP calls, it is important to filter your SNPs for the identification of the nonrecombinant region. In circumstances where the nonrecombinant region is difficult to identify, application of aggressive filtering at this stage may help. Also, it is not recommended to filter your SNPs prior to their provision to NGM. Although NGM will filter the SNPs, it stores them in memory and allows them to be considered in the final stage of mapping to account for circumstance involving poor quality data or low coverage that may exclude the actual causal mutation from the initial analysis. In scenarios with low or overly abundant sequence data, one may want to adjust the min/max depth parameters for SNP calling. Similarly, adjustments to quality scores can be tweaked to slacken or increase SNP pre-filtering. 21. In situations where the nonrecombinant region is not readily apparent, you may compare the histograms obtained against the default-expected natural variation histograms between Columbia-0 (Col-0) and Landsberg erecta (Ler) displayed in Fig. 1 or examine histograms returned by the three NGM examples provided on the NGM home page. The patterns of natural variation observed across each chromosome are highly reproducible. 22. In relation to the various example nonrecombinant regions in Fig. 1: Chromosome 1 provides an ideal scenario of good recombination rates on either side of the nonrecombinant region. Peak identification for this mutant is shown in Fig. 2. Chromosome 2 examples a large-scale drop in recombination towards the tail end, while chromosome 3 (and chromosome 1) illustrates the parabolic recombination pattern in SNP frequency that is frequently found. Chromosome 4 examples a complicated mapping scenario in which recombination dropped considerably across the chromosome, and chromosome 5 examples a scenario where poor sequence quality and the presence of false-positive F2s in the bulked population obfuscated the identification of the nonrecombinant region. 23. It is important not to aggressively choose a very small kernel right away as the smaller the kernel size chosen, the more sparse the data that is incorporated in the kernel density estimation. This can result in artifact effects that appear as “peak shifts.” If a peak exists at a larger kernel size and is shifted away from its original position when a smaller kernel is employed, this result 314 Ryan S. Austin et al. should be disregarded as an artifact. The rule of thumb is to use the largest kernel size possible with the smallest cluster size (k > 3) in order to return a distinctive peak. In some cases, chastity belt partitioning can fail to return a distinctive peak. If this is the case but the initial SNP histogram possessed a distinctive nonrecombinant region, it is advised that the user select a generous region surrounding the nonrecombinant region identified in the histogram. When this region exhibits a parabolic pattern, the causal mutation is typically found near the bottom of the parabola. 24. These measures allow for a generous inclusion of all SNPs occurring within the targeted region. Removing the filters is useful in cases where poor quality data may have been removed but correspond to false-negative SNPs. Transversions may occur in ~1 % of EMS mutations and may be informative in rare instances. Similarly, non-CDS mutations in the form of cryptic splice sites are also possible and will be annotated as such by NGM. 25. The BLOSUM 100 score provides a measure of effect the amino acid substitution will have, with larger numbers having a more adverse effect. Also, the discordant chastity should ideally be as close to 1.0 (100 %) as possible. However, it will be lowered by false-positive F2s as well as sequencing and mapping errors. The default value of 0.85 is conservative and usually sufficient with values >0.95 commonly seen around the causal mutation. 26. In cases with abundant candidate genes, it is advised that the researcher performs a multiple sequence alignment using protein sequence from several orthologs of the target gene pulled from various plant relatives. Genes can then be ranked for priority based on whether position at which the amino acid substitution occurs in the candidate gene is conserved among plant species and more likely to have a phenotypic effect. Acknowledgments The authors thank Peter McCourt, Nicholas J. Provart, Pauline W. Wang, Danielle Vidaurre, George Stamatiou, Robert Breit, Dario Bonetta, Jianfeng Zhang, Pauline Fung, and Yunchen Gong for their help in the development of NGM. We would also like to express our gratitude to the McCourt and Desveaux Labs (University of Toronto), Haughan Lab (University of British Columbia), and Bonnetta Lab (University of Ontario Institute of Technology) for their provision of sequence data. This work was funded through grants by the Natural Sciences and Engineering Research Council of Canada to D.S. Guttman and D. Desveaux. Next-Generation Mapping 315 References 1. Schneeberger K et al (2009) SHOREmap: simultaneous mapping and mutation identification by deep sequencing. Nat Methods 6:550–551 2. Austin RS et al (2011) Next-generation mapping of Arabidopsis genes. Plant J 67:715–725 3. Uchida N et al (2011) Identification of EMSinduced causal mutations in a non-reference Arabidopsis thaliana accession by whole genome sequencing. Plant Cell Physiol 52:716–722 4. Sarin S et al (2010) Analysis of multiple ethyl methanesulfonate-mutagenized Caenorhabditis elegans strains by whole-genome sequencing. Genetics 185:417–430 5. Blumenstiel JP et al (2009) Identification of EMS-induced mutations in Drosophila melanogaster by whole-genome sequencing. Genetics 182:25–32 6. Smith DR et al (2008) Rapid whole-genome mutational profiling using next-generation sequencing technologies. Genome Res 18:1638–1642 7. Zuryn S et al (2010) A strategy for direct mapping and identification of mutations by wholegenome sequencing. Genetics 186:427–430 8. Irvine DV et al (2009) Mapping epigenetic mutations in fission yeast using whole-genome next-generation sequencing. Genome Res 19:1077–1083 9. Michelmore RW, Paran I, Kesseli RV (1991) Identification of markers linked to diseaseresistance genes by bulked segregant analysis: a rapid method to detect markers in specific genomic regions by using segregating populations. Proc Natl Acad Sci U S A 88:9828–9832 10. Lister R, Gregory B, Ecker J (2009) Next is now: new technologies for sequencing of genomes, transcriptomes, and beyond. Curr Opin Plant Biol 12:107–118 11. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760 12. Langmead B et al (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25 13. Li H et al (2009) The sequence alignment/ map format and SAMtools. Bioinformatics 25:2078–2079 Chapter 18 Chemical Fingerprinting of Arabidopsis Using Fourier Transform Infrared (FT-IR) Spectroscopic Approaches András Gorzsás and Björn Sundberg Abstract Fourier transform infrared (FT-IR) spectroscopy is a fast, sensitive, inexpensive, and nondestructive technique for chemical profiling of plant materials. In this chapter we discuss the instrumental setup, the basic principles of analysis, and the possibilities for and limitations of obtaining qualitative and semiquantitative information by FT-IR spectroscopy. We provide detailed protocols for four fully customizable techniques: (1) Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS): a sensitive and high-throughput technique for powders; (2) attenuated total reflectance (ATR) spectroscopy: a technique that requires no sample preparation and can be used for solid samples as well as for cell cultures; (3) microspectroscopy using a single element (SE) detector: a technique used for analyzing sections at low spatial resolution; and (4) microspectroscopy using a focal plane array (FPA) detector: a technique for rapid chemical profiling of plant sections at cellular resolution. Sample preparation, measurement, and data analysis steps are listed for each of the techniques to help the user collect the best quality spectra and prepare them for subsequent multivariate analysis. Key words Fourier transform infrared spectroscopy, Methods, Microspectroscopy, Chemical composition, Multivariate analysis, Plant, Attenuated total reflectance, Diffuse reflectance, Focal plane array detector 1 Introduction It is not surprising that Fourier transform infrared (FT-IR) spectroscopy has gained popularity in plant sciences in the past years [1–7] as it has numerous advantages in the chemical analysis of a wide range of plant materials. It is nondestructive, fast, inexpensive, sensitive, and easy to customize and automate. It provides information on the entire chemical profile of the investigated sample and can be used on intact tissues for in situ analysis. With microscopic accessories, even the spatial distribution of compounds can be studied and visualized. FT-IR spectroscopy probes functional groups in the sample. In plants, which contain a mixture of chemically related components, Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_18, © Springer Science+Business Media New York 2014 317 318 András Gorzsás and Björn Sundberg this information is rarely diagnostic for a particular compound. Instead, FT-IR spectroscopy provides a chemical fingerprint of the sample composition. As such, it is well suited for high-throughput screens aiming to classify a large number of samples according to their overall chemical profile or to identify samples with modified chemical composition (i.e., mutant screens). Although some qualitative and quantitative information is gained about the chemical composition of the sample, complementary analytical techniques are required for more detailed information, for example, Raman, UV–VIS, and NMR spectroscopies; wet chemical analyses; or mass spectrometry. The present chapter will focus on FT-IR spectroscopy, both as a high-throughput technique (diffuse reflectance and attenuated total reflectance measurements, Subheadings 1.1 and 1.2) and as a low-throughput tool for spatially resolved [8] sampling (microspectroscopy, Subheading 1.3). More advanced uses of FT-IR spectroscopy, for example, two-dimensional correlation spectroscopy [9, 10] and multivariate imaging [7], will not be described here. FT-IR spectroscopy is based on molecular vibrations, i.e., the displacement of atoms from their equilibrium positions. The different ways a molecule can vibrate are called vibrational modes. Mid-infrared (mid-IR) radiation (the 400–4,000 cm−1 region of the electromagnetic spectrum) often causes a transition of the molecular vibrational modes from the ground state to first excited state (fundamental transition). A detector measures the intensity difference between the original radiation (I0) and the radiation after interaction with the sample (I). The spectrum is the plot of intensity changes as a function of frequency (or wavenumber, as is often used in FT-IR spectroscopy, with units of cm−1; see Note 1). Qualitative information is obtained by analyzing the positions of peaks (bands) in the spectrum. Many vibrational modes involve the displacement of only a few atoms, while the rest of the molecule can be considered relatively stationary. The position of a band is therefore characteristic for a set of atoms and bonds (chemical functional groups). These are called characteristic group frequencies and are traditionally given in charts for a set of compounds ([11], see Note 2). The exact position of the bands depends on several factors, including bond strength and the reduced mass of the atoms involved (see Note 3). Thus, a functional group will produce a band within a frequency range, with the exact position depending on the rest of the molecule in which this functional group exists. Since the positions of bands in the spectrum are indicative of the functional groups present in the sample, positional changes (shifts) of bands indicate changes that affect that particular functional group in the molecule. These shifts can be the results of chemical or structural changes, for example, protonation/ deprotonation, formation or breakage of H-bonds, and protein alpha-helix/beta-sheet structural changes. In addition, composite vibrational modes of a molecule (where larger sets of atoms or the FT-IR Spectroscopic Techniques 319 Fig. 1 Reference spectra of pectin (solid line), cellulose (dashed line), lignin (dotted line) and xylan (dash-dot line) illustrating band positions, widths and overlaps for major cell wall components entire molecule vibrates) can also give rise to bands in the mid-IR spectrum. Taken together, this means that every infrared active chemical compound (see Note 4) has an infrared spectroscopic fingerprint, which is unique and can be used for qualitative analysis or detection. Plant material is mostly composed of cellulose, hemicelluloses, lignins, pectins, lipids, waxes, and proteins (see Note 5). Many of these compounds contain similar functional groups (such as –C–H, –C–O, and –O–H) that reside in very similar chemical environments. This results in the broadening and overlaps of the bands, which are seldom diagnostic and thus difficult to assign to a particular compound (Fig. 1, see Note 6). Therefore, care must be taken not to over-interpret the qualitative information from plant FT-IR spectra. Quantitative information is gained from band intensities via the Bouguer–Beer–Lambert law (see Note 7). However, due to natural and experimental variations in the case of plant materials, spectra must be normalized before comparing different samples (see Subheading 3.1.3 and notes therein for more details). As a result of normalization, the observed compositional changes always reflect proportional changes when comparing samples, and not absolute amounts (semiquantitative analysis, see Note 8). After normalization, the average spectra of different samples can be compared, and differences in band intensities can be estimated from band heights or areas or by creating a differential spectrum. However, such comparisons are inherently problematic because they do not consider variations between replicates (see Note 9). Moreover, as mentioned above most bands in the spectra of plant 320 András Gorzsás and Björn Sundberg material are not diagnostic on their own (Fig. 1, see Note 6), and therefore, a set of bands, or preferentially the whole spectrum, should be used for interpretation. Consequently, the best way to analyze FTIR spectra of plant tissues is by multivariate tools, which can handle experimental variation and use the full spectral region in the analysis. An unsupervised principal component analysis (PCA [12]) is often enough when the data is of high quality, and differences between samples are substantial. When that is not the case, however, the initial PCA analysis should be followed up with a more powerful but supervised analysis, such as orthogonal projections to latent structures discriminant analysis (OPLS-DA [13]). The multivariate analysis will reveal if there are outliers among the samples that distort the data and how effectively the FT-IR spectroscopic profiles can be used to classify samples and the bands that contribute to the differences between samples. 1.1 Diffuse Reflectance Infrared Fourier Transform Spectroscopy Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS), in a simplistic view, means the analysis of powders by focusing infrared light onto the powder and collecting the diffusely reflected (scattered) light while minimizing the contribution of specular reflection. Thus, DRIFTS spectra can be considered more like transmission and not reflectance spectra. The major advantages of DRIFTS are sensitivity, speed, and cost. Sample preparation involves homogenization and is relatively straightforward. It involves ball milling, manual grinding, and mixing of IR transparent diluters (typically KBr). As a consequence of the addition of the diluter, the sample does not remain completely intact for further analysis. In addition, homogenization must be performed in a standardized way as it will affect the final spectra (particle size effects [14]; degree of polymerization [15]). On the other hand, it is possible to automate measurements via, e.g., sample carousels or well plates, thereby increasing throughput considerably and making DRIFTS a powerful technique that excels at rapid chemotyping and screening. It should be noted that due to the sensitivity of DRIFTS, normal variation (biodiversity) can be substantial in the spectra. Filtering this variation away usually requires standardization and powerful multivariate analysis (see above and also Subheading 3.1.3). 1.2 Attenuated Total Reflectance Attenuated total reflectance (ATR) is a surface-sensitive infrared spectroscopic technique. Essentially, the sample is placed on an infrared transparent crystal with high refractive index (internal reflection element, IRE). Infrared radiation is totally internally reflected at the interface between the IRE and the sample, and the radiation penetrates the sample in the form of an evanescent wave that decreases in intensity exponentially with distance. The exact penetration depth depends on the angle of incidence, the refractive index differences between the sample and the IRE, and the wavelength of the light. FT-IR Spectroscopic Techniques 321 For plant materials on a Ge, ZnSe, or diamond IRE, the penetration dept is a few microns at most. Often, the IRE is shaped in such a way that the infrared light makes multiple reflections before exiting towards the detector. If the sample absorbs the light, then the intensity is attenuated, hence the name of the technique. In order to get acceptable signals, the sample should make perfect good contact with the IRE. Contact is not a problem for most liquids that wet the surface of the IRE, but solids require pressure to be applied to force the material against the IRE. ATR objectives are available for FT-IR microscopes (increasing the possible spatial resolution, see Subheading 1.3). In the absence of a microscopy accessory, standard FT-IR spectrometers can be equipped with more affordable standalone ATR accessories that have their own built-in cameras. These allow the identification of various areas on the sample surface (such as different areas of a leaf) for measurements, but no mapping possibilities. ATR measurements are nondestructive (unless the required pressure damages the plant tissue). Sample preparation is simple. Since it does not require drying, milling, or mixing, the sample can be placed as it is onto the IRE and pressed against it. However, ATR only provides information about the chemical composition of a 1–5 μm layer of the sample that is in contact with the IRE, which may not be ideal for certain applications. Moreover, the ATR signal is usually weaker than the ones obtainable via DRIFTS, and the detection limits are therefore generally lower. ATR is also more difficult to automate than DRIFTS measurements, and therefore, it may be less suited for large-scale experiments. 1.3 FT-IR Microspectroscopy A microscopy accessory attached to an infrared spectrometer enables spatially resolved sampling and visualization of chemical profiles across sections or surface tissues. Several measurement modes are available for infrared microscopes, for example, transmission (see Note 10), reflection (see Note 11), and ATR (see Note 12), which makes this a very versatile technique. The ATR technique provides in theory the highest spatial resolution due to the refractive index of the IRE (and thus the numerical aperture of the ATR objective). However, the force required for good contact between the IRE and the sample (Subheading 1.2) inevitably causes substantial damage to plant tissues from the small tip of the ATR objective. Furthermore, tissue fragments stuck on the IRE require frequent and rigorous cleaning to limit the risks of carryover contamination, making this technique very impractical for mapping applications. Therefore, we will here only focus on transmission and reflection mode microspectroscopy. The detectors available for infrared microscopes are the standard single element (SE) detector and the more advanced focal plane array (FPA) detector. Both are liquid nitrogen-cooled HgCdTe (MCT) devices, but they have very different capabilities 322 András Gorzsás and Björn Sundberg Fig. 2 Visible light snapshots of a 20 μm thick Col-0 Arabidopsis stem, showing the view field of the infrared microscope. Scale bars are 50 μm. (a) Unmodified view field. The single element (SE) detector records a single spectrum across the entire area. This spectrum represents the average chemical composition of all cell types within this area. A 64 × 64 focal plane array (FPA) detector will record 4,096 spectra across the same area. Thus, spectra from individual cells can be extracted. (b) Knife-edge apertures applied to limit the view field for the SE detector. Only the diagonal rectangular area in the center is measured. Consequently, the single spectrum recorded by the SE detector represents the average composition of xylem fibers and vessels and operational routines. The SE detector records a single spectrum of the entire view field, which represents the average chemical composition of that area. Knife-edge apertures are generally applied to limit the image area to be measured (Fig. 2). The smallest area that can routinely be measured this way is ca. 50 μm × 50 μm (see Note 13). An FPA detector consists of an array (see Note 14) of miniature detector elements in the focal plane of the infrared radiation (pixels). Each of these detector elements records a spectrum individually and independently from the other elements (no pixel cross talk). This means that thousands of spectra are recorded simultaneously across the view field, much like pixels building up an image. This type of measurement is therefore called imaging, as opposed to mapping, which means that a series of spectra are recorded consecutively. The view field of a typical FPA detector with 64 × 64 detector elements is ca. 175 μm × 175 μm (see Note 15), and 4,096 spectra are recorded across this area. Thus, the size of a detector element is about 2.7 μm × 2.7 μm. In reality, however, the spatial resolution is about 10–20 μm for applications on plant samples (see Note 16). Since a spectrum is recorded for each “pixel” of the image, FPA measurements will provide cellular resolution without the need of apertures. This in turn means better quality spectra at much higher speed. SE detectors are easy to operate, require less computing power, and generally less prone to errors (see Note 17), while FPA detectors are more sensitive to disturbances such as vibrations (see Note 18), and software problems (see Note 19). Since the differences between SE and FPA measurements are substantial not FT-IR Spectroscopic Techniques 323 only in theory but also in practice, separate protocols are given for their use (Subheadings 3.3 and 3.4). FPA data has traditionally been evaluated by so-called heat mapping, which means plotting the intensity (integral) or intensity ratio of certain bands in the spectrum. This visualization is easy to perform and interpret, but it should be noted that it is unreliable in the case of plant materials. Heat maps suffer from scattering effects, poorly resolved (overlapping) and nonspecific (nondiagnostic) bands, varying baseline slopes, and pixel coverages and are prone to reflect artifacts rather than the true distribution of a compound [7]. Alternatively, images can be created based on multivariate analyses, using the spatial information in the data to visualize the results (multivariate imaging [7]). Multivariate imaging is a powerful tool, which today represents state-of-the-art data analysis in FT-IR spectroscopic profiling of plants. Protocols for multivariate imaging are not provided here, since it is a complex process and not yet a routine approach. 2 Materials Chemicals Since FT-IR spectroscopy and microspectroscopy do not require staining or extraction, the only chemicals required are IR spectroscopy grade KBr (or other infrared transparent diluter) for DRIFTS (Subheading 3.1.1), liquid N2 to cool HgCdTe (MCT) detectors (Subheadings 3.3.2 and 3.4.2), and sample carriers (BaF2, CaF2, ZnSe windows; gold and aluminum mirrors) for microspectroscopy measurements (Subheadings 3.3.1 and 3.4.1). The reference spectra shown in Fig. 1 were recorded according to the DRIFTS protocol given in Subheading 3.1, using the following compounds: lignin isolated from wild-type poplar (P. tremula × P. Alba, courtesy of J. Ralph, University of Wisconsin, Madison, WI, USA), xylan from birchwood (Sigma-Aldrich, http://www.sigmaaldrich. com/), cellulose powder for thin layer chromatography (CAMAG, http://www.camag.com/), and pectin from citrus peel (SigmaAldrich, http://www.sigmaaldrich.com/). 2.2 Instrumentation and Equipment The protocols provided were developed on the following Bruker instruments: IFS 66 v/S, Equinox 55, and Tensor 27, equipped with Hyperion 3000 microscopy accessories and 64 × 64 focal plane array (FPA) detectors (see Note 20). Experimental settings are given for these systems, but other systems should behave similarly. Thus, the values given here can be used initially, but finetuning of measurement parameters (number of scans, spectral resolution, etc.) is needed for other instruments and samples. In addition to standard laboratory equipments, a desiccator, a vibrational ball mill (see Note 21), and mortar and pestle (see Note 22) are needed for DRIFTS measurements (see Subheading 3.1.1). 2.1 324 András Gorzsás and Björn Sundberg Cryomicrotome (Microm HM 505 E) and vibratome (Leica VT 1000 S) were used to prepare sections for microspectroscopy (see Subheading 3.3.1). For storage and measurement of sections, standard microscopy glass slides (see Note 23), infrared transparent windows (see Note 24), or mirrors (see Note 25) are needed. 2.3 3 Software The measurement and data analysis steps have been developed on standard PCs running Microsoft Windows XP and Windows 7 operating systems, using Bruker’s OPUS Software (Bruker Optik GmbH, http://www.brukeroptics.com/, versions 5.5–6.5, with notes detailing changes relevant to the newly released version 7). Other manufacturers provide their own software bundles. However, finding and adjusting parameters (number of scans, spectral resolution, etc.) in those software bundles should be straightforward. For data analysis, most software allow data export in ASCII format (data point tables in OPUS), which can then be opened in, e.g., Microsoft Excel. For multivariate analysis, the ASCII files were combined into a MATLAB matrix (.mat) file, which was processed by SIMCA-P + (version 12, Umetrics AB, Umeå, Sweden). Methods 3.1 Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS) 3.1.1 Preparation Samples should be in the form of dry powders. 1. Freeze-dry the samples for 24 h and store them in a moisturefree environment until further processing. 2. Powderize the samples by ball milling using the following procedure. Add 50 mg sample into a tube of the vibration mill, add two 12 mm diameter stainless steel balls in each tube, and mill at 30 Hz for 120 s (see Note 26). In addition, KBr (see step 3 below) can be added and mixed during the milling procedure. This will help manual grinding (see step 4 below) and also absorb a substantial part of the generated heat. Cleaning the vibration mill tubes is performed by washing with water and ethanol and then blowing them dry with compressed air. 3. Samples must be mixed with KBr (see Note 27), because undiluted plant materials absorb too much IR light. The total weight (sample + KBr) should be 400 mg, of which the dry sample should be 1–10 mg and KBr 399–390 mg (see Note 28). Mixing the sample with KBr can be performed prior to ball milling (to limit the effect of generated heat and help manual grinding; see steps 2 and 4, respectively). If KBr is added before ball milling, 50 mg dry sample requires approximately 2 g KBr. 4. The final step of sample preparation is the manual grinding of the mixture (sample and KBr). This should be done even if FT-IR Spectroscopic Techniques 325 ball milling was performed on the mixture to obtain a properly homogenized sample with suitable particle size. This is difficult to achieve by ball milling alone without burning the sample. Pure KBr appears crystalline, much like common table salt. The properly ground mixture should be a fine powder that appears like flour. This is the most laborious and timeconsuming step. 5. Load the sample mixture into the sample container cup. Make sure the surface is flat after loading and that the sample mixture is not compressed into the container cup. 6. For background measurement, prepare pure KBr in the same way as the sample + KBr mixtures. 3.1.2 Measurement For the best quality spectra, band intensities should ideally be between 0.3 and 0.8 Abs units and the signal-to-noise ratio high. First, the background (pure KBr, see step 6 in Subheading 3.1.1 above) should be recorded, then the samples, using the same parameters as the background (see Note 29). 1. Start OPUS and click on the “Advanced Measurement” icon. In the “Basic” or “Advanced” tab of the dialog window, click on “Load” and chose an already existing .XPM file for diffuse reflectance measurements (for the first time, use a generic XPM file that comes bundled with the instrument). 2. Set the number of scans to 128 (see Note 30) and spectral resolution to 4 cm−1 (see Note 31). This will result in measurement times of about 2 min/sample (see Note 32) using a standard 10 kHz scanner velocity (see step 6 below). 3. Set the spectral range to 400–4,000 cm−1 (see Note 33). Save the sample and background interferograms by checking the respective boxes on the “Advanced” tab during measurement setup (see Note 34). 4. Use “Double-sided, forward–backward” acquisition mode and automatic signal gains. 5. For Fourier transformation parameters, use a Blackman–Harris 3-term apodization function, the same frequency limits as given for the spectral range (step 2 above), a phase resolution of 32, Mertz-type phase correction with no peak search, and a zero filling factor of 2 (see Note 35). 6. The Optic and Instrument Parameters should be set by default, according to the type of instrument in use. 7. When all values are set, the experimental parameters can be saved as an .XPM file for future measurements. This means that only the file name needs to be given for further samples (see Note 36). 326 András Gorzsás and Björn Sundberg 3.1.3 Data Analysis There will always be variation between spectra that originate from experimental (instrument) factors. To minimize the effects of experimental variation on the data, spectra should be standardized before analysis by the following steps: 1. Load all spectra (see Note 37) to be standardized in OPUS. 2. Select the “AB” block of all loaded spectra and perform baseline correction (“Manipulate” menu, “Baseline correction”). Ideally, a two-point straight baseline should be created, spanning the entire spectral region that will be used for analysis. If the baseline is not linear, use the standard OPUS option of 64-point rubber band baseline correction (“Manipulate” menu, “Baseline correction” option, “Select method” tab), excluding CO2 bands (see Note 38). 3. If spectra are still not overlapping at flat baseline areas (i.e., regions where there are no bands), an “Offset correction” (“Manipulate” menu, “Normalization” option, “Offset correction” method) should be applied, using a flat baseline area for “Frequency Range.” 4. After baseline and offset corrections, normalization needs to be performed to make spectra fully comparable. Keep in mind that after normalization, compositional changes between samples will reflect relative differences and not absolute amounts (see Notes 8 and 39). Using area normalization is normally a good strategy. This means that the area under all bands in the spectral range of analysis is set to a constant value (100 %). In OPUS, there is no built-in area normalization option; the closest is “Vector Normalization” (“Manipulate” menu, “Normalization” option. see Note 40). 5. Alternatively, Min–Max normalization (“Manipulate” menu, “Normalization” option) can be used to set the intensity of a (reference) band in the spectrum to a constant value (see Note 41). This will scale the spectra in the absorbance axis so that the minimum and maximum values in the frequency range will be constant (see Note 42). 6. Standardized (baseline corrected and normalized) spectra can be compared directly in OPUS to detect substantial changes. However, to take full advantage of the data, multivariate analysis is highly recommended. This requires that the spectra are exported to a format which can be read in by the multivariate software. For this task, OPUS offers the possibility of exporting in Jcamp (.dx) file format as well as in standard tab or space delimited ASCII data point table (.dpt) files (see Notes 43 and 44) (“File” menu, “Save File As” option, “Mode” tab). Unfortunately, each file needs to be saved and exported individually (see Note 45). FT-IR Spectroscopic Techniques 3.2 Attenuated Total Reflectance (ATR) 3.2.1 Preparation 3.2.2 Measurement 327 ATR measurements require virtually no sample preparation: 1. Select the appropriate IRE (see Note 46). 2. Make sure there is good contact between the sample and the crystal. For solids, apply the maximum feasible pressure that will not damage the sample or the IRE. Most of the measurement parameters are identical to those described for DRIFTS (Subheading 3.1.2) and will not be detailed here. However, the signal in ATR measurements is often weaker, which necessitates longer measurement times (higher number of scans) than DRIFTS. Another difference is that the background measurement is done on either the empty IRE, or in case of solutions, using the solvent as background (see Note 47). 1. Start OPUS and click on the “Advanced Measurement” icon. In the “Basic” or “Advanced” tab of the dialog window, click on “Load” and choose an already existing .XPM file for attenuated total reflectance measurements (for the first time, use a generic XPM file that comes bundled with the instrument). 2. Record the background, with the same parameters as for the sample. 3. Record the sample, using the same initial parameters as in the case of DRIFTS measurements (Subheading 3.1.2). Save the sample and background interferograms by checking the respective boxes on the “Advanced” tab during measurement setup (see Note 35). 4. The only difference from DRIFTS setup should be under the “Optic and Instrument Parameters” tab, which should be set by default to the correct ATR settings matching the type of instrument in use. 5. If the signal-to-noise ratio is low, increase the number of scans (see Notes 31 and 32). When overall absorbance values are too low (i.e., most bands having intensity values below 0.3 Abs unit), the sample should be “concentrated.” For solutions, this can be achieved by either increasing the concentration or by repeatedly depositing the sample on the IRE and evaporating the solvent (see Note 48). For solids, the only way to improve signal strength is by a better contact with the IRE, which is achieved by applying higher pressure. 3.2.3 Data Analysis Data analysis is done in the same way as in the case of DRIFTS, follow steps 1–6 in Subheading 3.1.3. 328 András Gorzsás and Björn Sundberg 3.3 Microspectroscopy Using a Single Element (SE) Detector 3.3.1 Preparation When plant sections are analyzed, care must be taken to produce sections that are non-scattering (staying flat on the carrier) and thin enough for the infrared light to pass through but thick enough for the anatomical features to remain intact (no collapsing, folding, tearing, or cracking). Highest-quality data is obtained by using transmission mode [16] but that also requires the most challenging sample preparation as Arabidopsis sections are very fragile. 1. Sections should be about 10–20 μm thick (see Note 49). They can be obtained from frozen material with a cryotome (see Note 50), from fresh material with a vibratome (see Note 51), or from paraffin-embedded material with a microtome (see Note 52). The optimal thickness value depends on the material at hand (cell density, wall thickness, and other physical and optical properties). From the spectroscopic point of view, sections are ideal when the spectra recorded contain absorbance values between 0.3 and 0.8 and high signalto-noise ratios (see Note 53). 2. Mount the sections onto carriers (see Note 54): standard microscopy glass slides (see Note 23), infrared transparent windows (see Note 24), or mirrors (see Note 25). To keep the sections flat, they can be sandwiched between two carriers. 3. Place the mounted sections into desiccators for drying for at least 48 h (see Note 55). 4. Sections dried on standard microscopy glass slides need to be transferred to infrared transparent windows (see Note 24) or mirrors (see Note 25) prior to measurements. Transfer is done by scraping the sections off from the glass slide by a razor blade and gently placing them onto the new carrier. 3.3.2 Measurement The numerical values listed below are suggested initial values that normally provide good results for most types of Arabidopsis sections. However, fine-tuning for individual samples and instruments is always necessary. We provide two different methods for measurements. Method 1 allows for the definition of four sample positions at a time (see Note 56) and records each as individual files. Alternatively, Method 2 allows many measurement positions to be defined and recorded consecutively within the same measurement. In this case, all positions will be recorded and kept in the same file, as individual blocks (see Note 57). The major advantage of Method 2 is speed and automation, as the user does not have to manually set up and start each measurement individually. The drawbacks are as follows: (a) the same background will be used for all positions. In case of problems with that background (water vapor, vibrations, etc.), all measurements will be affected, and this will only be obvious after all measurements are done; (b) all positions will be recorded with the same parameters, and there is no possibility of FT-IR Spectroscopic Techniques 329 fine-tuning parameters for each sample position separately; (c) all positions must be in the same focus; and (d) the same apertures will be used for all positions. Method 1: 1. Cool the detector by filling it with liquid N2 until the red indicator light is switched off. 2. Place the sample mounted on the appropriate carrier (Subheading 3.3.1, step 4) onto the sample tray (see Note 58). 3. Select transmission or reflection mode on the foot of the microscope accessory. Make sure you are using visible and not infrared light, and adjust brightness and focus if needed. 4. In OPUS, change detector (select beam path) to single element. 5. In the “Measure” menu, select “Video Assisted Measurement.” 6. In the “Basic” or “Advanced” tab of the dialog window, click on “Load” and chose an already existing .XPM file for single element transmission or reflectance measurements (for the first time, use a generic XPM file that comes bundled with the instrument). 7. In the “Advanced” tab, specify filename and path. Set the number of scans between 32 and 512 for both background and sample and spectral resolution to 4 cm−1 (see Notes 31 and 59). For the “save data” parameters, give an upper limit of 4,000 cm−1 (see Note 34) and a lower limit depending on the cutoff edge of the carrier (400 cm−1—no cutoff—for mirrors in reflectance mode, 550 cm−1 for ZnSe, 850 cm−1 for BaF2, and 1,050 cm−1 for CaF2). Save all data blocks (see Note 35) and choose the resulting spectrum as Absorbance. 8. In the “Optic” and “Acquisition” tabs, keep the default parameters, as these are instrument specific (gains and scanner velocities, high- and low-pass filters, etc.). The “Double-sided, forward–backward” acquisition mode should be selected by default. 9. In the “Fourier transform” tab, select the “Blackman–Harris 3-term” apodization function, the “Power/No Peak Search” phase correction mode, and a zero filling factor of 2. Phase resolution should be left at default (16 or 32). 10. In the “XY stage” tab, make sure the joystick is activated. Calibration of the stage may be necessary if the stage does not move to positions during measurements. 11. In the “Check Signal” tab, make sure there is an interferogram with acceptable counts, depending on the instrument condition and apertures applied (at least 8,000 counts on an empty spot on the carrier, i.e., no sample). 330 András Gorzsás and Björn Sundberg 12. Save the XPM file under a descriptive name (“Advanced” tab, “Save” button). 13. In the “Basic” tab, select “Start Video Assisted Measurement.” This will close the dialog window and opens the measurement workspace, dominated by the Live Video Pane in the center. In the Live Video Pane, the red rectangle outlines the CCD area, while the green square with the crosshair shows the actual measurement area (see Note 60). 14. Move the tray with the joystick to an empty spot on the carrier. Right-click anywhere inside the green square in the Live Video Pane and select “Defining Positions…” then “Background Position” in the opening contextual menu (see Note 61). 15. Move the tray with the joystick to the first sample position. Right-click anywhere inside the green square in the Live Video Pane and select “Defining Positions…” then “Load Position 1” in the opening contextual menu (see Note 57). 16. Set the apertures if necessary. Adjust focus and light intensity if necessary. Save the visible image by right-clicking anywhere inside the green square in the Live Video Pane and selecting “Video Image…” then “Snapshot” in the opening contextual menu (see Notes 62–64). 17. Move to the Background Position (see Note 65), adjust focus if necessary, and change from visible to infrared light (see Note 66). 18. Right-click anywhere inside the green square in the Live Video Pane and select “Starting Measurement…” then “Collect Background at Current Position” in the opening contextual menu (see Note 67). 19. When the background measurement is finished, change to visible light and move to the predefined sample position (Load Position 1). Adjust focus if necessary and change to infrared light (see Notes 66 and 67). 20. Right-click anywhere inside the green square in the Live Video Pane and select “Starting Measurement…” then “Measure Current Position” in the opening contextual menu. 21. When the measurement is finished, change the light to visible (see Note 68) and close the measurement workspace to save the file (see Note 69). Method 2: Steps 1–14 are identical to Method 1. 22. If more than one sample position has been defined in step 15 (see Note 57), start a new “Video Assisted Measurement” (in the “Measure” menu of OPUS), use the same XPM file as FT-IR Spectroscopic Techniques 331 before (only changing the file name in the “Advanced” tab), start the measurement workspace (step 13), move to the next sample position to be measured (see Note 66), and repeat from step 16 (see Note 30). 23. Adjust focus and light intensity. Create an overview image by right-clicking anywhere inside the green square in the Live Video Pane and selecting “Video Image…” then “Set + Scan Overview Image Area” in the opening contextual menu (see Note 65). 24. Right-click in the Live Video Pane and select “Measurement Spots/Grid…” and “Mark Measurement Positions.” The cursor changes, and left-clicking will mark a position by placing a “+M” sign on the image. 25. Move around the sample using the joystick and mark all positions to be measured (see Notes 70 and 71). 26. Move to the Background Position (see Note 66) and change from visible to infrared light (see Note 67). 27. Right-click anywhere inside the green square in the Live Video Pane and select “Starting Measurement…” then “Collect Background at Current Position” in the opening contextual menu (see Note 68). 28. When the background measurement is finished, right-click anywhere inside the green square in the Live Video Pane and select “Starting Measurement…” then “Measure Marked Positions” in the opening contextual menu. This will measure all positions, in the order they were marked. This order will be the order of the data blocks as well (see Note 58). The already measured positions have a checkmark sign to differentiate them from those yet to be measured. 29. When the measurements are finished, change the light to visible (see Note 69) and close the measurement workspace to save the file (see Note 70). 3.3.3 Data Analysis Measurements using the single element detector result in 3D files, which contain one (Method 1 in Subheading 3.3.2) or more spectra (Method 2 in Subheading 3.3.2). To extract these spectra, open the 3D file in OPUS and follow the steps below (see Note 72): 1. Select the “AB” block of the 3D file in OPUS. 2. In the “Measure” menu, select the “Extract data” option. 3. In the dialog window that opens, specify a filename and path for the spectrum to be extracted (see Note 73) in the “Select Files” tab. 4. In the “Extension Range” tab, select from “Beginning of file” to “End of file.” 332 András Gorzsás and Björn Sundberg 5. In the “Extraction Mode” tab, select “Series of single blocks” (see Note 74) to be stored, the “Increment name” option under “If file already exists,” and the “Load” option under “Extracted files” (see Note 75). Finally, the extracted spectra are analyzed exactly the same way as spectra recorded by DRIFTS or ATR methods: follow steps 1–6 in Subheading 3.1.3. 3.4 Microspectroscopy Using a Focal Plane Array (FPA) Detector Samples are prepared in the same way as for microspectroscopy using a single element detector (Subheading 3.3.1). 3.4.1 Preparation 3.4.2 Measurement The parameters listed below are initial values that will provide good results for most types of Arabidopsis sections. However, it is necessary to fine-tune them for individual samples and instruments to achieve the best possible spectrum quality: 1. Cool the detector by filling it with liquid N2 (see Note 76). 2. Switch on the FPA detector. 3. Place the sample mounted on the appropriate carrier (Subheading 3.3.1, step 4) onto the sample tray (see Note 59). 4. Select transmission or reflection mode on the foot of the microscope accessory. Make sure you are using visible and not infrared light, and adjust brightness and focus if needed. 5. Start OPUS and change detector (select beam path) to FPA. 6. In the “Measure” menu, select “Continuous Scan FPA Measurement.” 7. In the “Basic” or “Advanced” tab of the dialog window, click on “Load” and chose an already existing .XPM file for FPA transmission or reflectance measurements (for the first time, use a generic XPM file that comes bundled with the instrument). 8. In the “Advanced” tab, specify filename and path. Set the number of scans to 32 for both background and sample, and set the spectral resolution to 4 cm−1 (see Notes 31 and 60). For the “save data” parameters, give the upper limit of 4,000 cm−1 (see Note 34) and the lower limit of 900 cm−1 (or 1,050 cm−1 if a CaF2 window is used as the carrier, see Notes 24 and 77). Save all data blocks (see Note 78) and chose the resulting spectrum as Absorbance. 9. In the “Optic” and “Acquisition” tabs, keep the default parameters, as these are instrument specific (gains and scanner velocities (see Note 79), high- and low-pass filters, etc.). The “Double-sided, forward–backward” acquisition mode should be selected by default. FT-IR Spectroscopic Techniques 333 10. In the “Fourier transform” tab, select the “Blackman–Harris 3-term” apodization function, the “Power/No Peak Search” phase correction mode, and a zero filling factor of 2. The phase resolution should be left at the default value (16 or 32). 11. In the “XY stage” tab, make sure the joystick is activated. Calibration of the stage may be necessary if the stage does not move to positions during measurements. However, this cannot be done once the measurement has been started (step 14). 12. The “Check Signal” tab has a unique display that is specific for the FPA detector. Instead of an interferogram, it contains a scatter plot. The dots represent the maximum intensity count of the interferogram at each pixel. In addition, several parameters for the FPA setup can be accessed here. Keep the default values for frame rate and integration. Click on the “Diagnostics” button for an exact readout of the “FPA temperature” (listed in the bottom row of parameters). 13. Save the XPM file under a descriptive name (“Advanced” tab, “Save” button). 14. In the “Basic” tab, select “Start Video Assisted Continuous Scan FPA Measurement.” This will close the dialog window and open the measurement workspace, which is dominated by the Live Video Pane in upper part and the Live FPA Image Pane in the lower part. In the Live Video Pane, the red rectangle outlines the CCD area, while the green square with the crosshair shows the actual FPA measurement area (see Note 61). 15. Move the tray with the joystick to an empty spot on the carrier. Right-click anywhere inside the green square in the Live Video Pane and select “Defining Positions…” then select “Background Position” in the opening contextual menu (see Note 62). 16. Move the tray with the joystick to the first sample position. Right-click anywhere inside the green square in the Live Video Pane and select “Defining Positions…” then “Load Position 1” in the opening contextual menu (see Note 57). 17. Adjust focus and light intensity if necessary (see Note 80). Save the visible image by right-clicking anywhere inside the green square in the Live Video Pane and selecting “Video Image…” then “Snapshot” in the opening contextual menu (see Notes 63–65). 18. While still at the sample position, change from visible to infrared light (see Note 67). Note that the Live FPA Image updates. Ideally, anatomical sample features should be recognizable. 19. Right-click in on the Live FPA Image and select “Setup FPA Detector…” then “Show Control Panel” in the opening contextual menu (see Note 81). Set the gain to 1 or 2 (see Note 82). Adjust the value for the offset so that all dots in the 334 András Gorzsás and Björn Sundberg scatter plot fall between ca. 4,000 and 12,000 counts, i.e., between ¼ and ¾ of the total intensity range scale. Ideally, no pixels should have 0 (minimum) or 16,383 (maximum) readouts (see Note 83). Close the dialog window by clicking “OK.” 20. If using transmission mode, adjust the condenser (i.e., focus the infrared light; see Note 84) so that the Live FPA Image shows maximum homogeneous illumination for as much of the entire FPA area as possible (i.e., no side should be darker than the others). 21. While still in infrared mode, move to the Background Position (see Note 66), adjust focus and condenser (see Note 85) if necessary. Check the Live FPA Image to make sure that the gain and offset are correctly set, and adjust if necessary as described in step 19. 22. Right-click on the Live FPA Image, select “Measurement” then “Start Measurement” and “Background.” Click on the “Start Measurement” button of the new window that appears (see Note 85). 23. Wait until the measurement is done (see Note 86), then move to a predefined sample position (see Note 66), and readjust the focus and condenser (see Note 85) if necessary. Check the Live FPA Image to make sure that gain and offset are correct, and adjust if necessary as described in step 19. 24. Right-click on the Live FPA Image, select “Measurement” then “Start Measurement” and “Sample.” Click on the “Start Measurement” button of the window that appears (see Note 86). 25. When the measurement is done (see Note 87) change to visible light (see Note 69) and close the measurement workspace to save the file (see Note 70). 26. If more than one sample position has been defined in step 16 (see Note 57), start a new “Start Video Assisted Continuous Scan FPA Measurement” (in the “Measure” menu of OPUS), use the same XPM file as before (only changing the file name in the “Advanced” tab) to start the next measurement, move to the next sample position to be measured (see Note 66), and repeat from step 23 (see Note 30). Marking many different positions and measuring them all as in Method 2 for the SE detector (Subheading 3.3.2) is not possible when using the FPA detector. 3.4.3 Data Analysis Measurements using the FPA detector result in 3D files, which contain all spectra in the order of their pixel number (called “blocks”) in the image (see Note 87). To extract these spectra, open the 3D file in OPUS and follow the steps below (see Note 73) for Method 1 (extracting all spectra from an image) or Method 2 (extracting spectra from selected pixels only): FT-IR Spectroscopic Techniques 335 Method 1, extracting all spectra 1. Open the 3D file in OPUS and double-click on the “AB” block. This will bring up the 3D window view. 2. In the “Evaluate” menu, choose “Integration.” If you have a predefined integration method (e.g., “Arabidopsis Overview”; see step 3), load it by clicking on the “Load Integration Method,” click “Integrate,” and proceed to step 4. If not, click on “Setup Method,” to define an integration method. 3. In the dialog window, choose the integral type “B,” for “Left edge” set the value to 1,614 and for “Right edge” to 1,572, and give the following Label: “lignin1595.” Click on the “>>” button to define the next band: Type B, Left edge 1,520, Right edge 2,485, Label “lignin 1510.” Click on the “>>” button once again to define the next band: Type B, Left edge 1,755, Right edge 1,733, Label “-C = O 1740.” Click on the “>>” button for the last time and define the next band: Type B, Left edge 950, Right edge 1,180, Label “carbohydrates.” Click on the “Store Method” button and save the integration method under the name “Arabidopsis Overview.” Click “Exit” to close the dialog window, return to the integration window and click “Integrate.” 4. The integration produces a “TRC” data block, which serves only as means for visualization. No qualitative analysis should be based on the produced heat map (see Note 88). 5. In the “Window” menu, select “New Registered Window…” and choose “Map + Vid + Spec” in the drop-down menu. Clicking “OK” brings up a new 3D view split into three panes; the two upper panes can be used to show visible images and infrared maps (Image Panes), while the bottom pane shows infrared spectra at the selected pixels (Spectrum Pane). Drag the “TRC” block into this new 3D view (see Note 89). 6. Right-click on the right Image pane (see Note 90) select “XZ Plot” and then choose “Properties.” In the dialog window that opens, select “Show surface” and “Video Image” on the “3D Properties” tab. On the “Mapping” tab, choose “-C = O 1740” (see Note 91) from the “Select trace” drop-down menu and choose the correct visible image (if more than one snapshot was taken in step 17 in Subheading 3.4.2) in the “Select image” drop-down menu. In the “Selection” tab, choose “X” and “Z” from the “Show” drop-down lists. In the “Contour” tab, select “Rainbow,” uncheck the “No colour splitting for negative values” box, choose the lowest possible number from the “Contours” drop-down menu (see Note 92) and choose “Contour lines and colors” from the “Method” drop-down menu. Click “Apply” and “OK” (see Notes 93 and 94). 336 András Gorzsás and Björn Sundberg 7. Repeat the entire step 6 on the left Image pane, but choose “No contours” instead of “Contour lines and colors” from the “Method” drop-down menu. Click “Apply” and “OK” (see Note 95). 8. The red and green crosshair marks the position from which the spectrum is shown in the bottom Spectrum pane. The Spectrum pane also lists the pixel number (after “Index”) of the spectrum (see Note 88) and has controls for moving the crosshair (see Note 96). Move around the image to make sure spectra look reasonable and note the positions of bad pixels, if any. These should be excluded from further data analysis. 9. Right-click on any of the Image panes and select “Extract Spectra.” 10. In the dialog window that opens, specify a filename and path for the spectrum to be extracted (see Note 74) in the “Select Files” tab. 11. In the “Extension Range” tab, select from “Beginning of file” to “End of file.” 12. In the “Extraction Mode” tab, select “Series of single blocks” (see Note 97) to be stored, the “Increment name” option under “If file already exists,” and the “Do not load” option under “Extracted files” (see Note 98). Finally, the extracted spectra are treated the same way as spectra recorded by DRIFTS or ATR methods: follow steps 1–6 in Subheading 3.1.3. Method 2, extracting only selected spectra 13. Follow steps 1–7 of Method 1 above. 14. Move to the pixel from which the spectrum should be extracted (see Notes 97 and 99). 15. Right-click on any of the Image panes and select “Extract Spectra.” 16. In the dialog window that opens, specify a filename and path for the spectrum to be extracted (see Note 74) in the “Select Files” tab. Use the pixel number (“Index” value in the Spectrum pane, see Note 88) in the filename for easy identification of the origin of the spectra. 17. In the “Extension Range” tab, select from “Block” and give the pixel number of the spectrum to be extracted (the default value is the pixel of the spectrum shown in the Spectrum pane, i.e., the one at the crosshair position). 18. In the “Extraction Mode” tab, select “First block only” (see Note 100) to be stored, the “Increment name” option under “If file already exists,” and the “Load” option under “Extracted files” (see Note 76). FT-IR Spectroscopic Techniques 337 19. Repeat steps 14–18 for extracting additional spectra from the same image. 20. Repeat steps 13–19 for additional spectra from another image. Finally, the extracted spectra are treated the same way as spectra recorded by DRIFTS or ATR methods: follow steps 1–6 in Subheading 3.1.3. 4 Notes 1. Spectra can be plotted as transmittance (T %) or absorbance (Abs). T = I/I0, while Abs = log10(1/T). While Abs spectra are more common, early works used T % more frequently (particularly before the Fourier transform revolution). In addition, older publications use wavelength on the x-axis (and µm as units), and not wavenumbers (with cm−1 units) as is common nowadays. The conversion can be easily done by the following formula: ν = 10,000/λ, where ν is the wavenumber (in cm−1) and λ is the wavelength (in µm). 2. Often these charts do not only display the wavenumber range in which a functional group can produce a band, but also show information about band shape (narrow, broad, shoulder) and intensity (weak, medium, strong). 3. The effect of bond strength on the characteristic frequencies can be illustrated by the positional change of the intensive –C = O stretching vibration in formaldehyde (H2C = O, ca. 2,053 cm−1) and acetone ((CH3)2C = O, ca. 1,731 cm−1). The effect of the change in atomic mass on characteristic frequencies can be even larger (several hundreds of cm−1 when exchanging hydrogen with deuterium, for instance). Therefore, isotope exchange can be used to identify the origin of a band or to shift a band to a different position to avoid overlaps. If a sample is repeatedly washed with D2O, the accessible Hs will be exchanged to Ds. Thus, if a –C–O band originates from an alcohol, it will shift considerably upon deuteration (–C–OH to –C–OD change), as opposed to a –C–O band of an ether or ester. 4. Only molecules with a dipole moment (permanent or induced) produce infrared active vibrations. Thus, pure diatomic gases (N2, O2, etc.) are infrared silent, and the major atmospheric disturbance in infrared spectra is caused by H2O and CO2. 5. Normally, water would also be present, but it produces very intense bands in FT-IR spectroscopy that can obscure important parts of the spectrum (e.g., around 1,600 cm−1 where characteristic lignin and protein bands are situated). 338 András Gorzsás and Björn Sundberg Thus, water must be removed prior to analysis by freezedrying or desiccation. The only general exception to this rule is the ATR technique, which can handle wet samples (see Subheading 1.2). 6. The most notable exception is perhaps the aromatic –C = C– functionality present in lignins and monolignols, giving rise to bands around 1,510 and 1,595 cm−1 [7, 17]. These positions are seldom obscured by other bands (although absorbed water and proteins that give rise to bands at around 1,650 cm−1 can mask part of the 1,595 cm−1 band), and lignins/monolignols are often the only aromatic compounds in large enough quantities to produce significant –C = C– bands in the spectrum. 7. The Bouguer–Beer–Lambert law can be written as follows: T = I/I0 = 10−εlc, where ε is the molar absorptivity coefficient, L is the path length of the light in the sample, and c is the concentration of the absorbing material. Since Abs = log10(1/T) = εlc, it means that the absorbance (and not the transmittance) is linearly correlated to the concentration. In addition, due to the differences in ε, direct comparison between band intensities is difficult. For instance, an intensity of 0.25 Abs unit for the –C = C– and 0.5 Abs unit for the –C–O–C– band does not mean that there are twice as many –C–O–C– as –C = C– functionalities in the sample. When monitoring the same band, however, the intensity change can be used for determining concentration changes after calibration. 8. For a simple theoretical example consider the following 1.5 mg samples: Sample A: 0.5 mg cellulose, 0.5 mg lignin, 0.5 mg other; Sample B: 0.25 mg cellulose, 0.5 mg lignin, 0.75 mg other; Sample C: 0.5 mg cellulose, 1.0 mg lignin; Sample D: 0.3 mg cellulose, 0.6 mg lignin, 0.6 mg other. After normalization, samples B, C, and D will all show a substantially increased lignin to cellulose ratio when compared to Sample A. However, from the normalized spectra alone, it is impossible to determine whether the lignin to cellulose ratio increased because the cellulose content decreased (Sample B) or because the lignin content increased (Sample C), or both (Sample D). 9. To include sample variation during the direct comparison of average spectra, standard deviations would have to be shown for all intensities at each wavenumber. Otherwise the significance of the differences between average spectra is impossible to estimate. 10. In transmission mode the infrared light passes through the sample, much like the visible light does in standard microscopy. Therefore, this mode is most similar to visible microscopy. FT-IR Spectroscopic Techniques 339 However, standard microscopy glass slides cannot be used for infrared microscopy as they absorb infrared light. Instead, samples should be mounted on infrared transparent windows (e.g., BaF2, CaF2, ZnSe, NaCl) (Subheading 3.3.1). In addition, the refraction caused by the window results in a focal shift of the infrared light [16]. This focus shift should be compensated for by a condenser (Subheading 3.3.2). 11. In reflection mode, the sample is mounted on a carrier with highly reflective surface (most commonly gold or aluminum mirrors) (Subheading 3.3.1). The infrared light first passes through the sample before it is reflected by the mirror and passes through the sample in the reverse direction before reaching the detector. Since the light passes through the sample twice, sample thickness (concentration) is doubled. 12. In ATR mode the sample is seen and measured via an ATR objective, instead of the standard objective of the infrared microscope. As such, the measurements are essentially surface-specific ATR measurements (see Subheading 1.2) of selected sample areas. 13. Too small of an aperture size results in spectral distortions and decreased signal-to-noise ratio due to diffraction and limitation in light intensity. The 50 µm × 50 µm area is given as a safe limit using an average infrared microscope setup with a conventional source and a 10–20 µm thick sample section of average quality. However, the exact size of the smallest applicable aperture is determined by the physical, chemical, and optical properties of the sample as well as by the infrared source. With a synchrotron source, high-intensity infrared radiation can be focused on very small areas, providing the best available spatial resolution of all infrared microspectroscopic techniques (in the µm range). 14. Arrays can be square (16 × 16, 32 × 32, 64 × 64 or 128 × 128) or linear (1 × 16, 1 × 32, etc.). The size of the array determines the number of spectra recorded simultaneously (4,096 for a 64 × 64 array) and also the area that can be recorded in a single image, since the size of an individual detector element is fixed. 15. Older generation FPA detectors had larger detector element sizes and consequently larger view fields. Typically these detectors had an individual element size of ca. 4.5 µm × 4.5 µm, resulting in a view field of 285 µm × 285 µm for a 64 × 64 FPA. However, even for this detector element size the spatial resolution is diffraction limited (see Note 16) and not detector element size limited. 16. The spatial resolution (Δx) is diffraction limited [8] and can be calculated by the formula: Δx ≥ 0.61λ/NA, where λ means the wavelength of the light and NA is the numerical aperture 340 András Gorzsás and Björn Sundberg of the objective. The useful spectral range using an FPA detector on plant samples is ca. 2,000–950 cm−1. This gives a spatial resolution of ca. 10–20 µm, assuming NA = 0.3 for a 20× Cassegrain-type objective. In practical terms, it means that the spatial resolution is ca. 13 µm for the aromatic –C = C– vibration (lignins, monolignols) located at 1,600 cm−1, but it is only ca. 20 µm for the carbohydrate bands (from cellulose, hemicelluloses, pectins, starch, etc.) located between 1,000 and 1,100 cm−1. 17. Originally SE detectors had the advantage of being considerably faster than FPA detectors. However, with the arrival of the latest generation of FPA detectors with updated electronics (including data communication channels), this is no longer the case. 18. The scanner velocity for FPA measurements lies in the range where the frequency of everyday vibrations (such as from walking, printing, computers) can easily disturb and create resonance patterns (fringes) in the spectra. Thus, it is important to provide a vibration-free environment for FPA measurements by using a vibration-proof table in a nonresonant room (e.g., basement instead of high floors). 19. Software problems can mean any process that takes priority on the PC, ranging from automatic updates of the operative system to triggered scans of antivirus software. These take resources from the PC, which interrupts data streaming from the FPA detector. In addition, due to the complexity of the FPA detector and the extreme high flow of data, freezes and bugs are more frequent than for SE measurements. 20. A generic FT-IR spectrometer with low space and maintenance requirements can be purchased at a relatively low cost. For high-quality data, however, a vacuum-bench in a thermostated and vibration-proof environment is recommended. The basic setup for microspectroscopy includes a microscopy accessory equipped with a single element (SE) detector and knife-edge apertures. This generally allows for a single spectrum to be collected from an area of at least 50 × 50 µm with traditional infrared sources. A focal plane array detector (FPA) increases the cost (purchase, running, maintenance, and service) and the computing power required but will provide the highest (diffraction limited, see Note 16) spatial resolution available with traditional sources at the highest speed by recording thousands of spectra simultaneously. 21. The ball mill should be able to operate at 30 Hz and contain a minimum of 50 mg sample at a time. If milling is done with KBr (Subheading 3.1.1), then 2.5 g capacity is needed. FT-IR Spectroscopic Techniques 341 22. Must be made of agate and not ceramic. Pure agate is nonabsorbing in the infrared region, resistant to wear and tear, and nonreactive. 23. Standard microscopy slides can be used to store large number of sections. However, sections cannot be measured on glass slides and therefore they will have to be transferred after drying to infrared transparent windows (see Note 24) or mirrors (see Note 25) for measurements. The transfer often damages the sections and can make them difficult to flatten. 24. Infrared transparent windows are available in different size, shape, thickness, and materials. NaCl windows are cheap but very sensitive to moisture and are not recommended. For Arabidopsis sections, the best materials are BaF2, CaF2, and ZnSe. BaF2 and CaF2 are colorless and only very weakly soluble in cold water, whereas ZnSe is orange colored and practically insoluble (except in acids). They all are brittle and easy to scratch and therefore must be handled with care. They have different infrared transparencies with the following cutoff edges at 50 % transmittance (i.e., they only let through less than half of the infrared light below these limits and therefore should not be used in that spectral range): BaF2 ca. 850 cm−1, CaF2 ca. 1,050 cm−1, ZnSe ca. 550 cm−1. These windows are reusable but may be too expensive for sample storage. 25. Much like infrared transparent windows (see Note 24), carrier mirrors are available in different sizes, shapes, and materials. The best options for Arabidopsis sections are gold, silver, and aluminum mirrors, which all provide practically 100 % reflectance in the entire spectral range. They are reusable, but care must be taken to clean them with only soft cotton pads as they are easily scratched. Gold is the least reactive but also the most expensive mirror. 26. It is critical that drying and powderizing are performed in the same way for all samples, otherwise spectral differences between samples will occur as a consequence of different sample preparation and not necessarily originate from chemical differences. Ball milling affects particle size, which in turn affects optical and thus spectral properties. In addition, it can also affect the degree of polymerization [14, 15]. Ball milling also generates heat and thus may burn the sample. Thus, the optimal time and frequency for ball milling may need to be fine-tuned for different sample types. However, when the samples have widely different physical and chemical properties, even standardized ball milling is unable to neutralize all differences. 342 András Gorzsás and Björn Sundberg 27. Other, nonreactive and nontoxic IR transparent diluters can also be used. KBr is the most common because the optical components in the spectrometers are often also made of KBr. Thus, KBr mixed with the sample will not impose any additional restrictions. However, only KBr that is specifically labelled as “Infrared spectroscopy grade” should be used. Other types may contain very small quantities of IR active contamination (e.g., nitrate). Note that dry KBr is hygroscopic, so care must be taken to avoid humidity and to keep all equipment dry. 28. Dilution with KBr can vary depending on the amount and IR properties of the sample, but it should be very similar for all samples within an experiment. Ideally the resulting mixtures should have band intensities between 0.3 and 0.8 Abs units. 29. If measurement conditions are stable, background measurements are not required before every sample. Usually it is sufficient to record the background once or twice per day, typically at start and after longer breaks. OPUS automatically uses the last recorded background unless parameters (like spectral range, spectral resolution) have been changed, making the background and sample measurements incompatible. 30. On newer systems, there is no real advantage in giving the number of scans as the exponentials of two, but there is no harm in doing so. The number of scans can be increased to gain higher signal-to-noise ratios. However, the signal-tonoise ratio only increases with the square root of the number of scans. 31. Increasing the spectral resolution (to 2 or 1 cm−1, instead of 4 cm−1) is only important when narrow bands or small positional shifts need to be determined precisely. This is rarely the case for plant materials because band widths are in the region of tens of cm−1. Increasing spectral resolution results in longer measurement times. 32. Excessively long measurement times should be avoided because background fluctuations and other disturbances can occur during measurement. This is particularly important when less stable purge benches are used. 33. Typically only the 400–2,000 cm−1 region of the spectrum is used, as the broad OH band obscures most features around 3,000 cm−1 and makes standardization difficult by introducing a large integral value with high uncertainties in the total sum. This region can, however, contain valuable information, and it can therefore be advantageous to record the spectra in this range too. 34. The interferogram data blocks are small and thus do not increase the spectrum file size considerably. However, they FT-IR Spectroscopic Techniques 343 are valuable if Fourier transformation with different parameters (or manual phasing) is required [18]. 35. When a zero filling factor of 2 is applied with a 4 cm−1 spectral resolution, the resulting spectrum will list absorbance values at every 2 cm−1 (4 cm−1/2). However, this is only the result of the zero filling factor and in reality the spectral resolution remains 4 cm−1. 36. This protocol describes the measurement of individual samples. However, there are DRIFTS accessories available that enable sample automation for virtually all spectrometer types. These can be in the form of carousels or well plates and often come with their own bundled software. For automated measurements of samples, refer to the software manual of your sample automation accessory, and provide the measurement parameters for each sample as outlined in Subheading 3.1.2. 37. It is necessary to simultaneously standardize all spectra to ensure that they are treated in the exact same way. 38. For non-OPUS users, a high-order polynomial baseline correction should be applied whenever a linear baseline cannot be used. 39. The only way to obtain quantitative information is by using a nonnative internal standard, with precisely known concentrations. This means that a nonreactive compound is added to each sample in a precise quantity. This compound should produce a distinct and well-resolved band that is used for normalization and calibration. Currently, there is no compound available that would meet the criteria of a general internal standard for plant samples. 40. Vector normalization uses the sum of squares, while area normalization uses the sums as constant. This means that in vector normalization larger bands will have a higher weight. This is ideal for suppressing the contribution of noise but also disfavors small bands. 41. The reference band is often a distinct band of a compound to which everything will be related, i.e., the observed changes will be relative to this band. Min–Max normalization is not disturbed by band position shifts as long as the shifted band and the baseline region still remain in the frequency range used for the normalization (see Note 42). However, changes of band widths can introduce errors, since the normalization is based on band height instead of band area. 42. The frequency range should be chosen so that it contains the peak of the band to which the referencing is done and a baseline point where there are no bands. It is crucial that the frequency range does not contain any bands that are of higher intensity than the reference band. 344 András Gorzsás and Björn Sundberg 43. Data point table (.dpt) files are more convenient than Jcamp (.dx) files, because they can be opened in any standard text editing software (Notepad on Windows or TextEdit on Mac) and copied—pasted from there. However, .dpt files are more sensitive to international settings, most notably to decimal dots vs decimal commas. Files exported to Jcamp (.dx) format are less prone to such errors. 44. In version 7, OPUS also offers the option to save files in Matlab (.mat) format. 45. OPUS allows the creation of macros to automate tasks, such as multiple exporting. However, the creation of macros is beyond the scope of the presented protocols. 46. The size and shape of the IRE depends on the ATR accessory used. The most common materials are ZnSe, Ge, and diamond. The diamond crystal is the hardest and allows for the highest applied pressure. It is also the most resistant to mechanical wear and chemicals and allows the entire spectral range to be used. Ge and ZnSe IREs impose cutoff edges, but these are usually outside the spectral region used for the analysis of plants. 47. Since it is possible to specify different files for background subtraction in OPUS, it is often a good strategy to record both the empty IRE as well as the sample solvent. 48. Depositing a solution/suspension onto the IRE and evaporating the solvent is also a good strategy when bands of the solvent interfere. If that is the case, a change of solvent (e.g., H2O to D2O) is also an option. Although evaporation under certain conditions can induce changes in protein structure, it has been demonstrated that proteins normally retain enough solvent molecules to keep their solution structures relatively intact [19]. 49. Section thickness should be halved for reflection mode measurements as compared to transmission mode measurements, because the infrared light passes through the sample twice in reflection mode. 50. Using a cryomicrotome has the advantage that the material can be stored, and the method is easy and do not require tedious embedding. The plant material is attached directly on the sample holder with O.C.T. compound and the sample is trimmed with a razorblade to remove as much of the mounting media as possible. Sectioning is preferentially done at −20 °C with well-sharpened steel knives. During sectioning, the sections can be directly collected on an object glass with the help of a brush. To remove excess mounting media, the sections can be carefully rinsed with water directly on the object glass. FT-IR Spectroscopic Techniques 345 51. For vibratome sectioning, samples are molded in agarose (3–8 %) in Eppendorf tubes. The agarose plug is removed from the tube and glued on the sample holder with cyanoacrylate glue. To collect the section from the water bath to the object glass, a plastic Pasteur pipette can be used, where the tip has been cut to make the opening appropriate to sample size. 52. Samples embedded in paraffin and sectioned in a microtome may provide well-preserved sections, but this procedure is more time-consuming. Moreover, care must be taken, not to smear paraffin over the sample during sectioning. In addition, the spectrum of the paraffin used for embedding should be recorded separately, and the recorded sample spectra always compared to this reference to make sure no traces of paraffin are interfering with the analysis. Ideally, no embedding should be used. 53. Creating a section that is thin enough for spectroscopy can be challenging in the case of Arabidopsis, as those sections become very fragile and tear, fold, or otherwise lose shape, making anatomical features unrecognizable. To limit such damage, paraffin embedding can be used. Care must be taken, however, not to smear paraffin over the sample during sectioning. In addition, the spectrum of the paraffin used for embedding should be recorded separately, and the recorded sample spectra always compared to this to make sure no traces of paraffin are interfering with the analysis. Ideally, no embedding should be used. 54. Always mount several sections on the same carrier to be able to select the best one for measurements. In addition, consecutive sections can be saved for staining and inspection under light microscopy. However, never stain the sections that are to be measured by FT-IR microspectroscopy, as the dye(s) will appear in the spectrum. 55. The exact time required for drying depends on the sample, water content, and desiccator capacity, but there should be no water vapor detected in the spectrum at measurement. 56. In addition to the background position, OPUS has four sample positions with predefined names: Load Position 1 and 2 and Special Position 1 and 2. All four are equivalent, meaning that four different sample positions can be predefined, and OPUS keeps them until they are overwritten or until OPUS is shut down. This is very useful as positions can be defined in one measurement and found again in subsequent measurements (see Note 65). 57. This is similar to the way FPA data files are built up (Subheading 3.4.3). However, SE data blocks are numbered 346 András Gorzsás and Björn Sundberg in the order they were marked and measured, while FPA data blocks are numbered by the pixel number in the FPA image. 58. For optimal results, the sample tray should be boxed in, although by default it might be an open design. A boxed sample tray limits fluctuations in H2O and CO2 levels and allows purging with dried instrument air or N2. 59. Start with a low number of scans and only increase it if necessary (i.e., low signal-to-noise ratio). Usually the problem in microscopy is too high sample intensities (because of too thick sections), and not the opposite. Increasing the spectral resolution (to 2 or 1 cm−1, instead of 4 cm−1) is only important when narrow bands or small positional shifts need to be determined precisely. This is usually not the case for plant materials where band widths are in the range of tens of cm−1. Increasing spectral resolution results in longer measurement times. 60. If the visible image is solid black with no features, make sure that (a) the correct mode (transmittance or reflectance) is set on the microscope accessory, (b) the light intensity is sufficiently high, and (c) the light is directed towards the video camera and not towards the front LCD display. 61. OPUS keeps the defined background positions until a new one is defined or until OPUS is shut down. 62. Snapshots can be taken independent of the measurements and afterwards attached to a file by selecting “Attach Video Image” in the “Edit” menu. This is not always straightforward, however, and taking snapshots during the measurements is much preferred. 63. Several snapshots can be taken and kept in a single file (i.e., an overview image, see Note 64, or “before” and “after” images). All images are numbered consecutively and can later be numerically accessed. 64. An overview of a larger area can be created by stitching together several images. To do that, right-click anywhere inside the green square in the Live Video Pane and select “Video Image…” then “Set + Scan Overview Image Area” in the opening contextual menu. Move to the bottom left corner of the area to be overviewed, click on the “Set Area” button in the open dialog window. Move to the top right corner of the area to be overviewed, click on the “Set Area” then on the “Overview Area Now Defined” buttons. The tray will start moving from the bottom left corner to the top right corner (the area can only be a rectangle), taking a snapshot at each position and stitching these individual pictures together into one large overview image. The overview image is displayed on the right “Still Image Pane” in the measure- FT-IR Spectroscopic Techniques 347 ment workspace and can be used for quickly moving to positions: right-click on it, select “Mouse mode…” and “Move to position.” The cursor changes and left-clicking anywhere on the overview image will move the tray to that position. To stop quick movement, right-click on the overview image, select “Mouse mode…” and “No action” in the contextual menu. Similarly, distances can be measured in the overview image: right-click, select “Mouse mode…” and “Measure distances.” As the cursor changes, left-click at one position, hold down the left mouse button and move to another position. A straight line will be created from the initial position to the new position, with the distance displayed in µm. To exit the distance measurement mode, right-click on the overview image, select “Mouse mode…” and “No action” in the contextual menu. 65. Moving to any of the defined positions can be done by rightclicking anywhere inside the green square in the Live Video Pane and selecting “Moving to Defined Positions…” and then the position name (Background Position, Load Position 1 and 2, Special Position 1 and 2). 66. The software control window for the microscopy accessory in theory allows changing between visible and infrared light, moving the tray, adjusting light intensity, and changing between transmission and reflectance modes. However, it is less reliable than the direct hardware control and often reacts sluggishly. Therefore, it is recommended to change these parameters on the direct hardware control of the microscopy accessory. 67. There is a direct “Collect Background at Background Position” command, which also involves the automatic movement of the tray to the predefined background position and the background measurement. While this is convenient, sometimes the xy stage control fails and the tray does not move. It is therefore advisable to first move to the desired position while using visible light to make sure that the tray reacts, then change to infrared light and finally start the measurement. 68. Changing to visible light after the measurement is not critical but good practice. It enables the user to determine if there have been changes to the sample during measurement (shifting out of focus, curling, etc.) and makes it more convenient to start up the next measurement. 69. The file is only saved when the measurement workspace is closed. If OPUS is closed, it is lost. 70. Instead of manually marking individual positions, linear, rectangular, and elliptical grids can also be created automatically 348 András Gorzsás and Björn Sundberg (right-click: “Measurement Spots/Grid…” and “Define Linear/Rectangular/Elliptical Grid” options). 71. To stop marking positions, right-click in the Live Video Pane and select “Mouse mode…” then “No action.” 72. The term “extraction” is somewhat misleading because the extracted spectra are not removed from the original 3D file but copied and saved in a new file. 73. Several spectra can be extracted at the same time with incrementing filenames (i.e., spectrum1.0, spectrum2.0, spectrum3.0) or extensions (spectrum1.0, spectrum1.1, spectrum1.2) by selecting the appropriate option in the dialog window. Generally, it is best to use incremented filenames and not extensions. 74. “The series of single blocks” option ensures that each extracted spectrum is stored as an individual single spectrum file. Another option for 3D files containing several spectra is the “Average block,” which means that all extracted spectra will be first averaged, and only the resulting average spectrum is stored in a file, not the individual spectra. There is no good reason to choose this averaging option, and it is not recommended even if average spectra are needed. Averages are best created afterwards, via the “Manipulate” menu and “Averaging” option, to make sure that no outlier and/or bad quality spectrum is included. Moreover, it is always recommended to keep the individual spectra for statistical reasons and multivariate analysis. 75. Loading the spectra is recommended to make sure they are all of acceptable quality before proceeding with standardization. 76. As opposed to the SE detector, the FPA detector has no direct temperature indicator light. This is an unfortunate oversight, as any attempts to start an FPA measurement while the detector is not cold enough result in an error message, and OPUS is likely to freeze and require a complete PC shutdown and restart. Generally, after the detector is slightly overfilled (i.e., a small amount of liquid N2 spills over), it requires 10–20 min before it reaches operational temperatures (usually below 87 K). For an exact temperature readout, see step 12 in Subheading 3.4.2. 77. Even though some carriers would allow for lower wavenumber limits (see step 7 in Subheading 3.3.2), the FPA detector itself has a cutoff at ca. 900 cm−1. 78. FPA files can easily exceed 200 MB, depending on spectral range, resolution, number of scans, and data blocks saved. If data storage is not a limitation, it is recommended to save all data blocks in case troubleshooting or error backtracking is required. FT-IR Spectroscopic Techniques 349 79. We do not list guideline numbers here, since they can vary enormously. An older generation FPA could use a scanner velocity of 168 Hz, whereas a new generation FPA can use several kHz scanner velocities. 80. In contrast to SE measurements (Subheading 3.3.2), there is no point using apertures at all. 81. The window that opens is the same as the one accessed from the “Check Signal” tab of the measurement setup (step 12 in Subheading 3.4.2). 82. The FPA manufacturers suggest to keep the gain as low as possible. On the other hand, in our experience with a new generation FPA in a Bruker Hyperion 3000 system, high gains resulted in better quality spectra. Although we cannot confirm whether this is a general rule or an exception, it makes it worthwhile to test several gain settings on the same sample and compare the results in order to determine the optimal gain value. 83. All pixels are individuals and will have their own readouts, some more intense than others. However, bad pixels are the ones that produce erroneous readouts and differ significantly from the rest. They can be detected in the Live FPA Image Pane as the pixels that do not change color (i.e., not changing intensity) in response to changes, such as moving the sample and changing the condenser. These should be marked and compensated for by right-clicking on the Live FPA Image Pane and selecting the “Bad Pixel…” option for marking, saving the list of bad pixels and choosing “Correct Bad Pixels.” Correction is made by automatically replacing the readout of the bad pixel by the average of the readouts of the pixels immediately surrounding it. 84. The condenser has no function in reflection mode. 85. There is an “Optimize” button in this window, which should initiate an automated process to find the best offset and gain settings. However, it does not always work, and manually setting offset and gain values (and keeping a record of the settings) is recommended. 86. Due to the extreme flow of data from the FPA, the computer is unable to show the progress of the measurement (i.e., no scan number counts). Unfortunately, even when the measurement is finished, OPUS still displays a green status bar as if the measurement was still in progress. The best indicator of status is therefore the Live Video Image Pane. While it is blank dark blue, the measurement is still ongoing. When it has turned back to black with a green square and crosshair marking the FPA size and position, the measurement is finished. It is important not to do anything on the PC during 350 András Gorzsás and Björn Sundberg measurements (no copying of files, etc.), as any activity can interrupt data transfer from the FPA to the PC. 87. Pixels are numbered consecutively, from left to right and bottom to top, row-wise, starting from number 0, not from number 1. 88. The defined integration method produces heat maps that are crude and generic and as such should not be considered as accurate chemical images. Although the right and left edges can be fine-tuned for each defined integral to produce more accurate maps, it is not the purpose here, and this is why neither baseline correction nor normalization had to be performed prior to this integration. 89. From version 7, OPUS automatically performs a dummy integration and opens up a new type of view called “Chemical Imaging” when a 3D file is opened. Therefore, steps 1–5 are not needed for OPUS 7 users. 90. When using OPUS version 7, the error message “An invalid argument was encountered” can pop up. It is a harmless bug that can safely be ignored. 91. The “Select trace” drop-down list in the Mapping tab should contain all the Labels given to bands in step 3 in Subheading 3.4.3, i.e., “lignin 1595,” “lignin 1510,” “-C = O 1740,” and “carbohydrates.” Choose the label that gives the chemical image (heat map) with the most details and features. Often, this will be the –C = O band at 1,740 cm−1 because it is usually intense, nonoverlapping, and is present in virtually all tissues. In addition, the –C = O band is mapped with the highest spatial resolution because it is in the higher wavenumber end of the spectral region (see Note 16). 92. The actual numbers in the “Contours” drop-down menu may not be up to date and may refer to the levels of the intensity of a different band. Therefore, this setting may need to be revisited after clicking “Apply” and changing to a different tab within the same dialog window, which should update its values. 93. There are many different view options for chemical imaging (heat maps), which can be used according to personal preference. It is most important to create a heat map that is detailed and can easily be correlated to visible features in the section to allow exact positioning and orientation within the image. 94. OPUS remembers these view settings for the following “TRC” blocks, except if it quits unexpectedly. Therefore, to avoid resetting these parameters, do not close the Map + Vid + Spec 3D view. Instead, just unload the file (rightclick on the file name and select “Unload File”). FT-IR Spectroscopic Techniques 351 95. Microsoft Windows and video card driver settings can cause the “No contour” option to return a blank white image, with no visible snapshot shown. Make sure the “Video Image” option is selected in the “3D Properties” tab, and the correct snapshot number is selected in the “Select image” dropdown list in the “Mapping” tab. If a blank Image pane remains, try to swap the left and right Image panes: displaying “No contours” in the right pane and the chemical image (heat map) “Contour lines and colors” in the left pane. If still unsuccessful, update the video card driver. 96. Unfortunately, “click and point” moving of the cursor is only available in the Chemical Imaging view of OPUS 7. For earlier OPUS versions, only stepwise moving is possible, and even that can cause frequent crashes that result in a loss of all view settings in OPUS (see Note 94). For stepwise moving, use the X and Z controls in the Spectrum pane. However, the X and Z controls can only have values that are below the X′ and Z′ control values. Therefore, if the crosshair does not move further in the X or Z direction, change the X′ or Z′ values to their maxima. 97. “The series of single blocks” is the only meaningful choice here (see Note 74). 98. As opposed to the single element detector (see Note 75, Subheading 3.3.3), and to Method 2 of Subheading 3.4.3, loading of all spectra is not recommended because of their large number in an FPA image. This is why the quality of the spectra must be checked and bad pixels excluded in step 8 of Method 1 in Subheading 3.4.3. 99. Version 7 of OPUS contains another major convenience factor in addition to its “click and point” feature for moving (see Note 96): the pixels marked during “click and point” are all loaded in the “Spectra” tab of the Spectrum pane and listed in the “List” tab. To extract these spectra, select all of them, right-click and choose “Extract Spectrum….” In the dialog box that opens the names of the spectra can be constructed from placeholder blocks, such as filename and index (pixel number). This way, spectra are automatically named containing their pixel numbers without the need to manually type the names. Thus, steps 15–19 of Method 2 in Subheading 3.4.3 are not needed for OPUS 7 users. 100. Since only the first block is extracted, it does not matter whether “Block” or “End of file” is specified in the “Extraction Range” because those values are ignored in this case. 352 András Gorzsás and Björn Sundberg Acknowledgements The authors thank Dr. John Loring and Dr. Janice Kenney for comments and discussions and Kjell Olofsson for assistance in sectioning. The protocols were developed and tested using the instruments of the Vibrational Spectroscopy Platform of the Chemical Biological Centre, Umeå University and Swedish University of Agricultural Sciences, Umeå, Sweden. References 1. Zhou GW, Taylor G, Polle A (2011) FTIR-ATR based prediction and modelling of lignin and energy contents reveals independent intraspecific variation of these traits in bioenergy poplars. Plant Methods 7:9 2. Fackler K et al (2011) FT-IR imaging microscopy to localise and characterise simultaneous and selective white-rot decay within spruce wood cells. Holzforschung 65:411–420 3. Stevanic JS, Salmén L (2009) Orientation of the wood polymers in the cell wall of spruce wood fibres. Holzforschung 63:497–503 4. Rana R et al (2008) FTIR spectroscopy in combination with principal component analysis or cluster analysis as a tool to distinguish beech (Fagus sylvatica L.) trees grown at different sites. Holzforschung 62:530–538 5. Dokken KM, Davis LC, Marinkovic NS (2005) Use of infrared microspectroscopy in plant growth and development. Appl Spectrosc Rev 40:301–326 6. Wetzel DL (2009) FT-IR microspectroscopic imaging of plant material. In: Salzer R, Siesler HW (eds) Infrared and raman spectroscopic imaging. Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, pp 225–258 7. Gorzsás A et al (2011) Cell-specific chemotyping and multivariate imaging by combined FT-IR microspectroscopy and orthogonal projections to latent structures (OPLS) analysis reveals the chemical landscape of secondary xylem. Plant J 66:903–914 8. Lasch P, Naumann D (2006) Spatial resolution in infrared microspectroscopic imaging of tissues. Biochim Biophys Acta 1758:814–829 9. Åkerholm M, Hinterstoisser B, Salmén L (2004) Characterization of the crystalline structure of cellulose using static and dynamic FT-IR spectroscopy. Carbohyd Res 339:569–578 10. Noda I, Ozaki Y (2004) Two-dimensional correlation spectroscopy. Applications in vibrational and optical spectroscopy. Wiley, Chichester 11. Socrates G (2001) Infrared and Raman characteristic group frequencies. Tables and charts, 3rd edn. Wiley, Chichester 12. Trygg J et al (2006) Chemometrics in metabolomics. Springer, Berlin 13. Trygg J, Wold S (2002) Orthogonal projections to latent structures (O-PLS). J Chemometr 16:119–128 14. Chalmers JM (2001) Mid-infrared spectroscopy of the condensed phase. In: Chalmers JM, Griffiths PR (eds) Theory and instrumentation, vol 1, Handbook of vibrational spectroscopy. Wiley, Chichester 15. Schwanninger M et al (2004) Effects of shorttime vibratory ball milling on the shape of FT-IR spectra of wood and cellulose. Vib Spectrosc 36:23–40 16. Sommer AJ (2001) Mid-infrared transmission microspectroscopy. In: Chalmers JM, Griffiths PR (eds) Sampling techniques for vibrational spectroscopy, vol 2, Handbook of vibrational spectroscopy. Wiley, Chichester 17. Faix O (1991) Classification of lignins from different botanical origins by FT-IR spectroscopy. Holzforschung 45:21–28 18. Romeo M, Diem M (2005) Correction of dispersive line shape artifact observed in diffuse reflection infrared spectroscopy and absorption/ reflection (transflection) infrared microspectroscopy. Vib Spectrosc 38: 129–132 19. Oberg KA, Fink AL (1998) A new attenuated total reflectance Fourier transform infrared spectroscopy method for the study of proteins in solution. Anal Biochem 256:92–106 Chapter 19 A Pipeline for 15N Metabolic Labeling and Phosphoproteome Analysis in Arabidopsis thaliana Benjamin B. Minkoff, Heather L. Burch, and Michael R. Sussman Abstract Within the past two decades, the biological application of mass spectrometric technology has seen great advances in terms of innovations in hardware, software, and reagents. Concurrently, the burgeoning field of proteomics has followed closely (Yates et al., Annu Rev Biomed Eng 11:49–79, 2009)—and with it, importantly, the ability to globally assay altered levels of posttranslational modifications in response to a variety of stimuli. Though many posttranslational modifications have been described, a major focus of these efforts has been protein-level phosphorylation of serine, threonine, and tyrosine residues (Schreiber et al., Proteomics 8:4416–4432, 2008). The desire to examine changes across signal transduction cascades and networks in their entirety using a single mass spectrometric analysis accounts for this push—namely, preservation and enrichment of the transient yet informative phosphoryl side group. Analyzing global changes in phosphorylation allows inferences surrounding cascades/networks as a whole to be made. Towards this same end, much work has explored ways to permit quantitation and combine experimental samples such that more than one replicate or experimental condition can be identically processed and analyzed, cutting down on experimental and instrument variability, in addition to instrument run time. One such technique that has emerged is metabolic labeling (Gouw et al., Mol Cell Proteomics 9:11–24, 2010), wherein biological samples are labeled in living cells with nonradioactive heavy isotopes such as 15N or 13C. Since metabolic labeling in living organisms allows one to combine the material to be processed at the earliest possible step, before the tissue is homogenized, it provides a unique and excellent method for comparing experimental samples in a high-throughput, reproducible fashion with minimal technical variability. This chapter describes a pipeline used for labeling living Arabidopsis thaliana plants with nitrogen-15 (15N) and how this can be used, in conjunction with a technique for enrichment of phosphorylated peptides (phosphopeptides), to determine changes in A. thaliana’s phosphoproteome on an untargeted, global scale. Key words Phosphorylation, Metabolic labeling, Stable isotope labeling, Phosphoproteomics, Mass spectrometry 1 Introduction Two important methods introduced in this protocol necessary for quantitative phosphoproteomics are (1) 15N-labeling of Arabidopsis thaliana and (2) titanium dioxide (TiO2)-based phosphopeptide enrichment [2]. We assume that the reader is already familiar with Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_19, © Springer Science+Business Media New York 2014 353 354 Benjamin B. Minkoff et al. the basic theory and use of high-resolution mass spectrometers [1] (e.g., the Thermo Fisher LTQ Orbitrap or similar instruments) and the software needed to systematically analyze their output. The reader is also referred to recent more general reviews on this subject as well [4]. The concept of metabolic labeling to a very high enrichment level (to correct for variability in homogenization and protein extraction) was first coupled with mass spectrometry in 2002 using deuterated leucine in myoblast and fibroblast cell lines [5]. Since that time, many forms of metabolic labeling have been described [3]. Generally, the question asked and model system used dictate the isotope used for labeling. In A. thaliana, it is clear that the most cost-effective, logical means of labeling is 15N. In growing up two sets of plants, one provided solely 15N as a nitrogen source and the other 14N, the product is a set of plants that have >98 % 15N incorporation [6] or a natural isotopic distribution of 14N throughout. From a proteomic standpoint, this is key—15N is incorporated into not only the nitrogenous amino acid residue side chains but into every amide bond within the peptide backbone. The same phosphopeptides, from both sets, prepared and processed identically, will have a mass/charge difference directly proportional to the number of nitrogen atoms they contain. Because of the large number of nitrogen atoms in each peptide, a given peptide’s isotopic envelope is very complex, and abundance of the peptide is hard to quantify when the degree of enrichment is below 90–95 %. As long as this degree of labeling is obtainable, there is readily available software that can separately identify and quantify the amount of any one peptide that is labeled in vivo with either 15N or 14N. By shifting the m/z of the 15N-labeled peptide population away from its non-labeled counterpart, the samples can be combined and the two peptide populations detected in a single analysis. It is this concept that drives the field of metabolic labeling associated with mass spectrometry. Furthermore, from this concept, it follows that changes in global phosphorylation as a result of treating one of the two sets of plants (14N or 15N) and mock-treating the other can be assayed using a high-resolution mass spectrometer. Specifically, comparing the signal obtained from a 14N-containing phosphopeptide to the signal from the 15N-containing one gives the observer the relative degree to which the treatment changes the amount of phosphorylated peptide in A. thaliana (see Fig. 1). Experiments such as these provide meaningful data that can be used to probe for biological relevance of phosphorylation events to a specific response. It should be noted that the reciprocal experiment is done concurrently, i.e., if in the original set of plants, the 14N-containing plants were treated and the 15N-containing plants were mock-treated, a concurrent experiment should be done in which 15N-containing plants are treated and 14N-containing plants are mock-treated. This 15 N Labeling for Phosphoproteomics 355 Fig. 1 Experiment one, on the left, shows the treatment given to the 14N-containing Arabidopsis. A change in a phosphopeptide’s abundance as a result of treatment, reflected in the isotopic envelopes and extracted ion chromatograms after combination, homogenization, digestion, enrichment, and analysis, is shown. As seen on the right side, a reciprocal change is reflected in the reciprocal experiment is an important experimental control—a relative change in the first described experiment should be reflected as a reciprocal relative change in the second (see Fig. 1) [7]. By performing the reciprocal experiment, artifacts that might arise from the labeling itself can be detected and excluded. It should also be noted that the relatively small size of the Arabidopsis seed makes it uniquely suitable for in vivo labeling. Were the seed larger, the contribution of 14N from storage protein in the seed would prevent the growing tissue from becoming labeled to the 90–95 % 15N content needed for accurate quantitation. Because the seed is so small (c. 12 mg dry weight), the endogenous 14N is negligible, and after a week’s growth, the tissue harvested from the seedling is well within the necessary 356 Benjamin B. Minkoff et al. range to make the experiments possible. For example, the model legume, Medicago truncatula, has seeds only 30–50 times more massive than Arabidopsis, but endogenous unlabeled protein complicates the use of metabolic labeling in these plants. There are ways around this issue, including software-based approaches [6] and growing tissue culture cells rather than seedlings, but these are outside the scope of this chapter. Another problem that limits phosphoproteomic analysis is obtaining a sample concentrated enough in phosphopeptides that instrument analysis time is used optimally for detecting and assaying levels of phosphopeptides, as opposed to unmodified peptides. An unenriched sample yields a vanishingly small proportion of phosphopeptides (they are eclipsed in multiple ways by highly abundant peptides, correlated with proteins that always maintain a disproportionately high level of expression)—thus much work has been done to design and optimize methods for enriching phosphopeptides from a sample of total protein extract [8]. Multiple techniques and varieties of such have been described; however, the preferred technique is usually laboratory dependent. The method described herein uses spherical TiO2 particles packed as a chromatographic column over which a trypsin-digested protein extract is run. Under highly acidic conditions (pH ≤ 3), acidic amino acid side-chain groups are protonated, whereas a phosphoserine or phosphothreonine (pKa < 1.7) remains negatively charged [9]. This remaining negative charge binds to the associated TiO2, whereas peptides lacking a phosphoryl group wash through the column. Elution of phosphopeptides is performed using a highly basic ammonium hydroxide solution—the excess OH− ions outcompete bound phosphopeptides, and the TiO2 (pKa/pKb = 4.4/7.7) becomes negatively charged [10]. The reader should be aware that there are many available means of enriching complex biological samples for phosphopeptides, but TiO2 has been implemented in many laboratories and is the most prevalent way of doing so currently. This chapter thus describes the process of metabolically labeling A. thaliana with 15N-containing salts, processing the tissue, enriching extracted, digested protein for phosphoryl-containing peptides, analyzing concentrated samples on an Orbitrap-based mass spectrometer, and post-analysis data processing, guiding the reader from seeds and media to a file output that is easy to work with and has had standard QA techniques performed. 2 Materials 2.1 Growing Plant Material 1. Magenta boxes, GA 7, or 250 mL Erlenmeyer flasks (see Note 1). 2. Wild-type or mutant seeds of A. thaliana, 12 mg of seeds (about 20 μg/seed, or 600 seeds) per box. 15 N Labeling for Phosphoproteomics 357 3. 10× Murashige and Skoog (MS) Micronutrient Solution. Store at 4 °C (see Note 2). 4. 1 M calcium chloride (CaCl2). 5. 1 M magnesium sulfate (MgSO4). 6. Monobasic potassium phosphate (KH2PO4). 7. 2-(N-morpholino)ethanesulfonic acid (MES). 8. Sucrose. 9. Ammonium nitrate (NH4NO3), 15N-NH4NO3, 98+ % (Cambridge Isotope Laboratories) and natural abundance, 14 N-NH4NO3 (Sigma). No special care needed in handling 15 N-stable (nonradioactive) isotopes. 10. Potassium nitrate (KNO3), 15N-KNO3, 99 % (Cambridge Isotope Laboratories) and natural abundance, 14N-KNO3 (Sigma). 11. 1 M potassium hydroxide (KOH). 12. 18 MΩ ultrapure deionized H2O (see Note 3). 13. 2 L graduated cylinder. 2.2 Sterilizing Seeds: Liquid Method (See Note 4) 1. 1.7 mL microcentrifuge tubes. 2. 95 % v/v ethanol. 3. 70 % ethanol, 0.1 % triton X-100, 2 % bleach solution (all in H2O, v/v). 4. Whatman filter paper. 2.3 Sterilizing Seeds: Vapor Method (See Note 4) 1. 250 mL glass beaker. 2. Sealable desiccator. 3. 1.7 mL microcentrifuge tubes. 4. Bleach, 5–10 % sodium hypochlorite (NaOCl). 5. Concentrated hydrochloric acid. 2.4 Plant Growth 1. Orbital shaker with platform. 2. Fluorescent lights above shaker platform, 2,600–3,200 lx light intensity. 2.5 Tissue Harvest 1. 3.5 in. porcelain mortar and pestle (CoorsTek). 2. Liquid nitrogen. 3. Dry ice. 4. 50 mL disposable centrifuge tubes. 5. Paper towels. 6. Salad spinner (see Note 5). 7. Disposable spatula. 358 Benjamin B. Minkoff et al. 2.6 Full Homogenization 1. 4 g fresh weight combined, ground tissue: 2 g 14N-containing tissue, 2 g 15N-containing tissue. Store at −80 °C. 2. 50 mL disposable centrifuge tubes. 3. 50 mL Oak Ridge tubes. 4. Centrifuge capable of housing rotor for 50 mL oak ridge tubes and 1,500 × g/4C spins. 5. 100 mL tricornered polypropylene beakers. 6. Homogenization buffer: 290 mM sucrose, 250 mM Tris–HCl (pH 8), 25 mM EDTA, 25 mM sodium fluoride, 50 mM sodium pyrophosphate, 1 mM ammonium molybdate, 0.5 % w/v polyvinyl pyrrolidone, in H2O. Store at 4 °C. 7. 5 μg/μL α/β (alpha/beta) casein stock in H2O (see Note 6). 8. 500 mM dithiothreitol in H2O (DTT). Store at −20 °C. 9. Saturated stock of phenylmethylsulfonyl fluoride (PMSF) in isopropanol. Store at 4 °C. 10. 21.5 mM leupeptin in H2O. Store at −20 °C. 11. 1.5 mM pepstatin in ethanol. Store at −20 °C. 12. 2 mM bestatin in H2O. Store at −20 °C. 13. 50 mM 1, 10 phenanthroline in 95 % ethanol. Store at 4 °C. 14. 100 mM vanadate. Store at −20 °C (see Note 7). 15. 2.8 mM E64 in H2O (Sigma). Store at −20 °C. 16. Sonicator with 1 cm probe (see Note 8). 17. Miracloth (Calbiochem). 2.7 Methanol/ Chloroform Protein Extraction 1. Methanol. 2. Chloroform. 3. 18 MΩ ultrapure deionized H2O (see Note 3). 4. 50 mL polypropylene disposable centrifuge tubes. 5. 8 M urea in 50 mM ammonium bicarbonate. Store at −20 °C. 6. PhosSTOP tablets (Roche). Store at 4 °C. 7. 80 % v/v acetone. 8. 1.5 mL LoBind microcentrifuge tubes. 9. Tabletop sonicator with 3 mm probe, capable of an output wattage of ≥3. 2.8 In-Solution Trypsin Digestion 1. 50 mM ammonium bicarbonate. 2. PhosSTOP tablets (Roche). Store at 4 °C. 3. BCA (bicinchoninic acid) assay kit. 4. 500 mM DTT. Store at −20 °C. 5. 500 mM iodoacetamide in H (IAA). Store at −20 °C. 15 N Labeling for Phosphoproteomics 359 Fig. 2 Packaging, pre- and post-modification syringes, and the Dremel tool used, from left to right. Also shown is the adapter (1, 3, and 6 mL) 6. Trypsin, lyophilized. Store at −80 °C. 7. 15 mL disposable centrifuge tubes. 2.9 Solid Phase Extraction/Peptide Concentration (All Liquid Solutions Are v/v) 1. tC18 3 cc Sep-Pak cartridges (Waters). See Note 9. 2. Acetonitrile. 3. 0.1 % trifluoroacetic acid in 18 MΩ ultrapure deionized H2O. 4. 80 % acetonitrile in 18 MΩ ultrapure deionized H2O/0.5 % formic acid (see Note 10). 5. 1.5 mL LoBind microcentrifuge tubes. 6. Vacuum centrifuge. 7. Glass syringe pipettes (Hamilton). 8. Ring stand/tube clamps. 9. 20 mL Luer-Lok tip syringes with manual modification (see Fig. 2 and Note 11). 10. Syringe adapter, 1, 3, and 6 mL (Varian) (see Fig. 2 and Note 11). 2.10 Titanium Dioxide-Based Phosphopeptide Enrichment (All Liquid Solutions Are v/v) 1. 19-gauge machined metal sheath with internal, sliding wire. 2. Empore high-performance extraction discs, C-8, 47 mm (3M). 3. GeneMate P200 yellow pipette tips (see Note 12). 4. 10 μm titansphere titanium dioxide resin (GL Sciences Inc, Japan). 5. Methanol. 6. 18 MΩ ultrapure deionized H2O (see Note 3). 360 Benjamin B. Minkoff et al. Fig. 3 Apparatus and materials are shown in (a), and a mock-enrichment setup is shown in (b). During enrichment, pressure must be applied downward to the syringe and column together as well as the syringe plunger itself, in order to prevent large amounts of pressure from forcing syringe and tubing from tip 7. 300 mg/mL lactic acid in 80 % acetonitrile/0.1 % trifluoroacetic acid, pH ≤ 3. 8. 80 % acetonitrile/0.1 % trifluoroacetic acid. 9. 30 % acetonitrile/0.1 % formic acid. 10. 1 % ammonium hydroxide in 18 MΩ ultrapure deionized H2O, pH ≥ 10. 11. 1.5 mL LoBind microcentrifuge tubes. 12. Formic acid. 13. Vacuum centrifuge. 14. In-house-constructed apparatus for procedure (see Fig. 3 and Note 13). 15. 5 mL Luer-Lok tip syringe. 15 N Labeling for Phosphoproteomics 361 16. 5 mm outer diameter latex tubing. 17. Clamp from ring stand. 18. 500 fmol/μL solution angiotensin-II phosphate peptide (Calbiochem). 2.11 Mass Spectrometric Analysis 2.12 Data Processing: From Direct Output to Database Search (See Note 15) 1. 0.1 % v/v formic acid in 18 MΩ ultrapure deionized H2O. 2. Orbitrap-based mass spectrometer/online liquid chromatography system (see Note 14). 1. Trans-Proteomic Pipeline [11]. 2. Mascot Daemon (Matrix Sciences). See Note 16. 3. TAIR Arabidopsis Proteomic Forward/Reverse Database. 4. PC running Windows 2000 or newer. 5. .raw file(s) from Orbitrap. 2.13 Data Processing: From Database Search Output to Census Processing (See Note 15) 1. Trans-Proteomic Pipeline [11]. 2. In-house-developed false discovery rate script (see Notes 15 and 17). 3. Microsoft WordPad or equivalent text editor. 4. Census processing script and viewer, freely available [12]. See Note 18. 5. Census config file. 6. PC running Windows 2000 or newer. 7. .dat file(s) output from Mascot database search. 2.14 Post-census Processing 1. In-house-developed TAIR Area Ratio Script (see Notes 15 and 17). 2. Microsoft Excel (or spreadsheet program capable of viewing and editing .tsv files). 3. census_chro.xml output file(s) from Census analysis. 3 Methods 3.1 Media Preparation (See Note 19) 1. Prepare 2 L modified MS media: combine 200 mL 10× MS Micronutrient Solution, 3 mL 1 M CaCl2, 1.5 mL 1 M MgSO4, 170 mg KH2PO4, 1 g MES, in 2,000 mL graduated cylinder. Bring volume to 1.8 L using H2O. 2. Split into two 900 mL aliquots (for 1L-14N and 1L-15N, final volumes). 3. To 14N solution, add 0.825 g natural abundance (14N) NH4NO3 and 0.96 g natural abundance (14N) KNO3. 362 Benjamin B. Minkoff et al. 4. To 15N solution, add 0.825 g 15 N-KNO3. 15 N-NH4NO3 and 0.96 g 5. Mix thoroughly. 6. Adjust pH to 5.7 using 1 M KOH while stirring. 7. Add 10 g sucrose and stir until fully dissolved. 8. Bring both solutions to 1 L using H2O. 9. Aliquot 75 mL/magenta box. 10. Replace lids on magenta boxes and apply autoclave tape to lid. 11. Autoclave on liquid setting for 45 min. 12. Post-sterilization, remove from autoclave and let cool to room temperature. 3.2 Seed Sterilization: Liquid Method (See Note 4) 1. All steps performed in sterile laminar flow hood, using aseptic technique. 2. Wet sections of filter paper (1 section/magenta cube to be grown) using 1 mL 95 % ethanol. 3. Add 12 mg of seeds to a 1.7 mL microcentrifuge tube per magenta cube to be grown (see Note 20). 4. Add 1 mL 70 % ethanol, 0.1 % triton X-100, 2 % bleach solution to seeds. Mix by inverting. 5. Soak for 5 min, shaking tube(s) at roughly 1 min intervals. 6. Allow seeds to sink to bottom of tube. Remove as much liquid as possible. 7. Rinse with 1 mL 95 % ethanol, shake, and allow seeds to settle to bottom. Remove as much liquid as possible. 8. Add an addition 1 mL 95 % ethanol. Suspend seeds in liquid. 9. Pipette seeds and ethanol onto filter paper (see Note 21). 10. Allow to dry in sterile hood. 3.3 Seed Sterilization: Vapor Method (See Note 4) 1. Place seeds (no more than 0.5 mL) into 1.7 mL microcentrifuge tube. Many tubes can be done at once to increase throughput. 2. In a sealable desiccator under a hood, place the rack containing the tubes and a 250 mL beaker containing 100 mL bleach. 3. Attach a piece of tape with sharpie writing onto the test tube rack (see Note 22). 4. Add 3 mL concentrated HCl to bleach and immediately seal container. 5. Allow fumes to sterilize seeds 4–16 h. 6. Open desiccator, seal microfuge tubes, and dispose of bleach/ HCl appropriately. 15 N Labeling for Phosphoproteomics 363 Fig. 4 Plants growing on shaker. Plant mass is right for processing—plants were frozen and ground shortly after picture was taken 3.4 Plant Growth 1. When magenta boxes and media have cooled to room temperature and seeds and filter paper have dried, label boxes and transfer seeds to magenta boxes (see Note 23). 2. Place boxes on orbital shaker under the following conditions: approximately 30 rpm, 24 h light, room temperature (~23 °C). See Note 24 and Fig. 4. 3.5 Treatment and Tissue Harvest 1. Allow plants to grow sufficiently (10–12 days). See Fig. 5. 2. Administer treatment in reciprocal fashion (see Note 25). 3. Label and prechill 50 mL disposable centrifuge tubes on dry ice (one for each magenta cube). 4. Carefully remove tissue from box, rinse with H2O, and dry using preferred method (see Note 5). 5. Transfer plant tissue to prechilled mortar (see Note 26). 6. Pour additional liquid nitrogen into mortar, covering plant tissue. 7. When most of the liquid nitrogen has evaporated, grind plant tissue quickly and completely into a fine powder. 8. Transfer plant powder into prechilled, pre-labeled 50 mL disposable centrifuge tube using disposable spatula to scrape powder from mortar. Keep on dry ice or place immediately in −80 °C freezer until processing further. 3.6 Full Homogenization 1. Combine 2 g 14N-labeled tissue with 2 g 15N-labeled tissue corresponding to 1 reciprocal experiment into 15 mL disposable 364 Benjamin B. Minkoff et al. Fig. 5 Close-up of plant at the mass ready for processing (~12 days of growth) centrifuge tube. Keep submerged in dry ice as much as possible (see Note 27). 2. In 100 mL tricornered polypropylene beakers, aliquot 40 mL homogenization buffer. Keep on ice throughout duration of procedure (see Note 28). 3. Prior to addition of tissue, add the following from stock solutions: (a) 80 μL 500 mM DTT. (b) 40 μL saturated PMSF stock. (c) 40 μL 21.5 mM leupeptin stock. (d) 40 μL 1.5 mM pepstatin stock. (e) 20 μL 2 mM bestatin stock. (f) 80 μL 50 mM 1,10 phenanthroline stock. (g) 800 μL 100 mM vanadate stock. (h) 40 μL 2.8 mM E64 stock. (i) 20 μL 5 μg/μL α/β casein stock. 4. Add pre-weighed, combined tissue to beaker. Stir with pipette tip to allow minor thawing and homogenous distribution of powdered tissue. 5. On ice, sonicate using 1 cm probe 10 s, five times. 6. Pour resulting homogenate through one or two layers of miracloth into a polycarbonate Oak Ridge tube (see Note 29). 7. Centrifuge filtrate 15 min at 1,500 × g and 4 °C to remove debris. 15 N Labeling for Phosphoproteomics 365 8. Collect supernatant in 50 mL disposable centrifuge tube. Discard pellet. 9. Supernatant can be stored at −80 °C or can be immediately aliquoted and further processed via methanol/chloroform extraction. 3.7 Methanol/ Chloroform Protein Extraction (See Note 30) 1. Using the ~40 mL supernatant from chemical homogenization, separate into 5 mL aliquots in 50 mL polypropylene disposable centrifuge tubes. 2. To each 5 mL aliquot, add in the following order: (a) 3 parts methanol (15 mL). (b) 1 part chloroform (5 mL). (c) 4 parts H2O (20 mL). 3. Centrifuge 10 min at room temperature and 5,000 × g. 4. Remove and discard upper aqueous phase from each tube, taking care not to disturb the protein precipitate located at the phase interface (see Note 31). 5. Add 4 parts (20 mL) methanol onto interface and lower phase remaining in each tube. 6. Centrifuge 5 min at room temperature and 1,500 × g. 7. Discard supernatant and wash using 5 mL 80 % acetone. 8. Centrifuge 5 min at room temperature and 1,500 × g (see Note 32). 9. Discard supernatant. 10. Per conical tube, solubilize/denature protein pellet with 300 μL 8 M urea + 1× phosSTOP cocktail (one tablet/10 mL). As each pellet is solubilized, it can be added to a combined portion in a 15 mL disposable centrifuge tube. 11. Sonicate on ice using a 3 mm desktop probe, pulsing the mixture lightly until a uniform color is reached. 12. Sample(s) can be frozen and stored at −80 °C or can be immediately digested with trypsin. 3.8 In-Solution Trypsin Digestion 1. Dilute samples to 1 M urea using 50 mM ammonium bicarbonate + 1× phosStop. 2. Perform BCA assay to determine protein concentration. 3. Aliquot 5 mg protein to a 15 mL disposable centrifuge tube. 4. Using 500 mM DTT, bring solution to 5 mM DTT. 5. Place at 50 °C for 45 min. 6. Using 500 mM IAA, bring solution to 15 mM IAA. 7. Place in dark at room temperature for 45 min. 8. Add trypsin at a ratio of 1:100 trypsin to protein. 366 Benjamin B. Minkoff et al. 9. Place, shaking, between 4 h and overnight at 37 °C. 10. Remove digested mixture from 37 °C incubation. 11. To arrest digestion, bring solution to 0.3 % formic acid. Check that pH ≤ 3 using indicator strips (see Note 33). 12. Sample(s) can be frozen and stored at −80 °C or can be immediately concentrated. 3.9 Solid Phase Extraction/Peptide Concentration (See Note 34) 1. Set up 3 cc tC18 Waters’ Sep-Pak in a tube clamp attached to ring stand, using a 50 mL disposable centrifuge tube for waste collection. 2. Equilibrate cartridge using 4 mL acetonitrile. 3. Wash column with 4 mL 80 % acetonitrile/0.5 % formic acid. 4. Wash column with 6 mL H2O/0.1 % trifluoroacetic acid. 5. Saving the flow through, load sample onto column and, using described syringes (see Note 10), push through at a rate no faster than ~1 drop/s. 6. Reload flow through onto column and push through a second time, no faster than ~1 drop/s. 7. Wash column with 6 mL H2O/0.1 % trifluoroacetic acid. 8. Elute slowly (≤1 drop/s) into a 1.5 mL LoBind microcentrifuge tube using 1 mL 80 % acetonitrile/0.5 % formic acid. 9. Collect a second elution in a second LoBind microcentrifuge tube using 500 μL 80 % acetonitrile/0.5 % formic acid and 500 μL acetonitrile. 10. Dry down elutions in vacuum centrifuge until liquid is gone (see Note 35). 11. Dried pellet/powder can be frozen at −80 °C or immediately solubilized and enriched for phosphopeptides. 3.10 Titanium Dioxide-Based Phosphopeptide Enrichment 1. Using the 19-gauge machined tubing with internal sliding wire, punch a single circle of C-8 material from a 3M Empore extraction disc (see Fig. 6). 2. Using the sliding wire, gently push the material out of the shaft into the bottom of a GeneMate P200 pipette tip, forming a tight plug. 3. Weigh out titanium dioxide and suspend in 100 μL H2O (see Note 36). 4. Set up apparatus as pictured (Fig. 3) and pipette suspended titanium dioxide into tip. Push liquid through, forming a tight, dry column of material. 5. Resuspend dried pellet/powder from solid phase extraction (SPE) elutions in 100 μL lactic acid solution (see Note 37). 6. Add 2 μL (1 pmol) phosphorylated angiotensin-II peptide into resuspended protein solution (see Note 38). 15 N Labeling for Phosphoproteomics 367 Fig. 6 Materials used to make TiO2 column 7. Wash titanium dioxide column with 60 μL methanol. 8. Pass 100 μL lactic acid solution through column. Repeat a second time. 9. Load sample onto column, using a 1.5 mL LoBind microcentrifuge tube to collect “flow through” fraction. 10. Wash column twice with 100 μL lactic acid solution, collecting in the same tube as the flow through. 11. Wash column twice with 100 μL 80 % acetonitrile/0.1 % trifluoroacetic acid. Collect in separate tube as “wash.” 12. Elute peptides from column into a third tube (“elution”) by washing twice with 50 μL 1.0 % ammonium hydroxide solution. 13. Perform a second elution into the elution tube with 50 μL 30 % acetonitrile/0.1 % formic acid. 14. Add 3.5 μL neat formic acid directly into eluate to acidify solution. 15. Using vacuum centrifuge, dry down total volume of elution to ~2–3 μL. 16. Dried phosphopeptide pellet/solution can be stored at −80 °C or immediately solubilized/diluted for Orbitrap analysis. 3.11 Mass Spectometric Analysis 1. Solubilize phosphopeptide solution/pellet in 0.1 % formic acid (in 18 MΩ ultrapure deionized OR pure LC/MS (liquid chromatography/mass spectrometry) grade water). See Note 39. 2. Analyze using Orbitrap mass spectrometer. For details of our separation/data collection conditions, see Notes 40 and 41. 368 Benjamin B. Minkoff et al. Fig. 7 A folder demonstrating a single experimental replicate throughout data processing named “2x1x.” (a) shows contents prior to Mascot database searching, (b) folder contents prior to Census processing, and (c) folder contents following both Census and Area Script processing 3.12 Data Processing: From Direct Output to Database Search (see Fig. 7) 1. Create a folder (in windows explorer) referring to the experiment/reciprocal treatment performed. Avoid spaces and characters other than letters, numbers, or underscores. This applies to all directories leading to newly created folder as well. 2. Copy database .fasta file to folder. 3. Move all .raw files (untouched Orbitrap output) into folder. 4. Using the Trans-Proteomic Pipeline, convert .raw files into .mzXML files: (a) Log in to the Trans-Proteomic Pipeline and specify the analysis pipeline to be used as “Mascot.” (b) Under the header “mzXML Utils,” navigate to “mzXML” button. (c) Specify the files to be converted and click “Convert to mzXML.” 5. Using Trans-Proteomic Pipeline, convert .mzXml files into .mgf files: (a) Log in to the Trans-Proteomic Pipeline and specify the analysis pipeline to be used as “Mascot.” 15 N Labeling for Phosphoproteomics 369 Fig. 8 Parameters used in database search. Variable N/Q deamidation, under “Variable modifications,” is not shown but also selected (b) Under the header “mzXML Utils,” navigate to “convert mzXML files.” (c) Ensure that mascot generic format, “.mgf,” is the output file format. Do not modify settings. (d) Specify files to convert and click “Convert files.” 6. Using Mascot Daemon, perform database search using the processed file(s). For a suggested set of search parameters, see Note 42 and Fig. 8. 3.13 Data Processing: From Database Search Output to Census Processing (see Fig. 7) 1. In order to access completed searches, navigate via windows explorer to Mascot’s “data” folder. 2. The output files (.dat) will have been given arbitrary names by the software but will all be contained within a folder specifying the date on which they were run. 3. Accessing the log file contained within Mascot Daemon’s interface will show which .dat file corresponds to which database search. Copy the .dat files to folder that now contains . raw, .mzXML, and .mgf files. Rename using respective file names, keeping the .dat file extension. 4. Using Trans-Proteomic Pipeline, convert .dat files into .pep. xml files: (a) Log in to the Trans-Proteomic Pipeline and specify the analysis pipeline to be used as “Mascot.” 370 Benjamin B. Minkoff et al. (b) Under the header “Analysis Pipeline (Mascot),” navigate to “pepXML.” (c) Add all .dat files to convert. (d) Add database used in the Mascot search. (e) Begin file conversion. (f) Ensure the .pep.xml output file(s) has identical names to all previous file types associated with the original .raw file. 5. Create subfolders named for each experimental replicate (.raw file) within the originally created folder. 6. Move respective .raw, .mzXML, .dat, .mgf, and .pep.xml files into corresponding subfolders. 7. Copy “config” .xml file, runCensus.bat script, and database .fasta file into each subfolder individually. 8. Open FDR v5 and set the parameter “Processing Mode” to either “single file” or “batch mode,” for analyzing one or more than one .pep.xml file, respectively. 9. Add files to be analyzed and specify the database used in the search. 10. Run FDR v5. For each replicate/subfolder, ensure output files include a …peptable.tsv file, a …chargesep.tsv file, a …filtered_bycharge.pep.xml file, and a …filtered_bycharge_reformattedmods.pep.xml file. The “…” refers to each file given to the FDR script for analysis and will reflect the name of the file submitted. This remains true for each step hereafter. 11. For each of your experimental replicates (each .raw file and associated downstream files, now all in individual folders), the following steps must be done. 12. Open the …filtered_bycharge_reformattedmods.pep.xml file using Microsoft WordPad. 13. This file must be manually modified in the following ways (see Fig. 9): (a) In the second line, where the file reads summary_ xml=“c:/…,” delete everything within the quotations except for the file name and .pep.xml extension. (b) In the third line, where the file reads msms_run_summary base_name=“c:/…,” delete the entire path, leaving solely the name of the file. There should be no file extension associated with the name. (c) In the seventh line, where the file reads search_summary base_name=“c:/…,” delete the entire path, leaving solely the name of the file and .pep.xml file extension. (d) Using the find and replace function (Edit -> Replace or Ctrl + H), find every instance of the word “ionscore” and replace it with the word “xcorr.” (no period). 15 N Labeling for Phosphoproteomics 371 Fig. 9 Reformatting of the “…_filtered_bycharge_reformattedmods.pep.xml” file from FDR v5 script. Relevant portions are in bold. (a) is prior to manual changes, (b) is following changes. Not shown is the change replacing every instance of “ionscore” with “xcorr.” (no period—Subheading 3.13, step 13d) Fig. 10 RunCensus.bat, opened with WordPad. Relevant portions are in bold. Line breaks have been inserted after each input command for easier visualization 14. Save this modified file as …filtered_bycharge_reformattedmods_ xcorr.pep.xml to the same folder as the unmodified file. 15. Using Microsoft WordPad, open the runCensus.bat file you copied into each subfolder. Three input options must be altered in this script (see Fig. 10): (a) The first input, a file path, refers to the location of the “…_ filtered_bycharge_reformattedmods_xcorr.pep.xml” file modified in step 13 and saved in step 14 and should be changed to reflect the location and specific name of the file. (b) The second input, a directory, refers to the directory containing information necessary for processing files using Census. If the preceding directions have been followed, simply change the directory to the specific subfolder containing the reformatted mods file specified above. 372 Benjamin B. Minkoff et al. (c) The third input refers to the config file used by Census during analysis. This was copied into folder in step 7; specify the directory that contains it, followed by the filename and extension. This should be the same as the directory specified in step 15b. 16. Overwrite the copied runCensus.bat file with this newly modified, replicate-specific file. 17. Begin Census analysis by navigating to the specific subfolder and double-clicking on the runCensus.bat. 18. Ensure that Census analysis yields a census_chro.xml file and a “…”.tgz file. 3.14 Post-census Processing 1. Open Census software. 2. Open census_chro.xml file using Census’ interface. 3. Export the report as an editable file with File → Export Report. Ensure “No Filters” is checked. Click Export. 4. Verify that census-out.txt and census-out_singleton.txt files now exist in subfolder. 5. Open in-house-developed “Census 1.72 Area Script (TAIR).” 6. Using the interface, specify the “census-out.txt” as the “Census Outfile,” “census-out_singleton.txt” as the “Census Singleton” file, “…_filtered_bycharge_reformattedmods.pep. xml” (prior to manual modification) as the “reformatted mods pep xml file,” and the .fasta database as the database file. 7. Click run. 8. Ensure the software yields a “histogram_all_peptides.gif,” “histograms_unique_peptides.gif,” “peptideSummary.tsv,” and a “peptideSummary_withScores.tsv.” See Note 43. 9. Open and visualize “peptideSummary_withScores.tsv” using Microsoft Excel or other appropriate spreadsheet software. 4 Notes 1. Plants can be grown in either 250 mL Erlenmeyer flasks (capped with aluminum foil) or GA7 Magenta boxes—there are pros and cons to both. Erlenmeyer flasks have a lower incidence of contamination, but during removal, the plant must be compressed through the top of the flask, potentially inducing mechanosensitive responses. 2. A modified MS solution can be made from scratch [13, 14], eliminating the need for the 10× Micronutrient Solution; however, using the 10× solution potentially reduces both variability and preparation time. The modified MS salt solution 15 N Labeling for Phosphoproteomics 373 contains, per liter of water: 6.2 mg boric acid, 166.1 mg CaCl2, 0.025 mg CoCl2·6H2O, 0.025 mg cupric sulfate·5H2O, 37.26 mg disodium ethylenediaminetetraacetic acid, 27.8 mg ferrous sulfate·7H2O, 90.35 mg MgSO4, 16.9 mg MnSO4, 0.25 mg Na2MoO4, 0.83 mg KI, 85 mg KPO4 (monobasic), 8.6 mg ZnSO4·7H2O, 0.825 g NH4NO3, 0.96 g KNO3, 0.5 g MES, and 10 g d-sucrose. If using the modified MS solution routinely, making stock solutions of the components increases consistency and efficiency. 3. Though specified as 18 MΩ ultrapure deionized H2O throughout Subheading 2, it is simply referred to as H2O in Subheading 3. 4. Both seed sterilization methods are used routinely with comparable results. The liquid method can be done quickly and without use of a fume hood, whereas the vapor method, though less hands-on, requires a fume hood and longer sterilization time. 5. There are two methods for drying plant material prior to freezing. Removing excess water and media is necessary prior to freezing and grinding—both will freeze into ice that not only makes efficient homogenization near impossible but can convolute accurate weighing of sample later in the pipeline. One method is a quick spin in a kitchen salad spinner. Place the plant material in spinner and spin quickly for roughly 3 s, allowing the spinner to continue for ~5 s. Stop the spinner and immediately freeze plant. The second method is to gently blot the plant material dry on paper towels. Place plant material on several layers of paper towels and cover with one layer. Gently blot, move the plant mass to a dry spot on the towel, and repeat. Both of these methods potentially induce mechanosensitive responses; however, no better methods have been described at the time of this publication. The most important aspect is to handle all samples that will be compared in similar fashions. 6. Equal amounts α and β casein should be mixed to attain this concentration. Casein is added at this step as an experimental control for every step hereafter and should be observed in every sample analysis post-processing. Casein phosphopeptides have been added to the provided database (see Note 15) for searching. 7. The recipe used to make 100 mM vanadate: (a) Make 200 mM solution using sodium orthovanadate in 1 M Tris–HCl. (b) Boil until colorless. (c) Mix 1:1 with 10 mM H2O2. (d) Bring solution to pH 9.5. (e) Store at −20 °C. 374 Benjamin B. Minkoff et al. 8. The sonicator used in this laboratory is a Heat SystemsUltrasonics, Inc. Sonicator/Cell Disruptor, Model W-375. It is used at 50 % duty cycle. 9. The Sep-Pak cartridge described here has a peptide capacity of 5 mg. The procedure(s) described in this chapter can be scaled down in reasonable fashion. For example, Waters sells Sep-Pak cartridges labeled “1 cc,” which have a peptide capacity of 1 mg. 10. The acid used will vary based on individual laboratory procedures. Formic acid is used in this case—this is because when running liquid chromatography/mass spectrometry (LC/ MS), the buffers used contain formic acid. This can be substituted with other acids, such as acetic acid. Importantly, the acid-containing solution with which peptides are eluted and in which the phospho-enriched pellet is solubilized should be made using an acid consistent with that used in LC buffers. 11. In order to modify Luer-Lok tip syringes to fit into the specified adapters (which in turn fit into Waters’ Sep-Pak cartridges), a handheld Dremel tool with a cutting bit is used. Carefully, the threaded portion of the tip is cut off, leaving solely the ~40 mm internal tip (see Fig. 2). The adapter fits onto the exposed internal tip. 12. GeneMate brand tips must be used—they taper at the tip and are thus the only brand that can be plugged sufficiently with the C-18 disc. 13. The apparatus used for TiO2 enrichment was constructed as follows (see Fig. 3): The two “legs” are made of three blue, stacked, 96-well PCR (polymerase chain reaction) tube racks, and the “bridge” portion is an orange, 96-well PCR tube rack. At the four corners of the bridge, a 1 mL syringe plunger and a P200 tip are used to hold the bridge to the legs. Generally, any apparatus that allows space for a test tube rack and 1.7 mL LoBind microcentrifuge tubes, while holding up under a decent amount of pressure, is sufficient. 14. The mass spectrometer used in this protocol is a Thermo LTQ Orbitrap XL. The LC system is composed of all Agilent 1100 hardware. It contains a Nanopump, an isocratic pump, a columnswitching valve, a micro-well plate autosampler (held at 4 °C), and a degasser. The trapping column is 5 mm × 300 μm inner diameter, packed with Agilent stable bond C18. The analytical column used is 360 μm × 75 μm (outer × inner diameter) and is pulled and packed in-house using a Sutter P-2000 laser puller and pressure bombs with Magic 200 Å C18 material, respectively. The analytical column is packed to between 10 and 15 cm. 15. All in-house-built scripts and configured database can be found on the laboratory’s website for download. Additionally, links are provided to all the relevant pieces of software (http:// www.biotech.wisc.edu/sussmanlab/home). 15 N Labeling for Phosphoproteomics 375 16. Mascot Daemon requires any version of Windows 2000 or newer. 17. In-house-built scripts were coded in Perl. In order to run Perl scripts on a Windows-based system, ActivePerl must be downloaded. 18. Census is built in Java and thus operating system independent; however, it requires Java V6 or newer. 19. 1 L of liquid medium will provide for approximately 13 magenta boxes or flasks. Due to the fact that fungal and/or bacterial contamination occurs, it is recommended to start approximately 25 % more boxes/flasks than are sufficient to obtain the desired number of replicates. 20. A small seed scoop can be made by melting a 0.2 mL PCR tube to a metal E. coli loop. Trim the tube to just above the loop with a razor blade. Scoop volumes will vary; take several measurements to find the average amount of seeds held. Adjust number of scoops added per box/flask accordingly to achieve approximately 12 mg/container. 21. Suspend the seeds fully in ethanol. The suspension can then be spread evenly across sterile filter paper for quick and efficient drying. 22. The sharpie is used as a crude metric for the effectiveness of the fumes used for sterilization—within roughly 3 h, the sharpie should begin to fade due to the corrosive nature of the fumes produced. Overnight (15–16 h), fine and ultrafine sharpie markers will fade almost entirely, and thicker sharpies will fade ~50–60 %. Though crude, it has pointed to ineffectiveness of the procedure in the past due to poor desiccator sealing. 23. Take care to pour seeds directly into media to avoid getting seeds stuck on the wall of the box/flask. Handle the boxes/ flasks gently also to avoid the seeds adhering to the sides. Seeds are very difficult to return to the media once adhered. Some will adhere anyway, usually, while moving cubes and while shaking during growth. 24. Orbital shaker is set up with fluorescent light fixtures placed approximately 12 in. above shaking platform. A fan is used to circulate air and counteract potential heating from light (see Fig. 4). 25. Each experiment contains two treatment sets: set one, containing 14N-treated flasks, 15N-control flasks and set two, containing 14N-control flasks, 15N-treated flasks. See Fig. 1 [7]. 26. Prechill mortar and pestle with liquid nitrogen immediately prior to adding and grinding tissue. 27. The easiest way to do this is to, using previously collected 50 mL disposable centrifuge tubes on dry ice, quickly weigh 376 Benjamin B. Minkoff et al. out material and combine in a 15 mL disposable centrifuge tube, also on dry ice. As the material begins to thaw, it becomes much harder to work with, sticking to any surface due to moisture. 28. It is important to perform this as quickly and coldly as possible. Use a cold room if available. 29. Due to polycarbonate Oak Ridge tubes having relatively small openings, it is recommended to filter into a 50 mL disposable centrifuge tube (which is much easier) and then pour filtrate into an Oak Ridge tube. 30. The scale of the methanol/chloroform extraction is limited solely by each laboratory’s capacity for growing and processing tissue. It can be scaled up or down while maintaining the described ratios, providing the capacity exists for centrifugation at room temperature, and appropriately sized polypropylene (for use with chloroform) tubes can be acquired. 31. The phases will be clearly separated. Protein will appear as a white layer between the clear upper aqueous phase and the green lower organic phase. Some protein can be lost; however, it is best to err on the side of leaving part of the aqueous phase, rather than removing some of the protein layer. 32. The wash step involving acetone can be done in duplicate or triplicate if desired. 33. Be sure to use glass syringes with neat formic acid. If, after addition of formic acid, pH > 3, add up to 0.5 % formic acid and retest pH. A pH ≤ 3 is necessary to arrest digestion. 34. Due to the nature of the setup described in this protocol, variability will exist in the amounts of pressure manually applied to push solution/peptides through column. 35. The protein may pellet as either a white powder or a gelatinous pellet, potentially lightly brown or yellow. The powder is much more easily solubilized than the pellet, and both should be vortexed to resuspend fully. This variability in appearance has had no observable effect on further processing and analysis. 36. The amount of resin to use will vary based on amount of protein chosen to digest. For a 5 mg digestion (the maximum capacity of Waters’ 3CC tC18 Sep-Pak columns), it is advised to use less than 5 mg TiO2 resin. Phosphorylated peptides exist at such low abundance that not much resin is needed to capture a sufficient percentage for analysis; additionally, as the amount of resin is increased, the capacity for unphosphorylated peptides to bind is increased as well. 37. The ammonium hydroxide solution should be made fresh everyday. TiO2 enrichment is performed. It is suggested that 15 N Labeling for Phosphoproteomics 377 50 μL of lactic acid solution be used to solubilize the first elution and then transferred to the second elution, where a 50 μL further lactic acid solution is added and vortexed. 38. The phosphorylated angiotensin-II peptide is used as a control for the TiO2 enrichment. If the appropriately modified database was downloaded from the website given, all experimental controls have been added. 39. The amount that the pellet should be diluted is variable. Note that very little peptide is returned from an enriched sample, so as little 0.1 % formic acid as possible should be used. Generally, 4–6 μL of solubilized pellet are injected onto LC column for analysis, and ideally, three injections of the same sample should be done to assay instrument reproducibility, ensure maximum phosphopeptide identification, and provide statistics on observed ratio measurements for each reciprocal experiment. 40. The following are a recommended set of buffers, LC gradient/flow conditions, and data collection methods. Buffer A consists of Fisher 0.1 % formic acid in water. Buffer B consists of 95 % acetonitrile/5 % Fisher 0.1 % formic acid in water. Buffer for isocratic pump consists of 1 % acetonitrile/0.1 % formic acid in water. Samples are loaded onto trapping column for 20 min at 15 μL/min using the isocratic pump, with the Nanopump flowing 1.0 % B at 200 nL/min onto the analytical column. Column switching then occurs, and while the isocratic pump flows under the same conditions directly to waste, the Nanopump flows through the trapping column and onto the analytical column at 200 nL/min from 1.0 to 40.0 % B over 195 min, from 40.0 to 60.0 % B over 5 min, and then from 60.0 to 100.0 % B over 3 min, where it flows at 100.0 % B for 2 min. Following this, the Nanopump flows 100.0–1.0 % B over 1 min, after which it flows 1.0 % B for 15 min. 41. A total of 240 min analysis time is used per sample, in conjunction with the LC system. MS scans are taken using a resolving power of 100,000, and FTMS preview mode is enabled. The top five ions, excluding a charge state of 1 and unassigned charge states, are selected for MS/MS. Dynamic exclusion is used for 40 s with a repeat count of 1 and a list size/capacity of 500. Precursor ions are fragmented via collisionally induced dissociation using a normalized collision energy of 35.0, activation Q and time of 0.25 and 30 ms, respectively, and an isolation width of 2.5. 42. The search conditions (Mascot Daemon v2.2) are as follows. Using the provided database and Trypsin, the “AUTO” option under “top hits” is used, allowing one missed cleavage and a peptide tolerance of ±30 ppm. The monoisotopic peaks are used, as well as a 13C count of 2. 2+ and 3+ peptides are 378 Benjamin B. Minkoff et al. specified. Phosphorylated S/T/Y and deamidated N/Q are set as variable modifications, and carbamidomethylation of cysteine is set as a fixed modification. For MS/MS ion search conditions, a tolerance of ±0.6 Da is set. 43. The most useful file is the peptide summary with scores. Examining the histograms is useful as well; they should center around 1, with the important changes falling towards the outskirts of what should be an approximately Gaussian distribution. If mixing of the 14N/15N tissue was skewed from a perfect 1:1 ratio, this can be reflected by a skewed average ratio. Normalization is done on a per replicate basis prior to combining replicates and producing a larger dataset, based on the histogram after examination. Acknowledgements The authors would like to acknowledge Greg Barrett-Wilt and Kelli Kline for work involved with development and implementation of the methods described in this chapter, as well as the University of Wisconsin-Madison Biotechnology Center Mass Spectrometry/Proteomics facility for instrument time, various reagents, lab space, and advice throughout this process. References 1. Yates JR, Ruse CI, Nakorchevsky A (2009) Proteomics by mass spectrometry: approaches, advances, and applications. Annu Rev Biomed Eng 11:49–79 2. Schreiber TB, Mausbacher N, Breitkopf SB, Grundner-Culemann K, Daub H (2008) Quantitative phosphoproteomics—an emerging key technology in signal-transduction research. Proteomics 8:4416–4432 3. Gouw JW, Krijgsveld J, Heck AJ (2010) Quantitative proteomics by metabolic labeling of model organisms. Mol Cell Proteomics 9:11–24 4. Kline KG, Sussman MR (2010) Protein quantitation using isotope-assisted mass spectrometry. Annu Rev Biophys 39:291–308 5. Ong SE, Blagoev B, Kratchmarova I, Kristensen DB, Steen H, Pandey A, Mann M (2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics 1:376–386 6. Huttlin EL, Hegeman AD, Harms AC, Sussman MR (2007) Comparison of full versus partial metabolic labeling for quantitative proteomics analysis in Arabidopsis thaliana. Mol Cell Proteomics 6:860–881 7. Kline KG, Barrett-Wilt GA, Sussman MR (2010) In planta changes in protein phosphorylation induced by the plant hormone abscisic acid. Proc Natl Acad Sci U S A 107:15986–15991 8. Dunn JD, Reid GE, Bruening ML (2010) Techniques for phosphopeptide enrichment prior to analysis by mass spectrometry. Mass Spectrom Rev 29:29–54 9. Vogel HJ (1989) Phosphorus-31 nuclear magnetic resonance of phosphoproteins. Methods Enzymol 177:263–282 10. Yoshiyuki Koizumia MT (2002) Kinetic evaluation of biocidal activity of titanium dioxide against phage MS2 considering interaction between the phage and photocatalyst particles. Biochem Eng J 12:107–116 11. Keller A, Eng J, Zhang N, Li XJ, Aebersold R (2005) A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol Syst Biol 1(2005):0017 12. Park SK, Venable JD, Xu T, Yates JR III (2008) A quantitative analysis software tool for mass spectrometry-based proteomics. Nat Methods 5:319–322 13. Nelson CJ, Huttlin EL, Hegeman AD, Harms AC, Sussman MR (2007) Implications of 15 N Labeling for Phosphoproteomics 15 N-metabolic labeling for automated peptide identification in Arabidopsis thaliana. Proteomics 7:1279–1292 14. Hegeman AD, Schulte CF, Cui Q, Lewis IA, Huttlin EL, Eghbalnia H, Harms AC, Ulrich 379 EL, Markley JL, Sussman MR (2007) Stable isotope assisted assignment of elemental compositions for metabolomics. Anal Chem 79:6912–6921 Chapter 20 Gene Expression Profiling Using DNA Microarrays Kyonoshin Maruyama, Kazuko Yamaguchi-Shinozaki, and Kazuo Shinozaki Abstract In Arabidopsis research, microarrays have typically been employed for the measurement of gene expression under different conditions. Microarray analysis is often used to analyze the effects of the expression of wild-type genes (control) versus mutants, the effects of varying environmental conditions, and the effects of hormones. In addition, microarray analysis is used to analyze differences in gene expression between growth stages and tissues. Other array applications include comparative genomic hybridization, chromatin immunoprecipitation, mutation detection, and genotyping. This chapter focuses on gene expression profiling, which is typically performed by the competitive hybridization of two samples, each labeled with a fluorescent dye such as cyanine 3-CTP or cyanine 5-CTP. We describe the steps, from RNA purification to data analysis, that are involved in obtaining data from DNA microarrays. Key words RNA purification, DNA microarray, Expression profiling, Microarray data analysis 1 Introduction DNA microarray technology is a powerful research tool that enables global measurement of the differences between paired nucleic acid samples. Nearly two decades have passed since the first microarrays were created, and various applications, including gene expression profiling, CGH, SNP, ChIP-on-chip, and DNA methylation, have been developed. This chapter focuses on gene expression profiling, which may be considered as a five-step process: (1) RNA purification, (2) labeling of the samples, (3) hybridization and washing of the slides, (4) signal detection, and (5) data analysis. The RNA purification protocol we describe here is valid for Arabidopsis, in addition to soybean and rice plants. It is extremely important for the purification of the total RNA that the plant materials be kept frozen during the grinding process by repeatedly adding excess liquid N2. The low temperature is needed to inactivate the cellular RNases. RNAiso Plus and TRIzol Reagent are ready-touse, monophasic solutions of phenol and guanidine isothiocyanate Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_20, © Springer Science+Business Media New York 2014 381 382 Kyonoshin Maruyama et al. that are suitable for the purification of total RNA [1–5]. Moreover, RNA sample quantitation is an essential step in microarray analyses, as it is necessary to use intact total RNA to obtain reliable results. We recommend to use the Agilent 2100 Bioanalyzer to determine the quality of the total RNA. This bioanalyzer, with its RNA kit, is the industry standard for RNA quality control. The number of companies that produce microarray platforms, including Affymetrix, Agilent Technologies, Illumina, Applied Biosystems, and GE Healthcare, and the variety of protocols available to researchers have increased during the last years. DNA microarray analysis typically uses either a one-color or two-color platform to measure the transcription products. Microarrays are currently affordable and have acceptable reproducibility and accuracy for many applications. The MicroArray Quality Control (MAQC) project demonstrated that six representative microarray platforms provided high reproducibility, and the data quality was essentially equivalent between the one- and two-color approaches [6, 7]. In this chapter, the Agilent Technologies’ platform is recommended for gene expression profiling. This platform is the most sensitive, and the results generated are highly reproducible [8]. Agilent’s Low Input Quick Amp Labeling Kit generates fluorescent complimentary RNA (cRNA) using a sample containing between 10 and 200 ng of total RNA. This method uses T7 RNA polymerase, which simultaneously amplifies the target material and incorporates cyanine 3- or cyanine 5-labeled CTP. Using this kit, the amplification is typically at least 100-fold from the total RNA to cRNA. Because there is no standard method for microarray data analysis, the data analysis step is the most important and difficult. Indeed, many articles regarding analytical methods for microarray data have been published [5, 9–14], and depositing microarray data and statistical analyses have become conditions for publication in most journals. Nonetheless, it is difficult to choose the appropriate statistical methods for microarray data analyses, which often relies on the microarray experiment design. In some cases, the GeneSpring software is recommended. This software can be used even by biologists with limited experience in microarray analysis. 2 2.1 Materials RNA Analysis 1. Latex gloves. 2. Mortar and pestle (Grinding equipment). 3. Spatula. 4. 2 ml Eppendorf tubes. 5. Vortex mixer. 6. Microtube mixer. 7. High-speed microcentrifuge. DNA Microarray Analysis 383 8. Centrifuge desiccator. 9. NanoDrop ND-1000 UV–VIS Spectrophotometer (Thermo Fisher Scientific Inc.). 10. Liquid N2. 11. Ultrapure water. 12. 75 % (v/v) ethanol. 13. 99.5 % (v/v) ethanol. 14. Isopropanol. 15. RNAiso Plus (Takara) or TRIzol Reagent (Invitrogen). 16. 3 M sodium acetate, pH 5.2. 17. High-salt buffer (0.8 M sodium citrate and 1.2 M sodium chloride). 18. RNA 6000 Nano Kit (Agilent Technologies). 19. Agilent 2100 Bioanalyzer (Agilent Technologies). 20. IKA vortex mixer (Agilent Technologies). 2.2 Microarray Analysis 1. Low Input Quick Amp Labeling Kit, Two-Color (Agilent Technologies). 2. RNA Spike-In Kit, Two-Color (Agilent Technologies). 3. Gene Expression Hybridization Kit (Agilent Technologies). 4. Gene Expression Wash Buffer Kit (Agilent Technologies). 5. DNase/RNase-free distilled water (Agilent Technologies). 6. RNeasy Mini Kits (Qiagen). 7. 99.5 % (v/v) ethanol. 8. Microarray Scanner (Agilent Technologies). 9. Hybridization Chamber, stainless (Agilent Technologies). 10. Hybridization Chamber gasket slides (Agilent Technologies). 11. Hybridization oven (Agilent Technologies). 12. Hybridization oven rotator (Agilent Technologies). 13. Nuclease-free 1.5 ml tubes. 14. Magnetic stir bar (×2). 15. Microcentrifuge. 16. NanoDrop ND-1000 UV–VIS Spectrophotometer (Thermo Fisher Scientific Inc.). 17. Slide-staining dish, with slide rack (X3). 18. Thermal cycler. 19. Clean forceps. 20. Powder-free gloves. 21. Vortex mixer. 384 3 3.1 Kyonoshin Maruyama et al. Methods RNA Purification 3.1.1 Purification of Total RNA 1. Harvest the plants, and place in liquid N2 as soon as possible (within 10 s). 2. Transfer the frozen plants (150–300 mg) to a mortar containing liquid N2, and grind to a very fine powder using the pestle. The plants should be kept frozen during the grinding by adding liquid N2 (see Note 1). 3. Transfer the powdered material (~100 mg) to a precooled (in liquid N2) 2 ml Eppendorf tube using a precooled spatula, and place each tube in liquid N2. 4. When N2 evaporates, add 1 ml RNAiso Plus (or TRIzol Reagent) to each tube, and mix well using a microtube mixer for 5–10 min (see Note 2). 5. Centrifuge at 12,000 × g for 15 min at 4 °C, and transfer 800 μl of the supernatant to a new tube (see Note 3). 6. Add chloroform (200–400 μl) to each sample, and mix well using a microtube mixer for 5 min at room temperature. 7. Centrifuge at 12,000 × g for 10 min at 4 °C, and transfer 400 μl of the upper layer to a new tube (see Note 4). 8. Add 250 μl of high-salt buffer and 250 μl of isopropanol to each sample tube, and mix well using a microtube mixer for 5 min at room temperature. 9. Centrifuge at 12,000 × g for 10 min at 4 °C, and after the careful removal of the supernatant, dissolve the pellet in 100 μl of ultrapure water. 10. Add 10 μl of sodium acetate and 250 μl of 99.5 % (v/v) ethanol to each sample tube, and mix using a microtube mixer for 60 s at room temperature. 11. Centrifuge at 12,000 × g for 10 min at 4 °C, and after removing the supernatant, add 400 μl of 75 % ethanol to each sample. 12. Centrifuge at 12,000 × g for 10 min at 4 °C and discard the supernatant retaining the RNA pellet. 13. Dry the RNA pellet using the centrifuge desiccator and resuspend it in 30 μl of ultrapure water. 14. Quantitate the total RNA using the NanoDrop 1000 Spectrophotometer, and prepare a solution of 200 ng/μl of total RNA (see Note 5). 3.1.2 Quality Control of Total RNA 1. Prepare 550 μl of RNA 6000 Nano gel matrix in a spin filter, and centrifuge the matrix at 1,500 × g for 10 min at room temperature. DNA Microarray Analysis 385 2. Transfer 65 μl of the filtered gel into 0.5 ml RNase-free microfuge tubes, add 1 μl of RNA 6000 Nano dye solution, and mix well using a vortex mixer. Then centrifuge at 12,000 × g for 10 min at room temperature. Before use, allow the RNA 6000 Nano dye solution to equilibrate to room temperature for 30 min, and mix well using a vortex mixer. 3. Prepare a new RNA 6000 Nano chip on the chip-priming station, and load 9.0 μl of the gel-dye mixture into the well marked G. Make sure that the plunger is positioned at 1 ml, and then close the chip-priming station. Press the plunger until it is held by the clip. Wait for exactly 30 s, and then release the clip. After another 10 s, slowly pull back the plunger to the 1 ml position, and open the chip-priming station. 4. Transfer 9.0 μl of the gel-dye mix into the wells marked G. Load 5 μl of the RNA 6000 Nano marker into each of the 12 sample wells and into the well-marked ladder. Load 1 μl of the prepared ladder into well-marked “ladder.” Load 1 μl of the RNA sample into each of the 12 sample wells. Transfer 1 μl of the RNA 6000 Nano Marker into each of the unused sample wells. 5. Place the chip horizontally in the adapter for the IKA vortex mixer, and mix well for 60 s at 14.5 × g. 6. Process the chip in the Agilent 2100 Bioanalyzer within 5 min. 3.2 Preparation of Labeled Samples 3.2.1 Preparation of Cyanine 3-CTP or Cyanine 5-CTP Labeling Reactions 1. Place 200 ng/1.5 μl of diluted total RNA, 2 μl of final diluted Spike mixture and 1.8 μl of diluted T7 promoter primer mixture in a 0.2 ml microcentrifuge tube. Each tube should now contain a total volume of 5.3 μl (see Notes 6 and 7). 2. Incubate the reactions in a thermal cycler for 10 min at 65 °C to denature the primer and the RNA sample. 3. Place the reactions on ice, and incubate them for 5 min; centrifuge each sample briefly to collect the content at the bottom of the tubes. 4. Add 4.7 μl of cDNA master mixture to each sample tube, and mix by pipetting up and down; incubate the reactions at 40 °C in a thermal cycler for 2 h. Each tube should now contain a total volume of 10 μl (see Note 8). 5. Incubate the reactions in a thermal cycler for 15 min at 70 °C to inactivate the AffinityScript enzyme. Place the reactions on ice, and incubate for 5 min; centrifuge each sample briefly to collect the content at the bottom of the tubes. 6. Add 6 μl of transcription master mixture to each sample tube. Gently mix by pipetting, and incubate the samples in a thermal cycler for 2 h at 40 °C. Each tube should now contain a total volume of 16 μl (see Note 9). 386 Kyonoshin Maruyama et al. 3.2.2 Purification of Labeled/Amplified cRNA 1. Transfer the cRNA sample to a 1.5 ml tube, and add 84 μl of nuclease-free water for a total volume of 100 μl. 2. Add 350 μl of Buffer RLT and 250 μl of 99.5 % ethanol, and mix by pipetting up and down. Centrifuge each sample briefly to collect the content at the bottom of the tubes. Each tube should now contain a total volume of 700 μl. 3. Transfer the 700 μl of the cRNA sample to an RNeasy mini column in a 2 ml collection tube. Centrifuge the sample at 12,000 × g for 60 s at 4 °C. Discard the flow-through and collection tube. 4. Transfer the RNeasy column to a new collection tube, and add 500 μl of buffer RPE to the column. Centrifuge the sample at 12,000 × g for 60 s at 4 °C. Discard the flow-through. Re-use the collection tube. 5. Add another 500 μl of buffer RPE to the column. Centrifuge the sample at 12,000 × g for 60 s at 4 °C. Discard the flowthrough and the collection tube. 6. If any buffer RPE remains on or near the rim of the column, transfer the RNeasy column to a new 1.5 ml collection tube, and centrifuge the sample at 12,000 × g for 60 s at 4 °C to remove any remaining traces of the buffer RPE. Discard this collection tube, and use a fresh tube to elute the clean cRNA sample. 7. Elute the clean cRNA sample by transferring the RNeasy column to a new 1.5 ml collection tube. Add 30 μl of RNase-free water directly to the RNeasy filter membrane. Wait 60 s, and then centrifuge at 12,000 × g for 60 s at 4 °C. 8. Maintain the flow-through, which contains the cRNA, on ice. 9. Quantitate the labeled/amplified cRNA using the NanoDrop 1000 Spectrophotometer (see Note 5). 10. Determine the yield of each labeled/amplified cRNA. Use the concentration of the cRNA (ng/μl) to determine the cRNA yield (in micrograms) as follows: (concentration of cRNA) × 30 μl (elution volume)/10,000 = μg of cRNA. 3.3 Hybridization and Washing of the Slides 3.3.1 Hybridization 1. Add 825 ng of cyanine 3-labeled linearly amplified cRNA, 825 ng of cyanine 5-labeled linearly amplified cRNA, 11 μl of diluted 10× Blocking Agent, and 2.2 μl of 25× Fragmentation Buffer, and mix gently by pipetting. Prepare the reactions using a total volume of 55 μl. 2. Incubate the reaction mixtures at 60 °C for exactly 30 min to fragment the RNA. 3. Place the reactions on ice, and incubate for 60 s. DNA Microarray Analysis 387 4. Add 55 μl of 2× GEx Hybridization Buffer HI-RPM to stop the fragmentation reaction, and mix well by careful pipetting. Take care to avoid introducing bubbles. Do not mix using a vortex mixer. Centrifuge at 12,000 × g for 60 s at room temperature to collect the contents at the bottom of the tube. 5. Use immediately. Do not store. Place the sample on ice, and load onto the array as soon as possible. 6. Load a clean gasket slide onto the Agilent SureHyb chamber base with the label facing up and aligned with the rectangular section of the chamber base. Ensure that the gasket slide is flush with the chamber base and is not ajar. 7. Slowly dispense 100 μl of hybridization sample onto the gasket well in a “drag and dispense” manner. 8. Slowly place an array “active side” down onto the SureHyb gasket slide, so that the “Agilent”-labeled barcode is facing down and the numeric barcode is facing up. Make sure the sandwich-pair is properly aligned. 9. Place the SureHyb chamber cover onto the sandwiched slides, and slide the clamp assembly onto both pieces. 10. Hand-tighten the clamp onto the chamber. 11. Vertically rotate the assembled chamber to wet the gasket, and assess the mobility of the bubbles. If necessary, tap the assembly on a hard surface to move stationary bubbles. 12. Place the assembled slide chamber on a rotator in a hybridization oven set to 65 °C. Set your hybridization rotator to rotate at 10 rpm when using the 2× GEx Hybridization Buffer HI-RPM. 13. Hybridize at 65 °C for 17 h. 3.3.2 Washing the Microarray Slides 1. With the sandwich completely submerged in Gene Expression Wash Buffer 1, pry the sandwich open from the barcode end only. Slip one of the blunt ends of the forceps between the slides, and gently turn the forceps upwards or downwards to separate the slides. Let the gasket slide drop to the bottom of the staining dish. Remove the microarray slide, and place it into the slide rack in the slide-staining dish 2 containing the Gene Expression Wash Buffer 1 at room temperature. Minimize the exposure of the slide to air. Touch only the barcode portion of the microarray slide or its edges (see Notes 10–12). 2. When all of the slides in the group are placed into the slide rack in the slide-staining dish 2, stir for 1 min at room temperature. 3. During this wash step, remove the Gene Expression Wash Buffer 2 from the 37 °C water bath and pour into slide-staining dish 3. 388 Kyonoshin Maruyama et al. 4. Transfer the slide rack to slide-staining dish 3 containing warm Gene Expression Wash Buffer 2. Stir for 1 min. 5. Slowly remove the slide rack, minimizing droplets on the slides. It should take 5–10 s to remove the slide rack. 6. Place the slides in a slide holder so that the Agilent barcode faces up. Scan the slides immediately to minimize the impact of environmental oxidants on the signal intensities. 3.4 Signal Detection (See Note 13) 1. Place the assembled slide holders into the scanner carousel. 2. In the Scan Control main window, choose the slot number of the first slide for the Start Slot and the slot number for the last slide for the End Slot. 3. Select Profile microarrays. AgilentHD_GX_2Color for 4x44 K 4. In the Scan Control main window, click Scan Slot m-n, where m is the slot of the first slide, and n is the slot for the last slide. 5. Open the Agilent Feature Extraction (FE) software, and open the images (.tif). 6. Save the FE Project (.fep) by selecting File > Save As, and browse for the desired location > Start Extracting. 7. After the extraction is successfully completed, view the QC report for each extraction set by double-clicking the QC Report link in the Summary Report tab. Determine whether the grid has been properly placed by using the Spot Finding tool at the four corners of the array. 3.5 Data Analysis (See Note 14) 1. Open the GeneSpring GX program, select Project > New Project > in Create New project window, create a project name, and click the OK button. 2. In the Experiment Selection Dialog window, select Create new experiment, and click the OK button. 3. In the Experiment description window, create an Experiment name and select Agilent expression Two color as the Experiment type and Guided Workflow-Find-differentially expressed Genes as the Workflow type. Then, click the OK button. 4. In the Load Data window, click Choose Files, and select your microarray .txt files. Then, click the Next >> button. 5. Confirm your Dye-swap arrays analysis, and click the Finish button. 6. In the Summary Report window, click the Next >> button. 7. In the Experiment Grouping window, click the Add Parameter… button, and create a parameter name. Select NonNumeric as the Parameter type, and then create Parameter Values. Then, click the OK button. DNA Microarray Analysis 389 8. Confirm the Experiment Grouping window, and click the Next >> button. 9. Confirm the QC on samples window, and click the Next >> button. 10. Confirm the Filter Probesets window, and click the Next >> button. 11. Confirm the Significance Analysis window, and click the Next >> button. 12. Confirm the Fold Change window, and click the Next >> button. 13. Confirm the GO Analysis window, and click the Next >> button. 14. Confirm the Find Significant Pathways Results window, and click the Finish button. 15. In Project Navigator > your experiment > Analysis folder > right-click on T-test, P < 0.05, and select Export List. Then, save a microarray text file. 16. This process normalizes the microarray raw data using the Lowess normalization method. The expression log ratios and Benjamini and Hochberg false discovery rate P values (Corrected P-value) are also calculated by GeneSpring GX. 4 Notes 1. The plants can be easily ground using grinding equipment. 2. After adding RNAiso Plus (or TRIzol Reagent) the solution often freezes. Homogenize the frozen solution as quickly as possible using a vortex or microtube mixer. The isolated total RNA is intact and does not contain small amounts of DNA or proteins. This RNA can be used for microarray, qRT-PCR, and RNA gel blot analyses. 3. Transfer the supernatant to a new tube. Be careful not to collect any of the cellular debris. 4. After centrifugation, the solution separates into three layers. The upper layer will be a clear liquid containing the RNA, the middle layer will be a semisolid layer containing the DNA, and the bottom layer will be a red-colored organic solvent containing the proteins, polysaccharides, fatty acids, cellular debris, and a small amount of DNA. Be careful not to collect any of the middle layer. Steps 5 and 6 should be performed again if the middle layer has been mixed with the top layer. When isolating RNA from rice and soybean, steps 5 and 6 should be performed again. 390 Kyonoshin Maruyama et al. 5. The NanoDrop 1000 Spectrophotometer will accurately measure the concentration of RNA samples up to 3,000 ng/μl without dilution. A 1.5–2 μl aliquot of RNA sample is recommended to ensure that a liquid sample column is formed and that the light path is completely covered by the sample. 6. To prepare the final diluted Spike mixture, (1) mix the thawed Spike A or B mixture well using a vortex mixer, incubate for 5 min at 37 °C, and mix well a second time. Centrifuge the reaction mixtures briefly to collect the content at the bottom of the tubes. (2) Transfer 2 μl of the Spike A or B mixture into a new tube, and add 38 μl of the Dilution Buffer provided in the Spike-In kit (1:20). Mix well using a vortex mixer. Centrifuge the reactions briefly to collect the contents at the bottom of the tube. This tube contains the first dilution. (3) Transfer 2 μl of the Spike A or B mixture into a new tube, and add 78 μl of the Dilution Buffer (1:40). Mix well using a vortex mixer. Centrifuge the reactions briefly to collect the contents at the bottom of the tube. This tube contains the second dilution. (4) Transfer 2 μl of the Spike A or B mixture into a new tube, and add 30 μl of the Dilution Buffer (1:16). Mix well using a vortex mixer. Centrifuge the reactions briefly to collect the contents at the bottom of the tube. This tube contains the final diluted Spike mixture. 7. To prepare the diluted T7 promoter primer mixture, mix 0.8 μl of the T7 Promoter Primer and 1 μl of nuclease-free water. 8. To prepare the cDNA master mixture, mix 2 μl of 5× FirstStrand Buffer, 1 μl of 0.1 M DTT, 0.5 μl of 10 mM dNTP mix, and 1.2 μl of AffinityScriptRNase Block Mix. 9. To prepare the transcription master mixture, mix 0.75 μl of nuclease-free water, 3.2 μl of 5× Transcription Buffer, 0.6 μl of 0.1 M DTT, 1 μl of NTP mix, 0.21 μl of T7 RNA Polymerase Blend, and 0.24 μl of Cyanine 3-CTP or cyanine 5-CTP. 10. The microarray wash procedure for Agilent’s two-color platform must be performed in an environment in which the ozone level is 5 ppb or less. 11. To prepare the 10× Blocking Agent, add 500 μl of nucleasefree water to the vial containing the lyophilized 10× Blocking Agent supplied with the Agilent Gene Expression Hybridization Kit. Centrifuge the solution briefly to collect the content at the bottom of the tube. 12. To set up the apparatus for the washes, completely fill slidestaining dish 1 with Gene Expression Wash Buffer 1 at room temperature. Place a slide rack into the slide-staining dish 2. Add a magnetic stir bar. Fill the slide-staining dish 2 with sufficient Gene Expression Wash Buffer 1 at room temperature to cover the slide rack. Place this dish on a magnetic stir plate. DNA Microarray Analysis 391 Place empty dish 3 on the stir plate, and add a magnetic stir bar. Do not add the pre-warmed (37 °C) Gene Expression Wash Buffer 2 until the first wash step has begun. Remove one hybridization chamber from the incubator, and record the time. Record whether bubbles have formed during the hybridization and whether all of the bubbles are rotating freely. 13. The microarrays are scanned using an Agilent dual-laser DNA microarray scanner with SureScan technology. The data are extracted from the images by the Agilent Feature Extraction software. 14. Microarray raw data are analyzed by the GeneSpring GX software. We recommend reading the GeneSpring PDF Manual when a more detailed analysis is desired. References 1. Wallace DM (1987) Large- and small-scale phenol extractions. Methods Enzymol 152:33–41 2. Coombs LM et al (1990) Simultaneous isolation of DNA, RNA, and antigenic protein exhibiting kinase activity from small tumor samples using guanidine isothiocyanate. Anal Biochem 188:338–343 3. Nicolaides NC, Stoeckert CJ Jr (1990) A simple, efficient method for the separate isolation of RNA and DNA from the same cells. Biotechniques 8:154–156 4. Chomczynski P, Sacchi N (1987) Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction. Anal Biochem 162:156–159 5. Raha S et al (1990) Simultaneous isolation of total cellular RNA and DNA from tissue culture cells using phenol and lithium chloride. Genet Anal Tech Appl 7:173–177 6. MAQC Consortium (2006) The MicroArray Quality Control (MAQC) Project shows interand intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24:1151–1161 7. Patterson TA et al (2006) Performance comparison of one-color and two-color platforms 8. 9. 10. 11. 12. 13. 14. within the MicroArray Quality Control (MAQC) Project. Nat Biotechnol 24: 1140–1150 Hardiman G (2004) Microarray platforms— comparisons and contrasts. Pharmacogenomics 5:487–502 Draghici S et al (2006) Reliability and reproducibility issues in DNA microarray measurements. Trends Genet 22:101–109 Ioannidis JP et al (2009) Repeatability of published microarray gene expression analyses. Nat Genet 41:149–155 Jafari P, Azuaje F (2006) An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors. BMC Med Inform Decis Mak 6:27 Reiner A, Yekutieli D, Benjamini Y (2003) Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 19:368–375 Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci U S A 100:9440–9445 Konishi T (2011) Microarray test results should not be compensated for multiplicity of gene contents. BMC Syst Biol 5:S6 Chapter 21 Forward Chemical Genetic Screening Hyunmo Choi, Jun-Young Kim, Young Tae Chang, and Hong Gil Nam Abstract Chemical genetics utilizes small molecules to perturb biological processes. Unlike conventional genetics methods, which involve the alteration of genetic information mostly with lasting effects, chemical genetics allows temporary and reversible alterations of biological processes. Furthermore, it enables the alteration of biological processes in a dose-dependent manner, providing an advantage over conventional genetics. In the present chapter, the general procedures of forward chemical genetic screening are described. Forward chemical genetic screening can be performed in three steps. The first step involves the identification of small molecules that induce phenotypic or physiological changes in a biological system from a chemical library. In the second step, cellular targets that interact with the isolated chemical, which are mostly proteins, are identified. Although several methods can be applied in the second step, the most common one is affinity pull-down assay using a target protein that binds to the isolated compound. However, affinity pull-down of a target protein is a formidable barrier in forward chemical genetics. We introduced a tagged chemical library approach that significantly facilitates the identification of target proteins. The third step consists of the validation of the target protein, which should include the assessment of target specificity. This step is critical because small molecules often show pleiotropic effects due to low specificity. The specificity test may include a competition assay using cold competitors and a genetic study using mutants or transgenic lines modified for the cellular target. Key words Forward chemical genetics, Chemical screening, Target identification, Tagged chemical library, Specificity 1 Introduction Arabidopsis is a well-established plant genetic model for the investigation of various aspects of plant biology due to its rich genetic resources and genetic amenability, which have led to an unprecedented success in molecular genetic characterization of various plant processes. The critical advantages of Arabidopsis as a plant genetic model system include the established pools of insertion mutants and facile generation of transgenic lines. However, these genetic mutants or transgenic lines are limited in their value for the elucidation of important aspects of plant biology. For example, the mechanism of action of lethal genes may not be easily revealed Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_21, © Springer Science+Business Media New York 2014 393 394 Hyunmo Choi et al. because genetic mutations lead simply to lethality, although an antisense approach may be utilized to overcome some of the lethality problems [1]. Furthermore, conventional genetic methods are associated with long-lasting effects that can hamper the observation of the direct or immediate effects of a gene of interest. Several strategies have been developed to overcome the shortcomings of conventional genetic mutations. Chemical genetics is an emerging approach that relies on the ability of small molecule chemicals to mimic genetic mutations by acting on cellular targets. Chemical treatment to modulate the activity or function of cellular targets provides a few advantages over conventional genetic approaches. The duration of the effect on the target can be adjusted, and the effect can be reversed, thus enabling a more direct assessment of a cellular target. Chemicals may be applied locally, thus mimicking a tissue-/organ-specific modulation of gene function. Variations in the target gene function or activity can be examined by using various doses of chemicals, allowing the study of the effect of lethal genes. Furthermore, chemical genetics can be applied to various species that are not genetically tractable. Chemicals identified from a plant species such as Arabidopsis can be utilized to investigate the function of a homolog gene in a related species. Chemical genetics is now successfully employed for the elucidation of various complex mechanisms, which may not have been feasible with conventional genetics, such as the study of auxin signaling [2–4], endo-membrane system components [5], and vacuolar sorting [6]. Yet, the cellular effects of small molecule chemicals may not be as specific as the mutation of a given gene, and this point needs to be borne in mind when applying chemical genetics (Fig. 1). Chemical genetics can be classified into forward chemical genetics (i.e., phenotype-based approach) and reverse chemical genetics (i.e., target-based approach). Forward chemical genetics proceeds from the altered phenotype or physiology to the corresponding target genes, similar to classical forward genetics [7]. However, while conventional genetics is based on the screening of a pool of mutant plants, in chemical genetics a pool of small Fig. 1 Forward and reverse chemical genetic approach Forward Chemical Genetic Screening 395 molecules is screened for their effects on phenotype or physiology. Chemicals that alter the phenotype or physiology of interest are then isolated and used to identify a target. In reverse chemical genetics, as in conventional reverse genetics, the targets, which are usually proteins, are first defined, and a chemical library is screened for compounds that interact with the target protein. These chemical compounds are then used to determine the phenotypic or physiological consequences of altering the function of the target protein in a cellular context. The present chapter describes a forward chemical genetics protocol that includes the screening of small molecules for a given phenotype or physiology, the identification of target proteins, and the validation of the target [8]. A key step in forward chemical genetics is the identification of the cellular targets, which can be approached in several ways. In one approach, cellular targets may be inferred from the responses of plants to the chemical and known physiological responses and later confirmed by various means such as binding of the chemical to the inferred target in vitro or in planta [4, 9]. Screening of genetic mutants that confer altered sensitivity to the chemical can provide information on the target of the chemical compound. Mutated genes are among the candidate cellular targets, although the altered sensitivity of a mutant to a specific compound may be due to an indirect effect of the chemical [10]. The cellular target can also be identified by pulling down the target protein that binds to the chemical. Usually, it involves addition of a linker molecule to the screened chemical without affecting its activity using structure–activity relationship (SAR) studies. The compound with the added linker is then attached to a solid phase matrix such as agarose beads to make an affinity matrix. This affinity matrix is used to pull down binding targets from cellular extracts. The matrix-bound proteins are usually separated by SDS-PAGE to examine the target protein, which can be identified by mass spectrometry [11, 12]. After the candidate target is identified, a functional validation is necessary to confirm that the bound proteins are the actual targets. This can be done by examining the phenotypic or physiological effects of the chemical in knockout, knockdown, and overexpression plants (Fig. 2). Fig. 2 Modification of hit chemical for target identification 396 Hyunmo Choi et al. The affinity-based approach for target identification has practical difficulties. The addition of a linker to a chemical compound in the appropriate position while minimizing the effect on its activity requires a thorough SAR study. This step is time-consuming and laborious, and normal biology laboratories are not familiar with this procedure. Sometimes this modification is not feasible without the loss of activity. To overcome these difficulties, the tagged chemical library approach was introduced [13]. In the tagged chemical library, chemicals already contain the linker molecule necessary for the preparation of an affinity matrix. Thus, the subsequent modification of the hit compound is not required. The tagged chemical library used in this study contained a triethylene glycol (TG)-based linker with a terminal amine functionality that is utilized for immobilization of the chemical on a solid matrix [14]. Here, we describe a protocol based on the tagged chemical library. 2 Materials 1. Dry seeds of Arabidopsis thaliana. 2. 1.5-mL microfuge tubes. 3. 24-well plates. 4. 10-cm Petri dishes. 5. 20-μm pore-size polyethylene frit cartridge. 6. 20-mL glass vial. 7. Microcentrifuge. 8. Orbital shaker. 9. Rotary mixer. 10. Tagged chemical library: the chemicals of the tagged triazine library used contained a triethylene glycol-based linker with a terminal amine (TG-NH2) [13, 14]. 11. Sterilized double-distilled water. 12. Dimethyl sulfoxide (DMSO). 13. Murashige and Skoog (MS) medium (0.5×) containing microelements without sucrose (pH 5.7) and with 0.8 % phyto agar. Note that the media must be adjusted according to the screening strategy for a specific phenotype, as the medium affects phenotype, especially that of seedlings. 14. Affigel-10. 15. N,N-Diisopropylethylamine (DIEA). 16. Ethanolamine. 17. Sodium azide. 18. Liquid nitrogen (N2). Forward Chemical Genetic Screening 397 19. Seed surface-sterilization solution: 10 % Sodium hypochlorite solution containing 0.1 % Triton X-100. 20. Extraction buffer: 20 mM Tris–HCl (pH 7.5), 150 mM NaCl, 2 mM EDTA, 1 mM NaF, 1 mM PMSF, 1 mM DTT, 10 mM β-glycerophosphate, protease inhibitor cocktail (Roche, Mannheim, Germany), 1 % Triton X-100, and 0.1 % sodium dodecyl sulfate. Washing buffer does not include Triton X-100 or SDS. 21. SDS-PAGE gels: 0.375 M Tris–HCl (pH 8.8), 10 % acrylamide/bis-acrylamide solution, 0.1 % SDS, 0.05 % ammonium persulfate, 0.05 % (v/v) TEMED. 22. Coomassie blue staining solutions: Fixing solution (50 % methanol and 10 % glacial acetic acid), staining solution (0.1 % Coomassie Brilliant Blue R-250, 50 % methanol and 10 % glacial acetic acid), and destaining solution (40 % methanol and 10 % glacial acetic acid). 3 Methods 3.1 Primary Chemical Screening The scheme is diagrammed in Fig. 3. The candidate chemicals from this screening can be called “Hit” compounds. 1. Pour 1.5 mL of 0.5× MS culture media with 0.8 % phyto agar into each well of a 24-well plate. Allow the agar to solidify at room temperature (see Notes 1–3). 2. Each chemical from the chemical library is applied to the surface of the culture medium in each well. As a control, one or two wells in each plate should only contain solvent (2 μm of chemical was used for primary screening in our case) (see Note 4). 3. Prepare the necessary amount of seeds considering that three seeds will be sown in each well. Fig. 3 Schematic diagram of the primary chemical screening 398 Hyunmo Choi et al. 4. Surface-sterilize seeds in microfuge tubes by adding 1 mL of 10 % (v/v) sodium hypochlorite solution containing 0.1 % Triton X-100 as surfactant and by shaking or vortexing for 5 min. The seeds are collected by centrifugation in a microcentrifuge for a few seconds and the supernatant is removed. The seeds are washed five times with sterile water and incubated in tubes for 2 days at 4 °C in the dark to synchronize germination. After sowing the seeds on media, the plates are placed under white light for 24 h to promote germination (see Note 5). Make sure that the seeds sown in each well do not stick together. 5. Maintain the experimental conditions until the phenotype can be observed (see Note 6). 6. Sort the chemicals into hit or inactive chemicals according to their phenotypic effects. Chemicals that elicit effects similar to those of the solvent control in each plate should be excluded. In addition to finding a hit chemical, it is also important to identify an inactive compound with a structure similar to that of the hit compound that could be used as a negative control. 3.2 Secondary Screening Once a hit chemical with a phenotypic effect is found, it must be confirmed by performing a secondary screening including doseresponse assessment and determination of the IC50 value, as shown in Fig. 4. 1. Pour 60 mL of 0.5× MS culture media with 0.8 % phyto agar into each 10-cm Petri dish (see Note 4). 2. Prepare enough plates by mixing culture media with the following chemicals: solvent only, a hit chemical (from low to high concentration), or an inactive control chemical (from low to high concentration) (see Note 7). Fig. 4 Schematic diagram of the secondary screening of chemicals Forward Chemical Genetic Screening 399 3. Prepare the necessary amount of seeds, while taking into consideration that more than 30 seeds will be sown in each Petri dish (see Note 8). 4. Surface-sterilize the prepared seeds, stratify them, and sow on media plate (see Subheading 3.1, Step 4). 5. Maintain the experimental conditions for phenotypic screening (see Note 9). 3.3 Target Identification 3.3.1 Bead Conjugation For the identification of cellular targets of the hit compound by the pull-down assay, a solid phase matrix needs to be covalently attached to the linker of the tagged chemical utilizing the terminal amine. 1. Shake the bottle of Affigel-10 gently to obtain a homogeneous suspension. 2. Transfer 0.5 mL (7.5 μmol) of Affigel-10 to a 3-mL cartridge with a 20-μm pore-size polyethylene frit. 3. Drain the supernatant solvent and wash the gel with DMSO. 4. Prepare 375 μL of 10 mM TG-NH2-linked chemical (dissolved in DMSO) into a 20-mL vial. Add 125 μL of DMSO to adjust the total volume of solution to 0.5 mL. 5. Add 50 μL DIEA to the vial with the TG-NH2-linked chemical. 6. Transfer the contents of the vial to the 3-mL cartridge with Affigel-10. 7. Shake well for 3 h at room temperature on an orbital shaker with a speed setting of 500 rpm. 8. Drain the solution and wash the product with DMSO. 9. Add 50 mM ethanolamine solution in 1 mL of DMSO and 15 μL DIEA to the reaction cartridge to block side reactions. Shake for 3 h at room temperature on an orbital shaker with a speed setting of 500 rpm. 10. Drain the solution and wash the product with DMSO, water, and then a 2 % sodium azide solution in water to protect the product from bacterial contamination. 11. The Affigel-10 product can now be stored in an E-tube in 2 % sodium azide solution in water (1 mL) at 4 °C. 12. For affinity pull-down assay, the sodium azide solution should be removed. Spin down the bead-conjugated chemical for 1–2 s at 800 × g in a microcentrifuge at 4 °C. Drain the supernatant containing the sodium azide. Wash the pellet three times with washing buffer. 3.3.2 Affinity Pull-Down Assay The Affigel-bound chemical is incubated with the cell extract to isolate cellular target proteins. SDS-PAGE is used to examine the pull down proteins. To exclude nonspecific binding proteins, a 400 Hyunmo Choi et al. competition assay is used. The isolated proteins are then identified by mass spectrometry. 1. Prepare 200 seeds of Arabidopsis in a microfuge tube, surfacesterilize, stratify, and sow on media plates (see Subheading 3.1, step 4). 2. Incubate the plates under the given experimental condition until the phenotype can be observed (see Note 10). 3. Freeze plant samples in liquid N2. Grind to a very fine powder in a precooled (−70 °C) mortar and pestle. 4. Transfer this powder to a microfuge tube. Add extraction buffer to the powder and mix thoroughly (the volume of the extraction buffer depends on sample mass: 200 μL per 0.1 g sample). Maintain on ice for 10 min with occasional inversion of the tube. 5. Centrifuge the mixture for 10 min at 4,000 × g in a microcentrifuge at 4 °C. 6. Transfer the supernatant into a new tube on ice. Measure the protein concentration in each tube. 7. Adjust the protein concentration in each tube so that all of the tubes have the same protein concentration. 8. Add 5 volumes of the washing buffer to the cell lysate on ice. 9. Add 30 μL of Affigel-10 to reduce nonspecific binding. Incubate for 1 h at 4 °C in a rotary mixer with gentle mixing. 10. Spin down the Affigel-10 for 1–2 s at 800 × g in a microcentrifuge at 4 °C. 11. Aliquot the total lysates to five new microfuge tubes on ice, and add the washing buffer to make a total volume of 1 mL (1 μg/μL). The details of each tube are given below: Tube 1. Target screening tube to pull-down target proteins from the cell extract using a bead-conjugated hit compound (prepared as described in Subheading 3.3.1). Tube 2. Competition assay tube to pull down proteins from the cell extract using a bead-conjugated hit compound after preincubation of the cell extract with the unconjugated hit compound (prepared as described in Subheading 3.3.1). Tube 3. Cell extract. Tube 4. Bead control to pull down proteins that bind nonspecifically to the unconjugated beads from the cell extract. Tube 5. Inactive control to pull down proteins from the cell extract using a bead-conjugated inactive compound (prepared as described in Subheading 3.3.1). Forward Chemical Genetic Screening 401 12. Place the hit chemical compound that is unconjugated to the beads in only tube 2. Add the same volume of solvent to the other four tubes (Tube 1, 3, 4, and 5). 13. Incubate all the tubes for 1 h at 4 °C in a rotary mixer with gentle mixing. 14. Add the beads conjugated to the hit compound to tubes 1 and 2. 15. Add unconjugated agarose beads to tube 4 and the bead conjugated with an inactive chemical compound to tube 5 as controls (Fig. 5). 16. Incubate the five tubes for 2–4 h at 4 °C in a rotary mixer with gentle mixing. 17. Centrifuge the tubes for 1–2 s at 800 × g in a microcentrifuge at 4 °C. 18. Drain the supernatant. Store the tubes on ice. Wash the pellet three times with the washing buffer. 19. Add SDS gel-loading buffer to each tube. Boil the tubes for 5 min at 95 °C. 20. Perform SDS-PAGE.A gradient gel with constant current was used in our case (Subheading 2, item 21). 21. Visualize the protein bands by Coomassie blue staining (Subheading 2, item 22) or EBT silver staining [15]. 22. Excise the band of interest from the gel and place it in a microfuge tube. 23. Determine the identity of target proteins by mass spectrometry. 3.4 Biological Validation Seeds of candidate target Arabidopsis mutant lines are available from the Arabidopsis Biological Resource Center at Ohio State (e-mail: [email protected]) and the Nottingham Stock Centre Fig. 5 Schematic diagram of the affinity pull-down assay 402 Hyunmo Choi et al. (e-mail: [email protected]). If a specific target mutant is not available from public resources, the generation of an RNAi or overexpression line is required. Even if the mutant lines are available, generation of RNAi and/or overexpression lines is recommended to examine the dependence of the chemical phenotype on the level of protein expression. 1. Pour 60 mL of 0.5× MS culture media (with 0.8 % phyto agar) into 10-cm Petri dishes. 2. Prepare culture media with no solvent, solvent only, a hit chemical (from low to high concentration), or an inactive control chemical (from low to high concentration) (see Note 7). 3. Prepare 300 seeds of candidate mutant lines (and RNAi and overexpression lines, if available) (see Note 8). Surface-sterilize the prepared seeds and stratify them (see Subheading 3.1, step 4). Sow the seeds on the medium plate with no solvent. 4. After germination, transfer the seedlings onto each of the medium plates prepared in step 2. 5. Maintain under the experimental condition for the required period. 4 Notes 1. Culture media components can affect the phenotype during growth. For general practice, we recommend 0.5× MS culture media (with microelements and without sucrose). 2. Solvent effect: plant phenotypes can be affected by the concentration of the solvent used to dissolve the chemical compounds. In the case of DMSO, no apparent effect on the growth of Arabidopsis seedlings was observed up to 0.1 % (v/v) concentration of culture media. 3. Edge effect: the culture media in the wells at the edges of 24-well plates can dry more easily than the culture media in the wells at the center of 24-well plates. This can affect the growth of plants. To avoid this edge effect, the edges can be wrapped with film after closing the lids. This significantly reduces the edge effect. 4. If plants are grown for more than 2 weeks in culture media in 24-well plates or Petri dishes, the culture media may dry out. If such a long incubation is required to observe the phenotype, the volume of the culture media must be increased. The plates also need to be changed to 12- or 6-well plates to provide sufficient space for growth. 5. If the experiment is not specifically related to photomorphogenesis, temperature response, or circadian clock, we recommend Forward Chemical Genetic Screening 403 a long day condition (16 h light/8 h darkness) or a continuous light condition at 22 °C. To avoid pleiotropic effects of the compounds on the germination process or on very young seedlings, compounds may be administered a few days after germination. 6. It is critically important to decide which phenotype should be observed for the proper selection of chemical compounds. Various phenotypes can be observed depending on the purpose of the experiment, such as organ swelling [16], hypocotyl growth inhibition [17], agravitropic response [5], pin-formed inflorescence [4], and leaf bleaching [18]. If a noticeable phenotypic alteration is observed, the chemical may be categorized as a hit compound. Those without a phenotypic effect but with a similar structure should be used as inactive controls later in target identification. 7. For primary high-throughput screening, the incorporation of the chemicals into the culture media may be difficult due to the large number of plates required for compound screening. For the small volumes of culture media used in 24-well plates, it is sufficient to put the chemical compound on the surface of the culture media as it easily diffuses into the media. In this case, the seeds need be sown after 3 or more hours to allow the chemicals to diffuse evenly. However, if an increased volume of culture media is required, the chemicals need to be mixed with the media thoroughly before it solidifies. In this case, the culture media should be cooled down enough to prevent damage to temperature-sensitive chemicals. 8. The effect of a chemical compound can be validated by statistical analysis. For each compound, more than 30 plants should be used for determining statistical significance. If the p-value of the phenotypic difference is 0.05 or less for a specific compound, it can be established as a hit. Considering the potential problem posed by variations in the phenotypes of plants grown in different Petri dishes, it is recommended that all experiments should be repeated at least three times. 9. Once a candidate hit is discovered, optimized derivatives can be generated later, depending on the requirement. 10. Proteins extracted from the whole plant up to the seedling stage can be used for target identification. However, when the true leaves emerge, the amount of plastid proteins such as RuBisCO strongly increases. In this case, an affinity column with antibodies against plastid proteins can be used to reduce their concentration in total protein lysates. If the root is the target organ, the direct use of root-extracted protein is feasible. 404 Hyunmo Choi et al. References 1. Jun J, Kim CS, Cho DS, Kwak JM, Ha CM, Park YS, Cho BH, Patton DA, Nam HG (2002) Random antisense cDNA mutagenesis as an efficient functional genomic approach in higher plants. Planta 214:668–674 2. Zhao Y, Dai X, Blackwell HE, Schreiber SL, Chory J (2003) SIR1, an upstream component in auxin signaling identified by chemical genetics. Science 301:1107–1110 3. Armstrong JI, Yuan S, Dale JM, Tanner VN, Theologis A (2004) Identification of inhibitors of auxin transcriptional activation by means of chemical genetics in Arabidopsis. Proc Natl Acad Sci U S A 101:14978–14983 4. Kim JY, Henrichs S, Bailly A, Vincenzetti V, Sovero V, Mancuso S, Pollmann S, Kim D, Geisler M, Nam HG (2010) Identification of an ABCB/P-glycoprotein-specific inhibitor of auxin transport by chemical genomics. J Biol Chem 285:23309–23317 5. Surpin M, Rojas-Pierce M, Carter C, Hicks GR, Vasquez J, Raikhel NV (2005) The power of chemical genomics to study the link between endomembrane system components and the gravitropic response. Proc Natl Acad Sci U S A 102:4902–4907 6. Zouhar J, Hicks GR, Raikhel NV (2004) Sorting inhibitors (sortins): chemical compounds to study vacuolar sorting in Arabidopsis. Proc Natl Acad Sci U S A 101:9497–9501 7. Blackwell HE, Zhao Y (2003) Chemical genetic approaches to plant biology. Plant Physiol 133:448–455 8. Das RK, Samanta S, Ghosh K, Zhai D, Xu W, Su D, Leong C, Chang YT (2011) Target identification: a challenging step in forward chemical genetics. IBC 3(3):1–16 9. Crews CM, Splittgerber U (1999) Chemical genetics: exploring and controlling cellular processes with chemical probe. Trends Biochem Sci 24:317–320 10. Zheng XFS, Chan TF, Zhou HH (2004) Genetic and genomic approaches to identify and study the targets of bioactive small molecules. Chem Biol 11:609–618 11. Khersonsky SM, Chang YT (2004) Strategies for facilitated forward chemical genetics. Chembiochem 5:903–908 12. Kim YK, Chang YT (2007) Tagged library approach facilitates forward chemical genetics. Mol Biosyst 3:392–397 13. Khersonsky SM, Jung DW, Kang TW, Walsh DP, Moon HS, Jo H, Jacobson EM, Shetty V, Neubert TA, Chang YT (2003) Facilitated forward chemical genetics using tagged triazine library and zebrafish embryo screening. J Am Chem Soc 125:11804–11805 14. Ahn YH, Chang YT (2007) Tagged small molecule library approach for facilitated chemical genetics. Acc Chem Res 40:1025–1033 15. Jin L, Hwang S, Yoo G, Choi J (2006) A mass spectrometry compatible silver staining method for protein incorporating a new silver sensitizer in sodium dodecyl sulfate-polyacrylamide electrophoresis gels. Proteomics 6:2334–2337 16. DeBolt S, Gutierrez R, Ehrhardt DW, Melo CV, Ross L, Cutler SR, Somerville C, Bonetta D (2007) Morlin, an inhibitor of cortical microtubule dynamics and cellulose synthase movement. Proc Natl Acad Sci U S A 104:5854–5859 17. Asami T, Min YK, Nagata N, Yamagishi K, Takatsuto S, Fujioka S, Murofushi N, Yamaguchi I, Yoshida S (2000) Characterization of brassinazole, a triazole-type brassinosteroid biosynthesis inhibitor. Plant Physiol 123:93–100 18. Walsh TA, Bauer T, Neal R, Merlo AO, Schmitzer PR, Hicks GR, Honma M, Matsumura W, Wolff K, Davies JP (2007) Chemical genetic identification of glutamine phosphoribosylpyrophosphate amidotransferase as the target for a novel bleaching herbicide in Arabidopsis. Plant Physiol 144:1292–1304 Chapter 22 Highly Reproducible ChIP-on-Chip Analysis to Identify Genome-Wide Protein Binding and Chromatin Status in Arabidopsis thaliana Jong-Myong Kim, Taiko Kim To, Maho Tanaka, Takaho A. Endo, Akihiro Matsui, Junko Ishida, Fiona C. Robertson, Tetsuro Toyoda, and Motoaki Seki Abstract Gene activity is regulated via chromatin dynamics in eukaryotes. In plants, alterations of histone modifications are correlated with gene regulation for development, vernalization, and abiotic stress responses. Using ChIP, ChIP-on-chip, and ChIP-seq analyses, the direct binding regions of transcription factors and alterations of histone modifications can be identified on a genome-wide level. We have established reliable and reproducible ChIP and ChIP-on-chip methods that have been optimized for the Arabidopsis model system. These methods are not only useful for identifying the direct binding of transcription factors and chromatin status but also for scanning the regulatory network in Arabidopsis. Key words Arabidopsis, Histone, Chromatin, ChIP-on-chip 1 Introduction Posttranslational modifications, such as histone modifications, are one of the critical events to regulate transcription and genome structure in eukaryotes [1–7]. In plants, the gene regulation of flowering, vernalization, and abiotic stress responses are correlated to histone modifications [8–13]. “ChIP-on-chip” and “ChIP-seq” are very powerful techniques that can be used to detect genome-wide changes in DNA–protein binding activity and chromatin status, combining chromatin immunoprecipitation (“ChIP”) with tiling array technology (“chip”) and high-throughput sequencing technology, respectively [14–17]. Although genome-wide analysis using ChIP-on-chip of both chromatin marks and transcription factor binding has been previously reported for Arabidopsis [18–22], the ChIP-on-chip assay has not yet become a widespread technique in Arabidopsis. Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_22, © Springer Science+Business Media New York 2014 405 406 Jong-Myong Kim et al. Steps in Methods Fixation of fresh plants 3.1. Breaking the fixed plants 3.2. Chromatin Shearing 3.3. Immunoprecipitation and ChIPed DNA purification 3.4.-3.5. Amplification of purified-ChIPed-DNA 3.6. Preparation of hybridization probe for tiling array 3.7.-3.8. Hybridization with tiling array 3.9.-3.10. Scanning and data analysis 3.11.-3.12. Fig. 1 Workflow of ChIP-on-chip analysis This is primarily due to the difficulties associated with the optimization of the ChIP assay conditions and the generation of reproducible results. Moreover, the ChIP-on-chip procedure involves numerous steps and as a result this makes it complicated for troubleshooting within each step. We have established a ChIP [11] and ChIP-on-chip protocol that has been optimized for Arabidopsis and proven to be reliable and reproducible (Fig. 1). In this protocol, fresh plants are used without freeze-thawing as a means to prevent the disruption of the protein interactions of interest. This method is capable of handling both small- and large-scale ChIP assays (Fig. 2). In the ChIP-on-chip method, we combined a T/A ligation technique for the attachment of a dsDNA adaptor and an in vitro transcription system to amplify a sufficient amount of cRNAs. As a component of the in vitro transcription system, the detection substrate is incorporated into the amplified cRNA fragments for hybridization. This provides high integrity and reproducibility to our results. Also, ChIPed DNA that has been prepared by our ChIP method is immediately available for subsequent ChIP-seq analysis using optimized amplification and sequencing procedures that have been provided from each manufacturer of high-throughput sequencers. In this chapter, we describe the protocols that we have developed for Arabidopsis ChIP and ChIPon-chip analyses to identify site-specific and genome-wide DNA– protein binding and chromatin status. 2 Materials 2.1 Fixation and Quenching of Plants 1. Two-week-old whole Arabidopsis seedlings (roots and entire aerial parts) grown in petri dishes containing GM agar (0.85 %) medium supplemented with 1 % sucrose under 16 h light/8 h dark cycle (40–80 μmol photons/m2 s, light period: 5:00 a.m. to 9:00 p.m.) [11]. ChIP-on-Chip Analysis in Arabidopsis 407 Fig. 2 Schematic diagram of chromatin immunoprecipitation 2. 1 M HEPES, adjusted to pH 7.5 using 10N KOH. Autoclave and store at room temperature. 3. Formaldehyde. Store at room temperature. 4. Vacuum system, including pumps and plastic bell, connected to a freeze-dryer. 5. 2.5 M Glycine solution. Store at room temperature. 2.2 Extraction of Cell Lysate from Plants 1. Metal tubes, SUST-0050 (Bio Medical Science). 2. Protease inhibitor cocktail tablet, (Complete, EDTA free, Roche). One tablet is dissolved in 50 mL of 50 mM HEPES buffer and is prepared just prior to use. 3. Tungsten balls, SS150-0050 (Bio Medical Science). 4. Aluminum tube holder unit, AB 50-0005 (Bio Medical Science). 408 Jong-Myong Kim et al. 5. Plant shredding equipment: Shake Master Auto, BMS A20-TP (Bio Medical Science). 6. Filter unit: cell strainer 100 μm, No. 352360 (BD Falcon). 7. Protein low-binding plastic tube of 50 mL size: SUMILON Proteosave, MS-5950 (SUMILON). 8. Protein low-binding plastic tube of 15 mL size: SUMILON Proteosave, MS-59150 (SUMILON). 9. 2× 150 mM lysis buffer: 25 mL of 1 M HEPES buffer, pH 7.5, 50 mL of 3 M NaCl, 2 mL of 0.5 M EDTA, 50 mL of 20 % Triton X-100, 10 mL of 10 % Na deoxycholate, and 5 mL of 20 % SDS. Make up to 500 mL with distilled water and store at room temperature. 2.3 Chromatin Shearing 1. 150 mM lysis buffer: 25 mL of 1 M HEPES buffer, pH 7.5, 25 mL of 3 M NaCl, 1 mL of 0.5 M EDTA, 25 mL of 20 % Triton X-100, 5 mL of 10 % Na deoxycholate, and 2.5 mL 20 % SDS. Make up to 500 mL with distilled water and store at room temperature. 2. Self-standing plastic tube of 25 mL size: Centrifuge Tubes Mini with triple seal cap, No. 2362-025 (IWAKI, JAPAN). 3. Sonicator: Astraon 3000, Model S3000-600, and probe tip: 1/2 flat tip (Misonix). 4. Phenol/chloroform/isoamyl alcohol 25:24:1 saturated with 10 mM Tris–HCl, pH 8.0, 1 mM EDTA (Sigma-Aldrich). Store at 4 °C. 5. Ethanol. 6. 3 M sodium acetate buffer solution, S7899-100ML (SigmaAldrich). Store at room temperature. 7. Glycogen for Mol. Biol., No. 0-90-393-001 (Roche). Stored at −20 °C. 8. Agilent DNA 1000 kit (Agilent). Store at room temperature. 9. Agilent 2100 Bioanalyzer (Agilent Technologies). 2.4 Chromatin Immunoprecipitation 1. Dynabeads Protein G (Dynal). 2. Nutator (BD Clay Adams brand). 3. Anti-histone H4 tetra-acetylation polyclonal antibody, 06-866 (Millipore). 4. Magnet rack for 50 mL plastic tube: Dynal MPC-1 (Dynal). 5. 1.7 mL SafeSeal Microcentrifuge Tubes (Sorenson BioScience). 6. Magnet rack for 1.5 mL plastic tube: Dynal MPC-S (Dynal). 7. 500 mM lysis buffer: 25 mL of 1 M HEPES buffer, pH 7.5, 83.3 mL of 3 M NaCl, 1 mL of 0.5 M EDTA, 25 mL of 20 % Triton X-100, 5 mL of 10 % Na deoxycholate, and 2.5 mL of ChIP-on-Chip Analysis in Arabidopsis 409 20 % SDS. Make up to 500 mL with distilled water and store at room temperature. 8. Deoxycholate buffer: 5 mL of 1 M Tris buffer, adjusted to pH 8.0 by 10 N HCl, 31.25 mL of 4 M LiCl. 12.5 mL of 20 % NP-40 (Sigma-Aldrich), and 25 mL of 10 % Na deoxycholate. Make up to 500 mL with distilled water, autoclaved, and store at room temperature. 9. 10× TE, pH 8.0. Store at room temperature. 10. Elution buffer: 2.5 mL of 1 M Tris buffer, adjusted to pH 8.0 by 10 N HCl, 1 mL of 0.5 M EDTA, and 2.5 mL of 20 % SDS. Make up to 50 mL with distilled water and store at room temperature. 11. Hybridization incubator HB-80 (TAITEC). 12. RNase A (7,000 units/mL), No. 19101 (Qiagen). Store at 4 °C. 13. Proteinase K (>600 mAU/mL), No. 19131 (Qiagen). Store at 4 °C. 14. QiaAmp DNA micro purification kit (Qiagen). Store at room temperature. 15. DNase-/RNase-free water. Store at room temperature. 2.5 Evaluation of ChIPed DNA Quality and Enrichment 1. ExTaq DNA polymerase (5 units/μL) No. RR001A (TaKaRa) and 10× ExTaq PCR buffer. Store at −20 °C. 2. Primers to detect the enrichment of an internal control, Arabidopsis ACT7 (At5g09810) gene: forward primer, ACT7-F 5′-CGTTTCGCTTTCCTTAGTGTTAGCT, and reverse primer, ACT7-R 5′-AGCGAACGGATCTAGAGACTCAC CTTG (see Note 1). 3. 6 % acrylamide gel, 3 mL of acrylamide, bis-acrylamide (19:1) gel solution (BioRad), and 3 mL of 5× TBE. Make up to 15 mL with distilled water. Add 40 μL of 30 % ammonium persulfate and 20 μL TEMED. Mix and pour into gel preparation system (glass plate size: 9 cm × 15 cm × 1 mm) for electrophoresis system, BE-22R (BIO CRAFT). Prepare just before use. 4. Gel detection system: VISTA FluorImager SI, Filter: 610RG and ImageQuant software (GE Healthcare). 5. SYBR Gold nucleic acid gel stain 10,000× concentrate in DMSO (Life Technologies). 6. Thermal cycler. 2.6 Amplification of ChIPed DNA Fragments for Tiling Array Hybridization 1. dNTP mixture (2.5 mM each) No. 4030 (TaKaRa). 2. DNA T4 polymerase (3,000 units/mL) No. M0203L (NEB). Store at −20 °C. 3. 10× NEB2 buffer (NEB). Store at −20 °C. 410 Jong-Myong Kim et al. 4. DNase-/RNase-free water. Store at room temperature. 5. 0.5 M EDTA. Store at room temperature. 6. 1× TE solution. Store at room temperature. 7. T4 polynucleotide kinase (10 units/μL) No. 18004-010 (Life Technologies). 8. 100 mM ATP (TaKara). 9. 5× forward buffer (NBE). 10. Phenol/chloroform/isoamyl alcohol 25:24:1 saturated with 10 mM Tris–HCl, pH 8.0, 1 mM EDTA. Store at 4 °C. 11. Ethanol. 12. 3 M sodium acetate buffer solution, S7899-100ML (SigmaAldrich). Store at room temperature. 13. Glycogen for Mol. Biol., No. 0-90-393-001 (Roche). Store at −20 °C. 14. QiaAmp DNA temperature. micro kit (Qiagen). Store at room 15. NanoDrop ND-1000 Spectrophotometer (Thermo Fisher Scientific). 16. T/A oligos for production of dsDNA adaptor: (see Note 2). T/A ds_F, 5′-GCGGCCGCGAAATTAATACGACTCACTAT AGGGAGT. T/A ds_R, 5′-CTCCCTATAGTGAGTCGTATTAATTT. 17. T4 DNA ligase (2,000,000 cohesive end units/mL) No. M0202T (NEB) and 10× ligation buffer. Store at −20 °C. 18. T7-c primer: 5′-CTTGGCGCGAAATTAATACGACTCACTATAGGGAGT. 19. ExTaq DNA polymerase (5 units/μL) No. RR001A (TaKara) and 10× ExTaq PCR buffer. Store at −20 °C. 20. Thermal cycler. 2.7 Synthesis of Biotin-Labeled cRNA with IVT Reaction 1. NanoDrop ND-1000 Spectrophotometer (Thermo Fisher Scientific). 2. One-Cycle Target Labeling and Control Reagents (the following reagents and materials are supplied from the manufacturer Affymetrix: 10× IVT labeling buffer, IVT labeling NTP mix, IVT labeling enzyme mix, IVT cRNA cleanup spin column, IVT cRNA binding buffer, IVT cRNA wash buffer). IVT cRNA cleanup spin column is stored at 4 °C and IVT cRNA binding buffer is stored at −20 °C. IVT cRNA is stored at −80 °C. Wash buffer is stored at room temperature and all other reagents are stored at −20 °C. 3. Ethanol. ChIP-on-Chip Analysis in Arabidopsis 2.8 Fragmentation of the cRNA 411 1. 5× fragmentation buffer (Affymetrix). Store at room temperature. 2. Thermal cycler. 3. DNase-/RNase-free water. Store at room temperature. 4. Agilent 2100 Bioanalyzer (Agilent Technologies). 2.9 Hybridization 1. GeneChip Hybridization Oven 640 (Affymetrix). 2. GeneChip Arabidopsis tiling array (1.0F Array, Affymetrix) (see Note 3). This can be stored for up to 6 months at 4 °C in the dark. 3. 250 μL micropipette tips HR-250S (Rainin) (see Note 4). Use when applying the hybridization buffer to the tiling array. 4. 5 M NaCl (DNase-/RNase-free, Ambion). 5. 0.5 M EDTA. Store at room temperature. 6. 2× hybridization buffer: add 8.3 mL of 12× MES stock buffer, 17.7 mL of 5 M NaCl, 4 mL of 0.5 M EDTA, 0.1 mL of 10 % Tween-20 to 19 mL RNase-free water, made up to 50 mL. Store at 4 °C in dark. 7. GeneChip Eukaryotic Hybridization Control Kit (the following reagents and materials are supplied from the manufacturer Affymetrix: 3 nM control oligo B2, 20× eukaryotic hybridization controls). Store at −20 °C. 8. 10 mg/mL herring sperm DNA (Promega). Store at −20 °C. 9. 50 mg/mL bovine serum albumin (BSA) (Invitrogen). Store at −20 °C. 10. Dimethyl sulfoxide (DMSO). Store at room temperature. 11. DNase-/RNase-free water. Store at room temperature. 12. Heat block. 2.10 Washing and Staining 1. GeneChip Fluidics Station 450 (Affymetrix). 2. 20× SSPE (3 M NaCl, 0.2 M NaH2PO4, 0.02 M EDTA, Cambrex). 3. Wash buffer A: Add 300 mL of 20× SSPE and 1 mL of 10 % Tween-20 (Pierce Chemical) to 650 mL of autoclaved distilled water and make up to 1,000 mL with autoclaved distilled water. Filter wash buffer A through 0.2 μm filter. This can be stored for 3 months at 4 °C in the dark. 4. 12× 2-[N-morpholino]ethanesulfonic acid (MES) stock buffer: Add 3.2 g of MES free acid monohydrate and 9.7 g of MES sodium salt (Sigma-Aldrich) to 40 mL of DNase-/ RNase-free water (Gibco) and make up to 50 mL with DNase-/ RNase-free water. Filter 12× MES stock buffer through 0.2 μm filter. This can be stored for 3 months at 4 °C in dark. 5. Wash buffer B: Add 41.7 mL of 12× MES stock buffer, 2.6 mL of 5 M NaCl, and 0.5 mL of 10 % Tween-20 to 400 mL of 412 Jong-Myong Kim et al. autoclaved distilled water and make up to 500 mL with autoclaved distilled water. Filter wash buffer B through a 0.2 μm filter. This can be stored for 3 months at 4 °C in the dark. 6. 10 mg/mL goat IgG stock: Add 50 mg of goat IgG (SigmaAldrich) to 5 mL of 150 mM NaCl solution (prepared from 5 M NaCl solution). If a larger volume of the 10 mg/mL IgG stock is prepared, aliquot and store at −20 °C until use. After thawing the solution, store at 4 °C. Avoid cycles of repeated freezing and thawing. 7. 2× Stain buffer: Add 41.7 mL of 12× MES stock buffer, 92.5 mL of 5 M NaCl, and 2.5 mL of 10 % Tween-20 to 113.3 mL of RNase-free water. Filter the 2× stain buffer through a 0.2 μm filter. This can be stored at 4 °C in the dark. 8. Anti-streptavidin antibody (goat), Laboratories). Store at −20 °C. biotinylated (Vector 9. 1 mg/mL streptavidin phycoerythrin (SAPE) solution (Molecular Probes). Store at 4 °C. 10. 50 mg/mL bovine serum albumin (BSA) (Invitrogen). Store at −20 °C. 11. DNase-/RNase-free water. Store at room temperature. 2.11 Array Scanning 1. GeneChip Scanner 3000 7G (Affymetrix). 2. Tough-Spots (USA Scientific). 2.12 Computational Analysis of ChIPon-Chip Data 3 1. MAS 5.0 algorithm (Affymetrix:http://www.affymetrix.com/ support/technical/whitepapers/sadd_whitepaper.pdf). Methods 3.1 Fixation and Quenching of Plants 1. Grow Arabidopsis plants in petri dishes (20 plants per petri dish) containing GM agar (0.85 %) medium supplemented with 1 % sucrose under 16 h light/8 h dark cycle (40–80 μmol photons/m2 s, light period: 5:00 a.m. to 9:00 p.m.) [11]. 2. Remove air from 200 mL of 50 mM HEPES buffer using an aspirator. Warm the HEPES buffer to 22 °C and keep in an incubator until use. 3. Pre-run the vacuum system and cool down the water trap chamber inside the freeze-dryer for at least 30 min before use. 4. To make the fixation buffer, add 6 mL of formaldehyde (final concentration 1 %) to the 200 mL of 50 mM HEPES buffer in a 500 mL beaker just before use. 5. Carefully remove plants from plates to ensure that no agar is transferred. ChIP-on-Chip Analysis in Arabidopsis 413 6. Harvest whole plants (fresh weight: 1 g) and immediately submerge in the 206 mL of fixation buffer containing formaldehyde (see Note 5). 7. Cover the beaker with a two-ply layer of parafilm and make 20 holes in the parafilm using forceps. 8. Set the beaker containing samples on the heating plate inside the plastic bell for vacuum infiltration. Stack enough paper towels on top of the beaker to prevent the formation of ice cores from splashes of fixation buffer. 9. Start vacuum using the maximum vacuum speed to remove air from samples. Maintain vacuum pressure between 60 and 133 Pa for 5 min (see Note 6). After 5 min, open the air valve to quickly release the vacuum. 10. Briefly swirl the samples and confirm infiltration of the fixation buffer into plants (see Note 7). 11. Repeat vacuum infiltration using the same procedure as described above. 12. Keep fixed samples at 22 °C in incubator for 45 min. 13. Remove the parafilm cover and wipe away any extra fluid collected on the inside surface of the beaker. 14. Add 10 mL of 2.5 M glycine solution and gently mix by swirling. 15. Again, cover the beaker with a two-ply layer parafilm and make 20 holes using forceps. 16. To quench the formaldehyde, repeat the vacuum infiltration procedure twice in the same manner as described for fixation (steps 8–11). 17. Keep fixed plants to quench at 22 °C in incubator for at least 30 min. 18. Remove the solution by decantation. 19. Add 200 mL of 50 mM HEPES buffer and wash the fixed plants. 20. Repeat sample washing two times with rinses of 200 mL of 50 mM HEPES. 21. Dry the fixed plants (see Note 8) using paper towels. 3.2 Extraction of Cell Lysate for Chromatin Immunoprecipitation 1. Transfer samples to prechilled metal tubes and maintain on ice for a few minutes (see Note 9). 2. Add two tungsten balls and 4 mL of prechilled 50 mM HEPES buffer containing Complete tablet. Cover with metal lid and wrap with parafilm. 3. To ensure that liquid does not leak from the lid, compress the parafilm by rolling the tube on the bench top (see Note 10). 414 Jong-Myong Kim et al. 4. Place the metal tube in the aluminum tube holder and place the holder in the plant shredding machine. 5. Grind samples with strong shaking for 13 min using the Shake Master Auto. 6. Remove holder from the shredding machine and immediately place on ice. 7. Take 10 mL of whole cell lysate and check the grinding efficiency using microscopy. 8. Add 1 mL of 50 mM HEPES buffer containing Complete tablet to the sample tube. 9. Resuspend ground samples by pipetting. 10. Pour 5 mL of ground samples onto the filter unit set on a 50 mL protein low-binding plastic tube. 11. Cover with parafilm and centrifuge ground samples for 5 min at 400 × g, 4 °C. 12. Replace the filter unit and pour the remaining ground samples onto the filter unit. 13. Add 1–2 mL of 50 mM HEPES buffer containing Complete tablet to the metal tube and completely transfer all of the ground samples to the filter unit. 14. Centrifuge for 10 min at 400 × g, 4 °C. 15. Remove the filter unit and transfer the supernatant (cell lysate) to a 15 mL protein low-binding plastic tube. Make up the cell lysate to 7.5 mL with 50 mM HEPES buffer containing Complete tablet. 16. Add 7.5 mL of 2× 150 mM lysis buffer (see Note 11). 3.3 Chromatin Shearing 1. Transfer 15 mL of cell lysate in 150 mM lysis buffer to a 25 mL self-standing plastic tube and keep in ice water (see Note 12). 2. Sonicate the cell lysate using a sonicator in 150 mM lysis buffer at an output level of 8.5 for 30 s and immediately return the tube to ice water for at least 1 min (see Note 13). 3. Repeat this cycle 14 times. 4. Transfer the sonicated cell lysate in 150 mM lysis buffer to a 50 mL protein low-binding plastic tube. 5. Centrifuge for 10 min at 20,000 × g, 4 °C. 6. The resultant aqueous whole cell extract (WCE) is used to produce Input DNA and ChIPed DNA. 7. For the Input DNA, take 200 μL of WCE and extract DNA by phenol/chloroform extraction and ethanol precipitation. 8. Check the fragment size range of the sheared DNA (see Note 14) using the Agilent 2100 Bioanalyzer (Agilent Technologies). ChIP-on-Chip Analysis in Arabidopsis 3.4 Chromatin Immunoprecipitation 415 1. To remove nonspecific IP, add 30 μL of magnetic beads (see Note 15) to 15 mL of WCE in a 50 mL protein lowbinding plastic tube. 2. Stir for 30 min using a nutator at 4 °C. 3. Collect magnetic beads using a magnet rack. 4. Transfer the WCE supernatant to a new 50 mL protein lowbinding tube. 5. Dispense 500 μL of prewashed WCE into a 1.7 mL protein low-binding tube. 6. Add 4 μL of antibody (see Note 16). 7. Stir overnight using a nutator at 4 °C. 8. Add 30 μL of magnetic beads. 9. Stir for 4 h using a nutator at 4 °C. 10. Collect magnetic beads using a magnet rack. 11. Discard aqueous supernatant using an aspirator. 12. Add 1 mL of 150 mM lysis buffer to wash beads. 13. Invert the tube to resuspend magnetic beads. 14. Collect magnetic beads using a magnet rack. 15. Remove aqueous supernatant using an aspirator. 16. Repeat this washing step three times. 17. Add 1 mL of 150 mM lysis buffer. 18. Resuspend the magnetic beads by inversion. 19. Stir for 10 min using a nutator at room temperature. 20. Collect magnetic beads using a magnet rack. 21. Remove aqueous supernatant using an aspirator. 22. Add 1 mL of 500 mM lysis buffer. 23. Resuspend the magnetic beads by inversion. 24. Stir for 10 min using a nutator at room temperature. 25. Collect magnetic beads using a magnet rack and remove aqueous supernatant using an aspirator. 26. Add 1 mL of deoxycholate buffer. 27. Resuspend the magnetic beads by inversion. 28. Stir for 10 min using a nutator at room temperature. 29. Collect magnetic beads using a magnet rack and remove aqueous supernatant using an aspirator. 30. Add 1 mL of 1× TE. 31. Resuspend the magnetic beads by inversion. 32. Stir for 10 min using a nutator at room temperature. 33. Collect magnetic beads using a magnet rack and remove aqueous supernatant using an aspirator. 416 Jong-Myong Kim et al. 34. Add 400 μL of elution buffer, resuspend magnetic beads by inversion, and transfer to a new 1.7 mL DNA low-binding tube. 35. To reverse cross-linking, incubate samples overnight at 65 °C in a hybridization oven. 36. Add 2 μL of RNase A and incubate for 30 min at 50 °C. 37. Add 5 μL of proteinase K and incubate for 30 min at 50 °C. 38. After cooling the samples to room temperature, extract DNA (ChIPed DNA) by phenol/chloroform extraction and ethanol precipitation. Allow the ChIPed DNA to air dry briefly. 39. Dissolve the ChIPed DNA in 100 μL of 1× TE solution. 40. Purify the ChIPed DNA using QiaAmp DNA micro purification kit (see Note 17) and elute with 30 μL DNase-/RNasefree water (see Note 18). 3.5 Evaluation of ChIPed DNA Quality and Enrichment 1. Mix 1 μL of 1 ng/mL ChIPed DNA, 1 μL of ExTaq DNA polymerase, 4 μL of 10 mM dNTP, 0.25 μL of 100 μM primers to amplify the target region, 0.25 μL of 100 μM primers, and 2.5 μL of 10× ExTaq PCR buffer in a total reaction volume of 25 μL to amplify ACT7 region (see Note 19). 2. Amplify Input DNA and ChIPed DNA by PCR. Cycle conditions are 94 °C for 5 min, [94 °C for 15 s, 58 °C for 30 s, 72 °C for 90 s] × 25cycles, and 72 °C for 1 min and store at 4 °C (see Note 20). 3. Apply 3 μL of PCR product to each well on a 6 % acrylamide gel (see Note 21). 4. Separate PCR products by electrophoresis for 40 min at 200 V. 5. Stain DNA fragments in gel using 1 μL of SYBR Gold in 300 mL of distilled water by gently shaking for 5 min. 6. Gently agitate stained gel in distilled water for 10 min at room temperature to remove the background fluorescence. 7. Measure the intensity of fluorescence of bands using a FluorImager. Calculate the signal intensity and fold enrichment using ImageQuant imaging software (see Notes 22 and 23). 3.6 Amplification of ChIPed DNA Fragments for Hybridization of Tiling Array 1. Gently mix 200 ng of ChIPed DNA with 0.5 μL of DNA T4 polymerase, 4.4 μL of dNTP, 11 μL of 10× NEB2 buffer, and DNase-/RNase-free water in a total volume of 110 μL. 2. Incubate the mixture for 15 min at 12 °C using a thermal cycler. 3. Add 1.1 μL of 0.5 M EDTA to stop the reaction. 4. Add 90 μL of 1× TE. 5. Purify DNA fragments by phenol/chloroform extraction and ethanol precipitation. ChIP-on-Chip Analysis in Arabidopsis 417 6. Briefly air dry DNA fragments. 7. Dissolve DNA in 16.5 μL of DNase-/RNase-free water. 8. Add 1 μL of T4 polynucleotide kinase, 2.5 μL of 100 mM ATP, and 5 μL of 5× forward buffer. Gently mix by pipetting. 9. Incubate for 10 min at 37 °C using a thermal cycler. 10. Immediately place the sample tube on ice. 11. Add 175 μL of 1× TE. 12. Purify DNA fragments by phenol/chloroform extraction and ethanol precipitation. 13. Briefly air dry DNA fragments. 14. Dissolve in 40.75 μL of DNase-/RNase-free water. 15. Add 0.25 μL of ExTaq DNA polymerase, 0.4 μL of 100 mM dATP, and 5 μL of 10× ExTaq PCR buffer. 16. Incubate for 30 s at 50 °C then for 20 min at 72 °C using a thermal cycler. 17. Immediately place the sample tube on ice. 18. Purify DNA fragments using QiaAmp DNA micro kit (Qiagen) according to the manufacturer’s instructions. Elute with 30 μL of DNase-/RNase-free water. 19. Check the DNA concentration by measuring the absorbance. 20. Gently mix 200 ng of DNA with 1 μL of 100 μM T/A dsDNA adaptor (see Note 2), 1 μL of T4 DNA ligase, 1.5 μL of 10× ligation buffer, and DNase-/RNase-free water in a total volume of 15 μL. 21. Incubate overnight at 16 °C on a thermal cycler. 22. Purify and elute the adaptor-ligated DNA fragments and check the DNA concentration using the same procedure described in step 19 of this section. 23. Mix 30 ng of the adaptor-ligated DNA fragments with 1 μL of 100 μM T7-c primer, 1 μL of ExTaq DNA polymerase (Takara), 4 μL of dNTP mixture (Takara), 4 μL of 10× ExTaq PCR buffer (Takara), and DNase-/RNase-free water in a total volume of 50 μL. 24. Amplify the adaptor-ligated DNA fragments by PCR using the following cycle conditions: 94 °C for 5 min, [94 °C for 30 s,55 °C for 30 s, 72 °C for 90 s] × 15 cycles, and 72 °C for 4 min and store at 4 °C (see Note 24). 25. Purify DNA using Qiagen PCR purification kit according to the manufacturer’s instructions. Elute with 30 μL of DNase-/ RNase-free water. 26. Check the DNA concentration by measuring the absorbance. 27. Also check the size of the amplified DNA fragments using the Agilent 2100 Bioanalyzer (Agilent Technologies) (see Note 25). 418 Jong-Myong Kim et al. 3.7 Synthesis of Biotin-Labeled cRNA Using the IVT Reaction 1. Transfer 200 ng of the amplified DNA sample to a RNase-free microfuge tube and add 4 μL of 10× IVT labeling buffer, 12 μL of IVT labeling NTP mix, and 4 μL of IVT labeling enzyme mix. Adjust to a final volume of 40 μL with RNase-free water. 2. Mix gently and spin down to collect the solution. 3. Incubate at 37 °C for 18 h in an air incubator (see Note 26). 4. For the cleanup of biotin-labeled cRNA (see Note 27), add 60 μL of RNase-free water to the IVT reaction mixture sample (after step 3) and vortex for 3 s. 5. Add 350 μL of IVT cRNA binding buffer (see Note 28) to the sample and mix by vortexing for 3 s. 6. Add 250 μL of ethanol and mix well by pipetting (see Note 29). 7. Apply 700 μL of the sample onto “IVT cRNA cleanup spin column” set in a 2 mL collection tube. Centrifuge for 15 s at 6,000 × g. Discard the flow-through and the collection tube. 8. Transfer the spin column onto a new 2 mL collection tube. Apply 500 μL of “IVT cRNA wash buffer” onto the spin column. Centrifuge for 15 s at 6,000 × g. Discard the flow-through. 9. Apply 500 μL of 80 % ethanol onto the spin column. Centrifuge for 15 s at 6,000 × g. Discard the flow-through. 10. Open the cap of the spin column and centrifuge for 5 min at 20,000 × g. Discard the flow-through and the collection tube. 11. Transfer the spin column onto a 1.5 mL collection tube and apply 11 μL of RNase-free water onto the membrane of the spin column. Subsequently centrifuge for 1 min at 20,000 × g to elute the cRNA. 12. Apply 10 μL of RNase-free water onto the membrane of the spin column. Then centrifuge for 1 min at 20,000 × g and collect the eluate. 13. Check the concentration of the biotin-labeled cRNAs by measuring the absorbance (see Note 30). 3.8 Fragmentation of the cRNA 1. Prepare the fragmentation buffer containing 45 μg of cRNA (1–21 μL) and 8 μL of 5× fragmentation buffer in a 0.2 mL tube. Adjust to a final volume of 40 μL with DNase-/RNasefree water. 2. Incubate at 94 °C for 35 min using a thermal cycler. Place on ice immediately after the incubation. 3. Check the fragmentation with an Agilent 2100 Bioanalyzer (see Note 31). 3.9 Hybridization 1. Incubate 20× eukaryotic hybridization controls for 5 min at 65 °C to completely dissolve the elements. ChIP-on-Chip Analysis in Arabidopsis 419 2. Prepare the hybridization cocktail. For each target sample, add the following reagents to 15 μg of each fragmented cRNA sample: 5 μL of 3 nM control Oligo B2, 15 μL of 20× eukaryotic hybridization controls, 3 μL of 10 mg/mL Herring Sperm DNA, 3 μL of 50 mg/mL BSA, 150 μL of 2× hybridization buffer, and 30 μL of DMSO. Adjust to a final volume of 300 μL with DNase-/RNase-free water. 3. Maintain the tiling array at room temperature (see Note 32). 4. Prehybridize the array by filling through a septum with 200 μL of 1× hybridization buffer using a micropipettor (see Note 33) and incubate the array for 10 min at 45 °C with rotation. 5. Heat the hybridization cocktail at 99 °C for 5 min on a heat block. 6. Transfer the hybridization cocktail to 45 °C on heat block and keep for 5 min. 7. Centrifuge the hybridization cocktail at 20,000 × g for 5 min to remove any insoluble materials. 8. Remove the pre-hybridization buffer solution from the array and add 200 μL of the hybridization cocktail (see Note 34) onto the array. 9. Incubate the array for 18 h at 45 °C with 60 rpm rotation in the hybridization oven. 3.10 Washing and Staining 1. For each target sample, prepare three tubes for streptavidin phycoerythrin (SAPE) solution for the first stain, antibody solution, and SAPE solution for the third stain. For each sample, prepare 1,200 μL of SAPE solution mix containing 600 μL of 2× stain buffer, 48 μL of 50 mg/mL BSA, 12 μL of 1 mg/ mL SAPE, and 540 μL of DNase-/RNase-free water. Divide it into two aliquots of 600 μL which are used for the first stain solution and the third stain solution (see Note 35). 2. For each sample, prepare 600 μL of the antibody solution mix containing 300 μL of 2× stain buffer, 24 μL of 50 mg/mL BSA (see Note 36), 6 μL of 10 mg/mL goat IgG stock, 3.6 μL of 0.5 mg/mL biotinylated antibody, and 266.4 μL of DNase-/ RNase-free water. 3. After 18 h of hybridization, remove the hybridization cocktail from the array (see Note 37) and completely fill the array with the appropriate volume (about 250 μL) of non-stringent wash buffer A. 4. Set the wash buffer A and wash buffer B into the fluidics station. Run the protocol “Prime_450.” 5. Set the SAPE solution and antibody solution into the fluidics station. 420 Jong-Myong Kim et al. 6. Select the protocol “EuKGE-ws2v4” in the fluidics station. Insert the array into the designated module of the fluidics station and start the run (see Note 38). Perform washing and staining procedure as follows: (a) Post-hyb wash #1: 10 cycles of 2 mixes/cycle with wash buffer A at 30 °C. (b) Post-hyb wash #2: 4 cycles of 15 mixes/cycle with wash buffer B at 50 °C. (c) Stain: Stain the array for 10 min in SAPE solution at 35 °C. (d) Post stain wash: 10 cycles of 4 mixes/cycle with wash buffer A at 30 °C. (e) Second stain: Stain the array for 10 min in antibody solution at 35 °C. (f) Third stain: Stain the array for 10 min in SAPE solution at 35 °C. (g) Final wash: 15 cycles of 4 mixes/cycle with wash buffer A at 35 °C. The loading temperature is 25 °C. 7. Turn on the scanner approximately 30 min prior to the end of the protocol (see Note 39). One hour and 20 min after starting the run, the “Eject” sign will appear. Remove the array at this time (see Note 40). 3.11 Array Scanning 1. On the back of the array, wipe off excess solution around the septum. Cover the septum with the seal “Tough-Spots” and keep the surface of the seal flat (see Note 41). 2. Perform scanning using filters (570 nm) at 0.7 μm resolution using a GeneChip Scanner 3000 7G. When entering the experimental information using GCOS (GeneChip Operating Software) ver. 1.3, select “At35b_MF_v04” for 1.0F Array. 3.12 Computational Analyses of ChIPon-Chip Data 1. Prepare the information of Arabidopsis genome sequence and annotation from Arabidopsis genome release (ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR∗∗_genome_release/; see Note 42) in the Arabidopsis information resource (TAIR). 2. Map the probes of each Affymetrix Arabidopsis whole-genome tiling array (1.0F Array) on the Arabidopsis genomic sequence. 3. For the analysis of protein enrichments in the Arabidopsis whole genome, normalize the intensity of a total of 6.4 million 25 nt oligonucleotide probes for one strand of genomic sequence (corresponding to 3.2 million perfect match (PM) and 3.2 million mismatch (MM) probes) of individual replicates for all samples at the same time via quantile normalization [23] (see Notes 43–45). ChIP-on-Chip Analysis in Arabidopsis 421 4. Calculate the signal intensity and genomic positions using the MAS5.0 algorithm (Affymetrix). 5. Normalize the data between ChIPed DNA and Input DNA using the rank consistency filter selects representative probes whose order of intensity is stable between two experiments (see Note 46). 6. Analyze the enrichment value of histone H4 tetra-acetylation at the genome-wide level. 4 Notes 1. It is necessary to set the preferred genes as internal controls for the ChIP assay to detect the target protein enrichments. To determine the enrichment of histone H4 tetra-acetylation, we utilize the ACT7 region as an internal control of the ChIP assay. 2. The T/A dsDNA adaptor (Fig. 3) is designed to increase the efficiency of ligation using the T/A ligation method and to amplify the fragments by the in vitro transcription (IVT) system using T7 RNA polymerase [24]. Forward and reverse strand oligos are very slowly annealed in vitro to make dsDNA. The annealed dsDNA is then purified by PAGE gel extraction of the band, derived from the accurately annealed dsDNA. We have ordered the annealing and purification of dsDNA through Sigma-Aldrich Japan. The quality of purified dsDNA directly affects the subsequent efficiency of ligation and the IVT reaction. 3. Information of the tilling array platform can be found at the Gene Expression Omnibus (GEO) at NCBI (http://www. ncbi.nlm.nih.gov/geo/). The 25 nt oligonucleotides chosen from the reverse strand genomic sequence are comprised in the 1.0F array, and the sequence information is the array platform GPL1980. Each Affymetrix Arabidopsis whole-genome tiling array (1.0F Array) contains 6.4 million 25 nt oligonucleotide probes [18]. The tiling arrays are comprised of 3.2 million perfect match (PM) probes that perfectly match genomic sequence and 3.2 million mismatch (MM) probes whose central base (positions 13 of 25) is substituted by its complement. 4. The head of the tip is ultrathin. Fig. 3 T/A dsDNA adaptor (see Note 2) 422 Jong-Myong Kim et al. 5. To detect well-enriched signals of in the ChIP assay and to prevent variation between each experiment, plant samples (at least 1 g fresh weight) should be used for sampling. Moreover, freeze-thaw steps should be avoided as much as possible because very weak direct protein binding and indirect protein interactions are dissociated as a result of freeze-thaw treatment. 6. To prevent plants from escaping or sticking to the sides of the beaker due to bubbling of the fixation buffer, carefully control the speed for the formation of the vacuum by using the threeway valve connected to the vacuum pump. 7. Well-fixed plants will sink to the bottom of the beaker and they will become darker in color. 8. Well-fixed plants should appear “crispy.” 9. It is necessary to prechill stainless steel tubes, tungsten balls, the aluminum tube holder, and 50 mM HEPES buffer on ice before use. 10. To prevent the blowing up the samples, the O-ring on the metal lid should be changed prior to every experiment and air bubbles should be eliminated from the parafilm sealing on the top of the tubes. 11. Minimum sample volume for effective sonication is 15 mL. If the sample volume is less than 15 mL, foaming will occur during the sonication procedure. 12. Tubes must be kept on ice water. 13. During sonication, samples should be kept in ice water to prevent the warming and foaming of samples. 14. The range of fragment size should be between 150 and 500 bp, peaking at around 250–300 bp. However, minuscule amounts of longer size fragments, ordinarily up to 1,000 bp, are produced. 15. It is necessary to wash magnetic beads with 150 mM lysis buffer three times just before use. 16. The amount of antibody added to detect protein–chromatin interactions are dependent on the antibody titer. The titer for each antibody should be checked and the amount added should be optimized by the ChIP PCR assay. 17. Follow the kit instructions. The use of other DNA purification columns (e.g., Qiagen DNA MiniElute column) in this step is not recommended because the recovery efficiency of low concentrations of Arabidopsis genomic DNA fragments is poor using other kits. 18. DNase-/RNase-free water is used to elute purified DNA from the column. The TE or AE buffer that is provided with the kit for elution is then used. The carry-over of excess salts inhibits the small-scale reactions in the PCR and adaptor ligation. ChIP-on-Chip Analysis in Arabidopsis 423 19. To evaluate the efficiency of ChIP, primers to amplify a region of DNA that is known to be enriched in the sample should be designed as follows: Tm 58–62 °C, nucleotide length ~25 nt, PCR product length 100–250 bp. The sequence should be specific to the target region of interest. ACT7 region is used as an internal control for multiplex PCR to analyze enrichment of histone H4 acetylation. 20. It is recommended to limit the number of PCR cycles less than 27 to guarantee the reliability of quantification. 21. It is necessary to adjust the volume of PCR products applied to the gel to detect unsaturated bands. 22. Detect the densities of each band using the “Histogram Peak” measurement detection tool in the ImageQuant software. Calculate the ratio of enrichment using the following formula: ratio of enrichment = {(value of band density of a target region in ChIPed DNA)/(value of band density of ACT7 region in ChIPed DNA)}/{(value of band density of a target region in Input DNA)/(value of band density of ACT7 region in Input DNA)}. 23. ChIPed DNA prepared after this steps can also be used as template DNA for ChIP-seq analysis. To amplify the template DNA for ChIP-seq analysis, it is recommended to use optimized amplification and sequencing procedures provided from the manufacturer of each high-throughput sequencer. 24. To guarantee linear PCR amplification, it is recommended to limit the number of PCR cycles to less than 15. 25. The main peak size of precisely amplified DNA fragments is shifted from 250–300 to 320–370 bp. 26. If the biotin-labeled cRNAs are not immediately used for cleanup, store them at −20 or −70 °C. 27. Perform the cleanup of the biotin-labeled cRNAs at room temperature. 28. If precipitates are formed, the IVT cRNA binding buffer should be warmed to 30 °C and then maintained at room temperature prior to use. 29. Do not centrifuge the samples after mixing. 30. More than 30 μg of the biotinylated cRNAs should be generated. The ratio (A260/A280) should be between 1.9 and 2.1. 31. RNA fragment size should range from 35 to 200 bp. Store the fragmented cRNA samples at −70 °C before use for hybridization. 32. Immediately after the tiling array is returned from storage at low temperature to room temperature, the rubber of the septa is hard and can easily crack. 424 Jong-Myong Kim et al. 33. Use the pipetman tip “HR-250S” for pushing the septa and filling hybridization buffer through the septum. Note that cracking of the septum causes the deposition of hybridization buffer. 34. Do not use the insoluble materials at the bottom of the tube. Do not add bubbles onto the array. 35. Thoroughly mix the SAPE solution by tapping before use. 36. For BSA, IgG, and antibody stocks, centrifuge the solution and use the supernatant for preparation of the antibody solution mix. 37. If the volume of the recovered hybridization cocktail is less than 170 μL, the center part of the array might not be filled with the cocktail. 38. Be sure to check that the buffer runs up and down. If the bubbles stay at the same position, stop the run and manually refill the array with wash buffer A. When the wrong buffer is used, the run stops. 39. The scanner should be warmed up at least 15 min prior to scanning. 40. Be sure to check whether bubbles stay on the array or not. If bubbles stay on the array, the array should be reset into the cartridge holder and then resubjected to washing and staining. However, excess washing causes a loss of signal intensity for each probe. After washing, the array should be immediately subjected to scanning. The remaining array should be kept in the dark at room temperature before scanning. 41. This step is done to prevent leakage of the solution during the scanning procedure. 42. Use of the latest version on the Arabidopsis genome annotation is recommended. The latest version is TAIR10 (ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release/). 43. Since our preliminary analyses using the intensities of (PMMM) generated better results for the identification of stressresponsive genes than that using only PM intensities, we used the intensities of (PM-MM) for the analyses [25]. 44. In our tiling array analysis, the following probes were excluded from the data analysis: (1) the PM probes which perfectly matched more than two positions and (2) the MM probes which perfectly matched the positions different from its original ones. 45. After the quantile normalization, the intensities of all replicates representing different samples reach a common median. All normalized intensities for each expressed spot are then averaged among all the replicates of the same sample to obtain a single statistic value. ChIP-on-Chip Analysis in Arabidopsis 425 46. We only analyzed the probes in the 90th percentile because probes having dark (no signal) and saturated signals should not be counted even if they have a consistent order. To detect and visualize the ChIP-on-chip results, we applied smoothing using a Parzen window function having a 250 bp width. The width setting depends on binding features of the target proteins. Acknowledgements This research was supported by The Grant-in-Aid for Scientific Research (Priority Areas no. 20127033 and 23012036; Innovative Areas 23119522) from the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan (to MS) and grants from the RIKEN Plant Science Center (to MS). References 1. Wolffe AP (1998) Packaging principle: how DNA methylation and histone acetylation control the transcriptional activity of chromatin. J Exp Zool 282:239–244 2. Jenuwein T, Allis CD (2001) Translating the histone code. Science 293:683–692 3. Kurdistani SK, Grunstein M (2003) Histone acetylation and deacetylation in yeast. Nat Rev Mol Cell Biol 4:276–284 4. Nightingale KP, O’Neill LP, Turner BM (2006) Histone modifications: signaling receptors and potential elements of a heritable epigenetic code. Curr Opin Genet Dev 16:125–136 5. Kouzarides T (2007) Chromatin modification and their function. Cell 128:693–705 6. Bhaumik SR, Smith E, Shilatifard A (2007) Covalent modifications of histones during development and disease pathogenesis. Nat Struct Mol Biol 14:1008–1016 7. Bártová E et al (2008) Histone modifications and nuclear architecture: a review. J Histochem Cytochem 56:711–721 8. Pfluger J, Wagner D (2007) Histone modifications and dynamic regulation of genome accessibility in plants. Curr Opin Plant Biol 10: 645–652 9. To TK et al (2011) Arabidopsis HDA6 is required for freezing tolerance. Biochem Biophys Res Commun 406:414–419 10. Sokol A et al (2007) Up-regulation of stressinducible genes in tobacco anad Arabidopsis cells in response to abiotic stresses and ABA treatment correlates with dynamic changes in histone H3 and H4 modifications. Planta 227: 245–254 11. Kim JM et al (2008) Alterations of lysine modifications on the histone H3 N-tail under drought stress conditions in Arabidopsis thaliana. Plant Cell Physiol 49:1580–1588 12. Kim JM et al (2010) Chromatin regulation function in plant abiotic stress responses. Plant Cell Environ 33:604–611 13. Kwon CS et al (2009) Histone occupancydependent removal of H3K27 trimethylation at cold-responsive genes in Arabidopsis. Plant J 60:112–121 14. Katou Y et al (2003) S-phase checkpoint proteins Tof1 and Mrc1 form a stable replication-pausing complex. Nature 424:1078–1083 15. Cawley S et al (2004) Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116: 499–509 16. Katou Y et al (2006) Genomic approach for the understanding of dynamic aspect of chromosome behavior. Methods Enzymol 409:389–410 17. Lee TL, Johnstone SE, Young RA (2006) Chromatin immunoprecipitation and microarray-based analysis of protein location. Nat Protoc 1:729–748 18. Zhang X et al (2006) Genome-wide highresolution mapping and functional analysis of DNA methylation in Arabidopsis. Cell 126: 1189–1201 19. Zilberman D et al (2006) Genome-wide analysis of Arabidopsis thaliana DNA methylation uncovers an interdependence between methylation and transcription. Nat Genet 39:61–69 426 Jong-Myong Kim et al. 20. Zhang X et al (2007) The Arabidopsis LHP1 protein colocalizes with histone H3 Lys27 trimethylation. Nat Struct Mol Biol 14:869–871 21. Lee J et al (2007) Analysis of transcription factor HY5 genomic binding sites revealed its hierarchical role in light regulation of development. Plant Cell 19:731–749 22. Morohashi K, Grotewold E (2009) A systems approach reveals regulatory circuitry for Arabidopsis trichome initiation by the GL3 and GL1 selectors. PLoS Genet 5:e1000396 23. Bolstad BM et al (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19:185–193 24. Liu CL, Schreiber SL, Bernstein BE (2003) Development and validation of a T7 based linear amplification for genomic DNA. BMC Genomics 4:19 25. Matsui A et al (2008) Arabidopsis transcriptome analysis under drought, cold, high-salinity and ABA treatment conditions using tiling array. Plant Cell Physiol 49:1135–1149 Part V Cell Biological Techniques Chapter 23 Fluorescence Microscopy Sébastien Peter, Klaus Harter, and Frank Schleifenbaum Abstract Optical microscopy has developed as an indispensable tool for Arabidopsis cell biology. This is due to the high sensitivity, good spatial resolution, minimal invasiveness, and availability of autofluorescent proteins, which can be specifically fused to a distinct protein of interest. In this chapter, we introduce the theoretical concepts of fluorescence emission necessary to accomplish quantitative and functional cell biology using optical microscopy. The main focus lies on spectroscopic techniques, which, in addition to intensity-based studies, provide functional insight into cellular processes. Key words Fluorescence microscopy, Spectromicroscopy, FRET, Autofluorescent proteins, Fluorescence sensors 1 Introduction Modern plant science research on systems such as Arabidopsis aspires to a precise understanding of molecular processes controlling the function of the plant at a subcellular level. To this end, several diverse techniques are available, ranging from genetics through biochemical approaches to electron microscopy with atomic resolution. In spite of the huge potential of these methods, they drastically influence the native functionality of a living system and, thus, real in vivo studies are difficult to obtain with these techniques. This is where optical microscopy comes into play. Being noninvasive techniques, which are also applicable to cells in their native tissue, optical approaches allow the undisturbed observation of cellular function [1]. With a diffraction-limited spatial resolution of around 200 nm, access to subcellular structures is possible and different functional compartments inside a cell can be distinguished. Moreover, the information content achievable from an optical measurement can be drastically enhanced when fluorescence emission is used to create a microscopy image. This way, only distinct areas of a sample, which host specific fluorescent dyes become visible. This chapter focuses on fluorescence microscopy Jose J. Sanchez-Serrano and Julio Salinas (eds.), Arabidopsis Protocols, Methods in Molecular Biology, vol. 1062, DOI 10.1007/978-1-62703-580-4_23, © Springer Science+Business Media New York 2014 429 430 Sébastien Peter et al. and presents different readout modes and specific notes on experimental parameters. Fluorescence emission differs from other optical techniques by the red-shifted emission of a fluorescence dye relative to the excitation wavelength [2]. This way, fluorescence emission principally occurs in front of a zero-background. Using designated optical filters, only light actively emitted by a fluorophore contributes to the microscopy image. This instance makes fluorescence microscopy one of the most sensitive techniques known so far, which even allows for the observation of single isolated molecules [3]. To exploit the full potential of fluorescence microscopy, it is essential to understand the origin of fluorescence emission. In the following, a short introduction into this field is provided. However, we suggest more specialized literature for further reading [2, 4–7]. Fluorescence emission is due to an electronic quantum transition in an electronically excited molecule. The principal processes can be summarized in a Jablonski diagram as depicted in Fig. 1. According to Boltzmann’s statistics a molecule is in its electronic (S0) as well as in its vibronic ground state at room temperatures in good approximation. If this molecule is interacting with electromagnetic irradiation with an energy matching the energy gap between the electronic ground state and some vibronic states of the first electronically excited state (S1), the molecule undergoes a transition between these states within a few femtoseconds (absorption). After this excitation, the molecule will lose some of the excitation energy thermically by vibration and collision with adjacent molecules. This effect which commonly is referred to as thermal equilibration (TE) occurs on a sub-picosecond timescale. After TE, the molecule is trapped in the vibronic ground state of the electronically excited state for a certain time until the further relaxation into the electronic ground state occurs. Here, one has to distinguish two competing mechanisms, a non-radiative, which is not accompanied with light emission, and a radiative, which is commonly referred to as fluorescence. The probability of these relaxation processes to occur can be expressed by the overall relaxation rate G = G rad + G nonrad. The radiative rate G rad expresses the Fig. 1 Jablonski diagram for a schematic illustration of quantum transition during fluorescence emission Fluorescence Microscopy 431 fluorescence photon flux per time interval and its reciprocal value t = 1 / G rad represents the typical time span a molecule is trapped in the electronically excited state before fluorescence occurs. This so-­called fluorescence lifetime (FLT) is an important spectroscopic parameter, which provides valuable insight into functional cell biology, as will be discussed later on. The radiative transition can reach any vibronically excited level of the S0-state, which gives rise to the shape of fluorescence spectra, which do not consist of a single line, but are rather composed of a number of broad bands. Each of these bands corresponds to a transition to a vibronic state from which, in turn, non-radiative TE occurs. Albeit rather broad, the fluorescence spectrum of a given molecule is characteristic and can be used to identify a distinct emitter among others. Moreover, fluorescence emission is not a mere property of the fluorescence dye but is also influenced by its local chemical nano-environment. As a consequence, the fluorescence spectroscopic information can be used as a probe for the local surrounding of a fluorophore and can, hence, provide information about distinct changes in physicochemical parameters, such as the pH or the redox potential. Before these options are discussed, a principal introduction into modern fluorescence microscopes is provided [8]. Note that, in addition to the processes discussed above, the molecule can also undergo a transition to a different electron spin configuration, the triplet state, via intersystem crossing (ISC). From this state, delayed emission can occur, which commonly is referred to as phosphorescence. However, as phosphorescence does not occur significantly for fluorescence dyes used in cell biology, this effect will not be treated further. 2 Confocal Microscopy FLT and fluorescence spectra are the most prominent spectroscopic characteristics, which can be read out with spatial resolution in addition to a fluorescence intensity image. Therefore, it is crucial to record the spectroscopic information in a very-well-defined spatial area and to exclude cross talk from other regions. Confocal laser scanning microscopy (CLSM) is one very prominent approach to reach this goal [1]. The basic principle of CLSM, which was invented by Marvin Minsky in 1957, is straightforward and consists of three confocal spots for the (1) excitation, (2) light collection, and (3) detection, as depicted schematically in Fig. 2a [9]. Due to the confocal arrangement, only one highly confined sample area, the focal spot, is irradiated and the fluorescence light is only collected from this defined area. The third confocal plane, which typically consists of a pinhole, blocks any light, which does not originate from the focal spot. Moreover, contrarily to a conventional microscope, a confocal image is confined to a well-defined 432 Sébastien Peter et al. Fig. 2 (a) Schema of confocal principle. The indices i, ii, and iii refer to the respective confocal focusing elements. (b) Confocal beam path using one focusing element for both excitation and collection. (c) Scheme of a typical confocal setup image plane. Light, which can pass the pinhole in x- and y-­direction but which is not tightly focused in z-direction (see Fig. 2c for axis assignment), will result in a broadened spot in the pinhole plane. Accordingly, the intensity is spread over a broader area and only a small fraction is directed to the detector. These considerations yield the basic concept of a point-to-point imaging. This, in turn, requires raster scanning of the focal spot relative to the sample and rearrangement of the intensity information for any image point (i.e., pixel) to obtain a microscopy image. The schema depicted in Fig. 2a is not very convenient, though. This is mainly because the maximum thickness of the sample is limited and alignment is difficult because two foci from different lenses have to be aligned to exactly the same spot. Hence, typical confocal configurations use the same lens for both excitation and detection. This is achieved by introducing a dichroic beam-splitter into the beam path as shown in Fig. 2b. This component reflects light below a certain cutoff wavelength and transmits radiation of longer wavelengths. This way, the excitation light is effectively directed onto the sample, while the fluorescence light can pass to the detector without disturbance. Actual high-end confocal setup uses acousto-optical beam splitters instead of dichroic mirrors. Offering basically the same functionality, they are highly flexible in varying the cutoff wavelength and can electronically be adjusted to any fluorescent dye system without the need of changing optical parts. Figure 2c shows a scheme of a typical confocal setup, equipped with a spectrally integrating detector and an additional spectrometer attached to a CCD camera, which allows the confocal acquisition of fluorescence spectra. As the dichroic beam splitter Fluorescence Microscopy 433 has a blocking efficiency, which is too low to restrain all excitation light reflected from the sample, additional long-pass or band-pass filters are introduced in the detection beam path. One important parameter in basically any microscopy application is the spatial resolution Δd. For optical microscopy, the resolution is limited due to diffraction of light waves in the focal spot according to Dd = l l = 2·n·sin q 2·NA (1) with θ being the half opening angle of the focusing lens and n the refractive index of the medium between lens and sample [10]. For convenience, the product of n and sin θ is often written as the numerical aperture (NA) of the focusing element. Equation 1 exhibits that the optical resolution is physically limited and directly depends on the wavelength. Moreover, the NA also limits the spatial resolution. As a consequence, in confocal microscopy typically microscope objectives with a high numerical aperture are used as focusing elements. Theoretically, the maximum value for the NA would be 1 in air, but such opening angles of 180° cannot be achieved with lens systems. However, the NA can effectively be enhanced by introducing immersion liquids. These are substances, which are highly transparent in the optical spectral regime to provide maximum transmission for fluorescence light and offer a refractive index significantly higher than 1. Typical immersion liquids are water (n = 1.33) or specific immersion oils (n = 1.51). This way, microscope objectives with NAs of 1.35 with a magnification factor of 100 are available and even objectives with NA = 1.46 are used for some special applications [9, 11]. See Note 7 for a short guide to selcet a suitable objective for a distinct experiment. Using Eq. 1, one obtains a maximal resolution of Dd = 185nm for blue light (500 nm) and an objective with NA = 1.35. It is important to note that the magnification of an objective does not directly influence the obtainable spatial resolution. For a lower magnification, the confocal pinhole size, which has to match the diameter of the focus Dd , is just larger. Small magnification factors are avoided, because a lower magnification is typically accompanied with a larger working distance between objective and sample, which translates into a smaller NA. Recent efforts have been made to circumvent the diffraction-­ limited optical resolution and different methods such as photoactivated localization microscopy [12–15] [PALM, also referred to as stochastic optical reconstruction microscopy (STORM)], stimulated emission depletion (STED) microscopy [16, 17], or super-­ resolution optical fluctuation imaging (SOFI) have been established [18]. These techniques are very powerful and lift optical microscopy close to a molecular level. Yet, there is hardly any application 434 Sébastien Peter et al. to plant cells so far, mainly due to some properties of plant cells such as the strong autofluorescence, which hamper the detection of single emitters inherent in PALM/STORM or depose too much energy due to the high laser powers required in STED. Hence, these techniques are not discussed in this chapter and we refer to more specialized literature. 3 Fluorescence ReadOut Modes Besides the intensity information, the fluorescence emission also carries information about the local environment of the fluorophore [8]. In the following section, the nature of this information, and how it can be measured, is introduced. The determination of the FLT τ is of special interest [4, 19]. This value represents the probability of a dye molecule to emit a photon in a distinct time window after excitation. Any disturbance of the excited state of the molecule will result in a change of this probability [2]. The origins of these disturbances are manifold, ranging from mechanical stress to changes in the refractive index, the local pH value, or electric field [20–22]. However, not all fluorescence dyes exhibit the same sensitivity to changes of these parameters and, thus, the FLT can be used as a valuable local probe. The decay of the excited electronic state is significantly slower than processes like the thermal equilibration, but is still typically in the nanosecond time range. Hence, fairly sophisticated data acquisition techniques and electronics are required. The most prominent approach to determine τ, which is utilized in the vast majority of confocal fluorescence microscopes, relies on the statistical analysis of the photon arrival times. This time-correlated single-photon counting (TCSPC) uses a short laser-pulse with pulse lengths in the picosecond range well below the FLT to locally excite the sample. Synchronously, an electronic stop watch is started, which is stopped by the first fluorescence photon detected by a highly sensitive detector such as an avalanche photodiode (APD) or a photon multiplier tube (PMT) operated in photon counting mode. The arrival time is translated into a discrete time value by an analogueto-­digital converter and histogramed. The procedure is repeated for several thousands of times, resulting in an intensity decay time histogram [23]. As the histogram describes the probability of the metastable excited state to decay into the ground state, it can be mathematically treated as a radioactive decay or as a chemical reaction with first-order kinetics. Hence, the time evolution of the fluorescence intensity I(t) obeys an exponential decay with a time constant τ, defined as the time span after that the initial intensity I 0 has dropped to 1/e. Accordingly, the fluorescence decay histogram can be described by Fluorescence Microscopy æ tö I (t ) = I 0 · exp ç - ÷ è tø 435 (2) Equation 2 can be rearranged to read t = -t ·ln(I 0 / I (t )) and the FLT can directly be extracted. The knowledge of the FLT can give useful information about the local environment of the chromophore under investigation. See Notes 4–6 for experimental settings and possible pitfalls in FLT-measurements. Fluorescence microscopy of Arabidopsis and other plant cells generally suffers from autofluorescence background. The strong emission from chloroplasts is found exclusively in the red spectral region and can, hence, be filtered out by short-pass or band-pass filters with acceptable spectral bleed-through. In contrast, unspecific emission from other compartments and especially from the cell wall exhibits a strong spectral and temporal overlap with the emission properties of typical fluorescence dyes. As a consequence, this autofluorescence contribution cannot easily be filtered out with conventional methods. Hence, different approaches have to be invented to discriminate the autofluorescence from the specific label signal. One very robust approach utilizes the statistics of the species contributing to a local fluorescence decay recorded with a standard TCSPC-FLIM setup. This fluorescence intensity decay shape analysis microscopy (FIDSAM) offers a robust means to discriminate background from target emission [24–26]. To this end, the shape of the fluorescence decay is compared to a reference function, i.e., a monoexponential fit function. In case of only pure label dye contributing to the measured fluorescence, the decay signal can be well described by the reference function and the resulting error value, representing the deviating from the fitted to experimental curve, is small. Contrarily, autofluorescent tissue consists of a multitude of unspecific emitters, each of which exhibiting its individual fluorescence decay statistics. As a consequence, the recorded fluorescence decay represents the sum of a large number of decay statistics and, thus, becomes multiexponential. Using the reference function to describe this multiexponential decay will result in relatively large error values. Obviously, the error values represent a quantitative number to determine the autofluorescence contribution to a signal recorded with spatial resolution. Hence, multiplication of the original intensity value of an image pixel with the inverse error value will cause this pixel to diminish in a FIDSAM-corrected image in dependence of the autofluorescence contribution. The most prominent feature of the FIDSAM technique is its robustness and applicability to basically any label dye without any presumptions. The basic concept rests on the valid assumption that the brightness of a well-fashioned dye molecule, which is the product of the extinction coefficient at a given excitation wavelength 436 Sébastien Peter et al. and its fluorescence quantum yield, exceeds the brightness of an arbitrary autofluorescent biomolecule. Accordingly, the impact of a single molecule contributing to the autofluorescence background is small compared to a fluorescence dye. This way, the relative contribution to the local fluorescence decay is higher for (bright) fluorescence dyes and, hence, as soon as these compounds contribute to a measured fluorescence decay, they dominate the shape of the decay curve. 3.1 Optical Protein– Protein Interaction Studies Besides localization studies, which clarify the appearance of a distinct protein of interest in a certain cellular compartment, the investigation of the interaction of two or more proteins is a major field in modern cell and molecular biology. A general concept incorporates protein–protein interaction as one of the major players in signal transduction, regulation of enzymatic activities, gene regulation, etc. Protein–protein interactions can initiate different phosphorylation states, can block distinct binding sites in a competitive fashion or in a noncompetitive way, or can directly influence transcription factors and, hence, regulation of gene expression [27]. The identification and quantification of protein–protein interactions in the living cell context are, therefore, of high interest. Unfortunately, mere imaging of protein distribution cannot lead to reasonable results due to the limited spatial resolution, which is restricted to ~200 nm. Nevertheless, optical fluorescence microscopy offers two concepts which circumvent this restriction and combine the advantages of optical microscopy, such as non-­ invasiveness and dynamical readouts, with the possibility to identify molecular interaction on a nanometer scale. 3.1.1 Fluorescence Resonance Energy Transfer Fluorescence resonance energy transfer (FRET), which was for the first time described by Theodor Förster in 1948 [28], exploits the distance dependence of the electromagnetic coupling of two dye molecules in the optical near field. The molecular basis for FRET to occur requires a pair of dyes consisting of a “donor” and an “acceptor.” The dyes are chosen in a way that the absorbance spectrum of the acceptor reasonably overlaps with the fluorescence emission spectrum of the donor. Given that the dyes are closely adjacent and properly oriented with respect to each other, energy, which has been used to excite the donor, can be transferred non-­ radiatively to the acceptor. Due to the postulate of conservation of energy, this energy transfer causes the donor to be quenched into the electronic ground state while the acceptor is transferred to the excited state. The acceptor may return into the ground state in a conventional manner such as by fluorescence. As will be discussed later on, there are different techniques to determine the FRET efficiency based on intensity readouts or using time domain techniques (FRET-FLIM). Fluorescence Microscopy 437 A theoretical treatment of FRET considers the two dyes as oscillating dipoles acting as transmitting and receiving antenna [2]. This treatment comprehends a set of equations, which account for the interchromophoric distance of the two emitters r, the relative orientation of their transition dipole moments κ2, the fluorescence quantum yield of the donor chromophore QD, and the overlap integral of donor emission and acceptor absorption J(λ) to describe the energy transfer efficiency E according to E= R06 R + r6 (3) 6 0 9, 000(ln 10)k 2Q D with R = J (l) and J (l) = 128p5Nn 4 6 0 ∫ ∞ 0 FD (l)e (l)l 4d l ∫ ∞ 0 FD (l)d l N is Avogadro’s constant and n represents the refractive index. R06 is a system constant and describes the interchromophoric distance where 50 % of the excitation energy is transferred from the donor to the acceptor. This parameter is of practical relevance as it is commonly used to describe a FRET pair. Equation 3 reveals an inverse sixth power law distance dependence of the FRET efficiency. As a consequence, FRET is most effective for chromophores in close contact and decays typically within 10–15 nm, depending on the actual dye-system, to less than 1 %. Hence, FRET can be used as a nano-ruler to single out chromophoric distances and their changes on a length scale two orders of magnitude smaller than the diffraction-limited optical resolution. Accordingly, two interacting proteins in close proximity, which are labeled with an appropriate donor–acceptor pair, will cause FRET to occur, whereas proteins, which are located in the same compartment but do not interact, will exhibit much less or rather no FRET activity [29–32]. Whereas qualitative FRET-studies can distinguish interacting from noninteracting proteins, quantitative FRET uses the full potential of the method to determine real interchromophoric distances with nanometer accuracy. For example, different binding domains, which lead to a different composition of a protein dimer, can be differentiated. To date, several approaches are known to determine the FRET efficiency and by that the interchromophoric distance. In the following, the most common approaches and their limitations are presented. The most straightforward approach relies on an intensity-­ based data evaluation, utilizing the quenched donor emission FDA relative to the donor fluorescence when no acceptor is present FD: E = 1- FDA FD (4) 438 Sébastien Peter et al. The main restriction of this approach is that it requires the knowledge of the unquenched donor emission intensity [2, 33]. This parameter is not always accessible, especially because intensity studies obtained from different plant cells are often not comparable. Another intensity-based approach for FRET detection relies on sensitized acceptor emission, where the detection channel is chosen to meet the acceptor emission wavelength. This approach often suffers from cross talk caused by donor emission, which leaks into the acceptor channel or directly excites the acceptor. Moreover, a quantification is difficult, as the absolute acceptor emission intensity for the highest FRET efficiency E = 1 remains unknown. The two intensity-based methods can significantly be improved when two detectors, matching the emission of the donor and the acceptor, respectively, are installed. Using this configuration, the FRET efficiency can directly be determined by a ratiometric measurement according to E= FA φA FDA FA + φD φA (5) with FA as the acceptor fluorescence intensity and FDA as the donor intensity. F A and F d represent the fluorescence quantum yields of the acceptor and the donor, respectively. However, while this approach is suited for quantitative determination of FRET, it suffers from intrinsic limitations inherent in intensity-based quantitative analysis of fluorescence studies. The main problem of this readout modality is uncertainties of absolute donor or acceptor concentrations. These uncertainties may be caused by incomplete protein-labeling, e.g., due to imperfect expression of the fluorescent protein tag or partial degradation of the fusion protein, or by photobleaching. Moreover, slight misalignments of the optical setup strongly affect the obtained values of the energy transfer efficiency. A more sophisticated intensity-based approach to quantitative FRET relies on gradual acceptor photobleaching, where the acceptor of a FRET pair is bleached by direct resonant excitation and the recovery of the quenched donor emission is monitored [34]. While this readout technique works fine on a single-molecule level, where photobleaching of the acceptor can be precisely visualized, the application to bulk samples incorporating several FRET-pairs as is typically the case in functionally labeled biological samples poses some restrictions. Mainly, it is not trivial to ensure the complete photobleaching of all acceptors in the detection volume. As a consequence, not all donor emission is recovered and the obtained values for the FRET efficiency tend to be too low. Accordingly, there is a strong need of alternative readouts for a quantitative description of FRET. Fluorescence Microscopy 439 One prominent technique utilizes time domain spectroscopy to analyze the fluorescence energy transfer (FRET-FLIM) [30, 31, 35]. Using a TCSPC setup (see Subheading 3), the radiative rates of the transition from the excited to the ground state of a fluorophore are investigated. If energy transfer to an acceptor occurs, an additional relaxation channel for the donor to lose its excitation energy opens. Accordingly, the radiative transition has to compete with an additional non-radiative pathway, causing the donor FLT to be shifted to shorter time values. This reduction in the FLT is connected to the energy transfer efficiency and can be quantified according to E ET = 1 - t DA tD (6) with τD as the donor FLT in the absence of an acceptor and τDA as the donor FLT, when energy transfer can occur. According to Eq. 6, the time domain approach provides one discrete number, i.e., the quenched donor FLT, to precisely determine the FRET efficiency. However, also this method requires a careful data analysis, since the fluorescence intensity decay of the quenched donor intrinsically obeys a second- or even higher order exponential decay function. Accordingly, data fitting must be accomplished in a careful manner and it has to be taken into account that the individual amplitudes and decay time constants are coupled parameters. There are efforts to circumvent this limitation, for example by recording the acceptor rise time [36, 37]. This very promising approach, however, lacks sensitivity and can only give reliable results for lower energy transfer efficiencies. For this reason, most time domain FRET studies mainly rely on the analysis of the FLT of the quenched donor chromophore. This is valid as long as relative statements in a semiquantitative manner are made or if the second time component of the donor decay function is kept constant for different transfer efficiencies. A more detailed discussion can be found in reference 33. FLT imaging is a powerful tool for the quantitative determination of FRET processes. However, for an evaluation of the FRET efficiency, the knowledge of the FLT of the FRET donor in the absence of the FRET acceptor is required. Therefore, control samples are required where no FRET occurs. The FLT is an intrinsic and specific property for a given chromophore, however only in a defined environment. In a cell, this means that the observed FLT of a chromophore such as an autofluorescent protein (AFP) may considerably deviate between different measurements depending on the fusion partner of the fluorescent protein, its cellular localization, pH value, ionic strength, and more factors. Therefore, the control sample should differ from the actual samples only in that no specific interaction with an acceptor and therefore no FRET occur. All other parameters, such as the protein which the fl ­ uorescent protein is attached to and its localization, should be kept constant to avoid data misinterpretation. At very high expression levels of 440 Sébastien Peter et al. fluorescent proteins, some FRET may occur due to high concentrations of the target proteins and therefore a minor probability that a donor and an acceptor approach close enough for FRET to occur. In those cases, a control where the FRET donor is expressed along with a FRET acceptor that lacks its fusion partner while still targeted to the appropriate cellular compartment (e.g., via a nuclear localization signal) may be more suitable than the estimation of a FRET donor alone. This way, FRET activity without specific protein–protein interaction can be singled out. While FRET-FLIM is valuable for studies such as listed above, even a qualitative interpretation of protein–protein interaction studies becomes difficult for very low energy transfer efficiencies. Unfortunately, many biological studies incorporate large interacting proteins which force the donor and acceptor chromophore to relatively far remote distances, even for a positive interaction. As a consequence, the FRET efficiency is very low and according to Eq. 5 the reduction of the donor FLT is only marginal. Together with local inhomogeneities of the FLT caused by the individual nanoenvironment sensed by the chromophore, it is often difficult to judge a protein–protein interaction positive or not. Here, the FIDSAM technique can also be applied, as it uncovers FRET activity due to the inherent multiexponential decays in FRET-active sample regions. This way, even marginal reductions of the donor FLT due to FRET can be discriminated from FLT reductions caused by environmental factors such as the pH value [38]. For a more sophisticated protocol using multiple FRET-systems (see Note 1). FLIM is a method that generates almost no false-positive results; however, there is a risk of getting false negatives. The apparent FLT of the FRET donor may be assigned to a free donor although an interaction with a second protein with a fused acceptor actually occurs in the following cases: Cause Remarks Expression level of the FRET donor much lower than expression level of the acceptor The measured FLT is a mixture of free FRET donor and FRET donor with an acceptor bound to it. Depending on the stoichiometry, this value may be quite close to the FLT of a free FRET donor. Therefore, the expression levels of donor and acceptor chromophore should be comparable Large interaction partners The FRET donor FLT may not be shortened although its fusion partner interacts with an acceptor-bound protein when the target proteins are so large that the two fluorophores are separated by a distance well beyond the Förster radius. It may be useful to attach the fluorescent labels to different sites of the proteins in order to check different sterical arrangements of the two proteins of interest (continued) Fluorescence Microscopy 441 Cause Remarks Blocking of interaction sites by fluorescent protein An interaction between two target proteins may be frustrated when the attached fluorescent proteins impose a sterical barrier that blocks interaction sites within the proteins. For overcoming this, see above Proteolytic cleavage at the linker between target protein and fluorescent protein The fluorescent protein may be cleaved from its fusion partner in vivo. This can be evaluated by checking the size of a fusion protein by downstream methods such as SDS-PAGE with subsequent western blot. If the fluorescent protein is cleaved from its fusion partner, vectors with different linkers may be used or the attachment site of the fluorescent protein changed 3.1.2 Bimolecular Fluorescence Complementation Besides the discussed FRET analysis there is another prominent optical technique to determine protein–protein interaction in vivo, which relies on the bimolecular fluorescence complementation (BiFC) [39–41]. The BiFC approach utilizes a specialty of AFPs, which will be introduced in detail in Subheading 4.1. 3.2 In Vivo Diffusion Studies To understand the dynamics of cellular function, the investigation of protein mobility inside the cell is of high importance. This can be achieved by diffusion studies of distinct fluorescent-labeled proteins. The most accurate techniques to obtain precise diffusion coefficients are fluorescence correlation spectroscopy (FCS) methods [42]. These techniques are quite sophisticated and typically require extensive data acquisition times of several hours. Moreover, their application to Arabidopsis and other plant cells has only been demonstrated in exceptional cases. This is mainly because FCS intrinsically relies on the detection of single emitters and the signalto-­noise ratio drastically decreases in case of background contribution. However, there is another technique, which provides access to molecular diffusion in a living tissue context. This method uses fluorescence recovery after photobleaching (FRAP) and can basically be accomplished with any commercial fluorescence microscope [43]. In FRAP, an intensity map is recorded in defined region of interest (ROI). In a next step, the ROI is irradiated with a high-­ power laser source, typically operating in a pulsed mode to obtain high power densities. This way, fluorescence dyes in this area are irreversibly transferred into a nonfluorescent state due to photobleaching and the intensity in the ROI drops to background level. The fluorescence intensity in the ROI recovers with time due to molecular diffusion. Recording the evolution of the fluorescence intensity until a steady-state level is reached (that is, the fluorescently labeled proteins are distributed homogeneously), the diffusion coefficient D can directly be deduced. 442 4 Sébastien Peter et al. Fluorescence Labels 4.1 Fluorescence Probes The key player in fluorescence microscopy is the fluorescence dye under investigation. A perfect dye system has to meet some requirements regarding its photophysical performance and its suitability for the distinct biological problem. Firstly, the dye should be highly photostable, meaning that it will not undergo photobleaching during the time of observation. Moreover, an outstanding brightness is desirable to achieve good image contrasts. The brightness is defined as the product of extinction coefficient at a given excitation wavelength and the fluorescence quantum yield [2]. The first parameter describes the molecule’s ability to absorb the excitation light and is connected to Lambert–Beer’s law according to c ·d e(l) = with E as the absorbance, c as the concentration of the E (l) dye in solution, and d as the cuvette thickness. The fluorescence quantum efficiency defines the probability of a dye to decay radiatively after excitation, and is expressed as a quotient of radiative and non-radiative decay rates according to F = G rad / G rad + G nonrad with values 0 < F < 1. Besides these photophysical requirements, the spectral properties of the fluorescence dye have to fit the biological question. This means that the emission should not overlap with some intrinsic background luminescence. Especially in plant systems, this is an issue and complicates the use of dye systems emitting in the far red spectral regions as they would overlap with the strong fluorescence of the plant chloroplasts. Moreover, the excitation wavelength of the dye should be in a spectral region where there is ideally no or little absorption of the cellular components. This is important because any absorbance can lead to unspecific emission and arouses a strong background signal. In addition, light absorption especially in the near-ultraviolet region can induce physiological effects such as DNA degradation. In Arabidopsis, also the photoreceptors of the plant have to be considered. Here, it is not always possible to find a fluorescence marker which does not interfere with the activity of the photoreceptors. However, if an external activation is not critical for the distinct studies or occurs on a timescale significantly longer than the microscopic study, these influences can be neglected. Another requirement concerns the specificity of the fluorescence marker. In contrast to fluorescence-based techniques such as cell sorting, in fluorescence microscopy it is crucial to exclusively mark a desired cellular compartment or even a distinct type of protein. To meet these requirements, industry offers a variety of high-­ performance fluorescence markers. Two prominent suppliers are Invitrogen and Molecular Probes, who sell, amongst others, the well-known Alexa dyes and Atto-tec, which provide the so-called Atto-dyes, which are outstanding concerning brightness and Fluorescence Microscopy 443 photostability. Despite their well-performing photophysical properties, these synthetic dyes suffer from the fact that they have to be inserted externally into the cell. While this is relatively feasible for mammalian cells, it is an issue for plant cells due to the cell wall, which has to be penetrated. The use of protoplasts, which lack the cell wall, is an accepted way to circumvent this problem. However, precise localization studies are no longer possible. A further problem is the specificity of these synthetic dyes. One approach uses very specialized markers such as the mitotrackers, which exclusively mark the mitochondria. To this end, an oxidation from a nonfluorescent to a fluorescent form is achieved with a thiol-specific binding of the mitotracker in the mitochondria. Other approaches for selective labeling use specific antibodies, which are covalently bound to the marker fluorophores. This way, distinct well-defined proteins bind to the antibody and specific protein labeling is feasible. Despite these promising developments in external fluorescence staining, those techniques have two major intrinsic limitations. The first one concerns cell toxicity. This is an issue for most of the synthetic fluorescence dyes which are composed of expanded conjugated aromatic systems inherent to their functional principle. Moreover, the use of specific antibodies can drastically influence the functionality of the labeled proteins. This is on the one hand due to the size of the antibodies, which is frequently comparable to that of the labeled protein or even larger. Moreover, a specific binding of an antibody might block functional binding sites of the protein, thus manipulating biological processes such as signal transduction. These limitations are largely overcome by the family of AFPs, which literally have revolutionized fluorescence cell biology within the past 15 years [44–48]. In contrast to conventional fluorescent label dyes, AFPs are peptides, which intrinsically contain a chromophoric unit. Accordingly, the AFP genes can be fused to the gene of interest by molecular techniques. After transient or stable transformation, the corresponding fusion protein is expressed in the Arabidopsis cells. Depending on the presence of the appropriate target sequence, the AFP fusion proteins can be directed to different subcellular compartments. While the first AFP, the green fluorescent protein (GFP), was limited in its spectral and functional properties, site-directed point mutations led to significantly improved and modified spectroscopic properties of GFP. One prominent example for this work is the creation of enhanced GFP (eGFP), a variant of the wild-type GFP (mutation S65T) with improved photostability and higher brightness due to increased extinction coefficient and fluorescence quantum yield. Almost any recent work which uses GFP as a fluorescence tag uses this enhanced form, even if not explicitly stated. The group of the Nobel laureate Roger Tsien also extended the spectral range of the AFPs, now covering the complete visible regime, ranging from the deep blue 444 Sébastien Peter et al. (blue fluorescence protein, eBFC, λem = 440 nm) to the far red (mPlum, λem = 648 nm). Thus, multicolor fluorescence in vivo labeling is possible [49]. All AFPs known to date are relatively small proteins composed of about 250 amino acids with a molecular weight around 25 kDa (GFP: 238 amino acids, MW = 26.9 kDa). The tertiary structure of the AFPs comprises a barrel-shaped morphology, which is composed by a set of 11 β-sheets, which helically wind along a central c∞-axis of symmetry, forming a barrel-shaped structure. This β-barrel is capped by an α-helical structure, sealing the inner area of the barrel from penetration of larger molecules or ions. In the wild-type form, the chromophore of the AFPs is composed by three amino acids (Ser–Tyr–Gly), which protrude in the inner part of the protein shell. After expression of the protein, these amino acids undergo a maturation process involving a cyclization, dehydration, and oxidation. Interestingly, approaches to synthesize the isolated chromophores lacking the protein shell resulted in nonfluorescent compounds, indicating that the protein shell significantly impacts the optical properties of the AFPs by stabilizing the three-dimensional structure of the chromophoric unit. Fine-tuning the emission properties of these proteins comprises modifications in both the chromophoric unit as well as the surrounding protein shell. In one prominent modification of the chromophore itself, the tyrosine 66 is exchanged by histidine, which shortens the delocalized π-system, causing the hypsochromic shift of the blue variant of GFP, (e)BFP. Contrarily, a significant red-shift of the fluorescence emission, which closes the gap between GFP and DsRed-type AFPs, can be achieved by exchanging a threonine by a tyrosine at position 203 (T203Y). In the folded protein, the π-system of this amino acid will arrange in a way that approaches the chromophoric unit without the formation of a covalent bond. This way, a π-stack is formed, which lowers the energy gap between S0 and S1 state, causing the emission maximum to shift from 505 to 530 nm in the yellow fluorescent protein (YFP) [50]. A different class of autofluorescent proteins, DsRed, which was found in the reef coral Discosoma sp., exhibits red fluorescence. Wild-type DsRed has the intrinsic property to form tetramers, which cannot be separated by physical or chemical means to obtain functional monomeric subunits. The formation of these tetramers hindered an extended use of DsRed as an in vivo label despite its outstanding spectral properties with emission showing less cross talk with autofluorescence background. This only changed with the introduction of the family of monomeric red fluorescent proteins (mRFPs: mCherry, mPlum, mStraberry, etc. often referred to as mFruits), which are point mutations of wild-type DsRed where the protein shell is modified to lose its tendency to aggregate but still maintains its fluorescence functionality [49]. As a result of these efforts, a variety of AFPs is available today covering the complete spectral Fluorescence Microscopy 445 region from blue to red. A remarkable class of AFPs comprises photoswitchable proteins such as DRONPA, which exhibits intense green fluorescence when excited with λexc = 488 nm. Increasing laser power causes the protein to switch to a nonfluorescent dark state. In contrast to photobleaching, this dark state is formed reversibly and fluorescence can be recovered by irradiation at 400 nm. As this switching between on- and off-states requires about two orders of magnitude lower irradiation intensities compared to photobleaching, DRONPA is highly suited for FRAP studies with less risk of cell damage. Moreover, due to the reversibility of the switching process even complex kinetic studies in a single cell are feasible [51]. Note 2 provides an overview on frequntly used fluorescent dyes together with appropriate filter sets. The BiFC approach [39] to study protein–protein interactions utilizes the unique property of AFPs, where fluorescence is only observed for the chromophore enclosed in its well-defined protein shell environment. Accordingly, if only one part of the protein is expressed, this fragment will be in a fluorescence inactive state. This can be utilized if the AFP gene is cut into two subsequences. These subsequences are then fused to the genes encoding the proteins of interest. If these fusion proteins interact, the two AFP fragments approach and will eventually orient in a way that they are capable of reconstituting the complete protein structure, forming a functional AFP. BiFC, which was initially demonstrated for YFP and is often referred to as split-YFP, is a very elegant technique to investigate protein–protein interaction, not only as it is very specific but also as fragments of different AFP mutants may complement, forming BiFC-products with distinct spectral emission (multicolor BiFC) [52]. It is, therefore, possible to investigate multiple competitive interactions at a single time. Moreover, the BiFC technique is very sensitive as it works with a zero background and spectral cross talk, often an issue in FRET studies, cannot occur. Despite these fascinating possibilities offered by BiFC, there are also some restrictions. The most important one is the non-­ reversibility of the protein complementation, rendering this method unsuited for dynamic investigations where transient protein–protein interactions shall be monitored over time. Moreover, BiFC can also give rise to the measurement of false positives if the affinity of the two fragments is high enough to form a functional AFP even if there is no specific interaction between the two fusion proteins. 4.2 AFP-FRET Pairs The most prominent AFP-FRET pair is formed from the cyan fluorescent protein (CFP) and the YFP. While the properties concerning spectral overlap and spectral cross talk of this system are rather ideal, other photophysical parameters restrict its applicability. First, the optimal excitation wavelength for the donor CFP is 438 nm, which evokes a strong autofluorescence background. Moreover, 446 Sébastien Peter et al. Fig. 3 Triple FRET arrangement composed of TagBFP, TagGFP, and TagRFP CFP is not very bright and has a rather low photostability. These restrictions require alternative AFP-FRET pairs which nowadays are available due to the broad spectral varieties. We, therefore, suggest using a red-shifted FRET pair if the individual experimental design allows for that. While a combination of GFP and the mRFP already exhibits results superior to the CFP–YFP combination, new brighter and more photostable constructs such as the Tag-­family are available. Here, a blue-emitting variant with fairly good spectroscopic properties can also be used and triple FRET studies can be applied. These studies extend the conventional FRET for a third component, thus generating an energy migration cascade. In triple FRET, the excitation energy of a donor chromophore is initially transferred non-radiatively to a first acceptor. This acceptor, in turn, can act as a second donor and transfer its excitation energy to a third chromophore, acting as last acceptor dye. This way, complex interaction studies with up to three participating proteins can be carried out. In Fig. 3, a triple FRET arrangement composed of AFPs from the Tag-family is depicted (TagBFP → TagGFP → TagRFP). When the use of a cyan-emitting AFP is indispensable, we recommend to use Cerulean rather than CFP [53]. Spectrally almost identical to CFP, Cerulean offers a higher brightness and photostability, albeit it still does not reach the levels of eGFP or YFP. Moreover, in contrast to CFP Cerulean exhibits a monoexponential fluorescence decay. This is crucial for FRET-FLIM studies, as the intrinsic biexponential decay of CFP complicates data evaluation. 4.3 AFPs as Local Biosensors The particular properties of the AFPs recommend their use as molecular in vivo sensors. Due to the protein shell, which shields the chromophore towards the environment, mainly protons can directly interact with the fluorophore. As the chromophore Fluorescence Microscopy 447 equilibrates in a protonated and a deprotonated form, this equilibrium can be influenced by the local proton concentration. This makes AFPs in general and GFP in particular a very sensitive local pH-­ sensor [54]. Amongst others, the protonation state can be read out by FLT measurements and, hence, a FLIM image can be translated into a pH map. The affinity of protons to penetrate the AFP barrel structure also depends on the protein the AFP is fused to. For example, a BRI1-eGFP construct is sensitive to changes in the local membrane potential due to a specific brassinolide-activated increase of the P-ATPase activity [55]. Another approach uses directed mutations to induce sensitivity to distinct external parameters. In one prominent approach, two cysteines have been introduced into GFP at positions 147 and 204 to be adjacent to each other (roGFP) [56]. In dependence on the local redox potential, a disulfide bond can reversibly form between the two thiol groups of the cysteins. Formation of this bond alters the protonation equilibrium of the chromophore. Thus, roGFP acts as an optical sensor to probe the local redox potential. A further development to achieve a sensor, which is exclusively sensitive to changes in the local H2O2 concentration, leads to the HyPer probe, where a circularly permuted yellow fluorescent protein (cpYFP) was inserted into the regulatory domain of the prokaryotic H2O2-sensing protein OxyR [57]. AFPs can also be used to sense local salt concentrations. The most prominent example of the so-called cameleon features is a protein construct, which links the FRET pair CFP and YFP via calmodulin [58, 59]. Calmodulin undergoes a conformational change in the presence of Ca2+ ions, which forces the AFPs to approach each other. This way, the FRET efficiency is varied in a Ca2+ concentration-dependent manner and the cameleon construct can be used for local and highly sensitive Ca2+ probing. 5 Conclusions This review provides an overview on actual fluorescence microscopy techniques used in Arabidopsis research today. While in classical applications, merely the intensity of the signal was used as a source of information, state-of-the art applications use specific spectroscopic properties of fluorescence label dyes to increase the information content of every single measurement. The major benefit of these techniques is due to spectral dependence of a fluorophore to its local nano-environment. Thus, fluorescence microscopy is a valuable tool for biochemical imaging with subcellular resolution, helping researchers to further understand biological processes on a molecular scale. Future developments will certainly further proceed in this direction. One important issue towards this highly sensitive and noninvasive technology will rely on further developments in 448 Sébastien Peter et al. super-resolution microscopy beyond the diffraction limit. Most likely, in the next few years, these techniques will find their way into plant research and offer fascinating insights, eventually even with a molecular spatial resolution. If the super-resolution techniques are combined with local spectral readout modalities such as the FLT or the fluorescence emission or excitation spectrum, optical microscopy will further emerge as an analytic technique with the highest potential. However, the amount of data to be recorded, interpreted, and correlated will also increase tremendously and, hence, highly sophisticated mathematical techniques for data evaluation and multivariate data analysis will play a major role. This way, the disciplines of biology, chemistry, physics, mathematics, and informatics will further merge together and deep and so-far unforeseeable insight into cellular processes will be gained. 6 Notes 1. Triple FRET excitation schema: While triple FRET is a very powerful tool to single out complex protein interactions, it requires a decent experimental concept, which incorporates different alternating excitation sources to retrieve the presence of the individual proteins and then to determine their interaction. Such an excitation schema, which requires either pulsed laser sources or fast switchable continuous wave (cw) lasers, is depicted in Table 1. 2. Excitation wavelengths: Choosing the appropriate excitation wavelength and filter sets is crucial for fluorescence microscopy. In Table 2, optimal and acceptable excitation wavelengths and suited emission filters for common fluorescence labels are arranged. For FRET applications the lowest excitation wavelength should be chosen to avoid direct acceptor excitation. For filters, at least for the donor, a band-pass filter is required to block acceptor emission. For the acceptor, or the last dye in a triple FRET energy migration chain, respectively, a long-pass filter will work fine. 3. FLIM excitation rate: For FLIM studies, distinct settings for the pulsed excitation source are required. At first, an appropriate repetition rate should be chosen. Using common fluorescence dyes, repetition rates between 10 and 40 MHz are well suited, and 80 MHz might work as well; however, the subsequent pulse might start before the intensity has completely decayed. Lower repetition rates than 10 MHz should be avoided, as due to the long time span between two pulses, significant readout and thermal noise is collimated. 4. FLIM excitation power: The excitation power in a FLIM experiment should be set to values, where 1 % of the excitation Fluorescence Microscopy 449 Table 1 Excitation schema for a triple FRET study. Using three independent excitation wavelengths, the presence and the interaction between three proteins can be deduced λ3 D3 D3 present D3 No D3 present λ2 D2 present, no D2 → D3 interaction or D3 not present D2 D3 D2 present, D2 → D3 interaction No D2 present D2 λ1 D1 present, no D1 → D2 interaction or no D2 present D1 D1, D2 present, D1 → D2 interaction, no D2 → D3 interaction or D3 not present D2 D3 D1 D1, D2, D3 present, D1 → D2 → D3 interaction No D1 present Table 2 Optimal (green light) and acceptable (orange light) excitation wavelengths and emission filters for common in vivo labeling dyes. LP long-pass, BP band-pass. BP numbering: AAA/BB: AAA = the central wavelength; BB = spectral width Dye Optimal λexc (nm) Acceptable λex (nm) Emission filter BFP, DAPI 360 405 LP420 CFP, eCFP, Cerulean 438 457 LP460, BP480/40 GFP, FITC (fluorescein), Alexa488, Atto488 488 457 LP500, BP525/50 YFP, eYFP, Venus, Citrine 514.5 488 LP520, BP540/35 mRFP 550 532, 514.5 LP600, BP610/20 mCherry 580 532 LP600, BP640/80 pulses cause a photon detection event on the detector. Hence, detection rates must not be higher than 100 kHz (10 MHz excitation rate) to 800 kHz (80 MHz excitation rate). For higher detection rates, the probability for two photons to be emitted by the sample while the detector is still in a dead time, where photons cannot be counted, increases. This, in turn, leads to an overestimation of early photon arrival times and the resulting FLT are too low. This effect often is referred to as “pile-up” effect and should carefully be avoided. 450 Sébastien Peter et al. 5. FLIM channel width: Modern TCSPC electronics allow for channel widths as small as 1 ps. Usually, such short binning times are not required and time intervals between 32 and ~200 ps provide good fitting results comparable to those obtained from high time resolution. The larger time intervals, in turn, take advantage of a faster build of the histogram, as photons of similar arrival times are binned together, which leads to significantly shorter data acquisition times. We recommend the highest time resolution only for measurements where ultrafast dynamics have to be monitored. This is, e.g., the case for recording the acceptor rise time in quantitative FRET studies. 6. Instrument response function (IRF) in FLIM studies: In TCSPC data analysis, the laser pulse is regarded as perfect delta function. The IRF corrects deviations from this delta function inherent in any experimental configuration. Hence, the IRF is broadened and asymmetric compared to the delta function due to the finite pulse width of the laser pulse and electronically caused time delays. To obtain quantitative data, the IRF must be known to be convoluted with the fit function (Eq. 2). To record an IRF, one may use back reflection of the laser beam at a coverslip without any emission filter. The blocking efficiency of dichroic beam splitter for back-reflected light is not sufficient to block all light. We, therefore, recommend to use greyfilters to reduce laser intensity when recording the IRF. Some pulsed laser diodes provide a mechanical power adjustment. This option should only be used in exceptional cases, as the pulse shape can vary with the output power. A different way to record an IRF is accomplished using luminescence, which proceeds on a very fast timescale. For example for a decent concentration in the micromolar range gold nanoparticles, which are commercially available, exhibit a strong red-shifted luminescence due to excited surface plasmons, which emit quasiinstantaneously after the excitation and the additional time jitter can be neglected. Since the fit quality strongly depends on the IRF and since the IRF is highly sensitive to any changes in the experimental setup (especially excitation repetition rates, but also filters or changed detection modalities), we recommend to record an IRF at least once a day. 7. Objectives: For optimal spatial resolution, high NA objectives are required. Optimal results are obtained using oil immersion objectives with a magnification of 60× to 100×. The objectives, however, require the observation of the sample through a microscopy coverslide (typical thickness 0.18 mm). If this limitation is not acceptable for a distinct investigation we suggest the use of air objectives with 100× magnification. Note that changing the objective NA and magnification requires a re-­ dimensioning of the image pinhole. Fluorescence Microscopy 451 References 1. Stephens DJ, Allan VJ (2003) Light microscopy techniques for live cell imaging. Science 300:82–86 2. Lakowicz JR (2006) Principles of fluorescence spectroscopy. Kluwer, New York 3. Schleifenbaum F, Blum C, Subramaniam V, Meixner AJ (2009) Single molecule spectral dynamics at room temperature. Mol Phys 107:1923–1942 4. van Munster EB, Gadella TW (2005) Fluorescence lifetime imaging microscopy (FLIM). Adv Biochem Eng Biotechnol 95:143–175 5. Ntziachristos V (2006) Fluorescence molecular imaging. Annu Rev Biomed Eng 8:1–33 6. Pepperkok R, Ellenberg J (2006) High-­ throughput fluorescence microscopy for systems biology. Nat Rev Mol Cell Biol 7:690–696 7. Suzuki T, Matsuzaki T, Hagiwara H, Aoki T, Takata K (2007) Recent advances in fluorescent labeling techniques for fluorescence microscopy. Acta Histochem Cytochem 40: 131–137 8. Valeur B (2002) Molecular fluorescence: principles and applications. Wiley-WCH, Weinheim 9. Shotton DM (1989) Confocal scanning optical microscopy and its applications for biological specimens. J Cell Sci 97:175–206 10. Abbe E (1904) Abhandlungen über die Theorie des Mikroskops. Verlag G. Fischer, Jena 11. Axelrod D, Gerard M, Ian P (2003) Total internal reflection fluorescence microscopy in cell biology. In: Methods in enzymology. Academic Press 36:1–33. Biophotonics, Part B, Elsevier (Amsterdam). Editors: Gerard Marriot and Jan Parker 12. Betzig E et al (2006) Imaging intracellular fluorescent proteins at nanometer resolution. Science 313:1642–1645 13. Rust M, Bates M, Zhuang X (2006) Subdiffraction-­limit imaging by stochastic optical reconstruction microscopy (STORM). Nat Methods 3:793–796 14. Heilemann M et al (2008) Subdiffraction-­ resolution fluorescence imaging with conventional fluorescent probes. Angew Chem Int Ed 47:6172–6176 15. Lippincott-Schwartz J, Patterson GH (2009) Photoactivatable fluorescent proteins for diffraction-­limited and super-resolution imaging. Trends Cell Biol 19:555–565 16. Hell S (2004) Strategy for far-field optical