Advances in Corpus Applications in Literary and Translation Studies Professor Riccardo Moratto and Professor Defeng Li present contributions focusing on the interdisciplinarity of corpus studies, with a special emphasis on literary and translation studies which offer a broad and varied picture of the promise and potential of methods and approaches. Inside scholars share their research findings concerning current advances in corpus applications in literary and translation studies and explore possible and tangible collaborative research projects. The volume is split into two sections focusing on the applications of corpora in literary studies and translation studies. Issues explored include historical backgrounds, current trends, theories, methodologies, operational methods, and techniques, as well as training of research students. This international, dynamic, and interdisciplinary exploration of corpus studies and corpus application in various cultural contexts and different countries will provide valuable insights for any researcher in literary or translation studies who wishes to have a better understanding when working with corpora. Riccardo Moratto (PhD, FCIL) is Professor of Translation and Interpreting Studies and Chinese Literature in Translation at the Graduate Institute of Interpretation and Translation, Shanghai International Studies University. Defeng Li, Professor of Translation Studies, is Associate Dean of Faculty of Arts and Humanities and Director of the Centre for Studies of Translation, Interpreting, and Cognition (CSTIC) at the University of Macau. Routledge Advances in Translation and Interpreting Studies Translating Controversial Texts in East Asian Contexts A Methodology for the Translation of “Controversy” Adam Zulawnik Using Technologies for Creative-Text Translation Edited by James Hadley, Kristiina Taivalkoski-Shilov, Carlos da Silva Cardoso Teixeira, and Antonio Toral Relevance Theory in Translation and Interpreting A Cognitive-Pragmatic Approach Fabrizio Gallai Towards a Feminist Translator Studies Intersectional Activism in Translation and Publishing Helen Vassallo The Behavioral Economics of Translation Douglas Robinson Online Collaborative Translation in China and Beyond Community, Practice, and Identity Chuan Yu Advances in Corpus Applications in Literary and Translation Studies Edited by Riccardo Moratto and Defeng Li Institutional Translator Training Edited by Tomáš Svoboda, Łucja Biel, and Vilelmini Sosoni For more information about this series, please visit www.routledge.com/ Routledge-Advances-in-Translation-and-Interpreting-Studies/book-series/RTS. Advances in Corpus Applications in Literary and Translation Studies Edited by Riccardo Moratto and Defeng Li First published 2023 by Routledge 4 Park Square, Milton Park, Abingdon, Oxon OX14 4RN and by Routledge 605 Third Avenue, New York, NY 10158 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2023 selection and editorial matter, Riccardo Moratto and Defeng Li; individual chapters, the contributors The right of Riccardo Moratto and Defeng Li to be identified as the authors of the editorial material, and of the authors for their individual chapters, has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ISBN: 978-1-032-28738-6 (hbk) ISBN: 978-1-032-28740-9 (pbk) ISBN: 978-1-003-29832-8 (ebk) DOI: 10.4324/9781003298328 Typeset in Times New Roman by Apex CoVantage, LLC Contents List of Tables List of Figures Notes on Editors Notes on Contributors Introduction viii xi xiii xiv 1 RICCARDO MORATTO AND DEFENG LI 1 Diachronic Trends in Fiction Authors’ Conceptualizations of their Practices 10 DARRYL HOCKING AND PAUL MOUNTFORT 2 Within-Author Style Variation in Literary Nonfiction: The Situational Perspective 26 MARIANNA GRACHEVA AND JESSE A. EGBERT 3 Charles Dickens’s Influence on Benito Pérez Galdós Revisited: A Corpus-Stylistic Approach 48 PABLO RUANO SAN SEGUNDO 4 A Corpus-Stylistic Approach to the Literary Representation of Narrative Space in Ruiz Zafón’s The Cemetery of Forgotten Books Series 65 GUADALUPE NIETO CABALLERO AND PABLO RUANO SAN SEGUNDO 5 Analyzing Who, What, and Where in a Mediæval Chinese Corpus: A Case Study on the Chinese Buddhist Canon TAK-SUM WONG AND JOHN SIE YUEN LEE 81 vi Contents 6 Corpora and Literary Translation 103 TITIKA DIMITROULIA 7 Orality in Translated and Non-Translated Fictional Dialogues 119 YANFANG SU AND KANGLONG LIU 8 The Avoidance of Repetition in Translation: A Multifactorial Study of Repeated Reporting Verbs in the Italian Translation of the Harry Potter Series 138 LORENZO MASTROPIERRO 9 Feminist Translation of Sexual Content: A Quantitative Study on Chinese Versions of The Color Purple 158 XINYI ZENG AND JOHN SIE YUEN LEE 10 Benefits of a Corpus-based Approach to Translations: The Example of Huckleberry Finn 176 RONALD JENN AND AMEL FRAISSE 11 Are Translated Chinese Wuxia Fiction and Western Heroic Literature Similar? A Stylometric Analysis Based on Stylistic Panoramas 191 KAN WU AND DECHAO LI 12 Translating Personal Reference: A Corpus-Based Study of the English Translation of Legends of the Condor Heroes 213 JING FANG AND SHIWEI FU 13 Lexical Bundles in the Fictional Dialogues of Two Hongloumeng Translations: A Corpus-Assisted Approach 229 KANGLONG LIU, JOYCE OIWUN CHEUNG, AND RICCARDO MORATTO 14 Mapping Culture-Specific and Creative Metaphors in Lu Xun’s Short Stories by L1 and L2 English Translators: A Corpus-Assisted Relevance-Theoretical Account LINPING HOU AND DEFENG LI 254 Contents 15 On a Historical Approach to Cantonese Studies: A CorpusBased Contrastive Analysis of the Use of Classifiers in Historical and Recent Translations of the Four Gospels vii 281 TAK-SUM WONG AND WAI-MUN LEUNG Index 300 Tables 1.1 1.2 Word Composition of the FAC High-Frequency Upward-Trending Lemmas in the FAC (freq. ≥ 50, p < 0.05) 1.3 High-Frequency Downward-Trending Lemmas in the FAC (freq. ≥ 50, p < 0.05) 1.4 1950s Keywords 1.5 1960s Keywords 1.6 1970s Keywords 1.7 1980s Keywords 1.8 1990s Keywords 1.9 2000s Keywords 1.10 2010s Keywords 1.11 2020s Keywords 2.1 Range of Variation along Dimensions 2.2 “Interactive vs. Informational Style” Dimension Scores across Communicative Purposes in Phillip Lopate’s Essays 2.3 “Immediate vs. Removed Style” Dimension Scores across Communicative Purposes in David Shields’s Essays 2.4 Clusters Identified in Ander Monson’s Essays 3.1 Annotation Tags Used to Annotate Fortunata and Jacinta 4.1 Novels by Ruiz Zafón 4.2 Clusters in Ruiz Zafón’s Novels 5.1 Named Entity Recognition Performance on the Test Set 5.2 Precision and Recall in Subject-Verb Pair Extraction from L&K Treebank 5.3 Most Frequent Characters (As Nominal Subjects) in the Corpus 5.4 Most Frequent Characters in the Two Subcorpora of the Canon 5.5 Most Frequent Verbs with Buddha (Left) and Other Characters (Right) as Subject 5.6 Most Frequent Verbs of Three Different Characters 5.7 Precision and Recall in Character-Toponym Pair Extraction in the Test Set 13 14 15 19 20 20 20 20 22 22 23 29 33 37 41 52 68 69 84 87 88 89 90 91 95 Tables ix 5.8 7.1 7.2 7.3 8.1 8.2 8.3 8.4 8.5 8.6 9.1 9.2 9.3 9.4 9.5 9.6 11.1 11.2 11.3 11.4 11.5 11.6 12.1 12.2 12.3 12.4 12.5 13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8 Most Frequent Places Associated with the Three Major Epithets of Buddha and with Other Characters Composition of the Fictional Dialogue Corpus Score of Dimension 1 Results of Mann-Whitney U-Test Overview of the Data and Sub-Datasets Reporting Verb Taxonomy Generalized Linear Model: Series Generalized Linear Model: Translator 1 Generalized Linear Model: Translator 2 Generalized Linear Model: HbP Breakdown 197 Sentences in Our Corpus According to Sexual Content Type The Spectrum of Translation Strategies Annotated in Our Corpus Translation Strategies Illustrated with Example Translations Breakdown of Translation Strategy Frequency in Feminist (Tao) and Non-Feminist (Yang) Translation Sentence-Level Comparison of the Translation Strategies in the Feminist (Tao) and Non-Feminist (Yang) Translations Example Sentences for Each Sexual Content Type Details of the Translated Wuxia Stories Details of the Chivalric Stories Details of the Heroic Fantasies Statistics of the Stylistic Indices across the Genres Average Distances between Each Wuxia Translation and the Western Counterparts Reception of the Six Wuxia Translations in English (Up to 02/2021) Personal References Used by Guo Jing in LoCH Translation Equivalence by Social Status (Speaker-Reference) Chi-Square Test Result of Translation Equivalence and Social Status (Speaker-Reference) Translation Equivalence by Social Status (Addressee-Reference) Chi-Square Test Result of Translation Equivalence and Social Status (Addressee-Reference) Descriptive Statistics of Fictional Dialogues in HD and YD Types and Tokens of 3-Word and 4-Word LBs in HD and YD Structural Classifications of Key-LBs in HD and YD Statistics of VPBased Key-LBs in HD and YD Statistics of PPBased Key-LBs in YD Functional Classifications of Key-LBs in HD and YD Statistics of Stance Key-LBs in HD and YD Statistics of Referential Key-LBs in HD and YD 97 124 126 129 144 146 148 148 149 149 161 162 163 165 166 168 194 195 195 197 206 208 219 221 221 224 224 234 234 236 236 236 239 240 240 x Tables 14.1 Rendering Strategies of Conventional Culture-Specific Metaphors 14.2 Rendering Strategies of Creative Culture-Specific Metaphors 14.3 Rendering Strategies of Conventional Culture-Universal Metaphors 14.4 Rendering Strategies of Creative Culture-Universal Metaphors 15.1 List of Top 10 Classifiers Present in the Contemporary Cantonese Translation of the Four Gospels 15.2 List of Top 10 Classifiers Present in the Historical Cantonese Translation of the Four Gospels 15.3 The Most Frequently Observed Classifiers Present in the Recent Cantonese Translation of the Four Gospels 15.4 The Most Frequently Observed Classifiers Present in the Historical Cantonese Translation of the Four Gospels 15.5 Reduplicated Classifiers in the Cantonese Translation of the 2010 Edition of the Four Gospels (N = 11) 15.6 Reduplicated Classifiers in the Cantonese Translation of the 1880s edition of the Four Gospels (N = 32) 265 265 265 265 286 288 290 291 295 295 Figures 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3.1 3.2 3.3 4.1 5.1 5.2 5.3 5.4 5.5 5.6 5.7 6.1 7.1 8.1 Phillip Lopate’s essays: spread of scores on “interactive vs. informational style.” David Shields’s essays: spread of scores on “immediate vs. removed style.” Ander Monson’s essays: spread of scores on “abstract expository vs. concrete descriptive style.” Phillip Lopate: variation by communicative purpose on “interactive vs. informational style.” David Shields: variation by communicative purpose on “immediate vs. removed style.” Ander Monson: variation by communicative purpose on “abstract expository vs. concrete descriptive style.” Ander Monson’s essays: three-cluster solution. Clusters identified in Ander Monson’s essays. Example 2 with annotation. Screenshot of 50 suspensions in Fortunata and Jacinta. Screenshot of CLiC tool with 20 suspensions from Oliver Twist. Screenshot of concordance a la puerta de la in Ruiz Zafón’s novels. Example dependency tree to illustrate character-verb pair extraction (K229). Most frequent verbs with nominal subjects. Dependency tree with a character-toponym pair involving a verb. Dependency tree with a character-toponym pair involving a preposition. Most frequent verbs that take toponyms as direct objects. Most frequent prepositions that take toponyms as prepositional objects. The ten toponyms most frequently mentioned with a character. Melby’s eight types of translation technology (1998, 1). Scores for dimension 1 of different registers. Concordance sample for reporting verbs attributed to Harry. 30 31 32 34 38 40 41 42 52 53 54 72 86 90 92 93 93 94 96 105 127 144 xii 8.2 8.3 8.4 8.5 9.1 9.2 9.3 10.1 10.2 10.3 11.1 11.2 11.3 11.4 11.5 11.6 11.7 11.8 12.1 14.1 Figures Query for reply in Treq. Query for urged in WordNet. “Freq” effect plots. “Freq” effect plots without said. Overall distribution of translation strategies for sexual content in feminist (Tao) and non-feminist (Yang) translation. Overall distribution of translation strategies according to sexual content type in feminist (Tao) and non-feminist (Yang) translation. Sentence-level comparison: the number of sentences translated with a more explicit strategy in the feminist (Tao) than nonfeminist (Yang) translation, and vice versa. Excerpt of the Translation Dashboard for Basque, Bulgarian, Dutch, Finnish, German, Hungarian, Polish, Portuguese, Russian, and Ukrainian (first five chapters). Example of paragraph alignment for the Basque version. Example of paragraph alignment for the Bulgarian version. Stylistic panoramas of the three subgenres, from a global view. Cluster dendrogram of the HCA-based stylistic panoramas. Sample parallel list for the MFWs in the selected works. Sample parallel list for the MFWSs (2-grams) in the selected works. Sample parallel list for the MFWSs (3-grams) in the selected works. PCA graph of individuals, based on the top 1,000 MFWs. PCA graph of individuals, based on the top 1,000 2-grams. PCA graph of individuals, based on the top 1,000 3-grams. Lexicogrammatical realization of speech roles in LoCH (ST). RT translation route of metaphors by L1 and L2 translators. 145 146 150 151 165 167 168 180 181 182 198 200 202 202 202 203 204 205 217 266 Notes on Editors Riccardo Moratto (PhD, FCIL) is Professor of Translation and Interpreting Studies and Chinese Literature in Translation at the Graduate Institute of Interpretation and Translation, Shanghai International Studies University; Chartered Linguist and Fellow Member of CIOL; Editor in Chief of Interpreting Studies for Shanghai Foreign Language Education Press (外教社); General Editor of Routledge Studies in East Asian Interpreting; and General Editor of Routledge Interdisciplinary and Transcultural Approaches to Chinese Literature. Professor Moratto is also Honorary Guest Professor at the College of Foreign Studies, Nanjing Agricultural University; Honorary Research Fellow at the Center for Translation Studies of Guangdong University of Foreign Studies; and Expert Member of the Translators Association of China (TAC). Professor Moratto is an international conference interpreter and renowned literary translator. He has published extensively in the field of translation and interpreting studies and Chinese literature in translation. Defeng Li, Professor of Translation Studies, is Associate Dean of Faculty of Arts and Humanities and Director of the Centre for Studies of Translation, Interpreting, and Cognition (CSTIC) at the University of Macau. Prior to his current appointment, he served as Chair of the Centre for Translation Studies and Reader in Translation Studies at SOAS, University of London; Director of the MA in Translation and Associate Professor at the Chinese University of Hong Kong; Dean and Chair Professor at Shandong University (adjunct); and Chair Professor at Shanghai Jiaotong University (adjunct). He is currently President of the International Association of Translation, Interpreting, and Cognition (IATIC) and World Interpreter and Translator Training Association (WITTA). He has researched and published extensively in the field of cognitive translation studies, corpus-assisted translation studies, curriculum development in translator training, research methods in translation studies, professional translation (e.g., business, journalistic, legal translation), as well as second language education. Notes on Contributors CHEUNG, Joyce Oiwun, is a PhD student at the Department of Chinese and Bilingual Studies of the Hong Kong Polytechnic University. Her current research interests include corpus linguistics, translation studies, and discourse analysis. She has previously published in linguistic journals such as Social Semiotics and Recall. DIMITROULIA, Titika, is Professor of Translation Studies at Aristotle University of Thessaloniki, Greece, a translator and a literary critic. Her research interests range from literary translation to translation technologies, with emphasis on text analysis and translation history. Her publications include the volumes Translation and Memory (2021), Literary Translation (2015), and Digital Literary Studies (2015), and articles in various journals and edited volumes. She is member of the board of Petra-E network for the education and training of literary translators, and she coordinates at Aristotle University the Clarin-Apollonis project. She has translated numerous libretti and published books and articles on literary criticism. EGBERT, Jesse A., is Associate Professor of Applied Linguistics at Northern Arizona University. Jesse specializes in register variation, quantitative methods in linguistics, and corpus linguistic approaches to legal interpretation. He is Founding General Editor of Register Studies, Technical Strand Editor for the series Cambridge Elements in Corpus Linguistics, and Co-Editor of the Routledge series Advances in Corpus Linguistics. He has published more than 75 peer-reviewed papers. Recent books include Using Corpus Methods to Triangulate Linguistic Analysis (Routledge, 2019), Doing Linguistics with a Corpus: Methodological Considerations for the Everyday User (Cambridge, 2020), and Designing and Evaluating Language Corpora (Cambridge, 2022). FANG, Jing, is a lecturer in the Translation and Interpreting Program at Macquarie University. She has a PhD in linguistics and a master’s degree in translation and interpreting. Her current research interests include systemic functional linguistics, corpus-based translation studies, and sight translation research. FRAISSE, Amel, is Associate Professor at Université de Lille, working on information science and digital humanities. Her research focuses on information Notes on Contributors xv extraction, knowledge acquisition and visualization, multilingualism and multiculturalism, and under-resourced languages and cultures. She received a PhD in Computer Science in 2010 from Université de Grenoble. Her dissertation focused on issues of software localization, globalization, and internationalization process. From 2013 to 2015, she worked as a CNRS postdoctoral fellow at the LIMSICNRS laboratory (Orsay-Saclay, south of Paris), where she worked on natural language processing tasks and more specifically on building multilingual corpora and lexicon for sentiment analysis and opinion mining tasks. FU, Shiwei, is PhD Candidate in the Department of Linguistics at Macquarie University. She is also a PhD candidate in the School of Foreign Languages and Literature at Wuhan University. Her research interests include Chinese literary translation, corpus-based translation studies, and translation theories. GRACHEVA, Marianna, is PhD Candidate in Applied Linguistics at Northern Arizona University. Marianna’s linguistic interests include register and style variation, grammar, and corpus linguistics. Her dissertation research focuses on intra-speaker style variation across registers in literary, academic, and political domains, as well as inter-speaker variation within registers, and investigates the relationship between individual linguistic style and the situation of use. HOCKING, Darryl, is Senior Lecturer at Auckland University of Technology, New Zealand. His research primarily uses corpus, genre, and discourse analysis to examine the interactional genres and communicative practices in art and design settings and how these impact on creative activity. He is the author of the books Communicating Creativity (Palgrave Macmillan) and The Impact of Everyday Language Change on the Practices of Visual Artists (Cambridge University Press). HOU, Linping, received his PhD in English Linguistics, specializing in Cognitive Translation Studies, from the Centre for Studies of Translation, Interpreting, and Cognition (CSTIC) at the University of Macau. He is Professor of Translation Studies in the English Department at Shandong University of Science and Technology and Director of the Center for Cognitive Translation Studies (CCTS). His main research interests include corpus-assisted translation studies, cognitive translation studies, and cognitive pragmatics. He has published more than 20 articles in all these areas. JENN, Ronald, is Professor of Translation Studies at Université de Lille, France. He coauthored Mark Twain & France (University of Missouri Press, 2017) with Paula Harrington. After a France-Berkeley project on the “ ‘French Marginalia’ of Mark Twain’s Personal Recollections of Joan of Arc” with Linda Morris, his research now focuses on how Translation Studies and digital humanities can interact, using Huckleberry Finn as a case in point. This is what the France-Stanford ROSETTA/Global Huck project has done, in collaboration with Shelley Fisher Fishkin and Amel Fraisse (https://rosetta.huma-num.fr/worldmap/). xvi Notes on Contributors LEE, John Sie Yuen, is Associate Professor at the Department of Linguistics and Translation at City University of Hong Kong. He obtained his PhD in Computer Science at the Massachusetts Institute of Technology. His research interest is in natural language processing and its applications in digital humanities, translation studies, and computer-assisted language learning. LEUNG, Wai-Mun, is Associate Professor of Applied Chinese Linguistics at the Department of Chinese and Bilingual Studies, the Hong Kong Polytechnic University. Her research interests include Cantonese studies, language planning and policy, Chinese-language education, and teaching Chinese to non-Chinese-speaking students. She recently constructed “The 19th Century (1865–1894) Cantonese Christian Writings Database 十九世紀中後期(1865–1894)粵語基督教典籍 資料庫.” She is also the first author of the monograph Biliteracy and Trilingualism: Language Education Policy Research in Hong Kong (兩文三語:香港語文 教育政策研究, CityU Press, 2020), in which a chapter is devoted to discussing Cantonese. LI, Dechao, is Associate Professor of the Department of Chinese and Bilingual Studies, the Hong Kong Polytechnic University. He also serves as Chief Editor of Translation Quarterly, a journal published by the Hong Kong Translation Society. His main research areas include corpus-based translation studies, empirical approaches to translation process research, history of translation in the late Qing and early Republican periods, and PBL and translator/interpreter training. LI, Defeng, is Professor of Translation Studies and Director of the Centre for Studies of Translation, Interpreting, and Cognition (CSTIC) at the University of Macau. Previously, he served as Chair of the Centre for Translation Studies and Reader in Translation Studies at SOAS, University of London. He is President of the International Association of Translation, Interpreting, and Cognition (IATIC) and World Interpreter and Translator Training Association (WITTA). He has researched and published extensively in the field of cognitive translation studies, corpus-assisted translation studies, curriculum development in translator training, research methods in translation studies, professional translation, as well as second-language education. LIU, Kanglong, is Assistant Professor in the Department of Chinese and Bilingual Studies, the Hong Kong Polytechnic University. He specializes in corpusbased translation studies, and his main interests include empirical approaches to translation studies, translation pedagogy, and corpus-based translation research. MASTROPIERRO, Lorenzo is Lecturer in English Language and Translation at the University of Insubria (Como, Italy). He holds a PhD in English Linguistics from the University of Nottingham (UK). His research sits at the intersection between corpus linguistics, stylistics, and translation studies. He has worked extensively on corpus stylistic approaches to literary translation, publishing a Notes on Contributors xvii monograph with Bloomsbury (Corpus Stylistics in Heart of Darkness and its Italian Translations) and several papers/chapters on topics such as translated cohesive networks, translator style as opposed to author style, the translation of repeated items, reader-response analysis, and the translation of reporting verbs. MORATTO, Riccardo, is Professor of Translation and Interpreting Studies and Chinese Literature in Translationat the Graduate Institute of Interpretation and Translation, Shanghai International Studies University; Chartered Linguist and Fellow Member of CIOL (FCIL); Editor in Chief of Interpreting Studies for Shanghai Foreign Language Education Press (外教社); General Editor of Routledge Studies in East Asian Interpreting; and General Editor of Routledge Interdisciplinary and Transcultural Approaches to Chinese Literature. Professor Moratto is also Honorary Guest Professor at the College of Foreign Studies, Nanjing Agricultural University, and Expert Member of the Translators Association of China (TAC). Professor Moratto has published extensively in the field of translation and interpreting studies and Chinese literature in translation. MOUNTFORT, Paul, is Associate Professor in Auckland University of Technology’s School of Communication Studies, and former Chair of the AUT Centre for Creative Writing. His research interests are narrative design and transmedia studies. NIETO CABALLERO, Guadalupe, is Postdoctoral Researcher at the Universidad Complutense de Madrid, Spain. Her main research interests are in twentieth-century Hispanic prose. She also works on digital humanities and on corpus stylistics, with a particular focus on Spanish authors. She has published several articles on corpus stylistics, and she is the coauthor of Estilística de corpus: nuevos enfoques en el análisis de textos literarios (2020, Peter Lang). RUANO SAN SEGUNDO, Pablo, is Senior Lecturer at the University of Extremadura, Spain. His research interests are in corpus linguistics, corpus stylistics, and corpus translation studies, with a particular interest in Charles Dickens’s narrative fiction. He has published a number of articles and chapters in edited books in this area and is coauthor of the book Estilística de corpus: nuevos enfoques en el análisis de textos literarios (Peter Lang, 2020). SU, Yanfang, is currently a PhD student in the Department of Chinese and Bilingual Studies, the Hong Kong Polytechnic University. Her research interests include corpus linguistics, corpus-based translation studies, and computerassisted language learning. WONG, Tak-Sum (黃得森), 黃得森 is Postdoctoral Fellow in the Department of Chinese and Bilingual Studies at the Hong Kong Polytechnic University. He received his BEng in Computer Science from Hong Kong University of Science and Technology (2004), and PhD in Linguistics from City University of Hong Kong xviii Notes on Contributors (2018). He has built a treebank of the Tripiṭaka Koreana during his doctoral study and has been working on the quantitative study of historical syntax. His research expertise covers Chinese historical linguistics, Cantonese linguistics, corpus linguistics, computer-assisted language learning, Chinese dialectology, and Chinese paleography. Recently, he is working on historical Sinitic brush-talk materials from East Asian nations. WU, Kan, is Lecturer of Translation Studies, School of Foreign Languages, Zhejiang University of Finance and Economics, Dongfang College. His research interests include corpus-based translation studies and digital humanity. ZENG, Xinyi, earned her master’s degree in Conference Interpreting and Translation from the University of Essex. She is currently a PhD candidate at the Department of Linguistics and Translation at City University of Hong Kong. Her research focus is on feminist translation and corpus-based translation studies. Introduction Riccardo MORATTO and Defeng LI One of the most striking developments in the area of humanities over the past decades is probably the increasing integration of humanities with computing, which has resulted in the growth and popularity of digital humanities (DH) as a new discipline of studies. As technology develops fast, so do the tools made available to humanities scholars and their applications in humanities research. Consequently, it has become very difficult to define DH, as its definitions become outdated very quickly. However, the difficulty in defining DH has not deterred the increasingly wider use of digital resources and tools in humanities research and the analysis of such applications. As a matter of fact, recent years have seen the applications accelerated. Fine examples are the introduction and integration of corpus linguistics in the study of literature across the world. As Gonçalves states: Corpus Linguistics can be a powerful tool in the analysis of literary texts, especially when allied with non-computational approaches, to bring into light interpretations, thematic details, critically important words in a text, and other information that to other types of analysis might go unseen. By enabling the researcher to process a large quantity of data, and by giving a statistical treatment to the information obtained, Corpus Linguistics provides an ideal approach to study various characteristics of a literary text that would otherwise have gone unnoticed. (2016, 42) Biber (2011, 15) surveyed corpus-assisted analytical techniques for the analysis of literature. He pointed out that “most of these studies focus on the distribution of words (analyzing keywords, extended lexical phrases, or collocations) to identify textual features that are especially characteristic of an author or particular text.” Fischer-Starcke (2010) applied corpus stylistics into the analysis of Northanger Abbey by Jane Austen, with the assistance of two corpora, one made up of six novels by Austen, and the other of texts contemporary with Austen’s. Her analysis shows how corpus tools can reveal textual linguistic features and how these features affect the literary meanings of the texts. Thompson and Sealey (2007) made a corpus-based comparison of children’s literature, adult literature, DOI: 10.4324/9781003298328-1 2 Introduction and newspaper texts and conducted a quantitative analysis of the most frequent words and sequences of words. They found that adult and children fiction are similar in some characteristics, but subtle differences also exist between them in the frequency of some linguistic items, and that the differences between fictions and news texts are apparent. While many scholars pursue the application of computational and corpus linguistics in the study of literature as creative writings (e.g., Mahlberg 2007, 2012, 2013; Mahlberg and McIntyre 2011; Mahlberg et al. 2013; McIntyre 2010), Baker (2000) applied corpus methods in the analysis of literature as translations, a subsystem to the system of literature. In her seminal study of translator’s style, she defined translator’s style as “a kind of thumb-print that is expressed in a range of linguistic – as well as non-linguistic – features . . . the preferred or recurring patterns of linguistic behaviors, rather than individual or one-off instances of intervention” (245). Saldanha (2011) argues that the stylistic features of a translation are not just those of a translator but rather a combination of the linguistic choices of the authors, translators, editors, and others who might have a role in the production and revision of the translated text. She revised the definition of translator’s style as: [A] “way of translating” which (1) is felt to be recognizable across a range of translations by the same translator, (2) distinguishes the translator’s work from that of others, (3) constitutes a coherent pattern of choice, (4) is “motivated,” in the sense that it has a discernable function or functions, and (5) cannot be explained purely with reference to the author or source-text style, or as the result of linguistic constraints. (Saldanha 2011, 31) Inspired by Baker (2000), many translation scholars applied corpus methods in the investigations of translator’s style involving different languages (e.g., Bosseaux 2001, 2004; Winters 2004a, 2004b, 2007, 2009, Li et al. 2011; Wang and Li 2011; Chen and Li 2022). As research advances on both fronts, that is, creative as well as translated literature, scholars have also made efforts to expand research topics and innovate research designs. In order to capture the recent developments along these lines, an online roundtable seminar was organized by the Centre for Studies of Translation, Interpreting, and Cognition (CSTIC) of the Faculty of Arts and Humanities, University of Macau, at the end of 2020, with speakers from around the world. Due to the COVID-19 pandemic and the consequent travel restrictions, the seminar was held online. Despite the rampage of the pandemic, the speakers of the roundtable seminar spared no efforts to share their research on how corpora can be utilized in the exploration of both original and translated literary texts. To celebrate these new developments and to commemorate the remarkable research efforts despite the pandemic, the editors of the present volume invited some speakers of the roundtable seminar to develop their presentations into full-length chapters for inclusion in this book. Besides, some additional leading and active researchers Riccardo MORATTO and Defeng LI 3 were also approached to contribute with a chapter to this book. Therefore, the present volume is the outcome of joint efforts of the speakers at the roundtable seminar and those who were not able to partake the seminar but most generously agreed to share their recent research with us. This book consists of five chapters on the application of corpora to literary studies and nine on the application of corpora to literary translation studies. To facilitate reading, the abstracts of the authors will be presented in the following passages as summary to each chapter. In Chapter 1, Darryl Hocking and Paul Mounfort argue that recent years have seen a proliferation in the diachronic analysis of narrative fiction. Using large ready-made, digitized collections of fictional works, or smaller corpora compiled by the researchers themselves, conclusions about developments in narrative fiction traditionally emerge though a focus on the fictional text itself. This chapter shows how the author interview, an increasingly pervasive genre in which authors speak candidly about their writing practices, also has much potential for contributing to the understanding of diachronic change in narrative fiction. Using a selfcompiled corpus of interviews with fiction writers from the 1950s to the present day, the chapter identifies, firstly, shifts in the way that contemporary authors of narrative fiction have conceptualized their practices over time and, secondly, the more salient conceptualizations of fiction writing that have emerged for each decade since the 1950s. In Chapter 2, Marianna Gracheva and Jesse A. Egbert show that authors vary their language use according to the situational characteristics of individual texts rather than simply their idiosyncratic preferences for certain language features, by building on the results of a multidimensional analysis, which reveals considerable within-author stylistic variation in modern literary nonfiction essays. In particular, the study traces the relationship between communicative purpose and stylistic choices made by authors across their works and shows that style reflects functional considerations rather than only individual language attitudes. The findings contribute to the field of corpus stylistics and have practical value for literary analysis, translation, and creative writing, as they demonstrate that stylistic effects achieved through strategic use of specific linguistic devices are closely associated with the situation of use. In Chapter 3, Pablo Ruano San Segundo uses a corpus-stylistic approach and investigates the alleged influence of Charles Dickens on the style of the Spanish novelist Benito Pérez Galdós. To do so, he has developed an annotation system of Galdós’ novels to identify suspensions. A suspension is a protracted interruption by the narrator of a character’s speech. Stylistically speaking, suspensions have received attention as one of the techniques typical of Dickens’ style. In this chapter, Pablo Ruano San Segundo compares Dickens’ use of suspensions to that of Galdós in Fortunata and Jacinta, the novel for which the Spanish novelist is best known. The results show that there are patterns in form and function hitherto unremarked in literary appreciations of Galdós’ style that show how the Spanish novelist may have incorporated this device into his style to achieve similar effects as those conveyed by Dickens. 4 Introduction In Chapter 4, Guadalupe Nieto Caballero and Pablo Ruano San Segundo analyze Ruiz Zafón’s The Cemetery of Forgotten Books series using a corpus-stylistic methodology. The authors look into how Zafón shapes narrative space in the series. More specifically, they intend to show how certain aspects discussed by literary critics are enacted in the same way throughout the series, thus unveiling aspects of Ruiz Zafón’s craftsmanship hitherto unremarked in literary appreciations of his style. To do so, the authors have carried out a cluster analysis, with which they have identified textual building blocks and analyzed them systematically. The analysis is meant to make a contribution to the still-emerging field of corpus stylistics in Spanish, illustrating how the analysis of literary works can benefit greatly from the use of innovative corpus tools. In Chapter 5, Tak-sum Wong and John Sie Yuen Lee argue that information extraction from historical text is challenging because of the lack of data to train natural language processing tools. This chapter evaluates the utility of in-domain training data for data-driven profiling of characters, verbs, and toponyms and reports a case study on a corpus of Chinese Buddhist text. As is typical for such a corpus, the Chinese Buddhist Canon has few annotated linguistic resources other than names, places, and domain-specific terms. The authors apply a lexicon-based approach for named entity recognition and then report an analysis of the “who,” “what,” and “where” of the Canon: who the characters were, what they did, and where they were. Experimental results also show that even a small amount of word segmentation, part-of-speech, and dependency annotation can improve accuracy in named entity recognition and in extraction of character-verb associations. In Chapter 6, Titika Dimitroulia aims at discussing the use of corpora in literary translation practice, teaching, and research in the frame of the emerging field of computer-assisted literary translation (CALT). Applied corpus-based translation studies (CBTS or CTS) have not extensively investigated so far the use of corpora in literary translation practice and education, whereas in descriptive CBTS, in which literary translation holds an eminent place, digital humanities (DH) techniques that explore large corpora in innovative ways offer new perspectives in the study of the literary translation as a complex sociocultural discursive event at the heart of world literature. First, the author attempts an outline of the use of corpora by professional literary translators during the translation process, as reshaped by electronic tools, and draws their implications for literary translator education. Based on this, a number of new approaches to literary translation in CBTS will be presented, casting light on translation’s complexity and the potential of corpus analysis with the use of advanced techniques and methodologies. In Chapter 7, Yanfang Su and Kanglong Liu argue that in fiction, fictional dialogues are created with the purpose of character building, plot development, and reader appeal. To achieve these purposes, the orality features present in fictional dialogues are designed to mimic authentic conversations, which also pose great challenges for translation. Previous studies have investigated how orality features Riccardo MORATTO and Defeng LI 5 are translated in fictional dialogues. There is, however, a lack of quantitative analysis of representative orality features in existing studies of orality in fictional dialogues. The objective of this study is to fill the research gap by comparing orality in translated and non-translated fictional dialogues using a comparable corpus design. To ensure a robust comparison of the two text types, the authors made use of the linguistic features of dimension 1 in multidimensional analysis approach together with the dimension score of the two text types. It was found that both text types display an orality tendency toward the interactive texts, but the dimension score for non-translated fictional dialogues was higher than that for translated dialogues, and 11 out of the 28 linguistic features were more frequently used in non-translated fictional dialogues. The study further explores possible explanations for the different profiling between translated and non-translated fictional dialogues. In Chapter 8, Lorenzo Mastropierro explores the translation of repeated reporting verbs in the Harry Potter series in Italian. The author applies a multifactorial approach to investigate whether and to what extent four factors, representing linguistic features of the source text verbs, have an effect on the reproduction of repetition in translation or its avoidance. The factors are (i) the frequency, (ii) the number of possible translation equivalents, (iii) the number of different meanings, and (iv) the semantic category of the source text verbs. This study moves beyond the focus on the effects of avoiding repetition in literary translation, or on the strategies used by translators to avoid repetition, as seen in the existing literature, to provide instead a data-based and multidimensional description of the phenomenon itself in the context of reporting verbs. In Chapter 9, Xinyi Zeng and John Sie Yuen Lee present a quantitative analysis on a feminist translation of The Color Purple by Alice Walker. The analysis identifies distinctive translation strategies in the Chinese version of the novel by Jie Tao, a prominent feminist translator, through comparison with the version by Renjing Yang, who did not identify as feminist. The authors annotated 197 sentences in the novel in terms of their sexual content type and the translation strategy adopted by Tao and Yang. Results show that the feminist version constitutes a slightly more faithful translation, with more frequent use of explicit translation strategies and less frequent use of conservative ones. Further, it exhibits distinctive choices in translation strategy for different sexual content types. The feminist perspective likely motivated the relatively explicit treatment of references to private parts, body explorations, and female bodily phenomena and relatively conservative treatment of rape, illicit relations, and stigma related to virginity. In Chapter 10, Ronald Jenn and Amel Fraisse describe how corpora-based DH projects can bring together scholars from different fields, such as natural language processing, information science, translation studies, and American studies with equal benefits. This chapter uses Adventures of Huckleberry Finn as a case in point to explore the necessary steps to be taken and defines a number of criteria to emulate other corpora-based interdisciplinary projects that would use literary 6 Introduction texts. One important aspect is also the retrieval of existing scholarship on the translated texts, allowing for wider multilingual approaches, significantly broadening the scope of fine-grained textual analysis. In Chapter 11, Kan Wu and Dechao Li investigate the extent to which English translations of Chinese Wuxia fiction and Western heroic literature in modern English are stylistically similar through stylometric analyses. This chapter adds to literary translation research by highlighting possible stylistic connections between heroic literature in the East and that in the West, clues that may help understand the current reception of Wuxia translations. It also contributes to stylometric studies by introducing the stylistic panorama, a novel concept proposed to describe the stylistic picture of a (translated) text in a relatively holistic and functional way. Examining six English translations of Wuxia novels and 12 chivalric stories and heroic fantasies in modern English, the study finds that the Wuxia translations differ from the two Western subgenres in stylistic panoramas built by formal features (dispersion of word lengths, average sentence length, etc.), as well as the most frequent words and the most frequent word sequences. Such differences have foregrounded the unique stylistic features (richer Wuxia-specific vocabularies, shorter paragraph lengths, etc.) of these translations, which has contributed in part to their favorable reception among English-speaking readers. It is hoped that this study will encourage new applications for the concept of stylistic panoramas in future stylometric studies. In Chapter 12, Jing Fang and Shiwei Fu draw upon a parallel corpus of the Chinese martial arts novel Legends of the Condor Heroes (射雕英雄傳) and present a linguistic exploration of the English translation of personal reference used in the novel. In particular, the examination focuses on the protagonist Guo Jing (郭靖), investigating how the character makes reference to himself and to his listeners in conversations, through which his personal trait of humbleness is portrayed. Reviews from English readers of the translation show that Guo’s humbleness has been successfully rendered in the translated text, despite the linguistic disparity between the two languages that poses a challenge in translating personal reference, which is a significant contributor to the portrayal of humbleness. By examining and comparing the lexicogrammatical realization of personal reference in the source and the target texts, the authors try to explore how an equally humble character is developed in the translation. Findings indicate that the translator’s choices in translating personal reference are closely related to the translator’s analysis of the social status of the characters, and the translator also uses compensation strategies at both the microlevel of sentence and the macrolevel of text to render an equivalent humble image in the target text. The study is expected to shed light on how a pragmatically equivalent character could be developed in translation when the two languages are culturally distant. In Chapter 13, Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto argue that the use of lexical bundles (LBs) has been affirmed to be a reliable indicator of translators’ style as they can reveal the idiosyncrasies beyond the use of words. Using LBs as an indicator, the authors investigate how fictional dialogues in two full-length English translations of Hongloumeng diverge in style. This corpus-assisted study is based on the first 80 chapters of two full-length Riccardo MORATTO and Defeng LI 7 Hongloumeng translations, that is, one translated by the British sinologist David Hawkes (who translated the first 80 chapters) and John Minford (who translated the remaining 40 chapters), and the other co-translated by the Chinese translator Xianyi Yang and his British wife, Gladys Yang. The results of the study show that Hawkes used more tokens and types of LBs than did the Yang couple. Further structural and functional analysis revealed that Hawkes overused verb phrases and stance markers, whereas the Yangs overused prepositional phrases and referential markers. The divergences in style are discussed with reference to the translators’ language backgrounds, life experiences, and translation purposes. In Chapter 14, Linping Hou and Defeng Li compare English translations of metaphors in Lu Xun’s short stories translated by L1 and L2 translators, adopting a corpus-assisted method. The patterns of the translation strategies were analyzed within the framework of the relevance theory. The authors found that paraphrasing as a rendering strategy was used more frequently in translating culture-specific metaphors than culture-universal ones, and more recurrently in translating conventional metaphors than creative ones. It suggested that more cognitive effort was required to translate these conventional, culture-specific metaphors, typical barriers in literary translation, to achieve optimal relevance. The authors also found that L1 translators were more target-oriented in rendering the metaphors than L2 translators, indicating an impact of translation direction on the translators’ performance. Finally, the present research testified the interplay of the two routes (i.e., direct translation and indirect translation) in literary translation and demonstrated that the interaction between the two routes was regulated by the principle of relevance and modulated by such essential factors as the source input and the translation direction. In Chapter 15, Tak-sum Wong and Wai-Mun Leung provide a statistical account and a contrastive study on the use of classifiers in historical Cantonese and contemporary Cantonese documents. The authors have conducted a statistical analysis of classifiers present in the Cantonese translations of the 1880s edition and the 2010 edition of the four canonical gospels in the Christian New Testament; 94 classifiers are observed in the 2010 edition, but only 80 are found in the 1880s edition. The results show that while some classifiers have been used more regularly since the nineteenth century, for example, kɔ33個 (a general classifier), kin22件 “piece,” thiu11條 “strip,” tsɛk33隻 (mostly for counting animals and dolls), and ti55 的/啲, the frequency of some classifiers in the 2010 edition drops drastically as a result of lexical replacement, for example, tat33笪 (for counting fields) in place of fai33塊. The authors have also found that the reduction in frequency of reduplicated classifiers is a result of changes in translation strategy rather than a real reduction in usage in contemporary Cantonese. We believe that such an international, dynamic, and interdisciplinary exploration will provide valuable insights for anyone who wishes to have a better understanding of the relationship between corpora, literature, and literary translation. We also believe it is appropriate to take this very opportunity to thank all the contributors of this volume for their dedication and their efforts to produce their best research and share it with the readers of the book. 8 Introduction References Baker, Mona. 2000. “Towards a Methodology for Investigating the Style of a Literary Translator.” Target 12, no. 2: 241–66. Biber, Douglas. 2011. “Corpus Linguistics and the Scientific Study of Literature: Back to the Future?” Scientific Study of Literature 1, no. 1: 15–23. Bosseaux, Charlotte. 2001. “A Study of the Translator’s Voice and Style in the French Translation of Virginia Woolf’s The Waves.” In CTIS Occasional Papers, edited by Maeve Olohan, 55–75. Manchester: Centre for Translation and Intercultural Studies, UMIST. Bosseaux, Charlotte. 2004. “Point of View in Translation: A Corpus-based Study of French Translations of Virginia Woolf’s To the Lighthouse.” Across Languages and Cultures 5, no. 1: 107–22. Chen, Fengde and Defeng Li. 2022. “Patronage and Ideology: A Corpus-assisted Investigation of Eileen Chang’s Style of Translating Herself and the Other.” Digital Scholarship in the Humanities: fqac015. https://doi.org/10.1093/llc/fqac015. Fischer-Starcke, Bettina. 2010. Corpus Linguistics in Literary Analysis: Jane Austen and Her Contemporaries. London: Continuum. Gonçalves, Lourdes Bernardes. 2016. “A Contribution of Corpus Linguistics to Literary Analysis.” Transversal – Revista em Tradução, Fortaleza 2, no. 2: 42–53. Li, Defeng, Chunling Zhang, and Kanglong Liu. 2011. “Translation Style and Ideology: A Corpus-assisted Analysis of Two English Translations of Hongloumeng.” Literary and Linguistic Computing 3: 1–14. Mahlberg, Michaela. 2007. “Clusters, Key Clusters and Local Textual Functions in Dickens.” Corpora 2, no. 1: 1–31. Mahlberg, Michaela. 2012. “Corpus Analysis of Literary Texts.” In The Encyclopedia of Applied Linguistics, edited by C. A. Chapelle, 1479–85. Oxford: Blackwell. Mahlberg, Michaela. 2013. Corpus Stylistics and Dickens’s Fiction. New York and London: Routledge. Mahlberg, Michaela, and Dan McIntyre. 2011. “A Case for Corpus Stylistics: Ian Fleming’s Casino Royale.” English Text Construction 4, no. 2: 204–27. Mahlberg, Michaela, Catherine Smith, and Simon Preston. 2013. “Phrases in Literary Contexts: Patterns and Distributions of Suspensions in Dickens’s Novels.” International Journal of Corpus Linguistics 18, no. 1: 35–56. McIntyre, Dan. 2010. “Dialogue and Characterization in Quentin Tarantino’s Reservoir Dogs: A Corpus Stylistic Analysis.” In Language and Style, edited by Dan McIntyre and Beatrix Busse, 162–82. Basingstoke: Palgrave Macmillan. Saldanha, Gabriela. 2011. “Translator Style: Methodological Consideration.” The Translator 17, no. 1: 25–50. Thompson, Paul, and Alison Sealey. 2007. “Through Children’s Eyes?: Corpus Evidence of the Features of Children’s Literature.” International Journal of Corpus Linguistics 12, no. 1: 1–23. Wang, Qing, and Defeng Li. 2011. “Looking for Translator’s Fingerprints: A Corpus-based Study on Chinese Translations of Ulysses.” Literary and Linguistic Computing 11: 1–13. Winters, Marion. 2004a. “German Translations of F. Scott Fitzgerald’s The Beautiful and Damned: A Corpus-based Study of Modal Particles as Features of Translators’ Style.” In Using Corpora and Databases in Translation, edited by Ian Kemble, 71–89. London: University of Portsmouth. Riccardo MORATTO and Defeng LI 9 Winters, Marion. 2004b. “F. Scott Fitzgerald’s Die Schönen und Verdammten: A Corpusbased Study of Loan Words and Code Switches as Features of Translators’ Style.” Language Matters, Studies in the Languages of Africa 35, no. 1: 248–58. Winters, Marion. 2007. “F. Scott Fitzgerald’s Die Schönen und Verdammten: A Corpusbased Study of Speech-act Report Verbs as a Feature of Translators’ Style.” Meta 52, no. 3: 412–25. Winters, Marion. 2009. “Modal Particles Explained: How Modal Particles Creep into Translations and Reveal Rranslators’ Styles.” Target 21, no. 1: 74–97. 1 Diachronic Trends in Fiction Authors’ Conceptualizations of their Practices Darryl Hocking and Paul Mountfort 1.1 Introduction In recent years, there has been a considerable increase in studies that investigate diachronic corpora of narrative fiction in order to establish how fiction writing has changed over time (McIntyre and Walker 2019). Many of these diachronic analyses involve the development of researcher-compiled corpora and focus on the stylistic evolution of individual authors, for instance, Hoover’s (2007) study on the work of Henry James. Some, such as Klaussner and Vogel’s (2018), have sought to compare diachronic shifts in style between individual authors, while others examine broader stylistic developments over time in collections of authors’ works from a particular historical period. An early example of the latter can be found in Biber and Finegan’s (1989) diachronic study of fiction writing, which examines stylistic change in a small corpus of 33 English literary works from the seventeenth century onward. Although identifying some pockets of resistance, particularly in the eighteenth century, Biber and Finegan provide strong evidence of a progression from a more elaborated and impersonal style in their fictional texts toward one more characteristic of oral language. With the growing access to large digitized collections of time-stamped historical texts of fictional works, for example Project Gutenberg or The Google Books Corpus, the number of diachronic analyses of narrative fiction is further expanding, as is the specific foci of the studies. Underwood (2019), for example, investigates a corpus of English fiction drawn from the HathiTrust Digital Library from 1700 until the early twenty-first century. Among his observations, he finds that over time fiction has increasingly become distanced stylistically from biographical and nonfictional texts, often the result of a growth in more concrete descriptions of the human body, physical actions, and sensory perceptions. Underwood also finds that since the 1700s, changes in the pacing of fictional works have occurred, showing that in the average 250-word passage of an eighteenth century novel, several days are likely to have passed, but by the twentieth century, a passage of a similar length will typically only describe a period of 30 minutes. In another example, Sun and Wang (2022) use a 32,851 million token Google Books subcorpus of English fiction from the 1820s to the 1990s to examine whether the language of English fiction has tended toward the more concrete or DOI: 10.4324/9781003298328-2 Diachronic Trends in Fiction Authors’ Conceptualizations 11 the more abstract. They find that, over the last two centuries, English fiction has become increasingly concrete, and conclude that fiction today is less difficult to read than it was in the nineteenth century. Also, using the English fiction dataset of the Google Books corpus, but restricting their analysis to the years between the 1900 and the 2000, Morin and Acerbi (2017) found that the presence of emotional words significantly decreased in English fiction. To validate their findings, however, they carried out a similar examination of two other smaller, self-built corpora of English fiction and came to similar conclusions. Rather than looking for more general diachronic trends, a number of studies have targeted specific areas of diachronic development in narrative fiction, for instance, Busse’s (2020) examination of how fictional characters’ thought and speech are presented in nineteenth century fictional works, or Kung’s (2007) focus on the word melancholy in pre-Romantic and Romantic British novels. Among other findings, Kung found that melancholy is typically associated with reasoning in the pre-Romantic period and emotion in the Romantic period. These diachronic studies all develop their findings through a focus on the fictional text itself. A related genre, however, which has untapped potential to contribute to the analysis of diachronic change in narrative fiction, but whose focus lies beyond the work, is the author’s interview. In interviews, authors generally express a broad range of issues related to the conceptualization of their practices, and given the marked proliferation since the 1950s of print and online publications containing interviews with fiction authors (e.g., Freiburg 1999), a further opportunity is provided for the use of corpus-based analytical tools to examine and identify shifting trends in fiction and fiction writing practice. Taking this into account, this chapter uses a self-compiled corpus of interviews with fiction authors, firstly, to identify diachronic changes in the way that these authors have conceptualized their practices from the 1950s to the present day and, secondly, to identify the more salient conceptualizations of fiction writing practice for each of the decades since the 1950s. The chapter concludes by evaluating the potential contribution of the artist interview to the quantitative analysis of fiction writing. 1.2 Methods In order to investigate changes in the way that fiction authors have used language to conceptualize their writing practices from 1950 until the present day, a trend mapping analysis (Baker 2011; Stanyer and Mihelj 2016; Hocking 2022) and a diachronic keyword analysis (Baker et al. 2013; Csomay and Young 2021) were carried out on a 433,000-word diachronic fiction authors’ corpus (hereafter FAC), consisting of interviews with fiction authors about their practices. Trend mapping identifies words in a corpus that have exhibited statistically significant decreases or increases in frequency over a particular period of time, while a diachronic keyword analysis enables the identification of words found in a specific time period of a corpus that are comparatively more frequent than those found in the rest of the corpus. In the case of the FAC, trend mapping can provide insights into the wider shifts over time that have occurred in fiction writers’ conceptualization of their practices, while a diachronic 12 Darryl Hocking and Paul Mountfort keyword analysis can help reveal the salient conceptualizations of authors’ practices during specific time intervals within the corpus. To carry out the analysis, the study employed the Sketch Engine online corpus-analytical tool (Kilgarriff et al. 2014). After providing a description of the FAC and the criteria used for its compilation, the methods used to analyze the corpus will be discussed in more detail. 1.2.1 The Fiction Authors’ Corpus (FAC) The FAC consists of 232 interviews with fiction authors from 1950 to the present. The 1950s were selected as the earliest decade for the corpus due to the difficulty of easily obtaining author interviews prior to this period. To facilitate a manageable analysis of trends, the FAC was divided into time intervals representing eight calendar decades (Baker et al. 2013). The texts in the corpus were located from online interviews, some from online magazines, and others from print magazines reproduced online. Importantly, as the focus of the corpus was on authors of fiction (i.e., novels and short stories), interviews with authors who primarily discussed, or whose oeuvre primarily involved, poetry, theatre, essayist literature, or the writing of screenplays were excluded. Furthermore, as all interviews were required to be in English, authors from English-speaking contexts were largely represented in the corpus. Nevertheless, interviews of authors whose first language was not English were still included if there was no evidence that the interview had been translated from another language. In order to specifically focus on the author’s conceptualization of their practices, the questions and comments of interviewers were omitted from the corpus data. Moreover, where appropriate, sections of the interviews that explicitly focused on the authors’ discussions of their personal history, rather than their writing practices, were also removed from the texts. The FAC is ethnically diverse and includes emergent, mid-career, and late-career writers of fiction. While the latter could be argued as conceptualizing their practice in ways representing earlier decades, we would suggest that like all creative individuals, their practices continue to be influenced by conceptual shifts in the field. Sketch Engine, the corpus tool used for the analysis, indicates that the FAC contains 433,281 words. The size of the FAC was determined by the number of suitable texts that could be obtained to represent each of the earlier decades, and the decision that the later decades of the corpus should contain a corresponding word frequency. Hence, to maintain the production of a reasonably sized corpus yet prevent the stylistic peculiarities of any single author emerging as a generality (McEnery et al. 2006), individual interviews were limited to 2,500 words. The FAC might be criticized as a relatively small corpus (Aston 1997); however, small context-specific corpora are frequently viewed as advantageous for the analysis of specialized language use (Bowker and Pearson 2002; Vaughan and Clancy 2013), because the analyst is able to develop a familiarity with the overall nature of the texts, which, along with their typically broad knowledge of the context from which they arise, enhances the capacity to generate and interpret findings (Aston 1997; Koester 2010). Details of the composition of the FAC can Diachronic Trends in Fiction Authors’ Conceptualizations 13 Table 1.1 Word Composition of the FAC Decade Words Percentage of Corpus Texts Average Words per Text 1950s 1960s 1970s 1980s 1990s 2000s 2010s 2020s TOTAL 54,300 54,317 53,986 54,047 54,107 54,025 54,189 54,310 433,281 12.5 12.5 12.5 12.5 12.5 12.5 12.5 12.5 100 26 29 26 27 27 28 30 39 232 2,088 1,873 2,076 2,002 2,004 1,929 1,806 1,393 1,868 be seen in Table 1.1. Each time interval represents 12.5% of the corpus and contains approximately 54,000 words. 1.2.2 Trend Mapping Using the Trends tool in the corpus analytical software Sketch Engine, the trend mapping analysis involved identifying those lemmas which exhibited statistically significant (p-value ≤ 0.05) increasing or decreasing frequency trends over the eight decades of the FAC. The tool uses the Theil-Sen estimator to provide a linear approximation of the slope of the frequencies of an item over time by calculating the medium slope between all individual pairs of frequency points. In the case of this study, the analytical focus was on the lemma representing a particular part of speech category (referred to as a lempos in Sketch Engine). In Sketch Engine, the Theil-Sen slope is represented by a numerical value that identifies the direction and magnitude of the trend. The Mann Kendall test is also employed to identify the significance level of the trend statistic by providing a p-value. Further details of Sketch Engine’s Trends tool can be found in Kilgarriff et al. (2015). Drawing upon Baker (2011), Lazzeretti (2016), and Hocking (2022), it was also determined that the trend analysis would only include those lemmas that occurred a minimum of 50 times in the corpus. Furthermore, it was also decided to only include those lemmas which exhibited a relatively strong slope over time. These were represented by a trend value higher than 1 for increasing trends and less than -1 for decreasing trends. These criteria resulted in 51 increasing or decreasing lemmas of interest. 1.2.3 Diachronic Keyword Analysis Keywords are those words in a target corpus whose frequencies are unusually high when set against the frequencies of the same words in a larger reference corpus (McEnery et al. 2006). They are typically used to capture the overall “aboutness” (Scott and Tribble 2006, 55) of a target corpus. Following Baker et al. (2013) and Csomay and Young (2021), a diachronic keyword analysis, however, involves establishing the keywords for a particular time interval within a diachronic corpus by referencing that time interval against the rest of the corpus. This enables 14 Darryl Hocking and Paul Mountfort the analyst to identify the aboutness of each particular time interval in the corpus, which in the case of the FAC can indicate certain salient conceptualizations of the fiction authors’ practices within successive decades. The diachronic keyword analysis compliments the trend analysis, which focuses on increasing and decreasing trends from the earlier time intervals of a corpus to the later time intervals. To evaluate keyness, Sketch Engine uses a simple maths formula to establish a statistic identifying the degree of keyness and then ranks the results (Kilgarriff 2009). As in the trend analysis, the focus in the diachronic keyword analysis is on lemmas representing a particular part of speech category. Furthermore, in order to ensure that any key lemmas identified are substantively representative of the target decade and not the result of repeated use by the producers of just a few texts in the corpus, the keyword list for each decade only includes the top six key lemmas which occur in at least one-third of all texts of the FAC. All keywords in the table are listed in order of their keyness. Finally, in order to assist the diachronic analysis by facilitating an analysis of the trend and diachronic keyword findings in context, the more conventional, corpus-based tool of concordance analysis (Baker 2006) was also employed. 1.3 Results and Discussion Table 1.2 provides a list of all upward-trending lemma in the FAC that meet the criteria outlined in the previous section, while Table 1.3 provides a list of the Table 1.2 High-Frequency Upward-Trending Lemmas in the FAC (freq. ≥ 50, p < 0.05) Rank Word Part of Speech Trend Strength Freq. p-value 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 community space collection parent like school kid grow relationship include experience need love also family home her allow narrative lot structure everyone noun noun noun noun adjective noun noun verb noun verb verb verb verb adverb noun noun determiner verb noun noun noun noun 2.36 2.05 2.05 1.88 1.60 1.54 1.54 1.48 1.48 1.43 1.43 1.38 1.38 1.33 1.33 1.33 1.23 1.23 1.23 1.19 1.11 1.11 66 100 54 75 71 107 89 156 121 61 54 260 264 530 186 128 673 105 85 559 110 87 0.00930 0.00930 0.01800 0.00440 0.03500 0.00190 0.01800 0.00930 0.03500 0.00930 0.03500 0.00930 0.00930 0.00930 0.00440 0.01800 0.00930 0.01800 0.01800 0.00440 0.00930 0.00930 Diachronic Trends in Fiction Authors’ Conceptualizations 15 Table 1.3 High-Frequency Downward-Trending Lemmas in the FAC (freq. ≥ 50, p < 0.05) Rank Word Part of Speech Trend Strength Freq. p-value 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 technique critic modern typewriter simply play hero style unless use several must deal business he rewrite accept man matter himself no influence great begin yes any no nothing James noun noun adjective noun adverb noun noun noun conjunction noun adjective modal noun noun determiner verb verb noun noun determiner – noun adjective verb – – adverb noun noun -2.90 -1.88 -1.73 -1.66 -1.66 -1.66 -1.60 -1.60 -1.60 -1.60 -1.60 -1.38 -1.33 -1.33 -1.23 -1.19 -1.19 -1.11 -1.11 -1.11 -1.07 -1.07 -1.04 -1.04 -1.00 -1.00 -1.00 -1.00 -1.00 79 90 57 54 158 102 70 157 51 50 63 267 82 82 2,005 54 76 610 154 145 860 66 423 320 441 612 167 216 67 0.0093 0.0180 0.0044 0.0350 0.0044 0.0350 0.0019 0.0350 0.0350 0.0180 0.0350 0.0093 0.0180 0.0350 0.0093 0.0180 0.0350 0.0019 0.0350 0.0044 0.0044 0.018 0.0350 0.0093 0.0093 0.0044 0.0180 0.0093 0.0350 downward-trending lemmas. In both tables, the lemmas are ranked according to their trend strength. The tables also show the part of speech, total frequency, and p-value for each lemma. In order to foreground the wider and more consequential shifts over time in fiction authors’ conceptualization of their practices, the following discussion of these trend results is organized around four major themes, all of which consider the association between particular upward- and downward-trending items. By consequential we mean those shifts which appear to constellate within a larger pattern and, as a result, may represent broader epistemological shifts in the discourse around writing. 1.3.1 A Shift from Formal and Generic Influence Toward the Possibilities and Subjectivity of Experience A shift from formal concerns toward greater openness, experience, and multiple exigencies is visible, for instance, in the declining use of technique, influence, 16 Darryl Hocking and Paul Mountfort must, and simply versus the rising use of experience, include, allow, and also. In the 1950s, technique is ubiquitous, occurring 32 times compared to only twice in the 2010s, with the 1980s being the pivot to precipitous decline from 14 occurrences in the 1970s to only 3 in the 2020s. While technique may be framed subjectivity (i.e., stemming from writers’ instincts or nature) and there are disavowals of technique’s importance, the influence of particular books, authors, and literary movements, or adjacent media such as poems, plays, and films, is frequently cited. This is reinforced by the concomitant decline of influence, which peaked with 14 occurrences in the 1960s, where it is similarly deployed (“Jamesian influence,” “Hemmingway’s influence,” etc.), as compared to only 4 occurrences per decade for the 2010s and 2020s. Furthermore, there is a declining sense of imperative, with use of the modal verb must halving by the 1990s from its 1960s peak, and to a fifth in the last two decades. Use of the adverb simply declines concomitantly from 41 in the 1950s to only 8 in the 2020s, perhaps signifying that notions of the self-evident and straightforward can no longer be taken for granted. By comparison, use of the verb experience has consistently grown from only 1 instance in the 1950s to 5 per decade in the 1960s, 1970s, 1990s, and 2000s, with peaks of 9 in the 1980s, 10 in the 2021s, and 14 in the 2020s. The sole 1950s example linked age and experience, while the 1960s through the 1990s, despite often discussing experience in the literal sense, increasingly emphasize the role of consciousness in shaping reality, pivoting in the 1980s toward acknowledging heterodox ways of experiencing the world. By the twenty-first century, this sense has bedded in, along with reference to the interior experiences of both fictional characters and readers of the fictional work. Consistent upward trending of the verbs include and allow, along with the adverb also, suggests a similar “loosening up” of possibility. Include varied between three to four usages from the 1950s through the 1980s before coming to average almost triple that in the last four decades. It now frequently refers to a variety of formative factors, including the role of an author’s life experience in constructing a fictional work. Include can also connote inclusivity of choice regarding a writer’s options in developing a narrative (“a story can be anything, can include anything”). Usage of allow has quadrupled from the fifties through to 2020s and similarly suggests the multiplicity of affordances contemporary writers enjoy (“you are allowed to practice your way and I’m allowed to practice mine”), while also has risen from 22 instances in the 1950s through to 124 in the 2020s. Its additive function may correlate with an increasing emphasis on being able to contain contradictory impulses, such as “works that are full of energy but also full of vulgarity, crudity, and incompetence.” 1.3.2 A Shift from Stylistics to Structure and Contested Narratives “Style and technique” are commonly collocated, with style, another of the top 10 downwardly trending terms, declining more than fourfold from 40 usages per decade in the 1950s and 1960s to 8 today. As with technique, there are equivocal views on certain styles (i.e., “the conventional conception”), dubiousness about Diachronic Trends in Fiction Authors’ Conceptualizations 17 the very notion (“a fancy form of writing”), and associated notions of genre (“this so-called style. I don’t know what they’re talking about ‘tough,’ ‘hard boiled’”). There is a notable trend in the last two decades to frame style in terms of both contrivance (“a contrived style”) and heterodoxy (“each story brings its own style,” “alter egos who wrote in distinct styles”), suggesting a postmodern shift toward style as artifice rather than some essential attribute. Modern also declines notably from 13 and 14 occurrences in the 1950s and 1960s, respectively, to 2 and 3 in the 2010s and 2020s. In the 1950s, it was commonly applied to literary genres, novels, and especially, writers. By the 2010s and 2020s, there are only two such usages, and one to do with “something that sounds modern.” The use of the adjective great is also in serious retrenchment, from its peak of 83 in the 1960s to 24 this decade. Though scattered references remain to great “writers,” “stories,” “tales,” and “fiction,” there is increasing reference to great “questions,” “mysteries,” “loves,” and “heartbreaks.” The inference would be that, along with specific styles, modernism’s concern with canonical writers and “great works” has lost currency to the demand for relatable content. By comparison, structure and narrative are on the incline. The growth of the former has been protean, increasingly fivefold from 1950 to 25 occurrences in the 2020s. Structure was explicitly linked in the 1950s and 1960s to formalism, including “form,” “logic,” “craft,” and “plan.” However, as early as the restive 1960s and 1970s, such overdetermination was increasingly questioned. Structure is, instead, something that emerges from the unknown and “can’t be planned in advance,” defying attempts at the “linear” and “well-defined.” The 1980s and 1990s go further, with writing itself imposing structure on reality, including power structures which must be contested. Such usages continue, though in the 2000s there was increasing emphasis on storytelling structures, such as “story structure,” “novelistic structure,” “Aristotelian structures,” and “folkloric structures.” Despite this, recent usage evidences a growing sense of the relativity and multiplicity of narrative structure/s. Use of narrative has doubled in frequency from 9 occurrences in the 1950s to 21 this decade, with a spike in the 1990s of 24 occurrences. The shift is from fairly utilitarian deployment in the 1950s, often linked to “expositional and narrative writing” and “narrative technique,” to a more explicit focus on voice and characters’ own conceptualizations about themselves, often pluralistic in nature (“narrative points of view”). By the 2020s, there is increasing consciousness of how narratives are used to frame and indeed construct fictional “realities” (“different narrative[s],” “the nation’s narrative,” “a new kind of narrative”) and contestations of identity, such as gender and sexual orientation. Thus, we find mention of “men controlling the narrative” and “straight narrative,” versus “women get[ting] hold of the narrative,” “coming-out narratives,” and “trans narratives.” 1.3.3 Critics, the Business of Writing, and the Rise of Affect Critic is the second steepest-declining lemma. The 1950s, with 37 occurrences, were the critics’ heyday, a time when the profession was institutionalized in 18 Darryl Hocking and Paul Mountfort literary broadsheets and practitioners acted as cultural arbiters. This declines by approximately half per decade from the 1960s to the 1980s’ 5 instances. Brief upticks to 7 per decade in the 1990s and 2000s were quickly effaced, with a slump back to 5 occurrences again in the 2010s and only 1 in the 2020s. Ambivalence toward critics has always been proverbial from authors, and a high proportion of usages in the 1950s, despite acknowledgment of the critics’ job being to critique, are negative. They stand variously accused of reading too much into a work, being hard to please, having agendas, not listening, and being too fierce. However, such disavowals prove that critical opinions or consensus mattered to, frequently irritated, and sometimes wounded writers. This ambivalence carries on – one choice phrase from the 1960s describes “a certain type of critic, the ferrity, human-interest fiend, the jolly vulgarian,” while a 1970s writer impugns their willful refusal “to see what [their] books are really about” – until the nadir of the term. By the 1980s we find mention of the limitations of the author as critic of their own work, and in the 2010s, the need to outrun the “inner critic” along with mention of the particular tastes of “gay readers and critics,” while the sole usage in the 2020s is a writer claiming they were “able to be [their] own critic.” In other words, criticism came to be discussed less in terms of external recognition and more with regard to intrinsic factors and fitness for target readerships. Perhaps concomitantly, framing writing in terms of business has also declined from peaks of 16 and 19 in the 1950s and 1970s, respectively, to just under 7 per decade from the 1980s to today. Business occurs most frequently to mean the work of writing, with occasional reference to the “business plan” and publishers as “businesspeople,” among more general references to “this business of roots and stories” and “the business of living.” Against these declines, a raft of subjectively toned words connoting positive, even passionate, engagement are in the ascendency, such as feels, open, wanted, love, and felt. Space does not permit sustained unpacking here, but it can be noted that feels has tripled this century from 8 uses in the 2000s to 24 in the 2020s, where it has come to foreground the feelings of writers, characters, and readers in relation to critical issues. Metonymic of this wider shift, “the idea that there’s somehow less art in writing that feels close to the bone is just an old, ancient, disempowering story.” This affective turn or “structure of feeling,” as Williams (1977, 133) put it, is reflected in similar if less-steep inclines for wanted and felt. Wanted is very frequently used now in relation to writers’ priorities when developing work (i.e., [I] wanted “to explore,” “to write,” or “speak” about, to “focus on,” “to capture,” “to show,” “to suggest,” “to defamiliarize”), while felt connotes a growing emphasis on writers’ self-reflexivity toward their own work ([it] felt “so enormous,” “more urgent and true,” “productive,” “subversive!” “limited,” “more apolitical,” etc.). Arguably, then, affective concerns have displaced hierarchical impositions from the outside, whether critical or prescriptive, of how writers conceptualize their “work.” 1.3.4 Centrality of Community and Space Cumulatively, these shifts would appear to reconfigure earlier formalist concerns with technique and style that are hedged around with imperative toward more Diachronic Trends in Fiction Authors’ Conceptualizations 19 holistic conceptualizations of the writing process and its social context. Arguably, this culminates in a major shift toward the foregrounding of community and space, the first and third upward-trending terms in the corpus, respectively. The rise in usage of community has been meteoric – from a couple of instances in the 1960s and 1970s to 23 this decade. A spike in the 1980s, particularly in relation to ethnic communities, fan cultures, and “the gay community,” has prefigured its increasing mobilization since then. We find many such increasingly heterodox usages this century, from the “Jewish/communist anarchist/community,” “the Māori community,” and “Bronson Alcott’s failed utopian community,” to the “radical feminist alternative community” and “the queer community.” Such communities are sometimes the subject of literary works or allude to the writer’s cultural background but may also be the sites in, or for, which the works are produced. Space has also risen from similar obscurity in the 1950s and 1960s to 25 instances in the 2010s and 31 this decade. In the 1970s, there is an almost-perfect split in its 12 occurrences between “outer space” and “inner space, psychological space,” a dichotomy that persists through the 1980s. By the 1990s space is increasingly to do with gaps and ellipses, including spatial layout; a narrator’s “own demands for space” also figures. These uses intensify in the 2000s, from the need for “space for the imagination,” to the inherent politics of space (“those spaces of silence that exist in Nigerian literature”). In the 2010s, both these trends are extended via discussion of “spaces within novels” and “the idea that there exist multiple and distinct cultural and racial spaces.” By the 2020s we have a plethora of usages, from “female,” “feminist,” “gendered,” and “queer” spaces to “weird,” “tonal,” “emotional,” “online,” “coworking,” and “safe” spaces. Despite the fact that community and space do not specifically collocate in the corpus, the terms do appear to be complementary in suggesting sites for the production and circulation of literature that are less prescriptive in terms of formalist concerns and more rooted in the context of diverse communities, along with the affordances their associated spaces – literal and conceptual – provide. 1.3.5 Diachronic Keyword Analysis Tables 1.4 to 1.8 provide an indication of the top keywords for each decade of the corpus. As indicated previously, to ensure that all lemmas identified as key are particularly representative of the target decade, they must occur in at least Table 1.4 1950s Keywords Rank Lemma POS Freq. Keyness FPM 1950s FPM Rest Doc Freq. Focus 1 2 3 4 5 6 noun noun noun adjective noun modal 37 32 29 24 32 15 critic technique hero simple play ought 3.1 3.0 2.9 2.5 2.4 2.3 592.56 512.48 464.44 384.36 512.48 240.23 121.15 107.43 93.72 91.43 160.00 48.00 16 10 10 11 10 9 20 Darryl Hocking and Paul Mountfort Table 1.5 1960s Keywords Rank Lemma POS Freq. Keyness FPM 1960s FPM Rest Doc Freq. Focus 1 2 3 4 5 6 must power man picture country style modal noun noun noun noun noun 72 31 150 23 45 40 2.3 2.2 2.2 2.2 2.0 2.0 1,152.15 496.06 2,400.31 368.05 720.09 640.08 445.77 166.88 1,051.57 116.59 306.33 267.46 23 10 24 11 15 13 Table 1.6 1970s Keywords Rank Lemma POS Freq. Keyness FPM 1970s FPM Rest Doc Freq. Focus 1 2 3 4 5 6 25 37 23 96 13 22 serious author paper fiction terribly money adjective noun noun noun adverb noun 2.4 2.1 1.9 1.9 1.9 1.9 399.99 591.98 367.99 1,535.95 207.99 351.99 112.02 233.18 141.74 768.12 64.01 141.74 9 15 13 17 10 11 Table 1.7 1980s Keywords Rank Lemma POS Freq. Keyness FPM 1980s FPM Rest Doc Freq. Focus 1 2 3 4 5 6 adverb adjective adjective noun preposition adverb 16 27 29 26 14 54 basically moral large imagination beyond rather 2.4 2.3 2.1 1.9 1.8 1.8 255.07 430.42 462.31 414.48 223.18 860.85 48.03 132.66 164.68 171.54 77.77 432.29 10 12 16 10 11 16 Table 1.8 1990s Keywords Rank Lemma POS Freq. Keyness FPM 1990s FPM Rest Doc Freq. Focus 1 2 3 4 5 6 noun noun noun noun adjective noun 16 24 52 14 21 25 face act voice talk English truth 2.1 2.1 2.1 1.8 1.7 1.7 255.89 383.83 831.64 223.90 335.85 399.83 68.59 130.31 349.79 77.73 153.18 192.04 10 9 10 9 10 13 one-third of all interviews from that decade. This document frequency statistic is found in the final column. The tables also provide the keyness score for the lemma, as well as their frequency and frequency per million for the target decade, and frequency per million for the rest of the FAC. The keywords for the 1950s reinforce the earlier trend analysis. For example, the modal ought (“the writer ought to help the reader as much as he can”) aligns with Diachronic Trends in Fiction Authors’ Conceptualizations 21 the modal must, which, as indicated previously, frames practices of fiction writing as constituted by certain obligatory actions and values (“the writer must be disengaged, or else he is writing politics”). It is also notable that while hero figures prominently here, it ranks seventh in terms of overall decline (from 29 in the 1950s to 2 in the 2020s), with frequent ambivalence toward the notion, even in the 1950s, culminating in the practical redundancy of the term in today’s conversations around fiction. The frequency of the proclamatory must is at its apex in the 1960s, but what is of particular related interest here is fiction writers’ obsession with power, either to conceptualize their own writing as associated with a sense of personal power (“people tend to underestimate the power of my imagination,” “it lies within our power, as writers . . . to do something for others”) or as a motivating concern for their writing (“I think there is no question that power is a great temptation”). Perhaps concomitantly, man is a term that rides high in the 1960s but is one of a number of male-gendered lemma from the corpus, others being he and himself, that are in sharp overall decline. Widespread use of the adjective serious prefixing such terms as “story,” “novel,” “work,” “fiction,” “audience,” and “film critics” lends literary matters considerable gravitas in the 1970s. Concomitantly, this is the only decade where author occupies the top six key lemmas, with interviewees often talking about “an author” or “the author” in generic terms (“the author’s voice”). Fiction is used largely descriptively to refer to the mega-genre of prose storytelling but can increasingly stand in for a “novel” or “story” (“every middle-class fiction is basically a story of adultery,” “the strange amorphous fictions of Barthes and Robbe-Gillet”) in a way suggestive of a postmodern turn toward framing stories as fictions. Adverbs and adjectives dominate the 1980s keywords, with broad-sweeping terms, such as the essentializing basically, intensifier rather, and the aggrandizing large lavished over the decade of excess. At a time when relativists faced off against social conservatives, the term moral retains force but is often inflected with doubt, irony, or even derision (“the moral code of the middle-class writer”). For the first time, writing as a product of imagination, which also engages that of readers, becomes a discernible concern. The 1990s’ key lemmas are particularly interesting for their shared semantic relationships with representation, in a decade when meaning making was increasingly understood as performative. Act is used to refer to both the act of writing, the mimetic acting out of roles, and the multitudinous acts of “love,” “lust,” “hate,” “indifference,” “violence,” “despair,” “manipulation,” and “hope.” Talk may refer to attending talks, speaking voices and idiolects (“this guy talk voice”), and discourse (“talk about”) around issues critical to writers and writing. Voice can refer literally to how people speak, but also differing narrative voices, including points of view (POVs) within fictional works, and the concerns writers give voice to. These mediating factors frequently complicate the notion of unitary truth (“I shall argue with you about your interpretation of the truth”). In the 2000s, landscape becomes a concern in senses cinematic (being “just as important as the [human] figures”), psychogeographic (“the emotional 22 Darryl Hocking and Paul Mountfort Table 1.9 2000s Keywords Rank Lemma POS Freq. Keyness FPM 2000s FPM Rest Doc Freq. Focus 1 2 3 4 5 6 noun noun noun adjective noun noun 19 30 21 16 24 12 landscape boy pleasure recently kid hope 2.4 2.3 2.2 2.0 1.9 1.8 303.12 478.61 335.02 255.26 382.89 191.44 70.90 150.94 96.05 77.76 148.66 66.32 10 14 13 14 13 9 Table 1.10 2010s Keywords Rank Lemma POS Freq. Keyness FPM 2010s FPM Rest Doc Freq. Focus 1 2 3 4 5 6 noun noun noun noun noun noun 17 18 40 49 20 23 perspective childhood draft art research kid 2.3 2.2 2.1 2.0 2.0 1.9 271.84 287.83 639.62 783.54 319.81 367.78 61.73 75.45 246.92 336.08 107.45 150.89 10 10 12 14 10 10 landscape”), and with regard to the literary landscape itself, the metaphor likely owing its rise to an increasingly visual and multimodal culture. Boy and kid often relate to writers’ own upbringings, but the focus on adolescent characters may be connected to the growth in young adult (YA) literature. A spike in abstract nouns pleasure and hope – the first often applying to the “particular, private pleasure” of both text and writing, the second finely balanced with hopelessness (“hopes and desires for what we want the world to look like”) – perhaps reflects a decade in which multiple existential threats to civilization became visible. Several ranked nouns suggest increasingly conscious engagement with the act of writing. Perspective functions both in regard to characterological POVs (e.g., “the ‘god’ perspective”), but also heterodox ways of seeing (“a variety of voices from all different perspectives”). Draft connotes concern with process and reworking material multiple times. Interestingly, art is used less often to refer to writing than related visual arts, but it is notable that, as in visual arts (Hocking 2022), research has increasing currency in creative writing praxis, though sometimes in disavowal (“I try to do as little research as possible when writing” and “research deadens fiction.”) Complementary to the 2000s, kid is again on the rise, as is childhood, for related reasons. It is striking that identity is the top trending lemma in a decade when identarian concerns are very much part of public discourse. It is no surprise, then, to see it used directly after possessives such as “their,” “your,” and “me,” connoting ownership, and to heterodox identity markers (e.g., “gay,” “racial,” “marginal,” and “hybrid”). Whereas male-gendered terms are falling in use, her is on the rise. Diachronic Trends in Fiction Authors’ Conceptualizations 23 Table 1.11 2020s Keywords Rank Lemma POS 1 2 3 4 5 6 identity family parent her space figure Freq. Keyness FPM 2020s FPM Rest Doc Freq. Focus noun 25 noun 64 noun 27 determiner 180 noun 31 verb 17 3.3 3.0 2.6 2.4 2.3 2.3 403.08 1,031.88 435.32 2,902.15 499.81 274.09 50.24 278.60 109.61 1,125.80 157.57 66.22 13 19 13 30 17 13 Complementary to these shifts, space is a key term. As discussed in Section 1.3.4, it clearly now denotes both literal and more abstract zones within which these newly empowered identities, and their associated communities, can contest their corner. Figure is most commonly collocated with “out,” reinforcing the sense of a time when meaning is not set but is being contested both through the act of writing and outside of the boundaries of the text, as if to say, “We are still figuring it out.” 1.4 Conclusion Using a self-compiled diachronic corpus of interviews with narrative fiction authors from the 1950s until the present day, this chapter has identified four major thematic trends in their conceptualizations around their literary practices. In general, they suggest shifts from formalist concerns to more holistic conceptualizations of the writing process and its social context. Of course, there are nuances within this broad arc, such as the declining influence of technique and style set against rising and increasingly polymorphous conceptualizations of structure and narrative. Additionally, the study identified the more salient conceptualizations of fiction writing practice for each of the decades since the 1950s. It was shown that obligatory pressures associated with the powerful demands of cultural arbiters have given way to writers’ more affective engagement with identity, creative process, and heterodox readerships. Given how these findings chime with widely identifiable trends in our cultural moment, we hope to have shown that the artist interview has considerable potential to contribute to the quantitative analysis of fiction writing. References Aston, Guy. 1997. “Small and Large Corpora in Language Learning.” In Practical Applications in Language Corpora, edited by Barbara Lewandowska-Tomaszczyk and Patrick James, 51–62. Melia: Lódz University Press. Baker, Paul. 2006. Using Corpora in Discourse Analysis. London: Continuum. Baker, Paul. 2011. “Times May Change But We’ll Always Have Money: A Corpus Driven Examination of Vocabulary Change in Four Diachronic Corpora.” Journal of English Linguistics 39, no. 1: 65–88. 24 Darryl Hocking and Paul Mountfort Baker, Paul, Costas Gabrielatos, and Tony McEnery. 2013. Discourse Analysis and Media Attitudes: The Representation of Islam in the British Press. Cambridge: Cambridge University Press. Biber, Douglas, and Edward Finegan. 1989. “Drift and the Evolution of English Style: A History of Three Genres.” Language 65, no. 3: 487–517. Bowker, Lynne, and Jennifer Pearson. 2002. Working with Specialized Language: A Practical Guide to Using Corpora. London: Routledge. Busse, Beatrix. 2020. Speech, Writing, and Thought Presentation in 19th-Century Narrative Fiction: A Corpus-Assisted Approach. Oxford: Oxford University Press. Csomay, Enriko, and Ryan Young. 2021. “Language Use in Pop Culture Over Three Decades: A Diachronic Keyword Analysis of Star Trek Dialogues.” International Journal of Corpus Linguistics 26, no. 1: 71–94. Freiburg, Rudolf, and Jan Schnitker. 1999. Do You Consider Yourself a Postmodern Author?: Interviews with Contemporary English Writers. Munster: Lit Verlag. Hocking, Darryl. 2022. The Impact of Everyday Language Change on the Practices of Visual Artists. Cambridge: Cambridge University Press. Hoover, David L. 2007. “Corpus Stylistics, Stylometry, and the Styles of Henry James.” Style 41, no. 2: 174–203. Kilgarriff, Adam. 2009. “Simple Maths for Keywords.” In Proceedings of Corpus Linguistics Conference CL2009, edited by Michaela Mahlberg, Victorina González Díaz, and Catherine Smith. Liverpool: University of Liverpool. Kilgarriff, Adam, Vit Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. 2014. “The Sketch Engine: Ten Years On.” Lexicography ASIALEX, no. 1: 7–36. Kilgarriff, Adam, Ondřej Herman, Jan Bušta, Vojtěch Kovář, Vít Baisa, and Miloš Jakubíček. 2015. “DIACRAN: A Framework for Diachronic Analysis.” www.sketchengine.eu/wpcontent/uploads/Diacran_CL2015.pdf. Klaussner, Carndmen, and Carl Vogel. 2018. “A Diachronic Corpus for Literary Style Analysis.” In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA). https://aclanthology.org/L18-1552.pdf. Koester, Almut. 2010. “Building A Small Specialised Corpora.” In The Routledge Handbook of Corpus Linguistics, edited by Anne O’Keeffe and Michael McCarthy, 66–79. Oxon: Routledge. Kung, Sally. 2007. “Unit 5 Case Studies 5.2 a Diachronic Study of Melancholy in a British Novel Corpus.” www.birmingham.ac.uk/Documents/college-artslaw/corpus/Intro/ Unit52Melancholy.pdf. Lazzeretti, Cecillia. 2016. The Language of Museum Communication: A Diachronic Perspective. London: Palgrave Macmillan. McEnery, Tony, Richard Xiao, and Yukio Tono. 2006. Corpus-based Language Studies: An Advanced Resource Book. New York: Routledge. McIntyre, Dan, and Brian Walker. 2019. Corpus Stylistics: Theory and Practice. Edinburgh: Edinburgh University Press. Morin, Oliver, and Alberto Acerbi. 2017. “Birth of the Cool: A Two-centuries Decline in Emotional Expression in Anglophone Fiction.” Cognition and Emotion 31, no. 8: 1663–75. Scott, Mike, and Christopher Tribble. 2006. Textual Patterns: Key Words and Corpus Analysis in Language Education. Amsterdam: John Benjamins. Diachronic Trends in Fiction Authors’ Conceptualizations 25 Sun, Kun, and Rong Wang. 2022. “The Evolutionary Pattern of Language in English Fiction Over the Last Two Centuries: Insights from Linguistic Concreteness and Imageability.” SAGE Open 12, no. 1: 1–13. https://doi.org/10.1177/21582440211069386. Stanyer, James, and Sabina Mihelj. 2016. “Taking Time Seriously? Theorizing and Researching Change in Communication and Media Studies.” Journal of Communication 66, no. 2: 266–79. Underwood, Ted. 2019. Distant Horizons. London: University of Chicago Press. Vaughan, Elaine, and Brian Clancy. 2013. “Small Corpora and Pragmatics.” In Yearbook of Corpus Linguistics and Pragmatics, edited by Jesús Romero-Trillo, 53–73. London: Springer. Williams, Raymond. 1977. Marxism and Literature. Oxford: Oxford University Press. 2 Within-Author Style Variation in Literary Nonfiction The Situational Perspective Marianna Gracheva and Jesse A. Egbert 2.1 Introduction Studies of style focus on consistent patterns in an author’s works with the goal of revealing pervasive trends in their language use. It has been observed, however, that style is not uniform, but a single author displays varying degrees of versatility across their works. Often, such versatility is associated with evolution of style over time. Hoover (2007), for example, examines Henry James’s style diachronically, identifying pervasive vocabulary in 20 of his novels, and distinguishes between substyles within them as well as James’s early, intermediate, and late stylistic trends. Moss (2014) performs a diachronic analysis of James’s syntax in two novels and finds increased syntactic complexity in non-dialogic sections of his later work. It does not seem common, however, for such studies to identify reasons for this variation other than the effect of time. Biber and Conrad (2019, 16) note that research on style is primarily concerned with the aesthetic value of linguistic choices, which are “not directly functional.” In fact, functional underpinnings of these choices are often seen as either irrelevant or impossible to identify. Discussing James’s frequent use of pronouns without clear referents, for example, Moss (2014, 78) notes that “sometimes James uses these stylistic devices without clear reasons, apparently mimicking the pattern set by those sentences in which such unusual structures have been meaningful,” thus suggesting that there is essentially no functional reason for this use and it is purely idiosyncratic. The same idea is expressed in her statement regarding sentence length: “No reason has been identified for the relatively short clauses . . . ; it seems simply to be a stylistic variation” (Moss 2014, 173). It is possible, however, that language produced with attention to aesthetics at the same time fulfills certain communicative functions. In their synchronic study of eighteenth-century fiction and essays, Biber and Finegan (1994, 13–4) demonstrate that a method based on systematic functional co-occurrence of linguistic features (multidimensional analysis) can be applied to stylistic analysis. To this end, the styles of Johnson, Addison, Defoe, and Swift are analyzed with respect to three dimensions of variation identified by Biber (1988): “informational vs. involved production” (preference for nominal [informational] features vs. clausal ones, characteristic for spoken, more involved discourse), “elaborated vs. situation-dependent DOI: 10.4324/9781003298328-3 Within-Author Style Variation in Literary Nonfiction 27 reference” (use of wh-relative clauses for elaboration vs. discourse situated through time, place, and other adverbials), and “abstract style” (use of passive structures). The study illustrates that some authors exhibit noticeable variation across their works. For example, on the dimension of “elaborated vs. situation-dependent reference,” Swift’s fiction is “moderately non-elaborated,” but one text is the most situated of all the text in the study. In essays, Defoe’s texts are divided between markedly situated and relatively elaborated; Addison shows a wide range as well from highly elaborated texts to quite situated. Swift’s and Addison’s essays also extend over a range of variation along the dimension of “informational vs. involved production.” Biber and Finegan attribute this range to “stylistic adaptation to various topics and purposes,” thus suggesting that these preferences for different sets of features are functional and serve the communicative goal of the text. In a study of nineteenth-century fiction, Egbert (2012) observes considerable within-author variation in the styles of several authors, the most striking of which is the case of Mark Twain on the dimension of “thought presentation vs. description.” Twain’s works The Prince and the Pauper and Tom Sawyer, Detective occupy opposite ends of the dimension, the former being highly descriptive, rich in attributive adjectives and prepositional phrases modifying other nouns, and the latter heavily focused on character thought and emotion conveyed through a variety of verbal features and their complements expressing stance. The study further examines within-novel variation across the chapters of these two works. The results again show a considerable range in scores on the dimension, indicating vast differences even within a single novel, yet no overlap in the dimension scores of the individual chapters between the two books. While further analysis of these trends was outside the scope of the study, these results suggest that systematic choices made by the author in favor of certain sets of features fulfilling the purpose of presenting ideas or that of description are guided by functional considerations. Extensive within-author differences along dimensions of style variation were observed in another literary register – modern nonfiction essays (Gracheva 2022). The study identifies three main dimensions of variation in the corpus: “interactive vs. informational style,” “abstract expository vs. concrete descriptive style,” and “immediate vs. removed style.” While some authors show stylistic consistency, their texts gravitating toward one or the other end of the spectrum, each dimension also features authors who exhibit substantial variation across their works. Analysis of texts from the opposite poles suggests that this internal variability may reflect consistent differences in situational characteristics. In particular, the communicative purpose of each text may be a key consideration influencing the choice of style. For example, in the works of the same author, the purpose of entertaining and sharing personal experiences results in a highly interactive style, while the purpose of informing warrants a focus on information presentation; the goal to evaluate abstract notions leads to an abstract expository style, while a concrete descriptive style is chosen by the same writer to depict a future event; finally, the immediate style creating a sense of urgency is employed in a persuasive text aiming to instigate change, while a past-oriented removed style is opted for by the same author in a cultural and historical treatise. 28 Marianna Gracheva and Jesse A. Egbert While these observations, based on qualitative analysis of select authors’ works, warrant only tentative conclusions, the present study aims to consistently distinguish between situational characteristics of the essays and trace the relationship between communicative purpose and an author’s stylistic choices across their works. Since the term “essay” is applied to a wide variety of factually accurate texts detailing events from the author’s life or real-world events (Gutkind 2007), authors writing in this register have an unusual amount of creative freedom, and essays are commonly classified as personal, memoir, literary journalism, narrative, rhetorical, meditative, collage, braided, or lyric, among others (Silverman 2008; Hartsock 2016). It is highly likely that this variety of essay types reflects difference in communicative purpose; it is therefore natural to expect that authors who do not limit themselves to a single type exhibit variation in purpose across their texts. Alternatively, a single essay type may integrate a variety of communicative purposes into a single text. In this attempt to investigate the relationship between author style and communicative purpose, this study problematizes the notion of style as merely a reflection of aesthetic or idiosyncratic preferences. Since linguistic features have been shown to be functional in nature (e.g., Biber 1988), if an author consistently associates a particular communicative purpose with a certain group of features, it may be problematic to separate purely aesthetic preference from functional use. The study poses the following research question: 1 To what extent does the communicative purpose predict within-author stylistic variation on the identified dimensions (“interactive vs. informational style,” “abstract expository vs. concrete descriptive style,” “immediate vs. removed style”)? The next section provides a brief description of the three dimensions and the method used in the present study. 2.2 Method 2.2.1 Dimensions of Style Variation in Modern Literary Nonfiction The dimensions of variation outlined previously were identified by a multidimensional analysis conducted on a corpus of 300 creative nonfiction essays written by 17 modern (predominantly twenty-first century) authors (Gracheva 2022). “Interactive vs. informational style” is a cline from authors whose works involve interactivity with the reader or between the characters to those who prioritize informational density. Texts scoring high in interactivity are marked by typical features of oral discourse (Biber 1988): first and second person pronouns, demonstrative and indefinite pronouns, “that” deletion, wh-clauses controlled by private verbs, conditional adverbial clauses, emphatics, wh-questions, among others (see Appendix for a complete list of features comprising each dimension). On the informational end of the pole, the texts are rich in features functioning as Within-Author Style Variation in Literary Nonfiction 29 information packaging devices – prepositional phrases, nominalizations, attributive adjectives, and by-passives – while features of oral discourse are rare. “Abstract expository vs. concrete descriptive style” is a continuum from authors focusing on exposition of abstract concepts, stance, and reasoning to writers whose essays are full of concrete descriptive detail. This dimension is characterized by a contrast between abstract and concrete nouns, abstract nouns denoting complex concepts explicated through noun complements, verb complements, and causative adverbial clauses (these clausal features serving the function of reasoning) and concrete nouns contributing to descriptions enhanced by present and past reduced relative clauses. Again, these sets of features are in complementary distribution: texts with high rates of occurrence of the first set of features typically do not contain features from the other end of the pole, and vice versa. “Immediate vs. removed style” represents a cline from present-oriented texts to texts set in the past, with high rates of occurrence of past tense, perfect aspect, third-person pronouns, and public verbs. This present or past orientation appears to reflect the author’s presence and involvement in or distance from the described events, which gives the dimension its label. As each text from each author receives a dimension score, that is, an indicator of how high or low the text occurs on a given dimension, it is possible to observe each author’s stylistic tendencies on each dimension. As was stated earlier, while clear stylistic preferences were observed in the case of some authors, each dimension also featured authors whose works were spread wide along the spectrum. It is these authors’ texts that are of interest to the present study, which is concerned with identifying the basis for this substantial linguistic variation across their works. 2.2.2 Present Study: Corpus and Quantitative Analysis The author who exhibited the widest range of variation was selected on each dimension (Table 2.1). The descriptive statistics in Table 2.1 show dimension score means and variance (SD) of each author’s texts on their respective dimension. Table 2.1 Range of Variation along Dimensions Author Number of Texts Dimension Variation along Dimension M Phillip Lopate 32 Ander Monson 19 David Shields 27 Interactive vs. -0.17 Informational Style Abstract Expository vs. -0.06 Concrete Descriptive Style Immediate vs. Removed 0.11 Style SD 1.03 0.91 1.12 30 Marianna Gracheva and Jesse A. Egbert The spread of scores across each author’s works is illustrated in Figures 2.1– 2.3. Thus, Lopate’s style is represented by essays relying on both interactivity and informational focus, Monson’s essays are equally divided between the expository and descriptive ends of the spectrum, and Shields’s essays are varied in their present or past orientation and narrator presence. To investigate the role of communicative purpose in this linguistic variation, each text was coded for purpose, resulting in several purposes commonly found across an author’s works. In these cases, one-way ANOVAs were run, with purpose treated as the independent variable. Author style, operationalized through Figure 2.1 Phillip Lopate’s essays: spread of scores on “interactive vs. informational style.” Within-Author Style Variation in Literary Nonfiction 31 Figure 2.2 David Shields’s essays: spread of scores on “immediate vs. removed style.” the dimension scores of the author’s texts on their respective dimension, was the dependent variable in each case study. In the case of one author, however (Monson on the dimension of “abstract expository vs. concrete descriptive style”), coding the texts for purpose did not reveal clear distinctions, as purposes of all texts included reflection. This uniformity makes purpose (at least in the way we coded it) an unlikely reason for the substantial linguistic variation observed across the author’s works. To examine the possibility of other factors determining Monson’s choice of style, a hierarchical cluster analysis was conducted on his essays, with the goal of grouping 32 Marianna Gracheva and Jesse A. Egbert Figure 2.3 Ander Monson’s essays: spread of scores on “abstract expository vs. concrete descriptive style.” texts based only on proximity in dimension scores. Thus, instead of using a topdown method of coding for communicative purpose and measuring differences among them, as with the authors on the first two dimensions, for Monson’s essays we used a bottom-up approach grouping texts that are maximally similar to each other in their scores and maximally different from the texts in the other clusters. Each cluster of Monson’s essays could then be examined for a possible functional basis of these groupings. Within-Author Style Variation in Literary Nonfiction 33 2.3. Results and Discussion 2.3.1 Phillip Lopate: “Interactive vs. Informational Style” The communicative purposes identified in Phillip Lopate’s essays, which varied on the dimension of “interactive vs. informational style,” include addressing a question, description of a person, narration and reflection, speculation on or analysis of an issue, and review (Table 2.2). Essays addressing readers’ or students’ questions are concerned with writing techniques and practices, such as the ethics of writing about other people, research and personal writing, and ways of effectively ending an essay. These texts explicitly state the question and are structured in the form of an answer, offering guidance, explaining the rationale behind the advice, and providing the necessary background into essay writing. The focus of essays describing a person, or rather an interaction, is on a memorable encounter, an experience that involved other people (e.g., renting a place), or a relationship. These essays describe people through conversations the narrator has with them and place emphasis on interpersonal matters. The purpose to narrate and reflect is found in essays that tell a story and share the narrator’s thoughts, opinions, and feelings about the events. Essays whose purpose is to speculate on or analyze an issue differ from the narrative essays containing reflection in that there is usually no specific event or episode that underlies the analysis. These speculative essays typically present the problem they examine and the author’s view without the foundation of a concrete experience. Finally, reviews are the author’s evaluations of directorial efforts or works of writing. Thus, Lopate’s essays are quite diverse in purpose, and purpose is found to be a statistically significant predictor of linguistic variation on the dimension (F [4, 27] = 15.67, p < .05, R2 = 0.7). This range includes purposes that explain the higher level of interactivity in some texts through author engagement with the reader (the purpose of addressing a question) or through reproducing an encounter (the purpose of describing a person) and warrant informational density in others (texts whose goal is to narrate and reflect, speculate, or review a work of art). This split between the “interactive” and “informational” purposes can be seen quite clearly in Figure 2.4. The three essays with the goal of describing a person or an encounter are all found on the “interactive” pole (positive scores of 3.2, 2, and Table 2.2 “Interactive vs. Informational Style” Dimension Scores across Communicative Purposes in Phillip Lopate’s Essays Purpose Number of Texts Descriptive Statistics M Describe a person/ encounter Address a question Narrate and reflect Speculate/analyze Review SD 3 1.99 1.19 3 6 9 11 0.42 -0.04 -0.45 -0.92 0.93 0.30 0.61 0.39 34 Marianna Gracheva and Jesse A. Egbert Figure 2.4 Phillip Lopate: variation by communicative purpose on “interactive vs. informational style.” 0.8), as well as two of the three essays labelled as addressing a question (scores of 1.3 and 0.6). In contrast, essays fulfilling the three communicative purposes which suggest an emphasis on information presentation, namely, narration and reflection, speculation, and review, are predominantly found on the “informational” pole. The purpose of describing a person, the most extreme representation of interactivity in Lopate’s essays, differs significantly (p < .05; d > 0.8) from essays with all three informational purposes, while the purpose of addressing a question is statistically different from reviews (p < .05; d = 1.9). The differences between communicative purposes appear even more nuanced than this general distinction between the “interactive” and “informational” extremes: the two purposes on the “interactive” end are also statistically different (p < .05, d = 1.5). Specifically, the purpose of addressing a question, found in two essays with positive scores (0.6 and 1.3) but one text with a moderate negative score (-0.6), appears to encompass a wider linguistic range. Considering that the answer presented in these essays may rely to different extents on specific strategies, such as explanations, examples, or references to theory, it is not surprising that essays fulfilling this purpose differ linguistically, perhaps reflecting these finer-grained distinctions. The excerpts that follow illustrate the five communicative purposes of Lopate’s writing from the most interactive essays to the most informational ones (scores indicated in parentheses) and the differences in their linguistic representation (interactive features bolded, informational italicized). Within-Author Style Variation in Literary Nonfiction 35 Texts 1 and 2 represent the “interactive” pole of the dimension, but it is apparent that the emphasis on interactivity manifests itself to different degrees. In Text 1, describing a relationship through a conversation, the focus on interpersonal matters is conveyed through first- and second-person pronouns, private verbs of feeling and mental state, followed by an expression of stance through a wh-complement clause, another complement with that deletion, a conditional clause, and emphatics. The passage also features other indicators of oral discourse, such as demonstrative and indefinite pronouns: Text 1: Description of a person or an encounter, Motel (3.15) I don’t really know what I’m trying to say, but I always felt [that deletion] there was something you were holding back. It doesn’t work that way, Phillip. As long as you weren’t forthcoming with it, I didn’t see any way I could allow myself to. It’s like a game. You put your chip down, I put my chip down, you put another, and I put another. It doesn’t work without being mutual. As stated earlier, despite being found on the “interactive” end, the purpose of addressing a question is significantly different from that of describing a person. Unlike Text 1, Text 2 is monologic, with the author stating the questions he is commonly asked and relating them to his background as a writer. This goal does not demand the same emphasis on interactivity; as a result, the typical features of interactivity in this excerpt, wh-questions, first-person, indefinite, and demonstrative pronouns, and private verbs, are less prominent. In fact, the text features several information packaging devices, such as prepositional phrases, nominalizations, and attributive adjectives, reflecting the importance of information presentation for essays with this purpose: Text 2: Addressing a question, On the Ethics of Writing about Others (1.27) Whenever I speak in public about autobiographical nonfiction or simply give a reading of my own work, I am invariably asked in the Q-and-A session: How should one deal with writing about one’s family members or intimates? How does one balance the need to tell one’s story with the pain others might feel in being exposed this way? The assumption is that since I have written candidly about family and friends in the past, I must know the answer to this difficult question. Narration and reflection is the purpose that gravitates toward the “informational” end; however, the negative scores are moderate, and these texts tend to combine features of informational density and interactivity. Text 3 illustrates this balance. The narrative is personal, which is reflected in the use of first-person pronouns; however, the reflection accompanying the narrative requires a heavy informational focus. Nominalizations, particularly frequent in the text, express mental and emotional states (“semblance of rationality,” “expression of sympathy and knowingness,” “detachment,” “skepticism,” “reaction”) and other complex 36 Marianna Gracheva and Jesse A. Egbert constructs (“personality,” “essayist’s equipment”), often further modified through prepositional phrases. Text 3: Narration and reflection, A Mother’s Tale (-0.07) When I was about eight years old, not long after I had mastered speech and a semblance of rationality, I became, or rather, fashioned myself into becoming, an ideal interlocutor for my mother. She would come to me with her troubles (usually complaints about my father) and I would listen with an expression of sympathy and knowingness, which I had learned to sham at an early age. . . . I now see that large parts of my adult personality and professional demeanor were formed in reaction to my mother: habits of detachment, skepticism, and thinking against oneself, which are classic essayist’s equipment. The purpose of speculation is found further down on the “informational” end. Text 4 demonstrates that reliance on a combination of nominalizations and attributive adjectives (“cosmopolitan worldliness,” “painstaking, willed achievement”) among other features conveys the author’s judgment and contributes to the speculative tone: Text 4: Speculation, Brooklyn the Unknowable (-0.62) Brooklyn’s provincialism, be it said, is not, or not entirely, a failure to achieve cosmopolitan worldliness; it is also a painstaking, willed achievement. It’s not easy to be situated next to the most au courant place on the planet and hold onto one’s rough edges. Though Tiresias’s passage between genders has always struck me as exhausting, I seem to have conducted my life so as to crisscross the identity border between Manhattanite and Brooklynite. Finally, reviews are essays with the largest negative scores on the dimension. Text 5, a film review, shows that here informational features contribute to value judgments and their expressivity (“greatest performances,” “grim hard-nosed comedy,” “the embodiment of a corrupt past,” “parable of regional and class tensions”) as well as depict the on-screen reality (“optimistic modern Northern Italy,” “poor, backward, fatalistic South,” “a go-getter foreman in a Milanese car factory”), thus fulfilling primary goals of a review. Text 5: Review, Strained Relations (-1.49) That situation has been rectified by Rialto Pictures, which is releasing a lustrously restored and newly subtitled version of Lattuada’s supremely grim, hard-nosed comedy. The plot centers on Nino, a go-getter foreman in a Milanese car factory, played by Alberto Sordi in one of his greatest performances. . . . The film can be seen as a parable of regional and class tensions: between the gullibly optimistic modern Northern Italy of the economic boom, and the poor, backward, fatalistic South, still ruled by bandits and gangsters, the embodiment of a corrupt past that has never gone away. Within-Author Style Variation in Literary Nonfiction 37 2.3.2 David Shields: “Immediate vs. Removed Style” Analysis of David Shields’s essays, showing variation on the dimension of “immediate vs. removed style,” revealed four distinct communicative purposes (Table 2.3): to analyze and reflect on a problem or a phenomenon, to argue a point with reliance on evidence, to narrate, and to review. The purpose of analysis and reflection characterizes texts presenting the author’s thoughts on forms of writing, literature, norms and conventions of the writing industry. Unlike the argumentative purpose (to argue a point), this kind of reflection is typically not supported by concrete evidence and conveys mainly the author’s opinions and personal observations. In contrast, the purpose to argue a point involves evidence in the form of research findings, historical facts, or statements made by authority figures. Narrative essays share past experiences, and reviews focus on film and works of writing. Purpose is again found to be a significant predictor of variation across texts (F [3, 23] = 36.43, p < .05, R2 = 0.83), with all pairs except reviews and analyses/ reflections showing significant differences (p < .05, d > 0.8). The essays of these two groups are both found on the positive, present-oriented end of the dimension. This lack of significance is unsurprising in view of the overlap between these two purposes – most reviews are likely to contain analysis of the content of the piece and its artistic merit. Figure 2.5 illustrates how the texts unified by a single purpose cluster on the dimension. Expectedly, Figure 2.5 shows the divide between analyses and reviews occupying the present-oriented pole and narratives on the past-oriented end of the dimension. Text 6, an example of an analysis/reflection, demonstrates immediacy and author presence as he expresses stance and makes statements about mental processes and their reflection in writing. This goal necessitates the use of present tense (bolded) and exemplifies what Langer (1953, 208) calls the “timeless present,” best suited for conceptualizations of ideas and their interrelationships. Text 6: Analysis/reflection, Contradiction (2.16) It’s natural to enter into dialogues and disputes with others, because it’s natural to enter into disputes with oneself: the mind works by contradiction. Great art is clear thinking about mixed feelings. One of the tricks in writing a personal essay is that you have to develop a dialogue between the parts of Table 2.3 “Immediate vs. Removed Style” Dimension Scores across Communicative Purposes in David Shields’s Essays Purpose Analyze/reflect Review Argue a point Narrate/describe an experience Number of Texts 7 4 10 6 Descriptive Statistics M SD 1.03 1.01 0.13 -1.62 0.67 0.41 0.32 0.55 38 Marianna Gracheva and Jesse A. Egbert Figure 2.5 David Shields: variation by communicative purpose on “immediate vs. removed style.” yourself that in a way corresponds to the conflict in fiction. You cop to various tendencies, and then you struggle with these tendencies. Similarly, Text 7, a review, presents the author’s interpretation of a literary work, facts he considers generally true and, therefore, “timeless” (Langer 1953): Text 7: Review, Autobiographic Rapture (0.80) The contrast between the title of Vladimir Nabokov’s autobiography and the title of his first English novel suggests a distinction between autobiography Within-Author Style Variation in Literary Nonfiction 39 and fiction. . . . But the comedy, as always with Nabokov, cuts considerably deeper. If autobiography is a physical place to which one can return, and if memory has words with which to communicate, then consciousness is tangible and the imagination is real. Argumentative essays differ significantly from the two purposes found on the “immediate” end (p < .05, d > 0.8); however, Figure 2.5 shows that these essays are a mixture of the “immediate” and “removed” styles. While the present tense may be expected in argumentation, it is interesting to notice the role of the negative features, namely, past tense, perfect aspect, public verbs, and third-person pronouns, which here are related to evidentiality. Text 8, an excerpt from an essay arguing for the need to borrow from other art forms to avoid stagnation, illustrates that these essays often present a balance of present and past orientation, the former serving the goal of outlining the problem and the latter providing evidence for the claim from past trends in history. It is thus this emphasis on evidence that makes argumentative texts distinct from the purely analytical/reflective ones discussed previously and warrants the use of narrative features (italicized): Text 8: Arguing a point, Reality Hunger (-0.02) Why is hip-hop stagnant right now, why is rock dead, why is the conventional novel moribund? Because they’re ignoring the culture around them, where new, more exciting forms . . . are being found (or rediscovered). American R & B was enormously popular in Jamaica in the 1950s. . . . The music culture was based around DJs playing records at public dances; huge public-address systems were set up for these dances. DJs started acting more and more as taste editors. Finally, the narrative purpose relies on the past to a considerably greater extent. Text 9 tells a personal story and contains multiple references to past events and their participants. It is clear from Figure 2.5 that such texts only occupy the negative end of the dimension. Text 9: Narration, Notes for Eulogy for My Father (-1.46) Since I was six years old, the first thing he and I have done every morning is read the sports page. . . . [W]hen Mike Marshall hit a three-run home run in the tenth inning to win it for the Dodgers, he and I looked at each other and we were both, a little weirdly, crying. Games have held us together, but also words. I’ve always loved his love of puns . . . ; admired his ability to tell a joke and a story. 2.3.3 Ander Monson: “Abstract Expository vs. Concrete Descriptive Style” The picture is somewhat more complex in the case of Ander Monson, who exhibits substantial linguistic variation on the dimension of “abstract expository vs. 40 Marianna Gracheva and Jesse A. Egbert Figure 2.6 Ander Monson: variation by communicative purpose on “abstract expository vs. concrete descriptive style.” concrete descriptive style.” The process of coding his essays for communicative purpose, however, revealed a limited range, including such purposes as to reflect, describe and reflect, narrate and reflect, engage the reader, and present an idea. It becomes apparent that reflection is the underlying goal in all of Monson’s essays, with even the essays that present an idea or directly address and engage the reader in the thought process heavily focused on reflection. This homogeneity makes it unlikely that communicative purpose is the factor accounting for variability in the author’s choice of language, which is confirmed by a lack of statistical significance of purpose as a predictor of variation (F [4,14] = 0.57, p>.05; Figure 2.6). To explore the possibility of other functional reasons for the extensive range of variation shown by the author on the dimension, we used hierarchical cluster analysis to identify possible bottom-up “text types” among essays based on proximity in their dimension scores. That is, this approach does not operate on the assumption that the reason for variation lies in the differences in communicative purpose, and the texts were not coded for any situational parameters. Rather, texts close in scores form clusters, which subsequently allows the researcher to identify the possible functional basis for the clustering through a qualitative analysis Within-Author Style Variation in Literary Nonfiction 41 2 16 15 17 8 9 19 5 14 12 18 4 10 6 11 7 3 13 2 1 0 1 Height 3 4 Dendrogram of agnes(x = Monson_reduced, method = "ward' Monson_reduced Agglomerative Coefficient = 0.97 Figure 2.7 Ander Monson’s essays: three-cluster solution. Table 2.4 Clusters Identified in Ander Monson’s Essays Cluster Cluster 1 Cluster 2 Cluster 3 Number of Texts 7 7 5 Descriptive Statistics M SD -0.98 -0.06 1.19 0.31 0.16 0.51 of essays that are linguistically similar within clusters and maximally different between them. The cluster analysis yielded three clusters (dendrogram in Figure 2.7). Table 2.4 and Figure 2.8 show that Cluster 1 includes highly descriptive essays found entirely on the negative end; in contrast, Cluster 3 includes essays scoring high on the positive “abstract” end of the dimension, while Cluster 2 occupies an intermediate position. The differences between the clusters are statistically significant (F [2,16] = 51.61, p < .05, R2 = 0.86), with all three clusters significantly different from each of the others (p < .05, d > 0.8). Qualitative analysis of the essays within the clusters suggests that the quantitative differences between them reflect varying degrees of reliance on concrete illustration in Monson’s essays. The reflections in Cluster 1 seem unique in that they use concrete descriptions of objects, places, or events to convey ideas central to the essay without naming the concepts or problems being analyzed or using 42 Marianna Gracheva and Jesse A. Egbert Figure 2.8 Clusters identified in Ander Monson’s essays. features of reasoning. The reflection often revolves around an object or a place which seems to have triggered it. Features of this object, place, or event are constantly brought into focus, and thought presentation is structured around them. In some essays, these descriptions (italicized) are used to frame the central message, as in Text 10: Text 10, Cluster 1, Forecast (-1.56) It is not about the leather jackets or the letter jackets or the sun that gleams off the hood of his Trans Am as it arrives in a cloud of dust that looks like the beginnings of a blizzard. It’s not about the shoplifted jewelry or the supply of illicit alcohol although all of these are factors. . . . It is the spray-painted slogans on the overpasses-not just the message but the branding and the fact of them. All this is why Heidi ends up reclined in the passenger side of the high school burnout dropout counselor’s-nightmares car. Essays in Cluster 2 are distinctive in that the analyzed concepts are explicitly named, and as they are explained, the essay provides concrete illustrations. Text 11 is an excerpt from an essay equating essay writing to the hacking activity. Within-Author Style Variation in Literary Nonfiction 43 While hacking is first discussed in terms of abstract processes and concepts such as “exploration,” “opening up,” “problem solving,” and “magic,” accompanied by features of reasoning such as verb complement clauses and adverbs (bolded), the text shifts to concreteness (italicized), as an example is provided: Text 11, Cluster 2, Essay Hack (-0.04) Hacking is at heart a creative activity. It is first, simply, an exploration, an opening up, of a system. A kind of problem solving. . . . More loosely, a hack is an ingenious use of technology to accomplish something that is otherwise impossible to accomplish. It is a bridge from one land mass to another over deep water. It appears, like any sufficiently advanced technology, as a kind of magic. . . . For instance, a famous hardware hack, the red box, repurposes a Radio Shack autodialer (a portable, pre-cell-phone device that could store and automatically dial numbers) via some soldering to mimic the tone (technically a series of four tones) that indicates to a pay phone that a quarter has been deposited. Cluster 3 abounds in abstract notions and analysis: essays present an explanation or the narrator’s thought process in an attempt to arrive at an understanding of some complex phenomenon. Text 12 illustrates thought presentation and reasoning through complement clauses controlled by mental verbs, noun complement clauses, and stance adverbials (bolded), but abstract concepts are not exemplified through an account of a specific case or event: Text 12, Cluster 3, On Selah Straterstrom (1.94) Just as I don’t believe in the religion behind the ritual but I love the ritual, I don’t really believe in the practice of augury but I do love the idea of it, that by chance (my personal preference for prognostication . . .) some design presents itself to me. I recognize that what I’m probably doing is allowing chance to access some internally stored information or route, but either way I like the feeling of giving up control. 2.4 Conclusion: Implications for Style Research, Limitations, Future Directions In this chapter, we have shown that stylistic variation across an author’s works may have a functional basis, specifically reflecting the communicative intent of the text. This is the case for Lopate and Shields, whose styles vary with regard to the degree of interactive or informational focus and present or past orientation of the text, respectively. Lopate’s essays found on the “interactive” end fulfill such purposes as to describe a person or an encounter and address students’ or readers’ questions, thus explaining the inclination toward interactivity, while his heavily informational essays aim to narrate and reflect, speculate on a problem, or review an artistic work, justifying the increased use of information packaging devices. The purposes of Shields’s essays, which warrant present-orientedness 44 Marianna Gracheva and Jesse A. Egbert and immediacy, are reviews and analyses/reflections, while essays whose goal is to share a personal story are set in the past. Shields’s argumentative essays occupy an intermediate position on the dimension due to their reliance on evidentiality, with the present tense serving to state the claim and the past providing evidence. Linguistic variation has a different basis in the writing of Ander Monson, whose essays, all reflective in nature, did not reveal variation in communicative purpose. However, the observed differences between the three clusters, identified in a bottom-up way, also suggest a functional basis and reflect different approaches to communicating ideas: by creating a highly specific description in itself sufficient to convey a bigger message, by explicitly stating the concepts discussed and accompanying them with illustration, or by presenting thoughts surrounding abstract notions without the support of concrete exemplification. It is worth noting that whether these distinctions result from an intentional or subconscious choice (the latter often viewed as a defining characteristic of style, e.g., Argamon and Levitan 2005) is a separate consideration, not addressed in this study. It is demonstrated, however, that regardless of its nature, within-author variation contains analyzable patterns that lend themselves to a functional interpretation. An important implication of this line of work for style research lies in the need to acknowledge the effect of situational considerations, such as the communicative intent of the text, on an author’s linguistic choices. This acknowledgment will result in a finer-grained approach to style as individual language use in response to a specific communicative need rather than arbitrary choices made in isolation from a social reality. Awareness of these nuanced situational distinctions, particularly the communicative purpose of a text, within a single author’s body of work seems important in literary translation, which aims at preservation of the original authorial style and a reflection of style shifts of the original in the translated work (e.g., Chesterman 2007). The situational perspective and the idea that authorial preferences have functional underpinnings reflected in specific linguistic choices provide systematic guidance in the task of achieving stylistic similarity to the original, not offered by the view of stylistic choices as arbitrary or purely idiosyncratic. The functional approach is also highly informative for writing practice, as it illustrates that certain stylistic effects can be achieved through strategic use of linguistic devices, which is not a common consideration in the teaching of writing (Bryant 2016). One limitation of the present study is its broad-strokes approach to the operationalization of purpose, as it identifies one overarching communicative purpose of each essay. Recent research indicates that texts are not monolithic units and points to the existence of a far greater granularity in textual units than what is marked by existing boundaries, such as beginning and end of an essay. Egbert and Schnur (2018, 162–3), for example, define a text as a recognizably self-contained and functional language unit, suggesting that larger texts can be subdivided into more granular ones that are also self-contained and functional. These smaller textual units often reflect shifts in purpose within a longer unit of discourse. Alternative, more granular textual units have been explored in studies of the IMRD structure of research articles (Biber and Finegan 1994), discourse units Within-Author Style Variation in Literary Nonfiction 45 in conversation (Biber et al. 2021), or narration and speech within fiction (Egbert and Mahlberg 2020). Building on that research, Egbert and Gracheva (Forthcoming) observe additional linguistic variation associated with increased textual and situational granularity within the registers of fiction, presidential memoirs, political speeches, and introductory textbooks. Modern literary essays are a clear example of a highly varied register, as follows from a lack of agreement on a definition of an essay as well as the results of empirical studies, such as the ones used as the basis for this investigation and reported in this chapter. Thus, it is almost certain that a single essay features multiple communicative purposes, prioritized by the author to different extents. To account for within-author linguistic variability on this fine-grained level, future studies may undertake text segmentation based on purpose shifts or perform continuous coding for purpose, pioneered by Biber et al. (2020) in their study of web registers. Finally, accounting for audience, topic, or other specific situational factors can substantially enhance our understanding of individual language use. References Argamon, Shlomo, and Shlomo Levitan. 2005. “Measuring the Usefulness of Function Words for Authorship Attribution.” Paper presented at ACH/ALLC Conference, Victoria, Canada, June. https://doi.org/10.1.1.71.6935&rep=rep1&type=pdf. Biber, Douglas. 1988. Variation across Speech and Writing. Cambridge: Cambridge University Press. Biber, Douglas, and Susan Conrad. 2019. Register, Genre, and Style. Cambridge: Cambridge University Press. Biber, Douglas, Jesse Egbert, and Daniel Keller. 2020. “Reconceptualizing Register in a Continuous Situational Space.” Corpus Linguistics and Linguistic Theory 16, no. 3: 581–616. Biber, Douglas, Jesse Egbert, Daniel Keller, and Stacey Wizner. 2021. “Towards a Taxonomy of Conversational Discourse Types: An Empirical Corpus-based Analysis.” Journal of Pragmatics 171: 20–35. Biber, Douglas, and Edward Finegan. 1994. “Multi-dimensional Analyses of Authors’ Styles: Some Case Studies from the Eighteenth Century.” Research in Humanities Computing 3: 3–17. Bryant, Stacy. 2016. “Teaching Authorial Style and Literary Technique: Exemplo XI of El Conde Lucanor.” Hispania 99, no. 2: 234–45. Chesterman, Andrew. 2007. “Similarity Analysis and the Translation Profile.” Belgian Journal of Linguistics 21, no. 1: 53–66. Egbert, Jesse. 2012. “Style in Nineteenth Century Fiction. A Multidimensional Analysis.” Scientific Study of Literature 2, no. 2: 167–98. Egbert, Jesse, and Marianna Gracheva. Forthcoming. “Linguistic Variation within Registers. Granularity in Textual Units and Situational Parameters.” Corpus Linguistics and Linguistic Theory. https://www.degruyter.com/document/doi/10.1515/cllt-2022-0034/html Egbert, Jesse, and Michaela Mahlberg. 2020. “Fiction – One Register or Two?” Register Studies 2, no. 1: 72–101. Egbert, Jesse, and Erin Schnur. 2018. “The Role of the Text in Corpus and Discourse Analysis: Missing the Trees for the Forest.” In Corpus Approaches to Discourse, edited by Charlotte Taylor and Anna Marchi, 159–73. New York: Routledge. 46 Marianna Gracheva and Jesse A. Egbert Gracheva, Marianna. 2022. “Style of Creative Nonfiction: A Multidimensional Analysis of Literary Essays.” Scientific Study of Literature 12, no. 1. https://doi.org/10.1075/ssol. 22002.gra. Gutkind, Lee. 2007. The Best Creative Nonfiction. New York: W. W. Norton. Hartsock, John. 2016. Literary Journalism and the Aesthetics of Experience. Amherst, MA: University of Massachusetts Press. Hoover, David. 2007. “Corpus Stylistics, Stylometry, and the Styles of Henry James.” Style 41, no. 2: 174–203. Langer, Susanne. 1953. Feeling and Form. New York: Charles Scribner’s Sons. Moss, Lesley. 2014. “Corpus Stylistics and Henry James’s Syntax.” PhD diss., University College London. Silverman, Sue. 2008. “The Meandering River: An Overview of the Subgenres of Creative Nonfiction.” Association of Writers and Writing Programs. Last modified September 2008. www.awpwriter.org/magazine_media/writers_chronicle_view/2507/ the_meandering_river_an_overview_of_the_subgenres_of_creative_nonfiction. Within-Author Style Variation in Literary Nonfiction 47 Appendix Dimensions of Variation: Features and Factor Loadings Dimensions Associated Linguistic Features Dimension 1 Interactive vs. Informational Style Positive Features “That” deletion + private verbs (0.63), private verbs (0.62), pronoun IT (0.57), conditional clauses (0.56), 2nd person (0.56), 1st person (0.55), indefinite pronouns (0.49), demonstrative pronouns (0.44), “that” deletion + public verbs (0.44), emphatics (0.41), hedges (0.36), proverb DO (0.34), wh-questions (0.34), wh-clauses + private verbs (0.33), be as main verb (0.32) Negative Features Attributive adjectives (-0.76), prepositional phrases (-0.62), nominalizations (-0.61), by-passive (-0.40). Positive Features Verb complements + private verbs (0.55), adverbs (0.49), noun complements (0.47), causative clauses (0.42), abstract nouns (0.39), verb complements + public verbs (0.37), suasive verbs (0.31) Negative Features Concrete nouns (-0.76), present participial (-0.37), past participial (-0.33) Positive Features Present tense (0.58) Negative Features Past tense (-0.87), perfect aspect (-0.55), public verbs (-0.54), 3rd person (-0.38) Dimension 2 Abstract Expository vs. Concrete Descriptive Style Dimension 3 Immediate vs. Removed Style 3 Charles Dickens’s Influence on Benito Pérez Galdós Revisited A Corpus-Stylistic Approach1 Pablo Ruano San Segundo 3.1 Introduction In this chapter, we compare Charles Dickens’s and Benito Pérez Galdós’s style to investigate the alleged influence of the former on the latter. Benito Pérez Galdós is a well-known nineteenth-century Spanish novelist whose craftsmanship has been frequently compared to Dickens’s (see Section 3.2). In this chapter, we scrutinize this influence from a corpus-stylistic point of view. To do so, we have developed an annotation system of Galdós’s novels to identify suspensions. A suspension (also known as suspended quotation) is a “protracted interruption by the narrator of a character’s speech” of at least five words (Lambert 1981, 6). They are characteristic of Dickens’s style (Newsom 2000, 556). As shown in example 1, suspensions are projecting clauses with which narrators introduce stretches of direct speech. A suspension can have several functions, such as organizing discourse, offering character information, or creating specific literary effects, such as an impression of simultaneity between the words of a character and their actions. In example 1, for instance, the suspension contributes to the effect of synchronicity between Mr. Gradgrind’s words and his body language (pondering with his hands in his pockets, and his cavernous eyes on the fire). (1) “Whether,” said Gradgrind, pondering with his hands in his pockets, and his cavernous eyes on the fire, “whether any instructor or servant can have suggested anything? Whether Louisa or Thomas can have been reading anything? Whether, in spite of all precautions, any idle story-book can have got into the house? Because, in minds that have been practically formed by rule and line, from the cradle upwards, this is so curious, so incomprehensible.” (Hard Times, Chapter 4)2 Thanks to the advances in corpus stylistics, suspensions have been systematically analyzed in Charles Dickens’s novels (Mahlberg and Smith 2012; Mahlberg et al. 2013, 2016, among others). In this chapter, we use a similar methodology to compare Dickens’s use of suspensions to that of Galdós in Fortunata and Jacinta, the novel for which the Spanish novelist is best known. The aim of the chapter is to discuss patterns in form and function hitherto unremarked in literary appreciations DOI: 10.4324/9781003298328-4 Charles Dickens’s Influence on Benito Pérez Galdós Revisited 49 of Galdós’s style that show how the Spanish novelist may have incorporated a Dickensian device into his style to achieve similar effects as those conveyed by Dickens. In doing so, hopefully, the chapter will also contribute to illustrating the potential of corpus stylistics in the analysis of literary texts in Spanish, in which the amount of studies using computer-assisted methodologies is still small in number. The chapter is organized as follows. First, we provide a brief overview of the alleged influence of Charles Dickens on Benito Pérez Galdós (Section 3.2). Then, the annotation system used to identify suspensions in Fortunata and Jacinta is explained, and the results obtained are shown (Section 3.3). These results are analyzed in Section 3.4, which is divided into two subsections. In Section 3.4.1, we discuss the similarities in form and function of suspensions between Dickens and Galdós. In Section 3.4.2 we focus on specific textual functions of the suspensions identified in Galdós’s novel that are similar to those discussed in Dickens in previous studies. The chapter concludes with some remarks on the potential of computer-assisted methods that combine quantitative and qualitative analyses, from which the study of literary texts written in Spanish can benefit greatly. 3.2 The Influence of Charles Dickens on Pérez Galdós The influence that great writers – such as Cervantes (Goldman 1971; Benítez 1990), Balzac (Lacosta 1968; Ollero 1973), writers from the Russian schools such as Tolstoy and Dostoevsky (Gilman 1981; Ley 1977, 294–95), and of course Dickens (McGovern 2000; Tambling 2013) – exerted on Galdós is well-known. This was admitted by the Spanish novelist himself. In his autobiographical Memorias de un desmemoriado, for example, Galdós states: I regarded Charles Dickens as my most beloved master. In my literary apprenticeship, still in my conceited youthfulness and having barely devoured Balzac’s The Human Comedy, I zealously set myself to reading Dickens’s vast oeuvre. (Pérez Galdós 1980, 1693) (our translation)3 Galdós’s translation of Dickens’s Pickwick Papers for the Spanish newspaper La Nación is situated precisely in this literary apprenticeship. Not only did Galdós pay tribute to Dickens in this translation, but he also absorbed the Victorian author’s style (Wright 1979, 24).4 He translated Pickwick Papers in 1868, when he was barely 24 years old, and while he was fully engaged in the writing of his first novel, La Fontana de Oro, published a couple of years later. This is probably the reason this novel has plenty of Dickensian echoes, such as the detailed and exaggerated descriptions of characters’ physical appearance or the narrator’s visible stance and his animosity toward oppressors (Nieto Caballero 2019a, 323). These would be the first of the many Dickensian parallelisms found in Galdós’s works, which have been widely discussed as part of the alleged influence of Dickens on Galdós. The presence of Dickensian reminiscences in Galdós’s novels has been the subject of numerous studies, in which it is not difficult to find examples 50 Pablo Ruano San Segundo that refer to Fortunata and Jacinta, the text under analysis here. For instance, the use of quia and en toda la extensión de la palabra by Fortunata and Doña Lupe, respectively, is frequently used as a prototypical examples of Dickens’s wellknown use of catchphrases with which he singles out his characters’ speeches. We can also mention examples related to specific Dickensian situations and characters that seem to be transported into Galdós’s fictional universe. The chapter “Una visita al Cuarto Estado” in Fortunata and Jacinta, for instance, has been frequently likened to the search for an orphan that takes place in Our Mutual Friend (Gilman 1981, 218–19). As for parallelisms in characters, José Izquierdo strongly resembles Mr. Casby, while Doña Guillermina seems to be inspired by Mrs. Jellyby, both from Little Dorrit (Gilman 1981, 271). All these echoes in Fortunata and Jacinta are just a paradigmatic example of the numerous traces of the English author in Galdós’s oeuvre, on which the influence pointed out by critics is based. Without a doubt, they illustrate the influence of Dickens on Galdós. However, it should be noted that this influence is mostly based on a compendium of references rather than on textual analyses of both authors. In other words, although the influence of Dickens on Galdós is indisputable, there are no systematic analyses that have systematically investigated such influence. Quite on the contrary, scholarship has gauged the Dickensian echo in Galdós on the basis of a collection of novelistic reminiscences, such as characters’ use of catchphrases or the Dickensian situations and characters that we have just pointed out. Needless to say, this should not be understood as a criticism to the (invaluable) work carried out by literary critics, which has contributed to a better understanding of the parallelisms between Dickens and Galdós in general and to the influence of the former on the latter in particular. In fact, scholars have admitted this problem. Tambling (2013, 191) recognizes that the work carried out is “speculative,” whereas McGovern (2000, 1) also admits that due to the “immensity” of the literary production of both authors, it is difficult to carry out analyses of intertextuality. The fact that the influence of Dickens on Galdós’s style has commonly been accepted on the basis of a compilation of Dickensian references, such as the ones detailed prior, not only justifies but also requires systematic textual approaches to the style of both authors. This would make it possible to gauge whether and to what extent Dickens’s style influenced Galdós’s beyond the novelistic reminiscences referred to in traditional literary criticism. Corpus stylistics can be helpful in this regard. Thanks to computer-assisted methodologies, systematic textual analyses of the literary production of both authors are possible. Nieto Caballero (2019a), for instance, has analyzed clusters containing body part nouns in Dickens’s and Galdós’s novels and demonstrated how both authors make use of similar body language constructions that contribute to characterization. In this chapter, we focus on suspensions, a characteristic feature of Dickens’s style that, thanks to the annotation system explained next, can be systematically scrutinized in Galdós’s craftsmanship. As will be shown in Section 3.4, there are patterns in form and function in Galdós’s use of this unit in Fortunata and Jacinta that suggest that Dickens’s influence on his novels is more profound than usually thought – or at least demonstrated. Charles Dickens’s Influence on Benito Pérez Galdós Revisited 51 3.3 Methodology and Results To carry out our analysis of suspensions in Fortunata and Jacinta, we have annotated the novel following the annotation system of Dickens’s novels explained in Mahlberg et al. (2016). The annotation of Fortunata and Jacinta is part of an annotation of a corpus of Galdós’s novels (c. 6.4 million words) that is being carried out as part of larger corpus-stylistic project currently underway. As in the case of Dickens’s novels, this annotation distinguishes between several textual subsets of the novel. The main distinction is that between characters’ speech (and thoughts) (also known as “quotes”) on the one hand and narration (also known as “non-quotes”) on the other. Suspensions are a special type of non-quote. Suspensions, italicized in example 2, can be short and long. Short suspensions have a length up to four words (dijo Villalonga), whereas long suspensions have a length of five or more words (le dijo en secreto Guillermina, deteniéndola, y ambas se miraban con picardía).5 (2) Jacinta pasó al salón, más que por enterarse de las noticias, por ver a su marido que aquel día no había comido en casa. “Oye” – le dijo en secreto Guillermina, deteniéndola, y ambas se miraban con picardía – “con veinte duros que le sonsaques hay bastante.” “En Bolsa no se supo nada. Yo lo supe en el Bolsín a las diez” – dijo Villalonga –. “Fui al Casino a llevar la noticia. Cuando volví al Bolsín, se estaba haciendo el consolidado a 20.” (Fortunata and Jacinta, Part 1, Chapter 6) To create the annotated version of Fortunata and Jacinta, we have used a plain text file of the novel and converted it to an XML file with the help of a set of Python scripts. Specifically, we have used XML elements (<element> </element>) to annotate paragraphs and sentences on the one hand and empty elements, also known as milestones (<milestone/>), to annotate examples of characters’ speech and suspensions, on the other. The elements that form the nested hierarchy of paragraphs and sentences contain the text between an opening element and a closing element (<p> and </p> in the case of paragraphs, for example), while the empty elements that annotate character’s speech and suspensions contain their own place marker to indicate the start or end of the annotated phenomenon (<qs/> and <qe/> to indicate the start and end of a character’s speech, for example). In Table 3.1 we show the tags that we have used, which are similar to the ones used in the annotation of Dickens’s novels by Mahlberg et al. (2016). To annotate suspensions in Fortunata and Jacinta, we first annotated characters’ speech (quotes) with <qs/> (“start of quoted text”) and <qe/> tags (“end of quoted text”). Then, following Lambert’s (1981, 6) definition of suspensions as an interruption of a character’s speech of at least five words, we marked up any text of five or more words which occur between a <qe/> tag and a <qs/> tag with the tags <sls/> and <sle/> to annotate long suspensions, and any text of four or less words which occur between a <qe/> tag and a <qs/> tag with the tags <sss/> and <sse/> to annotate short suspensions. In Figure 3.1 we show example 2 with the annotation that we have used. 52 Pablo Ruano San Segundo Table 3.1 Annotation Tags Used to Annotate Fortunata and Jacinta Annotation Tags Tag Meaning <p> </p> <s> </s> <qs/> <qe/> <sls/> <sle/> <sss/> <sle/> Paragraph (start) Paragraph (end) Sentence (start) Sentence (end) Quote (start) Quote (end) Suspension (long) (start) Suspension (long) (end) Suspension (short) (start) Suspension (short) (end) Figure 3.1 Example 2 with annotation. In this chapter, we focus on long suspensions, which are more likely to contribute to meaningful lexicogrammatical patterns in narrative fiction (Mahlberg and Smith 2010). In the particular case of Dickens’s fiction, several analyses have demonstrated how long suspensions are a potentially useful place to check a text for character information, especially in the form of descriptions of body language (Mahlberg et al. 2016, 445), to look into patterns of characterization (Stockwell Charles Dickens’s Influence on Benito Pérez Galdós Revisited 53 and Mahlberg 2015), to provide info related to characters’ psychological dimension (Ruano San Segundo 2018, 340), and even a device used to convey specific literary effects in the act of reading, such as the impression of simultaneity between speech and body language (Mahlberg et al. 2013) or the retrospective narration of pauses (Mahlberg and Smith 2012, 61). In our case, the annotation of Fortunata and Jacinta has made it possible to identify 687 long suspensions in Galdós’s novel, with which we will be able to compare Dickens’s and Galdós’s use of this element and investigate the alleged influence of the former on the latter from an innovative perspective. In Figure 3.2, we show a screenshot with 50 of the 687 suspensions, arranged in alphabetical order. The fact that a group of specific suspensions can be viewed together in the form of a concordance makes it possible to read and analyze them vertically (Tognini-Bonelli 2001, 3). Thanks to this vertical reading, a range of co-occurrence patterns of words can be investigated, which can be meaningful for the literary appreciation of the novel and, in the particular case of this chapter, to compare Galdós’s and Dickens’s use of suspensions from a stylistic point of view. To compare Galdós’s and Dickens’s use of suspensions, we have also benefited from the CLiC tool (Mahlberg et al. 2016), in which all the suspensions from Dickens’s 15 novels can be visualized. In Figure 3.3 we show a screenshot of 20 suspensions in Oliver Twist. As can be observed at the right side of the screenshot, the CLiC tool contains search options that make it possible to focus on stretches of text within suspensions. Such search options have opened up novel ways of using concordances to link lexicogrammatical and textual patterns (Mahlberg et al. 2016, 433). In this chapter, we have searched for patterns in suspensions Figure 3.2 Screenshot of 50 suspensions in Fortunata and Jacinta. 54 Pablo Ruano San Segundo Figure 3.3 Screenshot of CLiC tool with 20 suspensions from Oliver Twist. in Fortunata and Jacinta in our data, comparing our results in Galdós’s novel to those in Dickens’s novels than can be visualized in the CLiC tool. 3.4 Analysis 3.4.1 Form and Function of Suspensions Without a doubt, the aspect that brings us closer to the similarity between Dickens and Galdós in the use of suspensions is found in the functional pattern that dominates suspensions in general. As Mahlberg et al. (2013, 40) state, by interrupting a character’s speech, suspensions can create an impression of simultaneity between the speech and the contextual information described by the narrator, which in turn can suggest similarities to the simultaneous occurrence of speech and body language in real life. This is the function par excellence of suspensions in Dickens’s novels, which is enacted not only by the interruption of the character’s speech but also by the formal pattern found in the suspension: in addition to the reporting verb and the name of the character whose speech is being reported, suspensions frequently contain an -ing clause to describe the body language. Let us take 3 as an example. As can be observed, Mrs. Sowerberry is taking up a dim and a lamp and leading the way upstairs as she speaks. This impression of synchronicity between her words and her body language is not only conveyed by the use of the -ing clauses but also by placing these clauses interrupting the character’s speech. (3) “Then come with me,” said Mrs. Sowerberry: taking up a dim and dirty lamp, and leading the way upstairs; “your bed’s under the counter. You don’t Charles Dickens’s Influence on Benito Pérez Galdós Revisited 55 mind sleeping among the coffins, I suppose? But it doesn’t much matter whether you do or don’t, for you can’t sleep anywhere else. Come; don’t keep me here all night!” (Oliver Twist, chapter 4) In Fortunata and Jacinta, we can see a similar pattern, as shown in examples 4 and 5. In example 4, for instance, we see how Maxi throws himself into Fortunata as he asks her to hug him (Dame un abrazo). As in the examples from Dickens’s novels, the impression of simultaneity is enacted both by the -ing clause that describes the body language (arrojándose a ella medio vestido) and by suspending Maxi’s speech to describe this body language. It is only fair to state that this device is frequently used in fictional narratives to create the effect of synchronicity between speech and movement. As Korte (1997, 97) points out in his analysis about body language in literary texts, this impression is frequently created thanks to “[a] the interruption of the character’s speech by a description of the body language, and [b] the syntactical subordination of the body language to the character’s speech.” However, the systematicity with which Dickens does that makes it a stylistically marked choice (Newsom 2000, 556). In the case of Fortunata and Jacinta, we also find a repeated use of this construction in suspensions. Specifically, 407 of the 687 suspensions identified with the annotation discussed in Section 3.3. In other words, 59.24% of the suspensions follow the pattern that characterizes Dickens’s use of suspensions. (4) “Dame un abrazo” – le dijo Maxi arrojándose a ella medio vestido –. “Así te quiero. Tú has padecido, tú has pecado . . . luego eres mía.” (Fortunata and Jacinta, Part 4, Chapter 1) (5) “¡Vacía, enteramente vacía!” – exclamó esta levantándola en alto y mirándola al trasluz –. “Y estaba casi llena, pues apenas.” (Fortunata and Jacinta, Part 2, Chapter 6) In addition to the formal and functional pattern of suspensions, elements outside the suspensions also buttress the Dickensian echo in Galdós’s use of this element. We can see this both in the first and in the second stretch of direct speech that surround the suspension. On the one hand, the stretches of direct speech that precede suspensions in Dickens’s novels frequently contain elements such as vocatives, exclamations, and imperatives, which are separated from the remainder of the speech by the suspension. In example 3 we have shown an example that contains an imperative (come with me). In examples 6 and 7, two examples of a vocative (Nicholas) and an exclamation (My good fellow!) are shown. (6) “Nicholas,” cried Kate, throwing herself on her brother’s shoulder, “do not say so. My dear brother, you will break my heart. Mama, speak to him. 56 Pablo Ruano San Segundo Do not mind her, Nicholas; she does not mean it, you should know her better. Uncle, somebody, for Heaven’s sake speak to him.” (Nicholas Nickleby, Chapter 20) (7) “My good fellow!” exclaimed Martin, clutching him by both arms, “I have never seen her since I left my grandfather’s house.” (Martin Chuzzlewit, Chapter 14) By separating off vocatives, imperatives, and exclamations from the remainder of the speech, the first stretch of the text is highlighted, and the effect of simultaneity between the character’s speech and their body language enacted, as all the examples contain references to their body language in the suspensions. An identical pattern can be observed in Fortunata and Jacinta, in which suspensions are also frequently preceded by stretches of direct speech that contain these elements. In example 4, we have shown a suspension preceded by an imperative (Dame un abrazo), whereas in example 5, the suspension is preceded by an exclamation (¡Vacía, enteramente vacía!). In examples 8, 9, and 10, we can see other examples from Fotunata and Jacinta that follow this structural pattern – with examples of an imperative (Siéntate un ratito), a vocative (Primo) and an exclamation (¡Bah!), respectively. It is interesting to note that in those cases in which the suspension is preceded by an exclamation, as in examples 5 and 10, Galdós makes use of the verb exclamar (exclaim). This is in line with Mahlberg et al.’s (2013, 51–52) finding that Dickens also chooses this verb when he separates an exclamation from the remainder of the speech with a suspension, as can be observed in example 7. The fact that Galdós does the same in his novel reinforces the alleged Dickensian echo discussed here. (8) “Siéntate un ratito” – dijo Moreno, haciéndolo en el sofá y dando una palmada en el asiento –. “Más santidad que en oír siete misas, hay en practicar las obras de misericordia, acompañando a los enfermos y dando un ratito de conversación a quien se ha pasado toda la noche en vela. Dime una cosa. ¿Cómo llevas las obras de tu asilo?” (Fortunata and Jacinta, part 4, chapter 2) (9) “Primo” – le dijo el otro mirándole con socarronería – ; “si quieres hijos, haberlo pensado antes.” (Fortunata and Jacinta, part 4, chapter 2) (10) “¡Bah!” – exclamó apartando la vista de su hermano con un movimiento desdeñoso de la cabeza –. “No quiero oír sermones. Yo sé bien lo que debo hacer.” (Fortunata and Jacinta, part 2, chapter 4) With regard to the remainder of the speech, on the other hand, Lambert (1981, 44) found that Dickens sometimes repeated the stretch of direct speech that Charles Dickens’s Influence on Benito Pérez Galdós Revisited 57 preceded the suspension, as shown in example 11, in which the first stretch of direct speech (And yet) is repeated immediately after the suspension. This happens only occasionally and mostly when suspensions are unusually lengthy – suspension in example 11 is made up of 15 words. It is interesting that Galdós also does that in Fortunata and Jacinta. Two examples are provided in examples 12 and 13. In both cases, suspensions are lengthy too (19 and 24 words, respectively), which suggests that Galdós, in a Dickensian manner, might have repeated the words of the character to remind his readers of what the character was saying before (s)he is interrupted by the narrator. This seems to be in line with the function that Lambert discusses when he analyzes Dickens’s use of suspensions. In his view, the suspension “seems to be fundamentally a sort of aggression” by a jealous author (Lambert 1981, 35). Although these rather-provocative claims by Lambert cannot be tested from a stylistic point of view, it is clear that the suspension in example 11, as in examples 12 and 13, interrupts a sentence in progress: just after the character has started to speak the narrator interposes a (lengthy) comment “so that ancillary details may be given, accompanying circumstances indicated” (Lambert 1981, 51), after which the words of the character from the first stretch of direct speech are repeated. (11) “And yet,” said Ralph, speaking in a very marked manner, and looking furtively, but fixedly, at Kate, “and yet I would not. I would spare the feelings of his – of his sister. And his mother of course.” (Nicholas Nickleby, Chapter 20) (12) “Ahí tienes” – le dijo doña Lupe moviendo la mano derecha, con dos dedos de ella muy tiesos, en ademán enteramente episcopal – ; “ahí tienes lo que pasa por no hacer lo que yo te digo . . . Si hubieras seguido los consejos que te di este verano, no te verías como te ves.” (Fortunata and Jacinta, Part 2, Chapter 3) (13) “De modo” – exclamó Feijoo en voz alta, abriendo los brazos y tomando un tono que no se podría decir si era de indignación o de burla –, “de modo que ya no hay patriotismo.” (Fortunata and Jacinta, Part 3, Chapter 1) Whether and to what extent this repetition is caused by a jealous author in Dickens’s novels is open to debate. However, what cannot be denied is that this formal feature is found in Galdós under similar circumstances (there seems to be a relationship between repeating the words of the character and the length of the suspension). This, together with the parallelisms in the first stretch of direct speech that precedes the suspension and functional pattern of the suspensions discussed before, unveils a parallel use that suggests a Dickensian echo hitherto unremarked in literary appreciations of the influence of Dickens on the style of Galdós. 58 Pablo Ruano San Segundo 3.4.2 Local Textual Functions The parallelism in Dickens’s and Galdós’s use of suspensions is further reinforced by local textual functions identified in Fortunata and Jacinta. Local textual functions (Mahlberg 2005, 2007, 2009) “describe the patterns of a (set of) lexical item(s) in a specific (set of) text(s)” (Mahlberg et al. 2013, 37). Generally speaking, the concept of local textual function makes it possible to relate lexical patterns to a range of textual properties. In Dickens’s novels, for instance, patterns of body part nouns that contribute to the creation of characters have been widely discussed (cf. Mahlberg 2013), some of them connected to suspensions. In this section, we focus on two aspects identified in suspensions in Fortunata and Jacinta that seem to have a Dickensian origin too: the positioning of characters and the link between movement and character’s thoughts. Firstly, the way in which characters stand in relation to spatial references is an aspect where clear similarities have been detected between Dickens’s and Galdós’s use of suspensions. It is well-known that Dickens used “words that refer to parts of buildings and furniture or words that give other spatial information” as references to explain characters’ positions and movements in the scene (Mahlberg 2013, 134). This is also frequent in Galdos’s fictional narratives. Doors, for instance, usually act as the reference in which the characters are placed and from which the scene that is presented to us is described (Nieto Caballero 2019b). When references to doors are found in suspensions, they seem to be used in a Dickensian manner. In Dickens’s works, references to doors in suspensions are frequently used to define characters’ positions (rather than movement). Thus, characters are frequently described as standing next to a door, as in example 14, or stopping at doors, as in examples 15 and 16. As can be observed, Mr. Perch’s, Mr. Pecksniff’s, and Quilp’s positions are defined by the door, which is used as a spatial reference to define the character’s position. To do so, a prepositional phrase is used (at the door). These examples show how, in addition to creating an effect of synchronicity between speech and body language, suspensions can also contribute to defining the scene by means of references to narrative space. (14) “Yes, Sir. Begging your pardon, Sir,” said Mr Perch, hesitating at the door, “he’s rough, Sir, in appearance.” (Dombey and Son, Chapter 22) (15) “I am afraid,” said Mr Pecksniff, pausing at the door, and giving his head a melancholy roll, “I am afraid that this looks artful. I am afraid, Mrs Lupin, do you know, that this looks very artful!” (Martin Chuzzlewit, Chapter 3) (16) “There she is,” said Quilp, stopping short at the door, and wrinkling up his eyebrows as he looked towards Miss Sally; “there is the woman I ought to have married – there is the beautiful Sarah – there is the female who has all the charms of her sex and none of their weaknesses. Oh Sally, Sally!” (The Old Curiosity Shop, Chapter 33) Charles Dickens’s Influence on Benito Pérez Galdós Revisited 59 Interestingly, Galdós makes a similar use of en la puerta (at the door) in Fortunata and Jacinta. Thus, this prepositional phrase is used in suspensions to show where characters stand, as shown in examples 17 and 18, or where they stop, as in example 19. As in the examples identified in Dickens’s novels, these suspensions do not contribute to creating an impression of simultaneity between speech and body language, but to defining the narrative space of the story world by means of circumstantial information in the form of a prepositional phrase. (17) “La culpa la tienes tú” – añadió severamente doña Lupe, en la puerta –, “porque te pones a jugar con ella, le ríes las gracias, y ya ves. Cuando quieres que te respete, no puede ser. Es muy mal criada.” (Fortunata and Jacinta, Part 2, Chapter 1) (18) “Consolarse” – le dijo Segismundo en la puerta –. “La vida es así; hoy pena, mañana una alegría. Hay que tener calma, y tomar las cosas como vienen, y no ligar todo nuestro ser a una sola persona. Cuando una vela se acaba, debe encenderse otra . . . Conque tengamos valor, y aprendamos a despreciar . . . Quien no sabe despreciar, no es digno de los goces del amor . . . Y por último, simpática amiga mía, ya sabe que estoy a sus órdenes, que tiene en mí el más rendido de los servidores para cuanto se le ocurra, amigo diligente, reservadísimo, buena persona . . . Abur.” (Fortunata and Jacinta, Part 4, Chapter 3) (19) “Amigo” – dijo parándose en la puerta de la botica –. “Su mujer de usted me ha parecido una mujer defectuosísima. Aunque la he tratado poco puedo asegurar que tiene buen fondo; pero carece de fuerza moral. Será siempre lo que quieran hacer de ella los que la traten.” (Fortunata and Jacinta, Part 3, Chapter 4) Secondly, references to narrative space are also frequently related to thought presentation. Because of the ubiquity of direct speech in Dickens’s novels, the synchronicity conveyed by suspensions is normally between the characters’ body language and their speech (see Mahlberg and Smith 2012; and Mahlberg et al. 2013). However, as shown in Ruano San Segundo’s (2018, 341) analysis of Dickens’s use of direct thought presentation, the Victorian author also used suspensions to achieve the same effect when reporting his characters’ thoughts, thus creating an impression that comes close to synchronicity of presentation between characters’ thoughts and their body language. Two examples are shown, 20 and 21. As can be observed, both suspensions contain references about narrative space – as he crossed to the door in example 20 and walking on tiptoe to another door near the bedside in example 21. By placing this information in the suspension, Dickens creates an effect of simultaneity between the thoughts and actions of Clennam and Mr. Snodgrass, respectively. (20) “The house,” thought Clennam, as he crossed to the door, “is as little changed as my mother’s, and looks almost as gloomy. But the likeness ends 60 Pablo Ruano San Segundo outside. I know its staid repose within. The smell of its jars of old rose-leaves and lavender seems to come upon me even here.” (Little Dorrit, Chapter 13) (21) “Very lucky I had the presence of mind to avoid them,” thought Mr. Snodgrass with a smile, and walking on tiptoe to another door near the bedside; “this opens into the same passage, and I can walk quietly and comfortably away.” (Pickwick Papers, Chapter 54) This strategy is also observed in Fortunata and Jacinta. Literary critics have traditionally referred to the close relationship between kinetic references and the psychological dimension of characters (see Padilla 2000; Arroyo Díez 2011, 104). However, this has been mostly discussed in dialogue novels, in which it is frequent that the narrator projects characters’ thoughts by making use of soliloquies or thought presentation strategies, such as the interior monologue (Jiménez Gómez 2020, 369). These strategies are combined with the kinetic references mentioned before, which results in a relation of interdependence between characters’ movements and the representation of their thoughts. This aspect, however, remains underexplored outside dialogue novels. In Fortunata and Jacinta, we have found that the relationship between kinetic references and the psychological dimension of characters is enacted by means of suspensions, as shown in examples 22, 23, and 24. As in Dickens’s works, Galdós makes use of references to narrative space – el establecimiento (the establishment), la sala (the room), and la alcoba (the chamber) – in suspensions. They are part of -ing clauses – penetrando en el establecimiento in example 22, for instance – with which the impression of simultaneity between characters’ thoughts and their movements is further reinforced. (22) “¡Dátiles! . . . ¡Cuántos le he comprado yo! Las golosinas la venden. Se despepita por ellas . . .” – pensó el razonador, penetrando en el establecimiento, sin ver nada de lo que en él había –. “Come dátiles . . . luego no está mala; los dátiles son muy indigestos. Y puesto que ella los come, la causa del no salir, no es enfermedad . . . Luego, es otra cosa . . .” (Fortunata and Jacinta, Part 4, Chapter 5) (23) “Pues lo que es hoy sí que no me quedo con esto dentro del cuerpo” – pensó mi hombre al otro día, entrando en la sala, hecho un sol de limpio y despidiendo, como todas las mañanas al salir de su casa, un fuerte olor a colonia –. ¿Y Dónde está?, ¿qué hace que no sale? Es un encanto esa mujer, y tengo al tal Santa Cruz por el gaznápiro más grande que come pan . . . ¡Cuánto me hace esperar!” (Fortunata and Jacinta, Part 3, Chapter 4) (24) “Pues lo que es mañana temprano” – se dijo volviendo a la alcoba –, “mañana tempranito, antes de que salga para el obrador, voy y la acogoto . . .” (Fortunata and Jacinta, Part 4, Chapter 6) Charles Dickens’s Influence on Benito Pérez Galdós Revisited 61 This function of suspensions to connect thought presentation with characters’ movements or the role of doors to define characters’ positions, together with the effect of synchronicity between characters’ speech and body language and the formal aspects discussed at the beginning of Section 3.4 (both in the suspension and in the stretches of direct speech before and after the interruption), reveals hithertounremarked textual parallelisms between Dickens and Galdós that serve to reinforce the influence of the former on the latter. Without a doubt, the identification of these patterns has only been possible thanks to the annotation of Galdós’s Fortunata and Jacinta, which has made it possible to systematically scrutinize 687 instances of suspensions throughout the novel. This proves the potential of corpus stylistics and how computer-assisted approaches can unveil meaningful textual patterns that cannot be detected with more traditional approaches and from which the analysis of literary texts can benefit greatly. 3.5 Conclusion The Dickensian element in Benito Pérez Galdós’s craftsmanship is indisputable. The study of this element in the works of the Spanish novelist, however, has been built upon a compilation of impressionistic references (based mostly on shared themes, scenes, and characters) rather than on textual analyses of their works. This lack of stylistic analyses is partly due to the difficulty of analyzing both authors’ works systematically, as some critics have admitted (see Section 3.2). Thanks to the emergence of new computer-assisted disciplines such as corpus stylistics, new avenues of analysis have been disclosed. By combining methods and theories from literary stylistics and corpus linguistics alike, corpus stylistics makes it possible to identify meaningful patterns that have traditionally gone unremarked in critical appreciations of literary texts, which have contributed to furthering our understanding of the effects that these patterns have on the way in which readers create meanings from texts. This is precisely what we have set out to do in this chapter. Specifically, we have shown how suspensions can be systematically explored and compared in the works of Charles Dickens and Benito Pérez Galdós thanks to an annotation of the texts under analysis. A suspension is an interruption of a character’s speech. Stylistically speaking, the suspension is a textual device for which Dickens is well-known. The Victorian author makes an extensive use of suspensions with different purposes, such as organizing discourse, offering character information or creating specific literary effects, such as an impression of simultaneity between the words of a character and their actions. Thanks to the annotation system that we have developed, we have looked into some of these Dickensian traits in Fortunata and Jacinta, the novel for which Galdós is best known. As has been shown, Galdós makes a Dickensian use of suspensions both from a formal and functional point of view. From a formal point of view, on the one hand, Galdós frequently incorporates an -ing clause to the suspension, with which he conveys an impression of simultaneity between the words of the character and their body language. Besides, like Dickens, he also frequently makes use of vocatives, exclamations, 62 Pablo Ruano San Segundo or imperatives in the stretch of direct speech that precedes the suspensions and also tends to repeat the stretch of direct speech that preceded the suspension in the remainder of the speech when suspensions are lengthy, as Dickens frequently does in his novels. From a functional point of view, on the other hand, we have looked into some textual functions identified in Fortunata and Jacinta which seem to have a Dickensian origin too. Thus, in addition to the frequent effect of synchronicity between characters’ speech and their body language, we have also identified more specific functions, such as the positioning of characters next to doors or the relation of interdependence between characters’ movements and the representation of their thoughts. The remarkable systematicity of these patterns has unveiled a more subtle similarity than has so far been noticed between Dickens and Galdós, thus opening new avenues of analysis in the study of the well-known (yet still underexplored) influence of the former on the latter from a stylistic point of view. Notes 1 The research reported on in this chapter has been funded by the Spanish government (Ayuda del Programa de Recualificación del Sistema Universitario Español. Modalidad de recualificación del profesorado universitario funcionario o contratado), which we acknowledge here. 2 All the examples shown in the chapter are taken from e-texts (see Section 3.3). Therefore, we provide the chapter location rather than page references. 3 The Spanish quote reads: “Consideraba yo a Carlos Dickens como mi maestro más amado. En mi aprendizaje literario, cuando no había salido yo de mi mocedad petulante, apenas devorada La Comedia humana de Balzac, me apliqué con loco afán a la copiosa obra de Dickens” (Pérez Galdós 1980, 1693). 4 The translation of Pickwick Papers was preceded by the essay “Carlos Dickens,” in which Galdós explained to readers of La Nación the most prominent features of the Victorian author’s style. 5 For further information on the rationale behind the division of the subsets, see Mahlberg et al. (2016). References Arroyo Díez, María Cristina. 2011. “Aspectos Espaciales y Visuales en las Primeras Novelas Contemporáneas Benito Pérez Galdós y su Repercusión en la Novela Española Actual.” PhD diss., Universidad de Valladolid. Benítez, Rubén. 1990. Cervantes en Galdós. Murcia: Universidad de Murcia. Gilman, Stephen. 1981. Galdós and the Art of the European Novel: 1867–1887. Princeton, NJ: Princeton University Press. Goldman, Peter. 1971. “Galdós and Cervantes: Two Articles and a Fragment.” Anales Galdosianos 4: 99–106. Jiménez Gómez, Cristina. 2020. “Galdós y su Narrativa: La Polifonía Textual Como Mecanismo Configurador de las Voces Ajenas.” Boletín de la Real Academia de Córdoba 169: 361–82. Korte, Barbara. 1997. Body Language in Literature. Toronto: University of Toronto Press. Charles Dickens’s Influence on Benito Pérez Galdós Revisited 63 Lacosta, Francisco C. 1968. “Galdós y Balzac.” Cuadernos Hispanoamericanos 224–5: 345–74. Lambert, Michael. 1981. Dickens and the Suspended Quotation. New Haven, CT: Yale University Press. Ley, Charles David. 1977. “Galdós Comparado con Balzac y Dickens, Como Novelista Nacional.” In Actas del primer congreso internacional galdosiano, 291–95. Las Palmas de Gran Canaria: Cabildo de Gran Canaria. Mahlberg, Michaela. 2005. English General Nouns: A Corpus Theoretical Approach. Amsterdam: John Benjamins. Mahlberg, Michaela. 2007. “Lexical Items in Discourse: Identifying Local Textual Functions of Sustainable Development.” In Text, Discourse and Corpora. Theory and Analysis, edited by Michael Hoey, Michaela Mahlberg, Michael Stubbs, and Wolfgang Teubert, 191–218. London: Continuum. Mahlberg, Michaela. 2009. “Local Textual Functions of Move in Newspaper Story Patterns.” In Exploring the Lexis-Grammar Interface, edited by Ute Römer and Rainer Schulze, 265–87. Amsterdam: John Benjamins. Mahlberg, Michaela. 2013. Corpus Stylistics and Dickens’s Fiction. London: Routledge. Mahlberg, Michaela, and Catherine Smith. 2010. “Corpus Approaches to Prose Fiction: Civility and Body Language in Pride and Prejudice.” In Language and Style, edited by Dan McIntyre and Beatrix Busse, 449–67. Basingstoke: Palgrave Macmillan. Mahlberg, Michaela, and Catherine Smith. 2012. “Dickens, the Suspended Quotation and the Corpus.” Language and Literature 21, no. 1: 51–65. https://doi. org/10.1177/0963947011432058. Mahlberg, Michaela, Catherine Smith, and Simon Preston. 2013. “Phrases in Literary Contexts: Patterns and Distributions of Suspensions in Dickens’s Novels.” International Journal of Corpus Linguistics 18, no. 1: 35–56. https://doi.org/10.1075/ijcl.18.1.05mah. Mahlberg, Michaela, Peter Stockwell, Johan de Joode, Catherine Smith, and Matthew Brook O’Donnell. 2016. “CLiC Dickens: Novel Uses of Concordances for the Integration of Corpus Stylistics and Cognitive Poetics.” Corpora 11, no. 3: 433–63. McGovern, Timothy. 2000. Dickens in Galdós. New York: Peter Lang. Newsom, Robert. 2000. “Style of Dickens.” In The Oxford Reader’s Companion to Charles Dickens, edited by Paul Schlicke, 553–57. Oxford: Oxford University Press. Nieto Caballero, Guadalupe. 2019a. “Análisis de la Influencia de Charles Dickens en el Estilo de Benito Pérez Galdós a Través del Lenguaje Gestual de sus Personajes: Un Estudio de Corpus.” Dicenda. Estudios de lengua y literatura españolas 37: 321–41. Nieto Caballero, Guadalupe. 2019b. “El Espacio como Eje Vertebrador en la Creación del Universo Ficticio Galdosiano: Un Estudio de Corpus.” Signa. Revista de la Asociación Española de Semiótica 28: 1203–38. Ollero, Carlos. 1973. “Galdós y Balzac.” In Benito Pérez Galdós. El escritor y la crítica, edited by Douglass M. Rogers, 185–93. Madrid: Taurus. Padilla Mangas, Ana María. 2000. “Del Galdós Narrador al Galdós Dramaturgo: Un Acercamiento al Problema de las Didascalias.” In Actas VI Congreso Internacional Galdosiano 1997, edited by Carmen Yolanda Arencibia Santana, María del Prado Escobar Bonilla, and Rosa María Quintana Domínguez, 782–93. Las Palmas de Gran Canaria: Cabildo de Gran Canaria. Pérez Galdós, Benito. 1980. Obras Completas. Novelas. Tomo III. Madrid: Aguilar. Ruano San Segundo, Pablo. 2018. “A Corpus-based Approach to Charles Dickens’s Use of Direct Thought Presentation.” Corpora 13, no. 3: 319–45. 64 Pablo Ruano San Segundo Stockwell, Peter, and Michaela Mahlberg. 2015. “Mind-Modelling with Corpus Stylistics in David Copperfield.” Language and Literature 24, no. 2: 129–47. Tambling, Jeremy. 2013. “Dickens and Galdós.” In The Reception of Charles Dickens in Europe, edited by Michael Hollington, 191–96. London: Bloomsbury. Tognini-Bonelli, Elena. 2001. Corpus Linguistics at Work. Amsterdam: John Benjamins. Wright, Chad C. 1979. “Artifacts and Effigies: The Porreño Household Revisited.” Anales Galdosianos 14: 13–24. 4 A Corpus-Stylistic Approach to the Literary Representation of Narrative Space in Ruiz Zafón’s The Cemetery of Forgotten Books Series Guadalupe Nieto Caballero and Pablo Ruano San Segundo 4.1 Introduction In this chapter we show the application of corpus linguistic techniques to the analysis of literary texts written in Spanish. This methodology falls in the realm of corpus stylistics (McIntyre and Walker 2019), an area of corpus linguistics that applies “corpus methods to the analysis of literary texts, giving particular emphasis to the relationship between linguistic description and literary appreciation” (Mahlberg 2014, 378). Although corpus stylistics is a well-established approach in the analysis of literary texts in general, the use of computer-assisted methodologies has not yet been fully developed in the Spanish-speaking world (Nieto Caballero and Ruano San Segundo 2020, 19). This chapter sets out to demonstrate how the analysis of literary texts written in Spanish can also benefit greatly from quantitative methods to retrieve data that can then be subjected to a qualitative analysis. To do so, we analyze Carlos Ruiz Zafón’s The Cemetery of Forgotten Books (El Cementerio de los Libros Olvidados) series. Ruiz Zafón is the mostread Spanish author of the twentieth and twenty-first centuries, whose books have been translated into more than 50 different languages (Ramos Nogueira 2016). The Cemetery of Forgotten Books series is made up of four books: La sombra del viento (The Shadow of the Wind), El juego del ángel (The Angel’s Game), El prisionero del cielo (The Prisoner of Heaven), El laberinto de los espíritus (The Labyrinth of Spirits). La sombra del viento (Ruiz Zafón 2001) is a Gothic mystery that involves Daniel Sempere’s quest to track down the man responsible for destroying every book written by author Julian Carax. El juego del ángel (Ruiz Zafón 2008) is a prequel to La sombra del viento, also set in Barcelona, but during the 1920s and 1930s. It follows David Martín, a young writer who is approached by a mysterious figure to write a book. The next book in the cycle is El prisionero del cielo (Ruiz Zafón 2011). It returns to La sombra del viento’s Daniel Sempere and his travel back to the 1940s to resolve a buried secret. El laberinto de los espíritus (Ruiz Zafón 2016) is the fourth and final book in the Cemetery of Forgotten Books series. The novel is set in the Barcelona of the late 1950s and early 1960s. Daniel, overwhelmed by rage and the need to avenge the death of his mother, Isabella, DOI: 10.4324/9781003298328-5 66 Guadalupe Nieto Caballero and Pablo Ruano San Segundo will discover a network of crimes and violations of Francoist Spain, and a new protagonist, Alicia Gris, will help him solve the mysteries. The four books are related and share motifs, themes, symbolism, etc. Using a corpus-stylistic methodology, in this chapter we intend to show how certain aspects discussed by literary critics are enacted in the same way in the four novels, thus unveiling aspects of Zafón’s craftsmanship hitherto unremarked in literary appreciations of his style. More specifically, we will look into how the author shapes narrative space in the series. To do so, we will carry out a cluster analysis, with which we will identify textual building blocks used in the four novels and analyze them systematically. As pointed out before, the analysis is meant to make a contribution to the still-emerging field of corpus stylistics in Spanish, illustrating how the analysis of literary works can benefit greatly from the use of innovative corpus tools. The chapter is organized as follows. First, we offer a general overview of Ruiz Zafón’s treatment of narrative space (Section 4.2). Then, we explain the methodology used to identify the clusters under analysis, and we show the results obtained (Section 4.3). In Section 4.4, we analyze some of the examples, discussing meaningful patterns that contribute to the creation of particular literary effects. This section is divided into two subsections, which concentrate on clusters that contribute to the creation of fictional universes and characterization (Section 4.4.1) and on clusters that offer contextualizing information (Section 4.4.2). The chapter finishes with some remarks on the potential of corpus stylistics for the analysis of literary texts in the Spanish-speaking world. 4.2 Narrative Space in Ruiz Zafón’s Novels The treatment of space in fictional narratives has received a great deal of scholarly attention, which has resulted in different approaches and a distinction between different types of space, such as narrative space, the space that serves as context for the text, the space taken by the text itself, and the spatial form of the text, among others (Ryan et al. 2016; Buchholz and Manfred 2005). In this chapter, we concentrate on narrative space. Narrative space refers to “the space (and the places) providing the physical environment in which the characters of narrative live and move” (Ryan et al. 2016, 3). This is a fundamental dimension of any fictional narrative (Álvarez Méndez 2002), as it is closely related to the rest of the elements that shape the narrative story world: every story implies a series of events that take place in a given time and space (Zubiaurre 2000, 20). Narrative space is of paramount importance to create an effect of believability, especially in the case of non-transportable narratives that are rooted and inscribed in specific locations (Matzat 2007). Authors such as Dickens, Galdós, or Balzac, to name but three canonical novelists, are well-known for their treatment of space to inscribe some of their novels in London, Madrid, and Paris, respectively. Ruiz Zafón and The Cemetery of Forgotten Books series is another good example, with Barcelona as the setting of the story. The way in which textual space is mapped in the four novels that make the series provides the reader with a physical environment that contributes to an effect of plausibility. A Corpus-Stylistic Approach to Ruiz Zafón’s Series 67 In addition to a faithful reproduction of the fictional universe, narrative space can also convey a wide range of possibilities of interpretation (Zoran 1984, 319). In other words, narrative space not only serves as a physical environment for characters but also plays a stylistic role that usually goes beyond the geographical reality in which the characters of a narrative live and move. In the case of Ruiz Zafón, his masterful treatment of space is also connected to literary functions that go beyond the description of the scene. Settings like houses, train stations, and cemeteries are frequently given a symbolic value. Houses, for instance, often hide secrets, intriguing plots, or unsolved mysteries that the new tenant who comes to them must solve. They are also frequently given a symbolic value, as in El palacio de la medianoche, in which Chandra Chatterghee’s house becomes an alter ego to the awkward personality of the character, or the cabin on the desert island where David Martín takes refuge after his long adventure in El juego del Ángel, which is frequently seen as a representation of David’s inner life (Ruiz Tosaus 2009). With regard to train stations, they are a common setting in Ruiz Zafón’s novels and are frequently used symbolically too, especially in The Cemetery of Forgotten Books series. In La sombra del viento and El juego del Ángel, the train station Estación de Francia in Barcelona becomes a prominent setting in the story. In La sombra del viento, on the one hand, the station is the place in which frustration overcomes the happiness of the union between lovers. In El juego del Ángel, on the other, this train station symbolizes escape and the search for a new life (Ruiz Tosaus 2009). Finally, cemeteries are also frequently described in terms that suggest something otherworldly. The Cemetery of Forgotten Books is referred to as a palace, a temple, a honeycomb filled with honey (Hrabova et al. 2020). Specific cemeteries are also mentioned throughout the series, such as the cemetery in Sarriá in La sombra del viento or the cemetery of Pueblo Nuevo and the cemetery of San Gervasio in El juego del Ángel. These real cemeteries are frequently likened to labyrinths, which contributes to shaping the entangled and mystical image of the Cemetery of Forgotten Books in the series (Ruiz Tosaus 2009). Literary appreciations of Ruiz Zafón’s craftsmanship have mostly focused on this symbolic dimension of narrative space, to the detriment of the configuration of the spaces and places that make up the physical environment in which the characters of his narratives live and move. However, in the configuration of narrative of space, we can also identify habits hitherto unremarked that can contribute to a better understanding of Ruiz Zafón’s style. It is true that some scholars have referred to aspects of Ruiz Zafón’s writing in connection to his configuration of narrative space. Ruiz Tosaus (2009), for instance, mentions the abundance of movement in Ruiz Zafón’s novels, which seems to be related to the episodic nature of his writing. Romero Frías and Galiñanes Gallén (2009) scrutinize the descriptions of narrative space in La sombra del viento as filtered by a narratorprotagonist whose view occasionally distorts what is really happening. Apart from these isolated references, however, literary critics have not yet methodically approached the way in which Ruiz Zafón shapes the narrative space in his novels from a textual point of view. The clusters analyzed in this chapter reveal formal and functional patterns that will help to better understand the author’s style in this 68 Guadalupe Nieto Caballero and Pablo Ruano San Segundo regard. To do so, a corpus-stylistic approach has been used. The methodology used to identify the examples under analysis is explained next. 4.3 Methodology and Results The corpus with which we have conducted our analysis is made up of the four novels of The Cemetery of Forgotten Books series: La sombra del viento (Ruiz Zafón 2001), El juego del ángel (Ruiz Zafón 2008), El prisionero del cielo (Ruiz Zafón 2011), and El laberinto de los espíritus (Ruiz Zafón 2016). These four novels add up to 612,027 words, distributed as shown in Table 4.1. To carry out the analysis and identify textual building blocks related to the shaping of narrative space, we have used WordSmith Tools (Scott 2016), the most frequently used tool in corpus stylistics (Archer 2007, 249). More specifically, we have used this software tool to identify clusters in our corpus of Ruiz Zafón’s novels.1 From a formal point of view, a cluster is a sequence of two or more words that repeatedly occur consecutively in a corpus of texts (Cheng 2012, 72). Different terms have been used to refer to clusters, such as “recurrent word-combinations” (Altenberg 1998), “chains” (Stubbs and Barth 2003), or “n-grams” (Anthony 2019). Following the taxonomy used in WordSmith Tools user guide (2013), we have opted for the term cluster, as it is defined by purely formal aspects (Scott 2019). From a functional viewpoint, clusters identified in a corpus are supposed to have “identifiable discourse functions in texts” (Conrad and Biber 2005, 58), and they can lead to the identification of meaningful textual building blocks (Mahlberg 2013, 26). The potential functional value of clusters makes it very important to carefully define the criteria that guide their identification. There are three aspects that should be borne in mind in this regard: the length of the clusters, their distribution across the corpus, and their frequency. Firstly, regarding the length of the examples, analyses tend to oscillate between three- and five-word clusters. Biber et al. (1999, 992) explain that three-word clusters, although more numerous due to their limited length, tend to be related to grammar aspects. Lengthier clusters, on the contrary, are “more phrasal in nature and correspondingly less common” (ibid.). These clusters are more useful in order to identify stylistically relevant functions in literary texts, even if less examples are identified. As Mahlberg points out in her study of Victorian fiction in general and of Dickens’s works in particular, a “length of five has been shown to be a useful starting point for the analysis of fiction” (Mahlberg 2013). This is also the length of the examples discussed in Nieto Caballero’s (2019) Table 4.1 Novels by Ruiz Zafón Novel Words La sombra del viento El juego del ángel El prisionero del cielo El laberinto de los espíritus 158,802 156,248 68,237 228,740 A Corpus-Stylistic Approach to Ruiz Zafón’s Series 69 analysis of Galdós’s novels. These studies seem a good reference, and we have decided to analyze five-word clusters too, ruling out any example of four or less words. Secondly, as for the distribution of the examples identified, it is necessary to emphasize that examples should be found across a number of texts in the corpus so that they can be analyzed as a stylistic mark that goes beyond the idiosyncratic use in one text. In our study, we have only analyzed examples that have been found in the four novels included in our corpus. Frequency, finally, is the aspect about which it is more difficult to make an informed decision when working with clusters. As Kopaczyk states, “(t)here is no uniform practice in lexical bundle studies. . . . Every researcher takes an informed but idiosyncratic decision” (Kopaczyk 2013, 152). Sometimes, frequency is measured in absolute terms, whereas in other cases, it is normalized (per million of words, for example). Absolute frequencies are the preferred option when working with specialized corpora, while normalized frequencies seem a good alternative when working with more extensive corpora (Biber et al. 1999, 993). Because our corpus of study is a specialized corpus, made up of novels by one novelist, we have opted for absolute frequencies. Specifically, we have established a threshold of ten occurrences in our search criteria. In sum, we have selected clusters with a length of five words that occur at least ten times in the corpus and that are found in each of the four texts of the corpus. With these criteria, we have identified 28 examples. Results are shown in Table 4.2. Table 4.2 Clusters in Ruiz Zafón’s Novels N Word Freq. N Word Freq. 1 AL OTRO LADO DE LA 54 2 CEMENTERIO DE LOS LIBROS OLVIDADOS 3 ME DI CUENTA DE QUE 4 COMO SI SE TRATASE DE 5 EN LO ALTO DE LA 6 OTRO LADO DE LA CALLE 7 A LA PUERTA DE LA 54 15 CON LÁGRIMAS EN LOS 13 OJOS 16 CON UN HILO DE VOZ 13 39 36 25 24 23 17 18 19 20 21 8 EL CEMENTERIO DE LOS LIBROS 9 DEL CEMENTERIO DE LOS LIBROS 10 PUSO LOS OJOS EN BLANCO 11 AL FIN Y AL CABO 12 A LAS PUERTAS DE LA 19 17 16 13 SE ENCOGIÓ DE HOMBROS Y 16 14 SI SE TRATASE DE UN 14 21 19 LA VERDAD ES QUE NO QUE SE TRATABA DE UN SE DIO LA VUELTA Y A LA ENTRADA DE LA AL CEMENTERIO DE LOS LIBROS 22 Y SE ENCOGIÓ DE HOMBROS 23 EN EL INTERIOR DE LA 24 A LA ESPERA DE QUE 25 A LOS PIES DE LA 26 DE LA CALLE SANTA ANA 27 LA IGLESIA DE SANTA ANA 28 TUVE LA IMPRESIÓN DE QUE 13 13 13 12 12 12 11 10 10 10 10 10 70 Guadalupe Nieto Caballero and Pablo Ruano San Segundo Of the 28 examples identified, 13 are directly connected to narrative space: AL OTRO LADO DE LA, CEMENTERIO DE LOS LIBROS OLVIDADOS, EN LO ALTO DE LA, OTRO LADO DE LA CALLE, A LA PUERTA DE LA, A LA ENTRADA DE LA, EL CEMENTERIO DE LOS LIBROS, DEL CEMENTERIO DE LOS LIBROS, EN EL INTERIOR DE LA, A LOS PIES DE LA, DE LA CALLE SANTA ANA, LA IGLESIA DE SANTA ANA and A LAS PUERTAS DE LA. The fact that almost half of the clusters identified are related to the configuration of narrative space reveals a dimension about which Ruiz Zafón seems particularly concerned in his novels. Thus, in addition to giving a particular symbolic value to physical environments, narrative space is also frequently brought to the forefront of his writing. Some of the examples identified are part of the same textual block, such as CEMENTERIO DE LOS LIBROS OLVIDADOS, CEMENTERIO DE LOS LIBROS, and DEL CEMENTERIO DE LOS LIBROS. However, most of the examples refer to different aspects that contribute to the configuration of the narrative space throughout the series. The most striking aspect is probably the number of references to specific places and settings found in the examples. There are references to doors (puertas, as in A LA PUERTA DE LA), streets (calles, as in OTRO LADO DE LA CALLE), entrances (entradas, as in A LA ENTRADA DE LA), or churches (iglesias, as in LA IGLESIA DE SANTA ANA). This contributes to defining the physical environment in which characters move and live and also to creating the effect of believability discussed in Section 4.2. In example 1, for instance, we show an example of LA IGLESIA DE SANTA ANA in El laberinto de los espíritus. The systematic references to real places such as the church of Santa Ana contribute to rooting the story world of The Cemetery of Forgotten Books series in specific locations. They are repeatedly referred to in the four books of the series – occurrences of LA IGLESIA DE SANTA ANA in El prisionero del cielo, La sombra del viento, and El juego del ángel are shown in examples 18, 19, and 20. (1) Como todos los domingos desde que se había quedado viudo, más de veinte años atrás, Juan Sempere se levantaba temprano, se preparaba un café bien cargado y se enfundaba su traje y su sombrero de señor de Barcelona para bajar a la iglesia de Santa Ana. (El laberinto de los espíritus.) (Ruiz Zafón 2016, 582) In order to explore the functions of clusters and how Ruiz Zafón shapes the configuration of narrative space, we have searched for meaningful patterns that contribute to the creation of specific effects across the four novels under analysis. This will disclose hitherto-unremarked stylistic traits of Ruiz Zafón’s craftsmanship, which may contribute to a better understanding of his literary language. 4.4 Analysis A cluster analysis can be conducted in different ways. The most straightforward approach is perhaps by running a concordance of the clusters identified. A Corpus-Stylistic Approach to Ruiz Zafón’s Series 71 Concordances make possible a “vertical reading” (Tognini Bonelli 2001, 18) of the examples, which can in turn unveil unnoticed aspects of how a cluster is used. For example, the cluster CEMENTERIO DE LOS LIBROS OLVIDADOS, which refers to the mysterious place around which the novel revolves, is used in the exact same manner when a character welcomes another to that place, as shown in examples 2 to 8. As can be observed, the stretch of direct speech in which the cluster appears follows the same pattern every time: there is a vocative together with the phrase “bienvenido/a al” (welcome to) and the cluster. (2) – Daniel, bienvenido al Cementerio de los Libros Olvidados. (La sombra del viento) (Ruiz Zafón 2001, 8) (3) – Ignatius B. Samson, bienvenido al Cementerio de los Libros Olvidados. (El juego del Ángel) (Ruiz Zafón 2008, 89) (4) – Bienvenida al Cementerio de los Libros Olvidados, Isabella. (El juego del Ángel) (Ruiz Zafón 2008, 355) (5) – Fermín, bienvenido al Cementerio de los Libros Olvidados. (El prisionero del cielo) (Ruiz Zafón 2011, 271) (6) – Alicia – dijo por fin –. Bienvenida al Cementerio de los Libros Olvidados. (El laberinto de los espíritus) (Ruiz Zafón 2016, 53) (7) – Alicia, bienvenida de nuevo al Cementerio de los Libros Olvidados. (El laberinto de los espíritus) (Ruiz Zafón 2016, 460) (8) – Julián, bienvenido al Cementerio de los Libros Olvidados. (El laberinto de los espíritus) (Ruiz Zafón 2016, 621) This conciseness of the sentences in which the Cemetery of Forgotten Books is mentioned suggests how Ruiz Zafón encapsulates a moment of heightened intensity in a short stretch of language. Interestingly, this fits in what some authors admired by Ruiz Zafón do in similar moments. Charles Dickens, for instance, uses short sentences in parataxis to convey the same effect (Gordon 1966, 150). Gordon uses the example of Little Nell’s death in The Old Curiosity Shop, concisely described in a short one-sentence paragraph. According to Brook (1970, 29), this Dickensian trait is the result of the direct influence of Romantic writers, in which this linguistic feature was extensively used in moments of heightened intensity. As shown in these examples, Ruiz Zafón shows a similar tendency. Although this would require a more detailed analysis than editorial constraints permit here, it is not far-fetched that this could be connected to the alleged Dickensian influence in Ruiz Zafón’s writing. As Calle Rosingana (2012, 32) states in relation to Ruiz Zafón and Dickens, “Ruiz Zafón’s admiration for nineteenth-century fiction has been expressed by the writer in many interviews and is one of the central elements on which he bases his literary style” (our translation). As can be seen in examples 2 to 8, far from being circumstantial, the cluster CEMENTERIO DE LOS LIBROS OLVIDADOS shows a consistent pattern in Zafón’s style, as it is used similarly throughout the four books in the series. This could be a textual trace of nineteenth-century authors in general and of Dickens in particular. 72 Guadalupe Nieto Caballero and Pablo Ruano San Segundo Figure 4.1 Screenshot of concordance a la puerta de la in Ruiz Zafón’s novels. Concordances can also help us identify lexicogrammatical patterns in the use of clusters. Let us take A LA PUERTA DE LA as an example. A screenshot of a concordance run with WordSmith Tools is shown in Figure 4.1. As we can see, there are elements used repeatedly with this cluster: verbs that indicate movement, such as acercarse (get closer) (concordance lines 8, 9, 20, and 21) and aproximarse (approach) (concordance line 15); verbs that indicate position, such as detenerse (stop) (concordance lines 10, 11), permanecer (remain) (concordance line 4), esperar (wait) (concordance line 12), or apostarse (position) (concordance lines 16, 17, 18, and 19); adverbs that describe an action, such as lentamente (slowly) (concordance lines 8 and 9); and nouns that refer to means of transportation, such as coche (car) (concordance lines 6 and 21). A vertical reading of the concordance can help us see how the author uses these elements with the cluster and look into how the configuration of narrative space is shaped. Although editorial constraints do not allow to cover them all in this chapter, some of them are discussed next, as they are repeatedly used with other clusters too, resulting in meaningful functional patterns. We will focus A Corpus-Stylistic Approach to Ruiz Zafón’s Series 73 on two aspects: the use of spatial references in relation to the construction of fictional universes and characterization (Section 4.4.1) and the combination of spatial and time references as a means of offering contextualizing information (Section 4.4.2). 4.4.1 Fictional Universes and Characterization The construction of fictional universes and characterization in The Cemetery of Forgotten Books series is frequently related to references to narrative space. Descriptions of settings and characters often occur after one of the clusters under analysis here. This would be the case of example 9 from El juego del Ángel, in which AL OTRO LADO DE LA precedes a description of a bookshop. This is the function par excellence of the clusters that refer to narrative space. Not only are these the most frequent examples, but they also fulfill the primary function of space references: to provide a corporeal reality of the fictional universe we are seeing. These examples are frequently related to eye behavior – as seen in “nadie podía vernos” and “En el interior se podía ver” in (9) – which gives us the impression of being informed about the bookshop through the lens of the narrator and his own feelings at that moment. This is in line with Romero Frías and Galiñanes Gallén’s (2009) findings in their analysis of descriptions of narrative space in La sombra del viento, frequently filtered by a narrator-protagonist whose view occasionally distorts what is really happening (see Section 4.2). As seen in example 9 from El juego del Ángel, this goes beyond La sombra del viento and seems a textual device that characterizes Ruiz Zafón’s craftsmanship in general. (9) Barcelona nunca me parecía tan hermosa y tan triste como aquella tarde. Cuando empezaba a anochecer nos acercamos hasta la librería de Sempere e Hijos. Nos apostamos en un portal al otro lado de la calle, donde nadie podía vernos. El escaparate de la vieja librería proyectaba un soplo de luz sobre los adoquines húmedos y brillantes. En el interior se podía ver a Isabella aupada a una escalera ordenando libros en el último estante, mientras el hijo de Sempere hacía como que repasaba un libro de contabilidad tras el mostrador y le miraba los tobillos de refilón. (El juego del Ángel.) (Ruiz Zafón 2008, 258) In addition to eye behavior, clusters related to the construction of fictional universes and characterization are frequently used in connection to references to means of transportation, as commented on in the previous section. In examples 10 and 11, we show two similar examples of A LAS PUERTAS DE LA from two different novels. In both cases, a taxi takes a character to a place, then stops at the doors of a building (A LAS PUERTAS DE LA), and then the narrator starts describing a particular place or character. In example 10 from El juego del Ángel, the narrator describes the headquarters of the newspaper La Vanguardia, whereas in example 11 74 Guadalupe Nieto Caballero and Pablo Ruano San Segundo from El laberinto de los espíritus, the description is of a particular character. Another example from El laberinto de los espíritus is shown in example 12, this time with the cluster EN LO ALTO DE LA and with a tram (“tranvía”) instead of a taxi. This suggests a consistent textual pattern in Ruiz Zafón’s series. Thus, not only different clusters used repeatedly across the four novels (A LA PUERTA DE LA, A LAS PUERTAS DE LA, EN LO ALTO DE LA) are used as references from which a description starts, but they are also frequently related to a means of transportation, with which a character arrives at a place, and as if we were with him/her, we start seeing what is going on. Let us take example 10 as an example. Ruiz Zafón uses Daniel Sempere and his arrival to the headquarters of the newspaper La Vanguardia to start describing what the building is like (“todo allí desprendía un aire de señorío y opulencia”) and also descriptions about the people working inside (“un chaval con trazas de meritorio que me recordaba a mí mismo en mis años de Pepito Grillo”). This further reinforces Romero Frías and Galiñanes Gallén’s (2009) discussion of our perception of events through the lens of the narrator-protagonist and buttresses the relevance of narrative space in how Ruiz Zafón shapes his fictional universe in The Cemetery of Forgotten Books series, both with regard to settings and also to the characters that move and live in the physical environment of the story world. (10) Media hora más tarde, un taxi me dejaba a las puertas de la sede de La Vanguardia en la calle Pelayo. A diferencia de la siniestra decrepitud de mi antiguo diario, todo allí desprendía un aire de señorío y opulencia. Me identifiqué en el mostrador de conserjería y un chaval con trazas de meritorio que me recordaba mí mismo en mis años de Pepito Grillo fue enviado a dar aviso a don Basilio de que tenía visita. (El juego del Ángel) (Ruiz Zafón 2008, 209) (11) Cuando el taxi los dejó a las puertas de la Sección Trece, el que parecía haber sido designado como cancerbero del lugar esperaba ya en el umbral portando un manojo de llaves prendido del cinto y con un semblante que hubiera cosechado premios en un concurso de enterradores. (El laberinto de los espíritus) (Ruiz Zafón 2016, 205) (12) El tranvía lo dejó en lo alto de la avenida y se perdió de nuevo en la niebla, sus luces desvaneciéndose cuesta abajo en un espejismo de vapor. La plazoleta estaba desierta a aquellas horas. La luz de una farola solitaria dibujaba apenas las siluetas de dos coches negros apostados frente al restaurante La Venta. Policía, pensó Fernandito. (El laberinto de los espíritus) (Ruiz Zafón 2016, 342) 4.4.2 Contextualizing Information These descriptions lead us to another consistent pattern found throughout the different clusters under analysis here: how time and space references are frequently A Corpus-Stylistic Approach to Ruiz Zafón’s Series 75 combined. In example 10 we have shown an example that contains a time reference (“media hora más tarde”) together with the cluster A LAS PUERTAS DE LA and a means of transportation (a taxi), which contributes to offering the necessary context that precedes the description of the headquarters of La Vanguardia and the people working there. This example is not coincidental but part of a regular pattern found throughout the series. To understand how the clusters under analysis here are combined with time references to offer contextualizing information, we should first briefly refer to Ruiz Zafón’s frequent references to time references as a defining trait of his writing. As Ruiz Tosaus (2009) explains, time is one of the main themes of Ruiz Zafón’s novels. It frequently plays an important role in the plot of his novels, such as in the bildungsroman El príncipe de la niebla. But most important, time is key to understanding and deciphering the mysteries of Ruiz Zafón’s novels (Sáiz 2004, 14). In addition to these functions discussed by literary critics, references to time are also used less conspicuously throughout his novels to offer contextualizing information. Interestingly, these references are used in conjunction with references to narrative space. Together, they are used to provide contextualizing information and result in a preferred textual option that Ruiz Zafón uses repeatedly to define the context that precedes a new story episode. The combination of space and time as a means of offering contextualizing information can be observed in different clusters. In examples 13 to 15 we show three occurrences of A LAS PUERTAS DE LA. The pattern is the same as that shown in example 10. As can be observed, time references – “Diez minutos más tarde” in (13), “Una semana más tarde” in (14), and “En apenas un par de minutos de paseo” in (15) – are followed by the reference to narrative space, which is used as the starting point from which the episode begins. These examples are from three different novels (El juego del Ángel, La sombra del viento, and El laberinto de los espíritus), which suggests a pattern across the series. (13) Diez minutos más tarde llegaba a las puertas de la estación de Francia. Las taquillas ya estaban cerradas, pero aún podían verse varios trenes alineados en los andenes bajo la gran bóveda de cristal y acero. (El juego del Ángel) (Ruiz Zafón 2008, 284) (14) Una semana más tarde, a las puertas de la escuela de música de la calle Diputación, Sophie se encontró con don Ricardo Aldaya, que la esperaba fumando y ojeando un periódico. (La sombra del viento) (Ruiz Zafón 2001, 429) (15) En apenas un par de minutos de paseo a través de callejas heladas y desiertas llegaron a las puertas de la vieja fábrica. Los raíles de una línea ferroviaria se desvanecían a sus pies y se adentraban en el recinto. Un gran portón de piedra con la leyenda VAPOR BARCINO presidía la entrada. (El laberinto de los espíritus) (Ruiz Zafón 2016, 272) 76 Guadalupe Nieto Caballero and Pablo Ruano San Segundo Other clusters that refer to narrative space, such as AL OTRO LADO DE LA and EN EL INTERIOR DE LA, are also used jointly with time references, as shown in examples 16 and 17. These examples serve to reinforce the hypothesis that the configuration of narrative space is frequently related to eye behavior and how we perceive space through the lens of characters, as mentioned in Section 4.4.1. As can be observed, time references – “cerca de media hora” in example 16 and “las dos de la madrugada” in example 17 – add to the context with which the narrator describes the scene from his own viewpoint. To do so, Ruiz Zafón uses a time reference in combination with a reference to narrative space, which is directly related to his gaze. In example 16, for instance, “cerca de media hora” is the context information that Ruiz Zafón uses as the basis on which to define the scene: Daniel spends half an hour at the other side of the street (“al otro lado de la calle”), watching (“vigilando” and “viendo”) the silhouettes of Mr. Aguilar and his wife. To explain to his readers what is going on, Ruiz Zafón combines time and space. The exact same pattern is observed in example 17. (16) Aun así, tan desprovisto de dignidad como de abrigo apropiado para la gélida temperatura, me resguardé del viento en un portal al otro lado de la calle y permanecí allí cerca de media hora, vigilando las ventanas y viendo pasar las siluetas del señor Aguilar y de su esposa. No había rastro de Bea. (La sombra del viento) (Ruiz Zafón 2001, 351) (17) Cuando llegué a casa eran casi las dos de la madrugada. Iba a enfilar el portal cuando vi que había luz en el interior de la librería, un resplandor débil tras la cortina de la trastienda. (El prisionero del cielo) (Ruiz Zafón 2011, 192) Finally, clusters referring to narrative space are also combined with time references to bring us to a previous time. This is in line with Ruiz Zafón’s constant recollection of past events (Ruiz Tosaus 2009). Interestingly, Zafón’s idiosyncratic practice of bringing us back to a previous time is enacted by making constant references to narrative space too. In examples 18 to 20 we show three occurrences of LA IGLESIA DE SANTA ANA. They are from three different novels (El prisionero del cielo, La sombra del viento, and El juego del Ángel), which suggests a consistent formal pattern that contributes to buttressing literary critics’ discussions with regard to the frequent connections of Ruiz Zafón’s stories with past events to make sense of the story world. Another similar example of LA IGLESIA DE SANTA ANA with a time reference in El laberinto de los espíritus was shown in example 1. (18) La novia vestía de blanco y, aunque no lucía grandes alhajas ni adornos, no ha habido en la historia una mujer que fuese más hermosa a los ojos de su A Corpus-Stylistic Approach to Ruiz Zafón’s Series 77 prometido que la Bernarda aquel día primerizo de febrero reluciente de sol en la plaza de la iglesia de Santa Ana. (El prisionero del cielo) (Ruiz Zafón 2011, 281) (19) Bea y yo nos casamos en la iglesia de Santa Ana dos meses más tarde. El señor Aguilar, que todavía me hablaba en monosílabos y seguiría haciéndolo hasta el fin de los tiempos, me había concedido la mano de su hija ante la imposibilidad de obtener mi cabeza en bandeja. (La sombra del viento) (Ruiz Zafón 2001, 530) (20) En la carta, Sempere hijo me contaba que Isabella y él, tras varios años de noviazgo tormentoso e interrumpido, habían contraído matrimonio el 18 de enero de 1935 en la iglesia de Santa Ana. La ceremonia, contra todo pronóstico, la había celebrado el nonagenario sacerdote que había pronunciado la eulogia en el entierro del señor Sempere y que, a pesar de todos los intentos y afanes del obispado, se resistía a morir y seguía haciendo las cosas a su manera. (El juego del Ángel) (Ruiz Zafón 2008, 358) This relationship of narrative space with time references, together with the configuration of fictional universes shown in Section 4.4.1, has served to unveil hitherto-unremarked aspects of Ruiz Zafón’s craftsmanship in his treatment of narrative space. Although some of the clusters are clearly exclusive to The Cemetery of Forgotten Books series, such as CEMENTERIO DE LOS LIBROS OLVIDADOS, the systematicity of the formal and functional patterns discussed throughout this chapter suggests a well-established practice in the treatment of narrative that may be part of Ruiz Zafón’s style in general. The analysis of the clusters identified has shed new light on the Spanish author’s treatment of space beyond the symbolic value addressed by literary critics mentioned in Section 4.2. Hopefully, this has served to demonstrate the potential of corpus stylistics to reveal “meanings of literary texts that cannot be detected either by intuitive techniques as in literary studies” (Fischer-Starcke 2010, 2), thus complementing them and opening new avenues of analysis in the study of literary texts written in Spanish. 4.5 Conclusion Corpus stylistics is a well-established field of corpus linguistics that applies corpus methodologies to the analysis of literary texts. The potential of corpusstylistic analyses has been shown with many authors, especially in the Englishspeaking world. This potential contrasts with a clear absence of computer-assisted approaches of the analysis of Spanish novelists. In this chapter, we have set out to demonstrate the potential of corpus stylistics in the analysis of Spanish-speaking authors. To do so, we have conducted a cluster analysis to investigate Carlos Ruiz 78 Guadalupe Nieto Caballero and Pablo Ruano San Segundo Zafón’s treatment of narrative space in The Cemetery of Forgotten Books series. Our brief (of necessity) account of some findings on how the configuration of narrative space is shaped by Ruiz Zafón has hopefully shown how a computerassisted approach that combines corpus linguistics with traditional stylistics can provide meaningful insight into the style of the author. As has been shown, some of the findings have revealed previously unremarked stylistic traits of Zafón’s craftsmanship, demonstrating how corpus stylistics “can reveal patterns that we as readers may not be aware of, although such patterns might still contribute to the effects we perceive” (Mahlberg 2013, 27). Of course, the purpose of this chapter was not to set corpus stylistics as an approach that seeks to replace more traditional approaches – this seems one of the reasons that stylisticians in the Spanish-speaking world are reluctant to embrace corpus approaches. Needless to say, this form of analysis does not supplant other studies. Rather, they “should be seen as a complementary approach to more traditional approaches” (Biber et al. 1998, 7–8), from which the study of literary texts can benefit greatly. We hope that the case study presented here contributes to opening new avenues of analysis and to encouraging stylisticians in the Spanishspeaking world to incorporate computer-assisted approaches to their analytical tool kit. Note 1 For a more detailed account of clusters in general and how to identify them using WordSmith Tools, see Scott (2013). References Altenberg, Bengt. 1998. “On the Phraseology of Spoken English: The Evidence of Recurrent Word-Combinations.” In Phraseology. Theory, Analysis, and Applications, edited by Anthony Paul Cowie, 101–22. Oxford: Oxford University Press. Álvarez Méndez, Natalia. 2002. Espacios Narrativos. León: Servicio de Secretariado y Relaciones internacionales de la Universidad de León. Anthony, Laurence. 2019. AntConc (Version 3.5.8) [Computer Software]. Tokyo: Waseda University. Archer, Dawm. 2007. “Computer-Assisted Literary Stylistics: The State of the Field.” In Contemporary Stylistics, edited by Marina Lambrou and Peter Stockwell, 244–57. London: Continuum. Biber, Douglas, Susan Conrad, and Randi Reppen. 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press. Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad, and Edward Finegan. 1999. Longman Grammar of Spoken and Written English. Harlow: Longman. Brook, George L. 1970. The Language of Dickens. London: Andre Deutsch. Buchholz, Sabine, and Manfred Jahn. 2005. “Space in Narrative.” In The Routledge Encyclopedia of Narrative Theory, edited by David Herman, Manfred Jahn, and Marie-Laure Ryan, 551–54. London: Routledge. A Corpus-Stylistic Approach to Ruiz Zafón’s Series 79 Calle Rosingana, Gonzalo. 2012. “Perspectiva lingüística y cognitiva del estilo de Carlos Ruiz Zafón en La sombra del viento.” PhD diss., Universitat de Vic. Cheng, Winnie. 2012. “Corpus-Based Linguistic Approaches to Critical Discourse Analysis.” In The Encyclopedia of Applied Linguistics, edited by Carol A. Chapelle, 1–8. Oxford: Wiley-Blackwell. Conrad, Susan, and Douglas Biber. 2005. “The Frequency and Use of Lexical Bundles in Conversation and Academic Prose.” In The Corpus Approach to Lexicography, Thematischer Teil von Lexicographica. Internationales Jahrbuch für Lexikographie 20, edited by Wolfgang Teubert and Michaela Mahlberg, 56–71. Tübingen: Niemeyer. Fischer-Starcke, Bettina. 2010. Corpus Linguistics in Literary Analysis: Jane Austen and her Contemporaries. London: Continuum. Gordon, Ian. 1966. The Movement of English Prose. London: Longman. Hrabova, Valeria, Larisa Аdonina, Olga Medvedeva, and Olga Shutova. 2020. “Philological Concept of the Novel ‘The Shadow of the Wind’ by Carlos Ruiz Zafón.” Revista Inclusiones: Revista de Humanidades y Ciencias Sociales 7, no. 19: 344–52. Kopaczyk, Joanna. 2013. The Legal Language of Scottish Burghs: Standardization and Lexical Bundles (1380–1560). Oxford: Oxford University Press. Mahlberg, Michaela. 2013. Corpus Stylistics and Dickens’s Fiction. London: Routledge. Mahlberg, Michaela. 2014. “Corpus Stylistics.” In The Routledge Handbook of Stylistics, edited by Michael Burke, 387–92. London: Routledge. Matzat, Wolfgang. 2007. Espacios y Discursos en la Novela Española: Del Realismo a la Actualidad. Frankfurt: Iberoamericana Vervuert. McIntyre, Dan, and Brian Walker. 2019. Corpus Stylistics. Theory and Practice. Edinburgh: Edinburgh University Press. Nieto Caballero, Guadalupe. 2019. “El Espacio como Eje Vertebrador en la Creación del Universo Ficticio Galdosiano: Un Estudio de Corpus.” Signa 28: 1203–38. Nieto Caballero, Guadalupe, and Pablo Ruano San Segundo. 2020. Estilística de Corpus: Nuevos Enfoques en el Análisis de Textos Literarios. Bern: Peter Lang. Ramos Nogueira, Luis Carlos. 2016. “Motivación y Vigencia en Seis Locuciones del Universo de Carlos Ruiz Zafón.” Language Design 18: 45–70. Romero Frías, Marina, and Marta Galiñanes Gallén. 2009. “La Importancia del Análisis del Discurso Narrativo en la Traducción: L’ombra del vento de Carlos Ruiz Zafón.” Espéculo: Revista de Estudios Literarios 41. http://webs.ucm.es/info/especulo/ numero41/lombrav.html. Accessed 1 April 2022. Ruiz Tosaus, Eduardo. 2009. “Motivos, Símbolos y Obsesiones en la Narrativa de Carlos Ruiz Zafón.” Espéculo: Revista de Estudios Literarios 41. https://webs.ucm.es/info/ especulo/numero41/motivzaf.html. Accessed 1 April 2022. Ruiz Zafón, Carlos. 2001. La Sombra del Viento. Barcelona: Planeta. Ruiz Zafón, Carlos. 2008. El Juego del Ángel. Barcelona: Planeta. Ruiz Zafón, Carlos. 2011. El Prisionero del Cielo. Barcelona: Planeta. Ruiz Zafón, Carlos. 2016. El Laberinto de los Espíritus. Barcelona: Planeta. Ryan, Marie-Laure, Kenneth Foote, and Maoz Azaryahu. 2016. Narrating Space/Spatializing Narrative: Where Narrative Theory and Geography Meet. Columbus, OH: Ohio State University Press. Sáiz Ripoll, Anabel. 2004. “Sólo Recordamos lo que Nunca Sucedió: Análisis de la Obra de Carlos Ruiz Zafón.” Cuadernos de literatura infantil y juvenil 177: 7–27. Scott, Mike. 2013. WordSmith Tools Manual. Version 6.0. Liverpool: Lexical Analysis Software. 80 Guadalupe Nieto Caballero and Pablo Ruano San Segundo Scott, Mike. 2016. WordSmith Tools Sersion 7. Stroud: Lexical Analysis Software. Scott, Mike. 2019. “Single Words v. Clusters.” https://lexically.net/downloads/version7/ HTML/single_words.html. Accessed 1 April 2022. Stubbs, Michael, and Isabel Barth. 2003. “Using Recurrent Phrases as Text-Type Discriminators. A Quantitative Method and Some Findings.” Functions of Language 10, no. 1: 61–104. Tognini-Bonelli, Elena. 2001. Corpus Linguistics at Work. Amsterdam: John Benjamins. Zoran, Gabriel. 1984. “Towards a Theory of Space in Narrative.” Poetics Today 5, no. 2: 309–35. Zubiaurre, María Teresa. 2000. El Espacio en la Novela Realista. México: Fondo de Cultura Económica. 5 Analyzing Who, What, and Where in a Mediæval Chinese Corpus A Case Study on the Chinese Buddhist Canon1 Tak-sum Wong and John Sie Yuen Lee 5.1 Introduction As more literary and historical texts become digitized, researchers increasingly complement traditional, manual analysis with data-driven, quantitative methods. They have been applied to study various textual properties, such as literary style (e.g., Holmes 1994), evolution of literary genres (e.g., Moretti 2007), text reuse and intertextuality (e.g., Büchler et al. 2010), authorship (e.g., Hung et al. 2010; Sayoud 2012), and novel structure (e.g., Clement 2008). Automatic methods have also been developed to recognize named entities (e.g., Van Dalen-Oskam et al. 2014; Bornet and Kaplan 2017) and to retrieve location, time, participant, and action from events recorded in a narrative (e.g., Vossen et al. 2008). These methods can in turn facilitate character analysis, such as their persona (e.g., Bamman et al. 2013) and social networks (e.g., Moretti 2007; Elson et al. 2010; Agarwal et al. 2012). While automatic analysis cannot match the depth of scholarly work, it offers greater breadth by covering larger amount of text than can ever be processed by individual scholars. Information extraction from historical text is challenging because of the lack of metadata such as ontologies (Gutierrez et al. 2016), as well as syntactically annotated data. This chapter addresses the challenges in information extraction for the three “wh-questions” – who were the characters, what did they do, and where were they? – in a historical corpus written in a low-resource language. In particular, we investigate whether and how to exploit a limited amount of in-domain training data to improve extraction accuracy and report a case study on the Chinese Buddhist Canon (henceforth, “the Canon”), the largest collection of Medieval Chinese texts. We apply a lexicon-based approach for named entity recognition (NER) on the Canon and then perform part-of-speech (POS) tagging and dependency parsing to identify significant relations between characters, verbs, and toponyms. NER and relation extraction are essential for information retrieval from textual data (Lizarralde et al. 2019), but they remain underexplored for literary texts, especially those written in low-resource languages: the only previous study on these tasks for the Canon focused on manual annotations in a limited number of texts (Bingenheimer et al. 2009). Existing NLP tools for Chinese tend not to perform well on Medieval Chinese, since they are trained on Modern Standard Chinese. DOI: 10.4324/9781003298328-6 82 Tak-sum Wong and John Sie Yuen Lee This study shows that even a small amount of in-domain data for word segmentation, POS, and syntactic structure can improve accuracy in NER and in retrieving significant character-verb associations. Further, it presents the first data-driven profiling of the characters, verbs, and toponyms over the entire Canon and produces the largest Medieval Chinese corpus to date that is automatically annotated with named entities, character-verb, and character-toponym associations. The rest of the chapter is organized as follows. In the next section, we present the textual material of our corpus and review previous work in information extraction. In Section 5.3, we exploit word segmentation training data to improve named entity annotation. In Sections 5.4 and 5.5, we evaluate the impact of POS and dependency training data on retrieving significant verb and toponyms, respectively. In Section 5.6, we conclude with a summary of our contributions and suggestions for future work. 5.2 Background 5.2.1 Corpus and Linguistic Resources The Chinese Buddhist Canon is a collection of Medieval Chinese texts that are deemed canonical for Chinese Buddhism. In this study, we use the Korean edition of the Canon,2 the Tripiṭaka Koreana 高麗藏 (Lancaster and Park 1979), for which a digital version has been made available (Lancaster 2010). This edition, with over 40 million characters, was derived from the printing blocks stored at the Haein Monastery海印寺 in Korea, the most complete set of currently available blocks. The Canon can be divided into two main subcorpora, the Mahāyāna and the Hīnayāna,3 named after the two main schools of Buddhism. Mahāyāna, the “great vehicle,” is today dominant among the Chinese, Koreans, Japanese, Vietnamese, and Tibetans; Hīnayāna, the “lesser vehicle,” is most widespread in South and Southeast Asian countries. Unlike Modern Chinese, which enjoys a wide range of linguistic resources, Medieval Chinese remains a low-resource language. The only existing large POS-tagged corpus for this language is Huainanzi 淮南子 (Song and Xia 2014), and the only treebank (henceforth, the “L&K Treebank”) consists of only about 50,000 Chinese characters drawn from four sūtras in the Canon (Lee and Kong 2016). For word segmentation and part-of-speech tagging, both resources largely followed the guidelines for the Penn Chinese Treebank (Xue et al. 2005). For dependency relations, the L&K Treebank used the Stanford Dependencies for Modern Chinese (Chang et al. 2009). Although Buddhism scholars have developed a number of in-domain digital lexica, there is not yet any attempt to mark up the myriad of characters and places in the entire Canon. To tackle this task, we will utilize four of the largest lexica: the Person Authority Database (DDBC 2008a), which contains 39,277 personal names; the Place Authority Database (DDBC 2008b), which contains 18,017 geographical names; a Dictionary of Chinese Buddhist Terms (Soothill and Hodous 1937), which has 16,687 entries; and 720 Sanskrit-transliterated terms harvested from Chu (1996, 1998, 1999). Analyzing Who, What, and Where in a Historical Mediæval Chinese Corpus 83 5.2.2 Information Extraction from Literary Works Named entity recognition (NER) aims to identify entities such as people, organizations, and locations in unstructured text. Information extraction (IE) then labels their relationships, such as the location of the headquarters of an organization or the organization to which a person belongs (Doddington et al. 2004). In the historical domain, the KYOTO framework defines the historical event model with four slots: location, time, participant, and action (Vossen et al. 2008). Our work follows this framework in extracting associations between characters, verbs, and locations but excludes temporal information. In supervised approaches, statistical classifiers are trained to identify relation mentions that link two entities. A variety of lexical, syntactic, and semantic features has been explored (e.g., Surdeanu and Ciaramita 2007; Ji and Grishman 2008; Agarwal and Rambow 2010). In the news domain, large corpora, such as those from the NIST Automatic Content Extraction (ACE) program, have served as training datasets. The Namescape project, for example, trained existing named entity recognizers on Dutch literary texts to recognize names in Dutch fiction (Van Dalen-Oskam et al. 2014). While most approaches previously mentioned used POS tags and regular expressions, a number of studies showed that syntactic features can help improve performance. Bornet and Kaplan (2017) included grammatical structure as part of their rule-based system to recognize proper names in French novels. Zhou et al. (2007) reported that syntactic features can improve the accuracy in information extraction, while Mintz et al. (2009) found that the combination of syntactic and lexical features provides better performance than either feature set on its own. However, it remains an open question whether these improvements can hold in low-resource domains, for which the paucity of in-domain training data typically precludes high-accuracy automatic parsing. We seek to answer this question with a case study on the Chinese Buddhist Canon, the first attempt of automatic NER and IE in Medieval Chinese text. With the exception of a manually annotated corpus of nexus points – groups of people at particular locations – for the Biography of Eminent Monks 高僧傳 (Bingenheimer et al. 2009), most existing Chinese NER and IE corpora are for Modern Chinese (e.g., Finkel et al. 2005; Shih et al. 2004). Our experimental results, which investigate whether a limited amount of in-domain syntactic annotation can improve NER and IE accuracy, will have implications on future analyses of historical corpora. 5.3 Named Entity Tagging We first describe our lexicon-based method for marking up personal and geographical names (Section 5.3.1) and then present an evaluation on this method (Section 5.3.2). 5.3.1 Approach Baseline Our baseline uses the Stanford Chinese word segmenter tool, a widely used segmenter trained on Modern Chinese (Chang et al. 2008). 84 Tak-sum Wong and John Sie Yuen Lee In-Domain Segmenter We trained a Chinese word segmenter on the L&K Treebank with conditional random fields (Lafferty et al. 2001) using the CRF++ implementation. We adopted the features proposed by Zhao et al. (2007) and used the four resources mentioned in Section 5.2.1 as external lexica. Compared to Modern Chinese, fewer words in Medieval Chinese text contain more than two syllables. Therefore, we followed Peng et al. (2004) and Tseng et al. (2005) in adopting a two-tag set for word segmentation. Following word segmentation with either approach, we performed Forward Maximal Matching (FMM) on the text using the Person Authority Database and Place Authority Database (Section 5.2.1). This improves recall since most word segmentation errors involve the split-up of a word than the merging of two words. 5.3.2 Evaluation 5.3.2.1 Data We created a test set from the L&K Treebank as follows. For each word that is tagged as a proper noun (NR), we search in the Person or Place Authority Database to decide whether it is a personal name or toponym. In cases where the word is found in neither database, an annotator with a background in Medieval Chinese performed the classification. This test set contains a total of 1,914 characters and 114 toponyms. Naive use of these lexica would lead to false alarms because they cover not only terms found in the Canon but also a much wider range of people and locations related to Buddhism, many of which share the same form with common nouns and verbs.4 To increase NER precision, we filter out terms of Chinese origins from the lexica:5 since the Canon consists mostly of translations from Indic languages, most named entities are of non-Chinese origins. 5.3.2.2 Results Table 5.1 shows the NER precision and recall using the FMM approach previously described. For recognizing personal names, the baseline achieved 77% precision and 51% recall. Since the Stanford segmenter is trained on Modern Chinese, its relatively poor performance on Medieval Chinese is not unexpected. Using the CRF model trained on in-domain data, precision improved to 87%, and recall to Table 5.1 Named Entity Recognition Performance on the Test Set Word Segmentation Method Characters In-domain segmenter Baseline Precision 0.87 0.77 Toponyms Recall 0.69 0.51 Precision 0.82 0.58 Recall 0.48 0.19 Analyzing Who, What, and Where in a Historical Mediæval Chinese Corpus 85 69%. Toponym recognition turned out to be a more challenging task. The baseline achieved 58% precision and 19% recall. Our proposed approach performed significantly better, at 82% precision and 48% recall. For both personal names and toponyms, recall suffered from limited coverage of the dictionaries, which do not include personal names, such as Hēsàduō 訶薩多 and Bōtóumó 砵頭摩 (K426), or places, such as Jiāluóhuánsì 加羅洹寺 (K1002), a temple in Śrāvastī; Shājiégǔ 沙竭古 (K1002), a country in the north of Central Asia; and Băoshān寶 山 (K426), a mountain. Using this method, we identified 588,871 personal names (871 unique names) and 53,033 toponyms (480 unique names) in the Canon. 5.4 Characters and Verbs While raw frequency of a character name can give a rough indication of the significance of the character, it is inflated when the name is embedded in expressions that do not imply the character’s presence. For instance, the terms rúláijiětuōmén 如來解脫門 and fóchà 佛剎 contain “Buddha” (rúlái and fó, respectively) but imply no action on the part of Śākyamuni. Instead, we consider the number of times the character appears in a character-verb pair, that is, the number of times a character serves as the subject of a noun. We now seek to identify the most significant characters in the corpus and the verbs that are associated with them. Following an outline of our approach (Section 5.4.1), we report an evaluation (Section 5.4.2) and then analyze a number of characters with their most frequent verbs (Section 5.4.3). 5.4.1 Approach Baseline Our baseline does not have access to POS information. Given that Chinese is an SVO language, it assumes a character name and the following word (except punctuation) form a character-verb pair. POS-Based Approach We trained a part-of-speech (POS) tagger with CRF++ (Lafferty et al. 2001) on the POS tags in the L&K Treebank. In addition to the standard unigram and bigram features, we also included a feature for proper nouns, based on the Person and Place Authority Databases and the Sanskrit-transliterated terms from Chu (Section 5.2.1). For each personal name in the NER-annotated corpus (Section 5.3), we retrieved the verb (VV) that either immediately follows it or separated from it by an adverb (AD). Take the sentence in Figure 5.1 as an example. The word immediately following Bhiət, “Buddha,” is ngiò, “meet”; we thus included the character-verb pair (Bhiət, ngiò) in our dataset. 86 Tak-sum Wong and John Sie Yuen Lee Njiε̌ zhiə Bhiət ngiò Biunghuàn , dang sio that time Buddha encounter stroke must need “At that time, Buddha had a stroke and must need (some) milk.” ngiou bull njiǒ milk Figure 5.1 Example dependency tree to illustrate character-verb pair extraction (K229). Dependency-Based Approach Dependency structures facilitate extraction of character names separated by longer distance from their verbs. We trained a Minimum-Spanning Tree parser (McDonald et al. 2006) on the L&K Treebank to automatically derive dependency structures (Wong and Lee 2016). To collect character-verb pairs, we retrieved all “nominal subject” (nsubj) relations in the treebank: the child word, if tagged as a personal name, is the character; the parent word is the verb. The nsubj relation in Figure 5.1, for example, indicates that Bhiət, “Buddha,” is the subject of the verb ngiò, “meet”; we thus included the character-verb pair (Bhiət, ngiò) in our dataset. Dependency structures are especially useful in serial verb constructions. The verbs in these constructions, which are common in Medieval Chinese, are linked with the “conjunct” (conj) relation, as exemplified by the verbs ngiò, “meet,” and sio, “need,” in Figure 5.1. Although Bhiət, “Buddha,” is the subject for both verbs, the second verb (sio) is not directly linked to it. We attributed the subject to all verbs in a serial verb construction, hence, in this case, also recognizing “Bhiət, sio” as a character-verb pair. 5.4.2 Evaluation 5.4.2.1 Data Our test set consists of 896 character-verb pairs in the L&K Treebank. These pairs were retrieved from the child and parent words of the “nominal subject” (nsubj) relations, where the child word is annotated as a personal name (Section 5.3.2.1). 5.4.2.2 Results To estimate the quality of the automatically retrieved character-verb pairs, we evaluated the accuracy of the nsubj relations on tenfold cross-validation on the test set. The precision/recall figure of the baseline approach are 0.46 and 0.64, Analyzing Who, What, and Where in a Historical Mediæval Chinese Corpus 87 Table 5.2 Precision and Recall in Subject-Verb Pair Extraction from L&K Treebank Method Precision Recall Baseline POS-based Dependency-based 0.46 0.77 0.91 0.64 0.71 0.93 respectively (Table 5.2). Despite the small amount of training data, the precision/ recall of the POS-based approach reached 0.77 and 0.71. The use of dependency information further improved the precision to 0.91, and the recall to 0.93. Recognition of character-verb pairs in serial verb constructions contributed to the gain in recall. The remaining recall errors were due to the parser’s mislabeling of nominal subject relations as noun modifier (nn) or adverbial modifier (advmod). Most precision errors were caused by mistaking a vocative as a nominal subject. 5.4.3 Analysis The procedure previously described harvested 36,666 unique character-verb pairs. We analyze the most significant characters (Section 5.4.3.1) and the most frequent verbs associated with each character (Section 5.4.3.2). 5.4.3.1 Character Distribution Table 5.3 lists the ten most frequent characters.6 The founder of Buddhism, Śākyamuni Buddha, is the dominant character, serving as the nominal subject 81% of the time.7 Four of his prominent disciples – Subhūti, Ānanda, Śāriputra, and Maudgalyāyana – trail him, at much lower frequencies. Ranked fourth is a bodhisattva, Mañjuśrī. Bodhisattvas are “enlightened beings” who, out of compassion, delay their entry into nirvāṇa in order to aid others. In contrast, Pratyekabuddha aims to attain nirvāṇa for himself rather than helping others. The three remaining characters among the top ten are Mahāyānadeva (602−664), a prolific translator of Buddhist scriptures into Chinese; Sudhana (also known as the Child of Wealth), an acolyte of Bodhisattva Avalokitesvara; and Devadatta, a cousin and rival of Śākyamuni, who cultivated magical powers. To compare our method with scholarly work, we consulted the glossary items in the book Buddhism: A Modern Perspective (Prebish 2000). There are 50 characters among these glossary items that are listed in the Person Authority Database (DDBC 2008a). Our method retrieved 48% of these characters as our top 50 characters. Among those ranked below 50 were Yaśodharā 耶輸陀羅, the wife of Śākyamuni, and Viśākhā 毘舍佉, a prominent follower. Other omissions consisted mainly of monks, writers, and kings who played important roles in the history of Buddhism but did not appear in the Buddhist Canon. 88 Tak-sum Wong and John Sie Yuen Lee Table 5.3 Most Frequent Characters (As Nominal Subjects) in the Corpus Character Frequency Character Frequency Śākyamuni Buddha 81.4% Pratyeka-buddha 1.0% 釋迦牟尼佛 Ānanda 2.8% 辟支佛 Maudgalyāyana 0.6% 阿難 Subhūti 1.9% 目犍連 Mahāyānadeva 0.4% 須菩提 Mañjuśrī 1.5% 玄奘 Devadatta 0.3% 文殊菩薩 Śāriputra 1.4% 提婆達多 Sudhana 0.3% 舍利弗 善財童子 The high-ranking characters in our list that are not mentioned in Prebish (2000) were mostly deities, such as Śakra 帝釋天, and translators, such as Amoghavajra 不空 (705−774, ranked 15th). Two other notable characters excluded from the glossary were Aniruddha 阿那律 (ranked 12th), one of the ten principal disciples of Buddha, and Channa 闡那 (ranked 25th), the head charioteer of Prince Siddārtha. As discussed in Section 5.2.1, the Canon has two main subcorpora, containing material from two different religious traditions, known as Mahāyāna and Hīnayāna. Not all characters are equally represented in both subcorpora. The log-likelihood metric can identify words that can best discriminate between the two subcorpora (Rayson 2008): words with high log-likelihood scores tend to be emphasized by one subcorpus but rarely mentioned in the other. We computed this metric on all character names. The character who yielded the highest log-likelihood score was Subhūti, one of Buddha’s disciples. In the Mahāyāna section, Subhūti is second only to Buddha in terms of frequency. Known for his knowledge of “emptiness,” one of the most important Mahāyāna doctrines, Subhūti played a significant role in the exposition of the Mahāyāna scriptures, especially the prajñā-pāramitā sūtras 般若經; his role in the Hīnayāna is much less important (Table 5.4). The second highest-scoring character was Mañjuśrī, one of the best-known bodhisattvas. The concept of bodhisattva was a major doctrinal difference between the two subcorpora. As summarized by I Tsing 義淨 (635−713), a seventh-century Chinese monk, “those who venerate the bodhisattvas and read the Mahāyāna sūtras are called the Mahāyānists, while those who do not perform these are called the Hīnayānists.” In the Mahāyāna section, Mañjuśrī features prominently, ranking third, behind only Buddha and Subhūti. Reflecting the doctrinal differences, however, he receives less prominence (0.2%) in the Hīnayāna section (Table 5.4). Analyzing Who, What, and Where in a Historical Mediæval Chinese Corpus 89 Table 5.4 Most Frequent Characters in the Two Subcorpora of the Canon Mahāyāna Character Frequency Hīnayāna Character Frequency Śākyamuni Buddha Subhūti Mañjuśrī Ānanda Śāriputra 80.9% 3.6% 2.3% 1.9% 1.7% Śākyamuni Buddha Ānanda Śāriputra Maudgalyāyana Mahāyānadeva 83.8% 4.4% 1.2% 1.0% 0.5% 5.4.3.2 Verb Distribution Having identified the main characters, we now analyze what they did, based on the most common verbs that portray the characters in action. We first examine the verb profile of the protagonist against those of other characters. We then show how the profiles of three of the top ten characters suggest their contrasting roles in the corpus. Figure 5.2 lists the ten most frequent verbs in our corpus. The saying verbs dominate this list, occupying the top three positions – yán, “to speak”; shuō, “to talk”; and gaò, “to tell.” Their prominence reflects the Canon as the remembered word of the Buddha. When Buddha “talked” with an object, he frequently talked about “sūtra” (jīng 經), the “dharma” (fǎ 法), and “verse” (jì 偈), the form in which much of the Canon was written to facilitate memorization and chanting, reinforcing the authority of the texts by emphasizing that Śākyamuni himself “spoke” them. He not only gave sermons but also engaged in dialogues with many characters. The other two verbs, yán and gaò, often serve as quotative verbs to introduce direct speech. As shown in Table 5.5, the saying verbs dominate all characters’ verb profile – yán, shuō, and gaò for Buddha, and yán, bái, shuō for the other characters. Three verbs co-occur significantly more with other characters than with Buddha. The first one is bái, ranked second for other characters but conspicuously absent in the list for Buddha (Table 5.5). The uneven distribution is due to its honorific usage: bái is a quotative verb for reporting speech from an inferior to a superior (Kieschnick 2015). The second verb is fèng, “to receive (order)” (ranked fifth), which acknowledges those who served the imperial court to translate the Buddhist texts into Chinese from their sources in Indic languages; Buddha himself, however, never engaged in this activity. Another contrastive verb is wèn, “to ask” (ranked fourth), which suggests that more questions flowed from other characters to Buddha, rather than the other way around. Conversely, two verbs, both dealing with locations, are more connected with Buddha than with the other characters. The frequent appearances of zài, “dwell in,” and zhù, “stay in” (Table 5.5, left column), result from the meticulous recording of the venues where Buddha delivered his sermons, including the formulaic sentence “Once upon a time, Buddha dwelled in (zài) so-and-so” that appears in the preamble in a majority of the sūtras. We further contrast the verb profiles of three of the top ten characters: Ānanda, Mahāyānadeva, and Pratyeka-buddha (Table 5.6). Ānanda was Tak-sum Wong and John Sie Yuen Lee 90 12% 10% 10.1% 8.8% 8% 7.2% 6% 1.8% 1.8% 1.6% 1.4% 1.2% 知 zhī ʻto knowʼ 住 zhù ʻto stayʼ 有 yǒu ʻto haveʼ 1.9% 無 wú ʻnot existʼ 2.1% 2% 白 bái ʻto addressʼ 4% 在 zài ʻto dwellʼ 爲 wéi ʻto doʼ 告 gào ʻto tellʼ 說 shuō ʻto talkʼ 言 yán ʻto speakʼ 0% Figure 5.2 Most frequent verbs with nominal subjects. Table 5.5 Most Frequent Verbs with Buddha (Left) and Other Characters (Right) as Subject Buddha Verb Other Characters Freq. Verb Freq. 言 yán “to speak” 白 bái “to address” 說 shuō “to talk” 問 wèn “to ask” 奉 fèng “to receive (order) ” 無 wú “not have” 爲 wéi “to do” 作 zuò “to make” 答 dá “to reply” 行 xíng “to practice” 10.6% 9.0% 3.7% 2.9% 2.5% 言 說 告 爲 在 yán “to say” shuō “to speak” gào “to tell” wéi “to do” zài “to dwell in” 10.0% 10.0% 8.7% 2.2% 2.1% 無 知 住 有 得 wú “not have” zhī “to know” zhù “to stay in” yǒu “to have” dé “to attain” 1.9% 1.6% 1.5% 1.2% 1.2% 1.6% 1.6% 1.5% 1.5% 1.4% Analyzing Who, What, and Where in a Historical Mediæval Chinese Corpus 91 Table 5.6 Most Frequent Verbs of Three Different Characters Pratyeka-Buddha Mahāyānadeva Verb Freq. Verb 無 wú “not have” 得 dé “to attain” 有 yǒu “to have” 知 zhī “to know” 作 zuò “to make” 4.7% 3.3% 3.2% 2.9% 2.8% 奉 譯 論 弘 言 fèng “to receive” yì “to translate” lùn “to discuss” hóng “to spread” yán “to speak” Ānanda Freq. Verb 51.0% 0.3% 0.2% 0.2% 0.2% 白 言 問 聞 知 bái “to address” yán “to speak” wèn “to ask” wén “to hear” zhī “to know” Freq. 14.0% 6.1% 3.6% 3.3% 2.8% Buddha’s personal attendant and closest disciple. His most frequent verb is bái, “to address,” an honorific quotative verb for reporting his conversations with Buddha. While bái also co-occurs often with other disciples, the verb wén, “to hear,” distinguishes Ānanda from them. By tradition, he was the one who heard all of Buddha’s sūtras and later recited them to be canonized. As a result, Ānanda is also called the one “who heard much,” a name corroborated by the high frequency of wén. For Mahāyānadeva, the most common verbs are yì, “translate,” and fèng, “to receive (order).” Unlike those of Ānanda and most other characters, he had hardly any verbal interactions. Indeed, Mahāyānadeva was an eminent translator of Buddhist scriptures. It is often said that he “received order” from the Chinese emperor and “translated” the scriptures for the Chinese to read. The Pratyeka-buddha, “lone buddha,” is one who seeks enlightenment for himself and does not bring others to it; he also has a distinctive verb profile. The Mahāyāna subcorpus (Section 5.4.2) tends to cast him in an unfavorable light. His second most frequent verb, dé, “to attain,” mostly collocates with fǎ, “dharma,” to form the phrase “attain (some) dharma.” A quarter of his fourth most frequent verb, zhī, “to know,” are negated with bù to yield “not able to know.” 5.5 Characters and Locations We now move to the analysis of “where” to examine prominent toponyms that indicate the scene of action for various characters. Raw frequencies again do not suffice for this task, since not every mention of a toponym serves this purpose. For example, “Ganges River” is disproportionately frequent, even though it is usually not tied to any character. Rather, it appears most of the time in the phrase “the sands of the Ganges River,” a stock expression to convey innumerability.8 We instead construct a set of character-toponym pairs by identifying instances of toponyms that indicate the characters’ whereabouts. After outlining our algorithm (Section 5.5.1), we report an evaluation (Section 5.5.2) and present an analysis of the most frequent toponyms (Section 5.5.3). 92 Tak-sum Wong and John Sie Yuen Lee 5.5.1 Approach Baseline We take all personal and geographical names that occur within the same sentence as character-toponym pairs. This baseline requires no POS tagging or dependency parsing. POS-Based Approach We retrieved the verbs and prepositions that immediately precede the toponyms in our corpus. These verbs and prepositions are rather limited in variety: the two most common verbs, zài, “to dwell in,” and zhù, “to stay in,” account for almost a third of the instances, while the most frequent preposition, yú, “at,” constitutes more than half of the instances. We considered all verbs and prepositions whose frequency exceed 0.1% as location markers. We extracted a personal name and a toponym to be a character-toponym pair when one of these markers occurs between them. Dependency Approach Toponyms typically appear in one of two kinds of dependency structures. It may serve as the direct object of a verb; for example, in Figure 5.3, Rājagṛha is the direct object of the locative verb dzhə̌i, “to dwell in.” It may also serve as the object of a preposition; for example, in Figure 5.4, Rājagṛha is the object of the preposition dzhiong, “from.” The location can be a simple proper noun or one that modifies a localizer (e.g., , “the north of Gṛdhrakūṭa”).9 We retrieved all verbs and prepositions in our corpus that take a toponym as the direct object or prepositional object (Figures 5.5, 5.6). Similar to the POS-based approach, we considered all verbs and prepositions whose frequency exceed 0.1% as location markers.10 Zhiε̌ gò Bhiət dzhəǐ d This reason Buddha usually dwell.in “For this reason, Buddha usually dwelled at Rājagr.ha.” Hiu ngshiàzhi ng Rājagr.ha Figure 5.3 Dependency tree with a character-toponym pair involving a verb. Analyzing Who, What, and Where in a Historical Mediæval Chinese Corpus dzhiong Njiε̌ zhiə Shi ìtzuən that time Bhagavat from “At that time, Bhagavat departed from Rājagr.ha.” Hiu ngshiàzhi ng Rājagr.ha 93 chuit depart Figure 5.4 Dependency tree with a character-toponym pair involving a preposition. 35.00% 30.60% 30.00% 25.00% 20.00% 17.42% 15.00% 10.00% 5.46% 3.20% 5.00% 2.35% 2.00% 1.96% 1.83% 1.55% 1.18% xiàng 向 ‘face’ yóu 遊 ‘travel to’ huán 還 ‘re turn’ dào 到 ‘reach’ zhù 住 ‘goto’ yì 詣 ‘goto’ rù 入 ‘enter’ zhì 至 ‘arrive at’ zhù 住 ‘stay in’ zài 在 ‘dwell in’ 0.00% Figure 5.5 Most frequent verbs that take toponyms as direct objects. Then, we examined the character-verb pairs collected in Section 5.4 and identified those that fall into one of the following two cases. First, if the verb is a location marker and if its direct object is a location, then the character and the location are considered a character-toponym pair. For example, Figure 5.3 contributes the pair “Bhiət, Hiuangshiàzhiɛng” (“Buddha, Rājagṛha”). Second, if the verb is modified by a preposition that is a location marker and if its prepositional object 94 Tak-sum Wong and John Sie Yuen Lee 60.00% 54.50% 50.00% 40.00% 30.00% 20.00% 7.80% 10.00% 7.00% 5.80% 1.80% yú 于 ‘at’ cóng 從 ‘from’ zhì 至 ‘to’ zài 在 ‘at’ yú 於 ʻatʼ 0.00% Figure 5.6 Most frequent prepositions that take toponyms as prepositional objects. is a location, then the character and the location are also included as a charactertoponym pair. From Figure 5.4, for example, we obtained the pair “Shiɛìtzuən, Hiuangshiàzhiɛng” (“Bhagavan, Rājagṛha”). 5.5.2 Evaluation 5.5.2.1 Data Since the L&K Treebank contains too few toponyms for a reliable evaluation, we expanded it with Pi nai yeh (Vinaya ) (K936), an important text on vinaya. An annotator with a background in Medieval Chinese studies first identified all toponyms in K936 and then examined whether it indicates the location of a person. Of the 663 instances of toponyms, 201 were included in a character-toponym pair. 5.5.2.2 Results The precision and recall on the test set are shown in Table 5.7. The baseline gave a strong performance, at 0.99 precision and 0.73 recall. The use of toponyms in the Analyzing Who, What, and Where in a Historical Mediæval Chinese Corpus 95 Table 5.7 Precision and Recall in Character-Toponym Pair Extraction in the Test Set Information Extraction Method Precision Recall Baseline POS-based Dependency-based 0.99 1.00 1.00 0.73 0.66 0.66 test set was rather regular: they almost always indicate the location of a character if one is present in the same sentence. The main source of error for recall is the distance between the character and the toponym. About 15% of the gold character-toponym pairs had personal names and toponyms in different sentences.11 We restricted the retrieval of character-toponym pairs within the same sentence to emphasize precision, which is important for our analysis. Unlike the case of character-verb pair extraction, POS and dependencies did not improve the overall performance. Both the POS-based and dependency-based approaches attained 100% precision. Their more stringent requirements on lexical choice and linguistic structure, however, led to a degradation in recall to 0.66. The most common errors were due to verbs that were not considered a location marker due to relative infrequency.12 5.5.3 Analysis The previous procedure yielded 1,113 unique character-toponym pairs. Figure 5.7 lists the most frequently mentioned locations that are associated with a character. The top entries are all places where Buddha was active, involving Śrāvastī, Rājagṛha, or a more specific location in their environs. The second-ranked location, Śrāvastī , was the city where Buddha spent much of his monastic life. This city was in turn located in Kosala (ranked ninth). During the nine months of favorable weather in northeast India, Buddha and his disciples wandered from place to place to teach. During the monsoon season, they retreated to a monastery, where Buddha taught and gave discourses. They spent most of these seasons in Śrāvastī, at a monastery at Jetavana (ranked first). Rājagṛha (ranked third), another major location of Buddha’s preaching, and two other places at or near this city are also frequently named. One was the Bamboo Grove 大林精舍, the first Buddhist monastery, where Buddha often stayed during the winter. The other was Vulture’s Peak 靈鷲山, where Buddha delivered the celebrated prajñā-pāramitā sutras, among many others. Of the five remaining places, two played significant roles at the beginning and end of Buddha’s ministry: Vaiśālī , where he preached his last sermon before his death and announced his “great nirvāṇa,” and Varanasī , where Buddha gave his first sermon in Sarnath 鹿野苑. Buddha has also established several early monastic precincts in Kauśāmbī . The remaining one, in contrast, is not connected with the historical Śākyamuni. The disciples are often said to be at “Buddha’s place” 佛 所 (ranked eighth) and rarely independently reported to be at a geographical location. 96 Tak-sum Wong and John Sie Yuen Lee 35% 31.7% 30% 26.5% 25% 20% 15% 1.2% 1.1% Kau˝°mb˛ 2.1% Kosala 2.4% Buddna’s place 3.4% Vai˝al˛ 3.5% Bamboo Grove R°jagrha ˜r°vast˛ Jetavana 0% Varanas˛ 3.6% 5% Vulture’s Peak 8.2% 10% Figure 5.7 The ten toponyms most frequently mentioned with a character. For a comparison with scholarly work, we again turned to the glossary items in Prebish (2000). The glossary items include 16 toponyms that are listed in the Place Authority Database (DDBC 2008b). Our top 16 list covered 9 out of these 16 toponyms (56%). Most of the omitted toponyms refer to locations that were not mentioned in the Canon but played significant roles in Buddhist history, for example, Bodh-gayā, Koliya and Licchavi. A number of high-ranking locations in our list, such as Śrāvastī and Vulture’s Peak, are not items in their own right but referenced in other items. A few other locations, such as Varanasī and Trāyastrijśa 忉利天, occur frequently in our dataset but were not deemed significant enough as a glossary item in Prebish (2000). Different characters vary in terms of their range of locations, just as they do with respect to verb profiles. Buddha’s most frequent toponyms – topped by Jetavana, Śrāvastī, and Rājagṛha (leftmost column, Table 5.8) – largely overlap with those in Figure 5.7, given his dominance in the Canon. Toponyms associated with “Bhagavat,” one of Buddha’s other epithets, are also similar (third column, Table 5.8). The epithet “Tathāgata,” however, is distinctive in being less often associated with places of Śākyamuni’s ministry, such as Śrāvastī and Rājagṛha (second column, Table 5.8). Under this epithet, he is often said to be Analyzing Who, What, and Where in a Historical Mediæval Chinese Corpus 97 Table 5.8 Most Frequent Places Associated with the Three Major Epithets of Buddha and with Other Characters Buddha Tathāgata Location Freq. Jetavana 37.8% Varanasī 18.4% Śrāvastī Śrāvastī 30.4% Kuśinagara 10.0% Jetavana Rājagr.ha Bamboo Grove Vaiśālī Location Bhagavat Freq. Location All Other Characters Freq. Location Freq. 8.6% Dīpam.kara Buddha’s place 4.2% Śrāvastī 9.5% Rājag.rha 20.1% Buddha’s place 14.1% Bhagavat’s place 11.5% Śrāvastī 19.4% 5.8% Varanasī 7.9% Varanasī 5.4% 3.3% Jetavana 4.2% Vaiśālī 7.3% Jetavana at Śrāvastī 3.9% 9.5% 6.7% at the place of Dīpaṃkara Buddha, “Lamp-bearer Buddha” 然燈佛所 (ranked third), one of the so-called celestial buddhas who reached enlightenment eons before Śākyamuni. Finally, the toponyms of other characters in the Canon also differ significantly. They are more frequently said to be at “Buddha’s place” and “Bhagavat’s place” (rightmost column, Table 5.8) rather than any specific geographical location. 5.6 Conclusions Information extraction from historical text is challenging because of the lack of training data. This chapter investigated whether and how to exploit a limited amount of in-domain training data to improve extraction accuracy and presented a case study on data-driven profiling of the characters and toponyms in the Chinese Buddhist Canon, the largest collection of Medieval Chinese texts. We applied lexicon-based NER on the Canon and then extracted significant character-verb and character-toponym associations. The contribution of this chapter is threefold. First, we have created the largest Medieval Chinese corpus to date that is automatically annotated with named entities, as well as associations between characters, verbs, and toponyms.13 Second, we have shown that even a relatively small amount of in-domain linguistic annotation is useful for this kind of analysis. In our case, a word segmenter, POS tagger, and dependency parser trained on 50,000 Chinese characters were able to improve accuracy in NER and the extraction of character-verb pairs. These results suggest that future analyses on historical corpora in low-resource languages may also benefit from annotation on a similar scale. Third, we illustrated the utility of these annotations with a quantitative analysis of “who,” “what,” and “where”: who the characters were and how they reflect doctrinal differences within the Canon; what they did, as gleaned from their verb profiles; where they were and how characters vary in their association with locations. 98 Tak-sum Wong and John Sie Yuen Lee This research can be extended in a number of directions. Using manual annotation resulting from this study, NER extraction accuracy can potentially be further improved by bootstrap learning (e.g., Wang et al. 2018) and distant supervision (Mintz et al. 2009). Likewise, word segmentation, POS tagging, and dependency annotation performance may also benefit from domain adaptation from modern Chinese resources (Song and Xia 2014). It would also be interesting to expand the scope from “who,” “what,” and “where” to a full “sketch” (Kilgarriff et al. 2014) of characters and places, and from the Chinese Buddhist Canon to other seminal works of literature whose size precludes manual analysis. Notes 1 This book chapter is an extension of a conference paper presented in the 24th Australasian Document Computing Symposium (Wong and Lee 2019). 2 We used the Chinese Canon since many Buddhist texts, especially those from the Mahāyāna tradition, survive only in their Medieval Chinese translation. 3 The Mahāyāna section runs from K1 to K646, the Hīnayāna from K647 to K978. 4 For example, 不可思議 biətkɑ̌siə’ngyɛ̀ is listed in the dictionary as the name of a monk in the T‘ang period; it also happens to be a common adjective meaning “inconceivable.” Consider, for example, the sentence 讚歎阿彌陀佛不可思議功德。(K192) Tz ǹ t ǹ Qɑmiɛdhɑbhiət biətkɑ̌siə’ngyɛ̀ gungdək, “Praise the unimaginable merit of Amitabha Buddha.” 5 We did so by examining whether the lexicon provides the “original language” for the term; for example, the entry for Shìjiāmóunífó 釋迦牟尼佛 includes the Sanskrit original Śākyamuni, while that for Kŏngqiū孔丘, “Confucius,” or the monk Biətkɑ̌siə’ngyε̌ 不可思義, do not. 6 Alternative names for a character are considered the same character. In the Personal Authority Database, a character can be referred to by different names or epithets. Buddha, for example, can be called fó, “Buddha”; rúlái, “Tathāgata”; and shìzūn, “Bhagavat.” We combine these statistics into the same character. 7 When in plural, the term “Buddha” refers not to Śākyamuni but to any Buddha among the myriads that populate the cosmos or belong to the previous lineage of teachers that led to Śākyamuni. Hence, we excluded all names with the plural marker 諸, zhū (e.g., 諸佛, zhūfó, “buddhas”). 8 For example, Subhūti is not depicted to be at Ganges River despite the co-occurrence of the two words in the sentence “須菩提,如恒河中所有沙數。” “Subhūti! As numerous as the sands of the Ganges River.” 9 In addition, we included the localizer involving the word 所, suŏ, “place,” which is common in our corpus. The word suǒ is typically modified by a personal name, for example, 佛所, fó suŏ, “place of Buddha.” 10 A number of verbs, though frequently associated with geographical names, do not indicate the location of its object; for example, míng, 名, “call,” is used for presenting place-names. Further, a number of prepositions, though frequently associated with geographical names, do not indicate the location of a person; for example, rú, 如, “such as,” is used in citing examples. We removed these verbs and prepositions from our list. 11 For example, the character Ānanda and the toponym Jetavana appear at different sentences in this passage: (K936) “At that time, (after) Saint Ānanda has preached with Kiæpbiət, he left from seat. (He) gradually begged (for food) and arrived Jetavana at Śrāvastī.” 12 E.g., the directional verb shàng, 上, “ascend,” in the sentence Njiuləi zhiǎng S mzhips mten’iɛ̀m, 如來上三十三天焰. 13 This corpus, and other relevant resources, are accessible publicly at http://mega.lt.cityu. edu.hk/~tswong/tkt/. Analyzing Who, What, and Where in a Historical Mediæval Chinese Corpus 99 References Agarwal, Apoorv, Augusto Corvalan, Jacob Jensen, and Owen Rambow. 2012. “Social Network Analysis of Alice in Wonderland.” In Proceedings of the Workshop on Computational Linguistics for Literature, edited by David Elson, Anna Kazantseva, Rada Mihalcea, and Stan Szpakowicz, 88–96. Stroudsburg, PA: Association for Computational Linguistics. https://aclanthology.org/W12-2513/. Agarwal, Apoorv, and Owen Rambow. 2010. “Automatic Detection and Classification of Social Events.” In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP’10), edited by Hang Li and Luís Màrquez, 1024 − 34. Stroudsburg, PA: Association for Computational Linguistics. https://aclanthology.org/ D10-1100/. Bamman, David, Brendan O’Connor, and Noah A Smith. 2013. “Learning Latent Personas of Film Characters.” In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), edited by Hinrich Schuetze, Pascale Fung, and Massimo Poesio, 352–61. Stroudsburg, PA: Association for Computational Linguistics. https://aclanthology.org/P13-1035/. Bingenheimer, Marcus, Jen-Jou Hung, and Simon Wiles. 2009. “Markup Meets GIS – Visualizing the ‘Biographies of Eminent Buddhist Monks’.” In Proceedings Information Visualization IV 2009, edited by Ebad Banissi et al., 550–4. Danvers, MA: The Institute of Electrical and Electronics Engineers, Inc. www.computer.org/csdl/proceedings-article/ iv/2009/3733a550/12OmNvnOwrT. Bornet, Cyril, and Frédéric Kaplan. 2017. “A Simple Set of Rules for Characters and Place Recognition in French Novels.” Frontiers in Digital Humanities 4, no. 6: 21. www.frontiersin.org/articles/10.3389/fdigh.2017.00006/full. Büchler, Marco, Annette Geßner, Thomas Eckart, and Gerhard Heyer. 2010. “Unsupervised Detection and Visualisation of Textual Reuse on Ancient Greek Texts.” Journal of the Chicago Colloquium on Digital Humanities and Computer Science 1, no. 2: 1–17. Chang, Pi-Chuan, Michel Galley, and Christopher D. Manning. 2008. “Optimizing Chinese Word Segmentation for Machine Translation Performance.” In Proceedings of the Third Workshop on Statistical Machine Translation, edited by Chris Callison-Burch and Philipp Koehn, 224–32. Stroudsburg, PA: Association for Computational Linguistics. https://dl.acm.org/doi/10.5555/1626394.1626430. Chang, Pi-Chuan, Huihsin Tseng, Dan Jurafsky, and Christopher D. Manning. 2009. “Discriminative Reordering with Chinese Grammatical Relations Features.” In Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3), edited by Dekai Wu and David Chiang, 51–59. Stroudsburg, PA: Association for Computational Linguistics. Chu Chia-ning 竺家寧. 1996. Vocabulary of Buddhist Sutras of the West Chin Dynasty 《 早期佛經詞彙研究:西晉佛經詞彙研究》 (Technical Report). Taipei: National Science Council. Chu Chia-ning 竺家寧. 1998. The Lexicology of Buddhist Sutra in Ancient China (II) – (Technical Three Kingdoms Report). Taipei: National Science Council. Chu Chia-ning 竺家寧. 1999. The Lexicology of Buddhist Sutra in Ancient China (III) – Eastern Han Dynasty 《早期佛經詞彙研究:東漢佛經詞彙研究》 (Technical Report). Taipei: National Science Council. Clement, Tanya E. 2008. “ ‘A Thing Not Beginning and Not Ending’: Using Digital Tools to Distant-read Gertrude Stein’s The Making of Americans.” Literary and Linguistic Computing 23, no. 3: 361–81. 100 Tak-sum Wong and John Sie Yuen Lee DDBC. 2008a. “Buddhist Studies Person Authority Databases (Beta Version).” Buddhist Studies Authority Database Project, Dharma Drum Buddhist College. http://authority. ddbc.edu.tw/person/. Accessed 14 September 2019. DDBC. 2008b. “Buddhist Studies Place Authority Databases (Beta Version).” Buddhist Studies Authority Database Project, Dharma Drum Buddhist College. http://authority. ddbc.edu.tw/place/. Accessed 14 September 2019. Doddington, George, Alexis Mitchell, Mark Przybocki, Lance Ramshaw, Stephanie Strassel, and Ralph Weischedel. 2004. “The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation.” In Fourth International Conference on Language Resources and Evaluation Proceedings (LREC 2004), edited by Maria Teresa Lino, Maria Francisca Xavier, Fátima Ferreira, Rute Costa, and Raquel Silva, 837–40. Paris: European Language Resources Association. https://aclanthology.org/L04-1011/. Elson, David K., Nicholas Dames, and Kathleen R. McKeown. 2010. “Extracting Social Networks from Literary Fiction.” In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL2010), edited by Jan Hajič, 138–47. Stroudsburg, PA: Association for Computational Linguistics. https://aclanthology.org/P10-1015/. Finkel, Jenny Rose, Trond Grenager, and Christopher Manning. 2005. “Incorporating Nonlocal Information into Information Extraction Systems by Gibbs Sampling.” In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, edited by Kevin Knight, Hwee Tou Ng, and Kemal Oflazer, 363–70. Stroudsburg, PA: Association for Computational Linguistics. https://aclanthology.org/P05-1045/. Gutierrez, Fernando, Dejing Dou, Stephen Fickas, Daya Wimalasuriya, and Hui Zong. 2016. “A Hybrid Ontology-based Information Extraction System.” Journal of Information Science 42, no. 6: 798–820. Holmes, David I. 1994. “Authorship Attribution.” Computers and the Humanities 28: 87–106. Hung, Jen-Jou, Marcus Bingenheimer, and Simon Wiles. 2010. “Quantitative Evidence for a Hypothesis Regarding the Attribution of Early Buddhist Translations.” Literary and Linguistic Computing 25, no. 1: 119–34. Ji, Heng, and Ralph Grishman. 2008. “Refining Event Extraction through Unsupervised Cross-document Inference.” In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics with the Human Language Technology Conference (HLT) of the North American Chapter of the ACL (ALT-08: HLT), edited by Kathleen McKeown, 254–62. Stroudsburg, PA: Association for Computational Linguistics. https://cs.nyu.edu/~hengji/CrossDocIE_Ji.pdf. Kieschnick, John. 2015. A Primer in Chinese Buddhist Writings: Volume One: Foundations. Stanford, CA: Stanford University. https://religiousstudies.stanford.edu/people/ john-kieschnick/primer-chinese-buddhist-writings. Accessed 14 September 2019. Kilgarriff, Adam, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. 2014. “The Sketch Engine: Ten Years On.” Lexicography 1, no. 1: 7 − 36. Lafferty, John D., Andrew McCallum, and Fernando C. N. Pereira. 2001. “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.” In Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), edited by Carla E. Brodley and Andrea Pohoreckyj Danyluk, 282–89. Williamstown, MA; San Francisco, CA: Morgan Kaufmann Publishers Inc. https://repository. upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers. Lancaster, Lewis. 2010. “From Text to Image to Analysis: Visualization of Chinese Buddhist Canon.” In Digital Humanities 2010: Conference Abstracts, edited by Elena Pierazzo, Analyzing Who, What, and Where in a Historical Mediæval Chinese Corpus 101 Charlotte Tupman and Camille Desenclos, 185–7. Oxford: Office for Humanities Communication and Centre for Computing in the Humanities. https://repository.upenn.edu/ cgi/viewcontent.cgi?article=1162&context=cis_papers. Accessed 14 September 2019. Lancaster, Lewis R., and Sung-bae Park. 1979. The Korean Buddhist Canon: A Description Catalogue. Berkeley, CA: University of California Press. www.acmuller.net/descriptive_catalogue/. Accessed 14 September 2019. Lee, John, and Yin Hei Kong. 2016. “A Dependency Treebank of Chinese Buddhist Texts.” Literary and Linguistic Computing 31, no. 1: 140–51. Lizarralde, Ignacio, Cristian Mateos, Juan Manuel Rodriguez, and Alejandro Zunino. 2019. “Exploiting Named Entity Recognition for Improving Syntactic-based Web Service Discovery.” Journal of Information Science 45, no. 3: 398–415. McDonald, Ryan, Kevin Lerman, and Fernando Pereira. 2006. “Multilingual Dependency Analysis with a Two-stage Discriminative Parser.” In Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X), edited by Lluís Màrquez and Dan Klein, 216–20. Stroudsburg, PA: Association for Computational Linguistics. https://aclanthology.org/W06-2932/. Mintz, Mike, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. “Distant Supervision for Relation Extraction without Labeled Data.” In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2, edited by Keh-Yih Su, Jian Su, Janyce Wiebe, and Haizhou Li, 1003–11. Stroudsburg, PA: Association for Computational Linguistics. https://aclanthology.org/P09-1113/. Moretti, Franco. 2007. Graphs, Maps, Trees: Abstract Models for Literary History. London: Verso. Peng, Fuchun, Fangfang Feng, and Andew McCallum. 2004. “Chinese Segmentation and New Word Detection using Conditional Random Fields.” In Proceedings of the 20th International Conference on Computing Linguistics (COLING’04). Stroudsburg, PA: Association for Computational Linguistics. https://aclanthology.org/C04-1081/. Prebish, Charles S. 2000. Buddhism: A Modern Perspective. University Park, PA: Penn State University Press. Rayson, Paul. 2008. “From Key Words to Key Semantics Domains.” International Journal of Corpus Linguistics 13, no. 4: 519–49. Sayoud. 2012. “Author Discrimination between the Holy Quran and Prophet’s Statements.” Literary and Linguistic Computing 27, no. 4: 424–44. Shih, Cheng-Wei, Tzong-Han Tsai, Shih-Hung Wu, Chiu-Chen Hsieh, and Wen-Lian Hsu. 2004. “The Construction of a Chinese Named Entity Tagged Corpus: CNEC1.0.” In Proceedings of the 16th Conference on Computational Linguistics and Speech Processing, edited by Lee-Feng Chien and Hsin-Min Wang, 305–13. Taipei: The Association for Computational Linguistics and Chinese Language Processing. https://aclanthology. org/O04-1032/. Song, Yan, and Fei Xia. 2014. “Modern Chinese Helps Archaic Chinese Processing: Finding and Exploiting the Shared Properties.” In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), edited by Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, 3129–36. Paris: European Language Resources Association (ELRA). www.lrec-conf.org/proceedings/lrec2014/ pdf/138_Paper.pdf. Soothill, William Edward, and Lewis Hodous. 1937. A Dictionary of Chinese Buddhist Terms: With Sanskrit and English Equivalents and a Sanskrit-Pali Index. Carter Lane, 102 Tak-sum Wong and John Sie Yuen Lee EC: Kegan Paul, Trench, Trubner & Company, Limited. http://mahajana.net/texts/ soothill-hodous.html. Accessed 14 September 2019. Surdeanu, Mihai, and Massimiliano Ciaramita. 2007. “Robust Information Extraction with Perceptrons.” In Proceedings of the NIST 2007 Automatic Content Extraction Workshop (ACE07). Paris: European Language Resources Association (ELRA). http://surdeanu. info/mihai/papers/ace07a.pdf Tseng, Huihsin, Pichuan Chang, Galen Andrew, Daniel Jurafsky, and Christopher Manning. 2005. “A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005.” In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing (IJCNLP-05), edited by Chu-Ren Huang and Gina-Anne Levow, 168–71. Singapore: Asian Federation of Natural Language Processing. https://aclanthology.org/I05-3027/. van Dalen-Oskam, Karina, Jesse de Does, Maarten Marx, Isaac Sijaranamual, Katrien Depuydt, Boukie Verheij, and Valentijn Geirnaert. 2014. “Named Entity Recognition and Resolution for Literary Studies.” Computational Linguistics in the Netherlands 4: 121–36. Vossen, Piek, Eneko Agirre, Nicoletta Calzolari, Christiane Fellbaum, Shu-kai Hsieh, Chu-Ren Huang, Isahara Isahara, Kyoko Kanzaki, Andrea Marchetti, Monica Monachini, Federico Neri, Remo Raffaelli, German Rigau, Maurizio Tescon, and Joop Vangent. 2008. “KYOTO: A System for Mining, Structuring and Distributing Knowledge across Languages and Cultures.” In Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), edited by Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, and Daniel Tapias, 1462– 9. Paris: European Language Resources Association (ELRA). https://aclanthology.org/ L08-1250/. Wang, Xiaoyu, Yujia Zhai, Yuanhai Lin, and Fang Wang. 2018. “Mining Layered Technological Information in Scientific Papers: A Semi-supervised Method.” Journal of Information Science 45, no. 6: 779–93. Wong, Tak-sum, and John Lee. 2016. “A Dependency Treebank of the Chinese Buddhist Canon.” In Proceedings of the Tenth International Conference on Language Resources and Evaluation, edited by Nicoletta Calzolari, 1679–83. Paris: European Language Resources Association. https://aclanthology.org/L16-1265/. Wong, Tak-sum, and John Lee. 2019. “Character Profiling in Low-Resource Language Documents.” In Proceedings of the 24th Australasian Document Computing Symposium (ADCS 2019), edited by Gianluca Demartini and Paul Thomas, 1–4. New York: Association for Computing Machinery. Xue, Naiwen, Fei Xia, Fu-Dong Chiou, and Marta Palmer. 2005. “The Penn Chinese Treebank: Phrase Structure Annotation of a Large Corpus.” Natural Language Engineering 11: 207–38. Zhao, Hai, Chang-Ning Huang, and Mu Li. 2007. “An Improved Chinese Word Segmentation System with Conditional Random Field.” In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, edited by Hwee Tou Ng and Olivia Oi Yee Kwong, 162–5. Stroudsburg, PA: Association for Computational Linguistics. https:// aclanthology.org/W06-0127/. Zhou, Guodong, Min Zhang, Donghong Ji, and Qiaoming Zhu. 2007. “Tree Kernel-based Relation Extraction with Content-sensitive Structured Parse Tree Information.” In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), edited by Jason Eisner, 728–36. Stroudsburg, PA: Association for Computational Linguistics. https://aclanthology.org/D07-1076.pdf. 6 Corpora and Literary Translation Titika Dimitroulia 6.1 Introduction: Literary Translation, Tools, and Corpora In her seminal paper “Corpus linguistics and Translation Studies: Implications and applications,” which introduced corpus linguistics (CL) in translation studies (TS) and founded the subfield of corpus-based translation studies (CBTS or CTS), Mona Baker predicted that the use of large electronic corpora and their interrogation would reshape and expand the new discipline (1993, 235). Since then, corpora and their technology have had an ever-increasing role in both descriptive and applied translation studies on the one hand and translation practice on the other. Translated literary texts have been investigated in CBTS since its very emergence (Anderman and Rogers 2008, 13), while recent surveys confirm that literary translation remains one of the most popular research fields in both TS (Van Doorslaer and Gambier 2015; Zanettin et al. 2015) and CBTS (Granger and Laufer 2022). With the initiative of Mona Baker (1993, 2000), a large amount of research has been done on the “regularities of translated texts, regularities of translators and regularities of languages” (Zanettin 2014, 178) with the use of various models of analysis (see Sun and Li 2020 and Zanettin 2017 for an overview). Corpus-based literary translation research has progressively adopted approaches to literary translation that situate its investigation in wider sociocultural and historical settings, grounded in the previous achievements of CBTS and using methodologies drawn from digital humanities. Evidence on the popularity of literary translation in descriptive corpus-based research domain runs counter to the scarcity of applied research on the specific use of corpora and corpus-based translation tools in literary translation practice and literary translator education. The use of corpus technology and computer-aided or assisted translation (CAT) tools has been for long exclusively investigated in nonliterary translation practice and non-literary translator education. This is probably due to the commonly held view that the use of technology in literary translation practice is insignificant. The emergence of CALT (Computer-Aided or Assisted Literary Translation) opens new paths of research in the field. Notwithstanding this, reports on the use of corpora and CAT tools in literary translation practice are still missing. In general surveys, such as the UK Translator Survey 2016, which, among else, has looked into the use of tools by translators, DOI: 10.4324/9781003298328-7 104 Titika Dimitroulia a specific question on corpora was included, and some useful data can be drawn on this from the responses of literary translators (EC Representation in the UK, CIOL and ITI 2017, 35–37). Recently, more specific surveys have also been conducted to investigate the interaction of literary translators with tools, including corpora, such as Slessor’s (2020) including 40 Canadian literary translators and Ruffo’s (2021) surveying 150 literary translators working with various language pairs in various countries. Other surveys, such as Şahin and Gürses’s (2021) in Turkey, have focused on specific tools, namely, machine translation (MT). As a result, more data on the use of corpora in literary translation practice are now available as a basis of further investigation, but a lot needs to be done. Data from these surveys confirm that, for the time being, the relation of literary translators to technology generally remains restricted to a minimum, consisting of the use of word processors and the internet for communication and documentation purposes. At the same time, one of the most interesting findings in Slessor’s (2020, 248) survey, partially confirmed by Ruffo’s data (2021, 83 and 135), is that (user-friendly) corpus technology seem to prevail among the wishes of literary translators as regards tools that meet their specific needs. The technological needs of literary translators are determined by “the inseparability of form and content in literary language” (Taivalkoski-Shilov 2019, 694) and the specific nature of the texts literary translators deal with, which “are often characterized by a remarkable, vocal multilayeredness and deliberate ambiguity . . . plural interpretations,” etc. (ibid., 695–696). Literary translation is an interpretative act which entails enhanced conceptual, linguistic, and (inter)cultural skills, as well as creativity that is considered to be hindered by technology. Literary translators put forward the very nature of their work in order to account for their negative perceptions of CAT tools and, in particular, MT, while they willingly adhere to non-translation-specific tools, among which is corpora, used as translation aids (Youdale and Rothwell 2022; Ruffo 2021; Slessor 2020). It is noteworthy, however, that the most popular research area among scholars interested in tools in literary translation practice is MT (see Youdale 2020, 19–23, for a brief overview). It is clear that the use of corpora and corpus technology by literary translators needs to be further studied, against the background of the increasing technologization of the translation profession, which does not leave unaffected literary translation, and the emergence of the research field of CALT. I would like to contend that corpus technology is an open and flexible technology that may well respond to the needs of literary translators. Corpus use in literary translation practice can definitely enhance translators’ creativity, while (corpus-based) translation-specific tools can also be of help for them in specific settings. Corpora are also of paramount importance for the further study of literary translation as “one of the major shaping elements in the processes of transmission of ideas, texts and cultural practices” (Bassnett and Johnston 2019, 183). Digital humanities (DH) methodologies and hermeneutical text analysis (Sinclair and Rockwell 2016) can enrich CBTS approaches and cast light both on the textual and extratextual terrains of literary translation, which lies at the heart of World Literature (Damrosch 2003). Corpora and Literary Translation 105 In what follows, I will, first, provide an outline of the field of CALT, with a focus on the current and prospective uses of corpus technology in literary translation practice and its integration in literary translator education. On the basis of this, I will then present some new trends in corpus-based literary translation research, with an emphasis on perspectives that explore “the wider cultural and historical context [of literary translation], including the circumstances of production and reception of both source and target texts” (Zanettin 2017). 6.2 CAT Tools and Corpus Technology in Literary Translation Practice and Literary Translator Education CAT is mainly defined today in two ways: either as encompassing all tools, which can potentially assist translators, including corpus technology, or as referring only to translation production proper and therefore to translation memories (TM), termbases (TB), and machine translation (MT) (Zhang and Nunes Vieira 2021; EMT Group 2017; PACTE Group 2018; Frérot 2016; Loock 2016). The broad definition of CAT is adopted here, which includes both “active” and “passive language technologies” (Taravella 2011) that is “generative” and “look-up technologies” (outils de production and outils de consultation, respectively, according to Frérot and Karagouch 2016). Corpora and corcondancers are part of the latter (ibid.).1 Zanettin reminds us that CALT was first conceptualized in the field of CBTS by Jan-Mirko Maczewski (1996), a pioneer in corpus-based literary translation research, “who proposed the acronym CoALiTS (Computer Assisted Literary Translation Studies) to denote a field of research combining literary and linguistic computing with literary translation” (2017). Almost 30 years later, this field seems to consolidate and corpus linguistics ranges among CALT technologies to be further investigated. The chart of the eight main types of translation technologies (see Figure 6.1) proposed in 1998 by the computational linguist Alan K. Melby can be useful in situating this field. The first seven types will be commented upon in what follows, with special reference to corpora; the eighth one, dealing with project and billing management, does not apply to most settings in literary translation practice. INFRASTRUCTURE SEGMENT LEVEL TERM LEVEL BEFORE TRANSLATION ˜ Term candidate extraction ˜ Terminology research ˜ New text segmentation, previous sourcetarget text alignment, and indexing DURING TRANSLATION ˜ Automatic terminology lookup ˜ ˜ Translation memory lookup Machine translation ˜ Terminology consistency check and nonallowed terminology check ˜ Missing segment detection and format and grammar checks AFTER TRANSLATION TRANSLATION WORKFLOW AND BILLING MANAGEMENT Organization of the eight translation tool functions. Figure 6.1 Melby’s eight types of translation technology (1998, 1). 106 Titika Dimitroulia 6.2.1 Corpora and Literary Translators’ (Complex) Needs Melby’s infrastructure covers document creation and management systems, telecommunications (including e-mail, web browsing, etc.), and looking up in terminology databases (1998, 1). In other words, it refers to non-translationspecific tools, including those needed for documentation in both the reception and formulation phase in the translation process model introduced by Holmes for the evaluation of literary translations (1988, 84).2 “Web browsing” is related to passive language technologies, namely, the “various search tools and databases,” which seem to be the most praised tools among translators (Koskinen and Ruokonen 2017, 106). Although corpora are included in these technologies, it is widely admitted that they are not used systematically by non-literary translators (Peraldi 2019; Frérot 2016; Bowker and Corpas Pastor 2015; FrankenbergGarcia 2015). This seems to be also true in the case for literary translators. Slessor does not explicitly mention corpora among general technologies; thus, only inferences about their use can be made. In Ruffo’s survey, a few literary translators confirm that they use corpora (2021, 135). Only 6% of those who have received technology-specific training in academic settings were trained in corpus technology, while corpora are not mentioned at all in the case of vocational training (Ruffo 2021, 93–94). There is no reference to the use of “robust” or “stable” corpora, in Zanettin’s (2002) words, that is, mainly large reference corpora, which can be very useful in literary translation. It is true that they are not available for all languages, and very often they are not easily adapted to literary translators’ needs (Gallego-Hernández 2015, 376). Even when reference corpora can be found for a language and can be employed for the study of language in use, literary translators are probably not inclined to use them if they are not trained to do so. As regards parallel corpora, Zanettin stresses “the scarcity of easily available sources of parallel corpora beyond a few domains and text types” (2012, 154). Literary parallel corpora are the hardest to find for reasons related to the greater effort needed for their creation and to copyright issues. The absence of available reliable corpora, the lack of knowledge about their availability and interrogation, and even more, “the widespread lack of knowledge about the very concept of corpus” (Peraldi 2019, 271) explain why literary translators, together with their non-literary colleagues, systematically use for their linguistic inquiries the “new generation dictionaries” on the web, such as Linguee, Reverso, or Tradooit, among others. In Ruffo’s survey, they are occasionally characterized by literary translators as “corpus dictionaries” (2021, 196). This means that some literary translators are aware that these resources are parallel-corpus-based applications. It is not at all certain, however, that they are equally aware that, as a result, they depend heavily on the quality of the corpora upon which they rely, which cannot be easily evaluated, as they are not often directly accessible (Loock 2016, 91–92). It is not obvious equally that they perceive their difference from bilingual dictionaries, which are “repertoires of lexical equivalents,” whereas parallel corpora are “repertoires of strategies deployed by past translators, as well as repertoires of translation equivalents” (Zanettin 2002, 11), which may not suit their specific needs. Corpora and Literary Translation 107 While 75% of participants in Slessor’s survey use Linguee, regarded as a “translation search engine,” 93% of them use “general search engines” (2020, 244), confirming Koskinen and Ruokonen’s findings on the popularity of search tools (2017). This means that they interrogate the web as a monolingual or comparable (and multimodal) “mega-corpus,” which, of course, is “not as balanced and reliable as a carefully chosen collection of texts” (Kübler 2008, 28). While the absence of reliable corpora of various types in several languages makes the “web-as-corpus” much more important, the risks in their use remain considerable with regard to the information’s quality (Loock 2016, 89–91). Peraldi’s remark about most non-literary translators seems to be valid also for literary translators: [They] do not even realise that classic and daily-used functionalities such as concordance searches in translation memories, the use of parallel-corpus based applications such as Linguee, or even plain collocates search requests on web engines already fall into corpus-based proficiency (Picton et al. 2015) and could therefore be boosted by the use of more powerful tools. Nevertheless, literary translators seem to vaguely perceive the usefulness of corpus technology for their work, as most items in literary translators’ technological wish list provided by the participants to Slessor’s survey, as an answer to the question “If you could design a technological tool for literary translators, what would you want it to do?” (2020, 248) are related to corpora: 1 2 3 4 5 6 7 8 9 Replace typing. An effective tool for creating bitexts. For literary translations, it would be nice to have tool for capturing printed texts (throughout a computer or smartphone camera?). Provide me a comparison of contexts in which a given word or phrase is used, specific to genre. Determine the frequency of the most-used terms of phrases, to avoid repetitions in close proximity or verbal tics. Automated creation of glossary, with references. Literary terminology organized by subject, culture, period. A database of all books ever written, which could be searched with a few keywords. That would be useful to track down original or translated quotations. To give the translator all possible translations of a specific term. This is in line with literary translators’ wishes expressed in Ruffo’s survey, where corpora and concordance search range among several other appealing tools (2021, 136). Still, too few of them describe corpus technology as “the best method to observe the effective linguistic use of words” (Ruffo 2021, 83) or as a “powerful resource,” in that it helps uncovering “patterns which close reading alone can either only subjectively assess or not register at all, e.g., sentence length and repetition” and improve consistency (2021, 135–6). 108 Titika Dimitroulia The potential of corpus technology remains generally invisible for most literary translators. This invisibility can explain why they ask for a tool to display term occurrences so that they avoid repetitions in their translation, as well as words in context in order to better understand their use, while so many concordancers are today available. Some concordancers, such as the one provided by Voyant tools, are online and free,3 which means that literary translators do not need to install a tool on their computer but can start working immediately on their queries. In addition, Voyant tools support many text formats and therefore do not require from the translator a conversion effort, as other concordancers do, and presuppose only a basic knowledge for the exploration of their basic features, while they offer advanced features for hermeneutical text analysis (Sinclair and Rockwell 2016). Concordancers can be used in multiple ways by literary translators as a translation aid in the translation process if they are taught how to use them. They can provide words in context so that their meaning is fully perceived and word frequencies so that translators can avoid repetitions, as they wish. Even more, they can unravel all sorts of hidden patterns in the texts and enhance a fine-grained interpretation, while parallel concordancers, such as Sketch Engine’s one or Paraconc, allow the display and exploration of bitexts. An interesting experiment at the interface of research and practice illustrates the potential of corpus use in the translation process. Youdale (2020) has applied a hybrid, flexible methodology, called Close-Distant Reading (CDR), to the study of literary style and the stylistic self-analysis of the translator. CDR draws from corpus linguistics and DH and links human close-reading with corpus-driven distant reading, enhanced by visualization techniques, in both phases of reception and formulation of the translation process. Nevertheless, not all information required by literary translators can be found in online corpora, corpus-based resources, and the web searched as corpus or retrieved from the text analysis of the foreign text and, eventually, its retranslation(s). Questions “for which there are often no clear answers in dictionaries, glossaries, Google searches and other tools and resources that [translators] are accustomed to using” (Frankenberg-Garcia 2015, 352) may lead literary translators to build their own corpora. Even the wish for a tool displaying words in context in a specific genre implies the creation of a do-it-yourself (DIY) corpus (Zanettin 2002), which can be interrogated for specific purposes. Still, literary translators cannot find easily the digital texts needed for their creation (Zanettin 2012), especially when they work with peripheral languages. Non-literary translators can use tools to collect electronic texts, such as Bootcat (Baroni and Bernardini 2004), to automatically build DIY corpora, whereas the shortage of online resources relevant to literature in many languages explains why literary translators need to be trained in corpus analysis as well as in corpus creation. Five participants in Ruffo’s survey mentioned, however, that they use “a software to build their own corpus of translations” (2021, 83). Although no further information is provided on the type of corpora created and the purposes of their use, we can assume that, along with technical skills, methodologies drawn from corpus-based research may eventually be needed for the effective compilation and Corpora and Literary Translation 109 interrogation of DIY corpora. Clearly formulated hypotheses in literary translation depend on theoretical assumptions. For instance, various translation-driven corpora, as described by Zanettin (2012), and not only parallel ones, may have to be combined so that the translator apprehends the foreign text in depth and finds a range of solutions for given translation problems, which will not hinder his/her creativity (Malmkjær 2008). Their use can help for instance the translator reconstitute, even partly, the intertextual node of a classic text and therefore better understand its position and functioning in its context (Venuti 2009). Venuti’s theory of intertextuality in and of translation, along with Theo Hermans’s “translation-specific intertextuality” (2003, 40) and Cay Dollerup’s (2000) “support-translation” concept, may apply not ex post, as a method of reading a translation, but ex ante, as a methodology for creating a translation on the basis of multiple information drawn from various corpora. Along with print resources, several types of corpora may be of help for the translation of such a text: • • • • • • • • The parallel corpus of the source text and its retranslations, if any. The monolingual corpus of all works by the author. A parallel corpus containing those of his or her works who have already been translated, along with their translations in the target language or in other languages. A monolingual comparable corpus including works of the same genre written by other authors in the same period in the source language, and a subcorpus containing texts of authors sharing the same aesthetic ideas. A monolingual comparable corpus including non-literary texts written in the same period in the source language. A multilingual comparable corpus composed by the reviews of his/her works. A monolingual corpus containing literary texts of the same period written in the target language. A monolingual reference corpus, etc. These methodological uses of corpora cannot be simply invented by a trainee or a professional translator and should be part of their education. Corpus technology “builds on human intuition by acting very often as a validation tool” (Peraldi 2019, 273). It mainly generates questions on the basis of data to be interpreted by the translator and therefore increases the range of translator’s choices. Therefore, corpus technology seems to meet perfectly the needs of literary translators. Still, they need to be trained to their use. A corpus-based course for literary translators could provide them competences in digitization, data curation and editing, corpus creation, and analysis, grounded in a solid methodological and theoretical framework. Practice-led experiments, such as Youdale’s, can be used in their education to shed light to the humanist and humanizing dimension of technology, highlighted equally by the founder of the field of CAT, Martin Kay, whose “translator’s amanuensis” triggers again today the interest of TS scholars (Alonso and Vieira 2017). Although the task may seem demanding, as tools increasingly become more easy to use, a corpus-based course for literary translators, such as the one hinted at previously, may be quite feasible soon.4 110 Titika Dimitroulia 6.2.2 What CAT Tools Can Do for Literary Translators CAT tools are translation-specific tools, generative, active language technologies. According to data drawn from the two surveys mentioned in Section 6.1, literary translators rarely use translation-specific tools (Ruffo 2021, 112–3; Slessor 2020, 245–7). The positive attitude toward CAT tools seems to still be an exception (Ruffo, ibid.; Youdale and Rothwell 2022, 382–4; Hansen 2021). However, a quick look at literary translators’ needs, along with related experiments (Youdale and Rothwell 2022; Hansen 2021), hint at the potential for the future use of the (corpus-based) CAT tools in the literary translation process and justify the need for relevant, specialized literary translator education, adapted to the specificity of literary translation. CAT tools, which are presented separately in Melby’s model, are today embedded in integrated environments. The three phases of the translation process as described by Melby (before, during, and after) are thus often carried out by different features of the same tool, whose overall performance depends on the quality of its “hidden” corpora (Loock 2016, 33–34). At the term-level process, Melby mentions term extraction and research for the building of the terminology database (before), terminology look-up (during), and terminology consistency check (after). Many TMs provide today term extractors, but building a term base needs usually much increased effort and manual work. Terminology look-up can be useful for literary translators who have decided to invest this effort in specific cases, for example, in mathematical fiction or science-fiction texts. At any case, the automatic creation of glossaries can be found among the translators’ wishes in Slessor’s survey (2020, 248), while terminology consistency ranges among the pros of the use of technology in Ruffo’s survey (2021, 136). At the segment-level process, Melby mentions alignment (before), translation memory look-up, machine translation (during), and missing segment detection, along with format and grammar checks (after). Translation memories are databases which store aligned texts, that is, texts along with their translations, segmented on sentence or paragraph level. So they are, in fact, parallel corpora of a specific type. Texts can be directly aligned by the TM or by another tool and imported in the TM. There are many open source, performing alignment tools, as for example, LF aligner, you align, etc. Post-editing (PE) is always needed, and even more in literary texts. TM’s parallel corpora can be entirely built up by a translator’s proper works but commonly include bitexts imported from various sources, either ready-made or aligned by the translator. When segments of the new text to be translated match – exactly or partially (fuzzy matches) – already-existing translated TM segments, these segments are automatically recalled from the database and proposed to the translator so that s/he does not translate the same segment twice. The specific resources needed by literary translators are once again more likely not to be found easily online so as to be imported as bitexts in a TM. For this reason, in literary translation, TMs are more likely to be useful when built by translators with their own bitexts and with strictly controlled import of selected, aligned resources. In general, segment look-up is not expected to be very useful for literary translators, due to the high discursive complexity of translated texts, but could be Corpora and Literary Translation 111 useful in specific cases, as, for example, when retranslations are imported, although the suggested translations need to be critically examined one by one by the translator; or for the translation of humanities and social sciences texts, where reiterations of terms and phrases can be observed (Mazoyer 2021) but on which we don’t focalize here. Until recently, this recall worked only with segments, and most CAT tools provided a functionality called “concordance search,” allowing the translator to search for a specific word, sequence, or phrase. Nowadays, subsegment matches, which can be of greater interest for literary translators, are also available. The fact that CAT tools can be useful when used in specific settings and mainly as concordancers is confirmed by an experiment conducted by Andrew Rothwell, who translated Zola’s novel La joie de vivre with the use of TM (Youdale and Rothwell 2022). Starting from the different French and English reception of the novel and the theory of retranslation, Rothwell aligned Zola’s text and Ernest Alfred Vizetelly’s translation that introduced it in the English literary space in the late nineteenth century and remained dominant along the twentieth century. Both were imported in a TM, and the version the translator produced constantly referred to the Victorian text, whose choices and important omissions reflected the era’s ideology. Although segments of this translation were proposed by the tool for insertion, they were rarely accepted by Rothwell for different reasons, ranging from linguistic to cultural and ideological ones. The overall value of the project lies on the level of interpretation, as the older translation was “casting a different, historical light, sentence by sentence, on the difficulties and possibilities of the ST” (Youdale and Rothwell 2002, 397). Rothwell’s experiment illustrates the pros and cons of TMs in literary translation. Repetition is almost insignificant in literary texts, while segmentation may hinder creativity. When retranslations are used in a hermeneutical perspective, they seem to be counterproductive. But his experiment illustrates also a possible hermeneutical use of CAT tools, which triggers reflection and supports a multilayered interpretation of the text that guides the translator’s decisions. Machine translation is also embedded today in most TMs. The majority of literary translators seem to abhor MT (Ruffo 2021, 112; Slessor 2020, 246), while in contrast it triggers the interest of the academic community in various perspectives (see Youdale 2020, 19–23, for a brief overview; also see Moratto 2010). Yet many scholars stress various ethical implications of the use of MT in literary translation with regard to the conceptualization of literature and literariness or the translator’s voice (Costa and da Silva 2020; Kenny and Winters 2020; Taivalkoski-Shilov 2019). In a recent study which aimed at exploring creativity in literary translation through the comparison of three versions of a short story by Kurt Vonnegut from English to Catalan and Dutch, produced by neural MT (NMT), NMT plus human PE, and a translator (Human Translation, HT), the translator’s version had, unsurprisingly, the highest creativity score, followed by PE, while MT resulted in a very poor translation (Guerberof-Arenas and Toral 2022; cf. Guerberof-Arenas and Toral 2020). Furthermore, the researchers point out that also “during PE translators are less creative (more errors and fewer creative shifts)” (Guerberof-Arenas and Toral 2022, 26). The theoretical debate on the use of MT in literary translation cannot be dealt with in detail here as it extends the scope of our research. Still, some experiments, 112 Titika Dimitroulia like Youdale’s, which explores MT suggestions in CAT tools as potentially stimulating creativity (2022, 384–93), or Rothwell’s (2009), using MT in his translation of dada poetry with reference to Walter Benjamin’s translation theory, suggest that MT may be useful for specific purposes and in specific settings. The last tools mentioned by Melby at the segment level process are related to quality assurance, concerning misspellings, missing tags, incorrect punctuation, etc. They can be useful for literary translators, as is suggested by Ruffo’s survey (2021, 136). Still, their value needs to be assessed with regard to the overall effort needed for the use of a TM. 6.3 The Encounter between CBTS and DH In descriptive corpus-based and corpus-driven research, literary translation holds a prominent place. Yet the detailed discussion of this abundant research, which has progressively been expanding its scope by building upon more variegated and coherent methodologies (Sun and Li 2020, 647), is beyond the scope of this paper (for an overview, see Zanettin 2013, 2017). We will focus instead on some multidimensional approaches to literary translation which seem to “find a more robust grounding in natural language processing (NLP) techniques” (Zanettin 2014, 182) and investigate literary translation in line with the demand formulated by Bassnett and Johnston (2019, 187), who argue that: We need to expand our ideas about translation beyond the linguistic and to seek a redefinition of what translation actually is. We also need to understand how translation has functioned in the past, and how attitudes to translation in some contexts have come to be. Based on the achievements in corpus-based literary translation research and their theoretical grounding, these new approaches use large corpora and advanced techniques to enrich the array of languages and genres investigated and to study texts in broader time spans and in complex (inter)cultural settings, which include a variety of translation agents. These techniques, drawn from DH, include, among else, text and data mining, stylometry, visualization, and mapping tools. Some of these tools are specifically developed for literary translation research, such as the “TL-Explorer,” created for mapping and analyzing translated literature (Zhai et al. 2020), and specific projects. DH methods, along the most often with close reading, allow more thorough investigation in existing research areas, the revision of previous assumptions on the basis of more extensive and new data, or the generation of new research hypotheses. In any case, they allow the situation of translation in wider contexts and an in-depth study of its complex role in cultural transfers, as they “unravel the intricate social, cultural and literary networks, in which the activities of literary translation are conducted” (Sun and Li 2020, 651). This is the case of the team project “Prismatic Jane Eyre,” led by Matthew Reynolds and investigating translation not as “a single act involving one source-text Corpora and Literary Translation 113 in one language and one translation-text in one another language, which just happens to occur again and again, but rather as paradigmatically generating multiple texts, so that ‘translation’ becomes the process of turning from one language into others, da una lingua in altre, producing chains of signifiers in target languages, creating multiple equivalent, authentic texts, while ‘a translation’ correspondingly figures as just one of many actual and/or possible linguistic ‘realisations’ (Reynolds 2019, 2). Drawing from cultural and post-colonial approaches in TS, the project explores translation through this “prismatic” lens, with Charlotte Brontë’s Jean Eyre as a case study. The team gathered 595 translations and 675 “acts of translation,” that is either the publication of a new translation or the republication of an existing translation in a new location into 67 languages. They situated them on two maps, providing knowledge on the spatial and temporal distribution of the novel, “as a trans-temporal and geographically multiple text, with many writer-translators, publishers, and readers collaborating to bring ever-new energy to its plural existence” (Reynolds and Vitali 2021, 10). Another “map” displays the available covers of the translations, which reflect on a different level its various receptions. While they study the texts by close reading, they also analyze with corpus technology a number of translations in order to “zoom in on moments of particular interpretive interest” (Reynolds and Vitali 2021, 13). Text analysis is in progress, and the outcomes of the project, along with the theory on which it is based, are available online.5 Cheesman et al. (2017) explore a large corpus of drama retranslations, starting from the principle that translation is an interpretative act and its analysis is not undertaken per se but, as Mona Baker has suggested (2000, 258), in order to unveil the complex sociocultural, ideological, cognitive background against which translators’ decisions are informed. The project is based on the hypothesis that “the more important an item is for a text’s meaning, the less translators tend to agree about translating it (though each one is consistent in using their selected terms)” (2017, 742). In their view, quantitative variation in a corpus of retranslations may lead to the qualitative annotation of the translated texts. Inspired by the close reading of texts against a solid theoretical background in literary studies, TS and CBTS, this interdisciplinary group of researchers has developed a web-based system that enables users to create and explore a parallel corpus of retranslations by using various visualization tools and constantly referring back to texts. This recursive loop reflects the complex hermeneutical premise underlying the project. Automatic and manual annotation is also supported, and stylometric analysis is carried out, revealing the importance of the period in which the translations were made. The kind of retranslations to be explored is already an engaging choice, as it radically challenges the instrumental perception of translation (Venuti 2019) and illustrates the complex ways through which it reflects societies while informing them. So, retranslations explored by the system can be: Complete, fragmentary, edited, adapted versions; versions derived from (a version of) the original-language translated work, or from intermediaries in 114 Titika Dimitroulia the translating language, and/or other languages; versions in various media; for various audiences (popular, scholarly, restricted); in mono-, bi-, or plurilingual formats; from various periods and places; produced and received under various economic, political, institutional, and cultural-linguistic conditions. (Cheesman et al. 2017, 743) A very appealing feature of the project is the visualization of the complex interconnection between texts, writers, and cultures in a joint close and distant reading perspective or, as the authors put it, “from the how and why of variation among translations, back to the varying capacity of the translated text to provoke variation” (2017, 740). What is of particular importance is the number of new questions it generates and the new projects it suggests. Some other projects explore and contribute to different areas of TS research, such as translation history and sociology. Drawing from the idea of “history from below,” Michelle Jia Ye (2022) also combines methods in historical studies, translation history, and network studies with the network visualization tool Gephi to reconstitute and represent the translators’ presences and connections in a popular magazine network in early twentieth-century China. She casts light on translation as “a discursive mode of production that enabled and popularized the very act of publication” (2022, 49), contributing at the same time to translation history and translators’ sociographies in this historical period. These approaches, among many others, illustrate the potential of the encounter between CBTS and DH for corpus-based literary translation research and TS as a whole, “illustrating the need for the expanding of horizons within and beyond the contours of the discipline,” for its outward turn (Bassnett and Johnston 2019, 187). 6.4 Conclusion The technologization of literary translation seems to be imminent today. In this new landscape, delineated by the increasing use of tools in literary translation practice, literary translator education can help literary translators deal with technology in a more sustainable and humanizing perspective. Web-based collaborative environments for computer-assisted translation, designed for cultural texts such as “Traduxio” (Henkel and Lacour 2021), can also be of great interest for literary translator practice and training. Corpus technology, which supports interpretation and enhances creativity, should be at the heart of this education, in which corpusbased CAT tools may also find a place, if used in a similar perspective. Applied corpus-based literary translation should confront these challenges in due time. In the same line of thinking, descriptive CBTS is nowadays transformed through its synergies with DH, which can broaden its scope and allow for the formulation of new complex hypotheses, grounded on literary translation theory as reshaped by interdisciplinary exchanges. Reflecting on applied CBTS in literary translation practice and literary translator education can lead to a cross-fertilization of both. Much needs to be done, but it seems that through the effective exploration of corpora and their technologies, “the last bastion of human translation” (Toral and Corpora and Literary Translation 115 Way 2015, 213) that is literary translation can outline a model of how technology can be used in a humanizing perspective in the translation profession. Notes 1 The first CALT conference’s rationale mentions “CAT tools, corpus linguistics, natural language processing, text analysis and visualization and in particular Neural Machine Translation (NMT)” (emphasis added). https://calt2021conference.wordpress.com. 2 Austermühl adapts Holmes’s model to include tools in its different phases (2001, 13). 3 https://voyant-tools.org. 4 I have designed and taught since 2012 a postgraduate course on the use of tools in literary translation, with emphasis on corpora (Master “Translation of Literature and the Humanities,” Aristotle University of Thessaloniki), whose brief description is available on the Course Registry of Dariah project: “The course aims to enhance the digital competences of the future literary translators and focuses on: Corpora in literature and literary translation (concordancers, annotation tools, corpora design, compilation and use, e.g., sketch engine, bootcat, voyant, catma) – Translation technologies (translation memories, terminology management, Translation Environment Tools (TenTs) as matecat etc.) – Collaborative translation platforms (traduxio),” https://dhcr.clarin-dariah.eu/#370. 5 https://prismaticjaneeyre.org. References Alonso, Elisa, and Lucas Nunes Vieira. 2017. “The Translator’s Amanuensis 2020.” JoSTrans: The Journal of Specialised Translation 28: 345–61. www.jostrans.org/ issue28/art_alonso.php. Anderman, Gunilla, and Margaret Rogers. 2008. Incorporating Corpora. The Linguist and the Translator. Clevedon: Multilingual Matters. Austermühl, Frank. 2001. Electronic Tools for Translators. London: Routledge. Baker, Mona. 1993. “Corpus Linguistics and Translation Studies: Implications and Applications.” In Text and Technology: In Honour of John Sinclair, edited by Mona Baker, Gill Francis and Elena Tognini-Bonelli, 233–50. Amsterdam: John Benjamins. Baker, Mona. 2000. “Towards a Methodology for Investigating the Style of a Literary Translator.” Target 12, no. 2: 241–66. Baroni, Marco, and Silvia Bernardini. 2004. “BootCaT: Bootstrapping Corpora and Terms from the Web.” In Proceedings of LREC 2004, 1313–16. Lisbon: LREC. www.lrec-conf. org/proceedings/lrec2004/pdf/509.pdf. Bassnett, Susan, and David Johnston. 2019. “The Outward Turn in Translation Studies.” The Translator 25, no. 3: 181–88. https://doi.org/10.1080/13556509.2019.1701228. Bowker, Lynne, and Gloria Corpas Pastor. 2015. “Translation Technology.” In The Oxford Handbook of Computational Linguistics (2nd ed.), edited by Ruslan Mitkov, 871–905. Oxford: Oxford University Press. Cheesman, Tom, Kevin Flanagan, Stephan Thiel, Jan Rybicki, Robert S. Laramee, Jonathan Hope, and Avraham Roos. 2017. “Multi-Retranslation Corpora: Visibility, Variation, Value, and Virtue.” Digital Scholarship in the Humanities 32, no. 4: 739–60. https://doi.org/10.1093/llc/fqw027. Costa, Cynthia Beatrice, and Igor A. L. da Silva. 2020. “On the Translation of Literature as a Human Activity Par Excellence.” Aletria: Revista de Estudos de Literatura. https:// periodicos.ufmg.br/index.php/aletria/article/view/22047. 116 Titika Dimitroulia Damrosch, David. 2003. What Is World Literature? Princeton, NJ and Oxford: Princeton University Press. Dollerup, Cay. 2000. “Relay and Support Translations.” In Translation in Context, edited by Andrew Chesterman, Natividad Gallardo San Salvador, and Yves Gambier, 17–26. Amsterdam and Philadelphia, PA: John Benjamins. EC Representation in the UK, CIOL and ITI. 2017. “UK Translator Survey. Final Report.” www.ciol.org.uk/sites/default/files/UKTS2016-Final-Report-Web.pdf. EMT Group. 2017. “European Master’s in Translation (EMT) Competence Framework.” https://ec.europa.eu/info/sites/default/files/emt_competence_fwk_2017_en_web.pdf. Frankenberg-Garcia, Ana. 2015. “Training Translators to Use Corpora Hands On: Challenges and Reactions by a Group of 13 Students at a UK University.” Corpora 10, no. 2: 351–80. Frérot, Cécile. 2016. “Corpora and Corpus Technology for Translation Purposes in Professional and Academic Environments. Major Achievements and New Perspectives.” Cadernos de Tradução 36, no. 1: 36–61. https://doi.org/10.5007/2175-7968.2016v36nesp1p36. Frérot, Cécile, and Lionel Karagouch. 2016. “Outils d’aide à la Traduction et Formation de Traducteurs: Vers une Adéquation des Contenus Pédagogiques avec la Réalité Technologique des Traducteurs.” ILCEA 27. https://doi.org/10.4000/ilcea.3849. Gallego-Hernández, Daniel. 2015. “The Use of Corpora as Translation Resources: A Study Based on a Survey of Spanish Professional Translators.” Perpectives 23, no. 3: 375–91. https://doi.org/10.1080/0907676X.2014.964269. Granger, Sylvianne, and Marie-Aude Laufer. 2022. “Corpus-based Translation and Interpreting Studies. A Forward-looking Review.” In Extending the Scope of Corpus-based Translation Studies, edited by Sylvianne Granger Marie-Aude Laufer, 13–41. London: Bloomsbury. Guerberof-Arenas, Ana, and Antonio Toral. 2020. “The Impact of Post-Editing and Machine Translation on Creativity and Reading Experience”. Translation Spaces 9, no 2: 255–82. https://doi.org/10.1075/ts.20035.gue. Guerberof-Arenas, Ana, and Ruiz Antonio Toral. 2022. “Creativity in Translation: Machine Translation as a Constraint for Literary Texts.” Translation Spaces, Online-First Articles. https://doi.org/10.1075/ts.21025.gue. Hansen, Damien. 2021. “Défis et Pertinence de la Traduction Littéraire Assistée par Ordinateur.” La main de Thôt 9. https://revues.univ-tlse2.fr:443/lamaindethot/index. php?id=982. Henkel, Daniel, and Philippe Lacour. 2021. “Collaboration Strategies in Multilingual Online Literary Translation.” In When Translation Goes Digital: Case Studies and Critical Reflections, edited by Renée Desjardins, Claire Larsonneur, and Philippe Lacour, 153–71. London: Palgrave Macmillan. Hermans, Theo. 2003. “Translation, Equivalence and Intertextuality.” Wasafiri 40: 39–41. Holmes, James S. 1988. Translated! – Papers on Literary Translation and Translation Studies. Amsterdam: Rodopi. Kenny, Dorothy, and Marion Winters. 2020. “Machine Translation, Ethics and the Literary Translator’s Voice.” Translation Spaces 9, no. 19: 123–49. https://doi.org/10.1075/ ts.00024.ken. Koskinen, Kaisa, and Minna Ruokonen. 2017. “Love Letters or Hate Mail? Translators’ Technology Acceptance in the Light of Their Emotional Narratives.” In Human Issues in Translation Technology, edited by Dorothy Kenny, 8–24. London: Routledge. Kübler, Nathalie. 2008. “Corpora and LSP Translation.” In Corpora in Translator Education, edited by Federico Zanettin, Silvia Bernardini, and Dominic Stewart, 25–42. Manchester: St. Jerome. Corpora and Literary Translation 117 Loock, Rudy. 2016. La Traductologie de Corpus. Villeneuve d’Ascq: Presses Universitaires du Septentrion. Malmkjær, Kirsten. 2008. “On a Pseudo-subversive Use of Corpora in Translator Training.” In Corpora in Translator Education, edited by Federico Zanettin, Silvia Bernardini, and Dominic Stewart, 119–34. Manchester: St. Jerome. Mazoyer, Renaud. 2021. “Traduction d’essai et TAO. Le Racisme est un Problème de Blancs de Reni Eddo-Lodge: Une Etude de Cas.” La main de Thôt 9. https://revues. univ-tlse2.fr:443/lamaindethot/index.php?id=991. Melby, Alan K. 1998. “Eight Types of Translation Technology.” Paper presented at American Translators Association ATA 39th Annual Conference, November 4–9, Hilton Head Island, SC. www.ttt.org/technology/8types.pdf. Moratto, Riccardo. 2010. “Designing Translation Curricula in the Machine Translation Era (MTE): Challenges of a New Approach. Student Perspectives.” In Huigu yu qianjing 回 顧與前瞻, Proceedings of 15th Taiwan Symposium on Translation and Interpretation at Changrong University, Tainan, edited by Li Gong-Wei and Li Hui-Rong, 69–89. Tainan: Changrong University. PACTE Group. 2018. “Competence Levels in Translation: Working Towards a European Framework.” The Interpreter and Translator Trainer 12, no. 2: 111–31. www.doi.org/1 0.1080/1750399X.2018.1466093. Peraldi, Sandrine. 2019. “Integrating Corpus-based Tools into Translators’ Work Environments: Cognitive and Professional Implications.” Revista Internacional de Organizaciones 23: 265–92. Picton, Aurélie, Fontanet, Mathilde, Maradan, Mélanie, and Pulitano, Donatella. 2015. “Corpora in Translation: Addressing the Gap between the Scholars’ and the Translators’ Point of View.” Presented at Corpus Use and Learning to Translate (CULT), Alicante, Spain. https://archive-ouverte.unige.ch/unige:86881. Reynolds, Matthew. 2019. “Introduction.” In Prismatic Translation, edited by Matthew Reynolds, 1–18. Cambridge: Legenda. Reynolds, Matthew, and Giovanni Pietro Vitali. 2021. “Mapping and Reading a World of Translations: Prismatic Jane Eyre.” Modern Languages Open 1: 1–18. https://doi. org/10.3828/mlo.v0i0.375. Rothwell Andrew. 2009. “Translating ‘Pure Nonsense’: Walter Benjamin Meets Systran on the Dissecting Table of Dada.” Romance Studies 27, no. 4: 259–72. https://doi.org/10.11 79/026399009X12523296128713. Ruffo, Paola. 2021. “In-between Role and Technology: Literary Translators on Navigating the New Socio-technological Paradigm.” PhD diss., Harriot Watt University, Edinburgh. Şahin, Mehmet, and Sabri Gürses. 2021. “English – Turkish Literary Translation Through Human – Machine Interaction.” Revista Tradumàtica. Tecnologies de la Traducció 19: 171–203. https://doi.org/10.5565/rev/tradumatica.284. Sinclair, Stéfan, and Geoffrey Rockwell. 2016. Hermeneutica: Computer-Assisted Interpretation in the Humanities. Cambridge, MA: The MIT Press. Slessor, Stephen. 2020. “Tenacious Technophobes or Nascent Technophiles? A Survey of the Technological Practices and Needs of Literary Translators.” Perspectives 28, no. 2: 238–52. Sun, Yifeng, and Dechao Li. 2020. “Digital Humanities Approaches to Literary Translation.” Comparative Literature Studies 57, no. 4: 640–54. https://doi.org/10.5325/ complitstudies.57.4.0640. 118 Titika Dimitroulia Taivalkoski-Shilov, Kristiina. 2019. “Ethical Issues Regarding Machine(-assisted) Translation of Literary Texts.” Perspectives 27, no. 5: 689–703. https://doi.org/10.1080/09076 76X.2018.1520907. Taravella, Anne-Marie. 2011. Rapport Sommaire et Préliminaire sur les Résultats de l’Enquête Menée auprès des Utilisateurs de Technologies Langagières en Avril-mai 2011. Gatineau: Centre de recherche en technologies langagières. www.crtl.ca/display265. Toral, Antonio, and Andy Way. 2015. “Machine-Assisted Translation of Literary Text: A Case Study”. Translation Spaces 4, no 2: 240–67. Van Doorslaer, Luc, and Yves Gambier. 2015. “Measuring Relationships in Translation Studies. On Affiliations and Keyword Frequencies in the Translation Studies Bibliography.” Perspectives 23, no. 2: 305–19. https://doi.org/10.1080/0907676X.2015.1026360. Venuti, Lawrence. 2009. “Translation, Intertextuality, Interpretation.” Romance Studies 27, no. 3: 157–73. https://doi.org/10.1179/174581509X455169. Venuti, Lawrence. 2019. Contra Instrumentalism: A Translation Polemic. Lincoln, NE: University of Nebraska Press. Ye, Jia Michelle. 2022. “A History from Below: Translators in the Publication Network of Four Magazines Issued by the China Book Company, 1913–1923.” Translation Studies 15, no. 1: 37–53. https://doi.org/10.1080/14781700.2021.1950043. Youdale, Roy. 2020. Using Computers in the Translation of Literary Style: Challenges and Opportunities. London and New York: Routledge. Youdale, Roy, and Andrew Rothwell. 2022. “Computer-assisted Translation (CAT) Tools, Translation Memory, and Literary Translation.” In The Routledge Handbook of Translation and Memory, edited by Sharon Deane-Cox and Anneleen Spiessens, 381–402. London: Routledge. Zanettin, Federico. 2002. “Corpora in Translation Practice.” In Language Resources for Translation Work and Research, LREC 2002 Workshop Proceedings, edited by Elia Yuste, 10–14. www.lrec-conf.org/proceedings/lrec2002/pdf/ws8.pdf. Zanettin, Federico. 2012. Translation-driven Corpora. Manchester: St Jerome Publishing. Zanettin, Federico. 2013. “Corpus Methods for Descriptive Translation Studies”. Procedia – Social and Behavioral Sciences 95: 20–32. https://doi.org/10.1016/j.sbspro.2013.10.618. Zanettin, Federico. 2014. “Corpora in Translation.” In Translation: A Multidisciplinary Approach, edited by Julian House, 178–99. London: Palgrave Macmillan. Zanettin, Federico. 2017. “Issues in Computer-Assisted Literary Translation Studies.” Intralinea. www.intralinea.org/specials/article/issues_in_computer_assisted_literary_ translation_studies. Zanettin, Federico, Gabriela Saldanha, and Sue-Ann Harding. 2015. “Sketching Landscapes in Translation Studies: A Bibliographic Study.” Perspectives: Studies in Translatology 23, no. 2: 161–82. https://doi.org/10.1080/0907676X.2015.1010551. Zhai, Alex, Zheng Zhang, Amel Fraisse, Ronald Jenn, Shelley Fisher Fishkin, and Pierre Zweigenbaum. 2020. “TL-Explorer: A Digital Humanities Tool for Mapping and Analyzing Translated Literature.” In Proceedings of the the 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, 167–71. International Committee on Computational Linguistics. https://aclanthology.org/2020.latechclfl-1.20. Zhang, Xiaochun, and Lucas Nunes Vieira. 2021. “CAT Teaching Practices: An International Survey.” JoSTrans: The Journal of Specialised Translation 36: 99–124. www. jostrans.org/issue36/art_zhang.pdf. 7 Orality in Translated and Non-Translated Fictional Dialogues Yanfang Su and Kanglong Liu1 7.1 Introduction Fictional dialogues are speech or conversational exchanges between (among) characters in fiction (Koivisto and Nykänen 2016; Bednarek 2018). Fictional dialogues are usually carefully scripted by the author to imitate the orality features of authentic conversations so as to shape characters, develop the storyline, and facilitate author-reader interaction. It is acknowledged that devising fictional dialogues is a demanding task for the literary author. For translators, it is equally challenging to translate fictional dialogues because linguistic, cultural, and aesthetic considerations need to be taken into account (Ettobi 2015). The challenges posed to translators are reflected in previous research regarding how and how well the orality features can be retained in translation. Many studies reported a certain degree of unnaturalness or reduced degree of orality in translated fictional dialogues, such as Leppihalme’s (2000) study on the translation of nonstandard language, Rosa’s (2000) analysis on diachronic changes in translating forms of address (i.e., pronouns, verbs, titles, and nouns used to address a specific speaker), and Ettobi’s (2015) research on cultural assimilation and non-assimilation in translating orality. Most of these studies are qualitative in nature, in that they resorted to the use of certain orality features to study how translated fictional dialogues deviate from the correspondent source texts. Despite some innovative findings, such a qualitative method cannot offer a holistic picture of the orality features in translated fictional dialogues, nor can it compare the similarities and differences of orality between translated and non-translated fictional dialogues. Therefore, in order to address such a gap, this study utilized a corpus of representative original English fictions and a corpus of representative Chinese-English translated fictions to examine how orality features are represented in fictional dialogues of translated and non-translated fiction. The present study extends the extant literature by adopting a multidimensional analysis approach (MDA), thus increasing the range of orality features being explored and providing more quantitative insights into this line of inquiry. In addition, we also aimed at uncovering the discrepancies between translated and non-translated texts in terms of orality, hoping to gain a better understanding of the distinctive features of translated texts and offer practical suggestions for similar future research. DOI: 10.4324/9781003298328-8 120 Yanfang Su and Kanglong Liu 7.2 Literature Review 7.2.1 Orality of Fictional Dialogues Orality refers to a way of dealing with “knowledge and verbalization” in oral speech (Ong 1982, 1). It is assumed that features of orality are epitomized in spontaneous face-to-face conversations (Bublitz 2017). Literary writers strive to imitate the linguistic features of authentic conversations in creating fictional dialogues. The features of fictional dialogues are thus very different from narration in fiction. For this reason, some scholars have challenged the traditional approach to regard speech and narration in fiction as one register (Egbert and Mahlberg 2020). At the same time, some researchers also explored the perceptual quality and naturalness of fictional dialogues. In particular, early studies on fictional dialogues mainly adopted a qualitative approach to describe the orality features in fictional dialogues (e.g., Short 1996; Thomas 1997, 2002). For example, Ferguson (1998) analyzed the use of dialect in Dickens’s Bleak House, Brontë’s Wuthering Heights, and Hardy’s Tess of the d’Urbervilles. By carefully examining the characters’ sociocultural background and the historical settings, she argued that the use of dialect in Victorian novels was inconsistent and deviated from readers’ expectations of genuine conversations. Recently, quantitative approaches utilizing corpus-based approaches and statistical analyses were used to analyze the orality features of fictional dialogues. For instance, Quaglio (2009) made use of two corpora and utilized the multidimensional analysis and the loglikelihood test to compare the linguistic features and the corresponding functions of fictional dialogues and authentic conversations. Jucker (2021) compared the orality features between performed fictional dialogues and spontaneous conversations by making use of five large-scale corpora and quantitatively analyzed the frequency distribution of common inserts and contractions which are believed to characterize orality. He found that the scripted fictional dialogues underused the orality features than the unscripted conversations. Both corpus-based quantitative investigations and qualitative descriptions revealed that the scripted fictional dialogues, although carefully contrived, shared some similarities with written texts (Ikeo 2019; Jucker 2021) but still diverged from the unscripted spontaneous conversations or impromptu speeches in many aspects (Short 1996; Bublitz 2017). Another strand of research strives to understand orality in fictional dialogues and their perceptive functions in fiction. At the microlevel, the orality features can facilitate the communicative purposes of fictional dialogues to reflect the state of the mind of the characters (Leech and Short 2007; Koivisto and Nykänen 2016) and promote the development of the plot (Locher and Jucker 2021). Moreover, the identity of characters and the power hierarchies in the fictional world also emerged through intersubject interactions (Bucholtz and Hall 2005; Holmes and Wilson 2017). Therefore, the orality features of fictional dialogues are carefully designed by the author, including conversation structures, syntactic characteristics, wording, spelling, and tone, to offer important cues of the age, gender, region, ethnicity, social status of the characters (Locher and Jucker 2021). At the macrolevel, Orality in Translated and Non-Translated Fictional Dialogues 121 orality features can help promote the interaction between the author and the readers. Specifically, the author reconstructs the activity and imparts the contextual information to the readers through fictional dialogues (Locher and Jucker 2021). The readers follow the logical progression of the novel and take the initiative to portray the characters in the sociocultural context of the novel with the help of the orality features (Nykänen and Koivisto 2016). Bublitz (2017) noted that although the orality features were reduced in fictional dialogues, the readers managed to create meanings and contexts through interacting with the dialogues. In brief, orality as represented in fictional dialogues plays an important part in constructing the fictional world. However, most of the studies still adopted a deductive method by analyzing a limited range of orality features of some extracts of fictional dialogues (Jucker 2021). In this regard, we believe that a more inductive corpus-based analysis of a wider variety of features can offer better insights into this line of inquiry. 7.2.2 Translating Orality of Fictional Dialogues In view of the complicated linguistic features and important functions of fictional dialogues, translating fictional dialogues in a natural-sounding and culturally appropriate way imposes unique challenges for translators. Since the cultural relations between the source texts and the translated texts are dissimilar in various aspects (Ettobi 2015), translating the social and cultural values connoted in orality features of fictional dialogues is a challenging and sometimes even impossible task (Tiittula and Nuolijärvi 2016; Newmark 1987). Such an argument is supported by some research findings that translation of fictional dialogues often seems to fail to reproduce effectively the orality features in the translated texts. For example, Leppihalme (2000) analyzed how translators dealt with nonstandard language related to regionalism in literary dialogues. She found the law of growing standardization (Toury 2012) (i.e., translation tends to lose its source language features and variations but instead conforms to target language conventions) is dominant in the translation, in addition to other strategies such as domestication, compensation, addition, and foreignization. However, the use of many strategies further led to a loss of features that could have distinguished the author’s literary works and reduced the traits of the characters’ social status. Rosa (2000) analyzed the diachronic changes in Portuguese translations of the forms of address in Robinson Crusoe. She found that the power relationship between Robinson and Friday was distorted in some translated versions, such as the three versions in the 1980s and 1990s. She further elucidated that the changes in translation were a negotiation of the source text, the target text, and the developing translation norms. In 2015, Rosa compared some examples of dialogues extracted from the original version and the translated version of Charles Dickens’s Oliver Twist. She found that many nonstandard usages of English were obliterated or even standardized in the translated version, and consequently, the discursive representation of otherness was totally wiped out. While the aforementioned researchers solely focused on the translated fictional dialogues, Arhire (2019) compared the use of lexical emphasis and ellipsis between the translated Romanian fictional dialogues and their 122 Yanfang Su and Kanglong Liu English originals; he argued that untranslatability occurred occasionally due to the structural differences between the two languages, which further led to an underrepresentation of emotions and reduced identity-shaping power of the dialogues. Despite some interesting findings, most studies in this field are largely descriptive in nature based on some representative excerpts extracted from the novels. To sum up, though great efforts have been made in investigating the orality features of translated fictional dialogues, most studies are still based on purely qualitative methods to analyze examples selected from one particular fiction. Besides, the use of orality features vary from one study to another, which has not only created problems for generalizability of findings, as the selected language features might not be adequate to distinguish one text from another (Xiao 2009; Biber 2014), but also restricted the pursuit of further scholarly investigations. Overall, in this field of research, quantitative evidence is still lacking, leaving the findings and conclusions resting heavily on the insight of individual researchers. In other words, studies based on a corpus of representative fictional dialogues remain relatively scarce. In addition, much of the extant literature on orality features in fictional dialogues focused solely on either original texts or translated texts, and it is unclear whether differences exist between translated and non-translated fictional dialogues (Nevalainen 2004). For the few studies comparing translated and nontranslated fictional dialogues, most of them centered on the translations between European languages. For example, Nevalainen (2004) utilized the Corpus of Translated Finnish to examine the colloquial features of translated texts, that is, nonstandardized spelling and wording. Nevertheless, investigations based on language pairs with distant genetic relationships, such as English and Chinese, might yield more fruitful results. In view of the limitations of the research methods of previous studies, we proposed using the MDA to compare the multiple linguistic features of different text types. 7.2.3 The Multidimensional Analysis Approach and Studies on Orality The multidimensional analysis approach (MDA) was originally proposed and developed by Biber (1988) to identify, interpret, and compare the “co-occurrence” patterns of certain linguistic features in corpora and the reflected “shared functions” (Biber et al. 2002, p. 14). Biber (1988) analyzed the register variation of English using a batch of linguistic features. Six dimensions turned out to yield important results to discriminate the different registers, that is, (1) involved vs. informational language, (2) narrative vs. non-narrative language, (3) elaborated vs. situation-dependent discourse, (4) overt expression of persuasion, (5) abstract vs. non-abstract discourse, and (6) online informational elaboration. Biber’s (1988) proposal of the MDA model was epoch-making (Biber et al. 2002). First, it is corpus-based, making analysis of a large number of representative texts possible. The use of computational tools also facilitates the thorough analysis of a wide range of linguistic features quantitatively, ensuring more accurate and consistent results. In addition, the same computational tools or corpora data can be applied Orality in Translated and Non-Translated Fictional Dialogues 123 and replicated in different studies, which can further strengthen the generalizability of research findings. Since its introduction, the MDA model has prompted subsequent researchers to adopt it in a variety of studies. One of the prominent research strands is the study of orality. For example, Biber et al. (2002) compared speech and writing in academic discourse and found that spoken and written texts contrasted remarkably in dimensions 1, 2, 3, and 5, with some variations in disciplines. Quaglio (2009) utilized dimension 1 of MDA to compare the language of television dialogues used in the situation comedy Friends and the language of natural conversations. He found that the television dialogues most resembled the linguistic features in the involved registers proposed in Biber’s (1988) study, indicating the endeavors of scriptwriters and actors to mimic natural conversations. Jonsson (2015) compared the linguistic features of synchronous and super-synchronous computer-mediated communication (CMC) with oral conversations using MDA. He found that oral and written texts contrasted notably in dimension 1, dimension 3, and dimension 5 of Biber (1988), with dimension 1 being the most significant one. Following the methodology of Jonsson (2015), Biber and Egbert (2020) compared the orality of searchable web registers with face-to-face conversations and found that the searchable web varies in terms of registers and the interactive registers are barely represented in this discourse domain. Xiao (2009) further developed the MDA approach by adding more semantic features, which result in a total of nine dimensions comprised of 141 linguistic features. He used the new model to compare the register variation in five varieties of English. Among all the factors, the dimension which differentiated the interactive casual texts and the informative elaborate texts exhibited the most prominent contrastive power among different registers (Xiao 2009). Summarizing from previous multidimensional analyses of orality, it is shown that most studies used authentic conversations as the benchmark for comparison. Besides, findings of previous studies indicate that dimension 1 of Biber’s (1988) MDA is particularly effective in characterizing the orality of texts. 7.2.4 Research Questions In view of the research gaps revealed by the foregoing review, the present study intends to adopt a corpus-based approach to systematically compare the degree of orality in translated and non-translated fictional dialogues. Specifically, two research questions are addressed. The first research question concerns with the orality of translated and non-translated fictional dialogues from a macrolevel. The second research question further examines how orality differs between translated and non-translated fictional dialogues in specific language features. 1 2 Do translated fictional dialogues display a lesser degree of orality than nontranslated fictional dialogues represented by Biber’s (1988) dimension 1? If differences are identified between the two types of texts in dimension 1, in what ways do the individual linguistic features associated with dimension 1 differ between translated and non-translated fictional dialogues? 124 Yanfang Su and Kanglong Liu 7.3 Methods 7.3.1 The Corpora With the aim of comparing the orality of translated and non-translated fictional dialogues, we compiled a corpus of fictional dialogues with one translated subcorpus and one non-translated subcorpus. The first step of corpus compilation was the selection of high-quality and comparable translated and non-translated fiction works. To ensure the quality of the novels, we referred to the list of Time’s top 100 best novels (1923 to 2005) in selecting the original English novels and the top 100 twentieth-century Chinese novels recommended by Asia Weekly in selecting the translated Chinese novels. In addition, to make sure the translated and nontranslated fictional dialogues were comparable, the publication time of the English novels and the translated novels was limited to the period of 1970s–2010s. Ten original English novels and ten translated novels were selected. Then, in the second step of corpus compilation, the fictional dialogues were extracted from the novels using a self-written Python program by detecting the quotation marks. The fictional dialogue data were then manually checked for consistency and accuracy. In the end, we have compiled the Fictional Dialogue Corpus, comprised of one subcorpus of non-translated fictional dialogues (250, 950 words) and one of translated fictional dialogues (132, 516 words) (see Table 7.1 for the corpus structure). Table 7.1 Composition of the Fictional Dialogue Corpus Fiction Publication Year Word Count Translated Fiction Border Town (《邊城》) Rickshaw Boy A Novel (《駱駝祥子》) Taipei People The Taste of Apples (《兒子的大玩偶》) The Deer and the Cauldron (《鹿鼎記》) Alien Realm (《異域》) Blades from the Willow (《蜀山劍俠傳》) Schoolmaster (《倪煥之》) Spring Peach (《春桃》) Farewell to My Concubine (《霸王別姬》) 2009 2010 2000 2001 2002 1996 1991 1978 1995 1994 132,516 7,758 9,565 15,849 18,138 25,251 3,426 13,003 20,171 2,803 16,552 Non-Translated Fiction American Pastoral Atonement Beloved The Blind Assassin Song of Solomon Falconer Gravity’s Rainbow Never Let Me Go Snow Crash White Teeth 1997 1987 2000 2000 1977 1977 1973 2005 1992 2000 25,095 30,178 4,266 15,409 14,286 29,651 9,795 41,236 18,963 37,928 49,238 Orality in Translated and Non-Translated Fictional Dialogues 125 7.3.2 Linguistic Features In the present studies, the 28 linguistic features in dimension 1 of Biber’s (1988) MDA are chosen for comparing the orality of translational and non-translational fictional dialogues. The reason for such a choice is twofold. Firstly, previous studies have confirmed dimension 1, which distinguishes “highly interactive, affective discourse produced under real-time constraints” and “highly informational discourse produced without time constraints” (Biber 1988, 135), was particularly useful in distinguishing oral from literate texts. Secondly, MDA has been established as a widely accepted analytical model with representative linguistic features. The use of this dimension together with the language features therein not only can increase the rigor of the study but also render comparison with other registers possible. In particular, dimension 1 consists of two categories of linguistic features. One category contains features with positive loadings, meaning that a higher frequency of such features will render the texts toward interactivity and orality; the other category consists of features with negative loadings, indicating that a higher frequency of these features will render the texts more informational and literate. The positive-loadings features include amplifiers, causative adverbial subordinators, discourse particles, subordinator “that” deletion, wh-clauses, pronoun “it,” among others. The negative-loadings features include nouns, word length, prepositional phrases, type-token ratio, and attributive adjectives. There are more positive-loadings features than negative ones, as the former are more commonly found in spoken registers, which are described as “verbal, interactional, affective, fragmented, reduced in form, and generalized in content” (Biber 1988, 105). 7.3.3 Data Analysis To compare the orality of translated and non-translated fictional dialogues, the first step is to grammatically annotate the corpus data and extract the statistics of linguistic features needed for further quantitative analysis. To this end, the Multidimensional Analysis Tagger (MAT) (Nini 2019), which was designed to replicate Biber’s (1988) MDA, was adopted in the present study. The MAT firstly tags the input texts with the linguistic features proposed by Biber (1988). Then, the program automatically calculates the normalized distribution (the frequency per 100 tokens) and computes the z-scores of the linguistic features in the corpus. Subsequently, based on the z-scores of linguistic variables, the dimension scores of the input texts are also calculated. The MAT also automatically matches the input texts with the closest register (Nini 2019). The output of the MAT analysis includes the normalized frequency and the z-scores of the individual linguistic features, the dimension scores of the input texts, a dimension graph, and a texttype graph. After pre-processing the corpus data and obtaining the statistics of the linguistic features, quantitative analyses were conducted to compare the degree of orality in translated and non-translated fictional dialogues. We first conducted the 126 Yanfang Su and Kanglong Liu one-sample Kolmogorov-Smirnov test and Levene’s test to check the assumptions of normal distribution and homogeneous variance (Larson-Hall 2015). The alpha level was set at .05 for this study. The results of one-sample KolmogorovSmirnov test indicated that the dimension scores of both the translated fictional dialogues (p = .20) and the non-translated fictional dialogues (p = .20) were normally distributed. The Levene’s test showed that the two groups of data followed the equality of variances (p = .77). As the assumptions were fulfilled, the independent samples t-test was conducted to compare whether the translated and non-translated fictional dialogues differ in the score of dimension 1 (RQ1). To get a more holistic picture of the overall degree of orality in translated and nontranslated fictional dialogues, the dimension scores of various text types used by Biber (1988) were also given as references. In addition to the independent samples t-test, the normalized frequency scores of individual linguistic features in the translated and non-translated fictional dialogues were also compared to reveal how the two text types differ in these features. The Mann-Whitney U test was utilized, as certain linguistic features did not fulfil the assumptions of normality or equality of variances (RQ2). The effect size of the features exhibiting significant differences (p<.05) was also calculated. The features that distinguish the two text types were discussed in detail with qualitative examples. 7.4 Results 7.4.1 Overall Dimension Scores Table 7.2 presents the mean score, the standard deviation of dimension 1, as well as the closest genre of translated and non-translated fictional dialogues. As shown in Table 7.2, both translated and non-translated fictional dialogues received a positive score on dimension 1. Although the mean score of non-translated fictional dialogues (M = 6.80, SD = 6.80) was higher than that of translated fictional dialogues (M = 15.62, SD = 7.13), the independent samples t-test showed that differences between the two text types were marginally significant in terms of the overall score of dimension 1 (t = 2.06, p = .054, df = 18). Instead of stating that the two text types are not statistically different from each, such a marginally significant result needed to be treated with caution. One possible explanation for such a result might be the small sample size in both groups (Huck 2011), that is, only ten translated fictions and ten non-translated fictions were involved in the analysis. Table 7.2 Score of Dimension 1 Text Type N Mean Score Std. Deviation Closest Genre Non-translated Translated 10 10 22.04 15.62 6.80 7.13 Personal Letters Personal Letters Orality in Translated and Non-Translated Fictional Dialogues 127 Figure 7.1 Scores for dimension 1 of different registers. Figure 7.1 shows the spread of scores of different registers regarding dimension 1, in which the degree of orality of translated and non-translated fictional dialogues is illustrated together with related registers. The registers for comparison include face-to-face conversations, broadcasts, prepared speeches, personal letters, general fiction, press reportage, academic prose, and official documents (note the statistics are taken from Biber [1988]). The dots in the middle represent the mean dimension score of the register, and the upper and lower whiskers show the dispersion of the scores. As indicated in Figure 7.1, written registers, like official documents, press reportage, and academic prose, receive negative mean scores. Broadcasts and general fiction also exhibit negative mean scores, but the variation of the scores for these two registers is large, which is possibly due to the influence of sub-genres (Biber 1988). The rest of the registers, including conversations, prepared speeches, personal letters, and translated and non-translated fictional dialogues, received positive mean scores. Among all the registers receiving a positive mean score, prepared speech is the lowest, face-to-face conversations the highest, and personal letters, translated fictional dialogues, and non-translated fictional dialogues range in between. From a functional perspective, the positive scores suggest that these text types are more involved and interactive in nature, and the spread of the dimension scores indicates the different tendency toward orality. The mean scores for both translated and non-translated fictional dialogues 128 Yanfang Su and Kanglong Liu are higher than that of prepared speeches, lower than the mean score of face-toface conversations, and similar to that of personal letters. This shows that both translated and non-translated fictional dialogues are characterized with a higher degree of orality, though they contain fewer informational features than prepared speeches but less orality features than face-to-face conversations. In addition, compared with the large variation of face-to-face conversations, the variation of fictional dialogues is smaller, indicating that the scripted fictional dialogues are relatively narrower in linguistic features than authentic conversations. In the end, it should also be noted that although both fictional dialogues resemble personal letters regarding mean scores, the variation of fictional dialogues is larger. This shows that fictional dialogues display a lower degree of internal consistency regarding orality than personal letters. To sum up, translated fictional dialogues contain less orality features than its non-translated counterpart. However, the overall dimension scores indicated that both text types display a similarly high degree of orality. There is a high tendency for these two text types toward orality. Based on the dimension scores, fictional dialogues show great similarity with personal letters but are not directly comparable to face-to-face conversations. 7.4.2 Distribution of Linguistic Features As indicated by the results of independent samples t-test, translated fictional dialogues did not differ significantly from non-translated fictional dialogues regarding the overall degree of orality. However, since the difference was marginally significant (p = .054), a closer look at the distribution of individual linguistic features might yield more insights into the similarities and disparities between the two text types. Therefore, the Mann-Whitney U test was conducted to compare the distribution of individual linguistic features between the two text types. Table 7.2 shows the results of the Mann-Whitney U test, including the mean rank differences, the Mann-Whitney U, the z-score, and the p-value. The mean rank differences reveal the discrepancy of individual features between the two text types. Features with a p-value smaller than .05 indicate that the linguistic feature is significantly different between the two text types. As we can see from Table 7.2, 16 out of 28 linguistic features are not significantly different between the two text types. The translated and non-translated fictional dialogues receive similar scores regarding personal pronoun (including first-person pronouns and second-person pronouns), questions (direct wh-questions), present tense, sentence relatives, independent clause coordinators, be as a main verb, amplifiers, emphatics, contractions, possibility modals, and analytic negation. In addition, both translated and non-translated fictional dialogues receive similar negative scores regarding the other group of features that are representative of informational texts, like nouns (excluding nominalization and gerunds), word length, prepositional phrases, attributive adjectives, and type-token ratio. On the other hand, the Mann-Whitney U test also identifies 11 (out of 28) features that exhibit significantly different distribution in translated and Orality in Translated and Non-Translated Fictional Dialogues 129 Table 7.3 Results of Mann-Whitney U-Test Feature Mean Rank Diff.a MannWhitney U Z Sig. (2-Tailed) Private verbs Wh-clauses Hedges Pronoun it Subordinator that deletion Stranded preposition Indefinite pronouns Discourse particles Demonstrative pronouns Pro-verb do Causative adverbial subordinators Present tense Analytic negation Contractions Total prepositional phrases Amplifiers Word length Attributive adjectives Emphatics Be as main verb Possibility modals Sentence relatives First-person pronouns Second-person pronouns Type-token ratio Direct wh-questions Total other nouns Independent clause coordination 9.6 8.5 7.9 7.2 6.6 6.3 6.2 5.7 5.5 5.4 5.3 4.4 4.0 3.8 -3.1 2.6 -2.3 -2.2 -2.1 2.0 -1.1 0.8 0.8 -0.7 0.2 0.2 -0.2 0.1 2 7.5 10.5 14 17 18.5 19 21.5 22.5 23 23.5 28.0 30.0 31.0 34.5 37.0 38.5 39.0 39.5 40.0 44.5 46.0 46.0 46.5 49.0 49.0 49.0 49.5 -3.63 -3.216 -2.995 -2.722 -2.496 -2.386 -2.35 -2.16 -2.083 -2.043 -2.007 -1.663 -1.512 -1.436 -1.173 -0.984 -0.870 -0.832 -0.794 -0.756 -0.416 -0.311 -0.302 -0.265 -0.076 -0.076 -0.076 -0.038 <0.001** 0.001** 0.003** 0.006** 0.013** 0.017** 0.019** 0.031* 0.037* 0.041* 0.045* 0.096 0.130 0.151 0.241 0.325 0.384 0.406 0.427 0.450 0.677 0.756 0.762 0.791 0.940 0.940 0.940 0.970 Source: a Mean rank diff. = mean rank (non-translated fictional dialogues) – mean rank (translated fictional dialogues), ** Large effect size (r > 0.5). * Medium effect size (0.3 > r ≥ 0.5). non-translated fictional dialogues, which are causative adverbial subordinators, demonstrative pronouns, discourse particles, hedges, indefinite pronouns, pronoun it, private verbs, pro-verb do, stranded preposition, subordinator that deletion, and wh-clauses. The mean rank differences of these 11 features are positive, meaning, that the normalized frequency of these specific features which are positively correlated with orality is higher in non-translated than in translated fictional dialogues. Among the 11 features, 7 features, that is, private verbs, whclauses, hedges, pronoun it, subordinator that deletion, stranded preposition, and indefinite pronouns, exhibit a large effect size. The rest of the 4 features, like demonstrative pronouns, pro-verb do, and causative adverbial subordinators, have a medium effect size. Specifically, among the 11 significantly different features, one group of features is related to attitudinal or interpersonal expressions, which are overrepresented in 130 Yanfang Su and Kanglong Liu non-translated fictional dialogues. In comparison, personal feelings or attitudes are relatively underrepresented in the translated fictional dialogues. The feature with the largest effect size are private verbs (e.g., feel, perceive), which are symbolic of attitudinal or affective expressions and indicative of interpersonal communication. Private verbs are also one of the features that have the strongest power to distinguish involved from informational texts. The non-translated fictional dialogues use significantly more private verbs than translated ones, revealing that the characters express their personal feelings and thinking more explicitly in the former. For example, in one typical example of the non-translated fiction Atonement, the character uses private verbs to express personal ideas or feelings in a series of consecutive sentences. You’d be forgiven for thinking me mad wandering into your house barefoot, or snapping your antique vase. The truth is, I feel rather lightheaded and foolish in your presence, Cee, and I don’t think I can blame the heat! Will you forgive me? – Robbie In addition to private verbs, more frequent use of causative adverbial subordinators in non-translated dialogues also suggests an overrepresentation of more attitudinal or affective expressions in this text type. For example, in the original English fiction American Pastoral, “because” is frequently used and emphasized (in the form of “b-because” and “b-b-because”) to express the strong emotions of the character when s/he is arguing with another person. Another feature that is typical of interactive texts is wh-clauses, which function as “structural elaboration” and provide a way to “talk about questions” (Biber 1988, 220). Wh-clauses are more prevalent in conversations and speeches. Likewise, the non-translated fictional dialogues exhibited a higher frequency of wh-clauses in comparison to translated ones. Another typical group of features in non-translated fictional dialogues are impersonal pronouns, including pronoun it, indefinite pronouns, and demonstrative pronouns. These pronouns are used as general referents as they carry limited information. Such linguistic features are often associated with a lack of careful thinking, thus featuring one of the typical traits of spoken texts. The higher representation of pronouns and underuse of nominal referents in non-translated fictional dialogues reveal a stronger degree of uncertainty typical of real-time conversations. Therefore, in this aspect, the non-translated fictional dialogues are less informational and share more similarities with real-time conversations than translated fictional dialogues. Besides impersonal pronouns, the underrepresentation of hedges (e.g., maybe, possibly, kind of) in translated fictional dialogues also implies that translated fictional dialogues carry a higher degree of perceptual certainty. In the example extracted from the non-translated fiction Beloved, the character uses the pronoun it to refer to the “ghost” that she was unsure about, and the hedge maybe further highlights the uncertainty. Then another character resolved her doubt by firstly using it to refer to the ghost, as the first character did, and then shifting from it to she when referring to the ghost. Orality in Translated and Non-Translated Fictional Dialogues 131 I don’t know about lonely, Mad, maybe, but I don’t see how it could be lonely spending every minute with us like it does. Must be something you got it wants. It’s just a baby. My sister, she died in this house. Using do as a pro-verb, as a distinctive feature of oral conversations, is also typical of non-translated texts. The word do is polysemous and can be used as a general verb in different contexts. The overuse of such a feature in the non-translated subcorpus implies a reduced information density and enhanced orality in non-translated fictional dialogues. In comparison, translated texts prefer precise wording to using the general verb do. Moreover, compared with translated fictional dialogues, non-translated fictional dialogues preferred more reduced forms, represented by the constant omission of subordinator that and the more frequent use of stranded prepositions. On the other hand, translated fictional dialogues preserve the subordinator that more frequently. For example, in the translated fiction Border Town, the translator chose to retain the subordinator that when the character was referring to other people’s ideas in the utterance. No. 2, my Cuicui tells me that one night during the last month she had a dream. It was strange. She said that in her dream someone’s songs floated her up to the bluffs across the creek, where she picked a handful of saxifrage! In comparison, in the non-translated fiction The Blind Assassin, subordinator that is often omitted, as evidenced in both of the following sentences. They said it was a matter of costs. After the button factory was burned, they said it would take too much to rebuild it. In addition, stranded prepositions are underrepresented in translated, while overrepresented in non-translated, fictional dialogues. For example, in one of the non-translated fictions, Snow Crash, the sentence “But you know that bug you were talking to earlier?” contains the stranded preposition “to” at the end of a sentence that is separated from the nominal. Such a feature is typical of orality, whereas the non-stranded counterpart is representative of formal discourse. Clearly, the non-translated fictional dialogues display a tendency toward the spoken end of the cline compared with the translated ones. Discourse particles can serve different pragmatic functions (Aijmer 2002) and are used to express the attitudes and beliefs of speakers regarding the propositional content of an utterance. As such, it also helps to maintain textual coherence especially when the text is fragmented. Two extracts taken respectively from the translated and non-translated texts are used to illustrate the interesting distinction between the two text types. In both examples, the character interacts with the other character in a bad mood and expresses his/her idea about what the other 132 Yanfang Su and Kanglong Liu character has told him/her. In the translated fiction Schoolmaster, the speaker expresses his disagreement by directly putting forward his suggestion using a rhetorical question. Why not let him get on with it? If it’s something simple that one can manage oneself, there’s no point in troubling someone else to do it. In comparison, in the non-translated fiction Snow Crash, “oh” and “well” were utilized to express the speaker’s discontent and signal the confrontational situation. From the conversation, it can be inferred that the speaker is not satisfied with the answer given by the other speaker. The use of discourse particles clearly indicates the speaker’s displeasure or even indignation. Also, using the discourse particles has, to some extent, mitigated the face-threatening situation caused by the rhetorical question. “Where do you want to go on the Kowloon?” “The Raft.” “Oh, well, why didn’t you say so, that’s where our other passenger is going.” The underuse of discourse particles in translated fictional dialogues as opposed to non-translated ones suggests that translations might lack the authenticity and naturalness of face-to-face conversations compared to the originals. In summary, although the translated and non-translated fictional dialogues are marginally significant in terms of dimension scores, they differ significantly in terms of the distribution of various individual features. Particularly, in comparison with non-translated fictional dialogues, the translated fictional dialogues are featured by an underrepresentation of personal attitudes and emotions, an underuse of discourse particles, and more complete and precise expressions. 7.5 Discussion This study reveals the similarities and differences in orality between translated and non-translated fictional dialogues by making use of the multidimensional analysis approach. By treating fictional dialogue as a genre in its own right, we have come up with some interesting findings that might otherwise remain undetected if fiction is treated as one single genre. In this study, it is found that fictional dialogue shares more similarities with personal letters but nonetheless still exhibits a considerable degree of orality. Fictional dialogue, both translated and non-translated, does not resemble general fiction, as reflected by the overall dimension scores, which confirmed the proposal of previous researchers that fictional dialogues and narration are indeed two different genres that should be analyzed separately (Axelsson 2009). One possible explanation might be that fictional dialogues are scripted texts that are artfully created to simulate real-life Orality in Translated and Non-Translated Fictional Dialogues 133 conversations representative of the sociocultural background of the characters (Bublitz 2017; Jucker 2021). The findings of the present study also corroborate with Bednarek (2018) and Jucker (2021) that scripted language of fiction displays different features from unscripted conversations regarding orality and thus can never be the same as spontaneous conversations. Notwithstanding the efforts to model and reproduce real-life conversations (Leech and Short 2007), as argued by Chaume (2007, 215), the scripted language are “very normative indeed.” The marginally significant differences between translated and non-translated fictional dialogues show that the two text types still show considerable differences. Such differences are supported by the discrepant normalized frequency distribution of individual linguistic features between the two text types. This is in line with the findings of Brodovich (1997) that translations differ from originals in their portrayal of characters speaking nonstandard language. Such a difference is reflected in vocabulary as well as grammar features, which can partly be attributed to the translators’ efforts to standardize translated texts. As far as the current study is concerned, the omission of subordinator that and stranded prepositions are less found in translated fictional dialogues, indicating that translated language favors more standardized structures over reduced forms or fragmented ones. The quantitative findings of the present study also corroborate with the qualitative findings of previous research that translated fictional dialogues tend toward standardization of language use (e.g., Read 2013; Tiittula and Nuolijärvi 2016; Nevalainen 2004). In addition, some distinct features of translated fictional dialogues might also be related to translators’ decision to explicitate the source text (Blum-Kulka 1986/2000). For example, expressions that indicate vagueness and uncertainties, such as hedges, do as a pro-verb, pronoun it, indefinite pronouns, and demonstrative pronouns, are often underused in translated fictional dialogues. The findings give clear support for the worries of Ben-Shahar (1994) that translators prefer more specific lexemes and explicit verbalization to generalized or uncertain expressions. Another possible explanation for the differences between translated and nontranslated fictional dialogues might be the influence of the source language. For instance, the present study contradicts Nevalainen (2004), who found that translators frequently used interjections and speech fillers to retain orality in the translation. In the current study, we found that non-translated texts used discourse particles at a higher frequency than translated texts, suggesting that the translations might be subject to unnaturalness and incoherence. One possible reason for the diverged findings might be the influence of the source language. As Liu (2013) found, people with different first language might have different ways of using discourse particles. In the present study, the source language is Chinese, while in Nevalainen’s (2004) study, the fictional dialogues were translated from Finnish. Source language clearly has a role to play in the translation of fictional dialogues. The source texts written in different languages might have a different proportion of attitudinal or affective expressions, which are then transferred to the translated texts. As argued by Bishop (1956), compared to Western fiction, emotions in Chinese fiction tend to be implicitly expressed and often conveyed 134 Yanfang Su and Kanglong Liu through the narrator’s voice rather than the fictional dialogues. Consequently, when the translators follow the source language norm by opting for a more faithful approach, it is natural that emotional and affective language might be underrepresented in the translated fictional dialogues. Despite the influence of source language, it should also be noted that the pragmatic functions of certain linguistic features might also be lost in the translation process of standardization or explicitation. The reduced degree of orality might also result in unnaturalness and lack of spontaneity in translated fictional dialogues (Ben-Shahar 1994). As fictional characters who come from different sociocultural backgrounds can be portrayed to exhibit divergent characteristics of speech (Locher and Jucker 2021), the degree of orality should be treated with extra attention in translated fiction which contains the source sociocultural backgrounds written in the target language. So far as the current study is concerned, the relatively lower degree of orality regarding certain linguistic features and the tendency toward standardization and explicitation in translated fictional dialogues, as warned by Tiittula and Nuolijärvi (2016), can influence the shaping of characters and even misrepresent the relationship between characters intended by the author. We suspect that, like other types of translation activities, translators of fictional dialogues are also trapped in a dilemma of either employing a more “literate” approach to produce more faithful but less “authentic” fictional dialogues or opting for a more “adaptational” approach to render less faithful but more “natural” dialogues. 7.6 Conclusion The current study has used multidimensional analysis to examine the orality features in translated and non-translated fictional dialogues. In comparison to other models, the consistency and perceived robustness of this model have greatly increased the generalizability of the research findings. Our study has found that translation as an important variable has played a crucial role in affecting the profiling of translated fictional dialogues, which differ significantly from non-translated ones in a range of language features. Notwithstanding the interesting findings, it is admitted that some limitations exist in the present study. The analysis has concentrated on fiction works that were published or translated from the 1970s to the early twenty-first century. Since translation is influenced by negotiation between sociocultural powers and the prevalent translation norms (Rosa 2000), future studies could compile a bigger corpus by including more fiction works for analysis. Another limitation arises from the nature of translation. The findings of this study are restricted to the design of the comparable corpus comprised of translated and non-translated fictional dialogues without referring to the source texts; therefore, the influence of the source texts on the orality features of translation remains unknown. Future studies can be conducted to examine to what extent the differences in orality between these two text types are a result of translation or source language influence. In this regard, the use of composite bilingual corpus (Laviosa 2006, 268) integrating both Orality in Translated and Non-Translated Fictional Dialogues 135 comparable and parallel corpora can be fruitfully utilized to explore a number of interrelated variables in translated fictional dialogues. Note 1 Corresponding author. References Aijmer, Karin. 2002. English Discourse Particles: Evidence from a Corpus (Vol. 10). Amsterdam/Philadelphia: John Benjamins Publishing. Arhire, Mona. 2019. “Lexical Emphasis in the Literary Dialogue: A Translational Perspective.” Acta Universitatis Sapientiae, Philologica 11, no. 3: 105–18. Axelsson, Karin. 2009. “Research on Fiction Dialogue: Problems and Possible Solutions.” In Corpora: Pragmatics and Discourse, edited by Andreas H. Jucker, Daniel Schreier, and Marianne Hundt, 189–201. Leiden: Brill. Bednarek, Monika. 2018. Language and Television Series: A Linguistic Approach to TV Dialogue. Cambridge: Cambridge University Press. Ben-Shahar, Rina. 1994. “Translating Literary Dialogue: A Problem and Its Implications for Translation into Hebrew.” Target. International Journal of Translation Studies 6, no. 2: 95–121. Biber, Douglas. 1988. Variation across Speech and Writing. Cambridge: Cambridge University Press. Biber, Douglas. 2014. “The Ubiquitous Oral Versus Literate Dimension: A Survey of Multidimensional Studies.” In Measured Language: Quantitative Studies of Acquisition, Assessment, and Variation, edited by Jeffrey Connor-Linton and Luke W. Amoroso, 1–20. Washington, DC: Georgetown University Press. Biber, Douglas, Susan Conrad, Randi Reppen, Pat Byrd, and Marie Helt. 2002. “Speaking and Writing in the University: A Multidimensional Comparison.” TESOL Quarterly 36, no. 1: 9–48. Biber, Douglas, and Jesse Egbert. 2020. “Orality on the Searchable Web: A Comparison of Involved Web Registers and Face-to-Face Conversation.” In Voices Past and Present Studies of Involved, Speech-related and Spoken Texts. In Honor of Merja Kyt, edited by Ewa Jonsson and Tove Larsson, 317–36. Amsterdam: John Benjamins. Bishop, John L. 1956. “Some Limitations of Chinese Fiction.” The Far Eastern Quarterly 15, no. 2: 239–47. Blum-Kulka, Shoshana. 1986/2000. “Shift of Cohesion and Coherence in Translation.” In The Translation Studies Reader (2nd ed.), edited by Lawrence Venuti and Mona Baker, 298–313. London and New York: Routledge. Brodovich, Olga I. 1997. “Translation Theory and Non-standard Speech in Fiction.” Perspectives: Studies in Translatology 5, no. 1: 25–31. Bublitz, Wolfram. 2017. “Oral Features in Fiction.” In Pragmatics of Fiction, edited by M. A. Locher and A. H. Jucker, 235–64. Berlin: de Gruyter. Bucholtz, Mary, and Hall Kira. 2005. “Identity and Interaction: A Sociocultural Linguistic Approach.” Discourse Studies 7, no. 4–5: 585–614. Chaume, Frederic. 2007. “Dubbing Practices in Europe: Localisation Beats Globalisation.” Linguistica Antwerpiensia 6: 201–17. Egbert, Jesse, and Mahlberg Michaela. 2020. “Fiction – One Register or Two? Speech and Narration in Novels.” Register Studies 2, no. 1: 72–101. 136 Yanfang Su and Kanglong Liu Ettobi, Mustapha. 2015. “Translating Orality in the Postcolonial Arabic Novel: A Study of Two Cases of Translation into English and French.” Translation Studies 8, no. 2: 226–40. Ferguson, Susan L. 1998. “Drawing Fictional Lines: Dialect and Narrative in the Victorian Novel.” Style 32, no. 1: 1–17. Holmes, Janet, and Wilson Nick. 2017. An Introduction to Sociolinguistics. London: Routledge. Huck, Schuyler W. 2011. Reading Statistics and Research. London: Pearson Education. Ikeo, Reiko. 2019. “Colloquialization’ in Fiction: A Corpus-Driven Analysis of PresentTense Fiction.” Language and Literature: International Journal of Stylistics 28, no. 3: 280–304. Jonsson, Ewa. 2015. Conversational Writing: A Multidimensional Study of Synchronous and Supersynchronous Computer-Mediated Communication. Bern: Peter Lang. Jucker, Andreas H. 2021. “Features of Orality in the Language of Fiction: A Corpus-based Investigation.” Language and Literature 30, no. 4: 341–60. Koivisto, Aino, and Elise Nykänen. 2016. “Introduction: Approaches to Fictional Dialogue.” International Journal of Literary Linguistics 5, no. 2: 1–14. Larson-Hall, Jennifer. 2015. A Guide to Doing Statistics in Second Language Research Using SPSS and R. New York and London: Routledge. Laviosa, Sara. 2006. “Data-Driven Learning for Translating Anglicisms in Business Communication.” IEEE Transactions on Professional Communication 49, no. 3: 267–74. Leech, Geoffrey N., and Mick Short. 2007. Style in Fiction: A Linguistic Introduction to English Fictional Prose (No. 13). London: Pearson Education. Leppihalme, Ritva. 2000. “The Two Faces of Standardization: On the Translation of Regionalisms in Literary Dialogue.” The Translator 6, no. 2: 247–69. Liu, Binmei. 2013. “Effect of First Language on the Use of English Discourse Markers by L1 Chinese Speakers of English.” Journal of Pragmatics 45, no. 1: 149–72. Locher, Miriam A., and Andreas H. Jucker. 2021. The Pragmatics of Fiction: Literature, Stage and Screen Discourse. Edinburgh: Edinburgh University Press. Nevalainen, Sampo. 2004. “Colloquialisms in Translated Text. Double Illusion?” Across Languages and Cultures 5, no. 1: 67–88. Newmark, Peter. 1987. A Textbook of Translation. Hoboken, NJ: Prentice-Hall International. Nini, Andrea. 2019. “The Multi-Dimensional Analysis Tagger.” In Multi-Dimensional Analysis: Research Methods and Current Issues, edited by T. Berber Sardinha and M. Veirano Pinto, 67–94. London and New York: Bloomsbury Academic. Nykänen, Elise, and Aino Koivisto. 2016. “Fictional Dialogue and the Construction of Interaction in Rosa Liksom’s Short Stories.” International Journal of Literary Linguistics 5, no. 2: 1–30. Ong, Walter J. 1982. Orality and Literacy: The Technologizing of the Word. London: Methuen. Quaglio, Paulo. 2009. Television Dialogue: The Sitcom Friends vs. Natural Conversation. Amsterdam and Philadelphia, PA: John Benjamins. Read, Andrew. 2013. “Translating and Adapting Fictional Speech: The Case of Philip Pullman’s Northern Lights.” PhD diss., University of Manchester. Rosa, Alexandra Assis. 2000. “The Negotiation of Literary Dialogue in Translation: Forms of Address in Robinson Crusoe Translated into Portuguese.” Target. International Journal of Translation Studies 12, no. 1: 31–62. Rosa, Alexandra Assis. 2015. “Translating Orality, Recreating Otherness.” Translation Studies 8, no. 2: 209–25. Orality in Translated and Non-Translated Fictional Dialogues 137 Short, Mick. 1996. Exploring the Language of Poems, Plays, and Prose. London: Longman. Toury, Gideon. 2012. Descriptive Translation Studies – And Beyond (Revised Edition). Amsterdam and Philadelphia: John Benjamins Publishing Company. Thomas, Bronwen E. 1997. “ ‘It’s Good to Talk’? An Analysis of a Telephone Conversation from Evelyn Waugh’s Vile Bodies.” Language and Literature: International Journal of Stylistics 6, no. 2: 105119. Thomas, Bronwen E. 2002. Multiparty Talk in the Novel: The Distribution of Tea and Talk in a Scene from Evelyn Waugh’s Black Mischief. Poetics Today 23, no. 4: 657–84. Tiittula, Liisa, and Nuolijärvi Pirkko. 2016. “Changing Norms in Translated Finnish Fiction: A Study of Non-Standard Varieties.” International Journal of Literary Linguistics 5, no. 3: 1–26. Xiao, Richard. 2009. “Multidimensional Analysis and the Study of World Englishes.” World Englishes 28, no. 4: 421–50. 8 The Avoidance of Repetition in Translation A Multifactorial Study of Repeated Reporting Verbs in the Italian Translation of the Harry Potter Series Lorenzo Mastropierro 8.1 Introduction Repetition in language is ubiquitous. It is a “central linguistic meaning-making strategy” (Tannen 1989, 97) that can occur on all linguistic levels (Wales 2011, 366) and in all kinds of registers and discourses (McCarthy and Carter 2014, 144f). As corpus linguistics insights have shown, the repetition of the same linguistic item or pattern can signal functional relevance (Mahlberg 2010, 297). A repeated feature can, for example, fulfill discursive functions, such as expressing stances or organizing discourses, which may be register-specific (e.g., Conrad and Biber 2004; Biber 2009); it can impart attitudinal and evaluating meanings by co-occurring regularly with another item (Sinclair 2004; Partington 1998; Stubbs 2002), or it can establish cohesive links throughout a text (Halliday and Hasan 1976; Flowerdew and Mahlberg 2009). More relevantly for the scope of this chapter, repetition can also be used for emphasis to foreground a textual element (Leech and Short 2007). The foregrounding function of repeated linguistic elements is particularly important in the study of style, where repetition is seen as one of the main devices through which stylistic effects are created. Repetition has always played a central role in the discipline of stylistics, since the early formalist discussions on the defamiliarizing potentials of poetic language (Mukařovský 1932). Wales (2011, 366) maintains that “it is impossible not to appreciate the significance of repetition” in literary language, and this significance has been demonstrated by extensive research, both qualitative and quantitative. For example, Prusse (2012) and Paton (2009) adopt qualitative approaches, manually identifying relevant examples of repetition in McGahern’s work and in Beckett’s Lessness, respectively, and discussing their stylistic function. However, it is quantitative studies that have demonstrated more clearly the pervasive role of repetition as a foregrounding device. In fact, corpus linguistic tools facilitate and improve the analysis of repetition, at the same time widening the range of repeated patterns identifiable to encompass phenomena impossible to detect manually. Corpus approaches have been used to explore the stylistic relevance of repetition in a variety of patterns, such as clusters and n-grams (Ikeo 2016; DOI: 10.4324/9781003298328-9 The Avoidance of Repetition in Translation 139 Mahlberg 2012, 2013), speech-bundles (Mahlberg et al. 2019), keywords (Vincent and Clarke 2017, Mahlberg and McIntyre 2011), keyword networks (Mastropierro and Mahlberg 2017), and collocations (Hori 2004), as well as to show how repetition can achieve a range of different functions, for example, building characterization (Ruano San Segundo 2017, 2018a, 2018b), creating narrative prospection (Toolan 2016), and establishing point of view (Ikeo 2016). Overall, the literature agrees in considering repetition a linguistic feature that can importantly contribute to the creation of a variety of stylistic effects and functions. When it comes to writing conventions, though, repetition is seen unfavorably, especially in creative and literary writing, where “too much repetition . . . can be tedious” (Leech and Short 2007, 199). “Repetitiveness” or “redundancy” have pejorative connotations and, as McCarthy and Carter (2014, 145) remark, being told “You’re repeating yourself” or “This piece is too repetitive” is usually perceived as a criticism. This is especially true for lexical repetition, which is easily noticeable and can suggest a lack of premeditation typical of ordinary speech, giving a written text the impression of being unsophisticated (Wales 2011, 366). Wales (2011, 36) explains that, for this reason, repetition is often avoided in writing, in favor of synonymity and substitution, or, in Leech and Short’s (2007, 199) words, “elegant variation.” This stigma is mirrored in translation too, where avoiding repetition has been famously equated to a universal norm by Ben-Ari (1998, 3): “One of the most persistent and inflexible norms in translation, in all languages studied (by myself and by colleagues elsewhere), is that of avoiding repetitions.” Even though the empirical research on the topic is insufficient to confirm the universal nature of this tendency, the existing studies (see Section 8.2) have shown that the avoidance of repetition in favor of lexical variety occurs in many of the language pairs and contexts analyzed, suggesting that such a tendency does exist. Given that repetition is a stylistically relevant linguistic phenomenon, it is then safe to expect that removing repetition in translation or altering the patterns it creates in the source text (ST) can impact importantly on the role that repetition has in the creation of stylistic effects. Despite the importance of the topic and of its implications for translation practice and training, repetition in translation has not been studied as extensively as it deserves, especially with data-based approaches, such as corpus linguistics. As the next section will show, most of the existing studies focus on investigating the stylistic consequences on the target text (TT) of replacing and/or altering repeated patterns in the original. Even though this approach provides important evidence of the impact that translator choices like these can have on the style of a literary text, it sheds little light on the nature of the phenomenon itself, concentrating instead on its consequences. Moreover, the latest developments in empirical translation studies have demonstrated the importance of approaching translation as a multidimensional phenomenon in which a variety of different factors interact simultaneously to shape the final product. De Sutter and Lefer (2019, 6) argue that multifactorial research designs are essential to capture the multidimensional nature of the translation phenomenon, and yet no multifactorial analysis of repetition in translation is currently available. This chapter contributes to redress these 140 Lorenzo Mastropierro gaps with a multifactorial study of repetition in translation that aims to describe the linguistic context in which a repeated ST item is replaced by lexical variety in the TT. More specifically, it investigates a set of factors (frequency of the ST item, number of possible translation equivalents of the ST item, number of different meanings of the ST item, semantic category of the ST item) as predictors of the replacement of repetition with lexical variety. Through an analysis of the translation of repeated reporting verbs in the Italian version of the Harry Potter series, this chapter offers a first multidimensional overview of the phenomenon and of its occurrence. 8.2 Repetition, Translation, and Style In line with its functional and stylistic relevance in linguistics, repetition has been considered in translation studies as an “absolutely essential” feature of the ST that “must be reproduced” in translation (Boase-Beier 1994, 407). Yet the avoidance of repetition in favor of variety is seen as a dominant trend in realworld translation practice (Ben-Ari 1998), and many studies have provided evidence of this tendency. Zhu (2004), for instance, finds that leitmotifs that build on lexical repetition in Galsworthy’s The Apple Tree and in Fitzgerald’s The Great Gatsby are disrupted in their Chinese translations because the repeated lexical items are translated as discrete units instead of networks. Similarly, al-Khafaji (2006) shows that avoiding or minimizing repetition is the most commonly used strategy to translate lexical repetition chains in the English version of an Arabic short story. Jawad (2009) works on the same language pair (Arabic to English) and reaches comparable conclusions: the foregrounding potential of rhetorical repetition in Tāhā Hussein’s al-Ayyām is affected by the replacement of repetition with “pervasive variation” (Jawad 2009, 768) in the English translation. Finally, Zupan (2006) explains that repeated patterns in Poe’s “The Fall of the House of Usher” are toned down in its Slovene translation, affecting the stylistic effect the patterns enact in the original. The spread of corpus linguistics methods in translation studies (i.e., corpusbased translation studies [CBTS], Granger and Lefer 2022; Mikhailov and Cooper 2016; Hu 2016; Kruger et al. 2013) has brought additional evidence for the tendency to avoid repetition in translation. However, even though most CBTS research investigates repeated patterns in corpora of translated texts, only a minority of these studies focus on the translation of repetition explicitly. Although limited, the existing literature confirms the trend identified by the qualitative research discussed previously. This is especially evident in the work by Čermáková, who shows how different types of repeated patterns – keywords and clusters – are rendered in literary translation. Čermáková and Fárová (2010) analyze the translation into Czech and Finnish of plot-relevant keywords in Harry Potter and the Philosopher’s Stone. They show that, when multiple equivalents are available, the translators vary their choices instead of using the same term consistently as in the original. Similar findings are seen in the translation of clusters. Čermáková (2015, 2018) finds out that the repetition of long, text-specific clusters in Irving’s The Avoidance of Repetition in Translation 141 A Widow for One Year (Čermáková 2015) and in the Winnie the Pooh books (Čermáková 2018) are avoided or toned down in translation. She suggests that “repetition seems to be a source of discomfort for many translators” (Čermáková 2018, 130), as both the Czech and Finnish translators opt for different solutions that disrupt the verbatim recurrence of the clusters. To the best of my knowledge, the only study that reaches different conclusions is that of Károly (2010). Her comparison of different types of repetition – as a feature of textual cohesion and coherence – in Hungarian news texts and their English translations does not confirm repetition avoidance. The TTs do not contain fewer instances of repetition than the STs; on the contrary, the English texts present a higher occurrence of simple, verbatim repetitions than the Hungarian original, although the difference is not statistically significant (Károly 2010, 63). Although not aimed at studying the translation of repetition directly, research on the translation of reporting verbs has produced findings aligned with the results of the studies reported so far. This research is highly relevant, as it focuses on the translation of a repeated linguistic feature in narrative prose that is commonly recognized as having important stylistic implications. In fact, given their fundamental contribution to characterization (Culpeper 2001; Ruano San Segundo 2016, 2017; Eberhardt 2017; Mastropierro forthcoming), reporting verbs are often seen as an important stylistic feature in literary texts, one that should be reproduced in translation “in order to replicate the characterising traits they endow to the characters associated with them” (Mastropierro 2020, 243). Yet the existing literature on the topic shows the same tendency to replace repetition with lexical variety, through translating the same reporting verbs into multiple target language items, especially from English into other languages. An increase in the lexical variety of reporting verbs from source to target texts is reported in translations from English into Spanish (Rojo and Valenzuela 2001; Bourne 2002), Czech (Corness 2010; Čermáková and Mahlberg 2018), Hungarian (Klaudy and Károly 2005), German (Winters 2007), and Italian (Mastropierro 2020). It is often said, by far the most frequent English reporting verb, that is translated in many different ways. For example, Corness (2010, 162) finds that the 9,992 occurrences of said in his English novels corpus are translated into 1,323 different ways, while Mastropierro (2020, 254) reports that 5,825 repetitions of said in the Harry Potter series are translated into 207 different Italian verbs. The predominance of said in reporting verbs usage in English narrative prose, compared to the use of its equivalent in other languages, has led some scholars (e.g., Rojo and Valenzuela 2001, 476; Levý 2011, 113) to suggest that it may be a difference in narrative styles between English and the target languages – the former tolerating the repetition of said far more than the latter – the reason why this type of repetition is replaced with lexical variety. Even if this is the case, Mastropierro (2020) shows that replacing said, an “interpretatively empty” reporting verb (Ruano San Segundo 2017, 111), with more semantically loaded verbs can disrupt the pattern relating verb usage to specific characters, affecting the potentials of patterns of verbs to convey characterization. More generally, the impact of avoiding repetition in translation has been convincingly demonstrated. Translating repeated items or patterns as discrete units 142 Lorenzo Mastropierro disrupts the networks of significance that repetition creates. This disruption may be local, when repetition occurs in a short portion of text. For example, Zupan (2006, 265) discusses the choice of the translator to remove the repetition of three appositions occurring one after the other in a short passage of Poe’s “The Fall of the House of Usher.” More often, though, repeated items build a network of recurrences that develops throughout a text, with each individual occurrence emphasizing and strengthening the overall pattern. In these cases, the disruption resulting from altering the patterns of repetition is not local but textual, or, to put it in Zupan’s (2006, 267) words, “the way the translated text functions at the macrostructural level, compared to the original text.” Macrostructural changes have been reported, for example, in the Italian translation of the Harry Potter series (Mastropierro 2020), where the replacement of patterns of repeated reporting verbs with lexical variety at the level of verb usage results in potential shifts in character development throughout the series. Similarly, Čermáková and Mahlberg (2018) find macrostructural changes in the modelling of Alice and the Queen in the Czech version of Carroll’s Alice’s Adventures in Wonderland as a result of an inconsistent translation of the ST repeated reporting verbs and body language patters throughout the novel. However, if the stylistic and interpretational implications of repetition avoidance have been evidenced (e.g., Mastropierro 2020; Čermáková and Mahlberg 2018; Čermáková 2015, 2018; Čermáková and Fárová 2010; Zupan 2006), and the strategies used by the translators to avoid repetition discussed (e.g., Ben-Ari 1998; al-Khafaji 2006; Jawad 2009), insufficient attention has instead been paid to the description of the phenomenon itself. There is, for instance, little or no discussion, substantiated by corpus data, of when repetition avoidance occurs, with what types of items, and in what textual contexts. Aiming to contribute to redress this gap, the present chapter presents a study of repetition avoidance in translation that focuses on the occurrence of the phenomenon itself rather than on its translational consequences. In particular, it investigates multiple linguistic factors as predictors of repetition avoidance. In this way, this chapter sheds some light on the linguistic nature of those items the repetition of which is avoided in translation, furthering our understanding of the reasons that the phenomenon may be occurring so frequently in translation. Although essentially descriptive, the findings of this chapter can be beneficial for improving translation practice and training: we need to study how and when repetition avoidance happens so we can address the why it is happening. The remainder of this chapter is structured as follows. The next section describes the methodology for data collection and analysis, and describes the factors taken into account as potential predictors of repetition avoidance. Section 8.4 presents the results of the multifactorial analysis and discusses them, while Section 8.5 provides some concluding remarks. 8.3 Methods and Data Recent developments in empirical translation studies have emphasized the multidimensional nature of the translation phenomenon. As any other linguistic The Avoidance of Repetition in Translation 143 phenomenon, translation too is shaped by a multitude of factors, the interaction of which defines the final product. This is true at both the macrolevel, where aspects such as translator expertise, language pair, or communicative purpose of the TT frame translating practices, and at the microlevel, where linguistic factors like polysemy, availability of equivalents, or lexicosyntactic constraints influence the choices of the translator. De Sutter and Lefer (2019) argue that, to understand fully the multidimensional nature of translation phenomena, multifactorial research designs are essential. Multifactorial studies allow the researcher to investigate simultaneously the effect of multiple variables on a phenomenon, providing a prediction of whether, and to what extent, such variables influence the observed phenomenon and/or its features. They have been successfully employed in a number of recent studies in CBTS (e.g., Kajzer-Wietrzny et al. 2021; KajzerWietrzny and Grabowski 2021; De Sutter and Lefer 2019; Kruger 2018), which demonstrates that multifactorial approaches can be used to account flexibly for a multitude of factors that govern the translation process and define its products. As such, a multifactorial design is employed in this study too. More specifically, this chapter explores the avoidance of repetition of reporting verbs in the Italian translation of the Harry Potter series. Reporting verbs were specifically selected for this study because, as mentioned in Section 8.2, they are a highly significant stylistic feature, the repetition of which has been demonstrated to play a fundamental role in the characterization process. Moreover, their contribution to characterization and character development in the Harry Potter books has already been established (Eberhardt 2017; Mastropierro 2020). By using reporting verbs, I can therefore ensure that the type of repetition taken into account is one that is stylistically relevant and of importance in translation. This type of repetition is investigated in terms of the extent to which four factors, representing linguistic features of the ST verbs, have an effect on the chance of a given ST reporting verb to be translated into multiple Italian reporting verbs. These factors are (i) the frequency of the ST verb, (ii) the number of different Italian equivalents of the ST verb, (iii) the number of different meanings of the ST verb, and (iv) the semantic category of the ST verb. I will discuss the factors in detail below, but first the procedure for data gathering and the data itself will be illustrated. Reporting verbs used to report the speech of the three protagonists of the Harry Potter books – Harry Potter, Ron Weasley, and Hermione Granger – were collected using InterCorp (Čermák and Rosen 2012), a multilingual parallel corpus developed in the context of the Czech National Corpus project (www.korpus.cz/). InterCorp offers a wide range of literary texts and their translation in several languages, including the seven Harry Potter novels and their Italian translations. A query combining both CQL syntax and regular expressions was employed to retrieve the reporting verbs from the original texts, through InterCorp web-based concordancer, KonText (Machálek 2014). The query identified all instances of closing quotation marks, followed by the name of one of the three characters, followed by a simple past verb – or closing quotation marks, followed by simple past verb, followed by name of the character. Figure 8.1 shows a sample of 144 Lorenzo Mastropierro Figure 8.1 Concordance sample for reporting verbs attributed to Harry. concordance lines retrieved in this way for Harry. These lines were manually checked, and reporting verbs were recorded, while verbs occurring in the specified position that were not reporting verbs (as with heard in line 6 of Figure 8.1) were removed. This query did not identify instances in which the characters were referred to with pronouns, but it allowed me to focus on the series’ protagonists without having to disambiguate manually the referent of each individual pronoun. After manually cleaning the data and removing instances of reporting verbs that were not repeated (that is, verbs with a minimum frequency < 2), 7,806 reporting verbs were gathered. In addition to using the data as a whole, the verbs were also divided into subsets to reflect different levels of distribution across the series. Three subsets were used. The first two divide the series on the basis of the translator. In fact, in Italian the novels were translated by two different translators: Marina Astrologo translated the first two novels, while Beatrice Masini translated the following five. The third subset comprises one novel only, Harry Potter and the Half-blood Prince (HbP). In this way, this study considers repetition translation and avoidance in incremental sizes of one book, two books (translated by Astrologo), five books (translated by Masini), and seven books (whole series). Moreover, given that the series has been translated by two different translators, a cross-comparison between them is possible, which could help understand whether the findings are translator-specific or could be indicative of a more generalized trend. Table 8.1 provides an overview of the data and subsets. The Italian translations of the reporting verbs were retrieved using the aligned functionalities of InterCorp. The corpus aligns automatically STs and TTs at the level of sentences, providing parallel concordance lines. Thus, English reporting verbs were searched individually so that all their Italian translations could be identified and retrieved. Table 8.1 Overview of the Data and Sub-Datasets HbP (one book) Translator 1 (two books) Translator 2 (five books) Series (seven books) Reporting Verb Types Reporting Verb Tokens 29 44 79 85 1,165 1,072 6,715 7,806 The Avoidance of Repetition in Translation 145 Figure 8.2 Query for reply in Treq. For each verb, four features or factors were recorded. Factor 1, labelled “Freq,” is the raw frequency of the reporting verb in each dataset. This was simply retrieved directly from the data gathered through InterCorp as described previously. Factor 2, labelled “Trans,” is the number of different possible Italian translations of the reporting verb. To retrieve this number, Treq (Vavřín and Rosen 2015) was used. Treq is a translation equivalents database freely available online as part of the Czech National Corpus suit of tools (https://treq.korpus.cz/#). It uses InterCorp to provide a list of equivalents for a query word in any of the language pairs available on the parallel corpus. For example, Figure 8.2 shows the results of a query for reply. Lemmas were searched, and only reporting verbs with a minimum proportion of occurrence > 4% were recorded. In this case, risposta, rispondere, and replicare are used as translation of reply in 43.1%, 43%, and 5% of the cases, respectively. However, risposta is not a reporting verb, so it was excluded; hence, “Trans” for reply would be 2. Factor 3, labelled “Senses,” is the number of possible different meanings of the ST verb. To retrieve this number, WordNet 3.1 (Fellbaum 1998) was employed. WordNet (https://wordnet.princeton.edu/) is a popular lexical database of English 146 Lorenzo Mastropierro Figure 8.3 Query for urged in WordNet. that provides a list of distinct concepts a word can refer to. Figure 8.3 shows a screenshot of a query for urged, for which three senses are provided; hence, “Senses” for urged would be 3. Finally, Factor 4, labelled “Verb_type,” indicates the type of reporting verb based on Caldas-Coulthard’s (1987) taxonomy. This taxonomy categorizes reporting verbs into seven main types, encompassing both linguistic and paralinguistic features. Table 8.2 provides an overview of all the main and subtypes of reporting Table 8.2 Reporting Verb Taxonomy Category Subcategory say, tell ask, inquire, reply, answer Neutral Structuring Metapropositional Assertive Directive Expressive Metalinguistic exclaim, proclaim, agree urge, instruct, order accuse, lament, swear narrate, quote, recount cry, shout, scream Prosodic Paralinguistic Examples Voice qualifier Voice qualification Signaling discourse Source: Adapted from Caldas-Coulthard (1987). whisper, murmur, mutter laugh, sigh, groan repeat, add, go on, hesitate The Avoidance of Repetition in Translation 147 verbs, with some examples. The types of verbs that are relevant for this study will be discussed in more detail in the analysis section, while for a full description of the taxonomy, see Caldas-Coulthard (1987). A multifactorial analysis was employed to study the impact of these four factors, or predictor variables, on a fifth variable, the outcome variable, that is, the number different Italian translations each ST reporting verb is translated into, labelled “Types.” In other words, the analysis will show whether the linguistic features of the ST reporting verbs that the factors represent have an effect on the translation of repetition or on its avoidance. To do so, a generalized linear model with Poisson distribution was fitted, as “Types” is a count variable with which Poisson regression is typically employed (Winter 2020, 218). R (R Core Team 2021) was used to run the analysis, and specifically the glm function. The analysis was repeated for each dataset (HbP, Translator 1, Translator 2, Series), and the results were compared. The results of the analysis and their discussion are presented in the next section. 8.4 Multifactorial Analysis Preliminary analyses were run to find the most efficient model, in line with what Brezina (2018, 123) defines a “hybrid procedure.” Starting with a model with no predictor variables, predictors were added and/or deleted, and the model reassessed with each change on the basis of Akaike information criterion (AIC). AIC was used to test whether the fit of the model improved with the addition or deletion of a variable. These preliminary tests showed that the most efficient models, the ones with the lowest AIC, included two variables, “Freq” and “Verb_types,” while the addition of “Senses” and “Trans” did not improve the overall fit. Moreover, during the tests it was also noticed that the distribution of “Freq,” a numerical factor, was highly skewed, so the variable was logarithmically transformed, as is usually done in variational-linguistic research (De Sutter and Lefer 2019, 10). Results (in Tables 8.3 to 8.6) show that “Freq” and “Verb_types” are significant predictors of “Types,” while “Trans” and “Senses” do not have a significant effect. This means that how many times a reporting verb occurs and what type of verb it is influence the chances of seeing that verb translated into multiple target language items. On the other hand, the number of translation equivalents and meanings of the ST verb do not determine whether its repetition is reproduced or avoided. This outcome is consistent across all datasets, meaning, that the same factors are significant predictors independently on the size of the data taken into account (one book, two books, five books, or the whole series) or between the two translators. The plots in Figure 8.4 show the nature and size of the effect that the predictor “Freq” has on “Types.” The more often an ST reporting verb occurs (x axis), the more likely it is that the verb is translated into multiple different translations (y axis). In other words, the more often a verb is repeated in the original, the less likely the repetition is reproduced in translation. Again, the same outcome is seen in all datasets. In addition to providing further evidence for the tendency to avoid 148 Lorenzo Mastropierro Table 8.3 Generalized Linear Model: Series NULL Verbtype LogFreq Df Deviance Resid. Df Resid. Dev Pr(> Chi) 7 1 711.85 362.06 77 76 84 411.41 49.35 1123.26 < < 2.20E-16 2.20E-16 *** *** Estimate Std. Error z Value -0.03215 0.12438 1.01132 0.04203 0.16711 -0.2436 0.11579 0.22753 0.48313 0.57814 0.5884 0.62759 0.60895 0.60355 0.64947 0.60846 0.58926 0.03297 -0.056 0.211 1.611 0.069 0.277 -0.375 0.19 0.386 14.654 0.956 0.833 0.107 0.945 0.782 0.708 0.849 0.699 <2e-16 Coefficients: (Intercept) VerbtypeMprop VerbtypeN VerbtypePros VerbtypeSdis VerbtypeStr VerbtypeVier VerbtypeVion LogFreq Table 8.4 Generalized Linear Model: Translator 1 NULL Verbtype LogFreq Df Deviance Resid. Df Resid. Dev Pr(> Chi) 6 1 179.807 69.052 37 36 43 85.202 16.15 265.009 < < 2.20E-16 2.20E-16 *** *** Coefficients: (Intercept) VerbtypeN VerbtypePros VerbtypeSdis VerbtypeStr VerbtypeVier VerbtypeVion LogFreq Estimate Std. Error z Value -0.002834 0.114604 -0.679034 -0.064428 -2.119339 0.029688 0.033445 0.583401 0.209837 0.491797 0.448877 0.417583 1.03578 0.297997 0.260963 0.087144 -0.014 0.233 -1.513 -0.154 -2.046 0.1 0.128 6.695 0.9892 0.8157 0.1303 0.8774 0.0407 0.9206 0.898 2.16E-11 repetition seen in previous studies, this finding seems also to confirm that such a tendency could be the result of an intentional choice. In fact, it is the most marked instances of repetition – hence the most noticeable – that are more likely to be replaced by lexical variety in translation. However, as the widening of confidence intervals suggests, certainty of the models decreases when “Freq” increases, as there are not many data points with very high frequency. There is actually one The Avoidance of Repetition in Translation 149 Table 8.5 Generalized Linear Model: Translator 2 NULL Verbtype LogFreq Df Deviance Resid. Df Resid. Dev Pr(> Chi) 7 1 657.42 323.43 71 70 78 367.74 44.31 1025.16 < < 2.20E-16 2.20E-16 *** *** Estimate Std. Error z Value -0.33539 0.36548 1.30278 0.36762 0.44711 0.15346 0.23736 0.49293 0.48387 1.0003 1.00804 1.03439 1.02007 1.01678 1.04405 1.02466 1.00811 0.03519 -0.335 0.363 1.259 0.36 0.44 0.147 0.232 0.489 13.75 0.737 0.717 0.208 0.719 0.66 0.883 0.817 0.625 <2e-16 Coefficients: (Intercept) VerbtypeMprop VerbtypeN VerbtypePros VerbtypeSdis VerbtypeStr VerbtypeVier VerbtypeVion LogFreq Table 8.6 Generalized Linear Model: HbP NULL Verbtype LogFreq Df Deviance Resid. Df Resid. Dev Pr(> Chi) 6 1 444.06 117.43 22 21 28 121.04 3.61 565.11 < < 2.20E-16 2.20E-16 *** *** Coefficients: (Intercept) VerbtypeN VerbtypePros VerbtypeSdis VerbtypeStr VerbtypeVier VerbtypeVion LogFreq Estimate Std. Error z Value -0.63511 1.56509 0.71357 0.19087 -0.28812 0.34638 0.58982 0.55745 0.37724 0.50393 0.45649 0.5 0.60491 0.50241 0.44569 0.07348 -1.684 3.106 1.563 0.382 -0.476 0.689 1.323 7.586 0.0923 0.0019 0.118 0.7027 0.6339 0.4905 0.1857 3.29E-14 point only the frequency of which is much higher than that of all other verbs, as it can be clearly seen in the plots in Figure 8.4. That point represents said, which is the reporting verb that in all datasets occurs by far the most frequently and has been translated into the largest number of different translations. In order to check that such an outlier was not skewing the data and affecting excessively the prediction of the models, the analyses were repeated without said. The 150 Lorenzo Mastropierro Figure 8.4 “Freq” effect plots. resulting effect plots are collected in Figure 8.5. As the plots show, removing said does not change the model predictions, which still show clearly a positive correlation between the number of different translations and the frequency of the reporting verb. Moving on to “Verb_type,” the other predictor with a significant effect on “Types,” the same consistency of results among all datasets can be noticed. The coefficients’ estimates (seen in Tables 8.3 to 8.6) indicate that the verb types that are more or less likely to be translated in many different ways are the same between the two translators and across the different number of novels taken into account. The repetition of neutral verbs (“VerbtypeN”) and voice qualification verbs (“VerbtypeVion”) is more likely to be avoided in favor of lexical variety, while the repetition of structuring verbs (“VerbtypeS”) is more likely to be reproduced unaltered. Neutral verbs (e.g., said and told) simply indicate the illocutionary act and are “interpretatively empty” (Ruano San Segundo 2017, 111). As blank verbs, they lend themselves to be translated into multiple ways, by adding additional meanings to the neutral “baseline.” For instance, example 1 shows a case in which the neutral verb said has been translated into the metapropositional verb decretò (“decreed”), adding decisiveness and finality to what Ron said. (1) ENG: “He can’t have,” said Ron. ITA: “Impossibile,” decretò Ron. [“Impossible,” decreed Ron.] (Harry Potter and the Deathly Hallows) The Avoidance of Repetition in Translation 151 Figure 8.5 “Freq” effect plots without said. Moreover, neutral verbs are extremely frequent, especially said, which is the most frequent reporting verb in the data and in English more generally (see Section 8.2). As explained in Section 8.2, previous research (Rojo and Valenzuela 2001, 476; Levý 2011, 113) has suggested that the repetition of said is much more common and tolerated in English than its equivalents are in other languages, including Italian (Mastropierro 2020). Thus, in addition to adding meaning to supplement the neutral nature of the original verb, the translators may have also aimed to conform to the stylistic norms of the target language. Voice qualification verbs (e.g., gasped, hissed, panted, roared) are a subcategory of paralinguistic verbs that “mark the attitude of the speaker in relation to what is being said” (Caldas-Coulthard 1987, 163) through paralinguistic cues. Differently from neutral verbs, they do convey an interpretational meaning that is added to the propositional content of the reported speech; in other words, how something is said complements the meaning of what is said. For example, in “I don’t believe this,” snarled Harry, the intensity and fervor of what Harry says is conveyed by the use of the verb snarled rather than by the proposition he utters. The repetition of voice qualification verbs in the STs is replaced with a range of verb types in the TTs, resulting in a wider lexical variety compared to the original. Not only Italian voice qualifications verbs are used to translate them but also metapropositional, signaling discourse, and neutral verbs. Some instances can be seen in examples 2, 3, and 4. 152 Lorenzo Mastropierro (2) ENG: “No – everyone’s fine – ” gasped Harry. ITA: “No . . . Stanno tutti bene . . .” balbettò Harry. [“No . . . everyone is fine,” stammered Harry.] (Harry Potter and the Order of the Phoenix) (3) ENG: “Water” panted Harry. ITA: “Acqua” ripeté Harry. [“Water,” repeated Harry.] (Harry Potter and the Half-Blood Prince) (4) ENG: “I don’t believe this,” snarled Harry. ITA: “Non ci posso credere” sbottò Harry. [“I can’t believe this,” snapped Harry.] (Harry Potter and the Order of the Phoenix) On the other end of the cline, that is, the types of verbs that are less likely to be translated in many different ways, are structuring verbs. Structuring verbs indicate that the reported utterance is part of a speech act exchange (Caldas-Coulthard 1987, 155), signaling either prospection (e.g., asked or enquired) or retrospection (e.g., answer or reply). The vast majority of the originals’ structuring verbs are translated with Italian structuring verbs, reproducing the repetition from the STs to the TTs. Structuring verbs enact quite literally a structuring function, establishing the answerreply sequence. They cannot be replaced with another type of verbs without affecting their role in structuring the exchange. This may be the reason why the repetition of this type of verbs is more likely to be reproduced in translation, compared to other types of verbs, neutral and voice qualification verbs especially. Example 5 shows an instance of a STs structuring verb translated into an Italian structuring verb. (5) ENG: “All right if we join you?” asked Ron. ITA: “Ti va bene se ci sediamo qui?” le chiese Ron. [Is it okay with you if we sit here?” Ron asked her.] (Harry Potter and the Deathly Hallows) Even though understanding the reasons behind the choice of the translators to avoid or maintain the repetition of certain types of verbs is beyond the scope of this chapter, as further research is needed to do so, a tentative hypothesis can be suggested. The decision to translate the same verb in many different ways may depend on the perceived extent of the alteration. Translators may feel that by replacing a neutral verb with a metapropositional or prosodic verb, they are simply adding some extra flavor to an otherwise-blank basis (i.e., said). Equally, by replacing a voice qualification verb with a different voice qualification verb, translators may have the perception of modifying the flavor of the verb without altering substantially its core denotational meaning. However, replacing a structuring verb may have been perceived as a more marked change, involving not simply a shift of meaning but an alteration of the very organization of the reported exchange. This perception is misleading though, as it narrowly fixates on the The Avoidance of Repetition in Translation 153 individual case, disregarding the bigger picture. As has been discussed in Section 8.2, individual changes in favor of lexical variety, when reiterated throughout a text, can result in macrostructural alterations. Repeated patterns that develop across the text are built on the reiteration of individual items; altering the repetition of the individual items affects the overall pattern. Hence, even changes that may seem uninfluential at first, like translating a verb into a different verb from the same verb category or replacing a neutral verb like said with a more meaningful verb, when reiterated can have an impact on how a text functions on the macrostructural level. This has been shown in the context of the Harry Potter series specifically (Mastropierro 2020), where differences in the ratios and proportions of different verb types across the novels between the STs and the TTs result in potential alterations to characters’ development. 8.5 Conclusion Repetition in translation is an underexplored phenomenon, but the existing research pointed out a tendency to avoid repetition in favor of lexical variety. Many studies emphasized the potential effects that such a tendency can have in translation – especially in terms of its impact on the style of literary texts – or described the strategies used by translators to avoid repetition. However, little or no attention has been paid to the description of the actual phenomenon itself, for example in terms of contexts in which it can occur or factors that could influence it. The present study shed some light on these aspects of the phenomenon, using repeated reporting verbs in the Harry Potter series and its Italian translation as a case study. It used a multifactorial research design to understand the linguistic features of the verbs that are more or less likely to be translated into multiple different ways. It showed that some of these features can indeed predict the way verb repetition is translated. Two factors were shown to have a significant effect on the avoidance of repetition in translation: the frequency and type of the ST verb. The more frequent a verb, hence the more noticeable its repetition, the more likely is its translation into different target language verbs. At the same time, some types of verbs, such as neutral and voice qualification verbs, are more likely to be translated into multiple ways compared to other types, such as structuring verbs. In contrast, the numbers of different meaning and different translation equivalents of the ST verb do not influence how the repetition of that verb is treated in translation, suggesting that polysemy and wider availability of target language options are not significant factors in determining the avoidance of repetition or otherwise. These findings were identified across a range of datasets that consisted of different number of novels (one, two, five, and seven) and two different translators, suggesting a certain degree of generalizability. Of course, these remain preliminary findings, as further research with more and different data is needed to confirm the tendencies identified here. However, it is hoped that the present study showed an alternative approach to the study of repetition in translation, one in which the data-based and multifactorial description of the phenomenon itself takes center stage instead of 154 Lorenzo Mastropierro its consequences. We know what the effects of repetition avoidance can be; we need now to understand when, how, and why repetition avoidance happens, to inform better approaches to translation training and practice. References Al-Khafaji, Rasoul. 2006. “In Search of Translational Norms: The Case of Shifts in Lexical Repetition in Arabic-English translations.” Babel 52, no. 1: 39–65. Ben-Ari, Nitsa. 1998. “The Ambivalent Case of Repetitions in Literary Translation. Avoiding Repetitions: A ‘Universal’ of Translation.” Meta 43, no. 1: 68–78. Biber, Douglas. 2009. “A Corpus-driven Approach to Formulaic Language in English: Multi-word Patterns in Speech and Writing.” International Journal of Corpus Linguistics 14, no. 3: 275–311. Boase-Beier, Jean. 1994. “Translating Repetition.” Journal of European Studies 24: 403–9. Bourne, Julian. 2002. “Controlling Illocutionary Force in the Translation of Literary Dialogue.” Target 14, no. 2: 241–61. Brezina, Vaclav. 2018. Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press. Caldas-Coulthard, Carmen Rosa. 1987. “Reported Speech in Written Narrative Texts.” In Discussing Discourse, edited by Malcolm Coulthard, 149–67. Birmingham: University of Birmingham. Čermák, František, and Alexandr Rosen. 2012. “The Case of InterCorp, a Multilingual Parallel Corpus.” International Journal of Corpus Linguistics 17, no. 3: 411–27. Čermáková, Anna. 2015. “Repetition in John Irving’s Novel a Widow for One Year. A Corpus Stylistic Approach to Literary Translation.” International Journal of Corpus Linguistics 20, no. 3: 355–77. Čermáková, Anna. 2018. “Translating Children’s Literature: Some Insights from Corpus Stylistics.” Ilha Desterro 71, no. 1: 117–33. Čermáková, Anna, and Lenka Fárová. 2010. “Keywords in Harry Potter and Their Czech and Finnish Translation Equivalents.” In InterCorp: Exploring a Multilingual Corpus, edited by František Čermák, Patrick Corness, and Aleš Klégr, 177–88. Prague: NLN. Čermáková, Anna, and Michaela Mahlberg. 2018. “Translating Fictional Characters – Alice and the Queen from the Wonderland in English and Czech.” In The Corpus Linguistics Discourse. In Honour of Wolfganf Teubert, edited by Anna Čermáková and Michaela Mahlberg, 223–53. Amsterdam and Philadelphia, PA: John Benjamins. Conrad, Susan, and Douglas Biber. 2004. “The Frequency and Use of Lexical Bundles in Conversation and Academic Prose.” Lexicographica 20: 56–71. Corness, Patrick. 2010. “Shifts in Czech Translation of the Reporting Verb Said in English Fiction.” In InterCorp: Exploring a Multilingual Corpus, edited by František Čermák, Patrick Corness, and Aleš Klégr, 177–88. Prague: NLN. Culpeper, Jonathan. 2001. Language and Characterisation: People in Plays and Other Texts. Harlow: Pearson Education. De Sutter, Gert, and Marie-Aude Lefer. 2019. “On the Need for a New Research Agenda for Corpus-based Translation Studies: A Multi-methodological, Multifactorial and Interdisciplinary Approach.” Perspectives 28, no. 1: 1–23. Eberhardt, Maeve. 2017. “Gendered Representations Through Speech: The Case of the Harry Potter Series.” Language and Literature 26, no. 3: 227–46. Fellbaum, Christiane, ed. 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press. The Avoidance of Repetition in Translation 155 Flowerdew, John, and Michaela Mahlberg, eds. 2009. Lexical Cohesion and Corpus Linguistics. Amsterdam and Philadelphia, PA: John Benjamins. Granger, Sylviane, and Marie-Aude Lefer, eds. 2022. Extending the Scope of Corpusbased Translation Studies. London: Bloomsbury. Halliday, Michael Alexander Kirkwood, and Ruqaiya Hasan. 1976. Cohesion in English. London: Longman. Hori, Masahiro. 2004. Investigating Dickens’ Style. A Collocational Analysis. Basingstoke: Palgrave Macmillan. Hu, Kaibao. 2016. Introducing Corpus-based Translation Studies. Heidelberg and Berlin: Springer. Ikeo, Reiko. 2016. “An Analysis of Viewpoints by the Use of Frequent-multi-word Sequences in DH Lawrence’s Lady Chatterley’s Lover.” Language and Literature 25, no. 2: 159–84. Jawad, Hisham. 2009. “Repetition in Literary Arabic: Foregrounding, Backgrounding, and Translation Strategies.” Meta – Translators’ Journal 54, no. 4: 753–69. Kajzer-Wietrzny, Marta, and Łukasz Grabowski. 2021. “Formulaicity in Constrained Communication: An Intermodal Approach.” MonTI 13: 148–83. Kajzer-Wietrzny, Marta, Ilmari Ivaska, and Adriano Ferraresi. 2021. “ ‘Lost’ in Interpreting and ‘Found’ in Translation: Using an Intermodal, Multidirectional Parallel Corpus to Investigate the Rendition of Numbers.” Perspectives 29, no. 4: 469–88. Károly, Krisztina. 2010. “Shifts in Repetition vs. Shifts in Text Meaning. A Study of the Textual Role of Lexical Repetition in Non-literary Rranslation.” Target 22, no. 1: 40–70. Klaudy, Kinga, and Krisztina Károly. 2005. “Implicitation in Translation: Empirical Evidence for Operational Asymmetry in Translation.” Across Languages and Cultures 6, no. 1: 13–28. Kruger, Alet, Kim Wallmach, and Jeremy Munday, eds. 2013. Corpus-based Translation Studies: Research and Applications. London: Bloomsbury. Kruger, Haidee. 2018. “That Again: A Multivariate Analysis of the Factors Conditioning Syntactic Explicitness in Translated English.” Across Languages and Cultures 20, no. 1: 1–33. Leech, Jeoffrey, and Mick Short. 2007. Style in Fiction: A Linguistic Introduction to English Fictional Prose. 2nd ed. Harlow: Pearson Longman. Levý, Jiří. 2011. The Art of Translation. Translated by Patrick Corness. Amsterdam and Philadelphia, PA: John Benjamins. Machálek, Tomáš. 2014. KonText – Application for Working with Language Corpora [Computer software]. Prague: FF UK. http://kontext.korpus.cz. Accessed November 2019. Mahlberg, Michaela. 2010. “Corpus Linguistics and the Study of Nineteenth-century Fiction.” Journal of Victorian Culture 15, no. 2: 292–98. Mahlberg, Michaela. 2012. “The Corpus Stylistic Analysis of Fiction or the Fiction of Corpus Stylistics?” In Corpus Linguistics and Variation in English: Theory and Description, edited by Joybrato Mukherjee and Magnus Huber, 77–95. Amsterdam: Rodopi. Mahlberg, Michaela. 2013. Corpus Stylistics and Dickens’s Fiction. London and New York: Routledge. Mahlberg, Michaela, and Dan McIntyre. 2011. “A Case for Corpus Stylistics: Ian Fleming’s Casino Royale.” English Text Construction 4, no. 2: 204–27. Mahlberg, Michaela, Viola Wiegand, Peter Stockwell, and Anthony Hennessey. 2019. “Speech-bundles in the 19th-century English Novel.” Language and Literature 28, no. 4: 326–53. Mastropierro, Lorenzo. 2020. “The Translation of Reporting Verbs in Italian: The Case of the Harry Potter Series.” International Journal of Corpus Linguistics 25, no. 3: 241–69. 156 Lorenzo Mastropierro Mastropierro, Lorenzo. (forthcoming). “Gendered Voices in Translation: Reporting Verbs in the Italian Translation of the Harry Potter Series.” In Good Girls and Brave Boys: 19th Century and Contemporary Children’s Literature and Childhood, edited by Michaela Mahlberg and Anna Čermáková. London: Bloomsbury. Mastropierro, Lorenzo, and Michaela Mahlberg. 2017. “Key Words and Translated Cohesion in Lovecraft’s at the Mountains of Madness and One of Its Italian Translations.” English Text Construction 10, no. 1: 78–105. McCarthy, Michael, and Ronald Carter. 2014. Language as Discourse: Perspectives for Language Teaching. London and New York: Routledge. Mikhailov, Mikhail, and Robert Cooper. 2016. Corpus Linguistics for Translation and Contrastive Studies: A Guide for Research. London and New York: Routledge. Mukařovský, Jan. 1932 “Standard Language and Poetic Language.” In A Prague School Reader on Aesthetics, Literary Structure and Style, edited and translated by Paul Garvin, 17–30. Washington, DC: Georgetown University Press. Partington, Alan. 1998. “Connotation and Semantic Prosody.” In Patterns and Meaning: Using Corpora for English Language Research and Teaching, edited by Alan Partington, 65–78. Amsterdam and Philadelphia, PA: John Benjamins. Paton, Steven. 2009. “Tile-Lessness, Simultaneity and Successivity: Repetition in Beckett’s Short Prose.” Language and Literature 18, no. 4: 357–66. Prusse, Michael. 2012. “Repetition, Difference and Chiasmus in John McGahern’s Narratives.” Language and Literature 21, no. 4: 363–80. R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. Rojo, Ana, and Javier Valenzuela. 2001. “How to Say Things with Words: Ways of Saying in English and Spanish.” Meta 46, no. 3: 467–77. Ruano San Segundo, Pablo. 2016. “A Corpus-stylistic Approach to Dickens’ Use of Speech Verbs: Beyond Mere Reporting.” Language and Literature 25, no. 2: 113–29. Ruano San Segundo, Pablo. 2017. “Reporting Verbs as a Stylistic Device in the Creation of Fictional Personalities in Literary Texts.” Journal of the Spanish Association of AngloAmerican Studies 39, no. 2: 105–24. Ruano San Segundo, Pablo. 2018a. “An Analysis of Charles Dickens’s Gender-based Use of Speech Verbs.” Gender and Language 12, no. 2: 192–217. Ruano San Segundo, Pablo. 2018b. “Dickens’s Hyperbolic Style Revisited: Verbs That Describe Sounds Made by Animals Used to Report the Words of Male Villains.” Style 52, no. 4: 475–93. Sinclair, John. 2004. Trust the Text: Language, Corpus and Discourse. London and New York: Routledge. Stubbs, Michael. 2002. Words and Phrases: Corpus Studies of Lexical Semantics. Oxford: Blackwell Publishers. Tannen, Deborah. 1989. Talking Voices: Repetition, Dialogue and Imagery in Conversational Discourse. Cambridge: Cambridge University Press. Toolan, Michael. 2016. Making Sense of Narrative Text: Situation, Repetition, and Picturing in the Reading of Short Stories. London and New York: Routledge. Vavřín, Martin, and Alexandr Rosen. 2015. “Treq (v. 2.1).” https://treq.korpus.cz/. Accessed January 2022. Vincent, Benet, and Jim Clarke. 2017. “The Language of A Clockwork Orange: A Corpus Stylistic Approach to Nadsat.” Language and Literature 26, no. 3: 247–64. Wales, Katie. 2011. A Dictionary of Stylistics. 3rd ed. London and New York: Routledge. Winter, Bodo. 2020. Statistics for Linguists. An Introduction using R. London and New York: Routledge. The Avoidance of Repetition in Translation 157 Winters, Marion. 2007. “F. Scott Fitzgerald’s Die Schönen und Verdammten: A Corpusbased Study of Speech-act Report Verbs as a Feature of Translators’ Style.” Meta 52, no. 3: 412–25. Zhu, Chunshen. 2004. “Repetition and Signification: A Study of Textual Accountability and Perlocutionary Effect in Literary Translation.” Target 16, no. 2: 227–52. Zupan, Simon. 2006. “Repetition and Translation Shifts.” ELOPE: English Language Overseas Perspectives and Enquiries 3, no. 1–2: 257–68. 9 Feminist Translation of Sexual Content A Quantitative Study on Chinese Versions of The Color Purple Xinyi Zeng and John Sie Yuen Lee 9.1 Introduction Many cultures consider references to genitals, sexual intercourse, and other sexrelated content to be taboo. Translators therefore often walk a tightrope balancing their desire for faithful translation, the need to be explicit for the sake of clarity, and the pressure to omit or mitigate sensitive material for acceptability. While omission of sexual content in a translation is a well-known phenomenon (e.g., Han 2008; Santaemilia 2011; Wu 2010), few quantitative analyses have been reported on the frequency and choice of translation strategies for different types of sexual content. This chapter aims to identify distinctive strategies used in a feminist translation through a comparison between two Chinese versions of an English novel, one feminist and the other non-feminist. Our study focuses on the acclaimed feminist novel The Color Purple (Walker 1985), winner of the National Book Award and the Pulitzer Prize, and its translations by Jie Tao (1998) and Renjing Yang (1987). Among the few translators who have identified themselves as feminist (Mu 2008, 25), Tao specializes in feminist literature and has published extensively on feminist consciousness in female writers (Tao 1994, 1995, 2004). Yang, also an expert in American literature, focuses on Hemingway and does not identify as feminist (Yang 1989). While the sexual content in these two translations has been discussed in the research literature (e.g., Han 2008; Lee 2013), no quantitative comparison has been given. This study presents a parallel text analysis that quantitatively identifies the characteristics of a feminist translation, as represented by Jie Tao. Specifically, we address the following research questions on the translation strategies employed in the two versions: • • (Q1) How does a feminist translation differ from a non-feminist one in terms of translation strategy? Adopting a corpus-based methodology, we annotate the translation strategies used in the feminist and non-feminist versions, focusing on deletion, implicit substitution and transcreation, faithful translation, and explicit substitution and transcreation. (Q2) How does the feminist translation strategy vary according to the nature of the sexual content? We investigate differences in the choice of DOI: 10.4324/9781003298328-10 Feminist Translation of Sexual Content 159 translation strategy for various sexual content types, including references to private parts, body exploration, bodily phenomena, stigma, rape, illicit relations, and sexual intercourse. Quantitative research methods have been increasingly applied in translation studies (Mellinger and Hanson 2017). However, according to a recent survey, only 1 out of 33 reviewed articles on feminist translation on novels employed a corpusbased method (Irshad and Yasmin 2022). To the best of our knowledge, this is the first parallel text study that statistically characterizes feminist translation practice and its treatment of different types of sex-related materials. The rest of the chapter is organized as follows. We first present background information on the text and summarize previous research on the translation of sexual content (Section 9.2). We then describe our parallel corpus in terms of the annotation on sexual content type (Section 9.3) and translation strategy (Section 9.4). Finally, we discuss the results (Section 9.5) and conclude (Section 9.6). 9.2 Background 9.2.1 Novel and Translators Celie, the heroine of The Color Purple, is often considered a model for female empowerment. At the beginning of the novel, she was raped by her stepfather but knew nothing about it. Her sexual experience with her husband, who did not care about her, was numb and hurtful. Then she met her true love, Shug, who taught her to find herself and appreciate her own body. Though criticized for being “too dirty” and “too explicit” (Holt 1996, 102), the sexual depictions in the novel are integral to its plot and themes. They immerse the reader in Celie’s world to experience her awakening from her painful childhood and sexual ignorance. This novel therefore provides valuable material for studying the translation of sexual content. The personal background of a translator plays an important role in his or her translation practice. Jie Tao and Renjing Yang are both Chinese experts in American literature, but they are poles apart in terms of feminist consciousness. While Yang (1989, 29–35) recognized certain feminist issues, such as the oppression of Black women, he rarely discussed his translation of sexual content in paratexts. In contrast, Tao, Vice Chairperson of the Domestic and Foreign Women’s Study Center at Peking University, paid high tribute to Walker’s feminist views and her proposition of racial and gender equality (Tao 2004, 277). She emphasized the awakening of the female protagonist and saw the description of the love, growth, and transformation of Black women as a great achievement of the novel. A comparative study on these two translators can therefore yield insights into the characteristics of feminist translation. 9.2.2 Feminist Translation and Parallel Text Analysis Language is a powerful tool to express one’s ideology, ideas, and subversion. Sexrelated content in literature often serves to shape the characters, develop the plot, 160 Xinyi Zeng and John Sie Yuen Lee and express the writer’s feelings. In feminist translation, such content typically focuses on the female body, inner desire, and sex life (von Flotow 1991, 69–84). Parallel text analysis, which compares translations of the same source text, has been conducted on free indirect speech (Bosseaux 2004, 107–22), report verbs (Winters 2007, 412–25), style (Huang and Chu 2014, 122–41), and discourse presentation (Huang 2014, 94–102). Research on sexual content, however, has mostly been limited to qualitative analyses on anecdotal examples. The relationship between body and language was examined in two English translations of Jin Ping Mei (Liang 2017, 1–20). Textual relations and aesthetic effects of erotic materials were analyzed in English versions of The Peony Pavilion (Lee and Ngai 2012, 73–94). It has also been argued that omissions in the translation of Sha Fu not only muddle the plot but also diminish the feminist sentiment (Wu 2010). In a previous study on The Color Purple, Han (2008, 69–85) showed that both Tao and Yang “cleansed” the source text because of the subconscious working of the sex taboo, rather than ethical considerations. Lee (2013) investigates the woman-identified translation approach in this novel with regard to sexual coercion, sex subjugation, extramarital affairs, and same-sex sexuality. However, none of these studies offer systematic comparisons in feminist translation strategies or consider the role of sexual content types. 9.3 Annotation of Sexual Content Types To quantify the treatment of different types of sexual content, we classified the content of sex-related sentences in The Color Purple (this section) and then constructed a parallel corpus and annotated the translation strategies (Section 9.4). 9.3.1 Identification of Sex-Related Sentences To identify sentences with sex-related content, we constructed a list of 31 taboo keywords with reference to existing lists (Kutner and Brogan 1974, 474–84; Geer and Bellard 1996, 379–95; Grosser and Walsh 1966, 219–27; Cooper 2007, 27–50; Slamia 2020, 82–98; Yuan 2015, 19–44). These keywords alone would have missed many sentences that express sexual content through words that are non-erotic in their primary sense. We therefore also constructed a list of 55 implicit keywords that can acquire a sexual connotation in context (e.g., “button,” “fresh,” and “clean”). Since the presence of a keyword does not necessarily mean that the sentence carries sexual content, human judgment is necessary. To reduce the subjectivity in interpretation, each sentence was independently labelled by two annotators, both non-native speakers but proficient in English, as either “sex-related” or “nonsex-related” in its context. The annotators were not shown the keyword in the sentence. Since manpower constraints did not allow exhaustive annotation of the entire novel, we retrieved between 1 and 4 example sentences from the novel per keyword, yielding 80 sentences with taboo keywords and 193 with implicit keywords, hence a total of 273 sentences. Feminist Translation of Sexual Content 161 Among these 273 sentences, 197 sentences were annotated as sex-related by both annotators; 65 were annotated as non-sex-related by both, and the remaining 11 elicited different judgments. The Cohen’s kappa coefficient was 0.89, corresponding to the “almost perfect” level of agreement (Landis and Koch 1977, 165). Most disagreements stemmed from interpretations of implicit keywords. For example, the sentence “He be on her all-time” was considered by one annotator to be a description of sexual intercourse, while the other annotator interpreted it without sexual tones to mean simply that they stayed together. The remainder of this article will focus on the 197 sentences considered as sex-related by both annotators (see Supplemental File). 9.3.2 Sexual Content Type Sex-related content can serve a variety of literary functions, from sexual innuendos that denigrate women, to descriptions of sexual pleasure that advocate selfdiscovery and self-acceptance. We identified the dominant content type in each of the 197 sentences according to the following taxonomy. Stigma (e.g., “slut,” “whore”) covers sentences containing insults, epithets, and other sexually derogatory terms, usually aimed at female characters. Rape (e.g., “wiggle it around,” “hurt”) includes sentences that depict sexual assault, mostly men forcibly having sex with women. Illicit relations (e.g., “fornication,” “incest”) refer to adultery, incest, and other illicit sexual relationships, mostly heterosexual. Sexual intercourse (e.g., “go to bed with,” “sleep with”) is the category for sentences describing sexually intimate behavior, either heterosexual or lesbian, that does not constitute rape or illicit relations. The remaining three types are directly related to the female body, in the context of this novel. Body exploration (e.g., “haul up,” “shiver”) describes masturbation and one’s sexual feelings. Bodily phenomena (e.g., “pregnant,” “blood”) are natural processes in the female body that are often considered “unclean,” such as menstruation and pregnancy. Finally, private parts (e.g., “breast,” “nipple”) are mentions of sexual organs, mostly female genitals, outside the context of sexual intercourse, body exploration, or bodily phenomena. Table 9.1 shows the number of sentences belonging to each type. Table 9.1 Breakdown of Sentences in Our Corpus According to Sexual Content Type Sexual Content Type # Sentences Private parts Stigma Bodily phenomena Rape Sexual intercourse Illicit relations Body exploration 28 22 22 18 62 19 26 162 Xinyi Zeng and John Sie Yuen Lee 9.4 Annotation of Translation Strategy Our annotation scheme for translation strategies can be visualized as a spectrum (Table 9.2). Faithful translation, the neutral strategy, occupies the middle, with conservative strategies placed to its left (Section 9.4.1) and explicit strategies to its right (Section 9.4.2). In a comparison of two strategies (Sections 9.5.2, 9.5.3), the strategy positioned to the right of the other will be said to be “more explicit.” All example sentences cited in this section can be found in Table 9.3. A translation is deemed “faithful” if the sexual content has a similar degree of explicitness as the source text. In our context, this means explicit English terms are kept explicit in the Chinese translation, for example, “nipples” in example 4a, and implicit ones are kept implicit, for example, “his thing” in example 4b. 9.4.1 Conservative Strategies Deletion means the omission of sex-related content by skipping the entire sentence (example 1a), by removing only the erotic parts of the sentence, or by hiding those parts with ellipses or suspension points (example 1b). At the conservative end of the spectrum, deletion can efficiently remove obscenity from the text but can also damage plot consistency. Implicit transcreation means rewriting the sentence to reduce indecency, as guided by the translator’s understanding of the source text. A graphic description, such as “put his thing up gainst my hip,” can be paraphrased as dong qi shoujiao lai, “move his hand and feet [on me]” (example 2). Besides avoiding sexual cues and vulgarity, this strategy can lead to better comprehension by the target readers than deletion (Spinzi et al. 2018). Implicit substitution uses euphemistic expressions to replace erotic content, typically through substitutions of specific terms. The word “privates,” for example, can be substituted with xiao duzi, “stomach,” to reduce the obscenity (example 3). This strategy is considered closer to faithful translation than implicit transcreation, which makes more significant changes to sentence structure and places less emphasis on fidelity and equivalence (Du 2020, 754). 9.4.2 Explicit Strategies Explicit substitution uses words that are even more explicit or bold than the original. It serves to clarify the eroticism by decoding slang terms, puns, sexual metaphors, etc. For example, the word “privates” was explicitly substituted with yinjing, “penis” (example 5), in direct contrast to the implicit substitution Table 9.2 The Spectrum of Translation Strategies Annotated in Our Corpus <<< Conservative Strategies <<< Deletion Implicit (D) Transcreation (IT) Implicit Substitution (IS) >>> Explicit Strategies >>> Faithful Translation (FT) Explicit Substitution (ES) Explicit Transcreation (ET) Feminist Translation of Sexual Content 163 Table 9.3 Translation Strategies Illustrated with Example Translations Strategy Source Text Translation Deletion 1a. Then he push his thing into my pussy. None 1b. Listening to you sing, folks git to thinking bout a good screw. 聽聽你唱歌,聽眾能夠想起 一次 . . . “Listening to you singing, audiences want to . . .” (Yang). Implicit Transcreation 2. First he put his thing up gainst my hip. 他動起手腳來 。 “He moves 他動起手腳來 his hand and feet on me” (Tao). Implicit Substitution 3. He punch her in the stomach, she double over groaning but come up with both hands lock right under his privates. 他掐住她的肚子,她大聲呻 吟,立即用兩隻 頭打中他 的小肚 小肚子 子。 “He nipped her stomach, she yelling loudly and punched his stomach immediately” (Yang). Faithful Translation 4a. First time I got the full sight of Shug Avery long black body with it black plum nipples, look like her mouth, I thought I have turned into a man. 我第一次看到莎格•艾弗裡 瘦長的黑身體和像她嘴唇 一樣的黑梅子似的乳 乳頭的 時候,我以為我變成男人 了。 “First time I saw Shug Avery’s long black body and her nipples which looks like her mouth, I thought I have turned into a man” (Tao). 4b. First he put his thing up gainst my hip 他把那東西隆起,頂著我的屁 股頂著我的屁股 “His thing perked up and gainst my hip” (Yang). Explicit Substitution 5. He punch her in the stomach, she double over groaning but come up with both hands lock right under his privates. 他一 打在她的肚子上,她 彎下身子大聲哼哼,可馬上 把兩手緊緊攥住他的陰莖。 “He punches her in the stomach; she bent down with groaning but gripped his penis” (Tao). Explicit Transcreation 6. How it hurt and how much I was surprise. 他搞 搞得我多疼。 我多麼 吃驚啊。 “How much he fucked me. How much I was surprised” (Yang). in example 3. In a feminist translation, these substitutions can focus the reader’s attention on the female experience. Explicit transcreation also uses language that is more explicit or bold than the original, but in a “looser” translation. As mentioned in Section 9.4.1, the translator’s understanding of the text is prioritized in transcreation over equivalence to 164 Xinyi Zeng and John Sie Yuen Lee the source text. Example 6 shows a creative rendering of “how it hurt” into a more explicit expression. 9.4.3 Annotation and Inter-Annotator Agreement Since the line between faithful translation and other strategies is often blurred, annotation of translation strategy can be subjective due to individual background and interpretation. To enhance the reliability of the analysis, we recruited eight native speakers of Chinese to participate in the annotation. All were professional translators, and six of them graduated from professional translation majors. We obtained draft sentence segmentation and English-Chinese sentence alignment from SDL Trados1 and then manually corrected the alignments. The translation strategies in each of the 197 sentences translated by Tao and the same sentences translated by Yang were independently classified by two annotators: the first author of this article, and one of these eight professional translators. The kappa coefficients were 0.58 and 0.56 for Tao and Yang, respectively, corresponding to a moderate level of agreement (Landis and Koch 1977, 165). In case of disagreement, the two annotators reconciled their differences through discussion to finalize the label. The most common disagreement was between faithful translation and explicit transcreation. Some annotators were more sensitive in detecting sexual emphases in the translation. Consider the sentence “When that hurt, I cry” and its translation, Wo teng de hen, han le qilai (我疼得很,喊了起來), “I feel pain and cry.” It was judged to be faithful translation by one annotator, but explicit transcreation by another, which may arguably be attributed to an overinterpretation of the word han, “cry.” 9.5 Results and Analysis We first compare the overall distribution of translation strategies in the feminist and non-feminist translations (Section 9.5.1) and then examine sentences for which the two versions differ in translation strategies (Section 9.5.2). Next, we investigate the role of sexual content types (Section 9.5.3), followed by a discussion on the significance of the results (Section 9.5.4). 9.5.1 Overall Comparison We use the term “conservative strategies” to refer to deletion, implicit substitution, and implicit transcreation (Section 9.4.1); and the term “explicit strategies” to refer to explicit substitution and explicit transcreation (Section 9.4.2). Figure 9.1 presents the overall distribution of translation strategies. Faithful translation was the most common strategy, adopted in about two-thirds of the sentences in both the feminist (67.0%) and non-feminist (64.0%) versions. The feminist translation uses explicit strategies more often (7.6%) than the non-feminist one (5.6%); conversely, the non-feminist translation uses conservative strategies more often (30.5%) than the feminist one (25.4%). The breakdown of the frequency of Feminist Translation of Sexual Content 165 Figure 9.1 Overall distribution of translation strategies for sexual content in feminist (Tao) and non-feminist (Yang) translation. Table 9.4 Breakdown of Translation Strategy Frequency in Feminist (Tao) and Non-Feminist (Yang) Translation Translation Strategy Feminist Non-feminist Deletion Implicit Transcreation Implicit Substitution Faithful Translation Explicit Substitution Explicit Transcreation 31 2 17 132 7 8 38 3 19 126 6 5 individual strategies, shown in Table 9.4, is consistent with the overall trend and suggests a slightly more tolerant attitude for the feminist translator. 9.5.2 Sentence-Level Comparison The sentence-level comparison investigates whether the feminist or non-feminist version translates a sentence in a more explicit manner. The feminist sentence is considered “more explicit” if its translation strategy is positioned on the spectrum (Table 9.2) to the right of the strategy taken in the non-feminist sentence, and vice versa. The feminist and non-feminist translations adopt the same strategy in a majority of sentences (128 out of 197), as shown in the diagonal cells in Table 9.5. Among the remaining sentences, the feminist ones are more explicit than the 166 Xinyi Zeng and John Sie Yuen Lee Table 9.5 Sentence-Level Comparison of the Translation Strategies in the Feminist (Tao) and Non-Feminist (Yang) Translations Non-feminist → ↓ Feminist Deletion Implicit Transc. Implicit Subst. Faithful Translation Explicit Subst. Explicit Transc. Deletion Implicit Transc. Implicit Subst. Faithful Translation Explicit Subst. Explicit Transc. 22 0 1 0 0 0 8 1 0 0 0 1 3 0 7 6 1 0 13 0 2 0 10 1 99 6 4 0 4 0 0 0 1 6 1 0 non-feminist counterparts most of the time. This can be seen in the 43 cases in the cells in the lower triangle, in comparison to the 26 cases in the cells in the upper triangle. The bolded figures show the four sentences that take opposite sides of the strategy spectrum – that is, one conservative and one explicit – in the two versions. Elsewhere, the gap is always smaller: we first discuss cases contrasting a faithful translation and a conservative one (Section 9.5.2.1), then those involving a faithful translation and an explicit one (Section 9.5.2.2). 9.5.2.1 Conservative vs. Faithful Translation There are 23 sentences that were translated faithfully by the feminist but conservatively by the non-feminist. The majority of the deleted sentences (7 out of 13 sentences) deleted by Yang deal with masturbation, for example when Shug taught Celie to touch and explore her body. In the other 10 sentences, Yang mostly used implicit substitution for private parts and heterosexual scenes. Conversely, 15 sentences were translated faithfully by the non-feminist but conservatively by the feminist. Tao used deletion in 8 sentences carrying a variety of sexual content, including rape, adultery, masturbation, and sexual intercourse. She applied implicit substitution and transcreation on three sentences describing heterosexual intercourse, one describing rape, and three with sexual insults on female characters. Yang gave a faithful translation despite the negative light cast on the female characters. 9.5.2.2 Explicit vs. Faithful Translation The treatment of pronouns is responsible for the bulk of the 12 sentences that were translated faithfully by the non-feminist but explicitly by the feminist. Tao used specific terms such as huaiyun (懷孕), “pregnancy,” and yuejing (月經), “menstruation,” to clarify the pronoun “it,” while Yang maintains the ambiguity with the euphemism zhezhong shi (這種事), “this kind of thing.” Likewise, Tao turned Feminist Translation of Sexual Content 167 “kind of love” into the more vivid zuo’ai (做愛), “make love,” which facilitates better understanding than the oblique hint in Yang’s nazhong aiqing (那種愛情), “that love.” Conversely, eight sentences have faithful translation in the feminist version but explicit translation in the non-feminist version. All these sentences involve sexual assault or stigma. Consider the expression “How it hurt” in example 6 (Table 9.3). Yang elaborated the pronoun “it” into ta gao de wo duo teng, “how much he fucked me,” using the pejorative verb gao in a vivid description of the rape. In contrast, Tao gave a neutral translation, nage tengtong de ziwei (那個疼痛的滋 味), “that hurtful feeling.” 9.5.3 Sexual Content Types We now assess whether different kinds of sexual content have any bearing on the choice of translation strategy. Our discussion will refer to the overall rates of faithful translation, conservative, and explicit strategies shown in Figure 9.2, using the same definitions for “explicit strategies” and “conservative strategies” as in Section 9.5.1. These rates are shown according to sexual content type, listed in decreasing rate of conservative translation in the feminist version. We first discuss the types in which the feminist translation exhibits a lower rate of conservative strategy than the non-feminist (Section 9.5.3.1), and then those in which it has a higher rate (Section 9.5.3.2). Our discussion will also refer to the sentence-level comparison in Figure 9.3. Using the same definition as Section 9.5.2, a translation strategy is said to be “more explicit” if it is positioned to the right of another strategy on the spectrum. All example sentences cited in this section can be found in Table 9.6. Tao Body exploration Yang Tao Private parts Yang Tao Rape Yang Tao Sexual intercourse Yang Tao Illicit relations Yang Tao Bodily phenomena Yang Tao Stigma Yang 0.00% 20.00% 40.00% Conservative (D+IS+IT) 60.00% Faithful translation 80.00% 100.00% Explicit (ES+ET) Figure 9.2 Overall distribution of translation strategies according to sexual content type in feminist (Tao) and non-feminist (Yang) translation. 168 Xinyi Zeng and John Sie Yuen Lee Figure 9.3 Sentence-level comparison: the number of sentences translated with a more explicit strategy in the feminist (Tao) than non-feminist (Yang) translation, and vice versa. Table 9.6 Example Sentences for Each Sexual Content Type Source text Strategy Stigma Faithful 1. He beat me for dressing Translation trampy. Implicit Transcreation Implicit Substitution Bodily Phenomena 2. When I start to hurt and then my stomach start moving and then that little baby come out my pussy chewing on it fist. Deletion Translation 他揍我,因為我穿得像個蕩婦, “He beats me because I dress like a tramp” (Tao). 他打我,怪我穿得邋裡邋遢的 。“He beats me and blames me to dress dirty” (Yang). 我的肚子突然一陣疼痛,肚子動 了起來,一個小娃娃從我那個 地方掉了出來,啃著手指頭。 “When I felt hurt my belly started to move and one little baby came out from that place, chewing fingers” (Tao). 我開始覺得疼時,緊接著,我 的肚子就蠕動起來,後來,那 小傢伙就生出來了,用嘴巴咬 著小 頭,簡直叫我大吃一 驚 。“When I felt hurt, my belly started moving, then the little guy came out and biting fist which surprised me” (Yang). Feminist Translation of Sexual Content 169 Table 9.6 (Continued) Source text Strategy Translation Sexual Intercourse 3. Much as I still want to be with her. Explicit Transcreation 儘管我非常想跟她親熱 “even though I very much want to make out with her” (Tao). 我仍然很想跟她同床 “I still very Faithful Translation much want to sleep in the same bed with her” (Yang). 哈波酒店去的時候打扮得漂漂亮 Implicit Private Parts 亮 ,抹得香噴噴的,可就不敢看 Substitution 4. All dressed up for 自己的下身。“Dressed beautiful to Harpo’s, smelling good Harpo’s hotel, smelling good but you and everything, but dare not to look down the lower part scared to look at your of your body” (Tao). own pussy. 上哈 打扮得漂漂亮亮的, Deletion 的 去, 香 , , 的 . . . 。“Dressed beautiful to Harpo’s bar, perfume is all around, everything is good, but you dare not to . . .” (Yang). Body Exploration 我下身 Faithful 來 5. My little button sort of Translation 了。 “The little button from the perk up too. lower part of the body has perked up” (Tao). None (Yang) Deletion Rape Faithful Translation 事後,我說,他讓我把他的頭髮 6. After he through, 。 “After it is done, I said, he I say, he mate me finish made me trim his hair” (Tao). trimming his hair. Explicit Substitution 他搞 搞過以後,我說,就叫我把他 。“After he fucked 的頭 me, I said, [he] asked me to trim his hair” (Yang). Illicit Relations Implicit Transcreation 跟随便哪個湯姆、迪克、哈里之類 7. Got they legs open to 可以睡覺。 “[They] may sleep every Tom, Dick and with anyone, Tom, Dick, Harry and Harry. the like” (Tao). Explicit Transcreation 她們跟湯姆、狄克和哈里鄉亂搞, 不論 人 。 “They fucked with Tom, Dick and Harry, fucked with all kinds of men” (Yang). 9.5.3.1 Less Conservative in Feminist Translation STIGMA In the feminist translation, stigma is subject to the lowest rate of conservative translation (4.5%) among all sexual content types (Figure 9.2). The feminist translator used conservative strategies more sparingly than did the non-feminist counterpart 170 Xinyi Zeng and John Sie Yuen Lee (18.2%), in line with the sentence-level comparison (Figure 9.3), which shows her to be slightly more explicit. These include four instances of faithful translation by Tao and implicit substitution or transcreation by Yang. In example 1, Celie was scolded by her stepfather as “trampy” when she dressed up to attract his attention and protect her sister from sexual harassment. Since the context suggests a slang usage of the word for a disreputable, promiscuous woman, Tao rendered “trampy” with the derogatory term dangfu, “slut.” In contrast, Yang translated it as lalilata, “messy,” taking the less sexually charged sense of a dirty wanderer living on the street. Among cases of explicit treatment of stigma, however, the non-feminist translation outnumbers the feminist. Many of these sentences are associated with virginity, for example, with the words “fresh” and “old-maid.” Yang highlighted the insinuation using explicit substitution with chunv (處女), “virgin,” while Tao deemphasized it, either with lao guniang (老姑娘), “old spinster,” or with the metaphorical term huanghua guinü (黃花閨女), “chrysanthemum girl.” BODILY PHENOMENA These phenomena incur the second-lowest rate of conservative translation for the feminist (9.1%) and the lowest rate for the non-feminist (13.6%). Terms related to these common and natural phenomena, such as menstruation and childbirth, appear to be more acceptable than those in most other types. Although Yang was only slightly more conservative than Tao overall, the sentence-level comparison shows Tao to be invariably more explicit when the two diverged. A case in point is her implicit substitution of “pussy” with nage difang, “that place,” in contrast to the outright deletion by Yang (example 2). Her disambiguation of the pronoun “it” into “menstruation,” discussed in Section 9.5.2.2, is another one. SEXUAL INTERCOURSE The feminist translation is slightly less conservative (28.0%) than the non-feminist one (34.0%) overall. The sentence-level comparison reveals a larger gap, with Tao being more explicit in 16 out of the 25 sentences in which they employed different strategies. The lesbian scenes involving Celie and Shug contributed much to the difference. Among the 12 sentences in this category, Tao used explicit strategy in three instances, while Yang never did so. In example 3, Tao stressed Celie’s active desire for sex by amplifying “to be with her” as gen ta qinre, “make out with her”; however, Yang preserved the ambivalence in the source text with the faithful translation gen ta tongchuang, “sleep in same bed with her.” PRIVATE PARTS The vast majority are references to female genitals, with only four on the male counterparts. The non-feminist translation applies conservative strategies on 50% of these references, significantly more often than did the non-feminist (35.7%). Tao was more explicit in a majority of sentences (7 out of 10). For example, in the Feminist Translation of Sexual Content 171 sentence “I got my eyes glue there too,” she boldly rendered “there” as xiongpu (胸脯), “bosom”, while Yang evaded the reference with wo yeshi yiyang ( 一樣), “me too.” While both translators seemed comfortable with private parts in the upper body (e.g., “nipples”), they almost always avoided the lower body, for example, with “pussy” in example 4. BODY EXPLORATION This content type triggers the highest rate of conservative treatment for both the feminist and non-feminist, likely because of the lack of socially acceptable terms for masturbation (Millett 2000, 55–58). Yang was more conservative (57.7%) than Tao (38.5%) overall, often deleting sensitive words describing one’s feelings during masturbation. The sentence-level comparison corroborates this trend with a clear contrast. Tao was more explicit than Yang in 7 out of 9 cases, including the treatment of “button” in example 5. 9.5.3.2 More Conservative in Feminist Translation RAPE In a departure from the content types discussed previously, when dealing with rape, the feminist translation is both more conservative (27.8 vs. 16.7%) and less explicit (0% vs. 16.7%) than the non-feminist (Figure 9.2). The sentence-level comparison yields the same observation, showing Yang to be more explicit (Figure 9.3). Yang ensured the reader understands “after he through” in example 6 with ta gao guo yihou, “after he fucked me,” while Tao obscures the harm done to Celie with shihou, “after it is done.” A similar contrast can be observed in example 4b in Table 9.3, where Yang preserved the graphic depiction of Celie’s rape by her stepfather, while Tao opted to remove the details with an implicit transcreation. ILLICIT RELATIONS Tao was more conservative than Yang overall (26.3% vs. 15.8%) when describing socially unaccepted relations, mostly adultery and incest. The direct comparison also shows Yang to be more explicit in 3 out of 5 cases. Two examples include the translation of “screw” as tongjian (通姦), “fornicate,” in the sentence “Feed fifty men, screw fifty-five,” and “got they legs open” as luan gao, “fuck wantonly,” in example 7. In both of these cases, Tao chose implicit substitution with the euphemistic term shuijiao, “sleep.” 9.5.4 Discussion Compared to a non-feminist translator, Jie Tao would be expected to hold a more open attitude for sex-related content, especially in a novel that “contributes to feminism and woman’s liberation” (Tao 2004, 276–7). However, Renjing Yang 172 Xinyi Zeng and John Sie Yuen Lee chose conservative strategies only slightly more often than did Tao (30.5% vs. 25.4%). He also opted for faithful translation almost as frequently (64.0% vs. 67.0%), all the more remarkable considering that his publication (1987) preceded hers (1998), separated by a decade of liberalization. Hence, a high rate of faithful translation, by itself, may not be a decisive metric for the feminist perspective. The rate could be attributed instead to the translator’s view on text acceptability, such as in the case of Yang, who considered the sexual content of the novel to be measured.2 The overall rate of faithful translation masks the underlying differences in sexual context types, which we found to be more revealing for the feminist perspective. On topics directly related to the female body, such as private parts, bodily phenomena, and body exploration, the feminist translation applies conservative strategies more sparingly (Section 9.5.3.1). Faithful or explicit translation of masturbation, for example, can be motivated by the translator’s approval of Celie’s self-discovery of her body, in accordance with feminist convictions. Frank descriptions of private parts and bodily phenomena can likewise be explained as a feminist effort to highlight the female protagonist’s growing sexual consciousness. Nonetheless, among all content types, private parts and body exploration are most often translated conservatively (Figure 9.3), indicating their relative unacceptability among the Chinese readership. The feminist attitude to stigma is nuanced. Perhaps similar to its treatment on the female body, the feminist translation prefers an unvarnished portrayal of sexual insults (e.g., “trampy”) to expose the systemic gender-based oppression. It is, however, more conservative when the stigma involves virginity (e.g., “old maid”). Since virginity is often bound up with the control of the female body, denial of sexual desire, and depression of sexual knowledge (Millett 2000, 55), the feminist translation might be inclined to de-emphasize these associations. The feminist translation is also more conservative than the non-feminist on illicit relations and sexual assault (Section 9.5.3.2). Since the illicit relations often involve promiscuity and adultery, which are frowned upon by traditional Confucian culture, the conservative strategies could be an attempt to minimize these stereotypes for the novel’s heroines. The victims of sexual assault are almost always female characters in this novel. Unlike verbal violence such as stigma, physical violence is depicted less explicitly in the feminist translation, likely to shield readers from the graphic abuse. Less sensitive to this need, the non-feminist translation finds the pain and cruelty in the description more acceptable. A case in point is Yang’s frequent use of the verb gao, which can mean “rape” and “molest” in a sexual context,3 despite the possible unpleasant effect on readers. Our results should be interpreted with a number of limitations in mind. With regard to translation strategies, while our study has examined six common ones (Section 9.4), others such as archaicization, footnoting, generalization, and hijacking could potentially yield further insights (von Flotow 1991, 69–84; Vinay and Darbelnet 1995; Lee and Ngai 2012, 73–94). With regard to the textual material, a corpus with more translators could support more comprehensive analyses, such as the role of the publisher, as well as the translator’s gender, which may influence the description of private parts and other aspects of the female body. A larger pool Feminist Translation of Sexual Content 173 of translators in both genders involving a variety of publishers could reduce these confounding variables. 9.6 Conclusion Despite the extensive literature on the translation of sexual content, most studies on feminist translation of novels have been limited to qualitative methods (Irshad and Yasmin 2022). We have presented the first quantitative comparison between feminist and non-feminist translations of a novel, through a parallel text analysis of the Chinese versions of The Color Purple by Jie Tao and Renjing Yang. We constructed a parallel corpus consisting of their translations of 197 sex-related sentences, annotated with their sexual content types and translation strategies. Our analysis has revealed that while faithful translation dominates both the feminist and non-feminist versions, the former employs explicit strategies slightly more frequently and conservative ones less frequently. Importantly, our results have identified distinctive strategy choices for different sexual content types. On content related to body exploration, private parts, and bodily phenomena, the feminist translation is less conservative than the non-feminist one; however, on rape, illicit relations, and stigma related to virginity, the feminist translation is more conservative. Further research can pursue a more thorough examination of the translators’ metatexts and paratexts to present a fuller picture on the distinctive practices in feminist translation. The corpus could also be expanded to include more feminist and non-feminist translators to investigate other variations in the patterns of feminist translation strategies. Notes 1 www.trados.com. 2 “Thinly disguised, whereas it is reasonable and not indecent” (in the original Chinese, 露 而不穢,較有分寸) (Yang 1989, 34). 3 Cf. examples from Wen (2012), e.g., gao nüren (搞女人), “molest young woman”; zhege liumang gaole ta, 這個流氓搞了她, “the hooligan raped her.” References Bosseaux, Charlotte. 2004. “Point of View in Translation: A Corpus-based Study of French Translations of Virginia Woolf’s To The Lighthouse.” Across Languages and Cultures 5, no. 1: 107–22. https://doi.org/10.1556/Acr.5.2004.1.6. Cooper, Burns. 2007. “Taboo Terms in a Sexual Abuse Criminal Trial.” International Journal of Speech Language and the Law 14, no. 1: 27–50. https://doi.org/10.1558/ijsll. v14i1.27. Du, Chen. 2020. “New Interpretation and Techniques of Transcreation.” APTIF 9 – Reality Vs. Illusion 66, no. 4–5: 750–64. https://doi.org/10.1075/babel.00178.che. Geer, James H., and Heidi S. Bellard. 1996. “Sexual Content Induced Delays in Unprimed Lexical Decisions: Gender and Context Effects.” Archives of Sexual Behavior 25, no. 4: 379–95. https://doi.org/10.1007/BF02437581. 174 Xinyi Zeng and John Sie Yuen Lee Grosser, George S., and Anthony A. Walsh. 1966. “Sex Differences in the Differential Recall of Taboo and Neutral Words.” The Journal of Psychology 63, no. 2: 219–27. https://doi.org/10.1080/00223980.1966.10543035. Han, Ziman. 2008. “Sex Taboo in Literary Translation in China: A Study of the Two Chinese Versions of the Color Purple.” Babel 54, no. 1: 69–85. Holt, Patricia. 1996. Alice Walker Banned. San Francisco, CA: Aunt Lute Books. Huang, Libo. 2014. “Discourse Presentation Translation as an Indicator of Translator’s Style: A Case Study of Lao She’s Luotuo Xiangzi and Its Three English Translations Style.” In Style in Translation: A Corpus-Based Perspective, 57–77. Berlin: Springer. https://doi.org/10.1007/978-3-662-45566-1_5. Huang, Libo, and Chiyu Chu. 2014. “Translator’s Style or Translational Style? A CorpusBased Study of Style in Translated Chinese Novels.” Asia Pacific Translation and Intercultural Studies 1, no. 2: 122–41. https://doi.org/10.1080/23306343.2014.883742. Irshad, Isra, and Musarat Yasmin. 2022. “Feminism and Literary Translation: A Systematic Review.” Heliyon 8, no. 3: 1–12. https://doi.org/10.1016/j.heliyon.2022.e09082. Kutner, Nancy G., and Donna Brogan. 1974. “An Investigation of Sex-Related Slang Vocabulary and Sex-Role Orientation among Male and Female University Students.” Journal of Marriage and the Family 36, no. 3: 474–84. https://doi.org/10.2307/350718. Landis, J. Richard, and Gary G. Koch. 1977. “The Measurement of Observer Agreement for Categorical Data.” Biometrics 33, no. 1: 159. https://doi.org/10.2307/2529310. Lee, Tong-King, and Cindy Ngai. 2012. “Translating Eroticism in Traditional Chinese Drama: Three English Versions of the Peony Pavilion.” Babel 58, no. 1: 73–94. Lee, Tzu-yi. 2013. “Woman-identified Approach in Practice: A Case Study of Four Chinese Translations of the Novel The Color Purple.” In Bridging the Gap between Theory and Practice in Translation and Gender Studies, edited by Eleonora Federici and Vanessa Leonardi, 75–85. Cambridge: Cambridge Scholars Publishers. Liang Wayne Wen 梁文駿. 2017. “Lun zhongguo yanqing xiaoshuo fanyi: yi Jinpingmei 豔情小說翻譯:以《金瓶梅》英譯本為例” [Translating Chiyingyiben weili nese Erotic Literature: A Case Study of the English Translations of Jin Ping Mei]. SPECTRUM: NCUE Studies in Language, Literature, Translation 英語文暨口筆譯學集刊 15, no. 1: 1–20. Mellinger, Christopher, and Thomas A. Hanson. 2017. Quantitative Research Methods in Translation and Interpreting Studies. London and New York: Routledge. Millett, Kate. 2000. Sexual Politics. Urbana, IL: University of Illinois Press. Mu Lei 穆雷. 2008. Fanyi yanjiu zhong de xingbie shijiao 翻譯研究中的性別視角 [Gender Perspective in Translation Studies]. Wuhan: Wuhan University Press. Santaemilia, José. 2011. “The Translation of Sexually Explicit Language: Almudena Grandes’ Las edades de Lulu (1989) in English.” In Translation and Opposition, edited by D. Asimakoulas and Margaret Rogers, 256–82. Bristol: Multilingual Matters. Slamia, Fatma Ben. 2020. “Translation Strategies of Taboo Words in Interlingual Film Subtitling.” International Journal of Linguistics, Literature and Translation 3, no. 6: 82–98. http://doi.org/10.32996/ijllt. Spinzi, Cinzia, A. Rizzo, and M. L. Zummo, eds. 2018. Translation or Transcreation? Discourse, Text and Visuals. Newcastle upon Tyne: Cambridge Scholars Publishing. Tao Jie 陶潔. 1994. “Ailisi Mengluo bixia de xiaozhen funü” 艾麗絲·蒙蘿筆下的小鎮 婦女 [Ladies in the town from Alice Ann Munro]. In Xinling de Guiji Jianada Wenxue Lunwen Ji 心靈的軌跡 : 加拿大文學論文集 [The Track of the True Heart: Canadian Literature], edited by Mingli Qin 秦名利, Tao Jie 陶潔 and Weng Dexiu 翁德修, 1–16. Beijing: China Federation of Literary and Art Circles Publishing House. Feminist Translation of Sexual Content 175 Tao Jie 陶潔. 1995. Yuwai nvxing 域外女性 [Foreign Females]. Beijing: Peking University. Tao Jie 陶潔. 1998. Ziyanse 紫顏色 [The Color Purple]. Nanjing: Yilin Publication. Tao Jie 陶潔. 2004. Dengxia xichuang meiguo wenxue he meiguo wenhua 燈下西窗 – 美 學和美 化 [Light from the Window on the West: American Literary Analysis]. Beijing: Beijing University. Vinay, Jean-Paul, and Jean Darbelnet. 1995. Comparative Stylistics of French and English Translated and English: A Methodology for Translation. Amsterdam: John Benjamins. Von Flotow, Luise. 1991. “Feminist Translation: Contexts, Practices and Theories.” TTR: Traduction, Terminologie, Redaction 4, no. 2: 69–84. Walker, Alice. 1985. The Color Purple. New York: Pocket Books. Wen Shaoxian 溫紹賢. 2012. Zhongwen changyong ciju ji zhongying duibi yu fanyi 中文 常用詞句及中英對比與翻譯 [Common Chinese Words and sentences and Comparison with English and Translation]. Hong Kong: Everflow Publications. Winters, Marion. 2007. “F. Scott Fitzgerald’s Die Schönen und Verdammten: A Corpusbased Study of Speech-act Report Verbs as a Feature of translator’s Style.” Meta 52, no. 3: 412–25. Wu, Yi-ping. 2010. “A Study in the English Translation of Eroticism: The Case of Li Ang’s Sha Fu.” In The Erotic in Context, edited by M. Soraya García-Sánchez, Cara Judea Alhadeff, and Joel Kuennen, 161–70. Leiden: Brill. Yang Renjing 楊仁敬. 1987. Zise 紫色 [The Color Purple]. Beijing: Shiyue Wenyi Publication. Yang Renjing 楊仁敬. 1989. “Meiguo heiren wenxue de xin tupo – ping Ailisi Woke Zise” 美 黑人文學的新突破 – 評愛麗絲·沃克《紫色》 [A New Breakthrough in American Black Humanities – Review of Alice Walker’s The Color Purple]. Foreign Literature Studies 學研究 3, no. 1: 29–36. Yuan, Long. 2015. “The Subtitling of Sexual Taboo from English to Chinese.” PhD dissertation, Imperial College. https://doi.org/10.25560/31546. 10 Benefits of a Corpus-based Approach to Translations: The Example of Huckleberry Finn Ronald Jenn and Amel Fraisse 10.1 Introduction Combining an open call to literature and translation studies scholars, existing bibliographical data, the UNESCO’s “Index Translationum,” and crowdsourcing input, a multilingual parallel corpus of translations of Adventures of Huckleberry Finn was compiled, aligned by chapter and paragraph, and exploited at different levels of granularity and expertise to the benefit of diverse scholarly communities. Besides better knowledge of how a single author’s ideas and texts were translated and interpreted in different languages, this approach to translated texts resulted in a refinement of NLP approaches to literary texts and helped rare and underresourced languages along the way. This chapter discusses the different stages of an interdisciplinary and international project as a blueprint for future collaborations. The necessary steps consist in finding a sufficiently well-traveled author and text that can provide ample material for study, mobilizing the tools for mining databases, extracting the texts, and aligning them to establish a comparable corpus. Once the digital phase is over and the texts are retrieved and aligned by NLP and information science experts, humanities scholars versed in the local languages and cultures can step in to conduct refined textual analysis.1 10.2 Adventures of Huckleberry Finn Described One essential ingredient of a corpus-based approach to translations is the existence of an active and dedicated international community of scholars willing to engage in a transnational approach of its author, text, or genre. The Mark Twain community, within which a globalized and transnational approach has been trending for over a decade, grew interested in the fate of Adventures of Huckleberry Finn, worked together with experts of other fields, and called on the international community in order to identify and collect existing translations in different languages. First prized as the quintessence of the American spirit, the American frontier, the adventurous West, and as is the case for Huckleberry Finn, the sultry and violence-prone South, Mark Twain’s fame went unabated even after those cultural features became history. The novel deals with transnational and universal topics DOI: 10.4324/9781003298328-11 Benefits of a Corpus-based Approach to Translations 177 such as childhood, coming of age, freedom, racism, and slavery.2 Thanks to its many universal ingredients, Huckleberry Finn survived the upheavals and evolutions of the twentieth century. Its international fame was actually kept alive and revived several times in the contexts of mass literacy, the ideological divides of the twentieth century, as well as the emerging teenage culture, and currently, well into the twenty-first century, it nourishes ethical questions linked to race relations. The text has a number of linguistic specificities, such as the use of dialects that would hypothetically bar it from being successfully translated, and yet it proved far from untranslatable and achieved worldwide fame (see Section 10.7). Published in 1885 in the United States and the previous year in Great Britain, Huckleberry Finn traveled fast. Within a short time span, Denmark and Sweden (1885), France (1886), Russia (1888), Germany (1890), and Poland (1898) came up with their own versions of the novel. These earlier versions and the succeeding ones across a wide range of countries mainly catered to educational institutions and publishers bent on providing reading material to a wide readership in a context of mass literacy. After World War II, there was a shift of emphasis as the educational aspect faded to the background while political implications came to the fore as, paradoxically, the East and West competed for the American icon in a bipolar world.3 At about the same time mass literacy was completed, the civil rights movement in the United States and the worldwide independence movement entailed growing sensitivity to African American voices in the novel and its translations. All this results in an astounding number of available translations (into 64 languages, sometimes multiple times over a century and a half) and makes Huckleberry Finn an ideal text to use as a prototype in an investigation of the global circulation of literary texts. Translations of Mark Twain’s work received relatively little scholarly attention until the 1982 publication of Robert Rodney’s landmark study Mark Twain International. A goldmine of bibliographical references of foreign editions of Mark Twain’s work in English and in translation, Rodney’s work is foundational.4 Progress on the international dimension of Mark Twain accelerated in the 2010s with contributions by Tsuyoshi Ishihara (Mark Twain in Japan, 2011), Selina Lai-Henderson (Mark Twain in China, 2015), and Paula Harrington and Ronald Jenn (Mark Twain and France, 2017). Also during this time, Shelley Fisher Fishkin advocated a more comprehensive and global (Mark Twain Anthology: Great Writers on His Life and Work 2010), which included translations of foreign criticism.5 She was also the first to foresee how the rise of digital tools applied to translation studies could benefit transnational approaches to Mark Twain in a landmark article (“Deep Maps” 2011). Undoubtedly, the anniversary of Mark Twain’s death in 2010, which came with the long-expected and much-touted Autobiography publication spurred renewed interest in his works and spawned novel translations and scholarship. 10.3 Global Huck and Rosetta Amid efforts to develop new ways of consolidating an understanding of global translations of Mark Twain’s Huckleberry Finn and exploring potential directions 178 Ronald Jenn and Amel Fraisse for the future, building on the scholarship of the past, the Global Huck project was created. It was first presented during the Eighth International Conference on the State of Mark Twain Studies in 2017, as part of “The Place of Mark Twain in Digital Humanities” workshop. It involved identifying global editions of Huckleberry Finn and having scholars contribute comments about the cultural work that each translation does in specific cultural contexts.6 In 2019, Global Huck morphed into the Rosetta Project, helped by the FranceStanford Fund and the Center for Interdisciplinary Studies and Stanford’s Center for Spatial and Textual Analysis (CESTA). Rosetta looked more specifically at versions from a number of low-resourced languages. The projects first relied on crowdsourcing as well as inclusive, interactive, and collaborative user-based approaches for data and information collection. Natural language processing methods were used to generate multilingual corpora with a view to provide material for language resources (corpora, thesauri, dictionaries). Information science experts located and retrieved digital versions and aligned them, first by chapter, then by paragraph and line by line. These tasks, conducted by IT experts, do not require an intimate knowledge of the text or Mark Twain scholarship. Among the results of Rosetta is an interactive map that was created and which is currently supported by Huma-Num, a very large research infrastructure for facilitating the digitization of research in the humanities and social sciences funded by the European Commission. The map’s URL is: https://rosetta.huma-num.fr/ worldmap/index.html. Through this map, users and scholars have the opportunity to gain insight into the global circulation of the novel. The items on the map, compiled from the different inputs (see Section 10.3), display the title in the target language, the first year of publication, the name of the translator, and the publisher, when available. 10.4 Method of Gathering Material In this section, we look at how the corpus was gathered, starting with the UNESCO’s Index Translationum, a most reliable source to assess the scope of translations accumulated across the world for a given author. The UNESCO’s Index Translationum quantifies the volume of translations worldwide and breaks them down by author, particular title, language, and country. Mark Twain is ranked 15th in the top 20 of the most translated authors and comes only after behemoths like Agatha Christie, Jules Verne, Alexandre Dumas, and Conan Doyle. Like other prolific nineteenth-century writers, Mark Twain was popular right from the start and continued to be translated in many languages throughout the twentieth century and well into the twenty-first. Adventures of Huckleberry Finn is only the second most commonly translated of Mark Twain’s books, after The Adventures of Tom Sawyer. However, the former was early on deemed essential and central to the canon of American literature, and there clearly is more interest among Mark Twain scholars in Huckleberry Finn, in which the coming-of-age dimension cuts across generations, whereas Tom Sawyer may look more childhood-bound.7 Benefits of a Corpus-based Approach to Translations 179 Although the bulk of the data was provided by UNESCO, an international institution, and Rodney, an individual bibliographer, there also were additions by individual Mark Twain scholars as well as through a crowdsourcing experiment that mobilized the power of anonymous crowds. Using the title in the target languages, we crawled the web and mined online digital libraries and national archives in order to find the full texts. In some cases, we came across the full online version that was in the public domain (provided by public institutions), in which case we downloaded them, whatever their format. When dealing with versions in pdf or epub format, we converted them into text format that could later be processed. There were other instances when we knew of an existing version but it was not readily available online. In that case, we turned to the national libraries and archives and asked them if they were willing to collaborate with us by digitizing their printed versions. In total, we collected 64 metadata (the title, the language, the translator’s name, the year of publication, and the publisher house) and 30 full text files. Volunteer contributors and scholars provided us with 34 metadata and 7 full text files. The crowdsourcing provided us with 18 metadata and full text files, and five translations were collected by crawling different digital libraries collections. Due to the significant number of existing translations and the growing number of digital versions made available online, the crowdsourcing allowed us to gather data that would have otherwise been beyond our reach. Crowdsourcing helped reduce the amount of time spent on the task and increased the number of translations available as a basis for developing parallel corpora for under-resourced languages. These efforts allowed the roster of translation of Huckleberry Finn to be brought up to date. As of today, the novel exists in 64 languages: Afrikaans, Albanian, Alemannic, Arabic, Armenian, Assamese, Basque, Bengali, Bulgarian, Burmese, Catalan, Chinese, Chuvash, Croatian, Czech, Danish, Dutch, Esperanto, Estonian, Finnish, French, German, Georgian, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kazakh, Kirghiz, Korean, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Marathi, Moldavian, Norwegian, Oriya, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Sinhalese, Slovak, Slovenian, Spanish, Swedish, Tamil, Tatar, Telugu, Thai, Turkish, Turkmen, Ukrainian, Uzbek, Vietnamese, and Yiddish. 10.5 Natural Language Processing and Alignment Alignment serves the needs of both the NLP community and translation studies scholars. While the interpretive nature of literary translations has caused a lag in their adoption as a source for NLP development, multiple projects have developed parallel corpora based on texts, including the Harry Potter series and Le Petit Prince (Stolz 2007). Concerning alignment, nearly all tools are not literary texts oriented, which is more challenging for alignment approaches because the entire corpora must be aligned and alignment should be as confident as possible (Xu et al. 2015). Usually, parallel corpora focus on very specific and specialized domains, which can be efficient but also show limitations for machine translation. 180 Ronald Jenn and Amel Fraisse Figure 10.1 Excerpt of the Translation Dashboard for Basque, Bulgarian, Dutch, Finnish, German, Hungarian, Polish, Portuguese, Russian, and Ukrainian (first five chapters). The advantage of using a work of fiction such as Huckleberry Finn is that it uses a very broad vocabulary linked to everyday life, which makes it a valuable asset for those languages that are currently lacking computational resources (see Section 10.5). The paragraph and word alignment algorithms used in the project were modest compared to the current cutting-edge of NLP, but the tool was nonetheless significant as an example of how to apply “just enough” NLP in the service of research priorities shaped by another field, such as library and information science or translation studies. For example, chapter alignment quickly gives an idea of how complete the version under scrutiny might be. Anything short of the original 43 chapters indicates incompleteness or an adaptation of sorts. So a number of versions matching the chapter count were gathered and first aligned by chapter. The next step was to create a tool that would allow a quick overview of how the paragraph count of each chapter matches the original – that’s how the Translation Dashboard came about (Figure 10.1): This Translation Dashboard reads as follows (starting in the top left corner): Chapter 1 of the English version (en) has 9 paragraphs, and the Basque (ba) version has 10, the Bulgarian (bu) 9, the Dutch (du) 9, and so on. The per-chapter paragraph count is based on new lines and white space. It meets the needs of translation studies scholars by creating a convenient environment enabling them to conduct parallel readings and visualizations. Scholars can easily see patterns of structural convergence and divergence between the source text and translations, at different levels of granularity. A translation studies scholar in the literary tradition may use this information to select chapters for a close reading analysis. Benefits of a Corpus-based Approach to Translations 181 Figure 10.2 Example of paragraph alignment for the Basque version To get paragraph alignments in each chapter, chapters were divided into three major categories based on the differences in their paragraph counts compared to the original English version: exact-match, large-difference, and small-difference. Different paragraph aligners may apply to different categories. An exceedingly high divergence from the source paragraph count alerts the scholar that there may be data-cleaning issues (e.g., one instance where each line in a poem embedded in a narrative was treated as a new paragraph), but a moderate divergence can reflect the translator’s deliberate stylistic choices about how the flow of the narrative should be rendered. The degree of convergence and divergence also determines the color variation in the heat map of the Translation Dashboard (Figure 10.1): the darker, the greater the divergence in paragraph numbers. For exact-match chapters, the hypothesis is that their paragraphs were translated one-to-one. No further paragraph alignment methods are needed. This hypothesis has been confirmed for most of the exact-match cases by a human validation experiment. Large-difference cases are normally caused by different ways of splitting quotations. Thus, we provide a text pre-processing option before paragraph alignment when long quotations have been found under large-difference cases (Fraisse et al. 2019, 303). This pre-processing option splits quotations into paragraphs according to the same standard in all translations. Experiments have shown that it can significantly reduce differences in paragraph counts and sometimes move a chapter from the large-difference category to the small-difference category. Figures 10.2 and 10.3 show examples of paragraph alignment based on Chapter 1 of the Basque version and Chapter 43 of the Bulgarian one, which are small-difference chapters: Deeper analysis for aligned paragraphs. An expected finding during the human validation stage is that some aligned paragraphs, even though correctly aligned, still contain some words or sentences that have not been translated. Further processing with sentence alignment, word alignment, text similarity, and text summarization can support a deeper analysis. User-uploaded corpora support. As the NLP algorithms we implemented are not bound to the novel Huckleberry Finn, the option for users to upload other 182 Ronald Jenn and Amel Fraisse Figure 10.3 Example of paragraph alignment for the Bulgarian version. parallel corpora that interest them, and visualize them in the same way, could be added. 10.6 Under-Resourced Defined A large but still modest number of languages, close to a hundred, have the so-called Basic Language Resource Kit (BLARK): monolingual and bilingual corpora, machine-readable dictionaries, thesauri, part-of-speech taggers, morphological analyzers, parsers, etc. (Krauwer 2003 and Arppe et al. 2016). This means that, as mentioned by Scannell (2007), over 98% of world languages lack most, and usually all, of these language resources. Consequently, these languages and, subsequently, the knowledge encoded in these languages are threatened and their preservation is at risk. Digital language resources can help prevent the disappearance of diverse knowledge systems, ensure their preservation and transmission, and foster their cross-fertilization. Even for well-endowed languages, parallel corpora, a valuable resource for sustaining linguistic diversity, are rare despite the great need there is for them. Such corpora are often used for testing new tools and methods to develop underresourced languages. Because translated texts are de facto parallel corpora, and because we know for a fact that translated language materials do exist in underresourced languages, using these translations can help build cheap, efficient, and reliable corpora for the purpose of preserving under-resourced languages. Those translations are mostly available in print and still awaiting digitization. They are all the more precious because, when translation does occur, it is currently into commercially dominant languages. The roster of translations of Huckleberry Finn includes 22 under-resourced languages: Assamese, Basque, Bengali, Burmese, Catalan, Chuvash, Hindi, Indonesian, Kazakh, Kirghiz, Malayalam, Marathi, Moldovan, Oriya, Persian, Sinhalese, Tamil, Tatar, Telugu, Turkmen, Uzbek, and Yiddish. In many of these languages, there have been multiple translations over time, reflecting different moments in history, and different ideological perspectives on the part of the translators or publishers, as well as different attitudes toward the United States, childhood, minorities and minority dialects, and race and racism. Benefits of a Corpus-based Approach to Translations 183 Texts gathered through Global Huck and Rosetta were used to create a parallel corpus containing translations of Huckleberry Finn as a basis for developing NLP resources for under-resourced and endangered languages (Fraisse et al. 2018), as shown in Section 10.4. The Slavic (Bulgarian, Polish, Russian, Ukrainian) and Finno-Ugric (Hungarian and Finnish) translations have served as the initial datasets for testing the project’s text alignment algorithms. Before we move on to some results of the fine-grained literary analysis, it is important to underline that a comprehensive approach to translated texts cannot be limited to collecting and aligning the versions. Translations are more often than not accompanied by scholarship. Collecting that scholarship in different languages helps gain insight into local strategies and ways of handling the text so as to get a global image and perceive patterns. In the following section, observations are made as to the chronology of those studies as they appear to be fairly recent and increasing in number. The ascent of translation studies as a self-standing discipline is not solely accountable for this surge in interest. Increasing attention paid to race relations and the issue of dialect translation make Huckleberry Finn an alluring mix for translations studies scholars. Most of the scholarship was published in the twenty-first century, and the bulk came after 2010. 10.7 The Scholarship A consistent wave of scholarship, too numerous to be detailed within the scope of this chapter, swelled in the late 2000s, gained traction in the 2010s, and recently culminated in a special forum of the Journal of Transnational American Studies, the first time the entirety of a journal issue was devoted to translations of Huckleberry Finn (Fishkin et al. 2021). Scholarship on translations of Huckleberry Finn covers languages spoken on five continents, although the greater part originates from Asia and Europe. Some studies stem from languages with but few speakers on the world stage, such as Czech, Slovenian, or Tatar, while others encompass large segments of the world population, whether because they tend to be international, like Arabic, French, or Spanish, or because they have a large population of speakers, and therefore wide readerships at home, such as Chinese, German, Hindi, Indonesian, Persian, and Vietnamese. Added up, these populations amount to a large portion of the world, and those studies offer a vantage point into the readers’ experience of Huckleberry Finn globally. China and Chinese-speaking scholars stand out as the single biggest providers of scholarship. The scholarly writings emanate from fledgling as well as confirmed scholars, and they come in all shapes and sizes – journal articles, book chapters, and booklength studies. Studies vary in scope, ranging from the examination of merely a few sentences or a single chapter to thorough and painstaking surveys complete with figures and statistics. Most of the scholarship is made of what could be dubbed single-language studies since they focus on translation into one language. Only a handful of studies provide a more international angle by looking at translations into multiple languages. 184 Ronald Jenn and Amel Fraisse Some studies are blind to the context in which the translations were produced; others are extremely concerned with the geographical, historical, sociological, and geopolitical context that make up the translations’ backdrop. Very rarely do they dwell on the identity and profile of the translators themselves, or the publishing houses, except when they are otherwise famous. While they have varying methodologies and different goals, the studies from foreign shores rely on a shared pool of authorities originating in the United States on Twain and are well grounded in the novel’s reception and progress in its homeland. Analysis of what actually happened to the translated texts is exemplified by the recent publication of the special forum “Global Huck: Mapping the Cultural Work of Translations of Mark Twain’s Adventures of Huckleberry Finn” in Journal of Transnational American Studies, 12(2) (Fishkin et al. 2021). Within the scope of the present study, we will dwell on one particular aspect: the presence of dialects in Huckleberry Finn and how they were dealt with by translators around the world. 10.8 Fine-Grained Textual Analysis: The Dialects Go Global One major theoretical objection to the translation of Huckleberry Finn often put forward by American Studies scholars is the presence of dialects, which are reputedly untranslatable. Indeed, considering that the dialect for dialect strategy was adopted in only a handful of translations, the odds seem to be against this approach.8 One underlying assumption against using dialects seems to be that they are tied to specific territories in ways that national languages are not. This is paradoxical because reading a translation implies, just as is the case for fiction, a suspension of disbelief, a process through which readers acknowledge having access to a foreign reality in their own tongue. Dialects in translations somehow disrupt this process, and suspension of disbelief works best when national languages are at play. This most probably originates in the way national idioms were shaped in opposition to regional dialects, notwithstanding their linguistical ties. Amid the intricate web of dialects claimed by the author of Huckleberry Finn, the structural opposition between Black and White voices is explored in this section to show the benefits of a multilingual approach. Culturally motivated and textually observable, the divide between Black and White voices is convenient to handle. Observers and critics of Huckleberry Finn agree that a clear distinction between Black and White voices was carefully crafted by the author and that Black voices coincide, from a linguistic point of view, with African American Vernacular English, sometimes referred to as Black English. The translation of African American voices in the novel, which is at once rife with technical and ethical questions, has come under increasing scrutiny since the late 2000s.9 To circumvent the dialect challenges and the strong belief that rendering dialect for dialect is bound to failure, translators worldwide turned to registers. Registers can conveniently be broken down into three universally shared categories: low, standard, and high. The low standard, which is distinct from broken language, Benefits of a Corpus-based Approach to Translations 185 appears to be confined to Chinese language, but further research might show that other languages have developed it. Standardization is the most widespread strategy, while high register is marginal yet noteworthy. In mainland China, some translators used the specificities of their language regarding registers and tones to implement successful strategies. In a 1956 version that has since been canonized, “[t]he translators tap into three register varieties available to them in Chinese: the vulgar, the colloquial and the standard formal varieties” (Jing Yu 2017, 60). The standard formal variety is only used for posters, letters, and inscriptions, while all White characters share the same casual colloquial tone. As a result, Huckleberry, a “marginal social outcast,” morphs into “a legitimate member of that society in full command of its standard oral code.” While Huckleberry’s difference is erased, Black voices are rooted in the vulgar register, which, in the context of socialist China, has positive connotations: “Jim represents the oppressed on the bottom rung of the social ladder,” and he is “given the working-class voice” (Jing 2017, 60–63).10 In the same low-register vein, other translators played on the discrepancy between oral and written Mandarin Chinese and its different tones for spoken words in order to offer a parallel to Black characters’ mispronunciations, otherwise known as malapropisms – although it is to be reminded that malapropisms are shared by many characters in the novel. Such is the case with a 1989 translation, whose use of the wrong tone results in shifts of meaning that “captured the spirit of Jim’s speech and mannerisms” (Lai-Henderso 2015, 114). These strategies can be deemed successful because through vulgar register or tonal malapropisms, slaves come across as unschooled and illiterate, fitting their social status, but they retain strong reasoning abilities. They stand out as humane and likeable characters that readers can relate to. Not all Chinese versions explored this path, and some resorted to another play on register and what is the most commonly shared strategy overall: standardization. In this case, although Jim, the main Black person in the novel, still is a full-blown character, his speech is “translated into normal spoken Chinese, just as Huck’s is,” and “it is hard to distinguish his vernacular or his low social status” (ibid., 117). Standardization and homogenization – which result in all characters speaking alike – are the dominant strategies. Considering the economy of the original text and Black characters’ outstanding identifiable voices, this is mostly considered another case of untranslatability as stated in a great number of studies of which Vietnamese and Indonesian provide a good example. This is what a scholar has to say about a Vietnamese translation: Jim’s distinctive and nonstandard voice does not come through at all. Although it is clear that finding an adequate Vietnamese dialectal equivalent for Jim’s voice is a difficult task. The task is even more challenging due to the fact that there is no perfect equivalent of Black and White race relations in the Vietnamese speaking world. (Hang 2019, 53) 186 Ronald Jenn and Amel Fraisse Likewise, a study of the 2012 Indonesian version states that standard Indonesian, which is “spoken mostly by the more educated members of society,” sounds “unnatural for daily conversation” and eliminates “the impression that the speaker comes from a lower class” (Dewi et al. 2018, 378). Standardization could easily be perceived as a lesser effort and a recognition of the shortcomings of translation and its inability to produce a viable equivalent of the original. This is mainly due to the fact that most translators are silent about their work and there is a deficiency of discourse on their part. In the Russian tradition, however, far from being a silent process, standardization has been verbalized, conceptualized, and valued as a strategy, most certainly speaking for many translators around the world who have left little trace of their strategy other than in the occasional foreword, preface, or footnote. Famed Russian translator and theorist Chukovsky, who discussed Daruzes’s landmark translation of Huckleberry Finn and would later translate the novel himself, invoked a long-standing tradition of “masters of translation who have completely refused to reproduce colloquial speech in translation” (Marinova 2021, 66). Acknowledging the untranslatability of dialects, they have embraced standardization, also called “blandscript,” as a way of conveying the true essence of characters in the novel, including Jim: Rather than seeing it as a deficiency, blandscript for him [Chukovsky] became the best possible answer to the problem of reencoding different regional dialects and registers in Russian. In Chukovsky’s professional assessment, Daruzes’s translation of Huck Finn with its “most pure, correct, neutral language, without straining after any dialects,” offered an admirable example of successful reaccentuation of the original. (Marinova 2021, 127) Russian translators’ writings are precious testimonies insofar as they enable us to consider standardization not as a failure but as a thoroughly thought-about strategy. Not translating dialects is considered no impediment to a feeling of relatedness to Jim and other dialect speakers, which was judged attainable through standardization: To put it differently, it was precisely because of the erasure of the original dialects that Soviet readers thought they could understand Huck and Jim so well. By stripping the cumbersome linguistic layer of difference, translators allowed Russian boys and girls the opportunity to access the true essence of the fictional heroes. (Marinova 2021, 127) Incidentally, in his assessment of Daruzes’s translation, Chukovsky praised the fact that his language resembled Turgenev’s style of natural descriptions more than Twain’s. This reference to a writer of renown leads us to what could be termed the high-register strategy. Benefits of a Corpus-based Approach to Translations 187 A high-register strategy consists in considerably enhancing Black voices and translating their dialect into literary language. This strategy is akin to standardization but takes it a notch higher. It will be found mostly in national literatures that had little to no tradition of written representation of oral speech at their disposal, as, let it be reminded, Huckleberry Finn was a pioneering venture from that point of view. Persian and Arabic are two examples of literatures with strong written traditions. To take the example of the first version in Iran by Golestan (1949), even though the translations back into English provided by scholars still make Jim’s speech sound incredibly enhanced, the recourse to colloquial language in this translation made a lasting impression and is considered a “significant step in the history of Persian translation” (Fomeshi 2021, 30). In Arabic, the same high-register strategy can be observed in the first version by Egyptian translator Naseem (1958). He used a high register and “polished the slave’s dialect into Classical Arabic, occasionally using Quranic terms” (Abdulmalik 2016, 43). Although a proper rendition of dialects would certainly help dramatize the novel in even greater proportions, it is striking that the novel has been praised for its universal dimension and generations have enjoyed reading it without direct access to the colorful and pithy language of the original. This is a reminder that a strictly linguistic appraisal of translated texts has limitations and that deeper mechanisms are at play. The intensity of narrated events, the falsely naive tone of some characters, the exposure of universal human traits, whether downfalls or outstanding qualities, came across sufficiently unscathed for the reading to be enjoyed. 10.9 Conclusion To summarize, an across-the-board corpus-based approach needs an author who ranks high in the Index Translationum, whose international career has been the subject of some preliminary bibliographic work and who is studied by a community bent on discovering how their writings reverberated throughout the world. Once a well-traveled text or author has been identified, retrieval of the texts can begin with a view to aligning them from chapter to paragraph. Building such comparable corpora can help salvage endangered languages by strengthening available linguistic material. The collected translations help build multilingual parallel corpora and highlight the transnational circulation of knowledge. The world’s cultural heritage is largely built by sharing texts through translations, which are locally impacted as they are both written and read in a specific context and culture. Comparable corpora allow scholars to explore the social, cultural, and political agendas of translators and publishers and look at how the cultural demands of readers shaped the book’s translation, fostering fine-grained literary analysis of what actually happened within the translated texts. Translators and translation studies scholars have been largely left to their own national devices, and a multilingual approach can abolish the seclusion of 188 Ronald Jenn and Amel Fraisse individual languages and cultures. Comparable corpora gesture to a fascinating array of perspectives on a series of issues, continuities, and divides that break through the monolingual and nation-bound silos that usually constrain literary and translation studies. Notes 1 This sequential description is partially hypothetical and broken down for the sake of demonstration. In actuality, experts from the different fields involved work hand in hand on different tasks at different stages of the project in a dynamic and feedbackbased fashion. 2 We tend to think of Huckleberry Finn (1885), in the wake of Ernest Hemingway and many others after him, as the foundation of American literature, so there is a tendency to overlook the fact that Huckleberry Finn was the sequel to, if not the companion piece of, The Adventures of Tom Sawyer (1876). 3 The Cold War period was characterized by deep ideological divides that, in many cases, overran the commercial aspect and took precedence over it. As early as 1929, an official list of about 800 books – political manifestoes or essays, along with literary works – was issued by the Komintern in Moscow and sent to all the affiliated publishing houses around the world. The result was a flurry of translations in languages like Khirgiz, Chuvach, or even Turkmen. 4 As of 1982, Huckleberry Finn alone had been translated into 53 languages, in 47 countries, for a total of 800 and 41 editions. 5 Originally published in Chinese, Danish, French, German, Italian, Japanese, Russian, Spanish, and Yiddish. 6 The collaboration of Fishkin in English and American studies at Stanford and Amel Fraisse in information sciences at Université de Lille and Ronald Jenn in translation studies at the same institution received support from MESHS in Lille. 7 Our findings in this project indicate that Huckleberry Finn is catching up with Tom Sawyer, not so much in terms of quantity, but definitely in terms of quality, as it is gradually acquiring the same canonical status abroad it enjoys at home, an observation based on the growing number of scholarly annotated editions complete with the original illustrations. The most recent example is O’Shea’s 2019 translation, which allows Brazil to join the ranks of countries with scholarly translations, complete with elaborate notes and the original Kemble illustrations. O’Shea authored close to 150 notes, an introduction, and a note on the translation. Mark Twain and Huckleberry Finn are inching their way toward full literary recognition one country at a time. 8 It was implemented in an experimental manner in the Atlantic world, where comparable ethnic, cultural, and historical contexts matching the original could be found. 9 This section is the result of observation of strategies in 18 languages: Arabic, Chinese, Czech, French, Hindi, Indonesian, Nordic (Danish, Norwegian, Swedish), Persian, Portuguese, Russian, Spanish, Slovenian, Tatar, Ukrainian, Vietnamese. For obvious reasons linked to limitations in the command of all the languages at play, this section relies exclusively on scholarship on the translation of Jim’s speech written in English. Those studies that proposed translations back into English (and not all of them did) proved most helpful to evaluate the shifts in register for an international readership. 10 The emphasis on the colloquial and the omission of the standard variety reflects the newly constructed power relations in the socialist China in the 1950s, where “[t]he majority of the population were illiterate workers and peasants who had been regarded as the proud owners and leading class of the country since 1949, when the People’s Republic of China was founded” (Jing 2017, 63). See also Lai-Henderson (2015, 112). Benefits of a Corpus-based Approach to Translations 189 References Abdulmalik, Mariam. 2016. “Adventures of Huckleberry Finn in Arabic Translations: A Case Study.” PhD diss., Binghamton University. Arppe, Antti, Jordan Lachler, Trond Trosterud, Lene Antonsen, and Moshagen Sjur. 2016. “Basic Language Resource Kits for Endangered Languages: A Case Study of Plains Cree.” Proceedings of the 2nd Workshop on Collaboration and Computing for UnderResourced Languages Workshop (CCURL 2016), 1–8. Dewi, Ida Kusuma, M.R. Nababan, Riyadi Santosa, and Djatmika. 2018. “The Characters’ Background in the African-American English Dialect of The Adventures of Huckleberry Finn: Should the Translation Retain It?” Journal of Social Studies Education Research 9, no. 4: 382–402. Fishkin, Shelley Fisher, ed. 2010. The Mark Twain Anthology: Great Writers on His Life and Work. New York: Library of America. Fishkin, Shelley Fisher. 2011. “ ‘Deep Maps’: A Brief for Digital Palimpsest Mapping Projects (DPMPs, or “Deep Maps”).” Journal of Transnational American Studies 3, no. 2. https://escholarship.org/uc/item/92v100t0. Accessed 11 April 2022. Fishkin, Shelley Fisher, Tsuyoshi Ishihara, Ronald Jenn, Holger Kersten, and Selina LaiHenderson. 2021. Special Forum Global Huck: Mapping the Cultural Work of Translations of Mark Twain’s Adventures of Huckleberry Finn. Journal of Transnational American Studies 12, no. 2. https://doi.org/10.5070/T812255976. Fomeshi, Benham. 2021. “Persian Huck: On the Reception of Huckleberry Finn in Iran.” Journal of Transnational American Studies 12, no. 2: 27–45. Fraisse, Amel, Quoc-Tan Tran, Ronald Jenn, Patrick Paroubek, Shelley Fishkin. 2018. TransLiTex: A Parallel Corpus of Translated Literary Texts. In Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki: Beijing Advanced Innovation Center for Language Resources. Fraisse, Amel, Zheng Zhang, Alex Zhai, Ronald Jenn, and Shelley Fisher Fishkin, Pierre Zweigenbaum, Laurence Favier, and Widad Mustafa El Hadi. 2019. A “Sustainable and Open Access Knowledge Organization Model to Preserve Cultural Heritage and Language Diversity.” Information 10, no. 10: 303. Hang, Hoang Thi Diem. 2019. “An Assessment of the Vietnamese Translation of The Adventures of Huckleberry Finn Chapter XX Using House’s Translation Quality Assessment Model.” VNU Journal of Foreign Studies 35, no. 1: 35–54. Ishihara, Tsuyoshi. 2011. Mark Twain in Japan. The Cultural Reception of an American Icon. Columbia, MI: University of Missouri Press. Jing, Yu. 2017. “Translating ‘Others’ as ‘Us’ in Huckleberry Finn: Dialect, Register and the Heterogeneity of Standard Language.” Language and Literature 26, no. 1: 54–65. Krauwer, Steven. 2003. “The Basic Language Resource Kit (BLARK) as the First Milestone for the Language Resources Roadmap.” Proceedings of the International Workshop Speech and Computer, Moscow, Russia. Lai-Henderson, Selina. 2015. Mark Twain in China. Stanford, CA: Stanford University Press. Marinova, Magdalena. 2021. “Huck Finn’s Adventures in the Land of the Soviet People.” Journal of Transnational American Studies 12, no. 2: 119–47. Rodney, Robert M. ed. and comp. 1982. Mark Twain International – A Bibliography and Interpretation of His Worldwide Popularity. Westport, CT: Greenwood Press. Scannell, Kevin. 2007. “The Crubadan Project: Corpus Building for Under-resourced Languages. Building and Exploring Web Corpora.” In Proceedings of the 3rd Web as Corpus Workshop, 5–15. Belgium: European Language Ressources Association. 190 Ronald Jenn and Amel Fraisse Stolz, Thomas. 2007. “Harry Potter Meets Le petit prince – On the Usefulness of Parallel Corpora in Crosslinguistic Investigations.” Language Typology and Universals 60, no. 2: 100–17. Xu, Yong, Aurélien Max, and François Yvon. 2015. “Sentence Alignment for Literary Texts. Linguistic Issues in Language Technology.” Stanford, CA: CSLI Publications 12: 1–25. 11 Are Translated Chinese Wuxia Fiction and Western Heroic Literature Similar? A Stylometric Analysis Based on Stylistic Panoramas1 Kan Wu and Dechao Li 11.1 Introduction Wuxia, or Chinese martial arts fiction, is a traditional genre of Eastern heroic literature that originated from unique historical and cultural contexts during China’s Warring States Period (475–221 BC) (Huang 2018, 152). Previous research on Wuxia (Flannery 2012; Vander Elst 2017; Keulemans 2020) has attempted to compare this type of Chinese heroic literature with Western chivalric stories and heroic fantasies – two subgenres of heroic literature deriving from a medieval background (Honegger 2010, 61). Those studies have demonstrated that Wuxia could be very different from the two Western subgenres in terms of cultural values, religious belief, and above all, worldviews, even though they share the heroic theme. Wu and Li, however, discovered that when readers read a Wuxia translation, they sometimes experience a déjà vu–like reminder of chivalric stories or heroic fantasies (Wu and Li 2018, 102–3). This raises the question as to whether there are any possible stylistic connections between heroic literature in the East and that in the West. An examination of such stylistic connections may give us clues about the current reception of translated Wuxia and is hence our first research objective. To conduct the investigation, we turn to Stylometry – the statistical analysis of literary styles (Holmes 1998, 111) – for methodological support. Existing stylometric research on (translated) texts of varied genres has employed a number of stylistic indices at such linguistic levels as characters (Daelemans 2013; Eder et al. 2016), words/lexes (Jones and Nulty 2019; Melka and Místecký 2020), n-grams/clusters (Mastropierro 2018; Valencia et al. 2019), sentences and paragraphs (Rong et al. 2006), tones and rimes (Hou and Huang 2020), and their combinations (Brocardo et al. 2014; Liu and Xiao 2020). These studies have unveiled the features of (translated) texts from multiple stylometric perspectives and reveal their most noticeable stylistic features. Nevertheless, they are not without methodological limitations. For one thing, they lack a panoramic view of different stylistic features in a text; for another, they do not always explain the selection criteria for the stylistic indices to be investigated. We believe that both the adoption of a panoramic view and the justification for selection criteria are vital in stylometric analyses because the former makes results more comparable DOI: 10.4324/9781003298328-12 192 Kan Wu and Dechao Li across research and the latter reveals intended functions associated with the chosen indices. Therefore, the second objective of the present study is to introduce the stylistic panorama, a novel concept proposed to describe the stylistic profile of a (translated) text in a relatively holistic and functional way. To achieve the two research objectives, we formulate the following research questions: RQ1: To what extent are translated Wuxia fiction similar to and/or different from Western chivalric stories and heroic fantasies, from a stylometric perspective? RQ2: In what ways could such similarities and/or differences reveal current reception of English translations of Wuxia? RQ3: How do the findings shed light on the use of stylistic panoramas in stylometric analyses? Whereas an investigation of RQ1 is expected to contribute to the hypothesized stylistic connection(s) between translated Wuxia and the two subgenres of Western heroic literature, RQ2 contributes to practical outcomes for target readers of translated Wuxia and RQ3 explores theoretical implications for stylometric research into literary translation, specifically Chinese to English translation on Wuxia fiction. 11.2 Stylistic Panorama as a Stylistic Profile of a (Translated) Text A stylistic panorama is defined in this research as a relatively complete stylistic profile that is based on a set of functionally related stylistic indices at multiple linguistic levels and is intended to satisfy certain research purpose(s). Theoretical significances of the concept can be observed in both stylometric analyses and translation studies. In stylometric analyses, the stylistic panorama emphasizes the combined efforts of varied stylistic features when exploring a (translated) text. For translation studies, the panorama values the stylometric approach in its methodological design. In other words, the concept can be a bridge connecting empirical translation studies with stylometric analyses: it lends viable methodological support to empirical translation studies, enriching the research scope of stylometric analyses. To generate the stylistic panorama of a text, the first step is to choose proper stylistic indices. Such a choice is a varied process that depends largely on research needs. Considering that the research aim here is to conduct an overall stylistic comparison between the translated Wuxia and the two Western subgenres of heroic literature, we tend to select stylistic indices that could cover wide linguistic levels (words, word sequences, sentences, paragraphs, etc.) to capture basic features of this type of literature. Hence, the indices we choose for this research are the average word length (AWL), the dispersion of word lengths (DWL), the movingaverage type-token ratio (MATTR), the verb-adjective ratio (VAR), the average sentence length (ASL), the dispersion of sentence lengths (DSL), the average paragraph length (APL), the most frequent words (MFWs), and the most frequent word Translated Chinese Wuxia Fiction and Western Heroic Literature 193 sequences (MFWSs). Next, we perform multivariate analyses on those indices to produce interpretable stylistic panoramas. This is because multivariate analyses are powerful enough to account for holistic stylistic similarities between the translated Wuxia and the two Western subgenres, as well as being amenable to varied sample sizes. Finally, for easier investigation, we categorize the stylistic panoramas according to the functions of the selected indices. Hence, two types of stylistic panoramas emerge in this work: one based on formal indices, and the other based on word and word sequence frequencies (MFWs and MFWSs, respectively). The stylistic panorama based on formal indices describes formal features of a text. It is founded on such indices as AWL, DWL, MATTR, VAR, ASL, DSL, and APL, which are connected to comparatively small sample sizes (ca. 126 in total) in this research. At the word level, the AWL and DWL are selected to show the orthographical complexity of the heroic texts, and the MATTR and VAR are chosen to demonstrate the lexical richness of these texts. This ensures a breadth of features types under classification. It is worth stressing that, out of many possible indices depicting lexical richness, we prefer the MATTR and VAR for practical reasons. The MATTR is favored because “it takes into account all possible segmentation of the text” (Březina 2018, 58) and is thus believed to capture features of vocabulary richness of a heroic text. The VAR is adopted because it reflects the lexical richness of the heroic text – a genre that is likely to contain a multitude of verbs and adjectives depicting kung fu fighting scenes (Wu and Li 2018, 102). For sentences and paragraphs, the ASL and APL are chosen to partly reveal the typological complexity, and above all, the readability, of the heroic texts, which we believe have certain connections to the reception of translated Wuxia. In addition, we select the DSL because computations of sentence dispersion can unveil the rhythm and likewise the readability of a heroic text: a lower dispersion value suggests a higher level of repetitiveness in a text, and vice versa. The stylistic panorama built on MFWs/MFWSs associates with lexes and their sequences in a text. In this study, the two stylistic indices are related to larger sample sizes (ca. 3,000 in total). The MFWs index is adopted because it has a long tradition of being an efficient classifier to distinguish stylistic features of one text from those of another (Burrows 2002). Meanwhile, word sequences (or n-grams, the contiguous sequences of n items of words within a given text or speech) have been widely applied in computational linguistics and information sciences for predictive or attributive purposes (Broder et al. 1997, 1157). In some quantitative linguistic/translation research (Rybicki 2012; Mastropierro 2018), frequency patterns of the MFWs and/or the MFWSs have been compared in an effort to assess their overall similarities and/or differences between texts. The MFWs and the MFWSs are often determined by the proportion of each word or word sequence in a text and are presented in parallel lists. The present study confines the scope of MFWSs to 2-grams and 3-grams, because they are the most common word sequences in (translated) literary texts (see Burrows 2002; Rybicki 2012; Mastropierro 2018). Furthermore, the choice of 2- and/or 3-grams as the MFWSs is further justified by the possibility that heroic literature is more likely to contain short phrases (of two to three words) that depict the quick action and short dialogue sequences of fighting scenes in the stories (Wu and Li 2018, 97). 194 Kan Wu and Dechao Li Overall, the stylistic panorama is a concept that attempts to bind selected stylistic indices in texts together in a relatively holistic and functional way. In other words, when measuring the stylistic features of a text, the indices are no longer examined in isolation but are instead measured in a more comprehensive and interrelated way in line with specific research needs. For this work, because one objective is to explore potential stylistic connection(s) between the heroic literature of the East and that of the West, the selected stylistic indices and their resulting panoramas are expected to meet this goal. 11.3 Data and Methodology 11.3.1 Data and Corpora The Wuxia novels2 used in this study are the English translations of six different works by Louis Cha, a renowned Hong Kong Wuxia novelist. We choose those six works because they are the only English Wuxia translations published at the time of writing. The novels are translated by five translators who are experienced in rendering Chinese Wuxia fiction into English: Minford, Earnshaw, and Mok are sinologists dedicated to Chinese literary translation, whereas Holmwood and Chang are new generation translators who are interested in the dissemination of Chinese Wuxia overseas (Wu and Li 2018, 95). For the Western heroic literature, the selected works3 are chivalric stories translated into modern English and heroic fantasies written in modern English. We choose these works for a balance between comparability and representativeness. First, the choice of works in modern English is meant to ensure linguistic comparability across the subgenres and, as a consequence, facilitate where the style of the Wuxia translations could be comparatively located. Second, we expect the great popularity (based on Amazon/Goodreads ratings) and diverse source languages (i.e., English, German, Spanish) of the selected translated works would increase the representativeness in the two subgenres of Western heroic literature. Details of all the works used in this research are summarized in Tables 11.1 through 11.3, including year of publication details regarding the versions used in the study. Comparability is additionally enhanced through similar token sizes (ca. 1.3 million) in texts across the subgenres. Representativeness is further captured Table 11.1 Details of the Translated Wuxia Stories Translated Wuxia Stories Year Translator Token Size Fox Volant of the Snowy Mountain The Book and Sword The Deer and the Cauldron A Hero Born A Bond Undone A Snake Lies Waiting Total Size 1993 2004 1997 2018 2019 2020 Olivia Mok Graham Earnshaw John Minford Anna Holmwood Gigi Chang Gigi Chang 120,613 192,439 617,949 127,123 160,851 140,539 1,359,514 Translated Chinese Wuxia Fiction and Western Heroic Literature 195 Table 11.2 Details of the Chivalric Stories Chivalric Stories Year Author/Editor Token Size Don Quixote (translated from Spanish) Ivanhoe Parzival (translated from German) In the Days of Chivalry Heroes and Heroines of Chivalry Castles, Knights, and Chivalry Total Size 2003 2005 1980 2004 2004 2015 M. De Cervantes W. Scott W. Von Eschenbach E. Everett-Green W. Patten Kaufman et al. 402,546 182,732 154,902 153,079 135,866 351,311 1,380,436 Table 11.3 Details of the Heroic Fantasies Heroic Fantasies Year Author Token Size The Lord of the Rings 1–2 A Song of Ice and Fire 1 Conan the Barbarian Wheel of Time 1 The Chronicles of Amber 1–4 The Chronicles of Narnia 1–5 Total Size 1954–1955 1996 1954 1990 1970–1976 1950–1954 J. R. R. Tolkien G. R. R. Martin R. E. Howard R. Jordan R. Zelazny C. S. Lewis 188,361 293,856 104,972 319,124 239,940 233,369 1,379,622 through the inclusion of both earlier works (before the 1850s) and modern collections (the 1850s or later) for the chivalric stories. 11.3.2 Calculations and Algorithms Whereas the average word, sentence, and paragraph length (AWL/ASL/APL) values are retrieved from the outputs of Wordsmith 6.0 (Scott 2012), the most frequent word and word sequences (MFWs/MFWSs) and their proportions in the texts are obtained from Intelligent Archive 3.0 (Craig 2018) as parallel lists. The dispersion of word and sentence lengths (DWL/DSL), moving-average typetoken ratio (MATTR), and verb-adjective ratio (VAR) values, on the other hand, are calculated according to formulas 1 through 3, as follows. SD = 1 å ( X i - X 0 )2 n (1) VAR = verbs verbs + adjectives (2) MATTR = å N -L i =1 Vi L( N - L + 1) (3) In formula 1, standard deviation (SD) gives the statistical expression of the DWL/ DSL (Liu and Xiao 2020, 35), where n is the number of words/sentences in the text, Xi is the length of a single word/sentence, and X0 is the average word/sentence length. 196 Kan Wu and Dechao Li In formula 2, verbs and adjectives represent the total numbers of verbs and adjectives in a text, respectively. Stanford Tagger 4.2.0 (Stanford NLP Group 2021) is used to obtain the numbers of verbs/adjectives in the texts through POS annotation. In formula 3, N is the total length of a text, L is the selected length of a text chunk, and Vi is the number of types in the text chunk. To operationalize, the chunk size in the MATTR calculation is set to 500, a setting that has previously produced reliable results (see Covington and McFall 2010; Kettunen 2014). Data related to the formal indices are normalized according to formula 4, while the Euclidean distance between two stylistic panoramas built on the MFWs and MFWSs are calculated based on formula 5, as follows: Y= X å n k =1 X 2 AB = (x1 - x2 ) 2 + ( y1 - y2 ) 2 (4) (5) Formula 4 is the L2 regularization. Y represents the normalized value, X is the n original value, and å k =1 X 2 is the sum of the squares of all original values in a dataset. A benefit of L2 regularization is that it handles the problem of overfitting well when the dataset is relatively small. Formula 5 is used to calculate the Euclidean distance between two stylistic panoramas built on the MFWs/MFWSs, where x and y are the coordinates of each panorama and AB is the distance between the panoramas. For the algorithms used to analyze the stylistic indices, the study employs hierarchical cluster analysis (HCA) and principal component analysis (PCA) for the reasons elaborated in Section 11.2. HCA is an unsupervised machine learning procedure that groups similar objects into a category (Christopher et al. 2008, 321) and is amenable to small sample sizes with the O = 2k principle4 (Formann 1984). By contrast, PCA is based on the idea of reducing a substantial number of variables into a smaller number of transformed variables (Manly 2016, 103) and can thus help measure the overall similarities and/or differences between (translated) texts. 11.3.3 Analytic Steps The stylometric analyses of the selected works require several steps. First, the data are cleaned by removing all the paratexts (prefaces, appendices, footnotes, etc.) as a preparatory step to minimize any possible influence of these on the results. Next, we retrieve the stylistic indices of the raw data from each text in the research, using software computation and manual calculations. Third, we build stylistic panoramas with the retrieved data, by using R 4.03 to perform HCA and PCA separately on the normalized data. This step reveals the potential stylistic connections between the translated Wuxia and the two subgenres of Western heroic literature. Finally, the results are interpreted in light of the current reception of some Wuxia translations, with reflection on theoretical implications on the use of stylistic panoramas in stylometric analyses. Translated Chinese Wuxia Fiction and Western Heroic Literature 197 11.4 Results 11.4.1 Stylistic Panoramas Based on Formal Indices Table 11.4 summarizes the raw values of the seven indices (AWL, DWL, MATTR, VAR, ASL, DSL, and APL). These raw values suggest that the translated Wuxia works and the two subgenres of Western heroic literature are stylistically similar Table 11.4 Statistics of the Stylistic Indices across the Genres Subgenre Chivalric Stories Heroic Fantasies Translated Wuxia Fiction Fiction Don Quixote Ivanhoe Parzival In the Days of Chivalry Heroes and Heroines of Chivalry Castles, Knights, and Chivalry The Lord of the Rings, 1–2 A Song of Ice and Fire, 1 Conan the Barbarian Wheel of Time, 1 The Chronicles of Amber, 1–4 The Chronicles of Narnia, 1–5 Fox Volant of the Snowy Mountain The Book and Sword The Deer and the Cauldron A Hero Born A Bond Undone A Snake Lies Waiting Word Sentence Para. AWL DWL MATTR VAR ASL DSL APL 4.27 4.46 4.22 4.25 2.21 2.35 2.14 2.09 0.50 0.55 0.54 0.51 0.77 0.71 0.76 0.71 48.17 33.07 19.62 25.10 26.56 18.29 9.68 18.79 71.83 56.57 75.78 82.88 4.07 1.93 0.46 0.80 23.12 14.53 67.60 4.32 2.14 0.53 0.81 11.75 6.72 29.38 4.09 1.92 0.50 0.75 14.63 10.14 45.70 4.16 1.93 0.52 0.77 11.82 8.49 33.60 4.44 2.23 0.55 0.72 26.43 10.25 49.68 4.26 2.08 0.53 0.79 18.75 8.71 48.26 4.09 2.15 0.50 0.80 11.12 8.46 38.89 4.14 1.99 0.50 0.76 17.47 11.51 31.28 4.56 2.36 0.53 0.75 19.73 10.76 46.84 4.29 2.10 0.50 0.80 13.07 8.20 33.31 4.38 2.26 0.52 0.76 39.18 14.84 26.00 4.39 4.47 2.20 2.27 0.55 0.56 0.78 0.77 22.82 21.67 7.63 7.40 33.70 29.06 4.34 2.13 0.54 0.79 17.41 7.73 31.27 198 Kan Wu and Dechao Li Figure 11.1 Stylistic panoramas of the three subgenres, from a global view. at the word level but divergent in terms of sentences and paragraphs. This trend could reflect multiple factors, from different literary norms to translatorial/authorial idiosyncrasies, thus giving target readers of the three subgenres varied reading experiences. To further probe the stylistic connections, we examine the stylistic panoramas formed by these formal indices from both global and local perspectives. While the examination at the global level attempts to capture the panorama of each subgenre, the investigation of local perspectives compares the panoramas of single works in the three subgenres. 11.4.1.1 Stylistic Panoramas from a Global Perspective A global comparison of the stylistic panoramas between the Wuxia translations and the chivalric stories/heroic fantasies is meant to locate the stylistic features of the Wuxia translations in relation to those of the Western heroic literature. Before that comparison, however, the formation of a stylistic panorama requires that we normalize the data under each column in Table 11.4 for better data comparability across the indices, using formula 4 (explained in Section 11.3.2). Then, we tally the normalized data for each stylistic index to obtain a total value, which is the statistical ingredient of the stylistic panorama at this global level. The Translated Chinese Wuxia Fiction and Western Heroic Literature 199 stylistic panoramas that are formed in each of the three subgenres are presented in Figure 11.1 as radar charts. They reveal several stylistic patterns. At the word level, a conspicuous pattern is that the translated Wuxia has higher normalized values than the two subgenres of Western heroic literature. This is easily detected in the moving-average type-token ratio (MATTR) values, in which the normalized value for the translated Wuxia novels is 1.44 and the values for the chivalric stories and heroic fantasies are both 1.40. That result may point to the use of a comparatively richer vocabulary in the translated Wuxia. In addition, the highest dispersion of word lengths (DWL) value is 1.47, demonstrating that the word length in the translated Wuxia novels is generally more variable than that in the two Western subgenres. Likewise, a normalized average word length (AWL) of the translated Wuxia stories at 1.45 indicates the use of longer and more complex words. One potential reason for such a tendency would be a shared preference by the Wuxia translators to use longer and more complex words for explanative renditions in English, because the original Chinese versions by Cha contain many culturally loaded Wuxia concepts. Also, the highest verb-adjective ratio (VAR) value is found in the Wuxia translations, which reveals that they use more verbs than their Western counterparts do. This has numerous stylistic effects, including a more vivid reading experience to the target readers. At the sentence and paragraph levels, there are two noticeable patterns. First, the chivalric stories have higher values overall. For instance, the normalized ASL (average sentence length) and DSL (dispersal sentence lengths) values for the chivalric stories, being 1.58 and 1.76, respectively, are much higher than those of the other two subgenres. This could mean that most sentences in the chivalric stories are more complex and more varied than those in the other two subgenres. Similarly, the normalized APL (average paragraph length) for the chivalric stories is 1.83, a value far greater than those for the translated Wuxia novels and the heroic fantasies. A possible explanation would be the use of different literary norms between the subgenres of heroic literature. Close reading of the selected chivalric stories reveals that they tend to pack more sentences into a single paragraph, thus often pushing their APLs to higher values and presenting readers with longer paragraphs. All these may suggest that, as an old form of heroic literature, chivalric stories are stylistically more complex and more varied in terms of sentences and paragraphs than the other two subgenres. Second, the ASL and APL values in the Wuxia translations and the heroic fantasies demonstrate a less consistent but meaningful trend. Whereas the Wuxia translations have a higher ASL value of 1.32 yet a lower APL value of 0.96, the heroic fantasies bear a greater APL value of 1.18 but a lower ASL value of 0.99. This indicates that the Wuxia translations may have many short paragraphs built on relatively longer sentences – an important stylistic feature which distinguishes the Wuxia translations from the heroic fantasies. 11.4.1.2 Stylistic Panoramas from a Local Perspective A local comparison is meant to investigate whether the stylistic panorama pattern deriving from the global comparison would vary when we make comparisons 200 Kan Wu and Dechao Li Figure 11.2 Cluster dendrogram of the HCA-based stylistic panoramas. across the three subgenres based on single heroic works. The study holds that such a local comparison is necessary because it reveals how such extrastylometric factors as translatorial motivations and publication years might affect the stylistic patterns. To make that comparison, we use the normalized data and resort to HCA to produce stylistic panoramas. The HCA-based stylistic panoramas are produced according to the Euclidean distance between the texts and using the maximum distance method in computation. The output from R 4.03 is presented in Figure 11.2 as a cluster dendrogram: the horizontal axis shows the titles of the 18 selected works, and the vertical axis records the divergence of clusters. The results show that the Wuxia translations differ in important ways from their Western counterparts and from each other. The Wuxia translations published in more recent years (i.e., 2018, 2019, 2020) appear to form a distinct category that is not only stylistically different from the selected chivalric stories and heroic fantasies but also differs from the rest of Wuxia translations in the dendrogram. This trend is clearly shown by the stylistic panoramas of A Snake Lies Waiting, A Bond Undone, and A Hero Born, which indicate that the three recent Wuxia translations may share certain similarities in terms of language use. In that regard, the study demonstrates that a short publication span and close translatorial cooperation could be two factors that shape this stylistic similarity among the three Wuxia translations. Notably, the three translations were consecutively published in 2018, 2019, and 2020, and the two translators, Holmwood and Chang, had worked with a kindred spirit Translated Chinese Wuxia Fiction and Western Heroic Literature 201 in their translations in an effort to “see the foreign interest in China and its culture” (Mei 2019). By contrast, the Wuxia translations from earlier periods (i.e., the 1990s and the 2000s) are stylistically diverse from each other but close to some works of the heroic fantasies. It is noteworthy in Figure 11.2 that the stylistic panoramas of The Book and Sword, Fox Volant of the Snowy Mountain, and The Deer and the Cauldron are comparatively divergent from each other, despite that the three translations are produced within a relatively short span of 11 years. Instead, the stylistic panoramas of Fox Volant of the Snowy Mountain and The Book and Sword are respectively similar to those of Wheel of Time 1 and A Song of Ice and Fire 1, two works of heroic fantasies published likewise in the 1990s. The stylistic panorama of The Deer and the Cauldron is another story: it differs from that of the other works in the three subgenres and forms an independent category, which places the Wuxia translation somewhere between the chivalric stories and the heroic fantasies in terms of style. That uniqueness may indicate variations in the language use and thus the readability of the three Wuxia translations – a scenario that we tend to associate with different translatorial motivations. For example, the motivation behind the translation of Fox Volant of the Snowy Mountain may have been “promoting Chinese martial arts cultures overseas” (Wu and Li 2018, 100), while that behind the production of The Deer and the Cauldron may have been “winning overseas readership,” and by juxtaposition, the motivation underlying the translation of The Book and Sword may have been “learning the Chinese language/culture” (Wu and Li 2018, 101). As a result, the translators of the three earlier Wuxia translations are likely to have used different vocabularies (more verbs, different culturally loaded words, etc.) and to have varied the sentence/paragraph lengths in their translations. 11.4.2 Stylistic Panoramas Based on MFWs/MFWSs The stylistic panoramas based on the formal indices show that the Wuxia translations are not only largely different from the works of the two Western subgenres but also divergent from each other. Therefore, the study seeks to determine whether such stylistic patterns would remain or change when we compare the panoramas on the basis of the most frequent words (MFWs) and most frequent word sequences (MFWSs) through the PCA. 11.4.2.1 Parallel Lists of MFWs/MFWSs Parallel lists of the MFWs/MFWSs in the texts (shown partially in Tables 5–75) include information about the MFWs/MFWSs and the titles of each work. The remaining rows list the words/word sequences and their proportions in each work. To facilitate comparability, proportions are analyzed rather than raw frequencies, given the different lengths of the works. Finally, contractions such as “I’ve,” “he’ll,” and “you’d” in all works are analyzed as single words rather than as separate words. Such a decision contrasts with previous studies (see Tognini-Bonelli 2001; Laviosa 2002; Mastropierro 2018), which treated those forms as separate 202 Kan Wu and Dechao Li Figure 11.3 Sample parallel list for the MFWs in the selected works. Figure 11.4 Sample parallel list for the MFWSs (2-grams) in the selected works. Figure 11.5 Sample parallel list for the MFWSs (3-grams) in the selected works. words for certain morphological and/or phonetic reasons. However, these linguistic options have stylistic impact in the present study, so it is reasonable to account for them as instances of stylistic choice. Before proceeding to the actual analyses, it is important to decide how many word/word sequences in each text to consider from the tops of the parallel lists in order to gain the MFWs/MFWSs-based panoramas. Because previous studies Translated Chinese Wuxia Fiction and Western Heroic Literature 203 (see Burrows 2002; Rybicki 2012; Grabowski 2013; Mastropierro 2018) share no agreement on this number, we determine it through repeated pilot studies according to this principle: the stability of analytic results improve with the increase of the MFWs/MFWSs numbers but remain relatively unchanged once the numbers reach a certain point, at which a stylistic panorama is formed. Therefore, in the present analysis, we test with numbers from 100 to 2,000, using increments of 50, to arrive at the point at which stable results form a stylistic panorama. That point turns out to be the top 1,000 in the case of the MFWs-based panoramas, the top 950 in the case of 2-grams, and the top 900 in the case of 3-grams. For better consistency, we use the top 1,000 MFWs/MFWSs entries as the benchmark for conducting PCA analysis. 11.4.2.2 Overall Patterns of Stylistic Panoramas The PCA results based on the MFWs, 2-grams, and 3-grams are depicted in Figures 11.6 through 11.7, respectively. Figure 11.6 shows the overall extent to which the selected works from the three subgenres differ on the basis of the stylistic panoramas formed by the top 1,000 Figure 11.6 PCA graph of individuals, based on the top 1,000 MFWs. 204 Kan Wu and Dechao Li Figure 11.7 PCA graph of individuals, based on the top 1,000 2-grams most frequent words (MFWs). The dots in the figure are the stylistic panoramas of the heroic literary works represented by the MFWs, and the horizontal and vertical axes are the two principal components (dimensions) that represent the majority of data variance in the parallel word list. The metric distance between two dots signifies a possible diversity level between the stylistic panoramas of two works. The general principle is that the greater the metric distance between two dots, the higher the diversity level between the stylistic panoramas of the two works. The axes are unlabeled because they are the results of a dimensionality reduction6 through the principal component analysis (PCA) – specifically, an unsupervised machine learning method, in which datasets are often unlabeled, unclassified, or uncategorized (Saslow 2018). The two percentiles in the brackets along the axes are the level of variance carried by the two components (dimensions): the first component (Dim 1) represents 16.80% of the variance across the data, whereas the second component (Dim 2) represents 12.74% of that variance. Those results Translated Chinese Wuxia Fiction and Western Heroic Literature 205 Figure 11.8 PCA graph of individuals, based on the top 1,000 3-grams. show that Dim 1 has more data variance than Dim 2 does, thus implying that the distances between the data points along the horizontal axis bear greater variance than those along the vertical axis do. In that light, the message conveyed by Figure 11.6 is clear: the relatively short distances between the dots representing the stylistic panoramas of the six Wuxia translations suggest that these translations share similarities in their MFWs. Meanwhile, longer distances between these dots and dots symbolizing the panoramas of the chivalric stories and heroic fantasies suggest that the translated Wuxia works are very different from the two Western subgenres in terms of their MFWs. Similarly, the dots that represent the chivalric stories (except for Castles, Knights, and Chivalry) and the heroic fantasies are mainly packed within their own subgenres and are distant from the ones representing works of other subgenres. That orientation indicates that most heroic works belonging to the same subgenre are prone to sharing their MFWs in texts. When it comes to the panoramas formed by the top 1,000 most frequent word sequences (MFWSs), the stylistic pictures are largely similar to those stemming 206 Kan Wu and Dechao Li from the top 1,000 MFWs, despite there being some slight differences. In Figure 11.7, the first principal component (Dim 1) shows 16.98% of the variance across the data, whereas the second principal component (Dim 2) carries 12.26% of that variance – a pattern that resembles the MFWs-based PCA results. Likewise, five of the six dots representing the Wuxia translations are close to each other but distant from the dots representing the works of other subgenres, except for the one for The Deer and the Cauldron, which is closest to the dot representing A Song of Ice and Fire 1, a work of heroic fantasies. This suggests that Minford’s Wuxia translation may bear strong similarities to the novel by Martin in terms of 2-grams. In addition to the dots for the Wuxia translations, the dots symbolizing works of the two Westerns subgenres largely stay within their own subgenres, with the exception of the dot for Castles, Knights, and Chivalry, which lies closer to dots representing the Wuxia translations. In Figure 11.5, the stylistic panoramas formed by 3-grams demonstrate a mirrored but otherwise almost identical pattern with that in Figure 11.7, even though Dim 1 on the horizontal axis shows 15.03% of the data variance and Dim 2 on the vertical axis has 10.39% of such variance. 11.4.2.3 Metric Distances between Stylistic Panoramas The above PCA results reveal how the MFWs/MFWSs-based stylistic panoramas of the Wuxia translations are similar to and/or different from those of the chivalric stories and heroic fantasies. However, they have only illustrated a general side of the stylistic picture wherein the exact level of similarities between a Wuxia translation and other works in the graphs is still unknown. To determine that level, we need to calculate the metric distances between each of the two dots through their coordinates, which are simultaneously generated in the PCA. With the coordinates of each dot, we use formula 5 to compute their metric distances, and then we focus on the average distances between the dots representing the Wuxia translations and the dots of the chivalric stories/heroic fantasies. In that way, the exact metric distances between the Wuxia translations and the chivalric stories/heroic fantasies are measured. Those average distances are reported in Table 11.5. As the table shows, the MFWs-based stylistic panorama of The Deer and the Cauldron by Minford has the shortest average distance (32.80) to the panoramas Table 11.5 Average Distances between Each Wuxia Translation and the Western Counterparts Fiction MFWs 2-grams 3-grams Fox Volant of the Snowy Mountain The Book and Sword The Deer and the Cauldron A Hero Born A Bond Undone A Snake Lies Waiting 38.32 43.12 32.80 42.74 39.15 42.72 34.91 31.86 26.55 37.88 35.47 37.42 31.34 31.44 25.91 38.75 33.43 39.35 Translated Chinese Wuxia Fiction and Western Heroic Literature 207 of the selected chivalric stories and heroic fantasies, whereas the MFWs-based panorama of A Hero Born by Holmwood has the longest average distance to the other subgenres (42.74). The average distances from the MFWs-based panoramas of the remaining Wuxia translations to those of the chivalric stories and heroic fantasies fall within the range of 38.32 to 42.74 and hence are significantly greater than that of Minford’s translation. Of the MFWSs-based panoramas, the previously described stylistic scenario seems to repeat itself in the case of 2-grams, but it bears nuances in the case of 3-grams. The 3-grams-founded panoramas illustrate that even though Minford’s translation still has a noticeably shorter average distance to the other subgenres at 25.91, A Snake Lies Waiting by Chang has the longest distance of 39.35, a value that is slightly higher than that of Holmwood’s translation at 38.75. All these numbers suggest that with regard to the top 1,000 MFWs- and MFWSs-based panoramas, most of the chosen Wuxia translations are stylistically different from the selected chivalric stories and heroic fantasies, with the exception of The Deer and the Cauldron by Minford. This result of metric distances is in line with our direct observations of the PCA individuals, as shown in Figures 11.3 through 11.5, which we would relate again to translatorial motivations: Minford’s motivation to win readerships in the English-speaking world (Cf. Section 11.4.1) may partly explain why his Wuxia translation resembles the chivalric stories/heroic fantasies in terms of the MFWs/MFWSs. 11.5 Discussion With the two types of stylistic panoramas, the present study has illustrated the extent to which translated Wuxia and the two Western subgenres are stylistically connected. The stylistic panoramas based on the formal indices show that there are few similarities between the Wuxia translations and the stories of two Western subgenres, despite that certain similarities do exist between those Wuxia translations and heroic fantasies published in the 1990s/2000s. The MFWs-/MFWSsbased stylistic panoramas reveal that the Wuxia translations are stylistically different from those of most of the chivalric stories and heroic fantasies, with the exception of The Deer and the Cauldron. These findings, together with the study, have practical and theoretical implications with respect to the research questions. On the practical side, the findings indicate reasons for the reception of Wuxia translations: unique stylistic features (richer Wuxia-specific vocabularies, shorter paragraph lengths, etc.) which distinguish Wuxia from both chivalric stories and heroic fantasies could be a possible reason that these Wuxia translations are well received. Table 11.9 summarizes the five-scale ratings of the six Wuxia translations by readers from four well-known websites of book promotions and reviews. Because some ratings are not available in Novelupdates and/or Audible, we focus on the average rating of each translation for better comparability. The table shows that A Hero Born, A Bond Undone, and A Snake Lies Waiting have the top three average ratings. We attribute the favorable ratings of the three Wuxia translations to their stylistic uniqueness, which is partly shown in the following two aspects. 208 Kan Wu and Dechao Li Table 11.6 Reception of the Six Wuxia Translations in English (Up to 02/2021) Fiction Amazon Goodreads Novelupdates Audible Fox Volant of the Snowy Mountain The Book and Sword The Deer and the Cauldron A Hero Born A Bond Undone A Snake Lies Waiting 3.40 of 5 3.84 of 5 3.00 of 5 n/a 3.41 of 5 4.80 of 5 4.20 of 5 4.60 of 5 4.70 of 5 4.80 of 5 3.20 of 5 4.40 of 5 4.30 of 5 n/a n/a n/a n/a 4.70 of 5 5.00 of 5 4.70 of 5 3.96 of 5 4.29 of 5 4.41 of 5 4.70 of 5 4.63 of 5 3.89 of 5 4.28 of 5 4.02 of 5 4.39 of 5 4.39 of 5 Average First, regarding the stylistic panoramas founded on the formal indices, relatively higher MATTR but lower DSL and APL values could contribute in part to favorable ratings. A high MATTR value suggests a rich vocabulary, which in Wuxia translations could mean the readers may receive an extended cultural experience of martial arts with a greater use of Wuxia-specific words. For instance, when rendering the original names of martial heroes and kung fu fighters, Holmwood and Chang both use more creative words, such as “Ryder Han,” “Ironheart Yang,” “Twice Foul Dark Wind,” “Nine Yin Skeleton Claw,” and the like. By contrast, in the earlier (the 1990s/2000s) Wuxia translations, those elements are sometimes presented less interestingly because of transliteration and/or omission. In addition, a lower DSL value in these translations could indicate a more repetitive yet consistent translation of the Wuxia-specific terms across sentences. Such an explanation would help us form a coherent impression of the fictional Wuxia world created by these terms. Finally, the lower APL values may reduce the readers’ reading efforts when they come across certain culturally alien and linguistically idiosyncratic Wuxia elements. For instance, when the readers read a paragraph which contains many Wuxia-specific words (Ryder Han, Ironheart Yang, etc.), a relatively short paragraph with a low APL value around 30 (see Table 11.4) is more likely to reduce their cognitive load as they process these Wuxia elements. All these unique stylistic features may motivate readers to rate the three translations favorably. Second, in terms of the MFWs-/MFWSs-based stylistic panoramas, a greater use of words and word sequences related to body language, body parts, mood, or inner feelings could likewise lead to more favorable ratings. When we look through the parallel lists, body-language words, such as “sighed,” “pointed,” “nodded,” “shouted,” and the like, appear frequently in the three translations, and words related to mood, such as “worried,” “angry,” “surprised,” “scared,” and so on, are also widely used in the same translations. In addition, 2- and/or 3-grams about body parts, such as “his neck,” “his chest,” “head and arms,” and “in his hand,” as well as ones for inner feelings, such as “dared to,” “refused to,” “had no idea,” and “he wondered about,” create vivid characterizations. This is because a preservation of the original descriptions of body language in the Wuxia translations may shorten the psychological distance between target readers and Translated Chinese Wuxia Fiction and Western Heroic Literature 209 the reconstructed Wuxia heroes/heroines, who are “alive” with perceptible human kinetic and/or mental presentations. On the theoretical side, the study casts light on the use of stylistic panoramas in stylometric analyses in the following ways. First, the study attaches importance to the intended function of the chosen stylistic indices when using them as building blocks of a stylistic panorama. The seven formal indices are selected to show the general stylistic features of the heroic works, while the purpose in analyzing the MFWs and MFWSs is to identify the lexical resources in the same works. This could be important to a stylometric study because it binds indices together through a shared function. Nonetheless, the selection criteria of stylistic indices in some previous studies (Hossain et al. 2017; Liu and Xiao 2020) are not always made clear to readers. As a result, possible functions associated with those indices are often underexplored, which could lead to a tenuous connection between the selected indices. Second, the study values triangulation of different types of stylistic panoramas when exploring holistic stylistic pictures of (translated) texts under investigation. For example, when the study concludes that the Wuxia translations are stylistically different from the chivalric stories and heroic fantasies, it has done so by triangulating the results stemming from the stylistic panoramas based on the seven formal indices and ones built by the MFWs and MFWSs. In this way, the study not only takes multiple functionally related stylistic features into account but also locates the stylistic pictures of the same genre from different stylistic perspectives. By contrast, some existing stylometric analyses of (translated) texts have confined their stylistic explorations to formal indices (Hossain et al. 2017; Liu and Xiao 2020) or MFWs/MFWSs (Eder 2017; Haverals et al. 2022) without attempting to triangulate the results from both sides. Consequently, extra stylistic pictures stemming from such triangulation are sometimes ignored in those analyses. In this light, we hold that such triangulation of stylistic panoramas may benefit stylometric analyses, as it helps the analyses transcend a single stylistic perspective by bringing multiple stylistic perspectives into play. Third, the study holds that when we use stylistic panoramas in stylometric studies that highlight linguistic characteristics at a single level, however, there might be some weakness. This is especially evident when we attempt to use both types of stylistic panoramas. Despite the edge offered through triangulation, it would be less appropriate to use them simultaneously in studies dedicated to such singlelevel characteristics as words, word sequences, or sentences, since their scopes of investigations would be too narrow. Nevertheless, as the concept of stylistic panoramas is now in its infancy, it still has room for further development to satisfy the theoretical and methodological needs of varied stylometric studies. 11.6 Conclusion Returning to the original research interest, the study can now give a clear answer: stylistic connections between translated Chinese Wuxia and Western heroic literature 210 Kan Wu and Dechao Li are weak because the stylistic panoramas founded on the formal indices and the MFWs/MFWSs have demonstrated important stylistic differences across the genres. Despite these divergences, the study has made the following contributions to Wuxia translation research and stylometric studies: first, it highlights possible stylistic connections between heroic literature in the East and that in the West, clues which may help understand the reception of Chinese Wuxia in the West. Second, it demonstrates the use of the stylistic panorama, a concept that seeks to describe the stylistic picture of a (translated) text in a relatively holistic way by binding different stylistic indices together, with respect to function. Nonetheless, this study has several limitations, one of which is that the stylometric analyses are founded on a relatively small number of Wuxia translations. Even though the study has included all the English Wuxia translations published at the time of writing, it is assumed that when there are more Wuxia translations in the future, the results might be slightly different due to various translatorial (translators’ motivations, preferences, etc.) and/or extratranslatorial (patronage intervention, sociocultural influences, etc.) reasons. Furthermore, the present selection of stylistic indices in the formation of panoramas has considered only general features, whereas an alternative selection that favors more idiosyncratic features (hapax legomena related to martial arts, chivalry, fantasies, etc.) pertaining to the heroic literature could be equally potent in uncovering stylistic connections between heroic literature in the East and that in the West. For further research along this line, the publication of additional Wuxia translations could allow works by different authors to be incorporated into the corpus to produce more insightful results. In the meantime, stylistic indices that focus on various idiosyncratic features of heroic literature can be considered to widen the scope of meaningful research. Funding This work was supported by the Research Grants Council of the Hong Kong Special Administrative Region, China [PolyU/RGC 15602621]. Notes 1 The article was originally published on April 23, 2022, in Digital Scholarship in the Humanities (DSH), DOI: 10.1093/llc/fqac019. It is reused under license 5304200620907 permitted by Oxford University Press. Credit goes to DSH, Oxford University Press, the European Association for Digital Humanities (EADH), and Alliance of Digital Humanities Organizations (ADHO). 2 All Wuxia translations used in the study were purchased from Amazon.com as e-books. 3 All the Western heroic works used in the study were available in the public domain and were downloaded freely from Gutenberg.org as “txt” files. 4 O is the minimum sample size, and k is the number of variables. 5 Full tables are available in Figshare (DOI: 10.6084/m9.figshare.19361468). 6 Because PCA is a multivariate statistical analysis that operates according to dimensionality reduction (Manly 2016, 102–3), multiple dimensions in the analysis were compressed into two dimensions – a more manageable scale for the present work. Translated Chinese Wuxia Fiction and Western Heroic Literature 211 References Březina, Vaclav. 2018. Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press. Brocardo, Marcelo Luiz, Issa Traore, and Isaac Woungang. 2014. “Toward a Framework for Continuous Authentication using Stylometry.” 2014 IEEE 28th International Conference on Advanced Information Networking and Applications, Victoria, BC, Canada, 106–15. Broder, Andrei Z., Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. 1997. “Syntactic Clustering of the Web.” Computer Networks and ISDN Systems 29, no. 8: 1157–66. Burrows, John. 2002. “ ‘Delta’: A Measure of Stylistic Difference and a Guide to Likely Authorship.” Literary and Linguistic Computing 17, no. 3: 267–87. Christopher, Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge: Cambridge University Press. Covington, Michael, and Joe D. McFall. 2010. “Cutting the Gordian Knot: The MovingAverage Type – Token Ratio MATTR.” Journal of Quantitative Linguistics 17, no. 2: 94–100. Craig, Hugh. 2018. Intelligent Archive 3.0. Newcastle: University of Newcastle. Daelemans, Walter. 2013. “Explanation in Computational Stylometry.” In Computational Linguistics and Intelligent Text Processing, edited by Alexander Gelbukh, 451–62. Berlin: Springer. Eder, Maciej. 2017. “Visualization in Stylometry: Cluster Analysis Using Networks.” Digital Scholarship in the Humanities 32, no. 1: 50–64. Eder, Maciej, Jan Rybicki, and Mike Kestemont. 2016. “Stylometry with R: A Package for Computational Text Analysis.” The R Journal 8, no. 1: 107–21. Flannery, Mary. 2012. “The Concept of Shame in Late‐Medieval English Literature.” Literature Compass 9, no. 2: 166–82. Formann, Anton. K. 1984. Die Latent-class-analyse: Einführung in Theorie und Anwendung. Weinheim: Beltz. Grabowski, Łukasz. 2013. “Interfacing Corpus Linguistics and Computational Stylistics: Translation Universals in Translational Literary.” International Journal of Corpus Linguistics 18, no. 2: 254–80. Haverals, Wouter, Lindsey Geybels, and Vanessa Joosen. 2022. “A Style for Every Age: A Stylometric Inquiry into Crosswriters for Children, Adolescents and Adults.” Language and Literature 31, no. 1: 62–84. Holmes, David I. 1998. “The Evolution of Stylometry in Humanities Scholarship.” Literary and Linguistic Computing 13, no. 3: 111–17. Honegger, Thomas. 2010. “Heroic Fantasy and the Middle Ages – Strange Bedfellows or An Ideal Cast?” Itinéraires: Littérature, Textes, Cultures, no. 3: 61–71. Hossain, M. Tahmid, Md. Moshiur Rahman, Sabir Ismail, and Md Saiful Islam. 2017. “A Stylometric Analysis on Bengali Literature for Authorship Attribution.” In ICCIT 2017: 20th International Conference on Computer and Information Technology, 1–5. Dhaka, Bangladesh: IEEE. Hou, Renkui, and Churen Huang. 2020. “Robust Stylometric Analysis and Author Attribution Based on Tones and Rimes.” Natural Language Engineering 26, no. 1: 49–71. Huang, Yonglin. 2018. Narrative of Chinese and Western Popular Fiction. Berlin: Springer. Jones, Ewan, and Paul Nulty. 2019. “Quantitative Measures of Lexical Complexity in Modern Prose Fiction.” Digital Scholarship in the Humanities 34, no. 4: 914–37. 212 Kan Wu and Dechao Li Kettunen, Kimmo. 2014. “Can Type-Token Ratio Be Used to Show Morphological Complexity of Languages?” Journal of Quantitative Linguistics 21, no. 3: 223–45. Keulemans, Paize. 2020. Sound Rising from the Paper: Nineteenth-Century Martial Arts Fiction and the Chinese Acoustic Imagination. Leiden: Brill. Laviosa, Sara. 2002. Corpus-based Translation Studies: Theory, Findings, Applications. Amsterdam: Rodopi. Liu, Ying, and Tianjiu Xiao. 2020. “A Stylistic Analysis for Gu Long’s Kung Fu Novels.” Journal of Quantitative Linguistics 27, no. 2: 32–61. Manly, Bryan F. J. 2016. Multivariate Statistical Methods: A Primer (4th ed.). London: Chapman and Hall. Mastropierro, Lorenzo. 2018. Corpus Stylistics in Heart of Darkness and Its Italian Translations. London: Bloomsbury Publishing. Mei, Jia. 2019. “Turning Action into Words.” China Daily, April 19. http://global.chinadaily.com.cn/a/201904/19/WS5cb91987a3104842260b70d3_2.html. Accessed 28 February 2021. Melka, Tomi S., and Michal Místecký. 2020. “On Stylometric Features of H. Beam Piper’s Omnilingual.” Journal of Quantitative Linguistics 2, no. 7: 204–43. Rong, Zheng, Li Jiexun, and Chen Hsinchun. 2006. “A Framework for Authorship Identification of Online Messages: Writing-Style Features and Classification Techniques.” Journal of the American Society for Information Science and Technology 57, no. 3: 378–93. Rybicki, Jan. 2012. “The Great Mystery of the Almost Invisible Translator.” In Quantitative Methods in Corpus-Based Translation Studies, edited by M. P. Oakes and M. Ji, 231–48. Amsterdam: John Benjamins. Saslow, Elliott. 2018. “Unsupervised Machine Learning.” Towards Data Science. https:// towardsdatascience.com/unsupervised-machine-learning-9329c97d6d9f. Accessed 28 February 2021. Scott, Mike. 2012. WordSmith Tools version 6, Stroud: Lexical Analysis Software. https:// www.lexically.net/publications/citing_wordsmith.htm. Stanford NLP Group. 2021. “Stanford Tagger 4.2.0.” https://nlp.stanford.edu/software/tagger.html. Accessed 28 February 2021. Tognini-Bonelli, Elena. 2001. Corpus Linguistics at Work. Amsterdam: John Benjamins. Valencia, Alex I., Helena Gómez-Adorno, Christophe Rhodes, and Gibran Fuentes Pineda. 2019. “Bots and Gender Identification Based on Stylometry of Tweet Minimal Structure and N-Grams Model.” In Working Notes of CLEF 2019 – Conference and Labs of the Evaluation Forum, 1–8. Lugano: CLEF. Vander Elst, Stefan. 2017. The Knight, the Cross, and the Song: Crusade Propaganda and Chivalric Literature, 1100–1400. Philadelphia, PA: University of Pennsylvania Press. Wu, Kan, and Dechao Li. 2018. “Lexical Normalization in English Translations of Jin Yong’s Martial Arts Fiction: A Corpus-Based Study.” In Asia Pacific Interdisciplinary Translation Studies, edited by X. Luo, 93–106. Beijing: Tsinghua University Press. 12 Translating Personal Reference A Corpus-Based Study of the English Translation of Legends of the Condor Heroes Jing Fang and Shiwei Fu 12.1 Introduction Louis Cha Leung-yung, better known by his pen name, Jin Yong, was the most famous martial arts novelist in China. One of his best-selling novels, Legend of the Condor Heroes (射雕英雄傳, henceforth LoCH), first published in 1957, has been a best-selling novel read by the Chinese communities worldwide for more than six decades. More recently, a four-volume English translation of the novel, published between 2018 and 2021, has sparked another round of craze in the English-speaking world, attracting more than 5,000 ratings and almost 1,000 comments on Goodreads. The English translation of the novel has also received attentions from scholars in translation studies. Readers’ reviews have been analyzed (e.g., Zhang and Wang 2020), and intratextual and extratextual translation factors have been further explored in terms of readers’ reception (e.g., Chen and Dai 2021; Xu and Chang 2020). However, among these studies, little attention has been given to the character development in the translation, though LoCH is known for its three-dimensional building of characters. For example, when the main protagonist of the novel, Guo Jing, makes reference to himself in conversation with other characters, Louis Cha used not only the first-person pronoun but also third-person nominal groups (NGs) to portray Guo Jing’s humbleness, a key personality trait of the character that has been widely recognized by the readers of the Chinese source text (ST). In English, however, the lexicogrammatical choices in realizing these personal references are limited. The disparity between the two languages has posed a challenge in translating these referential meanings, which in turn may impact the target text (TT) readers’ understanding of the character, as personal reference in the speech situation is recognized as an important linguistic marker that reflects a character’s personality and his perception of social relations (Ireland and Mehl 2014). Interestingly, our review of the comments made by readers of the translated LoCH on Goodreads (www.goodreads.com/) implies that the English translation had successfully portrayed the humbleness of the protagonist Guo Jing, with 22% of the reviewers particularly mentioning “humble” and “modest” when commenting on Guo Jing’s personality. It would be interesting to explore how the translator managed to adequately DOI: 10.4324/9781003298328-13 214 Jing Fang and Shiwei Fu render the character’s humbleness in the TT through the translation of personal reference in conversations. Adopting a corpus-based approach, we will investigate and compare the lexicogrammatical choices used in the ST and the TT in achieving personal reference, examining how an equally humble character is developed in the translation. 12.2 Character Development in Literary Translation It is necessary to first define the scope of the study. We believe that fictional characters are imitations of real people. As Mead (1990) points out, fictional characters can be recognized or appreciated in the same way as people in real life, and they should not be dehumanized as a purely textual existence. And in analyzing personality traits of real people, it has been found that language is a valid and reliable mode of measuring and understanding personality (Caplan et al. 2020). Therefore, by the same token, it is reasonable to adopt a linguistic approach in analyzing fictional characters by examining the language being used by these characters. In great novels, the depiction of a character’s actions and manners of thoughts makes the character who he is, through which a sense of relative stable or abiding personal quality is achieved (Chatman 1978). Likewise, such character development in the source text (ST) can be reproduced in translation, which concerns the recreation of meaning in context through choice, including both the choice in the interpretation of the original text and the choice in the creation of the translated text (Ma and Wang 2020). In other words, a good translator is able to depict a character equally effective to the TT readers by reconstructing elements such as the character’s traits, emotions, instincts, and relationship through lexicogrammatical choices. In fact, some research has been conducted to study the reshaping of characters in literary translation, focusing on various linguistic features, such as transitivity shifts (e.g., Barbosa de Vasconcellos 1998; Lee 2018), personal pronoun (e.g., Bosseaux 2006; Lin 2020), speech verbs (e.g., Ruano 2017; Mastropierro 2020), phonological feature in dialectal representations (e.g., Rodríguez Herrera 2015), grammatical and lexical choices of geolects (e.g., Alsúa and Edurne 2018). Among these linguistics-based studies of character-building in translation, the work of Bosseaux (2006) and Lin (2020) is particularly relevant to the current study because both of the studies examine the speaker’s usage of personal reference in relation to character development, which is also the focus of the current chapter. According to their studies, pronouns were counted as an indicator to evaluate characterization. Lin (2020), for example, compared the frequency of the first-person pronoun I that had been distributed in different translations and drew conclusions about the speaker’s subjectivity. However, the data collected in Lin’s study were not powerful enough, as no ST data were included in the analysis, and the TT data were analyzed out of context. In the study conducted by Bosseaux (2006), the author argued that, due to the ambiguity in the use of the secondperson pronoun you in the ST, translators were obliged to explicitate the reference when you was used in the TT. However, it is not clear whether the vagueness had been effectively eliminated through the explicitation of the reference, nor did the author provide any readers’ comments in supporting her argument. Translating Personal Reference 215 It seems that a more comprehensive project which involves a context-based analysis of both ST and TT and which takes into account the feedback of the TT readers is needed to provide an in-depth examination of the character-building in the literary translation. Against this background, the current paper aims to use the translation of the LoCH as a case to explore this topic. In particular, using personal reference as the operational variable, we will examine how the translator of LoCH managed to effectively re-establish the humbleness of the protagonist, as evidenced by the TT readers’ reviews. 12.3 Methodology Our analysis of the translation of LoCH will focus on Guo Jing (郭靖), the main protagonist in the novel, and we will investigate how the character makes reference to himself and to his listeners in conversations, through which his personal trait of humbleness is portrayed. It is therefore necessary to briefly introduce this key character. In the novel, Guo Jing is portrayed as a hero with great integrity living in the Song Dynasty (960–1279) who is extremely loyal to his country, respectful and humble to his shifu (masters who teach him martial arts), affectionate to his lover, Lotus (aka Huang Rong), and ruthless to enemies who invade his country. In a society of his time, as a young man who has only just started his adventure in the martial arts world, Guo Jing is considered junior to many characters in the novel, and he is fully aware of his inferior status, which explains his humbleness as portrayed in the book. Guo Jing understandably shows his humbleness and modesty mainly in front of people who are in a cordial relationship with him, such as his teachers and friends. Therefore, our data collection will focus on his conversations with this group of people, and his conversations with his enemies will not be included in our data collection. 12.3.1 Text Collection and Corpus Building A parallel corpus was established for this research, with the Chinese ST data collected from Legends of the Condor Heroes (Jin Yong 1980/2002) and the English TT data from the translation done by Anna Holmwood, Gigi Chang, and Shelly Bryant published by MacLehose Press. The ST was written in vernacular mandarin Chinese featured with classic Chinese styles, providing a rich repository for personal reference choices. After the data collection, we used CorpusWordParser 3.0.0.0 (Xiao 2014) to segment the Chinese ST text into sentences, and used “郭靖 (Guo Jing)” as the search term in AntConc 3.2.0 (Anthony 2006) to filter out his direct projection of locutions – the direct conversations that Guo Jing engages in with other characters. Then we manually removed those conversations between Guo Jing and his enemies, as these data are not the focus of the current project. The remaining conversations and their English translations then form the corpus for analysis, where all the instances of Guo Jing’s self-reference (i.e., in referring to himself as the speaker) and his reference to his addressees in the conversations were tagged, including first- and second-person pronouns and NGs that function as reference items to either the speaker Guo Jing or the addressee that Guo Jing talks to. 216 Jing Fang and Shiwei Fu Altogether, the parallel corpus consists of 20,927 tokens in the ST and 24,114 tokens in the TT. Observation and analysis of the data were carried out in a textual environment through concordance. 12.3.2 Analytical Framework 12.3.2.1 Personal Reference and Its Realization in Lexicogrammar As we will focus on the handling of personal reference in the translation, it is necessary to introduce the term based on the definition given by Halliday and Hasan (1976). According to Halliday and Hasan (1976), personal reference is the reference by means of function in the speech situation, through which readers can identify the speech roles involved in a conversation. These speech roles can be generally realized by pronouns: in English for example, I is used to refer to the speaker, and we to the speaker plus, and you to the addressee(s). In addition, connotations of social distance and social hierarchy are embedded in the use of personal reference, indicating politeness and constructing social identity and interpersonal relations (Wales 1996; Brown and Gilman 2012). In English, NGs are typically used to refer to the third person, a non-interactant who is not involved in a conversation. NGs are rarely used as speaker-reference in English except for being interpreted as baby talk or liturgy (Wales 1996, 57). However, the use of a third-person NG for speaker-reference is not uncommon in the vernacular Mandarin Chinese spoken in Guo Jing’s time, when China was in the imperial dynasty of Song. In the Chinese culture, the notion of “self” is positioned in a heavily relational model which intricates social networks based on age and gender, family ties, affinity, and social hierarchy (Feng 1961). Personal reference terms in general are therefore heavily social distance–calculated and hierarchy-based in Chinese, and even more so in the historical periods, when social ranks were much stricter than in the modern age. And third-person NGs were often used as a politeness strategy to refer to the speaker or to the addressee in Guo Jing’s time, bringing indications of the social relationship between the speaker and the addressee. The use of third-person NGs is therefore an important marker of humbleness, as showing humility is a key politeness strategy in the Chinese culture (Huang 2008). Based on the data observation, the reference items used by Guo Jing in the ST in referring to the two speech roles (i.e., speaker and addressee) are summarized in Figure 12.1. As shown in Figure 12.1, both pronouns and third-person NGs are used in the Chinese text in realizing the reference to the speaker himself and to the addressees, which carry significant implications of the social relationship between Guo Jing and his addressees. 12.3.2.2 Speech Roles and Their Social Status From a social perspective, speech roles, such as “speaker” and “addressee” in a speech situation, can generally be viewed in two dimensions underlying social relationships, including social status and interpersonal distance. This categorization Translating Personal Reference 217 Figure 12.1 Lexicogrammatical realization of speech roles in LoCH (ST). echoes with “power and solidarity” by Brown and Gilman (2012, 252), with the hierarchical status implicating an either equal or unequal (therefore “hierarchical”) relationship, and the interpersonal distance implicating an either cordial or hostile relationship. In the LoCH, a hierarchical relationship exists in both Guo Jing’s cordial circles (such as with his teachers and his friends) and his distant circles (such as with his enemies), implicating a sophisticated social network where the main protagonist functions. However, as we will only focus on the conversations between Guo Jing and characters who are in a cordial relationship with him, the variable “social distance” will be excluded from the data analysis. As a result, “social status” becomes the only variable for analysis when we interpret different speech roles in a social context. In terms of social status, we have found that characters who are in conversation with Guo Jing are either in an equal or a hierarchical relationship with the protagonist. The characters who are of equal social status to Guo Jing are generally fellow youths who are junior in the martial arts society. In comparison, the characters who are of hierarchical social status are generally senior in age and/or hold senior positions in a martial arts organization. It also needs to point out that, in the “hierarchical” group, all the characters listed in this category are in fact of a higher social status than Guo Jing in the novel. This is probably because that, in the novel, Guo Jing is described as a young man raised by a working-class single mother who only has just started his martial arts career. So it is not surprising to see that many characters in the book, except those who are as young as Guo Jing, have a higher social status, as they are either senior in age and/or senior in the martial arts status. 218 Jing Fang and Shiwei Fu 12.3.2.3 Translation Equivalence in Lexicogrammar Once the data were analyzed in terms of personal reference (as speaker-reference or as addressee-reference) and in terms of their lexicogrammatical realization (as pronouns or as third-person NGs), we then compared the lexicogrammatical choices in realizing these referential meanings between ST and TT. Based on the comparison, translation choices in realizing personal reference were then categorized as either “equivalence” or “non-equivalence.” A translation is labelled as “equivalence” when the same type of lexicogrammatical choice is made in the TT in achieving the referential meaning (as either the speaker or the addressee). For example, when a third-person NG is used as the ST in referring to the speaker Guo Jing and has been realized as a third-person NG in reference to the speaker Guo Jing in the TT, this translation is counted as an equivalence. In comparison, a translation is labelled as “non-equivalence” when a different lexicogrammatical strategy is used to achieve the referential meaning in the TT. For example, when a third-person NG in reference to the speaker in the ST is translated as a first-person pronoun I in the TT, this translation is labelled as a non-equivalence. It is important to clarify that, in this paper, the category of “non-equivalence” cannot be interpreted as a case where semantic meaning is not equal between the ST and the TT. Rather, by “non-equivalence” we aim to focus on the different lexicogrammatical choices made by the translators in achieving the same (or equal) referential meanings in semantics. In other words, a semantically equivalent translation of the personal reference may be viewed as a case of “non-equivalence” when a TT choice in lexicogrammar is different from the ST. Following the analysis of the parallel data, statistical analysis was conducted. The statistic work aims to examine whether the translation of personal reference is associated with the social status of the speech roles. As pointed out earlier, readers’ comments indicate that the TT managed to effectively portray Guo Jing’s humbleness despite the interlingual differences between Chinese and English in realizing these referential meanings. By examining the relationship between translation equivalence and the social status of the speaker and his addressee, we try to find possible explanations behind such effective rendition of Guo Jing’s humbleness in the TT, as we assume that the translator was probably highly sensitive to the social status of different characters, based on which a conscious translation choice was made to reflect the social relations between these characters. 12.4 Findings and Discussions 12.4.1 Personal Reference: The Speaker and the Addressee Table 12.1 presents all the lexicogrammatical choices in achieving personal reference in both the ST and the TT with a speech role either as the speaker or the addressee in Guo Jing’s utterances. Translating Personal Reference 219 Table 12.1 Personal References Used by Guo Jing in LoCH Social Status Equal Characters 黃蓉 [Lotus Huang] 楊康 [Yang Kang] 周伯通 [Zhou Botong the Hoary Urchin] 慈 [Mercy Mu] 華箏 [Khojin] 陸乘風 [Zephyr Lu] 傻姑 [Silly] 魯有腳 [Surefoot Lu] Hierarchical 江南七怪 [The Seven Freaks of the South] 洪七公 [Count Seven Hong] 丘處機 [Qiu Chuji] 王處一 [Wang Chuyi] 馬鈺 [Ma Yu] 黃藥師 [Apothecary Huang] 成 思汗 [Genghis Khan] 李萍[Lily Li] 梅超風 [Cyclone Mei] ST TT Speech Roles Speech Roles Speaker (= Guo Jing) Addressee (= You) Speaker (= Guo Jing) Addressee (= You) 我 [I] 兄弟 [your brother] 小弟 [your little brother] 在下 [this insignificant person] 小可 [this worthless person] 你 [you] 您 [you] 兄長 [my elder brother] 兄弟 [my brother] 大哥 [big brother] 兄 [mate] 哥[bro] 賢弟 [my good brother] 世妹 [younger sister] 妹子 [sis] 姑娘 [Miss] 先生 [Mister] 莊主 [lord of the town] 你 [you] 您 [you] 師父 [master] 恩師 [my respected teacher] 道長 [reverence] 前輩 [sir] 您老 [elder]/您 老人家 [the elder] 老丈 [the senior] 大叔 [uncle] 師伯 [uncle] 島主 [lord of the island] 岳父 [father in law]/岳父爹爹 [dad in law] 大汗 [the Khan] 長老 [elder] I you, brother, my brother, sister, master I, me, we, your student, your disciple shifu, his reverence, elder/the elder, sir, uncle, master, the Khan/ the Great Khan 我 [I] 晚輩 [the junior] 在下 [this insignificant person] 弟子 [your student] 徒兒 [your disciple] 小人 [this worthless person] 孩兒 [this son] As illustrated in Table 12.1, in both ST and TT, Guo Jing’s self-reference items can be divided into two categories in terms of lexicogrammatical choices: as the first-person pronoun and as the third-person NGs. The use of third-person NGs in referring to himself has important pragmatic implications reflecting his understanding of the social relationship with his addressee(s). For example, from the self-effacing references “小可” (this worthless person), “孩兒” (this child), 220 Jing Fang and Shiwei Fu Chinese readers can easily tell that Guo Jing is modest, owing to his high sensitivity to the Confucian social rites which value the proper order between older and younger. In terms of reference to the addressees, the choices generally fall into two categories: as the second-person pronoun and as the third-person NGs. In the ST, these choices are highly indicative of different interpersonal relations. For example, both “世妹” (younger sister, a formal term to address a female friend who is younger than the speaker) and “妹子” (sis, a colloquial term used to address a close female friend who is younger than the speaker) can be used to refer to a female friend, but the situational context in which they are used is not the same, thus creating different pragmatic indications. “妹子” (sis) indicates a closer relation with the speaker Guo Jing than the addressee “世妹” (younger sister). Guo Jing’s choice of these referential items manifests his sensitivity to the level of rapport, demonstrating the idea of “orderly propriety” that is deeply rooted in his character. As Table 12.1 shows, in terms of cordial relations, choices of third-person NGs are more diverse in the Chinese ST, with 37 different NGs, including 10 for the speaker-reference and 27 for the addressee-reference. In comparison, this variety diminishes in the TT, with only 11 different NGs, including 4 for the speakerreference and 7 for the addressee-reference. It seems that many NGs used as speaker-reference in the ST are translated as the first-person pronoun I in the TT. To further explore the situation, a quantitative approach is adopted in analyzing the translation of these personal reference items. 12.4.2 Analyzing the Translation of Personal Reference As explained earlier, the third-person NGs are rarely used in English to refer to an interactant speech role, either as the speaker or as the addressee. Meanwhile, the use of these NGs in the Chinese ST seems to be significant in constructing and reflecting the social relations between the speaker and the addressee. Ignorance of these interlingual differences in translation is expected to impact the reshaping of the character in the TT. Interestingly, TT readers’ reviews indicate that the English translation had effectively portrayed Guo Jing’s personal trait of humbleness. Therefore, it would be worth exploring how the translators manage to achieve this despite the challenges posed by the interlingual differences in realizing personal reference. In order to explore this question, a chi-square test of independence was adopted, where the categorical independent variable of “social status” (as either “equal” or “hierarchical”) and the categorical dependent variable of “translation equivalence in lexicogrammar” (as either “equivalence” or “non-equivalence”) are counted. The test aims to check the hypothesis that the lexicogrammatical choices in the translation in realizing the speaker-reference and the addresseereference is associated with their social status. In other words, we assume that the social status of Guo Jing and his addressee is a factor that the translator had taken into account when translating the personal reference, which ensured an effective rendition. Translating Personal Reference 221 12.4.2.1 Translating Speaker-Reference Table 12.2 presents the statistical results of translation equivalence in the case of translating speaker-reference (i.e., the way when Guo Jing refers to himself in a conversation). As shown in Table 12.2, when his addressee has an equal social status to Guo Jing, the translator is more likely to use an equivalent lexicogrammatical choice in achieving the speaker-reference (72.6%). However, when the addressee is in a hierarchical relationship with Guo Jing, a much lower percentage of equivalence is found in the translation (46.7%). Generally speaking, disregarding various social status, translators make more equivalent choices in lexicogrammar (62.1%) than non-equivalent choices (37.9%). The relationship between the two variables is further explored by a chi-square test (Table 12.3). The result in Table 12.3 indicates that there is a highly significant relationship between the two variables, X2 (1, n = 559) = 38.399, p < .001. This means that the translators’ choices between using an equivalent and using a non-equivalent lexicogrammatical item to translate the speaker-reference are closely related to the social status of Guo Jing and his addressee. A closer look at the data has found that, when the speaker Guo Jing is communicating with someone of a higher status, it is more likely that a non-equivalent choice will be made in the TT in translating the speaker-reference, compared with the situation when he talks to someone with an equal status. This tendency is Table 12.2 Translation Equivalence by Social Status (Speaker-Reference) Translation Equivalence in Lexicogrammar Social Status Equal Hierarchical Total Count % Count % Count % NonEquivalence Equivalence 91 27.4% 121 53.3% 212 37.9% 241 72.6% 106 46.7% 347 62.1% Total 332 100.0% 227 100.0% 559 100.0% Table 12.3 Chi-Square Test Result of Translation Equivalence and Social Status (Speaker-Reference) Pearson Chi-Square N of Valid Cases Value df Asymptotic Significance (2-Sided) 38.399a 559 1 <.001 Exact Sig. (2-Sided) Exact Sig. (1-Sided) a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 86.09. b. Computed only for a 2 × 2 table. 222 Jing Fang and Shiwei Fu possibly related to the interlingual differences in achieving the speaker-reference. We find that, in the case of hierarchical relations, the author in the ST uses many NGs to refer to the speaker himself, such as “晚輩” (the junior), “在下” (the inferior), because as aforementioned, Guo Jing is highly sensitive to orderly propriety in social status, and meanwhile, many characters in a cordial relationship with Guo Jing enjoy a higher social status, such as his teachers (shifu) and other senior martial arts masters. When Guo Jing talks to these characters, the use of third-person NGs in referring to himself as someone junior and/or inferior has effectively depicted his modesty and humbleness. However, as it is not common to use such third-person NGs in English, it is not surprising that the translator had to resort to a non-equivalent choice in translating the speaker-reference. Based on the observation of the data, the non-equivalent translation of speakerreference is generally realized in three ways, including a shift in lexicogrammatical choices, omitting the translation of the speaker-reference, or compensating the loss of humbleness by enhancing the sense somewhere else in the sentence. Examples for each of these non-equivalent types are presented in the following: (1) Shift in lexicogrammatical choice. 弟子做錯了事, 但憑六師父責罰。[ST] 弟子 The pupil did it wrong, (who) is ready to accept any punishment from the Sixth Shifu. [literal translation] I have been foolish, I will accept my Sixth Shifu’s punishment. [TT] (2) Omission. 弟子 不 壞事,又沒荒廢了學武,因此沒稟告恩師。[ST] The pupil thought this was not something bad and did no harm to the learning of martial arts, so [I] didn’t report it to [my] obliging teacher. [literal translation] It didn’t seem to be doing any harm to my training. [TT] (3) Compensation. 小可不解,請先生指教。[ST] 小可 This worthless person does not understand. Please enlighten [me], sir. [literal translation] May I humbly apologize? Please enlighten us. [TT] In example 1, Guo Jing is talking to his Sixth Shifu (teacher), to whom he admits his mistake. In the ST, a third-person NG, 弟子 (the pupil), is used to refer to himself, and another third-person NG, 六師父 (Sixth Shifu), is used to refer to the addressee. In the TT, the translator has made a shift by using the firstperson pronoun “I” to refer to the speaker, though an equivalent choice is made in reference to the addressee. There are at least two possible reasons that could explain the shift from a third-person NG in the ST to the first-person pronoun in the TT. Firstly, the use of a third-person NG in referring to the speaker is rare in English; thus, a translation equivalence in lexicogrammar means that it would Translating Personal Reference 223 sound very awkward to the TT readers. By shifting to the first-person pronoun “I,” the speaker-reference becomes very natural to the TT readers. Another possible reason for the shift is that, again, due to the fact that the use of third-person NGs is uncommon in referring to the speaker, a translation equivalence may confuse the English readers as they may not be able to immediately figure out who “the pupil” is. By shifting to a highly unmarked choice in the translation, the referential meaning becomes less ambiguous to the TT readers. This explanation echoes the findings of Bosseaux (2006), who argues that sometimes translators are obliged to explicitate the references in order to reduce ambiguity. In example 2, Guo Jing is talking to another shifu (teacher). In this instance, the third-person NG, 弟子 (the pupil), is used again to refer to himself, and another third-person NG, 恩師(my obliging teacher), is used to refer to the addressee. In the translation, however, both the speaker-reference and the addressee-reference are omitted, and the translator instead provides a summary of the main points in Guo Jing’s utterance. A possible explanation for the translation omission is that perhaps the translator considers the referential meanings in the ST as trivial, and a summary translation of the main idea would suffice, especially when the speaker-reference choice in the ST does not have an immediate equivalent in the TT. However, whether the translation omission should be considered justifiable is still debatable, as the omission of the referential meanings of “I” and “you” could make the conversation less interactive and lead to the loss of important linguistic clues of interpersonal meanings that could be significant in portraying Guo Jing’s humbleness. In example 3, Guo Jing is talking to an educated stranger who obviously looks older than him. In this situation, the addressee has a higher status because age is an important factor in deciding the social status in the Chinese culture, especially in Guo Jing’s time. In the ST, again, third-person NGs are used in reference to the speaker, 小可 (this worthless person), and to the addressee, 先生 (sir). It would be very awkward if an equivalent NG is used in English as a speaker-reference. The translator chose to use the first-person pronoun “I” to refer to the speaker, and in order to prevent the loss of humbleness in Guo Jing’s utterance, an adverb, “humbly,” was added to the translation to compensate for the loss of the sense in the self-reference. From the examples, we can see that, in translating the speaker-reference in a hierarchical relationship, the translator may make non-equivalent choices to avoid awkwardness and ambiguity in the TT while trying to find ways to still convey the character’s humbleness. Such choices can be very effective, as demonstrated by example 3. However, as demonstrated in example 2, such attempts may not always be viewed as appropriate, as a non-equivalent choice such as omission may cause damage to the character development in the TT due to the loss of the interpersonal implications carried by the omitted reference items. 12.4.2.2 Translating Addressee-Reference Table 12.4 presents the statistical results of translation equivalence in the case of translating addressee-reference (i.e., the way when Guo Jing refers to his audience in a conversation). 224 Jing Fang and Shiwei Fu Table 12.4 Translation Equivalence by Social Status (Addressee-Reference) Translation Equivalence in Lexicogrammar Social Status Equal Hierarchical Total Count % Count % Count % Non-Equivalence Equivalence 151 38.7% 45 20.8% 196 32.3% 239 61.3% 171 79.2% 410 67.7% Total 390 100.0% 216 100.0% 606 100.0% Table 12.5 Chi-Square Test Result of Translation Equivalence and Social Status (Addressee-Reference) Pearson Chi-Square N of Valid Cases Value df Asymptotic Significance (2-Sided) 20.319a 606 1 <.001 Exact Sig. (2-Sided) Exact Sig. (1-Sided) a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 69.86. b. Computed only for a 2 × 2 table. The results in Table 12.4 show that, in the case of a hierarchical relationship, most of the time the translator chose to use an equivalent lexicogrammatical item in the TT to refer to the addressee (79.2%), whereas such equivalence rate is noticeably lower when the addressee has an equal status as Guo Jing (61.3%). Another chi-square test of independence has been done to explore the relationship between the two variables (i.e., social status and translation equivalence) in translating the addressee-reference (see Table 12.5). As shown in Table 12.5, the chi-square test of independence indicates a significant relationship between social status and translation equivalence, X2 (1, n = 606) = 20.32, p < .001. More specifically, when Guo Jing and his addressee are in a hierarchical relationship, the translator is more likely to use an equivalent lexicogrammatical item in the TT in reference to the addressee, compared to the situation when Guo Jing talks to someone equal in status. This tendency has again implied that the translators’ choices about addressee-reference are related to the orderly social hierarchy implied in the context. Example 4 illustrates how an equivalent translation of the addressee-reference is achieved in the TT. (4) Equivalent translation of the addressee-reference. 弟子與前輩輩分差著兩輩,倘若依了前輩之言,必定為人笑駡。[ST] 前輩 前輩 Your pupil is two generations the Elder’s junior. If [I] follow the Elder’s advice, [I] must be laughed at by people. [literal translation] Translating Personal Reference 225 Your student is two generations the Master’s junior. If I do as the Elder instructs, I shall be laughed out of the martial world. [TT] In example 4, Guo Jing talks to a martial arts master who is senior both in age and in martial arts levels. In the ST, third-person NGs are used in reference to the speaker Guo Jing, 弟子 (your pupil), and to the addressee, 前輩 (the Elder). In the translation, except for the first time when an equivalent NG (“your pupil”) is used in reference to the speaker, a shift occurs in the next two instances of speakerreference, where the pronoun “I” is used to refer to the speaker Guo Jing. In comparison, equivalent third-person NGs are used in both instances where Guo Jing makes reference to the addressee: first as “the Master,” and then as “the Elder.” The use of a third-person NG to refer to a second-person speech role is highly marked in English, which again indicates that the translator probably has taken into account the social status when translating the personal reference. The uncommon lexicogrammatical items used as reference in the English text is probably related to the translator’s intention to portray Guo Jing’s humbleness in the TT. When comparing the findings in reference to the addressee with those in reference to the speaker (see Section 12.4.2.1), we find that the translator’s strategies seem to be different in translating these two types of personal reference. More specifically, when Guo Jing and his addressee are in a hierarchical relationship, a much lower equivalence rate is found in the translation of the speaker-reference (46.7% in Table 12.2) than the translation of the addressee-reference (79.2% in Table 12.4). A possible explanation for the difference is that, while trying to avoid awkwardness in the TT by making some non-equivalent translation choices, the translator also tries to find ways to compensate for the potential loss of interpersonal meanings originally embedded in the referential items in the ST, which are important personality indicators of the character. Such compensation strategies may happen in a local sentence environment when, for example, the loss of the sense of humbleness is added back to another part of the sentence, as previously illustrated in example 3. In addition to this local compensation strategy, the translator also seems to have adopted another type of compensation which takes place in a broader sense: in response to a nonequivalence in translating one type of the reference, the translator maintains equivalence in another type, which can still carry the interpersonal implications. In our case, the translator maintained a high level of equivalence in translating the addresseereference, but a comparatively lower level of equivalence in translating the reference to the speaker Guo Jing. It is possible to interpret the high equivalence achieved in translating the addressee-reference as a counterbalance measure to compensate for the non-equivalence (such as shifts and omissions) in translating the speaker-reference, since both types of refence typically co-occur in the same utterance setting and both of them can imply the sense of humbleness of the speaker. 12.5 Conclusion In this paper, we have explored the translation of personal reference, including the reference to the speaker and to the addressee, in the novel LoCH. Focusing 226 Jing Fang and Shiwei Fu the data involving utterances of the main character Guo Jing, we investigate how Guo Jing as the speaker makes reference to himself and to his audience, and how these referential meanings, which are important indicators of Guo Jing’s humbleness, are translated in English. Meanwhile, as the conventional lexicogrammatical ways in realizing personal reference are very different in Chinese and in English, we explore how the translator managed to effectively portray an equivalent humble image of Guo Jing, as reflected in the TT readers’ comments. Our findings show that, in both Chinese and English, personal pronouns, including first- and second-person pronouns, are used in referring to the speaker and to the addressee. However, many third-person NGs are also used in referring to the two speech roles in Chinese, which often carry implications of the hierarchical social relations between Guo Jing and his audience, through which his humbleness is effectively portrayed. In the English translation, it has been found that the translator’s decision to maintain or not to maintain an equivalence in lexicogrammatical choices is significantly associated with the social status of Guo Jing and his addressee. When the two parties are in a hierarchical relationship, the translator is more likely to make a non-equivalent choice, such as a shift or an omission, in translating the speaker-reference. Meanwhile, the translator is likely to make an equivalent choice, such as maintaining the use of third-person NGs, in translating the addressee-reference. Our findings also indicate that, when making a non-equivalent choice, the translator also tries to find other ways to compensate for the potential meaning loss (in our case, the loss of the sense of humbleness). And such compensation attempts may happen at the local sentence level or at the macrotextual level, where the non-equivalence in translation of self-reference is counterbalanced by the equivalence in translation of the addressee-reference. By exploring the effective translation practice of translating personal reference in LoCH, we hope to bring useful implications to literary translators who constantly face translation challenges caused by intercultural and interlingual differences. The study is also expected to shed light on how an equivalent character could be developed in the translation through informed choices in lexicogrammar when the two languages are culturally distant. References Alsúa, Goñi, and Miren Edurne. 2018. “Translating Characters: Eliza Doolittle Rendered into Spanish.” Special Issue, Estudios Irlandeses 13, no. 2: 103–19. https://doi. org/10.24162/EI. Anthony, Laurence. 2006. “AntConc (Version 3.2.0) Computer Software.” www.laurenceanthony.net. Accessed 12 April 2022. Barbosa de Vasconcellos, Maria L. 1998. “ ‘Araby’ and Meaning Production in the Source and Translated Texts: A Systemic Functional View of Translation Quality Assessment.” Cadernos de Tradução, no. 3: 215–54. https://doi.org/10.5007/%25x. Bosseaux, Charlotte. 2006. “Who’s Afraid of Virginia’s You: A Corpus-based Study of the French Translations of the Waves.” Meta 51, no. 3: 599–610. https://doi. org/10.7202/013565ar. Translating Personal Reference 227 Brown, Roger, and Albert Gilman. 2012. “The Pronouns of Power and Solidarity.” In Readings in the Sociology of Language, edited by A. Fishman Joshua, 252–75. Berlin: De Gruyter Mouton. Caplan, Jennifer E., Kiki Adams, and Ryan L. Boyd. 2020. “Personality and Language.” In The Wiley Encyclopedia of Personality and Individual Differences, edited by Bernardo J. Carducci, Christopher S. Nave, and Christopher S. Nave, 311–16. Hoboken: Wiley-Blackwell. Chatman, Seymour Benjamin. 1978. Story and Discourse: Narrative Structure in Fiction and Film, Cornell Paperbacks. Ithaca, NY: Cornell University Press. Chen, Lin, and Ruoyu Dai. 2021. “Translator’s Narrative Intervention in the English Translation of Jin Yong’s The Legend of Condor Heroes.” Perspectives 1–16. https://doi.org/ 10.1080/0907676X.2021.1974062. Feng Youlan 馮友蘭. 1961. Zhongguo zhexueshi 哲學史 [History of Chinese Philosophy]. Beijing: Zhong Hua Book Company. Halliday, Michael A. K., and Ruqaiya Hasan. 1976. Cohesion in English. London and New York: Routledge. Huang, Yongliang. 2008. “Politeness Principle in Cross-Culture Communication.” English Language Teaching 1, no. 1: 96–101. https://eric.ed.gov/?id=EJ1082589. Ireland, Molly E., and Matthias R. Mehl. 2014. “Natural Language Use as a Marker of Personality.” In The Oxford Handbook of Language and Social Psychology, edited by Thomas M. Holtgraves, 201–18. New York: Oxford University Press. Jin Yong 金庸. 1980/2002. Shediao yingxiong zhuan 射雕英雄傳 [Legends of the Condor Heroes]. Guangzhou: Guangzhou Publishing House. Lee, Sang B. 2018. “Shifts in Characterization in Literary Translation: Representation of the “I”-Protagonist of Yi Sang’s Wings.” Acta Koreana 21, no. 1: 283–307. https://doi. org/10.18399/acta.2018.21.1.011. Lin, Deng. 2020. “A Corpus-Based Study on Character Image Shaping in English Translated Version of Kuang Ren Ri Ji.” In Proceedings of the 2nd International Conference on Literature, Art and Human Development (ICLAHD 2020), edited by Malini Ganapathy, et al., 344–53. Amsterdam and Paris: Atlantis Press. https://dx.doi.org/10.2991/ assehr.k.201215.404. Ma, Yuanyi, and Bo Wang. 2020. “Demystifying Translation as Recreation of Meaning Through Choice.” In Translating Tagore’s Stray Birds into Chinese, edited by Yuanyi Ma and Bo Wang, 15–30. London: Routledge. Mastropierro, Lorenzo. 2020. “The Translation of Reporting Verbs in Italian: The Case of Harry Potter Series.” International Journal of Corpus Linguistics 25, no. 3: 241–69. https://doi.org/10.1075/ijcl.19124.mas. Mead, Gerald. 1990. “The Representation of Fictional Character.” Style 24, no. 3: 440–52. Rodríguez Herrera, José M. 2015. “The Adventures of Huckleberry Finn and Jim in China: A Case of What Corpus Pragmatics Can Do for the Translation of Dialect.” Digital Scholarship in the Humanities 32, no. 2: 385–97. https://doi.org/10.1093/llc/ fqv058. Ruano, Pablo. 2017. “Corpus Methodologies in Literary Translation Studies: An Analysis of Speech Verbs in Four Spanish Translations of Hard Times.” Meta 62, no. 1: 94–113. https://doi.org/10.7202/1040468ar. Wales, Katie. 1996. Personal Pronoun in Present Day English. Cambridge: Cambridge University Press. Xiao, Hang. 2014. “CorpusWordParser (Version 3.0.0.0) Computer Software.” www. cncorpus.org. Accessed 12 April 2022. 228 Jing Fang and Shiwei Fu Xu Xueying 徐雪英, and Gigi Chang 張菁. 2020. “Cong Jin Yong Shediao yingxiong zhuan yingyi kan zhingguo wemhua zouxiang shijie” 從金庸《射雕英雄傳》英譯看 化如何走向世界 [How Chinese culture goes global from the translation of Jin Yong’s Legends of the Condor Heroes]. Zhejiang Academic Journal 浙江学刊, no. 3: 42–53. Zhang Mi 張汨, and Wang Zhiwei 王志偉. 2020. “Jin Yong Shediao yingxiong zhuan zai yingyu shijie de jieshou yu pingjia” 金庸《射雕英雄傳》在英語世界的接受與評價 [The Reception and Evaluation of Jin Yong’s Legends of the Condor Heroes in the English World]. East Journal of Translation 東方翻譯, no. 5: 18–25. 13 Lexical Bundles in the Fictional Dialogues of Two Hongloumeng Translations A Corpus-Assisted Approach Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto 13.1 Introduction Acclaimed1 as one of China’s four great classical novels, the Chinese classic Dream of the Red Chamber, or in Chinese, Hongloumeng (hereinafter HLM), has drawn attention from both literary and translation researchers over decades. The work is widely acknowledged as one of the greatest Chinese fictions for it paints a vivid picture of the aristocratic families against the broad social background of the late Qing Dynasty (1644–1911). The first 80 chapters of this 120-chapter chronicle were composed by the Qing writer Cao Xueqin, and the Qing scholar Gao E completed the remaining 40 chapters after Cao’s death (Cao and Gao 1982). As a renowned Chinese literary work, the novel has been translated numerous times, hence providing scholars with a good source for comparative translation analysis. From 1979 to 2013, over 1,300 HLM research articles were published, with a majority focusing on the English translations of this classic (Ran and Yang 2013). There are three full-length versions, namely, The Story of the Stone, translated by David Hawkes and his son-in-law, John Minford; A Dream of Red Mansions, by Xianyi Yang and his wife, Gladys Yang; and The Red Chamber Dream, by B. S. Bonsall. The Bonsall version has never been officially published but is currently archived in the University of Hong Kong Library (Bonsall 2004), whereas the first two versions have been read by many people across the globe. Hawkes translated the first 80 chapters, and Minford finished the remaining 40, which parallels the division of labor between the two HLM writers, Cao Xueqin and Gao E. On the other hand, Xianyi Yang seemed to be the major translator of HLM, while his wife, Gladys Yang, served an assisting role. As stated by their daughter Chi Yang (cited in Li et al. 2011, 163): When he [Xianyi Yang] was translating at his top speed, he didn’t write, but simply rendered orally while my mother would type the translation on a typewriter. While she was typing the text, she also polished or edited it. So the translation was ready when all this was done. DOI: 10.4324/9781003298328-14 230 Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto Wang (2016) comments that Hawkes and Minford’s HLM translation is extremely popular among the broad reading public in comparison to the Yangs’ version. Such a difference in popularity has led to a number of studies exploring the various linguistic features between these two versions and translation strategies employed by respective translators. The advances in corpus-based translation studies initiated by Baker (1993) have provided an impetus for translation/translator style research. According to Baker (2000, 244), “it is as impossible to produce a stretch of language in a totally impersonal way as it is to handle an object without leaving one’s fingerprints on it.” Thus, similar to the research on translation universals, researchers have made use of various language indicators, such as type-token ratio, sentence length, lexical density, which are believed to be the translators’ “characteristic use of language and linguistic habits” (Baker 2000, 245), to examine how translators or translations differ. So far as HLM translations are concerned, researchers have compiled parallel and comparable corpora to examine how translations differ in a range of the aforementioned indicators. For example, previous research on HLM translations has identified that Hawkes diverged from the Yangs in various stylistic features (Li et al. 2011; Liu 2008; Liu and Afzaal 2021). In particular, based on the first 15 chapters of the two translation versions, Li et al. (2011) found that Hawkes’s version contained more tokens and used longer sentences than did the Yangs’, whereas the latter used a wider range of words, as reflected in a higher type-token ratio. Other linguistic indicators that have been used to study HLM translations include nominalization (Hou 2013), vocabulary richness (Fang and Liu 2015), and even idioms (Su 2021). To a large extent, researchers are largely confined to the use of word-level indicators to approach the style of HLM translations. As argued by Mastropierro (2018), the use of lexical bundles (LBs), or key clusters, can serve as a reliable indicator of translator’s style, as they can reveal the translators’ idiosyncrasies beyond the use of words. Following Mastropierro, the current study will make use of lexical bundles as a linguistic indicator to examine the fictional dialogues of the first 80 HLM chapters respectively translated by David Hawkes and Xianyi Yang and Gladys Yang. 13.2 Literature Review 13.2.1 Translation Style Research In order to properly define “translation style,” we must know the definition of style in the field of literary studies. Crystal (1999, 323) stated that style is “any situationally distinctive use of language, and of the choices made by individuals and social groups in their use of language.” Leech and Short (1981) specifically proposed four main categories for style analysis in literary works, including lexical category, grammatical category, figures of speech, as well as cohesion and context. Style research in the field of translation studies, to a large extent, borrows heavily from similar research in literary studies. With the rise of descriptive Lexical Bundles in the Dialogues of Hongloumeng Translations 231 translation studies (DTS), which aims at studying translation in its own right and situating it within the target social-cultural background, translation style research has attracted considerable scholarly attention from researchers working in corpusbased translation studies. The traditional prescriptive notion that translation should be faithful to the source text has largely lost its appeal due to the shift toward DTS. Generally speaking, style research mainly falls into two major strands: translator style and translation style. The first one concerns the use of a comparable corpus (Bosseaux 2007; Saldanha 2011) to study the oeuvre of a translator as opposed to the other by capturing “the translator’s characteristic use of language, his or her individual profile of linguistic habits, compared to other translators” (Baker 2000, 245). On the other hand, translation style research is often conducted based on a parallel corpus to examine how two or more translations of a particular work diverge from each other in certain linguistic indicators or features (Li et al. 2011; Mastropierro 2018). However, the two terms are sometimes used interchangeably, as it is practically impossible to examine all the translated works of a translator. Similar to the translation universals (TUs) research, which has benefited from the use of corpus tools, translation style research has also benefited from the methodology of TUs research, including the use of linguistic indicators and analytical frameworks. In the case of HLM, the two full-length translations, which were done at roughly the same time (i.e., 1970s–1980s), have provided a good source for the current study to examine how they differ in style. 13.2.2 Previous Studies on Style in English Translations of HLM Over the years, HLM and its translations have attracted much attention from translation scholars. As a monumental literary work, HLM has multiple translations, including some partial and complete translations. So far, most research efforts have been devoted to comparing the two full-length translation versions, namely, the one translated by David Hawkes and John Minford, and the other by Xianyi Yang and Gladys Yang. Early research on HLM translations is mainly based on qualitative deliberations. According to Yan’s (2005) systematic review of 50 research articles on HLM translations, a majority have adopted comparative methods to study a wide range of topics, ranging from poems to rhetoric devices. Some of the most frequently investigated topics in HLM translations include culture-specific items, book titles, idioms, character names, rhetoric devices, and history of translation. More recent publications also investigated how social terms (Tsao 2020) and material culture-loaded words (Yu 2020) are translated in HLM translations. Some other recent works also scrutinized letters exchanged between translators to discuss the commissioners behind HLM translations (Tong and Morgan 2021). Qualitative HLM research in general has studied a wide range of issues related to HLM translations in a descriptive yet case-by-case manner.2 With the rise of corpus linguistics in the field of translation studies, corpus methods have also been adopted to systematically analyze styles in the HLM translations. To this end, researchers often compiled parallel corpus consisting of the Chinese source text and the English translations. For example, Liu (2008) 232 Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto compared how titles and honorifics were handled in HLM translations. Ji and Oakes (2018) studied earlier HLM translations produced in the eighteenth century using corpus methods and found that Edward Bowra used more conjunctions and genitives while H. Bencraft Joly used more determiners which largely characterized Joly’s translator style. Joly’s version was also compared with the Yangs’ in Hou (2013), which revealed that nominalization construed formality in Joly’s version but conciseness in the Yangs’ version. In two doctoral theses, Hawkes’s and the Yangs’ HLM translations have been studied in detail: Mu (2012) found that Hawkes’s style emphasized events and feelings by following the Western narrative convention; on the other hand, the Yangs’ style was found to be nonevent-oriented and less direct. Wu (2021) further used Biber’s multidimensional analysis to analyze the acceptability of Hawkes’s and the Yangs’ versions respectively. From the development of corpus research on HLM translations, we can see the use of various linguistic indicators – from tokens and lexical types in Li et al. (2011), sTTR and lambda in Fang and Liu (2015), to metaphorical idioms in Su (2021) and lexical bundles in Liu and Afzaal (2021). 13.2.3 Lexical Bundles as an Indicator in Translation Style Research Lexical bundles (LBs), also known as multiword expressions (MWEs), ngrams, and formulaic sequences, mean recurring lexical sequences in a register (Biber et al. 2004). In the field of second language acquisition, the use of LBs has been found to be one of the features distinguishing native from non-native English (e.g., Chen and Baker 2010, Wei 2007); recently, LBs have also been affirmed an effective indicator for investigating translator’s style as well. Mastropierro (2018) compared LBs in two English-Italian translations of a thriller and found that one translator used significantly more bundles than the other. While acknowledging the merits of using LBs in translation style research, Mastropierro (2018) proposed that LBs can be categorized into groups which may disclose a translator’s linguistic patterns and habits. As noted by Mahlberg et al. (2019), LBs are sometimes marked features of a specific character; thus, the use of different LBs can help construct characters with its various functions of “negotiation of information, turn-taking, politeness, and first-person narration” (Mahlberg and Hoey 2012, 76). In terms of translation, translators’ use of LBs not only shows their linguistic preferences and characterization of the fictional characters but also impacts on the readability of their translations. Shrefler (2011) argued that Martin Luther’s German translation of the Bible is more reader-friendly because of his frequent use of verb-related LBs. Accordingly, the use of LBs is closely connected with translation style research. As a matter of fact, LBs have been used in Hongloumeng translation research. Based on the first 15 chapters of HLM translations, Liu and Afzaal (2021) demonstrated that Hawkes’s translation is embedded with a greater number and variety of LBs than the Yangs’ version. Although their study has shown major differences in the use of LBs between the two HLM translations, it is believed that a study taking all 80 chapters into consideration should yield more rigorous results. Moreover, responding to Axelsson’s (2008) call on treating fictional dialogue and Lexical Bundles in the Dialogues of Hongloumeng Translations 233 narration as two separate genres, a study on examining the use of LBs in HLM fictional dialogues can yield some new insights into HLM translation style research. Therefore, the current study focused on the dialogue part of both translations (all 80 chapters) to examine how the two diverge in translation style. The representation of LBs in respective translations serves as a departure point for the identification of the “the specific translator´s idiosyncrasies and conscious interpretive or unconscious idiolectal choices” (Munday 2012, 144). 13.2.4 Research Questions Based on the foregoing review, we can see that lexical bundles can be used as a reliable indicator for translation style research. Though such an indicator has been used to explore some parts of HLM translations (Liu and Afzaal 2011), no research has been conducted to systematically examine all 80 chapters translated by Hawkes and the Yangs. Besides, no research has so far attempted to separate HLM into fictional dialogues and narration. Thus, we believe that a study aiming at examining how lexical bundles are represented in HLM fictional dialogues can provide novel insights into this line of research. In this study, we aim at addressing the following three research questions: (1) Do the two Hongloumeng translations differ in style as represented by the frequency and types of lexical bundles? (2) If such differences are identified, do they diverge in terms of the structural and functional categories of the key lexical bundles? (3) What are the possible factors contributing to the different use of lexical bundles in the two Hongloumeng translations? 13.3 Data and Procedure 13.3.1 Corpus The current study made use of the English-Chinese Parallel Corpus of Hongloumeng, which was built by Li et al. (2011). The corpus was compiled by either scanning hard copies or downloading soft copies from the internet. It consists of three parts running in parallel, namely, the original Chinese texts, the translation by Hawkes and Minford, and the translation by the Yangs. The current research is based on the first 80 chapters of the two translations. In other words, the part translated by Minford is not included in our study. A self-written Python program was utilized to automatically extract the dialogues using punctuation (in this case, quotation marks) to separate fictional dialogues from narrations. The data were then manually proofread to ensure accuracy, as some quotation marks are used to mark titles or emphasize certain details instead of indicating dialogues. Upon completion, we have compiled two corpora, namely, the Yangs Dialogue Corpus (YD) and the Hawkes Dialogue Corpus (HD). YD consists of 219,478 tokens (i.e., the total number of orthographic words separated by 234 Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto Table 13.1 Descriptive Statistics of Fictional Dialogues in HD and YD Measures HD YD Tokens Types TTR1 STTR2 280,716 10,730 3.82 39.28 219,768 9,801 4.47 42.14 Source: 1TTR = type-token ratio. 2 sTTR = standardized type-token ratio. spaces and punctuations) and 9,801 types (i.e., the number of distinct words in the corpus), while HD has 280,682 tokens and 10,734 types (see Table 13.1). Although Hawkes used more words to translate the first 80 chapters, by dividing the number of types by tokens (i.e., type-token ratio or TTR) we can see a higher TTR in YD, showing that the Yangs used a wider range of distinct words. As YD and HD differ in size, standardized TTR (sTTR) of the two corpora were also calculated by working out the average of all the TTRs per 1,000 words. YD has a higher sTTR than HD, confirming that the Yangs indeed used more distinct words than Hawkes did. 13.3.2 Analytical Framework In order to identify the representative LBs used by Hawkes and the Yangs, we used WordSmith 8.0 (Scott 2020) to firstly turn both corpora into index files, which were then used to generate lists of three-word and four-word LBs with their corresponding frequencies. Most studies have opted for a frequency threshold for retrieving LBs, ranging from 10 (Biber et al. 1999), 20 (Cortes 2004; Hyland 2008), to 40 times (Biber et al. 2004; Pan et al. 2016) per million words (pmw). In view of the corpus size and the purpose of the current study, we have opted for a threshold of three times to retrieve the three-word and four-word LBs. Details of the retrieved LBs can be seen in Table 13.2. Based on the statistics, YD contains fewer tokens and types of both three-word and four-word LBs than HD. This is normal, considering the relatively smaller size of YD compared to HD. Further comparison of the TTRs reveal that YD has higher TTRs in both three-word and four-word LBs. Table 13.2 Types and Tokens of 3-Word and 4-Word LBs in HD and YD Measures HD YD Tokens of 3word LBs Types of 3word LBs TTR of 3word LBs Tokens of 4word LBs Types of 4word LBs TTR of 4word LBs 60,538 10,498 17.34 12,867 2,931 22.78 32,692 6,235 19.07 5,972 1,413 23.66 Lexical Bundles in the Dialogues of Hongloumeng Translations 235 Based on the two lists of LBs, we further adopted the structural and functional classifications framework proposed by Biber et al. (2004) to investigate how Hawkes and the Yangs used LBs differently. Structural classification is a system which broadly categorizes expressions into different groups based on their part of speech (POS) information. For LBs which contain at least one verb component, they are classified as verbphrase-based (VPbased). For the LBs which do not have any verb components, they are classified as nounphrase-based (NPbased) if a noun component comes before prepositions or other POS components. In case a preposition comes before nouns, the expression is then classified as prepositional phrase-based (PPbased). As for those without any verbs, nouns, or prepositions, they are classified as others. While structural classification is useful in differentiating the structural patterns of LBs preferred by respective translators, functional classification enables a comparison of the LBs in terms of their communicative goals. The LBs can be broadly categorized into stance, discourse markers, referential, and special conversational functions, depending on their use in the context. Sometimes an expression may perform more than one function. For example, I want to can be a discourse marker which introduces a topic; alternatively, I want to can also be used to express desire. To decide on the major function of an expression, we employed a context-based annotation. In other words, the LBs were studied in the context before we ultimately annotated the expression with its key function. In this study, we conducted two rounds of Key-LBs analysis. In the first round, we compared the YD LBs against the HD LBs as the reference corpus to identify the Key-LBs used in YD. In the second round, the two lists were reversed in order to identify the Key-LBs in HD. LBs having passed the keyness tests in the analyses (i.e., loglikelihood > 6.63) would be considered Key-LBs, meaning, that these LBs have an unusually high frequency in their respective corpus.3 Among these LBs, some content expressions, mainly, character and place-names, such as Our Old Lady, which are irrelevant for the analysis were redacted, leaving us with 57 and 139 LBs types in YD and HD, respectively. We applied the structural classification (i.e., NPbased, VPbased, PPbased, and others) and functional classification (i.e., stance, discourse organizers, referential, and special conversational functions) (Biber et al. 2004) to classify the Key-LBs, with the ultimate aim to identify how HD and YD diverge in style represented by the use of LBs. 13.4 Results 13.4.1 Structural Patterns Although YD yielded a higher TTR of LBs than HD, we only identified 57 KeyLBs in YD; HD, on the other hand, showed a lower TTR of LBs but recorded 139 Key-LBs (see Table 13.3). This reveals that TTR might not be a reliable indicator if we are comparing two LBs lists that differ in length. We found that HD and YD differ not only in the number of Key-LBs but also in structures and functions. 236 Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto While both Key-LBs in HD and YD are mostly VPbased (i.e., consisting of a verb component), HD has a higher proportion of VPbased Key-LBs (75.54%) than that of YD (61.40%). The result shows that HD is closer to Conrad and Biber’s (2005) finding that 90% of the LBs used in spoken British English involve verb components. On the other hand, a higher proportion of PPbased Key-LBs (i.e., bundles starting with a preposition) is found in YD (17.54%) than HD (7.91%). Since the majority of Key-LBs in HD and YD are VPbased, which involve at least one verb component, thus we proceeded to study their subpatterns (see Table 13.4). Our findings revealed that 40.95% of Key-LBs in HD started with a personal pronoun (e.g., I, you, she), 29.52% started with a verb (e.g., be, do, have, modal, or other verbs), and 20.95% started with either a conjunction or linking words, such as that and to (see Table 13.3). We further categorized the VP-based Key-LBs for their subcategories (see Table 13.4). Likewise, the PP-based KeyLBs were also further categorized for their subcategories (see Table 13.5). Table 13.3 Structural Classifications of Key-LBs in HD and YD HD YD Structural Classifications Key-LBs % Key-LB % NPbased VPbased PPbased Others Total 21 105 11 2 139 15.11 75.54 7.91 1.44 100 9 35 10 3 57 15.79 61.4 17.54 5.26 100 Table 13.4 Statistics of VP-Based Key-LBs in HD and YD VP-Based Key-LBs Types in HD % Types in YD Starting with personal pronouns Starting with verbs (including be, do, have, modal verbs, and other verbs) Starting with conjunctions, that, to, or not to Starting with whwords Starting with existential markers (including there and this) Starting with an adjective Total 43 31 40.95 5 29.52 15 14.29 42.86 22 5 2 20.95 6 4.76 8 1.9 1 17.14 22.86 2.86 2 105 1.9 100 0 100 0 35 % Table 13.5 Statistics of PP-Based Key-LBs in YD PP-Based Key-LBs HD % YD % Starting with a preposition and a determiner Starting with two prepositions Starting with conjunction Total 5 0 3 3 45% 0% 27% 27% 6 1 3 10 60.00 10.00 30.00 100 Lexical Bundles in the Dialogues of Hongloumeng Translations 237 13.4.2 Contextual Use of Key VP-Based and PP-Based LBs In this section, the two most common types of VP-based Key-LBs (i.e., those starting with personal pronouns and those starting with verbs) and the PP-based Key-LBs will be further discussed in relation to some examples extracted from HD and YD. Many of Hawkes’s VP-based Key-LBs are headed by a personal pronoun. I think you is the LB that is most significantly different between HD and YD (LL: 49.70), showing a clear overrepresentation in HD. This phrase usually appears at the beginning of a sentence and manifests the subject prominence in English. As we can see in excerpt 1, the suggestion of paying someone a visit is expressed in the form I think you should (i.e., first personal pronoun + verb base + second personal pronoun) in HD. Meanwhile, such subjectpredicate relation is absent in YD, which simply used the directive Go to express the character’s permission of the visit, which is a topic that has already been introduced in the previous dialogue exchange. YD prioritized the topic (Go), whereas HD adhered to the English convention of subject prominence (e.g., She is, I think you). As can be seen, Hawkes tended to use subjectpredicate structures (e.g., personal pronoun + verbs), whereas such structures are less found in the Yangs’ version. Excerpt 1 “你看看就過去罷,那 侄兒媳婦。” [Source] (Chapter 11) “Yes,” “she is your nephew’s wife. I think you should. Just look in for a moment, though, and then join the rest of us.” [Hawkes] “Go if you want, but don’t be long,” “Remember she’s your nephew’s wife.” [Yangs] Similar contrast is also observed in Key-LBs which begin with a verb. Ought to be is the most significant KeyLB in HD (LL: 36.02), which starts with a verb component. As we can see in excerpt 2, ought to be follows the subject you in HD. In his rendition, Hawkes translated the invitation 請, qing (literal translation: please), using a subject (you) and its predicate (ought to be getting back . . .). The Yangs, on the other hand, did not use the subjectpredicate structure but instead retained the semantic meaning (please) of 請, qing, in the source text. Since please is a near equivalent of 請, qing, the Yangs used literal translation by following the same sentence order as that of the source text. Subject is again omitted in the Yangs’ version. Excerpts 1 and 2 are just two of the many examples contrasting Hawkes’s and the Yangs’ preferences for subjectpredicate and topiccomment structures, respectively. Overall, we can safely conclude that Hawkes’s Key-LBs follow the spoken English convention in which most of the LBs involve verb components (Conrad and Biber 2005) structured in the form of personal pronouns + verb (Biber 2009). 238 Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto Excerpt 2 “如今來回老祖宗,債主已去,不用躲了。已預 希嫩的野鶏,請吃 晚飯去,再遲一會子就老了。” [Source] (Chapter 50) “So now your creditors have gone, you can come out of hiding. You ought to be getting back now in any case. You’ve got some nice, tender pheasant for dinner and if you leave it much longer it will spoil.” [Hawkes] “Now I’ve come to report to our Old Ancestress: Your duns have gone, you can come out of hiding. I’ve some very tender pheasant ready. So please come back for dinner. If you leave it any later, it’ll be overcooked.” [Yang] However, this is not the case in YD. Although more than half of the Yangs’ Key-LBs are still VPbased, this proportion is still fewer than that of HD because 17.54% belong to PPbased LBs. Meanwhile, only 7.91% of Hawkes’s Key-LBs are PPbased. This indicates that YD has used more PPbased LBs which were significantly underused by Hawkes when translating Hongloumeng (see Table 13.4). Yip (1995, 78) pinpointed that bare noun phrases are often placed in the beginning of a Chinese sentence to refer to a topic due to topic prominence, but such a syntactic structure (i.e., sentences beginning with a bare noun) is not really natural in English. Hence, Yip believed that Chinese speakers strategically use prepositional phrases to encapsulate a bare noun phrase when they need to first talk about a topic. Based on the results, it can be seen that using a prepositional noun phrase to start a sentence is more prevalent in YD than HD. For example, the Yangs used If not for (LL: 29.64) significantly more frequently than Hawkes did. If not for is a typical prepositional phrase which consists of the conjunction if, the adverb not, and the preposition for. In excerpt 3, we can see that the source text in Chinese is structured as 要 不 (if not) and 我 (me), which the Yangs directly translated into If not for me. As the focus is on the speaker holding back the other one from attacking people, the Yangs kept this topic in the translation and used the prepositional phrase If not for to topicalize the object me. The syntactic order of If not for me is almost an equivalent to the dependent clause 要不 (literally: if not me) in the Chinese source text. Conversely, Hawkes followed the subjectprominent convention by using a verb phrase to start the sentence. He used the verbpronounverb clause Suppose I hadn’t been here to describe a condition that is contrary to fact. Excerpt 3 “要不 ,你要傷了他的命,這會子可怎麽樣?” [Source] (Chapter 44) “If not for me you might have killed her. What do you intend to do now?” [Yang] “Suppose I hadn’t been here to protect her and you really had done her an injury, what would you have had to say for yourself then, I wonder?” [Hawkes] Lexical Bundles in the Dialogues of Hongloumeng Translations 239 The Yangs also used prepositional phrases at the end of sentences. For example, they extensively used for no reason to express the absurdity of a situation. For no reason is one of the Key-LBs in YD consisting of a preposition, a determiner, and a noun, which yielded a very high keyness value (LL: 31.29), meaning, that it is overrepresented in YD than HD. PPbased LBs like for no reason, when placed at the end of a sentence, often serve as an adverbial. From excerpt 4 we can see that the Yangs used this prepositional phrase to describe the unlikeliness that someone would offend those people. The Yangs not only used prepositional phrases to make noun phrase topics grammatically well-formed (e.g., excerpt 3) but also used them to describe actions. However, no such substantial use of prepositional phrases was found in Hawkes’s dialogue translation. Hawkes used a variety of linguistic choices to achieve the same purpose; in this case, he used the adverb possibly to express the unlikeliness of the event. So far, our study has found that there are more unique VPbased LBs in HD and more distinctive PPbased LBs in YD. Our findings revealed that the Yangs seemed to prefer using prepositions to introduce noun topics, while Hawkes used more verb phrases to express subjectpredicate relations. Excerpt 4 “誰可 的得罪着他?” [Source] (Chapter 78) “Why should anyone offend them for no reason.” [Yang] “Who could possibly have offended her?” [Hawkes] 13.4.3 Functional Classifications After manual classification, it was found that 47.48% of Hawkes’s Key-LBs mainly expressed stances, while 36.84% of the Yangs’ Key-LBs mainly served as referential bundles (see Table 13.6). This means almost half of Hawkes’s unique LBs come from his use of stance markers. Thus, these two functional categories were further examined in detail. In order to show how HD diverged from the YD in the use of stance markers, we further categorized the stance markers for their subpatterns (see Table 13.7). Likewise, we also further categorized the referential Key-LBs in HD and YD for their subpatterns (see Table 13.8). Table 13.6 Functional Classifications of Key-LBs in HD and YD Functional Classifications HD % YD % Stance Discourse organizers Referential Special conversational functions Total 66 31 37 5 139 47.48 22.3 26.62 3.6 100 10 10 21 16 57 17.54 17.54 36.84 28.07 100 240 Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto Table 13.7 Statistics of Stance Key-LBs in HD and YD Stance Functions HD % YD % Epistemic stance Overall attitudinal/modality stance Desire Obligation/directive Intention/prediction Ability Total 20 4 4 19 13 6 66 30.30 6.06 6.06 28.79 19.70 9.09 100.00 3 0 0 4 1 2 10 30 0 0 40 10 20 100 Table 13.8 Statistics of Referential Key-LBs in HD and YD Referential Functions HD % YD % Identification/focus Imprecision Quantity/specification Intangible framing attributes Place reference Time reference Multifunctional reference Total 7 6 9 6 1 3 1 37 18.92 16.22 24.32 16.21 2.70 14.29 2.70 100 4 1 5 4 1 3 3 21 19.05 4.76 23.81 19.05 4.76 14.29 14.29 100 13.4.4 Contextual Use of Key Stance and Referential LBs According to Biber and Barbieri (2007), the predominant function of LBs in all spoken registers (i.e., teaching, class management, office, study groups, and service encounters) is to express stance. It seems that Hawkes tended to use stance markers to translate the fictional dialogue. Among Hawkes’s Key-LBs which are classified as stance, 30.30% construe an epistemic stance, while 28.79% convey obligations/directives (see Table 13.6). The rest are distributed among intentions/predications, desire, ability, etc. This means most of Hawkes’s Key-LBs perform either an epistemic or a directive function. For instance, one of Hawkes’s KeyLB, I think I (LL: 33.52), is a very common epistemic marker in conversational English. It indicates personal opinions and sometimes functions as a hedge to soften the illocutionary force of an assertion. In excerpt 5, Hawkes added I think I to express the speaker’s decision to stay overnight. This use of hedging in decision-making is, however, not found in the source text. It is solely Hawkes’s interpretation that a certain degree of hedging might be required in this context. Such stance markers are found neither in the source text nor in YD. The Yangs used shan’t, the contraction form of shall not, to keep the formality and courtesy conveyed in the source text. On the other hand, the Yangs literally rendered the source text without adding any epistemic stances in relation to the context. Excerpt 5 “有的 炕,只管睡。 二爺使我送月銀的,交 去了。” [Source] (Chapter 65) 了奶奶, 不回 Lexical Bundles in the Dialogues of Hongloumeng Translations 241 “There’s plenty of room here for you to sleep. Make yourselves at home. Actually, I came here to bring the mistress her monthly allowance. Now that I’ve given it to her, I think I shall spend the night here as well.” [Hawkes] “Well, there’s plenty of room on the kang, just lie down as you like. Second Master sent me to bring the monthly allowance to the mistress, so I shan’t be going back either.” [Yang] Apart from epistemic stances, Hawkes used significantly more LBs to perform a kind of speech act directives. Among his stance Key-LBs, 28.79% assert obligation/directives. You ought to (LL: 28.64) is one of the LBs with a high keyness value which is used by the speaker to imply that the listener has a sense of duty or morality to undertake a certain task. Clearly, HD contains more expressions conveying obligations and directives than YD. Take a translation pair as an example (see excerpt 6): the source text 你細想去 (literal translation: you carefully think about) does not contain any sense of obligation. However, Hawkes used you ought to be able to in his translation, which signaled an obligation for the listener to work things out by themselves. Such an obligation sense was not found (at least literally) in the source text, so the Yangs simply used the adverb just to begin the subjectless command work it out yourself. In view of the fact that there are more stance Key-LBs (66) in HD compared to YD (10), it can be postulated that Hawkes tended to add stance LBs in his translation while the Yangs used stance LBs to a lesser degree. Among these stance Key-LBs, Hawkes mainly used them to convey epistemic stances or obligation/directives, as has been exemplified in excerpts 5 and 6. Excerpt 6 “ 。我哥哥已經相准了,只等來年就下定了, 不必提出人來,我 方才說你認不得娘,你細想去。” [Source] (Chapter 57) “No, that’s not the reason. It’s because someone has already been chosen for my brother. We are only waiting for him to come home to make it public. I don’t need to name names. If I tell you that you can’t possibly become Mamma’s god daughter, you ought to be able to work it out for yourself.” [Hawkes] “No, it’s because my brother has already set his mind on someone, and it’ll be fixed up as soon as he returns. I needn’t name any names. Why did I say you couldn’t take her as your mother? Just work it out for yourself!” [Yang] Unlike Hawkes, many of the Key-LBs in YD are referential markers. Results show that 36.84% of the frequently occurring LBs in YD were used to refer to different attributes. The referential Key-LBs in YD are distributed across many subfunctions, including identification/focus, imprecision, quantity/specification, intangible framing attributes, place, time, and multifunctional reference (see Table 13.8). Since the Yangs’ referential Key-LBs are evenly distributed across all subfunctions, we have selected two referential Key-LBs for detailed analysis Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto 242 based on the two LBs’ exceptionally high keyness values. The first one is this is just (LL: 24.70), which functions as an identification/focus marker. The Yangs used this is just significantly more frequently than Hawkes did (see excerpt 7). This is just what (YD) differs from this way of carrying on (HD), as the former just refers to a vague subject matter which readers can by no means infer from the literal meaning, but the latter identifies the exact misbehavior. In the source Chinese text 正爲勸你這些 (literal translation: just persuading you these), the word 這, zhe (literal translation: this), is exactly an identifier in Chinese. By starting a sentence with the identifier 這, zhe, Chinese speakers can easily follow the topic, which need not be reintroduced repeatedly. Largely a literal translation approach, the Yangs used identifiers (e.g., this) in their translation by adhering closely to the source text. We assume that the overuse of identification LBs in YD is thus probably a result of direct translation of Chinese identifier 這, zhe (i.e., this), which is a more economical way of introducing a mutually known topic. Hawkes, on the other hand, felt the need to explicate the topic clearly. Excerpt 7 “ 的,正爲勸你這些,更說的狠了。” [Source] (Chapter 19) “This is just what I wanted to warn you against, yet here you go, talking more wildly than ever.” [Yang] “It’s precisely this way of carrying on that I was going to talk to you about, and here you go, ranting away worse than ever!” [Hawkes] Another function of the Key-LBs in YD is the use of express imprecision. On like this is one KeyLB in this subcategory with a high keyness value (LL: 32.94) and overused in YD than HD. This LB does not specify what qualities it is referring to. Instead, it makes the circumstances off the record and leaves readers some room for imagination. For example, in excerpt 8, the Yangs used on like this to refer to the girl’s poor situation, which is not explicitly mentioned in the corresponding source text. The source text 這個形景 (literal translation: this situation) does not specify clearly what situation the girl is in. On the contrary, Hawkes did not use the imprecise LB on like this like the Yangs did but instead used the noun phrase her outward behavior. Again, Hawkes has given his own personal interpretation of the expression 這個形景 (i.e., this situation). Excerpt 8 “這女孩子一定有 麽話說不出來的大心事,才這麽個形景。外面既 這個形景,心裏不知怎麽熬煎。看他的模樣兒這般單薄,心裏那裏還 擱的住熬煎。可恨我不能替你分些過來。” [Source] (Chapter 30) “She must have some secret anxiety preying on her mind to carry on like this, yet she looks too delicate to stand much anxiety. I wish I could share her troubles.” [Yang] Lexical Bundles in the Dialogues of Hongloumeng Translations 243 “One can see from her outward behaviour how much she must be suffering inwardly. And she looks so frail. Too frail for suffering. I wish I could bear some of it for you, my dear!” [Hawkes] 13.5 Discussion This chapter has applied keyword analysis to identify the three-word and fourword lexical bundles (LBs) which are significantly more frequent in each of the Hongloumeng translations compared to meaningful LBs of other lengths. It is found that many of Hawkes’s Key-LBs (i.e., lexical bundles unusually frequent in Hawkes’s dialogue translation but infrequent in the Yangs’ dialogue translation) are verb phrases, while many of the Yangs’ Key-LBs (i.e., bundles unusually frequent in the Yangs’ dialogue translation but infrequent in Hawkes’s dialogue translation) are prepositional phrases. We have also found that almost half of Hawkes’s Key-LBs function as stance markers, while the largest proportion of the Yangs’ Key-LBs are referential markers. In this section, Hawkes’s and the Yangs’ use of LBs will be discussed with reference to their language backgrounds, life experiences, and respective translation purposes. 13.5.1 Language Backgrounds David Hawkes is a native English speaker, while Xianyi Yang is a native Chinese speaker. Although his wife, Gladys Yang, is a native English speaker, she mainly typed “the translation on a typewriter. While she was typing the text, she also polished or edited it” (Li et al. 2011, 163). In our study, it is found that Hawkes used more VP-based LBs, which is in line with Biber’s (2009) finding that 50% of the LBs used in native spoken English are structured as “personal pronoun + verb components.” This shows that Hawkes’s translation of fictional dialogues is largely in line with the norm of spoken English in this respect. On the contrary, Xianyi Yang, as a native Chinese speaker, is found to have used more PP-based LBs. This is also consistent with some findings that L2 speakers (e.g., native Chinese speakers) tend to overuse certain LBs which native English speakers seldom use (Chen and Baker 2010) and that Chinese speakers use more prepositions to construct lexical bundles than did their native English counterparts (Wei 2007; Chen and Baker 2010). As Chinese is a topicprominent language (Yip 1995), it is not surprising that Chinese speakers adhere to the topicprominence convention by using prepositions combined with a bare noun phrase in the topic position to ensure grammaticality in English. On the other hand, English is a subjectprominent language which often structures sentences in a subjectpredicate relation (ibid.); thus, half of the LBs in spoken English are made up of “pronoun + verb” (Biber 2009). Hawkes’s VP-based Key-LBs, such as I think you and ought to be, are manifestations of subject prominence in English; the Yangs’ PP-based Key-LBs, such as if not for and for no reason, may be influenced by topic prominence, in which preposition phrases often serve as adverbials in Chinese. This supports previous research (e.g., Yip 1995; Biber and Barbieri 2007, 2009; Conrad and Biber 2005) 244 Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto that LBs in spoken English are mostly verb phrases, and Chinese speakers tend to use prepositional phrases to topicalize the bare nouns or noun phrases when they speak English. As for the functional aspects, Hawkes’s Key-LBs, such as I think I and you ought to, also resonate with the convention that the most prominent function of LBs in spoken English is stancemaking: to assert epistemic stance and give directives (Biber and Barbieri 2007). Meanwhile, the Yangs’ less frequent use of stance bundles might be related to the fact that Chinese speakers often underuse participant-oriented LBs (Wei and Lei 2011; Pan and Liu 2019). The Yangs’ overuse of LBs such as this is just and on like this reflects Chinese speakers’ frequent use of identifiers to express mutually known topics. Hence, Hawkes’s frequent use of verb phrases and stance LBs, as well as the Yangs’ frequent use of prepositional phrases and referential LBs, reveal that divergent translation styles can be attributed to the different language backgrounds of the respective translators. 13.5.2 Life Experiences David Hawkes went to China and received postgraduate education in Beijing in 1948, while Xianyi Yang started his university education at Oxford University in 1936. According to Minford’s foreword to Xianyi Yang’s (2002) autobiography White Tiger, Xianyi and Gladys Yang would visit David and Jean Hawkes and the couples knew each other well. David Hawkes and Xianyi Yang were intellectuals in pretty much the same historical time, and they published their translations of Hongloumeng about the same time as well (i.e., both finished their translations by 1980). On the other hand, Hawkes and Xianyi Yang contrast in their walks of life. David Hawkes was a sinologist who first encountered Hongloumeng when he studied at Peking University. He read the novel under the guidance of a Chinesespeaking “laoxiansheng,” 老先生 (translation: old scholar), who was a former government clerk from the Hebei province. Hawkes described the reading journey as “direct method gone mad” in a sense that he barely understood what the teacher said. Perhaps due to his unpleasant experience, Hawkes preferred a more fluent approach in rendering the fictional dialogues (more VP-based LBs) which sound as if they were naturally spoken to the readers in English. Out of his passion for the novel, Hawkes resigned from his chair professorship at Oxford in 1971 to be fully devoted to his translation of Hongloumeng (Minford 2012). At that time, Hawkes was already an established scholar who had a research fellowship to live on. He did not translate for money’s sake but for his sheer joy. Contrary to David Hawkes, Xianyi Yang did not have the luxury of spending years on polishing his translated work. After he and his wife joined the official translation bureau in 1943, and subsequently the Foreign Languages Press in 1952, the couple was in charge of translating literary works in new China. In the 1950s, Xianyi Yang was drained by translating foreign works into Chinese, as he also had to fulfil the “voluntary physical labor” at the same time; from 1968 to 1972, the couple suffered a hard time of being imprisoned due to the political unrest brought by the Cultural Revolution. During the two years of translating Hongloumeng for the Foreign Lexical Bundles in the Dialogues of Hongloumeng Translations 245 Languages Press, they lost their beloved son. According to Xianyi Yang’s autobiography (2012), they were never paid for the extra work on translation except Hongloumeng, which was commissioned by the magazine Chinese Literature. Our findings corroborated with Li, Zhang and Liu (2011) that Xianyi Yang and Gladys Yang translated under censorship, grief, and tight schedule yet with little remuneration. This probably explains why a more literal approach was employed by the Yangs in rendering the fictional dialogues. 13.5.3 Translation Purposes Finally, David Hawkes’s translation purpose was to entertain readers and literary enthusiasts. To help reconstruct the dialogues, Hawkes has adopted a more liberal approach in his translation. For example, one of the most frequently occurring reporting verb phrase, 笑道, xiao dao, in the source text (literally: said with a smile) was translated in various ways (e.g., childe, laugh, with a broad smile, with a meaningful smile, with a proud smile) by Hawkes in relation to the context. Hawkes justified this approach as a measure to compensate for the absence of the tone of voice (Minford n.d., 32). In his preface to The Story of the Stone Volume 1: The Golden Days, Hawkes (1973, 46) stated his major concern in translating the novel: “If I can convey to the reader even a fraction of the pleasure this Chinese novel has given me, I shall not have lived in vain.” When translating the dialogues, Hawkes preferred stance bundles, as they serve many communicative functions (e.g., expressing attitudes, desire, directives, intentions, predictions, abilities) which render the dialogues more engaging. On the other hand, in the Publisher’s Note of A Dream of Red Mansions Volume 1, it was stated that Hongloumeng is a book “about political struggle” (1978, iv), which “by presenting the prosperity and decline of the four typical noble families it truthfully lays bare the corruption and decadence of the feudal ruling class and points out its inevitable doom” (1978, vii). Though such a remark might result from self-censorship due to the political atmosphere of the time, such a depiction has clearly shown that ideological factors greatly outweighed aesthetic ones in the case of the Yangs. When translation becomes a task assigned by the officials, the translated work is to promote ideologies and hence leaves the translators little room for interpretation. Therefore, it is plausible that the Yangs opted for a more rigid approach to translate the novel. 13.6 Conclusion This study sets out to compare different translators’ use of lexical bundles in two Hongloumeng translations. In line with Mastropierro’s (2018) suggestion, we affirmed that lexical bundles can serve as a reliable indicator beyond other lexical devices for differentiating style in different translations. By examining the syntactic structures and functions of the key lexical bundles in Hawkes and the Yangs, we have found that the Yangs adopted a more literal and seemingly rigid approach to translating Hongloumeng, as evidenced by the different use of key 246 Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto lexical bundles from Hawkes. Our study has yielded some preliminary evidence that translators’ styles may be influenced by the respective translator’s language background, life experiences, and translation purposes. This study is, however, not without limitations. Only translation works by two groups of translators (i.e., Hawkes and the Yangs) were sampled in the current study. Future studies can compare more translation versions of Hongloumeng to examine whether the use of lexical bundles differ among different translators as a result of their sociocultural background and translation purposes. Besides, as argued by Li and Zhang (2010, 250), “[a] corpus as well as a statistical presentation of translation or language facts is not the ultimate goal of our research, but rather the beginning and foundation for real research on whatever research questions the project is addressing.” In this regard, more documentary evidence needs to be collected to verify the claims made based on corpus frequency data. Notes 1 An earlier version first appeared in Translation Quarterly (2020), Issue 98, pp. 79–101. This present version is updated and modified based on the earlier version. 2 For a more detailed review of recent studies on HLM, readers may refer to Moratto et al. (2022). 3 Based on UCREL’s (https://ucrel.lancs.ac.uk/llwizard.html) instruction on calculating log-likelihood and effect size, a critical log-likelihood value of 6.63 means that the null hypothesis is considered to be false (i.e., p < 0.01). Therefore, a log-likelihood value of 6.63 is set as threshold for Key-LBs in the current study. References Axelsson, Karin. 2008. “Research on Fiction Dialogue: Problems and Possible Solutions.” In Corpora: Pragmatics and Discourse, edited by Andreas H. Jucker, Daniel Schreier, and Marianne Hundt, 189–201. Leiden: Brill. Baker, Mona. 1993. “Corpus Linguistics and Translation Studies: Implications and Applications.” In Text and Technology: In Honor of John Sinclair, edited by Mona Baker, G. Francis, and E. Tognini-Bonellis. Amsterdam: John Benjamins. Baker, Mona. 2000. “Towards a Methodology for Investigating the Style of a Literary Translator.” Target 12, no. 2: 241–66. Biber, Douglas. 2009. “A Corpusdriven Approach to Formulaic Language in English: Multiword Patterns in Speech and Writing.” International Journal of Corpus Linguistics 14, no. 3: 275–311. Biber, Douglas, and Federica Barbieri. 2007. “Lexical Bundles in University Spoken and Written Registers.” English for Specific Purposes 26, no. 3: 263–86. Biber, Douglas, Susan Conrad, and Viviana Cortes. 2004. “If You Look At . . . : Lexical Bundles in University Teaching and Textbooks.” Applied Linguistics 25, no. 3: 371–405. Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad, and Edward Finegan. 1999. Longman Grammar of Spoken and Written English. London: Longman. Bonsall, Bramwell Seaton, Trans. 2004. The Red Chamber (Hongloumeng). Hong Kong: The University of Hong Kong. https://lib.hku.hk/bonsall/hongloumeng/title.pdf. Bosseaux, Charlotte. 2007. How Does It Feel? Point of View in Translation: The Case of Virginia Woolf into French. Amsterdam: Rodopi. Lexical Bundles in the Dialogues of Hongloumeng Translations 247 Cao Xueqin, and E. Gao. 1982. Hong Lou Meng [in Chinese]. Beijing: People’s Literature Press. Chen, Yuhua, and Paul Baker. 2010. “Lexical Bundles in L1 and L2 Academic Writing.” Language Learning & Technology 14, no. 2: 30–49. Conrad, Susan, M., and Douglas Biber. 2005. “The Frequency and Use of Lexical Bundles in Conversation and Academic Prose.” Lexicographica 20: 56–71. https://doi. org/10.1515/9783484604674.56. Cortes, Viviana. 2004. “Lexical Bundles in Published and Student Disciplinary Writing: Examples from History and Biology.” English for Specific Purposes 23, no. 4: 397–423. Crystal, David. 1999. The Penguin Dictionary of Language. London: Penguin Books. Fang, Yu, and Haitao Liu. 2015. “Comparison of Vocabulary Richness in Two Translated Hongloumeng.” Glottometrics 31: 54–75. Hawkes, David. 1973. The Story of the Stone Volume 1: The Golden Days. London: Penguin. Hou, Yu. 2013. “A Corpus-Based Study of Nominalization as a Feature of Translator’s Style (Based on the English Versions of Hong Lou Meng).” Meta 58, no. 3: 556–73. Hyland, Ken. 2008. “As Can be Seen: Lexical Bundles and Disciplinary Variation.” English for Specific Purposes 27, no. 1: 4–21. Ji, Meng, and Michael P. Oakes. 2018. “A Corpus Study of Early English Translations of Cao Xueqin’s Hongloumeng.” In Quantitative Methods in Corpus-based Translation Studies, edited by Michael P. Oakes and Meng Ji, 177–208. Amsterdam: John Benjamins. Leech, Geoffrey N., and Mick Short. 1981. Style in Fiction: A Linguistic Introduction to English Fictional Prose. Longman: London. Li, Defeng, and Chunling Zhang. 2010. “Sense-Making in Corpus-Assisted Translation Research.” In Using Corpora in Contrastive and Translation Studies, edited by Richard Xiao, 235–54. Newcastle upon Tyne: Cambridge Scholars Publishing. Li, Defeng, Chunling Zhang, and Kanglong Liu. 2011. “Translation Style and Ideology: A Corpusassisted Analysis of Two English Translations of Hongloumeng.” Literary and Linguistic Computing 26, no. 2: 153–66. Liu, Kanglong, and Muhammad Afzaal. 2021. “Translator’s Style Through Lexical Bundles: A Corpusdriven Analysis of Two English Translations of Hongloumeng.” Frontiers in Psychology 12. https://doi.org/10.3389/fpsyg.2021.633422. Liu, Zequan. 2008. “Translating Tenor: With Reference to the English Versions of Hongloumeng.” Meta 53, no. 38: 528–48. Mahlberg, Michaela, and Michael Hoey. 2012. Corpus Stylistics and Dicken’s Fiction. London: Routledge. Mahlberg, Michaela, Viola Wiegand, Peter Stockwell, and Anthony Hennessey. 2019. “Speech Bundles in the 19th Century English Novel.” Language and Literature 28, no. 4: 326–53. Mastropierro, Lorenzo. 2018. “Key Clusters as Indicators of Translator Style.” Target 30, no. 2: 240–59. Minford, John. 2012. “A Tribute to Brother Stone.” In Style, Wit and Word-Play: Essays in Translation Studies in Memory of David Hawkes, edited by Tao Tao Liu, Laurence K. P. Wong, and Sin-wai Chan, 1–14. Newcastle upon Tyne: Cambridge Scholars Publishing. Minford, John. n.d. “Hawkes’ Approaches to Translating Fiction.” In A335 Culture and Translation [Course Materials]. Hong Kong: OUHK. Moratto, Riccardo, Kanglong Liu, and Di-kai Chao, eds. 2022. Dream of the Red Chamber: Literary and Translation Perspectives. London and New York: Routledge. Mu, Yuanyuan. 2012. “Towards a Quantitative & Qualitative Stylistic Approach to Ideational Construal in the Translation of Narrative Discourse: Norms and Readers’ 248 Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto Responses Revisited – A Corpus-based Study on Hong-lou Meng and Its Two English Translations.” PhD diss., City University of Hong Kong. Munday, Jeremy. 2012. Evaluation in Translation: Critical Points of Translator Decision Making. London and New York: Routledge. Pan, Fan, and Chen Liu. 2019. “Comparing L1L2 Differences in Lexical Bundles in Student and Expert Writing.” Southern African Linguistics and Applied Language Studies 37, no. 2: 142–57. Pan, Fan, Randi Reppen, and Douglas Biber. 2016. “Comparing Patterns of L1 Versus L2 English Academic Professionals: Lexical Bundles in Telecommunications Research Journals.” Journal of English for Academic Purposes 21: 60–71. Ran, Shiyang, and Ping Yang. 2013. “Breaking Through the Bottleneck: A Comparative Investigation into the ChineseEnglish Translation Studies of ‘Hong Lou Meng’ [in Chinese].” China Publishing Journal 12: 61–3. Saldanha, Gabriela. 2011. “Translator Style: Methodological Considerations.” The Translator 17, no. 1: 25–50. Scott, Mike. 2020. WordSmith Tools Version 8. Stroud: Lexical Analysis Software. Shrefler, Nathan. 2011. “Lexical Bundles and German Bibles.” Literary and Linguistic Computing 26, no. 1: 89–106. Su, Ke. 2021. “Translation of Metaphorical Idioms: A Case Study of Two English Versions of Hongloumeng.” Babel 67, no. 3: 332–54. Tong, Jasmine Man, and David Morgan. 2021. “ ‘Twice Bitten’ Two Men and a Translation: The Making of the Stone.” Babel 67, no. 6: 791–818. Tsao, Liqun. 2020. “On Translation of Social Terms in Hong Lou Meng.” Arts Studies and Criticism 1, no. 3: 68–73. Wang, Ning. 2016. “Chinese Literature as World Literature.” Canadian Review of Comparative Literature 43, no. 3: 380–92. Wei, Naixing. 2007. “Phraseological Characteristics of Chinese Learners’ Spoken English: Evidence of Lexical Chunks from COLSEC.” Modern Foreign Languages 30, no. 3: 281–91. Wei, Yaoyu, and Lei Lei. 2011. “Lexical Bundles in the Academic Writing of Advanced Chinese EFL Learners.” RELC Journal 42, no. 2: 155–66. Wu, Chungming. 2021. “Towards Norms in Two Translations of Hong Lou Meng: A Corpus-based Study.” PhD diss., The Hong Kong Polytechnic University. Yan Minmin 閆敏敏. 2005. “Ershinianlai de Hongloumeng yingyi yanjiu 二十年來的《 》英譯研究 [Twenty Years’ Studies on the Translation of Hongloumeng into English]. Foreign Language Education 外語教學 26, no. 4: 64–68. Yang, Hsien-Yi, and Gladys Yang, trans. 1978. A Dream of Red Mansions Volume 1. Beijing: Foreign Languages Press. Yang, Xianyi. 2002. White Tiger: An Autobiography of Yang Xianyi. Hong Kong: Chinese University Press. Yip, Virginia. 1995. Interlanguage and Learnability: From Chinese to English. Amsterdam: John Benjamins. Yu, Ke. 2020. “A Comparative Study of the Translation of Material Culture-loaded Words of Hongloumeng in the Light of Skopostheorie.” Journal of Language Teaching and Research 11, no. 2: 318–23. Lexical Bundles in the Dialogues of Hongloumeng Translations 249 Appendix A Yangs’ 3-Word and 4-Word Key-LBs Key-LBs Freq. BIC Log-Likelihood Log-Ratio P-Value A FEW CUPS ARE WE TO AS THE PROVERB AS THE PROVERB SAYS BOUND TO BE BUT MIND YOU CARRY ON LIKE COULD IT BE COUPLE OF DAYS DO SUCH A DO YOU EXPECT DOES IT MATTER DON’T YOU KNOW EVEN IF HE FOR A COUPLE FOR A COUPLE OF FOR A STROLL FOR A WHILE FOR NO REASON HAVE SUCH A HAVE THE SAME HIGH AND LOW HOW CAN I HOW CAN WE HOW CAN YOU HOW COULD I HOW IT IS HURRY UP AND I MEANT TO I’D NO IDEA IF NOT FOR IT’S NO USE IT’S NOT THAT JUST WHAT I MUCH THE BETTER MY ADVICE AND NOTHING BUT A ON LIKE THIS ON THE SLY SAY ONE WORD SO AS TO SO HOW CAN SO LONG AS 10 11 19 18 20 10 12 10 30 10 11 12 11 10 18 18 12 29 19 10 12 11 36 25 61 20 14 37 17 10 18 24 10 11 12 12 11 20 14 10 30 13 15 3.34 4.99 4.11 2.76 19.81 3.34 6.64 3.34 6.41 3.34 4.99 6.64 4.99 3.34 2.76 2.76 6.64 7.26 18.17 3.34 6.64 4.99 13.68 28.05 29.16 5.47 9.93 2.67 14.87 3.34 16.52 26.4 3.34 4.99 6.64 6.64 4.99 19.81 9.93 3.34 19.64 8.28 11.58 16.47 18.11 17.23 15.88 32.94 16.47 19.76 16.47 19.54 16.47 18.11 19.76 18.11 16.47 15.88 15.88 19.76 20.38 31.29 16.47 19.76 18.11 26.81 41.17 42.29 18.59 23.05 15.79 27.99 16.47 29.64 39.52 16.47 18.11 19.76 19.76 18.11 32.94 23.05 16.47 32.76 21.41 24.70 1,059.58 1,059.71 3.02 2.94 1,060.58 1,059.58 1,059.84 1,059.58 2.26 1,059.58 1,059.71 1,059.84 1,059.71 1,059.58 2.94 2.94 1,059.84 2.40 1,060.50 1,059.58 1,059.84 1,059.71 2.52 1,060.90 2.38 3.09 1,060.06 1.66 1,060.34 1,059.58 1,060.43 1,060.84 1,059.58 1,059.71 1,059.84 1,059.84 1,059.71 1,060.58 1,060.06 1,059.58 3.68 1,059.96 1,060.16 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 (Continued) 250 Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto Appendix A (Continued) Key-LBs Freq. BIC Log-Likelihood Log-Ratio P-Value SO MUCH THE BETTER TAKE MY ADVICE TAKE MY ADVICE AND THE BLAME ON THIS CHANCE TO THIS IS JUST TO ASK FOR TO SEE TO TO SHOW MY WHAT DOES IT MATTER WHY NOT GO WHY SHOULD WE WOULDN’T THAT BE YOU DON’T UNDERSTAND 12 13 11 11 11 15 26 13 10 12 11 15 20 18 6.64 8.28 4.99 4.99 4.99 11.58 3.68 8.28 3.34 6.64 4.99 11.58 19.81 16.52 19.76 21.41 18.11 18.11 18.11 24.70 16.80 21.41 16.47 19.76 18.11 24.70 32.94 29.64 1,059.84 1,059.96 1,059.71 1,059.71 1,059.71 1,060.16 2.25 1,059.96 1,059.58 1,059.84 1,059.71 1,060.16 1,060.58 1,060.43 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 Source: * Only Key-LBs loglikelihood > 6.63 (for pvalue < 0.01) are listed here. Appendix B Hawkes’s 3-Word and 4-Word Key-LBs Key-LBs Freq. BIC Log-Likelihood Log-Ratio P-Value A BIT AND A BIT BETTER A BIT OF A BIT TOO A FEW MINUTES A GOOD JOB A LOT OF A MATTER OF A QUESTION OF A THING LIKE THIS A WORD WITH ABLE TO SEE AND AFTER THAT AND GET IT AND I DON’T AND IN ANY AND IN ANY CASE ARE GOING TO ARE IN THE AS A MATTER AS A MATTER OF AWAY WITH IT 16 17 67 15 15 17 91 33 16 14 36 14 14 21 18 14 14 33 14 29 29 15 5.37 6.53 36.24 4.22 4.22 6.53 16.88 3.66 5.37 3.06 6.32 3.06 3.06 11.15 7.68 3.06 3.06 9.31 3.06 2.61 2.61 4.22 18.49 19.65 49.36 17.34 17.34 19.65 30.01 16.78 18.49 16.18 19.44 16.18 16.18 24.27 20.81 16.18 16.18 22.43 16.18 15.73 15.73 17.34 1,059.90 1,059.99 3.39 1,059.81 1,059.81 1,059.99 1.69 2.37 1,059.90 1,059.71 2.49 1,059.71 1,059.71 1,060.29 1,060.07 1,059.71 1,059.71 3.11 1,059.71 2.50 2.50 1,059.81 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 Lexical Bundles in the Dialogues of Hongloumeng Translations 251 Appendix B (Continued) Key-LBs Freq. BIC Log-Likelihood Log-Ratio P-Value BE A BIT EXACTLY THE SAME FOR A BIT GET ON WITH GOING TO BE GOING TO DO GOT TO HEAR HAVEN’T GOT ANY HEAR ABOUT IT I AM AFRAID I AM NOT I AM SURE I DON’T KNOW WHY I DON’T THINK I HAVE BEEN I HOPE YOU I SHOULD HAVE I SHOULD LIKE I SHOULD LIKE TO I SHOULD THINK I THINK I I THINK IT I THINK IT’S I THINK WE I THINK WE OUGHT I THINK YOU I THOUGHT I’D I THOUGHT YOU I WONDER IF IF YOU ARE IF YOU ASK IF YOU ASK ME IF YOU WILL I’M AFRAID I I’M NOT SURPRISED I’M SURE YOU IN ANY CASE IS A VERY IS GOING TO BE IS SUCH A IS THE ONE IT MUST HAVE IT SEEMS THAT I’VE JUST BEEN KNOW WHAT THEY LOOK AT YOU ME ABOUT IT NOT GOING TO 20 14 35 43 47 20 14 15 14 31 18 28 19 53 20 23 52 28 21 19 29 25 19 27 14 43 14 14 15 32 23 20 25 25 16 19 69 20 14 16 14 14 23 15 16 16 19 38 9.99 3.06 8.13 10.03 19.75 9.99 3.06 4.22 3.06 22.71 7.68 19.24 8.84 16.44 9.99 13.46 6.74 19.24 11.15 8.84 20.40 15.77 8.84 18.09 3.06 36.58 3.06 3.06 4.22 5.34 13.46 9.99 15.77 15.77 5.37 8.84 23.09 9.99 3.06 5.37 3.06 3.06 13.46 4.22 5.37 5.37 8.84 10.97 23.12 16.18 21.25 23.15 32.87 23.12 16.18 17.34 16.18 35.83 20.81 32.36 21.96 29.56 23.120 26.59 19.87 32.36 24.27 21.96 33.52 28.90 21.96 31.21 16.18 49.70 16.18 16.18 17.34 18.46 26.59 23.12 28.90 28.90 18.49 21.96 36.21 23.12 16.18 18.49 16.18 16.18 26.59 17.34 18.49 18.49 21.96 24.09 1,060.22 1,059.71 2.78 2.49 3.20 1,060.22 1,059.71 1,059.81 1,059.71 1,060.86 1,060.07 1,060.71 1,060.15 2.57 1,060.22 1,060.42 1.89 1,060.71 1,060.29 1,060.15 1,060.76 1,060.55 1,060.15 1,060.66 1,059.71 1,061.33 1,059.71 1,059.71 1,059.81 2.65 1,060.42 1,060.22 1,060.55 1,060.55 1,059.90 1,060.15 2.43 1,060.22 1,059.71 1,059.90 1,059.71 1,059.71 1,060.42 1,059.81 1,059.90 1,059.90 1,060.15 2.89 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 (Continued) 252 Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto Appendix B (Continued) Key-LBs Freq. BIC Log-Likelihood Log-Ratio P-Value OF THESE DAYS ONE OF THESE DAYS OUGHT NOT TO OUGHT TO BE OUT OF HERE SAY THAT I SHALL BE ABLE SHALL BE ABLE TO SHE HAS BEEN SHOULD LIKE TO SORT OF PERSON SORT OF THING SUPPOSED TO BE SURE TO BE TALK TO YOU TELL HER THAT TELL THEM THAT THAT I AM THAT I SHALL THAT I WAS THAT IF I THAT IF YOU THAT IT IS THAT SORT OF THAT SORT OF THING THAT THEY ARE THAT WE SHOULD THAT YOU ARE THAT YOU HAVE THE WAY I THERE IS A THERE WOULD BE THING LIKE THAT THINGS LIKE THAT THINK OF IT THINK WE OUGHT THINK WE OUGHT TO THINK YOU OUGHT THINK YOU OUGHT TO TO DO IS TO DO SOMETHING TO HAVE BEEN TO HEAR ABOUT TO TALK TO YOU TO TELL ME TO THINK THAT TO YOU ABOUT WANT TO GO 17 17 15 60 15 14 16 16 16 22 18 45 20 15 19 16 18 27 15 15 17 17 14 34 18 18 15 30 37 17 27 14 17 19 15 19 19 18 17 6.53 6.53 4.22 22.89 4.22 3.06 5.37 5.37 5.37 12.31 7.68 14.62 9.99 4.22 8.84 5.37 7.68 18.09 4.22 4.22 6.53 6.53 3.06 26.18 7.68 7.68 4.22 6.39 7.22 6.53 3.52 3.060 6.53 8.84 4.22 8.84 8.84 7.68 6.53 19.65 19.65 17.34 36.02 17.34 16.18 18.49 18.49 18.49 25.43 20.81 27.74 23.12 17.34 21.96 18.49 20.81 31.21 17.34 17.34 19.65 19.65 16.18 39.30 20.81 20.81 17.34 19.51 20.34 19.65 16.64 16.18 19.65 21.96 17.34 21.96 21.96 20.81 19.65 1,059.99 1,059.99 1,059.81 2.75 1,059.81 1,059.71 1,059.90 1,059.90 1,059.90 1,060.36 1,060.07 2.82 1,060.22 1,059.81 1,060.15 1,059.90 1,060.07 1,060.66 1,059.81 1,059.81 1,059.99 1,059.99 1,059.71 1,060.99 1,060.07 1,060.07 1,059.81 2.97 2.53 1,059.99 2.82 1,059.71 1,059.99 1,060.15 1,059.81 1,060.15 1,060.15 1,060.07 1,059.99 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 22 14 17 16 14 35 16 14 30 12.31 3.06 6.53 5.37 3.06 5.42 5.37 3.06 21.55 25.43 16.18 19.65 18.49 16.18 18.55 18.49 16.18 34.68 1,060.36 1,059.71 1,059.99 1,059.90 1,059.71 2.45 1,059.90 1,059.71 1,060.81 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 Lexical Bundles in the Dialogues of Hongloumeng Translations 253 Appendix B (Continued) Key-LBs Freq. BIC Log-Likelihood Log-Ratio P-Value WE OUGHT TO WHAT IT IS WHAT YOU ARE WHAT YOU HAVE WHEN YOU ARE WHILE YOU ARE YOU ARE GOING YOU ARE GOING TO YOU ARE NOT YOU ARE TOO YOU DON’T NEED TO YOU KNOW WHAT YOU OUGHT TO YOU OUGHT TO BE YOU THINK THAT YOU WILL BE YOU WOULD BE YOU WOULD HAVE YOU’LL BE ABLE YOU’LL BE ABLE TO YOU’VE GOT TO 49 25 16 20 27 14 17 15 20 17 14 35 100 22 16 28 17 14 15 15 16 10.45 15.77 5.37 9.99 18.09 3.06 6.53 4.22 9.99 6.53 3.06 3.08 15.51 12.31 5.37 4.47 6.53 3.06 4.22 4.22 5.37 23.57 28.90 18.49 23.12 31.21 16.18 19.65 17.34 23.12 19.65 16.18 16.20 28.64 25.43 18.49 17.59 19.65 16.18 17.34 17.34 18.49 2.26 1,060.55 1,059.90 1,060.22 1,060.66 1,059.71 1,059.99 1,059.81 1,060.22 1,059.99 1,059.71 2.19 1.53 1,060.36 1,059.90 2.87 1,059.99 1,059.71 1,059.81 1,059.81 1,059.90 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 Source: * Only Key-LBs loglikelihood > 6.63 (for pvalue < 0.01) are listed here. 14 Mapping Culture-Specific and Creative Metaphors in Lu Xun’s Short Stories by L1 and L2 English Translators A Corpus-Assisted RelevanceTheoretical Account Linping Hou and Defeng Li 14.1 Introduction Lu Xun (1881–1936), the father of modern Chinese literature, is a pioneering writer of Chinese short stories in a new Western style (Hsia 1961, 28). His short stories have been translated into more than 50 foreign languages, available to the reader or critics in the Eastern and Western world. Up to now, there have been four most influential English translations of Lu’s short stories: three selected versions (Lyell 1990; Wang 1941; Yang and Yang 1956) and one complete version (Lovell 2009). It can be said that Lu gained his reputation in the English world owing to these (re)translations by L1 English translators (hereafter, L1 translators) as well as L2 English translators (hereafter, L2 translators). The English translations of Lu’s short stories have attracted much attention from some researchers in translation studies. Methodologically, most of the inquiries were prescriptive case studies, but only a few empirical explorations were involved in the descriptive analysis of these English (re)translations, along with social or cultural explanations. Among these empirical studies, a corpus-assisted method has been used to investigate linguistic (lexical or syntactic) features (e.g., Shao 2018; Shao and Wang 2018) and translator style (e.g., Li et al. 2018; Xu and Jiang 2020; Yan and Han 2015). It is worth pointing out that the directionality of these English translations, that is, forward (L1 to L2) translation and backward (L2 to L1) translation, has not been investigated systematically. Among quite a few corpus-assisted studies of directionality, Xu and Jiang (2020) did a case analysis of translation direction by examining linguistic features of L1 and L2 English translations of Ah Q Zhengzhuan (阿Q正傳), one of Lu’s short stories. Hardly did these corpus-assisted studies target the patterns of translation strategies of some specific concepts (e.g., metaphor) and their underlying cognitive traits. One of Lu’s artistic narrations is embedded in its rhetorical features by deploying metaphorical expressions. Most of these linguistic metaphors are characterized by culture-specific colors, which are regarded as “the most important particular problem” (Newmark 1988, 104) in translation for their “creative violation of the DOI: 10.4324/9781003298328-15 Mapping Metaphors in Lu Xun’s Short Stories 255 semantic rules in a linguistic system” (Schäffner 2017, 249). However, as a salient rhetorical feature in the source text (ST) of Lu’s short stories, metaphor and its transformation into the target text (TT) were scarcely investigated with the assistance of corpus tools. It might be due to the challenge of manual semantic annotation and the complexity of different metaphorical mapping across languages and time (Stefanowitsch 2006, 2). Beyond the research of English rendition of metaphors in Lu’s short stories, Rodríguez Márquez (2010) and Shuttleworth (2017) used corpus tools to examine the patterns of translation strategies of political and scientific metaphors, respectively. However, they were product-oriented research from the perspective of conceptual metaphor. In this scenario, there is a need for a corpus-assisted study of the rendering process of linguistic metaphors in Lu’s stories from an alternative (e.g., cognitive-pragmatic) perspective. A corpus-assisted process study might also be a big challenge to conduct mapping analysis of metaphors across languages and time compared with its corpus-assisted product analysis. The former has seldom been employed by the researchers (Lang and Li 2020, 91), although such pure process-oriented methods as a verbal report, keystroke logging, and/or eye-tracking technique have gradually attracted attention of translation process researchers to explore metaphors in translation (Schäffner 2017; Schäffner and Chilton 2020; Schäffner and Shuttleworth 2013). A few scholars (e.g., Chou et al. 2016; Huang 2020) used corpus-assisted process analysis to investigate the translation strategies and neurocognitive processing routes of culture-specific items in the Chinese rendition of English written literary texts. However, they treated metaphorical expressions as a whole under the umbrella term of culture-specific items without considering culture-universal metaphors and without mentioning the conventional or creative metaphors. In the same vein, Lang and Li’s (2020) corpus-assisted process study of culture-specific metaphors in simultaneous interpreting was in lack of scrutinizing culture-universal or conventional/creative metaphors. More importantly, Hou (2017) conducted a corpus-assisted cognitive study of translation strategy patterns of linguistic metaphors in English translations of Lu’s short stories by taking metaphors as a whole without further categorizing them into different types in terms of culture specificity and conventionality. Moreover, Hou (ibid.) considered processing routes of metaphors from a neurocognitive perspective instead of a communicative one. To fill this gap, the present study contrasted mapping strategies of culture-specific, universal, conventional, and creative metaphors within a cognitive-pragmatic framework with the help of a bilingual parallel corpus. There were two objectives in the present study: pattern description and theoretical explanation. First, it aimed to describe and contrast translation-strategy patterns of metaphors quantitatively. To this end, a bidirectional parallel corpus was constructed to identify and contrast the patterns of translation strategies adopted by L1 and L2 professional translators in their rendition of metaphors as a whole as well as subcategories (i.e., culture-specific, culture-universal, conventional, and creative ones). Second, it aimed to analyze the patterns within the theoretical framework of relevance theory (RT, Sperber and Wilson 1986/1995). Within the RT framework, the between-group (i.e., L1 vs. L2 translators in dealing with 256 Linping Hou and Defeng Li different categories of metaphors) and within-group (i.e., among the individual translators) analyses were drawn on cognitive-pragmatic principles as well as on the degree of translator’s subjectivity to violate these principles in the translation process of linguistic metaphors. Accordingly, there were three research questions in the current study, as follows. (1) What were the rendering patterns of metaphors in English translations of Lu’s short stories by L1 and L2 translators respectively? (2) Contrastively, what were the similarities and differences between and within these groups of translators? (3) Cognitive-pragmatically, why were there such similarities and differences? To answer the questions, Section 14.2 of the present chapter introduced RT as a theoretical framework in metaphor translation. Section 14.3 described the bidirectional bilingual parallel corpus for identifying translation-strategy patterns of metaphors. Section 14.4 reported the corpus results and data analyses, and Section 14.5 discussed the findings from the perspective of RT. 14.2 Relevance Theory as the Theoretical Framework RT proposed by Sperber and Wilson (1986/1995) has been viewed as the primary theoretical framework in cognitive pragmatics and as “the only cognitive-pragmatic approach within translation studies” (Gallai 2019, 51). It thus serves as a hub to relate a translator’s mental process to his/her rendering activity, a communicative act. This section presented a relevance-theoretic approach to pragmatics and translation in general and its applications to metaphors in translation in particular. 14.2.1 RT and Translation Sperber and Wilson (1986/1995) proposed the two modes of human communication: coding-decoding mode and inferential mode. The former views communication as a coding-decoding linear process, while the latter sees it as an inferential computing process with the help of contextual assumptions, that is, cognitive environment. The two modes are necessarily utilized in a complex communicative interaction between speakers and hearers in a processing form of the search for relevance. Defined as “a property of inputs to cognitive processes and analyzed in terms of the notions of cognitive effect and processing effort” (Wilson 2000, 420), relevance could be achieved by the audience in an assumed ostensiveinferential process of either explicit or implicit communication. The relevanceoriented communication consists of two principles proposed by Sperber and Wilson (1995, 260). (1) Cognitive principle of relevance. Human cognition tends to be geared to the maximization of relevance. Mapping Metaphors in Lu Xun’s Short Stories 257 (2) Communicative principle of relevance. Every act of communication conveys a presumption of its own optimal relevance. The degree of relevance is regulated by the listeners’ cognitive effect in a given context of mutual manifestation and mediated by his/her effort to infer the communicator’s informative or communicative intentions (Sperber and Wilson 2012, 102). Cognitively, other things being equal, the greater the positive effect yielded by input, the greater the relevance to an individual, and the more efforts worth exerting. However, the cognitive principle implies that an audience with a given level of inferential abilities and preferences tends to get the maximal effect by putting the least necessary effort, resulting in maximal relevance. Communicatively, every communicative act is viewed as a relevance-optimizing process during which the addressee exerts enough effort to understand the addresser’s intentions. The communicative principle indicates that no extra effort could be needed as long as the optimal effect is achieved. Altogether, the process of communication is economically conditioned by the principle of relevance in terms of cognitive (processing) effort and cognitive (contextual) effect: (1) the least effort but the most effect, and (2) enough effort with satisfactory effect. RT was first applied to translation as a unified theoretical framework by Gutt (1991/2000) and empirically advanced in written translation by such scholars as Alves (1995) and Alves and Gonçalves (2003, 2007). Its applications to interpreting and audiovisual translation (for a review, see Gallai 2019; also see Alves 2020) were not presented here due to our interests in written translation in the present study. Although RT approach to translation has been criticized by some researchers (Gallai 2019, 61–62, 65), this line of research is fruitful in translation studies by viewing translation as an interlingual interpretive use of language and by considering translator as a carrier striving for the interpretive resemblance between the ST and the TT at the level of propositional form, thought, or utterance. The interpretive resemblance could be realized by “direct translation,” a term defined by Gutt (2000, 171) as follows: [A] receptor language utterance is a direct translation of a source language utterance if and only if it purports to interpretively resemble the original completely in the context envisaged for the original. In this respect, a translator is guided by the ST authors’ communicative clue, that is, a sign invoking inference and response or starting point for poetic effect (Carston 2002, 117; Mackenzie 2002, 5; also see Walker 2021, 187), to seek contextual resemblance to that of the ST. By direct translation, the authentic meaning of the original should be “unaffected by the translator’s own interpretation effort” (Gutt 1991/2000, 186). Conversely, the other term, “indirect translation,” is “designed to function on their own – e.g. a touristic leaflet – and may be modified in order to achieve maximal relevance for the users” (Gallai 2019, 58). Apart from the interpretive resemblance between the ST and the TT in context, translation could be viewed as a secondary communication, “an act of 258 Linping Hou and Defeng Li communication between translator and target audience only” (Gutt 2000, 213). Based on his/her judgment of the contextual assumptions of the TT receiver, the translator is therefore required to meet the TT reader’s expectations via making decisions on such different crucial choices as employing direct and indirect translation, conveying informative and communicative intentions, and strategically using explicitation and implicitation. The translator and the TT reader, therefore, need to seek the shared basic assumptions about interpretive resemblance and the agreement between translators’ intentions and the reader’s expectations in order to succeed in their cross-language communication (Gutt 2000, 192). Additionally, the three players in translation (i.e., the ST author, the translator, and the TT reader) might have different cognitive environments. To satisfy the expectations of relevance, the translator performs a kind of “mutual adjustment” (Carston 2002) of two parallel processing in the translation process, that is, decoding the ST linguistic items into explicature and making contextual assumptions from the explicit content. Importantly, translation competence plays a vital factor in the translator’s adjustment of inference (Gutt 2005). In principle, other things being equal, the higher translation competence tends to result in a success in translation by means of optimal relevance. Alves and Gonçalves (2007) modeled the translator’s competence within the relevance theoretical framework and claimed that competence could ensure the interpretive resemblance between the ST and the TT. Moreover, human cognition is designed by min-max relation between effort and effect, but more effort could be exerted if there is a reward for the reader in the form of poetic effects (Gutt 2000, 164). It is the case of more effect deserving more efforts if the ST input is implicit rather than explicit content for a translator to make inferences from contextual assumptions. Everything being equal, this implicitness needs more necessary effort to process just as unshared, creative, and poetic items function as a communicative clue to operate in translation (Walker 2021, 186). To sum up, the translator’s choices within the RT framework are guided by the principle of relevance. The degree of relevance is, in general, evaluated by the translator’s cognitive effect and processing effort and, in particular, triggered by the source input (e.g., implicitness/explicitness, culture-shared/unshared items, creative/conventional usage) and modulated by his/her translation competence in the context of background information (e.g., the bilingual and bicultural background knowledge and inferential skills used in the process of translation to yield new cognitive effects). Theoretically, the translator’s decision-making is embedded in the search for an interactive balance between the interpretive resemblance of the original content and the satisfaction of the expectations of the target reader. In other words, the translator is expected to produce an equivalent effect between the ST and TT readers (Nida 1964, 159) so that the cognitive equivalence could be realized by reaching to similar cognitive effect between the ST and TT readers in terms of the rendition of communication clues (Walker 2021, 197–98). Probably in practice, he/she might assume interpretive resemblance but sacrifice expectations of the target reader so as to show his/her own translation intention (e.g., attitude or voice). It is therefore advisable for an RT translation researcher to take source input and translator’s competence as the essential influential factors. Mapping Metaphors in Lu Xun’s Short Stories 259 14.2.2 RT and Metaphors in Translation Linguistic metaphor within the RT framework has been explored robustly in monolingual communication (Carston 2017) but not in interlingual translation. The subsection presented RT views on metaphor and examined its plausibility to account for metaphor in translation. 14.2.2.1 RT Views on Metaphor Understanding A metaphor might be a phenomenon of a departure from the normal use of language as explored in classical rhetorics or a way of thinking as examined in cognitive linguistics. However, metaphor in RT is viewed as a phenomenon of loose (i.e., not strictly literal) use of language, requiring no specific mechanism or procedures compared with literal and loose uses in verbal communication (Sperber and Wilson 1995, 237; also see Carston 2017, 43). Thus, the RT approach to metaphor comprehension is complementary to metaphor research from the perspective of cognitive linguistics (Gibbs and Tendahl 2006; Tendahl and Gibbs 2008; Wilson 2011). There are two distinct relevance-driven processing routes to metaphor comprehension: one is via ad hoc concept, the other via literal meaning. The deployment of two routes depends on “a range of factors including the degree of familiarity, complexity, and creativity of the metaphor” (Carston 2017, 43). The pragmatic account of metaphorical utterance aims to “explain how hearers recognize the intended meaning of a metaphorical utterance in context” (Wilson 2011, 180). According to the traditional view in RT, hearers understand linguistic metaphors by mutual parallel adjustment of linguistic and contextual clues to create a novel ad hoc concept, which is different from linguistically encoded meaning (Wilson 2011, 179). In principle, metaphorical comprehension, like understanding other literal or nonliteral uses, is regulated by relevance, that is, the audiences exert the least effort in search of implications and stop at the first interpretation that satisfies his/her expectations of implicatures. As for an array of weak implicatures recognized in some metaphorical utterances, they might activate a powerful poetic effect on the hearers and encourage further exploration. Novel metaphors might involve analogical processing in deriving implicatures and generating ad hoc concept to understand them, compared with conventional metaphors (Wearing 2014). Different from the traditional RT view on the comprehension of metaphorical utterances (esp. conventional ones) by merely resorting to weak implicatures for ad hoc concept construction, Carston (2017, 51–52) claimed that there was an alternative, relevance-driven route to novel metaphor understanding. The alternative is far from “the quick, local, on-line meaning-adjustment process,” “but a slower, more global appraisal of the literal meaning of the metaphorical language from which inferences about the speaker’s meaning are made” (Carston 2017, 52). For example, image metaphors (esp., those new and creative ones) should be interpreted more slowly and globally. Those image metaphors produce such nonpropositional (experiential) effects as sensory, imagistic, or affective ones, 260 Linping Hou and Defeng Li which could not be represented by explicatures or implictures, but they are in need of further reflective inferential processing on the basis of literal meaning. Some experimental data are consistent with Carston’s current position, suggesting that metaphoric reading is far from cost-free in case of familiarity and creativity (Noveck 2018, 166, 169). 14.2.2.2 RT and Interlingual/Intercultural Metaphor Rendition Most of the previous studies of metaphor in translation as product or process have been investigated from the perspective of conceptual metaphor theory (see Schäffner 2004, 2017; Schäffner and Chilton 2020, 326–43; Schäffner and Shuttleworth 2013; Trim and Śliwa 2019), but there is little attention paid to linguistic metaphors in translation from the angle of RT (see Alves 2020; Gallai 2019). One of the core issues of the former line of research is the “culture-specific or universal metaphors and consequences for translation” (Schäffner 2017, 252), where the mapping patterns of metaphor in translation have been identified and analyzed within the theoretical framework of conceptual metaphor assisted with or without bilingual or multilingual corpus. Cultural factors have been addressed quite frequently in Translation Studies, with cultural differences being identified as obstacles to the semantic transfer of metaphors. . . . In investigating cultural differences, Translation Studies scholars also build on Metaphor Studies research into the ways conceptual metaphors are expressed linguistically in different languages. (Schäffner 2017, 252) Culture-specific constraint on rendering metaphors is therefore one of the main reasons for the choice of a translation strategy as well as some identified patterns (Schäffner and Chilton 2020, 339). However, it is worth pointing out that monolingually, a corpus-assisted pragmatic study of both conventional and novel metaphors within the RT framework is rare in comparison with the line of research in cognitive corpus linguistics, which focuses on conventional rather than novel uses of language (Kolaiti and Wilson 2014). It is also the case for a corpus-assisted pragmatic study of the bilingual translation process of novel and culture-specific metaphors. Kolaiti and Wilson’s (2014) corpus analysis of monolingual lexical pragmatics would shed light on bilingual translations of the aforementioned types of metaphors. As early as the 1990s, Gutt (1991) investigated metaphorical expression with the RT framework in his pioneering study. More recently, this kind of study was followed by other scholars (e.g., Boase-Beier 2011). These early cognitivepragmatic explorations of metaphor in translation paved the methodological way for further study, although they were not assisted with corpus tools. In this case, the corpus-assisted research of metaphors in translation within the RT framework needs to be conducted on the basis of monolingual studies (e.g., Kolaiti and Wilson 2014) since there are no previous bilingual ones used for reference. Mapping Metaphors in Lu Xun’s Short Stories 261 Furthermore, professional translators’ selection of translation strategies of novel and/or culture-specific metaphors needs to be revisited due to translation competence as a critical factor to constrain metaphor rendition strategy. In short, it is plausible to adopt RT to analyze metaphors in translation, as early studies have shown. The choice of translation strategy of metaphors within RT depends on the key constraints, such as translator’s metaphor competence (e.g., familiarity) and types of metaphors (e.g., novel/creative or conventional; culturespecific or culture-universal). 14.2.3 Plausibility of RT in the Current Study The main reason for applying RT into the present research lay in the gap-filling exploration of rendering strategies in terms of the subcategories of metaphorical expressions as well as the translator’s competence. As mentioned in Section 14.1, metaphors in English translations of Lu’s short stories were mainly examined from lingual/social/cultural perspectives rather than a cognitive-pragmatic one. Additionally, most relevance-theoretical studies of metaphors in translation neglected the novelty and culture specificity of literary poetic metaphors, not to speak of those in Lu’s short stories by L1 and L2 translators. It was worth noting that both L1 and L2 translators in the present study were professional translators. Therefore, translation competence in the current research was viewed as a welldefined influential factor. The other reasons were the seamless connection of RT and metaphor in translation. First, RT has adequate explanatory power in interlingual and intercultural communication (i.e., translation; for a detailed review, see Gallai 2019, 51–72). It is thus advisable to apply RT to examine metaphors, one of the important or difficult issues in translation. Second, different from outer social-cultural analysis and abstract cognitive-linguistic explanation, RT is characterized by psychological reality (Gibbs and Tendahl 2006, 379–403) and focuses on the inner (mental) world of translators and communicative context (the speaker’s intention and reader’s expectation). In this respect, RT could provide a cognitive-pragmatic analysis for translating metaphors in the present study. Third, with the development of metaphor research from usage to cognition, there is a trend of taking communication as an angle to investigate metaphors (Steen 2011, 26). Accordingly, the exploration of metaphor in cross-language/culture communication is in need of a theoretical framework to keep pace with the trend in cognitive-pragmatic analysis. It is, therefore, safe and sound to say that as it would be a groundbreaking research area to connect metaphor with the relevance-driven idea (Carston 2017, 42–43), RT might be one of the best choices to explore metaphor in translation. It could therefore be predicted that RT possesses an invaluable power in its cognitive-pragmatic account for the mental processes of metaphor rendition. RT might tell us much more about the aspects of cognition in communication in contrast with the cognitive-linguistic approach to renditions of metaphors. It could serve as a sound theoretical tool to probe into the interplay of effort and effect in L1 and L2 translator’s translation process in the present research. 262 Linping Hou and Defeng Li 14.3 Research Method As mentioned previously, it is plausible for the researchers to adopt the corpusassisted process research method to investigate the renditions of novel and culture-specific metaphors. Additionally, these metaphors could be explained within the RT framework by following Kolaiti and Wilson’s (2014) practice of applying the corpus-assisted research method. To the end, we constructed a self-supported bilingual parallel corpus with the ST of Lu’s short stories and four English TTs by L1 and L2 professional translators, respectively. 14.3.1 Bidirectional Multi-translational Bilingual Parallel Corpus The corpus consisted of Lu’s ten short stories as the ST subcorpus and their four English versions (i.e., multi-translations) as the TT subcorpus (for the detailed word count of each subcorpus, see Li et al. 2018). The TT subcorpus was further divided into TT_L1Eng and TT_L2Eng subcorpora. TT_L1Eng subcorpus was composed of the English translations of Lu’s ten short stories by Lyell (1990) and Lovell (2009). Accordingly, TT_L2Eng subcorpus was made up of the English versions by Wang (1941) and Yang and Yang (1956). The Yangs were viewed as L2 translators in this study because Yang was the principal translator and his wife was a TT polisher in their collaborative translation, which was in line with Wang’s (2011, 897) proposal in his historical review of directionality in Chinese translation practice. The ST and the TT(s) in each subcorpus were segmented at the sentence level and aligned into parallel texts. By using the format of XML, the ST subcorpus was annotated with four types of metaphors, and the TT subcorpus was tagged with four translation strategies for these metaphors, as shown in the following subsections. 14.3.2 Metaphor Identification We resorted to an authoritative definition in order to successfully circumscribe and identify metaphors in the ST subcorpus. The metaphor in the ST was clarified with the definition by Dickins as follows: A metaphor is a figure of speech in which a word or phrase is used in a nonbasic sense, this non-basic sense suggesting a likeness or analogy . . . with another more basic sense of the same word or phrase. (Dickins 2005, 228) Apart from the prior definition, we strictly followed the metaphor identification procedure (MIP) put forward by Pragglejaz Group (2007). Categories of metaphors in the ST were classified in terms of time and space, that is, (1) conventional and creative metaphors in the temporal dimension and (2) culture-specific and culture-universal metaphors in the spacial dimension. We focused on culture-specific and creative metaphorical references rather than subclassification of metaphors at different linguistic (e.g., lexical, phrasal, or clausal) levels or different Mapping Metaphors in Lu Xun’s Short Stories 263 word classes (e.g., nominal, verbal, adjective, adverbial, or prepositional one). Therefore, the metaphorical references in the ST in our corpus were lexical in nature, specific in culture, and creative in time. Note that creative or novel metaphors in Lu’s short stories refer to individually used but not popularly accepted metaphorical expressions, although Lu’s short story was the production around the 1920s–1930s. Additionally, culture-universal and conventional ones were identified but viewed as the control group of items. The overlapped space and time dimensions resulted in four mixed subcategories of metaphors, namely, “culture-universal and conventional,” “culture-universal and creative,” “culture-specific and conventional,” and “culture-specific and creative” ones. In this case, we chose culture specificity as the first dimension and creativity as the second to identify and annotated them separately. With similar operational procedures of MIP (Pragglejaz Group 2007), we focused on the contextualized use of nonbasic or indirect sense of metaphorical expressions compared with their basic or literal sense. For instance, literally referring to “cold” and implicating “unsympathetic,” lenglengde ( 地) in the expression “lenglengde shuo ( 地說)” was identified as a culture-universal and conventional metaphorical reference due to its culture-shared and conventionalized features in both Chinese and English languages/cultures in the present corpus. Take suipian ( ) in the expression “[ta]sixiang li chuxian baikui baijia de suipian ([他]思想裡出 白 白 的 )” for example, it literally means small broken pieces of some concrete things and metaphorically means parts of experienced events. Suipian in the present corpus was identified as a culture-universal and creative metaphorical expression because it is not a fixed or conventionalized expression in Chinese language but shared metaphor in both Chinese and English cultures by means of looking up such Chinese and English authoritative dictionaries as Modern Chinese Dictionary ( 代漢語詞典) and Oxford English Dictionary along with inquiries from the native language speakers. Similarly, Koufeng (口風) in the phrase “tan gemingdang de koufeng ( 的口風)” is a metaphorical reference, with the literal meaning of “wind from one’s mouth” and indirect meaning of “attitude.” It was identified as a Chinese culture-specific and conventional metaphor in the present corpus, considering its usage in Chinese and English languages. With the same identification procedures and considering the literal/nonliteral usage across time and culture, wachu ( 出) in the verbal expression “cong shouli wachu yizhang zhitiao (從手 裡 出一張紙條)” was recognized as a culture-specific and creative metaphor. 14.3.3 Translation Strategies The strategies for translating metaphors were divided into four categories, namely, transcoding, paraphrasing, substitution, and omission, based on existing literature (e.g., Chou et al. 2016; Lang and Li 2020; Sjørup 2013, 75; Toury 2012, 108). We fully followed the definitions of the four translation strategies by Lang and Li (2020), as follows. Transcoding meant that the target retained the image and language form of the source; paraphrasing meant that the target explained the meaning of the 264 Linping Hou and Defeng Li source while discarding the image and the form of the source expression; substitution referred to the replacement of the source metaphor with another metaphor entailing a different image in the target language; omission meant no corresponding translation in the target output. (Lang and Li 2020, 97) Take wachu ( 出), one of the metaphorical references in Subsection 14.3.2, as an example. It had a literal meaning of “to dig something out” and metaphorical meaning of “to catch and then hold something carefully by one’s hands” in the atypical collocation “cong shouli wachu yizhang zhitiao” (從手 裡 出一張 紙條;literally, to dig out a sheet of paper from the handkerchief). When the English translation “to dig out” was adopted, it was the result of transcoding, a strategy used to retain the original form and image. The English translation “to take out” was a result of the strategy of paraphrasing as the original image and form were changed into an expression with explicit meaning. It is the case of substitution when the English idiomatic expression “to fish out” was used to replace the original image with a similar metaphorical meaning. Finally, the omission could be found in the target text where there was no corresponding translation at all. 14.3.4 Annotation and Cross-Check The complete automatic annotation of metaphors with software could be the castle in the air currently, as suggested by some scholars (e.g., Shuttleworth 2013, 119; Lang and Li 2020, 97). It might be unavoidable for a researcher to resort to his/her manual efforts to annotate metaphors in the ST. Human efforts were also required to annotate translation strategies in the TTs based on the operational definitions. Under such a circumstance, we made a sort of semi-automatic software by using macro add-in in Microsoft Word in the format of XML to cope with the time-demanding annotation. All the metaphors and their translation strategies were annotated manually with the semi-automatic software. The annotation in the present study underwent cross-check by three individual researchers to ensure its reliability and validity. 14.4 Results and Analyses After the completion of annotation, we identified 779 metaphors in the ST, among which there were 724 culture-specific and 55 culture-universal ones. The total number of culture-specific metaphors fell into 266 conventional and 458 creative ones, while culture-universal metaphors consisted of 41 conventional and 14 creative ones. The corresponding strategies for rendering these metaphors by L1 and L2 translators were identified and tabulated in Tables 14.1–14.4, respectively. These strategies could be integrated into Gutt’s (1991/2000) cognitive-pragmatic translation routes within the framework of RT, namely, direction translation and indirect transition, by transforming transcoding into the former and by merging paraphrasing, substitution, and omission into the latter, as seen in Figure 14.1. Mapping Metaphors in Lu Xun’s Short Stories 265 Table 14.1 Rendering Strategies of Conventional Culture-Specific Metaphors L2 Translators TRC (%) PAR (%) SUB (%) OM (%) Total L1 Translators Wang (1941) Yang and Yang (1956) Lyell (1990) Lovell (2009) 69 (25.94) 138 (51.88) 30 (11.28) 29 (10.90) 266 56 (21.05) 150 (56.39) 49 (18.42) 11 (4.14) 266 66 (24.81) 122 (45.86) 64 (24.06) 14 (5.26) 266 41 (15.41) 133 (50.00) 50 (18.80) 42 (15.79) 266 Note: TRC = transcoding; PAR = paraphrasing; SUB = substitution; OM = omission. Table 14.2 Rendering Strategies of Creative Culture-Specific Metaphors L2 Translators TRC (%) PAR (%) SUB (%) OM (%) Total L1 Translators Wang (1941) Yang and Yang (1956) Lyell (1990) Lovell (2009) 170 (37.12) 200 (43.67) 55 (12.01) 33 (7.21) 458 162 (35.37) 211 (46.07) 70 (15.28) 15 (3.28) 458 162 (35.37) 164 (35.81) 113 (24.67) 19 (4.15) 458 99 (21.62) 199 (43.45) 86 (18.78) 74 (16.16) 458 Note: TRC = transcoding; PAR = paraphrasing; SUB = substitution; OM = omission. Table 14.3 Rendering Strategies of Conventional Culture-Universal Metaphors L2 Translators TRC (%) PAR (%) SUB (%) OM (%) Total L1 Translators Wang (1941) Yang and Yang (1956) Lyell (1990) Lovell (2009) 36 (87.80) 2 (4.88) 1 (2.44) 2 (4.88) 41 23 (56.10) 13 (31.71) 5 (12.20) 0 (0) 41 23 (56.10) 11 (26.83) 5 (12.20) 2 (4.88) 41 19 (46.34) 18 (43.90) 3 (7.32) 1 (2.44) 41 Note: TRC = transcoding; PAR = paraphrasing; SUB = substitution; OM = omission. Table 14.4 Rendering Strategies of Creative Culture-Universal Metaphors L2 Translators TRC (%) PAR (%) SUB (%) OM (%) Total L1 Translators Wang (1941) Yang and Yang (1956) Lyell (1990) Lovell (2009) 13 (92.86) 0 (0) 0 (0) 1 (7.14) 14 11 (78.57) 2 (14.29) 0 (0) 1(7.14) 14 11 (78.57) 2 (14.29) 1 (7.14) 0 (0) 14 11 (78.57) 2 (14.29) 0 (0) 1 (7.14) 14 Note: TRC = transcoding; PAR = paraphrasing; SUB = substitution; OM = omission. 266 Linping Hou and Defeng Li 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 85.71% 76.50% 63.76% 79.89% 71.95% 71.51% 78.57% 51.22% 48.78% 36.24% 23.50% 28.05% 20.11% 14.29% Con(S) Crt(S) Con(U) Crt(U) 28.49% DirTran 21.43% IndTran Con(S) Crt(S) Con(U) Crt(U) L2 Translators L1 Translators Figure 14.1 RT translation route of metaphors by L1 and L2 translators Note: Con(S) = Conventional Culture-Specific Metaphors; Crt(S) = Creative Culture-Specific Metaphors; Con(U) = Conventional Culture-Universal Metaphors; Crt(U) = Creative Culture-Universal Metaphors ; DirTran = Direct Translation; IndTran = Indirect Translation. 14.4.1 Patterns of Translation Strategies The distributions of translation strategies of culture-specific metaphors were tabulated in Tables 14.1–14.2. Table 14.1 illustrated translation strategies of conventional culture-specific ones, while Table 14.2 depicted creative culture-specific ones. Accordingly, the distribution of translation strategies of culture-universal metaphors was tabulated in Tables 14.3 for conventional culture-universal ones and in Table 14.4 for creative culture-universal ones. The patterns of rendering strategies between groups as well as within group were analyzed in the following. As shown in Tables 14.1 and 14.2, paraphrasing was the dominant strategy used by both L1 and L2 translators for rendering culture-specific metaphors, followed by transcoding, substitution, and omission, regardless of either conventional or creative subcategories. Compared with conventional culture-specific metaphors, the creative ones as a whole in both translation directions accounted for a slightly lower share for paraphrasing strategy but a slightly higher one for transcoding. Directionwise, L2 translators tended to employ more paraphrasing but less substitution than L1 translators irrespective of convectional or creative culture-specific metaphors. Individual differences within the groups of professional translators could be found in both Table 14.1 and Table 14.2. Specifically, transcoding was used to the greatest extent by Wang (1941) in comparison with any other translators in the present research, paraphrasing by Yang and Yang (1956), substitution by Lyell (1990), and omission by Lovell (2009). It suggested that the individual translator had his/her translation style in terms of translation strategies used to render both subcategories of culture-specific metaphors. Tables 14.3 and 14.4 indicated the frequency distribution and the percentage of rendering strategies of culture-universal metaphors in terms of conventionality or creativity. Transcoding took the lion’s share among the translation strategies with a higher percentage for conventional ones, especially, adopted by L2 translators in Mapping Metaphors in Lu Xun’s Short Stories 267 forward translation. In general, paraphrasing was the secondary used strategy by both L1 and L2 translators (but more for the former) for rendering culture-universal metaphors. However, there was no central tendency for the use of substitution and omission either between or within a group. Taken together, our corpus data revealed that paraphrasing was the most adopted strategy by both L1 and L2 translators for rendering culture-specific metaphors, while transcoding was the major one for culture-universal metaphors, suggesting that culture-specific metaphors should be more difficult to translate than cultureuniversal ones. Additionally, the percentage of transcoding used by both L1 and L2 translators for translating conventional metaphors was lower than creative metaphors as a whole. Finally, more paraphrasing but less substitution were used by L2 translators than L1 translators in terms of culture-specific metaphors, suggesting that more efforts should be exerted by L1 translators to achieve better cognitive effect. In a similar vein, more paraphrasing and substitution were adopted by L1 translators for translating culture-universal metaphors, suggesting more cognitive effect could be exerted due to more cognitive effort exerted by L1 translators. 14.4.2 Patterns of Cognitive-Pragmatic Translation Routes As mentioned previously, the four rendering strategies were correlated to two cognitive-pragmatic translation routes, namely, direct and indirect routes, for further analysis within the RT framework. For the sake of illustration, the present research visualized the percentage of translation routes in Figure 14.1, and the exact numbers of frequency distribution could be referred to in Tables 14.1–14.4. Figure 14.1 demonstrated that indirect translation overwhelmingly dominated the rendition of culture-specific metaphors. However, direct translation for culture-universal metaphors accounted for the biggest share in contrast to indirect translation. Furthermore, the use of direct and indirect translation routes by L2 translators for transforming culture-specific metaphors as a whole was statistically different from that of culture-universal metaphors (χ2 = 86.97, p < 0.05). It suggested that metaphorical references with culture-universality in the ST required more direct translation, while those of culture specificity engaged more indirect translation. However, the rendition of conventional metaphors as a whole by L2 translators demanded more indirect translation than that of creative ones (χ2 = 9.85, p < 0.05). More interesting, L1 translators shared L2 translators’ central tendency, that is, more direct translation employed for culture-universal metaphors in comparison with culture-specific ones (χ2 = 54.78, p < 0.05), but less for conventional ones in contrast to creative ones (χ2 = 6.06, p < 0.05). However, L1 and L2 translators adopted statistically different translation routes, although a similar central tendency was observed between them. Direct translation was recruited more frequently by L2 translators, while indirect translation was employed more frequently by L1 translators in terms of culture specificity and creativity of the original metaphors. Specifically, there were statistically significant differences between direct and indirect translation used by L1 and L2 translators for translating culture-specific and culture-universal metaphors (culture-specific: L1 vs. L2 translators, χ2 = 13.43, p < 0.05; culture-universal: L1 vs. 268 Linping Hou and Defeng Li L2 translators, χ2 = 7.4, p < 0.05). It is the same case for conventional and creative metaphors respectively as a whole (conventional: L1 vs. L2 translators, χ2 = 5.05, p < 0.05; creative: L1 vs. L2 translators, χ2 = 12.6, p < 0.05). It seemed that culture specificity and creativity of metaphors in the ST had a significant impact on their cognitive-pragmatic translation routes employed L1 and L2 translators in the present research. With regard to the four mixed subcategories of metaphors (i.e., conventional culture-specific, creative culture-specific, conventional culture-universal, and creative culture-universal ones), they shared a similar tendency to those merged ones in terms of culture specificity or creativity. Statistically, direct translation was more frequently adopted by L2 translators than L1 translators in their translating creative culture-specific metaphors as well as conventional culture-universal ones (creative culture-specific, χ2 = 12.57, p < 0.05; conventional culture-universal, χ2 = 7.45, p < 0.05). However, there was no statistical difference between L1 and L2 translators’ adaptation of translation routes for rendering conventional culture-specific and creative culture-universal ones, respectively (conventional culture-specific, χ2 = 1.79, p > 0.05; creative culture-universal, χ2 = 0.12, p > 0.05), although direct translation was employed more frequently by L2 translators than L1 translators in such a condition (3.39% higher of conventional culturespecific metaphor and 7.14% higher of creative culture-universal ones for L2 than L1 translators). It suggested that the selective nuances of translation routes by L1 and L2 translators could be manifested in the process of rendering the mixed subcategories of metaphors. 14.5 RT Accounts and Discussions As shown in Section 14.4, the findings of the present study could affirmatively answer the first two research questions raised in Section 14.1.3. Concerning the third question, this section served as RT accounts for the similarities and differences within and between L1 and L2 translators in terms of translation strategies and routes. 14.5.1 Dominant Strategies/Routes by L1 and L2 Translators One of the salient features of translating metaphorical references in Lu’s short stories lay in the dominant transfer shared by L1 and L2 translators. As for rendering culture-specific metaphors (esp., conventional ones) regardless of translation direction, paraphrasing was the dominant strategy, as shown in Tables 14.1–14.2, and indirect translation was the most frequently adopted route, as visualized in Figure 14.1. However, transcoding was the overwhelmingly dominated strategy for rendering culture-universal metaphors (esp., creative ones) by L1 and L2 translators, as indicated in Tables 14.3–14.4, and direct translation was the most frequently adopted route, as demonstrated in Figure 14.1. In general, it suggested that those culture-specific metaphors might cause more cognitive effort in translation than culture-universal ones. Mapping Metaphors in Lu Xun’s Short Stories 269 Paraphrasing as the dominant strategy is also one of the major findings by Hou (2017), who viewed metaphor as a whole without further classification. It might be due to the reason that Lu’s short stories are rich in culture-specific metaphors. In this respect, the use of paraphrasing strategy is of no significant difference between metaphors as a whole and culture-specific metaphors. As culture-specific metaphors violate the semantic rules in the ST and are regarded as barriers in translation (Newmark 1988, 104; Schäffner 2017, 249), a direct translation of the metaphorical image tends to be a failure for cross-cultural communication. Cognitive-pragmatically, most of the specific original image should not be transplanted into the TT if there is no counterpart, as indicated in the four TTs of fanlian (翻臉) in example 1, for fear that the foreignizing image could not evoke the cognitive effect among the target readers. On the other hand, it was the translator’s optional decision-making to violate or observe cross-cultural communicative conventions. With this regard, the minority of culture-specific metaphors could be optionally transcoded into the TT (as shown in the TT3 of example 2) with the assumption that the target readers would exert extra effort for poetic effect in case of their recognizing and inferring the communication clue of the stylistic salient feature. Therefore, it seems that direct translation could be an alternative minor choice for professional translators to cope with culture-specific metaphors in literary translation, except for their deliberate purpose of introducing these metaphors in the ST into the target language. In general, a direct translation could not dominate the rendition of culture-specific metaphors, although a professional translator needs to arrive at an interpretive balance between the ST and TT (Gutt 1991/2000, 171). Example 1 ST: 況且他們一翻臉,便說人 惡人。 TT1: They have a way of branding anyone they don’t like as a wicked man. (Wang 1941) TT2: . . . and once they are angry they will call anyone a bad character. (Yang and Yang 1956) TT3: What’s more, as soon as they turn against someone, they’ll say he’s evil anyway. (Lyell 1990) TT4: And they can turn on you in an instant. (Lovell 2009) Example 2 ST: . . . 似乎想 的口風 。 TT1: . . . seeking to discover the attitude of the revolution towards himself. (Wang 1941) TT2: . . . as if sounding out the revolutionaries’ attitude. (Yang and Yang 1956) TT3: . . . in an attempt to determine which way the revolutionary winds were blowing. (Lyell 1990) TT4: . . . (Lovell 2009) 270 Linping Hou and Defeng Li Examples 1–2 demonstrate the typical pictures of translating culture-specific metaphors in Lu’s short stories by L1 and L2 translators. These examples imply that the majority of the original image of culture-specific metaphors should be filtered or mediated in the TT, except for the translator’s assumption that the target readers are capable of and willing to make extra effort to infer weak implicature from ad hoc concept construction (Kolaiti and Wilson 2014). Additionally, the intended meaning of culture-specific metaphors need to be comprehended by the TT readers from the literal meaning of the transferred metaphors, the creative ones in the TT, as claimed by the latest generation of relevance theorist (e.g., Carston 2017; Noveck 2018). As far as the translator’s decision-making process is concerned, his/her mental operations in translating metaphors can also be supported by some non-RT experimental studies (e.g., Sjørup 2013; Tirkkonen-Condit 2002). Based on her research, Tirkkonen-Condit (2002) claimed that transcoding or direct transfer was the default translation strategy for the translator to cope with the culture-specific metaphors. Similarly, Sjørup (2013, 207) pointed out, “the translators had a default direct transfer strategy, which was only replaced by an indirect translation strategy when the translator found it necessary.” In short, dormant indirect translation of the culture-specific metaphors could be interpreted as evidence for culture filter or concept mediation due to the translator’s final inferential processing to meet the TT readers’ expectations in terms of relevance. However, with an in-depth examination of the different types of metaphors in the current study, the most frequently used strategy and processing route for rendering the culture-universal ones are different from that of the culture-specific ones. Not viewed as translation production problems or difficulties, culture-universal metaphors in the ST could be directly translated into the TT without any obstacles to crossing over. The translator might subconsciously follow the cognitive and pragmatic principles to keep the shared image in the TT via direct translation in most cases, as shown in all the TTs in example 3. Moreover, he/she might consciously transform the literal sense of the original metaphor into its metaphorical sense, null meaning, or alternative image in a given particular case. The particular translation performance would result in the use of paraphrasing, deleting, or substitution strategy (i.e., strategic realization of indirect translation). It was a case where the target reader’s cognitive effort would increase in search for poetic effort with the translator’s deliberate employment of image substitution or decrease by means of semantic explicitation or omission. Example 4 illustrates that the translator in TT4 might enhance the target reader’s cognitive effort to enjoy the added poetic effect by replacing the original image with a new one in the TT, while the translator of TT1 might lessen the target readers’ cognitive effort by omitting the original image. Example 3 ST: 漸漸地,小報上有匿名人來攻擊他 . . . TT1: Gradually there appeared in the tabloid papers anonymous attacks upon him . . . (Wang 1941) Mapping Metaphors in Lu Xun’s Short Stories 271 TT2: Gradually anonymous attacks appeared in the less reputable papers . . . (Yang and Yang 1956) TT3: Anonymous attacks on him gradually began appearing in the local papers . . . (Lyell 1990) TT4: Small local newspapers began launching anonymous attacks on him . . . (Lovell 2009) Example 4 ST: . . . [他] 地說,“但哪裡去呢?” TT1: . . . he said after I had told him of my desire to go elsewhere to find a position. “But where to go?” (Wang 1941) TT2: . . . he said coldly, after I asked him to recommend me to a job somewhere else. “But where will you go?” (Yang and Yang 1956) TT3: . . . he said coldly after I had asked him to help me find a job someplace else. “But where else can you go?” (Lyell 1990) TT4: . . . he stonily responded to my request for help in tracking down a new situation. “But where else can you go?” (Lovell 2009) It is worth pointing out that the translator’s routine practice for rendering culture-universal metaphors as a whole is to adopt transcoding strategy or direction translation route, as revealed by the results of our research data in Section 14.4. The overall positive transfer of shared images of culture-universal metaphors from the ST to the TT could be “a preferred/unmarked choice” (Mauranen 2004, 80), although fluent bilinguals, especially professional translators, were reported to flexibly use processing routes in text-based translation tasks (Hatzidaki and Pothos 2008). In this respect, the translator’s deliberate manipulation in a given cognitive context merely occurs in a few special cases. It seems that the translator’s operation should be first regulated by the cognitive principle of maximal relevance and then adjusted by the communicative principle of optimal relevance due to the gradual consciousness of the features of the input source in cognition and communication. However, the temporal dimension of culture-universal metaphors in this study displays that more transcoding and direct translation route are used for translating creative culture-universal metaphors than conventional culture-universal ones. Probably, Lu borrowed the conventional expressions of western metaphors into his literary writings in the early 1900s as a result of fresh flavor for the ST readers. These creative culture-universal metaphors in the ST could be understood without any effort by the TT readers due to these metaphors as conventional and shared ones in the TT. As illustrated by the creative culture-universal metaphor “suipian” ) in example 5, the TT1 and TT4 translators were engaged with the use of ( transcoding strategy and direct translation route to meet the target reader’s expectations, although the cognitive equivalence of the ST and TT readers might not be balanced due to the creativity in the ST and conventionality in the TT. In general, 272 Linping Hou and Defeng Li the maintenance of ST and TT communicative conventions by means of direct translation leads to the central tendency to render culture-universal metaphors. Example 5 ST: . . . [他]思想裡出 白 白 的 。 TT1: . . . and in his fertile imagination there again appeared fragments of shattered white helmets and white armor. (Wang 1941) TT2: . . . and in his mind’s eye saw fragmentary visions of white helmets and white armour once more. (Yang and Yang 1956) TT3: . . . and glimpses of white helmets and white armor began to flicker across his brain once again. (Lyell 1990) TT4: . . . and fragments of white helmets and armour drifted back into his thoughts. (Lovell 2009) Apart from the RT analysis of these patterns, our results partially support the previous corpus-assisted studies of metaphors or culture-specific items in other texts rather than Lu’s short stories. It is worth mentioning that our results could not be compared with those of corpus-assisted study of linguistic features or translator’s style in English translations of Lu’s short stories (e.g., Li et al. 2018; Shao 2018; Shao and Wang 2018; Xu and Jiang 2020) due to the fact that the latter ones are not explorations of translation strategies/routes of metaphors. Taking metaphors with culture specificity as a whole, our research is correlated with some corpus-assisted research of culture-specific or difficult items in literary translation (Huang 2020) and even in consecutive interpreting (Dam 2001) or simultaneous interpreting (Lang and Li 2020), although they are not involved in Lu’s short stories. All the here-mentioned studies support the finding that paraphrasing strategy or meaning-based translation is used for rendering culture-specific items at a large scale, suggesting that more efforts should be required for the translator/ interpreter to deal with the culture-specific items in considering the target reader’s relevance to those items. More interesting, our finding of more transcoding and direct translation route for rendering creative culture-universal metaphors is also consistent with that of English-Chinese translation of culture-specific items by Chou and her colleagues (Chou et al. 2016). It should be mentioned that Chou and her colleagues (ibid.) selected the English novel Joy Luck Club by Amy Tan as the ST, which is rich in Chinese idiomatic expressions written in English (e.g., metaphor as a subcategory of these expressions). The borrowed items from the Chinese language are treated as creative culture-specific ones in the English ST (ibid), but they are corresponding ones between English ST and Chinese TT. In this case, the translators seem to transfer the items back into the TT with less effort by using transcoding strategy compared with those unshared ones in terms of the relevance of effort and effect. Mapping Metaphors in Lu Xun’s Short Stories 273 14.5.2 L1 Translators’ Adaptation vs. L2 Translators’ Adoption Taking metaphors as those being processed in different translation directions, our study demonstrated that as an overall tendency, direct translation (realized by adoption) was weighed by L2 translators while indirect translation (manifested by adaption) was preferred by L1 translators. With respect to the overall tendency of translators’ selecting translation strategies in both directions, our result is aligned with that of Huang’s (2015, 79–94) corpus-based study of English translations of Jia Pingwa’s novels. Huang (2015) concluded that L1 translators preferred adaption while L2 translators were more likely to use adoption. In general, the finding could be descriptively explained from the perspective of “interference” and “standardization” (Toury 2012, 303–15). Take L2 translators for instance, interference from the Chinese language (the source one) was stronger on the L2 translators due to Chinese as their native language and to their aims to introduce Chinese literature to the western world. In other words, the larger number of L1 formal equivalence in L2 production in forward translation would result in a higher proportion of adopted transfer. It was also correlated with the priming or cue effect of the source language in translation comprehension, transfer, and production (Wen and van Heuven 2017). In this case, L2 translators would subconsciously prefer the treatment of adoption rather than adaption. However, cognitive-pragmatic explanation within the RT framework might shed much more light on the directional effect in translation. The principle of relevance entails two components in communication (or translation): one is the least effort or min-max principle, the other is the less effort or optimal principle. Cognitive-pragmatically, the former is realized as adopted transfer, while the latter is manifested as the adapted transfer. Governed by the principle of relevance, the two processing routes (i.e., direct and indirect translations) are available at any time, and the former is the priority, whereas the latter will take over if the former fails. It could be inferred that professional translators subconsciously observe the principle of relevance in different translation directions and that they take such factors as concept and intention in their cognitive environments into consideration to an optimal degree. The unbalanced adoption and adaption suggest that L2 translators’ cognitive environment is different from L1 translators’. Concept-wise, it is impossible for professional translators to know all metaphorical references in the ST and their counterparts in the TT due to the unavailability of their mental lexicon. Professional translators might have an unbalanced mental lexicon in terms of L1 and L2 realization of a given concept. L2 professional translators are therefore much more knowledgeable in Chinese items but less in English ones. If not, they could be translation experts with balanced knowledge in both languages and cultures in theory, but few could be found in practice. Conversely, L1 translators are equipped with larger mental storage of English items. This asymmetry of the mental lexicon in translator’s cognitive environment could explain the result of more adoption by L2 translators and more adaptation by L1 translators. With regard to intention, L2 translators in this present study 274 Linping Hou and Defeng Li aimed to promote Chinese culture to the English world by keeping some potentially acceptable culture-specific images unchanged in the TT, while L1 translators aimed to introduce it as a novel for the general target readers, resulting in the loss of foreign flavor more or less. In this respect, L2 professional translators influenced by his/her cognitive environment are more engaged with adoption than L1 professional translators. It would be likely to generate a slightly larger proportion of direct translation or adoption but a lower share of concept-mediated transfer by L2 professional translators than L1 professional translators. As stated in Section 14.4, preference for translation strategies of metaphors varies across individual translators as the result of the different deployment of direct or indirect processing routes. In general, it is also individual differences in terms of the cognitive environment that regulate the employment of direct or indirect translation. Compared with other translators, Wang (1941), as an L2 translator, employed much more direct transfer. It is implied that he was more adoptionweighted due to his source-oriented purpose of promoting Chinese culture in his cognitive environment. On the contrary, Lovell (2009) used more indirect transfer to adapt Chinese metaphors to fulfill his more target-oriented purpose and meet potential target readers’ expectations. Apart from observing the general tendency, both L1 and L2 translators could adapt the culture-specific metaphor in the TTs in some special cases. In examples 6–7, maimo (埋沒, literally referring to “bury”) and yinxian (引線, literally referring to “fuse”) were rendered indirectly for lower the cognitive effort of the target readers in all TTs except for TT3 in example 7. Example 6 ST: . . . 他的刊物 決不會埋沒 稿子的。 TT1: . . . his publication would never turn down a good manuscript. (Wang 1941) TT2: . . . his magazine would never ignore a good manuscript. (Yang and Yang 1956) TT3: . . . he would never sit on a good manuscript. (Lyell 1990) TT4: . . . Freedom’s Friend never turned away good work. (Lovell 2009) Example 7 ST: . . .加以油雞們又大起來了,更容易成為兩家爭吵的引線。 TT1: – growing larger and larger, and more and more frequently the cause of quarrels between the two families. (Wang 1941) TT2: The chicks had grown into hens now, and were more of a bone of contention than ever between the two families. (Yang and Yang 1956) TT3: . . . and as if the chicks weren’t enough, then they had to go and grow into chickens and thereby provide even shorter fuses for the quarrels between our family and the landlord’s. (Lyell 1990) TT4: And now the hens were fully grown up, arguments between the two households in the compound became more frequent. (Lovell 2009) Mapping Metaphors in Lu Xun’s Short Stories 275 As mentioned previously, the categories of metaphors could modulate the use of translation strategies by L1 and L2 translators. Our data revealed that culture specificity and creativity influenced the degree of adoption and adaption by L1 and L2 translators, respectively, although preference of translation strategy or processing routes varied with individual translators in both translation directions. Specifically, L2 translators were inclined to recruit more salient adoption for culture-universal metaphors than culture-specific ones in comparison with L1 translators. Similarly, L2 translators tended to use more adoption for creative metaphors than conventional ones in contrast to L1 translators. In other words, L2 translators exerted more cognitive effort to render conventional culture-specific metaphors compared with L1 translators. The interaction of stimuli (different types of metaphors as source input) and cognitive environment leads to the asymmetry of processing routes between L1 and L2 translators. As mentioned earlier, universal terms are easy to translate into the TT than the specific ones. In this case, L2 translators with more ST knowledge and the purpose of culture promotion would generally employ direct translation to transfer more original images into the TT than L1 translators. However, most of the creative metaphors in the ST were borrowed from those in the TT, shown as our data, so that the translators would not resort to much more effort to render them than conventional ones. It is also the case for ST-knowledgeable, culture-motivated L2 translators to employ more direct translation than L1 translators during the process of translating creative metaphors. Overall, the cognitive-pragmatic account for those frequency distributions of the strategies or routes lies in the governance of the principle of relevance in terms of effort and effect embedded in the cognitive environments of the translator and the TT reader. By observing the principle of relevance, both L1 and L2 professional translators in our pool of data subconsciously deployed the least effortful route (i.e., direct translation). Crucially, if the most economical route is not available due to input difficulty or intentional-contextual consideration, he or she would search for a less economical route (i.e., indirect translation). The interplay of processing routes could result in the dominant transfer and its variation by L1 and L2 translators for rendering metaphors as a whole or as an individual subcategory. 14.6 Concluding Remarks The present research yielded some similarities and differences between L1 and L2 translators in their translating of metaphors. They were explained by the interplay of effort and effect in terms of RT to understand the creative process of literary translation in both translation directions. Four points needed further consideration, as follows. First, compared with L2 translators, L1 translators tended to adapt metaphors for the target reader, particularly in the cases of conventional and culture-specific metaphors in the present study. The findings indicated that the categories of metaphors and translation direction were two interactive factors during the process of translating metaphors where the directional effect was modulated by categories of metaphors. The interactive effects need to be testified with corpus-assisted 276 Linping Hou and Defeng Li evidence for few corpus-assisted studies of metaphors in translation have focused on the interactive effect. Second, the corpus-assisted relevance-theoretical analysis could be triangulated with behavioral or neurocognitive studies in translation in terms of research methodology. The former was complementary and mutual manifestation to the latter, although it was far from fine-grained research (e.g., a controlled research design in psychological studies). This complementary line of research supported the idea of “corpus and psychological triangulation” advanced by some scholars (e.g., Halverson 2010; Lang and Li 2020; Mahlberg et al. 2014) that corpusassisted studies of literary professional translator’s performance at the textual level could be triangulated by psychological studies. The triangulation needs further explorations with evidence. Specifically, the categorical effect of the more frequent engagement of transcoding for the shared items and paraphrasing for unshared ones needs further investigating by combining a corpus-assisted method with alternative experimental method. This line of triangulation is also invaluable for the directional effect on L1 and L2 translators’ performance. Third, the application of indirect translation is not restricted to such appellative texts as touristic leaflets, but it could be expanded to the translation of metaphors in literary texts by L1 or L2 professional translators. Cognitive-pragmatically, indirect reproduction of metaphors in literary texts should also be conformed to the principle of relevance. In other words, the interplay of effort and effect determines the use of direct or indirect translation by both L1 and L2 translators, respectively. In this scenario, RT advanced our understanding of the effect of directionality on the translator’s performance in literary translation. Our present research results revealed that L1 translators were more frequently engaged with indirect translation, whereas L2 translators resorted to more direct translation. It suggested that L1 translators exerted much more efforts to activate the cognitive context of the target reader than L2 translators did. In addition, the impact of the input category on the effort made by L1 and L2 translators in the current research could be explained as the translator’s effort varying with the difficulty of the rendered items, namely, the more difficult an item is, the more effort a translator employs. Fourth, the present study has implications for the cross-language creation of literary works and promotion of Chinese culture. As stated prior, both L1 and L2 competent translators (sub)consciously took their cognitive effort and contextual effect into consideration and made an optimal choice to put the (un)shared and (non)creative information into the target language. It is implied that translation competence, historical-cultural context, information type, translator’s intention, and reader’s expectation are the most influential factors among others for literary translation as well as culture promotion and reception. With these factors, it is directionality that plays an interactive role, as the differences of L1 and L2 translators’ performance indicated in this present study. In this scenario, we claim that L2 translators in the new era of Chinese culture promotion should keep pace and peace with L1 translators’ contribution to the reception or consumption of Chinese literary works and culture. Mapping Metaphors in Lu Xun’s Short Stories 277 References Alves, Fabio. 1995. Zwischen Schweigen und Sprechen: Wie bildet sich eine transkulturelle Brücke? Hamburg: Dr. Kovac. Alves, Fabio. 2020. “Translation, Pragmatics and Cognition.” In The Routledge Handbook of Translaiton and Cognition, edited by Fabio Alves and Arnt Lykke Jakobsen, 133–46. London: Routledge. Alves, Fabio, and José Luiz Gonçalves. 2003. “A Relevance Theory Approach to the Investigation of Inferential Processes in Translation.” In Triangulating Translation: Perspectives in Process Oriented Research, edited by Fabio Alves, 3–24. Amsterdam: John Benjamins. Alves, Fabio, and José Luiz Gonçalves. 2007. “Modeling Translator’s Competence: Relevance and Expertise Under Scrutiny.” In Doubts and Directions in Translation Studies, edited by Yves Gambier, Miriam Shlesinger, and Radegundis Stolze, 41–55. Amsterdam: John Benjamins. Boase-Beier, Jean. 2011. A Critical Introduction to Translation Studies. London: Continuum. Carston, Robyn. 2002. Thoughts and Utterances: The Pragmatics of Explicit Communication. Oxford: Blackwell. Carston, Robyn. 2017. “Relevance Theory and Metaphor.” In The Routledge Handbook of Metaphor and Language, edited by Elena Semino and Zsófia Demjén, 42–55. London: Routledge. Chou, Isabelle, Victoria Lei, Defeng Li, and Yuanjian He. 2016. “Translational Ethics from a Cognitive Perspective: A Corpus-assisted Study on Multiple English-Chinese Translations.” In Rereading Schleiermacher: Translation, Cognition and Culture, edited by Teresa Seruya and José Miranda Justo, 159–73. Heidelberg: Springer. Dam, Helle V. 2001. “On the Option Between Form-based and Meaning-based Interpreting: The Effect of Source Text Difficulty on Lexical Target Text Form in Simultaneous Interpreting.” The Interpreters’ Newsletter 11: 27–54. Dickins, James. 2005. “Two Models for Metaphor Translation.” Target 17, no. 2: 227–73. Gallai, Fabrizio. 2019. “Cognitive Pragmatics and Translation.” In The Routledge Handbook of Translation and Pragmatics, edited by Rebecca Tipton and Louisa Desilla, 51–72. London: Routledge. Gibbs, Raymond W., and Markus Tendahl. 2006. “Cognitive Effort and Effects in Metaphor Comprehension: Relevance Theory and Psycholinguistics.” Mind and Language 21, no. 3: 379–403. Gutt, Ernst-August. 1991/2000. Translation and Relevance: Cognition and Context. Manchester: St. Jerome. Gutt, Ernst-August. 2005. “On the Significance of the Cognitive Core of Translation.” The Translator 11, no. 1: 25–49. Halverson, Sandra L. 2010. “Cognitive Translation Studies: Developments in Theory and Method.” In Translation and Cognition, edited by Gregory M. Shreve and Erik Anglone, 349–69. London: Routledge. Hatzidaki, Anna, and Emmanuel M. Pothos. 2008. “Bilingual Language Representation and Cognitive Processes in Translation.” Applied Psycholinguistics 29, no. 1: 125–50. Hou, Linping. 2017. “A Corpus-assisted Case Study of Translational Directionality: Translations of Chinese Short Stories into English by L1- and L2- Translators.” PhD diss., University of Macau. 278 Linping Hou and Defeng Li Hsia, Chih-Tsing. 1961. A History of Modern Chinese Fiction. New Haven, CT: Yale University Press. Huang, Libo. 2015. Style in Translation: A Corpus-based Perspective. Heidelberg: Springer. Huang, Qiuhong. 2020. A Corpus-assisted Contrastive Study on Translating Culture-specific and Non-culture-specific Items. Hong Kong: Xin Hwa Book Co., Limited. Kolaiti, Patricia, and Deirdre Wilson. 2014. “Corpus Analysis and Lexical Pragmatics: An Overview.” International Review of Pragmatics 6, no. 2: 211–39. Lang, Yue, and Defeng Li. 2020. “Cognitive Processing Routes of Culture-specific Linguistic Metaphor in Simultaneous Interpreting: A Corpus-assisted Study.” In Key Issues in Translation Studies in China, edited by Lily Lim and Defeng Li, 91–109. Singapore: Springer. Li Defeng 李德鳳, He Wenzhao 賀文照 and Hou Linping 侯林平. 2018. “Lan Shiling fanyi fengge kuzhu yanjiu” 藍詩玲翻譯風格庫助研究 [A Corpus-assisted Study of Julia Lovell’s Translating Style]. Foreign Language Education 外語教學 39, no. 1: 70–6. Lovell, Julia (trans.). 2009. The Real Story of Ah-Q and Other Tales of China. London: Penguin Books. Lyell, William (trans.). 1990. Diary of a Madman and Other Stories. Honolulu, HI: University of Hawaii Press. Mackenzie, Ian. 2002. Paradigms of Reading: Relevance Theory and Deconstruction. New York: Palgrave Macmillan. Mahlberg, Michaela, Kathy Conklin, and Marie-Josée Bisson. 2014. “Reading Dickens’s Characters: Employing Psycholinguistic Methods to Investigate the Cognitive Reality of Patterns in Texts.” Language and Literature 23, no. 4: 369–88. Mauranen, Anna. 2004. “Corpora, Universals and Interference.” In Translation Universals: Do They Exist?, edited by Anna Mauranen and Pekka Kujamäki, 65–82. Amsterdam: John Benjamins. Newmark, Peter. 1988. A Textbook of Translation. New York: Prentice Hall International. Nida, Eugene A. 1964. Toward a Science of Translating with Special Reference to Principles and Procedures Involved in Bible Translating. Leiden: E. J. Brill. Noveck, Ira. 2018. Experimental Pragmatics: The Making of a Cognitive Science. Cambridge: Cambridge University Press. Pragglejaz Group. 2007. “MIP: A Method for Identifying Metaphorically Used Words in Discourse.” Metaphor and Symbol 22, no. 1: 1–39. Rodríguez Márquez, María. 2010. “Patterns of Translation of Metaphor in Annual Reports in American English and Mexican Spanish.” PhD diss., University of Surrey. Schäffner, Christina. 2004. “Metaphor and Translation: Some Implications of a Cognitive Approach.” Journal of Pragmatics 36, no. 7: 1823–64. Schäffner, Christina. 2017. “Metaphor in Translation.” In The Routledge Handbook of Metaphor and Language, edited by Elena Semino and Zsófia Demjén, 247–62. London: Routledge. Schäffner, Christina, and Paul Chilton. 2020. “Translation, Metaphor and Cognition.” In The Routledge Handbook of Translaiton and Cognition, edited by Fabio Alves and Arnt Lykke Jakobsen, 326–43. London: Routledge. Schäffner, Christina, and Mark Shuttleworth. 2013. “Metaphor in Translation: Possibilities for Process Research.” Target 25, no. 1: 93–106. Shao Li 邵莉. 2018. “Lu Xun xiaoshuo yizuozhong de cihui ouhua xianxiang – jiyu yuliaoku de lishi yuyanxue yanjiu” 魯迅小說譯作中的詞彙歐化 – 基於語料庫的 歷時語言學研究 [Lexical Europeanization in Lu Xun’s Fictional Translation Works: Mapping Metaphors in Lu Xun’s Short Stories 279 A Corpus-based Diachronic Linguistic Study]. Journal of PLA University of Foreign Languages 解放軍 學院學報 41, no. 6: 98–106. Shao Li 邵莉, and Wang Kefei 王克非. 2018. “Lu Xun baihua xiaoshuo yizuozhong jufa ouhua xianxiang de lishi bianhua – jiyu yuliaoku de yanjiu fangfa” 魯迅白話小說譯作 中句法歐化 的歷時變化 – 基於語料庫的研究方法 [The Diachronic Change of Syntactic Europeanization in Lu Xun’s Vernacular Fictional Translation Works: A Corpusbased Approach]. Foreign Language and Their Teaching 外語與外語教學, no. 6: 133–42. Shuttleworth, Mark. 2013. “Metaphor in Translation: A Multilingual Investigation into Language Use at the Frontiers of Science Knowledge.” PhD diss., Imperial College London. Shuttleworth, Mark. 2017. Studying Scientific Metaphor in Translation: An Inquiry into Cross-lingual Translation Practices. Abingdon: Routledge. Sjørup, Annette. 2013. “Cognitive Effort in Metaphor Translation: An Eye-tracking and Key-logging Study.” PhD diss., Copenhagen Business School. Sperber, Dan, and Deirdre Wilson. 1986/1995. Relevance: Communication and Cognition. Oxford: Blackwell. Sperber, Dan, and Deirdre Wilson. 2012. “A Deflationary Account of Metaphors” In Meaning and Relevance, edited by Deirdre Wilson and Dan Sperber, 97–122. Cambridge : Cambridge University Press. Steen, Gerard. 2011. “The Contemporary Theory of Metaphor – Now New and Improved!” Review of Cognitive Linguistics 9, no. 1: 26–64. Stefanowitsch, Anatol. 2006. “Corpus-based Approaches to Metaphor and Metonymy.” In Corpus-based Approaches to Metaphor and Metonymy, edited by Anatol Stefanowitsch and Stefan Th. Gries, 1–17. Berlin: Mouton de Gruyter. Tendahl, Markus, and Raymond W. Gibbs. 2008. “Complementary Perspectives on Metaphor: Cognitive Linguistics and Relevance Theory.” Journal of Pragmatics 40, no. 11: 1823–64. Tirkkonen-Condit, Sonja. 2002. “Metaphoric Expressions in Translation Processes.” Across Language and Cultures 3, no. 1: 101–16. Toury, Gideon. 2012. Descriptive Translation Studies and Beyond (rev. ed.). Amsterdam: John Benjamins. Trim, Richard, and Dorota Śliwa. 2019. Metaphor and Translation. Newcastle upon Tyne: Cambridge Scholars Press. Walker, Callum. 2021. An Eye-Tracking Study of Equivalent Effect in Translation: The Reader Experience of Literary Style. Cham: Palgrave Macmillan. Wang, Baorong. 2011. “Translation Practices and the Issue of Directionality in China.” Meta 56, no. 4: 896–914. Wang, Chi-Chen (trans.). 1941. Ah Q and Others: Selected Stories of Lusin. New York: Columbia University Press. Wearing, Catherine. 2014. “Interpreting Novel Metaphors.” International Review of Pragmatics 6, no. 1: 78–102. Wen, Yun, and Walter J. van Heuven. 2017. “Chinese Translation Norms for 1,429 English Words.” Behavior Research Methods 49, no. 3: 1006–19. Wilson, Deirdre. 2000. “Metarepresentation in Linguistic Communication.” In Metarepresentations: A Multidisciplinary Perspective, edited by Dan Sperber, 411–48. Oxford: Oxford University Press. Wilson, Deirdre. 2011. “Parallels and Differences in the Treatment of Metaphor in Relevance Theory and Cognitive Linguistics.” Intercultural Pragmatics 8: 177–96. 280 Linping Hou and Defeng Li Xu Ming 許明, and Jiang Yue 蔣躍. 2020. “Ah Q Zhengzhuan yiru yichu wenben de fengge jiliangxue duibi”《阿Q正傳》譯入譯出文本的風格計量學對比 [A Stylometric Comparison of L1 Translations and L2 Translations of The True Story of Ah Q]. Foreign Languages Research 外語研究 37, no. 3: 86–92. Yan Yidan 嚴苡丹, and Han Ning 韓寧. 2015. “Jiyu yuliaoku de yizhe fengge yanjiu – yi Lu Xun xiaoshuo liangge yingyiben wei li” 基於語料庫的譯者風格研究 – 以魯 迅小說兩個英譯本為例 [A Corpus-based Study of Translator’s Style in the Two English Versions of Lu Xun’s Fiction]. Foreign Language Education 外語教學 36, no. 2: 109–13. Yang, Hsien-yi, and Gladys Yang (trans.). 1956. Lu Xun: Selected Works (Vol 1). Beijing: Foreign Language Press. 15 On a Historical Approach to Cantonese Studies A Corpus-Based Contrastive Analysis of the Use of Classifiers in Historical and Recent Translations of the Four Gospels Tak-sum Wong and Wai-mun Leung 15.1 Introduction Supported by the Lord Wilson Heritage Trust, the “Database of the 19th Century (1865–1894) Cantonese Christian Writings” provides a public data repository through the digitization of 15 Cantonese Christian classics published in middle to late nineteenth century (Tóngguāng 同光 period of Qing Dynasty), with a total of approximately 466,000 characters. The database is accessible by those who are interested in the history of Christianity in Hong Kong and provides valuable and reliable documents for scholars in the fields of linguistics, theology, religion, translation, and other academic disciplines.1 Since Robert Morrison (1782–1834) arrived in Guangzhou at the beginning of the nineteenth century, marking the beginning of Protestant missions in China, many missionaries have followed his footsteps coming to the East. To facilitate the dissemination of Christian teachings, missionaries who came to Guangdong learned the local language, Cantonese, in the Guangdong region (including Hong Kong) and began to translate, write, and publish Christian books in Cantonese dialects, such as prayers, evangelistic books, and hymns. In addition to the various books of the Bible, many influential Christian books were gradually translated to or written in Cantonese during the mid- to late nineteenth century, such as Coming Close to Jesus (1865), The Pilgrim’s Progress (1871), and Questions and Answers on the Gospel of John (1888). The historical value of the works available in this database is enormous for the study of Christian missionary activities in the Guangdong area and the history of early Cantonese translations. For example, it provides not only materials for the study of the progress of scholars’ interpretation of ancient biblical manuscripts but also documents for the study of the historical development of Cantonese, textual analysis and interpretation of Cantonese, comparison of expressions and styles in English-Cantonese translations, and historical formation of written Cantonese. DOI: 10.4324/9781003298328-16 Tak-sum Wong and Wai-mun Leung 282 The four key features of this database are as follows: 1 High diversity of literature. Full texts of the 15 Cantonese Christian classics during the mid- to late nineteenth century were digitalized, covering the following four categories: • Books of the Bible: The Old Testament: Genesis (1873), Exodus (1888), Deuteronomy (1888) The New Testament: Acts (1872), Matthew (1882), Mark (1882), Luke (1883), John (1883), Selected Readings of the Gospel of Luke (circa the 1880s, Chinese-English-Romanization edition) • • • 2 3 4 Allegorical novels: The Pilgrim’s Progress (1871), The Pilgrim’s Progress II (1870) Spiritual missions: Coming Close to Jesus (1865), That Sweet Story of Old (1874) Teaching materials: Questions and Answers on the Gospel of John (1888), Readings in Cantonese Colloquial (1894) Easy searching and exporting. Our database provides retrieval and advanced query functions such that users can set the number of results per page from 10 to 100 entries. The preceding and ensuing three sentences of each search result are displayed on the result page to help users understand its context. Results can be easily copied or exported to a spreadsheet for further processing. Displaying images of original materials. Scanned images of original texts of all the 15 documents are provided to facilitate close reading of primary sources by users. Facilitating the comparison of different translations. The Old Testament. The following translation is provided for users to compare different translations of verses in Genesis, Exodus, and Deuteronomy: • The Mandarin version published in Shanghai in 1919 (“The Old and New Testaments,” Chinese Union Version Bible, published by the American Bible Society) The New Testament. The following two editions are provided for users to access selected readings from Matthew, Mark, Luke, John, and Acts for text comparison: • • The Mandarin version published in Shanghai in 1919 (“The Old and New Testaments,” Chinese Union Version Bible, published by the American Bible Society) The contemporary Cantonese translation published in Hong Kong in 2010 (Cantonese Bible: New Cantonese Version, published by the Hong Kong Bible Society, first edition published in 2006) On a Historical Approach to Cantonese Studies 283 In the first stage of development of our database, 15 historical Christian writings were digitalized and made publicly accessible. In the second stage, we planned to provide linguistic tagging for all texts. At present the tagging of the 1880s (Noyes et al. 1882a, 1882b, 1883a, 1883b) and 2010 editions of the four canonical gospels in the Christian New Testament (“Four Gospels,” hereinafter) was finished. In this chapter, we will focus on these eight texts and provide a statistical account and a contrastive study on the use of classifiers therein. For the linguistic value of studying the translations of the Four Gospels, please refer to Leung (2011, 2016, 2021). On the study of digitalizing the early Cantonese Bible, the reader may refer to Kataoka (2021). 15.2 Classifiers in Cantonese In most European languages, the use of measure words is marked. They are only employed when actualizing the semantic boundary of nouns (Bisang 1999, 121) is desired. In some cases, the natural boundary is absent (e.g., a cup of coffee, and a drop of water), while in other cases, the use of natural boundaries is not intended (e.g., a basket of fruit, and a gang of people). In the context when the natural boundary is adopted when counting, measure words are always absent (e.g., an apple, a man, and a bean). On the other hand, in another part of the world, the use of measure words is mandatory for a number of languages, even when the natural boundary is adopted when counting. The measure words in these languages are often referred to as classifiers. For example, in contemporary Cantonese: (1) 一個哥哥 jɐt5 kɔ33 kɔ11kɔ55 one cl elder.brother “an elder brother” (2) 兩隻眼 lœŋ13 tsɛk3 ŋan13 two cl eye “two eyes” (3) 三個姑娘 sam55 kɔ33 ku55nœŋ11 three cl young.lady “three young ladies” (4) 六隻貓 lok2 tsɛk3 mau55 six cl cat “six cats” 284 Tak-sum Wong and Wai-mun Leung The absence of classifiers is ungrammatical when counting (with rare exceptions), for example: (1)’ *一哥哥 *jɐt5 kɔ11kɔ55 one elder.brother “an elder brother” (2)’ *兩眼 *lœŋ13 ŋan13 two eye “two eyes” (3)’ *三姑娘 *sam55 ku55nœŋ11 three young.lady “three young ladies” (4)’ *六貓 *lok2 mau55 six cat “six cats” Classifiers can be used to count not only nouns but also actions, exempli gratia: (5) 賭一鋪 tou35 jɐt5 pʰou55 bet one cl “to take a gamble” (6) 打十下 ta35 sɐp2 ha13 hit ten cl “hit ten times” Classifiers for counting objects, as shown in examples 1 to 4, are commonly known as numerical classifiers, while those for counting actions, as shown in examples 5 and 6, are commonly called verbal classifiers. When nouns are premodified with demonstrative and interrogative pronouns, the use of classifiers is also mandatory, such as: On a Historical Approach to Cantonese Studies 285 (7) 呢個姑娘 ni55 kɔ33 ku55nœŋ11 this cl young.lady “this young lady” (8) 嗰隻貓 kɔ35 tsɛk3 mau55 that cl cat “that cat” (9) 邊隻眼? pin55 tsɛk3 ŋan13 ? which cl eye “Which eye?” Being commonly used for counting and referential purposes in Cantonese (and the majority of Sinitic languages), noun classifiers can also undergo reduplication to form reduplicated classifiers denoting each individual (Wu 2017), for example: (10) 個個 靚 kɔ33kɔ33 ku55nœŋ11 tou55 hou35 lɛŋ33 cl-cl young.lady also very pretty “Every young lady is pretty.” In example 10, the general classifier kɔ ɔ33個 is reduplicated to form the construction kɔɔ33kɔɔ33個個, “everyone,” referring to every young lady. For a comprehensive usage of classifiers in contemporary Cantonese, readers can refer to Cheung (2007, 344–6) as well as Matthews and Yip (2011, 39, 109–26). 15.3 A Contrastive Analysis of the Use of Classifiers in Historical and Recent Translations of the Four Gospels In this section, we will compare the use of classifiers as observed in the Cantonese translations of the 2010 edition and the 1880s edition of the four canonical gospels in the Christian New Testament. In Section 15.3.1, classifiers for counting and referential purposes will be analyzed, while reduplicated classifiers will be discussed in section 15.3.2. 15.3.1 Classifiers for Counting and Referential Purposes The ten most frequently used classifiers for counting and referential purposes as observed in the 2010 edition of the contemporary Cantonese translation of the Four Gospels are listed in Table 15.1. 286 Tak-sum Wong and Wai-mun Leung Table 15.1 List of Top 10 Classifiers Present in the Contemporary Cantonese Translation of the Four Gospels Matthew (N = 684) Mark (N = 432) Luke (N = 720) John (N = 465) Classifier (63) # Classifier (50) # Classifier (72) # Classifier (47) # 個 啲 日 隻 班 件 條 位 次 句 kɔ ɔ33 ti55 jɐɐt2 tsɛk3 pan55 kin22 tʰiu11 wɐɐi35 tsʰi33 kɵy33 247 144 42 25 19 17 17 15 15 11 個 啲 日 隻 次 條 件 班 座 位 kɔɔ33 ti55 jɐɐt2 tsɛk3 tsʰi33 tʰiu11 kin22 pan55 tsɔɔ22 wɐɐi35 152 89 22 15 15 13 11 11 9 7 個 kɔɔ33 啲 ti55 日 jɐɐt2 隻 tsɛk3 件 kin22 人 jɐɐn11 次 tsʰi33 位 wɐɐi35 年 nin11 條 tʰiu11 296 110 51 20 20 19 16 15 13 11 個 啲 位 日 件 次 條 年 班 羣 kɔɔ33 ti55 wɐɐi35 jɐ ɐt2 kin22 tsʰi33 tʰiu11 nin11 pan55 kʷʰɐɐn11 167 103 43 36 16 11 9 6 6 6 In Table 15.1, N denotes the total number of classifier tokens in each gospel, while the total number of classifier types is shown in row 2. For instance, 63 different classifiers are found in the Gospel of Matthew, while 684 tokens are present. It can be observed that 7 classifiers are overlapping in the top 10 classifier list across these four Gospels (highlighted). Note that in contemporary Cantonese, kɔɔ33個 is a general classifier used in a countable context in which the number or amount to be expressed is exact, while ti55啲 is a general classifier used in an uncountable context or when the number/amount to be expressed is unspecified. One example for each classifier is presented in the following for illustration: (11) 個 kɔɔ33 五個餅 (Luke 9:13, 2010) ŋ13 kɔ33 pɛŋ35 five cl loaf “five loaves” (12) 啲 ti55 呢啲工作 (Luke 4:43, 2010) ni55 ti55 koŋ55tsɔk3 DEM cl work “these tasks” (13) 日 jɐt2 三日 (Mark 8:2, 2010) sam55 jɐt2 three day “three days” On a Historical Approach to Cantonese Studies 287 (14) 件 kin22 呢件事 (Luke 1:18, 2010) ni55 kin22 si22 DEM cl matter “this issue” (15) 條 tʰiu11 兩條魚 (Luke 9:13, 2010) lœŋ13 tʰiu11 jy35 two cl fish “two fishes” (16) 位 wɐi35 嗰位天使 (Luke 2:13, 2010) kɔ35 wɐi35 thin55si33 that cl angel “that angel” (17) 次 tsʰi33 得罪你七次 (Luke 17:4, 2010) tɐk5tsɵy22 nei13 tshɐt5 tsʰi33 trespass.against 2SG seven cl “to trespass against thee seven times” It should be noted that the absence of some frequently observed classifiers in the top 10 list of a gospel does not imply its absence in the original text. In most cases, those classifiers merely occupy a lower position in the frequency list. For example, the sortal classifier commonly used for counting animals, tsɛk3隻, appears in all the Four Gospels: the Gospels of Matthew (25 tokens), the Gospel of Mark (15 tokens), the Gospel of Luke (20 tokens), and the Gospel of John (3 tokens). Its absence in the top 10 list of the Gospel of John is just a result of its low frequency, even lower than the tenth most frequently observed classifier, namely, kʷʰɐn11羣, “crowd” (6 tokens), which is a collective classifier and can also be used to count animals. Having introduced the distribution of classifiers in the contemporary Cantonese translation of the Four Gospels of the 2010 edition, we travel back to the 1880s! The distribution of classifiers for counting and referential purposes in the historical Cantonese translation of the 1880s edition of the Four Gospels is shown in Table 15.2. 288 Tak-sum Wong and Wai-mun Leung Table 15.2 List of Top 10 Classifiers Present in the Historical Cantonese Translation of the Four Gospels Matthew (N = 678) Mark (N = 398) Luke (N = 798) John (N = 476) Classifier # Classifier # Classifier # Classifier # 個 kɔ ɔ33 的 ti53 陣 tʃɐɐn22 日 jɐɐt2 隻 tʃɛk3 條 tʰiu11 樣 jœng22 kan53 次 tsʰɿ33 人 jɐɐn11 266 210 64 43 27 22 20 9 9 8 個 kɔɔ33 的 ti53 隻 tʃɛk3 日 jɐɐt2 條 tʰiu11 樣 jœng22 kan53 件 kin22 句 ky33 隊 tui22 184 107 19 17 16 16 11 10 10 8 個 kɔɔ33 的 ti53 日 jɐɐt2 隻 tʃɛk3 陣 tʃɐɐn22 件 kin22 條 tʰiu11 樣 jœng22 kan53 年 nin11 392 202 59 33 28 21 21 16 15 11 個 kɔ ɔ33 的 ti55 日 jɐɐt2 陣 tʃɐ ɐn22 條 tʰiu11 件 kin22 樣 jœng22 次 tsʰɿ33 處 ʃy33 位 wɐɐi22 200 174 36 15 15 10 10 6 5 5 Similarly, the overlapping classifiers are highlighted. One example for each of these commonly observed classifiers in historical Cantonese will be given in the following for illustration purposes: (18) 個 kɔɔ33 十個城 (Luke 19:17, 1883) ʃɐp2 kɔ33 ʃeŋ11 ten cl city “ten cities”2 (19) 的 ti53 呢的衆人 (Mark 8:2, 1882) ni53 ti53 tʃoŋ33jɐn11 dem cl multitude “these people” (20) 日 jɐt2 三日 (Mark 8:2, 1882) sam53 jɐɐt2 three day “three days” (21) 條 tʰiu11 呢條標 (John 19:20, 1883) ni53 tʰiu11 piu53 title DEM cl “this title” On a Historical Approach to Cantonese Studies 289 (22) 樣 jœŋ22 各樣嘅私慾 (Mark 4:19, 1882) kɔk3 jœŋ22 kɛ33 sɿ53jok2 every cl ADN lust “the lusts of other things” Likewise, the absence of some commonly observed classifiers in the top 10 list of a gospel in Table 15.2 does not imply its absence in that gospel. For instance, as shown in Table 15.2, the sortal classifier kan53 , which is commonly used for counting buildings, appearing in all Four Gospels except the Gospel of John, is merely a consequence of its low frequency in the Gospel of John – only one instance is found. Apparently, three classifiers are shared among both top 10 lists of the 1880s and 2010 editions, namely, kɔɔ33個 [(11), (18)], jɐt ɐ 2日 [(13), (20)], and tʰiu11條 [(15), (21)]. Readers who have a basic mastery of the Chinese language should be able to notice the graphical similarity between classifiers 12 and 19, namely, “啲” and “的.” In fact, the two allographs are semantically and phonologically identical; the former one is used predominantly in contemporary Cantonese but already appeared as early as 1877 in other Cantonese historical documents, while the frequent appearance of the latter one in the historical documents published in the nineteenth century is observed. However, in the 1880s edition of the Four Gospels, only the preserved graph “的” is present, possibly a result of direct transference from earlier translations. The insertion of the mouth radical “口” to the left of the graph “的” is probably related to a historical sound change of this classifier. On the etymology and historical development of “啲” and “的,” readers can refer to Wong (2010) for details. It is also worth noting that four instances of the graph “的” are also observed in the 2010 edition, albeit its rare presence, if not absence, in contemporary Cantonese vernacular writing. This suggests that in the course of preparing the 2010 edition, the translator(s) might have referred to the 1880s edition rather than translated from scratch. Thus, four classifiers are in fact shared among the top 10 lists of the four Gospels in both editions, namely: kɔ ɔ33個, jɐt ɐ 22日, tʰiu11條, and ti53/ti55的/啲 Tables 15.3 and 15.4 list the top 95% most frequently observed classifiers, based on cumulative frequency, in the 2010 and 1880s editions of the Four Gospels, respectively. In the 2010 edition, 96 classifiers are used, but in the 1880s edition, only 81 are present. Among the top 10 classifiers, 6 are found in both editions, namely, kɔ ɔ33個, ti55/ti53啲/的, jɐt ɐ 2日, kin22件, 隻 tsɛk3/tʃɛk3隻, tʰiu11條, which suggests the prevalent usage of these classifiers in Cantonese since the nineteenth century. It is interesting to see that the cumulative frequency of the tenth most frequently used classifier in the 1880s edition, ky33句, “sentence,” has reached 86.8% already, but 290 Tak-sum Wong and Wai-mun Leung Table 15.3 The Most Frequently Observed Classifiers Present in the Recent Cantonese Translation of the Four Gospels Rank Classifier Frequency Rel. Freq. Cul. Freq. Cul. Rel. Freq. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 個 kɔɔ33 啲/的 ti55 日 jɐt ɐ2 位 wɐi ɐ 35 件 kin22 隻 tsɛk3 次 tsʰi33 條 tʰiu11 班 pan55 座 tsɔ ɔ22 羣 kʷʰɐn ɐ 11 句 kɵy33 年 nin11 人 jɐn ɐ 11 家 ka55 倍 pʰui13 嚿 kɐu ɐ 22 樖 pʰɔ ɔ55 塊 fɐi ɐ 33 kan53 籃 lam11 種 tsong35 粒 nɐp ɐ 5 張 tsœŋ55 代 tɔi ɔ 22 樣 jœng22 晚 man13 組 tsou35 兩 lœŋ35 身 sɐn ɐ 55 歲 sɵy33 隊 tɵy22 段 tyn22 邊 pin55 雙 sœŋ55 (Other 61) 862 450 151 80 64 63 57 50 40 28 27 25 21 19 18 16 15 14 14 13 12 12 12 12 12 11 11 8 8 8 6 6 6 5 5 119 37.8% 19.7% 6.6% 3.5% 2.8% 2.8% 2.5% 2.2% 1.8% 1.2% 1.2% 1.1% 0.9% 0.8% 0.8% 0.7% 0.7% 0.6% 0.6% 0.6% 0.5% 0.5% 0.5% 0.5% 0.5% 0.5% 0.5% 0.4% 0.4% 0.4% 0.3% 0.3% 0.3% 0.2% 0.2% 5% 862 1312 1463 1543 1607 1670 1727 1777 1817 1845 1872 1897 1918 1937 1955 1971 1986 2000 2014 2027 2039 2051 2063 2075 2087 2098 2109 2117 2125 2133 2139 2145 2151 2156 2161 2280 37.8% 57.5% 64.2% 67.7% 70.5% 73.2% 75.7% 77.9% 79.7% 80.9% 82.1% 83.2% 84.1% 85.0% 85.7% 86.4% 87.1% 87.7% 88.3% 88.9% 89.4% 90.0% 90.5% 91.0% 91.5% 92.0% 92.5% 92.9% 93.2% 93.6% 93.8% 94.1% 94.3% 94.6% 94.8% 100% its rank counterpart in the 2010 edition, tsɔ ɔ22座, is 80.9% only, with a difference of almost 6%. In Table 15.3, among the 95% most frequently used classifiers in modern Cantonese, three are not found in the entire Four Gospels of the 1880s edition, namely, pan55班, tsong35種, tsou35組. All these suggest that the diversity of classifiers used in the 2010 edition is higher than that in the 1880s edition. It is also interesting to see that the relative frequency of some classifiers underwent a drastic change. For example, there was a reduction in the relative frequency of tui22/tɵy22 隊 from 0.7% in the 1880s edition to 0.3% in the 2010 edition, while the relative frequency of jœŋ22 樣 increased from 0.5% to 2.3%. Do the absence On a Historical Approach to Cantonese Studies 291 Table 15.4 The Most Frequently Observed Classifiers Present in the Historical Cantonese Translation of the Four Gospels Rank Classifier Frequency Rel. Freq. Cul. Req. Cul. Rel. Freq. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ɔ33 個 kɔ 的 ti53 日 jɐt ɐ2 陣 tʃɐn ɐ 22 隻 tʃɛk3 條 tʰiu11 樣 jœng22 件 kin22 kan53 句 ky33 嚿 kɐu ɐ 22 次 tsʰɿ33 人 jɐn ɐ 11 年 nin11 倍 pʰui13 隊 tui22 位 wɐi ɐ 22 斤 kɐn ɐ 53 處 ʃy33 籃 lam11 粒 nɐp ɐ 5 笪 tat3 世 ʃɐi ɐ 33 代 tɔi ɔ 22 張 tʃœŋ53 (Other 56) 1042 693 155 114 83 74 62 49 36 26 25 24 18 18 18 18 14 11 11 11 10 10 9 9 8 140 38.8% 25.8% 5.8% 4.2% 3.1% 2.8% 2.3% 1.8% 1.3% 1.0% 0.9% 0.9% 0.7% 0.7% 0.7% 0.7% 0.5% 0.4% 0.4% 0.4% 0.4% 0.4% 0.3% 0.3% 0.3% 5.2% 1042 1735 1890 2004 2087 2161 2223 2272 2308 2334 2359 2383 2401 2419 2437 2455 2469 2480 2491 2502 2512 2522 2531 2540 2548 2688 38.8% 64.5% 70.3% 74.6% 77.6% 80.4% 82.7% 84.5% 85.9% 86.8% 787.8% 88.7% 89.3% 90.0% 90.7% 91.3% 91.9% 92.3% 92.7% 93.1% 93.5% 93.8% 94.2% 94.5% 94.8% 100% of the three classifiers in the 1880s edition and the drastic change in the relative frequency of some classifiers also suggest that there existed a process of lexical replacement in the history of Cantonese? A comparison of identical verses containing these three classifiers in the two editions was conducted to investigate this conjecture. Our analysis found that while in most cases, the reduction in the use of classifiers is a result of the employment of other strategies in the course of translation, in other cases, lexical replacement took place. Example (23) shows a case which employed tui22隊 as a collective classifier of jɐn ɐ 11人, “human being,” in historical Cantonese, while tsou35組 was employed in contemporary Cantonese translation. (23) Luke 9:14 [. . .] 耶穌又對門生話、呌大衆排開坐倒處、每隊五十人。 (1883) [. . .] jɛ11su53 jɐu22 tui33 mun11ʃɐŋ53 wa22 , kiu33 tai22tʃoŋ33 phai11 hɔi53 tshɔ13 tou35 ʃy33 mui13 tui22 ŋ13ſɐp2 jɐn11 . Jesus also to disciple say ask masses line. P RT sit at place each cl fifty human up 292 Tak-sum Wong and Wai-mun Leung [. . .] 耶穌對 哋話:「叫羣眾一組一組坐落,每組約五十人。」 (2010) [. . .] jɛ11sou55 tɵy33 k hɵy13tei22 wa22 : Jesus to 3P L say “ kiu33 kwhɐn11tsoŋ33 jɐt5 tsou35 jɐt5 tsou35 tshɔ13 lɔk2 , mui13 tsou35 jœk3 ŋ13sɐp2 jɐn11 . “ ask throng one cl one cl sit P RT each cl approximately fifty human “[. . .] And he [Jesus] said to his disciples, Make them sit down by fifties in a company.” In contemporary Cantonese, tɵy22隊 is often used to count teams, while the collective classifier for counting groups (of people) is tsou35組; but in historical Cantonese, apparently, tui22隊 can also be used to count groups, while tsou35/tsu35組 is absent in the four Gospels of the 1880s edition. Example (24) shows a similar example which employed tui22隊 as the collective classifier of pigs in historical Cantonese, while kʷʰɐn ɐ 11羣 was employed in contemporary Cantonese translation. (24) Luke 8:32 [. . .]個的鬼求耶 個隊猪處 [. . .] (1883) [. . .] kɔ33 ti53 k wɐi35 k hɐu11 jɛ11su53 tſun35 khy13 jɐp2 kɔ33 tui22 tſy55 ʃy33 [. . .] DEM CL ghost beseech Jesus allow 3SG enter DEM cl swine place [. . .] 鬼就央求耶穌, 哋去羣豬處 [. . .] (2010) [. . .] kwɐi35 tsɐu22 jœŋ55khɐu11 jɛ11sou55 , tsɵn35 khɵy13tei22 hɵy33 kʷʰɐn11 tsy55 sy33 [. . .] ghost then implore Jesus allow 3P L go cl swine place “[. . .] and they [devils] besought him [Jesus] that he would suffer them to enter into them [. . .]” In this example, the classifier for counting pigs is kʷʰɐn ɐ 11羣, depicting a crowd of pigs. In contemporary Cantonese, it is also grammatical to say jat5tɵy22tsy55 一 隊豬, but only in the case when pigs are “lining up.” Example (25) shows an instance which employed jœŋ22樣, “kind,” as the generic classifier of an abstract concept, namely, sɿ53jok2私欲, “lust,” in historical Cantonese, while tsong35種, “kind,” was employed in contemporary Cantonese translation. (25) Mark 4:19 [. . .] 與及各樣嘅私慾 嚟偪死道理 [. . .] (1882) [. . .] jy13khɐp2 kɔk3 jœŋ22 kɛ33 sɿ53jok2 , tou53 lɐi11 pek5 sɿ35 tou22li13 [. . .] and every cl ADN lust also come choke die argument [. . .] 同其他各種慾望入嚟窒息信息嘅生機 [. . .] (2010) [. . .] thoŋ11 khei11tha55 kɔk3 tsoŋ35 jok2mɔŋ2 jɐp2lai11 tsɐt2sek5 sɵn33sek5 kɛ33 sɐŋ55kei55 [. . .] with other every cl desire go.into choke message ADN vitality “[. . .] and the lusts of other things entering in, choke the word [. . .]” In contemporary Cantonese, the use of jœŋ22樣 is more restricted, such that it can only be used to count a finite set of nouns (e.g., jɛ13嘢, “thing; issue”), On a Historical Approach to Cantonese Studies 293 but tsong35種 can be used in combination of any nouns. As reflected in the Four Gospels, in historical Cantonese, jœŋ22樣 seems to have been used in combination of any nouns, abstract or concrete, for example, tou22li13道理, “argument” (John 4:25), tʃɐn ɐ 53li13真理, “truth” (John 16:13), sɿ22事, “issue” (Marco 33 1:38), tʃeng 症, “disease” (Marco 1:34), pʰi33jy22譬喻, “parable” (Marco 4:13), ʃin22ji22善義, “righteousness” (Matthew 3:15), and peng22tʰong33 痛, “sickness” (Matthew 4:23). Example (26) employed wui11 囘, “time,” as a verbal classifier of the actions tɐk ɐ 5tsui22 得罪, “trespass against,” and fan53tʃyn33 番轉, “turn round,” in historical Cantonese, while tsʰɿ33 次, “time,” was employed in contemporary Cantonese translation. (26) Luke 17:4 倘若 一日七囘得罪你、亦七囘番轉嚟話 [. . .] (1883) thɔŋ35jœk2 khy13 jɐt5 jɐt2 tshɐt5 wui11 tɐk5tsui22 ni13 , jek2 tshɐt5 wui11 fan53tſyn33 lɐi11 wa22 [...] if 3SG one day seven cl trespass. 2SG also seven cl turn.round P RT say against 若 喺一日內得罪你七次,每 回頭對你話 [. . .] (2010) jœk2 khɵy13 hɐi35 jɐt5 jɐt2 nɔi22 tɐk5tsɵy22 nei13 tshɐt5 tsʰi33 , mui13 jɐt5 tsʰi33 tou55 wui11thɐu11 tɵy33 nei13 wa22 [...] if 3SG LOC one day inside trespass. 2SG seven cl each onecl also turn.round to 2SG say against “And if he trespass against thee seven times in a day, and seven times in a day turn again to thee, saying [...].” In contemporary Cantonese, tsʰi33次 is an unmarked classifier for counting the number of times of an action. Although there exists a difference in the word order between historical and contemporary Cantonese translation, in this context, the use of tsʰi33 is still an unmarked choice in colloquial contemporary Cantonese even if the classifier is in a preverbal position. The use of wui11囘 as a classifier is no longer common in contemporary Cantonese; it is usually used idiomatically in some particular context, like m11 hɐi ɐ 22 jɐt ɐ 5 wui11 si22唔係一回事, “not the same thing/issue.” Example (27) shows a verse which employs tat3笪 as a classifier of tʰin11 田, “field,” in historical Cantonese, while fɐi ɐ 33塊 is used in contemporary Cantonese translation: (27) Matthew 13:44 [. . .] 歡喜去賣嘵所有嘅、嚟買個笪田。 (1882) [. . .] hou35 fun53hi35 hy33 mai22 hiu53 ʃɔ35jɐu13 kɛ33, lɐi11 mai13 kɔ33 tat3 thin11. very joyous go sell P FV all NOM P RT buy DEM cl field [. . .] 然後 高興將自己所有嘅 賣,去買嗰塊田。 (2010) [. . .] jin11hɐu22 hou35 kou55heŋ33 tsœŋ55 tsi22kei35 sɔ35jɐu13 kɛ33 tou55 pin33mai22, hɵy33 mai13 kɔ35 fai33 thin11. afterwards very joyous P RT self all NOM also sell.off go buy that cl field “[. . .] and for joy thereof goeth and selleth all that he hath, and buyeth that field.” The previous example shows a typical case of lexical replacement. The classifier tat33 survives in contemporary Cantonese but is only used to count places or 294 Tak-sum Wong and Wai-mun Leung land parcels (e.g., jɐt ɐ 5 tat3 tei22fɔŋ ɔ 55一笪地方, “a place”), as seen in example 28, while the canonical classifier for thin11, “field,” is fai33. (28) Marco 14:32 哋到一笪地方,名客西馬尼 [. . .] (2010) khɵy13tei22 tou33 jɐt5 tat3 tei22fɔŋ55 3P L arrive one cl place , meŋ11 name hak3sɐi55ma13nei11 [...] GN “And they came to a place which was named Gethsemane . . .” It should be noted that, among the classifiers with a drastic change of the relative frequency in Tables 15.3 and 15.4, only a number of cases reflect the process of lexical replacement, while many other cases demonstrate a result of the application of different translation strategies. As shown in example (29), the lexical item kʷʰɐn ɐ 11tʃong33羣衆, “throng,” was used in the 1880s edition, when jɐt ɐ5 22 55 11 tai pan jɐn ɐ 一大班人, “a huge group of people,” is used in the 2010 edition. In contemporary Cantonese, jɐt ɐ 5 tai22 pan55 jɐn ɐ 11 sounds more colloquial, while 11 33 kʷʰɐn ɐ tsong is usually used in higher register. (29) John 6:5 耶穌舉眼、見羣衆嚟到 處 [. . .] (1883) jɛ11su53 ky35 ŋan13 , kin33 kʷʰɐn11tsoŋ33 see throng Jesus lift eye lɐi11 come tou33 to khy13 3SG ʃy33 place [. . .] 耶穌抬頭,睇見一大班人嚟到 前 [. . .] (2010) jɛ11sou55 thɔi11thɐu11 , thɐi35kin33 jɐt5 tai22 pan55 jɐn11 lɐi11 tou33 khɵy13 min22tshin11 . . . Jesus gain.ground see one big cl human come to 3SG in.front.of “When Jesus then lifted up his eyes, and saw a great company come unto him [. . .]” In example (30), the general classifier kɔɔ33個 is used to count the noun tʰin53sɿ33天使, “angel,” in the 1880s edition, but the honorific classifier for counting people, wɐi ɐ 35位, is utilized in contemporary Cantonese translation. In the 1880s edition, wɐi ɐ 22 was also observed, for example, in verse (30), when it is employed to count tʰin53sɿ33天使, “angel.” In this case, the selection of classifiers seems to have been a matter of the choice of the translators, but no linguistic factor was involved. (30) Luke 2:13 忽然 有大隊天軍、同埋個天使讚美上帝話。 (1883) fɐt5jin11kan53, jɐu13 tai22 tui22 thin53kwɐn53, thoŋ11mai11 kɔ33 thin53sɿ33 tsan33mi13 ʃœŋ22tɐi33 wa22. suddenly EXIST big CL heavenly.host and cl angel praise God say 忽然,有大隊天軍同嗰位天使,讚美上帝話: (2010) fɐt5jin11, jɐu13 tai22tɵy22 thin55kwɐn55 thoŋ11 kɔ35 wɐi35 thin55si33, tsan33mei13 sœŋ22tɐi33 wa22: suddenly EXIST big CL heavenly.host and that cl angel praise God say “And suddenly there was with the angel a multitude of the heavenly host praising God, and saying.” On a Historical Approach to Cantonese Studies 295 15.3.2 Classifier Reduplication Statistics of classifier reduplication are excluded from Tables 15.1 to 15.4. They are presented in Tables 15.5 and 15.6. Table 15.5 shows the statistics of the reduplicated classifiers present in the 2010 edition. It can be observed that only jɐn ɐ 11jɐn ɐ 11人人, “everybody,” and 2 2 jɐt ɐ jɐt ɐ 日日, “every day,” are observed more than once. Table 15.6 shows the statistics of the 1880s edition. It can be seen that kɔɔ33kɔɔ33個個 exists in all Four Gospels, while jɐn ɐ 11jɐn ɐ 11人人 is present in three Gospels but not in the Gospel of John. Apparently, the number of reduplicated classifiers was reduced from 32 in the 1880s edition to 11 in the 2010 edition. Does it reflect a historical syntactic change in Cantonese? By comparing the same verse in both editions, it is found that the reduction in usage of reduplicated classifiers is usually a result of a change of translating strategy when the idea of each individual is uttered. In some cases, a universal quantifier was used. For example: Table 15.5 Reduplicated Classifiers in the Cantonese Translation of the 2010 Edition of the Four Gospels (N = 11) Matthew Mark Luke John Type # Type # Type # Type # 人人jɐɐn11jɐɐn11 句句kɵy33kɵy33 1 1 種種tsoŋ35tsoŋ35 1 人人jɐɐn11jɐɐn11 日日jɐ ɐt2jɐɐt2 2 2 個個kɔɔ33kɔɔ33 2 日日jɐ ɐt2jɐɐt2 1 樣樣jœŋ22 jœŋ22 1 Table 15.6 Reduplicated Classifiers in the Cantonese Translation of the 1880s edition of the Four Gospels (N = 32) Matthew Mark Luke John Type # Type # Type # Type # 個個kɔ ɔ33kɔɔ33 世世ʃɐɐi33ʃɐɐi33 4 1 個個kɔɔ33kɔɔ33 人人jɐɐn11jɐɐn11 3 1 個個kɔɔ33kɔɔ33 人人jɐ ɐn11jɐɐn11 4 3 個個kɔ ɔ33kɔɔ33 3 人人jɐɐn11jɐɐn11 1 件件kin22kin22 1 日日jɐ ɐt2jɐɐt2 3 句句ky ky 33 1 樣樣jœŋ jœŋ 1 處處ʃy33ʃy33 1 2 1 世世ʃɐɐi ʃɐɐi 1 33 日日jɐ ɐt jɐɐt 2 22 22 33 33 對對tui33tui33 年年nin nin 11 樣樣jœŋ jœŋ 22 1 1 11 22 1 296 Tak-sum Wong and Wai-mun Leung (31) Luke 1:65 topic comment 鄰里個個驚慌 [. . .] (1883) lun11li13 kɔ33kɔ33 keŋ53fɔŋ53 neighbour everybody panic subject predicate 鄰居 奇 [. . .] (2010) lөn11kөy55 tou55 hou35 neighbour also very keŋ55khei11 surprised [. . .] [. . .] “And fear came on all that dwelt round about them [. . .]” In the 1880s edition, the reduplicated classifier kɔ ɔ33kɔɔ33個個 is used to express the idea of every neighbour. In the 2010 edition, the universal quantifier tou55 is used to express the idea of all neighbours. In addition, there also exists a change in syntactic construction. In example (31), topic-comment construction is used in the 1883 ɔ 53個 edition such that lun11li13鄰里, “neighbour,” is the topic, while kɔɔ33kɔɔ33 keŋ53fɔŋ 個驚慌, “everybody is panicking,” is the comment. In the 2010 edition, the subjectpredicate construction is used, with lɵn11kɵy55鄰居, “neighbour,” being the subject, 奇, “all being very surprised,” is the predicate. while tou55 hou35 keŋ55kʰei11 The objective truth expressed by these two translations is identical even though different linguist constructions were used, which also leads to a shift in focus. In other cases, other lexical items were used to express the identical objective truth. For instance: (32) Luke 4:20 topic comment . . . 在會堂嘅、人人 定眼睇住 。 (1883) . . . tsɔi22 wui22thɔŋ11 kɛ33 , jɐn ɐ 11jɐn ɐ 11 tou53 teŋ22 ŋan13 thɐi35 tʃy22 khy13 . LOC synagogue NOM human-human also fasten eye see ASP 3SG subject predicate . . . 全會堂嘅人 定眼睇住 。 (2010) . . . tsʰyn11 wui22thɔŋ11 kɛ33 jɐn11 tou55 entire synagogue ATTR human also teŋ22 ŋan13 thɐi35 fasten eye see tsy22 ASP khɵy13 3SG . “. . . And the eyes of all them that were in the synagogue were fastened on him.” The reduplicated classifier jɐn ɐ 11jɐn ɐ 11人人, literally “human-human,” is used to express the idea of everybody in the 1880s edition, while the universal quantifier ɔ 11 kɛ33 jɐn ɐ 11會堂嘅人 to convey the idea tsʰyn11全, “entire,” is used with wui22tʰɔŋ of people in the whole synagogue in the 2010 edition. There also exists a difference in sentence construction such that a topic-comment is used in the former while a subject-predicate is used in the latter edition. Similarly, the objective truth expressed by these two constructions is identical, although there is a subtle difference in focus. On a Historical Approach to Cantonese Studies 297 In a number of cases, the concept of each individual is expressed by other constructions, such as: (33) Luke 11:3 我哋需用嘅糧、日日俾我哋。 (1883) ŋɔ13ti22 sy53 joŋ22 kɛ33 lœŋ11 1P L need use ATTR grain 賜俾我哋每日需要嘅飲食。 (2010) tshi33 pei35 ŋɔ13tei22 mui13 bestow to 1P L each jɐt2jɐt2 day-day , jɐt2 day pi35 give sɵy55jiu33 need kɛ33 ATTR ŋɔ13ti22 1P L . jɐm35sek2 diet . “Give us day by day our daily bread.” The reduplicated classifier jɐt ɐ 2jɐt ɐ 2日日, literally “day-day,” is used to express the idea of every day in the 1880s edition, while in the 2010 edition, the determiner mui13每, “every” + classifier, is used to express the same idea. It is also worth noting that in some cases, other lexical items are used to convey the idea of each individual, like: (34) Luke 9:6 [. . .] [. . .] [. . .] [. . .] 處處傳福音、醫人嘅 ʃy33ʃy33 tʃhyn11 place-place preach 。(1883) fok5jɐm55 gospel 傳福音,到處醫 。 (2010) fok5jɐm55 , tshyn11 preach gospel ji53 cure , jɐn11 human tou33tsʰy33 everywhere kɛ33 P OSS ji55 cure peŋ22 sickness pɛŋ22 sickness . . “[. . .] preaching the gospel, and healing every where.” In example (34), the reduplicated classifier ʃy33ʃy33處處, literally, “place-place,” is used to express the idea of everywhere in the 1880s edition, while in the 2010 edition, the lexical item tou33tsʰy33到處, “everywhere,” is used instead. In terms of lexical choice, in contemporary Cantonese, ʃy33ʃy33 is rarely used, while tou33tsʰy33 is only used in a formal context (e.g., news reports). In the context of example (34), the word tsɐu ɐ 55wɐi ɐ 11周圍 is most frequently used in colloquial Cantonese according to the authors’ native intuition. In examples (31) to (34), other strategies are employed to replace the reduplicated classifiers in the 1880s edition to express the idea of each individual in the 2010 edition. Readers may wonder whether other strategies were replaced by the reduplicated classifiers in the 2010 edition. Let us take a look at the following example: (35) Luke 4:15 喺各會堂敎人、衆人歸榮 hɐi35 kɔk3 wui22thɔŋ11 LOC every synagogue 。 (1883) kau33 jɐn11 teach human , tʃoŋ33jɐn11 everybody kwɐi53weŋ11 glorify khy13 3SG . 喺各會堂敎導人,人人 稱讚 。 (2010) khɵy13 hɐi35 kɔk3 wui22thɔŋ11 kau33tou22 jɐn11 , jɐn11jɐn11 tou55 tsheŋ55tsan33 k hɵy13 . 3SG LOC every synagogue teach human human-human also glorify 3SG “And he taught in their synagogues, being glorified of all.” 298 Tak-sum Wong and Wai-mun Leung In the 1880s edition, the pronoun tʃoŋ33jɐn ɐ 11衆人, “everybody,” is used to refer to all the people in the synagogue, but in the 2010 edition, the reduplicated classifier jɐn ɐ 11jɐn ɐ 11人人, literally “human-human,” is used to convey the same objective truth, albeit a different focus. In terms of lexical choice, in contemporary Cantonese, tʃoŋ33jɐn ɐ 11衆人 is only used in a formal context, while jɐn ɐ 11jɐn ɐ 11人人 is often used in a colloquial context. This seems to suggest that the construction employed for expressing a collective concept is likely a matter of the choice of the translators. Some readers may make a conjecture that reduplicated classifiers become less popular in contemporary Cantonese as observed from their reduced usage in the 2010 edition. As native speakers, the authors confirm that the use of reduplicated classifiers is still prevalent in contemporary Cantonese. For this reason, investigations into more Cantonese historical documents should be made before jumping to a rash conclusion. 15.4 Conclusion In this chapter, we first introduced the “Database of the 19th Century (1865–1894) Cantonese Christian Writings,” which provides a public data repository by digitizing 15 Cantonese Christian classics published in mid- to late nineteenth century with approximately 466,000 characters. Then, we provided a statistical account and a contrastive study on the use of classifiers present in the Cantonese translations of the 1880s edition and the 2010 edition of the four canonical gospels in the Christian New Testament. Our results show that while some classifiers have been used most regularly since the nineteenth century, such as kɔɔ33個 (a general classifier), kin22件 (piece), tʰiu11條 (strip), tsɛk33隻 (mostly for counting animals and dolls), and ti55 的/啲, the frequency of some classifiers in the 2010 edition drops drastically as a result of lexical replacement. For example, tat33笪 (for counting fields) is replaced by fai33塊. We also found that the reduction in frequency of reduplicated classifiers is a result of changes in translation strategy rather than a reduction in usage in contemporary Cantonese. References Bisang, Walter. 1999. “Classifiers in East and Southeast Asian Languages: Counting and Beyond.” In Numeral Types and Changes Worldwide, edited by J. Gvozdanović, 113–85. Berlin: Mouton de Gruyter. Cheung Hung-nin Samuel 張洪年. 2007. Xiānggǎng yuèyǔ yǔfǎ de yánjiū 香港粵語語法 的硏究 [A Grammar of Cantonese as Spoken in Hong Kong] (rev. ed.). Hong Kong: The Chinese University Press. Hong Kong Bible Society 香港聖經公會, ed. and trans. 2010. Sān Gwóngdūngwá Singgīng 新廣東話聖經 [Cantonese Bible: New Cantonese Version]. Hong Kong: Author. Kataoka Shin 岡新. 2021. “Jiànlì ‘Zǎoqī yuèyǔ shèngjīng zīliàokù’: Yuèyǔ shèngjīng de shùmǎ rénwénxué yánjiū” 建立《早期粵語聖經資料庫》: 粵語聖經的數碼人文學研 究 [The Development of the ‘Early Cantonese Bible Database’: A Resource for Digital Humanities Research on Early Cantonese]. Current Research in Chinese Linguistics 中 通訊 100, no. 2: 213–28. On a Historical Approach to Cantonese Studies 299 Leung, Wai-mun 梁慧敏. 2011. “Shíjiǔ shìjì ‘Shèngjing’ Yuèyǔ yìběn de yánjiū jiàzhí” 十九世紀《聖經》粵語譯本的研究價值 [The Research Value of the 19th Century Cantonese Bible Translations]. Journal of Jinan University (Philosophy and Social Sciences) 暨南學報 (哲學社會科學版) 155: 125–29. Leung, Wai-mun 梁慧敏. 2016. “Lùn Yuèyǔ jùmòzhùcí “ze1” de zhǔguānxìng” 論粵語 句末助詞“啫”的主觀性 [Analysis of the Subjectivity of Sentence-final Particles: The Case of ze1 in Cantonese]. Studies of the Chinese Language 3: 339−48. Leung Wai-mun 梁慧敏. 2021. “Shíjiǔ Shìjì Mò “Xīnyuē Sì Fúyīn” Yuèyǔ Yìběn de Yǔyánxué Jiàzhí” 十九世紀末《新約四福音》粵語譯本的語言學價值 [The Linguistic Value of the Cantonese Translation of the Four Gospels Published in Late 19th Century]. www.lordwilson-heritagetrust.org.hk/filemanager/archive/project_doc/27-9105/2.pdf. Matthews, Stephen, and Virginia Yip. 2011. Cantonese: A Comprehensive Grammar (2nd ed.). London and New York: Routledge. Noyes, Henry V., George H. Piercy, and P. J. Masters, eds. and trans. 1882a. Máhhó Fūkyāmchyùhn: Yèuhngsìhng Tóupahk 馬可福音傳:羊城土白 [Gospel of Mark: Cantonese Dialect]. Shanghai: The American Bible Society. Noyes, Henry V., George H. Piercy, and P. J. Masters, eds. and trans. 1882b. Máhtai Fūkyāmchyùhn: Yèuhngsìhng Tóupahk 馬太福音傳:羊城土白 [Gospel of Matthew: Cantonese Dialect]. Shanghai: The American Bible Society. Noyes, Henry V., George H. Piercy, and P. J. Masters, eds. and trans. 1883a. Louhgāchyùhn Fūkyāmsyū: Yèuhngsìhng Tóupahk 路加傳福音書:羊城土白 [Gospel of Luke: Cantonese Dialect]. Canton: The British and Foreign Bible Society. Noyes, Henry V., George H. Piercy, and P. J. Masters, eds. and trans. 1883b. Yeukhohnchyùhn Fūkyāmsyū: Yèuhngsìhng Tóupahk 約翰傳福音書:羊城土白 [Gospel of John: Cantonese Dialect]. Canton: The British and Foreign Bible Society. Wong Tak-sum 黃得森. 2010. “Guǎngzhōuhuà “dī”, “dīt” yǔ “dīk” de lìshí fāzhǎn” 廣州 話“啲”、“尐”與“的”之歷時發展 [The Diachronic Development of Di, Dit and Dik in Cantonese]. Yue Dialect Research 粵語研究 6&7: 75–82. Wu Yicheng 吳義誠. 2017. “Numeral Classifiers in Sinitic languages: Semantic Content, Contextuality, and Semi-lexicality.” Linguistics 55, no. 2: 333–69. Notes 1 The database is accessible publicly through this link: www.polyu.edu.hk/cbs/hkchristdb. 2 All the English translations of the verses in the Bible are adopted from the King James Version unless otherwise specified. <www.o-bible.com/kjv.html>.DOI: 10.4324/ 9781003298328- Index absolute frequencies 69 addressee-reference 218, 220, 223–6 alignment 105, 110, 164, 179–83 alignment algorithms 180, 183 amplifiers 125, 128–9 analytic negation 128–9 author interview 12 authorship 81 author style 26–47 automatic analysis 81 average paragraph length (APL)192, 199, 208 average sentence length (ASL) 192, 193, 195, 197, 199 believability 66, 70 black characters 185 bodhisattva 87–8 bodily phenomena 159, 167–8, 170, 172–3 body exploration 159, 161, 167, 169, 171–3 body language 48, 50, 52–6, 59, 61–2, 142, 208 CAT tools 103–5, 110–12, 114–15 Cemetery of Forgotten Books, The 65–80 Cervantes 49, 195 chains 68, 113, 140 character analysis 81 character development 142–3, 213–14, 223 Chinese Buddhist Canon 81–98 Chinese fiction 133, 229 Chinese literature 245, 254, 273 close reading 107, 112–13, 180, 199, 282 cluster analysis 31, 40–1, 66, 70, 77, 196 cognitive effect 256–8, 267, 269 cognitive effort 7, 267–8, 270, 274–6 Cold War 188 Color Purple, The 158–73 communicative purpose 27–8, 30, 32–4, 37–8, 40, 44–5, 120, 143 compensation 121, 222, 225–6 computational resources 180 Computer-Aided or Assisted Literary Translation (CALT) 103–5 concordance 14, 53, 70, 72, 107, 108, 111, 115, 143, 144 conservative strategies 162, 164, 167, 169–70, 172 contractions 120, 128–9, 201 conventional metaphor 259, 263, 267 co-occurrence 26, 53, 122 corpus analysis 108 corpus-assisted process study 255 corpus-based applications 106 corpus-based translation studies (CBTS) 103–5, 112–13, 140, 143, 230 corpus creation 108–9 corpus linguistics 61, 65, 77–8, 103, 105, 108, 115, 138–40, 231, 260 corpus stylistics 48–50, 61, 65–6, 68, 77–8 corpus technology 104–9, 113–14 creative metaphor 254–76 creative process 23, 275 creative writing 22 crowdsourcing 176, 178–9 culture-specific metaphor 254–76 culture-universal metaphor 254–76 dependency parsing 81, 92 dependency structures 86, 92 diachronic analysis 14, 26 diachronic corpora 10 diachronic trends 10–23 dialects 177, 182, 184, 186–7, 281 Index dialogue novels 60 Dickens, Charles 48–62 digital humanities 1, 4, 103–4, 178, 210 dimensions of style variation 27–8 direct speech 48, 55–7, 59, 61–2, 71, 89 direct translation 7, 242, 257–8, 267–9, 271–6 direct WH-questions 128–9 dispersion of sentence lengths (DSL) 192, 193, 195, 197, 199, 208 distant reading 108, 114 El juego del Ángel 65, 67–8, 70–1, 73–7 El laberinto de los espíritus 65, 68, 70, 71, 74–6 El prisionero del cielo 65, 68, 70–1, 76–7 emphatics 28, 35, 47, 128–9 equivalence in lexicogrammar 218, 220–2, 224 evolution of literary genres 81 explicitation 134, 214, 258, 270 explicit strategies 162, 164, 167, 173 faithful translation 5, 158, 162, 164–70, 172–3 feminist translation 158–73 fictional dialogues 119–35 fiction writing 10–11, 21, 23 Fortunata and Jacinta 48–62 functional approach 44 functional relevance 138 functional use 28 Galdós, Benito Pérez 49–62 general technologies 106 Hawkes, David 229–46 heroic literature 191, 210 hierarchical 18, 31, 40, 196, 217, 219–26 Hīnayāna 82, 88–9, 98 historical corpus 81 Hongloumeng 229–46 Human Comedy, The 49 humbleness 213–16, 218, 220, 222–3, 225–6 illicit relations 159, 161, 167, 169, 171–3 independent clause coordinators 128 indirect translation 257–8, 267–8, 270, 273–6 information extraction 81–3, 95, 97 interactant 216, 220 interactive 27–30, 33–5, 43, 47, 123, 125, 127, 130, 178, 223, 258, 276 301 interior monologue 60 intertextuality 50, 81, 109 Jie Tao 158–73 keyword analysis 11, 12, 13–14, 19, 243 keyword networks 139 L1 English translator 254–76 L2 English translator 254–76 La sombra del viento 65, 67–8, 70–1, 73, 75–7 lexical bundles 229–46 lexical repetition 139–40 lexicogrammatical choice 213–14, 218–22, 226 linguistic diversity 182 literary register 27 literary style 71, 81, 108, 191 literary translator education 103, 105, 110, 114 low-resource language 81–2, 97 Lu Xun 254–76 machine translation 104–5, 110–11, 115, 179 Mahāyāna 81, 88–9, 91, 98 martial arts 191, 201, 208, 210, 213, 215, 217, 222, 225 Medieval Chinese 81–4, 86, 94, 97–8 metaphor in translation 259–61 Minford, John 194, 206–7, 229–31, 233, 244–5 modern literary nonfiction 28 multidimensional analysis 26, 28, 119–20, 122, 125, 132, 134, 232 Multidimensional Analysis Tagger (MAT) 125 multifactorial approach 143 named entity recognition 81, 83–4 narrative fiction 10, 11, 23, 52 narrative space 59, 60, 65, 66–8, 70, 72–8 n-grams 68, 138, 191, 193 non-equivalence 218, 220, 224–6 non-interactant 216 novel structure 81 Oliver Twist 53–5, 121 omission 87, 111, 131, 133, 160, 152, 188, 208, 222–3, 225–6, 263–7, 270 orality 119–35 302 Index paragraph count 180–1 parallel corpora 106, 110, 135, 179, 182, 187 parallel text analysis 158–60, 173 paraphrasing 263–70, 272, 276 personal reference 213–26 possibility modals 128–9 prepositional phrases 29, 35–6, 47, 125, 128–9, 238–9, 243–4 principal component analysis 196, 204 private parts 159, 161, 166–7, 169–73 racism 177, 182 rape 159, 161, 166–7, 169, 171–3 raw frequency 85, 145 recurrent word-combinations 68 reference corpora 106 referential markers 241–3 referential meaning 213, 218, 223, 226, 213, 218, 223, 226 relevance theory 255–6 Renjing Yang 158–9, 172–3 Ruiz Zafón, Carlos 65–78 Śākyamuni 85, 87, 89, 95–8 scripted language 133 self-reference 215, 219, 223, 226 sentence relatives 128–9 sexual content 158–73 sexual intercourse 158–73 short story 111, 140, 263 simultaneity 53–6, 59–61 situational characteristics 27–8 sketch engine 12–14, 108, 115 slavery 177 small corpus 10, 12 social networks 81, 216 speaker-reference 216, 218, 220–3, 225–6 speech role 216–20, 225–6 standardization 121, 133–4, 185–7, 273 stigma 139, 159, 161, 167–70, 172–3 style variation 26–45 stylistic change 10 stylistic panoramas 191–210 stylistics 16, 48–50, 61, 65–6, 68, 77–8, 138 stylometric 113, 191–210 stylometry 112, 191 substitution 139, 158, 162–6, 168–70, 263–7, 270 suspended quotation 48 suspensions 48–62 text reuse 81 thought presentation 27, 42, 43, 59–61 traditional stylistics 78 transcoding 263–8, 270–2, 276 translated fiction 119, 122–4, 126–35 translation choice 218, 225 translation dashboard 180 translation memory 105, 110 translation route 264, 266–8, 271–1 translation strategy 158–9, 162, 164–5, 167, 255, 260–1, 270, 275, 298 translation style 230–3, 244, 266 translator style 230–2, 254 transnational 176–7, 183–4, 187 trend mapping 11, 13 visualization tools 113 within-author variation 27, 44 world literature 104 writing process 19, 23 Wuxia 191–210 Yang, Gladys 229–37 Yang, Xianyi 229–37