Subido por mirianq

Advances in Corpus Applications in Literary and Translation Studies (Routledge Advances in Translation and Interpreting... (Riccardo Moratto (editor), Defeng Li (editor)) (Z-Library)

Anuncio
Advances in Corpus Applications in
Literary and Translation Studies
Professor Riccardo Moratto and Professor Defeng Li present contributions focusing on the interdisciplinarity of corpus studies, with a special emphasis on literary
and translation studies which offer a broad and varied picture of the promise and
potential of methods and approaches. Inside scholars share their research findings concerning current advances in corpus applications in literary and translation studies and explore possible and tangible collaborative research projects. The
volume is split into two sections focusing on the applications of corpora in literary
studies and translation studies. Issues explored include historical backgrounds,
current trends, theories, methodologies, operational methods, and techniques, as
well as training of research students.
This international, dynamic, and interdisciplinary exploration of corpus studies
and corpus application in various cultural contexts and different countries will
provide valuable insights for any researcher in literary or translation studies who
wishes to have a better understanding when working with corpora.
Riccardo Moratto (PhD, FCIL) is Professor of Translation and Interpreting Studies and Chinese Literature in Translation at the Graduate Institute of Interpretation
and Translation, Shanghai International Studies University.
Defeng Li, Professor of Translation Studies, is Associate Dean of Faculty of Arts
and Humanities and Director of the Centre for Studies of Translation, Interpreting,
and Cognition (CSTIC) at the University of Macau.
Routledge Advances in Translation and Interpreting Studies
Translating Controversial Texts in East Asian Contexts
A Methodology for the Translation of “Controversy”
Adam Zulawnik
Using Technologies for Creative-Text Translation
Edited by James Hadley, Kristiina Taivalkoski-Shilov, Carlos da Silva Cardoso
Teixeira, and Antonio Toral
Relevance Theory in Translation and Interpreting
A Cognitive-Pragmatic Approach
Fabrizio Gallai
Towards a Feminist Translator Studies
Intersectional Activism in Translation and Publishing
Helen Vassallo
The Behavioral Economics of Translation
Douglas Robinson
Online Collaborative Translation in China and Beyond
Community, Practice, and Identity
Chuan Yu
Advances in Corpus Applications in Literary and Translation Studies
Edited by Riccardo Moratto and Defeng Li
Institutional Translator Training
Edited by Tomáš Svoboda, Łucja Biel, and Vilelmini Sosoni
For more information about this series, please visit www.routledge.com/
Routledge-Advances-in-Translation-and-Interpreting-Studies/book-series/RTS.
Advances in Corpus
Applications in Literary and
Translation Studies
Edited by Riccardo Moratto and
Defeng Li
First published 2023
by Routledge
4 Park Square, Milton Park, Abingdon, Oxon OX14 4RN
and by Routledge
605 Third Avenue, New York, NY 10158
Routledge is an imprint of the Taylor & Francis Group, an informa
business
© 2023 selection and editorial matter, Riccardo Moratto and Defeng Li;
individual chapters, the contributors
The right of Riccardo Moratto and Defeng Li to be identified as the
authors of the editorial material, and of the authors for their individual
chapters, has been asserted in accordance with sections 77 and 78 of the
Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this book may be reprinted or reproduced or
utilised in any form or by any electronic, mechanical, or other means, now
known or hereafter invented, including photocopying and recording, or in
any information storage or retrieval system, without permission in writing
from the publishers.
Trademark notice: Product or corporate names may be trademarks or
registered trademarks, and are used only for identification and explanation
without intent to infringe.
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
ISBN: 978-1-032-28738-6 (hbk)
ISBN: 978-1-032-28740-9 (pbk)
ISBN: 978-1-003-29832-8 (ebk)
DOI: 10.4324/9781003298328
Typeset in Times New Roman
by Apex CoVantage, LLC
Contents
List of Tables
List of Figures
Notes on Editors
Notes on Contributors
Introduction
viii
xi
xiii
xiv
1
RICCARDO MORATTO AND DEFENG LI
1
Diachronic Trends in Fiction Authors’ Conceptualizations of
their Practices
10
DARRYL HOCKING AND PAUL MOUNTFORT
2
Within-Author Style Variation in Literary Nonfiction:
The Situational Perspective
26
MARIANNA GRACHEVA AND JESSE A. EGBERT
3
Charles Dickens’s Influence on Benito Pérez Galdós
Revisited: A Corpus-Stylistic Approach
48
PABLO RUANO SAN SEGUNDO
4
A Corpus-Stylistic Approach to the Literary Representation
of Narrative Space in Ruiz Zafón’s The Cemetery of
Forgotten Books Series
65
GUADALUPE NIETO CABALLERO AND PABLO RUANO SAN SEGUNDO
5
Analyzing Who, What, and Where in a Mediæval
Chinese Corpus: A Case Study on the Chinese Buddhist
Canon
TAK-SUM WONG AND JOHN SIE YUEN LEE
81
vi
Contents
6
Corpora and Literary Translation
103
TITIKA DIMITROULIA
7
Orality in Translated and Non-Translated Fictional
Dialogues
119
YANFANG SU AND KANGLONG LIU
8
The Avoidance of Repetition in Translation: A Multifactorial
Study of Repeated Reporting Verbs in the Italian Translation
of the Harry Potter Series
138
LORENZO MASTROPIERRO
9
Feminist Translation of Sexual Content: A Quantitative
Study on Chinese Versions of The Color Purple
158
XINYI ZENG AND JOHN SIE YUEN LEE
10 Benefits of a Corpus-based Approach to Translations: The
Example of Huckleberry Finn
176
RONALD JENN AND AMEL FRAISSE
11 Are Translated Chinese Wuxia Fiction and Western Heroic
Literature Similar? A Stylometric Analysis Based on
Stylistic Panoramas
191
KAN WU AND DECHAO LI
12 Translating Personal Reference: A Corpus-Based
Study of the English Translation of Legends of the Condor
Heroes
213
JING FANG AND SHIWEI FU
13 Lexical Bundles in the Fictional Dialogues of Two
Hongloumeng Translations: A Corpus-Assisted
Approach
229
KANGLONG LIU, JOYCE OIWUN CHEUNG, AND RICCARDO MORATTO
14 Mapping Culture-Specific and Creative Metaphors
in Lu Xun’s Short Stories by L1 and L2 English
Translators: A Corpus-Assisted Relevance-Theoretical
Account
LINPING HOU AND DEFENG LI
254
Contents
15 On a Historical Approach to Cantonese Studies: A CorpusBased Contrastive Analysis of the Use of Classifiers in
Historical and Recent Translations of the Four Gospels
vii
281
TAK-SUM WONG AND WAI-MUN LEUNG
Index
300
Tables
1.1
1.2
Word Composition of the FAC
High-Frequency Upward-Trending Lemmas in the FAC
(freq. ≥ 50, p < 0.05)
1.3 High-Frequency Downward-Trending Lemmas in the FAC
(freq. ≥ 50, p < 0.05)
1.4 1950s Keywords
1.5 1960s Keywords
1.6 1970s Keywords
1.7 1980s Keywords
1.8 1990s Keywords
1.9 2000s Keywords
1.10 2010s Keywords
1.11 2020s Keywords
2.1 Range of Variation along Dimensions
2.2 “Interactive vs. Informational Style” Dimension Scores across
Communicative Purposes in Phillip Lopate’s Essays
2.3 “Immediate vs. Removed Style” Dimension Scores across
Communicative Purposes in David Shields’s Essays
2.4 Clusters Identified in Ander Monson’s Essays
3.1 Annotation Tags Used to Annotate Fortunata and Jacinta
4.1 Novels by Ruiz Zafón
4.2 Clusters in Ruiz Zafón’s Novels
5.1 Named Entity Recognition Performance on the Test Set
5.2 Precision and Recall in Subject-Verb Pair Extraction from
L&K Treebank
5.3 Most Frequent Characters (As Nominal Subjects) in the
Corpus
5.4 Most Frequent Characters in the Two Subcorpora of the Canon
5.5 Most Frequent Verbs with Buddha (Left) and Other Characters
(Right) as Subject
5.6 Most Frequent Verbs of Three Different Characters
5.7 Precision and Recall in Character-Toponym Pair Extraction in
the Test Set
13
14
15
19
20
20
20
20
22
22
23
29
33
37
41
52
68
69
84
87
88
89
90
91
95
Tables ix
5.8
7.1
7.2
7.3
8.1
8.2
8.3
8.4
8.5
8.6
9.1
9.2
9.3
9.4
9.5
9.6
11.1
11.2
11.3
11.4
11.5
11.6
12.1
12.2
12.3
12.4
12.5
13.1
13.2
13.3
13.4
13.5
13.6
13.7
13.8
Most Frequent Places Associated with the Three Major Epithets
of Buddha and with Other Characters
Composition of the Fictional Dialogue Corpus
Score of Dimension 1
Results of Mann-Whitney U-Test
Overview of the Data and Sub-Datasets
Reporting Verb Taxonomy
Generalized Linear Model: Series
Generalized Linear Model: Translator 1
Generalized Linear Model: Translator 2
Generalized Linear Model: HbP
Breakdown 197 Sentences in Our Corpus According
to Sexual Content Type
The Spectrum of Translation Strategies Annotated in Our
Corpus
Translation Strategies Illustrated with Example Translations
Breakdown of Translation Strategy Frequency in Feminist (Tao)
and Non-Feminist (Yang) Translation
Sentence-Level Comparison of the Translation Strategies
in the Feminist (Tao) and Non-Feminist (Yang) Translations
Example Sentences for Each Sexual Content Type
Details of the Translated Wuxia Stories
Details of the Chivalric Stories
Details of the Heroic Fantasies
Statistics of the Stylistic Indices across the Genres
Average Distances between Each Wuxia Translation and the
Western Counterparts
Reception of the Six Wuxia Translations in English
(Up to 02/2021)
Personal References Used by Guo Jing in LoCH
Translation Equivalence by Social Status (Speaker-Reference)
Chi-Square Test Result of Translation Equivalence and Social
Status (Speaker-Reference)
Translation Equivalence by Social Status (Addressee-Reference)
Chi-Square Test Result of Translation Equivalence and Social
Status (Addressee-Reference)
Descriptive Statistics of Fictional Dialogues in HD and YD
Types and Tokens of 3-Word and 4-Word LBs in HD and YD
Structural Classifications of Key-LBs in HD and YD
Statistics of VPBased Key-LBs in HD and YD
Statistics of PPBased Key-LBs in YD
Functional Classifications of Key-LBs in HD and YD
Statistics of Stance Key-LBs in HD and YD
Statistics of Referential Key-LBs in HD and YD
97
124
126
129
144
146
148
148
149
149
161
162
163
165
166
168
194
195
195
197
206
208
219
221
221
224
224
234
234
236
236
236
239
240
240
x
Tables
14.1 Rendering Strategies of Conventional Culture-Specific
Metaphors
14.2 Rendering Strategies of Creative Culture-Specific Metaphors
14.3 Rendering Strategies of Conventional Culture-Universal
Metaphors
14.4 Rendering Strategies of Creative Culture-Universal Metaphors
15.1 List of Top 10 Classifiers Present in the Contemporary
Cantonese Translation of the Four Gospels
15.2 List of Top 10 Classifiers Present in the Historical Cantonese
Translation of the Four Gospels
15.3 The Most Frequently Observed Classifiers Present in the
Recent Cantonese Translation of the Four Gospels
15.4 The Most Frequently Observed Classifiers Present in the
Historical Cantonese Translation of the Four Gospels
15.5 Reduplicated Classifiers in the Cantonese Translation of the
2010 Edition of the Four Gospels (N = 11)
15.6 Reduplicated Classifiers in the Cantonese Translation of the
1880s edition of the Four Gospels (N = 32)
265
265
265
265
286
288
290
291
295
295
Figures
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
3.1
3.2
3.3
4.1
5.1
5.2
5.3
5.4
5.5
5.6
5.7
6.1
7.1
8.1
Phillip Lopate’s essays: spread of scores on “interactive vs.
informational style.”
David Shields’s essays: spread of scores on “immediate vs.
removed style.”
Ander Monson’s essays: spread of scores on “abstract expository
vs. concrete descriptive style.”
Phillip Lopate: variation by communicative purpose on
“interactive vs. informational style.”
David Shields: variation by communicative purpose on
“immediate vs. removed style.”
Ander Monson: variation by communicative purpose on
“abstract expository vs. concrete descriptive style.”
Ander Monson’s essays: three-cluster solution.
Clusters identified in Ander Monson’s essays.
Example 2 with annotation.
Screenshot of 50 suspensions in Fortunata and Jacinta.
Screenshot of CLiC tool with 20 suspensions from Oliver Twist.
Screenshot of concordance a la puerta de la in Ruiz Zafón’s
novels.
Example dependency tree to illustrate character-verb pair
extraction (K229).
Most frequent verbs with nominal subjects.
Dependency tree with a character-toponym pair involving a
verb.
Dependency tree with a character-toponym pair involving a
preposition.
Most frequent verbs that take toponyms as direct objects.
Most frequent prepositions that take toponyms as prepositional
objects.
The ten toponyms most frequently mentioned with a character.
Melby’s eight types of translation technology (1998, 1).
Scores for dimension 1 of different registers.
Concordance sample for reporting verbs attributed to Harry.
30
31
32
34
38
40
41
42
52
53
54
72
86
90
92
93
93
94
96
105
127
144
xii
8.2
8.3
8.4
8.5
9.1
9.2
9.3
10.1
10.2
10.3
11.1
11.2
11.3
11.4
11.5
11.6
11.7
11.8
12.1
14.1
Figures
Query for reply in Treq.
Query for urged in WordNet.
“Freq” effect plots.
“Freq” effect plots without said.
Overall distribution of translation strategies for sexual content
in feminist (Tao) and non-feminist (Yang) translation.
Overall distribution of translation strategies according to
sexual content type in feminist (Tao) and non-feminist (Yang)
translation.
Sentence-level comparison: the number of sentences translated
with a more explicit strategy in the feminist (Tao) than nonfeminist (Yang) translation, and vice versa.
Excerpt of the Translation Dashboard for Basque, Bulgarian,
Dutch, Finnish, German, Hungarian, Polish, Portuguese,
Russian, and Ukrainian (first five chapters).
Example of paragraph alignment for the Basque version.
Example of paragraph alignment for the Bulgarian version.
Stylistic panoramas of the three subgenres, from a global view.
Cluster dendrogram of the HCA-based stylistic panoramas.
Sample parallel list for the MFWs in the selected works.
Sample parallel list for the MFWSs (2-grams) in the selected
works.
Sample parallel list for the MFWSs (3-grams) in the selected
works.
PCA graph of individuals, based on the top 1,000 MFWs.
PCA graph of individuals, based on the top 1,000 2-grams.
PCA graph of individuals, based on the top 1,000 3-grams.
Lexicogrammatical realization of speech roles in LoCH (ST).
RT translation route of metaphors by L1 and L2 translators.
145
146
150
151
165
167
168
180
181
182
198
200
202
202
202
203
204
205
217
266
Notes on Editors
Riccardo Moratto (PhD, FCIL) is Professor of Translation and Interpreting Studies and Chinese Literature in Translation at the Graduate Institute of Interpretation
and Translation, Shanghai International Studies University; Chartered Linguist
and Fellow Member of CIOL; Editor in Chief of Interpreting Studies for Shanghai
Foreign Language Education Press (外教社); General Editor of Routledge Studies in East Asian Interpreting; and General Editor of Routledge Interdisciplinary
and Transcultural Approaches to Chinese Literature. Professor Moratto is also
Honorary Guest Professor at the College of Foreign Studies, Nanjing Agricultural
University; Honorary Research Fellow at the Center for Translation Studies of
Guangdong University of Foreign Studies; and Expert Member of the Translators
Association of China (TAC). Professor Moratto is an international conference
interpreter and renowned literary translator. He has published extensively in the
field of translation and interpreting studies and Chinese literature in translation.
Defeng Li, Professor of Translation Studies, is Associate Dean of Faculty of Arts
and Humanities and Director of the Centre for Studies of Translation, Interpreting,
and Cognition (CSTIC) at the University of Macau. Prior to his current appointment, he served as Chair of the Centre for Translation Studies and Reader in
Translation Studies at SOAS, University of London; Director of the MA in Translation and Associate Professor at the Chinese University of Hong Kong; Dean
and Chair Professor at Shandong University (adjunct); and Chair Professor at
Shanghai Jiaotong University (adjunct). He is currently President of the International Association of Translation, Interpreting, and Cognition (IATIC) and World
Interpreter and Translator Training Association (WITTA). He has researched and
published extensively in the field of cognitive translation studies, corpus-assisted
translation studies, curriculum development in translator training, research methods in translation studies, professional translation (e.g., business, journalistic,
legal translation), as well as second language education.
Notes on Contributors
CHEUNG, Joyce Oiwun, is a PhD student at the Department of Chinese and
Bilingual Studies of the Hong Kong Polytechnic University. Her current research
interests include corpus linguistics, translation studies, and discourse analysis. She
has previously published in linguistic journals such as Social Semiotics and Recall.
DIMITROULIA, Titika, is Professor of Translation Studies at Aristotle University of Thessaloniki, Greece, a translator and a literary critic. Her research interests
range from literary translation to translation technologies, with emphasis on text
analysis and translation history. Her publications include the volumes Translation
and Memory (2021), Literary Translation (2015), and Digital Literary Studies
(2015), and articles in various journals and edited volumes. She is member of the
board of Petra-E network for the education and training of literary translators,
and she coordinates at Aristotle University the Clarin-Apollonis project. She has
translated numerous libretti and published books and articles on literary criticism.
EGBERT, Jesse A., is Associate Professor of Applied Linguistics at Northern
Arizona University. Jesse specializes in register variation, quantitative methods in
linguistics, and corpus linguistic approaches to legal interpretation. He is Founding General Editor of Register Studies, Technical Strand Editor for the series
Cambridge Elements in Corpus Linguistics, and Co-Editor of the Routledge series
Advances in Corpus Linguistics. He has published more than 75 peer-reviewed
papers. Recent books include Using Corpus Methods to Triangulate Linguistic
Analysis (Routledge, 2019), Doing Linguistics with a Corpus: Methodological
Considerations for the Everyday User (Cambridge, 2020), and Designing and
Evaluating Language Corpora (Cambridge, 2022).
FANG, Jing, is a lecturer in the Translation and Interpreting Program at Macquarie University. She has a PhD in linguistics and a master’s degree in translation and interpreting. Her current research interests include systemic functional
linguistics, corpus-based translation studies, and sight translation research.
FRAISSE, Amel, is Associate Professor at Université de Lille, working on
information science and digital humanities. Her research focuses on information
Notes on Contributors
xv
extraction, knowledge acquisition and visualization, multilingualism and multiculturalism, and under-resourced languages and cultures. She received a PhD in
Computer Science in 2010 from Université de Grenoble. Her dissertation focused
on issues of software localization, globalization, and internationalization process.
From 2013 to 2015, she worked as a CNRS postdoctoral fellow at the LIMSICNRS laboratory (Orsay-Saclay, south of Paris), where she worked on natural
language processing tasks and more specifically on building multilingual corpora
and lexicon for sentiment analysis and opinion mining tasks.
FU, Shiwei, is PhD Candidate in the Department of Linguistics at Macquarie
University. She is also a PhD candidate in the School of Foreign Languages and
Literature at Wuhan University. Her research interests include Chinese literary
translation, corpus-based translation studies, and translation theories.
GRACHEVA, Marianna, is PhD Candidate in Applied Linguistics at Northern Arizona University. Marianna’s linguistic interests include register and style
variation, grammar, and corpus linguistics. Her dissertation research focuses on
intra-speaker style variation across registers in literary, academic, and political
domains, as well as inter-speaker variation within registers, and investigates the
relationship between individual linguistic style and the situation of use.
HOCKING, Darryl, is Senior Lecturer at Auckland University of Technology,
New Zealand. His research primarily uses corpus, genre, and discourse analysis
to examine the interactional genres and communicative practices in art and design
settings and how these impact on creative activity. He is the author of the books
Communicating Creativity (Palgrave Macmillan) and The Impact of Everyday
Language Change on the Practices of Visual Artists (Cambridge University Press).
HOU, Linping, received his PhD in English Linguistics, specializing in Cognitive Translation Studies, from the Centre for Studies of Translation, Interpreting,
and Cognition (CSTIC) at the University of Macau. He is Professor of Translation Studies in the English Department at Shandong University of Science and
Technology and Director of the Center for Cognitive Translation Studies (CCTS).
His main research interests include corpus-assisted translation studies, cognitive
translation studies, and cognitive pragmatics. He has published more than 20 articles in all these areas.
JENN, Ronald, is Professor of Translation Studies at Université de Lille, France.
He coauthored Mark Twain & France (University of Missouri Press, 2017) with
Paula Harrington. After a France-Berkeley project on the “ ‘French Marginalia’
of Mark Twain’s Personal Recollections of Joan of Arc” with Linda Morris, his
research now focuses on how Translation Studies and digital humanities can interact, using Huckleberry Finn as a case in point. This is what the France-Stanford
ROSETTA/Global Huck project has done, in collaboration with Shelley Fisher
Fishkin and Amel Fraisse (https://rosetta.huma-num.fr/worldmap/).
xvi
Notes on Contributors
LEE, John Sie Yuen, is Associate Professor at the Department of Linguistics and
Translation at City University of Hong Kong. He obtained his PhD in Computer
Science at the Massachusetts Institute of Technology. His research interest is in
natural language processing and its applications in digital humanities, translation
studies, and computer-assisted language learning.
LEUNG, Wai-Mun, is Associate Professor of Applied Chinese Linguistics at the
Department of Chinese and Bilingual Studies, the Hong Kong Polytechnic University. Her research interests include Cantonese studies, language planning and
policy, Chinese-language education, and teaching Chinese to non-Chinese-speaking students. She recently constructed “The 19th Century (1865–1894) Cantonese
Christian Writings Database 十九世紀中後期(1865–1894)粵語基督教典籍
資料庫.” She is also the first author of the monograph Biliteracy and Trilingualism: Language Education Policy Research in Hong Kong (兩文三語:香港語文
教育政策研究, CityU Press, 2020), in which a chapter is devoted to discussing
Cantonese.
LI, Dechao, is Associate Professor of the Department of Chinese and Bilingual
Studies, the Hong Kong Polytechnic University. He also serves as Chief Editor of
Translation Quarterly, a journal published by the Hong Kong Translation Society. His main research areas include corpus-based translation studies, empirical
approaches to translation process research, history of translation in the late Qing
and early Republican periods, and PBL and translator/interpreter training.
LI, Defeng, is Professor of Translation Studies and Director of the Centre for
Studies of Translation, Interpreting, and Cognition (CSTIC) at the University of
Macau. Previously, he served as Chair of the Centre for Translation Studies and
Reader in Translation Studies at SOAS, University of London. He is President of
the International Association of Translation, Interpreting, and Cognition (IATIC)
and World Interpreter and Translator Training Association (WITTA). He has
researched and published extensively in the field of cognitive translation studies,
corpus-assisted translation studies, curriculum development in translator training, research methods in translation studies, professional translation, as well as
second-language education.
LIU, Kanglong, is Assistant Professor in the Department of Chinese and Bilingual Studies, the Hong Kong Polytechnic University. He specializes in corpusbased translation studies, and his main interests include empirical approaches to
translation studies, translation pedagogy, and corpus-based translation research.
MASTROPIERRO, Lorenzo is Lecturer in English Language and Translation
at the University of Insubria (Como, Italy). He holds a PhD in English Linguistics from the University of Nottingham (UK). His research sits at the intersection between corpus linguistics, stylistics, and translation studies. He has worked
extensively on corpus stylistic approaches to literary translation, publishing a
Notes on Contributors
xvii
monograph with Bloomsbury (Corpus Stylistics in Heart of Darkness and its
Italian Translations) and several papers/chapters on topics such as translated
cohesive networks, translator style as opposed to author style, the translation of
repeated items, reader-response analysis, and the translation of reporting verbs.
MORATTO, Riccardo, is Professor of Translation and Interpreting Studies
and Chinese Literature in Translationat the Graduate Institute of Interpretation
and Translation, Shanghai International Studies University; Chartered Linguist
and Fellow Member of CIOL (FCIL); Editor in Chief of Interpreting Studies
for Shanghai Foreign Language Education Press (外教社); General Editor of
Routledge Studies in East Asian Interpreting; and General Editor of Routledge
Interdisciplinary and Transcultural Approaches to Chinese Literature. Professor
Moratto is also Honorary Guest Professor at the College of Foreign Studies, Nanjing Agricultural University, and Expert Member of the Translators Association of
China (TAC). Professor Moratto has published extensively in the field of translation and interpreting studies and Chinese literature in translation.
MOUNTFORT, Paul, is Associate Professor in Auckland University of Technology’s School of Communication Studies, and former Chair of the AUT Centre
for Creative Writing. His research interests are narrative design and transmedia
studies.
NIETO CABALLERO, Guadalupe, is Postdoctoral Researcher at the Universidad Complutense de Madrid, Spain. Her main research interests are in twentieth-century Hispanic prose. She also works on digital humanities and on corpus
stylistics, with a particular focus on Spanish authors. She has published several
articles on corpus stylistics, and she is the coauthor of Estilística de corpus:
nuevos enfoques en el análisis de textos literarios (2020, Peter Lang).
RUANO SAN SEGUNDO, Pablo, is Senior Lecturer at the University of Extremadura, Spain. His research interests are in corpus linguistics, corpus stylistics, and
corpus translation studies, with a particular interest in Charles Dickens’s narrative
fiction. He has published a number of articles and chapters in edited books in
this area and is coauthor of the book Estilística de corpus: nuevos enfoques en el
análisis de textos literarios (Peter Lang, 2020).
SU, Yanfang, is currently a PhD student in the Department of Chinese and
Bilingual Studies, the Hong Kong Polytechnic University. Her research interests include corpus linguistics, corpus-based translation studies, and computerassisted language learning.
WONG, Tak-Sum (黃得森),
黃得森 is Postdoctoral Fellow in the Department of Chinese and Bilingual Studies at the Hong Kong Polytechnic University. He received
his BEng in Computer Science from Hong Kong University of Science and
Technology (2004), and PhD in Linguistics from City University of Hong Kong
xviii Notes on Contributors
(2018). He has built a treebank of the Tripiṭaka Koreana during his doctoral study
and has been working on the quantitative study of historical syntax. His research
expertise covers Chinese historical linguistics, Cantonese linguistics, corpus linguistics, computer-assisted language learning, Chinese dialectology, and Chinese
paleography. Recently, he is working on historical Sinitic brush-talk materials
from East Asian nations.
WU, Kan, is Lecturer of Translation Studies, School of Foreign Languages,
Zhejiang University of Finance and Economics, Dongfang College. His research
interests include corpus-based translation studies and digital humanity.
ZENG, Xinyi, earned her master’s degree in Conference Interpreting and Translation from the University of Essex. She is currently a PhD candidate at the
Department of Linguistics and Translation at City University of Hong Kong. Her
research focus is on feminist translation and corpus-based translation studies.
Introduction
Riccardo MORATTO and Defeng LI
One of the most striking developments in the area of humanities over the past decades is probably the increasing integration of humanities with computing, which
has resulted in the growth and popularity of digital humanities (DH) as a new discipline of studies. As technology develops fast, so do the tools made available to
humanities scholars and their applications in humanities research. Consequently,
it has become very difficult to define DH, as its definitions become outdated very
quickly.
However, the difficulty in defining DH has not deterred the increasingly wider
use of digital resources and tools in humanities research and the analysis of such
applications. As a matter of fact, recent years have seen the applications accelerated. Fine examples are the introduction and integration of corpus linguistics in
the study of literature across the world. As Gonçalves states:
Corpus Linguistics can be a powerful tool in the analysis of literary texts,
especially when allied with non-computational approaches, to bring into
light interpretations, thematic details, critically important words in a text,
and other information that to other types of analysis might go unseen. By
enabling the researcher to process a large quantity of data, and by giving a
statistical treatment to the information obtained, Corpus Linguistics provides
an ideal approach to study various characteristics of a literary text that would
otherwise have gone unnoticed.
(2016, 42)
Biber (2011, 15) surveyed corpus-assisted analytical techniques for the analysis
of literature. He pointed out that “most of these studies focus on the distribution of words (analyzing keywords, extended lexical phrases, or collocations)
to identify textual features that are especially characteristic of an author or particular text.” Fischer-Starcke (2010) applied corpus stylistics into the analysis of
Northanger Abbey by Jane Austen, with the assistance of two corpora, one made
up of six novels by Austen, and the other of texts contemporary with Austen’s.
Her analysis shows how corpus tools can reveal textual linguistic features and
how these features affect the literary meanings of the texts. Thompson and Sealey
(2007) made a corpus-based comparison of children’s literature, adult literature,
DOI: 10.4324/9781003298328-1
2
Introduction
and newspaper texts and conducted a quantitative analysis of the most frequent
words and sequences of words. They found that adult and children fiction are
similar in some characteristics, but subtle differences also exist between them in
the frequency of some linguistic items, and that the differences between fictions
and news texts are apparent.
While many scholars pursue the application of computational and corpus linguistics in the study of literature as creative writings (e.g., Mahlberg 2007, 2012,
2013; Mahlberg and McIntyre 2011; Mahlberg et al. 2013; McIntyre 2010), Baker
(2000) applied corpus methods in the analysis of literature as translations, a subsystem to the system of literature. In her seminal study of translator’s style, she
defined translator’s style as “a kind of thumb-print that is expressed in a range of
linguistic – as well as non-linguistic – features . . . the preferred or recurring patterns of linguistic behaviors, rather than individual or one-off instances of intervention” (245). Saldanha (2011) argues that the stylistic features of a translation
are not just those of a translator but rather a combination of the linguistic choices
of the authors, translators, editors, and others who might have a role in the production and revision of the translated text. She revised the definition of translator’s
style as:
[A] “way of translating” which (1) is felt to be recognizable across a range
of translations by the same translator, (2) distinguishes the translator’s work
from that of others, (3) constitutes a coherent pattern of choice, (4) is “motivated,” in the sense that it has a discernable function or functions, and (5)
cannot be explained purely with reference to the author or source-text style,
or as the result of linguistic constraints.
(Saldanha 2011, 31)
Inspired by Baker (2000), many translation scholars applied corpus methods in the
investigations of translator’s style involving different languages (e.g., Bosseaux
2001, 2004; Winters 2004a, 2004b, 2007, 2009, Li et al. 2011; Wang and Li 2011;
Chen and Li 2022).
As research advances on both fronts, that is, creative as well as translated literature, scholars have also made efforts to expand research topics and innovate
research designs. In order to capture the recent developments along these lines, an
online roundtable seminar was organized by the Centre for Studies of Translation,
Interpreting, and Cognition (CSTIC) of the Faculty of Arts and Humanities, University of Macau, at the end of 2020, with speakers from around the world. Due to
the COVID-19 pandemic and the consequent travel restrictions, the seminar was
held online. Despite the rampage of the pandemic, the speakers of the roundtable
seminar spared no efforts to share their research on how corpora can be utilized
in the exploration of both original and translated literary texts. To celebrate these
new developments and to commemorate the remarkable research efforts despite
the pandemic, the editors of the present volume invited some speakers of the
roundtable seminar to develop their presentations into full-length chapters for
inclusion in this book. Besides, some additional leading and active researchers
Riccardo MORATTO and Defeng LI 3
were also approached to contribute with a chapter to this book. Therefore, the
present volume is the outcome of joint efforts of the speakers at the roundtable
seminar and those who were not able to partake the seminar but most generously
agreed to share their recent research with us.
This book consists of five chapters on the application of corpora to literary
studies and nine on the application of corpora to literary translation studies. To
facilitate reading, the abstracts of the authors will be presented in the following
passages as summary to each chapter.
In Chapter 1, Darryl Hocking and Paul Mounfort argue that recent years have
seen a proliferation in the diachronic analysis of narrative fiction. Using large
ready-made, digitized collections of fictional works, or smaller corpora compiled
by the researchers themselves, conclusions about developments in narrative fiction traditionally emerge though a focus on the fictional text itself. This chapter
shows how the author interview, an increasingly pervasive genre in which authors
speak candidly about their writing practices, also has much potential for contributing to the understanding of diachronic change in narrative fiction. Using a selfcompiled corpus of interviews with fiction writers from the 1950s to the present
day, the chapter identifies, firstly, shifts in the way that contemporary authors
of narrative fiction have conceptualized their practices over time and, secondly,
the more salient conceptualizations of fiction writing that have emerged for each
decade since the 1950s.
In Chapter 2, Marianna Gracheva and Jesse A. Egbert show that authors vary
their language use according to the situational characteristics of individual texts
rather than simply their idiosyncratic preferences for certain language features,
by building on the results of a multidimensional analysis, which reveals considerable within-author stylistic variation in modern literary nonfiction essays. In
particular, the study traces the relationship between communicative purpose and
stylistic choices made by authors across their works and shows that style reflects
functional considerations rather than only individual language attitudes. The findings contribute to the field of corpus stylistics and have practical value for literary
analysis, translation, and creative writing, as they demonstrate that stylistic effects
achieved through strategic use of specific linguistic devices are closely associated
with the situation of use.
In Chapter 3, Pablo Ruano San Segundo uses a corpus-stylistic approach and
investigates the alleged influence of Charles Dickens on the style of the Spanish
novelist Benito Pérez Galdós. To do so, he has developed an annotation system
of Galdós’ novels to identify suspensions. A suspension is a protracted interruption by the narrator of a character’s speech. Stylistically speaking, suspensions
have received attention as one of the techniques typical of Dickens’ style. In this
chapter, Pablo Ruano San Segundo compares Dickens’ use of suspensions to that
of Galdós in Fortunata and Jacinta, the novel for which the Spanish novelist is
best known. The results show that there are patterns in form and function hitherto
unremarked in literary appreciations of Galdós’ style that show how the Spanish
novelist may have incorporated this device into his style to achieve similar effects
as those conveyed by Dickens.
4
Introduction
In Chapter 4, Guadalupe Nieto Caballero and Pablo Ruano San Segundo analyze Ruiz Zafón’s The Cemetery of Forgotten Books series using a corpus-stylistic methodology. The authors look into how Zafón shapes narrative space in the
series. More specifically, they intend to show how certain aspects discussed by
literary critics are enacted in the same way throughout the series, thus unveiling
aspects of Ruiz Zafón’s craftsmanship hitherto unremarked in literary appreciations of his style. To do so, the authors have carried out a cluster analysis, with
which they have identified textual building blocks and analyzed them systematically. The analysis is meant to make a contribution to the still-emerging field of
corpus stylistics in Spanish, illustrating how the analysis of literary works can
benefit greatly from the use of innovative corpus tools.
In Chapter 5, Tak-sum Wong and John Sie Yuen Lee argue that information
extraction from historical text is challenging because of the lack of data to train
natural language processing tools. This chapter evaluates the utility of in-domain
training data for data-driven profiling of characters, verbs, and toponyms and
reports a case study on a corpus of Chinese Buddhist text. As is typical for such
a corpus, the Chinese Buddhist Canon has few annotated linguistic resources
other than names, places, and domain-specific terms. The authors apply a lexicon-based approach for named entity recognition and then report an analysis of
the “who,” “what,” and “where” of the Canon: who the characters were, what
they did, and where they were. Experimental results also show that even a small
amount of word segmentation, part-of-speech, and dependency annotation can
improve accuracy in named entity recognition and in extraction of character-verb
associations.
In Chapter 6, Titika Dimitroulia aims at discussing the use of corpora in literary translation practice, teaching, and research in the frame of the emerging field
of computer-assisted literary translation (CALT). Applied corpus-based translation studies (CBTS or CTS) have not extensively investigated so far the use
of corpora in literary translation practice and education, whereas in descriptive
CBTS, in which literary translation holds an eminent place, digital humanities
(DH) techniques that explore large corpora in innovative ways offer new perspectives in the study of the literary translation as a complex sociocultural discursive event at the heart of world literature. First, the author attempts an outline
of the use of corpora by professional literary translators during the translation
process, as reshaped by electronic tools, and draws their implications for literary translator education. Based on this, a number of new approaches to literary
translation in CBTS will be presented, casting light on translation’s complexity
and the potential of corpus analysis with the use of advanced techniques and
methodologies.
In Chapter 7, Yanfang Su and Kanglong Liu argue that in fiction, fictional dialogues are created with the purpose of character building, plot development, and
reader appeal. To achieve these purposes, the orality features present in fictional
dialogues are designed to mimic authentic conversations, which also pose great
challenges for translation. Previous studies have investigated how orality features
Riccardo MORATTO and Defeng LI 5
are translated in fictional dialogues. There is, however, a lack of quantitative analysis of representative orality features in existing studies of orality in fictional
dialogues. The objective of this study is to fill the research gap by comparing
orality in translated and non-translated fictional dialogues using a comparable
corpus design. To ensure a robust comparison of the two text types, the authors
made use of the linguistic features of dimension 1 in multidimensional analysis
approach together with the dimension score of the two text types. It was found
that both text types display an orality tendency toward the interactive texts, but
the dimension score for non-translated fictional dialogues was higher than that for
translated dialogues, and 11 out of the 28 linguistic features were more frequently
used in non-translated fictional dialogues. The study further explores possible
explanations for the different profiling between translated and non-translated fictional dialogues.
In Chapter 8, Lorenzo Mastropierro explores the translation of repeated reporting verbs in the Harry Potter series in Italian. The author applies a multifactorial approach to investigate whether and to what extent four factors, representing
linguistic features of the source text verbs, have an effect on the reproduction of
repetition in translation or its avoidance. The factors are (i) the frequency, (ii) the
number of possible translation equivalents, (iii) the number of different meanings,
and (iv) the semantic category of the source text verbs. This study moves beyond
the focus on the effects of avoiding repetition in literary translation, or on the
strategies used by translators to avoid repetition, as seen in the existing literature,
to provide instead a data-based and multidimensional description of the phenomenon itself in the context of reporting verbs.
In Chapter 9, Xinyi Zeng and John Sie Yuen Lee present a quantitative analysis on a feminist translation of The Color Purple by Alice Walker. The analysis
identifies distinctive translation strategies in the Chinese version of the novel
by Jie Tao, a prominent feminist translator, through comparison with the version
by Renjing Yang, who did not identify as feminist. The authors annotated 197
sentences in the novel in terms of their sexual content type and the translation
strategy adopted by Tao and Yang. Results show that the feminist version constitutes a slightly more faithful translation, with more frequent use of explicit
translation strategies and less frequent use of conservative ones. Further, it
exhibits distinctive choices in translation strategy for different sexual content
types. The feminist perspective likely motivated the relatively explicit treatment
of references to private parts, body explorations, and female bodily phenomena
and relatively conservative treatment of rape, illicit relations, and stigma related
to virginity.
In Chapter 10, Ronald Jenn and Amel Fraisse describe how corpora-based DH
projects can bring together scholars from different fields, such as natural language
processing, information science, translation studies, and American studies with
equal benefits. This chapter uses Adventures of Huckleberry Finn as a case in
point to explore the necessary steps to be taken and defines a number of criteria
to emulate other corpora-based interdisciplinary projects that would use literary
6
Introduction
texts. One important aspect is also the retrieval of existing scholarship on the
translated texts, allowing for wider multilingual approaches, significantly broadening the scope of fine-grained textual analysis.
In Chapter 11, Kan Wu and Dechao Li investigate the extent to which English
translations of Chinese Wuxia fiction and Western heroic literature in modern English are stylistically similar through stylometric analyses. This chapter adds to literary translation research by highlighting possible stylistic connections between
heroic literature in the East and that in the West, clues that may help understand the
current reception of Wuxia translations. It also contributes to stylometric studies by
introducing the stylistic panorama, a novel concept proposed to describe the stylistic
picture of a (translated) text in a relatively holistic and functional way. Examining six English translations of Wuxia novels and 12 chivalric stories and heroic
fantasies in modern English, the study finds that the Wuxia translations differ from
the two Western subgenres in stylistic panoramas built by formal features (dispersion of word lengths, average sentence length, etc.), as well as the most frequent
words and the most frequent word sequences. Such differences have foregrounded
the unique stylistic features (richer Wuxia-specific vocabularies, shorter paragraph
lengths, etc.) of these translations, which has contributed in part to their favorable
reception among English-speaking readers. It is hoped that this study will encourage
new applications for the concept of stylistic panoramas in future stylometric studies.
In Chapter 12, Jing Fang and Shiwei Fu draw upon a parallel corpus of the
Chinese martial arts novel Legends of the Condor Heroes (射雕英雄傳) and present a linguistic exploration of the English translation of personal reference used
in the novel. In particular, the examination focuses on the protagonist Guo Jing
(郭靖), investigating how the character makes reference to himself and to his
listeners in conversations, through which his personal trait of humbleness is portrayed. Reviews from English readers of the translation show that Guo’s humbleness has been successfully rendered in the translated text, despite the linguistic
disparity between the two languages that poses a challenge in translating personal
reference, which is a significant contributor to the portrayal of humbleness. By
examining and comparing the lexicogrammatical realization of personal reference in the source and the target texts, the authors try to explore how an equally
humble character is developed in the translation. Findings indicate that the translator’s choices in translating personal reference are closely related to the translator’s analysis of the social status of the characters, and the translator also uses
compensation strategies at both the microlevel of sentence and the macrolevel of
text to render an equivalent humble image in the target text. The study is expected
to shed light on how a pragmatically equivalent character could be developed in
translation when the two languages are culturally distant.
In Chapter 13, Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto
argue that the use of lexical bundles (LBs) has been affirmed to be a reliable
indicator of translators’ style as they can reveal the idiosyncrasies beyond the
use of words. Using LBs as an indicator, the authors investigate how fictional
dialogues in two full-length English translations of Hongloumeng diverge in style.
This corpus-assisted study is based on the first 80 chapters of two full-length
Riccardo MORATTO and Defeng LI 7
Hongloumeng translations, that is, one translated by the British sinologist David
Hawkes (who translated the first 80 chapters) and John Minford (who translated
the remaining 40 chapters), and the other co-translated by the Chinese translator
Xianyi Yang and his British wife, Gladys Yang. The results of the study show that
Hawkes used more tokens and types of LBs than did the Yang couple. Further
structural and functional analysis revealed that Hawkes overused verb phrases
and stance markers, whereas the Yangs overused prepositional phrases and referential markers. The divergences in style are discussed with reference to the translators’ language backgrounds, life experiences, and translation purposes.
In Chapter 14, Linping Hou and Defeng Li compare English translations of
metaphors in Lu Xun’s short stories translated by L1 and L2 translators, adopting
a corpus-assisted method. The patterns of the translation strategies were analyzed
within the framework of the relevance theory. The authors found that paraphrasing as a rendering strategy was used more frequently in translating culture-specific metaphors than culture-universal ones, and more recurrently in translating
conventional metaphors than creative ones. It suggested that more cognitive effort
was required to translate these conventional, culture-specific metaphors, typical
barriers in literary translation, to achieve optimal relevance. The authors also
found that L1 translators were more target-oriented in rendering the metaphors
than L2 translators, indicating an impact of translation direction on the translators’
performance. Finally, the present research testified the interplay of the two routes
(i.e., direct translation and indirect translation) in literary translation and demonstrated that the interaction between the two routes was regulated by the principle
of relevance and modulated by such essential factors as the source input and the
translation direction.
In Chapter 15, Tak-sum Wong and Wai-Mun Leung provide a statistical
account and a contrastive study on the use of classifiers in historical Cantonese
and contemporary Cantonese documents. The authors have conducted a statistical
analysis of classifiers present in the Cantonese translations of the 1880s edition
and the 2010 edition of the four canonical gospels in the Christian New Testament; 94 classifiers are observed in the 2010 edition, but only 80 are found in the
1880s edition. The results show that while some classifiers have been used more
regularly since the nineteenth century, for example, kɔ33個 (a general classifier),
kin22件 “piece,” thiu11條 “strip,” tsɛk33隻 (mostly for counting animals and dolls),
and ti55 的/啲, the frequency of some classifiers in the 2010 edition drops drastically as a result of lexical replacement, for example, tat33笪 (for counting fields)
in place of fai33塊. The authors have also found that the reduction in frequency of
reduplicated classifiers is a result of changes in translation strategy rather than a
real reduction in usage in contemporary Cantonese.
We believe that such an international, dynamic, and interdisciplinary exploration will provide valuable insights for anyone who wishes to have a better understanding of the relationship between corpora, literature, and literary translation.
We also believe it is appropriate to take this very opportunity to thank all the
contributors of this volume for their dedication and their efforts to produce their
best research and share it with the readers of the book.
8
Introduction
References
Baker, Mona. 2000. “Towards a Methodology for Investigating the Style of a Literary
Translator.” Target 12, no. 2: 241–66.
Biber, Douglas. 2011. “Corpus Linguistics and the Scientific Study of Literature: Back to
the Future?” Scientific Study of Literature 1, no. 1: 15–23.
Bosseaux, Charlotte. 2001. “A Study of the Translator’s Voice and Style in the French
Translation of Virginia Woolf’s The Waves.” In CTIS Occasional Papers, edited by
Maeve Olohan, 55–75. Manchester: Centre for Translation and Intercultural Studies,
UMIST.
Bosseaux, Charlotte. 2004. “Point of View in Translation: A Corpus-based Study of French
Translations of Virginia Woolf’s To the Lighthouse.” Across Languages and Cultures 5,
no. 1: 107–22.
Chen, Fengde and Defeng Li. 2022. “Patronage and Ideology: A Corpus-assisted Investigation of Eileen Chang’s Style of Translating Herself and the Other.” Digital Scholarship
in the Humanities: fqac015. https://doi.org/10.1093/llc/fqac015.
Fischer-Starcke, Bettina. 2010. Corpus Linguistics in Literary Analysis: Jane Austen and
Her Contemporaries. London: Continuum.
Gonçalves, Lourdes Bernardes. 2016. “A Contribution of Corpus Linguistics to Literary
Analysis.” Transversal – Revista em Tradução, Fortaleza 2, no. 2: 42–53.
Li, Defeng, Chunling Zhang, and Kanglong Liu. 2011. “Translation Style and Ideology:
A Corpus-assisted Analysis of Two English Translations of Hongloumeng.” Literary and
Linguistic Computing 3: 1–14.
Mahlberg, Michaela. 2007. “Clusters, Key Clusters and Local Textual Functions in Dickens.” Corpora 2, no. 1: 1–31.
Mahlberg, Michaela. 2012. “Corpus Analysis of Literary Texts.” In The Encyclopedia of
Applied Linguistics, edited by C. A. Chapelle, 1479–85. Oxford: Blackwell.
Mahlberg, Michaela. 2013. Corpus Stylistics and Dickens’s Fiction. New York and London: Routledge.
Mahlberg, Michaela, and Dan McIntyre. 2011. “A Case for Corpus Stylistics: Ian Fleming’s Casino Royale.” English Text Construction 4, no. 2: 204–27.
Mahlberg, Michaela, Catherine Smith, and Simon Preston. 2013. “Phrases in Literary Contexts: Patterns and Distributions of Suspensions in Dickens’s Novels.” International
Journal of Corpus Linguistics 18, no. 1: 35–56.
McIntyre, Dan. 2010. “Dialogue and Characterization in Quentin Tarantino’s Reservoir
Dogs: A Corpus Stylistic Analysis.” In Language and Style, edited by Dan McIntyre and
Beatrix Busse, 162–82. Basingstoke: Palgrave Macmillan.
Saldanha, Gabriela. 2011. “Translator Style: Methodological Consideration.” The Translator 17, no. 1: 25–50.
Thompson, Paul, and Alison Sealey. 2007. “Through Children’s Eyes?: Corpus Evidence
of the Features of Children’s Literature.” International Journal of Corpus Linguistics
12, no. 1: 1–23.
Wang, Qing, and Defeng Li. 2011. “Looking for Translator’s Fingerprints: A Corpus-based
Study on Chinese Translations of Ulysses.” Literary and Linguistic Computing 11: 1–13.
Winters, Marion. 2004a. “German Translations of F. Scott Fitzgerald’s The Beautiful and
Damned: A Corpus-based Study of Modal Particles as Features of Translators’ Style.”
In Using Corpora and Databases in Translation, edited by Ian Kemble, 71–89. London:
University of Portsmouth.
Riccardo MORATTO and Defeng LI 9
Winters, Marion. 2004b. “F. Scott Fitzgerald’s Die Schönen und Verdammten: A Corpusbased Study of Loan Words and Code Switches as Features of Translators’ Style.” Language Matters, Studies in the Languages of Africa 35, no. 1: 248–58.
Winters, Marion. 2007. “F. Scott Fitzgerald’s Die Schönen und Verdammten: A Corpusbased Study of Speech-act Report Verbs as a Feature of Translators’ Style.” Meta 52,
no. 3: 412–25.
Winters, Marion. 2009. “Modal Particles Explained: How Modal Particles Creep into
Translations and Reveal Rranslators’ Styles.” Target 21, no. 1: 74–97.
1
Diachronic Trends in Fiction
Authors’ Conceptualizations
of their Practices
Darryl Hocking and Paul Mountfort
1.1 Introduction
In recent years, there has been a considerable increase in studies that investigate
diachronic corpora of narrative fiction in order to establish how fiction writing
has changed over time (McIntyre and Walker 2019). Many of these diachronic
analyses involve the development of researcher-compiled corpora and focus on
the stylistic evolution of individual authors, for instance, Hoover’s (2007) study
on the work of Henry James. Some, such as Klaussner and Vogel’s (2018), have
sought to compare diachronic shifts in style between individual authors, while
others examine broader stylistic developments over time in collections of authors’
works from a particular historical period. An early example of the latter can be
found in Biber and Finegan’s (1989) diachronic study of fiction writing, which
examines stylistic change in a small corpus of 33 English literary works from the
seventeenth century onward. Although identifying some pockets of resistance,
particularly in the eighteenth century, Biber and Finegan provide strong evidence
of a progression from a more elaborated and impersonal style in their fictional
texts toward one more characteristic of oral language.
With the growing access to large digitized collections of time-stamped historical texts of fictional works, for example Project Gutenberg or The Google Books
Corpus, the number of diachronic analyses of narrative fiction is further expanding, as is the specific foci of the studies. Underwood (2019), for example, investigates a corpus of English fiction drawn from the HathiTrust Digital Library from
1700 until the early twenty-first century. Among his observations, he finds that
over time fiction has increasingly become distanced stylistically from biographical and nonfictional texts, often the result of a growth in more concrete descriptions of the human body, physical actions, and sensory perceptions. Underwood
also finds that since the 1700s, changes in the pacing of fictional works have
occurred, showing that in the average 250-word passage of an eighteenth century
novel, several days are likely to have passed, but by the twentieth century, a passage of a similar length will typically only describe a period of 30 minutes.
In another example, Sun and Wang (2022) use a 32,851 million token Google
Books subcorpus of English fiction from the 1820s to the 1990s to examine
whether the language of English fiction has tended toward the more concrete or
DOI: 10.4324/9781003298328-2
Diachronic Trends in Fiction Authors’ Conceptualizations 11
the more abstract. They find that, over the last two centuries, English fiction has
become increasingly concrete, and conclude that fiction today is less difficult to
read than it was in the nineteenth century. Also, using the English fiction dataset
of the Google Books corpus, but restricting their analysis to the years between
the 1900 and the 2000, Morin and Acerbi (2017) found that the presence of emotional words significantly decreased in English fiction. To validate their findings,
however, they carried out a similar examination of two other smaller, self-built
corpora of English fiction and came to similar conclusions.
Rather than looking for more general diachronic trends, a number of studies
have targeted specific areas of diachronic development in narrative fiction, for
instance, Busse’s (2020) examination of how fictional characters’ thought and
speech are presented in nineteenth century fictional works, or Kung’s (2007) focus
on the word melancholy in pre-Romantic and Romantic British novels. Among
other findings, Kung found that melancholy is typically associated with reasoning
in the pre-Romantic period and emotion in the Romantic period.
These diachronic studies all develop their findings through a focus on the fictional text itself. A related genre, however, which has untapped potential to contribute to the analysis of diachronic change in narrative fiction, but whose focus
lies beyond the work, is the author’s interview. In interviews, authors generally
express a broad range of issues related to the conceptualization of their practices,
and given the marked proliferation since the 1950s of print and online publications containing interviews with fiction authors (e.g., Freiburg 1999), a further
opportunity is provided for the use of corpus-based analytical tools to examine
and identify shifting trends in fiction and fiction writing practice. Taking this
into account, this chapter uses a self-compiled corpus of interviews with fiction
authors, firstly, to identify diachronic changes in the way that these authors have
conceptualized their practices from the 1950s to the present day and, secondly, to
identify the more salient conceptualizations of fiction writing practice for each of
the decades since the 1950s. The chapter concludes by evaluating the potential
contribution of the artist interview to the quantitative analysis of fiction writing.
1.2 Methods
In order to investigate changes in the way that fiction authors have used language
to conceptualize their writing practices from 1950 until the present day, a trend
mapping analysis (Baker 2011; Stanyer and Mihelj 2016; Hocking 2022) and a diachronic keyword analysis (Baker et al. 2013; Csomay and Young 2021) were carried
out on a 433,000-word diachronic fiction authors’ corpus (hereafter FAC), consisting
of interviews with fiction authors about their practices. Trend mapping identifies
words in a corpus that have exhibited statistically significant decreases or increases
in frequency over a particular period of time, while a diachronic keyword analysis
enables the identification of words found in a specific time period of a corpus that are
comparatively more frequent than those found in the rest of the corpus. In the case of
the FAC, trend mapping can provide insights into the wider shifts over time that have
occurred in fiction writers’ conceptualization of their practices, while a diachronic
12
Darryl Hocking and Paul Mountfort
keyword analysis can help reveal the salient conceptualizations of authors’ practices
during specific time intervals within the corpus. To carry out the analysis, the study
employed the Sketch Engine online corpus-analytical tool (Kilgarriff et al. 2014).
After providing a description of the FAC and the criteria used for its compilation, the
methods used to analyze the corpus will be discussed in more detail.
1.2.1 The Fiction Authors’ Corpus (FAC)
The FAC consists of 232 interviews with fiction authors from 1950 to the present.
The 1950s were selected as the earliest decade for the corpus due to the difficulty
of easily obtaining author interviews prior to this period. To facilitate a manageable analysis of trends, the FAC was divided into time intervals representing eight
calendar decades (Baker et al. 2013).
The texts in the corpus were located from online interviews, some from online
magazines, and others from print magazines reproduced online. Importantly, as the
focus of the corpus was on authors of fiction (i.e., novels and short stories), interviews with authors who primarily discussed, or whose oeuvre primarily involved,
poetry, theatre, essayist literature, or the writing of screenplays were excluded.
Furthermore, as all interviews were required to be in English, authors from English-speaking contexts were largely represented in the corpus. Nevertheless, interviews of authors whose first language was not English were still included if there
was no evidence that the interview had been translated from another language.
In order to specifically focus on the author’s conceptualization of their practices, the questions and comments of interviewers were omitted from the corpus data. Moreover, where appropriate, sections of the interviews that explicitly
focused on the authors’ discussions of their personal history, rather than their writing practices, were also removed from the texts. The FAC is ethnically diverse and
includes emergent, mid-career, and late-career writers of fiction. While the latter
could be argued as conceptualizing their practice in ways representing earlier decades, we would suggest that like all creative individuals, their practices continue
to be influenced by conceptual shifts in the field.
Sketch Engine, the corpus tool used for the analysis, indicates that the FAC
contains 433,281 words. The size of the FAC was determined by the number of
suitable texts that could be obtained to represent each of the earlier decades, and
the decision that the later decades of the corpus should contain a corresponding
word frequency. Hence, to maintain the production of a reasonably sized corpus
yet prevent the stylistic peculiarities of any single author emerging as a generality (McEnery et al. 2006), individual interviews were limited to 2,500 words.
The FAC might be criticized as a relatively small corpus (Aston 1997); however, small context-specific corpora are frequently viewed as advantageous for
the analysis of specialized language use (Bowker and Pearson 2002; Vaughan
and Clancy 2013), because the analyst is able to develop a familiarity with the
overall nature of the texts, which, along with their typically broad knowledge of
the context from which they arise, enhances the capacity to generate and interpret
findings (Aston 1997; Koester 2010). Details of the composition of the FAC can
Diachronic Trends in Fiction Authors’ Conceptualizations 13
Table 1.1 Word Composition of the FAC
Decade
Words
Percentage of Corpus
Texts
Average Words per Text
1950s
1960s
1970s
1980s
1990s
2000s
2010s
2020s
TOTAL
54,300
54,317
53,986
54,047
54,107
54,025
54,189
54,310
433,281
12.5
12.5
12.5
12.5
12.5
12.5
12.5
12.5
100
26
29
26
27
27
28
30
39
232
2,088
1,873
2,076
2,002
2,004
1,929
1,806
1,393
1,868
be seen in Table 1.1. Each time interval represents 12.5% of the corpus and contains approximately 54,000 words.
1.2.2 Trend Mapping
Using the Trends tool in the corpus analytical software Sketch Engine, the trend
mapping analysis involved identifying those lemmas which exhibited statistically
significant (p-value ≤ 0.05) increasing or decreasing frequency trends over the
eight decades of the FAC. The tool uses the Theil-Sen estimator to provide a linear
approximation of the slope of the frequencies of an item over time by calculating
the medium slope between all individual pairs of frequency points. In the case of
this study, the analytical focus was on the lemma representing a particular part of
speech category (referred to as a lempos in Sketch Engine). In Sketch Engine, the
Theil-Sen slope is represented by a numerical value that identifies the direction
and magnitude of the trend. The Mann Kendall test is also employed to identify
the significance level of the trend statistic by providing a p-value. Further details of
Sketch Engine’s Trends tool can be found in Kilgarriff et al. (2015). Drawing upon
Baker (2011), Lazzeretti (2016), and Hocking (2022), it was also determined that
the trend analysis would only include those lemmas that occurred a minimum of
50 times in the corpus. Furthermore, it was also decided to only include those lemmas which exhibited a relatively strong slope over time. These were represented
by a trend value higher than 1 for increasing trends and less than -1 for decreasing
trends. These criteria resulted in 51 increasing or decreasing lemmas of interest.
1.2.3 Diachronic Keyword Analysis
Keywords are those words in a target corpus whose frequencies are unusually high
when set against the frequencies of the same words in a larger reference corpus
(McEnery et al. 2006). They are typically used to capture the overall “aboutness”
(Scott and Tribble 2006, 55) of a target corpus. Following Baker et al. (2013) and
Csomay and Young (2021), a diachronic keyword analysis, however, involves
establishing the keywords for a particular time interval within a diachronic corpus by referencing that time interval against the rest of the corpus. This enables
14
Darryl Hocking and Paul Mountfort
the analyst to identify the aboutness of each particular time interval in the corpus,
which in the case of the FAC can indicate certain salient conceptualizations of
the fiction authors’ practices within successive decades. The diachronic keyword
analysis compliments the trend analysis, which focuses on increasing and decreasing trends from the earlier time intervals of a corpus to the later time intervals. To
evaluate keyness, Sketch Engine uses a simple maths formula to establish a statistic
identifying the degree of keyness and then ranks the results (Kilgarriff 2009). As in
the trend analysis, the focus in the diachronic keyword analysis is on lemmas representing a particular part of speech category. Furthermore, in order to ensure that
any key lemmas identified are substantively representative of the target decade and
not the result of repeated use by the producers of just a few texts in the corpus, the
keyword list for each decade only includes the top six key lemmas which occur in
at least one-third of all texts of the FAC. All keywords in the table are listed in order
of their keyness. Finally, in order to assist the diachronic analysis by facilitating an
analysis of the trend and diachronic keyword findings in context, the more conventional, corpus-based tool of concordance analysis (Baker 2006) was also employed.
1.3 Results and Discussion
Table 1.2 provides a list of all upward-trending lemma in the FAC that meet the
criteria outlined in the previous section, while Table 1.3 provides a list of the
Table 1.2 High-Frequency Upward-Trending Lemmas in the FAC (freq. ≥ 50, p < 0.05)
Rank
Word
Part of Speech
Trend Strength
Freq.
p-value
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
community
space
collection
parent
like
school
kid
grow
relationship
include
experience
need
love
also
family
home
her
allow
narrative
lot
structure
everyone
noun
noun
noun
noun
adjective
noun
noun
verb
noun
verb
verb
verb
verb
adverb
noun
noun
determiner
verb
noun
noun
noun
noun
2.36
2.05
2.05
1.88
1.60
1.54
1.54
1.48
1.48
1.43
1.43
1.38
1.38
1.33
1.33
1.33
1.23
1.23
1.23
1.19
1.11
1.11
66
100
54
75
71
107
89
156
121
61
54
260
264
530
186
128
673
105
85
559
110
87
0.00930
0.00930
0.01800
0.00440
0.03500
0.00190
0.01800
0.00930
0.03500
0.00930
0.03500
0.00930
0.00930
0.00930
0.00440
0.01800
0.00930
0.01800
0.01800
0.00440
0.00930
0.00930
Diachronic Trends in Fiction Authors’ Conceptualizations 15
Table 1.3 High-Frequency Downward-Trending Lemmas in the FAC (freq. ≥ 50, p < 0.05)
Rank
Word
Part of Speech
Trend Strength
Freq.
p-value
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
technique
critic
modern
typewriter
simply
play
hero
style
unless
use
several
must
deal
business
he
rewrite
accept
man
matter
himself
no
influence
great
begin
yes
any
no
nothing
James
noun
noun
adjective
noun
adverb
noun
noun
noun
conjunction
noun
adjective
modal
noun
noun
determiner
verb
verb
noun
noun
determiner
–
noun
adjective
verb
–
–
adverb
noun
noun
-2.90
-1.88
-1.73
-1.66
-1.66
-1.66
-1.60
-1.60
-1.60
-1.60
-1.60
-1.38
-1.33
-1.33
-1.23
-1.19
-1.19
-1.11
-1.11
-1.11
-1.07
-1.07
-1.04
-1.04
-1.00
-1.00
-1.00
-1.00
-1.00
79
90
57
54
158
102
70
157
51
50
63
267
82
82
2,005
54
76
610
154
145
860
66
423
320
441
612
167
216
67
0.0093
0.0180
0.0044
0.0350
0.0044
0.0350
0.0019
0.0350
0.0350
0.0180
0.0350
0.0093
0.0180
0.0350
0.0093
0.0180
0.0350
0.0019
0.0350
0.0044
0.0044
0.018
0.0350
0.0093
0.0093
0.0044
0.0180
0.0093
0.0350
downward-trending lemmas. In both tables, the lemmas are ranked according to their
trend strength. The tables also show the part of speech, total frequency, and p-value
for each lemma. In order to foreground the wider and more consequential shifts over
time in fiction authors’ conceptualization of their practices, the following discussion
of these trend results is organized around four major themes, all of which consider
the association between particular upward- and downward-trending items. By consequential we mean those shifts which appear to constellate within a larger pattern
and, as a result, may represent broader epistemological shifts in the discourse around
writing.
1.3.1 A Shift from Formal and Generic Influence Toward the
Possibilities and Subjectivity of Experience
A shift from formal concerns toward greater openness, experience, and multiple
exigencies is visible, for instance, in the declining use of technique, influence,
16
Darryl Hocking and Paul Mountfort
must, and simply versus the rising use of experience, include, allow, and also. In
the 1950s, technique is ubiquitous, occurring 32 times compared to only twice in
the 2010s, with the 1980s being the pivot to precipitous decline from 14 occurrences in the 1970s to only 3 in the 2020s. While technique may be framed subjectivity (i.e., stemming from writers’ instincts or nature) and there are disavowals
of technique’s importance, the influence of particular books, authors, and literary
movements, or adjacent media such as poems, plays, and films, is frequently cited.
This is reinforced by the concomitant decline of influence, which peaked with 14
occurrences in the 1960s, where it is similarly deployed (“Jamesian influence,”
“Hemmingway’s influence,” etc.), as compared to only 4 occurrences per decade
for the 2010s and 2020s. Furthermore, there is a declining sense of imperative,
with use of the modal verb must halving by the 1990s from its 1960s peak, and to
a fifth in the last two decades. Use of the adverb simply declines concomitantly
from 41 in the 1950s to only 8 in the 2020s, perhaps signifying that notions of the
self-evident and straightforward can no longer be taken for granted.
By comparison, use of the verb experience has consistently grown from only
1 instance in the 1950s to 5 per decade in the 1960s, 1970s, 1990s, and 2000s,
with peaks of 9 in the 1980s, 10 in the 2021s, and 14 in the 2020s. The sole 1950s
example linked age and experience, while the 1960s through the 1990s, despite
often discussing experience in the literal sense, increasingly emphasize the role
of consciousness in shaping reality, pivoting in the 1980s toward acknowledging
heterodox ways of experiencing the world. By the twenty-first century, this sense
has bedded in, along with reference to the interior experiences of both fictional
characters and readers of the fictional work.
Consistent upward trending of the verbs include and allow, along with the
adverb also, suggests a similar “loosening up” of possibility. Include varied
between three to four usages from the 1950s through the 1980s before coming
to average almost triple that in the last four decades. It now frequently refers to
a variety of formative factors, including the role of an author’s life experience
in constructing a fictional work. Include can also connote inclusivity of choice
regarding a writer’s options in developing a narrative (“a story can be anything,
can include anything”). Usage of allow has quadrupled from the fifties through to
2020s and similarly suggests the multiplicity of affordances contemporary writers
enjoy (“you are allowed to practice your way and I’m allowed to practice mine”),
while also has risen from 22 instances in the 1950s through to 124 in the 2020s.
Its additive function may correlate with an increasing emphasis on being able to
contain contradictory impulses, such as “works that are full of energy but also full
of vulgarity, crudity, and incompetence.”
1.3.2 A Shift from Stylistics to Structure and Contested Narratives
“Style and technique” are commonly collocated, with style, another of the top
10 downwardly trending terms, declining more than fourfold from 40 usages per
decade in the 1950s and 1960s to 8 today. As with technique, there are equivocal
views on certain styles (i.e., “the conventional conception”), dubiousness about
Diachronic Trends in Fiction Authors’ Conceptualizations 17
the very notion (“a fancy form of writing”), and associated notions of genre (“this
so-called style. I don’t know what they’re talking about ‘tough,’ ‘hard boiled’”).
There is a notable trend in the last two decades to frame style in terms of both contrivance (“a contrived style”) and heterodoxy (“each story brings its own style,”
“alter egos who wrote in distinct styles”), suggesting a postmodern shift toward
style as artifice rather than some essential attribute.
Modern also declines notably from 13 and 14 occurrences in the 1950s and
1960s, respectively, to 2 and 3 in the 2010s and 2020s. In the 1950s, it was commonly applied to literary genres, novels, and especially, writers. By the 2010s and
2020s, there are only two such usages, and one to do with “something that sounds
modern.” The use of the adjective great is also in serious retrenchment, from its
peak of 83 in the 1960s to 24 this decade. Though scattered references remain to
great “writers,” “stories,” “tales,” and “fiction,” there is increasing reference to
great “questions,” “mysteries,” “loves,” and “heartbreaks.” The inference would
be that, along with specific styles, modernism’s concern with canonical writers
and “great works” has lost currency to the demand for relatable content.
By comparison, structure and narrative are on the incline. The growth of the
former has been protean, increasingly fivefold from 1950 to 25 occurrences in
the 2020s. Structure was explicitly linked in the 1950s and 1960s to formalism,
including “form,” “logic,” “craft,” and “plan.” However, as early as the restive
1960s and 1970s, such overdetermination was increasingly questioned. Structure
is, instead, something that emerges from the unknown and “can’t be planned in
advance,” defying attempts at the “linear” and “well-defined.” The 1980s and
1990s go further, with writing itself imposing structure on reality, including
power structures which must be contested. Such usages continue, though in the
2000s there was increasing emphasis on storytelling structures, such as “story
structure,” “novelistic structure,” “Aristotelian structures,” and “folkloric structures.” Despite this, recent usage evidences a growing sense of the relativity and
multiplicity of narrative structure/s.
Use of narrative has doubled in frequency from 9 occurrences in the 1950s to
21 this decade, with a spike in the 1990s of 24 occurrences. The shift is from fairly
utilitarian deployment in the 1950s, often linked to “expositional and narrative
writing” and “narrative technique,” to a more explicit focus on voice and characters’ own conceptualizations about themselves, often pluralistic in nature (“narrative points of view”). By the 2020s, there is increasing consciousness of how
narratives are used to frame and indeed construct fictional “realities” (“different
narrative[s],” “the nation’s narrative,” “a new kind of narrative”) and contestations of identity, such as gender and sexual orientation. Thus, we find mention of
“men controlling the narrative” and “straight narrative,” versus “women get[ting]
hold of the narrative,” “coming-out narratives,” and “trans narratives.”
1.3.3 Critics, the Business of Writing, and the Rise of Affect
Critic is the second steepest-declining lemma. The 1950s, with 37 occurrences,
were the critics’ heyday, a time when the profession was institutionalized in
18
Darryl Hocking and Paul Mountfort
literary broadsheets and practitioners acted as cultural arbiters. This declines by
approximately half per decade from the 1960s to the 1980s’ 5 instances. Brief
upticks to 7 per decade in the 1990s and 2000s were quickly effaced, with a slump
back to 5 occurrences again in the 2010s and only 1 in the 2020s. Ambivalence
toward critics has always been proverbial from authors, and a high proportion of
usages in the 1950s, despite acknowledgment of the critics’ job being to critique,
are negative. They stand variously accused of reading too much into a work, being
hard to please, having agendas, not listening, and being too fierce. However, such
disavowals prove that critical opinions or consensus mattered to, frequently irritated, and sometimes wounded writers. This ambivalence carries on – one choice
phrase from the 1960s describes “a certain type of critic, the ferrity, human-interest fiend, the jolly vulgarian,” while a 1970s writer impugns their willful refusal
“to see what [their] books are really about” – until the nadir of the term.
By the 1980s we find mention of the limitations of the author as critic of their
own work, and in the 2010s, the need to outrun the “inner critic” along with mention of the particular tastes of “gay readers and critics,” while the sole usage in
the 2020s is a writer claiming they were “able to be [their] own critic.” In other
words, criticism came to be discussed less in terms of external recognition and
more with regard to intrinsic factors and fitness for target readerships. Perhaps
concomitantly, framing writing in terms of business has also declined from peaks
of 16 and 19 in the 1950s and 1970s, respectively, to just under 7 per decade from
the 1980s to today. Business occurs most frequently to mean the work of writing,
with occasional reference to the “business plan” and publishers as “businesspeople,” among more general references to “this business of roots and stories” and
“the business of living.”
Against these declines, a raft of subjectively toned words connoting positive,
even passionate, engagement are in the ascendency, such as feels, open, wanted,
love, and felt. Space does not permit sustained unpacking here, but it can be noted
that feels has tripled this century from 8 uses in the 2000s to 24 in the 2020s, where
it has come to foreground the feelings of writers, characters, and readers in relation
to critical issues. Metonymic of this wider shift, “the idea that there’s somehow
less art in writing that feels close to the bone is just an old, ancient, disempowering
story.” This affective turn or “structure of feeling,” as Williams (1977, 133) put
it, is reflected in similar if less-steep inclines for wanted and felt. Wanted is very
frequently used now in relation to writers’ priorities when developing work (i.e., [I]
wanted “to explore,” “to write,” or “speak” about, to “focus on,” “to capture,” “to
show,” “to suggest,” “to defamiliarize”), while felt connotes a growing emphasis
on writers’ self-reflexivity toward their own work ([it] felt “so enormous,” “more
urgent and true,” “productive,” “subversive!” “limited,” “more apolitical,” etc.).
Arguably, then, affective concerns have displaced hierarchical impositions from the
outside, whether critical or prescriptive, of how writers conceptualize their “work.”
1.3.4 Centrality of Community and Space
Cumulatively, these shifts would appear to reconfigure earlier formalist concerns
with technique and style that are hedged around with imperative toward more
Diachronic Trends in Fiction Authors’ Conceptualizations 19
holistic conceptualizations of the writing process and its social context. Arguably, this culminates in a major shift toward the foregrounding of community
and space, the first and third upward-trending terms in the corpus, respectively.
The rise in usage of community has been meteoric – from a couple of instances
in the 1960s and 1970s to 23 this decade. A spike in the 1980s, particularly in
relation to ethnic communities, fan cultures, and “the gay community,” has prefigured its increasing mobilization since then. We find many such increasingly
heterodox usages this century, from the “Jewish/communist anarchist/community,” “the Māori community,” and “Bronson Alcott’s failed utopian community,”
to the “radical feminist alternative community” and “the queer community.” Such
communities are sometimes the subject of literary works or allude to the writer’s
cultural background but may also be the sites in, or for, which the works are
produced.
Space has also risen from similar obscurity in the 1950s and 1960s to 25
instances in the 2010s and 31 this decade. In the 1970s, there is an almost-perfect
split in its 12 occurrences between “outer space” and “inner space, psychological space,” a dichotomy that persists through the 1980s. By the 1990s space is
increasingly to do with gaps and ellipses, including spatial layout; a narrator’s
“own demands for space” also figures. These uses intensify in the 2000s, from
the need for “space for the imagination,” to the inherent politics of space (“those
spaces of silence that exist in Nigerian literature”). In the 2010s, both these trends
are extended via discussion of “spaces within novels” and “the idea that there
exist multiple and distinct cultural and racial spaces.” By the 2020s we have a
plethora of usages, from “female,” “feminist,” “gendered,” and “queer” spaces to
“weird,” “tonal,” “emotional,” “online,” “coworking,” and “safe” spaces. Despite
the fact that community and space do not specifically collocate in the corpus, the
terms do appear to be complementary in suggesting sites for the production and
circulation of literature that are less prescriptive in terms of formalist concerns
and more rooted in the context of diverse communities, along with the affordances
their associated spaces – literal and conceptual – provide.
1.3.5 Diachronic Keyword Analysis
Tables 1.4 to 1.8 provide an indication of the top keywords for each decade of
the corpus. As indicated previously, to ensure that all lemmas identified as key
are particularly representative of the target decade, they must occur in at least
Table 1.4 1950s Keywords
Rank Lemma
POS
Freq. Keyness FPM 1950s FPM Rest Doc Freq. Focus
1
2
3
4
5
6
noun
noun
noun
adjective
noun
modal
37
32
29
24
32
15
critic
technique
hero
simple
play
ought
3.1
3.0
2.9
2.5
2.4
2.3
592.56
512.48
464.44
384.36
512.48
240.23
121.15
107.43
93.72
91.43
160.00
48.00
16
10
10
11
10
9
20
Darryl Hocking and Paul Mountfort
Table 1.5 1960s Keywords
Rank
Lemma
POS
Freq.
Keyness
FPM 1960s
FPM Rest
Doc Freq. Focus
1
2
3
4
5
6
must
power
man
picture
country
style
modal
noun
noun
noun
noun
noun
72
31
150
23
45
40
2.3
2.2
2.2
2.2
2.0
2.0
1,152.15
496.06
2,400.31
368.05
720.09
640.08
445.77
166.88
1,051.57
116.59
306.33
267.46
23
10
24
11
15
13
Table 1.6 1970s Keywords
Rank Lemma POS
Freq. Keyness FPM 1970s FPM Rest Doc Freq. Focus
1
2
3
4
5
6
25
37
23
96
13
22
serious
author
paper
fiction
terribly
money
adjective
noun
noun
noun
adverb
noun
2.4
2.1
1.9
1.9
1.9
1.9
399.99
591.98
367.99
1,535.95
207.99
351.99
112.02
233.18
141.74
768.12
64.01
141.74
9
15
13
17
10
11
Table 1.7 1980s Keywords
Rank Lemma
POS
Freq. Keyness FPM 1980s FPM Rest Doc Freq. Focus
1
2
3
4
5
6
adverb
adjective
adjective
noun
preposition
adverb
16
27
29
26
14
54
basically
moral
large
imagination
beyond
rather
2.4
2.3
2.1
1.9
1.8
1.8
255.07
430.42
462.31
414.48
223.18
860.85
48.03
132.66
164.68
171.54
77.77
432.29
10
12
16
10
11
16
Table 1.8 1990s Keywords
Rank Lemma
POS
Freq. Keyness FPM 1990s FPM Rest Doc Freq. Focus
1
2
3
4
5
6
noun
noun
noun
noun
adjective
noun
16
24
52
14
21
25
face
act
voice
talk
English
truth
2.1
2.1
2.1
1.8
1.7
1.7
255.89
383.83
831.64
223.90
335.85
399.83
68.59
130.31
349.79
77.73
153.18
192.04
10
9
10
9
10
13
one-third of all interviews from that decade. This document frequency statistic
is found in the final column. The tables also provide the keyness score for the
lemma, as well as their frequency and frequency per million for the target decade,
and frequency per million for the rest of the FAC.
The keywords for the 1950s reinforce the earlier trend analysis. For example, the
modal ought (“the writer ought to help the reader as much as he can”) aligns with
Diachronic Trends in Fiction Authors’ Conceptualizations 21
the modal must, which, as indicated previously, frames practices of fiction writing as
constituted by certain obligatory actions and values (“the writer must be disengaged,
or else he is writing politics”). It is also notable that while hero figures prominently
here, it ranks seventh in terms of overall decline (from 29 in the 1950s to 2 in the
2020s), with frequent ambivalence toward the notion, even in the 1950s, culminating in the practical redundancy of the term in today’s conversations around fiction.
The frequency of the proclamatory must is at its apex in the 1960s, but what is
of particular related interest here is fiction writers’ obsession with power, either
to conceptualize their own writing as associated with a sense of personal power
(“people tend to underestimate the power of my imagination,” “it lies within our
power, as writers . . . to do something for others”) or as a motivating concern
for their writing (“I think there is no question that power is a great temptation”).
Perhaps concomitantly, man is a term that rides high in the 1960s but is one of a
number of male-gendered lemma from the corpus, others being he and himself,
that are in sharp overall decline.
Widespread use of the adjective serious prefixing such terms as “story,”
“novel,” “work,” “fiction,” “audience,” and “film critics” lends literary matters
considerable gravitas in the 1970s. Concomitantly, this is the only decade where
author occupies the top six key lemmas, with interviewees often talking about
“an author” or “the author” in generic terms (“the author’s voice”). Fiction is
used largely descriptively to refer to the mega-genre of prose storytelling but
can increasingly stand in for a “novel” or “story” (“every middle-class fiction
is basically a story of adultery,” “the strange amorphous fictions of Barthes and
Robbe-Gillet”) in a way suggestive of a postmodern turn toward framing stories
as fictions.
Adverbs and adjectives dominate the 1980s keywords, with broad-sweeping
terms, such as the essentializing basically, intensifier rather, and the aggrandizing large lavished over the decade of excess. At a time when relativists faced off
against social conservatives, the term moral retains force but is often inflected
with doubt, irony, or even derision (“the moral code of the middle-class writer”).
For the first time, writing as a product of imagination, which also engages that of
readers, becomes a discernible concern.
The 1990s’ key lemmas are particularly interesting for their shared semantic
relationships with representation, in a decade when meaning making was increasingly understood as performative. Act is used to refer to both the act of writing, the mimetic acting out of roles, and the multitudinous acts of “love,” “lust,”
“hate,” “indifference,” “violence,” “despair,” “manipulation,” and “hope.” Talk
may refer to attending talks, speaking voices and idiolects (“this guy talk voice”),
and discourse (“talk about”) around issues critical to writers and writing. Voice
can refer literally to how people speak, but also differing narrative voices, including points of view (POVs) within fictional works, and the concerns writers give
voice to. These mediating factors frequently complicate the notion of unitary truth
(“I shall argue with you about your interpretation of the truth”).
In the 2000s, landscape becomes a concern in senses cinematic (being “just
as important as the [human] figures”), psychogeographic (“the emotional
22
Darryl Hocking and Paul Mountfort
Table 1.9 2000s Keywords
Rank Lemma
POS
Freq. Keyness FPM 2000s FPM Rest Doc Freq. Focus
1
2
3
4
5
6
noun
noun
noun
adjective
noun
noun
19
30
21
16
24
12
landscape
boy
pleasure
recently
kid
hope
2.4
2.3
2.2
2.0
1.9
1.8
303.12
478.61
335.02
255.26
382.89
191.44
70.90
150.94
96.05
77.76
148.66
66.32
10
14
13
14
13
9
Table 1.10 2010s Keywords
Rank Lemma
POS
Freq. Keyness FPM 2010s FPM Rest Doc Freq. Focus
1
2
3
4
5
6
noun
noun
noun
noun
noun
noun
17
18
40
49
20
23
perspective
childhood
draft
art
research
kid
2.3
2.2
2.1
2.0
2.0
1.9
271.84
287.83
639.62
783.54
319.81
367.78
61.73
75.45
246.92
336.08
107.45
150.89
10
10
12
14
10
10
landscape”), and with regard to the literary landscape itself, the metaphor likely
owing its rise to an increasingly visual and multimodal culture. Boy and kid
often relate to writers’ own upbringings, but the focus on adolescent characters may be connected to the growth in young adult (YA) literature. A spike
in abstract nouns pleasure and hope – the first often applying to the “particular, private pleasure” of both text and writing, the second finely balanced with
hopelessness (“hopes and desires for what we want the world to look like”) –
perhaps reflects a decade in which multiple existential threats to civilization
became visible.
Several ranked nouns suggest increasingly conscious engagement with the
act of writing. Perspective functions both in regard to characterological POVs
(e.g., “the ‘god’ perspective”), but also heterodox ways of seeing (“a variety of
voices from all different perspectives”). Draft connotes concern with process and
reworking material multiple times. Interestingly, art is used less often to refer to
writing than related visual arts, but it is notable that, as in visual arts (Hocking
2022), research has increasing currency in creative writing praxis, though sometimes in disavowal (“I try to do as little research as possible when writing” and
“research deadens fiction.”) Complementary to the 2000s, kid is again on the rise,
as is childhood, for related reasons.
It is striking that identity is the top trending lemma in a decade when identarian concerns are very much part of public discourse. It is no surprise, then, to see
it used directly after possessives such as “their,” “your,” and “me,” connoting
ownership, and to heterodox identity markers (e.g., “gay,” “racial,” “marginal,”
and “hybrid”). Whereas male-gendered terms are falling in use, her is on the rise.
Diachronic Trends in Fiction Authors’ Conceptualizations 23
Table 1.11 2020s Keywords
Rank Lemma POS
1
2
3
4
5
6
identity
family
parent
her
space
figure
Freq. Keyness FPM 2020s FPM Rest Doc Freq. Focus
noun
25
noun
64
noun
27
determiner 180
noun
31
verb
17
3.3
3.0
2.6
2.4
2.3
2.3
403.08
1,031.88
435.32
2,902.15
499.81
274.09
50.24
278.60
109.61
1,125.80
157.57
66.22
13
19
13
30
17
13
Complementary to these shifts, space is a key term. As discussed in Section 1.3.4,
it clearly now denotes both literal and more abstract zones within which these
newly empowered identities, and their associated communities, can contest their
corner. Figure is most commonly collocated with “out,” reinforcing the sense of a
time when meaning is not set but is being contested both through the act of writing
and outside of the boundaries of the text, as if to say, “We are still figuring it out.”
1.4 Conclusion
Using a self-compiled diachronic corpus of interviews with narrative fiction
authors from the 1950s until the present day, this chapter has identified four major
thematic trends in their conceptualizations around their literary practices. In general, they suggest shifts from formalist concerns to more holistic conceptualizations of the writing process and its social context. Of course, there are nuances
within this broad arc, such as the declining influence of technique and style set
against rising and increasingly polymorphous conceptualizations of structure and
narrative. Additionally, the study identified the more salient conceptualizations
of fiction writing practice for each of the decades since the 1950s. It was shown
that obligatory pressures associated with the powerful demands of cultural arbiters have given way to writers’ more affective engagement with identity, creative
process, and heterodox readerships. Given how these findings chime with widely
identifiable trends in our cultural moment, we hope to have shown that the artist
interview has considerable potential to contribute to the quantitative analysis of
fiction writing.
References
Aston, Guy. 1997. “Small and Large Corpora in Language Learning.” In Practical Applications in Language Corpora, edited by Barbara Lewandowska-Tomaszczyk and Patrick
James, 51–62. Melia: Lódz University Press.
Baker, Paul. 2006. Using Corpora in Discourse Analysis. London: Continuum.
Baker, Paul. 2011. “Times May Change But We’ll Always Have Money: A Corpus Driven
Examination of Vocabulary Change in Four Diachronic Corpora.” Journal of English
Linguistics 39, no. 1: 65–88.
24
Darryl Hocking and Paul Mountfort
Baker, Paul, Costas Gabrielatos, and Tony McEnery. 2013. Discourse Analysis and Media
Attitudes: The Representation of Islam in the British Press. Cambridge: Cambridge University Press.
Biber, Douglas, and Edward Finegan. 1989. “Drift and the Evolution of English Style:
A History of Three Genres.” Language 65, no. 3: 487–517.
Bowker, Lynne, and Jennifer Pearson. 2002. Working with Specialized Language: A Practical Guide to Using Corpora. London: Routledge.
Busse, Beatrix. 2020. Speech, Writing, and Thought Presentation in 19th-Century Narrative Fiction: A Corpus-Assisted Approach. Oxford: Oxford University Press.
Csomay, Enriko, and Ryan Young. 2021. “Language Use in Pop Culture Over Three Decades: A Diachronic Keyword Analysis of Star Trek Dialogues.” International Journal of
Corpus Linguistics 26, no. 1: 71–94.
Freiburg, Rudolf, and Jan Schnitker. 1999. Do You Consider Yourself a Postmodern
Author?: Interviews with Contemporary English Writers. Munster: Lit Verlag.
Hocking, Darryl. 2022. The Impact of Everyday Language Change on the Practices of
Visual Artists. Cambridge: Cambridge University Press.
Hoover, David L. 2007. “Corpus Stylistics, Stylometry, and the Styles of Henry James.”
Style 41, no. 2: 174–203.
Kilgarriff, Adam. 2009. “Simple Maths for Keywords.” In Proceedings of Corpus Linguistics Conference CL2009, edited by Michaela Mahlberg, Victorina González Díaz, and
Catherine Smith. Liverpool: University of Liverpool.
Kilgarriff, Adam, Vit Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit,
Pavel Rychlý, and Vít Suchomel. 2014. “The Sketch Engine: Ten Years On.” Lexicography ASIALEX, no. 1: 7–36.
Kilgarriff, Adam, Ondřej Herman, Jan Bušta, Vojtěch Kovář, Vít Baisa, and Miloš
Jakubíček. 2015. “DIACRAN: A Framework for Diachronic Analysis.” www.sketchengine.eu/wpcontent/uploads/Diacran_CL2015.pdf.
Klaussner, Carndmen, and Carl Vogel. 2018. “A Diachronic Corpus for Literary Style Analysis.” In Proceedings of the Eleventh International Conference on Language Resources
and Evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA). https://aclanthology.org/L18-1552.pdf.
Koester, Almut. 2010. “Building A Small Specialised Corpora.” In The Routledge Handbook of Corpus Linguistics, edited by Anne O’Keeffe and Michael McCarthy, 66–79.
Oxon: Routledge.
Kung, Sally. 2007. “Unit 5 Case Studies 5.2 a Diachronic Study of Melancholy in a British Novel Corpus.” www.birmingham.ac.uk/Documents/college-artslaw/corpus/Intro/
Unit52Melancholy.pdf.
Lazzeretti, Cecillia. 2016. The Language of Museum Communication: A Diachronic Perspective. London: Palgrave Macmillan.
McEnery, Tony, Richard Xiao, and Yukio Tono. 2006. Corpus-based Language Studies: An
Advanced Resource Book. New York: Routledge.
McIntyre, Dan, and Brian Walker. 2019. Corpus Stylistics: Theory and Practice. Edinburgh: Edinburgh University Press.
Morin, Oliver, and Alberto Acerbi. 2017. “Birth of the Cool: A Two-centuries Decline
in Emotional Expression in Anglophone Fiction.” Cognition and Emotion 31, no. 8:
1663–75.
Scott, Mike, and Christopher Tribble. 2006. Textual Patterns: Key Words and Corpus Analysis in Language Education. Amsterdam: John Benjamins.
Diachronic Trends in Fiction Authors’ Conceptualizations 25
Sun, Kun, and Rong Wang. 2022. “The Evolutionary Pattern of Language in English Fiction Over the Last Two Centuries: Insights from Linguistic Concreteness and Imageability.” SAGE Open 12, no. 1: 1–13. https://doi.org/10.1177/21582440211069386.
Stanyer, James, and Sabina Mihelj. 2016. “Taking Time Seriously? Theorizing and
Researching Change in Communication and Media Studies.” Journal of Communication 66, no. 2: 266–79.
Underwood, Ted. 2019. Distant Horizons. London: University of Chicago Press.
Vaughan, Elaine, and Brian Clancy. 2013. “Small Corpora and Pragmatics.” In Yearbook
of Corpus Linguistics and Pragmatics, edited by Jesús Romero-Trillo, 53–73. London:
Springer.
Williams, Raymond. 1977. Marxism and Literature. Oxford: Oxford University Press.
2
Within-Author Style Variation
in Literary Nonfiction
The Situational Perspective
Marianna Gracheva and Jesse A. Egbert
2.1 Introduction
Studies of style focus on consistent patterns in an author’s works with the goal
of revealing pervasive trends in their language use. It has been observed, however, that style is not uniform, but a single author displays varying degrees of
versatility across their works. Often, such versatility is associated with evolution
of style over time. Hoover (2007), for example, examines Henry James’s style
diachronically, identifying pervasive vocabulary in 20 of his novels, and distinguishes between substyles within them as well as James’s early, intermediate, and
late stylistic trends. Moss (2014) performs a diachronic analysis of James’s syntax
in two novels and finds increased syntactic complexity in non-dialogic sections
of his later work.
It does not seem common, however, for such studies to identify reasons for this
variation other than the effect of time. Biber and Conrad (2019, 16) note that research
on style is primarily concerned with the aesthetic value of linguistic choices, which
are “not directly functional.” In fact, functional underpinnings of these choices are
often seen as either irrelevant or impossible to identify. Discussing James’s frequent use of pronouns without clear referents, for example, Moss (2014, 78) notes
that “sometimes James uses these stylistic devices without clear reasons, apparently
mimicking the pattern set by those sentences in which such unusual structures have
been meaningful,” thus suggesting that there is essentially no functional reason for
this use and it is purely idiosyncratic. The same idea is expressed in her statement
regarding sentence length: “No reason has been identified for the relatively short
clauses . . . ; it seems simply to be a stylistic variation” (Moss 2014, 173).
It is possible, however, that language produced with attention to aesthetics at the
same time fulfills certain communicative functions. In their synchronic study of
eighteenth-century fiction and essays, Biber and Finegan (1994, 13–4) demonstrate
that a method based on systematic functional co-occurrence of linguistic features
(multidimensional analysis) can be applied to stylistic analysis. To this end, the
styles of Johnson, Addison, Defoe, and Swift are analyzed with respect to three
dimensions of variation identified by Biber (1988): “informational vs. involved
production” (preference for nominal [informational] features vs. clausal ones, characteristic for spoken, more involved discourse), “elaborated vs. situation-dependent
DOI: 10.4324/9781003298328-3
Within-Author Style Variation in Literary Nonfiction 27
reference” (use of wh-relative clauses for elaboration vs. discourse situated through
time, place, and other adverbials), and “abstract style” (use of passive structures).
The study illustrates that some authors exhibit noticeable variation across their
works. For example, on the dimension of “elaborated vs. situation-dependent reference,” Swift’s fiction is “moderately non-elaborated,” but one text is the most
situated of all the text in the study. In essays, Defoe’s texts are divided between
markedly situated and relatively elaborated; Addison shows a wide range as well
from highly elaborated texts to quite situated. Swift’s and Addison’s essays also
extend over a range of variation along the dimension of “informational vs. involved
production.” Biber and Finegan attribute this range to “stylistic adaptation to various topics and purposes,” thus suggesting that these preferences for different sets of
features are functional and serve the communicative goal of the text.
In a study of nineteenth-century fiction, Egbert (2012) observes considerable within-author variation in the styles of several authors, the most striking of
which is the case of Mark Twain on the dimension of “thought presentation vs.
description.” Twain’s works The Prince and the Pauper and Tom Sawyer, Detective occupy opposite ends of the dimension, the former being highly descriptive,
rich in attributive adjectives and prepositional phrases modifying other nouns, and
the latter heavily focused on character thought and emotion conveyed through a
variety of verbal features and their complements expressing stance. The study further examines within-novel variation across the chapters of these two works. The
results again show a considerable range in scores on the dimension, indicating
vast differences even within a single novel, yet no overlap in the dimension scores
of the individual chapters between the two books. While further analysis of these
trends was outside the scope of the study, these results suggest that systematic
choices made by the author in favor of certain sets of features fulfilling the purpose
of presenting ideas or that of description are guided by functional considerations.
Extensive within-author differences along dimensions of style variation were
observed in another literary register – modern nonfiction essays (Gracheva 2022).
The study identifies three main dimensions of variation in the corpus: “interactive
vs. informational style,” “abstract expository vs. concrete descriptive style,” and
“immediate vs. removed style.” While some authors show stylistic consistency,
their texts gravitating toward one or the other end of the spectrum, each dimension also features authors who exhibit substantial variation across their works.
Analysis of texts from the opposite poles suggests that this internal variability
may reflect consistent differences in situational characteristics. In particular, the
communicative purpose of each text may be a key consideration influencing the
choice of style. For example, in the works of the same author, the purpose of
entertaining and sharing personal experiences results in a highly interactive style,
while the purpose of informing warrants a focus on information presentation; the
goal to evaluate abstract notions leads to an abstract expository style, while a concrete descriptive style is chosen by the same writer to depict a future event; finally,
the immediate style creating a sense of urgency is employed in a persuasive text
aiming to instigate change, while a past-oriented removed style is opted for by the
same author in a cultural and historical treatise.
28
Marianna Gracheva and Jesse A. Egbert
While these observations, based on qualitative analysis of select authors’
works, warrant only tentative conclusions, the present study aims to consistently
distinguish between situational characteristics of the essays and trace the relationship between communicative purpose and an author’s stylistic choices across their
works. Since the term “essay” is applied to a wide variety of factually accurate
texts detailing events from the author’s life or real-world events (Gutkind 2007),
authors writing in this register have an unusual amount of creative freedom, and
essays are commonly classified as personal, memoir, literary journalism, narrative, rhetorical, meditative, collage, braided, or lyric, among others (Silverman
2008; Hartsock 2016). It is highly likely that this variety of essay types reflects
difference in communicative purpose; it is therefore natural to expect that authors
who do not limit themselves to a single type exhibit variation in purpose across
their texts. Alternatively, a single essay type may integrate a variety of communicative purposes into a single text.
In this attempt to investigate the relationship between author style and communicative purpose, this study problematizes the notion of style as merely a reflection of aesthetic or idiosyncratic preferences. Since linguistic features have been
shown to be functional in nature (e.g., Biber 1988), if an author consistently associates a particular communicative purpose with a certain group of features, it may
be problematic to separate purely aesthetic preference from functional use. The
study poses the following research question:
1
To what extent does the communicative purpose predict within-author stylistic variation on the identified dimensions (“interactive vs. informational
style,” “abstract expository vs. concrete descriptive style,” “immediate vs.
removed style”)?
The next section provides a brief description of the three dimensions and the
method used in the present study.
2.2 Method
2.2.1 Dimensions of Style Variation in Modern Literary Nonfiction
The dimensions of variation outlined previously were identified by a multidimensional analysis conducted on a corpus of 300 creative nonfiction essays written by 17 modern (predominantly twenty-first century) authors (Gracheva 2022).
“Interactive vs. informational style” is a cline from authors whose works involve
interactivity with the reader or between the characters to those who prioritize
informational density. Texts scoring high in interactivity are marked by typical features of oral discourse (Biber 1988): first and second person pronouns,
demonstrative and indefinite pronouns, “that” deletion, wh-clauses controlled
by private verbs, conditional adverbial clauses, emphatics, wh-questions, among
others (see Appendix for a complete list of features comprising each dimension).
On the informational end of the pole, the texts are rich in features functioning as
Within-Author Style Variation in Literary Nonfiction 29
information packaging devices – prepositional phrases, nominalizations, attributive adjectives, and by-passives – while features of oral discourse are rare.
“Abstract expository vs. concrete descriptive style” is a continuum from authors
focusing on exposition of abstract concepts, stance, and reasoning to writers
whose essays are full of concrete descriptive detail. This dimension is characterized by a contrast between abstract and concrete nouns, abstract nouns denoting
complex concepts explicated through noun complements, verb complements, and
causative adverbial clauses (these clausal features serving the function of reasoning) and concrete nouns contributing to descriptions enhanced by present and
past reduced relative clauses. Again, these sets of features are in complementary
distribution: texts with high rates of occurrence of the first set of features typically
do not contain features from the other end of the pole, and vice versa.
“Immediate vs. removed style” represents a cline from present-oriented texts
to texts set in the past, with high rates of occurrence of past tense, perfect aspect,
third-person pronouns, and public verbs. This present or past orientation appears
to reflect the author’s presence and involvement in or distance from the described
events, which gives the dimension its label.
As each text from each author receives a dimension score, that is, an indicator
of how high or low the text occurs on a given dimension, it is possible to observe
each author’s stylistic tendencies on each dimension. As was stated earlier, while
clear stylistic preferences were observed in the case of some authors, each dimension also featured authors whose works were spread wide along the spectrum.
It is these authors’ texts that are of interest to the present study, which is concerned with identifying the basis for this substantial linguistic variation across
their works.
2.2.2 Present Study: Corpus and Quantitative Analysis
The author who exhibited the widest range of variation was selected on each
dimension (Table 2.1). The descriptive statistics in Table 2.1 show dimension score
means and variance (SD) of each author’s texts on their respective dimension.
Table 2.1 Range of Variation along Dimensions
Author
Number of Texts Dimension
Variation along Dimension
M
Phillip Lopate
32
Ander Monson 19
David Shields
27
Interactive vs.
-0.17
Informational Style
Abstract Expository vs. -0.06
Concrete Descriptive
Style
Immediate vs. Removed 0.11
Style
SD
1.03
0.91
1.12
30
Marianna Gracheva and Jesse A. Egbert
The spread of scores across each author’s works is illustrated in Figures 2.1–
2.3. Thus, Lopate’s style is represented by essays relying on both interactivity and
informational focus, Monson’s essays are equally divided between the expository
and descriptive ends of the spectrum, and Shields’s essays are varied in their present or past orientation and narrator presence.
To investigate the role of communicative purpose in this linguistic variation,
each text was coded for purpose, resulting in several purposes commonly found
across an author’s works. In these cases, one-way ANOVAs were run, with purpose treated as the independent variable. Author style, operationalized through
Figure 2.1 Phillip Lopate’s essays: spread of scores on “interactive vs. informational
style.”
Within-Author Style Variation in Literary Nonfiction 31
Figure 2.2 David Shields’s essays: spread of scores on “immediate vs. removed style.”
the dimension scores of the author’s texts on their respective dimension, was the
dependent variable in each case study.
In the case of one author, however (Monson on the dimension of “abstract
expository vs. concrete descriptive style”), coding the texts for purpose did not
reveal clear distinctions, as purposes of all texts included reflection. This uniformity makes purpose (at least in the way we coded it) an unlikely reason for the
substantial linguistic variation observed across the author’s works. To examine
the possibility of other factors determining Monson’s choice of style, a hierarchical cluster analysis was conducted on his essays, with the goal of grouping
32
Marianna Gracheva and Jesse A. Egbert
Figure 2.3 Ander Monson’s essays: spread of scores on “abstract expository vs. concrete
descriptive style.”
texts based only on proximity in dimension scores. Thus, instead of using a topdown method of coding for communicative purpose and measuring differences
among them, as with the authors on the first two dimensions, for Monson’s essays
we used a bottom-up approach grouping texts that are maximally similar to each
other in their scores and maximally different from the texts in the other clusters.
Each cluster of Monson’s essays could then be examined for a possible functional
basis of these groupings.
Within-Author Style Variation in Literary Nonfiction 33
2.3. Results and Discussion
2.3.1 Phillip Lopate: “Interactive vs. Informational Style”
The communicative purposes identified in Phillip Lopate’s essays, which varied on
the dimension of “interactive vs. informational style,” include addressing a question,
description of a person, narration and reflection, speculation on or analysis of an
issue, and review (Table 2.2). Essays addressing readers’ or students’ questions are
concerned with writing techniques and practices, such as the ethics of writing about
other people, research and personal writing, and ways of effectively ending an essay.
These texts explicitly state the question and are structured in the form of an answer,
offering guidance, explaining the rationale behind the advice, and providing the necessary background into essay writing. The focus of essays describing a person, or
rather an interaction, is on a memorable encounter, an experience that involved other
people (e.g., renting a place), or a relationship. These essays describe people through
conversations the narrator has with them and place emphasis on interpersonal matters. The purpose to narrate and reflect is found in essays that tell a story and share
the narrator’s thoughts, opinions, and feelings about the events. Essays whose purpose is to speculate on or analyze an issue differ from the narrative essays containing reflection in that there is usually no specific event or episode that underlies the
analysis. These speculative essays typically present the problem they examine and
the author’s view without the foundation of a concrete experience. Finally, reviews
are the author’s evaluations of directorial efforts or works of writing.
Thus, Lopate’s essays are quite diverse in purpose, and purpose is found to
be a statistically significant predictor of linguistic variation on the dimension (F
[4, 27] = 15.67, p < .05, R2 = 0.7). This range includes purposes that explain the
higher level of interactivity in some texts through author engagement with the
reader (the purpose of addressing a question) or through reproducing an encounter
(the purpose of describing a person) and warrant informational density in others
(texts whose goal is to narrate and reflect, speculate, or review a work of art). This
split between the “interactive” and “informational” purposes can be seen quite
clearly in Figure 2.4. The three essays with the goal of describing a person or an
encounter are all found on the “interactive” pole (positive scores of 3.2, 2, and
Table 2.2 “Interactive vs. Informational Style” Dimension Scores across Communicative
Purposes in Phillip Lopate’s Essays
Purpose
Number of Texts
Descriptive Statistics
M
Describe a person/
encounter
Address a question
Narrate and reflect
Speculate/analyze
Review
SD
3
1.99
1.19
3
6
9
11
0.42
-0.04
-0.45
-0.92
0.93
0.30
0.61
0.39
34
Marianna Gracheva and Jesse A. Egbert
Figure 2.4 Phillip Lopate: variation by communicative purpose on “interactive vs. informational style.”
0.8), as well as two of the three essays labelled as addressing a question (scores
of 1.3 and 0.6). In contrast, essays fulfilling the three communicative purposes
which suggest an emphasis on information presentation, namely, narration and
reflection, speculation, and review, are predominantly found on the “informational” pole. The purpose of describing a person, the most extreme representation
of interactivity in Lopate’s essays, differs significantly (p < .05; d > 0.8) from
essays with all three informational purposes, while the purpose of addressing a
question is statistically different from reviews (p < .05; d = 1.9).
The differences between communicative purposes appear even more nuanced
than this general distinction between the “interactive” and “informational”
extremes: the two purposes on the “interactive” end are also statistically different
(p < .05, d = 1.5). Specifically, the purpose of addressing a question, found in two
essays with positive scores (0.6 and 1.3) but one text with a moderate negative
score (-0.6), appears to encompass a wider linguistic range. Considering that the
answer presented in these essays may rely to different extents on specific strategies, such as explanations, examples, or references to theory, it is not surprising
that essays fulfilling this purpose differ linguistically, perhaps reflecting these
finer-grained distinctions. The excerpts that follow illustrate the five communicative purposes of Lopate’s writing from the most interactive essays to the most
informational ones (scores indicated in parentheses) and the differences in their
linguistic representation (interactive features bolded, informational italicized).
Within-Author Style Variation in Literary Nonfiction 35
Texts 1 and 2 represent the “interactive” pole of the dimension, but it is apparent that the emphasis on interactivity manifests itself to different degrees. In Text
1, describing a relationship through a conversation, the focus on interpersonal
matters is conveyed through first- and second-person pronouns, private verbs of
feeling and mental state, followed by an expression of stance through a wh-complement clause, another complement with that deletion, a conditional clause, and
emphatics. The passage also features other indicators of oral discourse, such as
demonstrative and indefinite pronouns:
Text 1: Description of a person or an encounter, Motel (3.15)
I don’t really know what I’m trying to say, but I always felt [that deletion] there was something you were holding back. It doesn’t work that way,
Phillip. As long as you weren’t forthcoming with it, I didn’t see any way I
could allow myself to. It’s like a game. You put your chip down, I put my
chip down, you put another, and I put another. It doesn’t work without
being mutual.
As stated earlier, despite being found on the “interactive” end, the purpose of
addressing a question is significantly different from that of describing a person.
Unlike Text 1, Text 2 is monologic, with the author stating the questions he is
commonly asked and relating them to his background as a writer. This goal does
not demand the same emphasis on interactivity; as a result, the typical features
of interactivity in this excerpt, wh-questions, first-person, indefinite, and demonstrative pronouns, and private verbs, are less prominent. In fact, the text features
several information packaging devices, such as prepositional phrases, nominalizations, and attributive adjectives, reflecting the importance of information presentation for essays with this purpose:
Text 2: Addressing a question, On the Ethics of Writing about Others (1.27)
Whenever I speak in public about autobiographical nonfiction or simply give a reading of my own work, I am invariably asked in the Q-and-A
session: How should one deal with writing about one’s family members or
intimates? How does one balance the need to tell one’s story with the pain
others might feel in being exposed this way? The assumption is that since I
have written candidly about family and friends in the past, I must know the
answer to this difficult question.
Narration and reflection is the purpose that gravitates toward the “informational” end; however, the negative scores are moderate, and these texts tend to
combine features of informational density and interactivity. Text 3 illustrates this
balance. The narrative is personal, which is reflected in the use of first-person
pronouns; however, the reflection accompanying the narrative requires a heavy
informational focus. Nominalizations, particularly frequent in the text, express
mental and emotional states (“semblance of rationality,” “expression of sympathy
and knowingness,” “detachment,” “skepticism,” “reaction”) and other complex
36
Marianna Gracheva and Jesse A. Egbert
constructs (“personality,” “essayist’s equipment”), often further modified through
prepositional phrases.
Text 3: Narration and reflection, A Mother’s Tale (-0.07)
When I was about eight years old, not long after I had mastered speech and a
semblance of rationality, I became, or rather, fashioned myself into becoming,
an ideal interlocutor for my mother. She would come to me with her troubles
(usually complaints about my father) and I would listen with an expression of
sympathy and knowingness, which I had learned to sham at an early age. . . .
I now see that large parts of my adult personality and professional demeanor
were formed in reaction to my mother: habits of detachment, skepticism, and
thinking against oneself, which are classic essayist’s equipment.
The purpose of speculation is found further down on the “informational” end.
Text 4 demonstrates that reliance on a combination of nominalizations and attributive adjectives (“cosmopolitan worldliness,” “painstaking, willed achievement”)
among other features conveys the author’s judgment and contributes to the speculative tone:
Text 4: Speculation, Brooklyn the Unknowable (-0.62)
Brooklyn’s provincialism, be it said, is not, or not entirely, a failure to
achieve cosmopolitan worldliness; it is also a painstaking, willed achievement. It’s not easy to be situated next to the most au courant place on the
planet and hold onto one’s rough edges. Though Tiresias’s passage between
genders has always struck me as exhausting, I seem to have conducted my life
so as to crisscross the identity border between Manhattanite and Brooklynite.
Finally, reviews are essays with the largest negative scores on the dimension.
Text 5, a film review, shows that here informational features contribute to value
judgments and their expressivity (“greatest performances,” “grim hard-nosed
comedy,” “the embodiment of a corrupt past,” “parable of regional and class
tensions”) as well as depict the on-screen reality (“optimistic modern Northern
Italy,” “poor, backward, fatalistic South,” “a go-getter foreman in a Milanese car
factory”), thus fulfilling primary goals of a review.
Text 5: Review, Strained Relations (-1.49)
That situation has been rectified by Rialto Pictures, which is releasing
a lustrously restored and newly subtitled version of Lattuada’s supremely
grim, hard-nosed comedy. The plot centers on Nino, a go-getter foreman in
a Milanese car factory, played by Alberto Sordi in one of his greatest performances. . . . The film can be seen as a parable of regional and class tensions:
between the gullibly optimistic modern Northern Italy of the economic boom,
and the poor, backward, fatalistic South, still ruled by bandits and gangsters,
the embodiment of a corrupt past that has never gone away.
Within-Author Style Variation in Literary Nonfiction 37
2.3.2 David Shields: “Immediate vs. Removed Style”
Analysis of David Shields’s essays, showing variation on the dimension of
“immediate vs. removed style,” revealed four distinct communicative purposes
(Table 2.3): to analyze and reflect on a problem or a phenomenon, to argue a point
with reliance on evidence, to narrate, and to review. The purpose of analysis and
reflection characterizes texts presenting the author’s thoughts on forms of writing,
literature, norms and conventions of the writing industry. Unlike the argumentative purpose (to argue a point), this kind of reflection is typically not supported by
concrete evidence and conveys mainly the author’s opinions and personal observations. In contrast, the purpose to argue a point involves evidence in the form of
research findings, historical facts, or statements made by authority figures. Narrative essays share past experiences, and reviews focus on film and works of writing.
Purpose is again found to be a significant predictor of variation across texts (F
[3, 23] = 36.43, p < .05, R2 = 0.83), with all pairs except reviews and analyses/
reflections showing significant differences (p < .05, d > 0.8). The essays of these
two groups are both found on the positive, present-oriented end of the dimension.
This lack of significance is unsurprising in view of the overlap between these two
purposes – most reviews are likely to contain analysis of the content of the piece
and its artistic merit.
Figure 2.5 illustrates how the texts unified by a single purpose cluster on the
dimension. Expectedly, Figure 2.5 shows the divide between analyses and reviews
occupying the present-oriented pole and narratives on the past-oriented end of the
dimension.
Text 6, an example of an analysis/reflection, demonstrates immediacy and
author presence as he expresses stance and makes statements about mental processes and their reflection in writing. This goal necessitates the use of present
tense (bolded) and exemplifies what Langer (1953, 208) calls the “timeless present,” best suited for conceptualizations of ideas and their interrelationships.
Text 6: Analysis/reflection, Contradiction (2.16)
It’s natural to enter into dialogues and disputes with others, because it’s
natural to enter into disputes with oneself: the mind works by contradiction.
Great art is clear thinking about mixed feelings. One of the tricks in writing
a personal essay is that you have to develop a dialogue between the parts of
Table 2.3 “Immediate vs. Removed Style” Dimension Scores across Communicative Purposes in David Shields’s Essays
Purpose
Analyze/reflect
Review
Argue a point
Narrate/describe an experience
Number of Texts
7
4
10
6
Descriptive Statistics
M
SD
1.03
1.01
0.13
-1.62
0.67
0.41
0.32
0.55
38
Marianna Gracheva and Jesse A. Egbert
Figure 2.5 David Shields: variation by communicative purpose on “immediate vs.
removed style.”
yourself that in a way corresponds to the conflict in fiction. You cop to various tendencies, and then you struggle with these tendencies.
Similarly, Text 7, a review, presents the author’s interpretation of a literary
work, facts he considers generally true and, therefore, “timeless” (Langer 1953):
Text 7: Review, Autobiographic Rapture (0.80)
The contrast between the title of Vladimir Nabokov’s autobiography and the
title of his first English novel suggests a distinction between autobiography
Within-Author Style Variation in Literary Nonfiction 39
and fiction. . . . But the comedy, as always with Nabokov, cuts considerably
deeper. If autobiography is a physical place to which one can return, and if
memory has words with which to communicate, then consciousness is tangible and the imagination is real.
Argumentative essays differ significantly from the two purposes found on the
“immediate” end (p < .05, d > 0.8); however, Figure 2.5 shows that these essays
are a mixture of the “immediate” and “removed” styles. While the present tense
may be expected in argumentation, it is interesting to notice the role of the negative features, namely, past tense, perfect aspect, public verbs, and third-person
pronouns, which here are related to evidentiality. Text 8, an excerpt from an essay
arguing for the need to borrow from other art forms to avoid stagnation, illustrates
that these essays often present a balance of present and past orientation, the former serving the goal of outlining the problem and the latter providing evidence
for the claim from past trends in history. It is thus this emphasis on evidence that
makes argumentative texts distinct from the purely analytical/reflective ones discussed previously and warrants the use of narrative features (italicized):
Text 8: Arguing a point, Reality Hunger (-0.02)
Why is hip-hop stagnant right now, why is rock dead, why is the conventional novel moribund? Because they’re ignoring the culture around them,
where new, more exciting forms . . . are being found (or rediscovered).
American R & B was enormously popular in Jamaica in the 1950s. . . . The
music culture was based around DJs playing records at public dances; huge
public-address systems were set up for these dances. DJs started acting more
and more as taste editors.
Finally, the narrative purpose relies on the past to a considerably greater extent.
Text 9 tells a personal story and contains multiple references to past events and
their participants. It is clear from Figure 2.5 that such texts only occupy the negative end of the dimension.
Text 9: Narration, Notes for Eulogy for My Father (-1.46)
Since I was six years old, the first thing he and I have done every morning
is read the sports page. . . . [W]hen Mike Marshall hit a three-run home run
in the tenth inning to win it for the Dodgers, he and I looked at each other
and we were both, a little weirdly, crying. Games have held us together, but
also words. I’ve always loved his love of puns . . . ; admired his ability to tell
a joke and a story.
2.3.3 Ander Monson: “Abstract Expository vs. Concrete Descriptive
Style”
The picture is somewhat more complex in the case of Ander Monson, who exhibits substantial linguistic variation on the dimension of “abstract expository vs.
40
Marianna Gracheva and Jesse A. Egbert
Figure 2.6 Ander Monson: variation by communicative purpose on “abstract expository
vs. concrete descriptive style.”
concrete descriptive style.” The process of coding his essays for communicative
purpose, however, revealed a limited range, including such purposes as to reflect,
describe and reflect, narrate and reflect, engage the reader, and present an idea. It
becomes apparent that reflection is the underlying goal in all of Monson’s essays,
with even the essays that present an idea or directly address and engage the reader
in the thought process heavily focused on reflection. This homogeneity makes it
unlikely that communicative purpose is the factor accounting for variability in the
author’s choice of language, which is confirmed by a lack of statistical significance of purpose as a predictor of variation (F [4,14] = 0.57, p>.05; Figure 2.6).
To explore the possibility of other functional reasons for the extensive range
of variation shown by the author on the dimension, we used hierarchical cluster
analysis to identify possible bottom-up “text types” among essays based on proximity in their dimension scores. That is, this approach does not operate on the
assumption that the reason for variation lies in the differences in communicative
purpose, and the texts were not coded for any situational parameters. Rather, texts
close in scores form clusters, which subsequently allows the researcher to identify the possible functional basis for the clustering through a qualitative analysis
Within-Author Style Variation in Literary Nonfiction 41
2
16
15
17
8
9
19
5
14
12
18
4
10
6
11
7
3
13
2
1
0
1
Height
3
4
Dendrogram of agnes(x = Monson_reduced, method = "ward'
Monson_reduced
Agglomerative Coefficient = 0.97
Figure 2.7 Ander Monson’s essays: three-cluster solution.
Table 2.4 Clusters Identified in Ander Monson’s Essays
Cluster
Cluster 1
Cluster 2
Cluster 3
Number of Texts
7
7
5
Descriptive Statistics
M
SD
-0.98
-0.06
1.19
0.31
0.16
0.51
of essays that are linguistically similar within clusters and maximally different
between them.
The cluster analysis yielded three clusters (dendrogram in Figure 2.7). Table 2.4
and Figure 2.8 show that Cluster 1 includes highly descriptive essays found
entirely on the negative end; in contrast, Cluster 3 includes essays scoring high on
the positive “abstract” end of the dimension, while Cluster 2 occupies an intermediate position. The differences between the clusters are statistically significant (F
[2,16] = 51.61, p < .05, R2 = 0.86), with all three clusters significantly different
from each of the others (p < .05, d > 0.8).
Qualitative analysis of the essays within the clusters suggests that the quantitative differences between them reflect varying degrees of reliance on concrete
illustration in Monson’s essays. The reflections in Cluster 1 seem unique in that
they use concrete descriptions of objects, places, or events to convey ideas central
to the essay without naming the concepts or problems being analyzed or using
42
Marianna Gracheva and Jesse A. Egbert
Figure 2.8 Clusters identified in Ander Monson’s essays.
features of reasoning. The reflection often revolves around an object or a place
which seems to have triggered it. Features of this object, place, or event are constantly brought into focus, and thought presentation is structured around them. In
some essays, these descriptions (italicized) are used to frame the central message,
as in Text 10:
Text 10, Cluster 1, Forecast (-1.56)
It is not about the leather jackets or the letter jackets or the sun that gleams
off the hood of his Trans Am as it arrives in a cloud of dust that looks like the
beginnings of a blizzard. It’s not about the shoplifted jewelry or the supply
of illicit alcohol although all of these are factors. . . . It is the spray-painted
slogans on the overpasses-not just the message but the branding and the fact
of them. All this is why Heidi ends up reclined in the passenger side of the
high school burnout dropout counselor’s-nightmares car.
Essays in Cluster 2 are distinctive in that the analyzed concepts are explicitly
named, and as they are explained, the essay provides concrete illustrations. Text
11 is an excerpt from an essay equating essay writing to the hacking activity.
Within-Author Style Variation in Literary Nonfiction 43
While hacking is first discussed in terms of abstract processes and concepts such
as “exploration,” “opening up,” “problem solving,” and “magic,” accompanied by
features of reasoning such as verb complement clauses and adverbs (bolded), the
text shifts to concreteness (italicized), as an example is provided:
Text 11, Cluster 2, Essay Hack (-0.04)
Hacking is at heart a creative activity. It is first, simply, an exploration,
an opening up, of a system. A kind of problem solving. . . . More loosely,
a hack is an ingenious use of technology to accomplish something that is
otherwise impossible to accomplish. It is a bridge from one land mass to
another over deep water. It appears, like any sufficiently advanced technology, as a kind of magic. . . . For instance, a famous hardware hack, the
red box, repurposes a Radio Shack autodialer (a portable, pre-cell-phone
device that could store and automatically dial numbers) via some soldering
to mimic the tone (technically a series of four tones) that indicates to a pay
phone that a quarter has been deposited.
Cluster 3 abounds in abstract notions and analysis: essays present an explanation or the narrator’s thought process in an attempt to arrive at an understanding of
some complex phenomenon. Text 12 illustrates thought presentation and reasoning through complement clauses controlled by mental verbs, noun complement
clauses, and stance adverbials (bolded), but abstract concepts are not exemplified
through an account of a specific case or event:
Text 12, Cluster 3, On Selah Straterstrom (1.94)
Just as I don’t believe in the religion behind the ritual but I love the ritual,
I don’t really believe in the practice of augury but I do love the idea of
it, that by chance (my personal preference for prognostication . . .) some
design presents itself to me. I recognize that what I’m probably doing is
allowing chance to access some internally stored information or route, but
either way I like the feeling of giving up control.
2.4 Conclusion: Implications for Style Research, Limitations,
Future Directions
In this chapter, we have shown that stylistic variation across an author’s works
may have a functional basis, specifically reflecting the communicative intent of
the text. This is the case for Lopate and Shields, whose styles vary with regard
to the degree of interactive or informational focus and present or past orientation
of the text, respectively. Lopate’s essays found on the “interactive” end fulfill
such purposes as to describe a person or an encounter and address students’ or
readers’ questions, thus explaining the inclination toward interactivity, while his
heavily informational essays aim to narrate and reflect, speculate on a problem,
or review an artistic work, justifying the increased use of information packaging
devices. The purposes of Shields’s essays, which warrant present-orientedness
44
Marianna Gracheva and Jesse A. Egbert
and immediacy, are reviews and analyses/reflections, while essays whose goal is
to share a personal story are set in the past. Shields’s argumentative essays occupy
an intermediate position on the dimension due to their reliance on evidentiality,
with the present tense serving to state the claim and the past providing evidence.
Linguistic variation has a different basis in the writing of Ander Monson, whose
essays, all reflective in nature, did not reveal variation in communicative purpose. However, the observed differences between the three clusters, identified in
a bottom-up way, also suggest a functional basis and reflect different approaches
to communicating ideas: by creating a highly specific description in itself sufficient to convey a bigger message, by explicitly stating the concepts discussed
and accompanying them with illustration, or by presenting thoughts surrounding abstract notions without the support of concrete exemplification. It is worth
noting that whether these distinctions result from an intentional or subconscious
choice (the latter often viewed as a defining characteristic of style, e.g., Argamon
and Levitan 2005) is a separate consideration, not addressed in this study. It is
demonstrated, however, that regardless of its nature, within-author variation contains analyzable patterns that lend themselves to a functional interpretation.
An important implication of this line of work for style research lies in the need
to acknowledge the effect of situational considerations, such as the communicative intent of the text, on an author’s linguistic choices. This acknowledgment will
result in a finer-grained approach to style as individual language use in response
to a specific communicative need rather than arbitrary choices made in isolation
from a social reality. Awareness of these nuanced situational distinctions, particularly the communicative purpose of a text, within a single author’s body of work
seems important in literary translation, which aims at preservation of the original
authorial style and a reflection of style shifts of the original in the translated work
(e.g., Chesterman 2007). The situational perspective and the idea that authorial
preferences have functional underpinnings reflected in specific linguistic choices
provide systematic guidance in the task of achieving stylistic similarity to the
original, not offered by the view of stylistic choices as arbitrary or purely idiosyncratic. The functional approach is also highly informative for writing practice, as
it illustrates that certain stylistic effects can be achieved through strategic use of
linguistic devices, which is not a common consideration in the teaching of writing
(Bryant 2016).
One limitation of the present study is its broad-strokes approach to the operationalization of purpose, as it identifies one overarching communicative purpose
of each essay. Recent research indicates that texts are not monolithic units and
points to the existence of a far greater granularity in textual units than what is
marked by existing boundaries, such as beginning and end of an essay. Egbert
and Schnur (2018, 162–3), for example, define a text as a recognizably self-contained and functional language unit, suggesting that larger texts can be subdivided into more granular ones that are also self-contained and functional. These
smaller textual units often reflect shifts in purpose within a longer unit of discourse. Alternative, more granular textual units have been explored in studies of
the IMRD structure of research articles (Biber and Finegan 1994), discourse units
Within-Author Style Variation in Literary Nonfiction 45
in conversation (Biber et al. 2021), or narration and speech within fiction (Egbert
and Mahlberg 2020). Building on that research, Egbert and Gracheva (Forthcoming) observe additional linguistic variation associated with increased textual and
situational granularity within the registers of fiction, presidential memoirs, political speeches, and introductory textbooks. Modern literary essays are a clear example of a highly varied register, as follows from a lack of agreement on a definition
of an essay as well as the results of empirical studies, such as the ones used as the
basis for this investigation and reported in this chapter. Thus, it is almost certain
that a single essay features multiple communicative purposes, prioritized by the
author to different extents. To account for within-author linguistic variability on
this fine-grained level, future studies may undertake text segmentation based on
purpose shifts or perform continuous coding for purpose, pioneered by Biber et al.
(2020) in their study of web registers. Finally, accounting for audience, topic, or
other specific situational factors can substantially enhance our understanding of
individual language use.
References
Argamon, Shlomo, and Shlomo Levitan. 2005. “Measuring the Usefulness of Function
Words for Authorship Attribution.” Paper presented at ACH/ALLC Conference, Victoria, Canada, June. https://doi.org/10.1.1.71.6935&rep=rep1&type=pdf.
Biber, Douglas. 1988. Variation across Speech and Writing. Cambridge: Cambridge University Press.
Biber, Douglas, and Susan Conrad. 2019. Register, Genre, and Style. Cambridge: Cambridge University Press.
Biber, Douglas, Jesse Egbert, and Daniel Keller. 2020. “Reconceptualizing Register in
a Continuous Situational Space.” Corpus Linguistics and Linguistic Theory 16, no. 3:
581–616.
Biber, Douglas, Jesse Egbert, Daniel Keller, and Stacey Wizner. 2021. “Towards a Taxonomy of Conversational Discourse Types: An Empirical Corpus-based Analysis.” Journal
of Pragmatics 171: 20–35.
Biber, Douglas, and Edward Finegan. 1994. “Multi-dimensional Analyses of Authors’
Styles: Some Case Studies from the Eighteenth Century.” Research in Humanities Computing 3: 3–17.
Bryant, Stacy. 2016. “Teaching Authorial Style and Literary Technique: Exemplo XI of El
Conde Lucanor.” Hispania 99, no. 2: 234–45.
Chesterman, Andrew. 2007. “Similarity Analysis and the Translation Profile.” Belgian
Journal of Linguistics 21, no. 1: 53–66.
Egbert, Jesse. 2012. “Style in Nineteenth Century Fiction. A Multidimensional Analysis.”
Scientific Study of Literature 2, no. 2: 167–98.
Egbert, Jesse, and Marianna Gracheva. Forthcoming. “Linguistic Variation within Registers.
Granularity in Textual Units and Situational Parameters.” Corpus Linguistics and Linguistic Theory. https://www.degruyter.com/document/doi/10.1515/cllt-2022-0034/html
Egbert, Jesse, and Michaela Mahlberg. 2020. “Fiction – One Register or Two?” Register
Studies 2, no. 1: 72–101.
Egbert, Jesse, and Erin Schnur. 2018. “The Role of the Text in Corpus and Discourse
Analysis: Missing the Trees for the Forest.” In Corpus Approaches to Discourse, edited
by Charlotte Taylor and Anna Marchi, 159–73. New York: Routledge.
46
Marianna Gracheva and Jesse A. Egbert
Gracheva, Marianna. 2022. “Style of Creative Nonfiction: A Multidimensional Analysis
of Literary Essays.” Scientific Study of Literature 12, no. 1. https://doi.org/10.1075/ssol.
22002.gra.
Gutkind, Lee. 2007. The Best Creative Nonfiction. New York: W. W. Norton.
Hartsock, John. 2016. Literary Journalism and the Aesthetics of Experience. Amherst,
MA: University of Massachusetts Press.
Hoover, David. 2007. “Corpus Stylistics, Stylometry, and the Styles of Henry James.” Style
41, no. 2: 174–203.
Langer, Susanne. 1953. Feeling and Form. New York: Charles Scribner’s Sons.
Moss, Lesley. 2014. “Corpus Stylistics and Henry James’s Syntax.” PhD diss., University
College London.
Silverman, Sue. 2008. “The Meandering River: An Overview of the Subgenres of Creative Nonfiction.” Association of Writers and Writing Programs. Last modified September 2008. www.awpwriter.org/magazine_media/writers_chronicle_view/2507/
the_meandering_river_an_overview_of_the_subgenres_of_creative_nonfiction.
Within-Author Style Variation in Literary Nonfiction 47
Appendix
Dimensions of Variation: Features and
Factor Loadings
Dimensions
Associated Linguistic Features
Dimension 1
Interactive vs. Informational
Style
Positive Features
“That” deletion + private verbs (0.63), private verbs
(0.62), pronoun IT (0.57), conditional clauses (0.56),
2nd person (0.56), 1st person (0.55), indefinite
pronouns (0.49), demonstrative pronouns (0.44),
“that” deletion + public verbs (0.44), emphatics (0.41),
hedges (0.36), proverb DO (0.34), wh-questions
(0.34), wh-clauses + private verbs (0.33), be as main
verb (0.32)
Negative Features
Attributive adjectives (-0.76), prepositional phrases
(-0.62), nominalizations (-0.61), by-passive (-0.40).
Positive Features
Verb complements + private verbs (0.55), adverbs
(0.49), noun complements (0.47), causative clauses
(0.42), abstract nouns (0.39), verb complements +
public verbs (0.37), suasive verbs (0.31)
Negative Features
Concrete nouns (-0.76), present participial (-0.37), past
participial (-0.33)
Positive Features
Present tense (0.58)
Negative Features
Past tense (-0.87), perfect aspect (-0.55), public verbs
(-0.54), 3rd person (-0.38)
Dimension 2
Abstract Expository vs.
Concrete Descriptive Style
Dimension 3
Immediate vs. Removed Style
3
Charles Dickens’s Influence
on Benito Pérez Galdós
Revisited
A Corpus-Stylistic Approach1
Pablo Ruano San Segundo
3.1 Introduction
In this chapter, we compare Charles Dickens’s and Benito Pérez Galdós’s style to
investigate the alleged influence of the former on the latter. Benito Pérez Galdós is
a well-known nineteenth-century Spanish novelist whose craftsmanship has been
frequently compared to Dickens’s (see Section 3.2). In this chapter, we scrutinize
this influence from a corpus-stylistic point of view. To do so, we have developed
an annotation system of Galdós’s novels to identify suspensions. A suspension
(also known as suspended quotation) is a “protracted interruption by the narrator of a character’s speech” of at least five words (Lambert 1981, 6). They are
characteristic of Dickens’s style (Newsom 2000, 556). As shown in example 1,
suspensions are projecting clauses with which narrators introduce stretches of
direct speech. A suspension can have several functions, such as organizing discourse, offering character information, or creating specific literary effects, such as
an impression of simultaneity between the words of a character and their actions.
In example 1, for instance, the suspension contributes to the effect of synchronicity between Mr. Gradgrind’s words and his body language (pondering with his
hands in his pockets, and his cavernous eyes on the fire).
(1) “Whether,” said Gradgrind, pondering with his hands in his pockets, and
his cavernous eyes on the fire, “whether any instructor or servant can have
suggested anything? Whether Louisa or Thomas can have been reading anything? Whether, in spite of all precautions, any idle story-book can have got
into the house? Because, in minds that have been practically formed by rule
and line, from the cradle upwards, this is so curious, so incomprehensible.”
(Hard Times, Chapter 4)2
Thanks to the advances in corpus stylistics, suspensions have been systematically analyzed in Charles Dickens’s novels (Mahlberg and Smith 2012; Mahlberg
et al. 2013, 2016, among others). In this chapter, we use a similar methodology to
compare Dickens’s use of suspensions to that of Galdós in Fortunata and Jacinta,
the novel for which the Spanish novelist is best known. The aim of the chapter is to
discuss patterns in form and function hitherto unremarked in literary appreciations
DOI: 10.4324/9781003298328-4
Charles Dickens’s Influence on Benito Pérez Galdós Revisited 49
of Galdós’s style that show how the Spanish novelist may have incorporated a
Dickensian device into his style to achieve similar effects as those conveyed by
Dickens. In doing so, hopefully, the chapter will also contribute to illustrating the
potential of corpus stylistics in the analysis of literary texts in Spanish, in which the
amount of studies using computer-assisted methodologies is still small in number.
The chapter is organized as follows. First, we provide a brief overview of the
alleged influence of Charles Dickens on Benito Pérez Galdós (Section 3.2). Then,
the annotation system used to identify suspensions in Fortunata and Jacinta is
explained, and the results obtained are shown (Section 3.3). These results are
analyzed in Section 3.4, which is divided into two subsections. In Section 3.4.1,
we discuss the similarities in form and function of suspensions between Dickens
and Galdós. In Section 3.4.2 we focus on specific textual functions of the suspensions identified in Galdós’s novel that are similar to those discussed in Dickens
in previous studies. The chapter concludes with some remarks on the potential
of computer-assisted methods that combine quantitative and qualitative analyses,
from which the study of literary texts written in Spanish can benefit greatly.
3.2 The Influence of Charles Dickens on Pérez Galdós
The influence that great writers – such as Cervantes (Goldman 1971; Benítez
1990), Balzac (Lacosta 1968; Ollero 1973), writers from the Russian schools
such as Tolstoy and Dostoevsky (Gilman 1981; Ley 1977, 294–95), and of course
Dickens (McGovern 2000; Tambling 2013) – exerted on Galdós is well-known.
This was admitted by the Spanish novelist himself. In his autobiographical Memorias de un desmemoriado, for example, Galdós states:
I regarded Charles Dickens as my most beloved master. In my literary apprenticeship, still in my conceited youthfulness and having barely devoured
Balzac’s The Human Comedy, I zealously set myself to reading Dickens’s
vast oeuvre.
(Pérez Galdós 1980, 1693) (our translation)3
Galdós’s translation of Dickens’s Pickwick Papers for the Spanish newspaper La
Nación is situated precisely in this literary apprenticeship. Not only did Galdós
pay tribute to Dickens in this translation, but he also absorbed the Victorian
author’s style (Wright 1979, 24).4 He translated Pickwick Papers in 1868, when
he was barely 24 years old, and while he was fully engaged in the writing of his
first novel, La Fontana de Oro, published a couple of years later. This is probably the reason this novel has plenty of Dickensian echoes, such as the detailed
and exaggerated descriptions of characters’ physical appearance or the narrator’s
visible stance and his animosity toward oppressors (Nieto Caballero 2019a, 323).
These would be the first of the many Dickensian parallelisms found in Galdós’s
works, which have been widely discussed as part of the alleged influence of Dickens on Galdós. The presence of Dickensian reminiscences in Galdós’s novels has
been the subject of numerous studies, in which it is not difficult to find examples
50
Pablo Ruano San Segundo
that refer to Fortunata and Jacinta, the text under analysis here. For instance, the
use of quia and en toda la extensión de la palabra by Fortunata and Doña Lupe,
respectively, is frequently used as a prototypical examples of Dickens’s wellknown use of catchphrases with which he singles out his characters’ speeches.
We can also mention examples related to specific Dickensian situations and characters that seem to be transported into Galdós’s fictional universe. The chapter
“Una visita al Cuarto Estado” in Fortunata and Jacinta, for instance, has been
frequently likened to the search for an orphan that takes place in Our Mutual
Friend (Gilman 1981, 218–19). As for parallelisms in characters, José Izquierdo
strongly resembles Mr. Casby, while Doña Guillermina seems to be inspired by
Mrs. Jellyby, both from Little Dorrit (Gilman 1981, 271).
All these echoes in Fortunata and Jacinta are just a paradigmatic example of
the numerous traces of the English author in Galdós’s oeuvre, on which the influence pointed out by critics is based. Without a doubt, they illustrate the influence
of Dickens on Galdós. However, it should be noted that this influence is mostly
based on a compendium of references rather than on textual analyses of both
authors. In other words, although the influence of Dickens on Galdós is indisputable, there are no systematic analyses that have systematically investigated such
influence. Quite on the contrary, scholarship has gauged the Dickensian echo in
Galdós on the basis of a collection of novelistic reminiscences, such as characters’
use of catchphrases or the Dickensian situations and characters that we have just
pointed out. Needless to say, this should not be understood as a criticism to the
(invaluable) work carried out by literary critics, which has contributed to a better
understanding of the parallelisms between Dickens and Galdós in general and to
the influence of the former on the latter in particular. In fact, scholars have admitted this problem. Tambling (2013, 191) recognizes that the work carried out is
“speculative,” whereas McGovern (2000, 1) also admits that due to the “immensity” of the literary production of both authors, it is difficult to carry out analyses
of intertextuality. The fact that the influence of Dickens on Galdós’s style has
commonly been accepted on the basis of a compilation of Dickensian references,
such as the ones detailed prior, not only justifies but also requires systematic textual approaches to the style of both authors. This would make it possible to gauge
whether and to what extent Dickens’s style influenced Galdós’s beyond the novelistic reminiscences referred to in traditional literary criticism. Corpus stylistics
can be helpful in this regard. Thanks to computer-assisted methodologies, systematic textual analyses of the literary production of both authors are possible. Nieto
Caballero (2019a), for instance, has analyzed clusters containing body part nouns
in Dickens’s and Galdós’s novels and demonstrated how both authors make use
of similar body language constructions that contribute to characterization. In this
chapter, we focus on suspensions, a characteristic feature of Dickens’s style that,
thanks to the annotation system explained next, can be systematically scrutinized
in Galdós’s craftsmanship. As will be shown in Section 3.4, there are patterns in
form and function in Galdós’s use of this unit in Fortunata and Jacinta that suggest that Dickens’s influence on his novels is more profound than usually thought
– or at least demonstrated.
Charles Dickens’s Influence on Benito Pérez Galdós Revisited 51
3.3 Methodology and Results
To carry out our analysis of suspensions in Fortunata and Jacinta, we have annotated the novel following the annotation system of Dickens’s novels explained in
Mahlberg et al. (2016). The annotation of Fortunata and Jacinta is part of an annotation of a corpus of Galdós’s novels (c. 6.4 million words) that is being carried out
as part of larger corpus-stylistic project currently underway. As in the case of Dickens’s novels, this annotation distinguishes between several textual subsets of the
novel. The main distinction is that between characters’ speech (and thoughts) (also
known as “quotes”) on the one hand and narration (also known as “non-quotes”)
on the other. Suspensions are a special type of non-quote. Suspensions, italicized in
example 2, can be short and long. Short suspensions have a length up to four words
(dijo Villalonga), whereas long suspensions have a length of five or more words
(le dijo en secreto Guillermina, deteniéndola, y ambas se miraban con picardía).5
(2) Jacinta pasó al salón, más que por enterarse de las noticias, por ver a su
marido que aquel día no había comido en casa.
“Oye” – le dijo en secreto Guillermina, deteniéndola, y ambas se miraban
con picardía – “con veinte duros que le sonsaques hay bastante.”
“En Bolsa no se supo nada. Yo lo supe en el Bolsín a las diez” – dijo Villalonga –. “Fui al Casino a llevar la noticia. Cuando volví al Bolsín, se estaba
haciendo el consolidado a 20.”
(Fortunata and Jacinta, Part 1, Chapter 6)
To create the annotated version of Fortunata and Jacinta, we have used a plain
text file of the novel and converted it to an XML file with the help of a set of
Python scripts. Specifically, we have used XML elements (<element> </element>)
to annotate paragraphs and sentences on the one hand and empty elements, also
known as milestones (<milestone/>), to annotate examples of characters’ speech
and suspensions, on the other. The elements that form the nested hierarchy of paragraphs and sentences contain the text between an opening element and a closing
element (<p> and </p> in the case of paragraphs, for example), while the empty
elements that annotate character’s speech and suspensions contain their own place
marker to indicate the start or end of the annotated phenomenon (<qs/> and <qe/>
to indicate the start and end of a character’s speech, for example). In Table 3.1 we
show the tags that we have used, which are similar to the ones used in the annotation of Dickens’s novels by Mahlberg et al. (2016).
To annotate suspensions in Fortunata and Jacinta, we first annotated characters’
speech (quotes) with <qs/> (“start of quoted text”) and <qe/> tags (“end of quoted
text”). Then, following Lambert’s (1981, 6) definition of suspensions as an interruption of a character’s speech of at least five words, we marked up any text of five or
more words which occur between a <qe/> tag and a <qs/> tag with the tags <sls/> and
<sle/> to annotate long suspensions, and any text of four or less words which occur
between a <qe/> tag and a <qs/> tag with the tags <sss/> and <sse/> to annotate short
suspensions. In Figure 3.1 we show example 2 with the annotation that we have used.
52
Pablo Ruano San Segundo
Table 3.1 Annotation Tags Used to Annotate Fortunata and Jacinta
Annotation Tags
Tag
Meaning
<p>
</p>
<s>
</s>
<qs/>
<qe/>
<sls/>
<sle/>
<sss/>
<sle/>
Paragraph (start)
Paragraph (end)
Sentence (start)
Sentence (end)
Quote (start)
Quote (end)
Suspension (long) (start)
Suspension (long) (end)
Suspension (short) (start)
Suspension (short) (end)
Figure 3.1 Example 2 with annotation.
In this chapter, we focus on long suspensions, which are more likely to contribute to meaningful lexicogrammatical patterns in narrative fiction (Mahlberg
and Smith 2010). In the particular case of Dickens’s fiction, several analyses have
demonstrated how long suspensions are a potentially useful place to check a text
for character information, especially in the form of descriptions of body language
(Mahlberg et al. 2016, 445), to look into patterns of characterization (Stockwell
Charles Dickens’s Influence on Benito Pérez Galdós Revisited 53
and Mahlberg 2015), to provide info related to characters’ psychological dimension (Ruano San Segundo 2018, 340), and even a device used to convey specific literary effects in the act of reading, such as the impression of simultaneity
between speech and body language (Mahlberg et al. 2013) or the retrospective
narration of pauses (Mahlberg and Smith 2012, 61).
In our case, the annotation of Fortunata and Jacinta has made it possible to
identify 687 long suspensions in Galdós’s novel, with which we will be able to
compare Dickens’s and Galdós’s use of this element and investigate the alleged
influence of the former on the latter from an innovative perspective. In Figure 3.2,
we show a screenshot with 50 of the 687 suspensions, arranged in alphabetical
order. The fact that a group of specific suspensions can be viewed together in the
form of a concordance makes it possible to read and analyze them vertically (Tognini-Bonelli 2001, 3). Thanks to this vertical reading, a range of co-occurrence
patterns of words can be investigated, which can be meaningful for the literary
appreciation of the novel and, in the particular case of this chapter, to compare
Galdós’s and Dickens’s use of suspensions from a stylistic point of view.
To compare Galdós’s and Dickens’s use of suspensions, we have also benefited
from the CLiC tool (Mahlberg et al. 2016), in which all the suspensions from
Dickens’s 15 novels can be visualized. In Figure 3.3 we show a screenshot of 20
suspensions in Oliver Twist. As can be observed at the right side of the screenshot,
the CLiC tool contains search options that make it possible to focus on stretches
of text within suspensions. Such search options have opened up novel ways of
using concordances to link lexicogrammatical and textual patterns (Mahlberg
et al. 2016, 433). In this chapter, we have searched for patterns in suspensions
Figure 3.2 Screenshot of 50 suspensions in Fortunata and Jacinta.
54
Pablo Ruano San Segundo
Figure 3.3 Screenshot of CLiC tool with 20 suspensions from Oliver Twist.
in Fortunata and Jacinta in our data, comparing our results in Galdós’s novel to
those in Dickens’s novels than can be visualized in the CLiC tool.
3.4 Analysis
3.4.1 Form and Function of Suspensions
Without a doubt, the aspect that brings us closer to the similarity between Dickens
and Galdós in the use of suspensions is found in the functional pattern that dominates suspensions in general. As Mahlberg et al. (2013, 40) state,
by interrupting a character’s speech, suspensions can create an impression of
simultaneity between the speech and the contextual information described by
the narrator, which in turn can suggest similarities to the simultaneous occurrence of speech and body language in real life.
This is the function par excellence of suspensions in Dickens’s novels, which is
enacted not only by the interruption of the character’s speech but also by the formal pattern found in the suspension: in addition to the reporting verb and the name
of the character whose speech is being reported, suspensions frequently contain
an -ing clause to describe the body language. Let us take 3 as an example. As can
be observed, Mrs. Sowerberry is taking up a dim and a lamp and leading the way
upstairs as she speaks. This impression of synchronicity between her words and
her body language is not only conveyed by the use of the -ing clauses but also by
placing these clauses interrupting the character’s speech.
(3) “Then come with me,” said Mrs. Sowerberry: taking up a dim and dirty
lamp, and leading the way upstairs; “your bed’s under the counter. You don’t
Charles Dickens’s Influence on Benito Pérez Galdós Revisited 55
mind sleeping among the coffins, I suppose? But it doesn’t much matter
whether you do or don’t, for you can’t sleep anywhere else. Come; don’t
keep me here all night!”
(Oliver Twist, chapter 4)
In Fortunata and Jacinta, we can see a similar pattern, as shown in examples 4 and 5. In example 4, for instance, we see how Maxi throws himself into
Fortunata as he asks her to hug him (Dame un abrazo). As in the examples
from Dickens’s novels, the impression of simultaneity is enacted both by the
-ing clause that describes the body language (arrojándose a ella medio vestido) and by suspending Maxi’s speech to describe this body language. It is
only fair to state that this device is frequently used in fictional narratives to
create the effect of synchronicity between speech and movement. As Korte
(1997, 97) points out in his analysis about body language in literary texts, this
impression is frequently created thanks to “[a] the interruption of the character’s speech by a description of the body language, and [b] the syntactical
subordination of the body language to the character’s speech.” However, the
systematicity with which Dickens does that makes it a stylistically marked
choice (Newsom 2000, 556). In the case of Fortunata and Jacinta, we also
find a repeated use of this construction in suspensions. Specifically, 407 of
the 687 suspensions identified with the annotation discussed in Section 3.3. In
other words, 59.24% of the suspensions follow the pattern that characterizes
Dickens’s use of suspensions.
(4) “Dame un abrazo” – le dijo Maxi arrojándose a ella medio vestido –. “Así
te quiero. Tú has padecido, tú has pecado . . . luego eres mía.”
(Fortunata and Jacinta, Part 4, Chapter 1)
(5) “¡Vacía, enteramente vacía!” – exclamó esta levantándola en alto y
mirándola al trasluz –. “Y estaba casi llena, pues apenas.”
(Fortunata and Jacinta, Part 2, Chapter 6)
In addition to the formal and functional pattern of suspensions, elements outside the suspensions also buttress the Dickensian echo in Galdós’s use of this
element. We can see this both in the first and in the second stretch of direct
speech that surround the suspension. On the one hand, the stretches of direct
speech that precede suspensions in Dickens’s novels frequently contain elements such as vocatives, exclamations, and imperatives, which are separated
from the remainder of the speech by the suspension. In example 3 we have
shown an example that contains an imperative (come with me). In examples 6
and 7, two examples of a vocative (Nicholas) and an exclamation (My good fellow!) are shown.
(6) “Nicholas,” cried Kate, throwing herself on her brother’s shoulder, “do
not say so. My dear brother, you will break my heart. Mama, speak to him.
56
Pablo Ruano San Segundo
Do not mind her, Nicholas; she does not mean it, you should know her better.
Uncle, somebody, for Heaven’s sake speak to him.”
(Nicholas Nickleby, Chapter 20)
(7) “My good fellow!” exclaimed Martin, clutching him by both arms, “I have
never seen her since I left my grandfather’s house.”
(Martin Chuzzlewit, Chapter 14)
By separating off vocatives, imperatives, and exclamations from the remainder
of the speech, the first stretch of the text is highlighted, and the effect of simultaneity between the character’s speech and their body language enacted, as all the
examples contain references to their body language in the suspensions. An identical pattern can be observed in Fortunata and Jacinta, in which suspensions are
also frequently preceded by stretches of direct speech that contain these elements.
In example 4, we have shown a suspension preceded by an imperative (Dame un
abrazo), whereas in example 5, the suspension is preceded by an exclamation
(¡Vacía, enteramente vacía!). In examples 8, 9, and 10, we can see other examples
from Fotunata and Jacinta that follow this structural pattern – with examples of
an imperative (Siéntate un ratito), a vocative (Primo) and an exclamation (¡Bah!),
respectively. It is interesting to note that in those cases in which the suspension
is preceded by an exclamation, as in examples 5 and 10, Galdós makes use of the
verb exclamar (exclaim). This is in line with Mahlberg et al.’s (2013, 51–52) finding that Dickens also chooses this verb when he separates an exclamation from
the remainder of the speech with a suspension, as can be observed in example 7.
The fact that Galdós does the same in his novel reinforces the alleged Dickensian
echo discussed here.
(8) “Siéntate un ratito” – dijo Moreno, haciéndolo en el sofá y dando una
palmada en el asiento –. “Más santidad que en oír siete misas, hay en practicar las obras de misericordia, acompañando a los enfermos y dando un ratito
de conversación a quien se ha pasado toda la noche en vela. Dime una cosa.
¿Cómo llevas las obras de tu asilo?”
(Fortunata and Jacinta, part 4, chapter 2)
(9) “Primo” – le dijo el otro mirándole con socarronería – ; “si quieres hijos,
haberlo pensado antes.”
(Fortunata and Jacinta, part 4, chapter 2)
(10) “¡Bah!” – exclamó apartando la vista de su hermano con un movimiento
desdeñoso de la cabeza –. “No quiero oír sermones. Yo sé bien lo que debo
hacer.”
(Fortunata and Jacinta, part 2, chapter 4)
With regard to the remainder of the speech, on the other hand, Lambert (1981,
44) found that Dickens sometimes repeated the stretch of direct speech that
Charles Dickens’s Influence on Benito Pérez Galdós Revisited 57
preceded the suspension, as shown in example 11, in which the first stretch of
direct speech (And yet) is repeated immediately after the suspension. This happens only occasionally and mostly when suspensions are unusually lengthy –
suspension in example 11 is made up of 15 words. It is interesting that Galdós
also does that in Fortunata and Jacinta. Two examples are provided in examples 12 and 13. In both cases, suspensions are lengthy too (19 and 24 words,
respectively), which suggests that Galdós, in a Dickensian manner, might have
repeated the words of the character to remind his readers of what the character was saying before (s)he is interrupted by the narrator. This seems to be
in line with the function that Lambert discusses when he analyzes Dickens’s
use of suspensions. In his view, the suspension “seems to be fundamentally a
sort of aggression” by a jealous author (Lambert 1981, 35). Although these
rather-provocative claims by Lambert cannot be tested from a stylistic point
of view, it is clear that the suspension in example 11, as in examples 12 and
13, interrupts a sentence in progress: just after the character has started to
speak the narrator interposes a (lengthy) comment “so that ancillary details
may be given, accompanying circumstances indicated” (Lambert 1981, 51),
after which the words of the character from the first stretch of direct speech
are repeated.
(11) “And yet,” said Ralph, speaking in a very marked manner, and looking
furtively, but fixedly, at Kate, “and yet I would not. I would spare the feelings
of his – of his sister. And his mother of course.”
(Nicholas Nickleby, Chapter 20)
(12) “Ahí tienes” – le dijo doña Lupe moviendo la mano derecha, con dos
dedos de ella muy tiesos, en ademán enteramente episcopal – ; “ahí tienes lo
que pasa por no hacer lo que yo te digo . . . Si hubieras seguido los consejos
que te di este verano, no te verías como te ves.”
(Fortunata and Jacinta, Part 2, Chapter 3)
(13) “De modo” – exclamó Feijoo en voz alta, abriendo los brazos y tomando
un tono que no se podría decir si era de indignación o de burla –, “de modo
que ya no hay patriotismo.”
(Fortunata and Jacinta, Part 3, Chapter 1)
Whether and to what extent this repetition is caused by a jealous author in
Dickens’s novels is open to debate. However, what cannot be denied is that this
formal feature is found in Galdós under similar circumstances (there seems to
be a relationship between repeating the words of the character and the length of
the suspension). This, together with the parallelisms in the first stretch of direct
speech that precedes the suspension and functional pattern of the suspensions
discussed before, unveils a parallel use that suggests a Dickensian echo hitherto
unremarked in literary appreciations of the influence of Dickens on the style
of Galdós.
58
Pablo Ruano San Segundo
3.4.2 Local Textual Functions
The parallelism in Dickens’s and Galdós’s use of suspensions is further reinforced
by local textual functions identified in Fortunata and Jacinta. Local textual functions (Mahlberg 2005, 2007, 2009) “describe the patterns of a (set of) lexical
item(s) in a specific (set of) text(s)” (Mahlberg et al. 2013, 37). Generally speaking, the concept of local textual function makes it possible to relate lexical patterns to a range of textual properties. In Dickens’s novels, for instance, patterns of
body part nouns that contribute to the creation of characters have been widely discussed (cf. Mahlberg 2013), some of them connected to suspensions. In this section, we focus on two aspects identified in suspensions in Fortunata and Jacinta
that seem to have a Dickensian origin too: the positioning of characters and the
link between movement and character’s thoughts.
Firstly, the way in which characters stand in relation to spatial references is
an aspect where clear similarities have been detected between Dickens’s and
Galdós’s use of suspensions. It is well-known that Dickens used “words that refer
to parts of buildings and furniture or words that give other spatial information”
as references to explain characters’ positions and movements in the scene (Mahlberg 2013, 134). This is also frequent in Galdos’s fictional narratives. Doors, for
instance, usually act as the reference in which the characters are placed and from
which the scene that is presented to us is described (Nieto Caballero 2019b). When
references to doors are found in suspensions, they seem to be used in a Dickensian
manner. In Dickens’s works, references to doors in suspensions are frequently
used to define characters’ positions (rather than movement). Thus, characters are
frequently described as standing next to a door, as in example 14, or stopping at
doors, as in examples 15 and 16. As can be observed, Mr. Perch’s, Mr. Pecksniff’s,
and Quilp’s positions are defined by the door, which is used as a spatial reference
to define the character’s position. To do so, a prepositional phrase is used (at the
door). These examples show how, in addition to creating an effect of synchronicity between speech and body language, suspensions can also contribute to defining the scene by means of references to narrative space.
(14) “Yes, Sir. Begging your pardon, Sir,” said Mr Perch, hesitating at the
door, “he’s rough, Sir, in appearance.”
(Dombey and Son, Chapter 22)
(15) “I am afraid,” said Mr Pecksniff, pausing at the door, and giving his head
a melancholy roll, “I am afraid that this looks artful. I am afraid, Mrs Lupin,
do you know, that this looks very artful!”
(Martin Chuzzlewit, Chapter 3)
(16) “There she is,” said Quilp, stopping short at the door, and wrinkling up
his eyebrows as he looked towards Miss Sally; “there is the woman I ought to
have married – there is the beautiful Sarah – there is the female who has all
the charms of her sex and none of their weaknesses. Oh Sally, Sally!”
(The Old Curiosity Shop, Chapter 33)
Charles Dickens’s Influence on Benito Pérez Galdós Revisited 59
Interestingly, Galdós makes a similar use of en la puerta (at the door) in Fortunata and Jacinta. Thus, this prepositional phrase is used in suspensions to show
where characters stand, as shown in examples 17 and 18, or where they stop, as in
example 19. As in the examples identified in Dickens’s novels, these suspensions
do not contribute to creating an impression of simultaneity between speech and
body language, but to defining the narrative space of the story world by means of
circumstantial information in the form of a prepositional phrase.
(17) “La culpa la tienes tú” – añadió severamente doña Lupe, en la puerta –,
“porque te pones a jugar con ella, le ríes las gracias, y ya ves. Cuando quieres
que te respete, no puede ser. Es muy mal criada.”
(Fortunata and Jacinta, Part 2, Chapter 1)
(18) “Consolarse” – le dijo Segismundo en la puerta –. “La vida es así; hoy
pena, mañana una alegría. Hay que tener calma, y tomar las cosas como
vienen, y no ligar todo nuestro ser a una sola persona. Cuando una vela se
acaba, debe encenderse otra . . . Conque tengamos valor, y aprendamos a
despreciar . . . Quien no sabe despreciar, no es digno de los goces del amor
. . . Y por último, simpática amiga mía, ya sabe que estoy a sus órdenes, que
tiene en mí el más rendido de los servidores para cuanto se le ocurra, amigo
diligente, reservadísimo, buena persona . . . Abur.”
(Fortunata and Jacinta, Part 4, Chapter 3)
(19) “Amigo” – dijo parándose en la puerta de la botica –. “Su mujer de
usted me ha parecido una mujer defectuosísima. Aunque la he tratado poco
puedo asegurar que tiene buen fondo; pero carece de fuerza moral. Será siempre lo que quieran hacer de ella los que la traten.”
(Fortunata and Jacinta, Part 3, Chapter 4)
Secondly, references to narrative space are also frequently related to thought
presentation. Because of the ubiquity of direct speech in Dickens’s novels, the
synchronicity conveyed by suspensions is normally between the characters’ body
language and their speech (see Mahlberg and Smith 2012; and Mahlberg et al.
2013). However, as shown in Ruano San Segundo’s (2018, 341) analysis of Dickens’s use of direct thought presentation, the Victorian author also used suspensions to achieve the same effect when reporting his characters’ thoughts, thus
creating an impression that comes close to synchronicity of presentation between
characters’ thoughts and their body language. Two examples are shown, 20 and
21. As can be observed, both suspensions contain references about narrative space
– as he crossed to the door in example 20 and walking on tiptoe to another door
near the bedside in example 21. By placing this information in the suspension,
Dickens creates an effect of simultaneity between the thoughts and actions of
Clennam and Mr. Snodgrass, respectively.
(20) “The house,” thought Clennam, as he crossed to the door, “is as little
changed as my mother’s, and looks almost as gloomy. But the likeness ends
60
Pablo Ruano San Segundo
outside. I know its staid repose within. The smell of its jars of old rose-leaves
and lavender seems to come upon me even here.”
(Little Dorrit, Chapter 13)
(21) “Very lucky I had the presence of mind to avoid them,” thought Mr.
Snodgrass with a smile, and walking on tiptoe to another door near the bedside;
“this opens into the same passage, and I can walk quietly and comfortably away.”
(Pickwick Papers, Chapter 54)
This strategy is also observed in Fortunata and Jacinta. Literary critics have
traditionally referred to the close relationship between kinetic references and the
psychological dimension of characters (see Padilla 2000; Arroyo Díez 2011, 104).
However, this has been mostly discussed in dialogue novels, in which it is frequent
that the narrator projects characters’ thoughts by making use of soliloquies or
thought presentation strategies, such as the interior monologue (Jiménez Gómez
2020, 369). These strategies are combined with the kinetic references mentioned
before, which results in a relation of interdependence between characters’ movements and the representation of their thoughts. This aspect, however, remains
underexplored outside dialogue novels. In Fortunata and Jacinta, we have found
that the relationship between kinetic references and the psychological dimension
of characters is enacted by means of suspensions, as shown in examples 22, 23,
and 24. As in Dickens’s works, Galdós makes use of references to narrative space
– el establecimiento (the establishment), la sala (the room), and la alcoba (the
chamber) – in suspensions. They are part of -ing clauses – penetrando en el establecimiento in example 22, for instance – with which the impression of simultaneity between characters’ thoughts and their movements is further reinforced.
(22) “¡Dátiles! . . . ¡Cuántos le he comprado yo! Las golosinas la venden.
Se despepita por ellas . . .” – pensó el razonador, penetrando en el establecimiento, sin ver nada de lo que en él había –. “Come dátiles . . . luego no está
mala; los dátiles son muy indigestos. Y puesto que ella los come, la causa del
no salir, no es enfermedad . . . Luego, es otra cosa . . .”
(Fortunata and Jacinta, Part 4, Chapter 5)
(23) “Pues lo que es hoy sí que no me quedo con esto dentro del cuerpo” –
pensó mi hombre al otro día, entrando en la sala, hecho un sol de limpio y
despidiendo, como todas las mañanas al salir de su casa, un fuerte olor a
colonia –. ¿Y Dónde está?, ¿qué hace que no sale? Es un encanto esa mujer,
y tengo al tal Santa Cruz por el gaznápiro más grande que come pan . . .
¡Cuánto me hace esperar!”
(Fortunata and Jacinta, Part 3, Chapter 4)
(24) “Pues lo que es mañana temprano” – se dijo volviendo a la alcoba –,
“mañana tempranito, antes de que salga para el obrador, voy y la acogoto . . .”
(Fortunata and Jacinta, Part 4, Chapter 6)
Charles Dickens’s Influence on Benito Pérez Galdós Revisited 61
This function of suspensions to connect thought presentation with characters’
movements or the role of doors to define characters’ positions, together with the
effect of synchronicity between characters’ speech and body language and the formal aspects discussed at the beginning of Section 3.4 (both in the suspension and
in the stretches of direct speech before and after the interruption), reveals hithertounremarked textual parallelisms between Dickens and Galdós that serve to reinforce the influence of the former on the latter. Without a doubt, the identification
of these patterns has only been possible thanks to the annotation of Galdós’s Fortunata and Jacinta, which has made it possible to systematically scrutinize 687
instances of suspensions throughout the novel. This proves the potential of corpus
stylistics and how computer-assisted approaches can unveil meaningful textual
patterns that cannot be detected with more traditional approaches and from which
the analysis of literary texts can benefit greatly.
3.5 Conclusion
The Dickensian element in Benito Pérez Galdós’s craftsmanship is indisputable.
The study of this element in the works of the Spanish novelist, however, has been
built upon a compilation of impressionistic references (based mostly on shared
themes, scenes, and characters) rather than on textual analyses of their works.
This lack of stylistic analyses is partly due to the difficulty of analyzing both
authors’ works systematically, as some critics have admitted (see Section 3.2).
Thanks to the emergence of new computer-assisted disciplines such as corpus
stylistics, new avenues of analysis have been disclosed. By combining methods
and theories from literary stylistics and corpus linguistics alike, corpus stylistics makes it possible to identify meaningful patterns that have traditionally gone
unremarked in critical appreciations of literary texts, which have contributed to
furthering our understanding of the effects that these patterns have on the way in
which readers create meanings from texts.
This is precisely what we have set out to do in this chapter. Specifically, we
have shown how suspensions can be systematically explored and compared in
the works of Charles Dickens and Benito Pérez Galdós thanks to an annotation of
the texts under analysis. A suspension is an interruption of a character’s speech.
Stylistically speaking, the suspension is a textual device for which Dickens is
well-known. The Victorian author makes an extensive use of suspensions with
different purposes, such as organizing discourse, offering character information
or creating specific literary effects, such as an impression of simultaneity between
the words of a character and their actions. Thanks to the annotation system that
we have developed, we have looked into some of these Dickensian traits in Fortunata and Jacinta, the novel for which Galdós is best known. As has been shown,
Galdós makes a Dickensian use of suspensions both from a formal and functional
point of view. From a formal point of view, on the one hand, Galdós frequently
incorporates an -ing clause to the suspension, with which he conveys an impression of simultaneity between the words of the character and their body language.
Besides, like Dickens, he also frequently makes use of vocatives, exclamations,
62
Pablo Ruano San Segundo
or imperatives in the stretch of direct speech that precedes the suspensions and
also tends to repeat the stretch of direct speech that preceded the suspension in
the remainder of the speech when suspensions are lengthy, as Dickens frequently
does in his novels. From a functional point of view, on the other hand, we have
looked into some textual functions identified in Fortunata and Jacinta which
seem to have a Dickensian origin too. Thus, in addition to the frequent effect
of synchronicity between characters’ speech and their body language, we have
also identified more specific functions, such as the positioning of characters next
to doors or the relation of interdependence between characters’ movements and
the representation of their thoughts. The remarkable systematicity of these patterns has unveiled a more subtle similarity than has so far been noticed between
Dickens and Galdós, thus opening new avenues of analysis in the study of the
well-known (yet still underexplored) influence of the former on the latter from a
stylistic point of view.
Notes
1 The research reported on in this chapter has been funded by the Spanish government
(Ayuda del Programa de Recualificación del Sistema Universitario Español. Modalidad
de recualificación del profesorado universitario funcionario o contratado), which we
acknowledge here.
2 All the examples shown in the chapter are taken from e-texts (see Section 3.3). Therefore, we provide the chapter location rather than page references.
3 The Spanish quote reads: “Consideraba yo a Carlos Dickens como mi maestro más
amado. En mi aprendizaje literario, cuando no había salido yo de mi mocedad petulante,
apenas devorada La Comedia humana de Balzac, me apliqué con loco afán a la copiosa
obra de Dickens” (Pérez Galdós 1980, 1693).
4 The translation of Pickwick Papers was preceded by the essay “Carlos Dickens,” in
which Galdós explained to readers of La Nación the most prominent features of the
Victorian author’s style.
5 For further information on the rationale behind the division of the subsets, see Mahlberg
et al. (2016).
References
Arroyo Díez, María Cristina. 2011. “Aspectos Espaciales y Visuales en las Primeras Novelas Contemporáneas Benito Pérez Galdós y su Repercusión en la Novela Española
Actual.” PhD diss., Universidad de Valladolid.
Benítez, Rubén. 1990. Cervantes en Galdós. Murcia: Universidad de Murcia.
Gilman, Stephen. 1981. Galdós and the Art of the European Novel: 1867–1887. Princeton,
NJ: Princeton University Press.
Goldman, Peter. 1971. “Galdós and Cervantes: Two Articles and a Fragment.” Anales Galdosianos 4: 99–106.
Jiménez Gómez, Cristina. 2020. “Galdós y su Narrativa: La Polifonía Textual Como
Mecanismo Configurador de las Voces Ajenas.” Boletín de la Real Academia de Córdoba 169: 361–82.
Korte, Barbara. 1997. Body Language in Literature. Toronto: University of Toronto Press.
Charles Dickens’s Influence on Benito Pérez Galdós Revisited 63
Lacosta, Francisco C. 1968. “Galdós y Balzac.” Cuadernos Hispanoamericanos 224–5:
345–74.
Lambert, Michael. 1981. Dickens and the Suspended Quotation. New Haven, CT: Yale
University Press.
Ley, Charles David. 1977. “Galdós Comparado con Balzac y Dickens, Como Novelista
Nacional.” In Actas del primer congreso internacional galdosiano, 291–95. Las Palmas
de Gran Canaria: Cabildo de Gran Canaria.
Mahlberg, Michaela. 2005. English General Nouns: A Corpus Theoretical Approach.
Amsterdam: John Benjamins.
Mahlberg, Michaela. 2007. “Lexical Items in Discourse: Identifying Local Textual Functions of Sustainable Development.” In Text, Discourse and Corpora. Theory and
Analysis, edited by Michael Hoey, Michaela Mahlberg, Michael Stubbs, and Wolfgang
Teubert, 191–218. London: Continuum.
Mahlberg, Michaela. 2009. “Local Textual Functions of Move in Newspaper Story Patterns.” In Exploring the Lexis-Grammar Interface, edited by Ute Römer and Rainer
Schulze, 265–87. Amsterdam: John Benjamins.
Mahlberg, Michaela. 2013. Corpus Stylistics and Dickens’s Fiction. London: Routledge.
Mahlberg, Michaela, and Catherine Smith. 2010. “Corpus Approaches to Prose Fiction:
Civility and Body Language in Pride and Prejudice.” In Language and Style, edited by
Dan McIntyre and Beatrix Busse, 449–67. Basingstoke: Palgrave Macmillan.
Mahlberg, Michaela, and Catherine Smith. 2012. “Dickens, the Suspended Quotation and the Corpus.” Language and Literature 21, no. 1: 51–65. https://doi.
org/10.1177/0963947011432058.
Mahlberg, Michaela, Catherine Smith, and Simon Preston. 2013. “Phrases in Literary Contexts: Patterns and Distributions of Suspensions in Dickens’s Novels.” International
Journal of Corpus Linguistics 18, no. 1: 35–56. https://doi.org/10.1075/ijcl.18.1.05mah.
Mahlberg, Michaela, Peter Stockwell, Johan de Joode, Catherine Smith, and Matthew
Brook O’Donnell. 2016. “CLiC Dickens: Novel Uses of Concordances for the Integration of Corpus Stylistics and Cognitive Poetics.” Corpora 11, no. 3: 433–63.
McGovern, Timothy. 2000. Dickens in Galdós. New York: Peter Lang.
Newsom, Robert. 2000. “Style of Dickens.” In The Oxford Reader’s Companion to Charles
Dickens, edited by Paul Schlicke, 553–57. Oxford: Oxford University Press.
Nieto Caballero, Guadalupe. 2019a. “Análisis de la Influencia de Charles Dickens en el
Estilo de Benito Pérez Galdós a Través del Lenguaje Gestual de sus Personajes: Un
Estudio de Corpus.” Dicenda. Estudios de lengua y literatura españolas 37: 321–41.
Nieto Caballero, Guadalupe. 2019b. “El Espacio como Eje Vertebrador en la Creación del
Universo Ficticio Galdosiano: Un Estudio de Corpus.” Signa. Revista de la Asociación
Española de Semiótica 28: 1203–38.
Ollero, Carlos. 1973. “Galdós y Balzac.” In Benito Pérez Galdós. El escritor y la crítica,
edited by Douglass M. Rogers, 185–93. Madrid: Taurus.
Padilla Mangas, Ana María. 2000. “Del Galdós Narrador al Galdós Dramaturgo: Un Acercamiento al Problema de las Didascalias.” In Actas VI Congreso Internacional Galdosiano 1997, edited by Carmen Yolanda Arencibia Santana, María del Prado Escobar
Bonilla, and Rosa María Quintana Domínguez, 782–93. Las Palmas de Gran Canaria:
Cabildo de Gran Canaria.
Pérez Galdós, Benito. 1980. Obras Completas. Novelas. Tomo III. Madrid: Aguilar.
Ruano San Segundo, Pablo. 2018. “A Corpus-based Approach to Charles Dickens’s Use of
Direct Thought Presentation.” Corpora 13, no. 3: 319–45.
64
Pablo Ruano San Segundo
Stockwell, Peter, and Michaela Mahlberg. 2015. “Mind-Modelling with Corpus Stylistics
in David Copperfield.” Language and Literature 24, no. 2: 129–47.
Tambling, Jeremy. 2013. “Dickens and Galdós.” In The Reception of Charles Dickens in
Europe, edited by Michael Hollington, 191–96. London: Bloomsbury.
Tognini-Bonelli, Elena. 2001. Corpus Linguistics at Work. Amsterdam: John Benjamins.
Wright, Chad C. 1979. “Artifacts and Effigies: The Porreño Household Revisited.” Anales
Galdosianos 14: 13–24.
4
A Corpus-Stylistic Approach
to the Literary Representation
of Narrative Space in Ruiz
Zafón’s The Cemetery of
Forgotten Books Series
Guadalupe Nieto Caballero and
Pablo Ruano San Segundo
4.1 Introduction
In this chapter we show the application of corpus linguistic techniques to the
analysis of literary texts written in Spanish. This methodology falls in the realm
of corpus stylistics (McIntyre and Walker 2019), an area of corpus linguistics that
applies “corpus methods to the analysis of literary texts, giving particular emphasis to the relationship between linguistic description and literary appreciation”
(Mahlberg 2014, 378). Although corpus stylistics is a well-established approach
in the analysis of literary texts in general, the use of computer-assisted methodologies has not yet been fully developed in the Spanish-speaking world (Nieto
Caballero and Ruano San Segundo 2020, 19). This chapter sets out to demonstrate
how the analysis of literary texts written in Spanish can also benefit greatly from
quantitative methods to retrieve data that can then be subjected to a qualitative
analysis. To do so, we analyze Carlos Ruiz Zafón’s The Cemetery of Forgotten
Books (El Cementerio de los Libros Olvidados) series. Ruiz Zafón is the mostread Spanish author of the twentieth and twenty-first centuries, whose books have
been translated into more than 50 different languages (Ramos Nogueira 2016).
The Cemetery of Forgotten Books series is made up of four books: La sombra del
viento (The Shadow of the Wind), El juego del ángel (The Angel’s Game), El prisionero del cielo (The Prisoner of Heaven), El laberinto de los espíritus (The Labyrinth of Spirits). La sombra del viento (Ruiz Zafón 2001) is a Gothic mystery that
involves Daniel Sempere’s quest to track down the man responsible for destroying
every book written by author Julian Carax. El juego del ángel (Ruiz Zafón 2008)
is a prequel to La sombra del viento, also set in Barcelona, but during the 1920s
and 1930s. It follows David Martín, a young writer who is approached by a mysterious figure to write a book. The next book in the cycle is El prisionero del cielo
(Ruiz Zafón 2011). It returns to La sombra del viento’s Daniel Sempere and his
travel back to the 1940s to resolve a buried secret. El laberinto de los espíritus
(Ruiz Zafón 2016) is the fourth and final book in the Cemetery of Forgotten Books
series. The novel is set in the Barcelona of the late 1950s and early 1960s. Daniel,
overwhelmed by rage and the need to avenge the death of his mother, Isabella,
DOI: 10.4324/9781003298328-5
66
Guadalupe Nieto Caballero and Pablo Ruano San Segundo
will discover a network of crimes and violations of Francoist Spain, and a new
protagonist, Alicia Gris, will help him solve the mysteries.
The four books are related and share motifs, themes, symbolism, etc. Using a
corpus-stylistic methodology, in this chapter we intend to show how certain aspects
discussed by literary critics are enacted in the same way in the four novels, thus
unveiling aspects of Zafón’s craftsmanship hitherto unremarked in literary appreciations of his style. More specifically, we will look into how the author shapes narrative space in the series. To do so, we will carry out a cluster analysis, with which
we will identify textual building blocks used in the four novels and analyze them
systematically. As pointed out before, the analysis is meant to make a contribution
to the still-emerging field of corpus stylistics in Spanish, illustrating how the analysis of literary works can benefit greatly from the use of innovative corpus tools.
The chapter is organized as follows. First, we offer a general overview of Ruiz
Zafón’s treatment of narrative space (Section 4.2). Then, we explain the methodology used to identify the clusters under analysis, and we show the results obtained
(Section 4.3). In Section 4.4, we analyze some of the examples, discussing meaningful patterns that contribute to the creation of particular literary effects. This
section is divided into two subsections, which concentrate on clusters that contribute to the creation of fictional universes and characterization (Section 4.4.1)
and on clusters that offer contextualizing information (Section 4.4.2). The chapter
finishes with some remarks on the potential of corpus stylistics for the analysis of
literary texts in the Spanish-speaking world.
4.2 Narrative Space in Ruiz Zafón’s Novels
The treatment of space in fictional narratives has received a great deal of scholarly
attention, which has resulted in different approaches and a distinction between
different types of space, such as narrative space, the space that serves as context
for the text, the space taken by the text itself, and the spatial form of the text,
among others (Ryan et al. 2016; Buchholz and Manfred 2005). In this chapter,
we concentrate on narrative space. Narrative space refers to “the space (and the
places) providing the physical environment in which the characters of narrative
live and move” (Ryan et al. 2016, 3). This is a fundamental dimension of any
fictional narrative (Álvarez Méndez 2002), as it is closely related to the rest of
the elements that shape the narrative story world: every story implies a series of
events that take place in a given time and space (Zubiaurre 2000, 20). Narrative
space is of paramount importance to create an effect of believability, especially in
the case of non-transportable narratives that are rooted and inscribed in specific
locations (Matzat 2007). Authors such as Dickens, Galdós, or Balzac, to name but
three canonical novelists, are well-known for their treatment of space to inscribe
some of their novels in London, Madrid, and Paris, respectively. Ruiz Zafón and
The Cemetery of Forgotten Books series is another good example, with Barcelona
as the setting of the story. The way in which textual space is mapped in the four
novels that make the series provides the reader with a physical environment that
contributes to an effect of plausibility.
A Corpus-Stylistic Approach to Ruiz Zafón’s Series
67
In addition to a faithful reproduction of the fictional universe, narrative space
can also convey a wide range of possibilities of interpretation (Zoran 1984, 319).
In other words, narrative space not only serves as a physical environment for
characters but also plays a stylistic role that usually goes beyond the geographical
reality in which the characters of a narrative live and move. In the case of Ruiz
Zafón, his masterful treatment of space is also connected to literary functions that
go beyond the description of the scene. Settings like houses, train stations, and
cemeteries are frequently given a symbolic value. Houses, for instance, often hide
secrets, intriguing plots, or unsolved mysteries that the new tenant who comes to
them must solve. They are also frequently given a symbolic value, as in El palacio
de la medianoche, in which Chandra Chatterghee’s house becomes an alter ego to
the awkward personality of the character, or the cabin on the desert island where
David Martín takes refuge after his long adventure in El juego del Ángel, which is
frequently seen as a representation of David’s inner life (Ruiz Tosaus 2009). With
regard to train stations, they are a common setting in Ruiz Zafón’s novels and are
frequently used symbolically too, especially in The Cemetery of Forgotten Books
series. In La sombra del viento and El juego del Ángel, the train station Estación
de Francia in Barcelona becomes a prominent setting in the story. In La sombra
del viento, on the one hand, the station is the place in which frustration overcomes
the happiness of the union between lovers. In El juego del Ángel, on the other, this
train station symbolizes escape and the search for a new life (Ruiz Tosaus 2009).
Finally, cemeteries are also frequently described in terms that suggest something
otherworldly. The Cemetery of Forgotten Books is referred to as a palace, a temple, a honeycomb filled with honey (Hrabova et al. 2020). Specific cemeteries are
also mentioned throughout the series, such as the cemetery in Sarriá in La sombra
del viento or the cemetery of Pueblo Nuevo and the cemetery of San Gervasio in
El juego del Ángel. These real cemeteries are frequently likened to labyrinths,
which contributes to shaping the entangled and mystical image of the Cemetery
of Forgotten Books in the series (Ruiz Tosaus 2009).
Literary appreciations of Ruiz Zafón’s craftsmanship have mostly focused on
this symbolic dimension of narrative space, to the detriment of the configuration of the spaces and places that make up the physical environment in which
the characters of his narratives live and move. However, in the configuration of
narrative of space, we can also identify habits hitherto unremarked that can contribute to a better understanding of Ruiz Zafón’s style. It is true that some scholars
have referred to aspects of Ruiz Zafón’s writing in connection to his configuration of narrative space. Ruiz Tosaus (2009), for instance, mentions the abundance
of movement in Ruiz Zafón’s novels, which seems to be related to the episodic
nature of his writing. Romero Frías and Galiñanes Gallén (2009) scrutinize the
descriptions of narrative space in La sombra del viento as filtered by a narratorprotagonist whose view occasionally distorts what is really happening. Apart
from these isolated references, however, literary critics have not yet methodically
approached the way in which Ruiz Zafón shapes the narrative space in his novels
from a textual point of view. The clusters analyzed in this chapter reveal formal
and functional patterns that will help to better understand the author’s style in this
68
Guadalupe Nieto Caballero and Pablo Ruano San Segundo
regard. To do so, a corpus-stylistic approach has been used. The methodology
used to identify the examples under analysis is explained next.
4.3 Methodology and Results
The corpus with which we have conducted our analysis is made up of the four
novels of The Cemetery of Forgotten Books series: La sombra del viento (Ruiz
Zafón 2001), El juego del ángel (Ruiz Zafón 2008), El prisionero del cielo (Ruiz
Zafón 2011), and El laberinto de los espíritus (Ruiz Zafón 2016). These four
novels add up to 612,027 words, distributed as shown in Table 4.1. To carry out
the analysis and identify textual building blocks related to the shaping of narrative
space, we have used WordSmith Tools (Scott 2016), the most frequently used tool
in corpus stylistics (Archer 2007, 249). More specifically, we have used this software tool to identify clusters in our corpus of Ruiz Zafón’s novels.1 From a formal
point of view, a cluster is a sequence of two or more words that repeatedly occur
consecutively in a corpus of texts (Cheng 2012, 72). Different terms have been
used to refer to clusters, such as “recurrent word-combinations” (Altenberg 1998),
“chains” (Stubbs and Barth 2003), or “n-grams” (Anthony 2019). Following the
taxonomy used in WordSmith Tools user guide (2013), we have opted for the term
cluster, as it is defined by purely formal aspects (Scott 2019). From a functional
viewpoint, clusters identified in a corpus are supposed to have “identifiable discourse functions in texts” (Conrad and Biber 2005, 58), and they can lead to the
identification of meaningful textual building blocks (Mahlberg 2013, 26). The
potential functional value of clusters makes it very important to carefully define
the criteria that guide their identification. There are three aspects that should be
borne in mind in this regard: the length of the clusters, their distribution across the
corpus, and their frequency.
Firstly, regarding the length of the examples, analyses tend to oscillate between
three- and five-word clusters. Biber et al. (1999, 992) explain that three-word
clusters, although more numerous due to their limited length, tend to be related
to grammar aspects. Lengthier clusters, on the contrary, are “more phrasal in
nature and correspondingly less common” (ibid.). These clusters are more useful
in order to identify stylistically relevant functions in literary texts, even if less
examples are identified. As Mahlberg points out in her study of Victorian fiction in general and of Dickens’s works in particular, a “length of five has been
shown to be a useful starting point for the analysis of fiction” (Mahlberg 2013).
This is also the length of the examples discussed in Nieto Caballero’s (2019)
Table 4.1 Novels by Ruiz Zafón
Novel
Words
La sombra del viento
El juego del ángel
El prisionero del cielo
El laberinto de los espíritus
158,802
156,248
68,237
228,740
A Corpus-Stylistic Approach to Ruiz Zafón’s Series 69
analysis of Galdós’s novels. These studies seem a good reference, and we have
decided to analyze five-word clusters too, ruling out any example of four or less
words. Secondly, as for the distribution of the examples identified, it is necessary to emphasize that examples should be found across a number of texts in
the corpus so that they can be analyzed as a stylistic mark that goes beyond the
idiosyncratic use in one text. In our study, we have only analyzed examples that
have been found in the four novels included in our corpus. Frequency, finally, is
the aspect about which it is more difficult to make an informed decision when
working with clusters. As Kopaczyk states, “(t)here is no uniform practice in
lexical bundle studies. . . . Every researcher takes an informed but idiosyncratic
decision” (Kopaczyk 2013, 152). Sometimes, frequency is measured in absolute terms, whereas in other cases, it is normalized (per million of words, for
example). Absolute frequencies are the preferred option when working with specialized corpora, while normalized frequencies seem a good alternative when
working with more extensive corpora (Biber et al. 1999, 993). Because our corpus of study is a specialized corpus, made up of novels by one novelist, we have
opted for absolute frequencies. Specifically, we have established a threshold of
ten occurrences in our search criteria. In sum, we have selected clusters with a
length of five words that occur at least ten times in the corpus and that are found
in each of the four texts of the corpus. With these criteria, we have identified 28
examples. Results are shown in Table 4.2.
Table 4.2 Clusters in Ruiz Zafón’s Novels
N
Word
Freq. N
Word
Freq.
1 AL OTRO LADO DE LA
54
2 CEMENTERIO DE LOS LIBROS
OLVIDADOS
3 ME DI CUENTA DE QUE
4 COMO SI SE TRATASE DE
5 EN LO ALTO DE LA
6 OTRO LADO DE LA CALLE
7 A LA PUERTA DE LA
54
15 CON LÁGRIMAS EN LOS 13
OJOS
16 CON UN HILO DE VOZ
13
39
36
25
24
23
17
18
19
20
21
8 EL CEMENTERIO DE LOS
LIBROS
9 DEL CEMENTERIO DE LOS
LIBROS
10 PUSO LOS OJOS EN BLANCO
11 AL FIN Y AL CABO
12 A LAS PUERTAS DE LA
19
17
16
13 SE ENCOGIÓ DE HOMBROS Y
16
14 SI SE TRATASE DE UN
14
21
19
LA VERDAD ES QUE NO
QUE SE TRATABA DE UN
SE DIO LA VUELTA Y
A LA ENTRADA DE LA
AL CEMENTERIO DE
LOS LIBROS
22 Y SE ENCOGIÓ DE
HOMBROS
23 EN EL INTERIOR DE LA
24 A LA ESPERA DE QUE
25 A LOS PIES DE LA
26 DE LA CALLE SANTA
ANA
27 LA IGLESIA DE SANTA
ANA
28 TUVE LA IMPRESIÓN DE
QUE
13
13
13
12
12
12
11
10
10
10
10
10
70
Guadalupe Nieto Caballero and Pablo Ruano San Segundo
Of the 28 examples identified, 13 are directly connected to narrative space:
AL OTRO LADO DE LA, CEMENTERIO DE LOS LIBROS OLVIDADOS,
EN LO ALTO DE LA, OTRO LADO DE LA CALLE, A LA PUERTA DE LA,
A LA ENTRADA DE LA, EL CEMENTERIO DE LOS LIBROS, DEL CEMENTERIO DE LOS LIBROS, EN EL INTERIOR DE LA, A LOS PIES DE LA, DE
LA CALLE SANTA ANA, LA IGLESIA DE SANTA ANA and A LAS PUERTAS DE LA. The fact that almost half of the clusters identified are related to
the configuration of narrative space reveals a dimension about which Ruiz Zafón
seems particularly concerned in his novels. Thus, in addition to giving a particular symbolic value to physical environments, narrative space is also frequently
brought to the forefront of his writing. Some of the examples identified are part
of the same textual block, such as CEMENTERIO DE LOS LIBROS OLVIDADOS, CEMENTERIO DE LOS LIBROS, and DEL CEMENTERIO DE LOS
LIBROS. However, most of the examples refer to different aspects that contribute
to the configuration of the narrative space throughout the series. The most striking
aspect is probably the number of references to specific places and settings found
in the examples. There are references to doors (puertas, as in A LA PUERTA DE
LA), streets (calles, as in OTRO LADO DE LA CALLE), entrances (entradas,
as in A LA ENTRADA DE LA), or churches (iglesias, as in LA IGLESIA DE
SANTA ANA). This contributes to defining the physical environment in which
characters move and live and also to creating the effect of believability discussed
in Section 4.2. In example 1, for instance, we show an example of LA IGLESIA
DE SANTA ANA in El laberinto de los espíritus. The systematic references to
real places such as the church of Santa Ana contribute to rooting the story world
of The Cemetery of Forgotten Books series in specific locations. They are repeatedly referred to in the four books of the series – occurrences of LA IGLESIA DE
SANTA ANA in El prisionero del cielo, La sombra del viento, and El juego del
ángel are shown in examples 18, 19, and 20.
(1) Como todos los domingos desde que se había quedado viudo, más de
veinte años atrás, Juan Sempere se levantaba temprano, se preparaba un café
bien cargado y se enfundaba su traje y su sombrero de señor de Barcelona
para bajar a la iglesia de Santa Ana. (El laberinto de los espíritus.)
(Ruiz Zafón 2016, 582)
In order to explore the functions of clusters and how Ruiz Zafón shapes the
configuration of narrative space, we have searched for meaningful patterns that
contribute to the creation of specific effects across the four novels under analysis.
This will disclose hitherto-unremarked stylistic traits of Ruiz Zafón’s craftsmanship, which may contribute to a better understanding of his literary language.
4.4 Analysis
A cluster analysis can be conducted in different ways. The most straightforward approach is perhaps by running a concordance of the clusters identified.
A Corpus-Stylistic Approach to Ruiz Zafón’s Series 71
Concordances make possible a “vertical reading” (Tognini Bonelli 2001, 18) of
the examples, which can in turn unveil unnoticed aspects of how a cluster is used.
For example, the cluster CEMENTERIO DE LOS LIBROS OLVIDADOS, which
refers to the mysterious place around which the novel revolves, is used in the
exact same manner when a character welcomes another to that place, as shown
in examples 2 to 8. As can be observed, the stretch of direct speech in which the
cluster appears follows the same pattern every time: there is a vocative together
with the phrase “bienvenido/a al” (welcome to) and the cluster.
(2) – Daniel, bienvenido al Cementerio de los Libros Olvidados. (La sombra
del viento) (Ruiz Zafón 2001, 8)
(3) – Ignatius B. Samson, bienvenido al Cementerio de los Libros Olvidados.
(El juego del Ángel) (Ruiz Zafón 2008, 89)
(4) – Bienvenida al Cementerio de los Libros Olvidados, Isabella. (El juego del
Ángel) (Ruiz Zafón 2008, 355)
(5) – Fermín, bienvenido al Cementerio de los Libros Olvidados. (El prisionero
del cielo) (Ruiz Zafón 2011, 271)
(6) – Alicia – dijo por fin –. Bienvenida al Cementerio de los Libros Olvidados.
(El laberinto de los espíritus) (Ruiz Zafón 2016, 53)
(7) – Alicia, bienvenida de nuevo al Cementerio de los Libros Olvidados. (El
laberinto de los espíritus) (Ruiz Zafón 2016, 460)
(8) – Julián, bienvenido al Cementerio de los Libros Olvidados. (El laberinto
de los espíritus)
(Ruiz Zafón 2016, 621)
This conciseness of the sentences in which the Cemetery of Forgotten Books is
mentioned suggests how Ruiz Zafón encapsulates a moment of heightened intensity in a short stretch of language. Interestingly, this fits in what some authors
admired by Ruiz Zafón do in similar moments. Charles Dickens, for instance, uses
short sentences in parataxis to convey the same effect (Gordon 1966, 150). Gordon uses the example of Little Nell’s death in The Old Curiosity Shop, concisely
described in a short one-sentence paragraph. According to Brook (1970, 29), this
Dickensian trait is the result of the direct influence of Romantic writers, in which
this linguistic feature was extensively used in moments of heightened intensity.
As shown in these examples, Ruiz Zafón shows a similar tendency. Although this
would require a more detailed analysis than editorial constraints permit here, it is
not far-fetched that this could be connected to the alleged Dickensian influence
in Ruiz Zafón’s writing. As Calle Rosingana (2012, 32) states in relation to Ruiz
Zafón and Dickens, “Ruiz Zafón’s admiration for nineteenth-century fiction has
been expressed by the writer in many interviews and is one of the central elements
on which he bases his literary style” (our translation). As can be seen in examples 2 to 8, far from being circumstantial, the cluster CEMENTERIO DE LOS
LIBROS OLVIDADOS shows a consistent pattern in Zafón’s style, as it is used
similarly throughout the four books in the series. This could be a textual trace of
nineteenth-century authors in general and of Dickens in particular.
72
Guadalupe Nieto Caballero and Pablo Ruano San Segundo
Figure 4.1 Screenshot of concordance a la puerta de la in Ruiz Zafón’s novels.
Concordances can also help us identify lexicogrammatical patterns in the use of
clusters. Let us take A LA PUERTA DE LA as an example. A screenshot of a concordance run with WordSmith Tools is shown in Figure 4.1. As we can see, there
are elements used repeatedly with this cluster: verbs that indicate movement, such
as acercarse (get closer) (concordance lines 8, 9, 20, and 21) and aproximarse
(approach) (concordance line 15); verbs that indicate position, such as detenerse (stop) (concordance lines 10, 11), permanecer (remain) (concordance line 4),
esperar (wait) (concordance line 12), or apostarse (position) (concordance lines
16, 17, 18, and 19); adverbs that describe an action, such as lentamente (slowly)
(concordance lines 8 and 9); and nouns that refer to means of transportation, such
as coche (car) (concordance lines 6 and 21).
A vertical reading of the concordance can help us see how the author uses
these elements with the cluster and look into how the configuration of narrative space is shaped. Although editorial constraints do not allow to cover them
all in this chapter, some of them are discussed next, as they are repeatedly used
with other clusters too, resulting in meaningful functional patterns. We will focus
A Corpus-Stylistic Approach to Ruiz Zafón’s Series 73
on two aspects: the use of spatial references in relation to the construction of
fictional universes and characterization (Section 4.4.1) and the combination of
spatial and time references as a means of offering contextualizing information
(Section 4.4.2).
4.4.1 Fictional Universes and Characterization
The construction of fictional universes and characterization in The Cemetery
of Forgotten Books series is frequently related to references to narrative space.
Descriptions of settings and characters often occur after one of the clusters under
analysis here. This would be the case of example 9 from El juego del Ángel, in
which AL OTRO LADO DE LA precedes a description of a bookshop. This is
the function par excellence of the clusters that refer to narrative space. Not only
are these the most frequent examples, but they also fulfill the primary function
of space references: to provide a corporeal reality of the fictional universe we
are seeing. These examples are frequently related to eye behavior – as seen in
“nadie podía vernos” and “En el interior se podía ver” in (9) – which gives us
the impression of being informed about the bookshop through the lens of the
narrator and his own feelings at that moment. This is in line with Romero Frías
and Galiñanes Gallén’s (2009) findings in their analysis of descriptions of narrative space in La sombra del viento, frequently filtered by a narrator-protagonist
whose view occasionally distorts what is really happening (see Section 4.2). As
seen in example 9 from El juego del Ángel, this goes beyond La sombra del
viento and seems a textual device that characterizes Ruiz Zafón’s craftsmanship
in general.
(9) Barcelona nunca me parecía tan hermosa y tan triste como aquella tarde.
Cuando empezaba a anochecer nos acercamos hasta la librería de Sempere e
Hijos. Nos apostamos en un portal al otro lado de la calle, donde nadie podía
vernos. El escaparate de la vieja librería proyectaba un soplo de luz sobre los
adoquines húmedos y brillantes. En el interior se podía ver a Isabella aupada
a una escalera ordenando libros en el último estante, mientras el hijo de Sempere hacía como que repasaba un libro de contabilidad tras el mostrador y le
miraba los tobillos de refilón. (El juego del Ángel.)
(Ruiz Zafón 2008, 258)
In addition to eye behavior, clusters related to the construction of fictional universes and characterization are frequently used in connection to references to means
of transportation, as commented on in the previous section. In examples 10 and
11, we show two similar examples of A LAS PUERTAS DE LA from two different novels. In both cases, a taxi takes a character to a place, then stops at the doors
of a building (A LAS PUERTAS DE LA), and then the narrator starts describing a
particular place or character. In example 10 from El juego del Ángel, the narrator
describes the headquarters of the newspaper La Vanguardia, whereas in example 11
74
Guadalupe Nieto Caballero and Pablo Ruano San Segundo
from El laberinto de los espíritus, the description is of a particular character. Another
example from El laberinto de los espíritus is shown in example 12, this time with
the cluster EN LO ALTO DE LA and with a tram (“tranvía”) instead of a taxi. This
suggests a consistent textual pattern in Ruiz Zafón’s series. Thus, not only different clusters used repeatedly across the four novels (A LA PUERTA DE LA, A LAS
PUERTAS DE LA, EN LO ALTO DE LA) are used as references from which a
description starts, but they are also frequently related to a means of transportation,
with which a character arrives at a place, and as if we were with him/her, we start
seeing what is going on. Let us take example 10 as an example. Ruiz Zafón uses
Daniel Sempere and his arrival to the headquarters of the newspaper La Vanguardia
to start describing what the building is like (“todo allí desprendía un aire de señorío
y opulencia”) and also descriptions about the people working inside (“un chaval con
trazas de meritorio que me recordaba a mí mismo en mis años de Pepito Grillo”).
This further reinforces Romero Frías and Galiñanes Gallén’s (2009) discussion of
our perception of events through the lens of the narrator-protagonist and buttresses
the relevance of narrative space in how Ruiz Zafón shapes his fictional universe in
The Cemetery of Forgotten Books series, both with regard to settings and also to the
characters that move and live in the physical environment of the story world.
(10) Media hora más tarde, un taxi me dejaba a las puertas de la sede de La
Vanguardia en la calle Pelayo. A diferencia de la siniestra decrepitud de mi
antiguo diario, todo allí desprendía un aire de señorío y opulencia. Me identifiqué en el mostrador de conserjería y un chaval con trazas de meritorio que
me recordaba mí mismo en mis años de Pepito Grillo fue enviado a dar aviso
a don Basilio de que tenía visita.
(El juego del Ángel) (Ruiz Zafón 2008, 209)
(11) Cuando el taxi los dejó a las puertas de la Sección Trece, el que parecía
haber sido designado como cancerbero del lugar esperaba ya en el umbral
portando un manojo de llaves prendido del cinto y con un semblante que
hubiera cosechado premios en un concurso de enterradores.
(El laberinto de los espíritus) (Ruiz Zafón 2016, 205)
(12) El tranvía lo dejó en lo alto de la avenida y se perdió de nuevo en la
niebla, sus luces desvaneciéndose cuesta abajo en un espejismo de vapor. La
plazoleta estaba desierta a aquellas horas. La luz de una farola solitaria dibujaba apenas las siluetas de dos coches negros apostados frente al restaurante
La Venta. Policía, pensó Fernandito.
(El laberinto de los espíritus) (Ruiz Zafón 2016, 342)
4.4.2 Contextualizing Information
These descriptions lead us to another consistent pattern found throughout the different clusters under analysis here: how time and space references are frequently
A Corpus-Stylistic Approach to Ruiz Zafón’s Series 75
combined. In example 10 we have shown an example that contains a time reference (“media hora más tarde”) together with the cluster A LAS PUERTAS DE
LA and a means of transportation (a taxi), which contributes to offering the necessary context that precedes the description of the headquarters of La Vanguardia and the people working there. This example is not coincidental but part of a
regular pattern found throughout the series. To understand how the clusters under
analysis here are combined with time references to offer contextualizing information, we should first briefly refer to Ruiz Zafón’s frequent references to time
references as a defining trait of his writing. As Ruiz Tosaus (2009) explains, time
is one of the main themes of Ruiz Zafón’s novels. It frequently plays an important role in the plot of his novels, such as in the bildungsroman El príncipe de
la niebla. But most important, time is key to understanding and deciphering the
mysteries of Ruiz Zafón’s novels (Sáiz 2004, 14). In addition to these functions
discussed by literary critics, references to time are also used less conspicuously
throughout his novels to offer contextualizing information. Interestingly, these
references are used in conjunction with references to narrative space. Together,
they are used to provide contextualizing information and result in a preferred
textual option that Ruiz Zafón uses repeatedly to define the context that precedes
a new story episode.
The combination of space and time as a means of offering contextualizing
information can be observed in different clusters. In examples 13 to 15 we show
three occurrences of A LAS PUERTAS DE LA. The pattern is the same as that
shown in example 10. As can be observed, time references – “Diez minutos más
tarde” in (13), “Una semana más tarde” in (14), and “En apenas un par de minutos
de paseo” in (15) – are followed by the reference to narrative space, which is used
as the starting point from which the episode begins. These examples are from
three different novels (El juego del Ángel, La sombra del viento, and El laberinto
de los espíritus), which suggests a pattern across the series.
(13) Diez minutos más tarde llegaba a las puertas de la estación de Francia.
Las taquillas ya estaban cerradas, pero aún podían verse varios trenes alineados en los andenes bajo la gran bóveda de cristal y acero.
(El juego del Ángel) (Ruiz Zafón 2008, 284)
(14) Una semana más tarde, a las puertas de la escuela de música de la calle
Diputación, Sophie se encontró con don Ricardo Aldaya, que la esperaba
fumando y ojeando un periódico.
(La sombra del viento) (Ruiz Zafón 2001, 429)
(15) En apenas un par de minutos de paseo a través de callejas heladas y
desiertas llegaron a las puertas de la vieja fábrica. Los raíles de una línea
ferroviaria se desvanecían a sus pies y se adentraban en el recinto. Un gran
portón de piedra con la leyenda VAPOR BARCINO presidía la entrada.
(El laberinto de los espíritus) (Ruiz Zafón 2016, 272)
76
Guadalupe Nieto Caballero and Pablo Ruano San Segundo
Other clusters that refer to narrative space, such as AL OTRO LADO DE LA
and EN EL INTERIOR DE LA, are also used jointly with time references, as
shown in examples 16 and 17. These examples serve to reinforce the hypothesis that the configuration of narrative space is frequently related to eye behavior and how we perceive space through the lens of characters, as mentioned in
Section 4.4.1. As can be observed, time references – “cerca de media hora” in
example 16 and “las dos de la madrugada” in example 17 – add to the context
with which the narrator describes the scene from his own viewpoint. To do so,
Ruiz Zafón uses a time reference in combination with a reference to narrative
space, which is directly related to his gaze. In example 16, for instance, “cerca
de media hora” is the context information that Ruiz Zafón uses as the basis
on which to define the scene: Daniel spends half an hour at the other side of
the street (“al otro lado de la calle”), watching (“vigilando” and “viendo”) the
silhouettes of Mr. Aguilar and his wife. To explain to his readers what is going
on, Ruiz Zafón combines time and space. The exact same pattern is observed in
example 17.
(16) Aun así, tan desprovisto de dignidad como de abrigo apropiado para
la gélida temperatura, me resguardé del viento en un portal al otro lado
de la calle y permanecí allí cerca de media hora, vigilando las ventanas y
viendo pasar las siluetas del señor Aguilar y de su esposa. No había rastro
de Bea.
(La sombra del viento) (Ruiz Zafón 2001, 351)
(17) Cuando llegué a casa eran casi las dos de la madrugada. Iba a enfilar el
portal cuando vi que había luz en el interior de la librería, un resplandor débil
tras la cortina de la trastienda.
(El prisionero del cielo) (Ruiz Zafón 2011, 192)
Finally, clusters referring to narrative space are also combined with time references to bring us to a previous time. This is in line with Ruiz Zafón’s constant recollection of past events (Ruiz Tosaus 2009). Interestingly, Zafón’s idiosyncratic
practice of bringing us back to a previous time is enacted by making constant
references to narrative space too. In examples 18 to 20 we show three occurrences
of LA IGLESIA DE SANTA ANA. They are from three different novels (El prisionero del cielo, La sombra del viento, and El juego del Ángel), which suggests
a consistent formal pattern that contributes to buttressing literary critics’ discussions with regard to the frequent connections of Ruiz Zafón’s stories with past
events to make sense of the story world. Another similar example of LA IGLESIA
DE SANTA ANA with a time reference in El laberinto de los espíritus was shown
in example 1.
(18) La novia vestía de blanco y, aunque no lucía grandes alhajas ni adornos,
no ha habido en la historia una mujer que fuese más hermosa a los ojos de su
A Corpus-Stylistic Approach to Ruiz Zafón’s Series 77
prometido que la Bernarda aquel día primerizo de febrero reluciente de sol en
la plaza de la iglesia de Santa Ana.
(El prisionero del cielo) (Ruiz Zafón 2011, 281)
(19) Bea y yo nos casamos en la iglesia de Santa Ana dos meses más tarde. El
señor Aguilar, que todavía me hablaba en monosílabos y seguiría haciéndolo
hasta el fin de los tiempos, me había concedido la mano de su hija ante la
imposibilidad de obtener mi cabeza en bandeja.
(La sombra del viento) (Ruiz Zafón 2001, 530)
(20) En la carta, Sempere hijo me contaba que Isabella y él, tras varios años
de noviazgo tormentoso e interrumpido, habían contraído matrimonio el
18 de enero de 1935 en la iglesia de Santa Ana. La ceremonia, contra todo
pronóstico, la había celebrado el nonagenario sacerdote que había pronunciado la eulogia en el entierro del señor Sempere y que, a pesar de todos los
intentos y afanes del obispado, se resistía a morir y seguía haciendo las cosas
a su manera.
(El juego del Ángel) (Ruiz Zafón 2008, 358)
This relationship of narrative space with time references, together with the
configuration of fictional universes shown in Section 4.4.1, has served to unveil
hitherto-unremarked aspects of Ruiz Zafón’s craftsmanship in his treatment
of narrative space. Although some of the clusters are clearly exclusive to The
Cemetery of Forgotten Books series, such as CEMENTERIO DE LOS LIBROS
OLVIDADOS, the systematicity of the formal and functional patterns discussed
throughout this chapter suggests a well-established practice in the treatment of
narrative that may be part of Ruiz Zafón’s style in general. The analysis of the
clusters identified has shed new light on the Spanish author’s treatment of space
beyond the symbolic value addressed by literary critics mentioned in Section 4.2.
Hopefully, this has served to demonstrate the potential of corpus stylistics to
reveal “meanings of literary texts that cannot be detected either by intuitive
techniques as in literary studies” (Fischer-Starcke 2010, 2), thus complementing
them and opening new avenues of analysis in the study of literary texts written
in Spanish.
4.5 Conclusion
Corpus stylistics is a well-established field of corpus linguistics that applies
corpus methodologies to the analysis of literary texts. The potential of corpusstylistic analyses has been shown with many authors, especially in the Englishspeaking world. This potential contrasts with a clear absence of computer-assisted
approaches of the analysis of Spanish novelists. In this chapter, we have set out to
demonstrate the potential of corpus stylistics in the analysis of Spanish-speaking
authors. To do so, we have conducted a cluster analysis to investigate Carlos Ruiz
78
Guadalupe Nieto Caballero and Pablo Ruano San Segundo
Zafón’s treatment of narrative space in The Cemetery of Forgotten Books series.
Our brief (of necessity) account of some findings on how the configuration of
narrative space is shaped by Ruiz Zafón has hopefully shown how a computerassisted approach that combines corpus linguistics with traditional stylistics can
provide meaningful insight into the style of the author. As has been shown, some
of the findings have revealed previously unremarked stylistic traits of Zafón’s
craftsmanship, demonstrating how corpus stylistics “can reveal patterns that we
as readers may not be aware of, although such patterns might still contribute to the
effects we perceive” (Mahlberg 2013, 27).
Of course, the purpose of this chapter was not to set corpus stylistics as an
approach that seeks to replace more traditional approaches – this seems one of the
reasons that stylisticians in the Spanish-speaking world are reluctant to embrace
corpus approaches. Needless to say, this form of analysis does not supplant other
studies. Rather, they “should be seen as a complementary approach to more traditional approaches” (Biber et al. 1998, 7–8), from which the study of literary
texts can benefit greatly. We hope that the case study presented here contributes to
opening new avenues of analysis and to encouraging stylisticians in the Spanishspeaking world to incorporate computer-assisted approaches to their analytical
tool kit.
Note
1 For a more detailed account of clusters in general and how to identify them using WordSmith Tools, see Scott (2013).
References
Altenberg, Bengt. 1998. “On the Phraseology of Spoken English: The Evidence of Recurrent Word-Combinations.” In Phraseology. Theory, Analysis, and Applications, edited
by Anthony Paul Cowie, 101–22. Oxford: Oxford University Press.
Álvarez Méndez, Natalia. 2002. Espacios Narrativos. León: Servicio de Secretariado y
Relaciones internacionales de la Universidad de León.
Anthony, Laurence. 2019. AntConc (Version 3.5.8) [Computer Software]. Tokyo: Waseda
University.
Archer, Dawm. 2007. “Computer-Assisted Literary Stylistics: The State of the Field.” In
Contemporary Stylistics, edited by Marina Lambrou and Peter Stockwell, 244–57. London: Continuum.
Biber, Douglas, Susan Conrad, and Randi Reppen. 1998. Corpus Linguistics: Investigating
Language Structure and Use. Cambridge: Cambridge University Press.
Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad, and Edward Finegan.
1999. Longman Grammar of Spoken and Written English. Harlow: Longman.
Brook, George L. 1970. The Language of Dickens. London: Andre Deutsch.
Buchholz, Sabine, and Manfred Jahn. 2005. “Space in Narrative.” In The Routledge Encyclopedia of Narrative Theory, edited by David Herman, Manfred Jahn, and Marie-Laure
Ryan, 551–54. London: Routledge.
A Corpus-Stylistic Approach to Ruiz Zafón’s Series 79
Calle Rosingana, Gonzalo. 2012. “Perspectiva lingüística y cognitiva del estilo de Carlos
Ruiz Zafón en La sombra del viento.” PhD diss., Universitat de Vic.
Cheng, Winnie. 2012. “Corpus-Based Linguistic Approaches to Critical Discourse Analysis.” In The Encyclopedia of Applied Linguistics, edited by Carol A. Chapelle, 1–8.
Oxford: Wiley-Blackwell.
Conrad, Susan, and Douglas Biber. 2005. “The Frequency and Use of Lexical Bundles in
Conversation and Academic Prose.” In The Corpus Approach to Lexicography, Thematischer Teil von Lexicographica. Internationales Jahrbuch für Lexikographie 20, edited
by Wolfgang Teubert and Michaela Mahlberg, 56–71. Tübingen: Niemeyer.
Fischer-Starcke, Bettina. 2010. Corpus Linguistics in Literary Analysis: Jane Austen and
her Contemporaries. London: Continuum.
Gordon, Ian. 1966. The Movement of English Prose. London: Longman.
Hrabova, Valeria, Larisa Аdonina, Olga Medvedeva, and Olga Shutova. 2020. “Philological Concept of the Novel ‘The Shadow of the Wind’ by Carlos Ruiz Zafón.” Revista
Inclusiones: Revista de Humanidades y Ciencias Sociales 7, no. 19: 344–52.
Kopaczyk, Joanna. 2013. The Legal Language of Scottish Burghs: Standardization and
Lexical Bundles (1380–1560). Oxford: Oxford University Press.
Mahlberg, Michaela. 2013. Corpus Stylistics and Dickens’s Fiction. London: Routledge.
Mahlberg, Michaela. 2014. “Corpus Stylistics.” In The Routledge Handbook of Stylistics,
edited by Michael Burke, 387–92. London: Routledge.
Matzat, Wolfgang. 2007. Espacios y Discursos en la Novela Española: Del Realismo a la
Actualidad. Frankfurt: Iberoamericana Vervuert.
McIntyre, Dan, and Brian Walker. 2019. Corpus Stylistics. Theory and Practice. Edinburgh: Edinburgh University Press.
Nieto Caballero, Guadalupe. 2019. “El Espacio como Eje Vertebrador en la Creación del
Universo Ficticio Galdosiano: Un Estudio de Corpus.” Signa 28: 1203–38.
Nieto Caballero, Guadalupe, and Pablo Ruano San Segundo. 2020. Estilística de Corpus:
Nuevos Enfoques en el Análisis de Textos Literarios. Bern: Peter Lang.
Ramos Nogueira, Luis Carlos. 2016. “Motivación y Vigencia en Seis Locuciones del Universo de Carlos Ruiz Zafón.” Language Design 18: 45–70.
Romero Frías, Marina, and Marta Galiñanes Gallén. 2009. “La Importancia del Análisis
del Discurso Narrativo en la Traducción: L’ombra del vento de Carlos Ruiz Zafón.”
Espéculo: Revista de Estudios Literarios 41. http://webs.ucm.es/info/especulo/
numero41/lombrav.html. Accessed 1 April 2022.
Ruiz Tosaus, Eduardo. 2009. “Motivos, Símbolos y Obsesiones en la Narrativa de Carlos Ruiz Zafón.” Espéculo: Revista de Estudios Literarios 41. https://webs.ucm.es/info/
especulo/numero41/motivzaf.html. Accessed 1 April 2022.
Ruiz Zafón, Carlos. 2001. La Sombra del Viento. Barcelona: Planeta.
Ruiz Zafón, Carlos. 2008. El Juego del Ángel. Barcelona: Planeta.
Ruiz Zafón, Carlos. 2011. El Prisionero del Cielo. Barcelona: Planeta.
Ruiz Zafón, Carlos. 2016. El Laberinto de los Espíritus. Barcelona: Planeta.
Ryan, Marie-Laure, Kenneth Foote, and Maoz Azaryahu. 2016. Narrating Space/Spatializing Narrative: Where Narrative Theory and Geography Meet. Columbus, OH: Ohio
State University Press.
Sáiz Ripoll, Anabel. 2004. “Sólo Recordamos lo que Nunca Sucedió: Análisis de la Obra
de Carlos Ruiz Zafón.” Cuadernos de literatura infantil y juvenil 177: 7–27.
Scott, Mike. 2013. WordSmith Tools Manual. Version 6.0. Liverpool: Lexical Analysis
Software.
80
Guadalupe Nieto Caballero and Pablo Ruano San Segundo
Scott, Mike. 2016. WordSmith Tools Sersion 7. Stroud: Lexical Analysis Software.
Scott, Mike. 2019. “Single Words v. Clusters.” https://lexically.net/downloads/version7/
HTML/single_words.html. Accessed 1 April 2022.
Stubbs, Michael, and Isabel Barth. 2003. “Using Recurrent Phrases as Text-Type Discriminators. A Quantitative Method and Some Findings.” Functions of Language 10, no. 1:
61–104.
Tognini-Bonelli, Elena. 2001. Corpus Linguistics at Work. Amsterdam: John Benjamins.
Zoran, Gabriel. 1984. “Towards a Theory of Space in Narrative.” Poetics Today 5, no. 2:
309–35.
Zubiaurre, María Teresa. 2000. El Espacio en la Novela Realista. México: Fondo de
Cultura Económica.
5
Analyzing Who, What,
and Where in a Mediæval
Chinese Corpus
A Case Study on the Chinese
Buddhist Canon1
Tak-sum Wong and John Sie Yuen Lee
5.1 Introduction
As more literary and historical texts become digitized, researchers increasingly
complement traditional, manual analysis with data-driven, quantitative methods.
They have been applied to study various textual properties, such as literary style
(e.g., Holmes 1994), evolution of literary genres (e.g., Moretti 2007), text reuse and
intertextuality (e.g., Büchler et al. 2010), authorship (e.g., Hung et al. 2010; Sayoud
2012), and novel structure (e.g., Clement 2008). Automatic methods have also been
developed to recognize named entities (e.g., Van Dalen-Oskam et al. 2014; Bornet
and Kaplan 2017) and to retrieve location, time, participant, and action from events
recorded in a narrative (e.g., Vossen et al. 2008). These methods can in turn facilitate character analysis, such as their persona (e.g., Bamman et al. 2013) and social
networks (e.g., Moretti 2007; Elson et al. 2010; Agarwal et al. 2012). While automatic analysis cannot match the depth of scholarly work, it offers greater breadth
by covering larger amount of text than can ever be processed by individual scholars.
Information extraction from historical text is challenging because of the lack of
metadata such as ontologies (Gutierrez et al. 2016), as well as syntactically annotated data. This chapter addresses the challenges in information extraction for the
three “wh-questions” – who were the characters, what did they do, and where were
they? – in a historical corpus written in a low-resource language. In particular, we
investigate whether and how to exploit a limited amount of in-domain training
data to improve extraction accuracy and report a case study on the Chinese Buddhist Canon (henceforth, “the Canon”), the largest collection of Medieval Chinese
texts. We apply a lexicon-based approach for named entity recognition (NER) on
the Canon and then perform part-of-speech (POS) tagging and dependency parsing to identify significant relations between characters, verbs, and toponyms. NER
and relation extraction are essential for information retrieval from textual data
(Lizarralde et al. 2019), but they remain underexplored for literary texts, especially those written in low-resource languages: the only previous study on these
tasks for the Canon focused on manual annotations in a limited number of texts
(Bingenheimer et al. 2009). Existing NLP tools for Chinese tend not to perform
well on Medieval Chinese, since they are trained on Modern Standard Chinese.
DOI: 10.4324/9781003298328-6
82
Tak-sum Wong and John Sie Yuen Lee
This study shows that even a small amount of in-domain data for word segmentation, POS, and syntactic structure can improve accuracy in NER and in retrieving
significant character-verb associations. Further, it presents the first data-driven
profiling of the characters, verbs, and toponyms over the entire Canon and produces the largest Medieval Chinese corpus to date that is automatically annotated
with named entities, character-verb, and character-toponym associations.
The rest of the chapter is organized as follows. In the next section, we present the textual material of our corpus and review previous work in information
extraction. In Section 5.3, we exploit word segmentation training data to improve
named entity annotation. In Sections 5.4 and 5.5, we evaluate the impact of POS
and dependency training data on retrieving significant verb and toponyms, respectively. In Section 5.6, we conclude with a summary of our contributions and suggestions for future work.
5.2 Background
5.2.1 Corpus and Linguistic Resources
The Chinese Buddhist Canon is a collection of Medieval Chinese texts that are
deemed canonical for Chinese Buddhism. In this study, we use the Korean edition
of the Canon,2 the Tripiṭaka Koreana 高麗藏 (Lancaster and Park 1979), for which
a digital version has been made available (Lancaster 2010). This edition, with over
40 million characters, was derived from the printing blocks stored at the Haein
Monastery海印寺 in Korea, the most complete set of currently available blocks.
The Canon can be divided into two main subcorpora, the Mahāyāna and the
Hīnayāna,3 named after the two main schools of Buddhism. Mahāyāna, the “great
vehicle,” is today dominant among the Chinese, Koreans, Japanese, Vietnamese,
and Tibetans; Hīnayāna, the “lesser vehicle,” is most widespread in South and
Southeast Asian countries.
Unlike Modern Chinese, which enjoys a wide range of linguistic resources,
Medieval Chinese remains a low-resource language. The only existing large
POS-tagged corpus for this language is Huainanzi 淮南子 (Song and Xia 2014),
and the only treebank (henceforth, the “L&K Treebank”) consists of only about
50,000 Chinese characters drawn from four sūtras in the Canon (Lee and Kong
2016). For word segmentation and part-of-speech tagging, both resources largely
followed the guidelines for the Penn Chinese Treebank (Xue et al. 2005). For
dependency relations, the L&K Treebank used the Stanford Dependencies for
Modern Chinese (Chang et al. 2009).
Although Buddhism scholars have developed a number of in-domain digital
lexica, there is not yet any attempt to mark up the myriad of characters and places
in the entire Canon. To tackle this task, we will utilize four of the largest lexica:
the Person Authority Database (DDBC 2008a), which contains 39,277 personal
names; the Place Authority Database (DDBC 2008b), which contains 18,017 geographical names; a Dictionary of Chinese Buddhist Terms (Soothill and Hodous
1937), which has 16,687 entries; and 720 Sanskrit-transliterated terms harvested
from Chu (1996, 1998, 1999).
Analyzing Who, What, and Where in a Historical Mediæval Chinese Corpus
83
5.2.2 Information Extraction from Literary Works
Named entity recognition (NER) aims to identify entities such as people, organizations,
and locations in unstructured text. Information extraction (IE) then labels their relationships, such as the location of the headquarters of an organization or the organization to
which a person belongs (Doddington et al. 2004). In the historical domain, the KYOTO
framework defines the historical event model with four slots: location, time, participant,
and action (Vossen et al. 2008). Our work follows this framework in extracting associations between characters, verbs, and locations but excludes temporal information.
In supervised approaches, statistical classifiers are trained to identify relation mentions that link two entities. A variety of lexical, syntactic, and semantic features has
been explored (e.g., Surdeanu and Ciaramita 2007; Ji and Grishman 2008; Agarwal
and Rambow 2010). In the news domain, large corpora, such as those from the NIST
Automatic Content Extraction (ACE) program, have served as training datasets. The
Namescape project, for example, trained existing named entity recognizers on Dutch
literary texts to recognize names in Dutch fiction (Van Dalen-Oskam et al. 2014).
While most approaches previously mentioned used POS tags and regular
expressions, a number of studies showed that syntactic features can help improve
performance. Bornet and Kaplan (2017) included grammatical structure as part of
their rule-based system to recognize proper names in French novels. Zhou et al.
(2007) reported that syntactic features can improve the accuracy in information
extraction, while Mintz et al. (2009) found that the combination of syntactic and
lexical features provides better performance than either feature set on its own.
However, it remains an open question whether these improvements can hold in
low-resource domains, for which the paucity of in-domain training data typically
precludes high-accuracy automatic parsing.
We seek to answer this question with a case study on the Chinese Buddhist
Canon, the first attempt of automatic NER and IE in Medieval Chinese text. With
the exception of a manually annotated corpus of nexus points – groups of people at
particular locations – for the Biography of Eminent Monks 高僧傳 (Bingenheimer
et al. 2009), most existing Chinese NER and IE corpora are for Modern Chinese
(e.g., Finkel et al. 2005; Shih et al. 2004). Our experimental results, which investigate whether a limited amount of in-domain syntactic annotation can improve NER
and IE accuracy, will have implications on future analyses of historical corpora.
5.3 Named Entity Tagging
We first describe our lexicon-based method for marking up personal and geographical
names (Section 5.3.1) and then present an evaluation on this method (Section 5.3.2).
5.3.1 Approach
Baseline
Our baseline uses the Stanford Chinese word segmenter tool, a widely used segmenter trained on Modern Chinese (Chang et al. 2008).
84
Tak-sum Wong and John Sie Yuen Lee
In-Domain Segmenter
We trained a Chinese word segmenter on the L&K Treebank with conditional random fields (Lafferty et al. 2001) using the CRF++ implementation. We adopted the
features proposed by Zhao et al. (2007) and used the four resources mentioned in
Section 5.2.1 as external lexica. Compared to Modern Chinese, fewer words in Medieval Chinese text contain more than two syllables. Therefore, we followed Peng
et al. (2004) and Tseng et al. (2005) in adopting a two-tag set for word segmentation.
Following word segmentation with either approach, we performed Forward
Maximal Matching (FMM) on the text using the Person Authority Database and
Place Authority Database (Section 5.2.1). This improves recall since most word
segmentation errors involve the split-up of a word than the merging of two words.
5.3.2 Evaluation
5.3.2.1 Data
We created a test set from the L&K Treebank as follows. For each word that is
tagged as a proper noun (NR), we search in the Person or Place Authority Database to decide whether it is a personal name or toponym. In cases where the word
is found in neither database, an annotator with a background in Medieval Chinese
performed the classification. This test set contains a total of 1,914 characters and
114 toponyms.
Naive use of these lexica would lead to false alarms because they cover not
only terms found in the Canon but also a much wider range of people and locations related to Buddhism, many of which share the same form with common
nouns and verbs.4 To increase NER precision, we filter out terms of Chinese origins from the lexica:5 since the Canon consists mostly of translations from Indic
languages, most named entities are of non-Chinese origins.
5.3.2.2 Results
Table 5.1 shows the NER precision and recall using the FMM approach previously
described. For recognizing personal names, the baseline achieved 77% precision
and 51% recall. Since the Stanford segmenter is trained on Modern Chinese, its
relatively poor performance on Medieval Chinese is not unexpected. Using the
CRF model trained on in-domain data, precision improved to 87%, and recall to
Table 5.1 Named Entity Recognition Performance on the Test Set
Word Segmentation Method
Characters
In-domain segmenter
Baseline
Precision
0.87
0.77
Toponyms
Recall
0.69
0.51
Precision
0.82
0.58
Recall
0.48
0.19
Analyzing Who, What, and Where in a Historical Mediæval Chinese Corpus
85
69%. Toponym recognition turned out to be a more challenging task. The baseline achieved 58% precision and 19% recall. Our proposed approach performed
significantly better, at 82% precision and 48% recall. For both personal names
and toponyms, recall suffered from limited coverage of the dictionaries, which
do not include personal names, such as Hēsàduō 訶薩多 and Bōtóumó 砵頭摩
(K426), or places, such as Jiāluóhuánsì 加羅洹寺 (K1002), a temple in Śrāvastī;
Shājiégǔ 沙竭古 (K1002), a country in the north of Central Asia; and Băoshān寶
山 (K426), a mountain.
Using this method, we identified 588,871 personal names (871 unique names)
and 53,033 toponyms (480 unique names) in the Canon.
5.4 Characters and Verbs
While raw frequency of a character name can give a rough indication of the significance of the character, it is inflated when the name is embedded in expressions
that do not imply the character’s presence. For instance, the terms rúláijiětuōmén
如來解脫門 and fóchà 佛剎 contain “Buddha” (rúlái and fó, respectively) but
imply no action on the part of Śākyamuni. Instead, we consider the number of
times the character appears in a character-verb pair, that is, the number of times a
character serves as the subject of a noun.
We now seek to identify the most significant characters in the corpus and the
verbs that are associated with them. Following an outline of our approach (Section 5.4.1), we report an evaluation (Section 5.4.2) and then analyze a number of
characters with their most frequent verbs (Section 5.4.3).
5.4.1 Approach
Baseline
Our baseline does not have access to POS information. Given that Chinese is an
SVO language, it assumes a character name and the following word (except punctuation) form a character-verb pair.
POS-Based Approach
We trained a part-of-speech (POS) tagger with CRF++ (Lafferty et al. 2001) on
the POS tags in the L&K Treebank. In addition to the standard unigram and
bigram features, we also included a feature for proper nouns, based on the Person and Place Authority Databases and the Sanskrit-transliterated terms from Chu
(Section 5.2.1).
For each personal name in the NER-annotated corpus (Section 5.3), we retrieved
the verb (VV) that either immediately follows it or separated from it by an adverb
(AD). Take the sentence in Figure 5.1 as an example. The word immediately following Bhiət, “Buddha,” is ngiò, “meet”; we thus included the character-verb pair
(Bhiət, ngiò) in our dataset.
86
Tak-sum Wong and John Sie Yuen Lee
Njiε̌ zhiə Bhiət
ngiò
Biunghuàn ,
dang
sio
that time Buddha encounter stroke
must
need
“At that time, Buddha had a stroke and must need (some) milk.”
ngiou
bull
njiǒ
milk
Figure 5.1 Example dependency tree to illustrate character-verb pair extraction (K229).
Dependency-Based Approach
Dependency structures facilitate extraction of character names separated by
longer distance from their verbs. We trained a Minimum-Spanning Tree parser
(McDonald et al. 2006) on the L&K Treebank to automatically derive dependency structures (Wong and Lee 2016). To collect character-verb pairs, we
retrieved all “nominal subject” (nsubj) relations in the treebank: the child word,
if tagged as a personal name, is the character; the parent word is the verb. The
nsubj relation in Figure 5.1, for example, indicates that Bhiət, “Buddha,” is the
subject of the verb ngiò, “meet”; we thus included the character-verb pair (Bhiət,
ngiò) in our dataset.
Dependency structures are especially useful in serial verb constructions. The
verbs in these constructions, which are common in Medieval Chinese, are linked
with the “conjunct” (conj) relation, as exemplified by the verbs ngiò, “meet,” and
sio, “need,” in Figure 5.1. Although Bhiət, “Buddha,” is the subject for both verbs,
the second verb (sio) is not directly linked to it. We attributed the subject to all
verbs in a serial verb construction, hence, in this case, also recognizing “Bhiət,
sio” as a character-verb pair.
5.4.2 Evaluation
5.4.2.1 Data
Our test set consists of 896 character-verb pairs in the L&K Treebank. These pairs
were retrieved from the child and parent words of the “nominal subject” (nsubj)
relations, where the child word is annotated as a personal name (Section 5.3.2.1).
5.4.2.2 Results
To estimate the quality of the automatically retrieved character-verb pairs, we
evaluated the accuracy of the nsubj relations on tenfold cross-validation on the
test set. The precision/recall figure of the baseline approach are 0.46 and 0.64,
Analyzing Who, What, and Where in a Historical Mediæval Chinese Corpus
87
Table 5.2 Precision and Recall in Subject-Verb Pair
Extraction from L&K Treebank
Method
Precision
Recall
Baseline
POS-based
Dependency-based
0.46
0.77
0.91
0.64
0.71
0.93
respectively (Table 5.2). Despite the small amount of training data, the precision/
recall of the POS-based approach reached 0.77 and 0.71.
The use of dependency information further improved the precision to 0.91, and
the recall to 0.93. Recognition of character-verb pairs in serial verb constructions
contributed to the gain in recall. The remaining recall errors were due to the parser’s mislabeling of nominal subject relations as noun modifier (nn) or adverbial
modifier (advmod). Most precision errors were caused by mistaking a vocative as
a nominal subject.
5.4.3 Analysis
The procedure previously described harvested 36,666 unique character-verb pairs.
We analyze the most significant characters (Section 5.4.3.1) and the most frequent
verbs associated with each character (Section 5.4.3.2).
5.4.3.1 Character Distribution
Table 5.3 lists the ten most frequent characters.6 The founder of Buddhism,
Śākyamuni Buddha, is the dominant character, serving as the nominal subject
81% of the time.7 Four of his prominent disciples – Subhūti, Ānanda, Śāriputra,
and Maudgalyāyana – trail him, at much lower frequencies. Ranked fourth is a
bodhisattva, Mañjuśrī. Bodhisattvas are “enlightened beings” who, out of compassion, delay their entry into nirvāṇa in order to aid others. In contrast, Pratyekabuddha aims to attain nirvāṇa for himself rather than helping others. The three
remaining characters among the top ten are Mahāyānadeva (602−664), a prolific
translator of Buddhist scriptures into Chinese; Sudhana (also known as the Child
of Wealth), an acolyte of Bodhisattva Avalokitesvara; and Devadatta, a cousin and
rival of Śākyamuni, who cultivated magical powers.
To compare our method with scholarly work, we consulted the glossary items
in the book Buddhism: A Modern Perspective (Prebish 2000). There are 50 characters among these glossary items that are listed in the Person Authority Database (DDBC 2008a). Our method retrieved 48% of these characters as our top 50
characters. Among those ranked below 50 were Yaśodharā 耶輸陀羅, the wife
of Śākyamuni, and Viśākhā 毘舍佉, a prominent follower. Other omissions consisted mainly of monks, writers, and kings who played important roles in the history of Buddhism but did not appear in the Buddhist Canon.
88
Tak-sum Wong and John Sie Yuen Lee
Table 5.3 Most Frequent Characters (As Nominal Subjects) in the Corpus
Character
Frequency
Character
Frequency
Śākyamuni Buddha
81.4%
Pratyeka-buddha
1.0%
釋迦牟尼佛
Ānanda
2.8%
辟支佛
Maudgalyāyana
0.6%
阿難
Subhūti
1.9%
目犍連
Mahāyānadeva
0.4%
須菩提
Mañjuśrī
1.5%
玄奘
Devadatta
0.3%
文殊菩薩
Śāriputra
1.4%
提婆達多
Sudhana
0.3%
舍利弗
善財童子
The high-ranking characters in our list that are not mentioned in Prebish (2000)
were mostly deities, such as Śakra 帝釋天, and translators, such as Amoghavajra
不空 (705−774, ranked 15th). Two other notable characters excluded from the
glossary were Aniruddha 阿那律 (ranked 12th), one of the ten principal disciples of Buddha, and Channa 闡那 (ranked 25th), the head charioteer of Prince
Siddārtha.
As discussed in Section 5.2.1, the Canon has two main subcorpora, containing material from two different religious traditions, known as Mahāyāna and
Hīnayāna. Not all characters are equally represented in both subcorpora. The
log-likelihood metric can identify words that can best discriminate between the
two subcorpora (Rayson 2008): words with high log-likelihood scores tend to be
emphasized by one subcorpus but rarely mentioned in the other. We computed this
metric on all character names.
The character who yielded the highest log-likelihood score was Subhūti, one of
Buddha’s disciples. In the Mahāyāna section, Subhūti is second only to Buddha
in terms of frequency. Known for his knowledge of “emptiness,” one of the most
important Mahāyāna doctrines, Subhūti played a significant role in the exposition
of the Mahāyāna scriptures, especially the prajñā-pāramitā sūtras 般若經; his role
in the Hīnayāna is much less important (Table 5.4).
The second highest-scoring character was Mañjuśrī, one of the best-known
bodhisattvas. The concept of bodhisattva was a major doctrinal difference between
the two subcorpora. As summarized by I Tsing 義淨 (635−713), a seventh-century Chinese monk, “those who venerate the bodhisattvas and read the Mahāyāna
sūtras are called the Mahāyānists, while those who do not perform these are called
the Hīnayānists.” In the Mahāyāna section, Mañjuśrī features prominently, ranking third, behind only Buddha and Subhūti. Reflecting the doctrinal differences,
however, he receives less prominence (0.2%) in the Hīnayāna section (Table 5.4).
Analyzing Who, What, and Where in a Historical Mediæval Chinese Corpus
89
Table 5.4 Most Frequent Characters in the Two Subcorpora of the Canon
Mahāyāna Character
Frequency
Hīnayāna Character
Frequency
Śākyamuni Buddha
Subhūti
Mañjuśrī
Ānanda
Śāriputra
80.9%
3.6%
2.3%
1.9%
1.7%
Śākyamuni Buddha
Ānanda
Śāriputra
Maudgalyāyana
Mahāyānadeva
83.8%
4.4%
1.2%
1.0%
0.5%
5.4.3.2 Verb Distribution
Having identified the main characters, we now analyze what they did, based on
the most common verbs that portray the characters in action. We first examine the
verb profile of the protagonist against those of other characters. We then show
how the profiles of three of the top ten characters suggest their contrasting roles
in the corpus.
Figure 5.2 lists the ten most frequent verbs in our corpus. The saying verbs
dominate this list, occupying the top three positions – yán, “to speak”; shuō, “to
talk”; and gaò, “to tell.” Their prominence reflects the Canon as the remembered
word of the Buddha. When Buddha “talked” with an object, he frequently talked
about “sūtra” (jīng 經), the “dharma” (fǎ 法), and “verse” (jì 偈), the form in
which much of the Canon was written to facilitate memorization and chanting,
reinforcing the authority of the texts by emphasizing that Śākyamuni himself
“spoke” them. He not only gave sermons but also engaged in dialogues with many
characters. The other two verbs, yán and gaò, often serve as quotative verbs to
introduce direct speech.
As shown in Table 5.5, the saying verbs dominate all characters’ verb profile –
yán, shuō, and gaò for Buddha, and yán, bái, shuō for the other characters. Three
verbs co-occur significantly more with other characters than with Buddha. The
first one is bái, ranked second for other characters but conspicuously absent in
the list for Buddha (Table 5.5). The uneven distribution is due to its honorific
usage: bái is a quotative verb for reporting speech from an inferior to a superior
(Kieschnick 2015). The second verb is fèng, “to receive (order)” (ranked fifth),
which acknowledges those who served the imperial court to translate the Buddhist texts into Chinese from their sources in Indic languages; Buddha himself,
however, never engaged in this activity. Another contrastive verb is wèn, “to ask”
(ranked fourth), which suggests that more questions flowed from other characters
to Buddha, rather than the other way around.
Conversely, two verbs, both dealing with locations, are more connected with
Buddha than with the other characters. The frequent appearances of zài, “dwell
in,” and zhù, “stay in” (Table 5.5, left column), result from the meticulous recording of the venues where Buddha delivered his sermons, including the formulaic
sentence “Once upon a time, Buddha dwelled in (zài) so-and-so” that appears in
the preamble in a majority of the sūtras.
We further contrast the verb profiles of three of the top ten characters:
Ānanda, Mahāyānadeva, and Pratyeka-buddha (Table 5.6). Ānanda was
Tak-sum Wong and John Sie Yuen Lee
90
12%
10%
10.1%
8.8%
8%
7.2%
6%
1.8%
1.8%
1.6%
1.4%
1.2%
知 zhī ʻto knowʼ
住 zhù ʻto stayʼ
有 yǒu ʻto haveʼ
1.9%
無 wú ʻnot existʼ
2.1%
2%
白 bái ʻto addressʼ
4%
在 zài ʻto dwellʼ
爲 wéi ʻto doʼ
告 gào ʻto tellʼ
說 shuō ʻto talkʼ
言 yán ʻto speakʼ
0%
Figure 5.2 Most frequent verbs with nominal subjects.
Table 5.5 Most Frequent Verbs with Buddha (Left) and Other Characters (Right) as Subject
Buddha
Verb
Other Characters
Freq.
Verb
Freq.
言 yán “to speak”
白 bái “to address”
說 shuō “to talk”
問 wèn “to ask”
奉 fèng “to receive
(order) ”
無 wú “not have”
爲 wéi “to do”
作 zuò “to make”
答 dá “to reply”
行 xíng “to practice”
10.6%
9.0%
3.7%
2.9%
2.5%
言
說
告
爲
在
yán “to say”
shuō “to speak”
gào “to tell”
wéi “to do”
zài “to dwell in”
10.0%
10.0%
8.7%
2.2%
2.1%
無
知
住
有
得
wú “not have”
zhī “to know”
zhù “to stay in”
yǒu “to have”
dé “to attain”
1.9%
1.6%
1.5%
1.2%
1.2%
1.6%
1.6%
1.5%
1.5%
1.4%
Analyzing Who, What, and Where in a Historical Mediæval Chinese Corpus
91
Table 5.6 Most Frequent Verbs of Three Different Characters
Pratyeka-Buddha
Mahāyānadeva
Verb
Freq.
Verb
無 wú “not have”
得 dé “to attain”
有 yǒu “to have”
知 zhī “to know”
作 zuò “to make”
4.7%
3.3%
3.2%
2.9%
2.8%
奉
譯
論
弘
言
fèng “to receive”
yì “to translate”
lùn “to discuss”
hóng “to spread”
yán “to speak”
Ānanda
Freq.
Verb
51.0%
0.3%
0.2%
0.2%
0.2%
白
言
問
聞
知
bái “to address”
yán “to speak”
wèn “to ask”
wén “to hear”
zhī “to know”
Freq.
14.0%
6.1%
3.6%
3.3%
2.8%
Buddha’s personal attendant and closest disciple. His most frequent verb is
bái, “to address,” an honorific quotative verb for reporting his conversations
with Buddha. While bái also co-occurs often with other disciples, the verb wén,
“to hear,” distinguishes Ānanda from them. By tradition, he was the one who
heard all of Buddha’s sūtras and later recited them to be canonized. As a result,
Ānanda is also called the one “who heard much,” a name corroborated by the
high frequency of wén.
For Mahāyānadeva, the most common verbs are yì, “translate,” and fèng, “to
receive (order).” Unlike those of Ānanda and most other characters, he had hardly
any verbal interactions. Indeed, Mahāyānadeva was an eminent translator of Buddhist scriptures. It is often said that he “received order” from the Chinese emperor
and “translated” the scriptures for the Chinese to read.
The Pratyeka-buddha, “lone buddha,” is one who seeks enlightenment for
himself and does not bring others to it; he also has a distinctive verb profile.
The Mahāyāna subcorpus (Section 5.4.2) tends to cast him in an unfavorable
light. His second most frequent verb, dé, “to attain,” mostly collocates with fǎ,
“dharma,” to form the phrase “attain (some) dharma.” A quarter of his fourth
most frequent verb, zhī, “to know,” are negated with bù to yield “not able to
know.”
5.5 Characters and Locations
We now move to the analysis of “where” to examine prominent toponyms
that indicate the scene of action for various characters. Raw frequencies again
do not suffice for this task, since not every mention of a toponym serves this
purpose. For example, “Ganges River” is disproportionately frequent, even
though it is usually not tied to any character. Rather, it appears most of the time
in the phrase “the sands of the Ganges River,” a stock expression to convey
innumerability.8
We instead construct a set of character-toponym pairs by identifying instances
of toponyms that indicate the characters’ whereabouts. After outlining our algorithm (Section 5.5.1), we report an evaluation (Section 5.5.2) and present an analysis of the most frequent toponyms (Section 5.5.3).
92
Tak-sum Wong and John Sie Yuen Lee
5.5.1 Approach
Baseline
We take all personal and geographical names that occur within the same sentence
as character-toponym pairs. This baseline requires no POS tagging or dependency
parsing.
POS-Based Approach
We retrieved the verbs and prepositions that immediately precede the toponyms in
our corpus. These verbs and prepositions are rather limited in variety: the two most
common verbs, zài, “to dwell in,” and zhù, “to stay in,” account for almost a third
of the instances, while the most frequent preposition, yú, “at,” constitutes more than
half of the instances. We considered all verbs and prepositions whose frequency
exceed 0.1% as location markers. We extracted a personal name and a toponym
to be a character-toponym pair when one of these markers occurs between them.
Dependency Approach
Toponyms typically appear in one of two kinds of dependency structures. It may
serve as the direct object of a verb; for example, in Figure 5.3, Rājagṛha is the
direct object of the locative verb dzhə̌i, “to dwell in.” It may also serve as the
object of a preposition; for example, in Figure 5.4, Rājagṛha is the object of the
preposition dzhiong, “from.” The location can be a simple proper noun or one
that modifies a localizer (e.g.,
, “the north of Gṛdhrakūṭa”).9 We retrieved
all verbs and prepositions in our corpus that take a toponym as the direct object
or prepositional object (Figures 5.5, 5.6). Similar to the POS-based approach, we
considered all verbs and prepositions whose frequency exceed 0.1% as location
markers.10
Zhiε̌
gò
Bhiət
dzhəǐ
d
This
reason
Buddha
usually
dwell.in
“For this reason, Buddha usually dwelled at Rājagr.ha.”
Hiu ngshiàzhi ng
Rājagr.ha
Figure 5.3 Dependency tree with a character-toponym pair involving a verb.
Analyzing Who, What, and Where in a Historical Mediæval Chinese Corpus
dzhiong
Njiε̌
zhiə
Shi ìtzuən
that
time
Bhagavat
from
“At that time, Bhagavat departed from Rājagr.ha.”
Hiu ngshiàzhi ng
Rājagr.ha
93
chuit
depart
Figure 5.4 Dependency tree with a character-toponym pair involving a preposition.
35.00%
30.60%
30.00%
25.00%
20.00%
17.42%
15.00%
10.00%
5.46%
3.20%
5.00%
2.35% 2.00% 1.96% 1.83%
1.55% 1.18%
xiàng 向 ‘face’
yóu 遊 ‘travel to’
huán 還 ‘re turn’
dào 到 ‘reach’
zhù 住 ‘goto’
yì 詣 ‘goto’
rù 入 ‘enter’
zhì 至 ‘arrive at’
zhù 住 ‘stay in’
zài 在 ‘dwell in’
0.00%
Figure 5.5 Most frequent verbs that take toponyms as direct objects.
Then, we examined the character-verb pairs collected in Section 5.4 and identified those that fall into one of the following two cases. First, if the verb is a location marker and if its direct object is a location, then the character and the location
are considered a character-toponym pair. For example, Figure 5.3 contributes
the pair “Bhiət, Hiuangshiàzhiɛng” (“Buddha, Rājagṛha”). Second, if the verb is
modified by a preposition that is a location marker and if its prepositional object
94
Tak-sum Wong and John Sie Yuen Lee
60.00%
54.50%
50.00%
40.00%
30.00%
20.00%
7.80%
10.00%
7.00%
5.80%
1.80%
yú 于 ‘at’
cóng 從 ‘from’
zhì 至 ‘to’
zài 在 ‘at’
yú 於 ʻatʼ
0.00%
Figure 5.6 Most frequent prepositions that take toponyms as prepositional objects.
is a location, then the character and the location are also included as a charactertoponym pair. From Figure 5.4, for example, we obtained the pair “Shiɛìtzuən,
Hiuangshiàzhiɛng” (“Bhagavan, Rājagṛha”).
5.5.2 Evaluation
5.5.2.1 Data
Since the L&K Treebank contains too few toponyms for a reliable evaluation, we
expanded it with Pi nai yeh (Vinaya
) (K936), an important text on vinaya.
An annotator with a background in Medieval Chinese studies first identified all
toponyms in K936 and then examined whether it indicates the location of a person.
Of the 663 instances of toponyms, 201 were included in a character-toponym pair.
5.5.2.2 Results
The precision and recall on the test set are shown in Table 5.7. The baseline gave a
strong performance, at 0.99 precision and 0.73 recall. The use of toponyms in the
Analyzing Who, What, and Where in a Historical Mediæval Chinese Corpus
95
Table 5.7 Precision and Recall in Character-Toponym Pair Extraction
in the Test Set
Information Extraction Method
Precision
Recall
Baseline
POS-based
Dependency-based
0.99
1.00
1.00
0.73
0.66
0.66
test set was rather regular: they almost always indicate the location of a character
if one is present in the same sentence. The main source of error for recall is the
distance between the character and the toponym. About 15% of the gold character-toponym pairs had personal names and toponyms in different sentences.11 We
restricted the retrieval of character-toponym pairs within the same sentence to
emphasize precision, which is important for our analysis.
Unlike the case of character-verb pair extraction, POS and dependencies did
not improve the overall performance. Both the POS-based and dependency-based
approaches attained 100% precision. Their more stringent requirements on lexical
choice and linguistic structure, however, led to a degradation in recall to 0.66. The
most common errors were due to verbs that were not considered a location marker
due to relative infrequency.12
5.5.3 Analysis
The previous procedure yielded 1,113 unique character-toponym pairs. Figure 5.7
lists the most frequently mentioned locations that are associated with a character. The top entries are all places where Buddha was active, involving Śrāvastī,
Rājagṛha, or a more specific location in their environs.
The second-ranked location, Śrāvastī
, was the city where Buddha spent
much of his monastic life. This city was in turn located in Kosala
(ranked
ninth). During the nine months of favorable weather in northeast India, Buddha
and his disciples wandered from place to place to teach. During the monsoon
season, they retreated to a monastery, where Buddha taught and gave discourses.
They spent most of these seasons in Śrāvastī, at a monastery at Jetavana
(ranked first). Rājagṛha (ranked third), another major location of Buddha’s
preaching, and two other places at or near this city are also frequently named. One
was the Bamboo Grove 大林精舍, the first Buddhist monastery, where Buddha
often stayed during the winter. The other was Vulture’s Peak 靈鷲山, where Buddha delivered the celebrated prajñā-pāramitā sutras, among many others.
Of the five remaining places, two played significant roles at the beginning and end
of Buddha’s ministry: Vaiśālī
, where he preached his last sermon before his
death and announced his “great nirvāṇa,” and Varanasī
, where Buddha gave
his first sermon in Sarnath 鹿野苑. Buddha has also established several early monastic
precincts in Kauśāmbī
. The remaining one, in contrast, is not connected with
the historical Śākyamuni. The disciples are often said to be at “Buddha’s place” 佛
所 (ranked eighth) and rarely independently reported to be at a geographical location.
96
Tak-sum Wong and John Sie Yuen Lee
35%
31.7%
30%
26.5%
25%
20%
15%
1.2%
1.1%
Kau˝°mb˛
2.1%
Kosala
2.4%
Buddna’s
place
3.4%
Vai˝al˛
3.5%
Bamboo
Grove
R°jagrha
˜r°vast˛
Jetavana
0%
Varanas˛
3.6%
5%
Vulture’s
Peak
8.2%
10%
Figure 5.7 The ten toponyms most frequently mentioned with a character.
For a comparison with scholarly work, we again turned to the glossary items
in Prebish (2000). The glossary items include 16 toponyms that are listed in the
Place Authority Database (DDBC 2008b). Our top 16 list covered 9 out of these
16 toponyms (56%). Most of the omitted toponyms refer to locations that were
not mentioned in the Canon but played significant roles in Buddhist history, for
example, Bodh-gayā, Koliya and Licchavi. A number of high-ranking locations in
our list, such as Śrāvastī and Vulture’s Peak, are not items in their own right but
referenced in other items. A few other locations, such as Varanasī and Trāyastrijśa
忉利天, occur frequently in our dataset but were not deemed significant enough
as a glossary item in Prebish (2000).
Different characters vary in terms of their range of locations, just as they do
with respect to verb profiles. Buddha’s most frequent toponyms – topped by
Jetavana, Śrāvastī, and Rājagṛha (leftmost column, Table 5.8) – largely overlap
with those in Figure 5.7, given his dominance in the Canon. Toponyms associated with “Bhagavat,” one of Buddha’s other epithets, are also similar (third
column, Table 5.8). The epithet “Tathāgata,” however, is distinctive in being
less often associated with places of Śākyamuni’s ministry, such as Śrāvastī and
Rājagṛha (second column, Table 5.8). Under this epithet, he is often said to be
Analyzing Who, What, and Where in a Historical Mediæval Chinese Corpus
97
Table 5.8 Most Frequent Places Associated with the Three Major Epithets of Buddha and
with Other Characters
Buddha
Tathāgata
Location
Freq.
Jetavana
37.8% Varanasī
18.4% Śrāvastī
Śrāvastī
30.4% Kuśinagara
10.0% Jetavana
Rājagr.ha
Bamboo
Grove
Vaiśālī
Location
Bhagavat
Freq.
Location
All Other Characters
Freq.
Location
Freq.
8.6% Dīpam.kara
Buddha’s place
4.2% Śrāvastī
9.5% Rājag.rha
20.1% Buddha’s
place
14.1% Bhagavat’s
place
11.5% Śrāvastī
19.4%
5.8% Varanasī
7.9% Varanasī
5.4%
3.3% Jetavana
4.2% Vaiśālī
7.3% Jetavana at
Śrāvastī
3.9%
9.5%
6.7%
at the place of Dīpaṃkara Buddha, “Lamp-bearer Buddha” 然燈佛所 (ranked
third), one of the so-called celestial buddhas who reached enlightenment eons
before Śākyamuni. Finally, the toponyms of other characters in the Canon also
differ significantly. They are more frequently said to be at “Buddha’s place”
and “Bhagavat’s place” (rightmost column, Table 5.8) rather than any specific
geographical location.
5.6 Conclusions
Information extraction from historical text is challenging because of the lack
of training data. This chapter investigated whether and how to exploit a limited
amount of in-domain training data to improve extraction accuracy and presented a
case study on data-driven profiling of the characters and toponyms in the Chinese
Buddhist Canon, the largest collection of Medieval Chinese texts. We applied
lexicon-based NER on the Canon and then extracted significant character-verb
and character-toponym associations.
The contribution of this chapter is threefold. First, we have created the largest Medieval Chinese corpus to date that is automatically annotated with named
entities, as well as associations between characters, verbs, and toponyms.13 Second, we have shown that even a relatively small amount of in-domain linguistic
annotation is useful for this kind of analysis. In our case, a word segmenter, POS
tagger, and dependency parser trained on 50,000 Chinese characters were able to
improve accuracy in NER and the extraction of character-verb pairs. These results
suggest that future analyses on historical corpora in low-resource languages may
also benefit from annotation on a similar scale. Third, we illustrated the utility
of these annotations with a quantitative analysis of “who,” “what,” and “where”:
who the characters were and how they reflect doctrinal differences within the
Canon; what they did, as gleaned from their verb profiles; where they were and
how characters vary in their association with locations.
98
Tak-sum Wong and John Sie Yuen Lee
This research can be extended in a number of directions. Using manual annotation resulting from this study, NER extraction accuracy can potentially be further
improved by bootstrap learning (e.g., Wang et al. 2018) and distant supervision
(Mintz et al. 2009). Likewise, word segmentation, POS tagging, and dependency
annotation performance may also benefit from domain adaptation from modern
Chinese resources (Song and Xia 2014). It would also be interesting to expand the
scope from “who,” “what,” and “where” to a full “sketch” (Kilgarriff et al. 2014)
of characters and places, and from the Chinese Buddhist Canon to other seminal
works of literature whose size precludes manual analysis.
Notes
1 This book chapter is an extension of a conference paper presented in the 24th Australasian Document Computing Symposium (Wong and Lee 2019).
2 We used the Chinese Canon since many Buddhist texts, especially those from the
Mahāyāna tradition, survive only in their Medieval Chinese translation.
3 The Mahāyāna section runs from K1 to K646, the Hīnayāna from K647 to K978.
4 For example, 不可思議 biətkɑ̌siə’ngyɛ̀ is listed in the dictionary as the name of a monk in the
T‘ang period; it also happens to be a common adjective meaning “inconceivable.” Consider,
for example, the sentence 讚歎阿彌陀佛不可思議功德。(K192) Tz ǹ t ǹ Qɑmiɛdhɑbhiət
biətkɑ̌siə’ngyɛ̀ gungdək, “Praise the unimaginable merit of Amitabha Buddha.”
5 We did so by examining whether the lexicon provides the “original language” for the
term; for example, the entry for Shìjiāmóunífó 釋迦牟尼佛 includes the Sanskrit original Śākyamuni, while that for Kŏngqiū孔丘, “Confucius,” or the monk Biətkɑ̌siə’ngyε̌
不可思義, do not.
6 Alternative names for a character are considered the same character. In the Personal
Authority Database, a character can be referred to by different names or epithets. Buddha, for example, can be called fó, “Buddha”; rúlái, “Tathāgata”; and shìzūn, “Bhagavat.” We combine these statistics into the same character.
7 When in plural, the term “Buddha” refers not to Śākyamuni but to any Buddha among
the myriads that populate the cosmos or belong to the previous lineage of teachers that
led to Śākyamuni. Hence, we excluded all names with the plural marker 諸, zhū (e.g.,
諸佛, zhūfó, “buddhas”).
8 For example, Subhūti is not depicted to be at Ganges River despite the co-occurrence
of the two words in the sentence “須菩提,如恒河中所有沙數。” “Subhūti! As
numerous as the sands of the Ganges River.”
9 In addition, we included the localizer involving the word 所, suŏ, “place,” which is
common in our corpus. The word suǒ is typically modified by a personal name, for
example, 佛所, fó suŏ, “place of Buddha.”
10 A number of verbs, though frequently associated with geographical names, do not
indicate the location of its object; for example, míng, 名, “call,” is used for presenting place-names. Further, a number of prepositions, though frequently associated with
geographical names, do not indicate the location of a person; for example, rú, 如, “such
as,” is used in citing examples. We removed these verbs and prepositions from our list.
11 For example, the character Ānanda and the toponym Jetavana appear at different sentences in this passage:
(K936) “At that time, (after) Saint Ānanda has preached with Kiæpbiət,
he left from seat. (He) gradually begged (for food) and arrived Jetavana at Śrāvastī.”
12 E.g., the directional verb shàng, 上, “ascend,” in the sentence Njiuləi zhiǎng
S mzhips mten’iɛ̀m, 如來上三十三天焰.
13 This corpus, and other relevant resources, are accessible publicly at http://mega.lt.cityu.
edu.hk/~tswong/tkt/.
Analyzing Who, What, and Where in a Historical Mediæval Chinese Corpus
99
References
Agarwal, Apoorv, Augusto Corvalan, Jacob Jensen, and Owen Rambow. 2012. “Social
Network Analysis of Alice in Wonderland.” In Proceedings of the Workshop on Computational Linguistics for Literature, edited by David Elson, Anna Kazantseva, Rada
Mihalcea, and Stan Szpakowicz, 88–96. Stroudsburg, PA: Association for Computational Linguistics. https://aclanthology.org/W12-2513/.
Agarwal, Apoorv, and Owen Rambow. 2010. “Automatic Detection and Classification of
Social Events.” In Proceedings of the 2010 Conference on Empirical Methods in Natural
Language Processing (EMNLP’10), edited by Hang Li and Luís Màrquez, 1024 − 34.
Stroudsburg, PA: Association for Computational Linguistics. https://aclanthology.org/
D10-1100/.
Bamman, David, Brendan O’Connor, and Noah A Smith. 2013. “Learning Latent Personas of Film Characters.” In Proceedings of the 51st Annual Meeting of the Association
for Computational Linguistics (ACL), edited by Hinrich Schuetze, Pascale Fung, and
Massimo Poesio, 352–61. Stroudsburg, PA: Association for Computational Linguistics.
https://aclanthology.org/P13-1035/.
Bingenheimer, Marcus, Jen-Jou Hung, and Simon Wiles. 2009. “Markup Meets GIS – Visualizing the ‘Biographies of Eminent Buddhist Monks’.” In Proceedings Information Visualization IV 2009, edited by Ebad Banissi et al., 550–4. Danvers, MA: The Institute of
Electrical and Electronics Engineers, Inc. www.computer.org/csdl/proceedings-article/
iv/2009/3733a550/12OmNvnOwrT.
Bornet, Cyril, and Frédéric Kaplan. 2017. “A Simple Set of Rules for Characters and Place
Recognition in French Novels.” Frontiers in Digital Humanities 4, no. 6: 21. www.frontiersin.org/articles/10.3389/fdigh.2017.00006/full.
Büchler, Marco, Annette Geßner, Thomas Eckart, and Gerhard Heyer. 2010. “Unsupervised Detection and Visualisation of Textual Reuse on Ancient Greek Texts.” Journal of
the Chicago Colloquium on Digital Humanities and Computer Science 1, no. 2: 1–17.
Chang, Pi-Chuan, Michel Galley, and Christopher D. Manning. 2008. “Optimizing Chinese Word Segmentation for Machine Translation Performance.” In Proceedings of the
Third Workshop on Statistical Machine Translation, edited by Chris Callison-Burch and
Philipp Koehn, 224–32. Stroudsburg, PA: Association for Computational Linguistics.
https://dl.acm.org/doi/10.5555/1626394.1626430.
Chang, Pi-Chuan, Huihsin Tseng, Dan Jurafsky, and Christopher D. Manning. 2009. “Discriminative Reordering with Chinese Grammatical Relations Features.” In Proceedings
of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3), edited
by Dekai Wu and David Chiang, 51–59. Stroudsburg, PA: Association for Computational Linguistics.
Chu Chia-ning 竺家寧. 1996. Vocabulary of Buddhist Sutras of the West Chin Dynasty 《
早期佛經詞彙研究:西晉佛經詞彙研究》 (Technical Report). Taipei: National Science Council.
Chu Chia-ning 竺家寧. 1998. The Lexicology of Buddhist Sutra in Ancient China (II) –
(Technical
Three Kingdoms
Report). Taipei: National Science Council.
Chu Chia-ning 竺家寧. 1999. The Lexicology of Buddhist Sutra in Ancient China (III) –
Eastern Han Dynasty 《早期佛經詞彙研究:東漢佛經詞彙研究》 (Technical
Report). Taipei: National Science Council.
Clement, Tanya E. 2008. “ ‘A Thing Not Beginning and Not Ending’: Using Digital Tools
to Distant-read Gertrude Stein’s The Making of Americans.” Literary and Linguistic
Computing 23, no. 3: 361–81.
100
Tak-sum Wong and John Sie Yuen Lee
DDBC. 2008a. “Buddhist Studies Person Authority Databases (Beta Version).” Buddhist
Studies Authority Database Project, Dharma Drum Buddhist College. http://authority.
ddbc.edu.tw/person/. Accessed 14 September 2019.
DDBC. 2008b. “Buddhist Studies Place Authority Databases (Beta Version).” Buddhist
Studies Authority Database Project, Dharma Drum Buddhist College. http://authority.
ddbc.edu.tw/place/. Accessed 14 September 2019.
Doddington, George, Alexis Mitchell, Mark Przybocki, Lance Ramshaw, Stephanie Strassel, and Ralph Weischedel. 2004. “The Automatic Content Extraction (ACE) Program –
Tasks, Data, and Evaluation.” In Fourth International Conference on Language
Resources and Evaluation Proceedings (LREC 2004), edited by Maria Teresa Lino,
Maria Francisca Xavier, Fátima Ferreira, Rute Costa, and Raquel Silva, 837–40. Paris:
European Language Resources Association. https://aclanthology.org/L04-1011/.
Elson, David K., Nicholas Dames, and Kathleen R. McKeown. 2010. “Extracting Social
Networks from Literary Fiction.” In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL2010), edited by Jan Hajič, 138–47. Stroudsburg, PA: Association for Computational Linguistics. https://aclanthology.org/P10-1015/.
Finkel, Jenny Rose, Trond Grenager, and Christopher Manning. 2005. “Incorporating Nonlocal Information into Information Extraction Systems by Gibbs Sampling.” In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, edited by
Kevin Knight, Hwee Tou Ng, and Kemal Oflazer, 363–70. Stroudsburg, PA: Association
for Computational Linguistics. https://aclanthology.org/P05-1045/.
Gutierrez, Fernando, Dejing Dou, Stephen Fickas, Daya Wimalasuriya, and Hui Zong.
2016. “A Hybrid Ontology-based Information Extraction System.” Journal of Information Science 42, no. 6: 798–820.
Holmes, David I. 1994. “Authorship Attribution.” Computers and the Humanities 28:
87–106.
Hung, Jen-Jou, Marcus Bingenheimer, and Simon Wiles. 2010. “Quantitative Evidence for
a Hypothesis Regarding the Attribution of Early Buddhist Translations.” Literary and
Linguistic Computing 25, no. 1: 119–34.
Ji, Heng, and Ralph Grishman. 2008. “Refining Event Extraction through Unsupervised
Cross-document Inference.” In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics with the Human Language Technology Conference
(HLT) of the North American Chapter of the ACL (ALT-08: HLT), edited by Kathleen
McKeown, 254–62. Stroudsburg, PA: Association for Computational Linguistics.
https://cs.nyu.edu/~hengji/CrossDocIE_Ji.pdf.
Kieschnick, John. 2015. A Primer in Chinese Buddhist Writings: Volume One: Foundations. Stanford, CA: Stanford University. https://religiousstudies.stanford.edu/people/
john-kieschnick/primer-chinese-buddhist-writings. Accessed 14 September 2019.
Kilgarriff, Adam, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit,
Pavel Rychlý, and Vít Suchomel. 2014. “The Sketch Engine: Ten Years On.” Lexicography 1, no. 1: 7 − 36.
Lafferty, John D., Andrew McCallum, and Fernando C. N. Pereira. 2001. “Conditional
Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.” In
Proceedings of the Eighteenth International Conference on Machine Learning (ICML
2001), edited by Carla E. Brodley and Andrea Pohoreckyj Danyluk, 282–89. Williamstown, MA; San Francisco, CA: Morgan Kaufmann Publishers Inc. https://repository.
upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers.
Lancaster, Lewis. 2010. “From Text to Image to Analysis: Visualization of Chinese Buddhist
Canon.” In Digital Humanities 2010: Conference Abstracts, edited by Elena Pierazzo,
Analyzing Who, What, and Where in a Historical Mediæval Chinese Corpus
101
Charlotte Tupman and Camille Desenclos, 185–7. Oxford: Office for Humanities Communication and Centre for Computing in the Humanities. https://repository.upenn.edu/
cgi/viewcontent.cgi?article=1162&context=cis_papers. Accessed 14 September 2019.
Lancaster, Lewis R., and Sung-bae Park. 1979. The Korean Buddhist Canon: A Description
Catalogue. Berkeley, CA: University of California Press. www.acmuller.net/descriptive_catalogue/. Accessed 14 September 2019.
Lee, John, and Yin Hei Kong. 2016. “A Dependency Treebank of Chinese Buddhist Texts.”
Literary and Linguistic Computing 31, no. 1: 140–51.
Lizarralde, Ignacio, Cristian Mateos, Juan Manuel Rodriguez, and Alejandro Zunino.
2019. “Exploiting Named Entity Recognition for Improving Syntactic-based Web Service Discovery.” Journal of Information Science 45, no. 3: 398–415.
McDonald, Ryan, Kevin Lerman, and Fernando Pereira. 2006. “Multilingual Dependency Analysis with a Two-stage Discriminative Parser.” In Proceedings of the Tenth
Conference on Computational Natural Language Learning (CoNLL-X), edited by Lluís
Màrquez and Dan Klein, 216–20. Stroudsburg, PA: Association for Computational Linguistics. https://aclanthology.org/W06-2932/.
Mintz, Mike, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. “Distant Supervision for
Relation Extraction without Labeled Data.” In Proceedings of the Joint Conference
of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on
Natural Language Processing of the AFNLP: Volume 2, edited by Keh-Yih Su, Jian Su,
Janyce Wiebe, and Haizhou Li, 1003–11. Stroudsburg, PA: Association for Computational Linguistics. https://aclanthology.org/P09-1113/.
Moretti, Franco. 2007. Graphs, Maps, Trees: Abstract Models for Literary History. London: Verso.
Peng, Fuchun, Fangfang Feng, and Andew McCallum. 2004. “Chinese Segmentation and
New Word Detection using Conditional Random Fields.” In Proceedings of the 20th
International Conference on Computing Linguistics (COLING’04). Stroudsburg, PA:
Association for Computational Linguistics. https://aclanthology.org/C04-1081/.
Prebish, Charles S. 2000. Buddhism: A Modern Perspective. University Park, PA: Penn
State University Press.
Rayson, Paul. 2008. “From Key Words to Key Semantics Domains.” International Journal
of Corpus Linguistics 13, no. 4: 519–49.
Sayoud. 2012. “Author Discrimination between the Holy Quran and Prophet’s Statements.” Literary and Linguistic Computing 27, no. 4: 424–44.
Shih, Cheng-Wei, Tzong-Han Tsai, Shih-Hung Wu, Chiu-Chen Hsieh, and Wen-Lian Hsu.
2004. “The Construction of a Chinese Named Entity Tagged Corpus: CNEC1.0.” In
Proceedings of the 16th Conference on Computational Linguistics and Speech Processing, edited by Lee-Feng Chien and Hsin-Min Wang, 305–13. Taipei: The Association
for Computational Linguistics and Chinese Language Processing. https://aclanthology.
org/O04-1032/.
Song, Yan, and Fei Xia. 2014. “Modern Chinese Helps Archaic Chinese Processing: Finding and Exploiting the Shared Properties.” In Proceedings of the Ninth International
Conference on Language Resources and Evaluation (LREC’14), edited by Nicoletta
Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph
Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, 3129–36. Paris: European
Language Resources Association (ELRA). www.lrec-conf.org/proceedings/lrec2014/
pdf/138_Paper.pdf.
Soothill, William Edward, and Lewis Hodous. 1937. A Dictionary of Chinese Buddhist
Terms: With Sanskrit and English Equivalents and a Sanskrit-Pali Index. Carter Lane,
102
Tak-sum Wong and John Sie Yuen Lee
EC: Kegan Paul, Trench, Trubner & Company, Limited. http://mahajana.net/texts/
soothill-hodous.html. Accessed 14 September 2019.
Surdeanu, Mihai, and Massimiliano Ciaramita. 2007. “Robust Information Extraction with
Perceptrons.” In Proceedings of the NIST 2007 Automatic Content Extraction Workshop
(ACE07). Paris: European Language Resources Association (ELRA). http://surdeanu.
info/mihai/papers/ace07a.pdf
Tseng, Huihsin, Pichuan Chang, Galen Andrew, Daniel Jurafsky, and Christopher Manning. 2005. “A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005.”
In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing
(IJCNLP-05), edited by Chu-Ren Huang and Gina-Anne Levow, 168–71. Singapore:
Asian Federation of Natural Language Processing. https://aclanthology.org/I05-3027/.
van Dalen-Oskam, Karina, Jesse de Does, Maarten Marx, Isaac Sijaranamual, Katrien
Depuydt, Boukie Verheij, and Valentijn Geirnaert. 2014. “Named Entity Recognition
and Resolution for Literary Studies.” Computational Linguistics in the Netherlands 4:
121–36.
Vossen, Piek, Eneko Agirre, Nicoletta Calzolari, Christiane Fellbaum, Shu-kai Hsieh,
Chu-Ren Huang, Isahara Isahara, Kyoko Kanzaki, Andrea Marchetti, Monica Monachini, Federico Neri, Remo Raffaelli, German Rigau, Maurizio Tescon, and Joop Vangent. 2008. “KYOTO: A System for Mining, Structuring and Distributing Knowledge
across Languages and Cultures.” In Proceedings of the Sixth International Language
Resources and Evaluation (LREC’08), edited by Nicoletta Calzolari, Khalid Choukri,
Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, and Daniel Tapias, 1462–
9. Paris: European Language Resources Association (ELRA). https://aclanthology.org/
L08-1250/.
Wang, Xiaoyu, Yujia Zhai, Yuanhai Lin, and Fang Wang. 2018. “Mining Layered Technological Information in Scientific Papers: A Semi-supervised Method.” Journal of Information Science 45, no. 6: 779–93.
Wong, Tak-sum, and John Lee. 2016. “A Dependency Treebank of the Chinese Buddhist
Canon.” In Proceedings of the Tenth International Conference on Language Resources
and Evaluation, edited by Nicoletta Calzolari, 1679–83. Paris: European Language
Resources Association. https://aclanthology.org/L16-1265/.
Wong, Tak-sum, and John Lee. 2019. “Character Profiling in Low-Resource Language
Documents.” In Proceedings of the 24th Australasian Document Computing Symposium
(ADCS 2019), edited by Gianluca Demartini and Paul Thomas, 1–4. New York: Association for Computing Machinery.
Xue, Naiwen, Fei Xia, Fu-Dong Chiou, and Marta Palmer. 2005. “The Penn Chinese Treebank: Phrase Structure Annotation of a Large Corpus.” Natural Language Engineering
11: 207–38.
Zhao, Hai, Chang-Ning Huang, and Mu Li. 2007. “An Improved Chinese Word Segmentation System with Conditional Random Field.” In Proceedings of the Fifth SIGHAN
Workshop on Chinese Language Processing, edited by Hwee Tou Ng and Olivia Oi Yee
Kwong, 162–5. Stroudsburg, PA: Association for Computational Linguistics. https://
aclanthology.org/W06-0127/.
Zhou, Guodong, Min Zhang, Donghong Ji, and Qiaoming Zhu. 2007. “Tree Kernel-based
Relation Extraction with Content-sensitive Structured Parse Tree Information.” In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language
Processing and Computational Natural Language Learning (EMNLP-CoNLL), edited
by Jason Eisner, 728–36. Stroudsburg, PA: Association for Computational Linguistics.
https://aclanthology.org/D07-1076.pdf.
6
Corpora and Literary
Translation
Titika Dimitroulia
6.1 Introduction: Literary Translation, Tools, and Corpora
In her seminal paper “Corpus linguistics and Translation Studies: Implications
and applications,” which introduced corpus linguistics (CL) in translation studies
(TS) and founded the subfield of corpus-based translation studies (CBTS or CTS),
Mona Baker predicted that the use of large electronic corpora and their interrogation would reshape and expand the new discipline (1993, 235). Since then, corpora and their technology have had an ever-increasing role in both descriptive and
applied translation studies on the one hand and translation practice on the other.
Translated literary texts have been investigated in CBTS since its very emergence (Anderman and Rogers 2008, 13), while recent surveys confirm that literary
translation remains one of the most popular research fields in both TS (Van
Doorslaer and Gambier 2015; Zanettin et al. 2015) and CBTS (Granger and Laufer
2022). With the initiative of Mona Baker (1993, 2000), a large amount of research
has been done on the “regularities of translated texts, regularities of translators and
regularities of languages” (Zanettin 2014, 178) with the use of various models of
analysis (see Sun and Li 2020 and Zanettin 2017 for an overview). Corpus-based
literary translation research has progressively adopted approaches to literary translation that situate its investigation in wider sociocultural and historical settings,
grounded in the previous achievements of CBTS and using methodologies drawn
from digital humanities.
Evidence on the popularity of literary translation in descriptive corpus-based
research domain runs counter to the scarcity of applied research on the specific use
of corpora and corpus-based translation tools in literary translation practice and
literary translator education. The use of corpus technology and computer-aided or
assisted translation (CAT) tools has been for long exclusively investigated in nonliterary translation practice and non-literary translator education. This is probably
due to the commonly held view that the use of technology in literary translation
practice is insignificant. The emergence of CALT (Computer-Aided or Assisted
Literary Translation) opens new paths of research in the field.
Notwithstanding this, reports on the use of corpora and CAT tools in literary
translation practice are still missing. In general surveys, such as the UK Translator
Survey 2016, which, among else, has looked into the use of tools by translators,
DOI: 10.4324/9781003298328-7
104
Titika Dimitroulia
a specific question on corpora was included, and some useful data can be drawn
on this from the responses of literary translators (EC Representation in the UK,
CIOL and ITI 2017, 35–37). Recently, more specific surveys have also been conducted to investigate the interaction of literary translators with tools, including
corpora, such as Slessor’s (2020) including 40 Canadian literary translators and
Ruffo’s (2021) surveying 150 literary translators working with various language
pairs in various countries. Other surveys, such as Şahin and Gürses’s (2021) in
Turkey, have focused on specific tools, namely, machine translation (MT). As
a result, more data on the use of corpora in literary translation practice are now
available as a basis of further investigation, but a lot needs to be done.
Data from these surveys confirm that, for the time being, the relation of literary
translators to technology generally remains restricted to a minimum, consisting of
the use of word processors and the internet for communication and documentation purposes. At the same time, one of the most interesting findings in Slessor’s
(2020, 248) survey, partially confirmed by Ruffo’s data (2021, 83 and 135), is that
(user-friendly) corpus technology seem to prevail among the wishes of literary
translators as regards tools that meet their specific needs.
The technological needs of literary translators are determined by “the inseparability of form and content in literary language” (Taivalkoski-Shilov 2019, 694)
and the specific nature of the texts literary translators deal with, which “are often
characterized by a remarkable, vocal multilayeredness and deliberate ambiguity
. . . plural interpretations,” etc. (ibid., 695–696). Literary translation is an interpretative act which entails enhanced conceptual, linguistic, and (inter)cultural skills,
as well as creativity that is considered to be hindered by technology. Literary
translators put forward the very nature of their work in order to account for their
negative perceptions of CAT tools and, in particular, MT, while they willingly
adhere to non-translation-specific tools, among which is corpora, used as translation aids (Youdale and Rothwell 2022; Ruffo 2021; Slessor 2020). It is noteworthy, however, that the most popular research area among scholars interested in
tools in literary translation practice is MT (see Youdale 2020, 19–23, for a brief
overview).
It is clear that the use of corpora and corpus technology by literary translators
needs to be further studied, against the background of the increasing technologization of the translation profession, which does not leave unaffected literary
translation, and the emergence of the research field of CALT. I would like to
contend that corpus technology is an open and flexible technology that may
well respond to the needs of literary translators. Corpus use in literary translation practice can definitely enhance translators’ creativity, while (corpus-based)
translation-specific tools can also be of help for them in specific settings. Corpora are also of paramount importance for the further study of literary translation as “one of the major shaping elements in the processes of transmission of
ideas, texts and cultural practices” (Bassnett and Johnston 2019, 183). Digital
humanities (DH) methodologies and hermeneutical text analysis (Sinclair and
Rockwell 2016) can enrich CBTS approaches and cast light both on the textual
and extratextual terrains of literary translation, which lies at the heart of World
Literature (Damrosch 2003).
Corpora and Literary Translation 105
In what follows, I will, first, provide an outline of the field of CALT, with a
focus on the current and prospective uses of corpus technology in literary translation practice and its integration in literary translator education. On the basis
of this, I will then present some new trends in corpus-based literary translation
research, with an emphasis on perspectives that explore “the wider cultural and
historical context [of literary translation], including the circumstances of production and reception of both source and target texts” (Zanettin 2017).
6.2 CAT Tools and Corpus Technology in Literary
Translation Practice and Literary Translator Education
CAT is mainly defined today in two ways: either as encompassing all tools, which
can potentially assist translators, including corpus technology, or as referring
only to translation production proper and therefore to translation memories (TM),
termbases (TB), and machine translation (MT) (Zhang and Nunes Vieira 2021;
EMT Group 2017; PACTE Group 2018; Frérot 2016; Loock 2016). The broad definition of CAT is adopted here, which includes both “active” and “passive language
technologies” (Taravella 2011) that is “generative” and “look-up technologies”
(outils de production and outils de consultation, respectively, according to Frérot
and Karagouch 2016). Corpora and corcondancers are part of the latter (ibid.).1
Zanettin reminds us that CALT was first conceptualized in the field of CBTS
by Jan-Mirko Maczewski (1996), a pioneer in corpus-based literary translation
research, “who proposed the acronym CoALiTS (Computer Assisted Literary
Translation Studies) to denote a field of research combining literary and linguistic computing with literary translation” (2017). Almost 30 years later, this field
seems to consolidate and corpus linguistics ranges among CALT technologies
to be further investigated. The chart of the eight main types of translation technologies (see Figure 6.1) proposed in 1998 by the computational linguist Alan K.
Melby can be useful in situating this field.
The first seven types will be commented upon in what follows, with special
reference to corpora; the eighth one, dealing with project and billing management,
does not apply to most settings in literary translation practice.
INFRASTRUCTURE
SEGMENT LEVEL
TERM LEVEL
BEFORE
TRANSLATION
˜ Term candidate extraction
˜ Terminology research
˜
New text segmentation, previous sourcetarget text alignment, and indexing
DURING
TRANSLATION
˜ Automatic terminology lookup
˜
˜
Translation memory lookup
Machine translation
˜ Terminology consistency check and nonallowed terminology check
˜
Missing segment detection and format
and grammar checks
AFTER
TRANSLATION
TRANSLATION WORKFLOW AND BILLING MANAGEMENT
Organization of the eight translation tool functions.
Figure 6.1 Melby’s eight types of translation technology (1998, 1).
106
Titika Dimitroulia
6.2.1 Corpora and Literary Translators’ (Complex) Needs
Melby’s infrastructure covers document creation and management systems,
telecommunications (including e-mail, web browsing, etc.), and looking up in
terminology databases (1998, 1). In other words, it refers to non-translationspecific tools, including those needed for documentation in both the reception and
formulation phase in the translation process model introduced by Holmes for the
evaluation of literary translations (1988, 84).2
“Web browsing” is related to passive language technologies, namely, the “various
search tools and databases,” which seem to be the most praised tools among translators (Koskinen and Ruokonen 2017, 106). Although corpora are included in these
technologies, it is widely admitted that they are not used systematically by non-literary
translators (Peraldi 2019; Frérot 2016; Bowker and Corpas Pastor 2015; FrankenbergGarcia 2015). This seems to be also true in the case for literary translators.
Slessor does not explicitly mention corpora among general technologies; thus,
only inferences about their use can be made. In Ruffo’s survey, a few literary
translators confirm that they use corpora (2021, 135). Only 6% of those who have
received technology-specific training in academic settings were trained in corpus
technology, while corpora are not mentioned at all in the case of vocational training (Ruffo 2021, 93–94).
There is no reference to the use of “robust” or “stable” corpora, in Zanettin’s
(2002) words, that is, mainly large reference corpora, which can be very useful in literary translation. It is true that they are not available for all languages, and very often
they are not easily adapted to literary translators’ needs (Gallego-Hernández 2015,
376). Even when reference corpora can be found for a language and can be employed
for the study of language in use, literary translators are probably not inclined to use
them if they are not trained to do so. As regards parallel corpora, Zanettin stresses
“the scarcity of easily available sources of parallel corpora beyond a few domains
and text types” (2012, 154). Literary parallel corpora are the hardest to find for reasons related to the greater effort needed for their creation and to copyright issues.
The absence of available reliable corpora, the lack of knowledge about their availability and interrogation, and even more, “the widespread lack of knowledge about
the very concept of corpus” (Peraldi 2019, 271) explain why literary translators,
together with their non-literary colleagues, systematically use for their linguistic
inquiries the “new generation dictionaries” on the web, such as Linguee, Reverso,
or Tradooit, among others. In Ruffo’s survey, they are occasionally characterized by
literary translators as “corpus dictionaries” (2021, 196). This means that some literary translators are aware that these resources are parallel-corpus-based applications.
It is not at all certain, however, that they are equally aware that, as a result, they
depend heavily on the quality of the corpora upon which they rely, which cannot be
easily evaluated, as they are not often directly accessible (Loock 2016, 91–92). It is
not obvious equally that they perceive their difference from bilingual dictionaries,
which are “repertoires of lexical equivalents,” whereas parallel corpora are “repertoires of strategies deployed by past translators, as well as repertoires of translation
equivalents” (Zanettin 2002, 11), which may not suit their specific needs.
Corpora and Literary Translation 107
While 75% of participants in Slessor’s survey use Linguee, regarded as a
“translation search engine,” 93% of them use “general search engines” (2020,
244), confirming Koskinen and Ruokonen’s findings on the popularity of search
tools (2017). This means that they interrogate the web as a monolingual or comparable (and multimodal) “mega-corpus,” which, of course, is “not as balanced
and reliable as a carefully chosen collection of texts” (Kübler 2008, 28). While
the absence of reliable corpora of various types in several languages makes the
“web-as-corpus” much more important, the risks in their use remain considerable
with regard to the information’s quality (Loock 2016, 89–91).
Peraldi’s remark about most non-literary translators seems to be valid also for
literary translators:
[They] do not even realise that classic and daily-used functionalities such
as concordance searches in translation memories, the use of parallel-corpus
based applications such as Linguee, or even plain collocates search requests
on web engines already fall into corpus-based proficiency (Picton et al. 2015)
and could therefore be boosted by the use of more powerful tools.
Nevertheless, literary translators seem to vaguely perceive the usefulness of corpus technology for their work, as most items in literary translators’ technological
wish list provided by the participants to Slessor’s survey, as an answer to the question “If you could design a technological tool for literary translators, what would
you want it to do?” (2020, 248) are related to corpora:
1
2
3
4
5
6
7
8
9
Replace typing.
An effective tool for creating bitexts.
For literary translations, it would be nice to have tool for capturing printed
texts (throughout a computer or smartphone camera?).
Provide me a comparison of contexts in which a given word or phrase is used,
specific to genre.
Determine the frequency of the most-used terms of phrases, to avoid repetitions in close proximity or verbal tics.
Automated creation of glossary, with references.
Literary terminology organized by subject, culture, period.
A database of all books ever written, which could be searched with a few keywords. That would be useful to track down original or translated quotations.
To give the translator all possible translations of a specific term.
This is in line with literary translators’ wishes expressed in Ruffo’s survey, where
corpora and concordance search range among several other appealing tools (2021,
136). Still, too few of them describe corpus technology as “the best method to
observe the effective linguistic use of words” (Ruffo 2021, 83) or as a “powerful
resource,” in that it helps uncovering “patterns which close reading alone can
either only subjectively assess or not register at all, e.g., sentence length and repetition” and improve consistency (2021, 135–6).
108
Titika Dimitroulia
The potential of corpus technology remains generally invisible for most literary
translators. This invisibility can explain why they ask for a tool to display term
occurrences so that they avoid repetitions in their translation, as well as words in
context in order to better understand their use, while so many concordancers are
today available.
Some concordancers, such as the one provided by Voyant tools, are online and
free,3 which means that literary translators do not need to install a tool on their
computer but can start working immediately on their queries. In addition, Voyant
tools support many text formats and therefore do not require from the translator a
conversion effort, as other concordancers do, and presuppose only a basic knowledge for the exploration of their basic features, while they offer advanced features
for hermeneutical text analysis (Sinclair and Rockwell 2016). Concordancers can
be used in multiple ways by literary translators as a translation aid in the translation process if they are taught how to use them. They can provide words in context
so that their meaning is fully perceived and word frequencies so that translators
can avoid repetitions, as they wish. Even more, they can unravel all sorts of hidden patterns in the texts and enhance a fine-grained interpretation, while parallel
concordancers, such as Sketch Engine’s one or Paraconc, allow the display and
exploration of bitexts.
An interesting experiment at the interface of research and practice illustrates
the potential of corpus use in the translation process. Youdale (2020) has applied
a hybrid, flexible methodology, called Close-Distant Reading (CDR), to the study
of literary style and the stylistic self-analysis of the translator. CDR draws from
corpus linguistics and DH and links human close-reading with corpus-driven distant reading, enhanced by visualization techniques, in both phases of reception
and formulation of the translation process.
Nevertheless, not all information required by literary translators can be found in
online corpora, corpus-based resources, and the web searched as corpus or retrieved
from the text analysis of the foreign text and, eventually, its retranslation(s).
Questions “for which there are often no clear answers in dictionaries, glossaries,
Google searches and other tools and resources that [translators] are accustomed to
using” (Frankenberg-Garcia 2015, 352) may lead literary translators to build their
own corpora. Even the wish for a tool displaying words in context in a specific
genre implies the creation of a do-it-yourself (DIY) corpus (Zanettin 2002), which
can be interrogated for specific purposes. Still, literary translators cannot find
easily the digital texts needed for their creation (Zanettin 2012), especially when
they work with peripheral languages. Non-literary translators can use tools to
collect electronic texts, such as Bootcat (Baroni and Bernardini 2004), to automatically build DIY corpora, whereas the shortage of online resources relevant to
literature in many languages explains why literary translators need to be trained
in corpus analysis as well as in corpus creation.
Five participants in Ruffo’s survey mentioned, however, that they use “a software to build their own corpus of translations” (2021, 83). Although no further
information is provided on the type of corpora created and the purposes of their
use, we can assume that, along with technical skills, methodologies drawn from
corpus-based research may eventually be needed for the effective compilation and
Corpora and Literary Translation 109
interrogation of DIY corpora. Clearly formulated hypotheses in literary translation depend on theoretical assumptions.
For instance, various translation-driven corpora, as described by Zanettin (2012),
and not only parallel ones, may have to be combined so that the translator apprehends the foreign text in depth and finds a range of solutions for given translation
problems, which will not hinder his/her creativity (Malmkjær 2008). Their use can
help for instance the translator reconstitute, even partly, the intertextual node of a
classic text and therefore better understand its position and functioning in its context (Venuti 2009). Venuti’s theory of intertextuality in and of translation, along
with Theo Hermans’s “translation-specific intertextuality” (2003, 40) and Cay Dollerup’s (2000) “support-translation” concept, may apply not ex post, as a method
of reading a translation, but ex ante, as a methodology for creating a translation on
the basis of multiple information drawn from various corpora. Along with print
resources, several types of corpora may be of help for the translation of such a text:
•
•
•
•
•
•
•
•
The parallel corpus of the source text and its retranslations, if any.
The monolingual corpus of all works by the author.
A parallel corpus containing those of his or her works who have already been
translated, along with their translations in the target language or in other
languages.
A monolingual comparable corpus including works of the same genre written
by other authors in the same period in the source language, and a subcorpus
containing texts of authors sharing the same aesthetic ideas.
A monolingual comparable corpus including non-literary texts written in the
same period in the source language.
A multilingual comparable corpus composed by the reviews of his/her works.
A monolingual corpus containing literary texts of the same period written in
the target language.
A monolingual reference corpus, etc.
These methodological uses of corpora cannot be simply invented by a trainee or a
professional translator and should be part of their education.
Corpus technology “builds on human intuition by acting very often as a validation tool” (Peraldi 2019, 273). It mainly generates questions on the basis of data
to be interpreted by the translator and therefore increases the range of translator’s
choices. Therefore, corpus technology seems to meet perfectly the needs of literary translators. Still, they need to be trained to their use. A corpus-based course for
literary translators could provide them competences in digitization, data curation
and editing, corpus creation, and analysis, grounded in a solid methodological and
theoretical framework. Practice-led experiments, such as Youdale’s, can be used
in their education to shed light to the humanist and humanizing dimension of technology, highlighted equally by the founder of the field of CAT, Martin Kay, whose
“translator’s amanuensis” triggers again today the interest of TS scholars (Alonso
and Vieira 2017). Although the task may seem demanding, as tools increasingly
become more easy to use, a corpus-based course for literary translators, such as
the one hinted at previously, may be quite feasible soon.4
110
Titika Dimitroulia
6.2.2 What CAT Tools Can Do for Literary Translators
CAT tools are translation-specific tools, generative, active language technologies.
According to data drawn from the two surveys mentioned in Section 6.1, literary
translators rarely use translation-specific tools (Ruffo 2021, 112–3; Slessor 2020,
245–7). The positive attitude toward CAT tools seems to still be an exception (Ruffo,
ibid.; Youdale and Rothwell 2022, 382–4; Hansen 2021). However, a quick look at
literary translators’ needs, along with related experiments (Youdale and Rothwell
2022; Hansen 2021), hint at the potential for the future use of the (corpus-based)
CAT tools in the literary translation process and justify the need for relevant, specialized literary translator education, adapted to the specificity of literary translation.
CAT tools, which are presented separately in Melby’s model, are today embedded in integrated environments. The three phases of the translation process as
described by Melby (before, during, and after) are thus often carried out by different features of the same tool, whose overall performance depends on the quality
of its “hidden” corpora (Loock 2016, 33–34).
At the term-level process, Melby mentions term extraction and research for the
building of the terminology database (before), terminology look-up (during), and
terminology consistency check (after). Many TMs provide today term extractors,
but building a term base needs usually much increased effort and manual work.
Terminology look-up can be useful for literary translators who have decided to invest
this effort in specific cases, for example, in mathematical fiction or science-fiction
texts. At any case, the automatic creation of glossaries can be found among the
translators’ wishes in Slessor’s survey (2020, 248), while terminology consistency
ranges among the pros of the use of technology in Ruffo’s survey (2021, 136).
At the segment-level process, Melby mentions alignment (before), translation
memory look-up, machine translation (during), and missing segment detection,
along with format and grammar checks (after). Translation memories are databases
which store aligned texts, that is, texts along with their translations, segmented on
sentence or paragraph level. So they are, in fact, parallel corpora of a specific type.
Texts can be directly aligned by the TM or by another tool and imported in the TM.
There are many open source, performing alignment tools, as for example, LF aligner,
you align, etc. Post-editing (PE) is always needed, and even more in literary texts.
TM’s parallel corpora can be entirely built up by a translator’s proper works but
commonly include bitexts imported from various sources, either ready-made or
aligned by the translator. When segments of the new text to be translated match –
exactly or partially (fuzzy matches) – already-existing translated TM segments,
these segments are automatically recalled from the database and proposed to the
translator so that s/he does not translate the same segment twice.
The specific resources needed by literary translators are once again more likely
not to be found easily online so as to be imported as bitexts in a TM. For this reason, in literary translation, TMs are more likely to be useful when built by translators with their own bitexts and with strictly controlled import of selected, aligned
resources. In general, segment look-up is not expected to be very useful for literary
translators, due to the high discursive complexity of translated texts, but could be
Corpora and Literary Translation 111
useful in specific cases, as, for example, when retranslations are imported, although
the suggested translations need to be critically examined one by one by the translator; or for the translation of humanities and social sciences texts, where reiterations of terms and phrases can be observed (Mazoyer 2021) but on which we don’t
focalize here. Until recently, this recall worked only with segments, and most CAT
tools provided a functionality called “concordance search,” allowing the translator
to search for a specific word, sequence, or phrase. Nowadays, subsegment matches,
which can be of greater interest for literary translators, are also available.
The fact that CAT tools can be useful when used in specific settings and mainly as
concordancers is confirmed by an experiment conducted by Andrew Rothwell, who
translated Zola’s novel La joie de vivre with the use of TM (Youdale and Rothwell
2022). Starting from the different French and English reception of the novel and the
theory of retranslation, Rothwell aligned Zola’s text and Ernest Alfred Vizetelly’s
translation that introduced it in the English literary space in the late nineteenth century and remained dominant along the twentieth century. Both were imported in a
TM, and the version the translator produced constantly referred to the Victorian text,
whose choices and important omissions reflected the era’s ideology. Although segments of this translation were proposed by the tool for insertion, they were rarely
accepted by Rothwell for different reasons, ranging from linguistic to cultural and
ideological ones. The overall value of the project lies on the level of interpretation, as
the older translation was “casting a different, historical light, sentence by sentence,
on the difficulties and possibilities of the ST” (Youdale and Rothwell 2002, 397).
Rothwell’s experiment illustrates the pros and cons of TMs in literary translation. Repetition is almost insignificant in literary texts, while segmentation may
hinder creativity. When retranslations are used in a hermeneutical perspective,
they seem to be counterproductive. But his experiment illustrates also a possible
hermeneutical use of CAT tools, which triggers reflection and supports a multilayered interpretation of the text that guides the translator’s decisions.
Machine translation is also embedded today in most TMs. The majority of literary translators seem to abhor MT (Ruffo 2021, 112; Slessor 2020, 246), while in
contrast it triggers the interest of the academic community in various perspectives
(see Youdale 2020, 19–23, for a brief overview; also see Moratto 2010). Yet many
scholars stress various ethical implications of the use of MT in literary translation
with regard to the conceptualization of literature and literariness or the translator’s
voice (Costa and da Silva 2020; Kenny and Winters 2020; Taivalkoski-Shilov 2019).
In a recent study which aimed at exploring creativity in literary translation through
the comparison of three versions of a short story by Kurt Vonnegut from English to
Catalan and Dutch, produced by neural MT (NMT), NMT plus human PE, and a
translator (Human Translation, HT), the translator’s version had, unsurprisingly, the
highest creativity score, followed by PE, while MT resulted in a very poor translation (Guerberof-Arenas and Toral 2022; cf. Guerberof-Arenas and Toral 2020).
Furthermore, the researchers point out that also “during PE translators are less creative (more errors and fewer creative shifts)” (Guerberof-Arenas and Toral 2022, 26).
The theoretical debate on the use of MT in literary translation cannot be dealt
with in detail here as it extends the scope of our research. Still, some experiments,
112
Titika Dimitroulia
like Youdale’s, which explores MT suggestions in CAT tools as potentially stimulating creativity (2022, 384–93), or Rothwell’s (2009), using MT in his translation
of dada poetry with reference to Walter Benjamin’s translation theory, suggest that
MT may be useful for specific purposes and in specific settings.
The last tools mentioned by Melby at the segment level process are related to
quality assurance, concerning misspellings, missing tags, incorrect punctuation,
etc. They can be useful for literary translators, as is suggested by Ruffo’s survey
(2021, 136). Still, their value needs to be assessed with regard to the overall effort
needed for the use of a TM.
6.3 The Encounter between CBTS and DH
In descriptive corpus-based and corpus-driven research, literary translation holds
a prominent place. Yet the detailed discussion of this abundant research, which
has progressively been expanding its scope by building upon more variegated and
coherent methodologies (Sun and Li 2020, 647), is beyond the scope of this paper
(for an overview, see Zanettin 2013, 2017). We will focus instead on some multidimensional approaches to literary translation which seem to “find a more robust
grounding in natural language processing (NLP) techniques” (Zanettin 2014, 182)
and investigate literary translation in line with the demand formulated by Bassnett
and Johnston (2019, 187), who argue that:
We need to expand our ideas about translation beyond the linguistic and to
seek a redefinition of what translation actually is. We also need to understand
how translation has functioned in the past, and how attitudes to translation in
some contexts have come to be.
Based on the achievements in corpus-based literary translation research and
their theoretical grounding, these new approaches use large corpora and advanced
techniques to enrich the array of languages and genres investigated and to study
texts in broader time spans and in complex (inter)cultural settings, which include
a variety of translation agents. These techniques, drawn from DH, include, among
else, text and data mining, stylometry, visualization, and mapping tools. Some
of these tools are specifically developed for literary translation research, such as
the “TL-Explorer,” created for mapping and analyzing translated literature (Zhai
et al. 2020), and specific projects.
DH methods, along the most often with close reading, allow more thorough
investigation in existing research areas, the revision of previous assumptions
on the basis of more extensive and new data, or the generation of new research
hypotheses. In any case, they allow the situation of translation in wider contexts
and an in-depth study of its complex role in cultural transfers, as they “unravel the
intricate social, cultural and literary networks, in which the activities of literary
translation are conducted” (Sun and Li 2020, 651).
This is the case of the team project “Prismatic Jane Eyre,” led by Matthew Reynolds and investigating translation not as “a single act involving one source-text
Corpora and Literary Translation 113
in one language and one translation-text in one another language, which just happens to occur again and again, but rather as paradigmatically generating multiple
texts, so that ‘translation’ becomes the process of turning from one language into
others, da una lingua in altre, producing chains of signifiers in target languages,
creating multiple equivalent, authentic texts, while ‘a translation’ correspondingly
figures as just one of many actual and/or possible linguistic ‘realisations’ (Reynolds 2019, 2). Drawing from cultural and post-colonial approaches in TS, the
project explores translation through this “prismatic” lens, with Charlotte Brontë’s
Jean Eyre as a case study. The team gathered 595 translations and 675 “acts of
translation,” that is either the publication of a new translation or the republication of an existing translation in a new location into 67 languages. They situated
them on two maps, providing knowledge on the spatial and temporal distribution
of the novel, “as a trans-temporal and geographically multiple text, with many
writer-translators, publishers, and readers collaborating to bring ever-new energy
to its plural existence” (Reynolds and Vitali 2021, 10). Another “map” displays
the available covers of the translations, which reflect on a different level its various receptions. While they study the texts by close reading, they also analyze with
corpus technology a number of translations in order to “zoom in on moments of
particular interpretive interest” (Reynolds and Vitali 2021, 13). Text analysis is
in progress, and the outcomes of the project, along with the theory on which it is
based, are available online.5
Cheesman et al. (2017) explore a large corpus of drama retranslations, starting from the principle that translation is an interpretative act and its analysis is
not undertaken per se but, as Mona Baker has suggested (2000, 258), in order
to unveil the complex sociocultural, ideological, cognitive background against
which translators’ decisions are informed. The project is based on the hypothesis
that “the more important an item is for a text’s meaning, the less translators tend
to agree about translating it (though each one is consistent in using their selected
terms)” (2017, 742). In their view, quantitative variation in a corpus of retranslations may lead to the qualitative annotation of the translated texts. Inspired by the
close reading of texts against a solid theoretical background in literary studies, TS
and CBTS, this interdisciplinary group of researchers has developed a web-based
system that enables users to create and explore a parallel corpus of retranslations
by using various visualization tools and constantly referring back to texts. This
recursive loop reflects the complex hermeneutical premise underlying the project.
Automatic and manual annotation is also supported, and stylometric analysis is
carried out, revealing the importance of the period in which the translations were
made.
The kind of retranslations to be explored is already an engaging choice, as it
radically challenges the instrumental perception of translation (Venuti 2019) and
illustrates the complex ways through which it reflects societies while informing
them. So, retranslations explored by the system can be:
Complete, fragmentary, edited, adapted versions; versions derived from (a
version of) the original-language translated work, or from intermediaries in
114
Titika Dimitroulia
the translating language, and/or other languages; versions in various media;
for various audiences (popular, scholarly, restricted); in mono-, bi-, or plurilingual formats; from various periods and places; produced and received under
various economic, political, institutional, and cultural-linguistic conditions.
(Cheesman et al. 2017, 743)
A very appealing feature of the project is the visualization of the complex interconnection between texts, writers, and cultures in a joint close and distant reading
perspective or, as the authors put it, “from the how and why of variation among
translations, back to the varying capacity of the translated text to provoke variation” (2017, 740). What is of particular importance is the number of new questions it generates and the new projects it suggests.
Some other projects explore and contribute to different areas of TS research,
such as translation history and sociology. Drawing from the idea of “history from
below,” Michelle Jia Ye (2022) also combines methods in historical studies, translation history, and network studies with the network visualization tool Gephi to
reconstitute and represent the translators’ presences and connections in a popular
magazine network in early twentieth-century China. She casts light on translation
as “a discursive mode of production that enabled and popularized the very act of
publication” (2022, 49), contributing at the same time to translation history and
translators’ sociographies in this historical period.
These approaches, among many others, illustrate the potential of the encounter
between CBTS and DH for corpus-based literary translation research and TS as a
whole, “illustrating the need for the expanding of horizons within and beyond the
contours of the discipline,” for its outward turn (Bassnett and Johnston 2019, 187).
6.4 Conclusion
The technologization of literary translation seems to be imminent today. In this
new landscape, delineated by the increasing use of tools in literary translation practice, literary translator education can help literary translators deal with technology in a more sustainable and humanizing perspective. Web-based collaborative
environments for computer-assisted translation, designed for cultural texts such
as “Traduxio” (Henkel and Lacour 2021), can also be of great interest for literary
translator practice and training. Corpus technology, which supports interpretation
and enhances creativity, should be at the heart of this education, in which corpusbased CAT tools may also find a place, if used in a similar perspective. Applied
corpus-based literary translation should confront these challenges in due time.
In the same line of thinking, descriptive CBTS is nowadays transformed through
its synergies with DH, which can broaden its scope and allow for the formulation
of new complex hypotheses, grounded on literary translation theory as reshaped
by interdisciplinary exchanges. Reflecting on applied CBTS in literary translation
practice and literary translator education can lead to a cross-fertilization of both.
Much needs to be done, but it seems that through the effective exploration of
corpora and their technologies, “the last bastion of human translation” (Toral and
Corpora and Literary Translation 115
Way 2015, 213) that is literary translation can outline a model of how technology
can be used in a humanizing perspective in the translation profession.
Notes
1 The first CALT conference’s rationale mentions “CAT tools, corpus linguistics, natural
language processing, text analysis and visualization and in particular Neural Machine
Translation (NMT)” (emphasis added). https://calt2021conference.wordpress.com.
2 Austermühl adapts Holmes’s model to include tools in its different phases (2001, 13).
3 https://voyant-tools.org.
4 I have designed and taught since 2012 a postgraduate course on the use of tools in literary
translation, with emphasis on corpora (Master “Translation of Literature and the Humanities,” Aristotle University of Thessaloniki), whose brief description is available on the
Course Registry of Dariah project: “The course aims to enhance the digital competences
of the future literary translators and focuses on: Corpora in literature and literary translation (concordancers, annotation tools, corpora design, compilation and use, e.g., sketch
engine, bootcat, voyant, catma) – Translation technologies (translation memories, terminology management, Translation Environment Tools (TenTs) as matecat etc.) – Collaborative translation platforms (traduxio),” https://dhcr.clarin-dariah.eu/#370.
5 https://prismaticjaneeyre.org.
References
Alonso, Elisa, and Lucas Nunes Vieira. 2017. “The Translator’s Amanuensis 2020.”
JoSTrans: The Journal of Specialised Translation 28: 345–61. www.jostrans.org/
issue28/art_alonso.php.
Anderman, Gunilla, and Margaret Rogers. 2008. Incorporating Corpora. The Linguist and
the Translator. Clevedon: Multilingual Matters.
Austermühl, Frank. 2001. Electronic Tools for Translators. London: Routledge.
Baker, Mona. 1993. “Corpus Linguistics and Translation Studies: Implications and Applications.” In Text and Technology: In Honour of John Sinclair, edited by Mona Baker,
Gill Francis and Elena Tognini-Bonelli, 233–50. Amsterdam: John Benjamins.
Baker, Mona. 2000. “Towards a Methodology for Investigating the Style of a Literary
Translator.” Target 12, no. 2: 241–66.
Baroni, Marco, and Silvia Bernardini. 2004. “BootCaT: Bootstrapping Corpora and Terms
from the Web.” In Proceedings of LREC 2004, 1313–16. Lisbon: LREC. www.lrec-conf.
org/proceedings/lrec2004/pdf/509.pdf.
Bassnett, Susan, and David Johnston. 2019. “The Outward Turn in Translation Studies.”
The Translator 25, no. 3: 181–88. https://doi.org/10.1080/13556509.2019.1701228.
Bowker, Lynne, and Gloria Corpas Pastor. 2015. “Translation Technology.” In The Oxford
Handbook of Computational Linguistics (2nd ed.), edited by Ruslan Mitkov, 871–905.
Oxford: Oxford University Press.
Cheesman, Tom, Kevin Flanagan, Stephan Thiel, Jan Rybicki, Robert S. Laramee, Jonathan Hope, and Avraham Roos. 2017. “Multi-Retranslation Corpora: Visibility, Variation, Value, and Virtue.” Digital Scholarship in the Humanities 32, no. 4: 739–60.
https://doi.org/10.1093/llc/fqw027.
Costa, Cynthia Beatrice, and Igor A. L. da Silva. 2020. “On the Translation of Literature
as a Human Activity Par Excellence.” Aletria: Revista de Estudos de Literatura. https://
periodicos.ufmg.br/index.php/aletria/article/view/22047.
116
Titika Dimitroulia
Damrosch, David. 2003. What Is World Literature? Princeton, NJ and Oxford: Princeton
University Press.
Dollerup, Cay. 2000. “Relay and Support Translations.” In Translation in Context, edited
by Andrew Chesterman, Natividad Gallardo San Salvador, and Yves Gambier, 17–26.
Amsterdam and Philadelphia, PA: John Benjamins.
EC Representation in the UK, CIOL and ITI. 2017. “UK Translator Survey. Final Report.”
www.ciol.org.uk/sites/default/files/UKTS2016-Final-Report-Web.pdf.
EMT Group. 2017. “European Master’s in Translation (EMT) Competence Framework.”
https://ec.europa.eu/info/sites/default/files/emt_competence_fwk_2017_en_web.pdf.
Frankenberg-Garcia, Ana. 2015. “Training Translators to Use Corpora Hands On: Challenges
and Reactions by a Group of 13 Students at a UK University.” Corpora 10, no. 2: 351–80.
Frérot, Cécile. 2016. “Corpora and Corpus Technology for Translation Purposes in Professional and Academic Environments. Major Achievements and New Perspectives.” Cadernos de Tradução 36, no. 1: 36–61. https://doi.org/10.5007/2175-7968.2016v36nesp1p36.
Frérot, Cécile, and Lionel Karagouch. 2016. “Outils d’aide à la Traduction et Formation
de Traducteurs: Vers une Adéquation des Contenus Pédagogiques avec la Réalité Technologique des Traducteurs.” ILCEA 27. https://doi.org/10.4000/ilcea.3849.
Gallego-Hernández, Daniel. 2015. “The Use of Corpora as Translation Resources: A Study
Based on a Survey of Spanish Professional Translators.” Perpectives 23, no. 3: 375–91.
https://doi.org/10.1080/0907676X.2014.964269.
Granger, Sylvianne, and Marie-Aude Laufer. 2022. “Corpus-based Translation and Interpreting Studies. A Forward-looking Review.” In Extending the Scope of Corpus-based
Translation Studies, edited by Sylvianne Granger Marie-Aude Laufer, 13–41. London:
Bloomsbury.
Guerberof-Arenas, Ana, and Antonio Toral. 2020. “The Impact of Post-Editing and
Machine Translation on Creativity and Reading Experience”. Translation Spaces 9, no
2: 255–82. https://doi.org/10.1075/ts.20035.gue.
Guerberof-Arenas, Ana, and Ruiz Antonio Toral. 2022. “Creativity in Translation: Machine
Translation as a Constraint for Literary Texts.” Translation Spaces, Online-First Articles. https://doi.org/10.1075/ts.21025.gue.
Hansen, Damien. 2021. “Défis et Pertinence de la Traduction Littéraire Assistée par
Ordinateur.” La main de Thôt 9. https://revues.univ-tlse2.fr:443/lamaindethot/index.
php?id=982.
Henkel, Daniel, and Philippe Lacour. 2021. “Collaboration Strategies in Multilingual
Online Literary Translation.” In When Translation Goes Digital: Case Studies and Critical Reflections, edited by Renée Desjardins, Claire Larsonneur, and Philippe Lacour,
153–71. London: Palgrave Macmillan.
Hermans, Theo. 2003. “Translation, Equivalence and Intertextuality.” Wasafiri 40: 39–41.
Holmes, James S. 1988. Translated! – Papers on Literary Translation and Translation
Studies. Amsterdam: Rodopi.
Kenny, Dorothy, and Marion Winters. 2020. “Machine Translation, Ethics and the Literary Translator’s Voice.” Translation Spaces 9, no. 19: 123–49. https://doi.org/10.1075/
ts.00024.ken.
Koskinen, Kaisa, and Minna Ruokonen. 2017. “Love Letters or Hate Mail? Translators’
Technology Acceptance in the Light of Their Emotional Narratives.” In Human Issues in
Translation Technology, edited by Dorothy Kenny, 8–24. London: Routledge.
Kübler, Nathalie. 2008. “Corpora and LSP Translation.” In Corpora in Translator Education, edited by Federico Zanettin, Silvia Bernardini, and Dominic Stewart, 25–42.
Manchester: St. Jerome.
Corpora and Literary Translation 117
Loock, Rudy. 2016. La Traductologie de Corpus. Villeneuve d’Ascq: Presses Universitaires du Septentrion.
Malmkjær, Kirsten. 2008. “On a Pseudo-subversive Use of Corpora in Translator Training.” In Corpora in Translator Education, edited by Federico Zanettin, Silvia Bernardini, and Dominic Stewart, 119–34. Manchester: St. Jerome.
Mazoyer, Renaud. 2021. “Traduction d’essai et TAO. Le Racisme est un Problème de
Blancs de Reni Eddo-Lodge: Une Etude de Cas.” La main de Thôt 9. https://revues.
univ-tlse2.fr:443/lamaindethot/index.php?id=991.
Melby, Alan K. 1998. “Eight Types of Translation Technology.” Paper presented at American Translators Association ATA 39th Annual Conference, November 4–9, Hilton Head
Island, SC. www.ttt.org/technology/8types.pdf.
Moratto, Riccardo. 2010. “Designing Translation Curricula in the Machine Translation Era
(MTE): Challenges of a New Approach. Student Perspectives.” In Huigu yu qianjing 回
顧與前瞻, Proceedings of 15th Taiwan Symposium on Translation and Interpretation at
Changrong University, Tainan, edited by Li Gong-Wei and Li Hui-Rong, 69–89. Tainan:
Changrong University.
PACTE Group. 2018. “Competence Levels in Translation: Working Towards a European
Framework.” The Interpreter and Translator Trainer 12, no. 2: 111–31. www.doi.org/1
0.1080/1750399X.2018.1466093.
Peraldi, Sandrine. 2019. “Integrating Corpus-based Tools into Translators’ Work Environments: Cognitive and Professional Implications.” Revista Internacional de Organizaciones 23: 265–92.
Picton, Aurélie, Fontanet, Mathilde, Maradan, Mélanie, and Pulitano, Donatella. 2015.
“Corpora in Translation: Addressing the Gap between the Scholars’ and the Translators’
Point of View.” Presented at Corpus Use and Learning to Translate (CULT), Alicante,
Spain. https://archive-ouverte.unige.ch/unige:86881.
Reynolds, Matthew. 2019. “Introduction.” In Prismatic Translation, edited by Matthew
Reynolds, 1–18. Cambridge: Legenda.
Reynolds, Matthew, and Giovanni Pietro Vitali. 2021. “Mapping and Reading a World
of Translations: Prismatic Jane Eyre.” Modern Languages Open 1: 1–18. https://doi.
org/10.3828/mlo.v0i0.375.
Rothwell Andrew. 2009. “Translating ‘Pure Nonsense’: Walter Benjamin Meets Systran on
the Dissecting Table of Dada.” Romance Studies 27, no. 4: 259–72. https://doi.org/10.11
79/026399009X12523296128713.
Ruffo, Paola. 2021. “In-between Role and Technology: Literary Translators on Navigating the New Socio-technological Paradigm.” PhD diss., Harriot Watt University,
Edinburgh.
Şahin, Mehmet, and Sabri Gürses. 2021. “English – Turkish Literary Translation Through
Human – Machine Interaction.” Revista Tradumàtica. Tecnologies de la Traducció 19:
171–203. https://doi.org/10.5565/rev/tradumatica.284.
Sinclair, Stéfan, and Geoffrey Rockwell. 2016. Hermeneutica: Computer-Assisted Interpretation in the Humanities. Cambridge, MA: The MIT Press.
Slessor, Stephen. 2020. “Tenacious Technophobes or Nascent Technophiles? A Survey of
the Technological Practices and Needs of Literary Translators.” Perspectives 28, no. 2:
238–52.
Sun, Yifeng, and Dechao Li. 2020. “Digital Humanities Approaches to Literary Translation.” Comparative Literature Studies 57, no. 4: 640–54. https://doi.org/10.5325/
complitstudies.57.4.0640.
118
Titika Dimitroulia
Taivalkoski-Shilov, Kristiina. 2019. “Ethical Issues Regarding Machine(-assisted) Translation of Literary Texts.” Perspectives 27, no. 5: 689–703. https://doi.org/10.1080/09076
76X.2018.1520907.
Taravella, Anne-Marie. 2011. Rapport Sommaire et Préliminaire sur les Résultats de
l’Enquête Menée auprès des Utilisateurs de Technologies Langagières en Avril-mai 2011.
Gatineau: Centre de recherche en technologies langagières. www.crtl.ca/display265.
Toral, Antonio, and Andy Way. 2015. “Machine-Assisted Translation of Literary Text:
A Case Study”. Translation Spaces 4, no 2: 240–67.
Van Doorslaer, Luc, and Yves Gambier. 2015. “Measuring Relationships in Translation
Studies. On Affiliations and Keyword Frequencies in the Translation Studies Bibliography.” Perspectives 23, no. 2: 305–19. https://doi.org/10.1080/0907676X.2015.1026360.
Venuti, Lawrence. 2009. “Translation, Intertextuality, Interpretation.” Romance Studies 27,
no. 3: 157–73. https://doi.org/10.1179/174581509X455169.
Venuti, Lawrence. 2019. Contra Instrumentalism: A Translation Polemic. Lincoln, NE:
University of Nebraska Press.
Ye, Jia Michelle. 2022. “A History from Below: Translators in the Publication Network of
Four Magazines Issued by the China Book Company, 1913–1923.” Translation Studies
15, no. 1: 37–53. https://doi.org/10.1080/14781700.2021.1950043.
Youdale, Roy. 2020. Using Computers in the Translation of Literary Style: Challenges and
Opportunities. London and New York: Routledge.
Youdale, Roy, and Andrew Rothwell. 2022. “Computer-assisted Translation (CAT) Tools,
Translation Memory, and Literary Translation.” In The Routledge Handbook of Translation and Memory, edited by Sharon Deane-Cox and Anneleen Spiessens, 381–402.
London: Routledge.
Zanettin, Federico. 2002. “Corpora in Translation Practice.” In Language Resources for
Translation Work and Research, LREC 2002 Workshop Proceedings, edited by Elia
Yuste, 10–14. www.lrec-conf.org/proceedings/lrec2002/pdf/ws8.pdf.
Zanettin, Federico. 2012. Translation-driven Corpora. Manchester: St Jerome Publishing.
Zanettin, Federico. 2013. “Corpus Methods for Descriptive Translation Studies”. Procedia –
Social and Behavioral Sciences 95: 20–32. https://doi.org/10.1016/j.sbspro.2013.10.618.
Zanettin, Federico. 2014. “Corpora in Translation.” In Translation: A Multidisciplinary
Approach, edited by Julian House, 178–99. London: Palgrave Macmillan.
Zanettin, Federico. 2017. “Issues in Computer-Assisted Literary Translation Studies.”
Intralinea. www.intralinea.org/specials/article/issues_in_computer_assisted_literary_
translation_studies.
Zanettin, Federico, Gabriela Saldanha, and Sue-Ann Harding. 2015. “Sketching Landscapes in Translation Studies: A Bibliographic Study.” Perspectives: Studies in Translatology 23, no. 2: 161–82. https://doi.org/10.1080/0907676X.2015.1010551.
Zhai, Alex, Zheng Zhang, Amel Fraisse, Ronald Jenn, Shelley Fisher Fishkin, and Pierre
Zweigenbaum. 2020. “TL-Explorer: A Digital Humanities Tool for Mapping and Analyzing Translated Literature.” In Proceedings of the the 4th Joint SIGHUM Workshop on
Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, 167–71. International Committee on Computational Linguistics. https://aclanthology.org/2020.latechclfl-1.20.
Zhang, Xiaochun, and Lucas Nunes Vieira. 2021. “CAT Teaching Practices: An International Survey.” JoSTrans: The Journal of Specialised Translation 36: 99–124. www.
jostrans.org/issue36/art_zhang.pdf.
7
Orality in Translated and
Non-Translated Fictional
Dialogues
Yanfang Su and Kanglong Liu1
7.1 Introduction
Fictional dialogues are speech or conversational exchanges between (among)
characters in fiction (Koivisto and Nykänen 2016; Bednarek 2018). Fictional dialogues are usually carefully scripted by the author to imitate the orality features of
authentic conversations so as to shape characters, develop the storyline, and facilitate author-reader interaction. It is acknowledged that devising fictional dialogues
is a demanding task for the literary author. For translators, it is equally challenging to translate fictional dialogues because linguistic, cultural, and aesthetic considerations need to be taken into account (Ettobi 2015). The challenges posed to
translators are reflected in previous research regarding how and how well the orality features can be retained in translation. Many studies reported a certain degree
of unnaturalness or reduced degree of orality in translated fictional dialogues,
such as Leppihalme’s (2000) study on the translation of nonstandard language,
Rosa’s (2000) analysis on diachronic changes in translating forms of address (i.e.,
pronouns, verbs, titles, and nouns used to address a specific speaker), and Ettobi’s
(2015) research on cultural assimilation and non-assimilation in translating orality. Most of these studies are qualitative in nature, in that they resorted to the use
of certain orality features to study how translated fictional dialogues deviate from
the correspondent source texts. Despite some innovative findings, such a qualitative method cannot offer a holistic picture of the orality features in translated
fictional dialogues, nor can it compare the similarities and differences of orality
between translated and non-translated fictional dialogues. Therefore, in order to
address such a gap, this study utilized a corpus of representative original English fictions and a corpus of representative Chinese-English translated fictions
to examine how orality features are represented in fictional dialogues of translated and non-translated fiction. The present study extends the extant literature by
adopting a multidimensional analysis approach (MDA), thus increasing the range
of orality features being explored and providing more quantitative insights into
this line of inquiry. In addition, we also aimed at uncovering the discrepancies
between translated and non-translated texts in terms of orality, hoping to gain a
better understanding of the distinctive features of translated texts and offer practical suggestions for similar future research.
DOI: 10.4324/9781003298328-8
120
Yanfang Su and Kanglong Liu
7.2 Literature Review
7.2.1 Orality of Fictional Dialogues
Orality refers to a way of dealing with “knowledge and verbalization” in oral
speech (Ong 1982, 1). It is assumed that features of orality are epitomized in
spontaneous face-to-face conversations (Bublitz 2017). Literary writers strive
to imitate the linguistic features of authentic conversations in creating fictional
dialogues. The features of fictional dialogues are thus very different from narration in fiction. For this reason, some scholars have challenged the traditional
approach to regard speech and narration in fiction as one register (Egbert and
Mahlberg 2020). At the same time, some researchers also explored the perceptual quality and naturalness of fictional dialogues. In particular, early studies on
fictional dialogues mainly adopted a qualitative approach to describe the orality features in fictional dialogues (e.g., Short 1996; Thomas 1997, 2002). For
example, Ferguson (1998) analyzed the use of dialect in Dickens’s Bleak House,
Brontë’s Wuthering Heights, and Hardy’s Tess of the d’Urbervilles. By carefully
examining the characters’ sociocultural background and the historical settings,
she argued that the use of dialect in Victorian novels was inconsistent and deviated from readers’ expectations of genuine conversations. Recently, quantitative
approaches utilizing corpus-based approaches and statistical analyses were used
to analyze the orality features of fictional dialogues. For instance, Quaglio (2009)
made use of two corpora and utilized the multidimensional analysis and the loglikelihood test to compare the linguistic features and the corresponding functions
of fictional dialogues and authentic conversations. Jucker (2021) compared the
orality features between performed fictional dialogues and spontaneous conversations by making use of five large-scale corpora and quantitatively analyzed the
frequency distribution of common inserts and contractions which are believed to
characterize orality. He found that the scripted fictional dialogues underused the
orality features than the unscripted conversations. Both corpus-based quantitative
investigations and qualitative descriptions revealed that the scripted fictional dialogues, although carefully contrived, shared some similarities with written texts
(Ikeo 2019; Jucker 2021) but still diverged from the unscripted spontaneous conversations or impromptu speeches in many aspects (Short 1996; Bublitz 2017).
Another strand of research strives to understand orality in fictional dialogues
and their perceptive functions in fiction. At the microlevel, the orality features can
facilitate the communicative purposes of fictional dialogues to reflect the state of
the mind of the characters (Leech and Short 2007; Koivisto and Nykänen 2016)
and promote the development of the plot (Locher and Jucker 2021). Moreover, the
identity of characters and the power hierarchies in the fictional world also emerged
through intersubject interactions (Bucholtz and Hall 2005; Holmes and Wilson
2017). Therefore, the orality features of fictional dialogues are carefully designed
by the author, including conversation structures, syntactic characteristics, wording, spelling, and tone, to offer important cues of the age, gender, region, ethnicity, social status of the characters (Locher and Jucker 2021). At the macrolevel,
Orality in Translated and Non-Translated Fictional Dialogues 121
orality features can help promote the interaction between the author and the readers. Specifically, the author reconstructs the activity and imparts the contextual
information to the readers through fictional dialogues (Locher and Jucker 2021).
The readers follow the logical progression of the novel and take the initiative to
portray the characters in the sociocultural context of the novel with the help of the
orality features (Nykänen and Koivisto 2016). Bublitz (2017) noted that although
the orality features were reduced in fictional dialogues, the readers managed to
create meanings and contexts through interacting with the dialogues. In brief, orality as represented in fictional dialogues plays an important part in constructing the
fictional world. However, most of the studies still adopted a deductive method by
analyzing a limited range of orality features of some extracts of fictional dialogues
(Jucker 2021). In this regard, we believe that a more inductive corpus-based analysis of a wider variety of features can offer better insights into this line of inquiry.
7.2.2 Translating Orality of Fictional Dialogues
In view of the complicated linguistic features and important functions of fictional
dialogues, translating fictional dialogues in a natural-sounding and culturally
appropriate way imposes unique challenges for translators. Since the cultural relations between the source texts and the translated texts are dissimilar in various
aspects (Ettobi 2015), translating the social and cultural values connoted in orality features of fictional dialogues is a challenging and sometimes even impossible task (Tiittula and Nuolijärvi 2016; Newmark 1987). Such an argument is
supported by some research findings that translation of fictional dialogues often
seems to fail to reproduce effectively the orality features in the translated texts.
For example, Leppihalme (2000) analyzed how translators dealt with nonstandard
language related to regionalism in literary dialogues. She found the law of growing standardization (Toury 2012) (i.e., translation tends to lose its source language
features and variations but instead conforms to target language conventions) is
dominant in the translation, in addition to other strategies such as domestication,
compensation, addition, and foreignization. However, the use of many strategies
further led to a loss of features that could have distinguished the author’s literary
works and reduced the traits of the characters’ social status. Rosa (2000) analyzed
the diachronic changes in Portuguese translations of the forms of address in Robinson Crusoe. She found that the power relationship between Robinson and Friday
was distorted in some translated versions, such as the three versions in the 1980s
and 1990s. She further elucidated that the changes in translation were a negotiation of the source text, the target text, and the developing translation norms. In
2015, Rosa compared some examples of dialogues extracted from the original
version and the translated version of Charles Dickens’s Oliver Twist. She found
that many nonstandard usages of English were obliterated or even standardized in
the translated version, and consequently, the discursive representation of otherness
was totally wiped out. While the aforementioned researchers solely focused on the
translated fictional dialogues, Arhire (2019) compared the use of lexical emphasis and ellipsis between the translated Romanian fictional dialogues and their
122
Yanfang Su and Kanglong Liu
English originals; he argued that untranslatability occurred occasionally due to the
structural differences between the two languages, which further led to an underrepresentation of emotions and reduced identity-shaping power of the dialogues.
Despite some interesting findings, most studies in this field are largely descriptive
in nature based on some representative excerpts extracted from the novels.
To sum up, though great efforts have been made in investigating the orality features of translated fictional dialogues, most studies are still based on purely qualitative methods to analyze examples selected from one particular fiction. Besides,
the use of orality features vary from one study to another, which has not only
created problems for generalizability of findings, as the selected language features
might not be adequate to distinguish one text from another (Xiao 2009; Biber
2014), but also restricted the pursuit of further scholarly investigations. Overall,
in this field of research, quantitative evidence is still lacking, leaving the findings
and conclusions resting heavily on the insight of individual researchers. In other
words, studies based on a corpus of representative fictional dialogues remain relatively scarce. In addition, much of the extant literature on orality features in fictional dialogues focused solely on either original texts or translated texts, and it is
unclear whether differences exist between translated and non-translated fictional
dialogues (Nevalainen 2004). For the few studies comparing translated and nontranslated fictional dialogues, most of them centered on the translations between
European languages. For example, Nevalainen (2004) utilized the Corpus of
Translated Finnish to examine the colloquial features of translated texts, that is,
nonstandardized spelling and wording. Nevertheless, investigations based on language pairs with distant genetic relationships, such as English and Chinese, might
yield more fruitful results. In view of the limitations of the research methods of
previous studies, we proposed using the MDA to compare the multiple linguistic
features of different text types.
7.2.3 The Multidimensional Analysis Approach and Studies
on Orality
The multidimensional analysis approach (MDA) was originally proposed and
developed by Biber (1988) to identify, interpret, and compare the “co-occurrence”
patterns of certain linguistic features in corpora and the reflected “shared functions” (Biber et al. 2002, p. 14). Biber (1988) analyzed the register variation of
English using a batch of linguistic features. Six dimensions turned out to yield
important results to discriminate the different registers, that is, (1) involved vs.
informational language, (2) narrative vs. non-narrative language, (3) elaborated
vs. situation-dependent discourse, (4) overt expression of persuasion, (5) abstract
vs. non-abstract discourse, and (6) online informational elaboration. Biber’s
(1988) proposal of the MDA model was epoch-making (Biber et al. 2002). First,
it is corpus-based, making analysis of a large number of representative texts possible. The use of computational tools also facilitates the thorough analysis of a wide
range of linguistic features quantitatively, ensuring more accurate and consistent
results. In addition, the same computational tools or corpora data can be applied
Orality in Translated and Non-Translated Fictional Dialogues 123
and replicated in different studies, which can further strengthen the generalizability of research findings. Since its introduction, the MDA model has prompted
subsequent researchers to adopt it in a variety of studies.
One of the prominent research strands is the study of orality. For example,
Biber et al. (2002) compared speech and writing in academic discourse and found
that spoken and written texts contrasted remarkably in dimensions 1, 2, 3, and
5, with some variations in disciplines. Quaglio (2009) utilized dimension 1 of
MDA to compare the language of television dialogues used in the situation comedy Friends and the language of natural conversations. He found that the television dialogues most resembled the linguistic features in the involved registers
proposed in Biber’s (1988) study, indicating the endeavors of scriptwriters and
actors to mimic natural conversations. Jonsson (2015) compared the linguistic
features of synchronous and super-synchronous computer-mediated communication (CMC) with oral conversations using MDA. He found that oral and written texts contrasted notably in dimension 1, dimension 3, and dimension 5 of
Biber (1988), with dimension 1 being the most significant one. Following the
methodology of Jonsson (2015), Biber and Egbert (2020) compared the orality
of searchable web registers with face-to-face conversations and found that the
searchable web varies in terms of registers and the interactive registers are barely
represented in this discourse domain. Xiao (2009) further developed the MDA
approach by adding more semantic features, which result in a total of nine dimensions comprised of 141 linguistic features. He used the new model to compare the
register variation in five varieties of English. Among all the factors, the dimension which differentiated the interactive casual texts and the informative elaborate
texts exhibited the most prominent contrastive power among different registers
(Xiao 2009). Summarizing from previous multidimensional analyses of orality,
it is shown that most studies used authentic conversations as the benchmark for
comparison. Besides, findings of previous studies indicate that dimension 1 of
Biber’s (1988) MDA is particularly effective in characterizing the orality of texts.
7.2.4 Research Questions
In view of the research gaps revealed by the foregoing review, the present study
intends to adopt a corpus-based approach to systematically compare the degree
of orality in translated and non-translated fictional dialogues. Specifically, two
research questions are addressed. The first research question concerns with the
orality of translated and non-translated fictional dialogues from a macrolevel. The
second research question further examines how orality differs between translated
and non-translated fictional dialogues in specific language features.
1
2
Do translated fictional dialogues display a lesser degree of orality than nontranslated fictional dialogues represented by Biber’s (1988) dimension 1?
If differences are identified between the two types of texts in dimension 1, in
what ways do the individual linguistic features associated with dimension 1
differ between translated and non-translated fictional dialogues?
124
Yanfang Su and Kanglong Liu
7.3 Methods
7.3.1 The Corpora
With the aim of comparing the orality of translated and non-translated fictional
dialogues, we compiled a corpus of fictional dialogues with one translated subcorpus and one non-translated subcorpus. The first step of corpus compilation was
the selection of high-quality and comparable translated and non-translated fiction
works. To ensure the quality of the novels, we referred to the list of Time’s top
100 best novels (1923 to 2005) in selecting the original English novels and the top
100 twentieth-century Chinese novels recommended by Asia Weekly in selecting
the translated Chinese novels. In addition, to make sure the translated and nontranslated fictional dialogues were comparable, the publication time of the English novels and the translated novels was limited to the period of 1970s–2010s.
Ten original English novels and ten translated novels were selected. Then, in the
second step of corpus compilation, the fictional dialogues were extracted from the
novels using a self-written Python program by detecting the quotation marks. The
fictional dialogue data were then manually checked for consistency and accuracy.
In the end, we have compiled the Fictional Dialogue Corpus, comprised of one
subcorpus of non-translated fictional dialogues (250, 950 words) and one of translated fictional dialogues (132, 516 words) (see Table 7.1 for the corpus structure).
Table 7.1 Composition of the Fictional Dialogue Corpus
Fiction
Publication Year
Word Count
Translated Fiction
Border Town (《邊城》)
Rickshaw Boy A Novel (《駱駝祥子》)
Taipei People
The Taste of Apples (《兒子的大玩偶》)
The Deer and the Cauldron (《鹿鼎記》)
Alien Realm (《異域》)
Blades from the Willow (《蜀山劍俠傳》)
Schoolmaster (《倪煥之》)
Spring Peach (《春桃》)
Farewell to My Concubine (《霸王別姬》)
2009
2010
2000
2001
2002
1996
1991
1978
1995
1994
132,516
7,758
9,565
15,849
18,138
25,251
3,426
13,003
20,171
2,803
16,552
Non-Translated Fiction
American Pastoral
Atonement
Beloved
The Blind Assassin
Song of Solomon
Falconer
Gravity’s Rainbow
Never Let Me Go
Snow Crash
White Teeth
1997
1987
2000
2000
1977
1977
1973
2005
1992
2000
25,095
30,178
4,266
15,409
14,286
29,651
9,795
41,236
18,963
37,928
49,238
Orality in Translated and Non-Translated Fictional Dialogues 125
7.3.2 Linguistic Features
In the present studies, the 28 linguistic features in dimension 1 of Biber’s (1988)
MDA are chosen for comparing the orality of translational and non-translational
fictional dialogues. The reason for such a choice is twofold. Firstly, previous studies have confirmed dimension 1, which distinguishes “highly interactive, affective discourse produced under real-time constraints” and “highly informational
discourse produced without time constraints” (Biber 1988, 135), was particularly
useful in distinguishing oral from literate texts. Secondly, MDA has been established as a widely accepted analytical model with representative linguistic features. The use of this dimension together with the language features therein not
only can increase the rigor of the study but also render comparison with other
registers possible.
In particular, dimension 1 consists of two categories of linguistic features.
One category contains features with positive loadings, meaning that a higher frequency of such features will render the texts toward interactivity and orality; the
other category consists of features with negative loadings, indicating that a higher
frequency of these features will render the texts more informational and literate.
The positive-loadings features include amplifiers, causative adverbial subordinators, discourse particles, subordinator “that” deletion, wh-clauses, pronoun
“it,” among others. The negative-loadings features include nouns, word length,
prepositional phrases, type-token ratio, and attributive adjectives. There are more
positive-loadings features than negative ones, as the former are more commonly
found in spoken registers, which are described as “verbal, interactional, affective,
fragmented, reduced in form, and generalized in content” (Biber 1988, 105).
7.3.3 Data Analysis
To compare the orality of translated and non-translated fictional dialogues, the
first step is to grammatically annotate the corpus data and extract the statistics of
linguistic features needed for further quantitative analysis. To this end, the Multidimensional Analysis Tagger (MAT) (Nini 2019), which was designed to replicate
Biber’s (1988) MDA, was adopted in the present study. The MAT firstly tags
the input texts with the linguistic features proposed by Biber (1988). Then, the
program automatically calculates the normalized distribution (the frequency per
100 tokens) and computes the z-scores of the linguistic features in the corpus.
Subsequently, based on the z-scores of linguistic variables, the dimension scores
of the input texts are also calculated. The MAT also automatically matches the
input texts with the closest register (Nini 2019). The output of the MAT analysis
includes the normalized frequency and the z-scores of the individual linguistic
features, the dimension scores of the input texts, a dimension graph, and a texttype graph.
After pre-processing the corpus data and obtaining the statistics of the linguistic features, quantitative analyses were conducted to compare the degree of orality in translated and non-translated fictional dialogues. We first conducted the
126
Yanfang Su and Kanglong Liu
one-sample Kolmogorov-Smirnov test and Levene’s test to check the assumptions of normal distribution and homogeneous variance (Larson-Hall 2015). The
alpha level was set at .05 for this study. The results of one-sample KolmogorovSmirnov test indicated that the dimension scores of both the translated fictional
dialogues (p = .20) and the non-translated fictional dialogues (p = .20) were
normally distributed. The Levene’s test showed that the two groups of data followed the equality of variances (p = .77). As the assumptions were fulfilled, the
independent samples t-test was conducted to compare whether the translated and
non-translated fictional dialogues differ in the score of dimension 1 (RQ1). To
get a more holistic picture of the overall degree of orality in translated and nontranslated fictional dialogues, the dimension scores of various text types used by
Biber (1988) were also given as references.
In addition to the independent samples t-test, the normalized frequency scores
of individual linguistic features in the translated and non-translated fictional dialogues were also compared to reveal how the two text types differ in these features. The Mann-Whitney U test was utilized, as certain linguistic features did not
fulfil the assumptions of normality or equality of variances (RQ2). The effect size
of the features exhibiting significant differences (p<.05) was also calculated. The
features that distinguish the two text types were discussed in detail with qualitative examples.
7.4 Results
7.4.1 Overall Dimension Scores
Table 7.2 presents the mean score, the standard deviation of dimension 1, as well
as the closest genre of translated and non-translated fictional dialogues. As shown
in Table 7.2, both translated and non-translated fictional dialogues received a
positive score on dimension 1. Although the mean score of non-translated fictional dialogues (M = 6.80, SD = 6.80) was higher than that of translated fictional
dialogues (M = 15.62, SD = 7.13), the independent samples t-test showed that
differences between the two text types were marginally significant in terms of
the overall score of dimension 1 (t = 2.06, p = .054, df = 18). Instead of stating
that the two text types are not statistically different from each, such a marginally
significant result needed to be treated with caution. One possible explanation for
such a result might be the small sample size in both groups (Huck 2011), that is,
only ten translated fictions and ten non-translated fictions were involved in the
analysis.
Table 7.2 Score of Dimension 1
Text Type
N
Mean Score
Std. Deviation
Closest Genre
Non-translated
Translated
10
10
22.04
15.62
6.80
7.13
Personal Letters
Personal Letters
Orality in Translated and Non-Translated Fictional Dialogues 127
Figure 7.1 Scores for dimension 1 of different registers.
Figure 7.1 shows the spread of scores of different registers regarding dimension 1, in which the degree of orality of translated and non-translated fictional
dialogues is illustrated together with related registers. The registers for comparison include face-to-face conversations, broadcasts, prepared speeches, personal
letters, general fiction, press reportage, academic prose, and official documents
(note the statistics are taken from Biber [1988]). The dots in the middle represent
the mean dimension score of the register, and the upper and lower whiskers show
the dispersion of the scores. As indicated in Figure 7.1, written registers, like
official documents, press reportage, and academic prose, receive negative mean
scores. Broadcasts and general fiction also exhibit negative mean scores, but the
variation of the scores for these two registers is large, which is possibly due to the
influence of sub-genres (Biber 1988). The rest of the registers, including conversations, prepared speeches, personal letters, and translated and non-translated fictional dialogues, received positive mean scores. Among all the registers receiving
a positive mean score, prepared speech is the lowest, face-to-face conversations
the highest, and personal letters, translated fictional dialogues, and non-translated
fictional dialogues range in between. From a functional perspective, the positive
scores suggest that these text types are more involved and interactive in nature,
and the spread of the dimension scores indicates the different tendency toward
orality. The mean scores for both translated and non-translated fictional dialogues
128
Yanfang Su and Kanglong Liu
are higher than that of prepared speeches, lower than the mean score of face-toface conversations, and similar to that of personal letters. This shows that both
translated and non-translated fictional dialogues are characterized with a higher
degree of orality, though they contain fewer informational features than prepared
speeches but less orality features than face-to-face conversations. In addition,
compared with the large variation of face-to-face conversations, the variation
of fictional dialogues is smaller, indicating that the scripted fictional dialogues
are relatively narrower in linguistic features than authentic conversations. In the
end, it should also be noted that although both fictional dialogues resemble personal letters regarding mean scores, the variation of fictional dialogues is larger.
This shows that fictional dialogues display a lower degree of internal consistency
regarding orality than personal letters.
To sum up, translated fictional dialogues contain less orality features than its
non-translated counterpart. However, the overall dimension scores indicated that
both text types display a similarly high degree of orality. There is a high tendency
for these two text types toward orality. Based on the dimension scores, fictional
dialogues show great similarity with personal letters but are not directly comparable to face-to-face conversations.
7.4.2 Distribution of Linguistic Features
As indicated by the results of independent samples t-test, translated fictional dialogues did not differ significantly from non-translated fictional dialogues regarding the overall degree of orality. However, since the difference was marginally
significant (p = .054), a closer look at the distribution of individual linguistic
features might yield more insights into the similarities and disparities between
the two text types. Therefore, the Mann-Whitney U test was conducted to compare the distribution of individual linguistic features between the two text types.
Table 7.2 shows the results of the Mann-Whitney U test, including the mean rank
differences, the Mann-Whitney U, the z-score, and the p-value. The mean rank
differences reveal the discrepancy of individual features between the two text
types. Features with a p-value smaller than .05 indicate that the linguistic feature
is significantly different between the two text types.
As we can see from Table 7.2, 16 out of 28 linguistic features are not significantly different between the two text types. The translated and non-translated
fictional dialogues receive similar scores regarding personal pronoun (including
first-person pronouns and second-person pronouns), questions (direct wh-questions), present tense, sentence relatives, independent clause coordinators, be as
a main verb, amplifiers, emphatics, contractions, possibility modals, and analytic
negation. In addition, both translated and non-translated fictional dialogues receive
similar negative scores regarding the other group of features that are representative
of informational texts, like nouns (excluding nominalization and gerunds), word
length, prepositional phrases, attributive adjectives, and type-token ratio.
On the other hand, the Mann-Whitney U test also identifies 11 (out of 28)
features that exhibit significantly different distribution in translated and
Orality in Translated and Non-Translated Fictional Dialogues 129
Table 7.3 Results of Mann-Whitney U-Test
Feature
Mean Rank
Diff.a
MannWhitney U
Z
Sig. (2-Tailed)
Private verbs
Wh-clauses
Hedges
Pronoun it
Subordinator that deletion
Stranded preposition
Indefinite pronouns
Discourse particles
Demonstrative pronouns
Pro-verb do
Causative adverbial subordinators
Present tense
Analytic negation
Contractions
Total prepositional phrases
Amplifiers
Word length
Attributive adjectives
Emphatics
Be as main verb
Possibility modals
Sentence relatives
First-person pronouns
Second-person pronouns
Type-token ratio
Direct wh-questions
Total other nouns
Independent clause coordination
9.6
8.5
7.9
7.2
6.6
6.3
6.2
5.7
5.5
5.4
5.3
4.4
4.0
3.8
-3.1
2.6
-2.3
-2.2
-2.1
2.0
-1.1
0.8
0.8
-0.7
0.2
0.2
-0.2
0.1
2
7.5
10.5
14
17
18.5
19
21.5
22.5
23
23.5
28.0
30.0
31.0
34.5
37.0
38.5
39.0
39.5
40.0
44.5
46.0
46.0
46.5
49.0
49.0
49.0
49.5
-3.63
-3.216
-2.995
-2.722
-2.496
-2.386
-2.35
-2.16
-2.083
-2.043
-2.007
-1.663
-1.512
-1.436
-1.173
-0.984
-0.870
-0.832
-0.794
-0.756
-0.416
-0.311
-0.302
-0.265
-0.076
-0.076
-0.076
-0.038
<0.001**
0.001**
0.003**
0.006**
0.013**
0.017**
0.019**
0.031*
0.037*
0.041*
0.045*
0.096
0.130
0.151
0.241
0.325
0.384
0.406
0.427
0.450
0.677
0.756
0.762
0.791
0.940
0.940
0.940
0.970
Source: a Mean rank diff. = mean rank (non-translated fictional dialogues) – mean rank (translated
fictional dialogues),
**
Large effect size (r > 0.5).
*
Medium effect size (0.3 > r ≥ 0.5).
non-translated fictional dialogues, which are causative adverbial subordinators, demonstrative pronouns, discourse particles, hedges, indefinite pronouns,
pronoun it, private verbs, pro-verb do, stranded preposition, subordinator that
deletion, and wh-clauses. The mean rank differences of these 11 features are positive, meaning, that the normalized frequency of these specific features which are
positively correlated with orality is higher in non-translated than in translated
fictional dialogues. Among the 11 features, 7 features, that is, private verbs, whclauses, hedges, pronoun it, subordinator that deletion, stranded preposition, and
indefinite pronouns, exhibit a large effect size. The rest of the 4 features, like
demonstrative pronouns, pro-verb do, and causative adverbial subordinators,
have a medium effect size.
Specifically, among the 11 significantly different features, one group of features
is related to attitudinal or interpersonal expressions, which are overrepresented in
130
Yanfang Su and Kanglong Liu
non-translated fictional dialogues. In comparison, personal feelings or attitudes
are relatively underrepresented in the translated fictional dialogues. The feature with the largest effect size are private verbs (e.g., feel, perceive), which are
symbolic of attitudinal or affective expressions and indicative of interpersonal
communication. Private verbs are also one of the features that have the strongest
power to distinguish involved from informational texts. The non-translated fictional dialogues use significantly more private verbs than translated ones, revealing that the characters express their personal feelings and thinking more explicitly
in the former. For example, in one typical example of the non-translated fiction
Atonement, the character uses private verbs to express personal ideas or feelings
in a series of consecutive sentences.
You’d be forgiven for thinking me mad wandering into your house barefoot,
or snapping your antique vase. The truth is, I feel rather lightheaded and
foolish in your presence, Cee, and I don’t think I can blame the heat! Will
you forgive me?
– Robbie
In addition to private verbs, more frequent use of causative adverbial subordinators in non-translated dialogues also suggests an overrepresentation of more
attitudinal or affective expressions in this text type. For example, in the original
English fiction American Pastoral, “because” is frequently used and emphasized
(in the form of “b-because” and “b-b-because”) to express the strong emotions of
the character when s/he is arguing with another person. Another feature that is typical of interactive texts is wh-clauses, which function as “structural elaboration” and
provide a way to “talk about questions” (Biber 1988, 220). Wh-clauses are more
prevalent in conversations and speeches. Likewise, the non-translated fictional dialogues exhibited a higher frequency of wh-clauses in comparison to translated ones.
Another typical group of features in non-translated fictional dialogues are
impersonal pronouns, including pronoun it, indefinite pronouns, and demonstrative pronouns. These pronouns are used as general referents as they carry limited
information. Such linguistic features are often associated with a lack of careful thinking, thus featuring one of the typical traits of spoken texts. The higher
representation of pronouns and underuse of nominal referents in non-translated
fictional dialogues reveal a stronger degree of uncertainty typical of real-time
conversations. Therefore, in this aspect, the non-translated fictional dialogues are
less informational and share more similarities with real-time conversations than
translated fictional dialogues. Besides impersonal pronouns, the underrepresentation of hedges (e.g., maybe, possibly, kind of) in translated fictional dialogues
also implies that translated fictional dialogues carry a higher degree of perceptual
certainty. In the example extracted from the non-translated fiction Beloved, the
character uses the pronoun it to refer to the “ghost” that she was unsure about,
and the hedge maybe further highlights the uncertainty. Then another character
resolved her doubt by firstly using it to refer to the ghost, as the first character did,
and then shifting from it to she when referring to the ghost.
Orality in Translated and Non-Translated Fictional Dialogues 131
I don’t know about lonely, Mad, maybe, but I don’t see how it could be lonely
spending every minute with us like it does. Must be something you got it
wants.
It’s just a baby. My sister, she died in this house.
Using do as a pro-verb, as a distinctive feature of oral conversations, is also
typical of non-translated texts. The word do is polysemous and can be used as a
general verb in different contexts. The overuse of such a feature in the non-translated subcorpus implies a reduced information density and enhanced orality in
non-translated fictional dialogues. In comparison, translated texts prefer precise
wording to using the general verb do.
Moreover, compared with translated fictional dialogues, non-translated fictional dialogues preferred more reduced forms, represented by the constant omission of subordinator that and the more frequent use of stranded prepositions. On
the other hand, translated fictional dialogues preserve the subordinator that more
frequently. For example, in the translated fiction Border Town, the translator chose
to retain the subordinator that when the character was referring to other people’s
ideas in the utterance.
No. 2, my Cuicui tells me that one night during the last month she had a
dream.
It was strange. She said that in her dream someone’s songs floated her up
to the bluffs across the creek, where she picked a handful of saxifrage!
In comparison, in the non-translated fiction The Blind Assassin, subordinator
that is often omitted, as evidenced in both of the following sentences.
They said it was a matter of costs. After the button factory was burned, they
said it would take too much to rebuild it.
In addition, stranded prepositions are underrepresented in translated, while
overrepresented in non-translated, fictional dialogues. For example, in one of the
non-translated fictions, Snow Crash, the sentence “But you know that bug you
were talking to earlier?” contains the stranded preposition “to” at the end of a
sentence that is separated from the nominal. Such a feature is typical of orality, whereas the non-stranded counterpart is representative of formal discourse.
Clearly, the non-translated fictional dialogues display a tendency toward the spoken end of the cline compared with the translated ones.
Discourse particles can serve different pragmatic functions (Aijmer 2002) and
are used to express the attitudes and beliefs of speakers regarding the propositional content of an utterance. As such, it also helps to maintain textual coherence
especially when the text is fragmented. Two extracts taken respectively from the
translated and non-translated texts are used to illustrate the interesting distinction
between the two text types. In both examples, the character interacts with the
other character in a bad mood and expresses his/her idea about what the other
132
Yanfang Su and Kanglong Liu
character has told him/her. In the translated fiction Schoolmaster, the speaker
expresses his disagreement by directly putting forward his suggestion using a
rhetorical question.
Why not let him get on with it?
If it’s something simple that one can manage oneself, there’s no point in
troubling someone else to do it.
In comparison, in the non-translated fiction Snow Crash, “oh” and “well” were
utilized to express the speaker’s discontent and signal the confrontational situation. From the conversation, it can be inferred that the speaker is not satisfied
with the answer given by the other speaker. The use of discourse particles clearly
indicates the speaker’s displeasure or even indignation. Also, using the discourse
particles has, to some extent, mitigated the face-threatening situation caused by
the rhetorical question.
“Where do you want to go on the Kowloon?”
“The Raft.”
“Oh, well, why didn’t you say so, that’s where our other passenger is
going.”
The underuse of discourse particles in translated fictional dialogues as opposed
to non-translated ones suggests that translations might lack the authenticity and
naturalness of face-to-face conversations compared to the originals.
In summary, although the translated and non-translated fictional dialogues are
marginally significant in terms of dimension scores, they differ significantly in
terms of the distribution of various individual features. Particularly, in comparison with non-translated fictional dialogues, the translated fictional dialogues are
featured by an underrepresentation of personal attitudes and emotions, an underuse of discourse particles, and more complete and precise expressions.
7.5 Discussion
This study reveals the similarities and differences in orality between translated
and non-translated fictional dialogues by making use of the multidimensional
analysis approach. By treating fictional dialogue as a genre in its own right,
we have come up with some interesting findings that might otherwise remain
undetected if fiction is treated as one single genre. In this study, it is found that
fictional dialogue shares more similarities with personal letters but nonetheless
still exhibits a considerable degree of orality. Fictional dialogue, both translated
and non-translated, does not resemble general fiction, as reflected by the overall dimension scores, which confirmed the proposal of previous researchers that
fictional dialogues and narration are indeed two different genres that should be
analyzed separately (Axelsson 2009). One possible explanation might be that fictional dialogues are scripted texts that are artfully created to simulate real-life
Orality in Translated and Non-Translated Fictional Dialogues 133
conversations representative of the sociocultural background of the characters
(Bublitz 2017; Jucker 2021). The findings of the present study also corroborate
with Bednarek (2018) and Jucker (2021) that scripted language of fiction displays
different features from unscripted conversations regarding orality and thus can
never be the same as spontaneous conversations. Notwithstanding the efforts to
model and reproduce real-life conversations (Leech and Short 2007), as argued by
Chaume (2007, 215), the scripted language are “very normative indeed.”
The marginally significant differences between translated and non-translated
fictional dialogues show that the two text types still show considerable differences. Such differences are supported by the discrepant normalized frequency
distribution of individual linguistic features between the two text types. This is in
line with the findings of Brodovich (1997) that translations differ from originals
in their portrayal of characters speaking nonstandard language. Such a difference
is reflected in vocabulary as well as grammar features, which can partly be attributed to the translators’ efforts to standardize translated texts. As far as the current
study is concerned, the omission of subordinator that and stranded prepositions
are less found in translated fictional dialogues, indicating that translated language
favors more standardized structures over reduced forms or fragmented ones.
The quantitative findings of the present study also corroborate with the qualitative findings of previous research that translated fictional dialogues tend toward
standardization of language use (e.g., Read 2013; Tiittula and Nuolijärvi 2016;
Nevalainen 2004). In addition, some distinct features of translated fictional dialogues might also be related to translators’ decision to explicitate the source text
(Blum-Kulka 1986/2000). For example, expressions that indicate vagueness and
uncertainties, such as hedges, do as a pro-verb, pronoun it, indefinite pronouns,
and demonstrative pronouns, are often underused in translated fictional dialogues.
The findings give clear support for the worries of Ben-Shahar (1994) that translators prefer more specific lexemes and explicit verbalization to generalized or
uncertain expressions.
Another possible explanation for the differences between translated and nontranslated fictional dialogues might be the influence of the source language.
For instance, the present study contradicts Nevalainen (2004), who found that
translators frequently used interjections and speech fillers to retain orality in the
translation. In the current study, we found that non-translated texts used discourse
particles at a higher frequency than translated texts, suggesting that the translations might be subject to unnaturalness and incoherence. One possible reason
for the diverged findings might be the influence of the source language. As Liu
(2013) found, people with different first language might have different ways of
using discourse particles. In the present study, the source language is Chinese,
while in Nevalainen’s (2004) study, the fictional dialogues were translated from
Finnish. Source language clearly has a role to play in the translation of fictional
dialogues. The source texts written in different languages might have a different
proportion of attitudinal or affective expressions, which are then transferred to
the translated texts. As argued by Bishop (1956), compared to Western fiction,
emotions in Chinese fiction tend to be implicitly expressed and often conveyed
134
Yanfang Su and Kanglong Liu
through the narrator’s voice rather than the fictional dialogues. Consequently,
when the translators follow the source language norm by opting for a more faithful approach, it is natural that emotional and affective language might be underrepresented in the translated fictional dialogues.
Despite the influence of source language, it should also be noted that the pragmatic functions of certain linguistic features might also be lost in the translation process of standardization or explicitation. The reduced degree of orality
might also result in unnaturalness and lack of spontaneity in translated fictional
dialogues (Ben-Shahar 1994). As fictional characters who come from different
sociocultural backgrounds can be portrayed to exhibit divergent characteristics
of speech (Locher and Jucker 2021), the degree of orality should be treated with
extra attention in translated fiction which contains the source sociocultural backgrounds written in the target language. So far as the current study is concerned,
the relatively lower degree of orality regarding certain linguistic features and
the tendency toward standardization and explicitation in translated fictional dialogues, as warned by Tiittula and Nuolijärvi (2016), can influence the shaping of
characters and even misrepresent the relationship between characters intended
by the author. We suspect that, like other types of translation activities, translators of fictional dialogues are also trapped in a dilemma of either employing a
more “literate” approach to produce more faithful but less “authentic” fictional
dialogues or opting for a more “adaptational” approach to render less faithful but
more “natural” dialogues.
7.6 Conclusion
The current study has used multidimensional analysis to examine the orality features in translated and non-translated fictional dialogues. In comparison to other
models, the consistency and perceived robustness of this model have greatly
increased the generalizability of the research findings. Our study has found that
translation as an important variable has played a crucial role in affecting the profiling of translated fictional dialogues, which differ significantly from non-translated ones in a range of language features.
Notwithstanding the interesting findings, it is admitted that some limitations
exist in the present study. The analysis has concentrated on fiction works that
were published or translated from the 1970s to the early twenty-first century.
Since translation is influenced by negotiation between sociocultural powers and
the prevalent translation norms (Rosa 2000), future studies could compile a bigger corpus by including more fiction works for analysis. Another limitation arises
from the nature of translation. The findings of this study are restricted to the design
of the comparable corpus comprised of translated and non-translated fictional dialogues without referring to the source texts; therefore, the influence of the source
texts on the orality features of translation remains unknown. Future studies can
be conducted to examine to what extent the differences in orality between these
two text types are a result of translation or source language influence. In this
regard, the use of composite bilingual corpus (Laviosa 2006, 268) integrating both
Orality in Translated and Non-Translated Fictional Dialogues 135
comparable and parallel corpora can be fruitfully utilized to explore a number of
interrelated variables in translated fictional dialogues.
Note
1 Corresponding author.
References
Aijmer, Karin. 2002. English Discourse Particles: Evidence from a Corpus (Vol. 10).
Amsterdam/Philadelphia: John Benjamins Publishing.
Arhire, Mona. 2019. “Lexical Emphasis in the Literary Dialogue: A Translational Perspective.” Acta Universitatis Sapientiae, Philologica 11, no. 3: 105–18.
Axelsson, Karin. 2009. “Research on Fiction Dialogue: Problems and Possible Solutions.”
In Corpora: Pragmatics and Discourse, edited by Andreas H. Jucker, Daniel Schreier,
and Marianne Hundt, 189–201. Leiden: Brill.
Bednarek, Monika. 2018. Language and Television Series: A Linguistic Approach to TV
Dialogue. Cambridge: Cambridge University Press.
Ben-Shahar, Rina. 1994. “Translating Literary Dialogue: A Problem and Its Implications
for Translation into Hebrew.” Target. International Journal of Translation Studies 6, no.
2: 95–121.
Biber, Douglas. 1988. Variation across Speech and Writing. Cambridge: Cambridge University Press.
Biber, Douglas. 2014. “The Ubiquitous Oral Versus Literate Dimension: A Survey of Multidimensional Studies.” In Measured Language: Quantitative Studies of Acquisition,
Assessment, and Variation, edited by Jeffrey Connor-Linton and Luke W. Amoroso,
1–20. Washington, DC: Georgetown University Press.
Biber, Douglas, Susan Conrad, Randi Reppen, Pat Byrd, and Marie Helt. 2002. “Speaking
and Writing in the University: A Multidimensional Comparison.” TESOL Quarterly 36,
no. 1: 9–48.
Biber, Douglas, and Jesse Egbert. 2020. “Orality on the Searchable Web: A Comparison
of Involved Web Registers and Face-to-Face Conversation.” In Voices Past and Present
Studies of Involved, Speech-related and Spoken Texts. In Honor of Merja Kyt, edited by
Ewa Jonsson and Tove Larsson, 317–36. Amsterdam: John Benjamins.
Bishop, John L. 1956. “Some Limitations of Chinese Fiction.” The Far Eastern Quarterly
15, no. 2: 239–47.
Blum-Kulka, Shoshana. 1986/2000. “Shift of Cohesion and Coherence in Translation.” In
The Translation Studies Reader (2nd ed.), edited by Lawrence Venuti and Mona Baker,
298–313. London and New York: Routledge.
Brodovich, Olga I. 1997. “Translation Theory and Non-standard Speech in Fiction.” Perspectives: Studies in Translatology 5, no. 1: 25–31.
Bublitz, Wolfram. 2017. “Oral Features in Fiction.” In Pragmatics of Fiction, edited by M.
A. Locher and A. H. Jucker, 235–64. Berlin: de Gruyter.
Bucholtz, Mary, and Hall Kira. 2005. “Identity and Interaction: A Sociocultural Linguistic
Approach.” Discourse Studies 7, no. 4–5: 585–614.
Chaume, Frederic. 2007. “Dubbing Practices in Europe: Localisation Beats Globalisation.”
Linguistica Antwerpiensia 6: 201–17.
Egbert, Jesse, and Mahlberg Michaela. 2020. “Fiction – One Register or Two? Speech and
Narration in Novels.” Register Studies 2, no. 1: 72–101.
136
Yanfang Su and Kanglong Liu
Ettobi, Mustapha. 2015. “Translating Orality in the Postcolonial Arabic Novel: A Study
of Two Cases of Translation into English and French.” Translation Studies 8, no. 2:
226–40.
Ferguson, Susan L. 1998. “Drawing Fictional Lines: Dialect and Narrative in the Victorian
Novel.” Style 32, no. 1: 1–17.
Holmes, Janet, and Wilson Nick. 2017. An Introduction to Sociolinguistics. London:
Routledge.
Huck, Schuyler W. 2011. Reading Statistics and Research. London: Pearson Education.
Ikeo, Reiko. 2019. “Colloquialization’ in Fiction: A Corpus-Driven Analysis of PresentTense Fiction.” Language and Literature: International Journal of Stylistics 28, no. 3:
280–304.
Jonsson, Ewa. 2015. Conversational Writing: A Multidimensional Study of Synchronous
and Supersynchronous Computer-Mediated Communication. Bern: Peter Lang.
Jucker, Andreas H. 2021. “Features of Orality in the Language of Fiction: A Corpus-based
Investigation.” Language and Literature 30, no. 4: 341–60.
Koivisto, Aino, and Elise Nykänen. 2016. “Introduction: Approaches to Fictional Dialogue.” International Journal of Literary Linguistics 5, no. 2: 1–14.
Larson-Hall, Jennifer. 2015. A Guide to Doing Statistics in Second Language Research
Using SPSS and R. New York and London: Routledge.
Laviosa, Sara. 2006. “Data-Driven Learning for Translating Anglicisms in Business Communication.” IEEE Transactions on Professional Communication 49, no. 3: 267–74.
Leech, Geoffrey N., and Mick Short. 2007. Style in Fiction: A Linguistic Introduction to
English Fictional Prose (No. 13). London: Pearson Education.
Leppihalme, Ritva. 2000. “The Two Faces of Standardization: On the Translation of
Regionalisms in Literary Dialogue.” The Translator 6, no. 2: 247–69.
Liu, Binmei. 2013. “Effect of First Language on the Use of English Discourse Markers by
L1 Chinese Speakers of English.” Journal of Pragmatics 45, no. 1: 149–72.
Locher, Miriam A., and Andreas H. Jucker. 2021. The Pragmatics of Fiction: Literature,
Stage and Screen Discourse. Edinburgh: Edinburgh University Press.
Nevalainen, Sampo. 2004. “Colloquialisms in Translated Text. Double Illusion?” Across
Languages and Cultures 5, no. 1: 67–88.
Newmark, Peter. 1987. A Textbook of Translation. Hoboken, NJ: Prentice-Hall International.
Nini, Andrea. 2019. “The Multi-Dimensional Analysis Tagger.” In Multi-Dimensional
Analysis: Research Methods and Current Issues, edited by T. Berber Sardinha and M.
Veirano Pinto, 67–94. London and New York: Bloomsbury Academic.
Nykänen, Elise, and Aino Koivisto. 2016. “Fictional Dialogue and the Construction of
Interaction in Rosa Liksom’s Short Stories.” International Journal of Literary Linguistics 5, no. 2: 1–30.
Ong, Walter J. 1982. Orality and Literacy: The Technologizing of the Word. London:
Methuen.
Quaglio, Paulo. 2009. Television Dialogue: The Sitcom Friends vs. Natural Conversation.
Amsterdam and Philadelphia, PA: John Benjamins.
Read, Andrew. 2013. “Translating and Adapting Fictional Speech: The Case of Philip Pullman’s Northern Lights.” PhD diss., University of Manchester.
Rosa, Alexandra Assis. 2000. “The Negotiation of Literary Dialogue in Translation: Forms
of Address in Robinson Crusoe Translated into Portuguese.” Target. International Journal of Translation Studies 12, no. 1: 31–62.
Rosa, Alexandra Assis. 2015. “Translating Orality, Recreating Otherness.” Translation
Studies 8, no. 2: 209–25.
Orality in Translated and Non-Translated Fictional Dialogues 137
Short, Mick. 1996. Exploring the Language of Poems, Plays, and Prose. London: Longman.
Toury, Gideon. 2012. Descriptive Translation Studies – And Beyond (Revised Edition).
Amsterdam and Philadelphia: John Benjamins Publishing Company.
Thomas, Bronwen E. 1997. “ ‘It’s Good to Talk’? An Analysis of a Telephone Conversation
from Evelyn Waugh’s Vile Bodies.” Language and Literature: International Journal of
Stylistics 6, no. 2: 105119.
Thomas, Bronwen E. 2002. Multiparty Talk in the Novel: The Distribution of Tea and Talk
in a Scene from Evelyn Waugh’s Black Mischief. Poetics Today 23, no. 4: 657–84.
Tiittula, Liisa, and Nuolijärvi Pirkko. 2016. “Changing Norms in Translated Finnish Fiction: A Study of Non-Standard Varieties.” International Journal of Literary Linguistics
5, no. 3: 1–26.
Xiao, Richard. 2009. “Multidimensional Analysis and the Study of World Englishes.”
World Englishes 28, no. 4: 421–50.
8
The Avoidance of Repetition
in Translation
A Multifactorial Study of
Repeated Reporting Verbs in the
Italian Translation of the Harry
Potter Series
Lorenzo Mastropierro
8.1 Introduction
Repetition in language is ubiquitous. It is a “central linguistic meaning-making
strategy” (Tannen 1989, 97) that can occur on all linguistic levels (Wales 2011,
366) and in all kinds of registers and discourses (McCarthy and Carter 2014,
144f). As corpus linguistics insights have shown, the repetition of the same linguistic item or pattern can signal functional relevance (Mahlberg 2010, 297).
A repeated feature can, for example, fulfill discursive functions, such as expressing stances or organizing discourses, which may be register-specific (e.g., Conrad
and Biber 2004; Biber 2009); it can impart attitudinal and evaluating meanings
by co-occurring regularly with another item (Sinclair 2004; Partington 1998;
Stubbs 2002), or it can establish cohesive links throughout a text (Halliday and
Hasan 1976; Flowerdew and Mahlberg 2009). More relevantly for the scope of
this chapter, repetition can also be used for emphasis to foreground a textual element (Leech and Short 2007). The foregrounding function of repeated linguistic
elements is particularly important in the study of style, where repetition is seen as
one of the main devices through which stylistic effects are created.
Repetition has always played a central role in the discipline of stylistics, since
the early formalist discussions on the defamiliarizing potentials of poetic language (Mukařovský 1932). Wales (2011, 366) maintains that “it is impossible not
to appreciate the significance of repetition” in literary language, and this significance has been demonstrated by extensive research, both qualitative and quantitative. For example, Prusse (2012) and Paton (2009) adopt qualitative approaches,
manually identifying relevant examples of repetition in McGahern’s work and
in Beckett’s Lessness, respectively, and discussing their stylistic function. However, it is quantitative studies that have demonstrated more clearly the pervasive
role of repetition as a foregrounding device. In fact, corpus linguistic tools facilitate and improve the analysis of repetition, at the same time widening the range
of repeated patterns identifiable to encompass phenomena impossible to detect
manually. Corpus approaches have been used to explore the stylistic relevance
of repetition in a variety of patterns, such as clusters and n-grams (Ikeo 2016;
DOI: 10.4324/9781003298328-9
The Avoidance of Repetition in Translation 139
Mahlberg 2012, 2013), speech-bundles (Mahlberg et al. 2019), keywords (Vincent and Clarke 2017, Mahlberg and McIntyre 2011), keyword networks (Mastropierro and Mahlberg 2017), and collocations (Hori 2004), as well as to show
how repetition can achieve a range of different functions, for example, building
characterization (Ruano San Segundo 2017, 2018a, 2018b), creating narrative
prospection (Toolan 2016), and establishing point of view (Ikeo 2016). Overall,
the literature agrees in considering repetition a linguistic feature that can importantly contribute to the creation of a variety of stylistic effects and functions.
When it comes to writing conventions, though, repetition is seen unfavorably, especially in creative and literary writing, where “too much repetition . . .
can be tedious” (Leech and Short 2007, 199). “Repetitiveness” or “redundancy”
have pejorative connotations and, as McCarthy and Carter (2014, 145) remark,
being told “You’re repeating yourself” or “This piece is too repetitive” is usually perceived as a criticism. This is especially true for lexical repetition, which
is easily noticeable and can suggest a lack of premeditation typical of ordinary
speech, giving a written text the impression of being unsophisticated (Wales 2011,
366). Wales (2011, 36) explains that, for this reason, repetition is often avoided in
writing, in favor of synonymity and substitution, or, in Leech and Short’s (2007,
199) words, “elegant variation.” This stigma is mirrored in translation too, where
avoiding repetition has been famously equated to a universal norm by Ben-Ari
(1998, 3): “One of the most persistent and inflexible norms in translation, in all
languages studied (by myself and by colleagues elsewhere), is that of avoiding repetitions.” Even though the empirical research on the topic is insufficient
to confirm the universal nature of this tendency, the existing studies (see Section 8.2) have shown that the avoidance of repetition in favor of lexical variety
occurs in many of the language pairs and contexts analyzed, suggesting that such
a tendency does exist. Given that repetition is a stylistically relevant linguistic
phenomenon, it is then safe to expect that removing repetition in translation or
altering the patterns it creates in the source text (ST) can impact importantly on
the role that repetition has in the creation of stylistic effects.
Despite the importance of the topic and of its implications for translation practice and training, repetition in translation has not been studied as extensively as
it deserves, especially with data-based approaches, such as corpus linguistics. As
the next section will show, most of the existing studies focus on investigating the
stylistic consequences on the target text (TT) of replacing and/or altering repeated
patterns in the original. Even though this approach provides important evidence
of the impact that translator choices like these can have on the style of a literary text, it sheds little light on the nature of the phenomenon itself, concentrating instead on its consequences. Moreover, the latest developments in empirical
translation studies have demonstrated the importance of approaching translation
as a multidimensional phenomenon in which a variety of different factors interact
simultaneously to shape the final product. De Sutter and Lefer (2019, 6) argue
that multifactorial research designs are essential to capture the multidimensional
nature of the translation phenomenon, and yet no multifactorial analysis of repetition in translation is currently available. This chapter contributes to redress these
140
Lorenzo Mastropierro
gaps with a multifactorial study of repetition in translation that aims to describe
the linguistic context in which a repeated ST item is replaced by lexical variety in
the TT. More specifically, it investigates a set of factors (frequency of the ST item,
number of possible translation equivalents of the ST item, number of different
meanings of the ST item, semantic category of the ST item) as predictors of the
replacement of repetition with lexical variety. Through an analysis of the translation of repeated reporting verbs in the Italian version of the Harry Potter series,
this chapter offers a first multidimensional overview of the phenomenon and of
its occurrence.
8.2 Repetition, Translation, and Style
In line with its functional and stylistic relevance in linguistics, repetition has
been considered in translation studies as an “absolutely essential” feature of the
ST that “must be reproduced” in translation (Boase-Beier 1994, 407). Yet the
avoidance of repetition in favor of variety is seen as a dominant trend in realworld translation practice (Ben-Ari 1998), and many studies have provided evidence of this tendency. Zhu (2004), for instance, finds that leitmotifs that build on
lexical repetition in Galsworthy’s The Apple Tree and in Fitzgerald’s The Great
Gatsby are disrupted in their Chinese translations because the repeated lexical
items are translated as discrete units instead of networks. Similarly, al-Khafaji
(2006) shows that avoiding or minimizing repetition is the most commonly used
strategy to translate lexical repetition chains in the English version of an Arabic
short story. Jawad (2009) works on the same language pair (Arabic to English)
and reaches comparable conclusions: the foregrounding potential of rhetorical
repetition in Tāhā Hussein’s al-Ayyām is affected by the replacement of repetition
with “pervasive variation” (Jawad 2009, 768) in the English translation. Finally,
Zupan (2006) explains that repeated patterns in Poe’s “The Fall of the House of
Usher” are toned down in its Slovene translation, affecting the stylistic effect the
patterns enact in the original.
The spread of corpus linguistics methods in translation studies (i.e., corpusbased translation studies [CBTS], Granger and Lefer 2022; Mikhailov and
Cooper 2016; Hu 2016; Kruger et al. 2013) has brought additional evidence for
the tendency to avoid repetition in translation. However, even though most CBTS
research investigates repeated patterns in corpora of translated texts, only a minority of these studies focus on the translation of repetition explicitly. Although limited, the existing literature confirms the trend identified by the qualitative research
discussed previously. This is especially evident in the work by Čermáková, who
shows how different types of repeated patterns – keywords and clusters – are
rendered in literary translation. Čermáková and Fárová (2010) analyze the translation into Czech and Finnish of plot-relevant keywords in Harry Potter and the
Philosopher’s Stone. They show that, when multiple equivalents are available,
the translators vary their choices instead of using the same term consistently as
in the original. Similar findings are seen in the translation of clusters. Čermáková
(2015, 2018) finds out that the repetition of long, text-specific clusters in Irving’s
The Avoidance of Repetition in Translation 141
A Widow for One Year (Čermáková 2015) and in the Winnie the Pooh books
(Čermáková 2018) are avoided or toned down in translation. She suggests that
“repetition seems to be a source of discomfort for many translators” (Čermáková
2018, 130), as both the Czech and Finnish translators opt for different solutions
that disrupt the verbatim recurrence of the clusters. To the best of my knowledge,
the only study that reaches different conclusions is that of Károly (2010). Her
comparison of different types of repetition – as a feature of textual cohesion and
coherence – in Hungarian news texts and their English translations does not confirm repetition avoidance. The TTs do not contain fewer instances of repetition
than the STs; on the contrary, the English texts present a higher occurrence of
simple, verbatim repetitions than the Hungarian original, although the difference
is not statistically significant (Károly 2010, 63).
Although not aimed at studying the translation of repetition directly, research
on the translation of reporting verbs has produced findings aligned with the results
of the studies reported so far. This research is highly relevant, as it focuses on the
translation of a repeated linguistic feature in narrative prose that is commonly recognized as having important stylistic implications. In fact, given their fundamental
contribution to characterization (Culpeper 2001; Ruano San Segundo 2016, 2017;
Eberhardt 2017; Mastropierro forthcoming), reporting verbs are often seen as an
important stylistic feature in literary texts, one that should be reproduced in translation “in order to replicate the characterising traits they endow to the characters
associated with them” (Mastropierro 2020, 243). Yet the existing literature on the
topic shows the same tendency to replace repetition with lexical variety, through
translating the same reporting verbs into multiple target language items, especially
from English into other languages. An increase in the lexical variety of reporting
verbs from source to target texts is reported in translations from English into Spanish (Rojo and Valenzuela 2001; Bourne 2002), Czech (Corness 2010; Čermáková
and Mahlberg 2018), Hungarian (Klaudy and Károly 2005), German (Winters
2007), and Italian (Mastropierro 2020). It is often said, by far the most frequent
English reporting verb, that is translated in many different ways. For example,
Corness (2010, 162) finds that the 9,992 occurrences of said in his English novels
corpus are translated into 1,323 different ways, while Mastropierro (2020, 254)
reports that 5,825 repetitions of said in the Harry Potter series are translated into
207 different Italian verbs. The predominance of said in reporting verbs usage in
English narrative prose, compared to the use of its equivalent in other languages,
has led some scholars (e.g., Rojo and Valenzuela 2001, 476; Levý 2011, 113) to
suggest that it may be a difference in narrative styles between English and the
target languages – the former tolerating the repetition of said far more than the latter – the reason why this type of repetition is replaced with lexical variety. Even if
this is the case, Mastropierro (2020) shows that replacing said, an “interpretatively
empty” reporting verb (Ruano San Segundo 2017, 111), with more semantically
loaded verbs can disrupt the pattern relating verb usage to specific characters,
affecting the potentials of patterns of verbs to convey characterization.
More generally, the impact of avoiding repetition in translation has been convincingly demonstrated. Translating repeated items or patterns as discrete units
142
Lorenzo Mastropierro
disrupts the networks of significance that repetition creates. This disruption may
be local, when repetition occurs in a short portion of text. For example, Zupan
(2006, 265) discusses the choice of the translator to remove the repetition of
three appositions occurring one after the other in a short passage of Poe’s “The
Fall of the House of Usher.” More often, though, repeated items build a network
of recurrences that develops throughout a text, with each individual occurrence
emphasizing and strengthening the overall pattern. In these cases, the disruption
resulting from altering the patterns of repetition is not local but textual, or, to
put it in Zupan’s (2006, 267) words, “the way the translated text functions at the
macrostructural level, compared to the original text.” Macrostructural changes
have been reported, for example, in the Italian translation of the Harry Potter
series (Mastropierro 2020), where the replacement of patterns of repeated reporting verbs with lexical variety at the level of verb usage results in potential shifts in
character development throughout the series. Similarly, Čermáková and Mahlberg
(2018) find macrostructural changes in the modelling of Alice and the Queen in
the Czech version of Carroll’s Alice’s Adventures in Wonderland as a result of
an inconsistent translation of the ST repeated reporting verbs and body language
patters throughout the novel. However, if the stylistic and interpretational implications of repetition avoidance have been evidenced (e.g., Mastropierro 2020;
Čermáková and Mahlberg 2018; Čermáková 2015, 2018; Čermáková and Fárová
2010; Zupan 2006), and the strategies used by the translators to avoid repetition
discussed (e.g., Ben-Ari 1998; al-Khafaji 2006; Jawad 2009), insufficient attention has instead been paid to the description of the phenomenon itself. There is,
for instance, little or no discussion, substantiated by corpus data, of when repetition avoidance occurs, with what types of items, and in what textual contexts.
Aiming to contribute to redress this gap, the present chapter presents a study of
repetition avoidance in translation that focuses on the occurrence of the phenomenon itself rather than on its translational consequences. In particular, it investigates multiple linguistic factors as predictors of repetition avoidance. In this way,
this chapter sheds some light on the linguistic nature of those items the repetition
of which is avoided in translation, furthering our understanding of the reasons
that the phenomenon may be occurring so frequently in translation. Although
essentially descriptive, the findings of this chapter can be beneficial for improving translation practice and training: we need to study how and when repetition
avoidance happens so we can address the why it is happening. The remainder of
this chapter is structured as follows. The next section describes the methodology
for data collection and analysis, and describes the factors taken into account as
potential predictors of repetition avoidance. Section 8.4 presents the results of
the multifactorial analysis and discusses them, while Section 8.5 provides some
concluding remarks.
8.3 Methods and Data
Recent developments in empirical translation studies have emphasized the multidimensional nature of the translation phenomenon. As any other linguistic
The Avoidance of Repetition in Translation 143
phenomenon, translation too is shaped by a multitude of factors, the interaction
of which defines the final product. This is true at both the macrolevel, where
aspects such as translator expertise, language pair, or communicative purpose of
the TT frame translating practices, and at the microlevel, where linguistic factors like polysemy, availability of equivalents, or lexicosyntactic constraints
influence the choices of the translator. De Sutter and Lefer (2019) argue that, to
understand fully the multidimensional nature of translation phenomena, multifactorial research designs are essential. Multifactorial studies allow the researcher
to investigate simultaneously the effect of multiple variables on a phenomenon,
providing a prediction of whether, and to what extent, such variables influence the
observed phenomenon and/or its features. They have been successfully employed
in a number of recent studies in CBTS (e.g., Kajzer-Wietrzny et al. 2021; KajzerWietrzny and Grabowski 2021; De Sutter and Lefer 2019; Kruger 2018), which
demonstrates that multifactorial approaches can be used to account flexibly for a
multitude of factors that govern the translation process and define its products. As
such, a multifactorial design is employed in this study too.
More specifically, this chapter explores the avoidance of repetition of reporting verbs in the Italian translation of the Harry Potter series. Reporting verbs
were specifically selected for this study because, as mentioned in Section 8.2,
they are a highly significant stylistic feature, the repetition of which has been
demonstrated to play a fundamental role in the characterization process. Moreover, their contribution to characterization and character development in the Harry
Potter books has already been established (Eberhardt 2017; Mastropierro 2020).
By using reporting verbs, I can therefore ensure that the type of repetition taken
into account is one that is stylistically relevant and of importance in translation.
This type of repetition is investigated in terms of the extent to which four factors,
representing linguistic features of the ST verbs, have an effect on the chance of
a given ST reporting verb to be translated into multiple Italian reporting verbs.
These factors are (i) the frequency of the ST verb, (ii) the number of different
Italian equivalents of the ST verb, (iii) the number of different meanings of the
ST verb, and (iv) the semantic category of the ST verb. I will discuss the factors
in detail below, but first the procedure for data gathering and the data itself will
be illustrated.
Reporting verbs used to report the speech of the three protagonists of the Harry
Potter books – Harry Potter, Ron Weasley, and Hermione Granger – were collected using InterCorp (Čermák and Rosen 2012), a multilingual parallel corpus
developed in the context of the Czech National Corpus project (www.korpus.cz/).
InterCorp offers a wide range of literary texts and their translation in several
languages, including the seven Harry Potter novels and their Italian translations.
A query combining both CQL syntax and regular expressions was employed to
retrieve the reporting verbs from the original texts, through InterCorp web-based
concordancer, KonText (Machálek 2014). The query identified all instances of
closing quotation marks, followed by the name of one of the three characters,
followed by a simple past verb – or closing quotation marks, followed by simple past verb, followed by name of the character. Figure 8.1 shows a sample of
144
Lorenzo Mastropierro
Figure 8.1 Concordance sample for reporting verbs attributed to Harry.
concordance lines retrieved in this way for Harry. These lines were manually
checked, and reporting verbs were recorded, while verbs occurring in the specified position that were not reporting verbs (as with heard in line 6 of Figure 8.1)
were removed.
This query did not identify instances in which the characters were referred
to with pronouns, but it allowed me to focus on the series’ protagonists without having to disambiguate manually the referent of each individual pronoun.
After manually cleaning the data and removing instances of reporting verbs that
were not repeated (that is, verbs with a minimum frequency < 2), 7,806 reporting
verbs were gathered. In addition to using the data as a whole, the verbs were also
divided into subsets to reflect different levels of distribution across the series.
Three subsets were used. The first two divide the series on the basis of the translator. In fact, in Italian the novels were translated by two different translators:
Marina Astrologo translated the first two novels, while Beatrice Masini translated
the following five. The third subset comprises one novel only, Harry Potter and
the Half-blood Prince (HbP). In this way, this study considers repetition translation and avoidance in incremental sizes of one book, two books (translated by
Astrologo), five books (translated by Masini), and seven books (whole series).
Moreover, given that the series has been translated by two different translators, a
cross-comparison between them is possible, which could help understand whether
the findings are translator-specific or could be indicative of a more generalized
trend. Table 8.1 provides an overview of the data and subsets.
The Italian translations of the reporting verbs were retrieved using the aligned
functionalities of InterCorp. The corpus aligns automatically STs and TTs at the
level of sentences, providing parallel concordance lines. Thus, English reporting verbs were searched individually so that all their Italian translations could be
identified and retrieved.
Table 8.1 Overview of the Data and Sub-Datasets
HbP (one book)
Translator 1 (two books)
Translator 2 (five books)
Series (seven books)
Reporting Verb Types
Reporting Verb Tokens
29
44
79
85
1,165
1,072
6,715
7,806
The Avoidance of Repetition in Translation 145
Figure 8.2 Query for reply in Treq.
For each verb, four features or factors were recorded. Factor 1, labelled “Freq,”
is the raw frequency of the reporting verb in each dataset. This was simply retrieved
directly from the data gathered through InterCorp as described previously. Factor
2, labelled “Trans,” is the number of different possible Italian translations of the
reporting verb. To retrieve this number, Treq (Vavřín and Rosen 2015) was used.
Treq is a translation equivalents database freely available online as part of the
Czech National Corpus suit of tools (https://treq.korpus.cz/#). It uses InterCorp
to provide a list of equivalents for a query word in any of the language pairs
available on the parallel corpus. For example, Figure 8.2 shows the results of a
query for reply. Lemmas were searched, and only reporting verbs with a minimum
proportion of occurrence > 4% were recorded. In this case, risposta, rispondere,
and replicare are used as translation of reply in 43.1%, 43%, and 5% of the cases,
respectively. However, risposta is not a reporting verb, so it was excluded; hence,
“Trans” for reply would be 2.
Factor 3, labelled “Senses,” is the number of possible different meanings of the
ST verb. To retrieve this number, WordNet 3.1 (Fellbaum 1998) was employed.
WordNet (https://wordnet.princeton.edu/) is a popular lexical database of English
146
Lorenzo Mastropierro
Figure 8.3 Query for urged in WordNet.
that provides a list of distinct concepts a word can refer to. Figure 8.3 shows
a screenshot of a query for urged, for which three senses are provided; hence,
“Senses” for urged would be 3.
Finally, Factor 4, labelled “Verb_type,” indicates the type of reporting verb
based on Caldas-Coulthard’s (1987) taxonomy. This taxonomy categorizes reporting verbs into seven main types, encompassing both linguistic and paralinguistic
features. Table 8.2 provides an overview of all the main and subtypes of reporting
Table 8.2 Reporting Verb Taxonomy
Category
Subcategory
say, tell
ask, inquire, reply, answer
Neutral
Structuring
Metapropositional
Assertive
Directive
Expressive
Metalinguistic
exclaim, proclaim, agree
urge, instruct, order
accuse, lament, swear
narrate, quote, recount
cry, shout, scream
Prosodic
Paralinguistic
Examples
Voice qualifier
Voice qualification
Signaling discourse
Source: Adapted from Caldas-Coulthard (1987).
whisper, murmur, mutter
laugh, sigh, groan
repeat, add, go on, hesitate
The Avoidance of Repetition in Translation 147
verbs, with some examples. The types of verbs that are relevant for this study will
be discussed in more detail in the analysis section, while for a full description of
the taxonomy, see Caldas-Coulthard (1987).
A multifactorial analysis was employed to study the impact of these four factors, or predictor variables, on a fifth variable, the outcome variable, that is, the
number different Italian translations each ST reporting verb is translated into,
labelled “Types.” In other words, the analysis will show whether the linguistic
features of the ST reporting verbs that the factors represent have an effect on
the translation of repetition or on its avoidance. To do so, a generalized linear
model with Poisson distribution was fitted, as “Types” is a count variable with
which Poisson regression is typically employed (Winter 2020, 218). R (R Core
Team 2021) was used to run the analysis, and specifically the glm function. The
analysis was repeated for each dataset (HbP, Translator 1, Translator 2, Series),
and the results were compared. The results of the analysis and their discussion are
presented in the next section.
8.4 Multifactorial Analysis
Preliminary analyses were run to find the most efficient model, in line with what
Brezina (2018, 123) defines a “hybrid procedure.” Starting with a model with no
predictor variables, predictors were added and/or deleted, and the model reassessed with each change on the basis of Akaike information criterion (AIC). AIC
was used to test whether the fit of the model improved with the addition or deletion of a variable. These preliminary tests showed that the most efficient models,
the ones with the lowest AIC, included two variables, “Freq” and “Verb_types,”
while the addition of “Senses” and “Trans” did not improve the overall fit. Moreover, during the tests it was also noticed that the distribution of “Freq,” a numerical
factor, was highly skewed, so the variable was logarithmically transformed, as
is usually done in variational-linguistic research (De Sutter and Lefer 2019, 10).
Results (in Tables 8.3 to 8.6) show that “Freq” and “Verb_types” are significant predictors of “Types,” while “Trans” and “Senses” do not have a significant
effect. This means that how many times a reporting verb occurs and what type of
verb it is influence the chances of seeing that verb translated into multiple target
language items. On the other hand, the number of translation equivalents and
meanings of the ST verb do not determine whether its repetition is reproduced or
avoided. This outcome is consistent across all datasets, meaning, that the same
factors are significant predictors independently on the size of the data taken into
account (one book, two books, five books, or the whole series) or between the
two translators.
The plots in Figure 8.4 show the nature and size of the effect that the predictor
“Freq” has on “Types.” The more often an ST reporting verb occurs (x axis), the
more likely it is that the verb is translated into multiple different translations (y
axis). In other words, the more often a verb is repeated in the original, the less
likely the repetition is reproduced in translation. Again, the same outcome is seen
in all datasets. In addition to providing further evidence for the tendency to avoid
148
Lorenzo Mastropierro
Table 8.3 Generalized Linear Model: Series
NULL
Verbtype
LogFreq
Df
Deviance
Resid.
Df
Resid.
Dev
Pr(> Chi)
7
1
711.85
362.06
77
76
84
411.41
49.35
1123.26
<
<
2.20E-16
2.20E-16
***
***
Estimate
Std.
Error
z Value
-0.03215
0.12438
1.01132
0.04203
0.16711
-0.2436
0.11579
0.22753
0.48313
0.57814
0.5884
0.62759
0.60895
0.60355
0.64947
0.60846
0.58926
0.03297
-0.056
0.211
1.611
0.069
0.277
-0.375
0.19
0.386
14.654
0.956
0.833
0.107
0.945
0.782
0.708
0.849
0.699
<2e-16
Coefficients:
(Intercept)
VerbtypeMprop
VerbtypeN
VerbtypePros
VerbtypeSdis
VerbtypeStr
VerbtypeVier
VerbtypeVion
LogFreq
Table 8.4 Generalized Linear Model: Translator 1
NULL
Verbtype
LogFreq
Df
Deviance
Resid.
Df
Resid.
Dev
Pr(> Chi)
6
1
179.807
69.052
37
36
43
85.202
16.15
265.009
<
<
2.20E-16
2.20E-16
***
***
Coefficients:
(Intercept)
VerbtypeN
VerbtypePros
VerbtypeSdis
VerbtypeStr
VerbtypeVier
VerbtypeVion
LogFreq
Estimate
Std.
Error
z Value
-0.002834
0.114604
-0.679034
-0.064428
-2.119339
0.029688
0.033445
0.583401
0.209837
0.491797
0.448877
0.417583
1.03578
0.297997
0.260963
0.087144
-0.014
0.233
-1.513
-0.154
-2.046
0.1
0.128
6.695
0.9892
0.8157
0.1303
0.8774
0.0407
0.9206
0.898
2.16E-11
repetition seen in previous studies, this finding seems also to confirm that such a
tendency could be the result of an intentional choice. In fact, it is the most marked
instances of repetition – hence the most noticeable – that are more likely to be
replaced by lexical variety in translation. However, as the widening of confidence
intervals suggests, certainty of the models decreases when “Freq” increases, as
there are not many data points with very high frequency. There is actually one
The Avoidance of Repetition in Translation 149
Table 8.5 Generalized Linear Model: Translator 2
NULL
Verbtype
LogFreq
Df
Deviance
Resid.
Df
Resid.
Dev
Pr(> Chi)
7
1
657.42
323.43
71
70
78
367.74
44.31
1025.16
<
<
2.20E-16
2.20E-16
***
***
Estimate
Std.
Error
z Value
-0.33539
0.36548
1.30278
0.36762
0.44711
0.15346
0.23736
0.49293
0.48387
1.0003
1.00804
1.03439
1.02007
1.01678
1.04405
1.02466
1.00811
0.03519
-0.335
0.363
1.259
0.36
0.44
0.147
0.232
0.489
13.75
0.737
0.717
0.208
0.719
0.66
0.883
0.817
0.625
<2e-16
Coefficients:
(Intercept)
VerbtypeMprop
VerbtypeN
VerbtypePros
VerbtypeSdis
VerbtypeStr
VerbtypeVier
VerbtypeVion
LogFreq
Table 8.6 Generalized Linear Model: HbP
NULL
Verbtype
LogFreq
Df
Deviance
Resid.
Df
Resid.
Dev
Pr(> Chi)
6
1
444.06
117.43
22
21
28
121.04
3.61
565.11
<
<
2.20E-16
2.20E-16
***
***
Coefficients:
(Intercept)
VerbtypeN
VerbtypePros
VerbtypeSdis
VerbtypeStr
VerbtypeVier
VerbtypeVion
LogFreq
Estimate
Std.
Error
z Value
-0.63511
1.56509
0.71357
0.19087
-0.28812
0.34638
0.58982
0.55745
0.37724
0.50393
0.45649
0.5
0.60491
0.50241
0.44569
0.07348
-1.684
3.106
1.563
0.382
-0.476
0.689
1.323
7.586
0.0923
0.0019
0.118
0.7027
0.6339
0.4905
0.1857
3.29E-14
point only the frequency of which is much higher than that of all other verbs, as
it can be clearly seen in the plots in Figure 8.4. That point represents said, which
is the reporting verb that in all datasets occurs by far the most frequently and has
been translated into the largest number of different translations.
In order to check that such an outlier was not skewing the data and affecting excessively the prediction of the models, the analyses were repeated without said. The
150
Lorenzo Mastropierro
Figure 8.4 “Freq” effect plots.
resulting effect plots are collected in Figure 8.5. As the plots show, removing said
does not change the model predictions, which still show clearly a positive correlation
between the number of different translations and the frequency of the reporting verb.
Moving on to “Verb_type,” the other predictor with a significant effect on
“Types,” the same consistency of results among all datasets can be noticed. The
coefficients’ estimates (seen in Tables 8.3 to 8.6) indicate that the verb types
that are more or less likely to be translated in many different ways are the same
between the two translators and across the different number of novels taken into
account. The repetition of neutral verbs (“VerbtypeN”) and voice qualification
verbs (“VerbtypeVion”) is more likely to be avoided in favor of lexical variety,
while the repetition of structuring verbs (“VerbtypeS”) is more likely to be reproduced unaltered. Neutral verbs (e.g., said and told) simply indicate the illocutionary act and are “interpretatively empty” (Ruano San Segundo 2017, 111). As
blank verbs, they lend themselves to be translated into multiple ways, by adding
additional meanings to the neutral “baseline.” For instance, example 1 shows a
case in which the neutral verb said has been translated into the metapropositional
verb decretò (“decreed”), adding decisiveness and finality to what Ron said.
(1) ENG: “He can’t have,” said Ron.
ITA: “Impossibile,” decretò Ron. [“Impossible,” decreed Ron.]
(Harry Potter and the Deathly Hallows)
The Avoidance of Repetition in Translation 151
Figure 8.5 “Freq” effect plots without said.
Moreover, neutral verbs are extremely frequent, especially said, which is the
most frequent reporting verb in the data and in English more generally (see Section 8.2). As explained in Section 8.2, previous research (Rojo and Valenzuela
2001, 476; Levý 2011, 113) has suggested that the repetition of said is much
more common and tolerated in English than its equivalents are in other languages,
including Italian (Mastropierro 2020). Thus, in addition to adding meaning to
supplement the neutral nature of the original verb, the translators may have also
aimed to conform to the stylistic norms of the target language.
Voice qualification verbs (e.g., gasped, hissed, panted, roared) are a subcategory of paralinguistic verbs that “mark the attitude of the speaker in relation to
what is being said” (Caldas-Coulthard 1987, 163) through paralinguistic cues.
Differently from neutral verbs, they do convey an interpretational meaning that
is added to the propositional content of the reported speech; in other words, how
something is said complements the meaning of what is said. For example, in
“I don’t believe this,” snarled Harry, the intensity and fervor of what Harry says
is conveyed by the use of the verb snarled rather than by the proposition he utters.
The repetition of voice qualification verbs in the STs is replaced with a range of
verb types in the TTs, resulting in a wider lexical variety compared to the original. Not only Italian voice qualifications verbs are used to translate them but also
metapropositional, signaling discourse, and neutral verbs. Some instances can be
seen in examples 2, 3, and 4.
152
Lorenzo Mastropierro
(2) ENG: “No – everyone’s fine – ” gasped Harry.
ITA: “No . . . Stanno tutti bene . . .” balbettò Harry. [“No . . . everyone is
fine,” stammered Harry.]
(Harry Potter and the Order of the Phoenix)
(3) ENG: “Water” panted Harry.
ITA: “Acqua” ripeté Harry. [“Water,” repeated Harry.]
(Harry Potter and the Half-Blood Prince)
(4) ENG: “I don’t believe this,” snarled Harry.
ITA: “Non ci posso credere” sbottò Harry. [“I can’t believe this,” snapped
Harry.]
(Harry Potter and the Order of the Phoenix)
On the other end of the cline, that is, the types of verbs that are less likely to be
translated in many different ways, are structuring verbs. Structuring verbs indicate
that the reported utterance is part of a speech act exchange (Caldas-Coulthard 1987,
155), signaling either prospection (e.g., asked or enquired) or retrospection (e.g.,
answer or reply). The vast majority of the originals’ structuring verbs are translated
with Italian structuring verbs, reproducing the repetition from the STs to the TTs.
Structuring verbs enact quite literally a structuring function, establishing the answerreply sequence. They cannot be replaced with another type of verbs without affecting their role in structuring the exchange. This may be the reason why the repetition
of this type of verbs is more likely to be reproduced in translation, compared to other
types of verbs, neutral and voice qualification verbs especially. Example 5 shows an
instance of a STs structuring verb translated into an Italian structuring verb.
(5) ENG: “All right if we join you?” asked Ron.
ITA: “Ti va bene se ci sediamo qui?” le chiese Ron. [Is it okay with you if
we sit here?” Ron asked her.]
(Harry Potter and the Deathly Hallows)
Even though understanding the reasons behind the choice of the translators
to avoid or maintain the repetition of certain types of verbs is beyond the scope
of this chapter, as further research is needed to do so, a tentative hypothesis can
be suggested. The decision to translate the same verb in many different ways
may depend on the perceived extent of the alteration. Translators may feel that
by replacing a neutral verb with a metapropositional or prosodic verb, they are
simply adding some extra flavor to an otherwise-blank basis (i.e., said). Equally,
by replacing a voice qualification verb with a different voice qualification verb,
translators may have the perception of modifying the flavor of the verb without
altering substantially its core denotational meaning. However, replacing a structuring verb may have been perceived as a more marked change, involving not
simply a shift of meaning but an alteration of the very organization of the reported
exchange. This perception is misleading though, as it narrowly fixates on the
The Avoidance of Repetition in Translation 153
individual case, disregarding the bigger picture. As has been discussed in Section 8.2, individual changes in favor of lexical variety, when reiterated throughout
a text, can result in macrostructural alterations. Repeated patterns that develop
across the text are built on the reiteration of individual items; altering the repetition of the individual items affects the overall pattern. Hence, even changes that
may seem uninfluential at first, like translating a verb into a different verb from
the same verb category or replacing a neutral verb like said with a more meaningful verb, when reiterated can have an impact on how a text functions on the macrostructural level. This has been shown in the context of the Harry Potter series
specifically (Mastropierro 2020), where differences in the ratios and proportions
of different verb types across the novels between the STs and the TTs result in
potential alterations to characters’ development.
8.5 Conclusion
Repetition in translation is an underexplored phenomenon, but the existing
research pointed out a tendency to avoid repetition in favor of lexical variety.
Many studies emphasized the potential effects that such a tendency can have in
translation – especially in terms of its impact on the style of literary texts – or
described the strategies used by translators to avoid repetition. However, little or
no attention has been paid to the description of the actual phenomenon itself, for
example in terms of contexts in which it can occur or factors that could influence
it. The present study shed some light on these aspects of the phenomenon, using
repeated reporting verbs in the Harry Potter series and its Italian translation as
a case study. It used a multifactorial research design to understand the linguistic
features of the verbs that are more or less likely to be translated into multiple different ways. It showed that some of these features can indeed predict the way verb
repetition is translated.
Two factors were shown to have a significant effect on the avoidance of repetition in translation: the frequency and type of the ST verb. The more frequent a
verb, hence the more noticeable its repetition, the more likely is its translation into
different target language verbs. At the same time, some types of verbs, such as
neutral and voice qualification verbs, are more likely to be translated into multiple
ways compared to other types, such as structuring verbs. In contrast, the numbers
of different meaning and different translation equivalents of the ST verb do not
influence how the repetition of that verb is treated in translation, suggesting that
polysemy and wider availability of target language options are not significant
factors in determining the avoidance of repetition or otherwise. These findings
were identified across a range of datasets that consisted of different number of
novels (one, two, five, and seven) and two different translators, suggesting a certain degree of generalizability. Of course, these remain preliminary findings, as
further research with more and different data is needed to confirm the tendencies
identified here. However, it is hoped that the present study showed an alternative
approach to the study of repetition in translation, one in which the data-based and
multifactorial description of the phenomenon itself takes center stage instead of
154
Lorenzo Mastropierro
its consequences. We know what the effects of repetition avoidance can be; we
need now to understand when, how, and why repetition avoidance happens, to
inform better approaches to translation training and practice.
References
Al-Khafaji, Rasoul. 2006. “In Search of Translational Norms: The Case of Shifts in Lexical
Repetition in Arabic-English translations.” Babel 52, no. 1: 39–65.
Ben-Ari, Nitsa. 1998. “The Ambivalent Case of Repetitions in Literary Translation. Avoiding Repetitions: A ‘Universal’ of Translation.” Meta 43, no. 1: 68–78.
Biber, Douglas. 2009. “A Corpus-driven Approach to Formulaic Language in English:
Multi-word Patterns in Speech and Writing.” International Journal of Corpus Linguistics 14, no. 3: 275–311.
Boase-Beier, Jean. 1994. “Translating Repetition.” Journal of European Studies 24: 403–9.
Bourne, Julian. 2002. “Controlling Illocutionary Force in the Translation of Literary Dialogue.” Target 14, no. 2: 241–61.
Brezina, Vaclav. 2018. Statistics in Corpus Linguistics: A Practical Guide. Cambridge:
Cambridge University Press.
Caldas-Coulthard, Carmen Rosa. 1987. “Reported Speech in Written Narrative Texts.” In
Discussing Discourse, edited by Malcolm Coulthard, 149–67. Birmingham: University
of Birmingham.
Čermák, František, and Alexandr Rosen. 2012. “The Case of InterCorp, a Multilingual
Parallel Corpus.” International Journal of Corpus Linguistics 17, no. 3: 411–27.
Čermáková, Anna. 2015. “Repetition in John Irving’s Novel a Widow for One Year. A Corpus Stylistic Approach to Literary Translation.” International Journal of Corpus Linguistics 20, no. 3: 355–77.
Čermáková, Anna. 2018. “Translating Children’s Literature: Some Insights from Corpus
Stylistics.” Ilha Desterro 71, no. 1: 117–33.
Čermáková, Anna, and Lenka Fárová. 2010. “Keywords in Harry Potter and Their Czech
and Finnish Translation Equivalents.” In InterCorp: Exploring a Multilingual Corpus,
edited by František Čermák, Patrick Corness, and Aleš Klégr, 177–88. Prague: NLN.
Čermáková, Anna, and Michaela Mahlberg. 2018. “Translating Fictional Characters – Alice
and the Queen from the Wonderland in English and Czech.” In The Corpus Linguistics
Discourse. In Honour of Wolfganf Teubert, edited by Anna Čermáková and Michaela
Mahlberg, 223–53. Amsterdam and Philadelphia, PA: John Benjamins.
Conrad, Susan, and Douglas Biber. 2004. “The Frequency and Use of Lexical Bundles in
Conversation and Academic Prose.” Lexicographica 20: 56–71.
Corness, Patrick. 2010. “Shifts in Czech Translation of the Reporting Verb Said in English
Fiction.” In InterCorp: Exploring a Multilingual Corpus, edited by František Čermák,
Patrick Corness, and Aleš Klégr, 177–88. Prague: NLN.
Culpeper, Jonathan. 2001. Language and Characterisation: People in Plays and Other
Texts. Harlow: Pearson Education.
De Sutter, Gert, and Marie-Aude Lefer. 2019. “On the Need for a New Research Agenda
for Corpus-based Translation Studies: A Multi-methodological, Multifactorial and Interdisciplinary Approach.” Perspectives 28, no. 1: 1–23.
Eberhardt, Maeve. 2017. “Gendered Representations Through Speech: The Case of the
Harry Potter Series.” Language and Literature 26, no. 3: 227–46.
Fellbaum, Christiane, ed. 1998. WordNet: An Electronic Lexical Database. Cambridge,
MA: MIT Press.
The Avoidance of Repetition in Translation 155
Flowerdew, John, and Michaela Mahlberg, eds. 2009. Lexical Cohesion and Corpus Linguistics. Amsterdam and Philadelphia, PA: John Benjamins.
Granger, Sylviane, and Marie-Aude Lefer, eds. 2022. Extending the Scope of Corpusbased Translation Studies. London: Bloomsbury.
Halliday, Michael Alexander Kirkwood, and Ruqaiya Hasan. 1976. Cohesion in English.
London: Longman.
Hori, Masahiro. 2004. Investigating Dickens’ Style. A Collocational Analysis. Basingstoke:
Palgrave Macmillan.
Hu, Kaibao. 2016. Introducing Corpus-based Translation Studies. Heidelberg and Berlin:
Springer.
Ikeo, Reiko. 2016. “An Analysis of Viewpoints by the Use of Frequent-multi-word
Sequences in DH Lawrence’s Lady Chatterley’s Lover.” Language and Literature 25,
no. 2: 159–84.
Jawad, Hisham. 2009. “Repetition in Literary Arabic: Foregrounding, Backgrounding, and
Translation Strategies.” Meta – Translators’ Journal 54, no. 4: 753–69.
Kajzer-Wietrzny, Marta, and Łukasz Grabowski. 2021. “Formulaicity in Constrained Communication: An Intermodal Approach.” MonTI 13: 148–83.
Kajzer-Wietrzny, Marta, Ilmari Ivaska, and Adriano Ferraresi. 2021. “ ‘Lost’ in Interpreting and ‘Found’ in Translation: Using an Intermodal, Multidirectional Parallel Corpus to
Investigate the Rendition of Numbers.” Perspectives 29, no. 4: 469–88.
Károly, Krisztina. 2010. “Shifts in Repetition vs. Shifts in Text Meaning. A Study of the
Textual Role of Lexical Repetition in Non-literary Rranslation.” Target 22, no. 1: 40–70.
Klaudy, Kinga, and Krisztina Károly. 2005. “Implicitation in Translation: Empirical Evidence for Operational Asymmetry in Translation.” Across Languages and Cultures 6,
no. 1: 13–28.
Kruger, Alet, Kim Wallmach, and Jeremy Munday, eds. 2013. Corpus-based Translation
Studies: Research and Applications. London: Bloomsbury.
Kruger, Haidee. 2018. “That Again: A Multivariate Analysis of the Factors Conditioning Syntactic Explicitness in Translated English.” Across Languages and Cultures 20, no. 1: 1–33.
Leech, Jeoffrey, and Mick Short. 2007. Style in Fiction: A Linguistic Introduction to English Fictional Prose. 2nd ed. Harlow: Pearson Longman.
Levý, Jiří. 2011. The Art of Translation. Translated by Patrick Corness. Amsterdam and
Philadelphia, PA: John Benjamins.
Machálek, Tomáš. 2014. KonText – Application for Working with Language Corpora [Computer software]. Prague: FF UK. http://kontext.korpus.cz. Accessed November 2019.
Mahlberg, Michaela. 2010. “Corpus Linguistics and the Study of Nineteenth-century Fiction.” Journal of Victorian Culture 15, no. 2: 292–98.
Mahlberg, Michaela. 2012. “The Corpus Stylistic Analysis of Fiction or the Fiction of Corpus Stylistics?” In Corpus Linguistics and Variation in English: Theory and Description,
edited by Joybrato Mukherjee and Magnus Huber, 77–95. Amsterdam: Rodopi.
Mahlberg, Michaela. 2013. Corpus Stylistics and Dickens’s Fiction. London and New
York: Routledge.
Mahlberg, Michaela, and Dan McIntyre. 2011. “A Case for Corpus Stylistics: Ian Fleming’s Casino Royale.” English Text Construction 4, no. 2: 204–27.
Mahlberg, Michaela, Viola Wiegand, Peter Stockwell, and Anthony Hennessey. 2019.
“Speech-bundles in the 19th-century English Novel.” Language and Literature 28, no.
4: 326–53.
Mastropierro, Lorenzo. 2020. “The Translation of Reporting Verbs in Italian: The Case of
the Harry Potter Series.” International Journal of Corpus Linguistics 25, no. 3: 241–69.
156
Lorenzo Mastropierro
Mastropierro, Lorenzo. (forthcoming). “Gendered Voices in Translation: Reporting
Verbs in the Italian Translation of the Harry Potter Series.” In Good Girls and Brave
Boys: 19th Century and Contemporary Children’s Literature and Childhood, edited by
Michaela Mahlberg and Anna Čermáková. London: Bloomsbury.
Mastropierro, Lorenzo, and Michaela Mahlberg. 2017. “Key Words and Translated Cohesion in Lovecraft’s at the Mountains of Madness and One of Its Italian Translations.”
English Text Construction 10, no. 1: 78–105.
McCarthy, Michael, and Ronald Carter. 2014. Language as Discourse: Perspectives for
Language Teaching. London and New York: Routledge.
Mikhailov, Mikhail, and Robert Cooper. 2016. Corpus Linguistics for Translation and
Contrastive Studies: A Guide for Research. London and New York: Routledge.
Mukařovský, Jan. 1932 “Standard Language and Poetic Language.” In A Prague School
Reader on Aesthetics, Literary Structure and Style, edited and translated by Paul Garvin,
17–30. Washington, DC: Georgetown University Press.
Partington, Alan. 1998. “Connotation and Semantic Prosody.” In Patterns and Meaning:
Using Corpora for English Language Research and Teaching, edited by Alan Partington,
65–78. Amsterdam and Philadelphia, PA: John Benjamins.
Paton, Steven. 2009. “Tile-Lessness, Simultaneity and Successivity: Repetition in Beckett’s Short Prose.” Language and Literature 18, no. 4: 357–66.
Prusse, Michael. 2012. “Repetition, Difference and Chiasmus in John McGahern’s Narratives.” Language and Literature 21, no. 4: 363–80.
R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna: R
Foundation for Statistical Computing.
Rojo, Ana, and Javier Valenzuela. 2001. “How to Say Things with Words: Ways of Saying
in English and Spanish.” Meta 46, no. 3: 467–77.
Ruano San Segundo, Pablo. 2016. “A Corpus-stylistic Approach to Dickens’ Use of Speech
Verbs: Beyond Mere Reporting.” Language and Literature 25, no. 2: 113–29.
Ruano San Segundo, Pablo. 2017. “Reporting Verbs as a Stylistic Device in the Creation of
Fictional Personalities in Literary Texts.” Journal of the Spanish Association of AngloAmerican Studies 39, no. 2: 105–24.
Ruano San Segundo, Pablo. 2018a. “An Analysis of Charles Dickens’s Gender-based Use
of Speech Verbs.” Gender and Language 12, no. 2: 192–217.
Ruano San Segundo, Pablo. 2018b. “Dickens’s Hyperbolic Style Revisited: Verbs That
Describe Sounds Made by Animals Used to Report the Words of Male Villains.” Style
52, no. 4: 475–93.
Sinclair, John. 2004. Trust the Text: Language, Corpus and Discourse. London and New
York: Routledge.
Stubbs, Michael. 2002. Words and Phrases: Corpus Studies of Lexical Semantics. Oxford:
Blackwell Publishers.
Tannen, Deborah. 1989. Talking Voices: Repetition, Dialogue and Imagery in Conversational Discourse. Cambridge: Cambridge University Press.
Toolan, Michael. 2016. Making Sense of Narrative Text: Situation, Repetition, and Picturing in the Reading of Short Stories. London and New York: Routledge.
Vavřín, Martin, and Alexandr Rosen. 2015. “Treq (v. 2.1).” https://treq.korpus.cz/.
Accessed January 2022.
Vincent, Benet, and Jim Clarke. 2017. “The Language of A Clockwork Orange: A Corpus
Stylistic Approach to Nadsat.” Language and Literature 26, no. 3: 247–64.
Wales, Katie. 2011. A Dictionary of Stylistics. 3rd ed. London and New York: Routledge.
Winter, Bodo. 2020. Statistics for Linguists. An Introduction using R. London and New
York: Routledge.
The Avoidance of Repetition in Translation 157
Winters, Marion. 2007. “F. Scott Fitzgerald’s Die Schönen und Verdammten: A Corpusbased Study of Speech-act Report Verbs as a Feature of Translators’ Style.” Meta 52,
no. 3: 412–25.
Zhu, Chunshen. 2004. “Repetition and Signification: A Study of Textual Accountability
and Perlocutionary Effect in Literary Translation.” Target 16, no. 2: 227–52.
Zupan, Simon. 2006. “Repetition and Translation Shifts.” ELOPE: English Language
Overseas Perspectives and Enquiries 3, no. 1–2: 257–68.
9
Feminist Translation of
Sexual Content
A Quantitative Study on Chinese
Versions of The Color Purple
Xinyi Zeng and John Sie Yuen Lee
9.1 Introduction
Many cultures consider references to genitals, sexual intercourse, and other sexrelated content to be taboo. Translators therefore often walk a tightrope balancing
their desire for faithful translation, the need to be explicit for the sake of clarity,
and the pressure to omit or mitigate sensitive material for acceptability. While
omission of sexual content in a translation is a well-known phenomenon (e.g.,
Han 2008; Santaemilia 2011; Wu 2010), few quantitative analyses have been
reported on the frequency and choice of translation strategies for different types
of sexual content. This chapter aims to identify distinctive strategies used in a
feminist translation through a comparison between two Chinese versions of an
English novel, one feminist and the other non-feminist.
Our study focuses on the acclaimed feminist novel The Color Purple (Walker
1985), winner of the National Book Award and the Pulitzer Prize, and its translations by Jie Tao (1998) and Renjing Yang (1987). Among the few translators who
have identified themselves as feminist (Mu 2008, 25), Tao specializes in feminist
literature and has published extensively on feminist consciousness in female writers (Tao 1994, 1995, 2004). Yang, also an expert in American literature, focuses
on Hemingway and does not identify as feminist (Yang 1989). While the sexual
content in these two translations has been discussed in the research literature (e.g.,
Han 2008; Lee 2013), no quantitative comparison has been given.
This study presents a parallel text analysis that quantitatively identifies the
characteristics of a feminist translation, as represented by Jie Tao. Specifically, we
address the following research questions on the translation strategies employed in
the two versions:
•
•
(Q1) How does a feminist translation differ from a non-feminist one in
terms of translation strategy? Adopting a corpus-based methodology, we
annotate the translation strategies used in the feminist and non-feminist versions, focusing on deletion, implicit substitution and transcreation, faithful
translation, and explicit substitution and transcreation.
(Q2) How does the feminist translation strategy vary according to the
nature of the sexual content? We investigate differences in the choice of
DOI: 10.4324/9781003298328-10
Feminist Translation of Sexual Content 159
translation strategy for various sexual content types, including references to
private parts, body exploration, bodily phenomena, stigma, rape, illicit relations, and sexual intercourse.
Quantitative research methods have been increasingly applied in translation studies (Mellinger and Hanson 2017). However, according to a recent survey, only 1
out of 33 reviewed articles on feminist translation on novels employed a corpusbased method (Irshad and Yasmin 2022). To the best of our knowledge, this is the
first parallel text study that statistically characterizes feminist translation practice
and its treatment of different types of sex-related materials. The rest of the chapter
is organized as follows. We first present background information on the text and
summarize previous research on the translation of sexual content (Section 9.2).
We then describe our parallel corpus in terms of the annotation on sexual content
type (Section 9.3) and translation strategy (Section 9.4). Finally, we discuss the
results (Section 9.5) and conclude (Section 9.6).
9.2 Background
9.2.1 Novel and Translators
Celie, the heroine of The Color Purple, is often considered a model for female
empowerment. At the beginning of the novel, she was raped by her stepfather but
knew nothing about it. Her sexual experience with her husband, who did not care
about her, was numb and hurtful. Then she met her true love, Shug, who taught
her to find herself and appreciate her own body. Though criticized for being “too
dirty” and “too explicit” (Holt 1996, 102), the sexual depictions in the novel are
integral to its plot and themes. They immerse the reader in Celie’s world to experience her awakening from her painful childhood and sexual ignorance. This novel
therefore provides valuable material for studying the translation of sexual content.
The personal background of a translator plays an important role in his or her
translation practice. Jie Tao and Renjing Yang are both Chinese experts in American
literature, but they are poles apart in terms of feminist consciousness. While Yang
(1989, 29–35) recognized certain feminist issues, such as the oppression of Black
women, he rarely discussed his translation of sexual content in paratexts. In contrast, Tao, Vice Chairperson of the Domestic and Foreign Women’s Study Center at
Peking University, paid high tribute to Walker’s feminist views and her proposition
of racial and gender equality (Tao 2004, 277). She emphasized the awakening of the
female protagonist and saw the description of the love, growth, and transformation of
Black women as a great achievement of the novel. A comparative study on these two
translators can therefore yield insights into the characteristics of feminist translation.
9.2.2 Feminist Translation and Parallel Text Analysis
Language is a powerful tool to express one’s ideology, ideas, and subversion. Sexrelated content in literature often serves to shape the characters, develop the plot,
160
Xinyi Zeng and John Sie Yuen Lee
and express the writer’s feelings. In feminist translation, such content typically
focuses on the female body, inner desire, and sex life (von Flotow 1991, 69–84).
Parallel text analysis, which compares translations of the same source text, has
been conducted on free indirect speech (Bosseaux 2004, 107–22), report verbs (Winters 2007, 412–25), style (Huang and Chu 2014, 122–41), and discourse presentation (Huang 2014, 94–102). Research on sexual content, however, has mostly been
limited to qualitative analyses on anecdotal examples. The relationship between
body and language was examined in two English translations of Jin Ping Mei
(Liang 2017, 1–20). Textual relations and aesthetic effects of erotic materials were
analyzed in English versions of The Peony Pavilion (Lee and Ngai 2012, 73–94). It
has also been argued that omissions in the translation of Sha Fu not only muddle the
plot but also diminish the feminist sentiment (Wu 2010). In a previous study on The
Color Purple, Han (2008, 69–85) showed that both Tao and Yang “cleansed” the
source text because of the subconscious working of the sex taboo, rather than ethical
considerations. Lee (2013) investigates the woman-identified translation approach
in this novel with regard to sexual coercion, sex subjugation, extramarital affairs,
and same-sex sexuality. However, none of these studies offer systematic comparisons in feminist translation strategies or consider the role of sexual content types.
9.3 Annotation of Sexual Content Types
To quantify the treatment of different types of sexual content, we classified the
content of sex-related sentences in The Color Purple (this section) and then constructed a parallel corpus and annotated the translation strategies (Section 9.4).
9.3.1 Identification of Sex-Related Sentences
To identify sentences with sex-related content, we constructed a list of 31 taboo
keywords with reference to existing lists (Kutner and Brogan 1974, 474–84;
Geer and Bellard 1996, 379–95; Grosser and Walsh 1966, 219–27; Cooper 2007,
27–50; Slamia 2020, 82–98; Yuan 2015, 19–44). These keywords alone would
have missed many sentences that express sexual content through words that
are non-erotic in their primary sense. We therefore also constructed a list of 55
implicit keywords that can acquire a sexual connotation in context (e.g., “button,”
“fresh,” and “clean”).
Since the presence of a keyword does not necessarily mean that the sentence
carries sexual content, human judgment is necessary. To reduce the subjectivity in
interpretation, each sentence was independently labelled by two annotators, both
non-native speakers but proficient in English, as either “sex-related” or “nonsex-related” in its context. The annotators were not shown the keyword in the
sentence. Since manpower constraints did not allow exhaustive annotation of the
entire novel, we retrieved between 1 and 4 example sentences from the novel per
keyword, yielding 80 sentences with taboo keywords and 193 with implicit keywords, hence a total of 273 sentences.
Feminist Translation of Sexual Content 161
Among these 273 sentences, 197 sentences were annotated as sex-related by
both annotators; 65 were annotated as non-sex-related by both, and the remaining 11 elicited different judgments. The Cohen’s kappa coefficient was 0.89, corresponding to the “almost perfect” level of agreement (Landis and Koch 1977,
165). Most disagreements stemmed from interpretations of implicit keywords.
For example, the sentence “He be on her all-time” was considered by one annotator to be a description of sexual intercourse, while the other annotator interpreted
it without sexual tones to mean simply that they stayed together. The remainder
of this article will focus on the 197 sentences considered as sex-related by both
annotators (see Supplemental File).
9.3.2 Sexual Content Type
Sex-related content can serve a variety of literary functions, from sexual innuendos that denigrate women, to descriptions of sexual pleasure that advocate selfdiscovery and self-acceptance. We identified the dominant content type in each of
the 197 sentences according to the following taxonomy.
Stigma (e.g., “slut,” “whore”) covers sentences containing insults, epithets, and
other sexually derogatory terms, usually aimed at female characters. Rape (e.g.,
“wiggle it around,” “hurt”) includes sentences that depict sexual assault, mostly
men forcibly having sex with women. Illicit relations (e.g., “fornication,” “incest”)
refer to adultery, incest, and other illicit sexual relationships, mostly heterosexual.
Sexual intercourse (e.g., “go to bed with,” “sleep with”) is the category for sentences describing sexually intimate behavior, either heterosexual or lesbian, that
does not constitute rape or illicit relations. The remaining three types are directly
related to the female body, in the context of this novel. Body exploration (e.g.,
“haul up,” “shiver”) describes masturbation and one’s sexual feelings. Bodily phenomena (e.g., “pregnant,” “blood”) are natural processes in the female body that
are often considered “unclean,” such as menstruation and pregnancy. Finally, private parts (e.g., “breast,” “nipple”) are mentions of sexual organs, mostly female
genitals, outside the context of sexual intercourse, body exploration, or bodily phenomena. Table 9.1 shows the number of sentences belonging to each type.
Table 9.1 Breakdown of Sentences in Our Corpus According
to Sexual Content Type
Sexual Content Type
# Sentences
Private parts
Stigma
Bodily phenomena
Rape
Sexual intercourse
Illicit relations
Body exploration
28
22
22
18
62
19
26
162
Xinyi Zeng and John Sie Yuen Lee
9.4 Annotation of Translation Strategy
Our annotation scheme for translation strategies can be visualized as a spectrum
(Table 9.2). Faithful translation, the neutral strategy, occupies the middle, with
conservative strategies placed to its left (Section 9.4.1) and explicit strategies to
its right (Section 9.4.2). In a comparison of two strategies (Sections 9.5.2, 9.5.3),
the strategy positioned to the right of the other will be said to be “more explicit.”
All example sentences cited in this section can be found in Table 9.3.
A translation is deemed “faithful” if the sexual content has a similar degree of
explicitness as the source text. In our context, this means explicit English terms
are kept explicit in the Chinese translation, for example, “nipples” in example 4a,
and implicit ones are kept implicit, for example, “his thing” in example 4b.
9.4.1 Conservative Strategies
Deletion means the omission of sex-related content by skipping the entire sentence (example 1a), by removing only the erotic parts of the sentence, or by hiding
those parts with ellipses or suspension points (example 1b). At the conservative
end of the spectrum, deletion can efficiently remove obscenity from the text but
can also damage plot consistency.
Implicit transcreation means rewriting the sentence to reduce indecency, as
guided by the translator’s understanding of the source text. A graphic description,
such as “put his thing up gainst my hip,” can be paraphrased as dong qi shoujiao
lai, “move his hand and feet [on me]” (example 2). Besides avoiding sexual cues
and vulgarity, this strategy can lead to better comprehension by the target readers
than deletion (Spinzi et al. 2018).
Implicit substitution uses euphemistic expressions to replace erotic content, typically through substitutions of specific terms. The word “privates,” for
example, can be substituted with xiao duzi, “stomach,” to reduce the obscenity
(example 3). This strategy is considered closer to faithful translation than implicit
transcreation, which makes more significant changes to sentence structure and
places less emphasis on fidelity and equivalence (Du 2020, 754).
9.4.2 Explicit Strategies
Explicit substitution uses words that are even more explicit or bold than the
original. It serves to clarify the eroticism by decoding slang terms, puns, sexual metaphors, etc. For example, the word “privates” was explicitly substituted
with yinjing, “penis” (example 5), in direct contrast to the implicit substitution
Table 9.2 The Spectrum of Translation Strategies Annotated in Our Corpus
<<< Conservative Strategies <<<
Deletion Implicit
(D)
Transcreation
(IT)
Implicit
Substitution
(IS)
>>> Explicit Strategies >>>
Faithful
Translation
(FT)
Explicit
Substitution
(ES)
Explicit
Transcreation
(ET)
Feminist Translation of Sexual Content 163
Table 9.3 Translation Strategies Illustrated with Example Translations
Strategy
Source Text
Translation
Deletion
1a. Then he push his thing into
my pussy.
None
1b. Listening to you sing, folks
git to thinking bout a good
screw.
聽聽你唱歌,聽眾能夠想起
一次 . . .
“Listening to you singing,
audiences want to . . .”
(Yang).
Implicit
Transcreation
2. First he put his thing up
gainst my hip.
他動起手腳來 。 “He moves
他動起手腳來
his hand and feet on me”
(Tao).
Implicit
Substitution
3. He punch her in the stomach,
she double over groaning but
come up with both hands lock
right under his privates.
他掐住她的肚子,她大聲呻
吟,立即用兩隻 頭打中他
的小肚
小肚子
子。 “He nipped her
stomach, she yelling loudly
and punched his stomach
immediately” (Yang).
Faithful
Translation
4a. First time I got the full sight
of Shug Avery long black body
with it black plum nipples,
look like her mouth, I thought
I have turned into a man.
我第一次看到莎格•艾弗裡
瘦長的黑身體和像她嘴唇
一樣的黑梅子似的乳
乳頭的
時候,我以為我變成男人
了。 “First time I saw Shug
Avery’s long black body and
her nipples which looks like
her mouth, I thought I have
turned into a man” (Tao).
4b. First he put his thing up
gainst my hip
他把那東西隆起,頂著我的屁
股頂著我的屁股 “His thing
perked up and gainst my hip”
(Yang).
Explicit
Substitution
5. He punch her in the stomach,
she double over groaning but
come up with both hands lock
right under his privates.
他一 打在她的肚子上,她
彎下身子大聲哼哼,可馬上
把兩手緊緊攥住他的陰莖。
“He punches her in the stomach;
she bent down with groaning
but gripped his penis” (Tao).
Explicit
Transcreation
6. How it hurt and how much
I was surprise.
他搞
搞得我多疼。 我多麼
吃驚啊。 “How much he
fucked me. How much I was
surprised” (Yang).
in example 3. In a feminist translation, these substitutions can focus the reader’s
attention on the female experience.
Explicit transcreation also uses language that is more explicit or bold than the
original, but in a “looser” translation. As mentioned in Section 9.4.1, the translator’s understanding of the text is prioritized in transcreation over equivalence to
164
Xinyi Zeng and John Sie Yuen Lee
the source text. Example 6 shows a creative rendering of “how it hurt” into a more
explicit expression.
9.4.3 Annotation and Inter-Annotator Agreement
Since the line between faithful translation and other strategies is often blurred,
annotation of translation strategy can be subjective due to individual background
and interpretation. To enhance the reliability of the analysis, we recruited eight
native speakers of Chinese to participate in the annotation. All were professional
translators, and six of them graduated from professional translation majors.
We obtained draft sentence segmentation and English-Chinese sentence alignment from SDL Trados1 and then manually corrected the alignments. The translation strategies in each of the 197 sentences translated by Tao and the same
sentences translated by Yang were independently classified by two annotators:
the first author of this article, and one of these eight professional translators. The
kappa coefficients were 0.58 and 0.56 for Tao and Yang, respectively, corresponding to a moderate level of agreement (Landis and Koch 1977, 165). In case of
disagreement, the two annotators reconciled their differences through discussion
to finalize the label. The most common disagreement was between faithful translation and explicit transcreation. Some annotators were more sensitive in detecting sexual emphases in the translation. Consider the sentence “When that hurt,
I cry” and its translation, Wo teng de hen, han le qilai (我疼得很,喊了起來),
“I feel pain and cry.” It was judged to be faithful translation by one annotator, but
explicit transcreation by another, which may arguably be attributed to an overinterpretation of the word han, “cry.”
9.5 Results and Analysis
We first compare the overall distribution of translation strategies in the feminist
and non-feminist translations (Section 9.5.1) and then examine sentences for
which the two versions differ in translation strategies (Section 9.5.2). Next, we
investigate the role of sexual content types (Section 9.5.3), followed by a discussion on the significance of the results (Section 9.5.4).
9.5.1 Overall Comparison
We use the term “conservative strategies” to refer to deletion, implicit substitution, and implicit transcreation (Section 9.4.1); and the term “explicit strategies”
to refer to explicit substitution and explicit transcreation (Section 9.4.2). Figure 9.1 presents the overall distribution of translation strategies. Faithful translation was the most common strategy, adopted in about two-thirds of the sentences
in both the feminist (67.0%) and non-feminist (64.0%) versions. The feminist
translation uses explicit strategies more often (7.6%) than the non-feminist one
(5.6%); conversely, the non-feminist translation uses conservative strategies more
often (30.5%) than the feminist one (25.4%). The breakdown of the frequency of
Feminist Translation of Sexual Content 165
Figure 9.1 Overall distribution of translation strategies for sexual content in feminist (Tao)
and non-feminist (Yang) translation.
Table 9.4 Breakdown of Translation Strategy Frequency in Feminist (Tao) and Non-Feminist (Yang) Translation
Translation Strategy
Feminist
Non-feminist
Deletion
Implicit Transcreation
Implicit Substitution
Faithful Translation
Explicit Substitution
Explicit Transcreation
31
2
17
132
7
8
38
3
19
126
6
5
individual strategies, shown in Table 9.4, is consistent with the overall trend and
suggests a slightly more tolerant attitude for the feminist translator.
9.5.2 Sentence-Level Comparison
The sentence-level comparison investigates whether the feminist or non-feminist
version translates a sentence in a more explicit manner. The feminist sentence is
considered “more explicit” if its translation strategy is positioned on the spectrum
(Table 9.2) to the right of the strategy taken in the non-feminist sentence, and vice
versa.
The feminist and non-feminist translations adopt the same strategy in a majority of sentences (128 out of 197), as shown in the diagonal cells in Table 9.5.
Among the remaining sentences, the feminist ones are more explicit than the
166
Xinyi Zeng and John Sie Yuen Lee
Table 9.5 Sentence-Level Comparison of the Translation Strategies in the Feminist (Tao)
and Non-Feminist (Yang) Translations
Non-feminist →
↓ Feminist
Deletion
Implicit
Transc.
Implicit
Subst.
Faithful Translation
Explicit
Subst.
Explicit
Transc.
Deletion
Implicit
Transc.
Implicit
Subst.
Faithful
Translation
Explicit
Subst.
Explicit
Transc.
22
0
1
0
0
0
8
1
0
0
0
1
3
0
7
6
1
0
13
0
2
0
10
1
99
6
4
0
4
0
0
0
1
6
1
0
non-feminist counterparts most of the time. This can be seen in the 43 cases in the
cells in the lower triangle, in comparison to the 26 cases in the cells in the upper
triangle. The bolded figures show the four sentences that take opposite sides of
the strategy spectrum – that is, one conservative and one explicit – in the two versions. Elsewhere, the gap is always smaller: we first discuss cases contrasting a
faithful translation and a conservative one (Section 9.5.2.1), then those involving
a faithful translation and an explicit one (Section 9.5.2.2).
9.5.2.1 Conservative vs. Faithful Translation
There are 23 sentences that were translated faithfully by the feminist but conservatively by the non-feminist. The majority of the deleted sentences (7 out of
13 sentences) deleted by Yang deal with masturbation, for example when Shug
taught Celie to touch and explore her body. In the other 10 sentences, Yang mostly
used implicit substitution for private parts and heterosexual scenes.
Conversely, 15 sentences were translated faithfully by the non-feminist but
conservatively by the feminist. Tao used deletion in 8 sentences carrying a variety
of sexual content, including rape, adultery, masturbation, and sexual intercourse.
She applied implicit substitution and transcreation on three sentences describing
heterosexual intercourse, one describing rape, and three with sexual insults on
female characters. Yang gave a faithful translation despite the negative light cast
on the female characters.
9.5.2.2 Explicit vs. Faithful Translation
The treatment of pronouns is responsible for the bulk of the 12 sentences that were
translated faithfully by the non-feminist but explicitly by the feminist. Tao used
specific terms such as huaiyun (懷孕), “pregnancy,” and yuejing (月經), “menstruation,” to clarify the pronoun “it,” while Yang maintains the ambiguity with
the euphemism zhezhong shi (這種事), “this kind of thing.” Likewise, Tao turned
Feminist Translation of Sexual Content 167
“kind of love” into the more vivid zuo’ai (做愛), “make love,” which facilitates
better understanding than the oblique hint in Yang’s nazhong aiqing (那種愛情),
“that love.”
Conversely, eight sentences have faithful translation in the feminist version but
explicit translation in the non-feminist version. All these sentences involve sexual
assault or stigma. Consider the expression “How it hurt” in example 6 (Table 9.3).
Yang elaborated the pronoun “it” into ta gao de wo duo teng, “how much he
fucked me,” using the pejorative verb gao in a vivid description of the rape. In
contrast, Tao gave a neutral translation, nage tengtong de ziwei (那個疼痛的滋
味), “that hurtful feeling.”
9.5.3 Sexual Content Types
We now assess whether different kinds of sexual content have any bearing on
the choice of translation strategy. Our discussion will refer to the overall rates
of faithful translation, conservative, and explicit strategies shown in Figure 9.2,
using the same definitions for “explicit strategies” and “conservative strategies”
as in Section 9.5.1. These rates are shown according to sexual content type, listed
in decreasing rate of conservative translation in the feminist version. We first discuss the types in which the feminist translation exhibits a lower rate of conservative strategy than the non-feminist (Section 9.5.3.1), and then those in which it has
a higher rate (Section 9.5.3.2).
Our discussion will also refer to the sentence-level comparison in Figure 9.3.
Using the same definition as Section 9.5.2, a translation strategy is said to be
“more explicit” if it is positioned to the right of another strategy on the spectrum.
All example sentences cited in this section can be found in Table 9.6.
Tao
Body exploration
Yang
Tao
Private parts
Yang
Tao
Rape
Yang
Tao
Sexual intercourse
Yang
Tao
Illicit relations
Yang
Tao
Bodily phenomena
Yang
Tao
Stigma
Yang
0.00%
20.00%
40.00%
Conservative (D+IS+IT)
60.00%
Faithful translation
80.00%
100.00%
Explicit (ES+ET)
Figure 9.2 Overall distribution of translation strategies according to sexual content type in
feminist (Tao) and non-feminist (Yang) translation.
168
Xinyi Zeng and John Sie Yuen Lee
Figure 9.3 Sentence-level comparison: the number of sentences translated with a more
explicit strategy in the feminist (Tao) than non-feminist (Yang) translation, and
vice versa.
Table 9.6 Example Sentences for Each Sexual Content Type
Source text
Strategy
Stigma
Faithful
1. He beat me for dressing Translation
trampy.
Implicit
Transcreation
Implicit Substitution
Bodily Phenomena
2. When I start to hurt and
then my stomach start
moving and then that
little baby come out my
pussy chewing on it fist.
Deletion
Translation
他揍我,因為我穿得像個蕩婦,
“He beats me because I dress like a
tramp” (Tao).
他打我,怪我穿得邋裡邋遢的
。“He beats me and blames me to
dress dirty” (Yang).
我的肚子突然一陣疼痛,肚子動
了起來,一個小娃娃從我那個
地方掉了出來,啃著手指頭。
“When I felt hurt my belly started
to move and one little baby came
out from that place, chewing
fingers” (Tao).
我開始覺得疼時,緊接著,我
的肚子就蠕動起來,後來,那
小傢伙就生出來了,用嘴巴咬
著小 頭,簡直叫我大吃一
驚 。“When I felt hurt, my belly
started moving, then the little guy
came out and biting fist which
surprised me” (Yang).
Feminist Translation of Sexual Content 169
Table 9.6 (Continued)
Source text
Strategy
Translation
Sexual Intercourse
3. Much as I still want to
be with her.
Explicit Transcreation 儘管我非常想跟她親熱 “even
though I very much want to make
out with her” (Tao).
我仍然很想跟她同床 “I still very
Faithful Translation
much want to sleep in the same
bed with her” (Yang).
哈波酒店去的時候打扮得漂漂亮
Implicit
Private Parts
亮 ,抹得香噴噴的,可就不敢看
Substitution
4. All dressed up for
自己的下身。“Dressed beautiful to
Harpo’s, smelling good
Harpo’s hotel, smelling good but you
and everything, but
dare not to look down the lower part
scared to look at your
of your body” (Tao).
own pussy.
上哈
打扮得漂漂亮亮的,
Deletion
的
去,
香 ,
,
的 . . . 。“Dressed beautiful to
Harpo’s bar, perfume is all around,
everything is good, but you dare
not to . . .” (Yang).
Body Exploration
我下身
Faithful
來
5. My little button sort of Translation
了。 “The little button from the
perk up too.
lower part of the body has perked
up” (Tao).
None (Yang)
Deletion
Rape
Faithful Translation
事後,我說,他讓我把他的頭髮
6. After he through,
。 “After it is done, I said, he
I say, he mate me finish
made me trim his hair” (Tao).
trimming his hair.
Explicit Substitution 他搞
搞過以後,我說,就叫我把他
。“After he fucked
的頭
me, I said, [he] asked me to trim
his hair” (Yang).
Illicit Relations
Implicit Transcreation 跟随便哪個湯姆、迪克、哈里之類
7. Got they legs open to
可以睡覺。 “[They] may sleep
every Tom, Dick and
with anyone, Tom, Dick, Harry and
Harry.
the like” (Tao).
Explicit Transcreation 她們跟湯姆、狄克和哈里鄉亂搞,
不論
人
。 “They
fucked with Tom, Dick and Harry,
fucked with all kinds of men” (Yang).
9.5.3.1 Less Conservative in Feminist Translation
STIGMA
In the feminist translation, stigma is subject to the lowest rate of conservative translation (4.5%) among all sexual content types (Figure 9.2). The feminist translator
used conservative strategies more sparingly than did the non-feminist counterpart
170
Xinyi Zeng and John Sie Yuen Lee
(18.2%), in line with the sentence-level comparison (Figure 9.3), which shows her to
be slightly more explicit. These include four instances of faithful translation by Tao
and implicit substitution or transcreation by Yang. In example 1, Celie was scolded
by her stepfather as “trampy” when she dressed up to attract his attention and protect her sister from sexual harassment. Since the context suggests a slang usage of
the word for a disreputable, promiscuous woman, Tao rendered “trampy” with the
derogatory term dangfu, “slut.” In contrast, Yang translated it as lalilata, “messy,”
taking the less sexually charged sense of a dirty wanderer living on the street.
Among cases of explicit treatment of stigma, however, the non-feminist translation outnumbers the feminist. Many of these sentences are associated with virginity, for example, with the words “fresh” and “old-maid.” Yang highlighted the
insinuation using explicit substitution with chunv (處女), “virgin,” while Tao deemphasized it, either with lao guniang (老姑娘), “old spinster,” or with the metaphorical term huanghua guinü (黃花閨女), “chrysanthemum girl.”
BODILY PHENOMENA
These phenomena incur the second-lowest rate of conservative translation for the
feminist (9.1%) and the lowest rate for the non-feminist (13.6%). Terms related
to these common and natural phenomena, such as menstruation and childbirth,
appear to be more acceptable than those in most other types. Although Yang was
only slightly more conservative than Tao overall, the sentence-level comparison
shows Tao to be invariably more explicit when the two diverged. A case in point
is her implicit substitution of “pussy” with nage difang, “that place,” in contrast
to the outright deletion by Yang (example 2). Her disambiguation of the pronoun
“it” into “menstruation,” discussed in Section 9.5.2.2, is another one.
SEXUAL INTERCOURSE
The feminist translation is slightly less conservative (28.0%) than the non-feminist one (34.0%) overall. The sentence-level comparison reveals a larger gap, with
Tao being more explicit in 16 out of the 25 sentences in which they employed different strategies. The lesbian scenes involving Celie and Shug contributed much
to the difference. Among the 12 sentences in this category, Tao used explicit strategy in three instances, while Yang never did so. In example 3, Tao stressed Celie’s
active desire for sex by amplifying “to be with her” as gen ta qinre, “make out
with her”; however, Yang preserved the ambivalence in the source text with the
faithful translation gen ta tongchuang, “sleep in same bed with her.”
PRIVATE PARTS
The vast majority are references to female genitals, with only four on the male
counterparts. The non-feminist translation applies conservative strategies on 50%
of these references, significantly more often than did the non-feminist (35.7%).
Tao was more explicit in a majority of sentences (7 out of 10). For example, in the
Feminist Translation of Sexual Content 171
sentence “I got my eyes glue there too,” she boldly rendered “there” as xiongpu
(胸脯), “bosom”, while Yang evaded the reference with wo yeshi yiyang (
一樣), “me too.” While both translators seemed comfortable with private parts in
the upper body (e.g., “nipples”), they almost always avoided the lower body, for
example, with “pussy” in example 4.
BODY EXPLORATION
This content type triggers the highest rate of conservative treatment for both the
feminist and non-feminist, likely because of the lack of socially acceptable terms
for masturbation (Millett 2000, 55–58). Yang was more conservative (57.7%)
than Tao (38.5%) overall, often deleting sensitive words describing one’s feelings
during masturbation. The sentence-level comparison corroborates this trend with
a clear contrast. Tao was more explicit than Yang in 7 out of 9 cases, including the
treatment of “button” in example 5.
9.5.3.2 More Conservative in Feminist Translation
RAPE
In a departure from the content types discussed previously, when dealing with
rape, the feminist translation is both more conservative (27.8 vs. 16.7%) and less
explicit (0% vs. 16.7%) than the non-feminist (Figure 9.2). The sentence-level
comparison yields the same observation, showing Yang to be more explicit (Figure 9.3). Yang ensured the reader understands “after he through” in example 6
with ta gao guo yihou, “after he fucked me,” while Tao obscures the harm done to
Celie with shihou, “after it is done.” A similar contrast can be observed in example
4b in Table 9.3, where Yang preserved the graphic depiction of Celie’s rape by her
stepfather, while Tao opted to remove the details with an implicit transcreation.
ILLICIT RELATIONS
Tao was more conservative than Yang overall (26.3% vs. 15.8%) when describing
socially unaccepted relations, mostly adultery and incest. The direct comparison
also shows Yang to be more explicit in 3 out of 5 cases. Two examples include the
translation of “screw” as tongjian (通姦), “fornicate,” in the sentence “Feed fifty
men, screw fifty-five,” and “got they legs open” as luan gao, “fuck wantonly,” in
example 7. In both of these cases, Tao chose implicit substitution with the euphemistic term shuijiao, “sleep.”
9.5.4 Discussion
Compared to a non-feminist translator, Jie Tao would be expected to hold a more
open attitude for sex-related content, especially in a novel that “contributes to
feminism and woman’s liberation” (Tao 2004, 276–7). However, Renjing Yang
172
Xinyi Zeng and John Sie Yuen Lee
chose conservative strategies only slightly more often than did Tao (30.5% vs.
25.4%). He also opted for faithful translation almost as frequently (64.0% vs.
67.0%), all the more remarkable considering that his publication (1987) preceded
hers (1998), separated by a decade of liberalization. Hence, a high rate of faithful
translation, by itself, may not be a decisive metric for the feminist perspective.
The rate could be attributed instead to the translator’s view on text acceptability,
such as in the case of Yang, who considered the sexual content of the novel to be
measured.2
The overall rate of faithful translation masks the underlying differences in sexual
context types, which we found to be more revealing for the feminist perspective. On
topics directly related to the female body, such as private parts, bodily phenomena,
and body exploration, the feminist translation applies conservative strategies more
sparingly (Section 9.5.3.1). Faithful or explicit translation of masturbation, for example, can be motivated by the translator’s approval of Celie’s self-discovery of her
body, in accordance with feminist convictions. Frank descriptions of private parts
and bodily phenomena can likewise be explained as a feminist effort to highlight the
female protagonist’s growing sexual consciousness. Nonetheless, among all content
types, private parts and body exploration are most often translated conservatively
(Figure 9.3), indicating their relative unacceptability among the Chinese readership.
The feminist attitude to stigma is nuanced. Perhaps similar to its treatment on
the female body, the feminist translation prefers an unvarnished portrayal of sexual insults (e.g., “trampy”) to expose the systemic gender-based oppression. It is,
however, more conservative when the stigma involves virginity (e.g., “old maid”).
Since virginity is often bound up with the control of the female body, denial of
sexual desire, and depression of sexual knowledge (Millett 2000, 55), the feminist
translation might be inclined to de-emphasize these associations.
The feminist translation is also more conservative than the non-feminist on
illicit relations and sexual assault (Section 9.5.3.2). Since the illicit relations often
involve promiscuity and adultery, which are frowned upon by traditional Confucian culture, the conservative strategies could be an attempt to minimize these stereotypes for the novel’s heroines. The victims of sexual assault are almost always
female characters in this novel. Unlike verbal violence such as stigma, physical
violence is depicted less explicitly in the feminist translation, likely to shield readers from the graphic abuse. Less sensitive to this need, the non-feminist translation finds the pain and cruelty in the description more acceptable. A case in point
is Yang’s frequent use of the verb gao, which can mean “rape” and “molest” in a
sexual context,3 despite the possible unpleasant effect on readers.
Our results should be interpreted with a number of limitations in mind. With
regard to translation strategies, while our study has examined six common ones
(Section 9.4), others such as archaicization, footnoting, generalization, and hijacking could potentially yield further insights (von Flotow 1991, 69–84; Vinay and
Darbelnet 1995; Lee and Ngai 2012, 73–94). With regard to the textual material,
a corpus with more translators could support more comprehensive analyses, such
as the role of the publisher, as well as the translator’s gender, which may influence
the description of private parts and other aspects of the female body. A larger pool
Feminist Translation of Sexual Content 173
of translators in both genders involving a variety of publishers could reduce these
confounding variables.
9.6 Conclusion
Despite the extensive literature on the translation of sexual content, most studies
on feminist translation of novels have been limited to qualitative methods (Irshad
and Yasmin 2022). We have presented the first quantitative comparison between
feminist and non-feminist translations of a novel, through a parallel text analysis
of the Chinese versions of The Color Purple by Jie Tao and Renjing Yang. We
constructed a parallel corpus consisting of their translations of 197 sex-related
sentences, annotated with their sexual content types and translation strategies.
Our analysis has revealed that while faithful translation dominates both the
feminist and non-feminist versions, the former employs explicit strategies slightly
more frequently and conservative ones less frequently. Importantly, our results
have identified distinctive strategy choices for different sexual content types. On
content related to body exploration, private parts, and bodily phenomena, the
feminist translation is less conservative than the non-feminist one; however, on
rape, illicit relations, and stigma related to virginity, the feminist translation is
more conservative.
Further research can pursue a more thorough examination of the translators’
metatexts and paratexts to present a fuller picture on the distinctive practices in
feminist translation. The corpus could also be expanded to include more feminist
and non-feminist translators to investigate other variations in the patterns of feminist translation strategies.
Notes
1 www.trados.com.
2 “Thinly disguised, whereas it is reasonable and not indecent” (in the original Chinese, 露
而不穢,較有分寸) (Yang 1989, 34).
3 Cf. examples from Wen (2012), e.g., gao nüren (搞女人), “molest young woman”;
zhege liumang gaole ta, 這個流氓搞了她, “the hooligan raped her.”
References
Bosseaux, Charlotte. 2004. “Point of View in Translation: A Corpus-based Study of French
Translations of Virginia Woolf’s To The Lighthouse.” Across Languages and Cultures 5,
no. 1: 107–22. https://doi.org/10.1556/Acr.5.2004.1.6.
Cooper, Burns. 2007. “Taboo Terms in a Sexual Abuse Criminal Trial.” International
Journal of Speech Language and the Law 14, no. 1: 27–50. https://doi.org/10.1558/ijsll.
v14i1.27.
Du, Chen. 2020. “New Interpretation and Techniques of Transcreation.” APTIF 9 – Reality
Vs. Illusion 66, no. 4–5: 750–64. https://doi.org/10.1075/babel.00178.che.
Geer, James H., and Heidi S. Bellard. 1996. “Sexual Content Induced Delays in Unprimed
Lexical Decisions: Gender and Context Effects.” Archives of Sexual Behavior 25, no. 4:
379–95. https://doi.org/10.1007/BF02437581.
174
Xinyi Zeng and John Sie Yuen Lee
Grosser, George S., and Anthony A. Walsh. 1966. “Sex Differences in the Differential
Recall of Taboo and Neutral Words.” The Journal of Psychology 63, no. 2: 219–27.
https://doi.org/10.1080/00223980.1966.10543035.
Han, Ziman. 2008. “Sex Taboo in Literary Translation in China: A Study of the Two Chinese Versions of the Color Purple.” Babel 54, no. 1: 69–85.
Holt, Patricia. 1996. Alice Walker Banned. San Francisco, CA: Aunt Lute Books.
Huang, Libo. 2014. “Discourse Presentation Translation as an Indicator of Translator’s
Style: A Case Study of Lao She’s Luotuo Xiangzi and Its Three English Translations
Style.” In Style in Translation: A Corpus-Based Perspective, 57–77. Berlin: Springer.
https://doi.org/10.1007/978-3-662-45566-1_5.
Huang, Libo, and Chiyu Chu. 2014. “Translator’s Style or Translational Style? A CorpusBased Study of Style in Translated Chinese Novels.” Asia Pacific Translation and Intercultural Studies 1, no. 2: 122–41. https://doi.org/10.1080/23306343.2014.883742.
Irshad, Isra, and Musarat Yasmin. 2022. “Feminism and Literary Translation: A Systematic
Review.” Heliyon 8, no. 3: 1–12. https://doi.org/10.1016/j.heliyon.2022.e09082.
Kutner, Nancy G., and Donna Brogan. 1974. “An Investigation of Sex-Related Slang
Vocabulary and Sex-Role Orientation among Male and Female University Students.”
Journal of Marriage and the Family 36, no. 3: 474–84. https://doi.org/10.2307/350718.
Landis, J. Richard, and Gary G. Koch. 1977. “The Measurement of Observer Agreement
for Categorical Data.” Biometrics 33, no. 1: 159. https://doi.org/10.2307/2529310.
Lee, Tong-King, and Cindy Ngai. 2012. “Translating Eroticism in Traditional Chinese
Drama: Three English Versions of the Peony Pavilion.” Babel 58, no. 1: 73–94.
Lee, Tzu-yi. 2013. “Woman-identified Approach in Practice: A Case Study of Four Chinese
Translations of the Novel The Color Purple.” In Bridging the Gap between Theory and
Practice in Translation and Gender Studies, edited by Eleonora Federici and Vanessa
Leonardi, 75–85. Cambridge: Cambridge Scholars Publishers.
Liang Wayne Wen 梁文駿. 2017. “Lun zhongguo yanqing xiaoshuo fanyi: yi Jinpingmei
豔情小說翻譯:以《金瓶梅》英譯本為例” [Translating Chiyingyiben weili
nese Erotic Literature: A Case Study of the English Translations of Jin Ping Mei]. SPECTRUM: NCUE Studies in Language, Literature, Translation 英語文暨口筆譯學集刊
15, no. 1: 1–20.
Mellinger, Christopher, and Thomas A. Hanson. 2017. Quantitative Research Methods in
Translation and Interpreting Studies. London and New York: Routledge.
Millett, Kate. 2000. Sexual Politics. Urbana, IL: University of Illinois Press.
Mu Lei 穆雷. 2008. Fanyi yanjiu zhong de xingbie shijiao 翻譯研究中的性別視角 [Gender Perspective in Translation Studies]. Wuhan: Wuhan University Press.
Santaemilia, José. 2011. “The Translation of Sexually Explicit Language: Almudena
Grandes’ Las edades de Lulu (1989) in English.” In Translation and Opposition, edited
by D. Asimakoulas and Margaret Rogers, 256–82. Bristol: Multilingual Matters.
Slamia, Fatma Ben. 2020. “Translation Strategies of Taboo Words in Interlingual Film
Subtitling.” International Journal of Linguistics, Literature and Translation 3, no. 6:
82–98. http://doi.org/10.32996/ijllt.
Spinzi, Cinzia, A. Rizzo, and M. L. Zummo, eds. 2018. Translation or Transcreation? Discourse, Text and Visuals. Newcastle upon Tyne: Cambridge Scholars Publishing.
Tao Jie 陶潔. 1994. “Ailisi Mengluo bixia de xiaozhen funü” 艾麗絲·蒙蘿筆下的小鎮
婦女 [Ladies in the town from Alice Ann Munro]. In Xinling de Guiji Jianada Wenxue
Lunwen Ji 心靈的軌跡 : 加拿大文學論文集 [The Track of the True Heart: Canadian
Literature], edited by Mingli Qin 秦名利, Tao Jie 陶潔 and Weng Dexiu 翁德修, 1–16.
Beijing: China Federation of Literary and Art Circles Publishing House.
Feminist Translation of Sexual Content 175
Tao Jie 陶潔. 1995. Yuwai nvxing 域外女性 [Foreign Females]. Beijing: Peking University.
Tao Jie 陶潔. 1998. Ziyanse 紫顏色 [The Color Purple]. Nanjing: Yilin Publication.
Tao Jie 陶潔. 2004. Dengxia xichuang meiguo wenxue he meiguo wenhua 燈下西窗 – 美
學和美
化 [Light from the Window on the West: American Literary Analysis].
Beijing: Beijing University.
Vinay, Jean-Paul, and Jean Darbelnet. 1995. Comparative Stylistics of French and English
Translated and English: A Methodology for Translation. Amsterdam: John Benjamins.
Von Flotow, Luise. 1991. “Feminist Translation: Contexts, Practices and Theories.” TTR:
Traduction, Terminologie, Redaction 4, no. 2: 69–84.
Walker, Alice. 1985. The Color Purple. New York: Pocket Books.
Wen Shaoxian 溫紹賢. 2012. Zhongwen changyong ciju ji zhongying duibi yu fanyi 中文
常用詞句及中英對比與翻譯 [Common Chinese Words and sentences and Comparison
with English and Translation]. Hong Kong: Everflow Publications.
Winters, Marion. 2007. “F. Scott Fitzgerald’s Die Schönen und Verdammten: A Corpusbased Study of Speech-act Report Verbs as a Feature of translator’s Style.” Meta 52, no.
3: 412–25.
Wu, Yi-ping. 2010. “A Study in the English Translation of Eroticism: The Case of Li Ang’s
Sha Fu.” In The Erotic in Context, edited by M. Soraya García-Sánchez, Cara Judea
Alhadeff, and Joel Kuennen, 161–70. Leiden: Brill.
Yang Renjing 楊仁敬. 1987. Zise 紫色 [The Color Purple]. Beijing: Shiyue Wenyi
Publication.
Yang Renjing 楊仁敬. 1989. “Meiguo heiren wenxue de xin tupo – ping Ailisi Woke Zise”
美 黑人文學的新突破 – 評愛麗絲·沃克《紫色》 [A New Breakthrough in American Black Humanities – Review of Alice Walker’s The Color Purple]. Foreign Literature Studies
學研究 3, no. 1: 29–36.
Yuan, Long. 2015. “The Subtitling of Sexual Taboo from English to Chinese.” PhD dissertation, Imperial College. https://doi.org/10.25560/31546.
10 Benefits of a Corpus-based
Approach to Translations:
The Example of Huckleberry
Finn
Ronald Jenn and Amel Fraisse
10.1 Introduction
Combining an open call to literature and translation studies scholars, existing
bibliographical data, the UNESCO’s “Index Translationum,” and crowdsourcing
input, a multilingual parallel corpus of translations of Adventures of Huckleberry
Finn was compiled, aligned by chapter and paragraph, and exploited at different
levels of granularity and expertise to the benefit of diverse scholarly communities.
Besides better knowledge of how a single author’s ideas and texts were translated
and interpreted in different languages, this approach to translated texts resulted
in a refinement of NLP approaches to literary texts and helped rare and underresourced languages along the way.
This chapter discusses the different stages of an interdisciplinary and international project as a blueprint for future collaborations. The necessary steps consist in finding a sufficiently well-traveled author and text that can provide ample
material for study, mobilizing the tools for mining databases, extracting the texts,
and aligning them to establish a comparable corpus. Once the digital phase is over
and the texts are retrieved and aligned by NLP and information science experts,
humanities scholars versed in the local languages and cultures can step in to conduct refined textual analysis.1
10.2 Adventures of Huckleberry Finn Described
One essential ingredient of a corpus-based approach to translations is the existence
of an active and dedicated international community of scholars willing to engage
in a transnational approach of its author, text, or genre. The Mark Twain community, within which a globalized and transnational approach has been trending
for over a decade, grew interested in the fate of Adventures of Huckleberry Finn,
worked together with experts of other fields, and called on the international community in order to identify and collect existing translations in different languages.
First prized as the quintessence of the American spirit, the American frontier,
the adventurous West, and as is the case for Huckleberry Finn, the sultry and
violence-prone South, Mark Twain’s fame went unabated even after those cultural
features became history. The novel deals with transnational and universal topics
DOI: 10.4324/9781003298328-11
Benefits of a Corpus-based Approach to Translations
177
such as childhood, coming of age, freedom, racism, and slavery.2 Thanks to its
many universal ingredients, Huckleberry Finn survived the upheavals and evolutions of the twentieth century. Its international fame was actually kept alive and
revived several times in the contexts of mass literacy, the ideological divides of
the twentieth century, as well as the emerging teenage culture, and currently, well
into the twenty-first century, it nourishes ethical questions linked to race relations.
The text has a number of linguistic specificities, such as the use of dialects that
would hypothetically bar it from being successfully translated, and yet it proved
far from untranslatable and achieved worldwide fame (see Section 10.7).
Published in 1885 in the United States and the previous year in Great Britain,
Huckleberry Finn traveled fast. Within a short time span, Denmark and Sweden
(1885), France (1886), Russia (1888), Germany (1890), and Poland (1898) came
up with their own versions of the novel. These earlier versions and the succeeding ones across a wide range of countries mainly catered to educational institutions and publishers bent on providing reading material to a wide readership in a
context of mass literacy. After World War II, there was a shift of emphasis as the
educational aspect faded to the background while political implications came to
the fore as, paradoxically, the East and West competed for the American icon in
a bipolar world.3 At about the same time mass literacy was completed, the civil
rights movement in the United States and the worldwide independence movement
entailed growing sensitivity to African American voices in the novel and its translations. All this results in an astounding number of available translations (into 64
languages, sometimes multiple times over a century and a half) and makes Huckleberry Finn an ideal text to use as a prototype in an investigation of the global
circulation of literary texts.
Translations of Mark Twain’s work received relatively little scholarly attention
until the 1982 publication of Robert Rodney’s landmark study Mark Twain International. A goldmine of bibliographical references of foreign editions of Mark
Twain’s work in English and in translation, Rodney’s work is foundational.4 Progress on the international dimension of Mark Twain accelerated in the 2010s with
contributions by Tsuyoshi Ishihara (Mark Twain in Japan, 2011), Selina Lai-Henderson (Mark Twain in China, 2015), and Paula Harrington and Ronald Jenn (Mark
Twain and France, 2017). Also during this time, Shelley Fisher Fishkin advocated
a more comprehensive and global (Mark Twain Anthology: Great Writers on His
Life and Work 2010), which included translations of foreign criticism.5 She was
also the first to foresee how the rise of digital tools applied to translation studies
could benefit transnational approaches to Mark Twain in a landmark article (“Deep
Maps” 2011). Undoubtedly, the anniversary of Mark Twain’s death in 2010, which
came with the long-expected and much-touted Autobiography publication spurred
renewed interest in his works and spawned novel translations and scholarship.
10.3 Global Huck and Rosetta
Amid efforts to develop new ways of consolidating an understanding of global
translations of Mark Twain’s Huckleberry Finn and exploring potential directions
178
Ronald Jenn and Amel Fraisse
for the future, building on the scholarship of the past, the Global Huck project was
created. It was first presented during the Eighth International Conference on the
State of Mark Twain Studies in 2017, as part of “The Place of Mark Twain in Digital Humanities” workshop. It involved identifying global editions of Huckleberry
Finn and having scholars contribute comments about the cultural work that each
translation does in specific cultural contexts.6
In 2019, Global Huck morphed into the Rosetta Project, helped by the FranceStanford Fund and the Center for Interdisciplinary Studies and Stanford’s Center
for Spatial and Textual Analysis (CESTA). Rosetta looked more specifically at
versions from a number of low-resourced languages. The projects first relied
on crowdsourcing as well as inclusive, interactive, and collaborative user-based
approaches for data and information collection. Natural language processing
methods were used to generate multilingual corpora with a view to provide material for language resources (corpora, thesauri, dictionaries). Information science
experts located and retrieved digital versions and aligned them, first by chapter,
then by paragraph and line by line. These tasks, conducted by IT experts, do not
require an intimate knowledge of the text or Mark Twain scholarship.
Among the results of Rosetta is an interactive map that was created and which is
currently supported by Huma-Num, a very large research infrastructure for facilitating the digitization of research in the humanities and social sciences funded
by the European Commission. The map’s URL is: https://rosetta.huma-num.fr/
worldmap/index.html.
Through this map, users and scholars have the opportunity to gain insight into
the global circulation of the novel. The items on the map, compiled from the different inputs (see Section 10.3), display the title in the target language, the first
year of publication, the name of the translator, and the publisher, when available.
10.4 Method of Gathering Material
In this section, we look at how the corpus was gathered, starting with the UNESCO’s Index Translationum, a most reliable source to assess the scope of translations accumulated across the world for a given author.
The UNESCO’s Index Translationum quantifies the volume of translations
worldwide and breaks them down by author, particular title, language, and country. Mark Twain is ranked 15th in the top 20 of the most translated authors and
comes only after behemoths like Agatha Christie, Jules Verne, Alexandre Dumas,
and Conan Doyle. Like other prolific nineteenth-century writers, Mark Twain was
popular right from the start and continued to be translated in many languages
throughout the twentieth century and well into the twenty-first.
Adventures of Huckleberry Finn is only the second most commonly translated
of Mark Twain’s books, after The Adventures of Tom Sawyer. However, the former was early on deemed essential and central to the canon of American literature,
and there clearly is more interest among Mark Twain scholars in Huckleberry
Finn, in which the coming-of-age dimension cuts across generations, whereas
Tom Sawyer may look more childhood-bound.7
Benefits of a Corpus-based Approach to Translations
179
Although the bulk of the data was provided by UNESCO, an international institution, and Rodney, an individual bibliographer, there also were additions by individual Mark Twain scholars as well as through a crowdsourcing experiment that
mobilized the power of anonymous crowds.
Using the title in the target languages, we crawled the web and mined online
digital libraries and national archives in order to find the full texts. In some cases,
we came across the full online version that was in the public domain (provided by
public institutions), in which case we downloaded them, whatever their format.
When dealing with versions in pdf or epub format, we converted them into text
format that could later be processed. There were other instances when we knew of
an existing version but it was not readily available online. In that case, we turned
to the national libraries and archives and asked them if they were willing to collaborate with us by digitizing their printed versions.
In total, we collected 64 metadata (the title, the language, the translator’s name,
the year of publication, and the publisher house) and 30 full text files. Volunteer
contributors and scholars provided us with 34 metadata and 7 full text files. The
crowdsourcing provided us with 18 metadata and full text files, and five translations were collected by crawling different digital libraries collections. Due to
the significant number of existing translations and the growing number of digital
versions made available online, the crowdsourcing allowed us to gather data that
would have otherwise been beyond our reach. Crowdsourcing helped reduce the
amount of time spent on the task and increased the number of translations available as a basis for developing parallel corpora for under-resourced languages.
These efforts allowed the roster of translation of Huckleberry Finn to be
brought up to date. As of today, the novel exists in 64 languages: Afrikaans, Albanian, Alemannic, Arabic, Armenian, Assamese, Basque, Bengali, Bulgarian, Burmese, Catalan, Chinese, Chuvash, Croatian, Czech, Danish, Dutch, Esperanto,
Estonian, Finnish, French, German, Georgian, Greek, Hebrew, Hindi, Hungarian,
Icelandic, Indonesian, Italian, Japanese, Kazakh, Kirghiz, Korean, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Marathi, Moldavian, Norwegian, Oriya,
Persian, Polish, Portuguese, Romanian, Russian, Serbian, Sinhalese, Slovak, Slovenian, Spanish, Swedish, Tamil, Tatar, Telugu, Thai, Turkish, Turkmen, Ukrainian, Uzbek, Vietnamese, and Yiddish.
10.5 Natural Language Processing and Alignment
Alignment serves the needs of both the NLP community and translation studies
scholars. While the interpretive nature of literary translations has caused a lag in
their adoption as a source for NLP development, multiple projects have developed parallel corpora based on texts, including the Harry Potter series and Le
Petit Prince (Stolz 2007). Concerning alignment, nearly all tools are not literary
texts oriented, which is more challenging for alignment approaches because the
entire corpora must be aligned and alignment should be as confident as possible
(Xu et al. 2015). Usually, parallel corpora focus on very specific and specialized
domains, which can be efficient but also show limitations for machine translation.
180
Ronald Jenn and Amel Fraisse
Figure 10.1 Excerpt of the Translation Dashboard for Basque, Bulgarian, Dutch, Finnish,
German, Hungarian, Polish, Portuguese, Russian, and Ukrainian (first five
chapters).
The advantage of using a work of fiction such as Huckleberry Finn is that it
uses a very broad vocabulary linked to everyday life, which makes it a valuable
asset for those languages that are currently lacking computational resources (see
Section 10.5).
The paragraph and word alignment algorithms used in the project were modest compared to the current cutting-edge of NLP, but the tool was nonetheless
significant as an example of how to apply “just enough” NLP in the service of
research priorities shaped by another field, such as library and information science or translation studies.
For example, chapter alignment quickly gives an idea of how complete the version under scrutiny might be. Anything short of the original 43 chapters indicates
incompleteness or an adaptation of sorts. So a number of versions matching the
chapter count were gathered and first aligned by chapter. The next step was to create a tool that would allow a quick overview of how the paragraph count of each
chapter matches the original – that’s how the Translation Dashboard came about
(Figure 10.1):
This Translation Dashboard reads as follows (starting in the top left corner):
Chapter 1 of the English version (en) has 9 paragraphs, and the Basque (ba) version has 10, the Bulgarian (bu) 9, the Dutch (du) 9, and so on. The per-chapter
paragraph count is based on new lines and white space. It meets the needs of
translation studies scholars by creating a convenient environment enabling them
to conduct parallel readings and visualizations. Scholars can easily see patterns of
structural convergence and divergence between the source text and translations, at
different levels of granularity. A translation studies scholar in the literary tradition
may use this information to select chapters for a close reading analysis.
Benefits of a Corpus-based Approach to Translations
181
Figure 10.2 Example of paragraph alignment for the Basque version
To get paragraph alignments in each chapter, chapters were divided into three
major categories based on the differences in their paragraph counts compared to
the original English version: exact-match, large-difference, and small-difference.
Different paragraph aligners may apply to different categories. An exceedingly
high divergence from the source paragraph count alerts the scholar that there may
be data-cleaning issues (e.g., one instance where each line in a poem embedded in
a narrative was treated as a new paragraph), but a moderate divergence can reflect
the translator’s deliberate stylistic choices about how the flow of the narrative
should be rendered. The degree of convergence and divergence also determines
the color variation in the heat map of the Translation Dashboard (Figure 10.1): the
darker, the greater the divergence in paragraph numbers.
For exact-match chapters, the hypothesis is that their paragraphs were translated
one-to-one. No further paragraph alignment methods are needed. This hypothesis
has been confirmed for most of the exact-match cases by a human validation experiment. Large-difference cases are normally caused by different ways of splitting quotations. Thus, we provide a text pre-processing option before paragraph alignment
when long quotations have been found under large-difference cases (Fraisse et al.
2019, 303). This pre-processing option splits quotations into paragraphs according
to the same standard in all translations. Experiments have shown that it can significantly reduce differences in paragraph counts and sometimes move a chapter from
the large-difference category to the small-difference category. Figures 10.2 and 10.3
show examples of paragraph alignment based on Chapter 1 of the Basque version
and Chapter 43 of the Bulgarian one, which are small-difference chapters:
Deeper analysis for aligned paragraphs. An expected finding during the human
validation stage is that some aligned paragraphs, even though correctly aligned,
still contain some words or sentences that have not been translated. Further processing with sentence alignment, word alignment, text similarity, and text summarization can support a deeper analysis.
User-uploaded corpora support. As the NLP algorithms we implemented are
not bound to the novel Huckleberry Finn, the option for users to upload other
182
Ronald Jenn and Amel Fraisse
Figure 10.3 Example of paragraph alignment for the Bulgarian version.
parallel corpora that interest them, and visualize them in the same way, could be
added.
10.6 Under-Resourced Defined
A large but still modest number of languages, close to a hundred, have the so-called
Basic Language Resource Kit (BLARK): monolingual and bilingual corpora,
machine-readable dictionaries, thesauri, part-of-speech taggers, morphological analyzers, parsers, etc. (Krauwer 2003 and Arppe et al. 2016). This means that, as mentioned by Scannell (2007), over 98% of world languages lack most, and usually all,
of these language resources. Consequently, these languages and, subsequently, the
knowledge encoded in these languages are threatened and their preservation is at risk.
Digital language resources can help prevent the disappearance of diverse knowledge
systems, ensure their preservation and transmission, and foster their cross-fertilization.
Even for well-endowed languages, parallel corpora, a valuable resource for
sustaining linguistic diversity, are rare despite the great need there is for them.
Such corpora are often used for testing new tools and methods to develop underresourced languages. Because translated texts are de facto parallel corpora, and
because we know for a fact that translated language materials do exist in underresourced languages, using these translations can help build cheap, efficient, and
reliable corpora for the purpose of preserving under-resourced languages. Those
translations are mostly available in print and still awaiting digitization. They are
all the more precious because, when translation does occur, it is currently into
commercially dominant languages.
The roster of translations of Huckleberry Finn includes 22 under-resourced languages: Assamese, Basque, Bengali, Burmese, Catalan, Chuvash, Hindi, Indonesian, Kazakh, Kirghiz, Malayalam, Marathi, Moldovan, Oriya, Persian, Sinhalese,
Tamil, Tatar, Telugu, Turkmen, Uzbek, and Yiddish. In many of these languages,
there have been multiple translations over time, reflecting different moments in
history, and different ideological perspectives on the part of the translators or publishers, as well as different attitudes toward the United States, childhood, minorities and minority dialects, and race and racism.
Benefits of a Corpus-based Approach to Translations
183
Texts gathered through Global Huck and Rosetta were used to create a parallel
corpus containing translations of Huckleberry Finn as a basis for developing NLP
resources for under-resourced and endangered languages (Fraisse et al. 2018), as
shown in Section 10.4. The Slavic (Bulgarian, Polish, Russian, Ukrainian) and
Finno-Ugric (Hungarian and Finnish) translations have served as the initial datasets for testing the project’s text alignment algorithms.
Before we move on to some results of the fine-grained literary analysis, it is
important to underline that a comprehensive approach to translated texts cannot
be limited to collecting and aligning the versions. Translations are more often
than not accompanied by scholarship. Collecting that scholarship in different languages helps gain insight into local strategies and ways of handling the text so as
to get a global image and perceive patterns.
In the following section, observations are made as to the chronology of those
studies as they appear to be fairly recent and increasing in number. The ascent
of translation studies as a self-standing discipline is not solely accountable for
this surge in interest. Increasing attention paid to race relations and the issue of
dialect translation make Huckleberry Finn an alluring mix for translations studies
scholars. Most of the scholarship was published in the twenty-first century, and
the bulk came after 2010.
10.7 The Scholarship
A consistent wave of scholarship, too numerous to be detailed within the scope of
this chapter, swelled in the late 2000s, gained traction in the 2010s, and recently
culminated in a special forum of the Journal of Transnational American Studies,
the first time the entirety of a journal issue was devoted to translations of Huckleberry Finn (Fishkin et al. 2021).
Scholarship on translations of Huckleberry Finn covers languages spoken on
five continents, although the greater part originates from Asia and Europe. Some
studies stem from languages with but few speakers on the world stage, such as
Czech, Slovenian, or Tatar, while others encompass large segments of the world
population, whether because they tend to be international, like Arabic, French, or
Spanish, or because they have a large population of speakers, and therefore wide
readerships at home, such as Chinese, German, Hindi, Indonesian, Persian, and
Vietnamese. Added up, these populations amount to a large portion of the world,
and those studies offer a vantage point into the readers’ experience of Huckleberry
Finn globally. China and Chinese-speaking scholars stand out as the single biggest providers of scholarship.
The scholarly writings emanate from fledgling as well as confirmed scholars,
and they come in all shapes and sizes – journal articles, book chapters, and booklength studies. Studies vary in scope, ranging from the examination of merely
a few sentences or a single chapter to thorough and painstaking surveys complete with figures and statistics. Most of the scholarship is made of what could be
dubbed single-language studies since they focus on translation into one language.
Only a handful of studies provide a more international angle by looking at translations into multiple languages.
184
Ronald Jenn and Amel Fraisse
Some studies are blind to the context in which the translations were produced;
others are extremely concerned with the geographical, historical, sociological,
and geopolitical context that make up the translations’ backdrop. Very rarely do
they dwell on the identity and profile of the translators themselves, or the publishing houses, except when they are otherwise famous. While they have varying methodologies and different goals, the studies from foreign shores rely on a
shared pool of authorities originating in the United States on Twain and are well
grounded in the novel’s reception and progress in its homeland.
Analysis of what actually happened to the translated texts is exemplified by
the recent publication of the special forum “Global Huck: Mapping the Cultural
Work of Translations of Mark Twain’s Adventures of Huckleberry Finn” in Journal of Transnational American Studies, 12(2) (Fishkin et al. 2021). Within the
scope of the present study, we will dwell on one particular aspect: the presence of
dialects in Huckleberry Finn and how they were dealt with by translators around
the world.
10.8 Fine-Grained Textual Analysis: The Dialects Go Global
One major theoretical objection to the translation of Huckleberry Finn often
put forward by American Studies scholars is the presence of dialects, which are
reputedly untranslatable. Indeed, considering that the dialect for dialect strategy was adopted in only a handful of translations, the odds seem to be against
this approach.8 One underlying assumption against using dialects seems to be
that they are tied to specific territories in ways that national languages are not.
This is paradoxical because reading a translation implies, just as is the case for
fiction, a suspension of disbelief, a process through which readers acknowledge having access to a foreign reality in their own tongue. Dialects in translations somehow disrupt this process, and suspension of disbelief works best
when national languages are at play. This most probably originates in the way
national idioms were shaped in opposition to regional dialects, notwithstanding
their linguistical ties.
Amid the intricate web of dialects claimed by the author of Huckleberry Finn,
the structural opposition between Black and White voices is explored in this section to show the benefits of a multilingual approach. Culturally motivated and
textually observable, the divide between Black and White voices is convenient
to handle. Observers and critics of Huckleberry Finn agree that a clear distinction between Black and White voices was carefully crafted by the author and that
Black voices coincide, from a linguistic point of view, with African American
Vernacular English, sometimes referred to as Black English. The translation of
African American voices in the novel, which is at once rife with technical and
ethical questions, has come under increasing scrutiny since the late 2000s.9
To circumvent the dialect challenges and the strong belief that rendering dialect
for dialect is bound to failure, translators worldwide turned to registers. Registers
can conveniently be broken down into three universally shared categories: low,
standard, and high. The low standard, which is distinct from broken language,
Benefits of a Corpus-based Approach to Translations
185
appears to be confined to Chinese language, but further research might show that
other languages have developed it. Standardization is the most widespread strategy, while high register is marginal yet noteworthy.
In mainland China, some translators used the specificities of their language
regarding registers and tones to implement successful strategies. In a 1956 version
that has since been canonized, “[t]he translators tap into three register varieties
available to them in Chinese: the vulgar, the colloquial and the standard formal
varieties” (Jing Yu 2017, 60). The standard formal variety is only used for posters, letters, and inscriptions, while all White characters share the same casual
colloquial tone. As a result, Huckleberry, a “marginal social outcast,” morphs into
“a legitimate member of that society in full command of its standard oral code.”
While Huckleberry’s difference is erased, Black voices are rooted in the vulgar
register, which, in the context of socialist China, has positive connotations: “Jim
represents the oppressed on the bottom rung of the social ladder,” and he is “given
the working-class voice” (Jing 2017, 60–63).10
In the same low-register vein, other translators played on the discrepancy
between oral and written Mandarin Chinese and its different tones for spoken
words in order to offer a parallel to Black characters’ mispronunciations, otherwise known as malapropisms – although it is to be reminded that malapropisms
are shared by many characters in the novel. Such is the case with a 1989 translation, whose use of the wrong tone results in shifts of meaning that “captured the
spirit of Jim’s speech and mannerisms” (Lai-Henderso 2015, 114).
These strategies can be deemed successful because through vulgar register or
tonal malapropisms, slaves come across as unschooled and illiterate, fitting their
social status, but they retain strong reasoning abilities. They stand out as humane
and likeable characters that readers can relate to.
Not all Chinese versions explored this path, and some resorted to another play
on register and what is the most commonly shared strategy overall: standardization. In this case, although Jim, the main Black person in the novel, still is a
full-blown character, his speech is “translated into normal spoken Chinese, just as
Huck’s is,” and “it is hard to distinguish his vernacular or his low social status”
(ibid., 117).
Standardization and homogenization – which result in all characters speaking
alike – are the dominant strategies. Considering the economy of the original text
and Black characters’ outstanding identifiable voices, this is mostly considered
another case of untranslatability as stated in a great number of studies of which
Vietnamese and Indonesian provide a good example. This is what a scholar has to
say about a Vietnamese translation:
Jim’s distinctive and nonstandard voice does not come through at all.
Although it is clear that finding an adequate Vietnamese dialectal equivalent
for Jim’s voice is a difficult task. The task is even more challenging due to
the fact that there is no perfect equivalent of Black and White race relations
in the Vietnamese speaking world.
(Hang 2019, 53)
186
Ronald Jenn and Amel Fraisse
Likewise, a study of the 2012 Indonesian version states that standard Indonesian, which is “spoken mostly by the more educated members of society,” sounds
“unnatural for daily conversation” and eliminates “the impression that the speaker
comes from a lower class” (Dewi et al. 2018, 378).
Standardization could easily be perceived as a lesser effort and a recognition
of the shortcomings of translation and its inability to produce a viable equivalent
of the original. This is mainly due to the fact that most translators are silent about
their work and there is a deficiency of discourse on their part. In the Russian tradition, however, far from being a silent process, standardization has been verbalized, conceptualized, and valued as a strategy, most certainly speaking for many
translators around the world who have left little trace of their strategy other than
in the occasional foreword, preface, or footnote.
Famed Russian translator and theorist Chukovsky, who discussed Daruzes’s
landmark translation of Huckleberry Finn and would later translate the novel himself, invoked a long-standing tradition of “masters of translation who have completely refused to reproduce colloquial speech in translation” (Marinova 2021,
66). Acknowledging the untranslatability of dialects, they have embraced standardization, also called “blandscript,” as a way of conveying the true essence of
characters in the novel, including Jim:
Rather than seeing it as a deficiency, blandscript for him [Chukovsky] became
the best possible answer to the problem of reencoding different regional
dialects and registers in Russian. In Chukovsky’s professional assessment,
Daruzes’s translation of Huck Finn with its “most pure, correct, neutral language, without straining after any dialects,” offered an admirable example of
successful reaccentuation of the original.
(Marinova 2021, 127)
Russian translators’ writings are precious testimonies insofar as they enable us to
consider standardization not as a failure but as a thoroughly thought-about strategy. Not translating dialects is considered no impediment to a feeling of relatedness to Jim and other dialect speakers, which was judged attainable through
standardization:
To put it differently, it was precisely because of the erasure of the original
dialects that Soviet readers thought they could understand Huck and Jim so
well. By stripping the cumbersome linguistic layer of difference, translators
allowed Russian boys and girls the opportunity to access the true essence of
the fictional heroes.
(Marinova 2021, 127)
Incidentally, in his assessment of Daruzes’s translation, Chukovsky praised the
fact that his language resembled Turgenev’s style of natural descriptions more
than Twain’s. This reference to a writer of renown leads us to what could be
termed the high-register strategy.
Benefits of a Corpus-based Approach to Translations
187
A high-register strategy consists in considerably enhancing Black voices and
translating their dialect into literary language. This strategy is akin to standardization but takes it a notch higher. It will be found mostly in national literatures that
had little to no tradition of written representation of oral speech at their disposal,
as, let it be reminded, Huckleberry Finn was a pioneering venture from that point
of view. Persian and Arabic are two examples of literatures with strong written
traditions.
To take the example of the first version in Iran by Golestan (1949), even though
the translations back into English provided by scholars still make Jim’s speech
sound incredibly enhanced, the recourse to colloquial language in this translation
made a lasting impression and is considered a “significant step in the history of
Persian translation” (Fomeshi 2021, 30).
In Arabic, the same high-register strategy can be observed in the first version
by Egyptian translator Naseem (1958). He used a high register and “polished the
slave’s dialect into Classical Arabic, occasionally using Quranic terms” (Abdulmalik 2016, 43).
Although a proper rendition of dialects would certainly help dramatize the
novel in even greater proportions, it is striking that the novel has been praised for
its universal dimension and generations have enjoyed reading it without direct
access to the colorful and pithy language of the original. This is a reminder that
a strictly linguistic appraisal of translated texts has limitations and that deeper
mechanisms are at play. The intensity of narrated events, the falsely naive tone
of some characters, the exposure of universal human traits, whether downfalls
or outstanding qualities, came across sufficiently unscathed for the reading to be
enjoyed.
10.9 Conclusion
To summarize, an across-the-board corpus-based approach needs an author who
ranks high in the Index Translationum, whose international career has been the
subject of some preliminary bibliographic work and who is studied by a community bent on discovering how their writings reverberated throughout the world.
Once a well-traveled text or author has been identified, retrieval of the texts can
begin with a view to aligning them from chapter to paragraph. Building such comparable corpora can help salvage endangered languages by strengthening available linguistic material. The collected translations help build multilingual parallel
corpora and highlight the transnational circulation of knowledge.
The world’s cultural heritage is largely built by sharing texts through translations, which are locally impacted as they are both written and read in a specific
context and culture. Comparable corpora allow scholars to explore the social,
cultural, and political agendas of translators and publishers and look at how the
cultural demands of readers shaped the book’s translation, fostering fine-grained
literary analysis of what actually happened within the translated texts.
Translators and translation studies scholars have been largely left to their
own national devices, and a multilingual approach can abolish the seclusion of
188
Ronald Jenn and Amel Fraisse
individual languages and cultures. Comparable corpora gesture to a fascinating
array of perspectives on a series of issues, continuities, and divides that break
through the monolingual and nation-bound silos that usually constrain literary and
translation studies.
Notes
1 This sequential description is partially hypothetical and broken down for the sake of
demonstration. In actuality, experts from the different fields involved work hand in
hand on different tasks at different stages of the project in a dynamic and feedbackbased fashion.
2 We tend to think of Huckleberry Finn (1885), in the wake of Ernest Hemingway and
many others after him, as the foundation of American literature, so there is a tendency
to overlook the fact that Huckleberry Finn was the sequel to, if not the companion
piece of, The Adventures of Tom Sawyer (1876).
3 The Cold War period was characterized by deep ideological divides that, in many
cases, overran the commercial aspect and took precedence over it. As early as 1929,
an official list of about 800 books – political manifestoes or essays, along with literary
works – was issued by the Komintern in Moscow and sent to all the affiliated publishing houses around the world. The result was a flurry of translations in languages like
Khirgiz, Chuvach, or even Turkmen.
4 As of 1982, Huckleberry Finn alone had been translated into 53 languages, in 47 countries, for a total of 800 and 41 editions.
5 Originally published in Chinese, Danish, French, German, Italian, Japanese, Russian,
Spanish, and Yiddish.
6 The collaboration of Fishkin in English and American studies at Stanford and Amel
Fraisse in information sciences at Université de Lille and Ronald Jenn in translation
studies at the same institution received support from MESHS in Lille.
7 Our findings in this project indicate that Huckleberry Finn is catching up with Tom
Sawyer, not so much in terms of quantity, but definitely in terms of quality, as it is
gradually acquiring the same canonical status abroad it enjoys at home, an observation based on the growing number of scholarly annotated editions complete with the
original illustrations. The most recent example is O’Shea’s 2019 translation, which
allows Brazil to join the ranks of countries with scholarly translations, complete with
elaborate notes and the original Kemble illustrations. O’Shea authored close to 150
notes, an introduction, and a note on the translation. Mark Twain and Huckleberry Finn
are inching their way toward full literary recognition one country at a time.
8 It was implemented in an experimental manner in the Atlantic world, where comparable ethnic, cultural, and historical contexts matching the original could be found.
9 This section is the result of observation of strategies in 18 languages: Arabic, Chinese, Czech, French, Hindi, Indonesian, Nordic (Danish, Norwegian, Swedish), Persian, Portuguese, Russian, Spanish, Slovenian, Tatar, Ukrainian, Vietnamese. For
obvious reasons linked to limitations in the command of all the languages at play,
this section relies exclusively on scholarship on the translation of Jim’s speech written in English. Those studies that proposed translations back into English (and not all
of them did) proved most helpful to evaluate the shifts in register for an international
readership.
10 The emphasis on the colloquial and the omission of the standard variety reflects the
newly constructed power relations in the socialist China in the 1950s, where “[t]he
majority of the population were illiterate workers and peasants who had been regarded
as the proud owners and leading class of the country since 1949, when the People’s
Republic of China was founded” (Jing 2017, 63). See also Lai-Henderson (2015, 112).
Benefits of a Corpus-based Approach to Translations
189
References
Abdulmalik, Mariam. 2016. “Adventures of Huckleberry Finn in Arabic Translations:
A Case Study.” PhD diss., Binghamton University.
Arppe, Antti, Jordan Lachler, Trond Trosterud, Lene Antonsen, and Moshagen Sjur. 2016.
“Basic Language Resource Kits for Endangered Languages: A Case Study of Plains
Cree.” Proceedings of the 2nd Workshop on Collaboration and Computing for UnderResourced Languages Workshop (CCURL 2016), 1–8.
Dewi, Ida Kusuma, M.R. Nababan, Riyadi Santosa, and Djatmika. 2018. “The Characters’
Background in the African-American English Dialect of The Adventures of Huckleberry
Finn: Should the Translation Retain It?” Journal of Social Studies Education Research
9, no. 4: 382–402.
Fishkin, Shelley Fisher, ed. 2010. The Mark Twain Anthology: Great Writers on His Life
and Work. New York: Library of America.
Fishkin, Shelley Fisher. 2011. “ ‘Deep Maps’: A Brief for Digital Palimpsest Mapping Projects (DPMPs, or “Deep Maps”).” Journal of Transnational American Studies 3, no. 2.
https://escholarship.org/uc/item/92v100t0. Accessed 11 April 2022.
Fishkin, Shelley Fisher, Tsuyoshi Ishihara, Ronald Jenn, Holger Kersten, and Selina LaiHenderson. 2021. Special Forum Global Huck: Mapping the Cultural Work of Translations of Mark Twain’s Adventures of Huckleberry Finn. Journal of Transnational
American Studies 12, no. 2. https://doi.org/10.5070/T812255976.
Fomeshi, Benham. 2021. “Persian Huck: On the Reception of Huckleberry Finn in
Iran.” Journal of Transnational American Studies 12, no. 2: 27–45.
Fraisse, Amel, Quoc-Tan Tran, Ronald Jenn, Patrick Paroubek, Shelley Fishkin. 2018.
TransLiTex: A Parallel Corpus of Translated Literary Texts. In Eleventh International
Conference on Language Resources and Evaluation (LREC 2018). Miyazaki: Beijing
Advanced Innovation Center for Language Resources.
Fraisse, Amel, Zheng Zhang, Alex Zhai, Ronald Jenn, and Shelley Fisher Fishkin, Pierre
Zweigenbaum, Laurence Favier, and Widad Mustafa El Hadi. 2019. A “Sustainable and
Open Access Knowledge Organization Model to Preserve Cultural Heritage and Language Diversity.” Information 10, no. 10: 303.
Hang, Hoang Thi Diem. 2019. “An Assessment of the Vietnamese Translation of The
Adventures of Huckleberry Finn Chapter XX Using House’s Translation Quality Assessment Model.” VNU Journal of Foreign Studies 35, no. 1: 35–54.
Ishihara, Tsuyoshi. 2011. Mark Twain in Japan. The Cultural Reception of an American
Icon. Columbia, MI: University of Missouri Press.
Jing, Yu. 2017. “Translating ‘Others’ as ‘Us’ in Huckleberry Finn: Dialect, Register and
the Heterogeneity of Standard Language.” Language and Literature 26, no. 1: 54–65.
Krauwer, Steven. 2003. “The Basic Language Resource Kit (BLARK) as the First Milestone for the Language Resources Roadmap.” Proceedings of the International Workshop Speech and Computer, Moscow, Russia.
Lai-Henderson, Selina. 2015. Mark Twain in China. Stanford, CA: Stanford University
Press.
Marinova, Magdalena. 2021. “Huck Finn’s Adventures in the Land of the Soviet People.”
Journal of Transnational American Studies 12, no. 2: 119–47.
Rodney, Robert M. ed. and comp. 1982. Mark Twain International – A Bibliography and
Interpretation of His Worldwide Popularity. Westport, CT: Greenwood Press.
Scannell, Kevin. 2007. “The Crubadan Project: Corpus Building for Under-resourced Languages. Building and Exploring Web Corpora.” In Proceedings of the 3rd Web as Corpus
Workshop, 5–15. Belgium: European Language Ressources Association.
190
Ronald Jenn and Amel Fraisse
Stolz, Thomas. 2007. “Harry Potter Meets Le petit prince – On the Usefulness of Parallel
Corpora in Crosslinguistic Investigations.” Language Typology and Universals 60, no.
2: 100–17.
Xu, Yong, Aurélien Max, and François Yvon. 2015. “Sentence Alignment for Literary
Texts. Linguistic Issues in Language Technology.” Stanford, CA: CSLI Publications 12:
1–25.
11 Are Translated Chinese
Wuxia Fiction and Western
Heroic Literature Similar?
A Stylometric Analysis Based
on Stylistic Panoramas1
Kan Wu and Dechao Li
11.1 Introduction
Wuxia, or Chinese martial arts fiction, is a traditional genre of Eastern heroic
literature that originated from unique historical and cultural contexts during China’s Warring States Period (475–221 BC) (Huang 2018, 152). Previous research
on Wuxia (Flannery 2012; Vander Elst 2017; Keulemans 2020) has attempted to
compare this type of Chinese heroic literature with Western chivalric stories and
heroic fantasies – two subgenres of heroic literature deriving from a medieval
background (Honegger 2010, 61). Those studies have demonstrated that Wuxia
could be very different from the two Western subgenres in terms of cultural values, religious belief, and above all, worldviews, even though they share the heroic
theme. Wu and Li, however, discovered that when readers read a Wuxia translation,
they sometimes experience a déjà vu–like reminder of chivalric stories or heroic
fantasies (Wu and Li 2018, 102–3). This raises the question as to whether there are
any possible stylistic connections between heroic literature in the East and that in
the West. An examination of such stylistic connections may give us clues about
the current reception of translated Wuxia and is hence our first research objective.
To conduct the investigation, we turn to Stylometry – the statistical analysis of
literary styles (Holmes 1998, 111) – for methodological support.
Existing stylometric research on (translated) texts of varied genres has employed
a number of stylistic indices at such linguistic levels as characters (Daelemans
2013; Eder et al. 2016), words/lexes (Jones and Nulty 2019; Melka and Místecký
2020), n-grams/clusters (Mastropierro 2018; Valencia et al. 2019), sentences and
paragraphs (Rong et al. 2006), tones and rimes (Hou and Huang 2020), and their
combinations (Brocardo et al. 2014; Liu and Xiao 2020). These studies have
unveiled the features of (translated) texts from multiple stylometric perspectives
and reveal their most noticeable stylistic features. Nevertheless, they are not
without methodological limitations. For one thing, they lack a panoramic view
of different stylistic features in a text; for another, they do not always explain the
selection criteria for the stylistic indices to be investigated. We believe that both
the adoption of a panoramic view and the justification for selection criteria are
vital in stylometric analyses because the former makes results more comparable
DOI: 10.4324/9781003298328-12
192
Kan Wu and Dechao Li
across research and the latter reveals intended functions associated with the chosen indices. Therefore, the second objective of the present study is to introduce the
stylistic panorama, a novel concept proposed to describe the stylistic profile of a
(translated) text in a relatively holistic and functional way.
To achieve the two research objectives, we formulate the following research
questions:
RQ1: To what extent are translated Wuxia fiction similar to and/or different
from Western chivalric stories and heroic fantasies, from a stylometric
perspective?
RQ2: In what ways could such similarities and/or differences reveal current
reception of English translations of Wuxia?
RQ3: How do the findings shed light on the use of stylistic panoramas in stylometric analyses?
Whereas an investigation of RQ1 is expected to contribute to the hypothesized
stylistic connection(s) between translated Wuxia and the two subgenres of Western
heroic literature, RQ2 contributes to practical outcomes for target readers of translated Wuxia and RQ3 explores theoretical implications for stylometric research into
literary translation, specifically Chinese to English translation on Wuxia fiction.
11.2 Stylistic Panorama as a Stylistic Profile of a
(Translated) Text
A stylistic panorama is defined in this research as a relatively complete stylistic profile that is based on a set of functionally related stylistic indices at multiple linguistic
levels and is intended to satisfy certain research purpose(s). Theoretical significances of the concept can be observed in both stylometric analyses and translation
studies. In stylometric analyses, the stylistic panorama emphasizes the combined
efforts of varied stylistic features when exploring a (translated) text. For translation
studies, the panorama values the stylometric approach in its methodological design.
In other words, the concept can be a bridge connecting empirical translation studies with stylometric analyses: it lends viable methodological support to empirical
translation studies, enriching the research scope of stylometric analyses.
To generate the stylistic panorama of a text, the first step is to choose proper
stylistic indices. Such a choice is a varied process that depends largely on research
needs. Considering that the research aim here is to conduct an overall stylistic
comparison between the translated Wuxia and the two Western subgenres of heroic
literature, we tend to select stylistic indices that could cover wide linguistic levels (words, word sequences, sentences, paragraphs, etc.) to capture basic features
of this type of literature. Hence, the indices we choose for this research are the
average word length (AWL), the dispersion of word lengths (DWL), the movingaverage type-token ratio (MATTR), the verb-adjective ratio (VAR), the average
sentence length (ASL), the dispersion of sentence lengths (DSL), the average paragraph length (APL), the most frequent words (MFWs), and the most frequent word
Translated Chinese Wuxia Fiction and Western Heroic Literature
193
sequences (MFWSs). Next, we perform multivariate analyses on those indices to
produce interpretable stylistic panoramas. This is because multivariate analyses
are powerful enough to account for holistic stylistic similarities between the translated Wuxia and the two Western subgenres, as well as being amenable to varied
sample sizes. Finally, for easier investigation, we categorize the stylistic panoramas according to the functions of the selected indices. Hence, two types of stylistic
panoramas emerge in this work: one based on formal indices, and the other based
on word and word sequence frequencies (MFWs and MFWSs, respectively).
The stylistic panorama based on formal indices describes formal features of a text.
It is founded on such indices as AWL, DWL, MATTR, VAR, ASL, DSL, and APL,
which are connected to comparatively small sample sizes (ca. 126 in total) in this
research. At the word level, the AWL and DWL are selected to show the orthographical complexity of the heroic texts, and the MATTR and VAR are chosen to demonstrate the lexical richness of these texts. This ensures a breadth of features types under
classification. It is worth stressing that, out of many possible indices depicting lexical richness, we prefer the MATTR and VAR for practical reasons. The MATTR is
favored because “it takes into account all possible segmentation of the text” (Březina
2018, 58) and is thus believed to capture features of vocabulary richness of a heroic
text. The VAR is adopted because it reflects the lexical richness of the heroic text – a
genre that is likely to contain a multitude of verbs and adjectives depicting kung fu
fighting scenes (Wu and Li 2018, 102). For sentences and paragraphs, the ASL and
APL are chosen to partly reveal the typological complexity, and above all, the readability, of the heroic texts, which we believe have certain connections to the reception
of translated Wuxia. In addition, we select the DSL because computations of sentence
dispersion can unveil the rhythm and likewise the readability of a heroic text: a lower
dispersion value suggests a higher level of repetitiveness in a text, and vice versa.
The stylistic panorama built on MFWs/MFWSs associates with lexes and their
sequences in a text. In this study, the two stylistic indices are related to larger
sample sizes (ca. 3,000 in total). The MFWs index is adopted because it has a long
tradition of being an efficient classifier to distinguish stylistic features of one text
from those of another (Burrows 2002). Meanwhile, word sequences (or n-grams,
the contiguous sequences of n items of words within a given text or speech) have
been widely applied in computational linguistics and information sciences for
predictive or attributive purposes (Broder et al. 1997, 1157). In some quantitative
linguistic/translation research (Rybicki 2012; Mastropierro 2018), frequency patterns of the MFWs and/or the MFWSs have been compared in an effort to assess
their overall similarities and/or differences between texts. The MFWs and the
MFWSs are often determined by the proportion of each word or word sequence
in a text and are presented in parallel lists. The present study confines the scope
of MFWSs to 2-grams and 3-grams, because they are the most common word
sequences in (translated) literary texts (see Burrows 2002; Rybicki 2012; Mastropierro 2018). Furthermore, the choice of 2- and/or 3-grams as the MFWSs is further justified by the possibility that heroic literature is more likely to contain short
phrases (of two to three words) that depict the quick action and short dialogue
sequences of fighting scenes in the stories (Wu and Li 2018, 97).
194
Kan Wu and Dechao Li
Overall, the stylistic panorama is a concept that attempts to bind selected stylistic indices in texts together in a relatively holistic and functional way. In other
words, when measuring the stylistic features of a text, the indices are no longer
examined in isolation but are instead measured in a more comprehensive and
interrelated way in line with specific research needs. For this work, because one
objective is to explore potential stylistic connection(s) between the heroic literature of the East and that of the West, the selected stylistic indices and their resulting panoramas are expected to meet this goal.
11.3 Data and Methodology
11.3.1 Data and Corpora
The Wuxia novels2 used in this study are the English translations of six different
works by Louis Cha, a renowned Hong Kong Wuxia novelist. We choose those
six works because they are the only English Wuxia translations published at the
time of writing. The novels are translated by five translators who are experienced
in rendering Chinese Wuxia fiction into English: Minford, Earnshaw, and Mok
are sinologists dedicated to Chinese literary translation, whereas Holmwood and
Chang are new generation translators who are interested in the dissemination of
Chinese Wuxia overseas (Wu and Li 2018, 95).
For the Western heroic literature, the selected works3 are chivalric stories
translated into modern English and heroic fantasies written in modern English.
We choose these works for a balance between comparability and representativeness. First, the choice of works in modern English is meant to ensure linguistic comparability across the subgenres and, as a consequence, facilitate where
the style of the Wuxia translations could be comparatively located. Second, we
expect the great popularity (based on Amazon/Goodreads ratings) and diverse
source languages (i.e., English, German, Spanish) of the selected translated
works would increase the representativeness in the two subgenres of Western
heroic literature.
Details of all the works used in this research are summarized in Tables 11.1
through 11.3, including year of publication details regarding the versions used in
the study. Comparability is additionally enhanced through similar token sizes (ca.
1.3 million) in texts across the subgenres. Representativeness is further captured
Table 11.1 Details of the Translated Wuxia Stories
Translated Wuxia Stories
Year
Translator
Token Size
Fox Volant of the Snowy Mountain
The Book and Sword
The Deer and the Cauldron
A Hero Born
A Bond Undone
A Snake Lies Waiting
Total Size
1993
2004
1997
2018
2019
2020
Olivia Mok
Graham Earnshaw
John Minford
Anna Holmwood
Gigi Chang
Gigi Chang
120,613
192,439
617,949
127,123
160,851
140,539
1,359,514
Translated Chinese Wuxia Fiction and Western Heroic Literature
195
Table 11.2 Details of the Chivalric Stories
Chivalric Stories
Year
Author/Editor
Token Size
Don Quixote (translated from Spanish)
Ivanhoe
Parzival (translated from German)
In the Days of Chivalry
Heroes and Heroines of Chivalry
Castles, Knights, and Chivalry
Total Size
2003
2005
1980
2004
2004
2015
M. De Cervantes
W. Scott
W. Von Eschenbach
E. Everett-Green
W. Patten
Kaufman et al.
402,546
182,732
154,902
153,079
135,866
351,311
1,380,436
Table 11.3 Details of the Heroic Fantasies
Heroic Fantasies
Year
Author
Token Size
The Lord of the Rings 1–2
A Song of Ice and Fire 1
Conan the Barbarian
Wheel of Time 1
The Chronicles of Amber 1–4
The Chronicles of Narnia 1–5
Total Size
1954–1955
1996
1954
1990
1970–1976
1950–1954
J. R. R. Tolkien
G. R. R. Martin
R. E. Howard
R. Jordan
R. Zelazny
C. S. Lewis
188,361
293,856
104,972
319,124
239,940
233,369
1,379,622
through the inclusion of both earlier works (before the 1850s) and modern collections (the 1850s or later) for the chivalric stories.
11.3.2 Calculations and Algorithms
Whereas the average word, sentence, and paragraph length (AWL/ASL/APL)
values are retrieved from the outputs of Wordsmith 6.0 (Scott 2012), the most
frequent word and word sequences (MFWs/MFWSs) and their proportions in
the texts are obtained from Intelligent Archive 3.0 (Craig 2018) as parallel lists.
The dispersion of word and sentence lengths (DWL/DSL), moving-average typetoken ratio (MATTR), and verb-adjective ratio (VAR) values, on the other hand,
are calculated according to formulas 1 through 3, as follows.
SD =
1
å ( X i - X 0 )2
n
(1)
VAR =
verbs
verbs + adjectives
(2)
MATTR =
å
N -L
i =1
Vi
L( N - L + 1)
(3)
In formula 1, standard deviation (SD) gives the statistical expression of the DWL/
DSL (Liu and Xiao 2020, 35), where n is the number of words/sentences in the text,
Xi is the length of a single word/sentence, and X0 is the average word/sentence length.
196
Kan Wu and Dechao Li
In formula 2, verbs and adjectives represent the total numbers of verbs and adjectives in a text, respectively. Stanford Tagger 4.2.0 (Stanford NLP Group 2021) is
used to obtain the numbers of verbs/adjectives in the texts through POS annotation.
In formula 3, N is the total length of a text, L is the selected length of a text
chunk, and Vi is the number of types in the text chunk. To operationalize, the
chunk size in the MATTR calculation is set to 500, a setting that has previously
produced reliable results (see Covington and McFall 2010; Kettunen 2014).
Data related to the formal indices are normalized according to formula 4, while
the Euclidean distance between two stylistic panoramas built on the MFWs and
MFWSs are calculated based on formula 5, as follows:
Y=
X
å
n
k =1
X
2
AB = (x1 - x2 ) 2 + ( y1 - y2 ) 2
(4)
(5)
Formula 4 is the L2 regularization. Y represents the normalized value, X is the
n
original value, and å k =1 X 2 is the sum of the squares of all original values in a
dataset. A benefit of L2 regularization is that it handles the problem of overfitting
well when the dataset is relatively small.
Formula 5 is used to calculate the Euclidean distance between two stylistic
panoramas built on the MFWs/MFWSs, where x and y are the coordinates of each
panorama and AB is the distance between the panoramas.
For the algorithms used to analyze the stylistic indices, the study employs hierarchical cluster analysis (HCA) and principal component analysis (PCA) for the
reasons elaborated in Section 11.2. HCA is an unsupervised machine learning procedure that groups similar objects into a category (Christopher et al. 2008, 321)
and is amenable to small sample sizes with the O = 2k principle4 (Formann 1984).
By contrast, PCA is based on the idea of reducing a substantial number of variables
into a smaller number of transformed variables (Manly 2016, 103) and can thus
help measure the overall similarities and/or differences between (translated) texts.
11.3.3 Analytic Steps
The stylometric analyses of the selected works require several steps. First, the
data are cleaned by removing all the paratexts (prefaces, appendices, footnotes,
etc.) as a preparatory step to minimize any possible influence of these on the
results. Next, we retrieve the stylistic indices of the raw data from each text in the
research, using software computation and manual calculations. Third, we build
stylistic panoramas with the retrieved data, by using R 4.03 to perform HCA and
PCA separately on the normalized data. This step reveals the potential stylistic
connections between the translated Wuxia and the two subgenres of Western
heroic literature. Finally, the results are interpreted in light of the current reception of some Wuxia translations, with reflection on theoretical implications on the
use of stylistic panoramas in stylometric analyses.
Translated Chinese Wuxia Fiction and Western Heroic Literature
197
11.4 Results
11.4.1 Stylistic Panoramas Based on Formal Indices
Table 11.4 summarizes the raw values of the seven indices (AWL, DWL, MATTR,
VAR, ASL, DSL, and APL). These raw values suggest that the translated Wuxia
works and the two subgenres of Western heroic literature are stylistically similar
Table 11.4 Statistics of the Stylistic Indices across the Genres
Subgenre
Chivalric
Stories
Heroic
Fantasies
Translated
Wuxia
Fiction
Fiction
Don Quixote
Ivanhoe
Parzival
In the Days of
Chivalry
Heroes and
Heroines of
Chivalry
Castles,
Knights, and
Chivalry
The Lord of the
Rings, 1–2
A Song of Ice
and Fire, 1
Conan the
Barbarian
Wheel of Time,
1
The Chronicles
of Amber,
1–4
The Chronicles
of Narnia,
1–5
Fox Volant of
the Snowy
Mountain
The Book and
Sword
The Deer and
the Cauldron
A Hero Born
A Bond
Undone
A Snake Lies
Waiting
Word
Sentence
Para.
AWL
DWL
MATTR
VAR
ASL
DSL
APL
4.27
4.46
4.22
4.25
2.21
2.35
2.14
2.09
0.50
0.55
0.54
0.51
0.77
0.71
0.76
0.71
48.17
33.07
19.62
25.10
26.56
18.29
9.68
18.79
71.83
56.57
75.78
82.88
4.07
1.93
0.46
0.80
23.12
14.53
67.60
4.32
2.14
0.53
0.81
11.75
6.72
29.38
4.09
1.92
0.50
0.75
14.63
10.14
45.70
4.16
1.93
0.52
0.77
11.82
8.49
33.60
4.44
2.23
0.55
0.72
26.43
10.25
49.68
4.26
2.08
0.53
0.79
18.75
8.71
48.26
4.09
2.15
0.50
0.80
11.12
8.46
38.89
4.14
1.99
0.50
0.76
17.47
11.51
31.28
4.56
2.36
0.53
0.75
19.73
10.76
46.84
4.29
2.10
0.50
0.80
13.07
8.20
33.31
4.38
2.26
0.52
0.76
39.18
14.84
26.00
4.39
4.47
2.20
2.27
0.55
0.56
0.78
0.77
22.82
21.67
7.63
7.40
33.70
29.06
4.34
2.13
0.54
0.79
17.41
7.73
31.27
198
Kan Wu and Dechao Li
Figure 11.1 Stylistic panoramas of the three subgenres, from a global view.
at the word level but divergent in terms of sentences and paragraphs. This trend
could reflect multiple factors, from different literary norms to translatorial/authorial idiosyncrasies, thus giving target readers of the three subgenres varied reading
experiences. To further probe the stylistic connections, we examine the stylistic
panoramas formed by these formal indices from both global and local perspectives. While the examination at the global level attempts to capture the panorama
of each subgenre, the investigation of local perspectives compares the panoramas
of single works in the three subgenres.
11.4.1.1 Stylistic Panoramas from a Global Perspective
A global comparison of the stylistic panoramas between the Wuxia translations
and the chivalric stories/heroic fantasies is meant to locate the stylistic features of
the Wuxia translations in relation to those of the Western heroic literature. Before
that comparison, however, the formation of a stylistic panorama requires that we
normalize the data under each column in Table 11.4 for better data comparability across the indices, using formula 4 (explained in Section 11.3.2). Then, we
tally the normalized data for each stylistic index to obtain a total value, which
is the statistical ingredient of the stylistic panorama at this global level. The
Translated Chinese Wuxia Fiction and Western Heroic Literature
199
stylistic panoramas that are formed in each of the three subgenres are presented in
Figure 11.1 as radar charts. They reveal several stylistic patterns.
At the word level, a conspicuous pattern is that the translated Wuxia has higher
normalized values than the two subgenres of Western heroic literature. This is
easily detected in the moving-average type-token ratio (MATTR) values, in which
the normalized value for the translated Wuxia novels is 1.44 and the values for the
chivalric stories and heroic fantasies are both 1.40. That result may point to the
use of a comparatively richer vocabulary in the translated Wuxia. In addition, the
highest dispersion of word lengths (DWL) value is 1.47, demonstrating that the
word length in the translated Wuxia novels is generally more variable than that in
the two Western subgenres. Likewise, a normalized average word length (AWL)
of the translated Wuxia stories at 1.45 indicates the use of longer and more complex words. One potential reason for such a tendency would be a shared preference
by the Wuxia translators to use longer and more complex words for explanative
renditions in English, because the original Chinese versions by Cha contain many
culturally loaded Wuxia concepts. Also, the highest verb-adjective ratio (VAR)
value is found in the Wuxia translations, which reveals that they use more verbs
than their Western counterparts do. This has numerous stylistic effects, including
a more vivid reading experience to the target readers.
At the sentence and paragraph levels, there are two noticeable patterns. First,
the chivalric stories have higher values overall. For instance, the normalized ASL
(average sentence length) and DSL (dispersal sentence lengths) values for the
chivalric stories, being 1.58 and 1.76, respectively, are much higher than those
of the other two subgenres. This could mean that most sentences in the chivalric
stories are more complex and more varied than those in the other two subgenres.
Similarly, the normalized APL (average paragraph length) for the chivalric stories
is 1.83, a value far greater than those for the translated Wuxia novels and the heroic
fantasies. A possible explanation would be the use of different literary norms
between the subgenres of heroic literature. Close reading of the selected chivalric
stories reveals that they tend to pack more sentences into a single paragraph, thus
often pushing their APLs to higher values and presenting readers with longer paragraphs. All these may suggest that, as an old form of heroic literature, chivalric
stories are stylistically more complex and more varied in terms of sentences and
paragraphs than the other two subgenres. Second, the ASL and APL values in the
Wuxia translations and the heroic fantasies demonstrate a less consistent but meaningful trend. Whereas the Wuxia translations have a higher ASL value of 1.32 yet
a lower APL value of 0.96, the heroic fantasies bear a greater APL value of 1.18
but a lower ASL value of 0.99. This indicates that the Wuxia translations may have
many short paragraphs built on relatively longer sentences – an important stylistic
feature which distinguishes the Wuxia translations from the heroic fantasies.
11.4.1.2 Stylistic Panoramas from a Local Perspective
A local comparison is meant to investigate whether the stylistic panorama pattern
deriving from the global comparison would vary when we make comparisons
200
Kan Wu and Dechao Li
Figure 11.2 Cluster dendrogram of the HCA-based stylistic panoramas.
across the three subgenres based on single heroic works. The study holds that
such a local comparison is necessary because it reveals how such extrastylometric factors as translatorial motivations and publication years might affect the stylistic patterns. To make that comparison, we use the normalized data and resort
to HCA to produce stylistic panoramas. The HCA-based stylistic panoramas are
produced according to the Euclidean distance between the texts and using the
maximum distance method in computation. The output from R 4.03 is presented
in Figure 11.2 as a cluster dendrogram: the horizontal axis shows the titles of the
18 selected works, and the vertical axis records the divergence of clusters. The
results show that the Wuxia translations differ in important ways from their Western counterparts and from each other.
The Wuxia translations published in more recent years (i.e., 2018, 2019,
2020) appear to form a distinct category that is not only stylistically different
from the selected chivalric stories and heroic fantasies but also differs from the
rest of Wuxia translations in the dendrogram. This trend is clearly shown by
the stylistic panoramas of A Snake Lies Waiting, A Bond Undone, and A Hero
Born, which indicate that the three recent Wuxia translations may share certain
similarities in terms of language use. In that regard, the study demonstrates that
a short publication span and close translatorial cooperation could be two factors
that shape this stylistic similarity among the three Wuxia translations. Notably,
the three translations were consecutively published in 2018, 2019, and 2020,
and the two translators, Holmwood and Chang, had worked with a kindred spirit
Translated Chinese Wuxia Fiction and Western Heroic Literature
201
in their translations in an effort to “see the foreign interest in China and its culture” (Mei 2019).
By contrast, the Wuxia translations from earlier periods (i.e., the 1990s and the
2000s) are stylistically diverse from each other but close to some works of the
heroic fantasies. It is noteworthy in Figure 11.2 that the stylistic panoramas of
The Book and Sword, Fox Volant of the Snowy Mountain, and The Deer and the
Cauldron are comparatively divergent from each other, despite that the three translations are produced within a relatively short span of 11 years. Instead, the stylistic
panoramas of Fox Volant of the Snowy Mountain and The Book and Sword are
respectively similar to those of Wheel of Time 1 and A Song of Ice and Fire 1, two
works of heroic fantasies published likewise in the 1990s. The stylistic panorama
of The Deer and the Cauldron is another story: it differs from that of the other
works in the three subgenres and forms an independent category, which places the
Wuxia translation somewhere between the chivalric stories and the heroic fantasies
in terms of style. That uniqueness may indicate variations in the language use and
thus the readability of the three Wuxia translations – a scenario that we tend to associate with different translatorial motivations. For example, the motivation behind
the translation of Fox Volant of the Snowy Mountain may have been “promoting
Chinese martial arts cultures overseas” (Wu and Li 2018, 100), while that behind
the production of The Deer and the Cauldron may have been “winning overseas
readership,” and by juxtaposition, the motivation underlying the translation of The
Book and Sword may have been “learning the Chinese language/culture” (Wu and
Li 2018, 101). As a result, the translators of the three earlier Wuxia translations are
likely to have used different vocabularies (more verbs, different culturally loaded
words, etc.) and to have varied the sentence/paragraph lengths in their translations.
11.4.2 Stylistic Panoramas Based on MFWs/MFWSs
The stylistic panoramas based on the formal indices show that the Wuxia translations are not only largely different from the works of the two Western subgenres but also divergent from each other. Therefore, the study seeks to determine
whether such stylistic patterns would remain or change when we compare the
panoramas on the basis of the most frequent words (MFWs) and most frequent
word sequences (MFWSs) through the PCA.
11.4.2.1 Parallel Lists of MFWs/MFWSs
Parallel lists of the MFWs/MFWSs in the texts (shown partially in Tables 5–75)
include information about the MFWs/MFWSs and the titles of each work. The
remaining rows list the words/word sequences and their proportions in each work.
To facilitate comparability, proportions are analyzed rather than raw frequencies, given the different lengths of the works. Finally, contractions such as “I’ve,”
“he’ll,” and “you’d” in all works are analyzed as single words rather than as separate words. Such a decision contrasts with previous studies (see Tognini-Bonelli
2001; Laviosa 2002; Mastropierro 2018), which treated those forms as separate
202
Kan Wu and Dechao Li
Figure 11.3 Sample parallel list for the MFWs in the selected works.
Figure 11.4 Sample parallel list for the MFWSs (2-grams) in the selected works.
Figure 11.5 Sample parallel list for the MFWSs (3-grams) in the selected works.
words for certain morphological and/or phonetic reasons. However, these linguistic options have stylistic impact in the present study, so it is reasonable to account
for them as instances of stylistic choice.
Before proceeding to the actual analyses, it is important to decide how many
word/word sequences in each text to consider from the tops of the parallel lists
in order to gain the MFWs/MFWSs-based panoramas. Because previous studies
Translated Chinese Wuxia Fiction and Western Heroic Literature
203
(see Burrows 2002; Rybicki 2012; Grabowski 2013; Mastropierro 2018) share no
agreement on this number, we determine it through repeated pilot studies according to this principle: the stability of analytic results improve with the increase of
the MFWs/MFWSs numbers but remain relatively unchanged once the numbers
reach a certain point, at which a stylistic panorama is formed. Therefore, in the
present analysis, we test with numbers from 100 to 2,000, using increments of
50, to arrive at the point at which stable results form a stylistic panorama. That
point turns out to be the top 1,000 in the case of the MFWs-based panoramas, the
top 950 in the case of 2-grams, and the top 900 in the case of 3-grams. For better
consistency, we use the top 1,000 MFWs/MFWSs entries as the benchmark for
conducting PCA analysis.
11.4.2.2 Overall Patterns of Stylistic Panoramas
The PCA results based on the MFWs, 2-grams, and 3-grams are depicted in Figures 11.6 through 11.7, respectively.
Figure 11.6 shows the overall extent to which the selected works from the three
subgenres differ on the basis of the stylistic panoramas formed by the top 1,000
Figure 11.6 PCA graph of individuals, based on the top 1,000 MFWs.
204
Kan Wu and Dechao Li
Figure 11.7 PCA graph of individuals, based on the top 1,000 2-grams
most frequent words (MFWs). The dots in the figure are the stylistic panoramas of
the heroic literary works represented by the MFWs, and the horizontal and vertical axes are the two principal components (dimensions) that represent the majority of data variance in the parallel word list. The metric distance between two dots
signifies a possible diversity level between the stylistic panoramas of two works.
The general principle is that the greater the metric distance between two dots, the
higher the diversity level between the stylistic panoramas of the two works. The
axes are unlabeled because they are the results of a dimensionality reduction6
through the principal component analysis (PCA) – specifically, an unsupervised
machine learning method, in which datasets are often unlabeled, unclassified, or
uncategorized (Saslow 2018). The two percentiles in the brackets along the axes
are the level of variance carried by the two components (dimensions): the first
component (Dim 1) represents 16.80% of the variance across the data, whereas
the second component (Dim 2) represents 12.74% of that variance. Those results
Translated Chinese Wuxia Fiction and Western Heroic Literature
205
Figure 11.8 PCA graph of individuals, based on the top 1,000 3-grams.
show that Dim 1 has more data variance than Dim 2 does, thus implying that the
distances between the data points along the horizontal axis bear greater variance
than those along the vertical axis do.
In that light, the message conveyed by Figure 11.6 is clear: the relatively short
distances between the dots representing the stylistic panoramas of the six Wuxia
translations suggest that these translations share similarities in their MFWs. Meanwhile, longer distances between these dots and dots symbolizing the panoramas of
the chivalric stories and heroic fantasies suggest that the translated Wuxia works
are very different from the two Western subgenres in terms of their MFWs. Similarly, the dots that represent the chivalric stories (except for Castles, Knights, and
Chivalry) and the heroic fantasies are mainly packed within their own subgenres
and are distant from the ones representing works of other subgenres. That orientation indicates that most heroic works belonging to the same subgenre are prone to
sharing their MFWs in texts.
When it comes to the panoramas formed by the top 1,000 most frequent word
sequences (MFWSs), the stylistic pictures are largely similar to those stemming
206
Kan Wu and Dechao Li
from the top 1,000 MFWs, despite there being some slight differences. In Figure 11.7, the first principal component (Dim 1) shows 16.98% of the variance
across the data, whereas the second principal component (Dim 2) carries 12.26%
of that variance – a pattern that resembles the MFWs-based PCA results. Likewise, five of the six dots representing the Wuxia translations are close to each
other but distant from the dots representing the works of other subgenres, except
for the one for The Deer and the Cauldron, which is closest to the dot representing A Song of Ice and Fire 1, a work of heroic fantasies. This suggests that
Minford’s Wuxia translation may bear strong similarities to the novel by Martin in terms of 2-grams. In addition to the dots for the Wuxia translations, the
dots symbolizing works of the two Westerns subgenres largely stay within their
own subgenres, with the exception of the dot for Castles, Knights, and Chivalry,
which lies closer to dots representing the Wuxia translations. In Figure 11.5, the
stylistic panoramas formed by 3-grams demonstrate a mirrored but otherwise
almost identical pattern with that in Figure 11.7, even though Dim 1 on the horizontal axis shows 15.03% of the data variance and Dim 2 on the vertical axis has
10.39% of such variance.
11.4.2.3 Metric Distances between Stylistic Panoramas
The above PCA results reveal how the MFWs/MFWSs-based stylistic panoramas
of the Wuxia translations are similar to and/or different from those of the chivalric
stories and heroic fantasies. However, they have only illustrated a general side of
the stylistic picture wherein the exact level of similarities between a Wuxia translation and other works in the graphs is still unknown. To determine that level, we
need to calculate the metric distances between each of the two dots through their
coordinates, which are simultaneously generated in the PCA. With the coordinates of each dot, we use formula 5 to compute their metric distances, and then we
focus on the average distances between the dots representing the Wuxia translations and the dots of the chivalric stories/heroic fantasies. In that way, the exact
metric distances between the Wuxia translations and the chivalric stories/heroic
fantasies are measured. Those average distances are reported in Table 11.5.
As the table shows, the MFWs-based stylistic panorama of The Deer and the
Cauldron by Minford has the shortest average distance (32.80) to the panoramas
Table 11.5 Average Distances between Each Wuxia Translation and the Western
Counterparts
Fiction
MFWs
2-grams
3-grams
Fox Volant of the Snowy Mountain
The Book and Sword
The Deer and the Cauldron
A Hero Born
A Bond Undone
A Snake Lies Waiting
38.32
43.12
32.80
42.74
39.15
42.72
34.91
31.86
26.55
37.88
35.47
37.42
31.34
31.44
25.91
38.75
33.43
39.35
Translated Chinese Wuxia Fiction and Western Heroic Literature
207
of the selected chivalric stories and heroic fantasies, whereas the MFWs-based
panorama of A Hero Born by Holmwood has the longest average distance to the
other subgenres (42.74). The average distances from the MFWs-based panoramas
of the remaining Wuxia translations to those of the chivalric stories and heroic
fantasies fall within the range of 38.32 to 42.74 and hence are significantly greater
than that of Minford’s translation. Of the MFWSs-based panoramas, the previously described stylistic scenario seems to repeat itself in the case of 2-grams, but
it bears nuances in the case of 3-grams. The 3-grams-founded panoramas illustrate that even though Minford’s translation still has a noticeably shorter average
distance to the other subgenres at 25.91, A Snake Lies Waiting by Chang has the
longest distance of 39.35, a value that is slightly higher than that of Holmwood’s
translation at 38.75.
All these numbers suggest that with regard to the top 1,000 MFWs- and
MFWSs-based panoramas, most of the chosen Wuxia translations are stylistically
different from the selected chivalric stories and heroic fantasies, with the exception of The Deer and the Cauldron by Minford. This result of metric distances
is in line with our direct observations of the PCA individuals, as shown in Figures 11.3 through 11.5, which we would relate again to translatorial motivations:
Minford’s motivation to win readerships in the English-speaking world (Cf. Section 11.4.1) may partly explain why his Wuxia translation resembles the chivalric
stories/heroic fantasies in terms of the MFWs/MFWSs.
11.5 Discussion
With the two types of stylistic panoramas, the present study has illustrated the
extent to which translated Wuxia and the two Western subgenres are stylistically
connected. The stylistic panoramas based on the formal indices show that there
are few similarities between the Wuxia translations and the stories of two Western
subgenres, despite that certain similarities do exist between those Wuxia translations and heroic fantasies published in the 1990s/2000s. The MFWs-/MFWSsbased stylistic panoramas reveal that the Wuxia translations are stylistically
different from those of most of the chivalric stories and heroic fantasies, with the
exception of The Deer and the Cauldron. These findings, together with the study,
have practical and theoretical implications with respect to the research questions.
On the practical side, the findings indicate reasons for the reception of Wuxia
translations: unique stylistic features (richer Wuxia-specific vocabularies, shorter
paragraph lengths, etc.) which distinguish Wuxia from both chivalric stories and
heroic fantasies could be a possible reason that these Wuxia translations are well
received. Table 11.9 summarizes the five-scale ratings of the six Wuxia translations by readers from four well-known websites of book promotions and reviews.
Because some ratings are not available in Novelupdates and/or Audible, we focus
on the average rating of each translation for better comparability. The table shows
that A Hero Born, A Bond Undone, and A Snake Lies Waiting have the top three
average ratings. We attribute the favorable ratings of the three Wuxia translations
to their stylistic uniqueness, which is partly shown in the following two aspects.
208
Kan Wu and Dechao Li
Table 11.6 Reception of the Six Wuxia Translations in English (Up to 02/2021)
Fiction
Amazon
Goodreads Novelupdates Audible
Fox Volant of the Snowy
Mountain
The Book and Sword
The Deer and the Cauldron
A Hero Born
A Bond Undone
A Snake Lies Waiting
3.40 of 5 3.84 of 5
3.00 of 5
n/a
3.41 of 5
4.80 of 5
4.20 of 5
4.60 of 5
4.70 of 5
4.80 of 5
3.20 of 5
4.40 of 5
4.30 of 5
n/a
n/a
n/a
n/a
4.70 of 5
5.00 of 5
4.70 of 5
3.96 of 5
4.29 of 5
4.41 of 5
4.70 of 5
4.63 of 5
3.89 of 5
4.28 of 5
4.02 of 5
4.39 of 5
4.39 of 5
Average
First, regarding the stylistic panoramas founded on the formal indices, relatively higher MATTR but lower DSL and APL values could contribute in part
to favorable ratings. A high MATTR value suggests a rich vocabulary, which in
Wuxia translations could mean the readers may receive an extended cultural experience of martial arts with a greater use of Wuxia-specific words. For instance,
when rendering the original names of martial heroes and kung fu fighters, Holmwood and Chang both use more creative words, such as “Ryder Han,” “Ironheart Yang,” “Twice Foul Dark Wind,” “Nine Yin Skeleton Claw,” and the like.
By contrast, in the earlier (the 1990s/2000s) Wuxia translations, those elements
are sometimes presented less interestingly because of transliteration and/or omission. In addition, a lower DSL value in these translations could indicate a more
repetitive yet consistent translation of the Wuxia-specific terms across sentences.
Such an explanation would help us form a coherent impression of the fictional
Wuxia world created by these terms. Finally, the lower APL values may reduce
the readers’ reading efforts when they come across certain culturally alien and
linguistically idiosyncratic Wuxia elements. For instance, when the readers read
a paragraph which contains many Wuxia-specific words (Ryder Han, Ironheart
Yang, etc.), a relatively short paragraph with a low APL value around 30 (see
Table 11.4) is more likely to reduce their cognitive load as they process these
Wuxia elements. All these unique stylistic features may motivate readers to rate
the three translations favorably.
Second, in terms of the MFWs-/MFWSs-based stylistic panoramas, a greater
use of words and word sequences related to body language, body parts, mood,
or inner feelings could likewise lead to more favorable ratings. When we look
through the parallel lists, body-language words, such as “sighed,” “pointed,”
“nodded,” “shouted,” and the like, appear frequently in the three translations,
and words related to mood, such as “worried,” “angry,” “surprised,” “scared,”
and so on, are also widely used in the same translations. In addition, 2- and/or
3-grams about body parts, such as “his neck,” “his chest,” “head and arms,” and
“in his hand,” as well as ones for inner feelings, such as “dared to,” “refused to,”
“had no idea,” and “he wondered about,” create vivid characterizations. This is
because a preservation of the original descriptions of body language in the Wuxia
translations may shorten the psychological distance between target readers and
Translated Chinese Wuxia Fiction and Western Heroic Literature
209
the reconstructed Wuxia heroes/heroines, who are “alive” with perceptible human
kinetic and/or mental presentations.
On the theoretical side, the study casts light on the use of stylistic panoramas in
stylometric analyses in the following ways. First, the study attaches importance
to the intended function of the chosen stylistic indices when using them as building blocks of a stylistic panorama. The seven formal indices are selected to show
the general stylistic features of the heroic works, while the purpose in analyzing
the MFWs and MFWSs is to identify the lexical resources in the same works.
This could be important to a stylometric study because it binds indices together
through a shared function. Nonetheless, the selection criteria of stylistic indices
in some previous studies (Hossain et al. 2017; Liu and Xiao 2020) are not always
made clear to readers. As a result, possible functions associated with those indices
are often underexplored, which could lead to a tenuous connection between the
selected indices.
Second, the study values triangulation of different types of stylistic panoramas
when exploring holistic stylistic pictures of (translated) texts under investigation.
For example, when the study concludes that the Wuxia translations are stylistically different from the chivalric stories and heroic fantasies, it has done so by triangulating the results stemming from the stylistic panoramas based on the seven
formal indices and ones built by the MFWs and MFWSs. In this way, the study
not only takes multiple functionally related stylistic features into account but also
locates the stylistic pictures of the same genre from different stylistic perspectives. By contrast, some existing stylometric analyses of (translated) texts have
confined their stylistic explorations to formal indices (Hossain et al. 2017; Liu and
Xiao 2020) or MFWs/MFWSs (Eder 2017; Haverals et al. 2022) without attempting to triangulate the results from both sides. Consequently, extra stylistic pictures
stemming from such triangulation are sometimes ignored in those analyses. In
this light, we hold that such triangulation of stylistic panoramas may benefit stylometric analyses, as it helps the analyses transcend a single stylistic perspective
by bringing multiple stylistic perspectives into play.
Third, the study holds that when we use stylistic panoramas in stylometric studies that highlight linguistic characteristics at a single level, however, there might
be some weakness. This is especially evident when we attempt to use both types
of stylistic panoramas. Despite the edge offered through triangulation, it would be
less appropriate to use them simultaneously in studies dedicated to such singlelevel characteristics as words, word sequences, or sentences, since their scopes
of investigations would be too narrow. Nevertheless, as the concept of stylistic
panoramas is now in its infancy, it still has room for further development to satisfy
the theoretical and methodological needs of varied stylometric studies.
11.6 Conclusion
Returning to the original research interest, the study can now give a clear answer:
stylistic connections between translated Chinese Wuxia and Western heroic literature
210
Kan Wu and Dechao Li
are weak because the stylistic panoramas founded on the formal indices and the
MFWs/MFWSs have demonstrated important stylistic differences across the genres. Despite these divergences, the study has made the following contributions to
Wuxia translation research and stylometric studies: first, it highlights possible stylistic connections between heroic literature in the East and that in the West, clues
which may help understand the reception of Chinese Wuxia in the West. Second, it
demonstrates the use of the stylistic panorama, a concept that seeks to describe the
stylistic picture of a (translated) text in a relatively holistic way by binding different
stylistic indices together, with respect to function.
Nonetheless, this study has several limitations, one of which is that the stylometric analyses are founded on a relatively small number of Wuxia translations.
Even though the study has included all the English Wuxia translations published
at the time of writing, it is assumed that when there are more Wuxia translations in
the future, the results might be slightly different due to various translatorial (translators’ motivations, preferences, etc.) and/or extratranslatorial (patronage intervention, sociocultural influences, etc.) reasons. Furthermore, the present selection
of stylistic indices in the formation of panoramas has considered only general
features, whereas an alternative selection that favors more idiosyncratic features
(hapax legomena related to martial arts, chivalry, fantasies, etc.) pertaining to
the heroic literature could be equally potent in uncovering stylistic connections
between heroic literature in the East and that in the West.
For further research along this line, the publication of additional Wuxia translations could allow works by different authors to be incorporated into the corpus to
produce more insightful results. In the meantime, stylistic indices that focus on
various idiosyncratic features of heroic literature can be considered to widen the
scope of meaningful research.
Funding
This work was supported by the Research Grants Council of the Hong Kong Special Administrative Region, China [PolyU/RGC 15602621].
Notes
1 The article was originally published on April 23, 2022, in Digital Scholarship in the
Humanities (DSH), DOI: 10.1093/llc/fqac019. It is reused under license 5304200620907
permitted by Oxford University Press. Credit goes to DSH, Oxford University Press, the
European Association for Digital Humanities (EADH), and Alliance of Digital Humanities Organizations (ADHO).
2 All Wuxia translations used in the study were purchased from Amazon.com as e-books.
3 All the Western heroic works used in the study were available in the public domain and
were downloaded freely from Gutenberg.org as “txt” files.
4 O is the minimum sample size, and k is the number of variables.
5 Full tables are available in Figshare (DOI: 10.6084/m9.figshare.19361468).
6 Because PCA is a multivariate statistical analysis that operates according to dimensionality reduction (Manly 2016, 102–3), multiple dimensions in the analysis were compressed into two dimensions – a more manageable scale for the present work.
Translated Chinese Wuxia Fiction and Western Heroic Literature
211
References
Březina, Vaclav. 2018. Statistics in Corpus Linguistics: A Practical Guide. Cambridge:
Cambridge University Press.
Brocardo, Marcelo Luiz, Issa Traore, and Isaac Woungang. 2014. “Toward a Framework
for Continuous Authentication using Stylometry.” 2014 IEEE 28th International Conference on Advanced Information Networking and Applications, Victoria, BC, Canada,
106–15.
Broder, Andrei Z., Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. 1997.
“Syntactic Clustering of the Web.” Computer Networks and ISDN Systems 29, no. 8:
1157–66.
Burrows, John. 2002. “ ‘Delta’: A Measure of Stylistic Difference and a Guide to Likely
Authorship.” Literary and Linguistic Computing 17, no. 3: 267–87.
Christopher, Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to
Information Retrieval. Cambridge: Cambridge University Press.
Covington, Michael, and Joe D. McFall. 2010. “Cutting the Gordian Knot: The MovingAverage Type – Token Ratio MATTR.” Journal of Quantitative Linguistics 17, no. 2:
94–100.
Craig, Hugh. 2018. Intelligent Archive 3.0. Newcastle: University of Newcastle.
Daelemans, Walter. 2013. “Explanation in Computational Stylometry.” In Computational
Linguistics and Intelligent Text Processing, edited by Alexander Gelbukh, 451–62. Berlin: Springer.
Eder, Maciej. 2017. “Visualization in Stylometry: Cluster Analysis Using Networks.” Digital Scholarship in the Humanities 32, no. 1: 50–64.
Eder, Maciej, Jan Rybicki, and Mike Kestemont. 2016. “Stylometry with R: A Package for
Computational Text Analysis.” The R Journal 8, no. 1: 107–21.
Flannery, Mary. 2012. “The Concept of Shame in Late‐Medieval English Literature.” Literature Compass 9, no. 2: 166–82.
Formann, Anton. K. 1984. Die Latent-class-analyse: Einführung in Theorie und Anwendung. Weinheim: Beltz.
Grabowski, Łukasz. 2013. “Interfacing Corpus Linguistics and Computational Stylistics:
Translation Universals in Translational Literary.” International Journal of Corpus Linguistics 18, no. 2: 254–80.
Haverals, Wouter, Lindsey Geybels, and Vanessa Joosen. 2022. “A Style for Every Age:
A Stylometric Inquiry into Crosswriters for Children, Adolescents and Adults.” Language and Literature 31, no. 1: 62–84.
Holmes, David I. 1998. “The Evolution of Stylometry in Humanities Scholarship.” Literary and Linguistic Computing 13, no. 3: 111–17.
Honegger, Thomas. 2010. “Heroic Fantasy and the Middle Ages – Strange Bedfellows or
An Ideal Cast?” Itinéraires: Littérature, Textes, Cultures, no. 3: 61–71.
Hossain, M. Tahmid, Md. Moshiur Rahman, Sabir Ismail, and Md Saiful Islam. 2017.
“A Stylometric Analysis on Bengali Literature for Authorship Attribution.” In ICCIT
2017: 20th International Conference on Computer and Information Technology, 1–5.
Dhaka, Bangladesh: IEEE.
Hou, Renkui, and Churen Huang. 2020. “Robust Stylometric Analysis and Author Attribution Based on Tones and Rimes.” Natural Language Engineering 26, no. 1: 49–71.
Huang, Yonglin. 2018. Narrative of Chinese and Western Popular Fiction. Berlin: Springer.
Jones, Ewan, and Paul Nulty. 2019. “Quantitative Measures of Lexical Complexity in
Modern Prose Fiction.” Digital Scholarship in the Humanities 34, no. 4: 914–37.
212
Kan Wu and Dechao Li
Kettunen, Kimmo. 2014. “Can Type-Token Ratio Be Used to Show Morphological Complexity of Languages?” Journal of Quantitative Linguistics 21, no. 3: 223–45.
Keulemans, Paize. 2020. Sound Rising from the Paper: Nineteenth-Century Martial Arts
Fiction and the Chinese Acoustic Imagination. Leiden: Brill.
Laviosa, Sara. 2002. Corpus-based Translation Studies: Theory, Findings, Applications.
Amsterdam: Rodopi.
Liu, Ying, and Tianjiu Xiao. 2020. “A Stylistic Analysis for Gu Long’s Kung Fu Novels.”
Journal of Quantitative Linguistics 27, no. 2: 32–61.
Manly, Bryan F. J. 2016. Multivariate Statistical Methods: A Primer (4th ed.). London:
Chapman and Hall.
Mastropierro, Lorenzo. 2018. Corpus Stylistics in Heart of Darkness and Its Italian Translations. London: Bloomsbury Publishing.
Mei, Jia. 2019. “Turning Action into Words.” China Daily, April 19. http://global.chinadaily.com.cn/a/201904/19/WS5cb91987a3104842260b70d3_2.html. Accessed 28
February 2021.
Melka, Tomi S., and Michal Místecký. 2020. “On Stylometric Features of H. Beam Piper’s
Omnilingual.” Journal of Quantitative Linguistics 2, no. 7: 204–43.
Rong, Zheng, Li Jiexun, and Chen Hsinchun. 2006. “A Framework for Authorship Identification of Online Messages: Writing-Style Features and Classification Techniques.”
Journal of the American Society for Information Science and Technology 57, no. 3:
378–93.
Rybicki, Jan. 2012. “The Great Mystery of the Almost Invisible Translator.” In Quantitative Methods in Corpus-Based Translation Studies, edited by M. P. Oakes and M. Ji,
231–48. Amsterdam: John Benjamins.
Saslow, Elliott. 2018. “Unsupervised Machine Learning.” Towards Data Science. https://
towardsdatascience.com/unsupervised-machine-learning-9329c97d6d9f. Accessed 28
February 2021.
Scott, Mike. 2012. WordSmith Tools version 6, Stroud: Lexical Analysis Software. https://
www.lexically.net/publications/citing_wordsmith.htm.
Stanford NLP Group. 2021. “Stanford Tagger 4.2.0.” https://nlp.stanford.edu/software/tagger.html. Accessed 28 February 2021.
Tognini-Bonelli, Elena. 2001. Corpus Linguistics at Work. Amsterdam: John Benjamins.
Valencia, Alex I., Helena Gómez-Adorno, Christophe Rhodes, and Gibran Fuentes Pineda.
2019. “Bots and Gender Identification Based on Stylometry of Tweet Minimal Structure
and N-Grams Model.” In Working Notes of CLEF 2019 – Conference and Labs of the
Evaluation Forum, 1–8. Lugano: CLEF.
Vander Elst, Stefan. 2017. The Knight, the Cross, and the Song: Crusade Propaganda and
Chivalric Literature, 1100–1400. Philadelphia, PA: University of Pennsylvania Press.
Wu, Kan, and Dechao Li. 2018. “Lexical Normalization in English Translations of Jin
Yong’s Martial Arts Fiction: A Corpus-Based Study.” In Asia Pacific Interdisciplinary
Translation Studies, edited by X. Luo, 93–106. Beijing: Tsinghua University Press.
12 Translating Personal
Reference
A Corpus-Based Study of the
English Translation of Legends of
the Condor Heroes
Jing Fang and Shiwei Fu
12.1 Introduction
Louis Cha Leung-yung, better known by his pen name, Jin Yong, was the most
famous martial arts novelist in China. One of his best-selling novels, Legend of
the Condor Heroes (射雕英雄傳, henceforth LoCH), first published in 1957,
has been a best-selling novel read by the Chinese communities worldwide for
more than six decades. More recently, a four-volume English translation of the
novel, published between 2018 and 2021, has sparked another round of craze in
the English-speaking world, attracting more than 5,000 ratings and almost 1,000
comments on Goodreads.
The English translation of the novel has also received attentions from scholars in translation studies. Readers’ reviews have been analyzed (e.g., Zhang and
Wang 2020), and intratextual and extratextual translation factors have been further
explored in terms of readers’ reception (e.g., Chen and Dai 2021; Xu and Chang
2020). However, among these studies, little attention has been given to the character development in the translation, though LoCH is known for its three-dimensional
building of characters. For example, when the main protagonist of the novel, Guo
Jing, makes reference to himself in conversation with other characters, Louis Cha
used not only the first-person pronoun but also third-person nominal groups (NGs)
to portray Guo Jing’s humbleness, a key personality trait of the character that has
been widely recognized by the readers of the Chinese source text (ST). In English,
however, the lexicogrammatical choices in realizing these personal references are
limited. The disparity between the two languages has posed a challenge in translating these referential meanings, which in turn may impact the target text (TT) readers’ understanding of the character, as personal reference in the speech situation is
recognized as an important linguistic marker that reflects a character’s personality
and his perception of social relations (Ireland and Mehl 2014).
Interestingly, our review of the comments made by readers of the translated LoCH
on Goodreads (www.goodreads.com/) implies that the English translation had successfully portrayed the humbleness of the protagonist Guo Jing, with 22% of the reviewers
particularly mentioning “humble” and “modest” when commenting on Guo Jing’s personality. It would be interesting to explore how the translator managed to adequately
DOI: 10.4324/9781003298328-13
214
Jing Fang and Shiwei Fu
render the character’s humbleness in the TT through the translation of personal reference in conversations. Adopting a corpus-based approach, we will investigate and compare the lexicogrammatical choices used in the ST and the TT in achieving personal
reference, examining how an equally humble character is developed in the translation.
12.2 Character Development in Literary Translation
It is necessary to first define the scope of the study. We believe that fictional characters are imitations of real people. As Mead (1990) points out, fictional characters
can be recognized or appreciated in the same way as people in real life, and they
should not be dehumanized as a purely textual existence. And in analyzing personality traits of real people, it has been found that language is a valid and reliable
mode of measuring and understanding personality (Caplan et al. 2020). Therefore,
by the same token, it is reasonable to adopt a linguistic approach in analyzing
fictional characters by examining the language being used by these characters.
In great novels, the depiction of a character’s actions and manners of thoughts
makes the character who he is, through which a sense of relative stable or abiding
personal quality is achieved (Chatman 1978). Likewise, such character development in the source text (ST) can be reproduced in translation, which concerns the
recreation of meaning in context through choice, including both the choice in the
interpretation of the original text and the choice in the creation of the translated
text (Ma and Wang 2020). In other words, a good translator is able to depict a
character equally effective to the TT readers by reconstructing elements such as
the character’s traits, emotions, instincts, and relationship through lexicogrammatical choices.
In fact, some research has been conducted to study the reshaping of characters
in literary translation, focusing on various linguistic features, such as transitivity
shifts (e.g., Barbosa de Vasconcellos 1998; Lee 2018), personal pronoun (e.g.,
Bosseaux 2006; Lin 2020), speech verbs (e.g., Ruano 2017; Mastropierro 2020),
phonological feature in dialectal representations (e.g., Rodríguez Herrera 2015),
grammatical and lexical choices of geolects (e.g., Alsúa and Edurne 2018).
Among these linguistics-based studies of character-building in translation, the
work of Bosseaux (2006) and Lin (2020) is particularly relevant to the current
study because both of the studies examine the speaker’s usage of personal reference in relation to character development, which is also the focus of the current chapter. According to their studies, pronouns were counted as an indicator to
evaluate characterization. Lin (2020), for example, compared the frequency of the
first-person pronoun I that had been distributed in different translations and drew
conclusions about the speaker’s subjectivity. However, the data collected in Lin’s
study were not powerful enough, as no ST data were included in the analysis, and
the TT data were analyzed out of context. In the study conducted by Bosseaux
(2006), the author argued that, due to the ambiguity in the use of the secondperson pronoun you in the ST, translators were obliged to explicitate the reference
when you was used in the TT. However, it is not clear whether the vagueness had
been effectively eliminated through the explicitation of the reference, nor did the
author provide any readers’ comments in supporting her argument.
Translating Personal Reference
215
It seems that a more comprehensive project which involves a context-based
analysis of both ST and TT and which takes into account the feedback of the TT
readers is needed to provide an in-depth examination of the character-building in
the literary translation. Against this background, the current paper aims to use the
translation of the LoCH as a case to explore this topic. In particular, using personal reference as the operational variable, we will examine how the translator of
LoCH managed to effectively re-establish the humbleness of the protagonist, as
evidenced by the TT readers’ reviews.
12.3 Methodology
Our analysis of the translation of LoCH will focus on Guo Jing (郭靖), the main
protagonist in the novel, and we will investigate how the character makes reference
to himself and to his listeners in conversations, through which his personal trait of
humbleness is portrayed. It is therefore necessary to briefly introduce this key character. In the novel, Guo Jing is portrayed as a hero with great integrity living in the Song
Dynasty (960–1279) who is extremely loyal to his country, respectful and humble to
his shifu (masters who teach him martial arts), affectionate to his lover, Lotus (aka
Huang Rong), and ruthless to enemies who invade his country. In a society of his
time, as a young man who has only just started his adventure in the martial arts world,
Guo Jing is considered junior to many characters in the novel, and he is fully aware of
his inferior status, which explains his humbleness as portrayed in the book. Guo Jing
understandably shows his humbleness and modesty mainly in front of people who are
in a cordial relationship with him, such as his teachers and friends. Therefore, our data
collection will focus on his conversations with this group of people, and his conversations with his enemies will not be included in our data collection.
12.3.1 Text Collection and Corpus Building
A parallel corpus was established for this research, with the Chinese ST data collected from Legends of the Condor Heroes (Jin Yong 1980/2002) and the English
TT data from the translation done by Anna Holmwood, Gigi Chang, and Shelly
Bryant published by MacLehose Press. The ST was written in vernacular mandarin Chinese featured with classic Chinese styles, providing a rich repository for
personal reference choices.
After the data collection, we used CorpusWordParser 3.0.0.0 (Xiao 2014) to
segment the Chinese ST text into sentences, and used “郭靖 (Guo Jing)” as the
search term in AntConc 3.2.0 (Anthony 2006) to filter out his direct projection of
locutions – the direct conversations that Guo Jing engages in with other characters.
Then we manually removed those conversations between Guo Jing and his enemies, as these data are not the focus of the current project. The remaining conversations and their English translations then form the corpus for analysis, where all the
instances of Guo Jing’s self-reference (i.e., in referring to himself as the speaker)
and his reference to his addressees in the conversations were tagged, including
first- and second-person pronouns and NGs that function as reference items to
either the speaker Guo Jing or the addressee that Guo Jing talks to.
216
Jing Fang and Shiwei Fu
Altogether, the parallel corpus consists of 20,927 tokens in the ST and 24,114
tokens in the TT. Observation and analysis of the data were carried out in a textual
environment through concordance.
12.3.2 Analytical Framework
12.3.2.1 Personal Reference and Its Realization in Lexicogrammar
As we will focus on the handling of personal reference in the translation, it is
necessary to introduce the term based on the definition given by Halliday and
Hasan (1976). According to Halliday and Hasan (1976), personal reference is the
reference by means of function in the speech situation, through which readers
can identify the speech roles involved in a conversation. These speech roles can
be generally realized by pronouns: in English for example, I is used to refer to
the speaker, and we to the speaker plus, and you to the addressee(s). In addition, connotations of social distance and social hierarchy are embedded in the
use of personal reference, indicating politeness and constructing social identity
and interpersonal relations (Wales 1996; Brown and Gilman 2012). In English,
NGs are typically used to refer to the third person, a non-interactant who is not
involved in a conversation. NGs are rarely used as speaker-reference in English
except for being interpreted as baby talk or liturgy (Wales 1996, 57).
However, the use of a third-person NG for speaker-reference is not uncommon
in the vernacular Mandarin Chinese spoken in Guo Jing’s time, when China was
in the imperial dynasty of Song. In the Chinese culture, the notion of “self” is
positioned in a heavily relational model which intricates social networks based
on age and gender, family ties, affinity, and social hierarchy (Feng 1961). Personal reference terms in general are therefore heavily social distance–calculated
and hierarchy-based in Chinese, and even more so in the historical periods, when
social ranks were much stricter than in the modern age. And third-person NGs
were often used as a politeness strategy to refer to the speaker or to the addressee
in Guo Jing’s time, bringing indications of the social relationship between the
speaker and the addressee. The use of third-person NGs is therefore an important
marker of humbleness, as showing humility is a key politeness strategy in the
Chinese culture (Huang 2008).
Based on the data observation, the reference items used by Guo Jing in the
ST in referring to the two speech roles (i.e., speaker and addressee) are summarized in Figure 12.1. As shown in Figure 12.1, both pronouns and third-person
NGs are used in the Chinese text in realizing the reference to the speaker himself
and to the addressees, which carry significant implications of the social relationship between Guo Jing and his addressees.
12.3.2.2 Speech Roles and Their Social Status
From a social perspective, speech roles, such as “speaker” and “addressee” in a
speech situation, can generally be viewed in two dimensions underlying social relationships, including social status and interpersonal distance. This categorization
Translating Personal Reference
217
Figure 12.1 Lexicogrammatical realization of speech roles in LoCH (ST).
echoes with “power and solidarity” by Brown and Gilman (2012, 252), with the
hierarchical status implicating an either equal or unequal (therefore “hierarchical”) relationship, and the interpersonal distance implicating an either cordial or
hostile relationship. In the LoCH, a hierarchical relationship exists in both Guo
Jing’s cordial circles (such as with his teachers and his friends) and his distant circles (such as with his enemies), implicating a sophisticated social network where
the main protagonist functions. However, as we will only focus on the conversations between Guo Jing and characters who are in a cordial relationship with him,
the variable “social distance” will be excluded from the data analysis. As a result,
“social status” becomes the only variable for analysis when we interpret different
speech roles in a social context.
In terms of social status, we have found that characters who are in conversation with Guo Jing are either in an equal or a hierarchical relationship with the
protagonist. The characters who are of equal social status to Guo Jing are generally fellow youths who are junior in the martial arts society. In comparison, the
characters who are of hierarchical social status are generally senior in age and/or
hold senior positions in a martial arts organization.
It also needs to point out that, in the “hierarchical” group, all the characters
listed in this category are in fact of a higher social status than Guo Jing in the
novel. This is probably because that, in the novel, Guo Jing is described as a
young man raised by a working-class single mother who only has just started his
martial arts career. So it is not surprising to see that many characters in the book,
except those who are as young as Guo Jing, have a higher social status, as they are
either senior in age and/or senior in the martial arts status.
218
Jing Fang and Shiwei Fu
12.3.2.3 Translation Equivalence in Lexicogrammar
Once the data were analyzed in terms of personal reference (as speaker-reference
or as addressee-reference) and in terms of their lexicogrammatical realization
(as pronouns or as third-person NGs), we then compared the lexicogrammatical choices in realizing these referential meanings between ST and TT. Based
on the comparison, translation choices in realizing personal reference were
then categorized as either “equivalence” or “non-equivalence.” A translation is
labelled as “equivalence” when the same type of lexicogrammatical choice is
made in the TT in achieving the referential meaning (as either the speaker or the
addressee). For example, when a third-person NG is used as the ST in referring
to the speaker Guo Jing and has been realized as a third-person NG in reference
to the speaker Guo Jing in the TT, this translation is counted as an equivalence.
In comparison, a translation is labelled as “non-equivalence” when a different
lexicogrammatical strategy is used to achieve the referential meaning in the
TT. For example, when a third-person NG in reference to the speaker in the
ST is translated as a first-person pronoun I in the TT, this translation is labelled
as a non-equivalence. It is important to clarify that, in this paper, the category
of “non-equivalence” cannot be interpreted as a case where semantic meaning
is not equal between the ST and the TT. Rather, by “non-equivalence” we aim
to focus on the different lexicogrammatical choices made by the translators in
achieving the same (or equal) referential meanings in semantics. In other words,
a semantically equivalent translation of the personal reference may be viewed
as a case of “non-equivalence” when a TT choice in lexicogrammar is different
from the ST.
Following the analysis of the parallel data, statistical analysis was conducted.
The statistic work aims to examine whether the translation of personal reference is associated with the social status of the speech roles. As pointed out
earlier, readers’ comments indicate that the TT managed to effectively portray
Guo Jing’s humbleness despite the interlingual differences between Chinese
and English in realizing these referential meanings. By examining the relationship between translation equivalence and the social status of the speaker and his
addressee, we try to find possible explanations behind such effective rendition
of Guo Jing’s humbleness in the TT, as we assume that the translator was probably highly sensitive to the social status of different characters, based on which
a conscious translation choice was made to reflect the social relations between
these characters.
12.4 Findings and Discussions
12.4.1 Personal Reference: The Speaker and the Addressee
Table 12.1 presents all the lexicogrammatical choices in achieving personal reference in both the ST and the TT with a speech role either as the speaker or the
addressee in Guo Jing’s utterances.
Translating Personal Reference
219
Table 12.1 Personal References Used by Guo Jing in LoCH
Social
Status
Equal
Characters
黃蓉 [Lotus
Huang]
楊康 [Yang
Kang]
周伯通 [Zhou
Botong the
Hoary Urchin]
慈 [Mercy
Mu]
華箏 [Khojin]
陸乘風 [Zephyr
Lu]
傻姑 [Silly]
魯有腳 [Surefoot
Lu]
Hierarchical 江南七怪 [The
Seven Freaks
of the South]
洪七公 [Count
Seven Hong]
丘處機 [Qiu
Chuji]
王處一 [Wang
Chuyi]
馬鈺 [Ma Yu]
黃藥師
[Apothecary
Huang]
成 思汗
[Genghis
Khan]
李萍[Lily Li]
梅超風 [Cyclone
Mei]
ST
TT
Speech Roles
Speech Roles
Speaker
(= Guo Jing)
Addressee
(= You)
Speaker
(= Guo
Jing)
Addressee
(= You)
我 [I]
兄弟 [your
brother]
小弟 [your little
brother]
在下 [this
insignificant
person]
小可 [this
worthless
person]
你 [you]
您 [you]
兄長 [my elder
brother]
兄弟 [my brother]
大哥 [big
brother]
兄 [mate]
哥[bro]
賢弟 [my good
brother]
世妹 [younger
sister]
妹子 [sis]
姑娘 [Miss]
先生 [Mister]
莊主 [lord of
the town]
你 [you]
您 [you]
師父 [master]
恩師 [my
respected
teacher]
道長 [reverence]
前輩 [sir]
您老 [elder]/您
老人家 [the
elder]
老丈 [the senior]
大叔 [uncle]
師伯 [uncle]
島主 [lord of the
island]
岳父 [father in
law]/岳父爹爹
[dad in law]
大汗 [the Khan]
長老 [elder]
I
you,
brother,
my brother,
sister,
master
I,
me,
we,
your
student,
your
disciple
shifu,
his reverence,
elder/the
elder,
sir,
uncle,
master,
the Khan/
the Great
Khan
我 [I]
晚輩 [the
junior]
在下 [this
insignificant
person]
弟子 [your
student]
徒兒 [your
disciple]
小人 [this
worthless
person]
孩兒 [this son]
As illustrated in Table 12.1, in both ST and TT, Guo Jing’s self-reference items
can be divided into two categories in terms of lexicogrammatical choices: as
the first-person pronoun and as the third-person NGs. The use of third-person
NGs in referring to himself has important pragmatic implications reflecting his
understanding of the social relationship with his addressee(s). For example, from
the self-effacing references “小可” (this worthless person), “孩兒” (this child),
220
Jing Fang and Shiwei Fu
Chinese readers can easily tell that Guo Jing is modest, owing to his high sensitivity to the Confucian social rites which value the proper order between older and
younger.
In terms of reference to the addressees, the choices generally fall into two categories: as the second-person pronoun and as the third-person NGs. In the ST, these
choices are highly indicative of different interpersonal relations. For example, both
“世妹” (younger sister, a formal term to address a female friend who is younger
than the speaker) and “妹子” (sis, a colloquial term used to address a close female
friend who is younger than the speaker) can be used to refer to a female friend, but
the situational context in which they are used is not the same, thus creating different pragmatic indications. “妹子” (sis) indicates a closer relation with the speaker
Guo Jing than the addressee “世妹” (younger sister). Guo Jing’s choice of these
referential items manifests his sensitivity to the level of rapport, demonstrating the
idea of “orderly propriety” that is deeply rooted in his character.
As Table 12.1 shows, in terms of cordial relations, choices of third-person NGs
are more diverse in the Chinese ST, with 37 different NGs, including 10 for the
speaker-reference and 27 for the addressee-reference. In comparison, this variety
diminishes in the TT, with only 11 different NGs, including 4 for the speakerreference and 7 for the addressee-reference. It seems that many NGs used as
speaker-reference in the ST are translated as the first-person pronoun I in the TT.
To further explore the situation, a quantitative approach is adopted in analyzing
the translation of these personal reference items.
12.4.2 Analyzing the Translation of Personal Reference
As explained earlier, the third-person NGs are rarely used in English to refer to an
interactant speech role, either as the speaker or as the addressee. Meanwhile, the
use of these NGs in the Chinese ST seems to be significant in constructing and
reflecting the social relations between the speaker and the addressee. Ignorance of
these interlingual differences in translation is expected to impact the reshaping of
the character in the TT. Interestingly, TT readers’ reviews indicate that the English translation had effectively portrayed Guo Jing’s personal trait of humbleness.
Therefore, it would be worth exploring how the translators manage to achieve this
despite the challenges posed by the interlingual differences in realizing personal
reference.
In order to explore this question, a chi-square test of independence was
adopted, where the categorical independent variable of “social status” (as either
“equal” or “hierarchical”) and the categorical dependent variable of “translation
equivalence in lexicogrammar” (as either “equivalence” or “non-equivalence”)
are counted. The test aims to check the hypothesis that the lexicogrammatical
choices in the translation in realizing the speaker-reference and the addresseereference is associated with their social status. In other words, we assume that
the social status of Guo Jing and his addressee is a factor that the translator had
taken into account when translating the personal reference, which ensured an
effective rendition.
Translating Personal Reference
221
12.4.2.1 Translating Speaker-Reference
Table 12.2 presents the statistical results of translation equivalence in the case of
translating speaker-reference (i.e., the way when Guo Jing refers to himself in a
conversation). As shown in Table 12.2, when his addressee has an equal social
status to Guo Jing, the translator is more likely to use an equivalent lexicogrammatical choice in achieving the speaker-reference (72.6%). However, when the
addressee is in a hierarchical relationship with Guo Jing, a much lower percentage
of equivalence is found in the translation (46.7%). Generally speaking, disregarding various social status, translators make more equivalent choices in lexicogrammar (62.1%) than non-equivalent choices (37.9%).
The relationship between the two variables is further explored by a chi-square
test (Table 12.3). The result in Table 12.3 indicates that there is a highly significant relationship between the two variables, X2 (1, n = 559) = 38.399, p < .001.
This means that the translators’ choices between using an equivalent and using
a non-equivalent lexicogrammatical item to translate the speaker-reference are
closely related to the social status of Guo Jing and his addressee.
A closer look at the data has found that, when the speaker Guo Jing is communicating with someone of a higher status, it is more likely that a non-equivalent
choice will be made in the TT in translating the speaker-reference, compared with
the situation when he talks to someone with an equal status. This tendency is
Table 12.2 Translation Equivalence by Social Status (Speaker-Reference)
Translation Equivalence in
Lexicogrammar
Social Status
Equal
Hierarchical
Total
Count
%
Count
%
Count
%
NonEquivalence
Equivalence
91
27.4%
121
53.3%
212
37.9%
241
72.6%
106
46.7%
347
62.1%
Total
332
100.0%
227
100.0%
559
100.0%
Table 12.3 Chi-Square Test Result of Translation Equivalence and Social Status
(Speaker-Reference)
Pearson Chi-Square
N of Valid Cases
Value
df
Asymptotic
Significance
(2-Sided)
38.399a
559
1
<.001
Exact Sig.
(2-Sided)
Exact Sig.
(1-Sided)
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 86.09.
b. Computed only for a 2 × 2 table.
222
Jing Fang and Shiwei Fu
possibly related to the interlingual differences in achieving the speaker-reference.
We find that, in the case of hierarchical relations, the author in the ST uses many
NGs to refer to the speaker himself, such as “晚輩” (the junior), “在下” (the
inferior), because as aforementioned, Guo Jing is highly sensitive to orderly propriety in social status, and meanwhile, many characters in a cordial relationship
with Guo Jing enjoy a higher social status, such as his teachers (shifu) and other
senior martial arts masters. When Guo Jing talks to these characters, the use of
third-person NGs in referring to himself as someone junior and/or inferior has
effectively depicted his modesty and humbleness. However, as it is not common
to use such third-person NGs in English, it is not surprising that the translator had
to resort to a non-equivalent choice in translating the speaker-reference.
Based on the observation of the data, the non-equivalent translation of speakerreference is generally realized in three ways, including a shift in lexicogrammatical choices, omitting the translation of the speaker-reference, or compensating
the loss of humbleness by enhancing the sense somewhere else in the sentence.
Examples for each of these non-equivalent types are presented in the following:
(1) Shift in lexicogrammatical choice.
弟子做錯了事,
但憑六師父責罰。[ST]
弟子
The pupil did it wrong, (who) is ready to accept any punishment from the
Sixth Shifu. [literal translation]
I have been foolish, I will accept my Sixth Shifu’s punishment. [TT]
(2) Omission.
弟子
不 壞事,又沒荒廢了學武,因此沒稟告恩師。[ST]
The pupil thought this was not something bad and did no harm to the learning of martial arts, so [I] didn’t report it to [my] obliging teacher. [literal
translation]
It didn’t seem to be doing any harm to my training. [TT]
(3) Compensation.
小可不解,請先生指教。[ST]
小可
This worthless person does not understand. Please enlighten [me], sir. [literal translation]
May I humbly apologize? Please enlighten us. [TT]
In example 1, Guo Jing is talking to his Sixth Shifu (teacher), to whom he
admits his mistake. In the ST, a third-person NG, 弟子 (the pupil), is used to
refer to himself, and another third-person NG, 六師父 (Sixth Shifu), is used to
refer to the addressee. In the TT, the translator has made a shift by using the firstperson pronoun “I” to refer to the speaker, though an equivalent choice is made
in reference to the addressee. There are at least two possible reasons that could
explain the shift from a third-person NG in the ST to the first-person pronoun in
the TT. Firstly, the use of a third-person NG in referring to the speaker is rare in
English; thus, a translation equivalence in lexicogrammar means that it would
Translating Personal Reference
223
sound very awkward to the TT readers. By shifting to the first-person pronoun “I,”
the speaker-reference becomes very natural to the TT readers. Another possible
reason for the shift is that, again, due to the fact that the use of third-person NGs
is uncommon in referring to the speaker, a translation equivalence may confuse
the English readers as they may not be able to immediately figure out who “the
pupil” is. By shifting to a highly unmarked choice in the translation, the referential meaning becomes less ambiguous to the TT readers. This explanation echoes the findings of Bosseaux (2006), who argues that sometimes translators are
obliged to explicitate the references in order to reduce ambiguity.
In example 2, Guo Jing is talking to another shifu (teacher). In this instance, the
third-person NG, 弟子 (the pupil), is used again to refer to himself, and another
third-person NG, 恩師(my obliging teacher), is used to refer to the addressee. In
the translation, however, both the speaker-reference and the addressee-reference are
omitted, and the translator instead provides a summary of the main points in Guo
Jing’s utterance. A possible explanation for the translation omission is that perhaps
the translator considers the referential meanings in the ST as trivial, and a summary translation of the main idea would suffice, especially when the speaker-reference choice in the ST does not have an immediate equivalent in the TT. However,
whether the translation omission should be considered justifiable is still debatable,
as the omission of the referential meanings of “I” and “you” could make the conversation less interactive and lead to the loss of important linguistic clues of interpersonal meanings that could be significant in portraying Guo Jing’s humbleness.
In example 3, Guo Jing is talking to an educated stranger who obviously looks
older than him. In this situation, the addressee has a higher status because age is an
important factor in deciding the social status in the Chinese culture, especially in Guo
Jing’s time. In the ST, again, third-person NGs are used in reference to the speaker,
小可 (this worthless person), and to the addressee, 先生 (sir). It would be very awkward if an equivalent NG is used in English as a speaker-reference. The translator
chose to use the first-person pronoun “I” to refer to the speaker, and in order to
prevent the loss of humbleness in Guo Jing’s utterance, an adverb, “humbly,” was
added to the translation to compensate for the loss of the sense in the self-reference.
From the examples, we can see that, in translating the speaker-reference in a
hierarchical relationship, the translator may make non-equivalent choices to avoid
awkwardness and ambiguity in the TT while trying to find ways to still convey
the character’s humbleness. Such choices can be very effective, as demonstrated
by example 3. However, as demonstrated in example 2, such attempts may not
always be viewed as appropriate, as a non-equivalent choice such as omission
may cause damage to the character development in the TT due to the loss of the
interpersonal implications carried by the omitted reference items.
12.4.2.2 Translating Addressee-Reference
Table 12.4 presents the statistical results of translation equivalence in the case of
translating addressee-reference (i.e., the way when Guo Jing refers to his audience
in a conversation).
224
Jing Fang and Shiwei Fu
Table 12.4 Translation Equivalence by Social Status (Addressee-Reference)
Translation Equivalence in
Lexicogrammar
Social
Status
Equal
Hierarchical
Total
Count
%
Count
%
Count
%
Non-Equivalence
Equivalence
151
38.7%
45
20.8%
196
32.3%
239
61.3%
171
79.2%
410
67.7%
Total
390
100.0%
216
100.0%
606
100.0%
Table 12.5 Chi-Square Test Result of Translation Equivalence and Social Status
(Addressee-Reference)
Pearson Chi-Square
N of Valid Cases
Value
df
Asymptotic
Significance
(2-Sided)
20.319a
606
1
<.001
Exact Sig.
(2-Sided)
Exact Sig.
(1-Sided)
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 69.86.
b. Computed only for a 2 × 2 table.
The results in Table 12.4 show that, in the case of a hierarchical relationship,
most of the time the translator chose to use an equivalent lexicogrammatical item
in the TT to refer to the addressee (79.2%), whereas such equivalence rate is
noticeably lower when the addressee has an equal status as Guo Jing (61.3%).
Another chi-square test of independence has been done to explore the relationship between the two variables (i.e., social status and translation equivalence) in
translating the addressee-reference (see Table 12.5).
As shown in Table 12.5, the chi-square test of independence indicates a significant relationship between social status and translation equivalence, X2 (1,
n = 606) = 20.32, p < .001. More specifically, when Guo Jing and his addressee
are in a hierarchical relationship, the translator is more likely to use an equivalent
lexicogrammatical item in the TT in reference to the addressee, compared to the
situation when Guo Jing talks to someone equal in status. This tendency has again
implied that the translators’ choices about addressee-reference are related to the
orderly social hierarchy implied in the context.
Example 4 illustrates how an equivalent translation of the addressee-reference
is achieved in the TT.
(4) Equivalent translation of the addressee-reference.
弟子與前輩輩分差著兩輩,倘若依了前輩之言,必定為人笑駡。[ST]
前輩
前輩
Your pupil is two generations the Elder’s junior. If [I] follow the Elder’s
advice, [I] must be laughed at by people. [literal translation]
Translating Personal Reference
225
Your student is two generations the Master’s junior. If I do as the Elder
instructs, I shall be laughed out of the martial world. [TT]
In example 4, Guo Jing talks to a martial arts master who is senior both in age
and in martial arts levels. In the ST, third-person NGs are used in reference to the
speaker Guo Jing, 弟子 (your pupil), and to the addressee, 前輩 (the Elder). In the
translation, except for the first time when an equivalent NG (“your pupil”) is used
in reference to the speaker, a shift occurs in the next two instances of speakerreference, where the pronoun “I” is used to refer to the speaker Guo Jing. In comparison, equivalent third-person NGs are used in both instances where Guo Jing
makes reference to the addressee: first as “the Master,” and then as “the Elder.”
The use of a third-person NG to refer to a second-person speech role is highly
marked in English, which again indicates that the translator probably has taken
into account the social status when translating the personal reference. The uncommon lexicogrammatical items used as reference in the English text is probably
related to the translator’s intention to portray Guo Jing’s humbleness in the TT.
When comparing the findings in reference to the addressee with those in reference to the speaker (see Section 12.4.2.1), we find that the translator’s strategies
seem to be different in translating these two types of personal reference. More specifically, when Guo Jing and his addressee are in a hierarchical relationship, a much
lower equivalence rate is found in the translation of the speaker-reference (46.7%
in Table 12.2) than the translation of the addressee-reference (79.2% in Table 12.4).
A possible explanation for the difference is that, while trying to avoid awkwardness
in the TT by making some non-equivalent translation choices, the translator also tries
to find ways to compensate for the potential loss of interpersonal meanings originally
embedded in the referential items in the ST, which are important personality indicators of the character. Such compensation strategies may happen in a local sentence
environment when, for example, the loss of the sense of humbleness is added back
to another part of the sentence, as previously illustrated in example 3. In addition to
this local compensation strategy, the translator also seems to have adopted another
type of compensation which takes place in a broader sense: in response to a nonequivalence in translating one type of the reference, the translator maintains equivalence in another type, which can still carry the interpersonal implications. In our case,
the translator maintained a high level of equivalence in translating the addresseereference, but a comparatively lower level of equivalence in translating the reference
to the speaker Guo Jing. It is possible to interpret the high equivalence achieved in
translating the addressee-reference as a counterbalance measure to compensate for
the non-equivalence (such as shifts and omissions) in translating the speaker-reference, since both types of refence typically co-occur in the same utterance setting and
both of them can imply the sense of humbleness of the speaker.
12.5 Conclusion
In this paper, we have explored the translation of personal reference, including
the reference to the speaker and to the addressee, in the novel LoCH. Focusing
226
Jing Fang and Shiwei Fu
the data involving utterances of the main character Guo Jing, we investigate how
Guo Jing as the speaker makes reference to himself and to his audience, and how
these referential meanings, which are important indicators of Guo Jing’s humbleness, are translated in English. Meanwhile, as the conventional lexicogrammatical
ways in realizing personal reference are very different in Chinese and in English,
we explore how the translator managed to effectively portray an equivalent humble image of Guo Jing, as reflected in the TT readers’ comments.
Our findings show that, in both Chinese and English, personal pronouns,
including first- and second-person pronouns, are used in referring to the speaker
and to the addressee. However, many third-person NGs are also used in referring to the two speech roles in Chinese, which often carry implications of the
hierarchical social relations between Guo Jing and his audience, through which
his humbleness is effectively portrayed. In the English translation, it has been
found that the translator’s decision to maintain or not to maintain an equivalence in lexicogrammatical choices is significantly associated with the social
status of Guo Jing and his addressee. When the two parties are in a hierarchical
relationship, the translator is more likely to make a non-equivalent choice, such
as a shift or an omission, in translating the speaker-reference. Meanwhile, the
translator is likely to make an equivalent choice, such as maintaining the use of
third-person NGs, in translating the addressee-reference. Our findings also indicate that, when making a non-equivalent choice, the translator also tries to find
other ways to compensate for the potential meaning loss (in our case, the loss of
the sense of humbleness). And such compensation attempts may happen at the
local sentence level or at the macrotextual level, where the non-equivalence in
translation of self-reference is counterbalanced by the equivalence in translation
of the addressee-reference.
By exploring the effective translation practice of translating personal reference
in LoCH, we hope to bring useful implications to literary translators who constantly face translation challenges caused by intercultural and interlingual differences. The study is also expected to shed light on how an equivalent character
could be developed in the translation through informed choices in lexicogrammar
when the two languages are culturally distant.
References
Alsúa, Goñi, and Miren Edurne. 2018. “Translating Characters: Eliza Doolittle Rendered into Spanish.” Special Issue, Estudios Irlandeses 13, no. 2: 103–19. https://doi.
org/10.24162/EI.
Anthony, Laurence. 2006. “AntConc (Version 3.2.0) Computer Software.” www.laurenceanthony.net. Accessed 12 April 2022.
Barbosa de Vasconcellos, Maria L. 1998. “ ‘Araby’ and Meaning Production in the Source
and Translated Texts: A Systemic Functional View of Translation Quality Assessment.”
Cadernos de Tradução, no. 3: 215–54. https://doi.org/10.5007/%25x.
Bosseaux, Charlotte. 2006. “Who’s Afraid of Virginia’s You: A Corpus-based Study
of the French Translations of the Waves.” Meta 51, no. 3: 599–610. https://doi.
org/10.7202/013565ar.
Translating Personal Reference
227
Brown, Roger, and Albert Gilman. 2012. “The Pronouns of Power and Solidarity.” In
Readings in the Sociology of Language, edited by A. Fishman Joshua, 252–75. Berlin:
De Gruyter Mouton.
Caplan, Jennifer E., Kiki Adams, and Ryan L. Boyd. 2020. “Personality and Language.”
In The Wiley Encyclopedia of Personality and Individual Differences, edited by Bernardo J. Carducci, Christopher S. Nave, and Christopher S. Nave, 311–16. Hoboken:
Wiley-Blackwell.
Chatman, Seymour Benjamin. 1978. Story and Discourse: Narrative Structure in Fiction
and Film, Cornell Paperbacks. Ithaca, NY: Cornell University Press.
Chen, Lin, and Ruoyu Dai. 2021. “Translator’s Narrative Intervention in the English Translation of Jin Yong’s The Legend of Condor Heroes.” Perspectives 1–16. https://doi.org/
10.1080/0907676X.2021.1974062.
Feng Youlan 馮友蘭. 1961. Zhongguo zhexueshi
哲學史 [History of Chinese Philosophy]. Beijing: Zhong Hua Book Company.
Halliday, Michael A. K., and Ruqaiya Hasan. 1976. Cohesion in English. London and New
York: Routledge.
Huang, Yongliang. 2008. “Politeness Principle in Cross-Culture Communication.” English
Language Teaching 1, no. 1: 96–101. https://eric.ed.gov/?id=EJ1082589.
Ireland, Molly E., and Matthias R. Mehl. 2014. “Natural Language Use as a Marker of
Personality.” In The Oxford Handbook of Language and Social Psychology, edited by
Thomas M. Holtgraves, 201–18. New York: Oxford University Press.
Jin Yong 金庸. 1980/2002. Shediao yingxiong zhuan 射雕英雄傳 [Legends of the Condor
Heroes]. Guangzhou: Guangzhou Publishing House.
Lee, Sang B. 2018. “Shifts in Characterization in Literary Translation: Representation of
the “I”-Protagonist of Yi Sang’s Wings.” Acta Koreana 21, no. 1: 283–307. https://doi.
org/10.18399/acta.2018.21.1.011.
Lin, Deng. 2020. “A Corpus-Based Study on Character Image Shaping in English Translated Version of Kuang Ren Ri Ji.” In Proceedings of the 2nd International Conference
on Literature, Art and Human Development (ICLAHD 2020), edited by Malini Ganapathy, et al., 344–53. Amsterdam and Paris: Atlantis Press. https://dx.doi.org/10.2991/
assehr.k.201215.404.
Ma, Yuanyi, and Bo Wang. 2020. “Demystifying Translation as Recreation of Meaning
Through Choice.” In Translating Tagore’s Stray Birds into Chinese, edited by Yuanyi Ma
and Bo Wang, 15–30. London: Routledge.
Mastropierro, Lorenzo. 2020. “The Translation of Reporting Verbs in Italian: The Case of
Harry Potter Series.” International Journal of Corpus Linguistics 25, no. 3: 241–69.
https://doi.org/10.1075/ijcl.19124.mas.
Mead, Gerald. 1990. “The Representation of Fictional Character.” Style 24, no. 3: 440–52.
Rodríguez Herrera, José M. 2015. “The Adventures of Huckleberry Finn and Jim in
China: A Case of What Corpus Pragmatics Can Do for the Translation of Dialect.”
Digital Scholarship in the Humanities 32, no. 2: 385–97. https://doi.org/10.1093/llc/
fqv058.
Ruano, Pablo. 2017. “Corpus Methodologies in Literary Translation Studies: An Analysis
of Speech Verbs in Four Spanish Translations of Hard Times.” Meta 62, no. 1: 94–113.
https://doi.org/10.7202/1040468ar.
Wales, Katie. 1996. Personal Pronoun in Present Day English. Cambridge: Cambridge
University Press.
Xiao, Hang. 2014. “CorpusWordParser (Version 3.0.0.0) Computer Software.” www.
cncorpus.org. Accessed 12 April 2022.
228
Jing Fang and Shiwei Fu
Xu Xueying 徐雪英, and Gigi Chang 張菁. 2020. “Cong Jin Yong Shediao yingxiong
zhuan yingyi kan zhingguo wemhua zouxiang shijie” 從金庸《射雕英雄傳》英譯看
化如何走向世界 [How Chinese culture goes global from the translation of Jin
Yong’s Legends of the Condor Heroes]. Zhejiang Academic Journal 浙江学刊, no. 3:
42–53.
Zhang Mi 張汨, and Wang Zhiwei 王志偉. 2020. “Jin Yong Shediao yingxiong zhuan zai
yingyu shijie de jieshou yu pingjia” 金庸《射雕英雄傳》在英語世界的接受與評價
[The Reception and Evaluation of Jin Yong’s Legends of the Condor Heroes in the English World]. East Journal of Translation 東方翻譯, no. 5: 18–25.
13 Lexical Bundles in the
Fictional Dialogues of Two
Hongloumeng Translations
A Corpus-Assisted Approach
Kanglong Liu, Joyce Oiwun Cheung, and
Riccardo Moratto
13.1 Introduction
Acclaimed1 as one of China’s four great classical novels, the Chinese classic
Dream of the Red Chamber, or in Chinese, Hongloumeng (hereinafter HLM),
has drawn attention from both literary and translation researchers over decades. The work is widely acknowledged as one of the greatest Chinese fictions
for it paints a vivid picture of the aristocratic families against the broad social
background of the late Qing Dynasty (1644–1911). The first 80 chapters of this
120-chapter chronicle were composed by the Qing writer Cao Xueqin, and the
Qing scholar Gao E completed the remaining 40 chapters after Cao’s death (Cao
and Gao 1982).
As a renowned Chinese literary work, the novel has been translated numerous
times, hence providing scholars with a good source for comparative translation
analysis. From 1979 to 2013, over 1,300 HLM research articles were published,
with a majority focusing on the English translations of this classic (Ran and Yang
2013). There are three full-length versions, namely, The Story of the Stone, translated by David Hawkes and his son-in-law, John Minford; A Dream of Red Mansions, by Xianyi Yang and his wife, Gladys Yang; and The Red Chamber Dream,
by B. S. Bonsall. The Bonsall version has never been officially published but
is currently archived in the University of Hong Kong Library (Bonsall 2004),
whereas the first two versions have been read by many people across the globe.
Hawkes translated the first 80 chapters, and Minford finished the remaining 40,
which parallels the division of labor between the two HLM writers, Cao Xueqin
and Gao E. On the other hand, Xianyi Yang seemed to be the major translator of
HLM, while his wife, Gladys Yang, served an assisting role. As stated by their
daughter Chi Yang (cited in Li et al. 2011, 163):
When he [Xianyi Yang] was translating at his top speed, he didn’t write, but
simply rendered orally while my mother would type the translation on a typewriter. While she was typing the text, she also polished or edited it. So the
translation was ready when all this was done.
DOI: 10.4324/9781003298328-14
230
Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto
Wang (2016) comments that Hawkes and Minford’s HLM translation is extremely
popular among the broad reading public in comparison to the Yangs’ version. Such
a difference in popularity has led to a number of studies exploring the various linguistic features between these two versions and translation strategies employed
by respective translators.
The advances in corpus-based translation studies initiated by Baker (1993)
have provided an impetus for translation/translator style research. According to
Baker (2000, 244), “it is as impossible to produce a stretch of language in a
totally impersonal way as it is to handle an object without leaving one’s fingerprints on it.” Thus, similar to the research on translation universals, researchers
have made use of various language indicators, such as type-token ratio, sentence length, lexical density, which are believed to be the translators’ “characteristic use of language and linguistic habits” (Baker 2000, 245), to examine
how translators or translations differ. So far as HLM translations are concerned,
researchers have compiled parallel and comparable corpora to examine how
translations differ in a range of the aforementioned indicators. For example, previous research on HLM translations has identified that Hawkes diverged from
the Yangs in various stylistic features (Li et al. 2011; Liu 2008; Liu and Afzaal
2021). In particular, based on the first 15 chapters of the two translation versions, Li et al. (2011) found that Hawkes’s version contained more tokens and
used longer sentences than did the Yangs’, whereas the latter used a wider range
of words, as reflected in a higher type-token ratio. Other linguistic indicators that
have been used to study HLM translations include nominalization (Hou 2013),
vocabulary richness (Fang and Liu 2015), and even idioms (Su 2021). To a large
extent, researchers are largely confined to the use of word-level indicators to
approach the style of HLM translations. As argued by Mastropierro (2018), the
use of lexical bundles (LBs), or key clusters, can serve as a reliable indicator of
translator’s style, as they can reveal the translators’ idiosyncrasies beyond the
use of words. Following Mastropierro, the current study will make use of lexical
bundles as a linguistic indicator to examine the fictional dialogues of the first 80
HLM chapters respectively translated by David Hawkes and Xianyi Yang and
Gladys Yang.
13.2 Literature Review
13.2.1 Translation Style Research
In order to properly define “translation style,” we must know the definition of
style in the field of literary studies. Crystal (1999, 323) stated that style is “any
situationally distinctive use of language, and of the choices made by individuals and social groups in their use of language.” Leech and Short (1981) specifically proposed four main categories for style analysis in literary works, including
lexical category, grammatical category, figures of speech, as well as cohesion and
context. Style research in the field of translation studies, to a large extent, borrows heavily from similar research in literary studies. With the rise of descriptive
Lexical Bundles in the Dialogues of Hongloumeng Translations
231
translation studies (DTS), which aims at studying translation in its own right and
situating it within the target social-cultural background, translation style research
has attracted considerable scholarly attention from researchers working in corpusbased translation studies. The traditional prescriptive notion that translation should
be faithful to the source text has largely lost its appeal due to the shift toward DTS.
Generally speaking, style research mainly falls into two major strands: translator
style and translation style. The first one concerns the use of a comparable corpus
(Bosseaux 2007; Saldanha 2011) to study the oeuvre of a translator as opposed
to the other by capturing “the translator’s characteristic use of language, his or
her individual profile of linguistic habits, compared to other translators” (Baker
2000, 245). On the other hand, translation style research is often conducted based
on a parallel corpus to examine how two or more translations of a particular work
diverge from each other in certain linguistic indicators or features (Li et al. 2011;
Mastropierro 2018). However, the two terms are sometimes used interchangeably,
as it is practically impossible to examine all the translated works of a translator.
Similar to the translation universals (TUs) research, which has benefited from the
use of corpus tools, translation style research has also benefited from the methodology of TUs research, including the use of linguistic indicators and analytical
frameworks. In the case of HLM, the two full-length translations, which were
done at roughly the same time (i.e., 1970s–1980s), have provided a good source
for the current study to examine how they differ in style.
13.2.2 Previous Studies on Style in English Translations of HLM
Over the years, HLM and its translations have attracted much attention from
translation scholars. As a monumental literary work, HLM has multiple translations, including some partial and complete translations. So far, most research
efforts have been devoted to comparing the two full-length translation versions,
namely, the one translated by David Hawkes and John Minford, and the other
by Xianyi Yang and Gladys Yang. Early research on HLM translations is mainly
based on qualitative deliberations. According to Yan’s (2005) systematic review
of 50 research articles on HLM translations, a majority have adopted comparative
methods to study a wide range of topics, ranging from poems to rhetoric devices.
Some of the most frequently investigated topics in HLM translations include
culture-specific items, book titles, idioms, character names, rhetoric devices,
and history of translation. More recent publications also investigated how social
terms (Tsao 2020) and material culture-loaded words (Yu 2020) are translated in
HLM translations. Some other recent works also scrutinized letters exchanged
between translators to discuss the commissioners behind HLM translations (Tong
and Morgan 2021). Qualitative HLM research in general has studied a wide range
of issues related to HLM translations in a descriptive yet case-by-case manner.2
With the rise of corpus linguistics in the field of translation studies, corpus
methods have also been adopted to systematically analyze styles in the HLM
translations. To this end, researchers often compiled parallel corpus consisting
of the Chinese source text and the English translations. For example, Liu (2008)
232
Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto
compared how titles and honorifics were handled in HLM translations. Ji and
Oakes (2018) studied earlier HLM translations produced in the eighteenth century
using corpus methods and found that Edward Bowra used more conjunctions and
genitives while H. Bencraft Joly used more determiners which largely characterized Joly’s translator style. Joly’s version was also compared with the Yangs’
in Hou (2013), which revealed that nominalization construed formality in Joly’s
version but conciseness in the Yangs’ version. In two doctoral theses, Hawkes’s
and the Yangs’ HLM translations have been studied in detail: Mu (2012) found
that Hawkes’s style emphasized events and feelings by following the Western
narrative convention; on the other hand, the Yangs’ style was found to be nonevent-oriented and less direct. Wu (2021) further used Biber’s multidimensional
analysis to analyze the acceptability of Hawkes’s and the Yangs’ versions respectively. From the development of corpus research on HLM translations, we can see
the use of various linguistic indicators – from tokens and lexical types in Li et al.
(2011), sTTR and lambda in Fang and Liu (2015), to metaphorical idioms in Su
(2021) and lexical bundles in Liu and Afzaal (2021).
13.2.3 Lexical Bundles as an Indicator in Translation Style Research
Lexical bundles (LBs), also known as multiword expressions (MWEs), ngrams,
and formulaic sequences, mean recurring lexical sequences in a register (Biber
et al. 2004). In the field of second language acquisition, the use of LBs has been
found to be one of the features distinguishing native from non-native English
(e.g., Chen and Baker 2010, Wei 2007); recently, LBs have also been affirmed an
effective indicator for investigating translator’s style as well. Mastropierro (2018)
compared LBs in two English-Italian translations of a thriller and found that one
translator used significantly more bundles than the other. While acknowledging
the merits of using LBs in translation style research, Mastropierro (2018) proposed
that LBs can be categorized into groups which may disclose a translator’s linguistic
patterns and habits. As noted by Mahlberg et al. (2019), LBs are sometimes marked
features of a specific character; thus, the use of different LBs can help construct
characters with its various functions of “negotiation of information, turn-taking,
politeness, and first-person narration” (Mahlberg and Hoey 2012, 76). In terms of
translation, translators’ use of LBs not only shows their linguistic preferences and
characterization of the fictional characters but also impacts on the readability of
their translations. Shrefler (2011) argued that Martin Luther’s German translation
of the Bible is more reader-friendly because of his frequent use of verb-related LBs.
Accordingly, the use of LBs is closely connected with translation style research.
As a matter of fact, LBs have been used in Hongloumeng translation research.
Based on the first 15 chapters of HLM translations, Liu and Afzaal (2021) demonstrated that Hawkes’s translation is embedded with a greater number and variety
of LBs than the Yangs’ version. Although their study has shown major differences in the use of LBs between the two HLM translations, it is believed that a
study taking all 80 chapters into consideration should yield more rigorous results.
Moreover, responding to Axelsson’s (2008) call on treating fictional dialogue and
Lexical Bundles in the Dialogues of Hongloumeng Translations
233
narration as two separate genres, a study on examining the use of LBs in HLM fictional dialogues can yield some new insights into HLM translation style research.
Therefore, the current study focused on the dialogue part of both translations (all
80 chapters) to examine how the two diverge in translation style. The representation of LBs in respective translations serves as a departure point for the identification of the “the specific translator´s idiosyncrasies and conscious interpretive or
unconscious idiolectal choices” (Munday 2012, 144).
13.2.4 Research Questions
Based on the foregoing review, we can see that lexical bundles can be used as
a reliable indicator for translation style research. Though such an indicator has
been used to explore some parts of HLM translations (Liu and Afzaal 2011), no
research has been conducted to systematically examine all 80 chapters translated
by Hawkes and the Yangs. Besides, no research has so far attempted to separate
HLM into fictional dialogues and narration. Thus, we believe that a study aiming
at examining how lexical bundles are represented in HLM fictional dialogues can
provide novel insights into this line of research. In this study, we aim at addressing
the following three research questions:
(1) Do the two Hongloumeng translations differ in style as represented by the
frequency and types of lexical bundles?
(2) If such differences are identified, do they diverge in terms of the structural
and functional categories of the key lexical bundles?
(3) What are the possible factors contributing to the different use of lexical bundles in the two Hongloumeng translations?
13.3 Data and Procedure
13.3.1 Corpus
The current study made use of the English-Chinese Parallel Corpus of Hongloumeng, which was built by Li et al. (2011). The corpus was compiled by either
scanning hard copies or downloading soft copies from the internet. It consists of
three parts running in parallel, namely, the original Chinese texts, the translation
by Hawkes and Minford, and the translation by the Yangs. The current research
is based on the first 80 chapters of the two translations. In other words, the part
translated by Minford is not included in our study.
A self-written Python program was utilized to automatically extract the dialogues using punctuation (in this case, quotation marks) to separate fictional dialogues from narrations. The data were then manually proofread to ensure accuracy,
as some quotation marks are used to mark titles or emphasize certain details instead
of indicating dialogues. Upon completion, we have compiled two corpora, namely,
the Yangs Dialogue Corpus (YD) and the Hawkes Dialogue Corpus (HD). YD consists of 219,478 tokens (i.e., the total number of orthographic words separated by
234
Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto
Table 13.1 Descriptive Statistics of Fictional Dialogues
in HD and YD
Measures
HD
YD
Tokens
Types
TTR1
STTR2
280,716
10,730
3.82
39.28
219,768
9,801
4.47
42.14
Source: 1TTR = type-token ratio.
2
sTTR = standardized type-token ratio.
spaces and punctuations) and 9,801 types (i.e., the number of distinct words in the
corpus), while HD has 280,682 tokens and 10,734 types (see Table 13.1). Although
Hawkes used more words to translate the first 80 chapters, by dividing the number
of types by tokens (i.e., type-token ratio or TTR) we can see a higher TTR in YD,
showing that the Yangs used a wider range of distinct words. As YD and HD differ
in size, standardized TTR (sTTR) of the two corpora were also calculated by working out the average of all the TTRs per 1,000 words. YD has a higher sTTR than
HD, confirming that the Yangs indeed used more distinct words than Hawkes did.
13.3.2 Analytical Framework
In order to identify the representative LBs used by Hawkes and the Yangs, we used
WordSmith 8.0 (Scott 2020) to firstly turn both corpora into index files, which were
then used to generate lists of three-word and four-word LBs with their corresponding frequencies. Most studies have opted for a frequency threshold for retrieving
LBs, ranging from 10 (Biber et al. 1999), 20 (Cortes 2004; Hyland 2008), to 40
times (Biber et al. 2004; Pan et al. 2016) per million words (pmw). In view of the
corpus size and the purpose of the current study, we have opted for a threshold of
three times to retrieve the three-word and four-word LBs. Details of the retrieved
LBs can be seen in Table 13.2. Based on the statistics, YD contains fewer tokens
and types of both three-word and four-word LBs than HD. This is normal, considering the relatively smaller size of YD compared to HD. Further comparison of
the TTRs reveal that YD has higher TTRs in both three-word and four-word LBs.
Table 13.2 Types and Tokens of 3-Word and 4-Word LBs in
HD and YD
Measures
HD
YD
Tokens of 3word LBs
Types of 3word LBs
TTR of 3word LBs
Tokens of 4word LBs
Types of 4word LBs
TTR of 4word LBs
60,538
10,498
17.34
12,867
2,931
22.78
32,692
6,235
19.07
5,972
1,413
23.66
Lexical Bundles in the Dialogues of Hongloumeng Translations
235
Based on the two lists of LBs, we further adopted the structural and functional
classifications framework proposed by Biber et al. (2004) to investigate how
Hawkes and the Yangs used LBs differently. Structural classification is a system
which broadly categorizes expressions into different groups based on their part
of speech (POS) information. For LBs which contain at least one verb component, they are classified as verbphrase-based (VPbased). For the LBs which do not
have any verb components, they are classified as nounphrase-based (NPbased) if
a noun component comes before prepositions or other POS components. In case a
preposition comes before nouns, the expression is then classified as prepositional
phrase-based (PPbased). As for those without any verbs, nouns, or prepositions,
they are classified as others. While structural classification is useful in differentiating the structural patterns of LBs preferred by respective translators, functional
classification enables a comparison of the LBs in terms of their communicative
goals. The LBs can be broadly categorized into stance, discourse markers, referential, and special conversational functions, depending on their use in the context.
Sometimes an expression may perform more than one function. For example,
I want to can be a discourse marker which introduces a topic; alternatively, I
want to can also be used to express desire. To decide on the major function of
an expression, we employed a context-based annotation. In other words, the LBs
were studied in the context before we ultimately annotated the expression with its
key function.
In this study, we conducted two rounds of Key-LBs analysis. In the first
round, we compared the YD LBs against the HD LBs as the reference corpus
to identify the Key-LBs used in YD. In the second round, the two lists were
reversed in order to identify the Key-LBs in HD. LBs having passed the keyness
tests in the analyses (i.e., loglikelihood > 6.63) would be considered Key-LBs,
meaning, that these LBs have an unusually high frequency in their respective
corpus.3 Among these LBs, some content expressions, mainly, character and
place-names, such as Our Old Lady, which are irrelevant for the analysis were
redacted, leaving us with 57 and 139 LBs types in YD and HD, respectively.
We applied the structural classification (i.e., NPbased, VPbased, PPbased, and
others) and functional classification (i.e., stance, discourse organizers, referential, and special conversational functions) (Biber et al. 2004) to classify the
Key-LBs, with the ultimate aim to identify how HD and YD diverge in style
represented by the use of LBs.
13.4 Results
13.4.1 Structural Patterns
Although YD yielded a higher TTR of LBs than HD, we only identified 57 KeyLBs in YD; HD, on the other hand, showed a lower TTR of LBs but recorded 139
Key-LBs (see Table 13.3). This reveals that TTR might not be a reliable indicator
if we are comparing two LBs lists that differ in length. We found that HD and YD
differ not only in the number of Key-LBs but also in structures and functions.
236
Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto
While both Key-LBs in HD and YD are mostly VPbased (i.e., consisting of a
verb component), HD has a higher proportion of VPbased Key-LBs (75.54%) than
that of YD (61.40%). The result shows that HD is closer to Conrad and Biber’s
(2005) finding that 90% of the LBs used in spoken British English involve verb
components. On the other hand, a higher proportion of PPbased Key-LBs (i.e.,
bundles starting with a preposition) is found in YD (17.54%) than HD (7.91%).
Since the majority of Key-LBs in HD and YD are VPbased, which involve
at least one verb component, thus we proceeded to study their subpatterns (see
Table 13.4). Our findings revealed that 40.95% of Key-LBs in HD started with a
personal pronoun (e.g., I, you, she), 29.52% started with a verb (e.g., be, do, have,
modal, or other verbs), and 20.95% started with either a conjunction or linking
words, such as that and to (see Table 13.3). We further categorized the VP-based
Key-LBs for their subcategories (see Table 13.4). Likewise, the PP-based KeyLBs were also further categorized for their subcategories (see Table 13.5).
Table 13.3 Structural Classifications of Key-LBs in HD and YD
HD
YD
Structural Classifications
Key-LBs
%
Key-LB
%
NPbased
VPbased
PPbased
Others
Total
21
105
11
2
139
15.11
75.54
7.91
1.44
100
9
35
10
3
57
15.79
61.4
17.54
5.26
100
Table 13.4 Statistics of VP-Based Key-LBs in HD and YD
VP-Based Key-LBs
Types in HD %
Types in YD
Starting with personal pronouns
Starting with verbs (including be, do, have,
modal verbs, and other verbs)
Starting with conjunctions, that, to, or not to
Starting with whwords
Starting with existential markers (including
there and this)
Starting with an adjective
Total
43
31
40.95 5
29.52 15
14.29
42.86
22
5
2
20.95 6
4.76 8
1.9
1
17.14
22.86
2.86
2
105
1.9
100
0
100
0
35
%
Table 13.5 Statistics of PP-Based Key-LBs in YD
PP-Based Key-LBs
HD
%
YD
%
Starting with a preposition and a determiner
Starting with two prepositions
Starting with conjunction
Total
5
0
3
3
45%
0%
27%
27%
6
1
3
10
60.00
10.00
30.00
100
Lexical Bundles in the Dialogues of Hongloumeng Translations
237
13.4.2 Contextual Use of Key VP-Based and PP-Based LBs
In this section, the two most common types of VP-based Key-LBs (i.e., those
starting with personal pronouns and those starting with verbs) and the PP-based
Key-LBs will be further discussed in relation to some examples extracted from
HD and YD.
Many of Hawkes’s VP-based Key-LBs are headed by a personal pronoun.
I think you is the LB that is most significantly different between HD and YD
(LL: 49.70), showing a clear overrepresentation in HD. This phrase usually
appears at the beginning of a sentence and manifests the subject prominence in
English. As we can see in excerpt 1, the suggestion of paying someone a visit is
expressed in the form I think you should (i.e., first personal pronoun + verb base
+ second personal pronoun) in HD. Meanwhile, such subjectpredicate relation
is absent in YD, which simply used the directive Go to express the character’s
permission of the visit, which is a topic that has already been introduced in
the previous dialogue exchange. YD prioritized the topic (Go), whereas HD
adhered to the English convention of subject prominence (e.g., She is, I think
you). As can be seen, Hawkes tended to use subjectpredicate structures (e.g.,
personal pronoun + verbs), whereas such structures are less found in the Yangs’
version.
Excerpt 1
“你看看就過去罷,那 侄兒媳婦。” [Source] (Chapter 11)
“Yes,” “she is your nephew’s wife. I think you should. Just look in for a
moment, though, and then join the rest of us.” [Hawkes]
“Go if you want, but don’t be long,” “Remember she’s your nephew’s
wife.” [Yangs]
Similar contrast is also observed in Key-LBs which begin with a verb. Ought
to be is the most significant KeyLB in HD (LL: 36.02), which starts with a verb
component. As we can see in excerpt 2, ought to be follows the subject you in
HD. In his rendition, Hawkes translated the invitation 請, qing (literal translation: please), using a subject (you) and its predicate (ought to be getting back
. . .). The Yangs, on the other hand, did not use the subjectpredicate structure but
instead retained the semantic meaning (please) of 請, qing, in the source text.
Since please is a near equivalent of 請, qing, the Yangs used literal translation
by following the same sentence order as that of the source text. Subject is again
omitted in the Yangs’ version. Excerpts 1 and 2 are just two of the many examples
contrasting Hawkes’s and the Yangs’ preferences for subjectpredicate and topiccomment structures, respectively. Overall, we can safely conclude that Hawkes’s
Key-LBs follow the spoken English convention in which most of the LBs involve
verb components (Conrad and Biber 2005) structured in the form of personal
pronouns + verb (Biber 2009).
238
Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto
Excerpt 2
“如今來回老祖宗,債主已去,不用躲了。已預
希嫩的野鶏,請吃
晚飯去,再遲一會子就老了。” [Source] (Chapter 50)
“So now your creditors have gone, you can come out of hiding. You ought
to be getting back now in any case. You’ve got some nice, tender pheasant for
dinner and if you leave it much longer it will spoil.” [Hawkes]
“Now I’ve come to report to our Old Ancestress: Your duns have gone,
you can come out of hiding. I’ve some very tender pheasant ready. So please
come back for dinner. If you leave it any later, it’ll be overcooked.” [Yang]
However, this is not the case in YD. Although more than half of the Yangs’
Key-LBs are still VPbased, this proportion is still fewer than that of HD because
17.54% belong to PPbased LBs. Meanwhile, only 7.91% of Hawkes’s Key-LBs
are PPbased. This indicates that YD has used more PPbased LBs which were significantly underused by Hawkes when translating Hongloumeng (see Table 13.4).
Yip (1995, 78) pinpointed that bare noun phrases are often placed in the beginning
of a Chinese sentence to refer to a topic due to topic prominence, but such a syntactic structure (i.e., sentences beginning with a bare noun) is not really natural in
English. Hence, Yip believed that Chinese speakers strategically use prepositional
phrases to encapsulate a bare noun phrase when they need to first talk about a topic.
Based on the results, it can be seen that using a prepositional noun phrase to start
a sentence is more prevalent in YD than HD. For example, the Yangs used If not
for (LL: 29.64) significantly more frequently than Hawkes did. If not for is a typical prepositional phrase which consists of the conjunction if, the adverb not, and
the preposition for. In excerpt 3, we can see that the source text in Chinese is structured as 要 不
(if not) and 我 (me), which the Yangs directly translated into If
not for me. As the focus is on the speaker holding back the other one from attacking people, the Yangs kept this topic in the translation and used the prepositional
phrase If not for to topicalize the object me. The syntactic order of If not for me is
almost an equivalent to the dependent clause 要不
(literally: if not me) in the
Chinese source text. Conversely, Hawkes followed the subjectprominent convention by using a verb phrase to start the sentence. He used the verbpronounverb
clause Suppose I hadn’t been here to describe a condition that is contrary to fact.
Excerpt 3
“要不
,你要傷了他的命,這會子可怎麽樣?” [Source] (Chapter 44)
“If not for me you might have killed her. What do you intend to do now?”
[Yang]
“Suppose I hadn’t been here to protect her and you really had done her
an injury, what would you have had to say for yourself then, I wonder?”
[Hawkes]
Lexical Bundles in the Dialogues of Hongloumeng Translations
239
The Yangs also used prepositional phrases at the end of sentences. For example,
they extensively used for no reason to express the absurdity of a situation. For no
reason is one of the Key-LBs in YD consisting of a preposition, a determiner, and
a noun, which yielded a very high keyness value (LL: 31.29), meaning, that it is
overrepresented in YD than HD. PPbased LBs like for no reason, when placed at
the end of a sentence, often serve as an adverbial. From excerpt 4 we can see that
the Yangs used this prepositional phrase to describe the unlikeliness that someone would offend those people. The Yangs not only used prepositional phrases
to make noun phrase topics grammatically well-formed (e.g., excerpt 3) but also
used them to describe actions. However, no such substantial use of prepositional
phrases was found in Hawkes’s dialogue translation. Hawkes used a variety of
linguistic choices to achieve the same purpose; in this case, he used the adverb
possibly to express the unlikeliness of the event. So far, our study has found that
there are more unique VPbased LBs in HD and more distinctive PPbased LBs in
YD. Our findings revealed that the Yangs seemed to prefer using prepositions to
introduce noun topics, while Hawkes used more verb phrases to express subjectpredicate relations.
Excerpt 4
“誰可
的得罪着他?” [Source] (Chapter 78)
“Why should anyone offend them for no reason.” [Yang]
“Who could possibly have offended her?” [Hawkes]
13.4.3 Functional Classifications
After manual classification, it was found that 47.48% of Hawkes’s Key-LBs
mainly expressed stances, while 36.84% of the Yangs’ Key-LBs mainly served
as referential bundles (see Table 13.6). This means almost half of Hawkes’s
unique LBs come from his use of stance markers. Thus, these two functional
categories were further examined in detail. In order to show how HD diverged
from the YD in the use of stance markers, we further categorized the stance
markers for their subpatterns (see Table 13.7). Likewise, we also further categorized the referential Key-LBs in HD and YD for their subpatterns (see
Table 13.8).
Table 13.6 Functional Classifications of Key-LBs in HD and YD
Functional Classifications
HD
%
YD
%
Stance
Discourse organizers
Referential
Special conversational functions
Total
66
31
37
5
139
47.48
22.3
26.62
3.6
100
10
10
21
16
57
17.54
17.54
36.84
28.07
100
240
Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto
Table 13.7 Statistics of Stance Key-LBs in HD and YD
Stance Functions
HD
%
YD
%
Epistemic stance
Overall attitudinal/modality stance
Desire
Obligation/directive
Intention/prediction
Ability
Total
20
4
4
19
13
6
66
30.30
6.06
6.06
28.79
19.70
9.09
100.00
3
0
0
4
1
2
10
30
0
0
40
10
20
100
Table 13.8 Statistics of Referential Key-LBs in HD and YD
Referential Functions
HD
%
YD
%
Identification/focus
Imprecision
Quantity/specification
Intangible framing attributes
Place reference
Time reference
Multifunctional reference
Total
7
6
9
6
1
3
1
37
18.92
16.22
24.32
16.21
2.70
14.29
2.70
100
4
1
5
4
1
3
3
21
19.05
4.76
23.81
19.05
4.76
14.29
14.29
100
13.4.4 Contextual Use of Key Stance and Referential LBs
According to Biber and Barbieri (2007), the predominant function of LBs in all
spoken registers (i.e., teaching, class management, office, study groups, and service
encounters) is to express stance. It seems that Hawkes tended to use stance markers
to translate the fictional dialogue. Among Hawkes’s Key-LBs which are classified as
stance, 30.30% construe an epistemic stance, while 28.79% convey obligations/directives (see Table 13.6). The rest are distributed among intentions/predications, desire,
ability, etc. This means most of Hawkes’s Key-LBs perform either an epistemic or
a directive function. For instance, one of Hawkes’s KeyLB, I think I (LL: 33.52),
is a very common epistemic marker in conversational English. It indicates personal
opinions and sometimes functions as a hedge to soften the illocutionary force of an
assertion. In excerpt 5, Hawkes added I think I to express the speaker’s decision to stay
overnight. This use of hedging in decision-making is, however, not found in the source
text. It is solely Hawkes’s interpretation that a certain degree of hedging might be
required in this context. Such stance markers are found neither in the source text nor
in YD. The Yangs used shan’t, the contraction form of shall not, to keep the formality
and courtesy conveyed in the source text. On the other hand, the Yangs literally rendered the source text without adding any epistemic stances in relation to the context.
Excerpt 5
“有的 炕,只管睡。
二爺使我送月銀的,交
去了。” [Source] (Chapter 65)
了奶奶,
不回
Lexical Bundles in the Dialogues of Hongloumeng Translations
241
“There’s plenty of room here for you to sleep. Make yourselves at home.
Actually, I came here to bring the mistress her monthly allowance. Now that
I’ve given it to her, I think I shall spend the night here as well.” [Hawkes]
“Well, there’s plenty of room on the kang, just lie down as you like. Second
Master sent me to bring the monthly allowance to the mistress, so I shan’t be
going back either.” [Yang]
Apart from epistemic stances, Hawkes used significantly more LBs to perform
a kind of speech act directives. Among his stance Key-LBs, 28.79% assert obligation/directives. You ought to (LL: 28.64) is one of the LBs with a high keyness
value which is used by the speaker to imply that the listener has a sense of duty or
morality to undertake a certain task. Clearly, HD contains more expressions conveying obligations and directives than YD. Take a translation pair as an example
(see excerpt 6): the source text 你細想去 (literal translation: you carefully think
about) does not contain any sense of obligation. However, Hawkes used you ought
to be able to in his translation, which signaled an obligation for the listener to work
things out by themselves. Such an obligation sense was not found (at least literally)
in the source text, so the Yangs simply used the adverb just to begin the subjectless command work it out yourself. In view of the fact that there are more stance
Key-LBs (66) in HD compared to YD (10), it can be postulated that Hawkes tended
to add stance LBs in his translation while the Yangs used stance LBs to a lesser
degree. Among these stance Key-LBs, Hawkes mainly used them to convey epistemic stances or obligation/directives, as has been exemplified in excerpts 5 and 6.
Excerpt 6
“
。我哥哥已經相准了,只等來年就下定了, 不必提出人來,我
方才說你認不得娘,你細想去。” [Source] (Chapter 57)
“No, that’s not the reason. It’s because someone has already been chosen
for my brother. We are only waiting for him to come home to make it public. I don’t need to name names. If I tell you that you can’t possibly become
Mamma’s god daughter, you ought to be able to work it out for yourself.”
[Hawkes]
“No, it’s because my brother has already set his mind on someone, and it’ll
be fixed up as soon as he returns. I needn’t name any names. Why did I say
you couldn’t take her as your mother? Just work it out for yourself!” [Yang]
Unlike Hawkes, many of the Key-LBs in YD are referential markers. Results
show that 36.84% of the frequently occurring LBs in YD were used to refer to
different attributes. The referential Key-LBs in YD are distributed across many
subfunctions, including identification/focus, imprecision, quantity/specification,
intangible framing attributes, place, time, and multifunctional reference (see
Table 13.8). Since the Yangs’ referential Key-LBs are evenly distributed across
all subfunctions, we have selected two referential Key-LBs for detailed analysis
Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto
242
based on the two LBs’ exceptionally high keyness values. The first one is this is
just (LL: 24.70), which functions as an identification/focus marker. The Yangs
used this is just significantly more frequently than Hawkes did (see excerpt 7).
This is just what (YD) differs from this way of carrying on (HD), as the former
just refers to a vague subject matter which readers can by no means infer from the
literal meaning, but the latter identifies the exact misbehavior. In the source Chinese text 正爲勸你這些 (literal translation: just persuading you these), the word
這, zhe (literal translation: this), is exactly an identifier in Chinese. By starting a
sentence with the identifier 這, zhe, Chinese speakers can easily follow the topic,
which need not be reintroduced repeatedly. Largely a literal translation approach,
the Yangs used identifiers (e.g., this) in their translation by adhering closely to the
source text. We assume that the overuse of identification LBs in YD is thus probably a result of direct translation of Chinese identifier 這, zhe (i.e., this), which is
a more economical way of introducing a mutually known topic. Hawkes, on the
other hand, felt the need to explicate the topic clearly.
Excerpt 7
“
的,正爲勸你這些,更說的狠了。” [Source] (Chapter 19)
“This is just what I wanted to warn you against, yet here you go, talking
more wildly than ever.” [Yang]
“It’s precisely this way of carrying on that I was going to talk to you about,
and here you go, ranting away worse than ever!” [Hawkes]
Another function of the Key-LBs in YD is the use of express imprecision. On
like this is one KeyLB in this subcategory with a high keyness value (LL: 32.94)
and overused in YD than HD. This LB does not specify what qualities it is referring to. Instead, it makes the circumstances off the record and leaves readers some
room for imagination. For example, in excerpt 8, the Yangs used on like this to
refer to the girl’s poor situation, which is not explicitly mentioned in the corresponding source text. The source text 這個形景 (literal translation: this situation)
does not specify clearly what situation the girl is in. On the contrary, Hawkes did
not use the imprecise LB on like this like the Yangs did but instead used the noun
phrase her outward behavior. Again, Hawkes has given his own personal interpretation of the expression 這個形景 (i.e., this situation).
Excerpt 8
“這女孩子一定有 麽話說不出來的大心事,才這麽個形景。外面既
這個形景,心裏不知怎麽熬煎。看他的模樣兒這般單薄,心裏那裏還
擱的住熬煎。可恨我不能替你分些過來。” [Source] (Chapter 30)
“She must have some secret anxiety preying on her mind to carry on like
this, yet she looks too delicate to stand much anxiety. I wish I could share her
troubles.” [Yang]
Lexical Bundles in the Dialogues of Hongloumeng Translations
243
“One can see from her outward behaviour how much she must be suffering
inwardly. And she looks so frail. Too frail for suffering. I wish I could bear
some of it for you, my dear!” [Hawkes]
13.5 Discussion
This chapter has applied keyword analysis to identify the three-word and fourword lexical bundles (LBs) which are significantly more frequent in each of the
Hongloumeng translations compared to meaningful LBs of other lengths. It is
found that many of Hawkes’s Key-LBs (i.e., lexical bundles unusually frequent
in Hawkes’s dialogue translation but infrequent in the Yangs’ dialogue translation) are verb phrases, while many of the Yangs’ Key-LBs (i.e., bundles unusually
frequent in the Yangs’ dialogue translation but infrequent in Hawkes’s dialogue
translation) are prepositional phrases. We have also found that almost half of
Hawkes’s Key-LBs function as stance markers, while the largest proportion of the
Yangs’ Key-LBs are referential markers. In this section, Hawkes’s and the Yangs’
use of LBs will be discussed with reference to their language backgrounds, life
experiences, and respective translation purposes.
13.5.1 Language Backgrounds
David Hawkes is a native English speaker, while Xianyi Yang is a native Chinese
speaker. Although his wife, Gladys Yang, is a native English speaker, she mainly
typed “the translation on a typewriter. While she was typing the text, she also
polished or edited it” (Li et al. 2011, 163). In our study, it is found that Hawkes
used more VP-based LBs, which is in line with Biber’s (2009) finding that 50%
of the LBs used in native spoken English are structured as “personal pronoun +
verb components.” This shows that Hawkes’s translation of fictional dialogues is
largely in line with the norm of spoken English in this respect. On the contrary,
Xianyi Yang, as a native Chinese speaker, is found to have used more PP-based
LBs. This is also consistent with some findings that L2 speakers (e.g., native Chinese speakers) tend to overuse certain LBs which native English speakers seldom
use (Chen and Baker 2010) and that Chinese speakers use more prepositions to
construct lexical bundles than did their native English counterparts (Wei 2007;
Chen and Baker 2010). As Chinese is a topicprominent language (Yip 1995), it is
not surprising that Chinese speakers adhere to the topicprominence convention by
using prepositions combined with a bare noun phrase in the topic position to ensure
grammaticality in English. On the other hand, English is a subjectprominent language which often structures sentences in a subjectpredicate relation (ibid.); thus,
half of the LBs in spoken English are made up of “pronoun + verb” (Biber 2009).
Hawkes’s VP-based Key-LBs, such as I think you and ought to be, are manifestations of subject prominence in English; the Yangs’ PP-based Key-LBs, such as
if not for and for no reason, may be influenced by topic prominence, in which
preposition phrases often serve as adverbials in Chinese. This supports previous
research (e.g., Yip 1995; Biber and Barbieri 2007, 2009; Conrad and Biber 2005)
244
Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto
that LBs in spoken English are mostly verb phrases, and Chinese speakers tend to
use prepositional phrases to topicalize the bare nouns or noun phrases when they
speak English.
As for the functional aspects, Hawkes’s Key-LBs, such as I think I and you
ought to, also resonate with the convention that the most prominent function of
LBs in spoken English is stancemaking: to assert epistemic stance and give directives (Biber and Barbieri 2007). Meanwhile, the Yangs’ less frequent use of stance
bundles might be related to the fact that Chinese speakers often underuse participant-oriented LBs (Wei and Lei 2011; Pan and Liu 2019). The Yangs’ overuse
of LBs such as this is just and on like this reflects Chinese speakers’ frequent use
of identifiers to express mutually known topics. Hence, Hawkes’s frequent use of
verb phrases and stance LBs, as well as the Yangs’ frequent use of prepositional
phrases and referential LBs, reveal that divergent translation styles can be attributed to the different language backgrounds of the respective translators.
13.5.2 Life Experiences
David Hawkes went to China and received postgraduate education in Beijing in
1948, while Xianyi Yang started his university education at Oxford University in
1936. According to Minford’s foreword to Xianyi Yang’s (2002) autobiography
White Tiger, Xianyi and Gladys Yang would visit David and Jean Hawkes and the
couples knew each other well. David Hawkes and Xianyi Yang were intellectuals
in pretty much the same historical time, and they published their translations of
Hongloumeng about the same time as well (i.e., both finished their translations
by 1980). On the other hand, Hawkes and Xianyi Yang contrast in their walks of
life. David Hawkes was a sinologist who first encountered Hongloumeng when he
studied at Peking University. He read the novel under the guidance of a Chinesespeaking “laoxiansheng,” 老先生 (translation: old scholar), who was a former
government clerk from the Hebei province. Hawkes described the reading journey
as “direct method gone mad” in a sense that he barely understood what the teacher
said. Perhaps due to his unpleasant experience, Hawkes preferred a more fluent
approach in rendering the fictional dialogues (more VP-based LBs) which sound
as if they were naturally spoken to the readers in English. Out of his passion for
the novel, Hawkes resigned from his chair professorship at Oxford in 1971 to be
fully devoted to his translation of Hongloumeng (Minford 2012). At that time,
Hawkes was already an established scholar who had a research fellowship to live
on. He did not translate for money’s sake but for his sheer joy. Contrary to David
Hawkes, Xianyi Yang did not have the luxury of spending years on polishing his
translated work. After he and his wife joined the official translation bureau in
1943, and subsequently the Foreign Languages Press in 1952, the couple was in
charge of translating literary works in new China. In the 1950s, Xianyi Yang was
drained by translating foreign works into Chinese, as he also had to fulfil the “voluntary physical labor” at the same time; from 1968 to 1972, the couple suffered a
hard time of being imprisoned due to the political unrest brought by the Cultural
Revolution. During the two years of translating Hongloumeng for the Foreign
Lexical Bundles in the Dialogues of Hongloumeng Translations
245
Languages Press, they lost their beloved son. According to Xianyi Yang’s autobiography (2012), they were never paid for the extra work on translation except
Hongloumeng, which was commissioned by the magazine Chinese Literature.
Our findings corroborated with Li, Zhang and Liu (2011) that Xianyi Yang and
Gladys Yang translated under censorship, grief, and tight schedule yet with little
remuneration. This probably explains why a more literal approach was employed
by the Yangs in rendering the fictional dialogues.
13.5.3 Translation Purposes
Finally, David Hawkes’s translation purpose was to entertain readers and literary
enthusiasts. To help reconstruct the dialogues, Hawkes has adopted a more liberal
approach in his translation. For example, one of the most frequently occurring
reporting verb phrase, 笑道, xiao dao, in the source text (literally: said with a
smile) was translated in various ways (e.g., childe, laugh, with a broad smile,
with a meaningful smile, with a proud smile) by Hawkes in relation to the context.
Hawkes justified this approach as a measure to compensate for the absence of the
tone of voice (Minford n.d., 32). In his preface to The Story of the Stone Volume
1: The Golden Days, Hawkes (1973, 46) stated his major concern in translating the novel: “If I can convey to the reader even a fraction of the pleasure this
Chinese novel has given me, I shall not have lived in vain.” When translating the
dialogues, Hawkes preferred stance bundles, as they serve many communicative
functions (e.g., expressing attitudes, desire, directives, intentions, predictions,
abilities) which render the dialogues more engaging. On the other hand, in the
Publisher’s Note of A Dream of Red Mansions Volume 1, it was stated that Hongloumeng is a book “about political struggle” (1978, iv), which “by presenting the
prosperity and decline of the four typical noble families it truthfully lays bare the
corruption and decadence of the feudal ruling class and points out its inevitable
doom” (1978, vii). Though such a remark might result from self-censorship due
to the political atmosphere of the time, such a depiction has clearly shown that
ideological factors greatly outweighed aesthetic ones in the case of the Yangs.
When translation becomes a task assigned by the officials, the translated work is
to promote ideologies and hence leaves the translators little room for interpretation. Therefore, it is plausible that the Yangs opted for a more rigid approach to
translate the novel.
13.6 Conclusion
This study sets out to compare different translators’ use of lexical bundles in
two Hongloumeng translations. In line with Mastropierro’s (2018) suggestion,
we affirmed that lexical bundles can serve as a reliable indicator beyond other
lexical devices for differentiating style in different translations. By examining the
syntactic structures and functions of the key lexical bundles in Hawkes and the
Yangs, we have found that the Yangs adopted a more literal and seemingly rigid
approach to translating Hongloumeng, as evidenced by the different use of key
246
Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto
lexical bundles from Hawkes. Our study has yielded some preliminary evidence
that translators’ styles may be influenced by the respective translator’s language
background, life experiences, and translation purposes. This study is, however,
not without limitations. Only translation works by two groups of translators (i.e.,
Hawkes and the Yangs) were sampled in the current study. Future studies can
compare more translation versions of Hongloumeng to examine whether the use
of lexical bundles differ among different translators as a result of their sociocultural background and translation purposes. Besides, as argued by Li and Zhang
(2010, 250), “[a] corpus as well as a statistical presentation of translation or language facts is not the ultimate goal of our research, but rather the beginning and
foundation for real research on whatever research questions the project is addressing.” In this regard, more documentary evidence needs to be collected to verify
the claims made based on corpus frequency data.
Notes
1 An earlier version first appeared in Translation Quarterly (2020), Issue 98, pp. 79–101.
This present version is updated and modified based on the earlier version.
2 For a more detailed review of recent studies on HLM, readers may refer to Moratto et al.
(2022).
3 Based on UCREL’s (https://ucrel.lancs.ac.uk/llwizard.html) instruction on calculating
log-likelihood and effect size, a critical log-likelihood value of 6.63 means that the null
hypothesis is considered to be false (i.e., p < 0.01). Therefore, a log-likelihood value of
6.63 is set as threshold for Key-LBs in the current study.
References
Axelsson, Karin. 2008. “Research on Fiction Dialogue: Problems and Possible Solutions.”
In Corpora: Pragmatics and Discourse, edited by Andreas H. Jucker, Daniel Schreier,
and Marianne Hundt, 189–201. Leiden: Brill.
Baker, Mona. 1993. “Corpus Linguistics and Translation Studies: Implications and Applications.” In Text and Technology: In Honor of John Sinclair, edited by Mona Baker, G.
Francis, and E. Tognini-Bonellis. Amsterdam: John Benjamins.
Baker, Mona. 2000. “Towards a Methodology for Investigating the Style of a Literary
Translator.” Target 12, no. 2: 241–66.
Biber, Douglas. 2009. “A Corpusdriven Approach to Formulaic Language in English:
Multiword Patterns in Speech and Writing.” International Journal of Corpus Linguistics
14, no. 3: 275–311.
Biber, Douglas, and Federica Barbieri. 2007. “Lexical Bundles in University Spoken and
Written Registers.” English for Specific Purposes 26, no. 3: 263–86.
Biber, Douglas, Susan Conrad, and Viviana Cortes. 2004. “If You Look At . . . : Lexical
Bundles in University Teaching and Textbooks.” Applied Linguistics 25, no. 3: 371–405.
Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad, and Edward Finegan.
1999. Longman Grammar of Spoken and Written English. London: Longman.
Bonsall, Bramwell Seaton, Trans. 2004. The Red Chamber (Hongloumeng). Hong Kong:
The University of Hong Kong. https://lib.hku.hk/bonsall/hongloumeng/title.pdf.
Bosseaux, Charlotte. 2007. How Does It Feel? Point of View in Translation: The Case of
Virginia Woolf into French. Amsterdam: Rodopi.
Lexical Bundles in the Dialogues of Hongloumeng Translations
247
Cao Xueqin, and E. Gao. 1982. Hong Lou Meng [in Chinese]. Beijing: People’s Literature
Press.
Chen, Yuhua, and Paul Baker. 2010. “Lexical Bundles in L1 and L2 Academic Writing.”
Language Learning & Technology 14, no. 2: 30–49.
Conrad, Susan, M., and Douglas Biber. 2005. “The Frequency and Use of Lexical Bundles in Conversation and Academic Prose.” Lexicographica 20: 56–71. https://doi.
org/10.1515/9783484604674.56.
Cortes, Viviana. 2004. “Lexical Bundles in Published and Student Disciplinary Writing:
Examples from History and Biology.” English for Specific Purposes 23, no. 4: 397–423.
Crystal, David. 1999. The Penguin Dictionary of Language. London: Penguin Books.
Fang, Yu, and Haitao Liu. 2015. “Comparison of Vocabulary Richness in Two Translated
Hongloumeng.” Glottometrics 31: 54–75.
Hawkes, David. 1973. The Story of the Stone Volume 1: The Golden Days. London: Penguin.
Hou, Yu. 2013. “A Corpus-Based Study of Nominalization as a Feature of Translator’s
Style (Based on the English Versions of Hong Lou Meng).” Meta 58, no. 3: 556–73.
Hyland, Ken. 2008. “As Can be Seen: Lexical Bundles and Disciplinary Variation.” English for Specific Purposes 27, no. 1: 4–21.
Ji, Meng, and Michael P. Oakes. 2018. “A Corpus Study of Early English Translations of
Cao Xueqin’s Hongloumeng.” In Quantitative Methods in Corpus-based Translation
Studies, edited by Michael P. Oakes and Meng Ji, 177–208. Amsterdam: John Benjamins.
Leech, Geoffrey N., and Mick Short. 1981. Style in Fiction: A Linguistic Introduction to
English Fictional Prose. Longman: London.
Li, Defeng, and Chunling Zhang. 2010. “Sense-Making in Corpus-Assisted Translation
Research.” In Using Corpora in Contrastive and Translation Studies, edited by Richard
Xiao, 235–54. Newcastle upon Tyne: Cambridge Scholars Publishing.
Li, Defeng, Chunling Zhang, and Kanglong Liu. 2011. “Translation Style and Ideology:
A Corpusassisted Analysis of Two English Translations of Hongloumeng.” Literary and
Linguistic Computing 26, no. 2: 153–66.
Liu, Kanglong, and Muhammad Afzaal. 2021. “Translator’s Style Through Lexical Bundles: A Corpusdriven Analysis of Two English Translations of Hongloumeng.” Frontiers
in Psychology 12. https://doi.org/10.3389/fpsyg.2021.633422.
Liu, Zequan. 2008. “Translating Tenor: With Reference to the English Versions of Hongloumeng.” Meta 53, no. 38: 528–48.
Mahlberg, Michaela, and Michael Hoey. 2012. Corpus Stylistics and Dicken’s Fiction.
London: Routledge.
Mahlberg, Michaela, Viola Wiegand, Peter Stockwell, and Anthony Hennessey. 2019.
“Speech Bundles in the 19th Century English Novel.” Language and Literature 28, no.
4: 326–53.
Mastropierro, Lorenzo. 2018. “Key Clusters as Indicators of Translator Style.” Target 30,
no. 2: 240–59.
Minford, John. 2012. “A Tribute to Brother Stone.” In Style, Wit and Word-Play: Essays in
Translation Studies in Memory of David Hawkes, edited by Tao Tao Liu, Laurence K. P.
Wong, and Sin-wai Chan, 1–14. Newcastle upon Tyne: Cambridge Scholars Publishing.
Minford, John. n.d. “Hawkes’ Approaches to Translating Fiction.” In A335 Culture and
Translation [Course Materials]. Hong Kong: OUHK.
Moratto, Riccardo, Kanglong Liu, and Di-kai Chao, eds. 2022. Dream of the Red Chamber: Literary and Translation Perspectives. London and New York: Routledge.
Mu, Yuanyuan. 2012. “Towards a Quantitative & Qualitative Stylistic Approach to
Ideational Construal in the Translation of Narrative Discourse: Norms and Readers’
248
Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto
Responses Revisited – A Corpus-based Study on Hong-lou Meng and Its Two English
Translations.” PhD diss., City University of Hong Kong.
Munday, Jeremy. 2012. Evaluation in Translation: Critical Points of Translator Decision
Making. London and New York: Routledge.
Pan, Fan, and Chen Liu. 2019. “Comparing L1L2 Differences in Lexical Bundles in Student and Expert Writing.” Southern African Linguistics and Applied Language Studies
37, no. 2: 142–57.
Pan, Fan, Randi Reppen, and Douglas Biber. 2016. “Comparing Patterns of L1 Versus
L2 English Academic Professionals: Lexical Bundles in Telecommunications Research
Journals.” Journal of English for Academic Purposes 21: 60–71.
Ran, Shiyang, and Ping Yang. 2013. “Breaking Through the Bottleneck: A Comparative
Investigation into the ChineseEnglish Translation Studies of ‘Hong Lou Meng’ [in Chinese].” China Publishing Journal 12: 61–3.
Saldanha, Gabriela. 2011. “Translator Style: Methodological Considerations.” The Translator 17, no. 1: 25–50.
Scott, Mike. 2020. WordSmith Tools Version 8. Stroud: Lexical Analysis Software.
Shrefler, Nathan. 2011. “Lexical Bundles and German Bibles.” Literary and Linguistic
Computing 26, no. 1: 89–106.
Su, Ke. 2021. “Translation of Metaphorical Idioms: A Case Study of Two English Versions
of Hongloumeng.” Babel 67, no. 3: 332–54.
Tong, Jasmine Man, and David Morgan. 2021. “ ‘Twice Bitten’ Two Men and a Translation:
The Making of the Stone.” Babel 67, no. 6: 791–818.
Tsao, Liqun. 2020. “On Translation of Social Terms in Hong Lou Meng.” Arts Studies and
Criticism 1, no. 3: 68–73.
Wang, Ning. 2016. “Chinese Literature as World Literature.” Canadian Review of Comparative Literature 43, no. 3: 380–92.
Wei, Naixing. 2007. “Phraseological Characteristics of Chinese Learners’ Spoken English:
Evidence of Lexical Chunks from COLSEC.” Modern Foreign Languages 30, no. 3:
281–91.
Wei, Yaoyu, and Lei Lei. 2011. “Lexical Bundles in the Academic Writing of Advanced
Chinese EFL Learners.” RELC Journal 42, no. 2: 155–66.
Wu, Chungming. 2021. “Towards Norms in Two Translations of Hong Lou Meng: A Corpus-based Study.” PhD diss., The Hong Kong Polytechnic University.
Yan Minmin 閆敏敏. 2005. “Ershinianlai de Hongloumeng yingyi yanjiu 二十年來的《
》英譯研究 [Twenty Years’ Studies on the Translation of Hongloumeng into
English]. Foreign Language Education 外語教學 26, no. 4: 64–68.
Yang, Hsien-Yi, and Gladys Yang, trans. 1978. A Dream of Red Mansions Volume 1. Beijing: Foreign Languages Press.
Yang, Xianyi. 2002. White Tiger: An Autobiography of Yang Xianyi. Hong Kong: Chinese
University Press.
Yip, Virginia. 1995. Interlanguage and Learnability: From Chinese to English. Amsterdam: John Benjamins.
Yu, Ke. 2020. “A Comparative Study of the Translation of Material Culture-loaded Words
of Hongloumeng in the Light of Skopostheorie.” Journal of Language Teaching and
Research 11, no. 2: 318–23.
Lexical Bundles in the Dialogues of Hongloumeng Translations
249
Appendix A
Yangs’ 3-Word and 4-Word Key-LBs
Key-LBs
Freq.
BIC
Log-Likelihood
Log-Ratio
P-Value
A FEW CUPS
ARE WE TO
AS THE PROVERB
AS THE PROVERB SAYS
BOUND TO BE
BUT MIND YOU
CARRY ON LIKE
COULD IT BE
COUPLE OF DAYS
DO SUCH A
DO YOU EXPECT
DOES IT MATTER
DON’T YOU KNOW
EVEN IF HE
FOR A COUPLE
FOR A COUPLE OF
FOR A STROLL
FOR A WHILE
FOR NO REASON
HAVE SUCH A
HAVE THE SAME
HIGH AND LOW
HOW CAN I
HOW CAN WE
HOW CAN YOU
HOW COULD I
HOW IT IS
HURRY UP AND
I MEANT TO
I’D NO IDEA
IF NOT FOR
IT’S NO USE
IT’S NOT THAT
JUST WHAT I
MUCH THE BETTER
MY ADVICE AND
NOTHING BUT A
ON LIKE THIS
ON THE SLY
SAY ONE WORD
SO AS TO
SO HOW CAN
SO LONG AS
10
11
19
18
20
10
12
10
30
10
11
12
11
10
18
18
12
29
19
10
12
11
36
25
61
20
14
37
17
10
18
24
10
11
12
12
11
20
14
10
30
13
15
3.34
4.99
4.11
2.76
19.81
3.34
6.64
3.34
6.41
3.34
4.99
6.64
4.99
3.34
2.76
2.76
6.64
7.26
18.17
3.34
6.64
4.99
13.68
28.05
29.16
5.47
9.93
2.67
14.87
3.34
16.52
26.4
3.34
4.99
6.64
6.64
4.99
19.81
9.93
3.34
19.64
8.28
11.58
16.47
18.11
17.23
15.88
32.94
16.47
19.76
16.47
19.54
16.47
18.11
19.76
18.11
16.47
15.88
15.88
19.76
20.38
31.29
16.47
19.76
18.11
26.81
41.17
42.29
18.59
23.05
15.79
27.99
16.47
29.64
39.52
16.47
18.11
19.76
19.76
18.11
32.94
23.05
16.47
32.76
21.41
24.70
1,059.58
1,059.71
3.02
2.94
1,060.58
1,059.58
1,059.84
1,059.58
2.26
1,059.58
1,059.71
1,059.84
1,059.71
1,059.58
2.94
2.94
1,059.84
2.40
1,060.50
1,059.58
1,059.84
1,059.71
2.52
1,060.90
2.38
3.09
1,060.06
1.66
1,060.34
1,059.58
1,060.43
1,060.84
1,059.58
1,059.71
1,059.84
1,059.84
1,059.71
1,060.58
1,060.06
1,059.58
3.68
1,059.96
1,060.16
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
(Continued)
250
Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto
Appendix A (Continued)
Key-LBs
Freq.
BIC
Log-Likelihood
Log-Ratio
P-Value
SO MUCH THE BETTER
TAKE MY ADVICE
TAKE MY ADVICE AND
THE BLAME ON
THIS CHANCE TO
THIS IS JUST
TO ASK FOR
TO SEE TO
TO SHOW MY
WHAT DOES IT MATTER
WHY NOT GO
WHY SHOULD WE
WOULDN’T THAT BE
YOU DON’T UNDERSTAND
12
13
11
11
11
15
26
13
10
12
11
15
20
18
6.64
8.28
4.99
4.99
4.99
11.58
3.68
8.28
3.34
6.64
4.99
11.58
19.81
16.52
19.76
21.41
18.11
18.11
18.11
24.70
16.80
21.41
16.47
19.76
18.11
24.70
32.94
29.64
1,059.84
1,059.96
1,059.71
1,059.71
1,059.71
1,060.16
2.25
1,059.96
1,059.58
1,059.84
1,059.71
1,060.16
1,060.58
1,060.43
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
Source: * Only Key-LBs loglikelihood > 6.63 (for pvalue < 0.01) are listed here.
Appendix B
Hawkes’s 3-Word and 4-Word Key-LBs
Key-LBs
Freq.
BIC
Log-Likelihood
Log-Ratio
P-Value
A BIT AND
A BIT BETTER
A BIT OF
A BIT TOO
A FEW MINUTES
A GOOD JOB
A LOT OF
A MATTER OF
A QUESTION OF
A THING LIKE THIS
A WORD WITH
ABLE TO SEE
AND AFTER THAT
AND GET IT
AND I DON’T
AND IN ANY
AND IN ANY CASE
ARE GOING TO
ARE IN THE
AS A MATTER
AS A MATTER OF
AWAY WITH IT
16
17
67
15
15
17
91
33
16
14
36
14
14
21
18
14
14
33
14
29
29
15
5.37
6.53
36.24
4.22
4.22
6.53
16.88
3.66
5.37
3.06
6.32
3.06
3.06
11.15
7.68
3.06
3.06
9.31
3.06
2.61
2.61
4.22
18.49
19.65
49.36
17.34
17.34
19.65
30.01
16.78
18.49
16.18
19.44
16.18
16.18
24.27
20.81
16.18
16.18
22.43
16.18
15.73
15.73
17.34
1,059.90
1,059.99
3.39
1,059.81
1,059.81
1,059.99
1.69
2.37
1,059.90
1,059.71
2.49
1,059.71
1,059.71
1,060.29
1,060.07
1,059.71
1,059.71
3.11
1,059.71
2.50
2.50
1,059.81
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
Lexical Bundles in the Dialogues of Hongloumeng Translations
251
Appendix B (Continued)
Key-LBs
Freq.
BIC
Log-Likelihood
Log-Ratio
P-Value
BE A BIT
EXACTLY THE SAME
FOR A BIT
GET ON WITH
GOING TO BE
GOING TO DO
GOT TO HEAR
HAVEN’T GOT ANY
HEAR ABOUT IT
I AM AFRAID
I AM NOT
I AM SURE
I DON’T KNOW WHY
I DON’T THINK
I HAVE BEEN
I HOPE YOU
I SHOULD HAVE
I SHOULD LIKE
I SHOULD LIKE TO
I SHOULD THINK
I THINK I
I THINK IT
I THINK IT’S
I THINK WE
I THINK WE OUGHT
I THINK YOU
I THOUGHT I’D
I THOUGHT YOU
I WONDER IF
IF YOU ARE
IF YOU ASK
IF YOU ASK ME
IF YOU WILL
I’M AFRAID I
I’M NOT SURPRISED
I’M SURE YOU
IN ANY CASE
IS A VERY
IS GOING TO BE
IS SUCH A
IS THE ONE
IT MUST HAVE
IT SEEMS THAT
I’VE JUST BEEN
KNOW WHAT THEY
LOOK AT YOU
ME ABOUT IT
NOT GOING TO
20
14
35
43
47
20
14
15
14
31
18
28
19
53
20
23
52
28
21
19
29
25
19
27
14
43
14
14
15
32
23
20
25
25
16
19
69
20
14
16
14
14
23
15
16
16
19
38
9.99
3.06
8.13
10.03
19.75
9.99
3.06
4.22
3.06
22.71
7.68
19.24
8.84
16.44
9.99
13.46
6.74
19.24
11.15
8.84
20.40
15.77
8.84
18.09
3.06
36.58
3.06
3.06
4.22
5.34
13.46
9.99
15.77
15.77
5.37
8.84
23.09
9.99
3.06
5.37
3.06
3.06
13.46
4.22
5.37
5.37
8.84
10.97
23.12
16.18
21.25
23.15
32.87
23.12
16.18
17.34
16.18
35.83
20.81
32.36
21.96
29.56
23.120
26.59
19.87
32.36
24.27
21.96
33.52
28.90
21.96
31.21
16.18
49.70
16.18
16.18
17.34
18.46
26.59
23.12
28.90
28.90
18.49
21.96
36.21
23.12
16.18
18.49
16.18
16.18
26.59
17.34
18.49
18.49
21.96
24.09
1,060.22
1,059.71
2.78
2.49
3.20
1,060.22
1,059.71
1,059.81
1,059.71
1,060.86
1,060.07
1,060.71
1,060.15
2.57
1,060.22
1,060.42
1.89
1,060.71
1,060.29
1,060.15
1,060.76
1,060.55
1,060.15
1,060.66
1,059.71
1,061.33
1,059.71
1,059.71
1,059.81
2.65
1,060.42
1,060.22
1,060.55
1,060.55
1,059.90
1,060.15
2.43
1,060.22
1,059.71
1,059.90
1,059.71
1,059.71
1,060.42
1,059.81
1,059.90
1,059.90
1,060.15
2.89
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
(Continued)
252
Kanglong Liu, Joyce Oiwun Cheung, and Riccardo Moratto
Appendix B (Continued)
Key-LBs
Freq.
BIC
Log-Likelihood
Log-Ratio
P-Value
OF THESE DAYS
ONE OF THESE DAYS
OUGHT NOT TO
OUGHT TO BE
OUT OF HERE
SAY THAT I
SHALL BE ABLE
SHALL BE ABLE TO
SHE HAS BEEN
SHOULD LIKE TO
SORT OF PERSON
SORT OF THING
SUPPOSED TO BE
SURE TO BE
TALK TO YOU
TELL HER THAT
TELL THEM THAT
THAT I AM
THAT I SHALL
THAT I WAS
THAT IF I
THAT IF YOU
THAT IT IS
THAT SORT OF
THAT SORT OF THING
THAT THEY ARE
THAT WE SHOULD
THAT YOU ARE
THAT YOU HAVE
THE WAY I
THERE IS A
THERE WOULD BE
THING LIKE THAT
THINGS LIKE THAT
THINK OF IT
THINK WE OUGHT
THINK WE OUGHT TO
THINK YOU OUGHT
THINK YOU OUGHT
TO
TO DO IS
TO DO SOMETHING
TO HAVE BEEN
TO HEAR ABOUT
TO TALK TO YOU
TO TELL ME
TO THINK THAT
TO YOU ABOUT
WANT TO GO
17
17
15
60
15
14
16
16
16
22
18
45
20
15
19
16
18
27
15
15
17
17
14
34
18
18
15
30
37
17
27
14
17
19
15
19
19
18
17
6.53
6.53
4.22
22.89
4.22
3.06
5.37
5.37
5.37
12.31
7.68
14.62
9.99
4.22
8.84
5.37
7.68
18.09
4.22
4.22
6.53
6.53
3.06
26.18
7.68
7.68
4.22
6.39
7.22
6.53
3.52
3.060
6.53
8.84
4.22
8.84
8.84
7.68
6.53
19.65
19.65
17.34
36.02
17.34
16.18
18.49
18.49
18.49
25.43
20.81
27.74
23.12
17.34
21.96
18.49
20.81
31.21
17.34
17.34
19.65
19.65
16.18
39.30
20.81
20.81
17.34
19.51
20.34
19.65
16.64
16.18
19.65
21.96
17.34
21.96
21.96
20.81
19.65
1,059.99
1,059.99
1,059.81
2.75
1,059.81
1,059.71
1,059.90
1,059.90
1,059.90
1,060.36
1,060.07
2.82
1,060.22
1,059.81
1,060.15
1,059.90
1,060.07
1,060.66
1,059.81
1,059.81
1,059.99
1,059.99
1,059.71
1,060.99
1,060.07
1,060.07
1,059.81
2.97
2.53
1,059.99
2.82
1,059.71
1,059.99
1,060.15
1,059.81
1,060.15
1,060.15
1,060.07
1,059.99
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
22
14
17
16
14
35
16
14
30
12.31
3.06
6.53
5.37
3.06
5.42
5.37
3.06
21.55
25.43
16.18
19.65
18.49
16.18
18.55
18.49
16.18
34.68
1,060.36
1,059.71
1,059.99
1,059.90
1,059.71
2.45
1,059.90
1,059.71
1,060.81
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
Lexical Bundles in the Dialogues of Hongloumeng Translations
253
Appendix B (Continued)
Key-LBs
Freq.
BIC
Log-Likelihood
Log-Ratio
P-Value
WE OUGHT TO
WHAT IT IS
WHAT YOU ARE
WHAT YOU HAVE
WHEN YOU ARE
WHILE YOU ARE
YOU ARE GOING
YOU ARE GOING TO
YOU ARE NOT
YOU ARE TOO
YOU DON’T NEED TO
YOU KNOW WHAT
YOU OUGHT TO
YOU OUGHT TO BE
YOU THINK THAT
YOU WILL BE
YOU WOULD BE
YOU WOULD HAVE
YOU’LL BE ABLE
YOU’LL BE ABLE TO
YOU’VE GOT TO
49
25
16
20
27
14
17
15
20
17
14
35
100
22
16
28
17
14
15
15
16
10.45
15.77
5.37
9.99
18.09
3.06
6.53
4.22
9.99
6.53
3.06
3.08
15.51
12.31
5.37
4.47
6.53
3.06
4.22
4.22
5.37
23.57
28.90
18.49
23.12
31.21
16.18
19.65
17.34
23.12
19.65
16.18
16.20
28.64
25.43
18.49
17.59
19.65
16.18
17.34
17.34
18.49
2.26
1,060.55
1,059.90
1,060.22
1,060.66
1,059.71
1,059.99
1,059.81
1,060.22
1,059.99
1,059.71
2.19
1.53
1,060.36
1,059.90
2.87
1,059.99
1,059.71
1,059.81
1,059.81
1,059.90
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
Source: * Only Key-LBs loglikelihood > 6.63 (for pvalue < 0.01) are listed here.
14 Mapping Culture-Specific and
Creative Metaphors in Lu
Xun’s Short Stories by L1 and
L2 English Translators
A Corpus-Assisted RelevanceTheoretical Account
Linping Hou and Defeng Li
14.1 Introduction
Lu Xun (1881–1936), the father of modern Chinese literature, is a pioneering
writer of Chinese short stories in a new Western style (Hsia 1961, 28). His short
stories have been translated into more than 50 foreign languages, available to the
reader or critics in the Eastern and Western world. Up to now, there have been four
most influential English translations of Lu’s short stories: three selected versions
(Lyell 1990; Wang 1941; Yang and Yang 1956) and one complete version (Lovell
2009). It can be said that Lu gained his reputation in the English world owing to
these (re)translations by L1 English translators (hereafter, L1 translators) as well
as L2 English translators (hereafter, L2 translators).
The English translations of Lu’s short stories have attracted much attention
from some researchers in translation studies. Methodologically, most of the inquiries were prescriptive case studies, but only a few empirical explorations were
involved in the descriptive analysis of these English (re)translations, along with
social or cultural explanations. Among these empirical studies, a corpus-assisted
method has been used to investigate linguistic (lexical or syntactic) features (e.g.,
Shao 2018; Shao and Wang 2018) and translator style (e.g., Li et al. 2018; Xu and
Jiang 2020; Yan and Han 2015). It is worth pointing out that the directionality of
these English translations, that is, forward (L1 to L2) translation and backward
(L2 to L1) translation, has not been investigated systematically. Among quite a
few corpus-assisted studies of directionality, Xu and Jiang (2020) did a case analysis of translation direction by examining linguistic features of L1 and L2 English
translations of Ah Q Zhengzhuan (阿Q正傳), one of Lu’s short stories. Hardly did
these corpus-assisted studies target the patterns of translation strategies of some
specific concepts (e.g., metaphor) and their underlying cognitive traits.
One of Lu’s artistic narrations is embedded in its rhetorical features by deploying metaphorical expressions. Most of these linguistic metaphors are characterized
by culture-specific colors, which are regarded as “the most important particular
problem” (Newmark 1988, 104) in translation for their “creative violation of the
DOI: 10.4324/9781003298328-15
Mapping Metaphors in Lu Xun’s Short Stories
255
semantic rules in a linguistic system” (Schäffner 2017, 249). However, as a salient
rhetorical feature in the source text (ST) of Lu’s short stories, metaphor and its
transformation into the target text (TT) were scarcely investigated with the assistance of corpus tools. It might be due to the challenge of manual semantic annotation and the complexity of different metaphorical mapping across languages and
time (Stefanowitsch 2006, 2). Beyond the research of English rendition of metaphors in Lu’s short stories, Rodríguez Márquez (2010) and Shuttleworth (2017)
used corpus tools to examine the patterns of translation strategies of political and
scientific metaphors, respectively. However, they were product-oriented research
from the perspective of conceptual metaphor. In this scenario, there is a need for
a corpus-assisted study of the rendering process of linguistic metaphors in Lu’s
stories from an alternative (e.g., cognitive-pragmatic) perspective.
A corpus-assisted process study might also be a big challenge to conduct
mapping analysis of metaphors across languages and time compared with its
corpus-assisted product analysis. The former has seldom been employed by the
researchers (Lang and Li 2020, 91), although such pure process-oriented methods as a verbal report, keystroke logging, and/or eye-tracking technique have
gradually attracted attention of translation process researchers to explore metaphors in translation (Schäffner 2017; Schäffner and Chilton 2020; Schäffner and
Shuttleworth 2013). A few scholars (e.g., Chou et al. 2016; Huang 2020) used
corpus-assisted process analysis to investigate the translation strategies and neurocognitive processing routes of culture-specific items in the Chinese rendition
of English written literary texts. However, they treated metaphorical expressions
as a whole under the umbrella term of culture-specific items without considering
culture-universal metaphors and without mentioning the conventional or creative
metaphors. In the same vein, Lang and Li’s (2020) corpus-assisted process study
of culture-specific metaphors in simultaneous interpreting was in lack of scrutinizing culture-universal or conventional/creative metaphors. More importantly,
Hou (2017) conducted a corpus-assisted cognitive study of translation strategy
patterns of linguistic metaphors in English translations of Lu’s short stories by
taking metaphors as a whole without further categorizing them into different types
in terms of culture specificity and conventionality. Moreover, Hou (ibid.) considered processing routes of metaphors from a neurocognitive perspective instead of
a communicative one. To fill this gap, the present study contrasted mapping strategies of culture-specific, universal, conventional, and creative metaphors within a
cognitive-pragmatic framework with the help of a bilingual parallel corpus.
There were two objectives in the present study: pattern description and theoretical explanation. First, it aimed to describe and contrast translation-strategy patterns of metaphors quantitatively. To this end, a bidirectional parallel corpus was
constructed to identify and contrast the patterns of translation strategies adopted
by L1 and L2 professional translators in their rendition of metaphors as a whole
as well as subcategories (i.e., culture-specific, culture-universal, conventional,
and creative ones). Second, it aimed to analyze the patterns within the theoretical framework of relevance theory (RT, Sperber and Wilson 1986/1995). Within
the RT framework, the between-group (i.e., L1 vs. L2 translators in dealing with
256
Linping Hou and Defeng Li
different categories of metaphors) and within-group (i.e., among the individual
translators) analyses were drawn on cognitive-pragmatic principles as well as on
the degree of translator’s subjectivity to violate these principles in the translation
process of linguistic metaphors.
Accordingly, there were three research questions in the current study, as follows.
(1) What were the rendering patterns of metaphors in English translations of Lu’s
short stories by L1 and L2 translators respectively?
(2) Contrastively, what were the similarities and differences between and within
these groups of translators?
(3) Cognitive-pragmatically, why were there such similarities and differences?
To answer the questions, Section 14.2 of the present chapter introduced RT
as a theoretical framework in metaphor translation. Section 14.3 described the
bidirectional bilingual parallel corpus for identifying translation-strategy patterns
of metaphors. Section 14.4 reported the corpus results and data analyses, and Section 14.5 discussed the findings from the perspective of RT.
14.2 Relevance Theory as the Theoretical Framework
RT proposed by Sperber and Wilson (1986/1995) has been viewed as the primary
theoretical framework in cognitive pragmatics and as “the only cognitive-pragmatic approach within translation studies” (Gallai 2019, 51). It thus serves as a
hub to relate a translator’s mental process to his/her rendering activity, a communicative act. This section presented a relevance-theoretic approach to pragmatics and translation in general and its applications to metaphors in translation in
particular.
14.2.1 RT and Translation
Sperber and Wilson (1986/1995) proposed the two modes of human communication: coding-decoding mode and inferential mode. The former views communication as a coding-decoding linear process, while the latter sees it as an inferential
computing process with the help of contextual assumptions, that is, cognitive
environment. The two modes are necessarily utilized in a complex communicative interaction between speakers and hearers in a processing form of the search
for relevance. Defined as “a property of inputs to cognitive processes and analyzed in terms of the notions of cognitive effect and processing effort” (Wilson
2000, 420), relevance could be achieved by the audience in an assumed ostensiveinferential process of either explicit or implicit communication. The relevanceoriented communication consists of two principles proposed by Sperber and
Wilson (1995, 260).
(1) Cognitive principle of relevance. Human cognition tends to be geared to the
maximization of relevance.
Mapping Metaphors in Lu Xun’s Short Stories
257
(2) Communicative principle of relevance. Every act of communication conveys
a presumption of its own optimal relevance.
The degree of relevance is regulated by the listeners’ cognitive effect in a given
context of mutual manifestation and mediated by his/her effort to infer the communicator’s informative or communicative intentions (Sperber and Wilson 2012,
102). Cognitively, other things being equal, the greater the positive effect yielded
by input, the greater the relevance to an individual, and the more efforts worth
exerting. However, the cognitive principle implies that an audience with a given
level of inferential abilities and preferences tends to get the maximal effect by putting the least necessary effort, resulting in maximal relevance. Communicatively,
every communicative act is viewed as a relevance-optimizing process during
which the addressee exerts enough effort to understand the addresser’s intentions.
The communicative principle indicates that no extra effort could be needed as
long as the optimal effect is achieved. Altogether, the process of communication
is economically conditioned by the principle of relevance in terms of cognitive
(processing) effort and cognitive (contextual) effect: (1) the least effort but the
most effect, and (2) enough effort with satisfactory effect.
RT was first applied to translation as a unified theoretical framework by Gutt
(1991/2000) and empirically advanced in written translation by such scholars as
Alves (1995) and Alves and Gonçalves (2003, 2007). Its applications to interpreting and audiovisual translation (for a review, see Gallai 2019; also see Alves
2020) were not presented here due to our interests in written translation in the
present study. Although RT approach to translation has been criticized by some
researchers (Gallai 2019, 61–62, 65), this line of research is fruitful in translation studies by viewing translation as an interlingual interpretive use of language
and by considering translator as a carrier striving for the interpretive resemblance
between the ST and the TT at the level of propositional form, thought, or utterance. The interpretive resemblance could be realized by “direct translation,” a
term defined by Gutt (2000, 171) as follows:
[A] receptor language utterance is a direct translation of a source language
utterance if and only if it purports to interpretively resemble the original completely in the context envisaged for the original.
In this respect, a translator is guided by the ST authors’ communicative clue,
that is, a sign invoking inference and response or starting point for poetic effect
(Carston 2002, 117; Mackenzie 2002, 5; also see Walker 2021, 187), to seek contextual resemblance to that of the ST. By direct translation, the authentic meaning of the original should be “unaffected by the translator’s own interpretation
effort” (Gutt 1991/2000, 186). Conversely, the other term, “indirect translation,”
is “designed to function on their own – e.g. a touristic leaflet – and may be modified in order to achieve maximal relevance for the users” (Gallai 2019, 58).
Apart from the interpretive resemblance between the ST and the TT in context, translation could be viewed as a secondary communication, “an act of
258
Linping Hou and Defeng Li
communication between translator and target audience only” (Gutt 2000, 213).
Based on his/her judgment of the contextual assumptions of the TT receiver, the
translator is therefore required to meet the TT reader’s expectations via making
decisions on such different crucial choices as employing direct and indirect translation, conveying informative and communicative intentions, and strategically
using explicitation and implicitation. The translator and the TT reader, therefore,
need to seek the shared basic assumptions about interpretive resemblance and the
agreement between translators’ intentions and the reader’s expectations in order to
succeed in their cross-language communication (Gutt 2000, 192).
Additionally, the three players in translation (i.e., the ST author, the translator,
and the TT reader) might have different cognitive environments. To satisfy the
expectations of relevance, the translator performs a kind of “mutual adjustment”
(Carston 2002) of two parallel processing in the translation process, that is, decoding the ST linguistic items into explicature and making contextual assumptions
from the explicit content. Importantly, translation competence plays a vital factor in
the translator’s adjustment of inference (Gutt 2005). In principle, other things being
equal, the higher translation competence tends to result in a success in translation by
means of optimal relevance. Alves and Gonçalves (2007) modeled the translator’s
competence within the relevance theoretical framework and claimed that competence could ensure the interpretive resemblance between the ST and the TT. Moreover, human cognition is designed by min-max relation between effort and effect, but
more effort could be exerted if there is a reward for the reader in the form of poetic
effects (Gutt 2000, 164). It is the case of more effect deserving more efforts if the
ST input is implicit rather than explicit content for a translator to make inferences
from contextual assumptions. Everything being equal, this implicitness needs more
necessary effort to process just as unshared, creative, and poetic items function as a
communicative clue to operate in translation (Walker 2021, 186).
To sum up, the translator’s choices within the RT framework are guided by
the principle of relevance. The degree of relevance is, in general, evaluated by
the translator’s cognitive effect and processing effort and, in particular, triggered
by the source input (e.g., implicitness/explicitness, culture-shared/unshared items,
creative/conventional usage) and modulated by his/her translation competence in
the context of background information (e.g., the bilingual and bicultural background knowledge and inferential skills used in the process of translation to yield
new cognitive effects). Theoretically, the translator’s decision-making is embedded in the search for an interactive balance between the interpretive resemblance
of the original content and the satisfaction of the expectations of the target reader.
In other words, the translator is expected to produce an equivalent effect between
the ST and TT readers (Nida 1964, 159) so that the cognitive equivalence could be
realized by reaching to similar cognitive effect between the ST and TT readers in
terms of the rendition of communication clues (Walker 2021, 197–98). Probably
in practice, he/she might assume interpretive resemblance but sacrifice expectations of the target reader so as to show his/her own translation intention (e.g.,
attitude or voice). It is therefore advisable for an RT translation researcher to take
source input and translator’s competence as the essential influential factors.
Mapping Metaphors in Lu Xun’s Short Stories
259
14.2.2 RT and Metaphors in Translation
Linguistic metaphor within the RT framework has been explored robustly in
monolingual communication (Carston 2017) but not in interlingual translation.
The subsection presented RT views on metaphor and examined its plausibility to
account for metaphor in translation.
14.2.2.1 RT Views on Metaphor Understanding
A metaphor might be a phenomenon of a departure from the normal use of language as explored in classical rhetorics or a way of thinking as examined in cognitive linguistics. However, metaphor in RT is viewed as a phenomenon of loose
(i.e., not strictly literal) use of language, requiring no specific mechanism or procedures compared with literal and loose uses in verbal communication (Sperber
and Wilson 1995, 237; also see Carston 2017, 43). Thus, the RT approach to metaphor comprehension is complementary to metaphor research from the perspective of cognitive linguistics (Gibbs and Tendahl 2006; Tendahl and Gibbs 2008;
Wilson 2011).
There are two distinct relevance-driven processing routes to metaphor comprehension: one is via ad hoc concept, the other via literal meaning. The deployment
of two routes depends on “a range of factors including the degree of familiarity,
complexity, and creativity of the metaphor” (Carston 2017, 43). The pragmatic
account of metaphorical utterance aims to “explain how hearers recognize the
intended meaning of a metaphorical utterance in context” (Wilson 2011, 180).
According to the traditional view in RT, hearers understand linguistic metaphors
by mutual parallel adjustment of linguistic and contextual clues to create a novel
ad hoc concept, which is different from linguistically encoded meaning (Wilson
2011, 179). In principle, metaphorical comprehension, like understanding other
literal or nonliteral uses, is regulated by relevance, that is, the audiences exert the
least effort in search of implications and stop at the first interpretation that satisfies his/her expectations of implicatures. As for an array of weak implicatures recognized in some metaphorical utterances, they might activate a powerful poetic
effect on the hearers and encourage further exploration. Novel metaphors might
involve analogical processing in deriving implicatures and generating ad hoc concept to understand them, compared with conventional metaphors (Wearing 2014).
Different from the traditional RT view on the comprehension of metaphorical
utterances (esp. conventional ones) by merely resorting to weak implicatures for
ad hoc concept construction, Carston (2017, 51–52) claimed that there was an
alternative, relevance-driven route to novel metaphor understanding. The alternative is far from “the quick, local, on-line meaning-adjustment process,” “but a
slower, more global appraisal of the literal meaning of the metaphorical language
from which inferences about the speaker’s meaning are made” (Carston 2017,
52). For example, image metaphors (esp., those new and creative ones) should
be interpreted more slowly and globally. Those image metaphors produce such
nonpropositional (experiential) effects as sensory, imagistic, or affective ones,
260
Linping Hou and Defeng Li
which could not be represented by explicatures or implictures, but they are in
need of further reflective inferential processing on the basis of literal meaning.
Some experimental data are consistent with Carston’s current position, suggesting
that metaphoric reading is far from cost-free in case of familiarity and creativity
(Noveck 2018, 166, 169).
14.2.2.2 RT and Interlingual/Intercultural Metaphor Rendition
Most of the previous studies of metaphor in translation as product or process
have been investigated from the perspective of conceptual metaphor theory (see
Schäffner 2004, 2017; Schäffner and Chilton 2020, 326–43; Schäffner and Shuttleworth 2013; Trim and Śliwa 2019), but there is little attention paid to linguistic
metaphors in translation from the angle of RT (see Alves 2020; Gallai 2019). One
of the core issues of the former line of research is the “culture-specific or universal metaphors and consequences for translation” (Schäffner 2017, 252), where the
mapping patterns of metaphor in translation have been identified and analyzed
within the theoretical framework of conceptual metaphor assisted with or without
bilingual or multilingual corpus.
Cultural factors have been addressed quite frequently in Translation Studies,
with cultural differences being identified as obstacles to the semantic transfer
of metaphors. . . . In investigating cultural differences, Translation Studies
scholars also build on Metaphor Studies research into the ways conceptual
metaphors are expressed linguistically in different languages.
(Schäffner 2017, 252)
Culture-specific constraint on rendering metaphors is therefore one of the main
reasons for the choice of a translation strategy as well as some identified patterns
(Schäffner and Chilton 2020, 339). However, it is worth pointing out that monolingually, a corpus-assisted pragmatic study of both conventional and novel metaphors within the RT framework is rare in comparison with the line of research in
cognitive corpus linguistics, which focuses on conventional rather than novel uses
of language (Kolaiti and Wilson 2014). It is also the case for a corpus-assisted
pragmatic study of the bilingual translation process of novel and culture-specific
metaphors. Kolaiti and Wilson’s (2014) corpus analysis of monolingual lexical
pragmatics would shed light on bilingual translations of the aforementioned types
of metaphors.
As early as the 1990s, Gutt (1991) investigated metaphorical expression with
the RT framework in his pioneering study. More recently, this kind of study
was followed by other scholars (e.g., Boase-Beier 2011). These early cognitivepragmatic explorations of metaphor in translation paved the methodological way
for further study, although they were not assisted with corpus tools. In this case,
the corpus-assisted research of metaphors in translation within the RT framework needs to be conducted on the basis of monolingual studies (e.g., Kolaiti
and Wilson 2014) since there are no previous bilingual ones used for reference.
Mapping Metaphors in Lu Xun’s Short Stories
261
Furthermore, professional translators’ selection of translation strategies of novel
and/or culture-specific metaphors needs to be revisited due to translation competence as a critical factor to constrain metaphor rendition strategy.
In short, it is plausible to adopt RT to analyze metaphors in translation, as early
studies have shown. The choice of translation strategy of metaphors within RT
depends on the key constraints, such as translator’s metaphor competence (e.g.,
familiarity) and types of metaphors (e.g., novel/creative or conventional; culturespecific or culture-universal).
14.2.3 Plausibility of RT in the Current Study
The main reason for applying RT into the present research lay in the gap-filling
exploration of rendering strategies in terms of the subcategories of metaphorical
expressions as well as the translator’s competence. As mentioned in Section 14.1,
metaphors in English translations of Lu’s short stories were mainly examined
from lingual/social/cultural perspectives rather than a cognitive-pragmatic one.
Additionally, most relevance-theoretical studies of metaphors in translation
neglected the novelty and culture specificity of literary poetic metaphors, not to
speak of those in Lu’s short stories by L1 and L2 translators. It was worth noting
that both L1 and L2 translators in the present study were professional translators.
Therefore, translation competence in the current research was viewed as a welldefined influential factor.
The other reasons were the seamless connection of RT and metaphor in translation. First, RT has adequate explanatory power in interlingual and intercultural
communication (i.e., translation; for a detailed review, see Gallai 2019, 51–72).
It is thus advisable to apply RT to examine metaphors, one of the important or
difficult issues in translation. Second, different from outer social-cultural analysis
and abstract cognitive-linguistic explanation, RT is characterized by psychological reality (Gibbs and Tendahl 2006, 379–403) and focuses on the inner (mental)
world of translators and communicative context (the speaker’s intention and reader’s expectation). In this respect, RT could provide a cognitive-pragmatic analysis
for translating metaphors in the present study. Third, with the development of
metaphor research from usage to cognition, there is a trend of taking communication as an angle to investigate metaphors (Steen 2011, 26). Accordingly, the
exploration of metaphor in cross-language/culture communication is in need of a
theoretical framework to keep pace with the trend in cognitive-pragmatic analysis. It is, therefore, safe and sound to say that as it would be a groundbreaking
research area to connect metaphor with the relevance-driven idea (Carston 2017,
42–43), RT might be one of the best choices to explore metaphor in translation.
It could therefore be predicted that RT possesses an invaluable power in its
cognitive-pragmatic account for the mental processes of metaphor rendition. RT
might tell us much more about the aspects of cognition in communication in contrast with the cognitive-linguistic approach to renditions of metaphors. It could
serve as a sound theoretical tool to probe into the interplay of effort and effect in
L1 and L2 translator’s translation process in the present research.
262
Linping Hou and Defeng Li
14.3 Research Method
As mentioned previously, it is plausible for the researchers to adopt the corpusassisted process research method to investigate the renditions of novel and culture-specific metaphors. Additionally, these metaphors could be explained within
the RT framework by following Kolaiti and Wilson’s (2014) practice of applying
the corpus-assisted research method. To the end, we constructed a self-supported
bilingual parallel corpus with the ST of Lu’s short stories and four English TTs by
L1 and L2 professional translators, respectively.
14.3.1 Bidirectional Multi-translational Bilingual Parallel Corpus
The corpus consisted of Lu’s ten short stories as the ST subcorpus and their four
English versions (i.e., multi-translations) as the TT subcorpus (for the detailed word
count of each subcorpus, see Li et al. 2018). The TT subcorpus was further divided
into TT_L1Eng and TT_L2Eng subcorpora. TT_L1Eng subcorpus was composed
of the English translations of Lu’s ten short stories by Lyell (1990) and Lovell
(2009). Accordingly, TT_L2Eng subcorpus was made up of the English versions by
Wang (1941) and Yang and Yang (1956). The Yangs were viewed as L2 translators
in this study because Yang was the principal translator and his wife was a TT polisher in their collaborative translation, which was in line with Wang’s (2011, 897)
proposal in his historical review of directionality in Chinese translation practice.
The ST and the TT(s) in each subcorpus were segmented at the sentence level
and aligned into parallel texts. By using the format of XML, the ST subcorpus was
annotated with four types of metaphors, and the TT subcorpus was tagged with four
translation strategies for these metaphors, as shown in the following subsections.
14.3.2 Metaphor Identification
We resorted to an authoritative definition in order to successfully circumscribe
and identify metaphors in the ST subcorpus. The metaphor in the ST was clarified
with the definition by Dickins as follows:
A metaphor is a figure of speech in which a word or phrase is used in a nonbasic sense, this non-basic sense suggesting a likeness or analogy . . . with
another more basic sense of the same word or phrase.
(Dickins 2005, 228)
Apart from the prior definition, we strictly followed the metaphor identification
procedure (MIP) put forward by Pragglejaz Group (2007). Categories of metaphors in the ST were classified in terms of time and space, that is, (1) conventional
and creative metaphors in the temporal dimension and (2) culture-specific and
culture-universal metaphors in the spacial dimension. We focused on culture-specific and creative metaphorical references rather than subclassification of metaphors at different linguistic (e.g., lexical, phrasal, or clausal) levels or different
Mapping Metaphors in Lu Xun’s Short Stories
263
word classes (e.g., nominal, verbal, adjective, adverbial, or prepositional one).
Therefore, the metaphorical references in the ST in our corpus were lexical in
nature, specific in culture, and creative in time. Note that creative or novel metaphors in Lu’s short stories refer to individually used but not popularly accepted
metaphorical expressions, although Lu’s short story was the production around
the 1920s–1930s. Additionally, culture-universal and conventional ones were
identified but viewed as the control group of items.
The overlapped space and time dimensions resulted in four mixed subcategories
of metaphors, namely, “culture-universal and conventional,” “culture-universal and
creative,” “culture-specific and conventional,” and “culture-specific and creative”
ones. In this case, we chose culture specificity as the first dimension and creativity as
the second to identify and annotated them separately. With similar operational procedures of MIP (Pragglejaz Group 2007), we focused on the contextualized use of nonbasic or indirect sense of metaphorical expressions compared with their basic or literal
sense. For instance, literally referring to “cold” and implicating “unsympathetic,” lenglengde (
地) in the expression “lenglengde shuo (
地說)” was identified as
a culture-universal and conventional metaphorical reference due to its culture-shared
and conventionalized features in both Chinese and English languages/cultures in the
present corpus. Take suipian (
) in the expression “[ta]sixiang li chuxian baikui
baijia de suipian ([他]思想裡出 白 白 的
)” for example, it literally means
small broken pieces of some concrete things and metaphorically means parts of experienced events. Suipian in the present corpus was identified as a culture-universal and
creative metaphorical expression because it is not a fixed or conventionalized expression in Chinese language but shared metaphor in both Chinese and English cultures
by means of looking up such Chinese and English authoritative dictionaries as Modern Chinese Dictionary ( 代漢語詞典) and Oxford English Dictionary along with
inquiries from the native language speakers. Similarly, Koufeng (口風) in the phrase
“tan gemingdang de koufeng (
的口風)” is a metaphorical reference, with
the literal meaning of “wind from one’s mouth” and indirect meaning of “attitude.” It
was identified as a Chinese culture-specific and conventional metaphor in the present
corpus, considering its usage in Chinese and English languages. With the same identification procedures and considering the literal/nonliteral usage across time and culture, wachu ( 出) in the verbal expression “cong shouli wachu yizhang zhitiao (從手
裡 出一張紙條)” was recognized as a culture-specific and creative metaphor.
14.3.3 Translation Strategies
The strategies for translating metaphors were divided into four categories, namely,
transcoding, paraphrasing, substitution, and omission, based on existing literature
(e.g., Chou et al. 2016; Lang and Li 2020; Sjørup 2013, 75; Toury 2012, 108).
We fully followed the definitions of the four translation strategies by Lang and Li
(2020), as follows.
Transcoding meant that the target retained the image and language form of
the source; paraphrasing meant that the target explained the meaning of the
264
Linping Hou and Defeng Li
source while discarding the image and the form of the source expression;
substitution referred to the replacement of the source metaphor with another
metaphor entailing a different image in the target language; omission meant
no corresponding translation in the target output.
(Lang and Li 2020, 97)
Take wachu ( 出), one of the metaphorical references in Subsection 14.3.2, as
an example. It had a literal meaning of “to dig something out” and metaphorical
meaning of “to catch and then hold something carefully by one’s hands” in the
atypical collocation “cong shouli wachu yizhang zhitiao” (從手
裡 出一張
紙條;literally, to dig out a sheet of paper from the handkerchief). When the English translation “to dig out” was adopted, it was the result of transcoding, a strategy used to retain the original form and image. The English translation “to take
out” was a result of the strategy of paraphrasing as the original image and form
were changed into an expression with explicit meaning. It is the case of substitution when the English idiomatic expression “to fish out” was used to replace the
original image with a similar metaphorical meaning. Finally, the omission could
be found in the target text where there was no corresponding translation at all.
14.3.4 Annotation and Cross-Check
The complete automatic annotation of metaphors with software could be the castle
in the air currently, as suggested by some scholars (e.g., Shuttleworth 2013, 119;
Lang and Li 2020, 97). It might be unavoidable for a researcher to resort to his/her
manual efforts to annotate metaphors in the ST. Human efforts were also required
to annotate translation strategies in the TTs based on the operational definitions.
Under such a circumstance, we made a sort of semi-automatic software by
using macro add-in in Microsoft Word in the format of XML to cope with the
time-demanding annotation. All the metaphors and their translation strategies
were annotated manually with the semi-automatic software. The annotation in the
present study underwent cross-check by three individual researchers to ensure its
reliability and validity.
14.4 Results and Analyses
After the completion of annotation, we identified 779 metaphors in the ST, among
which there were 724 culture-specific and 55 culture-universal ones. The total
number of culture-specific metaphors fell into 266 conventional and 458 creative ones, while culture-universal metaphors consisted of 41 conventional and 14
creative ones. The corresponding strategies for rendering these metaphors by L1
and L2 translators were identified and tabulated in Tables 14.1–14.4, respectively.
These strategies could be integrated into Gutt’s (1991/2000) cognitive-pragmatic
translation routes within the framework of RT, namely, direction translation and
indirect transition, by transforming transcoding into the former and by merging
paraphrasing, substitution, and omission into the latter, as seen in Figure 14.1.
Mapping Metaphors in Lu Xun’s Short Stories
265
Table 14.1 Rendering Strategies of Conventional Culture-Specific Metaphors
L2 Translators
TRC (%)
PAR (%)
SUB (%)
OM (%)
Total
L1 Translators
Wang (1941)
Yang and Yang (1956)
Lyell (1990)
Lovell (2009)
69 (25.94)
138 (51.88)
30 (11.28)
29 (10.90)
266
56 (21.05)
150 (56.39)
49 (18.42)
11 (4.14)
266
66 (24.81)
122 (45.86)
64 (24.06)
14 (5.26)
266
41 (15.41)
133 (50.00)
50 (18.80)
42 (15.79)
266
Note: TRC = transcoding; PAR = paraphrasing; SUB = substitution; OM = omission.
Table 14.2 Rendering Strategies of Creative Culture-Specific Metaphors
L2 Translators
TRC (%)
PAR (%)
SUB (%)
OM (%)
Total
L1 Translators
Wang (1941)
Yang and Yang (1956)
Lyell (1990)
Lovell (2009)
170 (37.12)
200 (43.67)
55 (12.01)
33 (7.21)
458
162 (35.37)
211 (46.07)
70 (15.28)
15 (3.28)
458
162 (35.37)
164 (35.81)
113 (24.67)
19 (4.15)
458
99 (21.62)
199 (43.45)
86 (18.78)
74 (16.16)
458
Note: TRC = transcoding; PAR = paraphrasing; SUB = substitution; OM = omission.
Table 14.3 Rendering Strategies of Conventional Culture-Universal Metaphors
L2 Translators
TRC (%)
PAR (%)
SUB (%)
OM (%)
Total
L1 Translators
Wang (1941)
Yang and Yang (1956)
Lyell (1990)
Lovell (2009)
36 (87.80)
2 (4.88)
1 (2.44)
2 (4.88)
41
23 (56.10)
13 (31.71)
5 (12.20)
0 (0)
41
23 (56.10)
11 (26.83)
5 (12.20)
2 (4.88)
41
19 (46.34)
18 (43.90)
3 (7.32)
1 (2.44)
41
Note: TRC = transcoding; PAR = paraphrasing; SUB = substitution; OM = omission.
Table 14.4 Rendering Strategies of Creative Culture-Universal Metaphors
L2 Translators
TRC (%)
PAR (%)
SUB (%)
OM (%)
Total
L1 Translators
Wang (1941)
Yang and Yang (1956)
Lyell (1990)
Lovell (2009)
13 (92.86)
0 (0)
0 (0)
1 (7.14)
14
11 (78.57)
2 (14.29)
0 (0)
1(7.14)
14
11 (78.57)
2 (14.29)
1 (7.14)
0 (0)
14
11 (78.57)
2 (14.29)
0 (0)
1 (7.14)
14
Note: TRC = transcoding; PAR = paraphrasing; SUB = substitution; OM = omission.
266
Linping Hou and Defeng Li
90.00%
80.00%
70.00%
60.00%
50.00%
40.00%
30.00%
20.00%
10.00%
0.00%
85.71%
76.50%
63.76%
79.89%
71.95%
71.51%
78.57%
51.22%
48.78%
36.24%
23.50%
28.05%
20.11%
14.29%
Con(S) Crt(S) Con(U) Crt(U)
28.49%
DirTran
21.43%
IndTran
Con(S) Crt(S) Con(U) Crt(U)
L2 Translators
L1 Translators
Figure 14.1 RT translation route of metaphors by L1 and L2 translators
Note: Con(S) = Conventional Culture-Specific Metaphors; Crt(S) = Creative Culture-Specific Metaphors; Con(U) = Conventional Culture-Universal Metaphors; Crt(U) = Creative Culture-Universal
Metaphors ; DirTran = Direct Translation; IndTran = Indirect Translation.
14.4.1 Patterns of Translation Strategies
The distributions of translation strategies of culture-specific metaphors were tabulated in Tables 14.1–14.2. Table 14.1 illustrated translation strategies of conventional culture-specific ones, while Table 14.2 depicted creative culture-specific
ones. Accordingly, the distribution of translation strategies of culture-universal
metaphors was tabulated in Tables 14.3 for conventional culture-universal ones
and in Table 14.4 for creative culture-universal ones. The patterns of rendering
strategies between groups as well as within group were analyzed in the following.
As shown in Tables 14.1 and 14.2, paraphrasing was the dominant strategy used
by both L1 and L2 translators for rendering culture-specific metaphors, followed by
transcoding, substitution, and omission, regardless of either conventional or creative subcategories. Compared with conventional culture-specific metaphors, the
creative ones as a whole in both translation directions accounted for a slightly lower
share for paraphrasing strategy but a slightly higher one for transcoding. Directionwise, L2 translators tended to employ more paraphrasing but less substitution than
L1 translators irrespective of convectional or creative culture-specific metaphors.
Individual differences within the groups of professional translators could be
found in both Table 14.1 and Table 14.2. Specifically, transcoding was used to
the greatest extent by Wang (1941) in comparison with any other translators in
the present research, paraphrasing by Yang and Yang (1956), substitution by Lyell
(1990), and omission by Lovell (2009). It suggested that the individual translator
had his/her translation style in terms of translation strategies used to render both
subcategories of culture-specific metaphors.
Tables 14.3 and 14.4 indicated the frequency distribution and the percentage of
rendering strategies of culture-universal metaphors in terms of conventionality or
creativity. Transcoding took the lion’s share among the translation strategies with
a higher percentage for conventional ones, especially, adopted by L2 translators in
Mapping Metaphors in Lu Xun’s Short Stories
267
forward translation. In general, paraphrasing was the secondary used strategy by
both L1 and L2 translators (but more for the former) for rendering culture-universal metaphors. However, there was no central tendency for the use of substitution
and omission either between or within a group.
Taken together, our corpus data revealed that paraphrasing was the most adopted
strategy by both L1 and L2 translators for rendering culture-specific metaphors,
while transcoding was the major one for culture-universal metaphors, suggesting
that culture-specific metaphors should be more difficult to translate than cultureuniversal ones. Additionally, the percentage of transcoding used by both L1 and L2
translators for translating conventional metaphors was lower than creative metaphors as a whole. Finally, more paraphrasing but less substitution were used by L2
translators than L1 translators in terms of culture-specific metaphors, suggesting
that more efforts should be exerted by L1 translators to achieve better cognitive
effect. In a similar vein, more paraphrasing and substitution were adopted by L1
translators for translating culture-universal metaphors, suggesting more cognitive
effect could be exerted due to more cognitive effort exerted by L1 translators.
14.4.2 Patterns of Cognitive-Pragmatic Translation Routes
As mentioned previously, the four rendering strategies were correlated to two
cognitive-pragmatic translation routes, namely, direct and indirect routes, for further analysis within the RT framework. For the sake of illustration, the present
research visualized the percentage of translation routes in Figure 14.1, and the
exact numbers of frequency distribution could be referred to in Tables 14.1–14.4.
Figure 14.1 demonstrated that indirect translation overwhelmingly dominated the rendition of culture-specific metaphors. However, direct translation
for culture-universal metaphors accounted for the biggest share in contrast to
indirect translation. Furthermore, the use of direct and indirect translation routes
by L2 translators for transforming culture-specific metaphors as a whole was
statistically different from that of culture-universal metaphors (χ2 = 86.97, p <
0.05). It suggested that metaphorical references with culture-universality in the
ST required more direct translation, while those of culture specificity engaged
more indirect translation. However, the rendition of conventional metaphors as a
whole by L2 translators demanded more indirect translation than that of creative
ones (χ2 = 9.85, p < 0.05). More interesting, L1 translators shared L2 translators’
central tendency, that is, more direct translation employed for culture-universal
metaphors in comparison with culture-specific ones (χ2 = 54.78, p < 0.05), but less
for conventional ones in contrast to creative ones (χ2 = 6.06, p < 0.05).
However, L1 and L2 translators adopted statistically different translation
routes, although a similar central tendency was observed between them. Direct
translation was recruited more frequently by L2 translators, while indirect translation was employed more frequently by L1 translators in terms of culture specificity and creativity of the original metaphors. Specifically, there were statistically
significant differences between direct and indirect translation used by L1 and L2
translators for translating culture-specific and culture-universal metaphors (culture-specific: L1 vs. L2 translators, χ2 = 13.43, p < 0.05; culture-universal: L1 vs.
268
Linping Hou and Defeng Li
L2 translators, χ2 = 7.4, p < 0.05). It is the same case for conventional and creative
metaphors respectively as a whole (conventional: L1 vs. L2 translators, χ2 = 5.05,
p < 0.05; creative: L1 vs. L2 translators, χ2 = 12.6, p < 0.05). It seemed that culture specificity and creativity of metaphors in the ST had a significant impact on
their cognitive-pragmatic translation routes employed L1 and L2 translators in the
present research.
With regard to the four mixed subcategories of metaphors (i.e., conventional
culture-specific, creative culture-specific, conventional culture-universal, and
creative culture-universal ones), they shared a similar tendency to those merged
ones in terms of culture specificity or creativity. Statistically, direct translation was
more frequently adopted by L2 translators than L1 translators in their translating
creative culture-specific metaphors as well as conventional culture-universal ones
(creative culture-specific, χ2 = 12.57, p < 0.05; conventional culture-universal,
χ2 = 7.45, p < 0.05). However, there was no statistical difference between L1
and L2 translators’ adaptation of translation routes for rendering conventional
culture-specific and creative culture-universal ones, respectively (conventional
culture-specific, χ2 = 1.79, p > 0.05; creative culture-universal, χ2 = 0.12, p >
0.05), although direct translation was employed more frequently by L2 translators
than L1 translators in such a condition (3.39% higher of conventional culturespecific metaphor and 7.14% higher of creative culture-universal ones for L2 than
L1 translators). It suggested that the selective nuances of translation routes by
L1 and L2 translators could be manifested in the process of rendering the mixed
subcategories of metaphors.
14.5 RT Accounts and Discussions
As shown in Section 14.4, the findings of the present study could affirmatively
answer the first two research questions raised in Section 14.1.3. Concerning the
third question, this section served as RT accounts for the similarities and differences within and between L1 and L2 translators in terms of translation strategies
and routes.
14.5.1 Dominant Strategies/Routes by L1 and L2 Translators
One of the salient features of translating metaphorical references in Lu’s short
stories lay in the dominant transfer shared by L1 and L2 translators. As for rendering culture-specific metaphors (esp., conventional ones) regardless of translation
direction, paraphrasing was the dominant strategy, as shown in Tables 14.1–14.2,
and indirect translation was the most frequently adopted route, as visualized in
Figure 14.1. However, transcoding was the overwhelmingly dominated strategy
for rendering culture-universal metaphors (esp., creative ones) by L1 and L2
translators, as indicated in Tables 14.3–14.4, and direct translation was the most
frequently adopted route, as demonstrated in Figure 14.1. In general, it suggested
that those culture-specific metaphors might cause more cognitive effort in translation than culture-universal ones.
Mapping Metaphors in Lu Xun’s Short Stories
269
Paraphrasing as the dominant strategy is also one of the major findings by Hou
(2017), who viewed metaphor as a whole without further classification. It might be
due to the reason that Lu’s short stories are rich in culture-specific metaphors. In
this respect, the use of paraphrasing strategy is of no significant difference between
metaphors as a whole and culture-specific metaphors. As culture-specific metaphors violate the semantic rules in the ST and are regarded as barriers in translation
(Newmark 1988, 104; Schäffner 2017, 249), a direct translation of the metaphorical image tends to be a failure for cross-cultural communication. Cognitive-pragmatically, most of the specific original image should not be transplanted into the
TT if there is no counterpart, as indicated in the four TTs of fanlian (翻臉) in
example 1, for fear that the foreignizing image could not evoke the cognitive effect
among the target readers. On the other hand, it was the translator’s optional decision-making to violate or observe cross-cultural communicative conventions. With
this regard, the minority of culture-specific metaphors could be optionally transcoded into the TT (as shown in the TT3 of example 2) with the assumption that the
target readers would exert extra effort for poetic effect in case of their recognizing
and inferring the communication clue of the stylistic salient feature. Therefore, it
seems that direct translation could be an alternative minor choice for professional
translators to cope with culture-specific metaphors in literary translation, except
for their deliberate purpose of introducing these metaphors in the ST into the target language. In general, a direct translation could not dominate the rendition of
culture-specific metaphors, although a professional translator needs to arrive at an
interpretive balance between the ST and TT (Gutt 1991/2000, 171).
Example 1
ST: 況且他們一翻臉,便說人 惡人。
TT1: They have a way of branding anyone they don’t like as a wicked man.
(Wang 1941)
TT2: . . . and once they are angry they will call anyone a bad character. (Yang
and Yang 1956)
TT3: What’s more, as soon as they turn against someone, they’ll say he’s evil
anyway. (Lyell 1990)
TT4: And they can turn on you in an instant. (Lovell 2009)
Example 2
ST: . . . 似乎想
的口風 。
TT1: . . . seeking to discover the attitude of the revolution towards himself.
(Wang 1941)
TT2: . . . as if sounding out the revolutionaries’ attitude. (Yang and Yang 1956)
TT3: . . . in an attempt to determine which way the revolutionary winds were
blowing. (Lyell 1990)
TT4: . . . (Lovell 2009)
270
Linping Hou and Defeng Li
Examples 1–2 demonstrate the typical pictures of translating culture-specific
metaphors in Lu’s short stories by L1 and L2 translators. These examples imply
that the majority of the original image of culture-specific metaphors should be
filtered or mediated in the TT, except for the translator’s assumption that the target
readers are capable of and willing to make extra effort to infer weak implicature
from ad hoc concept construction (Kolaiti and Wilson 2014). Additionally, the
intended meaning of culture-specific metaphors need to be comprehended by the
TT readers from the literal meaning of the transferred metaphors, the creative
ones in the TT, as claimed by the latest generation of relevance theorist (e.g.,
Carston 2017; Noveck 2018). As far as the translator’s decision-making process
is concerned, his/her mental operations in translating metaphors can also be supported by some non-RT experimental studies (e.g., Sjørup 2013; Tirkkonen-Condit
2002). Based on her research, Tirkkonen-Condit (2002) claimed that transcoding
or direct transfer was the default translation strategy for the translator to cope
with the culture-specific metaphors. Similarly, Sjørup (2013, 207) pointed out,
“the translators had a default direct transfer strategy, which was only replaced by
an indirect translation strategy when the translator found it necessary.” In short,
dormant indirect translation of the culture-specific metaphors could be interpreted
as evidence for culture filter or concept mediation due to the translator’s final
inferential processing to meet the TT readers’ expectations in terms of relevance.
However, with an in-depth examination of the different types of metaphors in the
current study, the most frequently used strategy and processing route for rendering
the culture-universal ones are different from that of the culture-specific ones. Not
viewed as translation production problems or difficulties, culture-universal metaphors in the ST could be directly translated into the TT without any obstacles to
crossing over. The translator might subconsciously follow the cognitive and pragmatic principles to keep the shared image in the TT via direct translation in most
cases, as shown in all the TTs in example 3. Moreover, he/she might consciously
transform the literal sense of the original metaphor into its metaphorical sense, null
meaning, or alternative image in a given particular case. The particular translation performance would result in the use of paraphrasing, deleting, or substitution
strategy (i.e., strategic realization of indirect translation). It was a case where the
target reader’s cognitive effort would increase in search for poetic effort with the
translator’s deliberate employment of image substitution or decrease by means of
semantic explicitation or omission. Example 4 illustrates that the translator in TT4
might enhance the target reader’s cognitive effort to enjoy the added poetic effect
by replacing the original image with a new one in the TT, while the translator of
TT1 might lessen the target readers’ cognitive effort by omitting the original image.
Example 3
ST: 漸漸地,小報上有匿名人來攻擊他 . . .
TT1: Gradually there appeared in the tabloid papers anonymous attacks upon
him . . . (Wang 1941)
Mapping Metaphors in Lu Xun’s Short Stories
271
TT2: Gradually anonymous attacks appeared in the less reputable papers . . .
(Yang and Yang 1956)
TT3: Anonymous attacks on him gradually began appearing in the local papers
. . . (Lyell 1990)
TT4: Small local newspapers began launching anonymous attacks on him . . .
(Lovell 2009)
Example 4
ST: . . . [他]
地說,“但哪裡去呢?”
TT1: . . . he said after I had told him of my desire to go elsewhere to find a
position. “But where to go?” (Wang 1941)
TT2: . . . he said coldly, after I asked him to recommend me to a job somewhere
else. “But where will you go?” (Yang and Yang 1956)
TT3: . . . he said coldly after I had asked him to help me find a job someplace
else. “But where else can you go?” (Lyell 1990)
TT4: . . . he stonily responded to my request for help in tracking down a new
situation. “But where else can you go?” (Lovell 2009)
It is worth pointing out that the translator’s routine practice for rendering culture-universal metaphors as a whole is to adopt transcoding strategy or direction
translation route, as revealed by the results of our research data in Section 14.4.
The overall positive transfer of shared images of culture-universal metaphors
from the ST to the TT could be “a preferred/unmarked choice” (Mauranen 2004,
80), although fluent bilinguals, especially professional translators, were reported
to flexibly use processing routes in text-based translation tasks (Hatzidaki and
Pothos 2008). In this respect, the translator’s deliberate manipulation in a given
cognitive context merely occurs in a few special cases. It seems that the translator’s operation should be first regulated by the cognitive principle of maximal
relevance and then adjusted by the communicative principle of optimal relevance
due to the gradual consciousness of the features of the input source in cognition
and communication.
However, the temporal dimension of culture-universal metaphors in this study
displays that more transcoding and direct translation route are used for translating
creative culture-universal metaphors than conventional culture-universal ones.
Probably, Lu borrowed the conventional expressions of western metaphors into
his literary writings in the early 1900s as a result of fresh flavor for the ST readers.
These creative culture-universal metaphors in the ST could be understood without
any effort by the TT readers due to these metaphors as conventional and shared
ones in the TT. As illustrated by the creative culture-universal metaphor “suipian”
) in example 5, the TT1 and TT4 translators were engaged with the use of
(
transcoding strategy and direct translation route to meet the target reader’s expectations, although the cognitive equivalence of the ST and TT readers might not be
balanced due to the creativity in the ST and conventionality in the TT. In general,
272
Linping Hou and Defeng Li
the maintenance of ST and TT communicative conventions by means of direct
translation leads to the central tendency to render culture-universal metaphors.
Example 5
ST: . . . [他]思想裡出 白 白 的
。
TT1: . . . and in his fertile imagination there again appeared fragments of shattered white helmets and white armor. (Wang 1941)
TT2: . . . and in his mind’s eye saw fragmentary visions of white helmets and
white armour once more. (Yang and Yang 1956)
TT3: . . . and glimpses of white helmets and white armor began to flicker
across his brain once again. (Lyell 1990)
TT4: . . . and fragments of white helmets and armour drifted back into his
thoughts. (Lovell 2009)
Apart from the RT analysis of these patterns, our results partially support the
previous corpus-assisted studies of metaphors or culture-specific items in other
texts rather than Lu’s short stories. It is worth mentioning that our results could
not be compared with those of corpus-assisted study of linguistic features or translator’s style in English translations of Lu’s short stories (e.g., Li et al. 2018; Shao
2018; Shao and Wang 2018; Xu and Jiang 2020) due to the fact that the latter ones
are not explorations of translation strategies/routes of metaphors. Taking metaphors with culture specificity as a whole, our research is correlated with some
corpus-assisted research of culture-specific or difficult items in literary translation
(Huang 2020) and even in consecutive interpreting (Dam 2001) or simultaneous
interpreting (Lang and Li 2020), although they are not involved in Lu’s short
stories. All the here-mentioned studies support the finding that paraphrasing strategy or meaning-based translation is used for rendering culture-specific items at
a large scale, suggesting that more efforts should be required for the translator/
interpreter to deal with the culture-specific items in considering the target reader’s
relevance to those items. More interesting, our finding of more transcoding and
direct translation route for rendering creative culture-universal metaphors is also
consistent with that of English-Chinese translation of culture-specific items by
Chou and her colleagues (Chou et al. 2016). It should be mentioned that Chou
and her colleagues (ibid.) selected the English novel Joy Luck Club by Amy Tan
as the ST, which is rich in Chinese idiomatic expressions written in English (e.g.,
metaphor as a subcategory of these expressions). The borrowed items from the
Chinese language are treated as creative culture-specific ones in the English ST
(ibid), but they are corresponding ones between English ST and Chinese TT. In
this case, the translators seem to transfer the items back into the TT with less effort
by using transcoding strategy compared with those unshared ones in terms of the
relevance of effort and effect.
Mapping Metaphors in Lu Xun’s Short Stories
273
14.5.2 L1 Translators’ Adaptation vs. L2 Translators’ Adoption
Taking metaphors as those being processed in different translation directions,
our study demonstrated that as an overall tendency, direct translation (realized by
adoption) was weighed by L2 translators while indirect translation (manifested by
adaption) was preferred by L1 translators. With respect to the overall tendency of
translators’ selecting translation strategies in both directions, our result is aligned
with that of Huang’s (2015, 79–94) corpus-based study of English translations
of Jia Pingwa’s novels. Huang (2015) concluded that L1 translators preferred
adaption while L2 translators were more likely to use adoption. In general, the
finding could be descriptively explained from the perspective of “interference”
and “standardization” (Toury 2012, 303–15). Take L2 translators for instance,
interference from the Chinese language (the source one) was stronger on the L2
translators due to Chinese as their native language and to their aims to introduce Chinese literature to the western world. In other words, the larger number
of L1 formal equivalence in L2 production in forward translation would result in
a higher proportion of adopted transfer. It was also correlated with the priming
or cue effect of the source language in translation comprehension, transfer, and
production (Wen and van Heuven 2017). In this case, L2 translators would subconsciously prefer the treatment of adoption rather than adaption.
However, cognitive-pragmatic explanation within the RT framework might
shed much more light on the directional effect in translation. The principle of
relevance entails two components in communication (or translation): one is the
least effort or min-max principle, the other is the less effort or optimal principle.
Cognitive-pragmatically, the former is realized as adopted transfer, while the latter is manifested as the adapted transfer. Governed by the principle of relevance,
the two processing routes (i.e., direct and indirect translations) are available at any
time, and the former is the priority, whereas the latter will take over if the former
fails. It could be inferred that professional translators subconsciously observe the
principle of relevance in different translation directions and that they take such
factors as concept and intention in their cognitive environments into consideration to an optimal degree. The unbalanced adoption and adaption suggest that L2
translators’ cognitive environment is different from L1 translators’.
Concept-wise, it is impossible for professional translators to know all metaphorical references in the ST and their counterparts in the TT due to the unavailability of their mental lexicon. Professional translators might have an unbalanced
mental lexicon in terms of L1 and L2 realization of a given concept. L2 professional translators are therefore much more knowledgeable in Chinese items but
less in English ones. If not, they could be translation experts with balanced knowledge in both languages and cultures in theory, but few could be found in practice.
Conversely, L1 translators are equipped with larger mental storage of English
items. This asymmetry of the mental lexicon in translator’s cognitive environment
could explain the result of more adoption by L2 translators and more adaptation
by L1 translators. With regard to intention, L2 translators in this present study
274
Linping Hou and Defeng Li
aimed to promote Chinese culture to the English world by keeping some potentially acceptable culture-specific images unchanged in the TT, while L1 translators aimed to introduce it as a novel for the general target readers, resulting in
the loss of foreign flavor more or less. In this respect, L2 professional translators
influenced by his/her cognitive environment are more engaged with adoption than
L1 professional translators. It would be likely to generate a slightly larger proportion of direct translation or adoption but a lower share of concept-mediated transfer by L2 professional translators than L1 professional translators.
As stated in Section 14.4, preference for translation strategies of metaphors varies across individual translators as the result of the different deployment of direct
or indirect processing routes. In general, it is also individual differences in terms
of the cognitive environment that regulate the employment of direct or indirect
translation. Compared with other translators, Wang (1941), as an L2 translator,
employed much more direct transfer. It is implied that he was more adoptionweighted due to his source-oriented purpose of promoting Chinese culture in his
cognitive environment. On the contrary, Lovell (2009) used more indirect transfer
to adapt Chinese metaphors to fulfill his more target-oriented purpose and meet
potential target readers’ expectations. Apart from observing the general tendency,
both L1 and L2 translators could adapt the culture-specific metaphor in the TTs in
some special cases. In examples 6–7, maimo (埋沒, literally referring to “bury”)
and yinxian (引線, literally referring to “fuse”) were rendered indirectly for lower
the cognitive effort of the target readers in all TTs except for TT3 in example 7.
Example 6
ST: . . . 他的刊物 決不會埋沒 稿子的。
TT1: . . . his publication would never turn down a good manuscript. (Wang
1941)
TT2: . . . his magazine would never ignore a good manuscript. (Yang and Yang
1956)
TT3: . . . he would never sit on a good manuscript. (Lyell 1990)
TT4: . . . Freedom’s Friend never turned away good work. (Lovell 2009)
Example 7
ST: . . .加以油雞們又大起來了,更容易成為兩家爭吵的引線。
TT1: – growing larger and larger, and more and more frequently the cause of
quarrels between the two families. (Wang 1941)
TT2: The chicks had grown into hens now, and were more of a bone of contention than ever between the two families. (Yang and Yang 1956)
TT3: . . . and as if the chicks weren’t enough, then they had to go and grow into
chickens and thereby provide even shorter fuses for the quarrels between
our family and the landlord’s. (Lyell 1990)
TT4: And now the hens were fully grown up, arguments between the two
households in the compound became more frequent. (Lovell 2009)
Mapping Metaphors in Lu Xun’s Short Stories
275
As mentioned previously, the categories of metaphors could modulate the use
of translation strategies by L1 and L2 translators. Our data revealed that culture
specificity and creativity influenced the degree of adoption and adaption by L1
and L2 translators, respectively, although preference of translation strategy or
processing routes varied with individual translators in both translation directions. Specifically, L2 translators were inclined to recruit more salient adoption
for culture-universal metaphors than culture-specific ones in comparison with
L1 translators. Similarly, L2 translators tended to use more adoption for creative
metaphors than conventional ones in contrast to L1 translators. In other words, L2
translators exerted more cognitive effort to render conventional culture-specific
metaphors compared with L1 translators. The interaction of stimuli (different
types of metaphors as source input) and cognitive environment leads to the asymmetry of processing routes between L1 and L2 translators. As mentioned earlier,
universal terms are easy to translate into the TT than the specific ones. In this
case, L2 translators with more ST knowledge and the purpose of culture promotion would generally employ direct translation to transfer more original images
into the TT than L1 translators. However, most of the creative metaphors in the
ST were borrowed from those in the TT, shown as our data, so that the translators
would not resort to much more effort to render them than conventional ones. It is
also the case for ST-knowledgeable, culture-motivated L2 translators to employ
more direct translation than L1 translators during the process of translating creative metaphors.
Overall, the cognitive-pragmatic account for those frequency distributions of
the strategies or routes lies in the governance of the principle of relevance in terms
of effort and effect embedded in the cognitive environments of the translator and
the TT reader. By observing the principle of relevance, both L1 and L2 professional translators in our pool of data subconsciously deployed the least effortful
route (i.e., direct translation). Crucially, if the most economical route is not available due to input difficulty or intentional-contextual consideration, he or she would
search for a less economical route (i.e., indirect translation). The interplay of processing routes could result in the dominant transfer and its variation by L1 and L2
translators for rendering metaphors as a whole or as an individual subcategory.
14.6 Concluding Remarks
The present research yielded some similarities and differences between L1 and L2
translators in their translating of metaphors. They were explained by the interplay
of effort and effect in terms of RT to understand the creative process of literary
translation in both translation directions. Four points needed further consideration, as follows.
First, compared with L2 translators, L1 translators tended to adapt metaphors
for the target reader, particularly in the cases of conventional and culture-specific
metaphors in the present study. The findings indicated that the categories of metaphors and translation direction were two interactive factors during the process
of translating metaphors where the directional effect was modulated by categories of metaphors. The interactive effects need to be testified with corpus-assisted
276
Linping Hou and Defeng Li
evidence for few corpus-assisted studies of metaphors in translation have focused
on the interactive effect.
Second, the corpus-assisted relevance-theoretical analysis could be triangulated with behavioral or neurocognitive studies in translation in terms of research
methodology. The former was complementary and mutual manifestation to the
latter, although it was far from fine-grained research (e.g., a controlled research
design in psychological studies). This complementary line of research supported
the idea of “corpus and psychological triangulation” advanced by some scholars (e.g., Halverson 2010; Lang and Li 2020; Mahlberg et al. 2014) that corpusassisted studies of literary professional translator’s performance at the textual
level could be triangulated by psychological studies. The triangulation needs further explorations with evidence. Specifically, the categorical effect of the more
frequent engagement of transcoding for the shared items and paraphrasing for
unshared ones needs further investigating by combining a corpus-assisted method
with alternative experimental method. This line of triangulation is also invaluable
for the directional effect on L1 and L2 translators’ performance.
Third, the application of indirect translation is not restricted to such appellative
texts as touristic leaflets, but it could be expanded to the translation of metaphors
in literary texts by L1 or L2 professional translators. Cognitive-pragmatically, indirect reproduction of metaphors in literary texts should also be conformed to the
principle of relevance. In other words, the interplay of effort and effect determines
the use of direct or indirect translation by both L1 and L2 translators, respectively.
In this scenario, RT advanced our understanding of the effect of directionality on
the translator’s performance in literary translation. Our present research results
revealed that L1 translators were more frequently engaged with indirect translation, whereas L2 translators resorted to more direct translation. It suggested that L1
translators exerted much more efforts to activate the cognitive context of the target
reader than L2 translators did. In addition, the impact of the input category on the
effort made by L1 and L2 translators in the current research could be explained as
the translator’s effort varying with the difficulty of the rendered items, namely, the
more difficult an item is, the more effort a translator employs.
Fourth, the present study has implications for the cross-language creation of
literary works and promotion of Chinese culture. As stated prior, both L1 and L2
competent translators (sub)consciously took their cognitive effort and contextual
effect into consideration and made an optimal choice to put the (un)shared and
(non)creative information into the target language. It is implied that translation
competence, historical-cultural context, information type, translator’s intention,
and reader’s expectation are the most influential factors among others for literary translation as well as culture promotion and reception. With these factors,
it is directionality that plays an interactive role, as the differences of L1 and L2
translators’ performance indicated in this present study. In this scenario, we claim
that L2 translators in the new era of Chinese culture promotion should keep pace
and peace with L1 translators’ contribution to the reception or consumption of
Chinese literary works and culture.
Mapping Metaphors in Lu Xun’s Short Stories
277
References
Alves, Fabio. 1995. Zwischen Schweigen und Sprechen: Wie bildet sich eine transkulturelle Brücke? Hamburg: Dr. Kovac.
Alves, Fabio. 2020. “Translation, Pragmatics and Cognition.” In The Routledge Handbook
of Translaiton and Cognition, edited by Fabio Alves and Arnt Lykke Jakobsen, 133–46.
London: Routledge.
Alves, Fabio, and José Luiz Gonçalves. 2003. “A Relevance Theory Approach to the Investigation of Inferential Processes in Translation.” In Triangulating Translation: Perspectives in Process Oriented Research, edited by Fabio Alves, 3–24. Amsterdam: John
Benjamins.
Alves, Fabio, and José Luiz Gonçalves. 2007. “Modeling Translator’s Competence: Relevance and Expertise Under Scrutiny.” In Doubts and Directions in Translation Studies,
edited by Yves Gambier, Miriam Shlesinger, and Radegundis Stolze, 41–55. Amsterdam: John Benjamins.
Boase-Beier, Jean. 2011. A Critical Introduction to Translation Studies. London:
Continuum.
Carston, Robyn. 2002. Thoughts and Utterances: The Pragmatics of Explicit Communication. Oxford: Blackwell.
Carston, Robyn. 2017. “Relevance Theory and Metaphor.” In The Routledge Handbook of
Metaphor and Language, edited by Elena Semino and Zsófia Demjén, 42–55. London:
Routledge.
Chou, Isabelle, Victoria Lei, Defeng Li, and Yuanjian He. 2016. “Translational Ethics from
a Cognitive Perspective: A Corpus-assisted Study on Multiple English-Chinese Translations.” In Rereading Schleiermacher: Translation, Cognition and Culture, edited by
Teresa Seruya and José Miranda Justo, 159–73. Heidelberg: Springer.
Dam, Helle V. 2001. “On the Option Between Form-based and Meaning-based Interpreting: The Effect of Source Text Difficulty on Lexical Target Text Form in Simultaneous
Interpreting.” The Interpreters’ Newsletter 11: 27–54.
Dickins, James. 2005. “Two Models for Metaphor Translation.” Target 17, no. 2: 227–73.
Gallai, Fabrizio. 2019. “Cognitive Pragmatics and Translation.” In The Routledge Handbook of Translation and Pragmatics, edited by Rebecca Tipton and Louisa Desilla,
51–72. London: Routledge.
Gibbs, Raymond W., and Markus Tendahl. 2006. “Cognitive Effort and Effects in Metaphor Comprehension: Relevance Theory and Psycholinguistics.” Mind and Language
21, no. 3: 379–403.
Gutt, Ernst-August. 1991/2000. Translation and Relevance: Cognition and Context. Manchester: St. Jerome.
Gutt, Ernst-August. 2005. “On the Significance of the Cognitive Core of Translation.” The
Translator 11, no. 1: 25–49.
Halverson, Sandra L. 2010. “Cognitive Translation Studies: Developments in Theory and
Method.” In Translation and Cognition, edited by Gregory M. Shreve and Erik Anglone,
349–69. London: Routledge.
Hatzidaki, Anna, and Emmanuel M. Pothos. 2008. “Bilingual Language Representation
and Cognitive Processes in Translation.” Applied Psycholinguistics 29, no. 1: 125–50.
Hou, Linping. 2017. “A Corpus-assisted Case Study of Translational Directionality: Translations of Chinese Short Stories into English by L1- and L2- Translators.” PhD diss.,
University of Macau.
278
Linping Hou and Defeng Li
Hsia, Chih-Tsing. 1961. A History of Modern Chinese Fiction. New Haven, CT: Yale University Press.
Huang, Libo. 2015. Style in Translation: A Corpus-based Perspective. Heidelberg: Springer.
Huang, Qiuhong. 2020. A Corpus-assisted Contrastive Study on Translating Culture-specific and Non-culture-specific Items. Hong Kong: Xin Hwa Book Co., Limited.
Kolaiti, Patricia, and Deirdre Wilson. 2014. “Corpus Analysis and Lexical Pragmatics: An
Overview.” International Review of Pragmatics 6, no. 2: 211–39.
Lang, Yue, and Defeng Li. 2020. “Cognitive Processing Routes of Culture-specific Linguistic Metaphor in Simultaneous Interpreting: A Corpus-assisted Study.” In Key Issues
in Translation Studies in China, edited by Lily Lim and Defeng Li, 91–109. Singapore:
Springer.
Li Defeng 李德鳳, He Wenzhao 賀文照 and Hou Linping 侯林平. 2018. “Lan Shiling
fanyi fengge kuzhu yanjiu” 藍詩玲翻譯風格庫助研究 [A Corpus-assisted Study of
Julia Lovell’s Translating Style]. Foreign Language Education 外語教學 39, no. 1: 70–6.
Lovell, Julia (trans.). 2009. The Real Story of Ah-Q and Other Tales of China. London:
Penguin Books.
Lyell, William (trans.). 1990. Diary of a Madman and Other Stories. Honolulu, HI: University of Hawaii Press.
Mackenzie, Ian. 2002. Paradigms of Reading: Relevance Theory and Deconstruction. New
York: Palgrave Macmillan.
Mahlberg, Michaela, Kathy Conklin, and Marie-Josée Bisson. 2014. “Reading Dickens’s
Characters: Employing Psycholinguistic Methods to Investigate the Cognitive Reality of
Patterns in Texts.” Language and Literature 23, no. 4: 369–88.
Mauranen, Anna. 2004. “Corpora, Universals and Interference.” In Translation Universals: Do They Exist?, edited by Anna Mauranen and Pekka Kujamäki, 65–82. Amsterdam: John Benjamins.
Newmark, Peter. 1988. A Textbook of Translation. New York: Prentice Hall International.
Nida, Eugene A. 1964. Toward a Science of Translating with Special Reference to Principles and Procedures Involved in Bible Translating. Leiden: E. J. Brill.
Noveck, Ira. 2018. Experimental Pragmatics: The Making of a Cognitive Science. Cambridge: Cambridge University Press.
Pragglejaz Group. 2007. “MIP: A Method for Identifying Metaphorically Used Words in
Discourse.” Metaphor and Symbol 22, no. 1: 1–39.
Rodríguez Márquez, María. 2010. “Patterns of Translation of Metaphor in Annual Reports
in American English and Mexican Spanish.” PhD diss., University of Surrey.
Schäffner, Christina. 2004. “Metaphor and Translation: Some Implications of a Cognitive
Approach.” Journal of Pragmatics 36, no. 7: 1823–64.
Schäffner, Christina. 2017. “Metaphor in Translation.” In The Routledge Handbook of
Metaphor and Language, edited by Elena Semino and Zsófia Demjén, 247–62. London:
Routledge.
Schäffner, Christina, and Paul Chilton. 2020. “Translation, Metaphor and Cognition.” In
The Routledge Handbook of Translaiton and Cognition, edited by Fabio Alves and Arnt
Lykke Jakobsen, 326–43. London: Routledge.
Schäffner, Christina, and Mark Shuttleworth. 2013. “Metaphor in Translation: Possibilities
for Process Research.” Target 25, no. 1: 93–106.
Shao Li 邵莉. 2018. “Lu Xun xiaoshuo yizuozhong de cihui ouhua xianxiang – jiyu yuliaoku de lishi yuyanxue yanjiu” 魯迅小說譯作中的詞彙歐化
– 基於語料庫的
歷時語言學研究 [Lexical Europeanization in Lu Xun’s Fictional Translation Works:
Mapping Metaphors in Lu Xun’s Short Stories
279
A Corpus-based Diachronic Linguistic Study]. Journal of PLA University of Foreign
Languages 解放軍
學院學報 41, no. 6: 98–106.
Shao Li 邵莉, and Wang Kefei 王克非. 2018. “Lu Xun baihua xiaoshuo yizuozhong jufa
ouhua xianxiang de lishi bianhua – jiyu yuliaoku de yanjiu fangfa” 魯迅白話小說譯作
中句法歐化
的歷時變化 – 基於語料庫的研究方法 [The Diachronic Change of
Syntactic Europeanization in Lu Xun’s Vernacular Fictional Translation Works: A Corpusbased Approach]. Foreign Language and Their Teaching 外語與外語教學, no. 6: 133–42.
Shuttleworth, Mark. 2013. “Metaphor in Translation: A Multilingual Investigation into
Language Use at the Frontiers of Science Knowledge.” PhD diss., Imperial College
London.
Shuttleworth, Mark. 2017. Studying Scientific Metaphor in Translation: An Inquiry into
Cross-lingual Translation Practices. Abingdon: Routledge.
Sjørup, Annette. 2013. “Cognitive Effort in Metaphor Translation: An Eye-tracking and
Key-logging Study.” PhD diss., Copenhagen Business School.
Sperber, Dan, and Deirdre Wilson. 1986/1995. Relevance: Communication and Cognition.
Oxford: Blackwell.
Sperber, Dan, and Deirdre Wilson. 2012. “A Deflationary Account of Metaphors” In Meaning and Relevance, edited by Deirdre Wilson and Dan Sperber, 97–122. Cambridge :
Cambridge University Press.
Steen, Gerard. 2011. “The Contemporary Theory of Metaphor – Now New and Improved!”
Review of Cognitive Linguistics 9, no. 1: 26–64.
Stefanowitsch, Anatol. 2006. “Corpus-based Approaches to Metaphor and Metonymy.” In
Corpus-based Approaches to Metaphor and Metonymy, edited by Anatol Stefanowitsch
and Stefan Th. Gries, 1–17. Berlin: Mouton de Gruyter.
Tendahl, Markus, and Raymond W. Gibbs. 2008. “Complementary Perspectives on Metaphor: Cognitive Linguistics and Relevance Theory.” Journal of Pragmatics 40, no. 11:
1823–64.
Tirkkonen-Condit, Sonja. 2002. “Metaphoric Expressions in Translation Processes.”
Across Language and Cultures 3, no. 1: 101–16.
Toury, Gideon. 2012. Descriptive Translation Studies and Beyond (rev. ed.). Amsterdam:
John Benjamins.
Trim, Richard, and Dorota Śliwa. 2019. Metaphor and Translation. Newcastle upon Tyne:
Cambridge Scholars Press.
Walker, Callum. 2021. An Eye-Tracking Study of Equivalent Effect in Translation: The
Reader Experience of Literary Style. Cham: Palgrave Macmillan.
Wang, Baorong. 2011. “Translation Practices and the Issue of Directionality in China.”
Meta 56, no. 4: 896–914.
Wang, Chi-Chen (trans.). 1941. Ah Q and Others: Selected Stories of Lusin. New York:
Columbia University Press.
Wearing, Catherine. 2014. “Interpreting Novel Metaphors.” International Review of Pragmatics 6, no. 1: 78–102.
Wen, Yun, and Walter J. van Heuven. 2017. “Chinese Translation Norms for 1,429 English
Words.” Behavior Research Methods 49, no. 3: 1006–19.
Wilson, Deirdre. 2000. “Metarepresentation in Linguistic Communication.” In Metarepresentations: A Multidisciplinary Perspective, edited by Dan Sperber, 411–48. Oxford:
Oxford University Press.
Wilson, Deirdre. 2011. “Parallels and Differences in the Treatment of Metaphor in Relevance Theory and Cognitive Linguistics.” Intercultural Pragmatics 8: 177–96.
280
Linping Hou and Defeng Li
Xu Ming 許明, and Jiang Yue 蔣躍. 2020. “Ah Q Zhengzhuan yiru yichu wenben de fengge
jiliangxue duibi”《阿Q正傳》譯入譯出文本的風格計量學對比 [A Stylometric Comparison of L1 Translations and L2 Translations of The True Story of Ah Q]. Foreign
Languages Research 外語研究 37, no. 3: 86–92.
Yan Yidan 嚴苡丹, and Han Ning 韓寧. 2015. “Jiyu yuliaoku de yizhe fengge yanjiu –
yi Lu Xun xiaoshuo liangge yingyiben wei li” 基於語料庫的譯者風格研究 – 以魯
迅小說兩個英譯本為例 [A Corpus-based Study of Translator’s Style in the Two English Versions of Lu Xun’s Fiction]. Foreign Language Education 外語教學 36, no. 2:
109–13.
Yang, Hsien-yi, and Gladys Yang (trans.). 1956. Lu Xun: Selected Works (Vol 1). Beijing:
Foreign Language Press.
15 On a Historical Approach to
Cantonese Studies
A Corpus-Based Contrastive
Analysis of the Use of Classifiers
in Historical and Recent
Translations of the Four Gospels
Tak-sum Wong and Wai-mun Leung
15.1 Introduction
Supported by the Lord Wilson Heritage Trust, the “Database of the 19th Century
(1865–1894) Cantonese Christian Writings” provides a public data repository
through the digitization of 15 Cantonese Christian classics published in middle
to late nineteenth century (Tóngguāng 同光 period of Qing Dynasty), with a total
of approximately 466,000 characters. The database is accessible by those who
are interested in the history of Christianity in Hong Kong and provides valuable
and reliable documents for scholars in the fields of linguistics, theology, religion,
translation, and other academic disciplines.1
Since Robert Morrison (1782–1834) arrived in Guangzhou at the beginning of
the nineteenth century, marking the beginning of Protestant missions in China,
many missionaries have followed his footsteps coming to the East. To facilitate
the dissemination of Christian teachings, missionaries who came to Guangdong
learned the local language, Cantonese, in the Guangdong region (including Hong
Kong) and began to translate, write, and publish Christian books in Cantonese
dialects, such as prayers, evangelistic books, and hymns. In addition to the various
books of the Bible, many influential Christian books were gradually translated to
or written in Cantonese during the mid- to late nineteenth century, such as Coming
Close to Jesus (1865), The Pilgrim’s Progress (1871), and Questions and Answers
on the Gospel of John (1888).
The historical value of the works available in this database is enormous for the
study of Christian missionary activities in the Guangdong area and the history
of early Cantonese translations. For example, it provides not only materials for
the study of the progress of scholars’ interpretation of ancient biblical manuscripts but also documents for the study of the historical development of Cantonese, textual analysis and interpretation of Cantonese, comparison of expressions
and styles in English-Cantonese translations, and historical formation of written
Cantonese.
DOI: 10.4324/9781003298328-16
Tak-sum Wong and Wai-mun Leung
282
The four key features of this database are as follows:
1
High diversity of literature. Full texts of the 15 Cantonese Christian classics
during the mid- to late nineteenth century were digitalized, covering the following four categories:
•
Books of the Bible:
The Old Testament: Genesis (1873), Exodus (1888), Deuteronomy (1888)
The New Testament: Acts (1872), Matthew (1882), Mark (1882), Luke (1883),
John (1883), Selected Readings of the Gospel of Luke (circa the 1880s, Chinese-English-Romanization edition)
•
•
•
2
3
4
Allegorical novels: The Pilgrim’s Progress (1871), The Pilgrim’s Progress II (1870)
Spiritual missions: Coming Close to Jesus (1865), That Sweet Story of
Old (1874)
Teaching materials: Questions and Answers on the Gospel of John
(1888), Readings in Cantonese Colloquial (1894)
Easy searching and exporting. Our database provides retrieval and advanced
query functions such that users can set the number of results per page from
10 to 100 entries. The preceding and ensuing three sentences of each search
result are displayed on the result page to help users understand its context. Results can be easily copied or exported to a spreadsheet for further
processing.
Displaying images of original materials. Scanned images of original texts
of all the 15 documents are provided to facilitate close reading of primary
sources by users.
Facilitating the comparison of different translations.
The Old Testament. The following translation is provided for users to compare different translations of verses in Genesis, Exodus, and Deuteronomy:
•
The Mandarin version published in Shanghai in 1919 (“The Old and
New Testaments,” Chinese Union Version Bible, published by the American Bible Society)
The New Testament. The following two editions are provided for users to
access selected readings from Matthew, Mark, Luke, John, and Acts for text
comparison:
•
•
The Mandarin version published in Shanghai in 1919 (“The Old and
New Testaments,” Chinese Union Version Bible, published by the American Bible Society)
The contemporary Cantonese translation published in Hong Kong in
2010 (Cantonese Bible: New Cantonese Version, published by the Hong
Kong Bible Society, first edition published in 2006)
On a Historical Approach to Cantonese Studies 283
In the first stage of development of our database, 15 historical Christian writings
were digitalized and made publicly accessible. In the second stage, we planned to
provide linguistic tagging for all texts. At present the tagging of the 1880s (Noyes
et al. 1882a, 1882b, 1883a, 1883b) and 2010 editions of the four canonical gospels in the Christian New Testament (“Four Gospels,” hereinafter) was finished.
In this chapter, we will focus on these eight texts and provide a statistical account
and a contrastive study on the use of classifiers therein. For the linguistic value of
studying the translations of the Four Gospels, please refer to Leung (2011, 2016,
2021). On the study of digitalizing the early Cantonese Bible, the reader may refer
to Kataoka (2021).
15.2 Classifiers in Cantonese
In most European languages, the use of measure words is marked. They are only
employed when actualizing the semantic boundary of nouns (Bisang 1999, 121)
is desired. In some cases, the natural boundary is absent (e.g., a cup of coffee,
and a drop of water), while in other cases, the use of natural boundaries is not
intended (e.g., a basket of fruit, and a gang of people). In the context when the
natural boundary is adopted when counting, measure words are always absent
(e.g., an apple, a man, and a bean). On the other hand, in another part of the
world, the use of measure words is mandatory for a number of languages, even
when the natural boundary is adopted when counting. The measure words in
these languages are often referred to as classifiers. For example, in contemporary Cantonese:
(1) 一個哥哥
jɐt5 kɔ33 kɔ11kɔ55
one cl elder.brother
“an elder brother”
(2) 兩隻眼
lœŋ13 tsɛk3 ŋan13
two cl
eye
“two eyes”
(3) 三個姑娘
sam55 kɔ33 ku55nœŋ11
three cl young.lady
“three young ladies”
(4) 六隻貓
lok2 tsɛk3 mau55
six cl
cat
“six cats”
284
Tak-sum Wong and Wai-mun Leung
The absence of classifiers is ungrammatical when counting (with rare exceptions),
for example:
(1)’ *一哥哥
*jɐt5 kɔ11kɔ55
one elder.brother
“an elder brother”
(2)’ *兩眼
*lœŋ13 ŋan13
two
eye
“two eyes”
(3)’ *三姑娘
*sam55 ku55nœŋ11
three young.lady
“three young ladies”
(4)’ *六貓
*lok2 mau55
six cat
“six cats”
Classifiers can be used to count not only nouns but also actions, exempli
gratia:
(5) 賭一鋪
tou35 jɐt5 pʰou55
bet one cl
“to take a gamble”
(6) 打十下
ta35 sɐp2 ha13
hit ten cl
“hit ten times”
Classifiers for counting objects, as shown in examples 1 to 4, are commonly
known as numerical classifiers, while those for counting actions, as shown in
examples 5 and 6, are commonly called verbal classifiers.
When nouns are premodified with demonstrative and interrogative pronouns,
the use of classifiers is also mandatory, such as:
On a Historical Approach to Cantonese Studies 285
(7) 呢個姑娘
ni55 kɔ33 ku55nœŋ11
this cl young.lady
“this young lady”
(8) 嗰隻貓
kɔ35 tsɛk3 mau55
that cl cat
“that cat”
(9) 邊隻眼?
pin55 tsɛk3 ŋan13 ?
which cl eye
“Which eye?”
Being commonly used for counting and referential purposes in Cantonese (and
the majority of Sinitic languages), noun classifiers can also undergo reduplication to form reduplicated classifiers denoting each individual (Wu 2017), for
example:
(10) 個個
靚
kɔ33kɔ33 ku55nœŋ11 tou55 hou35 lɛŋ33
cl-cl young.lady also very pretty
“Every young lady is pretty.”
In example 10, the general classifier kɔ
ɔ33個 is reduplicated to form the construction kɔɔ33kɔɔ33個個, “everyone,” referring to every young lady.
For a comprehensive usage of classifiers in contemporary Cantonese, readers can refer to Cheung (2007, 344–6) as well as Matthews and Yip (2011, 39,
109–26).
15.3 A Contrastive Analysis of the Use of Classifiers in
Historical and Recent Translations of the Four Gospels
In this section, we will compare the use of classifiers as observed in the Cantonese translations of the 2010 edition and the 1880s edition of the four canonical
gospels in the Christian New Testament. In Section 15.3.1, classifiers for counting
and referential purposes will be analyzed, while reduplicated classifiers will be
discussed in section 15.3.2.
15.3.1 Classifiers for Counting and Referential Purposes
The ten most frequently used classifiers for counting and referential purposes as
observed in the 2010 edition of the contemporary Cantonese translation of the
Four Gospels are listed in Table 15.1.
286
Tak-sum Wong and Wai-mun Leung
Table 15.1 List of Top 10 Classifiers Present in the Contemporary Cantonese Translation
of the Four Gospels
Matthew (N = 684)
Mark (N = 432)
Luke (N = 720)
John (N = 465)
Classifier (63) #
Classifier (50) #
Classifier (72) #
Classifier (47) #
個
啲
日
隻
班
件
條
位
次
句
kɔ
ɔ33
ti55
jɐɐt2
tsɛk3
pan55
kin22
tʰiu11
wɐɐi35
tsʰi33
kɵy33
247
144
42
25
19
17
17
15
15
11
個
啲
日
隻
次
條
件
班
座
位
kɔɔ33
ti55
jɐɐt2
tsɛk3
tsʰi33
tʰiu11
kin22
pan55
tsɔɔ22
wɐɐi35
152
89
22
15
15
13
11
11
9
7
個 kɔɔ33
啲 ti55
日 jɐɐt2
隻 tsɛk3
件 kin22
人 jɐɐn11
次 tsʰi33
位 wɐɐi35
年 nin11
條 tʰiu11
296
110
51
20
20
19
16
15
13
11
個
啲
位
日
件
次
條
年
班
羣
kɔɔ33
ti55
wɐɐi35
jɐ
ɐt2
kin22
tsʰi33
tʰiu11
nin11
pan55
kʷʰɐɐn11
167
103
43
36
16
11
9
6
6
6
In Table 15.1, N denotes the total number of classifier tokens in each gospel,
while the total number of classifier types is shown in row 2. For instance, 63 different classifiers are found in the Gospel of Matthew, while 684 tokens are present. It can be observed that 7 classifiers are overlapping in the top 10 classifier
list across these four Gospels (highlighted). Note that in contemporary Cantonese, kɔɔ33個 is a general classifier used in a countable context in which the number
or amount to be expressed is exact, while ti55啲 is a general classifier used in an
uncountable context or when the number/amount to be expressed is unspecified.
One example for each classifier is presented in the following for illustration:
(11) 個 kɔɔ33
五個餅 (Luke 9:13, 2010)
ŋ13 kɔ33 pɛŋ35
five cl loaf
“five loaves”
(12) 啲 ti55
呢啲工作 (Luke 4:43, 2010)
ni55 ti55 koŋ55tsɔk3
DEM cl work
“these tasks”
(13) 日 jɐt2
三日 (Mark 8:2, 2010)
sam55 jɐt2
three day
“three days”
On a Historical Approach to Cantonese Studies 287
(14) 件 kin22
呢件事 (Luke 1:18, 2010)
ni55 kin22 si22
DEM cl
matter
“this issue”
(15) 條 tʰiu11
兩條魚 (Luke 9:13, 2010)
lœŋ13 tʰiu11 jy35
two cl fish
“two fishes”
(16) 位 wɐi35
嗰位天使 (Luke 2:13, 2010)
kɔ35 wɐi35 thin55si33
that cl
angel
“that angel”
(17) 次 tsʰi33
得罪你七次 (Luke 17:4, 2010)
tɐk5tsɵy22
nei13 tshɐt5 tsʰi33
trespass.against 2SG seven cl
“to trespass against thee seven times”
It should be noted that the absence of some frequently observed classifiers in the
top 10 list of a gospel does not imply its absence in the original text. In most cases,
those classifiers merely occupy a lower position in the frequency list. For example, the sortal classifier commonly used for counting animals, tsɛk3隻, appears
in all the Four Gospels: the Gospels of Matthew (25 tokens), the Gospel of Mark
(15 tokens), the Gospel of Luke (20 tokens), and the Gospel of John (3 tokens). Its
absence in the top 10 list of the Gospel of John is just a result of its low frequency,
even lower than the tenth most frequently observed classifier, namely, kʷʰɐn11羣,
“crowd” (6 tokens), which is a collective classifier and can also be used to count
animals.
Having introduced the distribution of classifiers in the contemporary Cantonese
translation of the Four Gospels of the 2010 edition, we travel back to the 1880s!
The distribution of classifiers for counting and referential purposes in the historical Cantonese translation of the 1880s edition of the Four Gospels is shown in
Table 15.2.
288
Tak-sum Wong and Wai-mun Leung
Table 15.2 List of Top 10 Classifiers Present in the Historical Cantonese Translation of
the Four Gospels
Matthew (N = 678)
Mark (N = 398)
Luke (N = 798)
John (N = 476)
Classifier
#
Classifier
#
Classifier
#
Classifier
#
個 kɔ
ɔ33
的 ti53
陣 tʃɐɐn22
日 jɐɐt2
隻 tʃɛk3
條 tʰiu11
樣 jœng22
kan53
次 tsʰɿ33
人 jɐɐn11
266
210
64
43
27
22
20
9
9
8
個 kɔɔ33
的 ti53
隻 tʃɛk3
日 jɐɐt2
條 tʰiu11
樣 jœng22
kan53
件 kin22
句 ky33
隊 tui22
184
107
19
17
16
16
11
10
10
8
個 kɔɔ33
的 ti53
日 jɐɐt2
隻 tʃɛk3
陣 tʃɐɐn22
件 kin22
條 tʰiu11
樣 jœng22
kan53
年 nin11
392
202
59
33
28
21
21
16
15
11
個 kɔ
ɔ33
的 ti55
日 jɐɐt2
陣 tʃɐ
ɐn22
條 tʰiu11
件 kin22
樣 jœng22
次 tsʰɿ33
處 ʃy33
位 wɐɐi22
200
174
36
15
15
10
10
6
5
5
Similarly, the overlapping classifiers are highlighted. One example for each of
these commonly observed classifiers in historical Cantonese will be given in the
following for illustration purposes:
(18) 個 kɔɔ33
十個城 (Luke 19:17, 1883)
ʃɐp2 kɔ33 ʃeŋ11
ten cl city
“ten cities”2
(19) 的 ti53
呢的衆人 (Mark 8:2, 1882)
ni53 ti53 tʃoŋ33jɐn11
dem cl multitude
“these people”
(20) 日 jɐt2
三日 (Mark 8:2, 1882)
sam53 jɐɐt2
three day
“three days”
(21) 條 tʰiu11
呢條標 (John 19:20, 1883)
ni53 tʰiu11 piu53
title
DEM cl
“this title”
On a Historical Approach to Cantonese Studies 289
(22) 樣 jœŋ22
各樣嘅私慾 (Mark 4:19, 1882)
kɔk3 jœŋ22 kɛ33 sɿ53jok2
every cl
ADN lust
“the lusts of other things”
Likewise, the absence of some commonly observed classifiers in the top 10 list
of a gospel in Table 15.2 does not imply its absence in that gospel. For instance,
as shown in Table 15.2, the sortal classifier kan53 , which is commonly used
for counting buildings, appearing in all Four Gospels except the Gospel of John,
is merely a consequence of its low frequency in the Gospel of John – only one
instance is found.
Apparently, three classifiers are shared among both top 10 lists of the 1880s
and 2010 editions, namely, kɔɔ33個 [(11), (18)], jɐt
ɐ 2日 [(13), (20)], and tʰiu11條
[(15), (21)]. Readers who have a basic mastery of the Chinese language should
be able to notice the graphical similarity between classifiers 12 and 19, namely,
“啲” and “的.” In fact, the two allographs are semantically and phonologically
identical; the former one is used predominantly in contemporary Cantonese
but already appeared as early as 1877 in other Cantonese historical documents,
while the frequent appearance of the latter one in the historical documents published in the nineteenth century is observed. However, in the 1880s edition of
the Four Gospels, only the preserved graph “的” is present, possibly a result of
direct transference from earlier translations. The insertion of the mouth radical
“口” to the left of the graph “的” is probably related to a historical sound change
of this classifier. On the etymology and historical development of “啲” and
“的,” readers can refer to Wong (2010) for details. It is also worth noting that
four instances of the graph “的” are also observed in the 2010 edition, albeit its
rare presence, if not absence, in contemporary Cantonese vernacular writing.
This suggests that in the course of preparing the 2010 edition, the translator(s)
might have referred to the 1880s edition rather than translated from scratch.
Thus, four classifiers are in fact shared among the top 10 lists of the four Gospels
in both editions, namely:
kɔ
ɔ33個, jɐt
ɐ 22日, tʰiu11條, and ti53/ti55的/啲
Tables 15.3 and 15.4 list the top 95% most frequently observed classifiers,
based on cumulative frequency, in the 2010 and 1880s editions of the Four Gospels, respectively.
In the 2010 edition, 96 classifiers are used, but in the 1880s edition, only 81
are present. Among the top 10 classifiers, 6 are found in both editions, namely,
kɔ
ɔ33個, ti55/ti53啲/的, jɐt
ɐ 2日, kin22件, 隻 tsɛk3/tʃɛk3隻, tʰiu11條, which suggests the
prevalent usage of these classifiers in Cantonese since the nineteenth century. It is
interesting to see that the cumulative frequency of the tenth most frequently used
classifier in the 1880s edition, ky33句, “sentence,” has reached 86.8% already, but
290
Tak-sum Wong and Wai-mun Leung
Table 15.3 The Most Frequently Observed Classifiers Present in the Recent Cantonese
Translation of the Four Gospels
Rank
Classifier
Frequency
Rel. Freq.
Cul. Freq.
Cul. Rel. Freq.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
個 kɔɔ33
啲/的 ti55
日 jɐt
ɐ2
位 wɐi
ɐ 35
件 kin22
隻 tsɛk3
次 tsʰi33
條 tʰiu11
班 pan55
座 tsɔ
ɔ22
羣 kʷʰɐn
ɐ 11
句 kɵy33
年 nin11
人 jɐn
ɐ 11
家 ka55
倍 pʰui13
嚿 kɐu
ɐ 22
樖 pʰɔ
ɔ55
塊 fɐi
ɐ 33
kan53
籃 lam11
種 tsong35
粒 nɐp
ɐ 5
張 tsœŋ55
代 tɔi
ɔ 22
樣 jœng22
晚 man13
組 tsou35
兩 lœŋ35
身 sɐn
ɐ 55
歲 sɵy33
隊 tɵy22
段 tyn22
邊 pin55
雙 sœŋ55
(Other 61)
862
450
151
80
64
63
57
50
40
28
27
25
21
19
18
16
15
14
14
13
12
12
12
12
12
11
11
8
8
8
6
6
6
5
5
119
37.8%
19.7%
6.6%
3.5%
2.8%
2.8%
2.5%
2.2%
1.8%
1.2%
1.2%
1.1%
0.9%
0.8%
0.8%
0.7%
0.7%
0.6%
0.6%
0.6%
0.5%
0.5%
0.5%
0.5%
0.5%
0.5%
0.5%
0.4%
0.4%
0.4%
0.3%
0.3%
0.3%
0.2%
0.2%
5%
862
1312
1463
1543
1607
1670
1727
1777
1817
1845
1872
1897
1918
1937
1955
1971
1986
2000
2014
2027
2039
2051
2063
2075
2087
2098
2109
2117
2125
2133
2139
2145
2151
2156
2161
2280
37.8%
57.5%
64.2%
67.7%
70.5%
73.2%
75.7%
77.9%
79.7%
80.9%
82.1%
83.2%
84.1%
85.0%
85.7%
86.4%
87.1%
87.7%
88.3%
88.9%
89.4%
90.0%
90.5%
91.0%
91.5%
92.0%
92.5%
92.9%
93.2%
93.6%
93.8%
94.1%
94.3%
94.6%
94.8%
100%
its rank counterpart in the 2010 edition, tsɔ
ɔ22座, is 80.9% only, with a difference
of almost 6%. In Table 15.3, among the 95% most frequently used classifiers in
modern Cantonese, three are not found in the entire Four Gospels of the 1880s
edition, namely, pan55班, tsong35種, tsou35組. All these suggest that the diversity
of classifiers used in the 2010 edition is higher than that in the 1880s edition.
It is also interesting to see that the relative frequency of some classifiers underwent a drastic change. For example, there was a reduction in the relative frequency
of tui22/tɵy22 隊 from 0.7% in the 1880s edition to 0.3% in the 2010 edition, while
the relative frequency of jœŋ22 樣 increased from 0.5% to 2.3%. Do the absence
On a Historical Approach to Cantonese Studies 291
Table 15.4 The Most Frequently Observed Classifiers Present in the Historical Cantonese
Translation of the Four Gospels
Rank
Classifier
Frequency
Rel. Freq.
Cul. Req.
Cul. Rel. Freq.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
ɔ33
個 kɔ
的 ti53
日 jɐt
ɐ2
陣 tʃɐn
ɐ 22
隻 tʃɛk3
條 tʰiu11
樣 jœng22
件 kin22
kan53
句 ky33
嚿 kɐu
ɐ 22
次 tsʰɿ33
人 jɐn
ɐ 11
年 nin11
倍 pʰui13
隊 tui22
位 wɐi
ɐ 22
斤 kɐn
ɐ 53
處 ʃy33
籃 lam11
粒 nɐp
ɐ 5
笪 tat3
世 ʃɐi
ɐ 33
代 tɔi
ɔ 22
張 tʃœŋ53
(Other 56)
1042
693
155
114
83
74
62
49
36
26
25
24
18
18
18
18
14
11
11
11
10
10
9
9
8
140
38.8%
25.8%
5.8%
4.2%
3.1%
2.8%
2.3%
1.8%
1.3%
1.0%
0.9%
0.9%
0.7%
0.7%
0.7%
0.7%
0.5%
0.4%
0.4%
0.4%
0.4%
0.4%
0.3%
0.3%
0.3%
5.2%
1042
1735
1890
2004
2087
2161
2223
2272
2308
2334
2359
2383
2401
2419
2437
2455
2469
2480
2491
2502
2512
2522
2531
2540
2548
2688
38.8%
64.5%
70.3%
74.6%
77.6%
80.4%
82.7%
84.5%
85.9%
86.8%
787.8%
88.7%
89.3%
90.0%
90.7%
91.3%
91.9%
92.3%
92.7%
93.1%
93.5%
93.8%
94.2%
94.5%
94.8%
100%
of the three classifiers in the 1880s edition and the drastic change in the relative
frequency of some classifiers also suggest that there existed a process of lexical
replacement in the history of Cantonese? A comparison of identical verses containing these three classifiers in the two editions was conducted to investigate this
conjecture. Our analysis found that while in most cases, the reduction in the use of
classifiers is a result of the employment of other strategies in the course of translation, in other cases, lexical replacement took place.
Example (23) shows a case which employed tui22隊 as a collective classifier of
jɐn
ɐ 11人, “human being,” in historical Cantonese, while tsou35組 was employed in
contemporary Cantonese translation.
(23) Luke 9:14
[. . .] 耶穌又對門生話、呌大衆排開坐倒處、每隊五十人。 (1883)
[. . .] jɛ11su53 jɐu22 tui33 mun11ʃɐŋ53 wa22 , kiu33 tai22tʃoŋ33 phai11 hɔi53 tshɔ13 tou35 ʃy33 mui13 tui22 ŋ13ſɐp2 jɐn11 .
Jesus also to disciple say ask masses line. P RT sit at place each cl fifty human
up
292
Tak-sum Wong and Wai-mun Leung
[. . .] 耶穌對 哋話:「叫羣眾一組一組坐落,每組約五十人。」 (2010)
[. . .]
jɛ11sou55 tɵy33
k hɵy13tei22
wa22
:
Jesus
to
3P L
say
“ kiu33 kwhɐn11tsoŋ33 jɐt5 tsou35 jɐt5 tsou35 tshɔ13 lɔk2 , mui13 tsou35 jœk3
ŋ13sɐp2 jɐn11 . “
ask throng
one cl
one cl
sit P RT each cl
approximately fifty human
“[. . .] And he [Jesus] said to his disciples, Make them sit down by fifties in a company.”
In contemporary Cantonese, tɵy22隊 is often used to count teams, while the collective classifier for counting groups (of people) is tsou35組; but in historical Cantonese, apparently, tui22隊 can also be used to count groups, while tsou35/tsu35組
is absent in the four Gospels of the 1880s edition. Example (24) shows a similar
example which employed tui22隊 as the collective classifier of pigs in historical
Cantonese, while kʷʰɐn
ɐ 11羣 was employed in contemporary Cantonese translation.
(24) Luke 8:32
[. . .]個的鬼求耶
個隊猪處 [. . .] (1883)
[. . .] kɔ33 ti53 k wɐi35 k hɐu11 jɛ11su53 tſun35 khy13 jɐp2 kɔ33 tui22 tſy55 ʃy33 [. . .]
DEM CL ghost beseech Jesus
allow 3SG enter DEM cl swine place
[. . .] 鬼就央求耶穌,
哋去羣豬處 [. . .] (2010)
[. . .] kwɐi35 tsɐu22 jœŋ55khɐu11 jɛ11sou55 , tsɵn35 khɵy13tei22 hɵy33 kʷʰɐn11 tsy55 sy33 [. . .]
ghost then implore
Jesus
allow 3P L
go cl
swine place
“[. . .] and they [devils] besought him [Jesus] that he would suffer them to enter into them
[. . .]”
In this example, the classifier for counting pigs is kʷʰɐn
ɐ 11羣, depicting a crowd
of pigs. In contemporary Cantonese, it is also grammatical to say jat5tɵy22tsy55 一
隊豬, but only in the case when pigs are “lining up.”
Example (25) shows an instance which employed jœŋ22樣, “kind,” as the
generic classifier of an abstract concept, namely, sɿ53jok2私欲, “lust,” in historical
Cantonese, while tsong35種, “kind,” was employed in contemporary Cantonese
translation.
(25) Mark 4:19
[. . .] 與及各樣嘅私慾
嚟偪死道理 [. . .] (1882)
[. . .] jy13khɐp2 kɔk3 jœŋ22 kɛ33 sɿ53jok2 , tou53 lɐi11 pek5 sɿ35 tou22li13 [. . .]
and
every cl
ADN lust
also come choke die argument
[. . .] 同其他各種慾望入嚟窒息信息嘅生機 [. . .] (2010)
[. . .] thoŋ11 khei11tha55 kɔk3 tsoŋ35 jok2mɔŋ2 jɐp2lai11 tsɐt2sek5 sɵn33sek5 kɛ33 sɐŋ55kei55 [. . .]
with other
every cl
desire go.into choke message ADN vitality
“[. . .] and the lusts of other things entering in, choke the word [. . .]”
In contemporary Cantonese, the use of jœŋ22樣 is more restricted, such that
it can only be used to count a finite set of nouns (e.g., jɛ13嘢, “thing; issue”),
On a Historical Approach to Cantonese Studies 293
but tsong35種 can be used in combination of any nouns. As reflected in the Four
Gospels, in historical Cantonese, jœŋ22樣 seems to have been used in combination of any nouns, abstract or concrete, for example, tou22li13道理, “argument” (John 4:25), tʃɐn
ɐ 53li13真理, “truth” (John 16:13), sɿ22事, “issue” (Marco
33
1:38), tʃeng 症, “disease” (Marco 1:34), pʰi33jy22譬喻, “parable” (Marco 4:13),
ʃin22ji22善義, “righteousness” (Matthew 3:15), and peng22tʰong33 痛, “sickness” (Matthew 4:23).
Example (26) employed wui11 囘, “time,” as a verbal classifier of the actions
tɐk
ɐ 5tsui22 得罪, “trespass against,” and fan53tʃyn33 番轉, “turn round,” in historical Cantonese, while tsʰɿ33 次, “time,” was employed in contemporary Cantonese
translation.
(26) Luke 17:4
倘若 一日七囘得罪你、亦七囘番轉嚟話 [. . .] (1883)
thɔŋ35jœk2 khy13 jɐt5 jɐt2 tshɐt5 wui11 tɐk5tsui22 ni13 , jek2 tshɐt5 wui11 fan53tſyn33 lɐi11 wa22 [...]
if
3SG one day seven cl trespass. 2SG also seven cl turn.round P RT say
against
若 喺一日內得罪你七次,每
回頭對你話 [. . .] (2010)
jœk2 khɵy13 hɐi35 jɐt5 jɐt2 nɔi22 tɐk5tsɵy22 nei13 tshɐt5 tsʰi33 , mui13 jɐt5 tsʰi33 tou55 wui11thɐu11 tɵy33 nei13 wa22 [...]
if 3SG LOC one day inside trespass. 2SG seven cl each onecl also turn.round to 2SG say
against
“And if he trespass against thee seven times in a day, and seven times in a day turn again to thee,
saying [...].”
In contemporary Cantonese, tsʰi33次 is an unmarked classifier for counting the
number of times of an action. Although there exists a difference in the word order
between historical and contemporary Cantonese translation, in this context, the
use of tsʰi33 is still an unmarked choice in colloquial contemporary Cantonese even
if the classifier is in a preverbal position. The use of wui11囘 as a classifier is no
longer common in contemporary Cantonese; it is usually used idiomatically in
some particular context, like m11 hɐi
ɐ 22 jɐt
ɐ 5 wui11 si22唔係一回事, “not the same
thing/issue.”
Example (27) shows a verse which employs tat3笪 as a classifier of tʰin11 田,
“field,” in historical Cantonese, while fɐi
ɐ 33塊 is used in contemporary Cantonese
translation:
(27) Matthew 13:44
[. . .] 歡喜去賣嘵所有嘅、嚟買個笪田。 (1882)
[. . .] hou35 fun53hi35 hy33 mai22 hiu53 ʃɔ35jɐu13 kɛ33, lɐi11 mai13 kɔ33 tat3 thin11.
very joyous go sell P FV all
NOM P RT buy
DEM cl field
[. . .] 然後 高興將自己所有嘅
賣,去買嗰塊田。 (2010)
[. . .] jin11hɐu22 hou35 kou55heŋ33 tsœŋ55 tsi22kei35 sɔ35jɐu13 kɛ33 tou55 pin33mai22, hɵy33 mai13 kɔ35 fai33 thin11.
afterwards very joyous
P RT
self
all
NOM also sell.off
go buy that cl field
“[. . .] and for joy thereof goeth and selleth all that he hath, and buyeth that field.”
The previous example shows a typical case of lexical replacement. The classifier tat33 survives in contemporary Cantonese but is only used to count places or
294
Tak-sum Wong and Wai-mun Leung
land parcels (e.g., jɐt
ɐ 5 tat3 tei22fɔŋ
ɔ 55一笪地方, “a place”), as seen in example 28,
while the canonical classifier for thin11, “field,” is fai33.
(28) Marco 14:32
哋到一笪地方,名客西馬尼 [. . .] (2010)
khɵy13tei22
tou33
jɐt5
tat3
tei22fɔŋ55
3P L
arrive
one
cl
place
,
meŋ11
name
hak3sɐi55ma13nei11
[...]
GN
“And they came to a place which was named Gethsemane . . .”
It should be noted that, among the classifiers with a drastic change of the relative frequency in Tables 15.3 and 15.4, only a number of cases reflect the process of lexical replacement, while many other cases demonstrate a result of the
application of different translation strategies. As shown in example (29), the lexical item kʷʰɐn
ɐ 11tʃong33羣衆, “throng,” was used in the 1880s edition, when jɐt
ɐ5
22
55
11
tai pan jɐn
ɐ 一大班人, “a huge group of people,” is used in the 2010 edition.
In contemporary Cantonese, jɐt
ɐ 5 tai22 pan55 jɐn
ɐ 11 sounds more colloquial, while
11
33
kʷʰɐn
ɐ tsong is usually used in higher register.
(29) John 6:5
耶穌舉眼、見羣衆嚟到 處 [. . .] (1883)
jɛ11su53 ky35 ŋan13 , kin33 kʷʰɐn11tsoŋ33
see
throng
Jesus
lift
eye
lɐi11
come
tou33
to
khy13
3SG
ʃy33
place
[. . .]
耶穌抬頭,睇見一大班人嚟到
前 [. . .] (2010)
jɛ11sou55 thɔi11thɐu11 , thɐi35kin33 jɐt5 tai22 pan55 jɐn11 lɐi11 tou33 khɵy13 min22tshin11 . . .
Jesus gain.ground see
one big cl human come to 3SG in.front.of
“When Jesus then lifted up his eyes, and saw a great company come unto him [. . .]”
In example (30), the general classifier kɔɔ33個 is used to count the noun
tʰin53sɿ33天使, “angel,” in the 1880s edition, but the honorific classifier for
counting people, wɐi
ɐ 35位, is utilized in contemporary Cantonese translation. In
the 1880s edition, wɐi
ɐ 22 was also observed, for example, in verse (30), when it
is employed to count tʰin53sɿ33天使, “angel.” In this case, the selection of classifiers seems to have been a matter of the choice of the translators, but no linguistic
factor was involved.
(30) Luke 2:13
忽然
有大隊天軍、同埋個天使讚美上帝話。 (1883)
fɐt5jin11kan53, jɐu13 tai22 tui22 thin53kwɐn53, thoŋ11mai11 kɔ33 thin53sɿ33 tsan33mi13 ʃœŋ22tɐi33 wa22.
suddenly EXIST big CL heavenly.host and
cl angel
praise
God
say
忽然,有大隊天軍同嗰位天使,讚美上帝話: (2010)
fɐt5jin11, jɐu13 tai22tɵy22 thin55kwɐn55 thoŋ11 kɔ35 wɐi35 thin55si33, tsan33mei13 sœŋ22tɐi33 wa22:
suddenly EXIST big CL heavenly.host and that cl angel
praise
God
say
“And suddenly there was with the angel a multitude of the heavenly host praising God,
and saying.”
On a Historical Approach to Cantonese Studies 295
15.3.2 Classifier Reduplication
Statistics of classifier reduplication are excluded from Tables 15.1 to 15.4. They
are presented in Tables 15.5 and 15.6.
Table 15.5 shows the statistics of the reduplicated classifiers present in the
2010 edition. It can be observed that only jɐn
ɐ 11jɐn
ɐ 11人人, “everybody,” and
2
2
jɐt
ɐ jɐt
ɐ 日日, “every day,” are observed more than once. Table 15.6 shows the
statistics of the 1880s edition. It can be seen that kɔɔ33kɔɔ33個個 exists in all Four
Gospels, while jɐn
ɐ 11jɐn
ɐ 11人人 is present in three Gospels but not in the Gospel
of John.
Apparently, the number of reduplicated classifiers was reduced from 32 in
the 1880s edition to 11 in the 2010 edition. Does it reflect a historical syntactic
change in Cantonese?
By comparing the same verse in both editions, it is found that the reduction
in usage of reduplicated classifiers is usually a result of a change of translating
strategy when the idea of each individual is uttered. In some cases, a universal
quantifier was used. For example:
Table 15.5 Reduplicated Classifiers in the Cantonese Translation of the 2010 Edition of
the Four Gospels (N = 11)
Matthew
Mark
Luke
John
Type
#
Type
#
Type
#
Type
#
人人jɐɐn11jɐɐn11
句句kɵy33kɵy33
1
1
種種tsoŋ35tsoŋ35
1
人人jɐɐn11jɐɐn11
日日jɐ
ɐt2jɐɐt2
2
2
個個kɔɔ33kɔɔ33
2
日日jɐ
ɐt2jɐɐt2
1
樣樣jœŋ22 jœŋ22
1
Table 15.6 Reduplicated Classifiers in the Cantonese Translation of the 1880s edition of
the Four Gospels (N = 32)
Matthew
Mark
Luke
John
Type
#
Type
#
Type
#
Type
#
個個kɔ
ɔ33kɔɔ33
世世ʃɐɐi33ʃɐɐi33
4
1
個個kɔɔ33kɔɔ33
人人jɐɐn11jɐɐn11
3
1
個個kɔɔ33kɔɔ33
人人jɐ
ɐn11jɐɐn11
4
3
個個kɔ
ɔ33kɔɔ33
3
人人jɐɐn11jɐɐn11
1
件件kin22kin22
1
日日jɐ
ɐt2jɐɐt2
3
句句ky ky
33
1
樣樣jœŋ jœŋ
1
處處ʃy33ʃy33
1
2
1
世世ʃɐɐi ʃɐɐi
1
33
日日jɐ
ɐt jɐɐt
2
22
22
33
33
對對tui33tui33
年年nin nin
11
樣樣jœŋ jœŋ
22
1
1
11
22
1
296
Tak-sum Wong and Wai-mun Leung
(31) Luke 1:65
topic comment
鄰里個個驚慌 [. . .] (1883)
lun11li13
kɔ33kɔ33
keŋ53fɔŋ53
neighbour
everybody
panic
subject predicate
鄰居
奇 [. . .] (2010)
lөn11kөy55
tou55
hou35
neighbour
also
very
keŋ55khei11
surprised
[. . .]
[. . .]
“And fear came on all that dwelt round about them [. . .]”
In the 1880s edition, the reduplicated classifier kɔ
ɔ33kɔɔ33個個 is used to express the
idea of every neighbour. In the 2010 edition, the universal quantifier tou55 is used
to express the idea of all neighbours. In addition, there also exists a change in syntactic construction. In example (31), topic-comment construction is used in the 1883
ɔ 53個
edition such that lun11li13鄰里, “neighbour,” is the topic, while kɔɔ33kɔɔ33 keŋ53fɔŋ
個驚慌, “everybody is panicking,” is the comment. In the 2010 edition, the subjectpredicate construction is used, with lɵn11kɵy55鄰居, “neighbour,” being the subject,
奇, “all being very surprised,” is the predicate.
while tou55 hou35 keŋ55kʰei11
The objective truth expressed by these two translations is identical even though different linguist constructions were used, which also leads to a shift in focus.
In other cases, other lexical items were used to express the identical objective
truth. For instance:
(32) Luke 4:20
topic
comment
. . . 在會堂嘅、人人 定眼睇住 。 (1883)
. . . tsɔi22 wui22thɔŋ11 kɛ33 , jɐn
ɐ 11jɐn
ɐ 11
tou53 teŋ22 ŋan13 thɐi35 tʃy22 khy13 .
LOC
synagogue NOM
human-human also fasten eye see ASP 3SG
subject predicate
. . . 全會堂嘅人 定眼睇住 。 (2010)
. . . tsʰyn11 wui22thɔŋ11 kɛ33 jɐn11
tou55
entire synagogue ATTR human also
teŋ22
ŋan13 thɐi35
fasten eye
see
tsy22
ASP
khɵy13
3SG
.
“. . . And the eyes of all them that were in the synagogue were fastened on him.”
The reduplicated classifier jɐn
ɐ 11jɐn
ɐ 11人人, literally “human-human,” is used to
express the idea of everybody in the 1880s edition, while the universal quantifier
ɔ 11 kɛ33 jɐn
ɐ 11會堂嘅人 to convey the idea
tsʰyn11全, “entire,” is used with wui22tʰɔŋ
of people in the whole synagogue in the 2010 edition. There also exists a difference in sentence construction such that a topic-comment is used in the former
while a subject-predicate is used in the latter edition. Similarly, the objective truth
expressed by these two constructions is identical, although there is a subtle difference in focus.
On a Historical Approach to Cantonese Studies 297
In a number of cases, the concept of each individual is expressed by other constructions, such as:
(33) Luke 11:3
我哋需用嘅糧、日日俾我哋。 (1883)
ŋɔ13ti22
sy53
joŋ22
kɛ33
lœŋ11
1P L
need
use
ATTR
grain
賜俾我哋每日需要嘅飲食。 (2010)
tshi33
pei35
ŋɔ13tei22
mui13
bestow
to
1P L
each
jɐt2jɐt2
day-day
,
jɐt2
day
pi35
give
sɵy55jiu33
need
kɛ33
ATTR
ŋɔ13ti22
1P L
.
jɐm35sek2
diet
.
“Give us day by day our daily bread.”
The reduplicated classifier jɐt
ɐ 2jɐt
ɐ 2日日, literally “day-day,” is used to express
the idea of every day in the 1880s edition, while in the 2010 edition, the determiner mui13每, “every” + classifier, is used to express the same idea.
It is also worth noting that in some cases, other lexical items are used to convey
the idea of each individual, like:
(34) Luke 9:6
[. . .]
[. . .]
[. . .]
[. . .]
處處傳福音、醫人嘅
ʃy33ʃy33
tʃhyn11
place-place preach
。(1883)
fok5jɐm55
gospel
傳福音,到處醫 。 (2010)
fok5jɐm55
,
tshyn11
preach
gospel
ji53
cure
,
jɐn11
human
tou33tsʰy33
everywhere
kɛ33
P OSS
ji55
cure
peŋ22
sickness
pɛŋ22
sickness
.
.
“[. . .] preaching the gospel, and healing every where.”
In example (34), the reduplicated classifier ʃy33ʃy33處處, literally, “place-place,” is
used to express the idea of everywhere in the 1880s edition, while in the 2010 edition, the
lexical item tou33tsʰy33到處, “everywhere,” is used instead. In terms of lexical choice, in
contemporary Cantonese, ʃy33ʃy33 is rarely used, while tou33tsʰy33 is only used in a formal
context (e.g., news reports). In the context of example (34), the word tsɐu
ɐ 55wɐi
ɐ 11周圍 is
most frequently used in colloquial Cantonese according to the authors’ native intuition.
In examples (31) to (34), other strategies are employed to replace the reduplicated
classifiers in the 1880s edition to express the idea of each individual in the 2010
edition. Readers may wonder whether other strategies were replaced by the reduplicated classifiers in the 2010 edition. Let us take a look at the following example:
(35) Luke 4:15
喺各會堂敎人、衆人歸榮
hɐi35 kɔk3
wui22thɔŋ11
LOC
every synagogue
。 (1883)
kau33 jɐn11
teach human
,
tʃoŋ33jɐn11
everybody
kwɐi53weŋ11
glorify
khy13
3SG
.
喺各會堂敎導人,人人 稱讚 。 (2010)
khɵy13 hɐi35 kɔk3 wui22thɔŋ11 kau33tou22 jɐn11 , jɐn11jɐn11
tou55 tsheŋ55tsan33 k hɵy13 .
3SG LOC every synagogue teach
human human-human also glorify
3SG
“And he taught in their synagogues, being glorified of all.”
298
Tak-sum Wong and Wai-mun Leung
In the 1880s edition, the pronoun tʃoŋ33jɐn
ɐ 11衆人, “everybody,” is used to refer
to all the people in the synagogue, but in the 2010 edition, the reduplicated classifier jɐn
ɐ 11jɐn
ɐ 11人人, literally “human-human,” is used to convey the same objective truth, albeit a different focus. In terms of lexical choice, in contemporary
Cantonese, tʃoŋ33jɐn
ɐ 11衆人 is only used in a formal context, while jɐn
ɐ 11jɐn
ɐ 11人人
is often used in a colloquial context. This seems to suggest that the construction
employed for expressing a collective concept is likely a matter of the choice of
the translators. Some readers may make a conjecture that reduplicated classifiers
become less popular in contemporary Cantonese as observed from their reduced
usage in the 2010 edition. As native speakers, the authors confirm that the use
of reduplicated classifiers is still prevalent in contemporary Cantonese. For this
reason, investigations into more Cantonese historical documents should be made
before jumping to a rash conclusion.
15.4 Conclusion
In this chapter, we first introduced the “Database of the 19th Century (1865–1894)
Cantonese Christian Writings,” which provides a public data repository by digitizing 15 Cantonese Christian classics published in mid- to late nineteenth century
with approximately 466,000 characters. Then, we provided a statistical account
and a contrastive study on the use of classifiers present in the Cantonese translations of the 1880s edition and the 2010 edition of the four canonical gospels in the
Christian New Testament. Our results show that while some classifiers have been
used most regularly since the nineteenth century, such as kɔɔ33個 (a general classifier), kin22件 (piece), tʰiu11條 (strip), tsɛk33隻 (mostly for counting animals and
dolls), and ti55 的/啲, the frequency of some classifiers in the 2010 edition drops
drastically as a result of lexical replacement. For example, tat33笪 (for counting
fields) is replaced by fai33塊. We also found that the reduction in frequency of
reduplicated classifiers is a result of changes in translation strategy rather than a
reduction in usage in contemporary Cantonese.
References
Bisang, Walter. 1999. “Classifiers in East and Southeast Asian Languages: Counting and
Beyond.” In Numeral Types and Changes Worldwide, edited by J. Gvozdanović, 113–85.
Berlin: Mouton de Gruyter.
Cheung Hung-nin Samuel 張洪年. 2007. Xiānggǎng yuèyǔ yǔfǎ de yánjiū 香港粵語語法
的硏究 [A Grammar of Cantonese as Spoken in Hong Kong] (rev. ed.). Hong Kong: The
Chinese University Press.
Hong Kong Bible Society 香港聖經公會, ed. and trans. 2010. Sān Gwóngdūngwá Singgīng
新廣東話聖經 [Cantonese Bible: New Cantonese Version]. Hong Kong: Author.
Kataoka Shin 岡新. 2021. “Jiànlì ‘Zǎoqī yuèyǔ shèngjīng zīliàokù’: Yuèyǔ shèngjīng de
shùmǎ rénwénxué yánjiū” 建立《早期粵語聖經資料庫》: 粵語聖經的數碼人文學研
究 [The Development of the ‘Early Cantonese Bible Database’: A Resource for Digital
Humanities Research on Early Cantonese]. Current Research in Chinese Linguistics 中
通訊 100, no. 2: 213–28.
On a Historical Approach to Cantonese Studies 299
Leung, Wai-mun 梁慧敏. 2011. “Shíjiǔ shìjì ‘Shèngjing’ Yuèyǔ yìběn de yánjiū jiàzhí”
十九世紀《聖經》粵語譯本的研究價值 [The Research Value of the 19th Century
Cantonese Bible Translations]. Journal of Jinan University (Philosophy and Social Sciences) 暨南學報 (哲學社會科學版) 155: 125–29.
Leung, Wai-mun 梁慧敏. 2016. “Lùn Yuèyǔ jùmòzhùcí “ze1” de zhǔguānxìng” 論粵語
句末助詞“啫”的主觀性 [Analysis of the Subjectivity of Sentence-final Particles: The
Case of ze1 in Cantonese]. Studies of the Chinese Language
3: 339−48.
Leung Wai-mun 梁慧敏. 2021. “Shíjiǔ Shìjì Mò “Xīnyuē Sì Fúyīn” Yuèyǔ Yìběn de
Yǔyánxué Jiàzhí” 十九世紀末《新約四福音》粵語譯本的語言學價值 [The Linguistic Value of the Cantonese Translation of the Four Gospels Published in Late 19th
Century]. www.lordwilson-heritagetrust.org.hk/filemanager/archive/project_doc/27-9105/2.pdf.
Matthews, Stephen, and Virginia Yip. 2011. Cantonese: A Comprehensive Grammar (2nd
ed.). London and New York: Routledge.
Noyes, Henry V., George H. Piercy, and P. J. Masters, eds. and trans. 1882a. Máhhó
Fūkyāmchyùhn: Yèuhngsìhng Tóupahk 馬可福音傳:羊城土白 [Gospel of Mark: Cantonese Dialect]. Shanghai: The American Bible Society.
Noyes, Henry V., George H. Piercy, and P. J. Masters, eds. and trans. 1882b. Máhtai
Fūkyāmchyùhn: Yèuhngsìhng Tóupahk 馬太福音傳:羊城土白 [Gospel of Matthew:
Cantonese Dialect]. Shanghai: The American Bible Society.
Noyes, Henry V., George H. Piercy, and P. J. Masters, eds. and trans. 1883a. Louhgāchyùhn
Fūkyāmsyū: Yèuhngsìhng Tóupahk 路加傳福音書:羊城土白 [Gospel of Luke: Cantonese Dialect]. Canton: The British and Foreign Bible Society.
Noyes, Henry V., George H. Piercy, and P. J. Masters, eds. and trans. 1883b. Yeukhohnchyùhn Fūkyāmsyū: Yèuhngsìhng Tóupahk 約翰傳福音書:羊城土白 [Gospel of
John: Cantonese Dialect]. Canton: The British and Foreign Bible Society.
Wong Tak-sum 黃得森. 2010. “Guǎngzhōuhuà “dī”, “dīt” yǔ “dīk” de lìshí fāzhǎn” 廣州
話“啲”、“尐”與“的”之歷時發展 [The Diachronic Development of Di, Dit and Dik in
Cantonese]. Yue Dialect Research 粵語研究 6&7: 75–82.
Wu Yicheng 吳義誠. 2017. “Numeral Classifiers in Sinitic languages: Semantic Content,
Contextuality, and Semi-lexicality.” Linguistics 55, no. 2: 333–69.
Notes
1 The database is accessible publicly through this link: www.polyu.edu.hk/cbs/hkchristdb.
2 All the English translations of the verses in the Bible are adopted from the King James
Version unless otherwise specified. <www.o-bible.com/kjv.html>.DOI: 10.4324/
9781003298328-
Index
absolute frequencies 69
addressee-reference 218, 220, 223–6
alignment 105, 110, 164, 179–83
alignment algorithms 180, 183
amplifiers 125, 128–9
analytic negation 128–9
author interview 12
authorship 81
author style 26–47
automatic analysis 81
average paragraph length (APL)192,
199, 208
average sentence length (ASL) 192, 193,
195, 197, 199
believability 66, 70
black characters 185
bodhisattva 87–8
bodily phenomena 159, 167–8, 170,
172–3
body exploration 159, 161, 167, 169,
171–3
body language 48, 50, 52–6, 59, 61–2,
142, 208
CAT tools 103–5, 110–12, 114–15
Cemetery of Forgotten Books, The 65–80
Cervantes 49, 195
chains 68, 113, 140
character analysis 81
character development 142–3, 213–14,
223
Chinese Buddhist Canon 81–98
Chinese fiction 133, 229
Chinese literature 245, 254, 273
close reading 107, 112–13, 180, 199, 282
cluster analysis 31, 40–1, 66, 70, 77, 196
cognitive effect 256–8, 267, 269
cognitive effort 7, 267–8, 270, 274–6
Cold War 188
Color Purple, The 158–73
communicative purpose 27–8, 30, 32–4,
37–8, 40, 44–5, 120, 143
compensation 121, 222, 225–6
computational resources 180
Computer-Aided or Assisted Literary
Translation (CALT) 103–5
concordance 14, 53, 70, 72, 107, 108, 111,
115, 143, 144
conservative strategies 162, 164, 167,
169–70, 172
contractions 120, 128–9, 201
conventional metaphor 259, 263, 267
co-occurrence 26, 53, 122
corpus analysis 108
corpus-assisted process study 255
corpus-based applications 106
corpus-based translation studies (CBTS)
103–5, 112–13, 140, 143, 230
corpus creation 108–9
corpus linguistics 61, 65, 77–8, 103, 105,
108, 115, 138–40, 231, 260
corpus stylistics 48–50, 61, 65–6, 68,
77–8
corpus technology 104–9, 113–14
creative metaphor 254–76
creative process 23, 275
creative writing 22
crowdsourcing 176, 178–9
culture-specific metaphor 254–76
culture-universal metaphor 254–76
dependency parsing 81, 92
dependency structures 86, 92
diachronic analysis 14, 26
diachronic corpora 10
diachronic trends 10–23
dialects 177, 182, 184, 186–7, 281
Index
dialogue novels 60
Dickens, Charles 48–62
digital humanities 1, 4, 103–4, 178, 210
dimensions of style variation 27–8
direct speech 48, 55–7, 59, 61–2, 71, 89
direct translation 7, 242, 257–8, 267–9,
271–6
direct WH-questions 128–9
dispersion of sentence lengths (DSL) 192,
193, 195, 197, 199, 208
distant reading 108, 114
El juego del Ángel 65, 67–8, 70–1, 73–7
El laberinto de los espíritus 65, 68, 70, 71,
74–6
El prisionero del cielo 65, 68, 70–1, 76–7
emphatics 28, 35, 47, 128–9
equivalence in lexicogrammar 218, 220–2,
224
evolution of literary genres 81
explicitation 134, 214, 258, 270
explicit strategies 162, 164, 167, 173
faithful translation 5, 158, 162, 164–70,
172–3
feminist translation 158–73
fictional dialogues 119–35
fiction writing 10–11, 21, 23
Fortunata and Jacinta 48–62
functional approach 44
functional relevance 138
functional use 28
Galdós, Benito Pérez 49–62
general technologies 106
Hawkes, David 229–46
heroic literature 191, 210
hierarchical 18, 31, 40, 196, 217, 219–26
Hīnayāna 82, 88–9, 98
historical corpus 81
Hongloumeng 229–46
Human Comedy, The 49
humbleness 213–16, 218, 220, 222–3,
225–6
illicit relations 159, 161, 167, 169, 171–3
independent clause coordinators 128
indirect translation 257–8, 267–8, 270,
273–6
information extraction 81–3, 95, 97
interactant 216, 220
interactive 27–30, 33–5, 43, 47, 123, 125,
127, 130, 178, 223, 258, 276
301
interior monologue 60
intertextuality 50, 81, 109
Jie Tao 158–73
keyword analysis 11, 12, 13–14, 19,
243
keyword networks 139
L1 English translator 254–76
L2 English translator 254–76
La sombra del viento 65, 67–8, 70–1, 73,
75–7
lexical bundles 229–46
lexical repetition 139–40
lexicogrammatical choice 213–14,
218–22, 226
linguistic diversity 182
literary register 27
literary style 71, 81, 108, 191
literary translator education 103, 105, 110,
114
low-resource language 81–2, 97
Lu Xun 254–76
machine translation 104–5, 110–11,
115, 179
Mahāyāna 81, 88–9, 91, 98
martial arts 191, 201, 208, 210, 213, 215,
217, 222, 225
Medieval Chinese 81–4, 86, 94, 97–8
metaphor in translation 259–61
Minford, John 194, 206–7, 229–31, 233,
244–5
modern literary nonfiction 28
multidimensional analysis 26, 28, 119–20,
122, 125, 132, 134, 232
Multidimensional Analysis Tagger
(MAT) 125
multifactorial approach 143
named entity recognition 81, 83–4
narrative fiction 10, 11, 23, 52
narrative space 59, 60, 65, 66–8, 70,
72–8
n-grams 68, 138, 191, 193
non-equivalence 218, 220, 224–6
non-interactant 216
novel structure 81
Oliver Twist 53–5, 121
omission 87, 111, 131, 133, 160, 152, 188,
208, 222–3, 225–6, 263–7, 270
orality 119–35
302
Index
paragraph count 180–1
parallel corpora 106, 110, 135, 179,
182, 187
parallel text analysis 158–60, 173
paraphrasing 263–70, 272, 276
personal reference 213–26
possibility modals 128–9
prepositional phrases 29, 35–6, 47, 125,
128–9, 238–9, 243–4
principal component analysis 196, 204
private parts 159, 161, 166–7, 169–73
racism 177, 182
rape 159, 161, 166–7, 169, 171–3
raw frequency 85, 145
recurrent word-combinations 68
reference corpora 106
referential markers 241–3
referential meaning 213, 218, 223, 226,
213, 218, 223, 226
relevance theory 255–6
Renjing Yang 158–9, 172–3
Ruiz Zafón, Carlos 65–78
Śākyamuni 85, 87, 89, 95–8
scripted language 133
self-reference 215, 219, 223, 226
sentence relatives 128–9
sexual content 158–73
sexual intercourse 158–73
short story 111, 140, 263
simultaneity 53–6, 59–61
situational characteristics 27–8
sketch engine 12–14, 108, 115
slavery 177
small corpus 10, 12
social networks 81, 216
speaker-reference 216, 218, 220–3, 225–6
speech role 216–20, 225–6
standardization 121, 133–4, 185–7, 273
stigma 139, 159, 161, 167–70, 172–3
style variation 26–45
stylistic change 10
stylistic panoramas 191–210
stylistics 16, 48–50, 61, 65–6, 68, 77–8,
138
stylometric 113, 191–210
stylometry 112, 191
substitution 139, 158, 162–6, 168–70,
263–7, 270
suspended quotation 48
suspensions 48–62
text reuse 81
thought presentation 27, 42, 43, 59–61
traditional stylistics 78
transcoding 263–8, 270–2, 276
translated fiction 119, 122–4, 126–35
translation choice 218, 225
translation dashboard 180
translation memory 105, 110
translation route 264, 266–8, 271–1
translation strategy 158–9, 162, 164–5,
167, 255, 260–1, 270, 275, 298
translation style 230–3, 244, 266
translator style 230–2, 254
transnational 176–7, 183–4, 187
trend mapping 11, 13
visualization tools 113
within-author variation 27, 44
world literature 104
writing process 19, 23
Wuxia 191–210
Yang, Gladys 229–37
Yang, Xianyi 229–37
Descargar