Corpus-based Translation Studies – innovations in the new digital age
Marion Winters, Heriot-Watt University, Edinburgh, UK
Sofia Malamatidou, University of Birmingham, UK
Corpus-based Translation Studies (CTS) has developed into a major paradigm in Translation Studies, ever since they were first introduced to the field some twenty years ago. Corpora are now used as CAT tools, in machine translation and translation memories, as well as in translator training. The application of corpus methodologies has allowed a better understanding of the nature of translated texts and their relationship to non-translated productions, offering new insights into the translation process and translator behaviour and style and moving the discipline of Translation Studies forward. As a field of study, CTS is truly inter-disciplinary, closely informed by developments in a range of related fields, such as corpus linguistics and computational linguistics. Research in those fields has recently seen great progress, offering the potential of exploring new and more complex types of corpora, such as multimodal corpora, while at the same time developing new means for corpus interrogation, together with new tools and techniques of analysis. If CTS is to expand its methods and applications, new technological advancements need to be fully embraced and new tools need to be developed. This needs to be in collaboration with other disciplines, since Translation Studies scholars often do not have the expertise to adapt tools to their needs or develop new ones, while computational linguists are often unaware of the needs of Translation Studies scholars. Similarly, maintaining a constructive dialogue with corpus linguistics will inform practices and offer the necessary theoretical insights. This panel aims to bring together the linguistic and computational side of corpus methodologies. It will discuss innovations in corpus methodologies and in analysis and annotation tools, with particular reference to translation and the translation profession, and provide a framework for collaboration and technological development in CTS to open up further avenues of research in this field.
For informal enquiries: [mDOTwintersAThwDOTacDOTuk]
Marion Winters is Senior Lecturer in Translation Studies/German at Heriot-Watt University, Edinburgh, Scotland, where she teaches translation technology, theory and practice. She is founding editor of the IATIS journal New Voices in Translation Studies, member of the IATIS Publications Committee and a professional member of the German and Irish translators' associations (BDÜ, ITIA). She has published several articles on translator style and is currently involved in a project on autobiographical writings and translation. Her main research interests include autobiographies in translation, corpus-based translation studies, translational stylistics and more specifically translator style and characterization in translation.
Sofia Malamatidou is a Lecturer in Translation Studies at the Birmingham Centre for Translation. She has worked as a research assistant on the Translational English Corpus project at the University of Manchester, UK. Her main research interests are in the field of corpus-based translation studies and she is currently working on developing corpus triangulation techniques for the study of translated texts. She is also interested in developing multimodal corpora that would allow for a systematic interrogation of images. She has written a number of articles on corpus-based translation studies and is the IATIS Chair of the Social Media and Outreach.
SESSION PLAN
Discussion time at the end of each paper
SESSION 1: Innovations in Corpus Methodologies
Introduction (10 min) – Marion Winters
PAPER 1:
Title: Taking Translation Corpora Further: An Introduction to Combined Corpus-Based Methods
Speaker: Sofia Malamatidou, University of Birmingham
PAPER 2:
Title: Training translators to use corpora hands-on: challenges and reactions by a group of 13 students at a UK university
Speaker: Ana Frankenberg-Garcia, University of SurreyPAPER 3:
PAPER 3:
Title: New computational tools in Corpus-based Translation Studies
Speaker: Marion Winters, Heriot-Watt University
PAPER 4:
Title: An annotation system for sign language corpora
Speaker: Ella Wehrmeyer, North-West University, Vanderbijlpark, South Africa
PAPER 5:
Title: Assisting comprehension in specialized fields using corpus data: Comparing the effectiveness of raw and annotated contexts
Speaker: Elizabeth Marshman and Marie-Claude L'Homme, University of Ottawa
Wrap up session (10 min) – Sofia Malamatidou
PAPER TITLES, ABSTRACTS AND BIONOTES
PAPER 1
Title: Taking Translation Corpora Further: An Introduction to Combined Corpus-Based Methods
speaker: Sofia Malamatidou, University of Birmingham
Abstract:
Corpus-based research has yielded important insights into translation; however, single types of corpora have been traditionally privileged, thus neglecting the advantages of combined corpus-based methods. This study aims to introduce a unique corpus methodology where corpora (diachronic, synchronic, comparable and parallel) can be used complementarily for the analysis of linguistic features of translated texts and their impact on non-translated texts. The language pair examined is English-Greek. The corpus analysed is a diachronic (1990-2010) corpus of Greek non-translated and translated popular science articles, along with their English source texts, consisting of approximately half a million word, and divided into three subcorpora. The first subcorpus consists of non-translated Greek texts published in 1990-1991. The second subcorpus consists of non-translated and translated Greek texts articles published in 2003-2004, as well as the source texts of the translations. The third subcorpus includes non-translated as well as translated texts and their source texts, all published in 2010-2011. The analysis of the corpus consists of three stages: (a) the diachronic analysis of a corpus of non-translated texts to examine whether there is any development in the language over time, (b) the synchronic analysis of the comparable corpus to examine whether this development is mirrored in translated texts; and (c) the synchronic analysis of the parallel corpus to trace the development back to the source texts. Results suggest that certain linguistic features, such as the frequency of passive voice reporting verbs, in Greek texts have changed under the influence of translation from English and are now closer to the patterns found in respective English texts. Through the systematic application of the methodology to data from the genre of popular science, the study demonstrates how the proposed methodology can be fruitfully employed to deepen our understanding not only of translated texts, but also of the texts influencing and being influenced by them.
Bionote:
Sofia Malamatidou is a Lecturer in Translation Studies at the Birmingham Centre for Translation. She has worked as a research assistant on the Translational English Corpus project at the University of Manchester. Her main research interests are in the field of corpus-based translation studies and she is currently working on developing corpus triangulation techniques for the study of translated texts. She is also interested in developing multimodal corpora that would allow for a systematic interrogation of images. She has written a number of articles on corpus-based translation studies and is the IATIS Chair of the Social Media and Outreach team.
PAPER 2:
Title: Training translators to use corpora hands-on: challenges and reactions by a group of 13 students at a UK university
Speaker: Ana Frankenberg-Garcia, University of Surrey
Abstract:
With the proliferation of online off-the-peg corpora over the past decade or so, the use of corpora is no longer restricted to a small community of researchers working on language description and natural language processing. Anyone with an internet connection is now able to access corpora to help them with everyday questions about language, including questions for which dictionaries, grammars and other language resources do not always have clear answers. Translators are among those who have much to gain from using corpora, as widely acknowledged in the literature (see, for example, Zanettin 1998, Maia 2002, Bowker and Pearson 2002, Zanettin et al 2003, and Beeby et al 2009). Yet in contrast to the pressure that exists to train translators in the use of computer-assisted translation tools, there seems to be little or no incentive to teach translators to use corpora. Moreover, most of the research at the crossroads of translation and corpora seems to focus on the use of corpora in Translation Studies, and there is not yet enough information about the use of corpora in actual translation training and practice.
This paper discusses some of the challenges of training translators to use corpora, and then describes how a group of 13 students studying for an MA in Translation at the University of Surrey reacted to a hands-on module on learning to use corpora in everyday translation. The analysis of the students' reactions draws on (1) their responses to an anonymous questionnaire and (2) a corpus of graded assignments, where the students were required to write a report on their use of corpora in translation (after having been asked from day one to keep a diary with examples of using corpora in their everyday translation practice). The corpus of student reports was submitted to both a quantitative and a qualitative analysis. The quantitative analysis focuses on verifying the extent to which the students made reference to terms such as concordance, lemma, collocation, part-of-speech tagging, normalized frequency and so on, and the extent to which the actual queries described in the reports involved the use of those concepts. The qualitative analysis details a selection of examples of how different students used corpora and also their views of the experience.
The students' opinions of corpora were generally very favourable, despite the steep learning curve entailed. The analysis also indicated that while some students remained underusers of corpora, others were quite capable of carrying out sophisticated queries that provided them with answers which they would not have been able to find in other more conventional tools and resources.
Bionote:
Ana Frankenberg-Garcia is Senior Lecturer in Translation Studies and Programme Director of the MA in Translation at the University of Surrey. She was responsible for creating COMPARA, a parallel corpus of English and Portuguese (www.linguateca.pt/COMPARA). Her work on the applied uses of corpora has been published in international, peer-reviewed publications, including International Journal of Lexicography, International Journal of Corpus Linguistics, Corpora and ReCALL. In 2011 she co-edited New Trends in Corpora and Language Learning (Bloomsbury). She has been working for Oxford University Press since 2011 as chief editor of a new corpus-based Portuguese-English dictionary to be published in 2015.
PAPER 3
Title: New computational tools in Corpus-based Translation Studies
Speaker: Marion Winters, Heriot-Watt University, Edinburgh
Abstract:
The aim of the present paper is to establish the profile of style of an author and translator using corpus-based methodologies. It is based on literary German-English parallel corpora of specific authors (F. Scott Fitzgerald, Natascha Wodin) and specific translators (Hans-Christian Oeser, Renate Orth-Guttmann). While corpus-based investigations of translator style, features of translation etc. have mostly used well-established text-analysis softwares in corpus-based translation studies (CTS), such as Wordsmith Tools, ParaConc and other concordancers, I intend to explore a variety of other softwares and methods. I will explore which softwares used in corpus linguistics or computational linguistics could usefully be applied in CTS and which information on author/translator style could be extracted from a corpus, for example, through application of tools for semantic profiling, semantic mirroring and distributional semantics. Concluding remarks will reflect upon strengths and limitations of corpus-analysis tools for profiling the style of an author/translator and identify desirable features of these tools for a more efficient application in CTS. Thus this study is also a call for collaboration between corpus-based translation studies and computational linguistics in developing and optimizing suitable corpus-analysis tools for CTS.
Bionote:
Marion Winters is Senior Lecturer in Translation Studies/German at Heriot-Watt University, Edinburgh, Scotland, where she teaches translation technology, theory and practice. She is founding editor of the IATIS journal New Voices in Translation Studies, member of the IATIS Publications Committee and a professional member of the German and Irish translators' associations (BDÜ, ITIA). She has published several articles on translator style and is currently involved in a project on autobiographical writings and translation. Her main research interests include autobiographies in translation, corpus-based translation studies, translational stylistics and more specifically translator style and characterization in translation.
PAPER 4
Title: An annotation system for sign language corpora
Speaker: Ella Wehrmeyer, North-West University, Vanderbijlpark, South Africa
Abstract:
The proposed paper presents a transcription and annotation system for sign language corpora which allows transcripts of interpretations to be analysed using readily-available text-based corpus packages such as WordSmith Tools and Antconc. The transcription system is based on context-free lemmatized glosses that distinguish between different aspects of the sign language lexicon such as established signs, the productive lexicon, finger-spelling and the number system. The annotation system built onto the transcription system is designed to overcome the many obstacles faced by researchers in recording features of face-to-face communication. It allows the concise description of four aspects of signed interpretation of interest to a researcher in Interpreting Studies. Firstly, phonological features of sign language, such as handshape, movement, direction, facial expression and head/body movements can be recorded. Secondly, production features such as clarity and accuracy of sign articulation, signing speed, lag time, background noises, hesitations and chunking segmentation can be included. Thirdly, it allows for the categorization and analysis of interpreting features such as additions, omissions, skewed substitutions or strategies, as well as interpreting errors and corrections. Fourthly, the system allows for further annotations in terms of language use, such as parts of speech, different features of the productive lexicon and sign language discourse features such as topic marking and referencing. The system was designed in order to investigate issues relating to incomprehension of news broadcasts interpreted into South African Sign Language (SASL). The theoretical basis of the research is built on signed language interpreting studies, signed language corpus studies and the descriptive translation framework of norm-driven shifts to identify interpreting strategies. It adapts existing annotation systems used by corpus-based researchers in sign language linguistics, but specifically redesigns annotation codes so that they can be used in readily available software packages, thereby allowing the researcher to analyse and compare multiple interpretations. Although primarily designed for sign language interpreting research, the annotations can also be used or adapted to meet the requirements of corpus-based/driven research into spoken language (i.e. oral) interpretation, especially in terms of annotating non-verbal features of interpretation as well as interpreting strategies.
Bionote:
Ella Wehrmeyer is a senior lecturer in Translation Studies at the School of Languages, North-West University, Vanderbijlpark, South Africa where she teaches translation theory, literary translation and interpreting studies. She holds a D. Litt. et Phil. from the University of South Africa, Pretoria. Her dissertation investigated sign language interpreting on television using questionnaires, focus groups, eye-tracking and corpus analysis. Her research interests include sign language interpreting, interpreting strategies, corpus-driven research, eye-tracking, children's literature, ideology in translation and the development of theoretical models of translation and interpreting.
PAPER 5:
Title: Assisting comprehension in specialized fields using corpus data: Comparing the effectiveness of raw and annotated contexts
Speaker: Elizabeth Marshman and Marie-Claude L'Homme, University of Ottawa
Abstract:
Student translators must acquire a number of new abilities: translation strategies, research techniques, and—especially when working in specialized fields—domain knowledge. This knowledge can be gained in several ways. Scholars have highlighted the potential of corpora for accessing domain and terminological knowledge. Some terminological resources have incorporated contexts extracted from corpora and annotated with key information to assist users in acquiring this knowledge. However, choosing and annotating contexts requires significant investment of time and effort from resource developers, which multiplies as the size of the resources increases. This raises questions: What is the return on this investment? Are annotated contexts more useful and effective than access to the raw corpus data?
In this study, we will compare translation students' comprehension of a small sample of terms in the field of renewable energies achieved after exploiting either "raw" corpus data in English or French or selected contexts from the same corpora, annotated with frame elements (based on principles of frame semantics, as in the DiCoEnviro) By studying how a sample of approximately 20 students match terms with their definitions from existing resources, we will investigate whether students are better able to differentiate between closely related concepts after studying the annotated contexts as compared to the raw corpus data. By evaluating how these students write their own definitions, we will look for possible differences in definition content and quality when students use the raw and annotated contexts for knowledge acquisition. We hypothesize that richer and more focused information provided by annotated contexts will help students to more accurately identify and describe concepts and differentiate them from others.
Through quantitative analysis of the proportion of correctly identified definitions for participants who consulted either annotated or raw contexts, and qualitative analysis of the accuracy and appropriateness of the written definitions (e.g. the inclusion of key, accurately identified and appropriately expressed defining characteristics), we hope to better evaluate and describe the usefulness of annotating contexts, and ultimately guide the development of terminological resources that can effectively and efficiently assist users in understanding specialized concepts.
Bionote:
Elizabeth Marshman is an Associate Professor at the University of Ottawa's School of Translation and Interpretation and a member of the Observatoire de linguistique Sens-Texte (OLST). Her research focuses on corpus-based applications in terminology and terminological relations in specialized fields.
Marie-Claude L'Homme is a Professor at the Université de Montréal's Département de linguistique et de traduction, Director of the OLST, and head of a team that develops terminological resources including the DiCoEnviro dictionary of environment terminology. She is currently researching applications of the FrameNet methodology in terminology.