Difference between revisions of "Background to translation"

From Translate Science
(inti; taken from what was formerly "background on", now "background to")
 
(The example with the "beautiful pianist" was a bad one, as "beautiful" tends to be used with women, "handsome" with men, and MT tools deal with that quite well. However, with the so-to-speak gender-neutral adjective "sexy" we can still see gender bias, as this tends to turn the English "pianist" into a feminine-gendered form in German ("Pianistin"))
 
(One intermediate revision by the same user not shown)
Line 14: Line 14:
  
 
=== Technical translation and cultural influences ===
 
=== Technical translation and cultural influences ===
A common misconception is that terminology (or language on the whole) in the natural and engineering sciences is near-‘objective’ in a sense that it fosters a ‘simple’ one-to-one transfer between languages. However, cultural influences abound also in technical language, influencing terminology, phraseology, style, text structures, argumentation patterns etc. Cultural influence here does not solely refer to the larger setting of regional, national, areal or global cultures, but also to cultures of specific scientific fields and subfields (i.e., shared assumptions, traditions, practices, etc.). Even within a language, creating, e.g., something like a common terminology may be quite an undertaking especially in younger fields of research (see, e.g., Avizienis et al. 2004 for the field of dependable and fault tolerant computing). Between languages, even slightest differences in conceptualizations and uses can pose a challenge. On top of this, influence of larger cultural contexts is omnipresent not just in the humanities or social sciences, with the discussion about the English master/slave terminology in computing and electrical engineering as a very prominent and illustrative example. As pointed out above, these differences may extend to other linguistic levels such as phraseology, argumentation patterns or text structures, in some cases giving rise to strategies of translation which are often subsumed under adaptation, i.e. making deep(er) changes to the make-up of a (stretch of) text in order to make it more target culture adequate and fitting to the purpose, which can be quite in line with Nord’s principle of loyalty. Whichever strategy you choose, be aware of these cultural factors even in technical language.
+
A common misconception is that terminology (or language on the whole) in the natural and engineering sciences is near-‘objective’ in a sense that it fosters a ‘simple’ one-to-one transfer between languages. However, cultural influences abound also in technical language, influencing terminology, phraseology, style, text structures, argumentation patterns etc. Cultural influence here does not solely refer to the larger setting of regional, national, areal or global cultures, but also to cultures of specific scientific fields and subfields (i.e., shared assumptions, traditions, practices, etc.). Even within a language, creating, e.g., something like a common terminology may be quite an undertaking especially in younger fields of research (see, e.g., Avizienis et al. 2004 for the field of dependable and fault tolerant computing). Between languages, even slightest differences in conceptualizations and uses can pose a challenge. On top of this, influence of larger cultural contexts is omnipresent not just in the humanities or social sciences, with the discussion about the English master/slave terminology in computing and electrical engineering as a very prominent and illustrative example ([https://www.allaboutcircuits.com/news/how-master-slave-terminology-reexamined-in-electrical-engineering/ Charboneau 2020]). As pointed out above, these differences may extend to other linguistic levels such as phraseology, argumentation patterns or text structures, in some cases giving rise to strategies of translation which are often subsumed under adaptation, i.e. making deep(er) changes to the make-up of a (stretch of) text in order to make it more target culture adequate and fitting to the purpose, which can be quite in line with Nord’s principle of loyalty. Whichever strategy you choose, be aware of these cultural factors even in technical language.
  
 
== Translating science ==
 
== Translating science ==
Line 33: Line 33:
 
A major problem which has been described is lack of consistency. This does not only extend to the terminological level as, e.g., shown by ([https://aclanthology.org/W16-3401.pdf Čulo & Nitzke 2016]), but a system may suddenly change in the output style, switching between different forms of addressing readers, for instance. A problem which is sometimes also attributed to how CAT tools display source and target text (mostly in segments of sentences, aligned left-to-right) is that translators do not necessarily spot these inconsistencies, a sort of peephole effect, as they check sentence by sentence and thus do not easily perceive the text as a whole in their revision. Sentence-by-sentence evaluation is also the reason why MT systems often used to score better in their evaluation than they deserved and sometimes still do (see, e.g., [https://aclanthology.org/2021.humeval-1.4 Castilho 2021]; [https://doi.org/10.1075/ts.21026.kru Krüger 2022]): Being evaluated by means of checking translations of single sentences only, inconsistencies are not spotted and thus not penalised.
 
A major problem which has been described is lack of consistency. This does not only extend to the terminological level as, e.g., shown by ([https://aclanthology.org/W16-3401.pdf Čulo & Nitzke 2016]), but a system may suddenly change in the output style, switching between different forms of addressing readers, for instance. A problem which is sometimes also attributed to how CAT tools display source and target text (mostly in segments of sentences, aligned left-to-right) is that translators do not necessarily spot these inconsistencies, a sort of peephole effect, as they check sentence by sentence and thus do not easily perceive the text as a whole in their revision. Sentence-by-sentence evaluation is also the reason why MT systems often used to score better in their evaluation than they deserved and sometimes still do (see, e.g., [https://aclanthology.org/2021.humeval-1.4 Castilho 2021]; [https://doi.org/10.1075/ts.21026.kru Krüger 2022]): Being evaluated by means of checking translations of single sentences only, inconsistencies are not spotted and thus not penalised.
  
A second very serious problem, as known from other fields of AI, is that neural MT systems reproduce biases that are implicitly or explicitly encoded in the training texts, a notable issue being gender bias. When translating from a language that has little or no grammatical gender such as English into a language such as German which differentiates between a grammatical ‘masculine’, ‘feminine’ and ‘neuter’ gender (which often, but not necessarily coincide with (supposed) biological sex for nouns referring to humans), this shows: Try translating “beautiful pianist” and “clever pianist” into German with MT systems like DeepL. At the time of writing the first version of these notes, the former translates into “schöne Pianistin” (feminine gender), the latter into “geschickter Pianist” (masculine gender). Also, gendering across a text can be wildly inconsistent. And highlighting the non-deterministic and adaptive nature of such systems, the results can actually vary not only between systems, but even for one system over time.
+
A second very serious problem, as known from other fields of AI, is that neural MT systems reproduce biases that are implicitly or explicitly encoded in the training texts, a notable issue being gender bias. When translating from a language that has little or no grammatical gender such as English into a language such as German which differentiates between a grammatical ‘masculine’, ‘feminine’ and ‘neuter’ gender (which often, but not necessarily coincide with (supposed) biological sex for nouns referring to humans), this shows: Try translating “sexy pianist” and “clever pianist” into German with MT systems like DeepL. At the time of writing the first version of these notes, the former translates into “sexy Pianistin” (feminine gender), the latter into “geschickter Pianist” (masculine gender). Also, gendering across a text can be wildly inconsistent. And highlighting the non-deterministic and adaptive nature of such systems, the results can actually vary not only between systems, but even for one system over time.
  
 
Third, watch out for missing or even spurious additional text. Koehn ([https://doi.org/10.18653/v1/W17-3204 2017]) describes some of the challenges of early neural machine translation research, some of which have been addressed in the meantime, but an important one remains: MT hallucination, or also called MT fiction. Neural MT systems basically operate by trying to predict the next most likely output based on previous input (which, in principle, is the same mechanism that allows for search completion in a web search bar). Take a moment to reflect on the options you are given in a search query completion: some of them may be very fitting, others quite nonsensical. Modern MT systems have become very good at picking out the fitting options, but when they cannot ‘make sense’ of the input, they may omit something, just try to ‘guess’ or even add stuff that is not there in the source text.
 
Third, watch out for missing or even spurious additional text. Koehn ([https://doi.org/10.18653/v1/W17-3204 2017]) describes some of the challenges of early neural machine translation research, some of which have been addressed in the meantime, but an important one remains: MT hallucination, or also called MT fiction. Neural MT systems basically operate by trying to predict the next most likely output based on previous input (which, in principle, is the same mechanism that allows for search completion in a web search bar). Take a moment to reflect on the options you are given in a search query completion: some of them may be very fitting, others quite nonsensical. Modern MT systems have become very good at picking out the fitting options, but when they cannot ‘make sense’ of the input, they may omit something, just try to ‘guess’ or even add stuff that is not there in the source text.
Line 43: Line 43:
  
 
Castilho, Sheila. 2021. ‘Towards document-level human MT evaluation: On the issues of annotator agreement, effort and misevaluation’. In Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval), 34–45. Online: Association for Computational Linguistics. https://aclanthology.org/2021.humeval-1.4.
 
Castilho, Sheila. 2021. ‘Towards document-level human MT evaluation: On the issues of annotator agreement, effort and misevaluation’. In Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval), 34–45. Online: Association for Computational Linguistics. https://aclanthology.org/2021.humeval-1.4.
 +
 +
Charboneau, Tyler. 2020. ‘How “Master” and “Slave” Terminology Is Being Reexamined in Electrical Engineering - News’. Accessed 1 August 2022. https://www.allaboutcircuits.com/news/how-master-slave-terminology-reexamined-in-electrical-engineering/.
  
 
Čulo, Oliver, and Jean Nitzke. 2016. ‘Patterns of Terminological  Variation in Post-Editing and of Cognate Use in Machine  Translation in Contrast to Human  Translation’. Baltic Journal of Modern Computing 4 (2): 106–14. https://aclanthology.org/W16-3401.pdf.
 
Čulo, Oliver, and Jean Nitzke. 2016. ‘Patterns of Terminological  Variation in Post-Editing and of Cognate Use in Machine  Translation in Contrast to Human  Translation’. Baltic Journal of Modern Computing 4 (2): 106–14. https://aclanthology.org/W16-3401.pdf.

Latest revision as of 14:57, 17 November 2022


This document contains general notes on how to approach the task of translating science, touching upon the most prevalent basic notions and advice relevant to the task. The two main points presented here are (a) an introduction into a present-day functionalist view of translation which provides for a wide range of purpose-driven strategic translation options and (b) key caveats when making use of digital support tools for translation including machine translation. These general notes are meant for people who read academic texts at a postgraduate level and have experience with scholarly publishing, but may have little to no experience in translation. As technological tools are nowadays omnipresent in translation processes, they have been comprised here under basic background to translation. At the end of the document, there are pointers to deeper discussions on and more specific issues in translation from which somewhat experienced translators may benefit as well.

Translation

Translation is a cluster concept (Tymoczko 2005) that is constituted by various cultural practices with complex overlapping similarities. This includes what is sometimes referred to as ‘translation proper’, i.e. ‘transferring’ a (mostly) written source text from one language to a target text in another language. Interpreting, i.e. the ‘transfer’ of (mostly) spoken language is part of the cluster concept, just as well as localization – of software, video games and the like – or sur-/subtitling, transcreating etc. In the following, the terms translation and translate shall include all these practices.

On a side note: It is exactly this understanding of translation as a cluster of cultural practices which opens up the possibility of studying not just the linguistic differences between two texts, but the whole range of patterns of practices and power concerning translation, including, but not limited to such questions as what is translated and who commissions translations, what conscious and subconscious translation strategies are being taught and applied, how censorship and translation interact, etc. etc. This wiki page introduces key concepts and issues that inform the pragmatics of translating a specific text.

Functionalism and translation strategies

Functionalist theories of translation (see, e.g.,Vermeer 1989) have highlighted that translation is a purposeful activity, i.e. it is text production with a goal and an audience, with a precursor, the source text, which may require different levels of adaptation to the target culture. Nord (1989) introduces the spectrum between documentary and instrumental translation: The former is meant to highlight the original make-up of the source text with interlinear glosses being an extreme form, the latter aims at producing a text which is meant to act as a target culture text and should not be discernible from original texts. All in all, a functionalist approach to translation offers us a wide array of translation strategies, keeping in mind that, following Nord, we should remain loyal to both the creators of the source text as well as the intended audience of the target text.

Applied to the purpose of translating science, you might ask yourself, for instance, how to go about the subtitling of a video which introduces a scientific topic. While, of course, you will want to get the terminology and the science right, do think about what the idea of the source text is: Is it purely informatory or does the video at hand also aim to entertain? Assuming it does, what is your goal in translation: Do you mostly care about the science or do you want to entertain as well? What you probably will not want is a ‘close’ translation in a structural sense, i.e. trying to mimic the syntactic or lexical structures of the source text – unless you are aiming, e.g., at documenting which linguistic strategies can be used in a certain language for edutainment videos. Another example is that of the translation of Bron Taylor’s book “Dark green religion” into German, where the author explicitly encouraged the translator Kocku von Stuckrad to add comments explaining how historical US-related circumstances compare to those in Germany, making the text more accessible to a German audience (von Stuckrad in Taylor 2020: 303).

Technical translation and cultural influences

A common misconception is that terminology (or language on the whole) in the natural and engineering sciences is near-‘objective’ in a sense that it fosters a ‘simple’ one-to-one transfer between languages. However, cultural influences abound also in technical language, influencing terminology, phraseology, style, text structures, argumentation patterns etc. Cultural influence here does not solely refer to the larger setting of regional, national, areal or global cultures, but also to cultures of specific scientific fields and subfields (i.e., shared assumptions, traditions, practices, etc.). Even within a language, creating, e.g., something like a common terminology may be quite an undertaking especially in younger fields of research (see, e.g., Avizienis et al. 2004 for the field of dependable and fault tolerant computing). Between languages, even slightest differences in conceptualizations and uses can pose a challenge. On top of this, influence of larger cultural contexts is omnipresent not just in the humanities or social sciences, with the discussion about the English master/slave terminology in computing and electrical engineering as a very prominent and illustrative example (Charboneau 2020). As pointed out above, these differences may extend to other linguistic levels such as phraseology, argumentation patterns or text structures, in some cases giving rise to strategies of translation which are often subsumed under adaptation, i.e. making deep(er) changes to the make-up of a (stretch of) text in order to make it more target culture adequate and fitting to the purpose, which can be quite in line with Nord’s principle of loyalty. Whichever strategy you choose, be aware of these cultural factors even in technical language.

Translating science

Who can translate science?

Translation is very likely more often than not: co-creation. Professional translators will have learned how to quickly adapt to the terminology, phraseology, style of a field, how to invent new terminology, how to perform effective research in cases of doubt and – actually one of the most challenging and frequent problems in translation – how to deal with faulty, ambiguous or badly formulated (stretches of) source texts; but technical expertise is still often required for translation, inside as much as outside of translating science. Many works are translated by (groups of) people with domain knowledge and the necessary linguistic competences, and it is not unusual to have MA or PhD translation students as well as career jumpers from completely different fields than linguistics given a certain background in their languages and cultures of interest.

This provides us with a number of options when it comes to the question of who could translate science: It could be scientists, alone or in groups with complementary domain or language skills; some institutions might even have translation services that can spare at least some time to (aid) translate science; or some stakeholders might have money on the side to commission translation. In all cases, however, the domain knowledge of scientists will be crucial, and should you be in the position to commission a translation, be prepared to answer questions on linguistic and other aspects of the field in question.

Aiding (commissioned) translation / translators

In any case, for a commissioned translation be prepared to act as the domain expert as a scientist. You can aid the linguistic side of a (commissioned) translation if you have some sort of terminology (e.g. any dictionary for your field that you have at hand) at the disposal of those who translate, or if you have a collection of texts (ideally in all languages involved) which you can make available so that term candidates and collocation patterns can be extracted quickly by means of the appropriate tools (see, e.g., on this wiki; professional translators should have acquired access to such tools). If you have commissioned a translation, the use of tools which allow for collaborative work can be a great help, e.g. in order to quickly comment on questions translators might have.

Translation tools and their caveats

Translation tools (often referred to as CAT tools for ‘computer aided translation’) are a great means of streamlining some of the elements of a translation process, such as checking terminology or retrieving existing translations (so-called Translation Memories). Modern versions of these tools allow for a web-based, collaborative translation, giving collaborators such possibilities as revising and/or commenting proposed translations, evaluating existing translations or adding machine translation (henceforth MT) support. Modern MT systems are based on artificial neural networks, which have boosted quality considerably since roughly the mid-2010s.

CAT tools, with or without added machine translation support, have been studied from various angles. While they in general increase efficiency and often ease the task of translation as translators do not have to start from scratch, there are some caveats to be kept in mind when working with them. Here are some of the more important ones.

A major problem which has been described is lack of consistency. This does not only extend to the terminological level as, e.g., shown by (Čulo & Nitzke 2016), but a system may suddenly change in the output style, switching between different forms of addressing readers, for instance. A problem which is sometimes also attributed to how CAT tools display source and target text (mostly in segments of sentences, aligned left-to-right) is that translators do not necessarily spot these inconsistencies, a sort of peephole effect, as they check sentence by sentence and thus do not easily perceive the text as a whole in their revision. Sentence-by-sentence evaluation is also the reason why MT systems often used to score better in their evaluation than they deserved and sometimes still do (see, e.g., Castilho 2021; Krüger 2022): Being evaluated by means of checking translations of single sentences only, inconsistencies are not spotted and thus not penalised.

A second very serious problem, as known from other fields of AI, is that neural MT systems reproduce biases that are implicitly or explicitly encoded in the training texts, a notable issue being gender bias. When translating from a language that has little or no grammatical gender such as English into a language such as German which differentiates between a grammatical ‘masculine’, ‘feminine’ and ‘neuter’ gender (which often, but not necessarily coincide with (supposed) biological sex for nouns referring to humans), this shows: Try translating “sexy pianist” and “clever pianist” into German with MT systems like DeepL. At the time of writing the first version of these notes, the former translates into “sexy Pianistin” (feminine gender), the latter into “geschickter Pianist” (masculine gender). Also, gendering across a text can be wildly inconsistent. And highlighting the non-deterministic and adaptive nature of such systems, the results can actually vary not only between systems, but even for one system over time.

Third, watch out for missing or even spurious additional text. Koehn (2017) describes some of the challenges of early neural machine translation research, some of which have been addressed in the meantime, but an important one remains: MT hallucination, or also called MT fiction. Neural MT systems basically operate by trying to predict the next most likely output based on previous input (which, in principle, is the same mechanism that allows for search completion in a web search bar). Take a moment to reflect on the options you are given in a search query completion: some of them may be very fitting, others quite nonsensical. Modern MT systems have become very good at picking out the fitting options, but when they cannot ‘make sense’ of the input, they may omit something, just try to ‘guess’ or even add stuff that is not there in the source text.

Last but not least, data ethics should be raised as an issue here. Note that for web-based CAT tools and/or machine translation systems (also those that you can plug into your locally installed CAT tool), the source text will be copied over to and processed by multiple other machines. Even if you have the permission to produce a translation that is accessible under more liberal terms, this can technically be a violation of copyright for the source text if it falls under stricter copyright terms. Anonymization of people which may not have been much of an issue for printed, narrowly distributed material can also pose an issue in such settings, even if you chose to perform anonymization for the target text. Ecological matters may apply as well, giving rise to the question how often and at which stage(s) MT should be used: it requires, after all, quite a bit of computing power. For a more in-depth discussion of ethics and the use of machine translation, see (Moorkens 2022).

Literature

Avizienis, Algirdas, J-C Laprie, Brian Randell, and Carl Landwehr. 2004. ‘Basic Concepts and Taxonomy of Dependable and Secure Computing’. IEEE Transactions on Dependable and Secure Computing 1 (1): 11–33.

Castilho, Sheila. 2021. ‘Towards document-level human MT evaluation: On the issues of annotator agreement, effort and misevaluation’. In Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval), 34–45. Online: Association for Computational Linguistics. https://aclanthology.org/2021.humeval-1.4.

Charboneau, Tyler. 2020. ‘How “Master” and “Slave” Terminology Is Being Reexamined in Electrical Engineering - News’. Accessed 1 August 2022. https://www.allaboutcircuits.com/news/how-master-slave-terminology-reexamined-in-electrical-engineering/.

Čulo, Oliver, and Jean Nitzke. 2016. ‘Patterns of Terminological  Variation in Post-Editing and of Cognate Use in Machine  Translation in Contrast to Human  Translation’. Baltic Journal of Modern Computing 4 (2): 106–14. https://aclanthology.org/W16-3401.pdf.

Koehn, Philipp, and Rebecca Knowles. 2017. ‘Six Challenges for Neural Machine Translation’. In Proceedings of the First Workshop on Neural Machine Translation, 28–39. Vancouver: Association for Computational Linguistics. https://doi.org/10.18653/v1/W17-3204.

Krüger, Ralph. 2022. ‘Some Translation Studies Informed Suggestions for Further Balancing Methodologies for Machine Translation Quality Evaluation’. Translation Spaces, March. https://doi.org/10.1075/ts.21026.kru.

Moorkens, Joss. 2022. ‘Ethics and Machine Translation’. In Machine Translation for Everyone: Empowering Users in the Age of Artificial Intelligence, edited by Dorothy Kenny, 121–40. Translation and Multilingual Natural Language Processing 18. Language Science Press. https://zenodo.org/record/6653406.

Nord, Christiane. 1989. ‘Loyalität Statt Treue. Vorschläge zu einer funktionalen Übersetzungstypologie’. Lebende Sprachen, no. 3: 100–105.

Taylor, Bron. 2020. Dunkelgrüne Religion: Naturspiritualität und die Zukunft des Planeten. Translated by Kocku von Stuckrad. Leiden, Netherlands: Brill, Wilhelm Fink.

Tymoczko, Maria. 2005. ‘Trajectories of Research in Translation Studies’. Meta 4 (50): 1082–97.

Vermeer, Hans J. 1989. ‘Skopos and commission in translational action’. In Readings in Translation Theory, edited by Andrew Chesterman, 173–87. Helsinki: Oy Finn Lectura AB.