TTR Changes in Different Directions of Translation

Volume 17, No. 1
January 2013

Born in 1978, in Kyiv (Ukraine, Sergiy Fokin, PhD, graduated from the Faculty of Foreign Philology at Kyiv Taras Shevchenko University in 2000, and, simultaneously, within a student exchange double degree program, from the Faculty of Philosophy and Literature at Granada University (Spain) in 2001.

In 2004 published and defended the PhD thesis “Grammar voice transformation in translation from Spanish into Ukrainian” directed by AP O.M.Kalustova. From 2003 to 2010, assistant, and, from 2010 to the present, Associate Professor in Theory and Practice of Translation from Romanic Languages “Mykola Zerov” Chair of Kyiv Taras Shevchenko University.

Scientific and didactic interests: theory and practice of translation, linguistic statistics, computational lexicography, note-taking in consecutive interpreting.

Has published 30 articles and papers Translation practice: numerous written technical, law, medical translations (Spanish-Ukrainian, Italian-Ukrainian, Ukrainian-Spanish), consecutive interpreting Spanish-Ukrainian and Ukrainian-Spanish: commercial (entrepreneur missions at Embassy of Spain in Kyiv, Spanish companies: Atlantic Agricola, Negarra, Baicha, Kroma Telecom and others);

Sergiy can be reached at
sergiyborysovych@ukr.net.

Front Page

TTR Changes in Different Directions of Translation

by Sergiy Fokin, PhD, AP,
Kyiv Taras Shevchenko University (Ukraine)

Abstract

he types-tokens ratio (TTR), which is calculated by dividing the number of different word forms (types) in a text by the total number of the words (tokens), roughly characterizes the lexical variety of the text. This makes it intriguing to compare this parameter in the original texts and Sin translations from the theoretical and practical points of view. After analyzing our proper empiric material, four Spanish-Ukrainian translations and four Ukrainian-Spanish translations compared with their respective originals, along with the results of other researchers in different language combinations, it turned out that TTR modifications show common tendencies depending on the typological characteristics of the source language and target language, and the direction of translation, rather than the lexical variety of the text.

1. Introduction

There have been a number of attempts to describe the quantitative characteristics of vocabulary, both of a language or sublanguage system and of the language of a particular author, from the point of view of the number and frequency of types, tokens, and and lexemes. We will not even attempt to offer a more or less exhaustive list of these. Gradually, with the development of corpus linguistics, theorists of translation studies have picked up several quantitative ideas from linguistics and, trying to make their criteria for evaluating the differences between original and target texts and the lexical-stylistic adequacy of the translation objective, started to calculate those parameters, which was easily accomplished using computing techniques. Within the last decade there has been a boom in the amoount of research into quantitative parameters in translation studies due to the use of electronic corpora. Although attempts to create corpora started before, corpus-based research did not emerge until the late 90s (Kruger, 70), being theoretically generalized in translation studies by M. Baker (Baker, 175-186) and other theoreticians. The availability of copora, their relatively easy compilation and/compatibility with personal computers meant that investigations carried out by individual researchers, even with manually compiled corpora, became possible and popular.

TTR can be a useful parameter for comparing translations with the respective originals from the practical and theoretical points of view.

With the use of electronic text analysis tools, it became possible to calculate the number of words and also word-forms (‘types,’ also called ‘orthographic words’) of a text automatically, literally with one click of a mouse, which was previously possible only by the method of total continuous extraction of samples or the like. Logically, the number of types compared to the total number of the words in a text will give a coefficient indirectly indicating the lexical richness of a text. “The higher the ratio, the more varied the vocabulary, i.e. the implication is that there is little repetition” (A.Kruger, 74). This coefficient, obtained by dividing the number of types by the number of tokens (also called ‘running words’), was first named TTR (types/tokens ratio), presumably by M.Templin in 1957 in the area of language didactics (cit. after Rhea, 2007, 476), highlighting a wide field for investigations in particular and general translation studies with the purpose of unveiling one more universal parameter of translation.

For example, the sentence “I have to buy some bread, because I have no bread” is stylistically awkward, and its TTR is low. (three word forms are repeated, TTR = 8/11 = 0,73), whereas “I’ve run out of bread, so I need to buy some” is much better stylistically and richer lexically, and its TTR is higher (only one word form is repeated, TTR = 10/11 = 0,91).

However, this rule is only applicable with reservations expressed by words such as ‘likely’ and ‘indirectly.’ And we should add another implication to that of A.Kruger that there are few repetitions of different types of the same lexeme. As the number of types may be quite extensive due to the large number of grammatical forms for a lexeme in inflexional or incorporating languages, TTR should be very sensitive to the variety of grammatical forms in the text. For instance, in the Present tense of Indicative Mode in English a verbal lexeme presents two flexed forms; in Ukrainian, as well as in Spanish, six forms. That’s why a high TTR may indirectly indicate not only lexical richness, but also grammatical (morphological) richness. A natural question is: which of the factors, lexical or grammatical richness, is more significant in a TTR? In spite of this doubt, it would be hard to deny that for the same language a text characterized by a higher TTR is certainly richer from the lexical point of view. However, the same statement is questionable when comparing texts in different languages, as usually happens in translation.

2. Related works and discussion

It has been stated that translated texts in a language differ from their original by a lower TTR (V.Pápai, 157), which can suggest that they are less rich lexically.

For instance, V.Pápai, having researched explicitation strategies in translation using four English-Hungarian fiction translations in her work “Explicitation: a universal of translated text?” argues that TTR is lower in translated texts than in non-translated text in Hungarian (V.Pápai, 159). But this does not necessarily mean that TTR should be lower in a translated text compared with the original. For instance, A.Kutuzov, from Tyumen State University, shows that in English-Russian translation the TTR becomes higher (A.Kutuzov, 10). Meanwhile, A.Kruger demonstrates that in English-German translations the TTR is lower than in the original (his empiric base was four Shakespeare texts) (A.Kruger, 74). So, a preliminary theoretical analysis suggests that TTR changes show a noticeable dependence on the language combination. As these changes may also depend on the translation direction, in the present research we are attempting to examine this hypothesis using both Spanish-Ukrainian and Ukrainian-Spanish translations, as well as trying to reveal the regularities of these dependences.

Before introducing our results, we should first stop and think about the strengths and weaknesses of the TTR comparison method to describe the lexical richness of a text. It must be accepted that this method is too simple and approximate. It is undeniable that this ratio is sensitive to text or corpus length. The longer a text, the more likely it is that words will be repeated, thus lowering the ratio; thus, in short texts this ratio is not representative. This ratio is widely used, since it can be easily calculated by any text analysis tool and the functioning of these tools does not depend on the language system.

However, a high number of types does not necessarily mean a high number of lexemes. To be more exact, if we want to calculate the lexical variety of a text, we should divide the number of lexemes (i.e. their respective lemmas used in a text) by the number of tokens. Since it is quite time-consuming to calculate the lexemes, their number is usually not taken into account. Let us incidentally note that also the number of lemmas can be calculated by specific software known as lemmatizer, which is designed for every language separately and usually requires a time-consuming work of processing a great number of morphological rules, exceptions, and vocabulary. It is usually not freeware. Thus, the TTR seems to show easily, indirectly, and roughly the variety of words, rather than the lexical richness of a text; it is “a simple indication of the superficial lexical complexity of a text” (Munday 1998:4) along with its grammatical complexity--we might add. In spite of the above, we do not deny by any means its theoretical usefulness. A. Kutuzov, for instance, after researching the variation of TTR from the original to the translated text, concludes that their graphs are extremely similar from chapter to chapter (A.Kutuov, 8-9). A. Kutuzov’s method by itself can be another useful tool to ‘measure’ the adequacy of translation. Unfortunately, we cannot afford to concentrate here on other important and interesting uses of TTR, although they do exist.

3. Hypothesis

As shown above, the number of types in a text may depend on two basic factors: the number of lexemes and the number of different grammatical forms. Hypothetically, in non-flextional and incorporating languages the TTR should be higher, as the same lexeme will present a wide number of types, while in ‘more analytic’ languages the TTR should be lower and tending to approach the ‘lemmas/tokens’ ratio, since most lemmas would present only one type (type number ≈ lemmas number). This hypothesis (hypothesis #1), both plausible and logical, we suppose, will not present serious contradictions, although it is still to be proven in the area of contrastive linguistics. It needs to be tested by comparing original (untranslated) texts in different languages with the same or similar content, such as international agreements, constitutions, laws, similar literary genres etc. Nevertheless, as indicated above, we have seen a clear dependence of the changes in TTR on the language combination and translation direction. Ch.Ho-Jeong has observed in English-Korean and Korean-English translations that several changes, such as contraction/expansion of the text, depend on the direction of translation (Ch.Ho-Jeong, 362). On the other hand, E. Kelih, investigating translation of a Russian novel into 11 Slavic languages (E.Kelih, 179) implicitly proves that the TTR changes depend on the source and target languages. Let us incidentally note that we deduced that by attentively reading his article, because the researcher miscalculated the TTR by confusing the divisor and the dividend.

Our actual hypothesis (hypothesis #2) will refer to translation studies, not to contrastive linguistics: when the degree of synthetism of the language increases from the original to the translation, the TTR rises, and, vice versa, when the degree of synthetism decreases from the original to the translation, the TTR will decrease. If hypothesis #2 is correct, it may also indirectly confirm hypothesis #1.

4. Empirical test

Assuming that hypotheses #1 and #2 are correct, i.e., when translating from an analytic language into a flexional one, the TTR rises, and, vice versa, when translating from an inflexional or incorporating language into a more analytic one, the TTR decreases, our hypothesis is true (naturally, there should be room for exceptions for the influence of extralinguistic factors). If we deal with Spanish and Ukrainian texts, Spanish is a more analytic language compared to Ukrainian. After analyzing four Spanish-Ukrainian and four Ukrainian-Spanish fiction translations, we obtained the following results:

Table 1. TTR changes in Spanish-Ukrainian and Ukrainian translation.

Work	Total types in the original	Total tokens in the original	Types / tokens ratio in the original	Total types in the translation	Total tokens in the translation	Types / Tokens Ratio in the translation	TTR change	Translator	Confir-rmation of the hypothesis
Spanish – Ukrainian translation
G.García Márquez “El amor en los tiempos del cólera”	15 352	145 108	0,1058	28 357	126 394	0,2244	0,47 (rises)	V.Shovkun	+
B. Pérez Gadós “Doña Perfecta”	11 117	65 177	0,1705	15 827	54 474	0,2905	0,59 (rises)	Zh.Konye-va	+
P.A. de Alarcón “El sombrero de tres picos”	5 572	25 768	0,2162	7 303	20 622	0,3541	0,61 (rises)	Zh.Konye-va	+
P.A. de Alarcón “El sombrero de tres picos”	5 572	25 768	0,2162	6 881	20 117	0,3420	0,63 (rises)	L.Dobryan-s’ka, L.Kolesnyk	+
Ukrainian-Spanish translation
І.Франко “Захар Беркут”	13 352	50 372	0,2651	10 049	61 472	0,1635	-0,62 (decreases)	S.Ryzva-niuk	+
М. Коцюбинський “Тіні забутих предків”	6 197	15 766	0,3811	5 639	26 027	0,2167	-0,57 (decreases)	J.Bory-syuk	+
О.Довженко “Зачарована Десна”	6 081	15 956	0,3811	5 523	19 828	0,2785	-0,73 (decreses)	R.Hupalo	+
Ю. Яновський “Вершники”	10 122	27 123	0,3732	8 647	38 325	0,2263	-0,6 (decreases)	S.Ryzva-niuk	+

As we see from the Table 1, the TTR decreases in all instances of the Ukrainian-Spanish translation direction and it rises in all instances of the Ukrainian-Spanish translations of our corpus. This tendency does not seem to depend on the translator.

5. Data interpretation and generalization

As we can see from Table 1, the results of the randomly chosen eight texts and their respective translations prove that TTR rises in Spanish-Ukrainian translation and it decreases in the opposite direction. This seems to be, if not a universally valid, but quite a clear tendency for this pair of languages. As this conclusion is valid solely for Spanish-Ukrainian and Ukrainian-Spanish translations, in order to extrapolate the results from different particular theories into the general one, we propose a table which will clearly indicate the general tendency. We’ve gathered several researchers’ results in Table 2.

Table 2. TTR changes in translation within different language combinations.

	Direction of translation	Degree of synthetism of the target language	TTR	Researcher	Confirmation of the hypothesis
1	English-Russian	Rises	rises	A. Kutuzov (Kutuzov,10)	+
2	English-German	rises	decreases	A. Kruger (Kruger, 74)	-
3	Spanish - English	decreases	decreases	J. Munday 1998, (Munday 4)	+
4	English-Chinese	decreases	decreases	Y. Tsai (Tsai, 75)	+
5	English-Polish	decreases	decreases	R. Uzar (R. Uzar, 259)	+
6	Russian- Macedonian	decreases	decreases	E. Kellih (Kelih, 179)	+
7	Russian-Serbian	decreases	decreases	E. Kellih (Kelih, 179)	+
8	Russian-Bulgarian	decreases	decreases	E. Kellih (Kelih, 179)	+
9	Russian-Slovene	decreases	decreases	E. Kellih (Kelih, 179)	+
10	Russian-Croation	decreases	decreases	E. Kellih (Kelih, 179)	+
11	Spanish-Ukranian	Rises	rises	S. Fokin (the present study)	+
12	Ukranian-Spanish	decreases	decreases	S. Fokin (the present study)	+
13	Finnish-Russian	decreases	decreases	M. Kopotev (Копотев, 379)	+

Therefore, the general picture mostly confirms the hypothesis. Exception number 2 (English-German translation, A. Kruger’s data) may have an explanation in extralingustic factors. However, we consider, try as we might, this will remain a tendency and not a general rule, because translation can be a conscious process, so that sometimes translators could consciously influence the TTR index for their own reasons, for example, trying to show the richness of their vocabulary or that of their native language, while it is quite absurd to imagine that a translator would try to artificially increase the number of grammatical forms in the translated text.

We cannot deny that translated texts show a lower TTR in comparison with the original texts, but ‘the third code,’ evidently, is not the only factor that influences the changes in TTR in translation; the typological differences between the source language and the target language turn out to be a much more powerful factor.

6. Conclusion

TTR can be a useful parameter for comparing translations with the respective originals from the practical and theoretical points of view. Changes in the TTR in translation can indirectly indicate modifications in the lexical variety; thus, it can be important for roughly evaluating this aspect of the adequacy of translation, as well as the translator’s and the author’s idiostyle. Much more significant, from our point of view, is its theoretical significance. Apart from being a universal fact that translated text is characterized by a lower TTR than the original (consequently is less varied lexically), the change in the TTR in translation follows a common tendency. When translating from an analytic language into a more synthetic one, the TTR rises; translating in the opposite direction, it decreases. While this is a strong tendency, it is not a universal law because of the strong influence of extralinguistic factors. In order to make this kind of research more precise, the ratio of the number of lemmas used to the number of tokens (lemmas-token ratio) should be applied when evaluating the lexical richness of a text in the original and the translation, although this method is further complicated by the lack of lemmatizers for several languages and their high cost.

References

Baker, M. (1996). Corpus-based translation studies: The challenges that lie ahead. In Terminology, LSP, and translation: studies in language engineering in honour of Juan C. Sager. – Amsterdam: John Benjamins. – P. 175-186.

Dovzhenko, A. (1972). El Desná encantado / Traducido por R. Hupalo. – Kiev: Dnipro.– 86 p.

Franko, I. (1983). Zakhar Bérkut / Traducido por S. Ryzvaniuk. – Kiev: Dnipro. – 199 p.

García Márquez G. (1986). El amor en los tiempos del cólera. – La Habana: Arte y literatura sólo para Cuba. – 460 p.

Ho-Jeong, Ch. (2006). Target Text Contraction in English-into-Korean Translations: A Contradiction of Presumed Translation Universals? In Meta: journal des traducteurs, vol. 51 – n° 2. – P. 343-367.

Janovskyj, J. (1982). Los jinetes. / Traducido por S. Ryzvaniuk. – Kiev: Dnipro. – 127 p.

Kelih, E. (2009). Preliminary Analysis of a Slavic Parallel Corpus. – NLP, Corpus Linguistics, Corpus Based Grammar Research. Fifth International Conference Smolenice, Slovakia, 25-27 November 2009. – Bratislava: Tribun. – P. 175-183. (Accessed online on 21 November 2012 at http://www.uni-graz.at/emmerich.kelih/Publikationen/2009_slovko_slavic_parallel_corpora_kak_zakaljalas_stal_kelih.pdf )

Kotsiubinskiy, M. (1972). La sombra de los antepasados olvidados y otros relatos / Traducido del por J. Borysiuk. – K.: Dnipro. – 330 p.

Kruger, A. (2002). Corpus-based translation research: its development and implications for general, literary and Bible translation. In. Acta Theologica, Supplementum 2. – P. 70-106.

Kutuzov, A. (2010). Change of word types to word tokens ratio in the course of translation (based on Russian translations of k. Vonnegut's novels). In International Computational Linguistic Conference “Dialog-21” (Accessed online on 21 November 2012 at http://arxiv.org/ftp/arxiv/papers/1003/1003.0337.pdf)

Munday, J. (1998). A computer-assisted approach to the analysis of translation shifts. In Meta: journal des traducteurs, vol. 43, n° 4. – P. 542-556.

Pápai, V. (2004). Explicitation: a universal of translated text? In Translation universals: Do they exist? / Edited by Anna Mauranen, Pekka Kujamäki. – Amsterdam. – John Benjamins B.V., 2004. – P. 145-164.

Pérez Galdós, B. (1964). Doña Perfecta. – Москва: Радуга. – 276 с.

Rhea, P. (2007). Language disorders from infancy through adolescence: assessment & intervention, 3rd edn. – St. Louis: Mosby/Elsevier. – 784 p.

Tsai, Y. (2010). Text Analysis of Patent Abstracts. In The Journal of Specialised Translation. – Issue 13. – National Taiwan University. – P. 61-80. (Accessed online on 21 November 2012 at http://www.jostrans.org/issue13/art_tsai.pdf)

Uzar, R. (2002). A Corpus Methodology for Analysing Translation. In Cadernos de Tradução. – Universidade Federal de Santa Catarina. – P. 237-265.

Аларкон, П.-А. (1958). Трикутний капелюх / пер. Л. Добрянської і Л. Колесник. – К.: Державне видавництво художньої літератури. – 80 с.

Аларкон, П.-А. (1983). Трикутний капелюx / пер. Ж.Конєвої. – К.: Дніпро. – 176 с.

Ґарсія Маркес, Ґ. (1999). Кохання в час холери. – Львів: Класика. – 346 с.

Довженко, О. (1957). Зачарована Десна. Кіноповісті . – Київ.: Радянський письменник. – С.459-507.

Копотев, М. (2010). Я никогда не буду так говорить. Языковая компетенция и языковая рефлексия американской финки из СССР . In Slavica Helsingiensia 40 Instrumentarium of Linguistics. Sociolinguistic Approach to Non-Standard Russian. – Helsinki. - (Accessed online on 21 November 2012 http://www.helsinki.fi/slavicahelsingiensia/preview/sh40/pdf/26-sh40.pdf ).

Коцюбинський, М. (1989). Тіні забутих предків. In Подарунок на іменини. Оповідання, новели, повісті. – К.

Перес Гальдос, Б. (1978).Донья Перфекта ; Сарагоса / пер. Ж. Конєвої – Київ: Дніпро, 1978. – 350 с

Франко, І. (1994). Захар Беркут: Роман / Микола Костомаров. Чернигівка: Повість – Київ: Укр. Центр духовної культури,. – 312 с.

Яновський, Ю. (1984). Оповідання, романи, п'єси. – Київ: Наукова думка. – 578 с.