Context is everything: a context-aware annotation typology for dialogue translation quality assessment

Miguel Menezes; Amin Farajian; Helena Moniz; João Graça

doi:10.5565/rev/tradumatica.508

PDF (English)

Citacions a Google Acadèmic

DOI: https://doi.org/10.5565/rev/tradumatica.508

Miguel Menezes

INESC-ID, ULisboa, Unbabel

Amin Farajian

Unbabel

Helena Moniz

University of Lisbon, INESC-ID, CLUL

João Graça

Fins fa poc, la majoria dels sistemes de traducció automàtica (TA) traduïen les oracions de manera aïllada, i deixaven de banda un context clau a nivell de document a causa de l’escassetat de dades d’entrenament centrades en el discurs i de la manca de mètodes d’avaluació sòlids. Presentem un marc d’anotació sensible al context, validat sobre un conjunt de dades d’atenció al client amb un acord interanotador substancial (κ de Cohen = 0,73), que podria oferir un nou estàndard per a l’avaluació contextual de la TA.

Paraules clau

traducció automàtica, fenòmens discursius, context, fluxos de treball d’avaluació de la qualitat de la traducció, marc d'anotació sensible al context

Com citar

Menezes, Miguel et al. «El context ho és tot: una tipologia d’anotació sensible al context per a l’avaluació de la qualitat de la traducció de diàlegs». Tradumàtica: traducció i tecnologies de la informació i la comunicació, 2025, núm. 23, p. 383-16, doi:10.5565/rev/tradumatica.508.

Drets

Aquesta obra està sota una llicència internacional Creative Commons Reconeixement 4.0.

(c) Miguel Menezes, Amin Farajian, Helena Moniz, João Graça, 2025

Referències

Amidei, Jacopo; Piwek, Paul; Willis, Alistair (2019). Agreement is overrated: A plea for correlation to assess human evaluation reliability. In: Van Deemter, Kess; Lin, Chenghua; Takamura, Hiroya (eds,). In: van Deemter, Kess; Lin, Chenghua; Takamura, Hiroya (eds.). Proceedings of the 12th International Conference on Natural Language Generation. Association for Computational Linguistics, pp. 344–354. <https://aclanthology.org/W19-8642>. [Accessed: 20251217].

Bawden, Rachel (2018). Going beyond the sentence: Contextual machine translation of dialogue [Doctoral dissertation]. Université Paris-Saclay. Paris.

<https://tel.archives-ouvertes.fr/tel-02066998>. [Accessed: 20251217].

Birner, Betty J. (2012). Introduction to pragmatics. Hoboken, NJ: John Wiley.

Bublitz, Wolfram (2011). Cohesion and coherence. In: Zienkowski, Jan; Östman, Jan-Ola; Verschueren, Jef (eds.). Discursive Pragmatics. Handbook of Pragmatics Highlights. Amsterdam; Philadelphia: John Benjamins, pp. 37–50. <https://doi.org/10.1075/hoph.8>. [Accessed: 20251217].

Cai, Xiaoyu; Xiong, Deyi (2020). A test suite for evaluating discourse phenomena in document-level neural machine translation. In: Liu, Qun; Xiong, Deyi; Ge, Shili; Zhang, Xiaojun (eds.). Proceedings of the Second International Workshop on Discourse Processing. Association for Computational Linguistics, pp. 13–17. <10.18653/v1/2020.iwdp-1.3>. [Accessed: 20251217].

Castilho, Sheila; Doherty, Stephen; Gaspari, Federico; Moorkens, Joss (2018). Approaches to human and machine translation quality assessment. In: Moorkens, Joss; Castilho, Sheila; Gaspari, Federico; Doherty, Stephen (eds.). Translation Quality Assessment: From Principles to Practice. Cham: Springer, pp. 9–38.

Castilho, Sheila; Cavalheiro Camargo, João Luiz; Menezes, Miguel; Way, Andy (2021). DELA corpus: A document-level corpus annotated with context-related issues. In: Barrault, Loic; et al. (eds.). Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 1–12. <https://aclanthology.org/2021.wmt-1.63/>. [Accessed: 20251217].

Cohen, Jacob (1988). Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale, N.J.: Lawrence Erlbaum Associates.

Escribe, Marie (2019). Human evaluation of neural machine translation: The case of deep learning. In: Temnikova, Irina.; Orasan, Constantin.; Corpas Pastor, Gloria.; Mitkov, Ruslan (eds.). Proceedings of the Human-Informed Translation and Interpreting Technology Workshop on Human-Informed Translation and Interpreting Technology (HiT-IT 2019). Association for Computational Linguistics, pp. 36–46. <https://aclanthology.org/W19-8705>. [Accessed: 20251217].

Fang, Qiong (2018). A study of the impact of translation ecosystem on the translator from the perspective of restriction factors. IOP Conference Series: Materials Science and Engineering, v. 452, n. 3, 032020. <https://doi.org/10.1088/1757-899X/452/3/032020>. [Accessed: 20251217].

Farinha, Ana C.; Farajian, M. Amin; Buchicchio, Marco; Fernandes, Patrick; De Souza, José G. C.; Moniz, Helena; Martins, André F. T. (2022). Findings of the WMT 2022 shared task on chat translation. In: Koehn, Philipp; et al. (eds.). Proceedings of the Seventh Conference on Machine Translation (WMT). Association for Computational Linguistics, pp. 724–743. <https://aclanthology.org/2022.wmt-1.72>. [Accessed: 20251217].

Freitag, Markus; Rei, Ricardo; Mathur, Nitika; Lo, Chi-Kiang; Craig, Stewart; Foster, George; Bojar, Ondřej (2021). Results of the WMT21 metrics shared task: evaluating metrics with expert-based human evaluations on TED and News Domain. In: Barrault, Loic; et al. (eds.). Proceedings of the Sixth Conference on Machine Translation. Association for Computational Linguistics, pp. 733–774. <https://aclanthology.org/2021.wmt-1.74>. [Accessed: 20251217].

Garvin, David A. (1984). What does “quality” really mean? Sloan Management Review, v. 25, n. 1, pp. 25–43.

Grice, H. Paul (1991). Studies in the way of words. Cambridge, MA: Harvard University Press.

Hassan, Hany; Aue, Anthony; Chen, Chang; Chowdhary, Vishal; Clark, Jonathan; Federmann, Christian; et al. (2018). Achieving human parity on automatic Chinese-to-English news translation. ArXiv:1803.05567. <https://doi.org/10.48550/arXiv.1803.05567>. [Accessed: 20251217].

Habermas, Jürgen (1979). Communication and the evolution of society. Boston: Beacon Press.

Halliday, Michael A. K. (1989). Language, context and text. Geelong: Deakin University Press.

Horn, Laurence R.; Ward, Gregory L. (eds.) (2004). The handbook of pragmatics. Oxford: Wiley.

Horton, William S. (2012). Shared knowledge, mutual understanding and meaning negotiation. In: Hans-Jörg Schmid (ed.). Cognitive Pragmatics. Berlin; Boston: De Gruyter Mouton, pp. 375–398.

Jin, Lifeng; He, Jie; May, Jonathan; Ma, Xuezhe (2023). Challenges in context-aware neural machine translation. arXiv:2305.13751. <https://doi.org/10.48550/arXiv.2305.13751>. [Accessed: 20251217].

Jwalapuram, Prathyusha; Rychalska, Barbara; Joty, Shafiq; Basaj, Dominik (2021).

DiP benchmark tests. arXiv preprint. <https://doi.org/10.48550/arXiv.2004.14607>. [Accessed: 20251217].

Koby, Geoffrey S.; Fields, Paul; Hague, Daryl R.; Lommel, Arle; Melby, Alan (2014). Defining translation quality. Revista Tradumàtica: tecnologies de la traducció, n. 12, pp. 413–420. <https://doi.org/10.5565/rev/tradumatica.76>. [Accessed: 20251217].

Läubli, Samuel; Sennrich, Rico; Volk, Martin (2018). Has machine translation achieved human parity? A case for document-level evaluation. ArXiv:1808.07048. <https://doi.org/10.48550/arXiv.1808.07048>. [Accessed: 20251217].

Läubli, Samuel; Castilho, Sheila; Neubig, Graham; Sennrich, Rico; Shen, Qinlan; Toral, Antonio (2020). A set of recommendations for assessing human–machine parity in language translation. Journal of Artificial Intelligence Research, v. 67, pp. 653–672. <https://doi.org/10.1613/jair.1.11371>. [Accessed: 20251217].

Lommel, Arle; Uszkoreit, Hans; Burchardt, Aljoscha (2014). Multidimensional Quality Metrics (MQM): A framework for declaring and describing translation quality metrics. Revista Tradumàtica: tecnologies de la traducció, n. 12, pp. 455–463. <https://ddd.uab.cat/pub/tradumatica/tradumatica_a2014n12/tradumatica_a2014n12p455.pdf>. [Accessed: 20251217].

Lommel, Arle; Gladkoff, Serge; Melby, Alan; Wright, Sue Ellen; Strandvik, Ingegerd; Gasova, Kristyna; Nenadic, Goran (2024). The multi-range theory of translation quality measurement: MQM scoring models and statistical quality control. arXiv:2405.16969. <https://arxiv.org/abs/2405.16969>. [Accessed: 20251217].

Malinowski, Bronisław (2000). The problem of meaning in primitive languages. In: Lucy Burke; Tony Crowley; Alan Girvin (eds.). The Routledge Language and Cultural Theory Reader. London; New York Routledge, pp. 386–395. [Accessed: 20251217].

Menezes, Miguel; Farajian, M. Amin; Moniz, Helena; Varelas Graça, João (2023). A Context-Aware Annotation Framework for Customer Support Live Chat Machine Translation. In: Utiyama, Masao; Wang, Rui (eds.). Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track, Macau SAR China. Asia-Pacific Association for Machine Translation, pp. 286–297. <https://aclanthology.org/2023.mtsummit-research.24/>. [Accessed: 20251217].

Müller, Mathias; Rios, Annette; Voita, Elena; Sennrich, Rico (2018). A large-scale test set for pronoun translation. ArXiv:1810.02268. <https://arxiv.org/abs/1810.02268>. [Accessed: 20251217].

Nord, Christiane (2014). Translating as a purposeful activity. London: Routledge.

O’Brien, Sharon (2023). Human-centered augmented translation. Perspectives, v. 32, n. 3), pp. 391–406. <https://doi.org/10.1080/0907676X.2023.2247423>. [Accessed: 20251217].

Papineni, Kishore; Roukos, Salim; Ward, Todd; Zhu, Wei-Jing (2002). BLEU: A Method for Automatic Evaluation of Machine Translation. In: Pierre, Isabelle; Charniak, Eugene; Lin, Dekang (eds.). Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 311–318. <https://aclanthology.org/P02-1040>. [Accessed: 20251217].

Petrick, Fabian; Herold, Christian; Petrushkov, Pavel; Khadivi, Siavash; Ney, Hermann (2023). Document-level language models for machine translation. ArXiv:2310.12303. <https://arxiv.org/abs/2310.12303>. [Accessed: 20251217].

Post, Matt; Junczys-Dowmunt, Marcin (2023). Escaping the sentence-level paradigm. ArXiv:2304.12959. <https://doi.org/10.48550/arXiv.2304.12959>. [Accessed: 20251217].

Rei, Ricardo; Stewart, Craig; Farinha, Ana C.; Lavie, Alon (2020). COMET: A Neural Framework for Mt Evaluation. ArXiv:2009.09025. <https://arxiv.org/abs/2009.09025>. [Accessed: 20251217].

Shen, Lihong (2012). Context and text. Theory and Practice in Language Studies, v. 2, n. 12, pp. 2663–2669. <https://www.academypublication.com/issues/past/tpls/vol02/12/28.pdf>. [Accessed: 20251217].

Silverman, Hugh J. (1986). What is textuality? Part II. Phenomenology + Pedagogy, v. 4, n. 1, pp. 54–61. <https://doi.org/10.29173/pandp15010>. [Accessed: 20251217].

Stalnaker, Robert (2002). Common ground. Linguistics and Philosophy, v. 25, n. 5–6, pp. 701-721. <https://doi.org/10.1023/A:1020867916902>. [Accessed: 20251217].

Tierney, Robert J.; Mosenthal, James H. (1983). Cohesion and textual coherence. Research in the Teaching of English, v. 17, n. 3, pp. 215-229. <https://www.jstor.org/stable/40170955>. [Accessed: 20251217].

Toral, Antonio; Castilho, Sheila; Hu, Ke; Way, Andy (2018). Attaining the unattainable? Reassessing claims of human parity in neural machine translation. ArXiv:1808.10432. <https://doi.org/10.48550/arXiv.1808.10432>. [Accessed: 20251217].

Vermeer, Hans J. (1978). Ein Rahmen für eine allgemeine Translationstheorie. Heidelberg: Groos.

Vernikos, Giorgos; Thompson, Brian; Mathur, Prashant; Federico, Marcello (2022). Embarrassingly easy document-level MT metrics: How to convert any pretrained metric into a document-level metric. ArXiv:2209.13654. <https://doi.org/10.48550/arXiv.2209.13654>. [Accessed: 20251217].

Von Wright, Georg Henrik (1981). Explanation and understanding of action. Revue internationale de philosophie, v 35, n. 135, pp. 127–142. <https://www.jstor.org/stable/23945379>. [Accessed: 20251217].

Wicks, Rachel; Post, Matt (2023). Identifying context-dependent translations for Evaluation Set Production. In: Koehn, Philipp; et al. (eds.). Proceedings of the Eighth Conference on Machine Translation (WMT), December 6-7, 2023. Association for Computational Linguistics, pp. 452-467. <https://aclanthology.org/2023.wmt-1.42/>. [Accessed: 20251217].

Wittgenstein, Ludwig (1958). Philosophical investigations. Oxford: Blackwell.

Yin, Kexin; Fernandes, Patrick; Pruthi, Danish; Chaudhary, Aditi; Martins, André F. T.; Neubig, Graham (2021). Do context-aware translation models pay the right attention? ArXiv:2105.06977. <https://doi.org/10.48550/arXiv.2105.06977>. [Accessed: 20251217].

Article Sidebar

Main Article Content

Article Details