El uso de la inteligencia artificial en la evaluación de la calidad de la subtitulación en vivo: el NER Buddy

Main Article Content

Pablo Romero-Fresco
Óscar Alonso Amigo
Luis Alonso Bacigalupe

La evaluación de la calidad de la traducción siempre está sujeta a altos niveles de subjetividad. Sin embargo, en áreas como la traducción audiovisual, se ha convertido en una práctica común evaluar objetivamente la calidad de los subtítulos de las transmisiones en vivo en televisión. En el subtitulado en vivo intralingüístico —un servicio de accesibilidad para personas con pérdida auditiva en el que los subtítulos están en el mismo idioma que el original—, el modelo NER fue propuesto por Romero-Fresco y Martínez (2015). No obstante, este modelo es complejo y requiere mucho tiempo. El propósito de este artículo es presentar los resultados de nuestra investigación sobre el desarrollo de una aplicación basada en inteligencia artificial para la evaluación (semi)automática de subtítulos en vivo utilizando la metodología NER. Actualmente, varias cadenas internacionales están probando esta aplicación.

Palabras clave
subtitulado en vivo, rehablado, reconocimiento automático del habla, ASR, el modelo NER, inteligencia artificial, IA, modelos masivos de lengua, LLM, evaluación automática, NER Buddy

Article Details

Cómo citar
Romero-Fresco, Pablo et al. «El uso de la inteligencia artificial en la evaluación de la calidad de la subtitulación en vivo: el NER Buddy». Tradumàtica: traducció i tecnologies de la informació i la comunicació, 2024, n.º 22, pp. 450-7, doi:10.5565/rev/tradumatica.408.
Citas

Bender, E. M., Gebru, T., MacMillan-Major, A. and Schmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp 610–623. <https://doi.org/10.1145/3442188.3445922>. [Accessed: 20240825].

Brown, T. et al. (2020). Language Models are Few-Shot Learners. <https://doi.org/10.48550/arXiv.2005.14165>. [Accessed: 20240815].

CRTC: Broadcasting Notice of Consultation CRTC 2019-9. Ottawa. <https://crtc.gc.ca/eng/archive/2019/2019-9.htm>. (2019a). [Accessed 20240625].

CRTC: Broadcasting Regulatory Policy CRTC 2019-308. Ottawa. <https://crtc.gc.ca/eng/archive/2019/2019-308.htm>. (2019b). [Accessed 20240625].

Devlin, J., Chang, M., Kenton, L., Toutanova, K. (2019). ]BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. <https://doi.org/10.48550/arXiv.1810.04805>. [Accessed 20240621].

Dumouchel, P., Boulianne, G. and Brousseau, J. (2011). Measures for quality of closed captioning, in: A. Şerban, A. Matamala and J. M. Lavaur (eds.). Audiovisual translation in closeup: Practical and theoretical approaches. Bern: Peter Lang, pp. 161–172.

HLAA (Hearing Loss Association of America): Hearing Loss: Facts and Statistics (2018). <https://www.hearingloss.org/wpcontent/uploads/HLAA_HearingLoss_Facts_Statistics.pdf?pdf=FactStats>. [Accessed 20240511].

Hughes, J., (2023). Introducing Ursa from Speechmatics. Speechmatics. <https://www.speechmatics.com/company/articles-and-news/introducing-ursa-the-worlds-most-accurate-speech-to-text>. [Accessed 20240908].

Kocmi, T. and Federmann, C. (2023). Large language models are state-of-the-art evaluators of translation quality, in European Association for Machine Translation (EAMT). <https://arxiv.org/abs/2302.14520>. [Accessed 20240807].

Lambourne, A. (2006). Subtitle Respeaking, in Carlo Eugeni and Gabriele Mack (eds.). Intralinea, Special Issue on Respeaking. <https://www.intralinea.org/specials/article/1686>. [Accessed 20240906].

Marsh, A. (2006). Respeaking for the BBC, in Carlo Eugeni and Gabriele Mack (eds.). Intralinea, Special Issue on Respeaking. <https://www.intralinea.org/specials/article/Respeaking_for_the_BBC>. [Accessed 20240906].

Mykhalevych, N. (2022). Survey: Why America is obsessed with subtitles, <https://preply.com/en/blog/americas-subtitles-use/>. [Accessed 20222010].

Papers with Code (2024). Multi-task Language Understanding on MMLU. <https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu>. [Accessed 20240908].

Pezeshkpour, P. and Hruschka, E. (2023). Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions. <https://doi.org/10.48550/arXiv.2308.11483>. [Accessed 20240525].

Radford, A., Narasimhan, K., Salimans, T. and Sutskever, I. (2018). Improving Language Understanding by Generative Pre-Training. Open AI. <https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf>. [Accessed 20240908].

Radford, A., Wook Kim, J., Xu, T., Brockman, G., McLeavey, C. and Sutskever, I. (2022). Robust Speech Recognition via Large-Scale Weak Supervision. <https://doi.org/10.48550/arXiv.2212.04356>. [Accessed 20240910].

Romero-Fresco, P. (2011). Subtitling Through Speech Recognition: Respeaking. Routledge: Manchester.

Romero-Fresco, P. (2020). Negotiating quality assessment in media accessibility: the case of live subtitling. Universal Access in the Information Society 20, pp. 741–751. <https://doi.org/10.1007/s10209-020-00735-6>. [Accessed 20240602].

Romero-Fresco, P. and Martínez, J. (2015). Accuracy rate in live subtitling: the NER model, in Díaz-Cintas, J., Baños, R. (eds.). Audiovisual Translation in a Global Context: Mapping an Ever-changing Landscape. London: Palgrave MacMillan, pp. 28–50. <https://doi.org/10.1057/9781137552891_3>. [Accessed 20240602].

Romero-Fresco, P. and Eugeni, C. (2020). Live subtitling through respeaking, in Bogucki, Ł. and Deckert, M. (eds.). Handbook of Audiovisual Translation and Media Accessibility. London: Palgrave MacMillan, pp. 269–297. <https://doi.org/10.1007/978-3-030-42105-2_14>. [Accessed 20240602]..

Romero-Fresco, P., & Fresno, N. (2023). The accuracy of automatic and human live captions in English. Linguistica Antverpiensia, New Series – Themes in Translation Studies, 22. <https://doi.org/10.52034/lans-tts.v22i.774>. [Accessed 20240715].

Stinson, M. S. (2015). Speech-to-text interpreting, in Pöchhacker, F. (ed.), Routledge Encyclopedia of Interpreting Studies. Manchester: Routledge, pp. 399-40.

Stureborg, R., Alikaniotis, D. and Suhara, Y. (2024). Large Language Models are Inconsistent and Biased Evaluators. <https://doi.org/10.48550/arXiv.2405.01724>. [Accessed 20240908].

Tang, L., Shalyminov, I., Wing-mei Wong, A., Burnsky, J., Vincent, J.W., Yang, Y., Singh, S., Feng, S., Song, H., Su, H., Sun, L., Zhang, Y., Mansour, S. and McKeown, K. (2024). TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization. <https://paperswithcode.com/paper/tofueval-evaluating-hallucinations-of-llms-on>. [Accessed 20240525].

UNE (2012). Subtitulado para personas sordas y personas con discapacidad auditiva. Madrid: UNE. <https://www.une.org/encuentra-tu-norma/busca-tu-norma/norma?c=N0049426>. [Accessed 20240521].

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I. (2017). Attention Is All You Need. <https://doi.org/10.48550/arXiv.1706.03762>. [Accessed 20240502].

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., and Zhou, D. (2023). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. <https://doi.org/10.48550/arXiv.2201.11903>. [Accessed 20240716].

Wells, T., Christoffels, D., Vogler, C., Kushalnagar, R. (2022). Comparing the Accuracy of ACE and WER Caption Metrics When Applied to Live Television Captioning, in Miesenberger, K., Kouroupetroglou, G., Mavrou, K., Manduchi, R., Covarrubias Rodriguez, M., Penáz, P. (eds.). Computers Helping People with Special Needs. ICCHP-AAATE 2022. Lecture Notes in Computer Science, vol 13341. Springer, Cham. <https://doi.org/10.1007/978-3-031-08648-9_61>. [Accessed 20240602].