Lemmatization and grammatical annotation of the Corpus Histórico Judeoespañol (CORHIJE): problems, solutions, and resolutions
Article Sidebar
Google Scholar citations
Main Article Content
Aitor García Moreno
Instituto de Lenguas y Culturas del Mediterráneo y Oriente Próximo
Francisco Javier Pueyo Mena
College of the Holy Cross
After a brief review of the most salient features of the Corpus Histórico Judeoespañol - CORHIJE —which was already presented at the 3rd Edition of the Congreso de Corpus Diacrónicos en lenguas Iberorrománicas (CODILI, Zurich 2014)—, this paper describes the ongoing process of lemmatization and grammatical annotation of the corpus. We focus on describing the challenges we have encountered during the annotation process and the solutions we have applied to them, which, in some cases, have led us to take relatively arbitrary resolutions in accordance with the description and analysis goals we were trying to achieve: problems, solutions, and resolutions that amplify the title of our presentation.
Keywords
Linguistic Corpora, Digital Corpus Design, Judeo-Spanish, Diachrony
Article Details
How to Cite
García Moreno, Aitor; and Pueyo Mena, Francisco Javier. “Lemmatization and grammatical annotation of the Corpus Histórico Judeoespañol (CORHIJE): problems, solutions, and resolutions”. Scriptum digital. Revista de corpus diacrònics i edició digital en Llengües iberoromàniques, no. 6, pp. 69-82, https://raco.cat/index.php/scriptumdigital/article/view/329260.
Most read articles by the same author(s)
- Francisco Gago Jover, Francisco Javier Pueyo Mena, The Old Spanish Textual Archive, design and development of a corpus of medieval texts : lemmatization and pos tagging , Scriptum digital. Revista de corpus diacrònics i edició digital en Llengües iberoromàniques: No. 7 (2018)