The Old Spanish Textual Archive, design and development of a corpus of medieval texts : lemmatization and pos tagging

Main Article Content

Francisco Gago Jover
Francisco Javier Pueyo Mena
This paper presents aspects related to the processing of forms, lemmas, grammatical analysis and texts in the Old Spanish Textual Archive (OSTA), a linguistic corpus of more than 32 million words, based on the more than 400 semipaleographic transcriptions of medieval texts written in Castilian, Asturian, Leonese, Navarro-Aragonese and Aragonese prepared by the collaborators of the Hispanic Seminary of Medieval Studies (HSMS). It also describes the process of tagging and lemmatization using Freeling, a Natural Language Processing tool, and HSMS-app, a textual analysis tool developed for this project.
Keywords
electronic corpus design, corpus annotation, digital medieval Spanish corpus, medieval Spanish

Article Details

How to Cite
Gago Jover, Francisco; Pueyo Mena, Francisco Javier. “The Old Spanish Textual Archive, design and development of a corpus of medieval texts : lemmatization and pos tagging”. Scriptum digital. Revista de corpus diacrònics i edició digital en Llengües iberoromàniques, 2018, no. 7, pp. 25-35, https://raco.cat/index.php/scriptumdigital/article/view/343462.