Corpus de textos notarials extremenys (cortenex s. XVII). L’edició d’un corpus historicolingüístic a en àmbit de les humanitats digitals

Inmaculada González Sopeña

PDF (Castellano)

Inmaculada González Sopeña

This article focuses on the methodology followed in the preparation of a corpus of notarial documentation from Extremadura during the seventeenth century (CORTENEX), thanks to the proposals of the TEI consortium regarding the coding and labeling of historical documents. This is based on the use of XML markup language in the transcriptions and in the linguistic processing of the texts in the TEITOK digital platform through four fundamental phases: tokenization, normalization, stemming and morphosyntactic annotation. CORTENEX is a subcorpus of Oralia diacrónica del español (ODE). Currently, CORTENEX already has accessible documentation, and, due to the type of texts that it includes, its fundamental interest lies in analyzing the lexical variation of the Spanish developed in the territory that corresponds to Extremadura. This variety practically lacks diachronic studies that allow analyze the language of that region from a historical perspective.

Keywords

notarial documentation, corpus linguistics, XML, TEITOK, history of Spanish lexicon

How to Cite

González Sopeña, Inmaculada. “Corpora of Notarial Texts from Extremadura (Cortenex S. XVII): Editing Historical Linguistic Corpus in the Field of Digital Humanities”. Dialectologia: revista electrònica, no. 31, pp. 105-26, https://raco.cat/index.php/Dialectologia/article/view/419490.

Rights

Copyright

Dialectologia: revista electrònica està subjecta a una llicència Creative Commons 3.0 de Reconeixement - No Comercial - Sense Obres Derivades.

Article Sidebar

Main Article Content

Article Details

Copyright