This article presents the results of a series of experiments with open-source neural network OCR software on a total of 88 medieval manuscripts ranging from the ninth through thirteenth centuries. Our scope in these experiments focused mainly on manuscripts written in Caroline minuscule, as well as a handful of test cases toward the end of our date range written in what may be called “Late Caroline” and “Early Gothic” scripts (termed “transitional” when taken together). In the following, we discuss the possibilities and challenges of using OCR on medieval manuscripts, neural network technology and its use in OCR software, the process and results of our experiments, and how these results offer a baseline for future research. Our results show potential for contributing to not only text recognition as such but also other areas of bibliography like paleographical analysis. In all of this, we want to emphasize the use of open-source software and sharing of data for decentralized, large-scale OCR with manuscripts in order to open up new collaborative avenues for innovation in the digital humanities and medieval studies.
Creative Commons License
This work is licensed under a Creative Commons Attribution-No Derivative Works 4.0 License.
Hawk, Brandon; Karaisl, Antonia; and White, Nick, "Modelling Medieval Hands: Practical OCR for Caroline Minuscule" (2018). Faculty Publications. 416.