The Schoenberg Institute for Manuscript Studies Online Lecture Series presents regularly scheduled lectures related to the study of premodern manuscript books and global manuscript culture.
Recent advances in machine learning combined with the availability of millions of images of manuscript pages means that we are now able to produce automatic transcriptions of medieval and other manuscripts, with over 99% accuracy in the right circumstances. This is extremely promising and opens up many new possibilities, but – as with any new approach – it naturally raises challenges and questions as well. Perhaps the first question is how we can best make use of this opportunity, in other words, how to read a million manuscripts. At the same time, machine learning and other "big data" approaches also raise questions about representation, since by definition they only work for scripts and languages that are already available in large quantities, whereas rare or historical languages that have fewer resources become all the more ignored. This talk will address these questions in the context of kraken and eScriptorium, a pair of tools for automatic transcription of handwritten and printed documents especially for rare and historical scripts, led by the Digital Humanities team in the lab "Archéologie et Philologie d'Orient et d'Occident" at the École Pratique des Hautes Études – Université PSL, in Paris.