Deep Neural Networks Could Be Key to Ancient Text Restoration and Attribution, Research Shows
Uncovering the truths of ancient history can be complicated. Researchers must study texts inscribed in stone and clay, a process called epigraphy, but these inscriptions can be ineligible after centuries of damage. Recent research suggests that deep neural networks (DNNs) could help with ancient text restoration.
In a recently published study, DeepMind, a British artificial intelligence (AI) lab, teamed up with University of Oxford researchers to develop a DNN to improve epigraphy. On top of restoring inscriptions to make them more eligible, the neural network can attribute them to their original time and place more accurately. Tools like this could unlock a new era of historical discovery.
How Neural Networks Can Restore Ancient Documents
This new DNN, Ithaca, builds on an earlier DeepMind project called Pythia. Researchers started by training their machine learning model on ancient Greek text from the seventh century BCE to the fifth century CE. The team used natural language processing (NLP) techniques to teach Pythia to predict missing characters based on grammatical and syntactical context.
First, Pythia analyzed texts to recognize characters and interpret what sentences and words it could. Then, it used this context on a character and word level to predict missing text. Pythia had a 30.1% character error rate, compared to 57.3% of human epigraphists.
Ithaca takes this research several steps further. It learned on the same dataset as Pythia but uses different task heads tailored to three epigraphical processes: restoration, region predictions, and date predictions. These specialized heads work together to analyze long-term context and produce more comprehensive, reliable results.
Ithaca also presents a set of 20 likely predictions for missing or ineligible text instead of a single interpretation. This allows human epigraphists to use their expertise to find the most likely answer. This cooperation produced impressive results in tests.
By itself, Ithaca achieved 62% accuracy in ancient text restoration, compared to just 25% in historians working alone. When historians and Ithaca worked together, they achieved 72% accuracy. Similarly, they could predict original locations with 71% accuracy and date inscriptions within 30 years of their origin.
Implications for Future Historical Research
Widespread use of systems like Ithaca could vastly improve historical research. These tests suggest that working with DNNs could make epigraphy almost three times as accurate as entirely manual processes. Researchers could understand inscriptions better, uncovering lost truths about ancient civilizations when that happens.
In addition to improving accuracy, automated systems like this help streamline analytical processes, letting researchers accomplish more in less time. Those time savings are crucial, considering how complex historical research can be. Historians could use DNNs like Ithaca to accelerate epigraphy so they can then focus on the understanding and application of these texts.
As researchers restore and attribute additional inscriptions, these models could become more accurate in this work. Historians could learn more about ancient languages, leading to further discoveries. The world could step into a new era of learning and understanding.
Remaining Obstacles
As with any neural network use case, there are still some challenges for epigraphy DNNs like Ithaca. Ithaca can accurately date inscriptions, but it can only do so because it looks at texts within a set historical period, between 800 BCE and 800 CE. That’s a tremendous range, but it limits these models’ ability to analyze newly discovered texts that may be older.
NLP also struggles with languages’ continually evolving structures and meanings. Linguistics can change beyond recognition in as little as 500 years, and words can take on entirely new meanings far before then. This rapid evolution makes it difficult for humans and DNNs to reliably interpret ancient texts, especially if their date of origin is uncertain.
The nature of historical research poses some difficulties, too. As new evidence emerges, historians must continually revise what they thought they knew as facts about the past. Consequently, the grammar and mechanics rules we think we know about dead languages may not be accurate, leading to unreliable transcriptions. Certainty improves with more evidence, but that can be scarce with ancient civilizations.
AI Can Bring New Life to the Past
Despite these challenges, Ithaca represents a considerable step forward for machine learning and historical research, going beyond just ancient text restoration. It highlights the advantages of human-AI collaboration and paves the way for a new era of historical discovery. Neural networks could transform our understanding of the past as this research and its real-world applications expand.
Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Subscribe to our fast-growing Medium Publication too, the ODSC Journal, and inquire about becoming a writer.