The subject matter focuses on studying texts engraved on durable materials such as stone and metal, providing key first-hand evidence for understanding the Roman world. The field faces many challenges, including fragmented inscriptions, uncertain dates, different geographical sources, widespread use of abbreviations, and a large and rapidly growing corpus of over 176,000 Latin inscriptions, with approximately 1,500 new inscriptions added each year.
To address these challenges, Google DeepMind has developed Aeneas: Transformer-based generative neural networks that restore damaged text segments, chronology, geographic attribution, and context by retrieving relevant inscription similarities.

Challenges in the Latin title table
Latin inscriptions spanned two thousand years, from the 7th century BC to the 8th century AD, and were spread across the huge Roman Empire in more than 60 provinces. These inscriptions vary from imperial decrees and legal documents to tombstones and altars of dedication. Traditionally, detailed knowledge of language, formulas, and cultural contexts are used to restore partially lost or illegible texts and attribute the inscription to certain time ranges and locations by comparing linguistic and material evidence.
However, many inscriptions suffer physical damage, lacking segments of uncertain lengths. Extensive geographical dispersion and diachronic linguistic changes make dating and source attributed to complexes, especially when combined with pure corpus size. Manual identification of artificial similarities is labor-intensive and is often limited by expertise locally located to certain regions or periods.


Latin Electronic Dataset (LED)
Aeneas vs. Latin Electronic Dataset (LED)an integrated and coordinated corpus of 176,861 Latin inscriptions summarizes records from three major databases. The dataset includes approximately 16 million characters, covering inscriptions from the 7th to 8th centuries B.C. About 5% of these inscriptions have relevant grayscale images.
This dataset uses character-level transcription using special placeholder tokens: -
Mark missing text of known length #
Indicates the missing length segment. The metadata includes provincial sources of 62 Roman provinces and dated ten years ago.
Model architecture and input method
At the heart of Aeneas is a deep, narrow transformer decoder based on the T5 architecture, which has a rotating position embedding for efficient local and contextual feature processing. Text input is processed with an optional inscription image (when available) through a shallow convolutional network (Resnet-8), which embeds the image only on the geo-attribution header.
This model includes multiple professional task headers to be performed:
- recover: Predict missing characters and use auxiliary neural classifiers to support unknown gaps of any length.
- Geographic attribution: Inscriptions in 62 provinces are classified by combining text and visual embeddings.
- Age attribution: Estimate text dates using a predicted probability distribution consistent with the historical date range.
Furthermore, the model generates a unified historically rich embedding by combining the outputs of the core and task headers. This embedding can use cosine similarities to retrieve ranking inscription similarities that incorporate language, inscriptions, and broader cultural analogues rather than precise text matching.
Training setup and data enhancement
Training takes place on TPU V5E hardware, with batch sizes up to 1024 text image pairs. The loss of each task is combined with the weighting of optimization. Random text masking (up to 75% of characters), text clips, word deletion, punctuation drops, image enhancement (zoom, rotation, brightness/contrast adjustment), dropout and label smoothing to improve data, thus enhancing the data.
Prediction performs unknown text repairs with dedicated nonsequential logic to ensure multiple recovery candidates ranked by joint probability and length.
Performance and evaluation
AENEAS was evaluated in the LED testing set and evaluated through human collaborative research with 23 employees, demonstrating significant improvements:
- recover: When Aeneas support is provided, the role error rate (CER) is reduced to about 21%, compared to 39% for helpless human experts. The model itself reaches 23% CER in the test set.
- Geographic attribution: About 72% accuracy was achieved when the province was correctly classified out of 62 options. With Aeneas’ aid, historians improved accuracy by up to 68%, individually improving performance.
- Age attribution: The average date estimate error for Aeneas is about 13 years, and historians reduce the error of Aeneas from 31 years to 14 years through Aeneas.
- Context similarities: In about 90% of cases, the retrieved inscription similarities were accepted as a useful starting point for historical research and increased historian confidence by an average of 44%.
These improvements are statistically significant and emphasize the practicality of the model to enhance expert scholarship.
Case study
Res Gestae Divi Augusti:
Aeneas’ analysis of this enormous inscription reveals the bimodal date distribution, reflecting academic debates about its constituent layers and stages (first century BC and early 1st century BC). The saliency chart highlights the linguistic forms that are sensitive to dates, ancient spelling, institutional titles and personal names, and mirrors the inscription knowledge of experts. The retrieval similarities mainly include imperial laws and regulations and official senators, sharing formulaic and ideological traits.
Altar of devotion from Mainz (CIL XIII, 6665):
The inscription was dedicated by a military official in 211 AD and was geographically accurate and attributed to the Germanic superiors and related provinces. The significance chart identifies key consular date formulas and sublime references. Aeneas searched highly relevant similarities, including the 197 ce altar that shares rare text formulas and portraits, revealing historically meaningful connections beyond direct text overlap or spatial metadata.
The integration of research workflow and education
Aeneas is a collaborative tool, not a replacement for historians. It accelerates the search for similarities in the inscription, recovery and perfect attribution of AIDS, thus allowing scholars to release a focus on higher levels of explanation. The tools and data sets are publicly available through prediction past platforms under the permitted license. An educational course is developed by high school students and educators to promote interdisciplinary digital literacy by bridging AI and classical research.
FAQ 1: What is Aeneas and what tasks does it perform?
Aeneas is a generative multi-modal neural network of Latin inscriptions developed by Google Deepmind. It helps historians by restoring damaged or missing texts in ancient Latin inscriptions, estimating their dates within about 13 years, attributed to their geographical origins with about 72% accuracy, and searching parallel inscriptions related to history for contextual analysis.
FAQ 2: How does Aeneas deal with incomplete or damaged inscriptions?
Even if the length of the gap is unknown (a function called arbitrary length recovery), Aeneas can predict the missing text segment. It uses transformer-based architecture and specialized neural network headers to generate multiple reasonable recovery assumptions that rank through possibilities, facilitating expert evaluation and further research.
FAQ 3: How does Aeneas integrate into the historian’s workflow?
Aeneas provides historians with the inscriptions for ranking rankings and predictive assumptions of recovery, date and origin. These outputs increase historian confidence and accuracy, reduce research time by quickly proposing relevant texts, and support collaborative human AI analytics. Models and datasets can be accessed publicly by predicting past platforms.
Check Paper,,,,, project and Google DeepMind Blog. All credits for this study are to the researchers on the project. Subscribe now To our AI newsletter

Michal Sutter is a data science professional with a master’s degree in data science from the University of Padua. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels in transforming complex datasets into actionable insights.
