A New Lens into the Archive
You are in an archive. You find a document in a language you don't understand. You take a photo, input it into Gemini 3 Pro. 60 seconds later you have a transcription, transliteration, and translation
The world changed on Tuesday, November 18th 2025, when Google launched Gemini 3 Pro Preview. For the last sixteen days I have been testing Gemini 3’s capabilities to perform high quality transcription, transliteration and translation in multiple languages. The product is a game changer in the world of machine transcription. The camera fantasy is not quite there yet, but it is coming. A Google Lens into the Archive.
I can confirm everything Professor Mark Humphries has said in his two important Substack posts (Generative History), and more.1 2 Especially for non-archivist and non-librarian users like me, interested in transcribing our own research collections of a few images up to tens of thousands, wanting flexibility and control of work flows, and the ability to summarise and enrich those transcriptions on the fly as part of our historical research.
This article
This Substack article concentrates on just two of my twenty eight experiments in the last sixteen days. They concern a late C19th Armenian celebratory letter, and a mixed Russian/Armenian birth certificate from 1907. But I have also successfully transcribed, transliterated, translated, précised, and contextualised scholarly and liturgical Armenian texts from earlier periods using Gemini 3. Rough and ready? Yes. How rough and ready I will leave it to Armenian scholars to judge. I’m seeking with this article to stimulate experiment, not to set myself up as a reader of Armenian.
Tip: You may need to enlarge the text on your screen to see some of the embedded images in this article
I will publish a second Substack article next week about my experiments with English High Court of Admiralty (HCA) legal depositions, 1573 - 1689, which I know like the back of my hand, having worked with them since 2012. I have 30,000 untranscribed images just waiting for machine transcription and a ground truth of over 6 million words sitting on the MarineLives website, in Transkribus, and in multiple Cloud accounts.
I will let you know in that article how my large scale batch processing of HCA depositions using Gemini 3 Pro stacks up against processing the same images in Transkribus and in Leo. In terms of quality, reliability, reproducability, scalability, cost, and flexibility to integrate into broader work processes.
Leo, like Gemini 3 Pro, is based on Large Language Model technology, whereas Transkribus uses a mix of technologies - Convoluted Neural Networks, Recurent Neural Networks, and Transformers, the latter used for “Super Models”. Transformers use self-attention mechanisms to model long-range dependencies across an entire sequence, and can lead to improved accuracy, especially with varied scripts and languages.
I have deep affection for Transkribus, as an early adopter in 2018 and a member of the READ-COOP, the cooperative which manages product development and delivery. But the world moves on, and we are now in a multi-polar world of transcription technologies in which Transkribus is no longer the sole credible player.
The challenge
Why on earth did I chose Armenian to test Gemini 3 Pro’s non-English capabilities? For no better reason than I have been married to an Armenian for forty years and would love to be able to read Armenian manuscripts. Note “would love to”. This means that I am offering my analysis with one HUGE caution. My checking of accuracy has been done by testing the outputs conceptually, and by using a second LLM (Chat GPT 5.1) to audit the work, and by getting a second instance of Gemini 3 Pro (which is unaware of the output of the first instance) to forensically dissect and critique the first Gemini 3 Pro’s work. And believe me, they are critical and insightful, and had useful process improvement suggestions, which I implemented. I look forward to a critique of my approach.
I am not arguing that Gemini 3 Pro is producing near 100% accuracy in terms of character by character, word by word, line by line, or region by region accuracy. And I am not presenting quantified results against benchmark data, which I will leave to others.3
I am arguing qualitatively, as an experienced user of machine transcription and as an expert English language palaeographer, that Gemini 3 Pro is producing output which means it needs to be taken very seriously as users of all sorts consider their options for transcription. My central assertion is that Gemini 3 Pro appears damned good prima facie at transcribing, transliterating and translating Armenian (and probably Russian) and is likely to have analogous utility for other medium and low resource languages.
The key, I believe, is Gemini 3 Pro’s deep visual grounding at pixel, line and region level, rather than extensive exposure to medium and low resource languages in its training. This I hypothesise enables it to extract more value out of less training data. But that is a hypothesis, no more. I am absolutely happy to be contradicted and encourage my readers to do so and to share their own data for medium and low resource languages, together with what work arounds they have tried. For me this was an unexpected, and a surprising result. Moreover I find Gemini 3 Pro to be a highly flexible, customisable tool, with which historical researchers can engage in a very granular way.
I am hoping this article will inspire historians and hobbyists (I am a bit of both) to test Gemini 3 Pro on classical Bengali, Sanskrit and Yiddish, to name just three languages of importance to scholarship, which are inaccessible to the vast majority of scholars, let alone to general readers.
Contrasting images
Take a look at three images below. I have produced excellent results with all three in terms of transcription, transliteration and translation. I have tested a good number of additional Modern and C17th English language images, including documents with significant Latin, and personal letters from the mid-C20th in the handwriting of my parents, but these three Armenian language examples are useful to demonstrate a variety of document types and layouts with which Gemini 3 can work. They are all in good condition, but I have had good results with faded ink and blotted text. Reportedly (and hopefully), Gemini 3 is also good with bad microfilmed images, but I have yet to test this.
Image 1: The first is a C17th Armenian language manuscript held in the Armenian Rarities Collection, African and Middle Eastern Division at the Library of Congress, with script in black and red. It discusses the number of Sibyls (prophetesses), and is based on prior Latin language sources.
Image 2: The second is a celebratory letter written by a large number of former students of educator and writer Smbat Shahazizean. There is a second page, which I have not reproduced, which contains further text and signatures. There is a large oramental first letter and text of varying width and colour. It dates from 1897.
Image 3: The third is a birth certificate of Arusiak Adamyants born on April 24, 1904 in Shushi issued in the Municipality of Shushi (then part of the Russian Empire) in 1907. It is a bilingual text with printed pre-reform Russian cyrillic and Armenian print, and Russian and Armenian handwriting. Arranged in two columns with an archival accession stamp. Columns are usually a hassle to deal with in machine transcription, though Transkribus has made great progress with columns and tables in terms of layout analysis.4
I have sourced the three images online and wish to thank Muhannad Salhi and Julia Hintlian for the first image, and Ruben Malayan for the second two. All three are publicly available. The birth certificate is that of Ruben Malayan’s paternal grandmother. 5 6 If you follow the references and links to the three images online, you will see some limited contextualisation, but importantly no English language translation for Gemini 3 to access.
How do I know all this? Because Gemini 3 Pro has transcribed, transliterated and translated the documents for me. It has summarised them, contextualised them, and has suggested research questions and approaches to dig deeper into the documents. It has checked up on identified Named Entities (people, places, events) against the historical record, where it exists, has told me if the data are dodgy, and has provided verifiable citations to support its contextualisation.
We have had a conversation, in which I was the blind non-Armenian speaking researcher asking probing questions. I specified the research approach, specified and co-designed the outputs, which we structured in JSON, and co-designed the testing protocols to be used by ChatGPT 5.1 and a second instance of Gemini 3 Pro to audit the output and approach taken by Gemini, and to check whether they agree on the contextualisation.
Here is the second instance of Gemini 3 Pro commenting on the context of the Celebratory letter. Given that it is based on just one page of a two page letter it may not be entirely accurate, but is a start.
Let’s dig in
So let’s dig in. What did I do? What did Gemini 3 do? How did we work together? How did I check the work?
I’m going to take you through parts of two worked examples, which use my second and third images (my Experiment 22 and Experiment 28). Of course I learned from my experiments, and I am showing you my improved work flow and improved outputs. I had plenty of failures along the way as I learned to machine transcribe with a LLM, which is different from Transkribus (with which I have a large amount of experience).
Google AI Studio
I chose to work with Google AI Studio, which is Google’s platform for developers to access the Gemini API directly, and which offers greater control than the consumer chatbot we all know. But if I can use AI Studio, I assure you that you too can. It is intuitive and easy to use. You can use a free version, or a paid API key. I went with the paid API, linked to my Google Cloud account, to give me greater project control.
System Instructions and User Prompt(s)
Below is an image of the Google AI Studio console for Experiment 28. The model I am using is Gemini 3 Pro (top RH of the image). The Systems Instructions have been entered below the model choice. I am using a set of instructions I have named “Expert Russian and Armenian Digital Palaeographer V1.2”. The first User Prompt has been entered in the central chat box and I have pasted into the same box an image of the Russian/Armenian language document which I simply copied from the web.
The Systems Instructions specify an expert persona and define output formatting rules. In this version I request a four-part JSON, which is to comprise a précis, a transcription, a transliteration and a translation. In a later version I added a section five to the JSON which was a short contextualisation of the document, and a section six, which logged both the Systems Instructions used, and all User Prompts.
The instructions specify a full diplomatic transcription, and instruct Gemini to strongly visually ground its work in visual regions, lines and pixels. The latter is key, since it reinfores Gemini 3’s very strong visual grounding skills, and it is this, together with its strong multi-modal design, which renders Gemini 3 so strong at machine transcription tasks.
Here are the Systems Instructions:
**Expert Russian and Armenian digital palaeographer V.1.2:** You are an expert digital palaeographer with expert knowledge of the Armenian and Russian languages and writing. You understand how to reproduce Armenian and Russian characters, how to transliterate them into Roman characters, and how to translate them into English. Your approach to palaeography is to produce full diplomatic transcription strongly grounded in visual regions, lines, and pixels.
**Output Formatting Rules:** Create a four-part JSON containing: Part One (Precis), Part Two (Transcription), Part Three (Transliteration), and Part Four (Translation).
**Crucial Transcription Schema:** `part_two_transcription`, `part_three_transliteration`, and `part_four_translation` must be a JSON object (not a single string) that separates the text into logical regions (e.g., `”header”`, `”salutation”`, `”main_body”`, `”margin_notes”`, `”signature”`).
The content of each region must be formatted in one of two ways, depending on the visual layout:
1. **For single-column text:** An **array of strings**, where each string represents exactly one visual line of text.
2. **For multi-column text:** A **nested JSON object containing labeled arrays** (e.g., `{”russian_column”: [...], “armenian_column”: [...]}`), where each array corresponds to a specific vertical column in the document.
The User Prompt is probably overkill, but describes a sequential process for Gemini to follow. Importantly it alerts Gemini to the two column layout in the image, though the formatting rubric in the Systems Instructions alert Gemini to the possibility of single or multi-column images, and Gemini should auto-detect the layout. I’m still experimenting with this, but the layout recognition of Gemini 3 is very strong.
“Copy this uploaded image which contains Armenian handwriting on a printed Russian official form. Do this very precisely, paying great attention to the different text regions. Transcribe the text accurately, carefully grounded in the pixels in each line within the regions. Layout your transcription according to the layout of the image. After completing the transcription in the Armenian and Russian languages, proceed to create a transliteration using Roman characters, again laying the transliteration out according to the layout of the image, respecting the columns and regions in the image. Finally translate the transliteration into English and layout according to the layout of the image, respecting the original lineation.
This document contains a distinct two-column layout (Russian on the left, Armenian on the right). In Parts Two, Three, and Four, please structure the JSON to explicitly separate these columns within each region using the keys `russian_column` and `armenian_column`.”
JSON Output
So what does the output look like?
Here are the transcription, transliteration and translation JSON outputs for Experiment 22, the celebratory letter. The letter was written, according to Gemini, and according to Ruben Malayan, in ‘Sła’gir (Շխագիր) cursive script, which is rarely seen in manuscripts and was used mainly for notes and personal letters. I would expect that the quality of the output would be improved if I fed the second page of the two page letter into Gemini and asked Gemini to cross check between the two pages.
And here is part four of more complex JSON outputs from Experiment 28, which incorporate the two column structure and multi-lingual JSON tagging required by the more complex underlying document.
IIIF
Post-processing of the JSON output is possible. For example, you can ask Gemini 3 to convert the JSON output into TEI/XML according to any schema you specify.
You can also, if you feel ambitious, ask Gemini to create a IIIF package including image and text to publish on a IIIF server or as a static demonstration. I can confirm that this works and that Gemini writes viable code. Using Nano Banana, the snazzily branded image generator which is actually under the hood called Gemini 3 Pro Image Preview, I asked Gemini to create detailed instructions and also a simple infographic how to mount the IIIF package as a demo.
In conclusion
The world has got more exciting since November 18th. Reach out to the British Library website, the Library of Congress, the Bibliothèque Nationale de France. Go back into your local archives and photograph some hard to read documents that fascinate you. Clip some manuscript images. The more exotic the better, whether it be layout, language or some other feature.
Test Gemini 3 Pro. Try your images in Transkribus and Leo. Try doing the transcription and translation by hand. Share your findings. And consider joining our AI + History Collaboratory. The first meeting of our new series is on Tuesday, December 9th 2025 @ 4 pm UK time; 5 pm Paris, Berlin, Madrid; 11 am EST by ZOOM. I will be demonstrating the design and development of a Socratic Research Agent for historical research and we will be making some changes to the Research Agent on the fly, in response to Collaboratory feedback. Finally, come along to our January meeting (January 20th 2026) at which we will be discussing HTR, NER and the impact of Gemini 3 Pro and other LLMs on the world of machine transcription, named entity extraction and annotation.
Humphries, Mark, ‘Gemini 3 Solves Handwriting Recognition and it’s a Bitter Lesson: Testing shows that Gemini 3 has effectively solved handwriting on English texts, one of the oldest problems in AI, achieving expert human levels of performance.’, Generative History, November 24th 2025, accessed 04/12/2025. Click here.
Humphries, Mark, ‘The Sugar Loaf Test: How an 18th-Century Ledger Reveals Gemini 3.0’s Emergent Reasoning': A deep dive into my experience testing the new Gemini 3.0 Pro and the growing evidence I’ve seen for emergent neuro-symbolic reasoning.’, Generative History, November 18th 2025, accessed 04/12.2025. Click here.
Crosilla G, Klic L, Colavizza G (2025), “Benchmarking large language models for handwritten text recognition”. Journal of Documentation, Vol. 81 No. 7 pp. 334–354, doi: https://doi.org/10.1108/JD-03-2025-0082, accessed 04/12/2025. Click here
‘Muhannad Salhi, ‘New Research on Armenian Manuscripts Held at the Library of Congress’, Blog post, October 10, 2024, Library of Congress. Image accompanied by text ‘A page on sibyls from a 17th-century Armenian manuscript from the Armenian Rarities Collection, African and Middle Eastern Division’. Click [here] Salhi reports that the manuscript from which this image is taken was discussed by Julia Hintlian of Harvard University in a paper titled “Searching for Molino: Sibyls and Amazons in a 17th-Century Armenian Manuscript.” This manuscript can be viewed on a IIIF server at the Library of Congress as part of a 402 image gallery of mixed manuscript material ‘[Vasn erkutsʻ ... ]: Click [here]
‘Letter from students to Smbat Shahazizean, GAT (Fond Lazarean) 1897 Moscow. Illustrated by Vardges Sureniants’, with additional text’, discussed in ‘Historical Background’ webpage on website titled ‘The Art of Armenian Calligraphy by Ruben Malayan’, accessed 03/12/2025. Click [here]; ‘Birth certificate Author’s Paternal Grandmother, Arusiak Adamyants born on April 24, 1904 in Shushi. The text is bilingual with Russian on the left. This is an good example of developed writing culture present in Artsakh at the turn of the 20th century as it is executed by an anonymous clerk in the Municipality of Shushi (then part of the Russian Empire). Image: personal archive of Ruben Malayan’, discussed in ‘Historical Background’ webpage on website titled ‘The Art of Armenian Calligraphy by Ruben Malayan’, accessed 03/12/2025. Click [here];









