Advances in HTR technology make handwritten documents even more accessible and discoverable
Scanning historical documents and making them available to scholars in digital format holds great promise for increasing the accessibility of primary-source materials. But researchers have faced key limitations in accessing handwritten letters, manuscripts, and other materials online.
Optical Character Recognition (OCR) technology allows librarians and archivists to scan and search text that is printed or typed, cursive writing and other handwritten texts have proven to be more of a challenge. Although these materials can be digitally scanned, researchers have had to rely on the accompanying metadata when searching for specific information, or else painstakingly read through the documents page by page to find what they’re looking for if no typewritten transcript exists.
New advancements in technology are changing that. For instance, Handwritten Text Recognition (HTR) technology is getting better at identifying handwritten characters with accuracy—and Quartex, a cloud-based digital collections solution from AM, has integrated HTR into its platform. This ground-breaking development makes digitised letters, manuscripts, and other handwritten materials fully keyword searchable, saving an enormous amount of time for researchers.
Quartex has extended this functionality with an automated transcription feature as well. The enhancement of HTR with Transcription means that librarians and archivists can now generate a fully editable and searchable transcript of each handwritten asset with a single click. This makes primary sources even more easily discoverable, especially for researchers who rely on transcriptions to support screen reader technology.
It’s really an invaluable resource to be able to search the full text of handwritten materials, rather than spending hours poring over page after page of manuscripts.
Opening new avenues for research
At McGill, Sundberg and her colleagues are using the Quartex platform to digitise and promote the university’s extensive collection of documents related to the fur trade business in and around Montreal. These documents include handwritten letters, invoices, and accounting ledgers from leaders of the fur trade business, including the university’s founder, James McGill.
“Before now, you could look at these documents if you came in and consulted our archives,” Sundberg says. There is a finding aid for each of the folders that make up the collection. However, researchers would still have to search through pages of documents in each folder to find what they’re looking for.
“What we have now is a giant leap forward,” she says. All of the files have been digitised and are available in one place online. All of the documents have been scanned using OCR for typewritten text and HTR for handwritten text.
Although the character recognition rate is not quite 100 percent, “it’s an impressive number,” Sundberg says. “What we have now is a workable search tool. If you search for a person or place name, you’re going to get a very good rate of return on those keyword phrases across the whole collection.”
If someone types a search term into the platform, such as “Northwest Company,” all of the documents that contain that phrase would appear. Researchers aren’t limited to the collection’s descriptors or metadata when searching for information.
The technology gives researchers “a more comprehensive look into the documents,” she observes.
This creates new avenues for research that might not have been discovered using traditional techniques. It opens up a lot of doors.
A ‘game changer’ for researchers
Baylor University in Texas is using HTR with Transcription to enhance its Armstrong Browning Collection, which includes extensive correspondence to and from the Victorian poets Elizabeth Barrett and Robert Browning. Darryl Stuhr, director of digitization and digital preservation services at Baylor, calls the technology a “game changer” for researchers.
The Armstrong Browning Collection actually consists of separate collections, including the Browning Letters, which have already been transcribed by a Browning scholar; the Victorian Letters, a collection of more than 3,300 other letters from the Victorian era, some of which have been transcribed by graduate students; and the Browning Manuscripts, which contains handwritten manuscripts of the Brownings’ works.
Baylor is using HTR with Transcription to generate automatic transcripts for the items that need these. “We’re hoping the system can also correct some of the mistakes our graduate students might have made in transcribing,” Stuhr says.
Assembling the Armstrong Browning Collection has been a collaborative process. Baylor initially embarked on the project with Wellesley College, which owned 500 love letters the Brownings wrote to each other. Since then, Baylor has also digitised letters and manuscripts owned by the University of Texas, Texas A&M, the Ohio State University, and other institutions—and it’s hoping to add materials from the Bodleian Library at Oxford and Eton College as well.
“Nobody’s going to give up the letters in their archives,” Stuhr says. “Really, the only way for all these items to come together is in a digital collections management system like Quartex, which gives us the ability to present a more comprehensive collection."
Twenty years ago, scholars had to travel from library to library to access these materials. This platform makes them much more widely available. Using HTR with Transcription to provide a complete set of transcriptions of every handwritten document across the collection should also make their contents more discoverable, to the benefit of our research community around the world.
A version of this article was originally published in Library Journal.
Amnesty International Archives: A Global Movement for Human Rights, the newly-published primary source database from AM, invites students and researchers to explore the history of the leading human rights NGO, and how their activities intersected with other key events in the development of the idea of universal human rights.
The Loyola Marymount University (LMU) adopted AM Quartex in 2019. Neel Agrawal, Digital Projects Librarian, William H. Hannon Library, explores how innovative partnerships and collaborations have contributed to the development of the library's digital collections.