Under Jewish law, religious texts cannot simply be thrown away once they're worn out. While many texts were buried, many synagogues also operated genizahs, or storerooms, to store disused holy texts. The Cairo Genizah is one of the most valuable sources of primary documents for medieval historians and religious scholars. The 350,000 fragments found in the Genizah include religious texts, as well as social and commercial documents, dating from the 9th to 19th century. The collection is scattered among 70 institutions worldwide, and scholars are hampered by both the wide dispersal of the collection as well as their fragmentary condition.
Researchers at Tel Aviv University are working to piece together this collection - bringing the pages of the texts back together for the first time in centuries. The results are being made available to scholars around the world through a website. Professors Lior Wolf and Nachum Dershowitz of TAU's Blavatnik School of Computer Science have developed sophisticated software, based on facial recognition technology, that can identify digitized Genizah fragments thought to be a part of the same work and make editorial "joins."
Whereas scholars concentrate primarily on content, the software looks at features of the writing itself, since it cannot read what is written. Using computer vision and image processing tools developed at TAU, the software analyzes fragments based on parameters such as the handwriting, the physical properties of the page, and the spacing between lines of writing. The program scans digitized fragments for "matches," and joins them together in a kind of digital loose-leaf binder. "Its big advantage is that it doesn't tire after examining thousands of fragments," Dershowitz says. A scholar must then review and verify the computer-proposed "joins."
So far, Wolf says, the researchers have had a great deal of success. Within a few months, they made some 1,000 confirmed "joins," almost as many as were made in 100 years of Cairo Genizah scholarship. One exciting find, he notes, was the identification of pages from a work by Saadia Gaon, a prominent rabbi and philosopher from the 10th century. "All extant specimens of his work were thought to have been already discovered," he explains.