Source

NEW YORK, May 15, 2003: Sanskrit is among the world’s oldest recorded language, but putting works created over the last 3,000 years onto the Web has not been easy. Documents written in Devanagiri, a compound word whose literal translation means “city of immortals” and whose script is used for Sanskrit and other South Asian languages, can be scanned as images. However, optical character recognition, O.C.R., software for turning Devanagiri texts into digital information that can be searched and reformatted has not been commercially available. In an effort to accelerate the development of O.C.R. software for Devanagiri, the Center of Excellence in Document Analysis and Recognition, or Cedar, at the State University of New York at Buffalo, and the Indian Statistical Institute are distributing a script-recognition tool that they hope will become the international standard for software that can recognize Devanagiri. Their script-recognition software, which can be downloaded free at www.cedar.buffalo.edu/ILT, can separate lines and individual characters written in the flowing script. It then offers an on-screen transliteration in Roman characters for proofreading.