![]() ![]() ![]() Using this, let’s quantify how much improvement we see using several representational data points across the sample dataset.įirst, let’s look at the issue of The Wooster Voice which is printed in a book-like dual column format of largely long-form editorial content. The question is, how much of an improvement would there be? As poor text quality is perhaps the biggest limitation in our project, addressing this issue is a high priority.įortunately, ABBY has a limited download 30-day trial for their newest OCR software (only the Win has the newest FineReader OCR engine). OCR’ing old newspapers from microfiche is notoriously error-prone one could expect significant improvements over the last 5 version updates released over the past 9 years. ĪBBYY recently released FineReader version 14 on. The obvious interpretation is that these scans were created using the ABBYY OCR engine FineReader version 9 released. According to the metatags in the *.xml version of files in the sample dataset the file was parsed with into the original …/TheFiveCollegesOfOhio_2012-Paper. For our HackOH5 Hackathon there is a sample dataset posted on the event’s website at .
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |