![]() |
|
|
|
. | High Accuracy OCR Saving Operating CostsOverview This document shows how the Prime Recognition High Accuracy OCR engine can reduce the ongoing labor costs of OCR by 65% at a lower initial investment than conventional OCR. OCR costs are highly variable based on the quality of images, labor rates, multiple shifts, etc. This analysis uses simplistic (and conservative) assumptions to make it easy to follow. For most applications this analysis will understate the savings of using Prime Recognitions OCR engine. For example, this analysis uses a base configuration that generates 65% fewer errors. A "high end" version is available that can reduce errors by 82%. Importance of OCR Cost Reduction An image of a document, i.e., a piece of paper converted into pixels in computer memory, is worthless unless you also electronically capture information about the images content. This will allow later electronic retrieval. Ideally you would at least capture all the text that appears on the document. There is tremendous value in electronically capturing the text information of an image. This is evidenced by the fast growth of imaging systems in recent years for automated processing of insurance forms, medical claims, legal documents, and other types of data on paper. Manual data entry is an accurate way to capture this data but very expensive because of the cost of labor. OCR is popular because it is usually significantly less expensive than manual data entry. However, OCR is less accurate than "triple key" data entry, even after OCR error correction. Projects that do capture the full text of each page using OCR find that OCR error correction is typically 50-60% of the full imaging system’s cost! Because OCR is so expensive (and manual data entry is typically worse) the majority of imaging systems do not capture the full text of the page. Instead, OCR or manual data entry is used to capture one or several "indexes" for the page - key words or phrases that they hope will allow them to retrieve this page in the future as needed. This is obviously not what users would prefer but unless their data is of very high value, e.g., medical claims processing, they cannot afford to perform full OCR today. The calculations below show how Prime Recognition lowers the
cost of OCR, both on the initial investment, and on the ongoing costs, while at the same
time increasing the accuracy of the data going into the users application. Many more
users will now be able to cost effectively capture data from paper documents using OCR.
Conventional OCR
Calculations 1 second of conventional OCR generates:
Or in other words: 1 second of conventional OCR processing generates 18 seconds of editing time (7.50+10.5) and 3.4 errors (8.4-5.0) that get past manual error correction. Prime Recognition High Accuracy OCR Engine Key Benefits Lowers OCR errors by 65% Lowers OCR "suspicious" characters marked for clerical labor review by 65% Data has 65% fewer OCR errors AFTER manual error correction. Key Cost 3.3 times slower than conventional OCR Prime Recognition High Accuracy OCR Calculations Prime Recognition will take 3.3 seconds to produces 420 characters, and it will generate the following: 3.0 errors (65% fewer errors) 1.8 of the errors are marked as suspicious (60% of errors, the same ratio as conventional OCR but on a 65% smaller base) 7.4 suspicious characters which are correct (65% fewer suspicious characters) 2.70 seconds of error correction time (1.8 errors * 1.5 sec per error) 3.70 seconds of checking suspicious characters (7.4 * 0.5 sec per char) Or in other words: For the same throughput (420 characters) the Prime Recognition High Accuracy OCR engine generates 6.4 seconds of editing time (2.70+3.70) and 1.2 errors (3.0-1.8) that get past manual error correction. Summary
Financial Calculations
Assume a system which requires the throughput of one conventional OCR package running on one PC. Conventional OCR Capital Costs
Ongoing Costs (per year)
Prime Recognition High Accuracy OCR Capital Costs
Ongoing Costs (per year)
Summary
Conclusions 1. Prime Recognitions OCR engines create lower capital costs. You must look beyond the cost of the OCR engines and include the costs of manual error correction workstations. 2. Prime Recognitions OCR engines create dramatically lower labor costs on an ongoing basis. Prime Recognitions cost advantage will increase over time as the cost of PC power decreases by 25% per year and labor costs increase. 3. Prime Recognition generates 65% fewer errors that get past manual error correction (up to 82% with added options). This saves costs in applications that are sensitive to errors in data. For example, some applications use mainframe, database, or other application logic to reject data with errors. These rejects then must be manually checked to see if the errors were caused by OCR or some other source, and fixed. 4. The analysis above assumes that users want accurate data, and hence they use manual error correction to clean up data. This assumption applies to most applications. Some applications are purportedly not sensitive to errors in the data, e.g., full text searches with the new "fuzzy" search engines, so users are contemplating using OCR but without manual error correction. However, even fuzzy searches assume a significant level of accuracy in the data. If the error rate goes beyond that, perhaps on bad quality pages, those pages may not be found by electronic retrieval. Each user will have to decide how much risk they want to incur, e.g., is it OK if only 95% of the relevant documents show up in a search? Prime Recognition offers a lower cost product (as low as $1,300) for applications that do not want to manually correct OCR errors. This engine is lower cost because it does not need to generate the information required by error correction software, such as character confidence levels, and suspicious character image "bounding boxes".
PRIME RECOGNITION |