Prime Recognition OCR Software

High Accuracy OCR Saving Operating Costs

Overview

This document shows how the Prime Recognition High Accuracy OCR engine can reduce the ongoing labor costs of OCR by 65% at a lower initial investment than conventional OCR.

OCR costs are highly variable based on the quality of images, labor rates, multiple shifts, etc. This analysis uses simplistic (and conservative) assumptions to make it easy to follow.  For most applications this analysis will understate the savings of using Prime Recognition’s OCR engine. 

For example, this analysis uses a base configuration that generates 65% fewer errors.  A "high end" version is available that can reduce errors by 82%.

Importance of OCR Cost Reduction

An image of a document, i.e., a piece of paper converted into pixels in computer memory, is worthless unless you also electronically capture information about the image’s content. This will allow later electronic retrieval. 

Ideally you would at least capture all the text that appears on the document. There is tremendous value in electronically capturing the text information of an image. This is evidenced by the fast growth of imaging systems in recent years for automated processing of insurance forms, medical claims, legal documents, and other types of data on paper.

Manual data entry is an accurate way to capture this data but very expensive because of the cost of labor. OCR is popular because it is usually significantly less expensive than manual data entry. However, OCR is less accurate than "triple key" data entry, even after OCR error correction.

Projects that do capture the full text of each page using OCR find that OCR error correction is typically 50-60% of the full imaging system’s cost! Because OCR is so expensive (and manual data entry is typically worse) the majority of imaging systems do not capture the full text of the page. Instead, OCR or manual data entry is used to capture one or several "indexes" for the page - key words or phrases that they hope will allow them to retrieve this page in the future as needed. This is obviously not what users would prefer but unless their data is of very high value, e.g., medical claims processing, they cannot afford to perform full OCR today.

The calculations below show how Prime Recognition lowers the cost of OCR, both on the initial investment, and on the ongoing costs, while at the same time increasing the accuracy of the data going into the user’s application. Many more users will now be able to cost effectively capture data from paper documents using OCR.

Conventional OCR

Assumptions

Notes

Average OCR accuracy rate is 98%

40 characters out of 2000 on a typical full text page will be wrong. This is a typical average error rate on "real world" documents in real production sites. Note that error rates are highly dependent on image quality.

OCR throughput is 420 characters/sec.

Assumes a 700 MHz Pentium  PC.

60% of OCR errors are marked as "suspicious" characters.

"Suspicious" characters are reviewed by data entry clerks to find and correct OCR errors. Errors that are not marked as suspicious - 40% of all errors - do not get reviewed, and are included in the final output. Users must use logical checks in their mainframe, database, or other target application to find and reject the data that includes errors (if possible)

Number of correct characters marked as suspicious is 2.5 times the total number of OCR errors.

The total number of suspicious characters marked is highly variable between OCR engines and is configurable. This number is an average across the top 5 OCR engines and represents the setting that finds the most errors but also marks the most correct characters.

Data entry clerk time:

0.5 seconds per suspicious character that is a correct character.
1.5 seconds per suspicious
character that is an error.

Most analysts quote a simpler number of 5 seconds per error. This number includes suspicious character processing and error correction. Prime Recognition’s number works out to 2.75 seconds per error.

Calculations

 1 second of conventional OCR generates:

  • 410 characters

  • 8.4 errors (420 char/sec * 2% error rate i.e., 98% accuracy rate)

  • 5.0 of the errors are marked as suspicious (8.4 errors * 60% marked)

  • 21.0 suspicious characters which are correct (8.4 errors * 2.5)

  • 7.50 seconds of error correction time (5.0 errors * 1.5 sec per error)

  • 10.5 seconds of checking suspicious characters (21.0 char * .5 sec/char)

Or in other words:

1 second of conventional OCR processing generates 18 seconds of editing time (7.50+10.5) and 3.4 errors (8.4-5.0) that get past manual error correction.

Prime Recognition High Accuracy OCR Engine

Key Benefits

  • Lowers OCR errors by 65%
  • Lowers OCR "suspicious" characters marked for clerical labor review by 65%
  • Data has 65% fewer OCR errors AFTER manual error correction.
  • Key Cost

  • 3.3 times slower than conventional OCR
  • Prime Recognition High Accuracy OCR Calculations

    Prime Recognition will take 3.3 seconds to produces 420 characters, and it will generate the following:

  • 3.0 errors (65% fewer errors)
  • 1.8 of the errors are marked as suspicious
  • (60% of errors, the same ratio as conventional OCR but on a 65% smaller base)
  • 7.4 suspicious characters which are correct (65% fewer suspicious characters)
  • 2.70 seconds of error correction time (1.8 errors * 1.5 sec per error)
  • 3.70 seconds of checking suspicious characters (7.4 * 0.5 sec per char)
  • Or in other words:

    For the same throughput (420 characters) the Prime Recognition High Accuracy OCR engine generates 6.4 seconds of editing time (2.70+3.70) and 1.2 errors (3.0-1.8) that get past manual error correction.

    Summary

    Conventional OCR

    Prime Recognition OCR

    OCR time

    1.0 second

    3.3 seconds

    Error Correction time

    18.0 seconds

    6.4 seconds

    Total Processing time

    19.0 seconds

    9.7 seconds

    Errors left in data  after manual error correction

    3.4 errors

    1.2 errors

    Financial Calculations

    Assumptions

    Notes

    Cost of PC is $5000

    This is simplistic because an OCR PC typically sits in a closet with no or inexpensive monitor/graphics adapter. An editing workstation, on the other hand, requires a large screen and sophisticated graphics adapter, plus chairs/desks/cubicles for the data entry clerk.

    Cost of data entry clerk is $20.00 per hour

    Direct hourly rate is $9.75 per hour and the remainder is the overhead costs of labor, including fringe benefits, sick time, vacation time, personal time, direct supervisory salaries, human resource and accounting overhead, cost of real estate per person, etc. 

    Cost of Prime Recognition software per PC is $14,940

    This is a "loaded" version.  Versions of PrimeOCR exist that cost %5,000.

    Cost of Conventional OCR and Editing
    Workstations software per PC is $8,000

    Assume a system which requires the throughput of one conventional OCR package running on one PC.

    Conventional OCR

    Capital Costs

     

    OCR PC

    $ 5,000

    1 station * $5,000

    OCR S/W

    $ 8,000

    1 station * $8,000

    Error Correction PCs

    $ 90,000

    Ratio of error correction time to OCR time is 18 seconds to 1 second, therefore18 stations * $5,000

    Error Correction S/W

    $ 144,000

    18 stations * $8,000

    Ongoing Costs (per year)

    Data Entry Clerks

    $691,200

    18 clerks * $20/hour * 8 hours/day * 240 work days per year

    Prime Recognition High Accuracy OCR

    Capital Costs

     

    OCR PC

    $ 16,500

    3.3 stations * $5,000

    OCR S/W

    $ 49,500

    3.3 stations * $14,940

    Error Correction PCs

    $ 32,000

    Ratio of error correction time to OCR time is 6.4 seconds to 3.3 seconds, therefore 6.4 stations * $5,000

    Error Correction S/W

    $ 51,200

    6.4 stations * $8,000

    Ongoing Costs (per year)

     

    Data Entry Clerks

    $245,800

    6.4 clerks * $20/hour * 8 hours/day * 240 work days per year

    Summary   

     

    Conventional OCR

    Prime Recognition OCR

    Capital Costs

    $247,000

    $149,200

    Ongoing Costs

    $691,200

    $245,800

    Cost of errors left in data

    Not Quantified

    Not Quantified


    Conclusions

    1. Prime Recognition’s OCR engines create lower capital costs. You must look beyond the cost of the OCR engines and include the costs of manual error correction workstations.

    2. Prime Recognition’s OCR engines create dramatically lower labor costs on an ongoing basis. Prime Recognition’s cost advantage will increase over time as the cost of PC power decreases by 25% per year and labor costs increase.

    3. Prime Recognition generates 65% fewer errors that get past manual error correction (up to 82% with added options). This saves costs in applications that are sensitive to errors in data. For example, some applications use mainframe, database, or other application logic to reject data with errors. These rejects then must be manually checked to see if the errors were caused by OCR or some other source, and fixed.

    4. The analysis above assumes that users want accurate data, and hence they use manual error correction to clean up data. This assumption applies to most applications. Some applications are purportedly not sensitive to errors in the data, e.g., full text searches with the new "fuzzy" search engines, so users are contemplating using OCR but without manual error correction. However, even fuzzy searches assume a significant level of accuracy in the data. If the error rate goes beyond that, perhaps on bad quality pages, those pages may not be found by electronic retrieval. Each user will have to decide how much risk they want to incur, e.g., is it OK if only 95% of the relevant documents show up in a search?

    Prime Recognition offers a lower cost product (as low as $1,300) for applications that do not want to manually correct OCR errors. This engine is lower cost because it does not need to generate the information required by error correction software, such as character confidence levels, and suspicious character image "bounding boxes".

      Bar Graph indicates PrimeOCR has lower operational costs for character error cleanup.

    PRIME RECOGNITION
    High Accuracy OCR Engine
    Copyright © 1996-2012
    Prime Recognition
    All rights reserved.

    Contact Us

    Information and Sales:
    sales@primerecognition.com

    Support:
    support@primerecognition.com

    Call Us
    (425) 895-0550

    Testimonials

    "The University of Michigan Digital Library Production Services is extraordinarily pleased with the increase in OCR quality made possible through the use of PrimeOCR. Scalability is a critical issue in digital libraries, and Prime Recognition has contributed to our creating a large and scalable digital library production service."
    ~ John Price-Wilkin, University of Michigan

    "PrimeOCR gives us a much cleaner document before verification than most OCR packages do after verification."  ~ Doug Thompson, Scan Center of America

    > Read More Customer Testimonials

    Try PrimeOCR | Site Map | Home