PDF Conversion Details

Overview

Prime Recognition software includes the capability to convert scanned images into PDF formatted files. Several products from Prime Recognition support PDF output, including PrimeOCR, an award winning, high accuracy "Voting" OCR engine, PrimeZone (image to PDF only), and PrimePost (PRO to PDF).

PrimeOCR's PDF output provides the most accurate OCR results available to the production imaging marketplace while minimizing PDF file size with full compression and retaining original image and text layout.

Supports Adobe Acrobat

Three styles of PDF documents can be produced:

PDF Image Only

documents contain a bitmap image of the original scanned document. Text is not included in this type of document.

PDF Normal

documents include the formatted text output from the PrimeOCR engine, and image zones, if any. These files are significantly smaller than the original compressed bitmap image files.

PDF Image with Hidden Text

includes information from both the PDF Image and the PDF Normal file types. The original bitmap image is included in the document while the OCR results are hidden behind the image. This type of document is useful when the original image needs to be retained while OCR results can be indexed, searched, or copied into another application.

Advantages of using PrimeOCR for PDF Creation

OCR Accuracy

PrimeOCR generates 50-80% fewer character recognition errors than other OCR engines.

Designed for high volume unattended production environments

Memory management for robust operation. Many of today's products that produce PDF files have limitations processing a large number of documents in batch mode, or handling multi-page TIFFs. Prime Recognition products manage memory effectively so thousands of images and multi-page TIFFs can be processed quickly without complications.

Capability to process batches of images in directories and subdirectories, facilitating hands off operations of large imaging jobs.

Fault tolerance and process logging. Image/OCR errors are captured and recorded in log files and processing continues automatically. The software is designed for robust, continuous operation.

Support for long filenames and NTFS compressed drives. Prime Recognition offers the latest in Windows compatibility.

Automatic zoning within OCR, or automatic zoning with manual QA, or manual zoning before OCR are supported.

Image enhancement may be controlled by the user and may be done in a separate step from OCR or within OCR process including deskew, auto-rotation, despeckle, etc.

Speed

The single engine (Level 1) version of PrimeOCR is over 85% faster than other production imaging solutions.

The very high accuracy (Level 6) version of PrimeOCR is at least 15% faster than alternatives.

Process Time

OCR Conversion of 21 multi-page TIFF files to PDF Image Plus Text

OCR Process	Time (min)	% faster with PrimeOCR
Other product	7:30	n/a
PrimeOCR Level 1	1:05	85%
PrimeOCR Level 3	3:00	55%
PrimeOCR Level 6	5:50	15%

Conversion of TIFF images to the PDF Image Only document format is 92% faster than alternatives.

Process Time

Conversion of 21 multi-page TIFF files to PDF Image Only

Process	Time (sec)	% faster with PrimeOCR>
Other product	80	n/a
PrimeOCR	6	92%

File Size

Prime Recognition's PDF output can save up to 80% disk space vs. other alternatives depending on the PDF file type.

Conversion of 21 multi-page TIFF files (876.6KB total size)

PDF File Type	PrimeOCR (File Size KB)	Other Product (File Size KB)	% saving with PrimeOCR
Normal	117.0	620.1	80%
Image Only	926.0	1263.0	25%
Image plus hidden text	988.0	1560.5	35%

All fonts are mapped to the base fonts found in the PDF reader reducing file size (however "look and feel" of document in PDF Normal format may suffer when the base fonts do not closely match fonts in document).
Both text and images are compressed within the PDF file to minimize file size.
To further minimize file size, desampling of the images within a PDF file is available with PrimeOCR PDF output. Desampling is fully configurable by the user from 50 dpi to 600 dpi.

PrimeOCR PDF I/O Specifications

Input File Formats:

TIFF - including large multi-page (>1,000's of pages) files
PCX
Bitonal images, color and grayscale
JPEG
PDF
many others ...

Output File Formats:

PDF Image Only
PDF Normal
PDF Image with Hidden Text
Color and grayscale output supported
Optimized PDF output
PDF/A
PDF/A-1a - Level A compliance
PDF/A-1b - Level B compliance
508 compliant PDF output - tagged PDF

Additional Information

Information and Sales:
sales@primerecognition.com

Support:
support@primerecognition.com

Call Us
(425) 895-0550

"The University of Michigan Digital Library Production Services is extraordinarily pleased with the increase in OCR quality made possible through the use of PrimeOCR. Scalability is a critical issue in digital libraries, and Prime Recognition has contributed to our creating a large and scalable digital library production service."
~ John Price-Wilkin, University of Michigan

"PrimeOCR gives us a much cleaner document before verification than most OCR packages do after verification." ~ Doug Thompson, Scan Center of America

> Read More Customer Testimonials

Products
PrimeOCR
PrimeView
PrimeVerify
PrimeZone
PrimePost

Services
OCR Conversion
PDF Conversion
PDF Accessibility
Consulting

Support
Configuration FAQ
OCR FAQ
Why High Accuracy OCR
Install FAQ
Licensing FAQ

More About Us
Why Prime Recognition
Our Customers
Our Partners
News
Contact us

Try PrimeOCR | Site Map | Home