PrimeOCR PDF Accessibility | Prime Recognition

Accessible PDF Overview

PrimeOCR accessible PDF output offers a number of features that increase the accessibility of the contents of the PDF file for people with disabilities. These features also help government agencies and businesses comply with US government regulations as outlined in Section 508 of the Rehabilitation Act.

Section 508 requires that Federal agencies' electronic documents, including PDF files, is accessible to people with disabilities.

Features:

PrimeOCR provides accessibility and usability to OCRed text within scanned image PDF files that meet Section 508 standards.

Higher character recognition accuracy delivers more accurate results
Text reading order can be identified
Alternate text can be inserted for each graphic on a scanned page
Natural language of recognized text can be included
Text can be viewed in "Relfow" view mode - including text from Image plus Hidden Text files
Text retains formatting and paragraph wrapping when exported to another application

508 requirements:	PrimeOCR PDF output:
A text equivalent for every non text element shall be provided	Alternate text provided for each significant graphic on the page. Alternate text can be provided by the user for each zoned graphic on the scanned image (using PrimeView) If alternate text is not provided by the operator from manual zoning then PrimeOCR automatically inserts a default string "This is a graphic from a scanned page". The alternate text can be updated later with a PDF tag editor.
Documents shall be organized so they are readable without requiring an associated style sheet	Reading order is identified and tagged in the PDF file along with paragraph markers. Reading order is determined from autozoning the scanned page or set by the user manually using PrimeView. The language of the document is identified and tagged for each page. (The language for the page is from the language setting in the processing template as defined in PrimeView or API call.) Text retains formatting so when the OCR text is exported to RTF from a PDF viewer the output text reflows within paragraphs and character formatting is retained. OCRed text can be viewed in "relflow" mode for all versions of PDF output including image plus hidden text.
Row and column headers shall be identified for data tables	Rows and columns are defined and tagged in the PDF file. Rows and columns must be manually defined (using PrimeView) within the template or autozoning will segment the table as a single text zone or as several text zones.

Automated processing:

Autozoning may be used to automatically include accessible features into PDF output. Autozoning automatically finds graphics on the scanned page and identifies columns of text to be recognized. If the document is too complex and autozoning does not reliably determine correct reading order then manual zoning with PrimeView may be used (see below). If using autozoning during OCR, PDF accessible output will include:

Reading order of the text that is recognized during OCR. Text will be identified and tagged in the PDF output file.
Paragraphs will be identified within the text for later export into another application. Formatting of the text is retained during export.
Graphics within the scanned page will be identified and tagged with a generic alternative text tag - "This is a graphic from a scanned page". The alternative text for the graphic can be easily modified later with PDF editing software.
Tables will not be recognized or tagged when using autozoning. Tables can only be identified by manually zoning the page prior to OCR taking place.

Manual processing:

Depending on the document style a less automated approach may need be used to provide more accurate accessible OCR text in PDF output. By using new features in PrimeView, specifically designed for adding accessibility attributes to PDF output, an operator can quickly zone text columns, zone graphics on a scanned page, provide alternate text for each graphic and identify table rows and columns.

The zoning information collected from PrimeView is provided to PrimeOCR when the PDF output is generated from the scanned image to create accessible PDF output.

Additional information - links: