PrimeOCR Job Server Installation F.A.Q.

The PrimeOCR Job Server includes many features for production OCR processing. Listed below are common operating configurations for the software. Send us an email if you have a specific need that you cannot find here or in the PrimeOCR Job Server.

How do I convert image only PDFs to searchable PDFs?
How can I configure four licenses of the Job Server running on four different machines to work on the same repository of images at the same time?
Can I generate_two_different kinds of output formats during processing?
How can I process images from different directories through PrimeOCR?
How can I configure the PrimeOCR Job Server to use watched folders?
How can I improve processing_speed?
My images have mixed orientation. Can PrimeOCR automatically find the correct orientation of the text and automatically rotate them for me?
When would I use both the_Primary and the Low Priority Job Directories?
How can I enable_many_number of PDF output options like the ability to downsample images, adding thumbnails, optimizing PDF output and creating Accessible PDF output?
I want to save the confid.txt file (processing statistics) with the images that are processed. How do I change its location?
What is the difference between lexical check and lexical plus?
Can PrimeOCR move my images to a fixed output directory during processing?
I see that output (ASCII or RTF) includes a carriage return on every line of text - can I have the text wrap in paragraphs?
How can I save_output a different directory?

How do I convert image only PDFs to searchable PDFs?

For single page image PDF files default settings can be used which write the searchable PDF output to the input directory. The searchable PDF output file will overwrite the image only PDF file.
For multi-page PDF files, under Setup\Output select "Save output to a different directory" and select a path. PrimeOCR will read image only PDF files from the input directory and output them to an output directory.
If you want the input files to be deleted when processing is complete then under Setup\Input\More Settings select "Erase image files after processing".

How can I configure four licenses of the Job Server running on four different machines to work on the same repository of images at the same time?

Install PrimeOCR onto each of the four PCs.
Mount the repository file server directory to each PrimeOCR server as the same mapped drive (z:\images) on each PC.
Create a job to process the images with a defined template.
Save the job file into the job directory then copy that job to each job directory of the four PCs.
Under Setup\Output set "Skip files with existing output" to "Yes". Select Ok to exit Setup and save changes.
Press the START button on each of the four Job Servers. Observe that each OCR server processes the next available file in the repository.

Can I generate two different kinds of output formats during processing?

Yes. Under Setup\OCR Engine\More Settings ... in the "Pre OCR String Option 1" type in: "file_output2,13".
PrimeOCR will output an additional PDF file during processing with this string option.

How can I process images from different directories through PrimeOCR?

Using the wizard create a job for each directory of images you want to process. Each job file is a simple text file that includes two lines of key information - the first is a path to the images to process and the second a path to the template to be used during processing.
PrimeOCR will read the first job, process the images then read the images from the second job and so forth until completion.
Images can be stored either on the local PC or on a remote drive.
If the images are stored on a remote drive then mount the drive to a drive letter prior to creating a job (z:\images).

Prime Recognition Job File
Version=4.20
1
z:\images\*.tif
z:\images\tif2pdf.ptm
As an alternative to mounted drives a UNC path to the images can also be referenced in the job file. A simple text editor (notepad.exe) can be used to modify the simple job file format and insert the UNC path to the images and template. For example:

Prime Recognition Job File
Version=4.20
1
\\server01\images\*.tif
\\server01\images\tif2pdf.ptm

How can I configure the PrimeOCR Job Server to use watched folders?

Under Setup\Input set "Once OCR of job is complete:" to "Do not Erase Job File".
Under Setup\Input set "Once all jobs are complete:" to "Continuously poll for next job".
Under Setup\Output set "Skip files with existing output?" to "Yes". Select Ok to exit Setup and save changes.
With these settings the PrimeOCR Job Server will process images that are inserted into the watched directories. Once output exists for a file PrimeOCR will not re-process the file.

How can I improve processing speed?

Under Setup\OCR Engine\deselect both logging settings.
- Logging is useful when first setting up the PrimeOCR Job Server but slows processing during production.
- Errors that occur during processing will still be recorded in the PrimeOCR log even though both logging settings are disabled.
Under Setup\OCR Engine\More Settings\Variable Processing On then Configure, select "Low quality images are processed quickly" and "High quality images are processed more quickly" See Setup help notes and review user's manual before using this setting in production.
Under Setup\OCR Engine\More Settings\# of CPUs. The default setting should be set to Auto. Auto will auto-sense how many CPUs are on the PC - including hyperthreaded CPUs and will compare that with how many CPUs are licensed for processing.

My images have mixed orientation. Can PrimeOCR automatically find the correct orientation of the text and automatically rotate them for me?

PrimeOCR has three functions for rotating images. All three can be defined in a template using the first screen of the wizard.
To rotate each image with a fixed rotation (90, 180 or 270) prior to OCR - perhaps all pages were scanned with a landscape orientation - use the rotate function. Do not use the rotate feature if using auto-rotate or strong auto-rotate since the rotation will occur after auto-rotation takes place.
If the scanned documents include a mix of orientations then use the auto-rotate function. The auto-rotate function is a fast algorithm that attempts to find the correct orientation of the page. It may be useful for many projects but may not provide the accuracy required for all projects.
If you find that auto-rotate is not accurate enough for your documents then also use strong auto-rotate. Strong auto-rotate is the most accurate solution for finding the correct orientation documents but it can contribute to longer processing times. It should be used in conjunction with auto-rotate when scanned documents have mixed orientations.

When would I use both the Primary and the Low Priority Job Directories?

Customers that usually have a mix of high priority and low priority jobs enable both the primary and the low priority job directories. A typical scenario would be that you have a job that can be processed in the background (low priority) and when a job comes up that needs to be completed sooner it can be placed in the Primary (high priority) Job Directory.
Most customers just use the Primary Directory and process their jobs sequentially. Other customers, that usually manage several different kinds of conversion projects, will use both job directories to manage their work through PrimeOCR.
The Primary Job Directory is always enabled. The use of the Low Priority Job Directory can be enabled as an option.
The PrimeOCR Job Server will look for jobs in the primary job directory first. If a job is not found in the Primary Job Directory or the job has been completed then the PrimeOCR Job Server will look for jobs in the Low Priority Job Directory if the "Enable low priority job directory" checkbox is checked.
The PrimeOCR Job Server will process a set number of images in a low priority job before returning to check for new jobs in the Primary Job Directory. Under Setup\Input set the "Number of low priority images before poll" to modify the number of images to process before changing over to the high priority directory.
Under Setup\Input set the "Number of seconds before poll" to the number of seconds that should pass before the PrimeOCR Job Server reads the Primary Job Directory for new jobs.

How can I enable the many number of PDF output options like the ability to downsample images, adding thumbnails, optimizing PDF output and creating Accessible PDF output?

Under Setup\Output\More Settings\Change PDF defaults\Details

I want to save the confid.txt file (processing statistics) with the images that are processed. How do I change its location?

Under Setup\OCR Engine\More Settings\Change confidence log attributes\Configure\Save in output directory.

What is the difference between lexical check and lexical plus?

Lexical check is basic lexical processing within each internal voting OCR engine and lexical plus is a post OCR function that acts as an advanced spell checker.
Lexical check is functionality that exists within each internal voting OCR engine. Each internal voting OCR engine includes some level of lexical review to see how recognized characters fit into word context. The result of the internal lexical check may improve recognition results.
Lexical plus is a powerful separate software module that analyzes the OCR results once OCR has been completed by all of the voting OCR engines.
Lexical plus can auto-correct words that have characters that are not correct (for example: changing misissipi to mississippi) provided most of the characters have high confidence.
Lexical plus is most useful on documents that contain English text. It does not have any capability to correct numeric data or non-English language words.
There are number of advanced settings that can be adjusted for lexical plus.
Lexical plus can also be used to reduce the number of characters required to be verified by 60-90%.

Can PrimeOCR move my images to a fixed output directory during processing?

Yes. Define a template using at least one image enhancement function (first screen of wizard - deskew) then on the next screen of the wizard select to save the processed image either as its same name or as a .fix file.
Under Setup\Input\More Settings\select "Erase image files after processing".
The incoming image will then be written to the output directory and the input image will be deleted during processing.

I see that my output file (ASCII or RTF) includes a carriage return on every line of text. How can I have the text wrap in paragraphs?

For RTF output go to Setup\Output\More Settings\Change RTF defaults\Details\select Wrapped paragraphs.
For ASCII output go to Setup\OCR Engine\More Settings\Pre Recognition string\type in "ASCII_DEFAULTS, 1"

Can I save the OCR output to a different directory?

Under Setup\Output select "Save output to a different directory".
You can then select the option to preserve subdirectories structure if you are processing multi- level directories and you want to retain the image directory structure.