PrimeOCR Job
Server Configuration FAQ The PrimeOCR Job Server includes
many features for production OCR processing. Listed below are common operating configurations
for the software. Send us
an email if you have a specific need that you cannot find here or in the PrimeOCR Job Server.
How do I convert image only PDFs to searchable
PDFs?
How can I configure four
licenses of the Job Server running on four different machines to work on the
same repository of images at the same time?
Can I generate two different
kinds of output formats during processing?
How can I process images from different
directories through PrimeOCR?
How can I configure the PrimeOCR Job Server to
use watched folders?
How can I improve processing speed?
My images have mixed
orientation. Can PrimeOCR automatically find the correct orientation
of the text and automatically rotate them for me?
When would I use both the Primary and the Low
Priority Job Directories?
How can I enable the many number
of PDF output options like the ability to downsample images, adding thumbnails,
optimizing PDF output and creating Accessible PDF output?
I want
to save the confid.txt file (processing statistics) with the images that are
processed. How do I change its location?
What is the difference between lexical check and
lexical plus?
Can PrimeOCR move
my images to a fixed output directory during processing?
I see that my output file
(ASCII or RTF) includes a carriage return on every line of text. How
can I have the text wrap in paragraphs?
Can I save the OCR output to
a different directory?
How do I
convert image only PDFs to searchable PDFs?
For single page image PDF files default settings
can be used which write the searchable PDF output to the input directory.
The searchable PDF output file will overwrite the image only PDF file.
For multi-page PDF files, under Setup\Output
select "Save output to a different directory" and select a path. PrimeOCR will read image only PDF files from
the input directory and output
them to an output directory.
If you want the input files to be deleted when
processing is complete then under Setup\Input\More Settings
select "Erase image files after processing".
How can I configure four
licenses of the Job Server running on four different machines to work on the
same repository of images at the same time?
Install PrimeOCR onto each of the four PCs.
Mount the repository file server directory to each
PrimeOCR server as the same mapped drive (z:\images) on each PC.
Create a job to process the images with a defined
template.
Save the job file into the job directory then copy that job to each job directory of the four PCs.
Under Setup\Output set "Skip files with
existing output" to "Yes". Select Ok to exit Setup and save changes.
Press the START button on each of the four Job
Servers. Observe that each OCR server processes the next available file
in the repository.
Can I
generate two different kinds of output formats during processing?
Yes. Under Setup\OCR Engine\More Settings
... in the "Pre OCR String Option 1" type in: "file_output2,13".
PrimeOCR will output an additional PDF file during
processing with this string option.
How can I process images from different
directories through PrimeOCR?
Using the wizard create a job for each directory
of images you want to process. Each job file is a simple text file
that includes two lines of key information - the first is a path to
the images to process and the second a path to the template to be used
during processing.
PrimeOCR will read the first job, process the
images then read the images from the second job and so forth until
completion.
Images can be stored either on the local PC or on
a remote drive.
If the images are stored on a remote drive then
mount the drive to a drive letter prior to creating a job (z:\images).
Prime Recognition Job File
Version=4.20
1
z:\images\*.tif
z:\images\tif2pdf.ptm
As an alternative to mounted drives a UNC path to
the images can also be referenced in the job file. A simple text
editor (notepad.exe) can be used to modify the simple job file format and
insert the UNC path to the images and template. For example:
Prime Recognition Job File
Version=4.20
1
\\server01\images\*.tif
\\server01\images\tif2pdf.ptm
How can I configure the PrimeOCR Job Server to
use watched folders?
Under Setup\Input set "Once OCR of job is
complete:" to "Do not Erase Job File".
Under Setup\Input set "Once all jobs are
complete:" to "Continuously poll for next job".
Under Setup\Output set "Skip files with
existing output?" to "Yes". Select Ok to exit Setup and save changes.
With these settings the PrimeOCR Job Server will
process images that are inserted into the watched directories. Once
output exists for a file PrimeOCR will not re-process the file.
How can I improve processing speed?
Under Setup\OCR Engine\deselect both
logging settings.
Logging is useful when first setting up the
PrimeOCR Job Server but slows processing during production.
Errors that occur during processing will still be
recorded in the PrimeOCR log even though both logging settings are
disabled.
Under Setup\OCR Engine\More
Settings\Variable Processing On then Configure, select "Low
quality images are processed quickly" and "High quality images are
processed more quickly" See Setup help notes and review user's
manual before using this setting in production.
Under Setup\OCR Engine\More Settings\# of CPUs. The default setting should be set to Auto.
Auto will auto-sense how many CPUs are on the PC - including hyperthreaded
CPUs and will compare that with how many CPUs are licensed for processing.
My
images have mixed orientation. Can PrimeOCR automatically find the
correct orientation of the text and automatically rotate them for me?
PrimeOCR has three functions for rotating images.
All three can be defined in a template using the first screen of the
wizard.
To rotate each image with a
fixed rotation (90, 180 or 270) prior to OCR - perhaps all pages were
scanned with a landscape orientation - use the rotate function. Do
not use the rotate feature if using auto-rotate or strong auto-rotate
since the rotation will occur after auto-rotation takes place.
If the scanned documents include a mix of
orientations then use the auto-rotate function. The auto-rotate
function is a fast algorithm that
attempts to find the correct orientation of the page. It may be
useful for many projects but may not provide the accuracy required for all
projects.
If you find that auto-rotate is not accurate
enough for your documents then also use strong auto-rotate. Strong
auto-rotate is the most accurate solution
for finding the correct orientation documents but it can contribute to
longer processing times. It should be used in
conjunction with auto-rotate when scanned documents have mixed orientations.
When would I use both the
Primary and the Low
Priority Job Directories?
Customers that usually have a mix of high priority
and low priority jobs enable both the primary and the low priority job
directories. A typical scenario would be that you have a job that
can be processed in the background (low priority) and when a job comes up
that needs to be completed sooner it can be placed in the Primary (high
priority) Job Directory.
Most customers just use the Primary Directory and
process their jobs sequentially. Other customers, that usually
manage several different kinds of conversion projects, will use both job
directories to manage their work through PrimeOCR.
The Primary Job Directory is always enabled.
The use of the Low Priority Job Directory can be enabled as an option.
The PrimeOCR Job Server will look for jobs in the
primary job directory first. If a job is not found in the
Primary Job Directory or the job has been completed then the PrimeOCR Job
Server will look for jobs in the Low Priority Job Directory if the "Enable
low priority job directory" checkbox is checked.
The PrimeOCR Job Server will process a set number
of images in a low priority job before returning to check for new jobs
in the Primary Job Directory. Under Setup\Input set the
"Number of low priority images before poll" to modify the number of images to
process before changing over to the high priority directory.
Under Setup\Input set the "Number of
seconds before poll" to the number of seconds that should pass before the PrimeOCR Job Server reads the Primary Job Directory for new jobs.
How
can I enable the many number of PDF output options like the ability to downsample images, adding thumbnails,
optimizing PDF output and creating Accessible PDF output?
I want
to save the confid.txt file (processing statistics) with the images that are
processed. How do I change its location?
What is the difference between lexical check and
lexical plus?
Lexical check is basic lexical processing within
each internal voting OCR engine and lexical plus is a post OCR function
that acts as an advanced spell checker.
Lexical check is functionality that exists within
each internal voting OCR engine. Each internal voting OCR engine
includes some level of lexical review to see how recognized characters fit
into word context. The result of the internal lexical check may
improve recognition results.
Lexical plus is a powerful separate software
module that analyzes the OCR results once OCR has been completed by all of
the voting OCR engines.
Lexical plus can auto-correct words that have characters that are not correct (for example: changing misissipi to
mississippi) provided most of the characters have high confidence.
Lexical plus is most useful on documents that contain
English text. It does not have any capability to correct numeric
data or non-English language words.
There are number of advanced settings that can be
adjusted for lexical plus.
Lexical plus can also be used to reduce the number
of characters required to be verified by 60-90%.
Can PrimeOCR move my images to a fixed output directory
during processing?
Yes. Define a template using at least one image
enhancement function (first screen of wizard - deskew) then on the next
screen of the wizard select to save the processed image either as its same
name or as a .fix file.
Under Setup\Input\More Settings\select "Erase image
files after processing".
The incoming image will then be written to the
output directory and the input image will be deleted during processing.
I see that
my output file (ASCII or RTF) includes a carriage return on every line of
text. How can I have the text wrap in paragraphs?
For RTF output go to Setup\Output\More
Settings\Change RTF defaults\Details\select Wrapped paragraphs.
For ASCII output go to Setup\OCR Engine\More
Settings\Pre Recognition string\type in "ASCII_DEFAULTS, 1"
Can I
save the OCR output to a different directory?
Under Setup\Output select "Save output to a different
directory".
You can then select the option to preserve
subdirectories structure if you are processing multi- level directories and
you want to retain the image directory structure.
|