Digitization is the process of converting information into a digital format. Digitizing information makes it easier to preserve, access, and share. There is a growing trend towards digitization of historically and culturally significant data.
The author files may be in MSWord, rtf, PDF, scanned copies, etc.
Data and image synchronization at character level is unchanged throughout the process.
Provision for clean-up of low-confidence (suspect) characters that are flagged in the common format.
Spell-check option with customized dictionary based on both language and content.
Interactive identification and correction of special characters which were not defined in the project symbol list.
Ability to run project-based validation rules like punctuation check, emphasis (bold, italic, underline) verification, spacing rules with character-level image synchronization.
Collect metrics on number of suspect characters reported and corrected for productivity measurement.
Track the changes made by the operator to monitor quality and operational efficiency. In addition, operator information is stored for all changes, which would help in review and feedback.
Collect details on corrected suspect characters for analysis and to train the OCR engine to improve its efficiency.
Leave a Reply