Explain why it is important to use frames. Explain data capture and OMR. How is it different from OCR?
Before recognition the document analysis usually processes for splitting and correct orientation of pages, detection of text blocks, pictures and other objects. Then after OCR document synthesis rebuilds the structure and layout of document (for content reuse task) or just retrieves the correct text order for complex documents with several text columns and pictures (for archive scenario). Resulted text is exported depending on task as pure text or as a document of supported format.
OCR
OCR (Optical Character Recognition) also called Optical Character Reader is a system that provides a full alphanumeric recognition of printed or handwritten characters at electronic speed by simply scanning the form. More recently, the term Intelligent Character Recognition (ICR) has been used to describe the process of interpreting image data, in particular alphanumeric text. Forms containing characters images can be scanned through scanner and then recognition engine of the OCR system interpret the images and turn images of handwritten or printed characters into ASCII data (machine-readable characters) (Baecker, 2005).
Therefore, OCR allows users to quickly automate data capture from forms, eliminate keystrokes to reduce data entry costs and still maintain the high level of accuracy required in forms processing applications. The technology provides a complete form processing and +documents capture solution. Usually, OCR uses a modular architecture that is open, scaleable and workflow controlled. It includes forms definition, scanning, image pre-processing, and recognition capabilities.
ICR
Intelligent Character Recognition (ICR) is the module of OCR that has the ability to turn images of hand written or printed characters into ASCII data. Sometimes OCR is known as ICR.
Difference between ICR, OMR and OCR
ICR and OCR are recognition engines used with imaging; while OMR is a data collection technology that does not require a recognition engine. Therefore, basically OMR can not recognize hand-printed or machine-printed characters. However, in the OCR technology, answer for question in “tick” or “mark” is also known as OCR.
Optical Mark Reader (OMR)
Forms
An OMR works with specialized document and contains timing tracks along one edge of the form to indicate scanner where to read for marks. It also contains form ID marks, which look like black boxes on the top or bottom of a form. The cut of the form is very precise and the bubbles on a form must be located in the same location on every form.
Storage
With OMR, the image of a document is not scanned and stored.
Accuracy
Considering that OMR is simpler than OCR, and if the forms and the system is designed properly, then OMR has accuracy more than of OCR.
OCR/ ICR
Forms
OCR/ ICR is more flexible since no timing tracks or block like form IDs required. In addition, the image can float on a page. The ICR/ OCR technology uses registration mark on the four-corners of a document, in the recognition of an image. Respondents place one character per box on this ...