In an ideal world, data entry is done by the customer using an online, web-based form. However ubiquitous computers seem, companies still find themselves dealing with paper forms.
To capture the data on a paper form, companies typically employ OCR (optical character recognition, also called ICR, intelligent character recognition) with varying results. When OCR does not work, the data entry is handled by in-house staff and becomes more costly.
The accuracy of OCR is greatly affected by the design of the form. Features such as machine or hand printed characters, scanned or faxed forms, black ink or drop out ink will affect the OCR engine’s ability to read characters accurately. In general, clean images of a standard size with clearly written characters are the best candidates for OCR. (Again, that ideal world we want.)
Processing faxed forms or hand printed forms pose additional challenges. These forms can be processed by an OCR application, but they will not yield the high accuracy rates of the clean, machine printed images especially without a solid form design.
Here are guidelines to follow when designing an OCR-friendly form:
- Use constrained boxes for each character instead of lines or one single box. This guides customers to print separate, evenly spaced characters.
- Use circles or ovals instead of check boxes. This encourages customers to completely fill the mark instead of placing a check or an ‘X’ in the mark.
- Use drop out ink for any form elements that can interfere with recognition, such as labels or dividing lines. Drop out ink is often a pastel yellow, red or orange. When these forms are scanned using a scanner that supports the drop out color, the labels and lines are removed from the image. As a result, they do not interfere with recognition.
- If drop out ink cannot be used, then move labels further from the hand printed fields or marks. For example, if the forms will be faxed instead of scanned, then ink will not drop out. By placing labels away from the data, the forms removal software can delete the labels without removing the data.
- Whether using drop out ink or black ink forms, make sure there is sufficient white space surrounding the data fields to prevent the OCR engine from mistakenly reading incorrect data. White space between rows of fields should be at least ¼” apart.
- Process a scanned or faxed image of the original paper form. Discourage customers from submitting photocopies of the form or faxing the page multiple times.
- Create separate fields wherever possible. For example, city, state, and zip code should be three separate fields, instead of one field labeled Address.
- Always print a small sample of your form and test it with your OCR engine before distribution a large number of forms to your customers.