Extracting Prices from Store Shelfs
Extracting Data from Data Strips Photos
The client is an international shopfitting company. The OCR solution created by SmartSoft lets their personnel take photos of sore shelfs with a mobile device. The software then extracts the data from the shelf data strips and populates a data base.
OCR software (object character recognition) works at identifying standard black-on-white letters (such as those seen in this document). However, when taking a picture in a store, we want to extract only the text and price on the actual data strips so we can enter them in a database. Here are some challenges we had to overcome:
- Finding data strips in a picture
- Finding the text in a photo of a data strip
- Determining if a picture is focused well enough to be OCR-readable
- Fixing perspective defects
- Digitally enhancing the focus of the picture (that is different from a “sharpen” filter which actually loses information)
The software transforms a generic picture taken by a cell phone, to an OCR-readable format.
Finding data strips in a picture
- A transformation of data from the standard Red-Green-Blue color space into a grayscale image.
- Resulting components (aka blobs) are identified
- The program uses statistical and shape-based algorithms to reduce the number of components and displays the resulting components as candidates for data strips
Finding text in a photo of a data strip
- A similar algorithm as stated above is used. The difference is that here, we are looking for paragraphs of letters and words that are connected horizontally or vertically.
- The resulting image will be used on the cell phone to determine whether to keep or discard the given price tag candidate.
Assessing the readability of an image
- There are a few types of image blur which can be simplified into motion blur and Gaussian (omnidirectional) blur. Generally, singly directional (motion) blur is more destructive to an image’s readability than Gaussian blur.
- This algorithm sorts a few similar images by a metric showing their estimated OCR compound readability.
- This lets us assess the quality of an image, so that we can use the best one or retake photos on the spot if necessary
Fixing perspective defects
- An image of a rectangle taken at a non-normal incident angle will distort the actual object. This reduces dramatically the accuracy of the OCR reading the price tag.
- In this algorithm, it is assumed that the price tag is actually rectangular. Once identified, a backwards perspective transform is applied.
Digitally enhancing the focus of the picture
- Deconvolution is digital enhancement of pictures. For carrying out deconvolution, we need two steps:
- Finding the kernel of the blur filter (the key of how the image was exactly blurred in respect to a perfectly sharp image),
- Applying a deconvolution algorithm to focus the image.
- Because of tremor of the holder’s hand, motion blur will usually be the dominating component of the blur
- We estimate a motion blur kernel by analyzing the picture and then apply our variation of the Lucy-Richardson deconvolution algorithm.
- The result is that letters and words of height 12 pixels and higher are made OCR-readable (as opposed to the case without deconvolution where 18-24+ pixel high letters are needed)