QR-code localization — the important recognition step that has been neglected
QR-code recognition in an image is a well-established problem in machine vision. First of all, the research object was initially developed as a tool for more “comfortable” recognition. Secondly, the problem gets broken into a few separate clear sub-tasks: QR-code localization, QR-code orientation, and QR-code decoding. It turns out that the excellent public domain libraries are able to solve the last two problems of recognition: QR-code orientation and decoding. The only downside is that for high-quality decoding such libraries expect a clear binary image of a QR-code at the input. On the other hand, the QR-code localization problem receives little attention.
We have learned from our own experience that the more precisely you locate the recognized object, the easier it will be to choose the necessary pre-processing instruments and perform the recognition itself. This means that if you are trying to improve the QR-code recognition quality in your project, you should upgrade the QR-code localization methods first. Even if afterwards you will have to binarize the image, it will be way more efficient (computation-wise and quality-wise) to binarize the area with the barcode than the entire image.
In this article, we’ll explain how you can improve the QR-code localization quality using classic image processing methods and share the figures that prove the effectiveness of our algorithm.
We’ll talk about an original QR-code localization method that is based on the modified Viola-Jones object detection framework.
We are sure that there is not a single Habr reader today who doesn’t know what a QR-code is. These two-dimensional barcodes are literally everywhere. It is only natural that there are a whole lot of tools all over the world that make it possible to add QR-codes to their projects somewhat effectively. But, the thing is, this effectiveness depends directly on the quality of the instruments used. And now we are grappling with the classic dilemma: either the problem will be solved (very) well, but it will be (very) costly, or it will be solved at no cost, but the quality will be so-so. Is it possible to refine the no-cost option so that the problem is solved well? If you are curious, keep reading!
QR-code recognition in an image is a well-established problem in machine vision. First of all, the research object was initially developed as a tool for more “comfortable” recognition. Secondly, the problem gets broken into a few separate clear sub-tasks: QR-code localization, QR-code orientation, and QR-code decoding. It turns out that the excellent public domain libraries are able to solve the last two problems of recognition: QR-code orientation and decoding. The only downside is that for high-quality decoding such libraries expect a clear binary image of a QR-code at the input. On the other hand, the QR-code localization problem receives little attention.
We have learned from our own experience that the more precisely you locate the recognized object, the easier it will be to choose the necessary pre-processing instruments and perform the recognition itself. This means that if you are trying to improve the QR-code recognition quality in your project, you should upgrade the QR-code localization methods first. Even if afterwards you will have to binarize the image, it will be way more efficient (computation-wise and quality-wise) to binarize the area with the barcode than the entire image.
In this article we’ll explain how you can improve the QR-code localization quality using classic image processing methods and share the figures that prove the effectiveness of our algorithm.
We’ll talk about an original QR-code localization method that is based on the modified Viola-Jones object detection framework.
Information piece on article’s main concepts
In this section we’ll mention the main QR-code characteristics that are used to create the localization method, as well as a brief description of the original Viola-Jones object detection method.
QR-code
QR-code (acronym for Quick Response Code) is a two-dimensional barcode that was developed in Japan in the mid-90s to serve the automobile industry. Due to its fast readability and greater storage capacity compared to standard one-dimensional barcodes, the QR-code system grew popular in various fields all over the world.
Unlike the standard one-dimensional barcodes which are, as a rule, scanned using specialized hardware, the QR-code is usually scanned with a camera. The QR-code structure is fully defined by the ISO/IEC standard 18004. In order to build a stable recognition algorithm for these purposes, the QR-code possesses a few reference points that form a function pattern: three squares located in the corners of a QR-code image (they are called finder patterns) and smaller synchronizing squares all over the barcode (they are called alignment patterns). These reference points make it possible to normalize the image size and orientation.
Although all the QR-codes look alike visually, various types of QR-codes can have different compositions of their elements depending on the volume of encoded data. In addition, so-called designer QR-codes are becoming increasingly popular. They use other graphic elements (logos, emblems, signs, etc.) instead of some additional data that guarantees better QR-code recognition quality. All these QR-code specificities have to be considered when developing QR-code localization and recognition methods.
The Viola-Jones method
Everybody and his dog already wrote about the Viola-Jones method here on Habr. Even we have done it a few times ourselves (for example, here, here, and here). Still, we consider it necessary to address the main points of this method in just a few words.
The Viola-Jones object detection method was developed for real-time face detection in an image. This framework solves the detection problem using binary classification for each image point, i.e. each rectangular area reviewed with all kinds of shifts and in all kinds of sizes is checked for the object with the help of a pre-trained classifier.
The Viola-Jones framework uses Haar-like rectangular features as the instance space. The value of these features is the difference between the sums of the pixel brightness values within adjacent rectangles. For the effective calculation of the Haar feature values we use integral images, also referred to as a summed-area table in some literature. The Viola-Jones framework is connected to each Haar feature with a binary “weak” classifier h(x):X->{-1,+1}. This classifier is usually presented as a recognition tree with one branch:
where θ and p are a feature threshold value and classifier parity respectively. The next step is to construct a “strong” classifier as a linear combination of “weak” classifiers we’ve mentioned earlier. In order to do that we can use the machine learning algorithm AdaBoost. High-performance rate of the Viola-Jones framework is ensured due to the use of a cascade of “strong” classifiers which allows finding “empty” areas of an image (they don’t contain any object) with a low number of calculations.
QR-code detection algorithm
When constructing a QR-code localization approach, we were guided by the following specificities of the problem. First off, the algorithm being developed by us has to be characterized by high-performance so we can apply it in real-time recognition systems. Second, this algorithm has to be resistant to possible barcode distortions in an image. Third, it has to take into consideration any existing QR-code variability.
As we’ve already mentioned earlier, we used the Viola-Jones object detection framework as an underlying method in our work. This method proved itself to be effective when solving various problems of rigid object detection while providing high performance. We can’t use the Viola-Jones framework in its original version due to the following reasons though:
- the original Viola-Jones method uses Haar features that “highlight” texture specificities of an object; in our case, a QR-code is made up of black and white elements, their distribution varies greatly from one barcode to another.
- the original method is designed to detect identical objects in a predetermined orientation, which is not applicable in our situation.
In order to be able to apply the Viola-Jones method to solve our problem, we’ll be using the original family of edge Haar-like features and a cascade classifier presented as a decision tree. The first modification allows us to focus on the edge characteristics of an object and not its texture. The second modification allows us to build an integrated classifier which will be able to detect different variations of an object. Now we’ll talk some more about each modification.
Gradient Haar-like features for QR-code scanning
We used a special family of gradient Haar features to construct an effective QR-code detector. These features are rectangular Haar-like features calculated over the map of directed edges which makes their generalizing power much stronger.
The map of directed edges is an image of absolute gradient values where the predominant gradient orientation in the (x,y) point is taken into account, and it can be calculated as the discretization of the slope of the edge into horizontal, vertical, +45° and -45° directions. To construct the QR-code detector we used two types of maps of directed edges: a map of straight edges and a map of diagonal edges.
Let’s suppose we have an original image f(x,y). Then we can calculate the approximation of the derivative along the horizontal and vertical direction using the Sobel operator:
In addition, using g_x and g_y we can calculate the direction of the gradient for each point of an image:
A map of straight edges consists mostly of horizontal and vertical edges and is calculated using the following formula:
A map of diagonal edges consists predominantly of the edges along the diagonals and is calculated using the following formula:
The rectangular Haar features are calculated over the constructed map of directed edges (diagonal or straight). Unlike the classic Haar features, these edge features generalize the objects with a large number of edges well.
Figure. Illustration of a map with directed edges: a) the QR-code original image, b) the map of straight lines, c) the rotated QR-code image, d) the diagonal edge map of the rotated QR-code.
A decision tree of strong classifiers
A decision tree of strong classifiers is represented as a binary decision tree: each node is a strong classifier, its right branches are subwindows that presumably contain an object, and its left branches are the ones that didn’t get recognized as an object. The definitive answer is presented only in leaves. The classic cascade classifier that was described in the original Viola and Jones research is a decision tree-like classifier with only one “positive” output (leaf) and a lot of “negative” outputs.
We can see from the research that any path from the root to the lowest node of a decision tree-like classifier can be presented as a cascade where some strong classifiers have an inverted answer at the input. Consequently, we can build a training algorithm for a decision tree-like classifier that uses the training procedure of a classic cascade classifier to train individual paths.
A decision tree-like classifier allows us to train more effective (in terms of recall) classifiers for variable objects compared to classic cascade classifiers.
Experimental results of QR-code detection
For experimental purposes, we prepared a set of barcodes consisting of 264 images that we would use to assess the effectiveness of our algorithm. The physical size of the images was around 1 MP. There was only one QR-code of arbitrary orientation in each image, the size of a barcode was not less than 10% of the image’s area. There are some examples of the images from this set below.
The prepared image dataset was divided into training images and test images. There were 88 training images and 176 test images there.
Training images were used both for positive examples and negative ones. Since there were much fewer positive examples initially, we implemented data augmentation techniques. For example, we used rotation around the barcode center in increments of 1⁵⁰. After augmentation, the number of positive examples was 2088.
Using the same positive and negative examples we trained three QR-code detectors: the classic cascade classifier with standard Haar features, the classic cascade classifier with edge Haar features, and the decision tree-like classifier with edge Haar features. The first cascade classifier consisted of 12 levels and a total of 58 features. The second cascade classifier consisted of 8 levels and a total of 39 features. The trained decision tree-like classifier consisted of 39 peaks, a total of 110 features, and the maximum path from root to leaf was 9. Here’s an example of a trained decision tree-like classifier below.
To assess the quality of the constructed QR-code detectors, we used the barcode decoding module from the open-source computer vision library OpenCV. Using the testing set (which consisted of 176 images as we mentioned earlier), we launched the decoding module with no pre-processing, as well as after the preliminary QR-code search with the help of trained detectors. We present the barcode decoding results below:
№Experiment TitleDecoded Image CountDecoding Quality1Only OpenCV10459,09%2VJ (Grayscale Features, Cascade Classifier) + OpenCV10559,66%3VJ (Edge Features, Cascade Classifier) + OpenCV12369,89%4VJ (Edge Features, Tree Classifier) + OpenCV13677,27%
We can see from the table that the preliminary QR-code localization using the described method allows us to considerably improve the barcode decoding quality (the number of decoding errors decreased by 44%). In addition, the results demonstrate that the use of the original Viola-Jones framework (together with standard Haar features and a cascade classifier) is not effective in addressing the QR-code localization problem.
Now let’s look at how accurate each classifier is at locating a QR-code. The image on the left demonstrates the decoding results of the same barcode using the classic cascade classifier with standard Haar features, the classic cascade classifier with edge features, and the tree-like classifier with edge features. We can see that the tree-like classifier provides the best accuracy in QR-code localization by taking into account the QR-code variability.
Conclusion
These days QR-codes are used in various aspects of life: in the advertising industry to encode URL-addresses, in public sectors to provide e-services, etc. Although the QR-code scanning tasks are extremely widespread today, the existing open-source libraries focus primarily on QR-code decoding and neglect the localization problem. To be honest, the real goal of this article was not so much in describing an effective QR-code localization method, as an attempt to show you, dear reader, how we can get the free libraries almost to the industrial level with the help of scientific thinking and systems analysis and with the knowledge on how to use classic digital image processing tools.
More about barcode scanner
Scanning barcode
Smart Code Engine
#qrcode #barcode #ai #barcodescanner #qrcodescanner #scanbarcode