What is a machine-readable zone, and how do machines read it?
MRZ, or Machine-Readable Zone is a codified element of identity documents. Its purpose is to facilitate easier automated scanning of basic personal details of the document holder, such as their full name, document number, nationality, date of birth, and the document expiration date. Today the MRZ can be found on the photo page of any international passport, as well as on many types of ID cards, residence permits, visas, and more. A document which contains MRZ is referred to as a Machine Readable Travel Document, or MRTD for short.
Today, automatic document analysis and recognition technologies have progressed enough for the MRZ to be read efficiently and easily using consumer devices, such as smartphones. For high-precision MRZ scanning Smart Engines has developed the Smart Code Engine product.
The purpose of MRZ
The MRZ was introduced in the 1980s with the intent to facilitate and speed up identity check in places with controlled access. Take for example an airport: prior to boarding the plane, each traveler needs to be identified, and more than once: their travel document needs to get checked at the flight registration desk, and then on various control points. To make this work, a well-cooperated, ordered and stable security organization has to be put in place. MRTD has proved to be an effective part of the identification process in cross-border travel since it got introduced, and has been universally recognized as such.
Eventually, large business centers and other places with a continuous flow of people and restricted access have naturally decided to replicate the practice of automated identification. Today, even small offices and health clinics are using the automatic document check systems, both for security and convenience reasons.
The machine-readable zone serves multiple purposes. Firstly, MRZ allows to have personal details of the passport’s owner in a standardized format so that they could be quickly recognized and registered by a special scanner. Secondly, the international standardization of the MRZ format allows officials from multiple countries to quickly decode and verify personal data details of citizens of other countries. Thirdly, the MRZ provides an additional level of security of the encoded personal data, as it contains check sums and syntactic rules which helps prevent some forgery attempts. In addition, MRZ scanning allows quick access to the RFID chip which is placed inside biometric identity documents, such as biometric passports. This chip contains extended information about the holder of the document, and can be accessed only after entering the passport number, date of birth, and passport’s expiration date.
There are currently several types of ICAO standard machine-readable zones, which vary in the number of lines and characters in each line:
- TD-1 (e.g. citizen’s identification card, EU ID cards, US Green Card): consists of 3 lines, 30 characters each.
- TD-2 (e.g. Romania ID, old type of German ID), and MRV-B (machine-readable visas type B — e.g. Schengen visa): consists of 2 lines, 36 characters each.
- MRP (all international passports, also known as TD-3), and MRV-A (machine-readable visas type A — issued by the USA, Japan, China, and others): consists of 2 lines, 44 characters each.
Technically, only those documents listed in ICAO standard contain what we call MRZ. Other documents might also have machine-readable zones, however they may deviate from the ICAO standard — both in the number of lines and the content. Thus, MRZ-like codes can be also found in some national ID cards, driver’s licenses, vehicle registration certificates, and other documents.
MRZ on a passport
A national passport is a document that allows cross-border travel, thus it has to be recognized equally correctly in all modern airports in the world, meaning the content and the structure of the identity page have to be standardized.
The identity page on the passport consists of two parts: the Visual Inspection Zone (VIZ), and the Machine Readable Zone (MRZ). The VIZ provides personal details of the passport owner, their photo, and the passport details, displayed in the format understandable to a human. The MRZ is located at the bottom of the page, and in its composition corresponds to the VIZ fields, but is meant to be read by machine.
MRZ code of a passport always consists of two lines of characters, which, as mentioned above, correspond to the following data from the VIZ:
– Document code
– State code, or code of the government agency (organization) that issued the passport
– Full Name
– Document number
– Passport’s expiration date
– Other data provided at the discretion of the issuing authority (this may include national ID number, issue date, etc.)
MRZ on a passport also contains several check digits which allow to detect gross attempts to document falsification, as well as some machine recognition errors.
Now, based on the example of a national passport, let us take a closer look at the MRZ composition.
The top line of the passport’s MRZ
The first character indicates the type of document: P — means machine-readable passport (as opposed to, for example, V in MRV-A(B) type of MRZ, corresponding to a visa document). The state or organization that issued the passport can use the second character to determine more specifically the passport type (civil, official, diplomatic, service, etc.). If the passport type is not specified, then a placeholder (<) is inserted instead. The following three characters indicate the country that issued the passport in accordance with ISO 3166–1 alpha-3 with some minor exceptions, or the organization that is authorized to issue passports and other machine-readable documents (for example, UN, Interpol, EU Council). The next 39 characters of the first line provide the name of the passport’s owner. First comes the primary identifier, or the last name. If the last name consists of several words, then a placeholder (<) is used between them. Punctuation marks — hyphens, apostrophes, commas, used in the VIZ, are not used in machine-readable lines. Instead of punctuation marks, a placeholder is used again.
In the machine-readable zone, the last name is separated from the given name(s) with two placeholder characters (<<). In the same way as in a last name, if there are several given names or if they consist of several words, they are separated by placeholder characters.
The number of characters per line is limited. For a passport, each MRZ line must contain exactly 44 characters. Therefore, if the full name is too long and does not fit into one line, the first name gets abbreviated, as it is the second identifier with respect to the last name.
In a machine-readable zone, only Latin characters without diacritics are used, thus specific transliteration rules have to be applied to names which are written with diacritical marks or using other alphabets.
The bottom line of the passport’s MRZ
The first 9 characters of the second line of the passport’s machine-readable zone is the document number. Despite the fact that in most countries that use machine-readable zones in their documents, passport numbers are converted to a 9-digit form, in some cases, the total number of characters may be more or less. If there are more characters in the number, those that did not fit in the allotted 9 places go into the “optional data” zone. The 10th character is there to verify the correctness of the number and is calculated using a special algorithm based on the first 9. The following three characters indicate the citizenship of the passport holder. The citizenship code is written in the ISO 3166–1 alpha-3 international format (with some minor exceptions); there are additional codes such as for stateless persons (the characters would be XXA), or for refugees (XXB or XXC). The next 6 digits is the date of birth in the YYMMDD format, and the character following that date is the check digit, which is calculated by a special algorithm based on the date of birth. The next character indicates the gender of the passport holder: male (M), female (F), or a placeholder < in case the person has not decided on their gender or refused to provide it. The next 6 digits indicate the validity period of the passport in the YYMMDD format, followed by the check digit. The next 14 characters represent optional data at the discretion of the issuer. If there is no personal number, and no other information, this entire field is filled with placeholders. If the personal number data is not available, then its check digit will be indicated either as 0 or as a placeholder. The last digit on the bottom line of the MRZ passport is a check digit calculated using all the characters in the bottom line, except for the characters indicating the gender and citizenship.
Other documents with MRZ
The scope of identity documents which are required to ensure a high security level and facilitate automated scanning exceed international passports. Global economy also requires processing of national identity cards, driver’s licenses, visa and residence documents. Thus, a vast majority of identity documents worldwide have a machine-readable zone as its element, to comply with international conventions. In 2019 the European Union strengthened their regulation of identity documents of EU citizens, their family members, and non-nationals. In addition to a wide range of procedural and physical document security requirements it enforces the use of MRZ on identity cards and residence permits as a functional and standardized way of document data entry and verification. A similar regulation with regards to visa documents gives hope that having the MRZ on an official travel document will remain a global requirement.
How machines read it
Specialized scanners, such as the like you can encounter in airport and access control terminals, typically work with documents only of a single specific type (such as passports), and take advantage of a controlled capturing conditions, known MRZ position in the captured image, as well as known MRZ subtype.
In a more multi-purpose scanning software, such as Smart Code Engine, the scanning is performed on a user device, most frequently given a live camera feed, thus the lighting conditions, camera angle, and other capturing parameters can be unknown. Nevertheless, MRZ has a characteristic periodic structure which allows the mobile software solutions to employ computationally efficient computer vision methods to detect it on the frame even with severe perspective distortions. After it is detected and the localized MRZ image is rectified and high-precision and resource-effective Green OCR technology can be used to segment the MRZ lines into individual characters which can then be individually classified. Then, having prior knowledge about the standardized syntax and semantics of the MRZ types specified by ICAO Doc 9303, as well as known deviations from the standards, the recognized information undergoes filtering, post-processing, possible correction and verification, before it is returned to the SDK caller.
MRZ recognition technologies
Over the past two decades we at Smart Engines have been evolving our optical character recognition (OCR) technologies, and constantly surpassing all quality standards through the use of our latest achievements in computational intelligence and deep learning. Using the original neural network models, we were able to bring the quality of automatic document recognition to a new level. In terms of MRZ recognition, Smart Engines technology not only captures ICAO Doc 9303 international standard forms, but also supports a number of MRZ-like codes, such as the ones used in French ID cards, Bulgarian vehicle registration certificates, Swiss driver’s license, and more, out-of-the-box. To learn more about MRZ recognition and other Smart Engines solutions visit OCR Engines page.
Smart Code Engine is a software solution which is designed to quickly and efficiently scan MRZ and other codified objects. With Smart Code Engine SDK you are able to add to your application, whether mobile or server-side, a state-of-the-art MRZ recognition and parsing module. The software can detect MRZ on single photos, scans, as well as in a real-time video feed from a device’s camera, capture the data in unknown and uncontrolled lighting and perspective conditions, accurately read the information, split it into constituent fields and perform verification checks. If you want to try this functionality out on your device, you can check out demo applications for iOS and Android, and if you wish to get your hands on the SDK distribution, contact our sales team.
#MRZ #MRZrecognition #datacapture #OCR #MobileOCR #scanMRZ