Data augmentation for neural network training — example for printed characters recognition

The creation of training samples from natural images

The training examples from natural images are created based on real data. The process consists of the following stages:

  1. collection of graphic data (photographing of objects of interest, capturing the video stream using a camera, selecting a part of an image from a web page).
  2. filtering — the image verification with some requirements: a proper level of objects lighting on the images, a necessary object presence, etc.
  3. preparing annotation instruments (custom development or optimization of the existing tools)
  4. annotation (the selection of quadrangles, necessary character placements, interesting image fields)
  5. labelling of every image (where the label is a letter or a title of an object on the image)

Training samples creation out of artificial images

Another approach for the training data creation is their artificial generation. Several templates can be used / ‘ideal’ examples (e.g. sets of fonts) and a necessary number of training samples can be created by applying various distortions. The following distortions can be used:

  1. geometric (affine, projective …).
  2. brightness and color modification
  3. background replacement
  4. distortions typical for the task to be solved: light specks, noise, blur, etc.

Creation of artificial training examples based on natural images

Consecutive extrapolation of the previous method is an artificial image generation with the use of real data instead of templates and initial ‘ideal’ examples. It is possible to achieve a substantial recognition system improvement by adding distortion. To conclude what distortions should be applied a part of real data must be used for validation. They can be used for an assessment of more common errors and to add images in correspondence with distortions into the training database.

The comparison of neural network training quality on the natural samples, fully artificial, and generated from the natural ones

To give it a go a neural network is created on the MRZ characters images. MRZ — Machine-Readable Zone is called a field of a document verifying ID performed in accordance with international recommendation stated in Doc 9303 — Machine Readable Travel Documents of International Organization of Civil Aviation.

  • errantly recognized characters percentage
  • percentage of fully and correctly recognized fields (MRZ is considered to be fully recognized if all characters are recognized correctly).
  • errors in acentric images
  • errors in rotated images
  • errors in images with lines
  • errors in images with highlights
  • errors in other complicated cases
  1. Adding ‘shift’ type of distortion which corresponds to the error in the ‘acentric’ image.
  2. A series of experiments: training several neural networks
  3. Quality assessment using a testing set. MRZ recognition quality has increased by 9%.
  4. Analysis of the most frequent recognition errors using the validation set.
  5. Adding images with extra lines into the training database
  6. Another series of experiments
  7. Testing. MRZ recognition quality on a testing set has increased by 3.5%.


To sum up, it is necessary to highlight that a specific algorithm of obtaining training data must be chosen in every precise case. If the initial data is absent — an artificial selection must be applied. If real data can be obtained, a training dataset created solely on it can be used. But if the real data is not enough or there are rarely detectable errors, the most appropriate way is the augmentation of the set of natural images. From our own experience, the latter case occurs most frequently.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Smart Engines

Smart Engines

A software development company with a broad expertise in computer vision, document processing and OCR, mobile recognition systems and AI.