Machine Learning Data Capture

The machine learning mode (ML) is a next-generation approach to data capture which relies on an underlying neural network to learn from already labeled documents. It works in combination with the traditional rule-based approach to increase the accuracy of data capture.

The ML mode is enabled at the document type level. Go to Admin panel ‣ Document types ‣ Edit document type and enable Machine learning.


Internally the ML settings are actually assigned to the DFD (Document Form Definition), and the DFD is attached to the Document type. So when you change the ML settings for a given document type they are changed for other document types which share the same DFD. You will typically have one-to-one relationship between DFDs and document types.


Machine learning model

The ML Model is the training data the system uses to data from your documents. You can create more than one model, where each is responsible for processing a certain document

You can create many ML models and assign each of them to a different Document type in one system. However one model may know how to read multiple document types.

A ML model can be periodically upgraded to increase its accuracy based on the latest documents processed in the system and the user’s input in the Verify screen.

Assigning a ML model to a document type

After enabling an ML for a document type you need to assign a ML model. You can:

  • select a model existing in the system (models can be shared among document types)

  • create a new empty model (it will be later trained based on processed documents)

  • upload an existing model

Training the ML model

A ML model can be periodically trained based on the latest processed documents. Two modes are available:

  • Automatic (every 200 documents or 600 pages (whichever comes first))

  • After every N documents (the model is trained after every 3 * N pages, e.g. if set to 500 the model will be trained after 1500 pages are processed)

Installing the ML module

The ML module works as part of the Data Capture client. When installing the Automated clinets make sure the Machine learning module is selected.