Data extraction with modules

Data extraction can be configured using pre-built extractor components called 'modules'. Modules provide a quick and easy way to utilise the power and flexibility of Aluma's data extraction capability without needing to build an extractor from scratch.

Modules come pre-configured to extract one or more fields from common document types. For example, the aluma.email_address module contains a field named 'Email', which will find email addresses within the document, and the aluma.invoice_amounts module contains the 'Net Total', 'Tax Total' and 'Gross Total' fields.

Some modules also expose parameters, which allow their behaviour to be customised to your own use cases. For example, the aluma.purchase_order module exposes the Order number format parameter, which allows you to specify a regular expression which used to search the text of the document. In this case, the parameter is optional - if unspecified, a default regular expression will be used. However, some modules' parameters must be set in order for them to function. The aluma.supplier_identity module will connect to a database in order to retrieve information about the supplier that issued an invoice, and in order to do so it requires a connection string. This is set by the connection_string parameter - which is required, since without it, the module could not function.