Extract invoice data

About this guide

Aluma makes it easy to reliably extract key data from any invoices, regardless of layout. Scanned documents are read automatically using on-demand OCR if necessary.

You can capture data such as invoice number, date, invoice amounts and supplier's tax ID without any configuration.

If you have a database available that contains details of your purchase orders and suppliers then you can also capture line items and match supplier and customer identities against your records.

In this guide we'll extract basic invoice data from some example UK invoices, including both scanned documents and digitally-created PDFs.

Working through this guide should take about 5 minutes.

Before you begin

Before you start you must have:

  • Installed the Aluma CLI and logged in to connect it to your account
  • Installed the example documents

If you have not done these steps, follow the Getting started guide and then return to this one.

Extract data from invoices

We'll extract some basic invoice data from the UK invoice example documents in the /invoices/uk directory.

Before starting, make sure that you are in the directory where you installed the examples.

Aluma has a built-in invoice data extractor for UK invoices that captures the following basic fields:

  • Invoice Number
  • Purchase Order Number
  • Invoice Date
  • Tax Point
  • Net Total, Tax Total and Gross Total
  • Currency
  • Document Type
  • Supplier Tax ID, IBAN, Bank Code, Bank Account Number and Company Number

📘

Line items and supplier identity

It's also possible to capture line items and match supplier & customer identities against your records if you have a database available with this data.

Let's use this built-in extractor to extract data from the sample invoice documents.

Enter the following command:

aluma extract aluma.invoices.gb examples/invoices/uk/*.* -f csv

The aluma extract command streams results to the console as each file is processed. The files are processed in parallel, so the order of the results may differ. You will see output like this:

Filename,Invoice Number,PO Number,Invoice Date,Tax Point,Net Total,Tax Total,Gross Total,Currency,Document Type,Supplier Tax ID,Supplier IBAN,Supplier Bank Code,Supplier Bank Account Number,Supplier Company Number
invoices\007.pdf,2507542,,07/02/2018,07/02/2018,9.95,,9.95,GBP,Invoice,,GB75CHAS60924241033583,609242,41033583,
invoices\004.tif,45111,,29/08/2014,29/08/2014,88.50,17.70,106.20,GBP,Invoice,416673738,,201003,83106543,
invoices\006.tif,OM1204,,01/06/2017,01/06/2017,240.00,48.00,288.00,GBP,Invoice,154439894,,090222,10083798,07690103
invoices\001.tif,69792,,31/01/2017,31/01/2017,75.00,15.00,90.00,GBP,Invoice,770560628,,301355,01050669,04172508
...

Invoices from other countries

You can also use Aluma to extract data from invoices originated in other countries. We have extractors available for many countries and can make these available on request. Please reach out to us at [email protected] to request access to them.