Extracting supplier identity from invoices

About this guide

Aluma's invoice data modules allow you to intelligently match invoices against live supplier information stored in a database in order to extract supplier identity fields for each invoice processed.

In this guide we'll configure an Azure SQL database with the appropriate table and some sample data and create a simple extractor that extracts supplier identity fields for a document.

Working through this guide should take about 20 minutes.

📘

MySQL also available

You can also use a MySQL database to hold the supplier information. Please contact us for information on how to do this.

Before you begin

You will need:

  • An Azure SQL database, and a valid connection string to connect to it
  • Sufficient permissions to configure the database parent Azure SQL Server's firewall rules

Overview

The Aluma Supplier Identity extraction module uses a database lookup to retrieve the identity of the supplier from whom an invoice was sent.

In the database, we will create a suppliers table that contains each supplier's identity, name, address and tax ID. The module does a "fuzzy" match between the information in the invoice and the information in the table to identify the appropriate supplier, and returns the supplier identity in extraction result fields.

1 - Create the suppliers table

The first thing we need to do is create a table that will contain each supplier's identity, name, address and tax ID.

We'll call our table suppliers but you can choose a different name if you prefer, or create it as a view on existing data.

Run the following table creation script in a database console, or create a table with the same columns using a graphical editor.

create table dbo.suppliers
(
    ID varchar(100) not null
        constraint suppliers_pk
            primary key nonclustered,
    Name varchar(100),
    Address1 varchar(100),
    Address2 varchar(100),
    Address3 varchar(100),
    Address4 varchar(100),
    PostalCode varchar(20),
    TaxID varchar(20)
)
go

create unique index suppliers_ID_uindex
    on dbo.suppliers (ID)
go

2 - Populate some example supplier data

Now let's insert some sample data that corresponds to the example invoice we'll test with later. You can run this script, or insert the data manually into the database using your database tool.

INSERT INTO suppliers (ID, Name, Address1, Address2, Address3, Address4, PostalCode, TaxID) VALUES ('S001HOT', 'Hot Numbers Coffee Ltd', 'Unit 6 Dales Brewery', 'Gwydir Street', 'Cambridge', null, 'CB1 2LJ', '121026379');

The ID is required for each supplier, but all the other fields are optional. Aluma will use any information available for each supplier to find the appropriate match.

3 - Make the database accessible to Aluma

Before we can use the data in the table, we need to make the database accessible to Aluma.

All requests from Aluma to external services will come from this IP address: 35.246.42.139. Let's add a rule permitting requests from this address to the Azure SQL server's firewall.

In the Azure portal, navigate to your Azure SQL Server resource and open the "Firewalls and virtual networks" blade.

Add a rule with 35.246.42.139 as the start and end IP. Don't forget to click "Save" at the top of the blade to commit the change. Once you are finished, your firewall rules should include a rule like this:

4 - Test

For this test we'll use one of our UK invoice example documents. Click this link to open the document and save it to your computer: https://downloads.aluma.io/example-documents/invoices/uk/Example-Invoice-3.pdf.

Now open the Aluma dashboard, and click "Data Extraction" then "Create Extractor" to open the Extractor creation page.

Click "Open Document" and select the example invoice you just downloaded.

Now let's add a Supplier Identity module and configure it to use the data in our database.

Click "Add new module" and select the Supplier Identity module.

Set the module parameters as follows:

Property

Value

Database connection string

The connection string to your database. You can find this in the Azure portal. Navigate to the database resource and open the "Connection strings" blade. Copy the "ADO.NET" version of the connection string to an editor and edit it to replace the password placeholder with the password for the user specified in the string.

Database type

mssql

Database schema name

dbo (unless you created the table in a different schema)

Supplier table name

suppliers (unless you created it with a different name)

Leave the other properties blank.

Click "Extract Data" to run the extract the data on the document. You should see the following results:

Optimise supplier identification - provide purchase order numbers

If you have purchase order numbers and the associated supplier identity available then you can optimise the supplier identification process by providing this information in a database table and specifying the table as a Supplier Identity module parameter.

Create a purchase_order_supplier lookup table with the following script, or the equivalent schema using an editor:

create table dbo.purchase_order_suppliers
(
    PurchaseOrderNumber varchar(100) not null
        constraint purchase_order_suppliers_pk
            primary key nonclustered,
    ID varchar(100) not null
        constraint purchase_order_suppliers_suppliers_ID_fk
            references dbo.suppliers
)
go

Each row in the table should contain a purchase order number and its associated supplier ID as specified in the suppliers table.

To make use of the table, specify its name in the Purchase order supplier table name parameter.

If your purchase order numbers are not 5-10 digit numbers then you should also specify a regular expression that matches your purchase order numbers format and specify it in the Purchase order number format parameter to optimise identification of the purchase order numbers.

Optimise supplier identification - provide customer information

Some organisations (the "customer") receive invoices from other parts of the same organisation. In these cases your own organisation's details may appear in the suppliers table.

Because the customer details will appear frequently on invoices, there is an increased risk of incorrectly identifying the supplier in these cases.

You can optimise the supplier identification process to account for this by providing the organisation's details in a database customer information table and specifying the table as a Supplier Identity module parameter.

The table does not have to contain details of other (external) customers, but can do if you are also using the table for customer identification using the Customer Identity module.

Create a customers lookup table with the following script, or the equivalent schema using an editor:

create table dbo.customers
(
    ID varchar(100) not null
        constraint customers_pk
            primary key nonclustered,
    Name varchar(100),
    Address1 varchar(100),
    Address2 varchar(100),
    Address3 varchar(100),
    Address4 varchar(100),
    PostalCode varchar(20),
    TaxID varchar(20)
)
go

To make use of this table, specify its name in the Customer table name parameter.