Extraction output formats

The CLI uses a human-readable table format as its default output option, but offers various ways for you to format the output of the extract command. Use the --format/-f parameter to format the output of the command into one of the output types in the following table.

--format (-f)

Description

table

table with column headings
table is the default.

csv

comma-separated values

json

JSON string

Using table format

The table format provides output in an easy-to-read format. This output format is the default, so you do not need to specify the --format parameter:

aluma extract myextractor *.pdf

You will get output like this, with each field in a separate column:

FILE                                Order                    Date
00001.pdf                           GB29374                  01/05/2017
00002.pdf                           GB93731                  01/05/2017

Using CSV format

The csv output format returns a simple text-based and comma-separated output with no headings. This format makes it easy to consume the output into other commands and tools that need to process the output in some form.

Using the preceding example with the csv option outputs the following comma-separated results:

aluma extract myextractor *.pdf -f csv
Filename,Order,Date
C:\examples\00001.pdf,GB29374,01/05/2017
C:\examples\00002.pdf,GB93731,01/05/2017

Note that the csv output includes the full file path as well as the extracted data.

The next examples show how the csv output can be piped to the Powershell ConvertFrom-Csv cmdlet to select specific results from the output of the extract command. In this case we're filtering the output to only include the Filename and Date fields:

aluma extract myextractor *.pdf -f csv | ConvertFrom-Csv | select "Filename", "Date"

In this case we're filtering the output to only include results where the Date is 01/01/2017:

aluma extract myextractor *.pdf -f csv | ConvertFrom-Csv | where { $_.Date -eq "01/01/2017" } | ConvertTo-Csv -NoTypeInformation

Using JSON format

The json output format returns a JSON string containing all available output fields. This format is designed for output into other commands and tools that need to process the output and need access to the more advanced output fields.

aluma extract myextractor test/*.pdf -f json

The output is in this form (some output omitted for brevity):

[{
  "filename": "001.pdf",
  "field_results": [
    {
      "field_name": "Order",
      "rejected": false,
      "reject_reason": "None",
      "result": {
        "text": "GB29374",
        "value": null,
        "rejected": false,
        "reject_reason": "None",
        "proximity_score": 100,
        "match_score": 100,
        "text_score": 98.65839,
        "areas": [
          {
            "top": 261.360016,
            "left": 155.16,
            "bottom": 269.54,
            "right": 195.02,
            "page_number": 1
          }
        ]
      },
      ...
}]

Note that the JSON is an array of results, except when using the --multiple-files/-m parameter to write a result file per input file in which case the JSON is a single result.

📘

Talk to us if you are using JSON output

We recommend that you chat to us if you think you need to use the advanced properties in the JSON output, so we can help make sure your configuration is optimised and you are using the properties in the correct way. Just reach out to us at [email protected].