Extraction output formats
The CLI uses a human-readable table format as its default output option, but offers various ways for you to format the output of the extract
command. Use the --format
/-f
parameter to format the output of the command into one of the output types in the following table.
--format (-f) | Description |
---|---|
table | table with column headingstable is the default. |
csv | comma-separated values |
json | JSON string |
Using table format
The table format provides output in an easy-to-read format. This output format is the default, so you do not need to specify the --format
parameter:
aluma extract myextractor *.pdf
You will get output like this, with each field in a separate column:
FILE Order Date
00001.pdf GB29374 01/05/2017
00002.pdf GB93731 01/05/2017
Using CSV format
The csv
output format returns a simple text-based and comma-separated output with no headings. This format makes it easy to consume the output into other commands and tools that need to process the output in some form.
Using the preceding example with the csv
option outputs the following comma-separated results:
aluma extract myextractor *.pdf -f csv
Filename,Order,Date
C:\examples\00001.pdf,GB29374,01/05/2017
C:\examples\00002.pdf,GB93731,01/05/2017
Note that the csv
output includes the full file path as well as the extracted data.
The next examples show how the csv
output can be piped to the Powershell ConvertFrom-Csv
cmdlet to select specific results from the output of the extract
command. In this case we're filtering the output to only include the Filename
and Date
fields:
aluma extract myextractor *.pdf -f csv | ConvertFrom-Csv | select "Filename", "Date"
In this case we're filtering the output to only include results where the Date
is 01/01/2017
:
aluma extract myextractor *.pdf -f csv | ConvertFrom-Csv | where { $_.Date -eq "01/01/2017" } | ConvertTo-Csv -NoTypeInformation
Using JSON format
The json
output format returns a JSON string containing all available output fields. This format is designed for output into other commands and tools that need to process the output and need access to the more advanced output fields.
aluma extract myextractor test/*.pdf -f json
The output is in this form (some output omitted for brevity):
[{
"filename": "001.pdf",
"field_results": [
{
"field_name": "Order",
"rejected": false,
"reject_reason": "None",
"result": {
"text": "GB29374",
"value": null,
"rejected": false,
"reject_reason": "None",
"proximity_score": 100,
"match_score": 100,
"text_score": 98.65839,
"areas": [
{
"top": 261.360016,
"left": 155.16,
"bottom": 269.54,
"right": 195.02,
"page_number": 1
}
]
},
...
}]
Note that the JSON is an array of results, except when using the --multiple-files
/-m
parameter to write a result file per input file in which case the JSON is a single result.
Talk to us if you are using JSON output
We recommend that you chat to us if you think you need to use the advanced properties in the JSON output, so we can help make sure your configuration is optimised and you are using the properties in the correct way. Just reach out to us at [email protected].
Updated about 3 years ago