classify
The classify
command classifies one or more files using a classifier and writes the results to the console, to a single file or to one output file per file processed. Results can be written in table, CSV and JSON formats.
Basic usage
To classify files using a classifier and write the results to the console, specify the file or files to classify and the name of the classifier:
aluma classify classifier-name *.pdf
See Selecting files to process for examples of how to use file patterns to select multiple files.
The default output format is the table format. This is an easy-to-read format, but only includes the most commonly-used output fields. Specifically, the "relative confidence" and individual document type scores are omitted.
FILE DOCUMENT TYPE CONFIDENT
00001a.pdf Expenses true
00001b.pdf Invoice true
Writing results in CSV format
Classification results can be written in CSV format. This format makes it easy to consume the output into other commands and tools that need to process the output in some form.
To specify CSV format output use the --format csv
parameter or the shorter -f csv
version.
aluma classify classifier-name *.pdf -f csv
C:\examples\00001a.xlsx,Expenses,true,1.619
C:\examples\00001b.pdf,Invoice,true,3.159
CSV format output includes the following fields (in this order):
- Full file path
- Document type
- Confident (
true
orfalse
) - Relative confidence
You can pipe CSV format output to the Powershell ConvertFrom-Csv
cmdlet to select specific results from the output of the classify
command. In this case we're selecting results where the document type is "Invoice".
aluma classify myclassifier *.* -f csv | `
ConvertFrom-Csv -Header "File", "Type", "Confident", "Confidence" | `
where { $_.Type -eq "Invoice" }
Writing results in JSON format
Classification results can be written in a JSON format that contains all available output fields. This format is designed for output into other commands and tools that need to process the output and need access to the more advanced output fields.
To specify JSON format output use the --format json
parameter or the shorter -f json
version.
aluma classify classifier-name *.pdf -f json
The JSON is an array of results, except when using the --multiple-files
/-m
parameter to write a result file per input file in which case the JSON is a single result:
[{
"filename": "C:\\examples\\00001a.pdf",
"classification_results": {
"document_type": "Expenses",
"is_confident": true,
"relative_confidence": 1.6189158,
"document_type_scores": [
{
"document_type": "Expenses",
"score": 49.467617
},
{
"document_type": "Invoice",
"score": 34.63108
}
]
}
},
{
"filename": "C:\\examples\\00001b.pdf",
...
}]
Using a read-profile to classify non-English documents
To specify a read-profile containing a language or languages to use if the document must be read before classifying, use the -r
switch with the name of the profile:
aluma classify classifier-name 001.tif -r read-profile-name
See also
Updated about 3 years ago