Document Studio

Document Studio allows for fast testing, prototyping, and developing of projects as well as classifier training.

Importing Documents

To import files into Document Studio either use Ctrl + I or File > Import Documents (supported document types here). Alternatively, use Import Folder which will import all documents in the selected folder all its subfolders. After selecting the documents, you will have the opportunity to perform some auto-splitting and type classification. By default, each file is set to be a single document and no splitting will occur but this can be changed to split on each page (each page is a document) or every n pages. The document type can also be auto-set to the files parent folder name and with the option of importing as test sample.

πŸ“˜

Top Tip - Easy Classifier Setup

To easily train a classifier based on pre-grouped documents in their own folders, use Import Folder and select the parent folder of all the sub-folders. Ensure the sub-folders are appropriately labelled to the document type they contain. When prompted, select the use folder name as document type option. This will pre-label all the documents ready for classification.

Parent Folder
β”œβ”€β”€ Doc Type 1
β”‚   β”œβ”€β”€ sample1
β”‚   └── sample2
└── Doc Type 2
    β”œβ”€β”€ sample1
    └── sample2

Loading a Project

Project files (.alproj) are loaded using Project > Load. Once imported, by default this will be run as the whole project was configured however if you are looking to only test certain functions, these can be selected or unselected in under Project > Function.... To run only a specific document types extractor, this can be selected under the drop down menu in Project > Function > Extraction....

Setting Credentials

To run a project on your account you will need to set up the API Service Credentials by navigating to Tools > Settings... and inputting the API secret and key obtained in Developer tab of the admin portal.

Loading Reference Data

Reference data can be loaded into Document Studio to allow for efficient checking of a project's output accuracy. This can be set from the current Document Studio state by using Run > Set Current Results as Reference (this output state can also be downloaded to an .xlsx, .xlx or .csv using File > Export Data...). If you have a spreadsheet or csv already populated with the reference values these can be important with Run > Load Reference Data. This spreadsheet or csv must contain the file path of each sample as under the column heading File to link the expected values to those that have been loaded. The remaining columns will auto-match to any extracted field names if present.

Discrepancy Analysis

If the field value provided by the extraction gives a different value to that of the reference data, the cell will be highlighted either red or yellow:

  • red - the value is not the same as the reference value and it is confident,
  • yellow - the value is not the same as the reference value and it is uncertain.