Add a single sample file to a classifier

The request body should contain the binary contents of the sample file and the Content-Type header should be set to the MIME-type of the file.

Correct use of the "retrain" query parameter

Once samples have been added to a classifier, the classifier must be "trained". During this process the classifier analyses the samples and determines the defining characteristics of each document type. Training can only be done when there are samples (that are not empty) of at least two document types.

For optimal performance of requests to this endpoint you should only train once, when all the samples you intend to add have been added. Training multiple times won't hurt but will make requests slower.

The retrain query parameter can be used to control whether training happens after the sample is added.

When starting from a new (empty) classifier you must always set retrain=false for the first samples until you have added samples for at least two document types.

Ideally you should set retrain=false for all except the very last sample you want to add, so the training is performed only once.

Supported file types

Files of the following file types can be used as samples:

  • PDFs that contain electronic content
  • Microsoft Office Word, Excel or PowerPoint documents
  • Text files

Image files and PDFs without electronic content cannot be used as samples. You should OCR these first and use the resulting documents as samples instead.

File Type

Content-Type header

PDFs (with electronic content)

application/pdf

Microsoft Office Word (.docx) documents

application/vnd.openxmlformats-officedocument.wordprocessingml.document

Microsoft Office Excel (.xlsx) documents

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Microsoft Office PowerPoint (.pptx) documents

application/vnd.openxmlformats-officedocument.presentationml.presentation

Text files

text/plain

📘

Note that you will need to make multiple requests to this endpoint to sufficiently train a classifier. Generally it is easier to use the Add samples from ZIP file endpoint, and you should only use this endpoint if you are tightly integrating training into another system, have very large samples, have a very large number of samples or it is inconvenient to build a ZIP file. The Add samples from ZIP file endpoint is also substantially faster when adding multiple samples.

RESPONSES

200 The sample was added to the Classifier
400 There is no file supplied in the body
401 There is no Authorization header or the access token is invalid
404 The specified Classifier does not exist
415 The Content-Type header is missing, contains an unsupported type, does not match the actual contents of the file, or the file is a PDF that does not include content (details of the exact problem are included in the error response).

Language