Read (OCR) document

Note that only documents created from PDFs containing images (i.e. scan to PDF) and digitally-created PDFs are supported for reading.

Every page in a multi-page PDF is processed and included in read results.

For small documents reading will usually be very quick, but for very large documents you should expect response time to be up to tens of seconds.

If you read a document created from a digitally-created PDF, the PDF is rendered as an image (or images) and then OCR is performed on the resulting image(s).

If you try to read a document for which reading is not supported, such as a Microsoft Office or text document you will receive a 422 Unprocessable Entity response.

Once this request is complete, you can obtain the OCR results as either a searchable PDF, with OCR text embedded, or as raw text by making a GET request to the same URL.

Using a read-profile

To use a read-profile that specifies languages to use for the read, you should set the Content-Type header to application/json and include a request body with the name of the read-profile, e.g.:

{
  "read_profile": "german"
}

RESPONSES

201 The Document was read. The results are available from the Get Read Results endpoint.
400 No Document ID is specified
401 There is no Authorization header or the access token is invalid
404 The specified Document does not exist
422 The content type of the specified document is not supported for this operation