Best Practices

The API is very straightforward to use, and you can process documents successfully with just a few simple requests. But when you are preparing to use the API in your production service there are several best practices you should follow so that your integration is resilient and efficient.

1. Renew your access tokens when necessary

Access tokens expire periodically and a new token must be obtained in order to continue making requests to the other API endpoints.

Your code should check the expiration time of the token (the exp JWT property) and request a new token just before this time. You should not request a new token more frequently than this or you may be rate-limited.

2. Use an appropriate retry policy

In order to be resilient to transient errors, you should wrap all requests to the API in a retry policy. We recommend a limited exponential back-off policy, circuit breaker, or combination of the two. A circuit breaker is a useful pattern allowing you to degrade your application's service while the remote dependency is unavailable, where an infinite retry policy will instead cause your application to hang.

Your retry policy should retries requests only in the following cases:

  • Any request that fails because of a network error
  • Any request that returns a response with HTTP status code >= 500 (server errors) or 408 (request timeout)

Be sure to include a jitter in your exponential back-off policy, to avoid sending all your retries at once, and place a limit on the number of retries you attempt.

3. Delete all document resources after use

The document resource is the foundation of the API. In order to process your files (PDFs, TIFFs etc.), you will:

  1. Create a new document resource using the Create document (upload) or Create document (import) endpoints
  2. OCR, classify, extract data from, or redact the document using the appropriate endpoints and the document's ID.
  3. Delete the document resource

Your API client has a maximum number of document resources that can exist at any time. This number is deliberately kept low to encourage you to keep your documents within the service for as short a time as possible.

You should ensure that you always delete the document resource once you've processed your document, even in error cases. Otherwise you will hit your document limit and the Create document request will return a 403 response code.

4. Process documents in parallel

If you have many documents and want to process these as fast as possible, you can do this by creating and using multiple document resources at the same time.

You should process a maximum of 10 documents at a time, even if your API client permits you to create more documents than this. If your volumes are sufficiently large that it would be valuable for you to parallelise to a greater degree, please talk to us.

When processing documents in parallel, the elapsed time taken to process each individual document may be slightly increased, even though overall time to process all documents will be decreased.

An easy basic approach to parallelisation is to take a batch of documents (say 10), process them all and wait until all are complete before processing another batch. However, each document will take a different amount of time to complete processing. Therefore you will maximise throughput if you ensure that once any document has been completed the next is started.

One common pattern used to accomplish this is:

  • Push all documents onto a thread-safe queue
  • Create a pool of workers on different threads (as many as the degree of parallelisation you want)
  • Each worker pulls the next document from the queue, processes it, does whatever it needs to with results (perhaps pushing to a results queue) and then pulls another document

Utilising the Bulkhead pattern will allow you to constrain the parallelism whilst buffering HTTP requests that can't currently be serviced.