Filter out data

This article explains how to use a standard property on a search to filter out surplus data. If you aren't familiar with searches, you may find it helpful to read the Searches overview article.

Sometimes unwanted data can sporadically find its way into your fields, usually when the regular expression is fairly loose and the extraction area depends on fine margins, such that other data is on the borderline of being included. If you know what the culprit data tends to look like, you can include it as an optional part of your regular expression, and use the Output Format property to return only the required groups (everything but the known, bad data).

This is one of the many pre-formatting options that you can apply within the search itself, and it is a transformation operation.

For example, the regular expression needs to be vague for the 'Description' column of this table as it could contain any text, but sometimes the 'Item No.' data is also included if the column boundaries are not always accurately calculated.

1049

Line items with surplus data (click to enlarge)

Instead of just .+ you could change the regular expression for this description to ([A-Z]\d{3})?(.+) so it will always match as two groups. The first optional group will be empty when the columns are correctly separated and the description text is clean, however when the item number does creep into the description it will be identified by the optional expression and separated into the first group.

So you would set the Output Format property to '$2', because you know that the characters which are in the second capturing group are always the ones you require.