Proximity Rules

Searches can be used to extract data unconditionally from anywhere on the document.

However, if you want to find a more specific piece of data that lies in a certain position, you sometimes need to use other elements to guide Aluma to the exact location.

There are two main ways to specify a relative position:

  • An Area Constraint, which finds the keyword text first, and uses the constraint properties to produce a bounded area within which the target Search will run

  • Or a Proximity Rule, which is useful when a Search has already identified some candidates for the result, and you need to confirm or eliminate some of those based on surrounding keywords

Each of these can work in a 'geometric' mode, which uses physical distances across the page in dimensions such as inches, millimeters and points. Alternatively they can work in 'logical' mode, with distances measured using natural flow of text such as words and text lines, and this is useful for capturing data that wraps across paragraphs and pages.

The main thing to note is that proximity rules come into effect after the target search has been run, in order to decide which of those alternative results is valid.

Proximity rules can work in combination or in isolation and are defined in the form of a connector between the keyword search, and the target search.

Proximity rules can be positive, promoting a certain result (and eliminating all others) if the condition is met. Or the rule can be negative, an Exception, ruling out all alternatives that meet the condition.

Geometric Proximity Rules

Geometric rules use Regions which have a specific notation in the form of a direction and an operational area (width multiplied by height).

<Direction>:<Width> x <Height>

Multiple regions are separated by semi-colons, for example:

N:1” x 1;W:2 x 0.25”

The distances can be specified in specific units (mm, inches etc.) or multiples of the keyword search size.

If a result of the keyword search lies within this area and direction of a result from the target search, the rule condition is met for the target result.

Remember that regions are relative to the target, so in this example a West proximity rule is used to connect the "Delivery date" keyword search to the generic date target search.

737

Region Directions for a Geometric Proximity Rule (click to enlarge)

Working with multiple rules

You can connect multiple rules to the same 'value' search such that they all apply and, depending on which rules pass and fail, determine whether the value is approved or excluded.

Why then, would you have multiple rules rather than building a single rule with a regular expression that covered all your scenarios?

  • It is sometimes hard to create an all-encompassing regular expression that does not become long and convoluted, and therefore hard to maintain
  • Some of the rules may be affirmations (the default, approval of a particular result) but others may be exceptions, ruling out the piece of data
  • The rules may need to be different types (Geometric or Logical)
  • You may need to set other parameters on each search to be specific to only one keyword / part of the regular expression. For example, you might need one particular keyword strictly north of the text and another particular keyword strictly west, hence the Regions must be defined on separate searches for the relevant keywords only

Seems like:

If all the green rules pass as a set (AND / OR logic satisfied), or there are no green rules, the value is approved
If all the red rules pass as a set (AND / OR logic satisfied) the value is excluded