NeighborFinder

From fmepedia

NeighborFinder is a Workbench Transformer.


Table of contents


Description

The NeighborFinder locates the nearest 'candidate' feature to a 'base' feature and copies the candidates attributes over to the base feature.


neighborfinder.gif



NeighborFinder and Format Attributes

Problems may occur when the base and candidate features have different geometry types.

For example, when using point features as the base and line features as the candidate all candidate attributes are copied to the nearest base feature. This includes 'format attributes' such as fme_geometry. So after processing the base point features now have a geometry type of mif_polyline (for example) when they ought to be mif_point

In FME2005 onwards this is resolved as long as you have a destination dataset defined (so a source dataset, NeighborFinder and Visualizer will show incorrect data, but a source and destination routed to the Viewer will show the correct result).



NeighborFinder Lists

When a list name is specified in the NeighborFinder the list contains all of the candidate features that were within the maximum distance of the base. The FME Functions and Factories Manual notes the name of the setting that causes this is CLOSED_CANDIDATES_LIST

BUT! Pre FME-2006 there was no such setting. Any NeighborFinder with a list created in FME 2005 or previous uses a setting called TESTED_CANDIDATES_LIST. This setting causes the list to contain all candidate features that were within the base feature's bounding box! Therefore an older NeighborFinder may return more list entries than a newer version.

NB: It is the NeighborFinder version that is important - not the FME version! If you place a NeighborFinder inside a workspace using FME 2005 then the setting used will be the 2005 version (TESTED_CANDIDATES_LIST) - even if you run it in FME 2006! To get the 2006 functionality you would need to open the workspace for editing in FME 2006 and replace all of the NeighborFinder transformers.


NB2: In FME 2006-GB onwards a newer version of the NeighborFinder allows the user to enter different list names for TESTED and CLOSED settings. This and an updated documentation should prevent further confusion. I'll leave the above info though in case there are old versions of the NeighborFinder floating around out there.



NeighborFinder Consider Self

Older versions of the NeighborFinder had a consider-self setting. As the help said, "Unless every feature is connected to both the BASE port and the CANDIDATE port, Consider Self should be set to Yes." If you set it to NO then the BASE port gets ignored completely.

In newer versions of the transformer - where that setting isn't present - simply leave the BASE port disconnected and the same actions will apply.



Example

The attached workspace shows an example use of the NeighborFinder transformer.

This example demonstrates a typical use for this transformer, and also how the Candidates First setting can be applied to improve performance.



Scenario

Here we have two source datasets; one of residences (addresses) in GeoMedia format, and one of city landmarks in AutoCAD DWG. The scenario is we wish to find the nearest landmark to each address, with a maximum distance to the landmark of one mile (5280 feet).



Workspace

This workspace uses the standard FME sample dataset (http://www.safe.com/support/onlinelearning/fmesampledata.php).


Image:Neighbor0.png
Above: The workspace. Approximately 13,000 addresses are being processed against 110 landmarks.



Results

The results are output according to whether or not a match was found. A second workspace was created that included a Bufferer and Dissolver to show the 5280 foot boundary and prove the results are correct.


Filtering

Image:Neighbor1.png
Above: Red addresses are inside the one mile limit and are matches. Grey addresses are outside and unmatched. The blue points are the landmark features.


Attributes

A second result is that the neighbor's attributes are automatically copied onto the base feature referencing it.


Image:Neighbor1a.png
Above: On querying this address we find that it is 383ft from the nearest landmark, which is Windsor Village.


Visualization

One interesting point is that a couple of matched addresses do appear to fall outside the boundary in the above image. That's because - for purposes of speed* - the buffer Interpolation Angle setting was left at 22.5 degrees, and thus resulted in only a crude approximation. This proves a) the processing is using the true distance, so the result is correct; and b) don't be complacent about default values in the settings dialog, if you set the buffer angle to 1 then the results look a lot better.


Image:Neighbor2.png
Above: Although this looks poor, the results are actually good.



Candidates First

When set to "Yes", the Candidates First setting informs the transformer that Candidate features will arrive first. Consequently, when the first base feature arrives the Candidates port is closed. Now the transformer can process Base features immediately, without having to cache them in case some candidates are yet to arrive.

The benefits of this are improved performance - the process will be faster and use less memory.


Image:Neighbor3.png
Above: The NeighborFinder settings dialog.


However, setting to yes does require that candidates do arrive first. The user controls Candidate/Base order by rearranging dataset readers in the navigation pane. The uppermost reader should be the candidates reader as it will be read first, and therefore arrive first at the transformer. Right-click a reader to find options for moving it up/down the list.


Image:Neighbor4.png
Above: The Navigator Pane. Note how Landmarks (our Candidates) is the uppermost dataset.


Log Files

At first the Candidates First option appears to be slowing things down, taking 11.4 seconds to read the data, rather than 7.4:

  • Candidates First = No
2008-08-07 12:27:25|   7.4|  1.0|INFORM|Reading source feature # 10000
  • Candidates First = Yes
2008-08-07 12:29:10|  11.4|  2.1|INFORM|Reading source feature # 10000


But this is misleading and you shouldn't be taken in. The Candidates First = Yes translation is not just reading the data, but processing it as it goes. Therefore it is almost complete. The Candidates First = No translation has only read the data and cached it to disk. It must still do the processing.


  • Candidates First = No
2008-08-07 12:27:31|  13.3|  0.0|INFORM|Translation was SUCCESSFUL with 0 warning(s) (0 feature(s)/0 coordinate(s) output)
2008-08-07 12:27:31|  13.3|  0.0|INFORM|FME Session Duration: 13.2 seconds. (CPU: 12.0s user, 0.5s system)
2008-08-07 12:27:31|  13.3|  0.0|INFORM|END - ProcessID: 4368, peak process memory usage: 96308 kB, current process memory usage: 58552 kB.
  • Candidates First = Yes
2008-08-07 12:29:12|  12.7|  0.0|INFORM|Translation was SUCCESSFUL with 0 warning(s) (0 feature(s)/0 coordinate(s) output)
2008-08-07 12:29:12|  12.7|  0.0|INFORM|FME Session Duration: 12.8 seconds. (CPU: 11.5s user, 0.5s system)
2008-08-07 12:29:12|  12.7|  0.0|INFORM|END - ProcessID: 6940, peak process memory usage: 56208 kB, current process memory usage: 48244 kB.


That's more like it! The Candidates First = Yes translation has finished first. More importantly it's peak memory usage is only 50% of the regular translation.


Caution

It's important to use this setting only when you are sure of the feature order.

Firstly, the candidates input port is closed the moment the first base feature arrives, therefore any subsequent candidates will be ignored.

2008-08-07 13:30:42|   9.0|  0.0|WARN  |NeighborFinder(ProximityFactory): Extra Candidate feature(s) encountered and ignored

Above: This is the log message that indicates ignored candidate features


Secondly, if there are no candidates at all (ie a base feature is the first to arrive) then the transformer will NOT halt the translation. It will carry on and ALL base features will be classed as unmatched. This is not the same behaviour as the Clipper which will stop the translation should there be no Clippers.

2008-08-07 13:42:08|   9.1|  0.0|STATS |NeighborFinder(ProximityFactory): Input Summary:  12292 Base feature(s), 0 Candidate features(s)

Above: You will always get a summary that will tell you exactly how many bases and candidates were used, but as a statistic and not a warning



Summary

Although this example only demonstrates a small difference in speed, the source datasets were relatively small. The larger the datasets (particularly the Base features) the greater the difference. In fact at some point it is going to make the difference between a translation succeeding or failing due to a lack of memory, and that's why this setting is so useful.



* OK, the real reason is that I plain forgot the Bufferer setting and blamed the bad output on the Viewer. As usual PEBKAC (http://en.wikipedia.org/wiki/Pebkac)!

Attached Files
filesizedate
NeighborFinderExample.zip220.8 kB11/13/08
index.php------
User Comments Add a new comment