StatisticsCalculator

From fmepedia

StatisticsCalculator is a Workbench Transformer.


Table of contents



Description

The StatisticsCalculator analyses a particular attribute on a set of features and produces a set of statistical results. It can calculate the minimum value, maximum value, mean, median, total, range, mode, standard deviation etc.

For example, this set of features...

Feature, Value
   1       2
   2       4
   3       1
   4       9
   5       9

...would give these results...

Max        9
Min        1
Mean       5
Sum       25
Total      5   (ie 5 features)
Mode       9
Median     4



Q+A

Q) What does the 'Pass Through Features' setting do?

If 'Pass Through' is set to Yes then all of the features that make up the source feature set will be output with the statistical information as attributes. You would use this where each feature is to be subsequently tested against the statistical values; for example test "Attribute Value > Mean (Attribute Value)" to retain features that have a higher than average attribute value.

If 'Pass Through' is set to No then only a single feature is ouput, also containing the statistical info as attributes. Use this method where you are just wanting to calculate a set of statistics for a feature set, and do not need to further assess each feature individually. If you can use this method then do so - because each feature is consumed when read there is a memory saving.




Q) Why is the Standard Deviation result -1? If there is an error why not use 0?

A large array of high numbers can be too much to handle in the Standard Deviation calculation and lead to an error. A value of -1 means this error has occurred. -1 is used to signify this because it is an invalid result for a Standard Deviation calculation; 0 is not a good indicator of an error because 0 is a valid SD result.



Example

The attached workspace provides an example workspace showing how the StatisticsCalculator transformer might be used - in particular the Group-By function on the transformer, which is new for FME2008.


Scenario

The simple scenario here is to find the average annual expenditure on toothpaste for each postal code in the City of Interopolis.


Source Data

The source data is a set of address points, and a set of postcode (zipcode) boundaries.

Toothpaste expenditure per household is simulated by attaching a random number to each address.


Above: Sample of source data showing addresses and postcode boundaries


Workspace

The first thing to do in the workspace - after reading the source and generating some random data - is to overlay the address points and zipcode boundaries so that each address is tagged with a postcode attribute.

Now the StatisticsCalculator is set to calculate the number of addresses per postcode, and the average toothpaste expenditure per postcode. The means to do this is the Group-By function, which here is set to the zipcode attribute.



Above: The StatisticsCalculator settings dialog. Note the attribute to analyze and group-by parameters



Above: The workspace as a whole [nb: zipcode on the output is coloured red as the group-by attributes are erroneously being un-exposed - I'm sure this will be fixed soon]


Output

Note in the above output how 12292 features enter the StatisticsCalculator, but only 7 features exit. This indicates there are 7 groups (different postcodes) on which statistics were created.

The output is written to a plain CSV file.



Above: The output (tidied up slightly for clarity)

Attached Files
filesizedate
SC1.jpg28.5 kB08/18/09
SC2.jpg50.1 kB08/18/09
SC3.jpg94.7 kB08/18/09
SC4.jpg15.6 kB08/18/09
StatisticsCalculator.zip262.1 kB08/18/09
User Comments Add a new comment