Sigma Data Cleansing FilterΒΆ
The foglamp-filter-sigmacleanse filter is designed to cleanse data in a stream by removing outliers from the data stream or adding a label to them to facilitate further processing upstream. The method used to remove or label these outliers is to build an average and standard deviation for the data over time and remove any data that differs by more than a certain factor of the standard deviation from that average.
The plugin is designed to be used in situations when a sensor or item of equipment produces occasional anomalous results, these Will be removed from the data passed onward within the system to provide a cleaner data stream. Care should be taken however that these values that are removed do represent sensor anomalies and are not the result of problems with the condition that is being monitored. If a sensor produces a high percentage of anomalous results then it should be considered for replacement.
In order to monitor the anomalous rates the plugin can optionally produce an hourly statistics report that will show the number of readings that have been forwarded as good, labeled as possible anomalies and the number that have been discarded.
The method used to determine if a value is anomalous is based on the premise that data from a given sensor will follow a normal distribution from the mean value that is sampled over time. The probably of a value being valid reduces as the value differs more greatly from the mean value. This gives rise to the classical bell shaped distribution of values as shown below.
The filter saves the state on shutdown and reloads it on startup so that it knows if it should start rejecting data or continue to determine the normal when restarted. It also saves the sigma map which contains the normalization statistics for each datapoint on the shutdown and reloads it on startup.
It can be seen from the diagram above how the probability drops as the values moves away from the mean, the sigma values here are the standard deviations observed for good data samples. Outlier values that are discarded do not contribute to the calculation of the standard deviation.
To add a sigma cleansing filter to your service:
Click on the Applications add icon for your service or task.
Select the sigmacleanse plugin from the list of available plugins.
Name your cleansing filter.
Click Next and you will be presented with the following configuration page
Configure your sigma cleanse filter
Sample Size: The number of hours over which an initial mean and standard deviation is built before any cleansing commences
Sigma: The factor to apply to the standard deviation, the default is 3. Any value that differs from the mean by more than 3 * sigma will be removed.
Statistics Asset: If this is not empty a statistics asset will be added every hour that details the number of readings that have been forwarded by the filter and the number removed. The name is that asset matches the value added here.
Action: The action to take if the reading being processed is detected as a possible anomaly.
The reading can either be labeled, by adding a new datapoint to the reading, or it may be removed from the data stream.
Label: The name of the data point to be used when labeling the reading as a possible anomaly.
Enable your filter and click Done