Automatically Detecting Abnormal Behavior in Computing Systems
RAACD: Roberts' Automatic Abnormal Conduct Detector
19 April 2013
J. Frank Roberts
Computer Science, University of Kentucky
19 April 2013
J. Frank Roberts
Computer Science, University of Kentucky
Looks like this:
We collect a lot of computer-health data.
We don't always know what interesting behavior looks like.
Classifying interesting behavior requires information that we don't have.
We don't know what interesting behavior looks like, but we know it's not normal.
We need a scheme that can detect abnormal behavior.
We prefer a scheme that can learn from true positive results.
I attempt to automatically discover abnormal behavior and to learn from known abnormal behavior.
RAACD provides a list of hosts that it thinks are behaving abnormally.
A host's page includes graphs for each property.
A host's page includes graphs of the anomaly score for each property:
RAACD reduces the amount of data the administrator must examine.
RAACD doesn't require any a priori definition of abnormal behavior.
RAACD applies an anomaly-detection algorithm to detect abnormal behavior.
Abnormal behavior is usually also anomalous.
I apply an anomaly detection algorithm to each property for each host.
If a host has multiple simultaneously anomalous properties, I consider
its behavior abnormal.
This technique doesn't detect only bad or interesting behavior, but it does filter out information about hosts that are behaving normally.
I have implemented three techniques to detect anomalies:
Profile search is my extension to the baseline analysis method.
All three methods rely on Symbolic Aggregate approXimation (SAX)2 to summarize the time series.The name has two parts:
Given the series
[25, 24, 20, 15, 12, 12, 12, 13, 11, 9, 6, 3]
SAX normalizes the series by subtracting from each sample the mean and then dividing the result by the standard deviation:
[1.73, 1.58, 0.98, 0.23, -0.23, -0.23, -0.23, -0.08, -0.38, -0.68, -1.13, -1.58]
SAX then partitions the series and computes the mean of each partition:
mean([ 1.73, 1.58, 0.98]) --> 1.43 mean([ 0.23, -0.23, -0.23]) --> -0.08 mean([-0.23, -0.08, -0.38]) --> -0.23 mean([-0.68, -1.13, -1.58]) --> -1.13
My SAX implementation uses the alphabet "abcd".
To ensure that each of the four symbols appears with equal probability, SAX uses the following table to assign symbols:
| Symbol | Range |
| a | < -0.675 |
| b | -0.675 .. 0.0 |
| c | 0.0 .. 0.675 |
| d | > 0.675 |
| Mean | Symbol |
| 1.43 | d |
| -0.08 | b |
| -0.23 | b |
| -1.13 | a |
SAX converts the series to the word "dbba".
SAX converts longer series into lists of words:
I convert computer-health data to the SAX representation.
I use the method introduced by Wei et al. to compute the distance between two sets of words. For each set of words:
func histogram_distance(A, B):
dist = 0
for subword in A:
dist += (A[subword] - B[subword]) ** 2
return distWords: "abbbd", "bdaab"
Count n-symbol subwords:
word: "abbbd" word: "bdaab" aa: 0 aa: 1 ab: 1 ab: 1 bb: 2 bb: 0 bd: 1 bd: 1 da: 0 da: 1
Normalize and subtract:
word: "abbbd" word: "bdaab" difference (squared)
aa: 0.00 aa: 1.00 -1.00 (1.00)
ab: 0.50 ab: 1.00 -0.50 (0.25)
bb: 1.00 bb: 0.00 1.00 (1.00)
bd: 0.50 bd: 1.00 -0.50 (0.25)
da: 0.00 da: 1.00 -1.00 (1.00)
Total: 3.50WP analysis slides two windows across the list of words obtained from SAX.
The lead window should be long enough to capture one cycle of normal behavior.
The lag window should be 2 or 3 times the length of the lead window.
At each time step in the series, WP analysis computes subword histograms from the lead and lag windows and computes the distance between them.
I use the distance as the anomaly score and associate the score with the sample at the border between the two windows.
I applied WP analysis to a synthesized series:
WP analysis produces a double peak around a point anomaly.
Baseline analysis slides only one window, the inspection window, across the list of words.
At each time step, baseline analysis builds a subword histogram from the inspection window and computes the distance to a precomputed subword histogram.
I build the precomputed subword histogram from a series that represents normal behavior.
Baseline analysis associates the anomaly score with the center of the inspection window.
I applied baseline analysis to a synthesized series:
Baseline analysis produces a sharp peak at a point anomaly.
A profile search is very similar to baseline analysis.
I build a precomputed profile from a series that contains a specific anomaly.
I produce the anomaly scores from the vector of distances as follows:
I can use this technique to detect specific patterns that I've previously discovered with other tools.
I searched for the 26 samples surrounding the anomaly:
Profile search generates a broad peak around an anomaly.
WP analysis generates a double peak.
WP correctly detects several anomalies.
Baseline analysis also performs well.
Profile search does not even perform well on synthesized data with noise.
The peaks in the anomaly score are too imprecise to be of any use.
The anomaly-score curves make it easy for a human to see anomalies in the data.
A computer needs a cutoff point.
The examples show that there is no clear threshold for "anomalous."
I have to look at more than one curve to detect abnormal behavior.
I apply multi-property detection on a per-host basis.
I look at all of the properties for one host simultaneously.
I first normalize the anomaly-score curves to have a maximum value of 1.0.
The algorithm looks for a region where multiple anomaly-score curves are most anomalous.
RAACD implements multi-property detection on top of baseline analysis.
Baseline analysis handles real data better than WP.
RAACD uses a threshold value of 0.6.
When three or more anomaly scores exceed 0.6, RAACD adds the current host to a list of hosts exhibiting abnormal behavior.
RAACD generates HTML pages from this list.
I implemented RAACD as a presentation front-end to NodeScape v2.
NodeScape v2 collects and stores computer-health data.
The NodeScape project is the first result of Aggregate.org's research on smarter computer monitoring.
KAOS moved into Marksbury => new machine room
Big windows warrant a pretty display
We devised NodeScape
The first version of NodeScape employs novel presentation techniques:
NodeScape v1 is about making the information easy for a human to reduce.
RAACD reduces the amount of information a human has to see.
In addition to presentation techniques, RAACD employs new analysis techniques.
These tools are all great for collecting and presenting computer-health data.
None of these tools detects abnormal behavior.
Pulsar detects alarming behavior.
Pulsar both reduces the amount of data presented and makes the data presented easy to reduce.
Drawback: The administrator must know what sort of behavior to expect when deriving a formula to compute the comfort level.
RAACD can detect unexpected behavior.
I want to spend more time studying the behavior of these anomaly-detection methods.
I want to find a new way to represent information about anomalies.
I want to release RAACD publicly.
I want to study multi-host detection.
I want to improve multi-property detection.
I want to try 5-symbol SAX.
I reimplemented the SAX method.
I implemented anomaly-detection methods similar to those presented by Wei et al.
I developed my own anomaly-detection method, profile search.
I developed the multi-property method for detecting abnormal behavior.
I implemented multi-property detection in RAACD as a front-end to NodeScape and currently use it to monitor approximately 30 hosts for abnormal behavior.
19 April 2013