Automatically Detecting Abnormal Behavior in Computing Systems

RAACD: Roberts' Automatic Abnormal Conduct Detector

19 April 2013

J. Frank Roberts

Computer Science, University of Kentucky

Computer-health Data

Looks like this:

The result of my analysis looks like this.

Computer-health Data

We collect a lot of computer-health data.

What does interesting behavior look like?

We don't always know what interesting behavior looks like.



Classifying interesting behavior requires information that we don't have.

Interesting behavior is a subset of abnormal behavior.

We don't know what interesting behavior looks like, but we know it's not normal.

We need a scheme that can detect abnormal behavior.

We prefer a scheme that can learn from true positive results.

I attempt to automatically discover abnormal behavior and to learn from known abnormal behavior.

Enter RAACD

RAACD provides a list of hosts that it thinks are behaving abnormally.

Violet.cs

A host's page includes graphs for each property.

Violet.cs

A host's page includes graphs of the anomaly score for each property:

RAACD

RAACD reduces the amount of data the administrator must examine.

RAACD doesn't require any a priori definition of abnormal behavior.

RAACD applies an anomaly-detection algorithm to detect abnormal behavior.

Terms

Detecting Abnormal Behavior

Abnormal behavior is usually also anomalous.

I apply an anomaly detection algorithm to each property for each host.

If a host has multiple simultaneously anomalous properties, I consider
its behavior abnormal.

This technique doesn't detect only bad or interesting behavior, but it does filter out information about hosts that are behaving normally.

An Example of Abnormal Behavior

Anomaly-detection Techniques

I have implemented three techniques to detect anomalies:

Anomaly-detection Techniques


The WP analysis and baseline analysis methods are based directly on techniques presented by Wei et al1.

Profile search is my extension to the baseline analysis method.

All three methods rely on Symbolic Aggregate approXimation (SAX)2 to summarize the time series.

1 Li Wei, Nitin Kumar, Venkata Lolla, Eamonn J. Keogh, Stefano Lonardi, and Chotirat Ratanamahatana. Assumption-free anomaly detection in time series. In Proceedings of the 17th international conference on Scientific and statistical database management, SSDBM'2005, pages 237-240, Berkeley, CA, US, 2005. Lawrence Berkeley Laboratory.
2 Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, DMKD '03, pages 2-11, New York, NY, USA, 2003. ACM.

Symbolic Aggregate Approximation - SAX

The name has two parts:

SAX

Given the series

[25, 24, 20, 15, 12, 12, 12, 13, 11, 9, 6, 3]

SAX normalizes the series by subtracting from each sample the mean and then dividing the result by the standard deviation:

[1.73, 1.58, 0.98, 0.23, -0.23, -0.23, -0.23, -0.08, -0.38, -0.68, -1.13, -1.58]

SAX then partitions the series and computes the mean of each partition:

mean([ 1.73,   1.58,   0.98]) -->  1.43
mean([ 0.23,  -0.23,  -0.23]) --> -0.08
mean([-0.23,  -0.08,  -0.38]) --> -0.23 
mean([-0.68,  -1.13,  -1.58]) --> -1.13

Assign symbols

My SAX implementation uses the alphabet "abcd".

To ensure that each of the four symbols appears with equal probability, SAX uses the following table to assign symbols:

Symbol Range
a < -0.675
b -0.675 .. 0.0
c 0.0 .. 0.675
d > 0.675
Mean Symbol
1.43 d
-0.08 b
-0.23 b
-1.13 a

SAX converts the series to the word "dbba".

Longer series

SAX converts longer series into lists of words:

I convert computer-health data to the SAX representation.

Computing Distance

I use the method introduced by Wei et al. to compute the distance between two sets of words. For each set of words:

func histogram_distance(A, B):
    dist = 0
    for subword in A:
        dist += (A[subword] - B[subword]) ** 2
    return dist

Example Distance Calculation

Words: "abbbd", "bdaab"

Count n-symbol subwords:

word: "abbbd"    word: "bdaab"
aa: 0            aa: 1
ab: 1            ab: 1
bb: 2            bb: 0
bd: 1            bd: 1
da: 0            da: 1

Normalize and subtract:

word: "abbbd"   word: "bdaab"   difference (squared)
aa: 0.00        aa: 1.00         -1.00      (1.00)
ab: 0.50        ab: 1.00         -0.50      (0.25)
bb: 1.00        bb: 0.00          1.00      (1.00)
bd: 0.50        bd: 1.00         -0.50      (0.25)
da: 0.00        da: 1.00         -1.00      (1.00)
                                    Total:   3.50

WP analysis

WP analysis slides two windows across the list of words obtained from SAX.

The lead window should be long enough to capture one cycle of normal behavior.

The lag window should be 2 or 3 times the length of the lead window.

At each time step in the series, WP analysis computes subword histograms from the lead and lag windows and computes the distance between them.

I use the distance as the anomaly score and associate the score with the sample at the border between the two windows.

WP Example

I applied WP analysis to a synthesized series:

WP analysis produces a double peak around a point anomaly.

Baseline analysis

Baseline analysis slides only one window, the inspection window, across the list of words.

At each time step, baseline analysis builds a subword histogram from the inspection window and computes the distance to a precomputed subword histogram.

I build the precomputed subword histogram from a series that represents normal behavior.

Baseline analysis associates the anomaly score with the center of the inspection window.

Baseline Example

I applied baseline analysis to a synthesized series:

Baseline analysis produces a sharp peak at a point anomaly.

Profile Search

A profile search is very similar to baseline analysis.

I build a precomputed profile from a series that contains a specific anomaly.

I produce the anomaly scores from the vector of distances as follows:

I can use this technique to detect specific patterns that I've previously discovered with other tools.

Profile Search

I searched for the 26 samples surrounding the anomaly:

Profile search generates a broad peak around an anomaly.

Analyzing Real Data

WP analysis generates a double peak.

WP correctly detects several anomalies.

Analyzing Real Data

Baseline analysis also performs well.

Analyzing Real Data

Profile search does not even perform well on synthesized data with noise.

The peaks in the anomaly score are too imprecise to be of any use.

Detection still isn't automatic.

The anomaly-score curves make it easy for a human to see anomalies in the data.

A computer needs a cutoff point.

The examples show that there is no clear threshold for "anomalous."

I have to look at more than one curve to detect abnormal behavior.

Multi-property detection

I apply multi-property detection on a per-host basis.

I look at all of the properties for one host simultaneously.

I first normalize the anomaly-score curves to have a maximum value of 1.0.

The algorithm looks for a region where multiple anomaly-score curves are most anomalous.

RAACD

RAACD implements multi-property detection on top of baseline analysis.

Baseline analysis handles real data better than WP.

RAACD uses a threshold value of 0.6.

When three or more anomaly scores exceed 0.6, RAACD adds the current host to a list of hosts exhibiting abnormal behavior.

RAACD generates HTML pages from this list.

RAACD and NodeScape

I implemented RAACD as a presentation front-end to NodeScape v2.

NodeScape v2 collects and stores computer-health data.

The NodeScape project is the first result of Aggregate.org's research on smarter computer monitoring.

Smarter Monitoring

KAOS moved into Marksbury => new machine room

Big windows warrant a pretty display

We devised NodeScape

NodeScape and the "bigger, faster, better" problem:

NodeScape v1

The first version of NodeScape employs novel presentation techniques:

NodeScape v1 is about making the information easy for a human to reduce.

Smarter Monitoring and RAACD

RAACD reduces the amount of information a human has to see.

In addition to presentation techniques, RAACD employs new analysis techniques.

Other Monitoring Tools

These tools are all great for collecting and presenting computer-health data.

None of these tools detects abnormal behavior.

Pulsar

Pulsar detects alarming behavior.

Pulsar both reduces the amount of data presented and makes the data presented easy to reduce.

Drawback: The administrator must know what sort of behavior to expect when deriving a formula to compute the comfort level.

RAACD can detect unexpected behavior.

Future Work

I want to spend more time studying the behavior of these anomaly-detection methods.

I want to find a new way to represent information about anomalies.

I want to release RAACD publicly.

I want to study multi-host detection.

I want to improve multi-property detection.

I want to try 5-symbol SAX.

Conclusion

I reimplemented the SAX method.

I implemented anomaly-detection methods similar to those presented by Wei et al.

I developed my own anomaly-detection method, profile search.

I developed the multi-property method for detecting abnormal behavior.

I implemented multi-property detection in RAACD as a front-end to NodeScape and currently use it to monitor approximately 30 hosts for abnormal behavior.

Questions?

J. Frank Roberts

Computer Science, University of Kentucky