Segment anomaly detection#

The task#

Segment anomaly detection is the task of identifying segments of a time series where the data behaves differently than expected. The goal is to estimate starts and ends of such segments. It is an important special case of change detection where certain segments are deemed “normal” and others are “anomalous”. In most settings, a vast majority of the data is “normal”.

We use the same data as in the change detection intro, but now we consider the segments 100:140 and 220:300 as segment anomalies, and the remaining data as “normal” or “baseline” data.

[1]:
from skchange.datasets import generate_piecewise_normal_data

x = generate_piecewise_normal_data(
    means=[0, [8.0, 0.0, 0.0], 0.0, [2.0, 3.0, 5.0]],
    lengths=[100, 40, 80, 80],
    seed=8,
)
x.columns = ["var0", "var1", "var2"]
x.index.name = "time"
x
[1]:
var0 var1 var2
time
0 -2.312582 -0.188897 -0.957229
1 0.893600 0.956847 1.392258
2 0.767470 -0.053030 0.859794
3 1.505481 -0.653595 0.610351
4 -0.042674 1.440017 -0.836895
... ... ... ...
295 2.945742 3.326710 5.213129
296 3.370899 2.608499 5.397341
297 1.733972 4.521629 4.686234
298 1.735389 0.345493 5.232872
299 1.625178 3.552635 3.516013

300 rows × 3 columns

[2]:
import plotly.express as px
import plotly.io as pio

pio.renderers.default = "notebook"

px.line(x)

As for change detection, segment anomalies may also affect the data in numerous other ways than sudden jumps in the mean.

Composable segment anomaly detectors#

Let us use the CAPA detector to detect segment anomalies in the toy data. It consists of the same components as the change detector we used before: A detector (CAPA), an interval score (segment_saving) and a penalty (segment_penalty). “Savings” is one of two types of anomaly scores supported in Skchange. You can read more about them in the Concepts section.

[3]:
from skchange.anomaly_detectors import CAPA
from skchange.anomaly_scores import L2Saving

detector = CAPA(
    segment_saving=L2Saving(),
    segment_penalty=20,
)
detector
/tmp/ipykernel_615/1966638737.py:1: FutureWarning: The current skchange API will be removed in 0.17.0 and replaced by the API currently previewed in `skchange.new_api`. Pin `skchange<0.16` to keep the current API. See the migration guide for details: https://github.com/NorskRegnesentral/skchange/blob/main/skchange/new_api/MIGRATION_GUIDE.md
  from skchange.anomaly_detectors import CAPA
[3]:
CAPA(segment_penalty=20, segment_saving=L2Saving())
Please rerun this cell to show the HTML repr or trust the notebook.

fit#

We fit the detector to obtain a fitted instance.

[4]:
detector.fit(x)
[4]:
CAPA(segment_penalty=20, segment_saving=L2Saving())
Please rerun this cell to show the HTML repr or trust the notebook.

predict#

As for change detection, predict is used to detect segment anomalies in test data x. The output is a pd.DataFrame with the "ilocs" column holding the integer locations of segment anomalies as pd.Intervals, and the "labels" column holding unique labels for each segment. The labels run from 1, …, K, where K is the number of detected segment anomalies.

[5]:
detections = detector.predict(x)
detections
[5]:
ilocs labels
0 [100, 140) 1
1 [220, 300) 2
[6]:
from skchange.utils.plotting import plot_detections

plot_detections(x, detections, data_repr="line")

transform#

The transform method labels the data according to the segment anomaly segmentation. The output is a pd.DataFrame with the same index as the input x and an integer column "labels" indicating which segment the index belongs to. The label 0 denotes the normal segments, and the labels >0 denote the segment anomalies.

[7]:
labels = detector.transform(x)
labels
[7]:
labels
time
0 0
1 0
2 0
3 0
4 0
... ...
295 2
296 2
297 2
298 2
299 2

300 rows × 1 columns

[8]:
px.line(labels)

transform_scores#

CAPA also supportes the transform_scores method. It returns the cumulative optimal penalised saving at each index.

[9]:
capa_scores = detector.transform_scores(x)
px.line(capa_scores)