Detectors#

Skchange detectors inherit from and extend the Sktime’s BaseDetector class. This enables a unified interface for both change and anomaly detection, making it easy to switch between different detectors and reuse surrounding code such as preprocessing, evaluation, tuning, and visualisation. It also facilitates the development of new detectors by providing a clear structure and set of guidelines.

Conceptual model#

All detectors in Skchange and Sktime are built around the conceptual model below.

  1. Input: A time series.

  2. Output: Locations of events in the time series.

    • Changepoints,

    • Segments,

    • Point anomalies.

    • Segment anomalies

Length(output) = Number of detected events.

Change detectors#

The task#

Change detection is the task of identifying abrupt changes in the distribution of a time series. The goal is to estimate the time points at which the distribution changes. These points are called change points (or change-points or changepoints).

Here is some 3-dimensional toy data with three changes in the mean of a Gaussian time series with unit variance. This data will be used in the examples throughout this section.

[1]:
from skchange.datasets import generate_piecewise_normal_data

x = generate_piecewise_normal_data(
    means=[0, [8.0, 0.0, 0.0], 0.0, [2.0, 3.0, 5.0]],
    lengths=[100, 40, 80, 80],
    seed=8,
)
x.columns = ["var0", "var1", "var2"]
x.index.name = "time"
x
[1]:
var0 var1 var2
time
0 -2.312582 -0.188897 -0.957229
1 0.893600 0.956847 1.392258
2 0.767470 -0.053030 0.859794
3 1.505481 -0.653595 0.610351
4 -0.042674 1.440017 -0.836895
... ... ... ...
295 2.945742 3.326710 5.213129
296 3.370899 2.608499 5.397341
297 1.733972 4.521629 4.686234
298 1.735389 0.345493 5.232872
299 1.625178 3.552635 3.516013

300 rows × 3 columns

[2]:
import plotly.express as px
import plotly.io as pio

pio.renderers.default = "notebook"

px.line(x)

Changes may occur in much more complex ways. For example, changes can affect:

  • Variance.

  • Shape of the distribution.

  • Auto-correlation.

  • The slope of a linear trend.

  • Relationships between variables in multivariate time series.

  • An unknown, small portion of variables in a high-dimensional time series.

Skchange supports detecting changes in all of these scenarios, amongst others.

Composable change detectors#

Let us estimate the change points in the toy data using a change detector.

[3]:
from skchange.change_detectors import MovingWindow
from skchange.change_scores import CUSUM

detector = MovingWindow(
    change_score=CUSUM(),
    penalty=10,
)
detector
/tmp/ipykernel_700/2754083811.py:1: FutureWarning: The current skchange API will be removed in 0.17.0 and replaced by the API currently previewed in `skchange.new_api`. Pin `skchange<0.16` to keep the current API. See the migration guide for details: https://github.com/NorskRegnesentral/skchange/blob/main/skchange/new_api/MIGRATION_GUIDE.md
  from skchange.change_detectors import MovingWindow
[3]:
MovingWindow(change_score=CUSUM(), penalty=10)
Please rerun this cell to show the HTML repr or trust the notebook.

Let us look at each each part of the detector in more detail:

  1. change_score: Represents the choice of feature to detect changes in. CUSUM is a popular choice for detecting changes in the mean of a time series.

  2. penalty: Used to control the complexity of the change point model. The higher the penalty, the fewer change points will be detected.

  3. detector: The search algorithm for detecting change points. It governs the slices of data the change score is evaluated on and how the results are compiled to a final set of detected change points.

In Skchange, all detectors follow the same pattern. They are composed of a score to be evaluated on data intervals, and a penalty. See the section on Interval scorers for more information.

To detect changes and segment anomalies, Skchange follows a familiar scikit-learn-type API. All detectors inherit from the BaseDetector class of Sktime to make it interoperable with the Sktime ecosystem of tools like pipelines, preprocessing, transformations, performance evaluation and so on. This also means that you can use the same API to detect both changes and segment anomalies, regardless of which detector you choose.

fit#

After initialising your detector of choice, you need to fit it to training data before you can use it to detect change points on test data. fit always returns a fitted instance of itself. Not all detectors have any parameters to fit. In this case, fit does nothing. This is the case for our example MovingWindow detector.

[4]:
detector.fit(x)
[4]:
MovingWindow(change_score=CUSUM(), penalty=10)
Please rerun this cell to show the HTML repr or trust the notebook.
[5]:
detector.is_fitted
[5]:
True
[6]:
detector.get_fitted_params()
[6]:
{}

predict#

After fitting the detector, you can use it to detect change points in test data x. The predict method returns a pd.DataFrame with the "ilocs" column holding the integer locations of the detected changepoints.

[7]:
detections = detector.predict(x)
detections
[7]:
ilocs
0 100
1 140
2 220
[8]:
from skchange.utils.plotting import plot_detections

plot_detections(x, detections, data_repr="line")

In Skchange, the change points indicate the inclusive start of a new segment. That is, the segmentation according to the detected changepoints in this example is 0:100, 100:140, 140:220 and 220:300.

transform#

You can use the transform method to label the data according to the change point segmentation. The output is a pd.DataFrame with the same index as the input x and an integer column "labels" indicating which segment the index belongs to.

[9]:
labels = detector.transform(x)
labels
[9]:
labels
time
0 0
1 0
2 0
3 0
4 0
... ...
295 3
296 3
297 3
298 3
299 3

300 rows × 1 columns

[10]:
px.line(labels)

This is useful for group-by operations per segment, for example.

[11]:
x.join(labels).groupby("labels").agg(["mean", "std"])
[11]:
var0 var1 var2
mean std mean std mean std
labels
0 0.091136 1.088356 0.101373 1.050778 -0.170998 0.960050
1 7.858835 1.023154 0.195224 1.132525 -0.116078 0.958947
2 0.006880 0.904106 -0.046340 0.925909 0.151141 0.981135
3 2.009350 1.005611 2.738063 1.113823 4.982940 1.056325

transform_scores#

Some detectors also support the transform_scores method, which returns the penalised change scores for each data point. This is the case for MovingWindow.

[12]:
detection_scores = detector.transform_scores(x)
detection_scores
[12]:
bandwidth 20
time
0 NaN
1 -6.440360
2 -8.462907
3 -8.855577
4 -8.631866
... ...
295 -7.886370
296 -8.512903
297 -8.371328
298 -7.973151
299 -7.280008

300 rows × 1 columns

[13]:
px.line(detection_scores)

For the MovingWindow detector, the peaks in the penalised scores correspond to the detected change points.

Segment anomaly detectors#

The task#

Segment anomaly detection is the task of identifying segments of a time series where the data behaves differently than expected. The goal is to estimate starts and ends of such segments. It is an important special case of change detection where certain segments are deemed “normal” and others are “anomalous”. In most settings, a vast majority of the data is “normal”.

We use the same data as before, but now we consider the segments 100:140 and 220:300 as segment anomalies, and the remaining data as “normal” or “baseline” data.

[14]:
px.line(x)

As for change detection, segment anomalies may also affect the data in numerous other ways than sudden jumps in the mean.

Composable segment anomaly detectors#

Let us use the CAPA detector to detect segment anomalies in the toy data. It consists of the same components as the change detector we used before: A detector (CAPA), an interval score (segment_saving) and a penalty (segment_penalty). “Savings” is one of two types of anomaly scores supported in Skchange. You can read more about them in the Concepts section.

[15]:
from skchange.anomaly_detectors import CAPA
from skchange.anomaly_scores import L2Saving

detector = CAPA(
    segment_saving=L2Saving(),
    segment_penalty=20,
)
detector
[15]:
CAPA(segment_penalty=20, segment_saving=L2Saving())
Please rerun this cell to show the HTML repr or trust the notebook.

fit#

We fit the detector to obtain a fitted instance.

[16]:
detector.fit(x)
[16]:
CAPA(segment_penalty=20, segment_saving=L2Saving())
Please rerun this cell to show the HTML repr or trust the notebook.

predict#

As for change detection, predict is used to detect segment anomalies in test data x. The output is a pd.DataFrame with the "ilocs" column holding the integer locations of segment anomalies as pd.Intervals, and the "labels" column holding unique labels for each segment. The labels run from 1, …, K, where K is the number of detected segment anomalies.

[17]:
detections = detector.predict(x)
detections
[17]:
ilocs labels
0 [100, 140) 1
1 [220, 300) 2
[18]:
plot_detections(x, detections, data_repr="line")

transform#

The transform method labels the data according to the segment anomaly segmentation. The output is a pd.DataFrame with the same index as the input x and an integer column "labels" indicating which segment the index belongs to. The label 0 denotes the normal segments, and the labels >0 denote the segment anomalies.

[19]:
labels = detector.transform(x)
labels
[19]:
labels
time
0 0
1 0
2 0
3 0
4 0
... ...
295 2
296 2
297 2
298 2
299 2

300 rows × 1 columns

[20]:
px.line(labels)

transform_scores#

CAPA also supportes the transform_scores method. It returns the cumulative optimal penalised saving at each index.

[21]:
capa_scores = detector.transform_scores(x)
px.line(capa_scores)