SeededBinarySegmentation#

class SeededBinarySegmentation(change_score: BaseIntervalScorer | None = None, penalty: ArrayLike | float | None = None, penalty_scale: float = 1.0, agg: str = 'sum', min_subinterval_length: int = 5, max_interval_length: int | None = None, growth_factor: float = 1.5, selection_method: str = 'greedy')[source][source]#

Seeded binary segmentation algorithm for multiple changepoint detection.

Binary segmentation type changepoint detection algorithms recursively split the data into two segments and test whether the two segments are different. The seeded binary segmentation algorithm is an efficient version of such algorithms that tests for changepoints in intervals of exponentially growing length. It has the same theoretical guarantees as the original binary segmentation algorithm but runs in log-linear time regardless of the changepoint configuration.

The penalty, penalty_scale, and agg parameters are ignored when change_score is an inherently penalised scorer (tag penalised=True); in that case the scorer owns aggregation and penalisation.

Parameters:
change_scoreBaseIntervalScorer or None, default=None

Change score to use in the algorithm. Must be an instance of BaseIntervalScorer with score_type="change_score". If None, defaults to CUSUM().

Standard usage is to pass an unpenalised score and set penalty separately. If the score already has tag penalised=True, it owns its own aggregation and penalty, and penalty / penalty_scale are ignored.

penaltyfloat, array-like of shape (n_features,) or None, default=None

Penalty subtracted from the aggregated feature-wise score. A candidate changepoint is accepted only when the penalised score is positive.

  • float: scalar penalty subtracted from the aggregated score (see agg).

  • array-like of length n_features, non-decreasing: element i is the penalty for i+1 features jointly affected; the detector picks the k largest feature scores maximising sum(top_k) - penalty[k-1] (handles sparse changes). A strictly linear array uses a faster code path. Only consistent with the default agg="sum".

  • None: defaults to change_score.get_default_penalty() (BIC-based) at fit time.

Ignored when change_score is already penalised.

penalty_scalefloat, default=1.0

Multiplicative factor applied to the effective penalty (whether penalty is user-provided or the default). Use as a single scalar tuning knob that preserves the shape of array penalties. Must be positive. Ignored when change_score is already penalised.

agg{“sum”, “max”}, default=”sum”

How feature-wise raw scores are aggregated into a single score per interval before subtracting a scalar penalty:

  • "sum": sum(scores, axis=1) - penalty (dense change assumption).

  • "max": max(scores, axis=1) - penalty (a single, unknown feature changes).

Ignored when change_score is already aggregated. Must be "sum" when penalty is array-valued (array penalties imply top-k aggregation).

min_subinterval_lengthint, default=5

Minimum length of a subinterval on each side of a candidate split point within each evaluated interval. The effective minimum used is max(min_subinterval_length, change_score.min_size), so the actual minimum may be larger than the value provided here when the change score requires more samples per segment. Note that this does not impose a lower bound on the spacing between detected changepoints.

max_interval_lengthint or None, default=None

The maximum length of an interval to evaluate a changepoint in. Must be at least 2 * min_subinterval_length. If None, defaults to min(200, n_samples) after fitting.

growth_factorfloat, default=1.5

Growth factor for the seeded intervals. Intervals grow in size according to interval_len = max(interval_len + 1, floor(growth_factor * interval_len)), starting at interval_len = 2 * min_subinterval_length. It also governs the amount of overlap between intervals of the same length, as the start of each interval is shifted by a factor of 1 - 1 / growth_factor. Larger values produce fewer, less-overlapping intervals (faster but coarser); smaller values produce more, more-overlapping intervals (slower but finer). Must be in (1, 2].

selection_methodstr, default=”greedy”

Method for selecting the final set of changepoints from candidate intervals with positive penalised score. Options:

  • "greedy": Select the interval with the highest score, remove all overlapping intervals containing the detected changepoint, and repeat until no intervals remain with a positive score.

  • "narrowest": Among intervals with positive scores, select the narrowest one, remove all overlapping intervals containing the detected changepoint, and repeat. Corresponds to the narrowest-over-threshold approach of [2].

Attributes:
change_score_BaseIntervalScorer

Fitted change score. When change_score is unpenalised this is the fitted unpenalised scorer (same type as the input). When the input is already penalised, it is that fitted penalised scorer.

penalty_float, np.ndarray or None

Effective penalty actually used at detection time: the resolved base penalty (either penalty or, if None, change_score.get_default_penalty()) multiplied by penalty_scale. None when change_score is inherently penalised, in which case the scorer owns its own penalty.

min_subinterval_length_int

Effective minimum split size used.

max_interval_length_int

Effective maximum interval length used.

Methods

fit(X[, y])

Fit the change score to training data.

fit_predict(X[, y])

Fit to data, then predict per-sample segment labels.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

predict(X)

Detect changepoints, returning per-sample segment labels.

predict_all(X)

Run seeded binary segmentation and return all outputs in a single pass.

predict_changepoints(X)

Detect changepoints in a time series.

predict_scores(X[, return_index])

Return the per-interval scoring objective evaluated on X.

set_params(**params)

Set the parameters of this estimator.

Notes

Typical usage recipes:

  • Default:

    SeededBinarySegmentation()
    
  • Tune sensitivity on a log grid:

    SeededBinarySegmentation(penalty_scale=5.0)
    
  • Detect changes in a single, unknown feature (sparse case):

    SeededBinarySegmentation(agg="max")
    
  • Adaptive sparsity via an array penalty (top-k aggregation):

    SeededBinarySegmentation(penalty=linear_chi2_penalty(n, n_features))
    
  • Power-user path with a joint scorer that owns aggregation and penalty:

    SeededBinarySegmentation(change_score=ESACScore(...))
    

References

[1]

Kovács, S., Bühlmann, P., Li, H., & Munk, A. (2023). Seeded binary segmentation: a general methodology for fast and optimal changepoint detection. Biometrika, 110(1), 249-256.

[2]

Baranowski, R., Chen, Y., & Fryzlewicz, P. (2019). Narrowest-over-threshold detection of multiple change points and change-point-like features. Journal of the Royal Statistical Society Series B: Statistical Methodology, 81(3), 649-672.

Examples

>>> import numpy as np
>>> from skchange.new_api.detectors import SeededBinarySegmentation
>>> rng = np.random.default_rng(2)
>>> X = np.concatenate([rng.normal(0, 1, (100, 1)),
...                     rng.normal(10, 1, (100, 1))])
>>> detector = SeededBinarySegmentation()
>>> detector.fit(X).predict_changepoints(X)
array([100])
fit(X: ArrayLike, y: ArrayLike | None = None) Self[source][source]#

Fit the change score to training data.

Parameters:
XArrayLike of shape (n_samples, n_features)

Training time series data.

yArrayLike | None, default=None

Ignored.

Returns:
selfSeededBinarySegmentation

Fitted detector.

predict_all(X: ArrayLike) dict[source][source]#

Run seeded binary segmentation and return all outputs in a single pass.

Parameters:
XArrayLike of shape (n_samples, n_features)

Time series to analyse for changepoints.

Returns:
resultdict with keys:
"changepoints"np.ndarray of shape (n_changepoints,)

Sorted integer indices of detected changepoints.

"interval_starts"np.ndarray

Start indices of the seeded intervals evaluated.

"interval_ends"np.ndarray

End indices of the seeded intervals evaluated.

"interval_max_scores"np.ndarray

Maximum penalised score within each seeded interval.

"interval_argmax_splits"np.ndarray

Index of the best split (changepoint candidate) per interval.

predict_changepoints(X: ArrayLike) ndarray[source][source]#

Detect changepoints in a time series.

Parameters:
XArrayLike of shape (n_samples, n_features)

Time series to analyse for changepoints.

Returns:
changepointsnp.ndarray of shape (n_changepoints,)

Sorted integer indices of detected changepoints. A changepoint at index t means sample t is the first sample of a new segment, i.e. a structural break occurs between samples t-1 and t. Empty array if no changepoints are detected.

predict_scores(X: ArrayLike, return_index: bool = False) ndarray | tuple[ndarray, dict[str, ndarray]][source][source]#

Return the per-interval scoring objective evaluated on X.

Returns the per-interval maximum of the aggregated, penalised change score across candidate splits, using the exact same seeded intervals and candidate splits as predict. The output is what predict reduces over via the selection step, without the selection itself.

For penalty calibration, use the free function skchange.new_api.tuning.unpenalised_scores, which fits a clone of this detector with the penalty parameter set to zero and returns the resulting unpenalised scores.

Parameters:
XArrayLike of shape (n_samples, n_features)

Time series to evaluate.

return_indexbool, default=False

If True, also return a dict locating each score on the time axis. See the Returns section for the keys.

Returns:
scoresnp.ndarray of shape (n_intervals,)

Aggregated, penalised change-score values, one per seeded interval, at the best split within that interval. Returned alone when return_index=False.

indexdict, optional

Only returned when return_index=True. Contains:

  • "starts" : np.ndarray of shape (n_intervals,) Start indices of the seeded intervals.

  • "ends" : np.ndarray of shape (n_intervals,) End indices of the seeded intervals.

  • "argmax_splits" : np.ndarray of shape (n_intervals,) Index of the maximising split within each interval.

fit_predict(X, y: ArrayLike | None = None, **fit_params) ndarray[source]#

Fit to data, then predict per-sample segment labels.

Equivalent to calling fit(X, y).predict(X).

Parameters:
XArrayLike of shape (n_samples, n_features)

Time series data.

yNone

Ignored. Exists for sklearn API compatibility.

**fit_paramsdict

Additional parameters passed to fit().

Returns:
labelsnp.ndarray of shape (n_samples,)

Dense integer segment labels, one per sample.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X: ArrayLike) ndarray[source]#

Detect changepoints, returning per-sample segment labels.

Default implementation calls predict_changepoints() and converts the changepoint indices to an array with segment labels per sample. Subclasses may override this directly when the algorithm natively produces labels.

Parameters:
XArrayLike of shape (n_samples, n_features)

Time series data.

Returns:
labelsnp.ndarray of shape (n_samples,)

Dense integer segment labels, one per sample. Segment 0 is the first segment, segment 1 the next, and so on.

Examples

>>> labels = detector.fit(X_train).predict(X_test)
>>> labels.shape
(n_samples,)
set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.