SeededBinarySegmentation#
- class SeededBinarySegmentation(change_score: BaseIntervalScorer | None = None, penalty: ArrayLike | float | None = None, penalty_scale: float = 1.0, agg: str = 'sum', min_subinterval_length: int = 5, max_interval_length: int | None = None, growth_factor: float = 1.5, selection_method: str = 'greedy')[source][source]#
Seeded binary segmentation algorithm for multiple changepoint detection.
Binary segmentation type changepoint detection algorithms recursively split the data into two segments and test whether the two segments are different. The seeded binary segmentation algorithm is an efficient version of such algorithms that tests for changepoints in intervals of exponentially growing length. It has the same theoretical guarantees as the original binary segmentation algorithm but runs in log-linear time regardless of the changepoint configuration.
The
penalty,penalty_scale, andaggparameters are ignored whenchange_scoreis an inherently penalised scorer (tagpenalised=True); in that case the scorer owns aggregation and penalisation.- Parameters:
- change_scoreBaseIntervalScorer or None, default=None
Change score to use in the algorithm. Must be an instance of
BaseIntervalScorerwithscore_type="change_score". IfNone, defaults toCUSUM().Standard usage is to pass an unpenalised score and set
penaltyseparately. If the score already has tagpenalised=True, it owns its own aggregation and penalty, andpenalty/penalty_scaleare ignored.- penaltyfloat, array-like of shape (n_features,) or None, default=None
Penalty subtracted from the aggregated feature-wise score. A candidate changepoint is accepted only when the penalised score is positive.
float: scalar penalty subtracted from the aggregated score (seeagg).array-likeof lengthn_features, non-decreasing: elementiis the penalty fori+1features jointly affected; the detector picks theklargest feature scores maximisingsum(top_k) - penalty[k-1](handles sparse changes). A strictly linear array uses a faster code path. Only consistent with the defaultagg="sum".None: defaults tochange_score.get_default_penalty()(BIC-based) at fit time.
Ignored when
change_scoreis already penalised.- penalty_scalefloat, default=1.0
Multiplicative factor applied to the effective penalty (whether
penaltyis user-provided or the default). Use as a single scalar tuning knob that preserves the shape of array penalties. Must be positive. Ignored whenchange_scoreis already penalised.- agg{“sum”, “max”}, default=”sum”
How feature-wise raw scores are aggregated into a single score per interval before subtracting a scalar penalty:
"sum":sum(scores, axis=1) - penalty(dense change assumption)."max":max(scores, axis=1) - penalty(a single, unknown feature changes).
Ignored when
change_scoreis already aggregated. Must be"sum"whenpenaltyis array-valued (array penalties imply top-k aggregation).- min_subinterval_lengthint, default=5
Minimum length of a subinterval on each side of a candidate split point within each evaluated interval. The effective minimum used is
max(min_subinterval_length, change_score.min_size), so the actual minimum may be larger than the value provided here when the change score requires more samples per segment. Note that this does not impose a lower bound on the spacing between detected changepoints.- max_interval_lengthint or None, default=None
The maximum length of an interval to evaluate a changepoint in. Must be at least
2 * min_subinterval_length. IfNone, defaults tomin(200, n_samples)after fitting.- growth_factorfloat, default=1.5
Growth factor for the seeded intervals. Intervals grow in size according to
interval_len = max(interval_len + 1, floor(growth_factor * interval_len)), starting atinterval_len = 2 * min_subinterval_length. It also governs the amount of overlap between intervals of the same length, as the start of each interval is shifted by a factor of1 - 1 / growth_factor. Larger values produce fewer, less-overlapping intervals (faster but coarser); smaller values produce more, more-overlapping intervals (slower but finer). Must be in(1, 2].- selection_methodstr, default=”greedy”
Method for selecting the final set of changepoints from candidate intervals with positive penalised score. Options:
"greedy": Select the interval with the highest score, remove all overlapping intervals containing the detected changepoint, and repeat until no intervals remain with a positive score."narrowest": Among intervals with positive scores, select the narrowest one, remove all overlapping intervals containing the detected changepoint, and repeat. Corresponds to the narrowest-over-threshold approach of [2].
- Attributes:
- change_score_BaseIntervalScorer
Fitted change score. When
change_scoreis unpenalised this is the fitted unpenalised scorer (same type as the input). When the input is already penalised, it is that fitted penalised scorer.- penalty_float, np.ndarray or None
Effective penalty actually used at detection time: the resolved base penalty (either
penaltyor, ifNone,change_score.get_default_penalty()) multiplied bypenalty_scale.Nonewhenchange_scoreis inherently penalised, in which case the scorer owns its own penalty.- min_subinterval_length_int
Effective minimum split size used.
- max_interval_length_int
Effective maximum interval length used.
Methods
fit(X[, y])Fit the change score to training data.
fit_predict(X[, y])Fit to data, then predict per-sample segment labels.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
predict(X)Detect changepoints, returning per-sample segment labels.
predict_all(X)Run seeded binary segmentation and return all outputs in a single pass.
Detect changepoints in a time series.
predict_scores(X[, return_index])Return the per-interval scoring objective evaluated on
X.set_params(**params)Set the parameters of this estimator.
Notes
Typical usage recipes:
Default:
SeededBinarySegmentation()
Tune sensitivity on a log grid:
SeededBinarySegmentation(penalty_scale=5.0)
Detect changes in a single, unknown feature (sparse case):
SeededBinarySegmentation(agg="max")
Adaptive sparsity via an array penalty (top-k aggregation):
SeededBinarySegmentation(penalty=linear_chi2_penalty(n, n_features))
Power-user path with a joint scorer that owns aggregation and penalty:
SeededBinarySegmentation(change_score=ESACScore(...))
References
[1]Kovács, S., Bühlmann, P., Li, H., & Munk, A. (2023). Seeded binary segmentation: a general methodology for fast and optimal changepoint detection. Biometrika, 110(1), 249-256.
[2]Baranowski, R., Chen, Y., & Fryzlewicz, P. (2019). Narrowest-over-threshold detection of multiple change points and change-point-like features. Journal of the Royal Statistical Society Series B: Statistical Methodology, 81(3), 649-672.
Examples
>>> import numpy as np >>> from skchange.new_api.detectors import SeededBinarySegmentation >>> rng = np.random.default_rng(2) >>> X = np.concatenate([rng.normal(0, 1, (100, 1)), ... rng.normal(10, 1, (100, 1))]) >>> detector = SeededBinarySegmentation() >>> detector.fit(X).predict_changepoints(X) array([100])
- fit(X: ArrayLike, y: ArrayLike | None = None) Self[source][source]#
Fit the change score to training data.
- Parameters:
- XArrayLike of shape (n_samples, n_features)
Training time series data.
- yArrayLike | None, default=None
Ignored.
- Returns:
- selfSeededBinarySegmentation
Fitted detector.
- predict_all(X: ArrayLike) dict[source][source]#
Run seeded binary segmentation and return all outputs in a single pass.
- Parameters:
- XArrayLike of shape (n_samples, n_features)
Time series to analyse for changepoints.
- Returns:
- resultdict with keys:
"changepoints"np.ndarray of shape (n_changepoints,)Sorted integer indices of detected changepoints.
"interval_starts"np.ndarrayStart indices of the seeded intervals evaluated.
"interval_ends"np.ndarrayEnd indices of the seeded intervals evaluated.
"interval_max_scores"np.ndarrayMaximum penalised score within each seeded interval.
"interval_argmax_splits"np.ndarrayIndex of the best split (changepoint candidate) per interval.
- predict_changepoints(X: ArrayLike) ndarray[source][source]#
Detect changepoints in a time series.
- Parameters:
- XArrayLike of shape (n_samples, n_features)
Time series to analyse for changepoints.
- Returns:
- changepointsnp.ndarray of shape (n_changepoints,)
Sorted integer indices of detected changepoints. A changepoint at index
tmeans sampletis the first sample of a new segment, i.e. a structural break occurs between samplest-1andt. Empty array if no changepoints are detected.
- predict_scores(X: ArrayLike, return_index: bool = False) ndarray | tuple[ndarray, dict[str, ndarray]][source][source]#
Return the per-interval scoring objective evaluated on
X.Returns the per-interval maximum of the aggregated, penalised change score across candidate splits, using the exact same seeded intervals and candidate splits as
predict. The output is whatpredictreduces over via the selection step, without the selection itself.For penalty calibration, use the free function
skchange.new_api.tuning.unpenalised_scores, which fits a clone of this detector with the penalty parameter set to zero and returns the resulting unpenalised scores.- Parameters:
- XArrayLike of shape (n_samples, n_features)
Time series to evaluate.
- return_indexbool, default=False
If
True, also return a dict locating each score on the time axis. See the Returns section for the keys.
- Returns:
- scoresnp.ndarray of shape (n_intervals,)
Aggregated, penalised change-score values, one per seeded interval, at the best split within that interval. Returned alone when
return_index=False.- indexdict, optional
Only returned when
return_index=True. Contains:"starts": np.ndarray of shape (n_intervals,) Start indices of the seeded intervals."ends": np.ndarray of shape (n_intervals,) End indices of the seeded intervals."argmax_splits": np.ndarray of shape (n_intervals,) Index of the maximising split within each interval.
- fit_predict(X, y: ArrayLike | None = None, **fit_params) ndarray[source]#
Fit to data, then predict per-sample segment labels.
Equivalent to calling fit(X, y).predict(X).
- Parameters:
- XArrayLike of shape (n_samples, n_features)
Time series data.
- yNone
Ignored. Exists for sklearn API compatibility.
- **fit_paramsdict
Additional parameters passed to fit().
- Returns:
- labelsnp.ndarray of shape (n_samples,)
Dense integer segment labels, one per sample.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X: ArrayLike) ndarray[source]#
Detect changepoints, returning per-sample segment labels.
Default implementation calls predict_changepoints() and converts the changepoint indices to an array with segment labels per sample. Subclasses may override this directly when the algorithm natively produces labels.
- Parameters:
- XArrayLike of shape (n_samples, n_features)
Time series data.
- Returns:
- labelsnp.ndarray of shape (n_samples,)
Dense integer segment labels, one per sample. Segment 0 is the first segment, segment 1 the next, and so on.
Examples
>>> labels = detector.fit(X_train).predict(X_test) >>> labels.shape (n_samples,)
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.