CROPS#

class CROPS(cost: BaseCost | None = None, min_penalty: float | None = None, max_penalty: float | None = None, selection_method: str = 'bic', min_segment_length: int | None = None, step_size: int = 1, split_cost: float = 0.0, prune: bool = True, pruning_margin: float = 0.0, middle_penalty_nudge: float = 1e-05)[source][source]#

CROPS algorithm for path solutions to the PELT algorithm.

This change detector solves for all penalised optimal partitionings within the penalty range [min_penalty, max_penalty] using the CROPS algorithm [1], which in turn employs the PELT algorithm to repeatedly solve penalised optimal partitioning problems for different penalties.

When predicting changepoints through predict_changepoints (or predict_all), the detector selects the best segmentation among the optimal partitionings within the penalty range, using the selection_method criterion.

Parameters:
costBaseCost or None, default=None

The cost function to use. Must be a BaseCost instance with score_type='cost'. If None, defaults to L2Cost().

min_penaltyfloat or None, default=None

The start of the penalty solution interval. Must be non-negative and strictly less than max_penalty. If None, defaults to 0.5 * cost.get_default_penalty() after fitting.

max_penaltyfloat or None, default=None

The end of the penalty solution interval. Must be strictly greater than min_penalty. If None, defaults to 5.0 * cost.get_default_penalty() after fitting.

selection_methodstr, default=”bic”

The segmentation selection method to use when selecting the best segmentation among the optimal segmentations found within the penalty range [min_penalty, max_penalty]. Options:

  • "bic": Select the segmentation with the lowest Bayesian Information Criterion, computed as segmentation_cost + n_segments * cost.get_default_penalty().

  • "elbow": Select the segmentation with the highest elbow score, defined as the improvement in squared residuals when allowing a change in slope of segmentation cost regressed on the number of changepoints, evaluated at each intermediate number of changepoints.

min_segment_lengthint or None, default=None

Minimum number of samples in a segment. Must be at least cost.min_size. If None, defaults to 2 * cost.min_size after fitting. The 2x factor provides a finite-sample safety floor that prevents spurious short segments from scale-estimating costs (e.g. Gaussian, Laplace).

step_sizeint, default=1

Only indices that are multiples of step_size from the start are considered as potential changepoints. Implicitly ensures that min_segment_length >= step_size, but it is an error to specify min_segment_length greater than step_size when step_size > 1.

split_costfloat, default=0.0

The cost of splitting a segment, to ensure that cost(X[t:p]) + cost(X[p:(s+1)]) + split_cost <= cost(X[t:(s+1)]), for all valid splits 0 <= t < p < s <= len(X) - 1. The default of 0.0 is sufficient for log-likelihood cost functions.

prunebool, default=True

If False, drop the pruning step, reverting to optimal partitioning. Useful for debugging and testing.

pruning_marginfloat, default=0.0

The pruning margin to use. Used to reduce pruning of the admissible starts set, which can be useful when the cost function is imprecise (e.g. based on solving an optimisation problem with large tolerance).

middle_penalty_nudgefloat, default=1e-5

When computing the threshold penalty value separating PELT solutions with differing numbers of changepoints, the penalty is nudged upwards in order to solve for the segmentation with fewer changepoints. The default of 1e-5 is sufficient for most cases.

Attributes:
cost_BaseCost

Fitted cost scorer.

min_penalty_float

Effective min_penalty used after resolving the default.

max_penalty_float

Effective max_penalty used after resolving the default.

min_segment_length_int

Effective minimum segment length used.

Methods

fit(X[, y])

Fit the cost to training data.

fit_predict(X[, y])

Fit to data, then predict per-sample segment labels.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

predict(X)

Detect changepoints, returning per-sample segment labels.

predict_all(X)

Run CROPS and return all outputs in a single pass.

predict_changepoints(X)

Detect changepoints in a time series.

set_params(**params)

Set the parameters of this estimator.

References

[1]

Haynes, K., Eckley, I. A., & Fearnhead, P. (2017). Computationally efficient changepoint detection for a range of penalties. Journal of Computational and Graphical Statistics, 26(1), 134-143.

Examples

>>> import numpy as np
>>> from skchange.new_api.detectors import CROPS
>>> rng = np.random.default_rng(2)
>>> X = np.concatenate([rng.normal(0, 1, (100, 1)),
...                     rng.normal(10, 1, (100, 1))])
>>> detector = CROPS(min_penalty=1.0, max_penalty=50.0)
>>> detector.fit(X).predict_changepoints(X)
array([100])
fit(X: ArrayLike, y: ArrayLike | None = None) Self[source][source]#

Fit the cost to training data.

Parameters:
XArrayLike of shape (n_samples, n_features)

Training time series data.

yArrayLike | None, default=None

Ignored.

Returns:
selfCROPS

Fitted detector.

predict_all(X: ArrayLike) dict[source][source]#

Run CROPS and return all outputs in a single pass.

Parameters:
XArrayLike of shape (n_samples, n_features)

Time series to analyse for changepoints.

Returns:
resultdict with keys:
"changepoints"np.ndarray of shape (n_changepoints,)

Sorted integer indices of the selected changepoints.

"changepoints_metadata"dict of np.ndarray

Dictionary with keys num_changepoints, penalty, segmentation_cost, optimum_value, and either bic_value or elbow_score. All arrays have the same length and rows are aligned by index, sorted by num_changepoints ascending.

"changepoints_lookup"dict[int, np.ndarray]

Mapping from number of changepoints to the corresponding changepoint indices for each segmentation in the path.

"optimal_penalty"float

Penalty corresponding to the selected segmentation.

predict_changepoints(X: ArrayLike) ndarray[source][source]#

Detect changepoints in a time series.

Parameters:
XArrayLike of shape (n_samples, n_features)

Time series to analyse for changepoints.

Returns:
changepointsnp.ndarray of shape (n_changepoints,)

Sorted integer indices of detected changepoints.

fit_predict(X, y: ArrayLike | None = None, **fit_params) ndarray[source]#

Fit to data, then predict per-sample segment labels.

Equivalent to calling fit(X, y).predict(X).

Parameters:
XArrayLike of shape (n_samples, n_features)

Time series data.

yNone

Ignored. Exists for sklearn API compatibility.

**fit_paramsdict

Additional parameters passed to fit().

Returns:
labelsnp.ndarray of shape (n_samples,)

Dense integer segment labels, one per sample.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X: ArrayLike) ndarray[source]#

Detect changepoints, returning per-sample segment labels.

Default implementation calls predict_changepoints() and converts the changepoint indices to an array with segment labels per sample. Subclasses may override this directly when the algorithm natively produces labels.

Parameters:
XArrayLike of shape (n_samples, n_features)

Time series data.

Returns:
labelsnp.ndarray of shape (n_samples,)

Dense integer segment labels, one per sample. Segment 0 is the first segment, segment 1 the next, and so on.

Examples

>>> labels = detector.fit(X_train).predict(X_test)
>>> labels.shape
(n_samples,)
set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.