CROPS#
- class CROPS(cost: BaseCost | None = None, min_penalty: float | None = None, max_penalty: float | None = None, selection_method: str = 'bic', min_segment_length: int | None = None, step_size: int = 1, split_cost: float = 0.0, prune: bool = True, pruning_margin: float = 0.0, middle_penalty_nudge: float = 1e-05)[source][source]#
CROPS algorithm for path solutions to the PELT algorithm.
This change detector solves for all penalised optimal partitionings within the penalty range
[min_penalty, max_penalty]using the CROPS algorithm [1], which in turn employs the PELT algorithm to repeatedly solve penalised optimal partitioning problems for different penalties.When predicting changepoints through
predict_changepoints(orpredict_all), the detector selects the best segmentation among the optimal partitionings within the penalty range, using theselection_methodcriterion.- Parameters:
- costBaseCost or None, default=None
The cost function to use. Must be a
BaseCostinstance withscore_type='cost'. IfNone, defaults toL2Cost().- min_penaltyfloat or None, default=None
The start of the penalty solution interval. Must be non-negative and strictly less than
max_penalty. IfNone, defaults to0.5 * cost.get_default_penalty()after fitting.- max_penaltyfloat or None, default=None
The end of the penalty solution interval. Must be strictly greater than
min_penalty. IfNone, defaults to5.0 * cost.get_default_penalty()after fitting.- selection_methodstr, default=”bic”
The segmentation selection method to use when selecting the best segmentation among the optimal segmentations found within the penalty range
[min_penalty, max_penalty]. Options:"bic": Select the segmentation with the lowest Bayesian Information Criterion, computed assegmentation_cost + n_segments * cost.get_default_penalty()."elbow": Select the segmentation with the highest elbow score, defined as the improvement in squared residuals when allowing a change in slope of segmentation cost regressed on the number of changepoints, evaluated at each intermediate number of changepoints.
- min_segment_lengthint or None, default=None
Minimum number of samples in a segment. Must be at least
cost.min_size. IfNone, defaults to2 * cost.min_sizeafter fitting. The 2x factor provides a finite-sample safety floor that prevents spurious short segments from scale-estimating costs (e.g. Gaussian, Laplace).- step_sizeint, default=1
Only indices that are multiples of
step_sizefrom the start are considered as potential changepoints. Implicitly ensures thatmin_segment_length >= step_size, but it is an error to specifymin_segment_lengthgreater thanstep_sizewhenstep_size > 1.- split_costfloat, default=0.0
The cost of splitting a segment, to ensure that
cost(X[t:p]) + cost(X[p:(s+1)]) + split_cost <= cost(X[t:(s+1)]), for all valid splits0 <= t < p < s <= len(X) - 1. The default of0.0is sufficient for log-likelihood cost functions.- prunebool, default=True
If
False, drop the pruning step, reverting to optimal partitioning. Useful for debugging and testing.- pruning_marginfloat, default=0.0
The pruning margin to use. Used to reduce pruning of the admissible starts set, which can be useful when the cost function is imprecise (e.g. based on solving an optimisation problem with large tolerance).
- middle_penalty_nudgefloat, default=1e-5
When computing the threshold penalty value separating PELT solutions with differing numbers of changepoints, the penalty is nudged upwards in order to solve for the segmentation with fewer changepoints. The default of
1e-5is sufficient for most cases.
- Attributes:
- cost_BaseCost
Fitted cost scorer.
- min_penalty_float
Effective
min_penaltyused after resolving the default.- max_penalty_float
Effective
max_penaltyused after resolving the default.- min_segment_length_int
Effective minimum segment length used.
Methods
fit(X[, y])Fit the cost to training data.
fit_predict(X[, y])Fit to data, then predict per-sample segment labels.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
predict(X)Detect changepoints, returning per-sample segment labels.
predict_all(X)Run CROPS and return all outputs in a single pass.
Detect changepoints in a time series.
set_params(**params)Set the parameters of this estimator.
References
[1]Haynes, K., Eckley, I. A., & Fearnhead, P. (2017). Computationally efficient changepoint detection for a range of penalties. Journal of Computational and Graphical Statistics, 26(1), 134-143.
Examples
>>> import numpy as np >>> from skchange.new_api.detectors import CROPS >>> rng = np.random.default_rng(2) >>> X = np.concatenate([rng.normal(0, 1, (100, 1)), ... rng.normal(10, 1, (100, 1))]) >>> detector = CROPS(min_penalty=1.0, max_penalty=50.0) >>> detector.fit(X).predict_changepoints(X) array([100])
- fit(X: ArrayLike, y: ArrayLike | None = None) Self[source][source]#
Fit the cost to training data.
- Parameters:
- XArrayLike of shape (n_samples, n_features)
Training time series data.
- yArrayLike | None, default=None
Ignored.
- Returns:
- selfCROPS
Fitted detector.
- predict_all(X: ArrayLike) dict[source][source]#
Run CROPS and return all outputs in a single pass.
- Parameters:
- XArrayLike of shape (n_samples, n_features)
Time series to analyse for changepoints.
- Returns:
- resultdict with keys:
"changepoints"np.ndarray of shape (n_changepoints,)Sorted integer indices of the selected changepoints.
"changepoints_metadata"dict of np.ndarrayDictionary with keys
num_changepoints,penalty,segmentation_cost,optimum_value, and eitherbic_valueorelbow_score. All arrays have the same length and rows are aligned by index, sorted bynum_changepointsascending."changepoints_lookup"dict[int, np.ndarray]Mapping from number of changepoints to the corresponding changepoint indices for each segmentation in the path.
"optimal_penalty"floatPenalty corresponding to the selected segmentation.
- predict_changepoints(X: ArrayLike) ndarray[source][source]#
Detect changepoints in a time series.
- Parameters:
- XArrayLike of shape (n_samples, n_features)
Time series to analyse for changepoints.
- Returns:
- changepointsnp.ndarray of shape (n_changepoints,)
Sorted integer indices of detected changepoints.
- fit_predict(X, y: ArrayLike | None = None, **fit_params) ndarray[source]#
Fit to data, then predict per-sample segment labels.
Equivalent to calling fit(X, y).predict(X).
- Parameters:
- XArrayLike of shape (n_samples, n_features)
Time series data.
- yNone
Ignored. Exists for sklearn API compatibility.
- **fit_paramsdict
Additional parameters passed to fit().
- Returns:
- labelsnp.ndarray of shape (n_samples,)
Dense integer segment labels, one per sample.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X: ArrayLike) ndarray[source]#
Detect changepoints, returning per-sample segment labels.
Default implementation calls predict_changepoints() and converts the changepoint indices to an array with segment labels per sample. Subclasses may override this directly when the algorithm natively produces labels.
- Parameters:
- XArrayLike of shape (n_samples, n_features)
Time series data.
- Returns:
- labelsnp.ndarray of shape (n_samples,)
Dense integer segment labels, one per sample. Segment 0 is the first segment, segment 1 the next, and so on.
Examples
>>> labels = detector.fit(X_train).predict(X_test) >>> labels.shape (n_samples,)
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.