Coding guidelines#

Skchange follows the scikit-learn developer guidelines as closely as possible. This page collects the conventions that are most relevant when writing or reviewing code for Skchange, with links to the scikit-learn documentation for the full rationale.

Formatting and linting#

Formatting and linting are handled by Ruff, configured under [tool.ruff] in pyproject.toml. The same checks run via pre-commit locally and in the continuous integration pipeline.

Key settings:

  • Line length is 88 characters, matching the Black default.

  • Imports are sorted with isort rules.

  • Docstrings are checked with pydocstyle using the numpy convention.

Run all formatting and lint checks locally with:

pre-commit run --all-files

Naming conventions#

Skchange follows scikit-learn naming conventions:

  • snake_case for modules, functions, variables, and parameters.

  • CamelCase for class names.

  • UPPER_CASE for module-level constants.

  • Private helpers and modules are prefixed with a single underscore (e.g. _utils.py, _compute_score).

  • Attributes that are set during fit end with a trailing underscore (e.g. self.threshold_, self.changepoints_). This mirrors the scikit-learn estimated attribute convention and is how callers tell fitted state apart from user-supplied parameters.

Common variable names align with scikit-learn:

  • X is the input data of shape (n_samples, n_features).

  • y is the target or label array. It is rarely used in Skchange’s unsupervised setting.

  • n_samples is the number of timepoints.

  • n_features is the number of variables or channels.

Estimator API conventions#

Detectors, costs, and other estimators in Skchange follow the scikit-learn estimator API. The most important rules are:

  • No work in __init__. __init__ only stores the constructor arguments verbatim on self, with no validation or transformation. See scikit-learn instantiation rules.

  • Parameter names match attribute names. Every constructor argument foo is stored as self.foo so that get_params / set_params and cloning work correctly.

  • All learning happens in fit. Input validation, fitted attributes (ending in _), and self is returned. See scikit-learn fitting.

  • predict and friends are stateless: they return values and do not modify self.

Detectors may expose any of the following predict_* methods, depending on what the algorithm computes.

  • predict(X) returns an ndarray of shape (n_samples,) containing dense per-sample integer segment labels. It is part of the universal interface and is always implemented.

  • predict_changepoints(X) returns an ndarray of shape (n_changepoints,) containing sorted start indices of segments. It is part of the universal interface and is always implemented.

  • predict_segment_anomalies(X) returns an ndarray of shape (n_anomalies, 2) containing start (inclusive) and end (exclusive) indices of anomalous segments. It is only implemented on detectors that identify anomalous segments.

  • predict_scores(X, return_index=False) returns the detector’s internal scoring objective as a 1D ndarray whose length is detector-specific. When return_index=True, it returns the tuple (scores, index_dict) where the dictionary carries algorithm-specific metadata that locates each score on the timeline.

  • predict_all(X) is a convenience method that returns all outputs computed in a single pass as a dictionary. The keys are detector-specific and are not a stable cross-detector contract.

Skchange intentionally deviates from scikit-learn in a few respects to fit the time series context. Most notably, Skchange detectors are not expected to be invariant to the order of the samples, so they fail the check_methods_sample_order_invariance check.

Docstrings#

All public modules, classes, functions, and methods must have a docstring written in the numpydoc style. numpydoc validation is enabled in the documentation build, so missing or malformed sections will be flagged.

A minimal example:

def my_function(x, threshold=0.5):
    """Summary line in imperative mood.

    Extended description, optional.

    Parameters
    ----------
    x : array-like of shape (n_samples, n_features)
        The input data.
    threshold : float, default=0.5
        Description of ``threshold``.

    Returns
    -------
    result : ndarray of shape (n_samples,)
        Description of the return value.
    """

Guidelines:

  • Start with a one-line summary in the imperative mood (“Compute …”, not “Computes …”).

  • Document every parameter and return value with type and shape.

  • Use default=<value> rather than optional for parameters that have a default, matching scikit-learn.

  • Cross-reference other Skchange objects with :class:`~skchange.detectors.PELT` or :func:`~skchange.metrics.f1_score`.

  • Include a References section with citations and an Examples section with a runnable doctest where it adds value.

Tests#

Tests live next to the code they cover, in a tests/ subpackage (e.g. skchange/detectors/tests/). Naming follows pytest conventions.

  • Test files are named test_*.py.

  • Test functions are named test_*.

  • Use pytest.mark.parametrize for varying inputs rather than loops.

Each subpackage also contains a shared contract test module that exercises every implementation in that subpackage against a common set of API checks. These modules are named test_all.py in the new API (e.g. skchange/new_api/detectors/tests/test_all.py, skchange/new_api/interval_scorers/tests/test_all.py, skchange/new_api/metrics/tests/test_all.py). The set of instances they run against is declared in a sibling _registry.py file. When you add a new detector, scorer, or metric, register a representative set of instances in the corresponding _registry.py so the shared checks are exercised against your implementation automatically.

These contract tests cover the constructor contract, fit and predict round trips, the fitted-attribute naming convention, cloning, and other estimator-level invariants. They are inspired by scikit-learn’s check_estimator.

Run the full test suite locally with:

pytest

Type hints#

Type hints are encouraged for new public functions but are not required everywhere. When adding hints, prefer numpy.typing aliases (e.g. npt.NDArray[np.floating]) and standard typing constructs over custom protocols.

Further reading#