Coding guidelines#
Skchange follows the scikit-learn developer guidelines as closely as possible. This page collects the conventions that are most relevant when writing or reviewing code for Skchange, with links to the scikit-learn documentation for the full rationale.
Formatting and linting#
Formatting and linting are handled by Ruff,
configured under [tool.ruff] in pyproject.toml. The same checks run via
pre-commit locally and in the continuous integration pipeline.
Key settings:
Line length is 88 characters, matching the Black default.
Imports are sorted with
isortrules.Docstrings are checked with
pydocstyleusing the numpy convention.
Run all formatting and lint checks locally with:
pre-commit run --all-files
Naming conventions#
Skchange follows scikit-learn naming conventions:
snake_casefor modules, functions, variables, and parameters.CamelCasefor class names.UPPER_CASEfor module-level constants.Private helpers and modules are prefixed with a single underscore (e.g.
_utils.py,_compute_score).Attributes that are set during
fitend with a trailing underscore (e.g.self.threshold_,self.changepoints_). This mirrors the scikit-learn estimated attribute convention and is how callers tell fitted state apart from user-supplied parameters.
Common variable names align with scikit-learn:
Xis the input data of shape(n_samples, n_features).yis the target or label array. It is rarely used in Skchange’s unsupervised setting.n_samplesis the number of timepoints.n_featuresis the number of variables or channels.
Estimator API conventions#
Detectors, costs, and other estimators in Skchange follow the scikit-learn estimator API. The most important rules are:
No work in
__init__.__init__only stores the constructor arguments verbatim onself, with no validation or transformation. See scikit-learn instantiation rules.Parameter names match attribute names. Every constructor argument
foois stored asself.fooso thatget_params/set_paramsand cloning work correctly.All learning happens in
fit. Input validation, fitted attributes (ending in_), andselfis returned. See scikit-learn fitting.predictand friends are stateless: they return values and do not modifyself.
Detectors may expose any of the following predict_* methods, depending on
what the algorithm computes.
predict(X)returns an ndarray of shape(n_samples,)containing dense per-sample integer segment labels. It is part of the universal interface and is always implemented.predict_changepoints(X)returns an ndarray of shape(n_changepoints,)containing sorted start indices of segments. It is part of the universal interface and is always implemented.predict_segment_anomalies(X)returns an ndarray of shape(n_anomalies, 2)containing start (inclusive) and end (exclusive) indices of anomalous segments. It is only implemented on detectors that identify anomalous segments.predict_scores(X, return_index=False)returns the detector’s internal scoring objective as a 1D ndarray whose length is detector-specific. Whenreturn_index=True, it returns the tuple(scores, index_dict)where the dictionary carries algorithm-specific metadata that locates each score on the timeline.predict_all(X)is a convenience method that returns all outputs computed in a single pass as a dictionary. The keys are detector-specific and are not a stable cross-detector contract.
Skchange intentionally deviates from scikit-learn in a few respects to fit the
time series context. Most notably, Skchange detectors are not expected to be invariant
to the order of the samples, so they fail the check_methods_sample_order_invariance
check.
Docstrings#
All public modules, classes, functions, and methods must have a docstring
written in the
numpydoc style.
numpydoc validation is enabled in the documentation build, so missing
or malformed sections will be flagged.
A minimal example:
def my_function(x, threshold=0.5):
"""Summary line in imperative mood.
Extended description, optional.
Parameters
----------
x : array-like of shape (n_samples, n_features)
The input data.
threshold : float, default=0.5
Description of ``threshold``.
Returns
-------
result : ndarray of shape (n_samples,)
Description of the return value.
"""
Guidelines:
Start with a one-line summary in the imperative mood (“Compute …”, not “Computes …”).
Document every parameter and return value with type and shape.
Use
default=<value>rather thanoptionalfor parameters that have a default, matching scikit-learn.Cross-reference other Skchange objects with
:class:`~skchange.detectors.PELT`or:func:`~skchange.metrics.f1_score`.Include a
Referencessection with citations and anExamplessection with a runnable doctest where it adds value.
Tests#
Tests live next to the code they cover, in a tests/ subpackage
(e.g. skchange/detectors/tests/). Naming follows
pytest conventions.
Test files are named
test_*.py.Test functions are named
test_*.Use
pytest.mark.parametrizefor varying inputs rather than loops.
Each subpackage also contains a shared contract test module that exercises
every implementation in that subpackage against a common set of API checks.
These modules are named test_all.py in the new API
(e.g. skchange/new_api/detectors/tests/test_all.py,
skchange/new_api/interval_scorers/tests/test_all.py,
skchange/new_api/metrics/tests/test_all.py). The set of instances they run
against is declared in a sibling _registry.py file. When you add a new
detector, scorer, or metric, register a representative set of instances in the
corresponding _registry.py so the shared checks are exercised against your
implementation automatically.
These contract tests cover the constructor contract, fit and predict
round trips, the fitted-attribute naming convention, cloning, and other
estimator-level invariants. They are inspired by
scikit-learn’s check_estimator.
Run the full test suite locally with:
pytest
Type hints#
Type hints are encouraged for new public functions but are not required
everywhere. When adding hints, prefer numpy.typing aliases (e.g.
npt.NDArray[np.floating]) and standard typing constructs over
custom protocols.