generate_piecewise_data#

Generate data with a piecewise constant distribution.

Generate piecewise segments of data from scipy.stats distributions, where unspecified parameters are randomly generated.

Parameters:

distributionslist of scipy.stats.rv_continuous or scipy.stats.rv_discrete, optional (default=None)

The distributions for generating piecewise data. They are recycled to match the number of segments specified by lengths or n_segments. If None, alternating segments of scipy.stats.norm() and scipy.stats.norm(5) are used. Each distribution is expected to be a scipy distribution instance (e.g., scipy.stats.norm, scipy.stats.uniform). See scipy.stats for a list of all available distributions.

lengthsint, list of int or np.ndarray, optional (default=None)

The segment lengths. There are three possible cases:

list or numpy array: Custom set of segment lengths.
int: Length of n_segments equal segments.
None: Generate n_segments random segment lengths with a total sample size of n_samples.

n_segmentsint (default=3)

Number of segments to generate if lengths is an integer or None.

n_samplesint (default=100)

Total number of samples to generate if lengths is not specified.

seednp.random.Generator | int | None, optional

Seed for the random number generator or a numpy random generator instance. If specified, this ensures reproducible output across multiple calls.

return_paramsbool, optional (default=False)

If True, the function returns a tuple of the generated DataFrame and a dictionary with the parameters used to generate the data.

Returns:

np.ndarray of shape (n_samples, n_variables)

Array with generated data.

dict, optional

A dictionary containing the parameters used to generate the data. Only returned if return_params is True. It has the following keys:

“n_segments” : number of segments generated.
“n_samples” : total number of samples generated.
“distributions” : list of scipy.stats.rv_continuous or scipy.stats.rv_discrete with the distributions used for each segment.
“lengths” : list of lengths for each segment.
“change_points” : list of change points, which are the starting indices of each segment in the data.

Examples

>>> # Example 1: Two normal segments
>>> from skchange.new_api.datasets import generate_piecewise_data
>>> from scipy.stats import norm
>>> generate_piecewise_data(
...     distributions=[norm(0, 1), norm(10, 0.1)],
...     lengths=[7, 3],
...     seed=1,
... )
array([[ 0.345584...],
       ...
       [10.029413...]])