generate_piecewise_data#
- generate_piecewise_data(distributions: rv_continuous | rv_discrete | list[rv_continuous] | list[rv_discrete] | None = None, lengths: int | list[int] | ndarray | None = None, *, n_segments: int = 3, n_samples: int = 100, seed: int | Generator | None = None, return_params: bool = False) ndarray | tuple[ndarray, dict][source][source]#
Generate data with a piecewise constant distribution.
Generate piecewise segments of data from scipy.stats distributions, where unspecified parameters are randomly generated.
- Parameters:
- distributionslist of scipy.stats.rv_continuous or scipy.stats.rv_discrete, optional (default=None)
The distributions for generating piecewise data. They are recycled to match the number of segments specified by lengths or n_segments. If None, alternating segments of scipy.stats.norm() and scipy.stats.norm(5) are used. Each distribution is expected to be a scipy distribution instance (e.g., scipy.stats.norm, scipy.stats.uniform). See scipy.stats for a list of all available distributions.
- lengthsint, list of int or np.ndarray, optional (default=None)
The segment lengths. There are three possible cases:
list or numpy array: Custom set of segment lengths.
int: Length of n_segments equal segments.
None: Generate n_segments random segment lengths with a total sample size of n_samples.
- n_segmentsint (default=3)
Number of segments to generate if lengths is an integer or None.
- n_samplesint (default=100)
Total number of samples to generate if lengths is not specified.
- seednp.random.Generator | int | None, optional
Seed for the random number generator or a numpy random generator instance. If specified, this ensures reproducible output across multiple calls.
- return_paramsbool, optional (default=False)
If True, the function returns a tuple of the generated DataFrame and a dictionary with the parameters used to generate the data.
- Returns:
- np.ndarray of shape (n_samples, n_variables)
Array with generated data.
- dict, optional
A dictionary containing the parameters used to generate the data. Only returned if return_params is True. It has the following keys:
“n_segments” : number of segments generated.
“n_samples” : total number of samples generated.
“distributions” : list of scipy.stats.rv_continuous or scipy.stats.rv_discrete with the distributions used for each segment.
“lengths” : list of lengths for each segment.
“change_points” : list of change points, which are the starting indices of each segment in the data.
Examples
>>> # Example 1: Two normal segments >>> from skchange.new_api.datasets import generate_piecewise_data >>> from scipy.stats import norm >>> generate_piecewise_data( ... distributions=[norm(0, 1), norm(10, 0.1)], ... lengths=[7, 3], ... seed=1, ... ) array([[ 0.345584...], ... [10.029413...]])