generate_piecewise_data#
- generate_piecewise_data(distributions: rv_continuous | rv_discrete | list[rv_continuous] | list[rv_discrete] | None = None, lengths: int | list[int] | ndarray | None = None, *, n_segments: int = 3, n_samples: int = 100, seed: int | Generator | None = None, return_params: bool = False) DataFrame | tuple[DataFrame, dict][source][source]#
Generate data with a piecewise constant distribution.
Generate piecewise segments of data from scipy.stats distributions, where unspecified parameters are randomly generated.
- Parameters:
- distributionslist of scipy.stats.rv_continuous or scipy.stats.rv_discrete, optional (default=None)
The distributions for generating piecewise data. They are recycled to match the number of segments specified by lengths or n_segments. If None, alternating segments of scipy.stats.norm() and scipy.stats.norm(5) are used. Each distribution is expected to be a scipy distribution instance (e.g., scipy.stats.norm, scipy.stats.uniform). See scipy.stats for a list of all available distributions.
- lengthsint, list of int or np.ndarray, optional (default=None)
The segment lengths. There are three possible cases:
list or numpy array: Custom set of segment lengths.
int: Length of n_segments equal segments.
None: Generate n_segments random segment lengths with a total sample size of n_samples.
- n_segmentsint (default=3)
Number of segments to generate if lengths is an integer or None.
- n_samplesint (default=100)
Total number of samples to generate if lengths is not specified.
- seednp.random.Generator | int | None, optional
Seed for the random number generator or a numpy random generator instance. If specified, this ensures reproducible output across multiple calls.
- return_paramsbool, optional (default=False)
If True, the function returns a tuple of the generated DataFrame and a dictionary with the parameters used to generate the data.
- Returns:
- pd.DataFrame
Data frame with generated data.
- dict, optional
A dictionary containing the parameters used to generate the data. Only returned if return_params is True. It has the following keys:
“n_segments” : number of segments generated.
“n_samples” : total number of samples generated.
“distributions” : list of scipy.stats.rv_continuous or scipy.stats.rv_discrete with the distributions used for each segment.
“lengths” : list of lengths for each segment.
“change_points” : list of change points, which are the starting indices of each segment in the data.
Examples
>>> # Example 1: Two normal segments >>> from skchange.datasets import generate_piecewise_data >>> from scipy.stats import norm >>> generate_piecewise_data( ... distributions=[norm(0, 1), norm(10, 0.1)], ... lengths=[7, 3], ... seed=1, ... ) 0 0 0.345584 1 0.821618 2 0.330437 3 -1.303157 4 0.905356 5 0.446375 6 -0.536953 7 10.058112 8 10.036457 9 10.029413
>>> # Example 2: Two Poisson segments >>> from scipy.stats import poisson >>> generate_piecewise_data( ... distributions=[poisson(1), poisson(10)], ... lengths=[5, 5], ... seed=2, ... ) 0 0 0 1 0 2 1 3 2 4 0 5 8 6 11 7 9 8 9 9 9
>>> # Example 3: Specify int lengths and n_segments >>> generate_piecewise_data( ... distributions=[norm(0), norm(5)], ... lengths=3, ... n_segments=3, ... seed=3, ... ) 0 0 2.040919 1 -2.555665 2 0.418099 3 4.432230 4 4.547351 5 4.784403 6 -2.019986 7 -0.231932 8 -0.865213