generate_piecewise_regression_data#
- generate_piecewise_regression_data(lengths: int | list[int] | ndarray | None = None, *, n_segments: int = 3, n_samples: int = 100, n_features: int = 1, n_informative: int = 1, n_targets: int = 1, bias: float = 0.0, effective_rank: int | None = None, tail_strength: float = 0.5, noise: float = 1.0, shuffle: bool = True, seed: int | Generator | None = None, return_params: bool = False) tuple[DataFrame, list[str], list[str]] | tuple[DataFrame, list[str], list[str], dict][source][source]#
Generate piecewise linear regression data.
Generate independent segments of data from sklearn.datasets.make_regression.
- Parameters:
- lengthsint, list of int or np.ndarray, optional (default=None)
The segment lengths. There are three possible cases:
list or numpy array: Custom set of segment lengths.
int: Length of n_segments equal segments.
None: Generate n_segments random segment lengths with a total sample size of n_samples.
- n_segmentsint (default=3)
Number of segments to generate if lengths is an integer or None.
- n_samplesint (default=100)
Total number of samples to generate if lengths is not specified.
- n_featuresint
The total number of features.
- n_informativeint
The number of informative features.
- n_targetsint
The number of target variables.
- biasfloat
The bias term in the linear model. Used across all segments.
- effective_rankint | None
The effective rank of the feature matrix. Used across all segments.
- tail_strengthfloat
The tail strength of the noise distribution. Used across all segments.
- noisefloat
The standard deviation of the Gaussian noise applied to the output. Used across all segments.
- shufflebool
Whether to shuffle the samples and features per segment.
- seednp.random.Generator | int | None, optional
Seed for the random number generator or a numpy random generator instance. If specified, this ensures reproducible output across multiple calls.
- return_paramsbool
Whether to return the parameters used for data generation.
- Returns:
- pd.DataFrame
The generated data as a DataFrame with columns named “feature_0”, “feature_1”, …, “target_0”, “target_1”, …
- list[str]
A list of feature column names.
- list[str]
A list of target column names.
- dict, optional
If return_params is True, a dictionary containing the parameters used to generate the data, including segment lengths, coefficients, change points, total number of samples and total number of segments.