pflacco.classical_ela_features#

pflacco.classical_ela_features.calculate_cm_angle(X: Union[DataFrame, ndarray, List[List[float]]], y: Union[Series, ndarray, List[float]], lower_bound: Union[List[float], float], upper_bound: Union[List[float], float], blocks: Optional[Union[List[int], ndarray, int]] = None, force: bool = False, minimize: bool = True) → Dict[str, Union[int, float]]#

Cell Mapping Angle features. These features are based on the location of the worst and best element within each cell. To be precise, their distance to the cell center and the angle between these three elements (at the center) are the foundation:

dist_ctr2{best, worst}.{mean, sd}: arithmetic mean and standard deviation of distances from the cell center to the best / worst observation within the cell (over all cells)
angle.{mean, sd}: arithmetic mean and standard deviation of angles (in degree) between worst, center and best element of a cell (over all cells)
y_ratio_best2worst.{mean, sd}: arithmetic mean and standard deviation of the ratios between the distance of the worst and best element within a cell and the worst and best element in the entire initial design (over all cells); note that the distances are only measured in the objective space

Parameters:

X (Union[pd.DataFrame, np.ndarray, List[List[float]]]) – A collection-like object which contains a sample of the decision space. Can be created with pflacco.sampling.create_initial_sample().
y (Union[pd.Series, np.ndarray, List[float]]) – A list-like object which contains the respective objective values of X.
lower_bound (Union[List[float], float]) – Lower bound of variables of the decision space.
upper_bound (Union[List[float], float]) – Upper bound of variables of the decision space.
blocks (Optional[Union[List[int], np.ndarray, int]], optional) – Number of blocks per dimension, by default None.
force (bool, optional) – The recommended number of blocks per dim is >2 and the minimum number of observation per cell is 3. Meaning, that X has to have at least dim^blocks * 3 observations. This requirement can be circumenvented by setting force to True. ATTENTION: The resulting feature values are not in line with any recommendation and may not have any predictive power, by default False.
minimize (bool, optional) – Indicator whether the objective function should be minimized or maximized, by default True.

Returns:

Dictionary consisting of the calculated features.

Return type:

Dict[str, Union[int, float]]

pflacco.classical_ela_features.calculate_cm_conv(X: Union[DataFrame, ndarray, List[List[float]]], y: Union[Series, ndarray, List[float]], lower_bound: Union[List[float], float], upper_bound: Union[List[float], float], blocks: Optional[Union[List[int], ndarray, int]] = None, force: bool = False, minimize: bool = True, cm_conv_diag: bool = False, cm_conv_fast_k: float = 0.05) → Dict[str, Union[int, float]]#

Cell Mapping Convexity features. Each cell will be represented by an observation (of the initial design), which is located closest to the cell center. Then, the objectives of three neighbouring cells are compared:

{convex, concave}.hard: if the objective of the inner cell is above / below the two outer cells, there is strong evidence for convexity / concavity
{convex, concave}.soft: if the objective of the inner cell is above / below the arithmetic mean of the two outer cells, there is weak evidence for convexity / concavity

Parameters:

X (Union[pd.DataFrame, np.ndarray, List[List[float]]]) – A collection-like object which contains a sample of the decision space. Can be created with pflacco.sampling.create_initial_sample().
y (Union[pd.Series, np.ndarray, List[float]]) – A list-like object which contains the respective objective values of X.
lower_bound (Union[List[float], float]) – Lower bound of variables of the decision space.
upper_bound (Union[List[float], float]) – Upper bound of variables of the decision space.
blocks (Optional[Union[List[int], np.ndarray, int]], optional) – Number of blocks per dimension, by default None.
force (bool, optional) – The recommended number of blocks per dim is >2 and the minimum number of observation per cell is 3. Meaning, that X has to have at least dim^blocks * 3 observations. This requirement can be circumenvented by setting force to True. ATTENTION: The resulting feature values are not in line with any recommendation and may not have any predictive power, by default False.
minimize (bool, optional) – Indicator whether the objective function should be minimized or maximized, by default True.
cm_conv_diag (bool, optional) – Indicator which, when true, consideres cells on the diagonal also as neighbours, by default False.
cm_conv_fast_k (float, optional) – Percentage of elements that should be considered within the nearest neighbour computation, by default 0.05.

Returns:

Dictionary consisting of the calculated features.

Return type:

Dict[str, Union[int, float]]

pflacco.classical_ela_features.calculate_cm_grad(X: Union[DataFrame, ndarray, List[List[float]]], y: Union[Series, ndarray, List[float]], lower_bound: Union[List[float], float], upper_bound: Union[List[float], float], blocks: Optional[Union[List[int], ndarray, int]] = None, force: bool = False, minimize: bool = True) → Dict[str, Union[int, float]]#

Cell Mapping Gradient Homogeneity features. Within a cell of the initial grid, the gradients between each observation and its nearest neighbour observation are computed. Those gradients are then directed towards the smaller of the two objective values and afterwards normalized. Then, the length of the sum of all the directed and normalized gradients within a cell is computed. Based on those measurements (one per cell) the following features are computed:

{mean, sd}: arithmetic mean and standard deviation of the aforementioned lengths

Parameters:

X (Union[pd.DataFrame, np.ndarray, List[List[float]]]) – A collection-like object which contains a sample of the decision space. Can be created with pflacco.sampling.create_initial_sample().
y (Union[pd.Series, np.ndarray, List[float]]) – A list-like object which contains the respective objective values of X.
lower_bound (Union[List[float], float]) – Lower bound of variables of the decision space.
upper_bound (Union[List[float], float]) – Upper bound of variables of the decision space.
blocks (Optional[Union[List[int], np.ndarray, int]], optional) – Number of blocks per dimension, by default None.
force (bool, optional) – The recommended number of blocks per dim is >2 and the minimum number of observation per cell is 3. Meaning, that X has to have at least dim^blocks * 3 observations. This requirement can be circumenvented by setting force to True. ATTENTION: The resulting feature values are not in line with any recommendation and may not have any predictive power, by default False.
minimize (bool, optional) – Indicator whether the objective function should be minimized or maximized, by default True.

Returns:

Dictionary consisting of the calculated features.

Return type:

Dict[str, Union[int, float]]

pflacco.classical_ela_features.calculate_dispersion(X: Union[DataFrame, ndarray, List[List[float]]], y: Union[Series, ndarray, List[float]], disp_quantiles: List[float] = [0.02, 0.05, 0.1, 0.25], dist_method: str = 'euclidean', dist_p: int = 2, minimize: bool = True) → Dict[str, Union[int, float]]#

Dispersion features. Computes features based on the comparison of the dispersion of pairwise distances among the ‘best’ elements and the entire initial design:

{ratio, diff}_{mean, median}_{02, 05, 10, 25}: ratio and difference of the mean / median distances of the distances of the ‘best’ objectives vs. ‘all’ objectives

Parameters:

X (Union[pd.DataFrame, np.ndarray, List[List[float]]]) – A collection-like object which contains a sample of the decision space. Can be created with pflacco.sampling.create_initial_sample().
y (Union[pd.Series, np.ndarray, List[float]]) – A list-like object which contains the respective objective values of X.
disp_quantiles (List[float], optional) – Quantiles which are used to determine the best elements of the entire sample, by default [0.02, 0.05, 0.1, 0.25].
dist_method (str, optional) – Determines which distance method is used. The given value is passed over to scipy.spatial.distance.pdist, by default ‘euclidean’.
dist_p (int, optional) – The p-norm to apply for Minkowski. This is only considered when dist_method = ‘minkowski’, by default 2.
minimize (bool, optional) – Indicator whether the objective function should be minimized or maximized, by default True.

Returns:

Dictionary consisting of the calculated features.

Return type:

Dict[str, Union[int, float]]

pflacco.classical_ela_features.calculate_ela_conv(X: Union[DataFrame, ndarray, List[List[float]]], y: Union[Series, ndarray, List[float]], f: Callable[[List[float]], float], ela_conv_nsample: int = 1000, ela_conv_threshold: float = 1e-10, seed: Optional[int] = None) → Dict[str, Union[int, float]]#

ELA Convexity features. Two observations are chosen randomly from the initial design. Then, a linear (convex) combination of those observations is calculated based on a random weight from [0, 1]. The corresponding objective value will be compared to the linear combination of the objectives from the two original observations. This process is replicated convex.nsample (per default 1000) times and will then be aggregated:

{convex_p, linear_p}: percentage of convexity / linearity
linear_dev.{orig, abs}: average (original / absolute) deviation between the linear combination of the objectives and the objective of the linear combination of the observations

Parameters:

X (Union[pd.DataFrame, np.ndarray, List[List[float]]]) – A collection-like object which contains a sample of the decision space. Can be created with pflacco.sampling.create_initial_sample().
y (Union[pd.Series, np.ndarray, List[float]]) – A list-like object which contains the respective objective values of X.
f (Callable[[List[float]], float]) – Objective function to be optimized.
ela_conv_nsample (int, optional) – Number of samples that are drawn for calculating the convexity features, by default 1000.
ela_conv_threshold (float, optional) – Threshold of the linearity, i.e., the tolerance to/deviation from perfect linearity, in order to still be considered linear, by default 1e-10.
seed (Optional[int], optional) – Seed for reproducability, by default None.

Returns:

Dictionary consisting of the calculated features.

Return type:

Dict[str, Union[int, float]]

pflacco.classical_ela_features.calculate_ela_curvate(X: Union[DataFrame, ndarray, List[List[float]]], y: Union[Series, ndarray, List[float]], f: Callable[[List[float]], float], dim: int, lower_bound: Union[List[float], float], upper_bound: Union[List[float], float], sample_size_factor: int = 100, delta: float = 0.0001, eps: float = 0.0001, zero_tol: float = 6.378748342528005e-156, seed: Optional[int] = None) → Dict[str, Union[int, float]]#

ELA Curvature features.

Given a feature object, curv.sample_size samples (per default 100 * d with d being the number of features) are randomly chosen. Then, the gradient and hessian of the function are estimated based on those points and the following features are computed:

grad_norm.{min, lq, mean, median, uq, max, sd, nas}: aggregations (minimum, lower quartile, arithmetic mean, median, upper quartile, maximum, standard deviation and percentage of NAs) of the gradients’ lengths
grad_scale.{min, lq, mean, median, uq, max, sd, nas}: aggregations of the ratios between biggest and smallest (absolute) gradient directions
hessian_cond.{min, lq, mean, median, uq, max, sd, nas}: aggregations of the ratios of biggest and smallest eigenvalue of the hessian matrices

Parameters:

X (Union[pd.DataFrame, np.ndarray, List[List[float]]]) – A collection-like object which contains a sample of the decision space. Can be created with pflacco.sampling.create_initial_sample().
y (Union[pd.Series, np.ndarray, List[float]]) – A list-like object which contains the respective objective values of X.
f (Callable[[List[float]], float]) – Objective function to be optimized.
dim (int) – Dimensionality of the decision space.
lower_bound (Union[List[float], float]) – Lower bound of variables of the decision space.
upper_bound (Union[List[float], float]) – Upper bound of variables of the decision space.
sample_size_factor (int, optional) – Factor which determines the sample size by sample_size_factor * dim, by default 100.
delta (float, optional) – Parameter used to approximate the gradient and hessian. See grad and hessian of the R-package numDeriv for more details, by default 10**-4.
eps (float, optional) – Parameter used to approximate the gradient and hessian. See grad and hessian of the R-package numDeriv for more details, by default 10**-4.
zero_tol (float, optional) – Parameter used to approximate the gradient and hessian. See grad and hessian of the R-package numDeriv for more details, by default np.sqrt(np.nextafter(0, 1)/70**-7).
seed (Optional[int], optional) – Seed for reproducability, by default None.

Returns:

Dictionary consisting of the calculated features.

Return type:

Dict[str, Union[int, float]]

pflacco.classical_ela_features.calculate_ela_distribution(X: Union[DataFrame, ndarray, List[List[float]]], y: Union[Series, ndarray, List[float]], ela_distr_skewness_type: int = 3, ela_distr_kurtosis_type: int = 3) → Dict[str, Union[int, float]]#

ELA Distribution features. Calculation is based on the objective values alone.

skewness: skewness of the objective values
kurtosis: kurtosis of the objective values
number_of_peaks: number of peaks based on an estimation of the density of the objective values

Parameters:

X (Union[pd.DataFrame, np.ndarray, List[List[float]]]) – A collection-like object which contains a sample of the decision space. Can be created with pflacco.sampling.create_initial_sample().
y (Union[pd.Series, np.ndarray, List[float]]) – A list-like object which contains the respective objective values of X.
ela_distr_skewness_type (int, optional) – Integer indicating which algorithm to use, by default 3: Type 1: g_1 = m_3 / m_2^(3/2). Typical definition in older textbooks. Type 2: G_1 = g_1 * sqrt(n(n-1)) / (n-2). Used in SAS and SPSS. Type 3: b_1 = m_3 / s^3 = g_1 ((n-1)/n)^(3/2). Used in MINITAB and BMDP.
ela_distr_kurtosis_type (int, optional) – Integer indicating which algorithm to use, by default 3: Type 1: g_2 = m_4 / m_2^2 - 3. Typical definition in older textbooks. Type 2: G_2 = ((n+1) g_2 + 6) * (n-1) / ((n-2)(n-3)). Used in SAS and SPSS. Type 3: b_2 = m_4 / s^4 - 3 = (g_2 + 3) (1 - 1/n)^2 - 3. Used in MINITAB and BMDP.

Returns:

Dictionary consisting of the calculated features.

Return type:

Dict[str, Union[int, float]]

pflacco.classical_ela_features.calculate_ela_level(X: Union[DataFrame, ndarray, List[List[float]]], y: Union[Series, ndarray, List[float]], ela_level_quantiles: List[float] = [0.1, 0.25, 0.5], interface_mda_from_R: bool = False, ela_level_resample_iterations: int = 10) → Dict[str, Union[int, float]]#

ELA Levelset features.

mmce_{methods}_{quantiles}: mean misclassification error of each pair of classification method and quantile
{method1}_{method2}_{quantiles}: ratio of all pairs of classification methods for all quantiles

Parameters:

X (Union[pd.DataFrame, np.ndarray, List[List[float]]]) – A collection-like object which contains a sample of the decision space. Can be created with pflacco.sampling.create_initial_sample().
y (Union[pd.Series, np.ndarray, List[float]]) – A list-like object which contains the respective objective values of X.
ela_level_quantiles (List[float], optional) – Cutpoints (quantiles of the objective values) for splitting the objective space, by default [0.1, 0.25, 0.5].
interface_mda_from_R (bool, optional) – Indicator whether to interface missing functionality from R, by default False.
ela_level_resample_iterations (int, optional) – Number of iterations of the resampling method, by default 10.

Returns:

Dictionary consisting of the calculated features.

Return type:

Dict[str, Union[int, float]]

pflacco.classical_ela_features.calculate_ela_local(X: Union[DataFrame, ndarray, List[List[float]]], y: Union[Series, ndarray, List[float]], f: Callable[[List[float]], float], dim: int, lower_bound: Union[List[float], float], upper_bound: Union[List[float], float], minimize: bool = True, ela_local_local_searches_factor: int = 50, ela_local_optim_method: str = 'L-BFGS-B', ela_local_clust_method: str = 'single', seed: Optional[int] = None, **minimizer_kwargs) → Dict[str, Union[int, float]]#

ELA Local Search features. Based on some randomly chosen points from the initial design, a pre-defined number of local searches (ela_local.local_searches) are executed. Their optima are then clustered (using hierarchical clustering), assuming that local optima that are located close to each other, likely belong to the same basin. Given those basins, the following features are computed:

n_loc_opt.{abs, rel}: the absolute / relative amount of local optima
best2mean_contr.orig: each cluster is represented by its center; this feature is the ratio of the objective values of the best and average cluster
best2mean_contr.ratio: each cluster is represented by its center; this feature is the ratio of the differences in the objective values of average to best and worst to best cluster
basin_sizes.avg_{best, non_best, worst}: average basin size of the best / non-best / worst cluster(s)
fun_evals.{min, lq, mean, median, uq, max, sd}: aggregations of the performed local searches

Parameters:

X (Union[pd.DataFrame, np.ndarray, List[List[float]]]) – A collection-like object which contains a sample of the decision space. Can be created with pflacco.sampling.create_initial_sample().
y (Union[pd.Series, np.ndarray, List[float]]) – A list-like object which contains the respective objective values of X.
f (Callable[[List[float]], float]) – Objective function to be optimized.
dim (int) – Dimensionality of the decision space.
lower_bound (Union[List[float], float]) – Lower bound of variables of the decision space.
upper_bound (Union[List[float], float]) – Upper bound of variables of the decision space.
minimize (bool, optional) – Indicator whether the objective function should be minimized or maximized, by default True.
ela_local_local_searches_factor (int, optional) – Factor which determines the number of local searches by ela_local_local_searches_factor * dim, by default 50.
ela_local_optim_method (str, optional) – Type of solver. Any of scipy.optimize.minimize can be used, by default ‘L-BFGS-B’.
ela_local_clust_method (str, optional) – Hierarchical clustering method to use, by default ‘single’.
seed (Optional[int], optional) – Seed for reproducability, by default None.

Returns:

Dictionary consisting of the calculated features.

Return type:

Dict[str, Union[int, float]]

pflacco.classical_ela_features.calculate_ela_meta(X: Union[DataFrame, ndarray, List[List[float]]], y: Union[Series, ndarray, List[float]]) → Dict[str, Union[int, float]]#

ELA Meta features. Given an initial design, linear and quadratic models of the form objective ~ features are created. Both versions are created with and without simple interactions (e.g., x1:x2). Based on those models, the following features are computed:

lin_simple.{adj_r2, intercept}: adjusted R^2 (i.e. model fit) and intercept of a simple linear model
lin_simple.coef.{min, max, max_by_min}: smallest and biggest (non-intercept) absolute coefficients of the simple linear model, and their ratio
{lin_w_interact, quad_simple, quad_w_interact}.adj_r2: adjusted R^2 (i.e. the model fit) of a linear model with interactions, and a quadratic model with and without interactions
quad_simple.cond: condition of a simple quadratic model (without interactions), i.e. the ratio of its (absolute) biggest and smallest coefficients

Parameters:

X (Union[pd.DataFrame, np.ndarray, List[List[float]]]) – A collection-like object which contains a sample of the decision space. Can be created with pflacco.sampling.create_initial_sample().
y (Union[pd.Series, np.ndarray, List[float]]) – A list-like object which contains the respective objective values of X.

Returns:

Dictionary consisting of the calculated features.

Return type:

Dict[str, Union[int, float]]

pflacco.classical_ela_features.calculate_information_content(X: Union[DataFrame, ndarray, List[List[float]]], y: Union[Series, ndarray, List[float]], ic_sorting: str = 'nn', ic_nn_neighborhood: int = 20, ic_nn_start: Optional[int] = None, ic_epsilon: List[float] = array([0.00000000e+00, 1.00000000e-05, 1.04717682e-05, ..., 9.11926760e+14, 9.54948564e+14, 1.00000000e+15]), ic_settling_sensitivity: float = 0.05, ic_info_sensitivity: float = 0.5, seed: Optional[int] = None) → Dict[str, Union[int, float]]#

Information Content features. Computes features based on the Information Content of Fitness Sequences (ICoFiS) approach [1]. In this approach, the information content of a continuous landscape, i.e. smoothness, ruggedness, or neutrality, are quantified. While common analysis methods were able to calculate the information content of discrete landscapes, the ICoFiS approach provides an adaptation to continuous landscapes that accounts e.g. for variable step sizes in random walk sampling:

h_max: “maximum information content” (entropy) of the fitness sequence, cf. equation (5)
eps_s: “settling sensitivity”, indicating the epsilon for which the sequence nearly consists of zeros only, cf. equation (6)
eps_max: similar to eps.s, but in contrast to the former eps.max guarantees non-missing values; this simply is the epsilon-value for which H(eps.max) == h.max
eps_ratio: “ratio of partial information sensitivity”, cf. equation (8), where the ratio is 0.5
m0: “initial partial information”, cf. equation (7)

Parameters:

X (Union[pd.DataFrame, np.ndarray, List[List[float]]]) – A collection-like object which contains a sample of the decision space. Can be created with pflacco.sampling.create_initial_sample().
y (Union[pd.Series, np.ndarray, List[float]]) – A list-like object which contains the respective objective values of X.
ic_sorting (str, optional) – Sorting strategy, which is used to define the tour through the landscape. Possible values are ‘nn’ and ‘random, by default ‘nn’.
ic_nn_neighborhood (int, optional) – Number of neighbours to be considered in the computation, by default 20.
ic_nn_start (Optional[int], optional) – Indices of the observation which should be used as starting points. When none are supplied, these are chosen randomly, by default None.
ic_epsilon (List[float], optional) – Epsilon values as described in section V.A of [1], by default np.insert(10 ** np.linspace(start = -5, stop = 15, num = 1000), 0, 0).
ic_settling_sensitivity (float, optional) – Threshold, which should be used for computing the settling sensitivity of [1], by default 0.05.
ic_info_sensitivity (float, optional) – Portion of partial information sensitivity of [1], by default 0.5
seed (Optional[int], optional) – Seed for reproducability, by default None

Returns:

Dictionary consisting of the calculated features.

Return type:

Dict[str, Union[int, float]]

References

[1] Muñoz, M.A., Kirley, M. and Halgamuge, S.K., 2014.: Exploratory landscape analysis of continuous space optimization problems using information content. IEEE transactions on evolutionary computation, 19(1), pp.74-87.

pflacco.classical_ela_features.calculate_limo(X: Union[DataFrame, ndarray, List[List[float]]], y: Union[Series, ndarray, List[float]], lower_bound: Union[List[float], float], upper_bound: Union[List[float], float], blocks: Optional[Union[List[int], ndarray, int]] = None, force: bool = False) → Dict[str, Optional[Union[int, float]]]#

Linear Model features. Linear models are computed per cell, provided the decision space is divided into a grid of cells. Each one of the models has the form objective ~ features.

avg_length.{reg, norm}: length of the average coefficient vector (based on regular and normalized vectors)
length_{mean, sd}: arithmetic mean and standard deviation of the lengths of all coefficient vectors
cor.{reg, norm}: correlation of all coefficient vectors (based on regular and normalized vectors)
ratio_{mean, sd}: arithmetic mean and standard deviation of the ratios of (absolute) maximum and minimum (non-intercept) coefficients per cell
sd_{ratio, mean}.{reg, norm}: max-by-min-ratio and arithmetic mean of the standard deviations of the (non-intercept) coefficients (based on regular and normalized vectors)

Parameters:

X (Union[pd.DataFrame, np.ndarray, List[List[float]]]) – A collection-like object which contains a sample of the decision space. Can be created with pflacco.sampling.create_initial_sample().
y (Union[pd.Series, np.ndarray, List[float]]) – A list-like object which contains the respective objective values of X.
lower_bound (Union[List[float], float]) – Lower bound of variables of the decision space.
upper_bound (Union[List[float], float]) – Upper bound of variables of the decision space.
blocks (Optional[Union[List[int], np.ndarray, int]], optional) – Number of blocks per dimension, by default None.
force (bool, optional) – The recommended number of blocks per dim is >2 and the minimum number of observation per cell is 3. Meaning, that X has to have at least dim^blocks * 3 observations. This requirement can be circumenvented by setting force to True. ATTENTION: The resulting feature values are not in line with any recommendation and may not have any predictive power, by default False.

Returns:

Dictionary consisting of the calculated features.

Return type:

Dict[str, Optional[Union[int, float]]]

pflacco.classical_ela_features.calculate_nbc(X: Union[DataFrame, ndarray, List[List[float]]], y: Union[Series, ndarray, List[float]], fast_k: float = 0.05, dist_tie_breaker: str = 'sample', minimize: bool = True) → Dict[str, Union[int, float]]#

Nearest Better Clustering features. Computes features based on the comparison of nearest neighbour and nearest better neighbour, i.e., the nearest neighbor with a better performance / objective value value.

nn_nb.{sd, mean}_ratio: ratio of standard deviations and arithmetic mean based on the distances among the nearest neighbours and the nearest better neighbours
nn_nb.cor: correlation between distances of the nearest neighbours and the distances of the nearest better neighbours
dist_ratio.coeff_var: coefficient of variation of the distance ratios
nb_fitness.cor: correlation between fitness value and count of observations to whom the current observation is the nearest better neighbour (the so-called “indegree”).

XUnion[pd.DataFrame, np.ndarray, List[List[float]]]: A collection-like object which contains a sample of the decision space. Can be created with pflacco.sampling.create_initial_sample().
yUnion[pd.Series, np.ndarray, List[float]]: A list-like object which contains the respective objective values of X.
fast_kfloat, optional: Controls the percentage of observations that should be considered when looking for the nearest better neighbour quickly. If no better neighbour is found, the whole sample is considered, by default 0.05.
dist_tie_breakerstr, optional: Strategy to break ties between observations. Currently allows sample (which samples randomly), first (which deterministically picks the first found occurence), and last (which deterministically picks the last found occurence), by default ‘sample’.
minimizebool, optional: Indicator whether the objective function should be minimized or maximized, by default True.

Returns:: Dictionary consisting of the calculated features.
Return type:: Dict[str, Union[int, float]]

pflacco.classical_ela_features.calculate_pca(X: Union[DataFrame, ndarray, List[List[float]]], y: Union[Series, ndarray, List[float]], prop_cov_x: float = 0.9, prop_cor_x: float = 0.9, prop_cov_init: float = 0.9, prop_cor_init: float = 0.9) → Dict[str, Union[int, float]]#

Principal component (analysis) features.

expl_var.{cov, cor}_{x, init}: proportion of the explained variance when applying PCA to the covariance / correlation matrix of the decision space (x) or the entire initial design (init)
expl_var_PC1.{cov, cor}_{x, init}: proportion of variance, which is explained by the first principal component when applying PCA to the covariance / correlation matrix of the decision space (x) or the entire initial design

Parameters:

X (Union[pd.DataFrame, np.ndarray, List[List[float]]]) – A collection-like object which contains a sample of the decision space. Can be created with pflacco.sampling.create_initial_sample().
y (Union[pd.Series, np.ndarray, List[float]]) – A list-like object which contains the respective objective values of X.
prop_cov_x (float, optional) – Proportion of the explained variance by the first PC based on the covariance matrix, by default 0.9.
prop_cor_x (float, optional) – Proportion of the explained variance by the first PC based on the correlation matrix, by default 0.9.
prop_cov_init (float, optional) – Proportion of the explained variance by the first PC based on the covariance matrix, by default 0.9.
prop_cor_init (float, optional) – Proportion of the explained variance by the first PC based on the correlation matrix, by default 0.9.

Returns:

Dictionary consisting of the calculated features.

Return type:

Dict[str, Union[int, float]]