pflacco.misc_features#

pflacco.misc_features.calculate_fitness_distance_correlation(X: Union[DataFrame, ndarray, List[List[float]]], y: Union[Series, ndarray, List[float]], f_opt: Optional[float] = None, proportion_of_best: float = 0.1, minimize: bool = True, minkowski_p: int = 2) → Dict[str, Union[int, float]]#

Calculation of Fitness Distance Correlation features in accordance to [1] and [2].

fd_{correlation, cov}: Correlation/Covariance between the fitness values f_i and the respective distance d_i, where d_i is the distance in the decision space between the given observation x_i and the sampled x*
distance_{mean, std}: Mean and standard deviation of all distances
fitness_{mean, std}: Mean and standard deviation of all fitness values

Parameters:

X (Union[pd.DataFrame, np.ndarray, List[List[float]]]) – A collection-like object which contains a sample of the decision space. Can be created with pflacco.sampling.create_initial_sample().
y (Union[pd.Series, np.ndarray, List[float]]) – A list-like object which contains the respective objective values of X.
f_opt (Optional[float], optional) – Objective value of the global optimum (if known), by default None.
proportion_of_best (float, optional) – Value which is used to split the provided observations X and y into the top `proportion_of_best * 100`% individuals and the remaining. Must be within the interval (0, 1], by default 0.1.
minimize (bool, optional) – Indicator whether the objective function should be minimized or maximized, by default True.
minkowski_p (int, optional) – The p-norm to apply for Minkowski, by default 2.

Returns:

Dictionary consisting of the calculated features.

Return type:

Dict[str, Union[int, float]]

References

[1] Jones, T. and Forrest, S., 1995, July.: Fitness distance correlation as a measure of problem difficulty for genetic algorithms. In ICGA (Vol. 95, pp. 184-192).
[2] Müller, C.L. and Sbalzarini, I.F., 2011, April.: Global characterization of the CEC 2005 fitness landscapes using fitness-distance analysis. In European conference on the applications of evolutionary computation (pp. 294-303).

pflacco.misc_features.calculate_gradient_features(f: Callable[[List[float]], float], dim: int, lower_bound: Union[List[float], float], upper_bound: Union[List[float], float], step_size: Optional[float] = None, budget_factor_per_dim: int = 100, seed: Optional[int] = None) → Dict[str, Union[int, float]]#

Calculation of a Gradient features in accordance to [1]. A random walk is performed the gradient of the fitness space between each consecutive step is estimated.

g_avg: the average estimated gradients
g_std: the standard deviation of estimated gradients

Parameters:

f (Callable[[List[float]], float]) – Objective function to be optimized.
dim (int) – Dimensionality of the decision space.
lower_bound (Union[List[float], float]) – Lower bound of variables of the decision space.
upper_bound (Union[List[float], float]) – Upper bound of variables of the decision space.
step_size (float, optional) – _description_, by default None
budget_factor_per_dim (int, optional) – The realized budget is calculated with budget_factor_per_dim * dim, by default 100.
seed (Optional[int], optional) – Seed for reproducability, by default None.

Returns:

Dictionary consisting of the calculated features.

Return type:

Dict[str, Union[int, float]]

References

[1] Malan, K.M. and Engelbrecht, A.P., 2013, June.: Ruggedness, funnels and gradients in fitness landscapes and the effect on PSO performance. In 2013 IEEE Congress on Evolutionary Computation (pp. 963-970). IEEE.

pflacco.misc_features.calculate_hill_climbing_features(f: Callable[[List[float]], float], dim: int, lower_bound: Union[List[float], float], upper_bound: Union[List[float], float], n_runs: int = 100, budget_factor_per_run: int = 1000, method: str = 'L-BFGS-B', minimize: bool = True, seed: Optional[int] = None, minkowski_p: int = 2) → Dict[str, Union[int, float]]#

Calculation of a Hill Climbing features in accordance to [1]. The feature set is calculated on a number of hill climbing runs.

{avg, std}_dist_between_opt: average and standard deviation of distance between found optima
{avg, std}_dist_local_to_global: average and standard deviation of distance between best found optima and all other local optima

Parameters:

f (Callable[[List[float]], float]) – Objective function to be optimized.
dim (int) – Dimensionality of the decision space.
lower_bound (Union[List[float], float]) – Lower bound of variables of the decision space.
upper_bound (Union[List[float], float]) – Upper bound of variables of the decision space.
n_runs (int, optional) – Number of independent solver runs to create the sample, by default 100.
budget_factor_per_run (int, optional) – Budget factor for each individual solver run. The realized budget is calculated with budget_factor_per_run * dim, by default 1000.
method (str, optional) – Type of solver. Any of scipy.optimize.minimize can be used, by default ‘L-BFGS-B’.
minimize (bool, optional) – Indicator whether the objective function should be minimized or maximized, by default True.
seed (Optional[int], optional) – Seed for reproducability, by default None.
minkowski_p (int, optional) – The p-norm to apply for Minkowski, by default 2.

Returns:

Dictionary consisting of the calculated features.

Return type:

Dict[str, Union[int, float]]

References

[1] Abell, T., Malitsky, Y. and Tierney, K., 2013, January.: Features for exploiting black-box optimization problem structure. In International Conference on Learning and Intelligent Optimization (pp. 30-36).

pflacco.misc_features.calculate_length_scales_features(f: Callable[[List[float]], float], dim: int, lower_bound: Union[List[float], float], upper_bound: Union[List[float], float], budget_factor_per_dim: int = 100, seed: Optional[int] = None, minimize: bool = True, sample_size_from_kde: int = 500) → Dict[str, Union[int, float]]#

Calculation of Length-Scale features in accordance to [1].

shanon_entropy: Entropy measure of the distribution of distances within the objective spaces divided by distances in the decision space of a given sample
{mean, std}: Mean and standard deviation of said distribution
distribution.{second, third, fourth}_moment: Respective moments of the said distribution

Parameters:

f (Callable[[List[float]], float]) – Objective function to be optimized.
dim (int) – Dimensionality of the decision space.
lower_bound (Union[List[float], float]) – Lower bound of variables of the decision space.
upper_bound (Union[List[float], float]) – Upper bound of variables of the decision space.
budget_factor_per_dim (int, optional) – The realized budget is calculated with budget_factor_per_dim * (dim ** 2), by default 100
seed (Optional[int], optional) – Seed for reproducability, by default None
minimize (bool, optional) – Indicator whether the objective function should be minimized or maximized, by default True
sample_size_from_kde (int, optional) – Sample size which is sampled from the fitted kde distribution, by default 500.

Returns:

Dictionary consisting of the calculated features.

Return type:

Dict[str, Union[int, float]]

References

[1] Morgan, R. and Gallagher, M., 2017.: Analysing and characterising optimization problems using length scale. Soft Computing, 21(7), pp.1735-1752.

pflacco.misc_features.calculate_sobol_indices_features(f: Callable[[List[float]], float], dim: int, lower_bound: Union[List[float], float], upper_bound: Union[List[float], float], sampling_coefficient: int = 10000, n_bins: int = 20, min_obs_per_bin_factor: float = 1.5, seed: Optional[int] = None) → Dict[str, Union[int, float]]#

Calculation of Sobol Indices, Fitness- and State-Distribution features. These features consists of Sobol method as well as extracting distribution moments of raw samples as well as histogram structures.

sobol_indices.degree_of_variable_interaction: Describes the degree of variable interaction
sobol_indices.coeff_var_x_sensitivy: Describes how sensitive the objective function reacts to changes in the decision space
fitness_variance: Variance of the objective values
state_variance: Variance of the averaged distances within a histogram bin
fitness_skewness: Skewness of the normalized objective values
state_skewness: Skewness of the averaged distances within a histogram bin

Parameters:

f (Callable[[List[float]], float]) – Objective function to be optimized.
dim (int) – Dimensionality of the decision space.
lower_bound (Union[List[float], float]) – Lower bound of variables of the decision space.
upper_bound (Union[List[float], float]) – Upper bound of variables of the decision space.
sampling_coefficient (int, optional) – Factor which determines the sample size. The actual sample size used in the paper is sampling_coffient * (dim + 2), by default 10000.
n_bins (int, optional) – Number of bins used in the construction of the histogram, by default 20.
min_obs_per_bin_factor (float, optional) – Bins with less than min_obs_per_bin_factoro * dim are ignored in the computation (see Equation 5 of [1]), by default 1.5.
seed (Optional[int], optional) – Seed for reproducability, by default None.

Returns:

Dictionary consisting of the calculated features.

Return type:

Dict[str, Union[int, float]]

References

[1] Waibel, C., Mavromatidis, G. and Zhang, Y.W., 2020, July.: Fitness Landscape Analysis Metrics based on Sobol Indices and Fitness-and State-Distributions. In 2020 IEEE Congress on Evolutionary Computation (CEC) (pp. 1-8).