matcal.core.surrogates
Functions
|
Negative Log Predictive Density Only applicable for GPR |
|
Classes
|
This class takes the results of the |
|
This class takes the results of the |
|
This class is responsible for taking source data and a parameter set and generating an efficient surrogate for predicting probe based quantities of interest. |
- class matcal.core.surrogates.SurrogateGenerator(evaluation_information, interpolation_field=None, interpolation_locations=200, training_fraction=0.8, surrogate_type='PCA Multiple Regressors', regressor_type='Gaussian Process', test_eval_info=None, **regressor_kwargs)[source]
This class is responsible for taking source data and a parameter set and generating an efficient surrogate for predicting probe based quantities of interest. The generator uses Principal Component Analysis(PCA) to generate an efficient representation of the data and then trains a predictor in the latent space identified by the PCA. To preform these calculations sklearn is leveraged to perform the correct scaling, PCA, and predictor training required.
- Parameters:
evaluation_information (
StudyResults) – A container of the relevant information to form a surrogate off of a body of data. This is intended to be based off of the results of a MatCal conducted sampling study. In addition, previously run surrogates joblib files can be passed to rerun the surrogate generation process with new settings.training_fraction (float) – What fraction of the source data to use as training data. Value should be 0 < training_fraction <= 1. If training_fraction == 1, test_evaluation_information must be provided.
interpolation_field (str) – the field that will be the independent field for surrogate results.
surrogate_type (str) – What type of surrogate to run. Details of each are detailed in the surrogate’s documentation. Currently the only available options are “PCA Multiple Regressors” and “PCA Monolithic Regressor”. The Default is set to “PCA Multiple Regressors” as it has better performance but uses more memory than the monolithic surrogate.
regressor_type (str) – The identifier key for what core regressor form to use as the predictor. Only “Random Forest” and “Gaussian Process” are accepted. Currently, MatCal uses the implementations of these tools from the sklearn library.
test_eval_info – A container of the relevant information to test a surrogate generated from a MatCal sampling study. This data is only used and must be provided if training_fraction == 1.0.
regressor_kwargs – A keyword selection of parameters to pass to the predictor used. Please refer to the sklearn documentation for more information for what can be passed to the predictors.
- Interpolation_locations:
the number of interpolation locations for the surrogate to output at or an array-like of values for the interpolation locations. If a number of locations is given, the surrogate will linearly space the points over the min and max value for the interpolation field for all evaluations.
- Interpolation_locations:
int or Array-like
- set_model_and_state(model_name=None, state=None)[source]
Set the evaluation set and state to select from the study results.
- Parameters:
model_name – This is the model name for which the surrogate will generate results. If no argument is passed, the surrogate generator will expect the study to have a single model.
state (str or
State) – This specifies the state for the model for which the surrogate will generate results. It can be either aStateobject or a state name. If no argument is provided, this method will assume that only a single state is associated with the model for which the surrogate is being generated.
- set_PCA_details(decomp_var=0.99, reconstruction_error=None)[source]
- Parameters:
decomp_var (float) – What level of the total variance should be accounted for in the PCA decomposition. Values closer to 1 will keep more modes than lower values. The more modes kept the more difficult it can become to train the predictors. A default value of .99 is chosen because it is a common conventional choice, and explains the vast majority of the seen behavior, and for an appropriate data set can lead to very few modes being retained.
- set_surrogate_details(surrogate_type='PCA Multiple Regressors', regressor_type='Gaussian Process', training_fraction=0.8, interpolation_locations=None, test_eval_info=None, **regressor_kwargs)[source]
This method provides an other avenue to alter the surrogate generation parameters after initialization.
- Parameters:
surrogate_type (str) – What type of surrogate to run. Details of each are detailed in the surrogate’s documentation. Currently the only available options are “PCA Multiple Regressors” and “PCA Monolithic Regressor”. The Default is set to “PCA Multiple Regressors” as it has better performance but uses more memory than the monolithic surrogate.
training_fraction (float) – What fraction of the source data to use as training data. Value should be 0 < training_fraction < 1.
regressor_type (str) – The identifier key for what core regressor form to use as the predictor. Only “Random Forest” and “Gaussian Process” are accepted. Currently, MatCal uses the implementations of these tools from the sklearn library.
test_eval_info – A container of the relevant information to test a surrogate generated from a MatCal sampling study. This data is only used and must be provided if training_fraction == 1.0.
regressor_kwargs – A keyword selection of parameters to pass to the predictor used. Please refer to the sklearn documentation for more information for what can be passed to the predictors.
- set_fields_to_log_scale(*field_names)[source]
For fields of interest that span over orders of magnitude it can be easier to train to the natural log of the data rather than the raw data. Passing fields here will inform the surrogate and the generator that these fields should be evaluated on the natural log scale. Any predictions given by the surrogate will be at the original scale. This just adds an additional scaling/descaling step within it. Note that data that has values less than or equal to zero will need to be scaled or modified by the user prior to selecting them as an option for log scaling.
- Parameters:
field_names (str) – a series of field names to train on the log scale
- set_fields_of_interest(*fields_of_interest)[source]
Specify which data fields the surrogate should model.
By default the surrogate generator attempts to build a model for every field present in the source data (aside from the independent interpolation field). Use this method to limit the surrogate to a user‑selected subset of fields.
- Parameters:
fields_of_interest (
*str) – One or more field names that should be included in the surrogate model.
Note
The independent interpolation field (if any) is never treated as a
field of interest and is automatically excluded; you should not pass it here. * Fields that are not listed will be ignored during surrogate generation and will not appear in the surrogate’s output.
- generate(save_filename: str, preprocessing_function: Callable = None, plot_n_worst: int = 0) Callable[source]
Generates a surrogate based on the information passed to it upon initialization
- Parameters:
save_filename (str) – the base of a filename without any extensions to be used to record the surrogate.
preprocessing_function (Callable) – an optional function that modifies the model data before it is passed to the tools that generate the surrogate model.
source_data_dict (dict(str, Array-Like)) – a dictionary of training data from which to generate the surrogate. Its keys are the field names for the data, rows contain data samples and and columns are the data pts at each independent variable data point. Not intended to be an argument for users. Passing data this way will take the place of any other data source.
plot_n_worst (int) – Generate a number of plots that show the worst recreations made by the surrogate. The number of plots made is equal to the value passed to this argument. Any values less than 1 will result in no plots being generated or worst analysis being performed.
- Returns:
a callable surrogate
- Return type:
- class matcal.core.surrogates.MatCalSurrogateBase(latent_scores, fields_to_log_scale, interp_field, interp_locs, parameter_scaler, regressors, decomposers, data_scalers, latent_scalers, param_ranges)[source]
Surrogate abstract base class from which all surrogates should be derived in MatCal.
- property scores
The test and train R2 scores for the surrogate.
- property max_errors
The test and train max errors for the surrogate in the given field’s units.
- enforce_training_data_parameter_range(enforce_training_data_parameter_range=True)[source]
By default the surrogate will error if called with a parameter set outside of the parameter ranges used in the training data set. To call the surrogate for parameters outside of the training data range, call this method with the argument set to False. Adherence to the training data range can be reactivated by calling this method with the argument set to True.
- Parameters:
ignore_training_range – bool flag to ignore training data range.
- set_parameter_ranges(*args, **param_ranges)[source]
Update the admissible parameter ranges that the user can call the surrogate to evaluate.
The surrogate stores, for each input parameter, a lower and upper bound that define the region of parameter space where the surrogate is considerd valid. When the surrogate is called, values that fall outside of these ranges trigger a
RuntimeErrorunlessenforce_training_data_parameter_range()has been disabled.Only keyword arguments are accepted; each keyword corresponds to a parameter name and must map to a two‑element sequence
(lower, upper)describing the allowed range for that parameter.- Parameters:
param_ranges (
dictorOrderedDictwhere each value is an iterable of twofloat/intvalues.) – Mapping of parameter names to (lower, upper) bounds.- Raises:
RuntimeError – If any positional arguments are supplied, or if a required parameter is missing from
param_ranges.RuntimeError – If a supplied parameter name is not part of the surrogate’s
parameter_order(i.e., it was not present in the training data).ValueError – If the lower bound is greater than the upper bound for any parameter.
TypeError – If either bound is not a real number (i.e., not an instance of
numbers.Real).
Example
>>> surrogate.set_parameter_ranges( ... temperature=(300.0, 800.0), ... pressure=(1e5, 5e5) ... )
- matcal.core.surrogates.nlpd(regressor, input_values, evals)[source]
Negative Log Predictive Density Only applicable for GPR
- class matcal.core.surrogates.MatCalPCASurrogateBase(latent_scores, fields_to_log_scale, interp_field, interp_locs, parameter_scaler, regressors, decomposers, data_scalers, latent_scalers, param_ranges)[source]
Surrogate abstract base class from which all surrogates should be derived in MatCal.
- property parameter_order
A list of strings that describe the correct order to input parameters into the surrogate prediction.
- property independent_field
The name of the independent field used in the surrogate prediction
- property prediction_locations
The array of locations that the surrogate predicts at
- enforce_training_data_parameter_range(enforce_training_data_parameter_range=True)
By default the surrogate will error if called with a parameter set outside of the parameter ranges used in the training data set. To call the surrogate for parameters outside of the training data range, call this method with the argument set to False. Adherence to the training data range can be reactivated by calling this method with the argument set to True.
- Parameters:
ignore_training_range – bool flag to ignore training data range.
- property max_errors
The test and train max errors for the surrogate in the given field’s units.
- property scores
The test and train R2 scores for the surrogate.
- set_parameter_ranges(*args, **param_ranges)
Update the admissible parameter ranges that the user can call the surrogate to evaluate.
The surrogate stores, for each input parameter, a lower and upper bound that define the region of parameter space where the surrogate is considerd valid. When the surrogate is called, values that fall outside of these ranges trigger a
RuntimeErrorunlessenforce_training_data_parameter_range()has been disabled.Only keyword arguments are accepted; each keyword corresponds to a parameter name and must map to a two‑element sequence
(lower, upper)describing the allowed range for that parameter.- Parameters:
param_ranges (
dictorOrderedDictwhere each value is an iterable of twofloat/intvalues.) – Mapping of parameter names to (lower, upper) bounds.- Raises:
RuntimeError – If any positional arguments are supplied, or if a required parameter is missing from
param_ranges.RuntimeError – If a supplied parameter name is not part of the surrogate’s
parameter_order(i.e., it was not present in the training data).ValueError – If the lower bound is greater than the upper bound for any parameter.
TypeError – If either bound is not a real number (i.e., not an instance of
numbers.Real).
Example
>>> surrogate.set_parameter_ranges( ... temperature=(300.0, 800.0), ... pressure=(1e5, 5e5) ... )
- class matcal.core.surrogates.MatCalMonolithicPCASurrogate(latent_scores, fields_to_log_scale, interp_field, interp_locs, parameter_scaler, regressors, decomposers, data_scalers, latent_scalers, param_ranges)[source]
This class takes the results of the
generate()and create a callable object that can generate predictions.- Parameters:
surrogate_information – The file path to or the lists of information generated by
generate().
Surrogate abstract base class from which all surrogates should be derived in MatCal.
- enforce_training_data_parameter_range(enforce_training_data_parameter_range=True)
By default the surrogate will error if called with a parameter set outside of the parameter ranges used in the training data set. To call the surrogate for parameters outside of the training data range, call this method with the argument set to False. Adherence to the training data range can be reactivated by calling this method with the argument set to True.
- Parameters:
ignore_training_range – bool flag to ignore training data range.
- property independent_field
The name of the independent field used in the surrogate prediction
- property max_errors
The test and train max errors for the surrogate in the given field’s units.
- property parameter_order
A list of strings that describe the correct order to input parameters into the surrogate prediction.
- property prediction_locations
The array of locations that the surrogate predicts at
- property scores
The test and train R2 scores for the surrogate.
- set_parameter_ranges(*args, **param_ranges)
Update the admissible parameter ranges that the user can call the surrogate to evaluate.
The surrogate stores, for each input parameter, a lower and upper bound that define the region of parameter space where the surrogate is considerd valid. When the surrogate is called, values that fall outside of these ranges trigger a
RuntimeErrorunlessenforce_training_data_parameter_range()has been disabled.Only keyword arguments are accepted; each keyword corresponds to a parameter name and must map to a two‑element sequence
(lower, upper)describing the allowed range for that parameter.- Parameters:
param_ranges (
dictorOrderedDictwhere each value is an iterable of twofloat/intvalues.) – Mapping of parameter names to (lower, upper) bounds.- Raises:
RuntimeError – If any positional arguments are supplied, or if a required parameter is missing from
param_ranges.RuntimeError – If a supplied parameter name is not part of the surrogate’s
parameter_order(i.e., it was not present in the training data).ValueError – If the lower bound is greater than the upper bound for any parameter.
TypeError – If either bound is not a real number (i.e., not an instance of
numbers.Real).
Example
>>> surrogate.set_parameter_ranges( ... temperature=(300.0, 800.0), ... pressure=(1e5, 5e5) ... )
- class matcal.core.surrogates.MatCalMultiModalPCASurrogate(latent_scores, fields_to_log_scale, interp_field, interp_locs, parameter_scaler, regressors, decomposers, data_scalers, latent_scalers, param_ranges)[source]
This class takes the results of the
generate()and create a callable object that can generate predictions.- Parameters:
surrogate_information – The file path to or the lists of information generated by
generate().
Surrogate abstract base class from which all surrogates should be derived in MatCal.
- enforce_training_data_parameter_range(enforce_training_data_parameter_range=True)
By default the surrogate will error if called with a parameter set outside of the parameter ranges used in the training data set. To call the surrogate for parameters outside of the training data range, call this method with the argument set to False. Adherence to the training data range can be reactivated by calling this method with the argument set to True.
- Parameters:
ignore_training_range – bool flag to ignore training data range.
- property independent_field
The name of the independent field used in the surrogate prediction
- property max_errors
The test and train max errors for the surrogate in the given field’s units.
- property parameter_order
A list of strings that describe the correct order to input parameters into the surrogate prediction.
- property prediction_locations
The array of locations that the surrogate predicts at
- property scores
The test and train R2 scores for the surrogate.
- set_parameter_ranges(*args, **param_ranges)
Update the admissible parameter ranges that the user can call the surrogate to evaluate.
The surrogate stores, for each input parameter, a lower and upper bound that define the region of parameter space where the surrogate is considerd valid. When the surrogate is called, values that fall outside of these ranges trigger a
RuntimeErrorunlessenforce_training_data_parameter_range()has been disabled.Only keyword arguments are accepted; each keyword corresponds to a parameter name and must map to a two‑element sequence
(lower, upper)describing the allowed range for that parameter.- Parameters:
param_ranges (
dictorOrderedDictwhere each value is an iterable of twofloat/intvalues.) – Mapping of parameter names to (lower, upper) bounds.- Raises:
RuntimeError – If any positional arguments are supplied, or if a required parameter is missing from
param_ranges.RuntimeError – If a supplied parameter name is not part of the surrogate’s
parameter_order(i.e., it was not present in the training data).ValueError – If the lower bound is greater than the upper bound for any parameter.
TypeError – If either bound is not a real number (i.e., not an instance of
numbers.Real).
Example
>>> surrogate.set_parameter_ranges( ... temperature=(300.0, 800.0), ... pressure=(1e5, 5e5) ... )