matcal.core.adaptive_surrogates

This module contains adaptive surrogates.

Classes

`AdaptiveSurrogate`(target_field_name, ...)	Stores the surrogate and training and test information regarding the surrogate and the progress of training for the surrogate.
`AdaptiveSurrogateStudyBase`(*parameters)
`KFoldCrossValidation`(nsplits, group_kfold, ...)	Initialize the K-Fold Cross-Validation with a given surrogate model.
`LeaveOneOutCrossValidation`(scale, metric, ...)	Initialize the LOOCV.
`SparseGridAdaptiveSurrogate`(...)	Create an `AdaptiveSurrogate` instance.
`SparseGridAdaptiveSurrogateStudy`(*parameters)	The SparseGridAdaptiveSurrogateStudy builds a Sparse Grid adaptive surrogate using the PyApprox library.
`VoronoiAdaptiveSurrogateStudy`(*parameters)	Initialize the VoronoiAdaptiveSurrogateStudy
`VoronoiTessellation`(points, bounds, finite_only)	Initialize the VoronoiBatchSamplingStudy

class matcal.core.adaptive_surrogates.AdaptiveSurrogate(target_field_name, indep_variable_name, indep_variable_values, variable_transformer, test_params, test_responses, param_names, bounds)[source]

Stores the surrogate and training and test information regarding the surrogate and the progress of training for the surrogate.

Can also be used to call the surrogate objects for predictions using the surrogate models. Since all iterations of the surrogate are stored, any version of the surrogate can be called.

Create an AdaptiveSurrogate instance.

Parameters:

target_field_name (str) – Name of the model field that the surrogate will approximate (e.g., "temperature").
indep_variable_name (str) – Name of the auxiliary independent variable (e.g., "time" or "x_position") that will be attached to the surrogate output.
indep_variable_values (array‑like of real numbers) – The values of the independent variable at which the surrogate should be evaluated.
variable_transformer (object with map_to_canonical and map_from_canonical methods) – Object that maps model parameters to the canonical space required by the surrogate library.
test_params (numpy.ndarray of shape (n_parameters, n_test)) – Parameter samples used for testing the surrogate.
test_responses (numpy.ndarray of shape (n_test, n_qois)) – Corresponding model responses for the test parameter samples.
param_names (list[str]) – Ordered list of parameter names that define the mapping between positional arguments and model parameters.

The constructor stores the supplied information and prepares internal containers that will hold the surrogate objects, error histories and sample counts as the adaptive training proceeds.

enforce_training_data_parameter_range(enforce_training_data_parameter_range=True)[source]

By default the surrogate will error if called with a parameter set outside of the parameter ranges used in the training data set. To call the surrogate for parameters outside of the training data range, call this method with the argument set to False. Adherence to the training data range can be reactivated by calling this method with the argument set to True.

Parameters:: ignore_training_range – bool flag to ignore training data range.

property current_surrogate: Return the most recent surrogate (or None if no iteration yet).

property average_error_history

Returns the list of errors for the average error history. The average error is calculated using

$E_{avg} = \frac{\lVert \mathbf{R}_{\text{test}} - \hat{\mathbf{R}} \rVert_{2}} {N}$

where $N$ is the number of QoIs in the response, ${R}_{\text{test}}$ is the test responses and ${\hat{R}}$ is the surrogate responses.

property max_error_history

Returns the list of errors for the max error history. The max error is calculated using

$E_{max} = \lVert \mathbf{R}_{\text{test}} - \hat{\mathbf{R}} \rVert_{\infty}$

where ${R}_{\text{test}}$ is the test responses and ${\hat{R}}$ is the surrogate responses.

score(surrogate_index=-1)[source]

Returns the $R^2$ test score for the surrogate.

Parameters:: surrogate_index (int) – optionally pick which surrogate to return the score for.

property sample_count_history: Returns a list containing the number of samples used by each surrogate training step.

class matcal.core.adaptive_surrogates.SparseGridAdaptiveSurrogate(target_field_name, indep_variable_name, indep_variable_values, variable_transformer, test_params, test_responses, param_names, bounds)[source]

Create an AdaptiveSurrogate instance.

Parameters:

target_field_name (str) – Name of the model field that the surrogate will approximate (e.g., "temperature").
indep_variable_name (str) – Name of the auxiliary independent variable (e.g., "time" or "x_position") that will be attached to the surrogate output.
indep_variable_values (array‑like of real numbers) – The values of the independent variable at which the surrogate should be evaluated.
variable_transformer (object with map_to_canonical and map_from_canonical methods) – Object that maps model parameters to the canonical space required by the surrogate library.
test_params (numpy.ndarray of shape (n_parameters, n_test)) – Parameter samples used for testing the surrogate.
test_responses (numpy.ndarray of shape (n_test, n_qois)) – Corresponding model responses for the test parameter samples.
param_names (list[str]) – Ordered list of parameter names that define the mapping between positional arguments and model parameters.

The constructor stores the supplied information and prepares internal containers that will hold the surrogate objects, error histories and sample counts as the adaptive training proceeds.

property average_error_history

Returns the list of errors for the average error history. The average error is calculated using

$E_{avg} = \frac{\lVert \mathbf{R}_{\text{test}} - \hat{\mathbf{R}} \rVert_{2}} {N}$

where $N$ is the number of QoIs in the response, ${R}_{\text{test}}$ is the test responses and ${\hat{R}}$ is the surrogate responses.

property current_surrogate: Return the most recent surrogate (or None if no iteration yet).

enforce_training_data_parameter_range(enforce_training_data_parameter_range=True)

By default the surrogate will error if called with a parameter set outside of the parameter ranges used in the training data set. To call the surrogate for parameters outside of the training data range, call this method with the argument set to False. Adherence to the training data range can be reactivated by calling this method with the argument set to True.

Parameters:: ignore_training_range – bool flag to ignore training data range.

property max_error_history

Returns the list of errors for the max error history. The max error is calculated using

$E_{max} = \lVert \mathbf{R}_{\text{test}} - \hat{\mathbf{R}} \rVert_{\infty}$

where ${R}_{\text{test}}$ is the test responses and ${\hat{R}}$ is the surrogate responses.

property sample_count_history: Returns a list containing the number of samples used by each surrogate training step.

score(surrogate_index=-1)

Returns the $R^2$ test score for the surrogate.

Parameters:: surrogate_index (int) – optionally pick which surrogate to return the score for.

class matcal.core.adaptive_surrogates.AdaptiveSurrogateStudyBase(*parameters)[source]

Parameters:: parameters (list(Parameter) or ParameterCollection) – The parameters of interest for the study.
Raises:: StudyTypeError – if parameters is of incorrect type.

set_error_stopping_criteria(average_l2_error_goal: float = 0.01, max_abs_error_goal: float = 0.1)[source]

Set the error thresholds that determine when the adaptive surrogate training stops.

When the average L2 error falls below average_l2_error_goal or the maximum absolute error falls below max_abs_error_goal the training loop terminates (provided at least two batches have been evaluated).

Parameters:

average_l2_error_goal (float, optional) – Desired upper bound for the average L2 error. Must be a positive number.
max_abs_error_goal (float, optional) – Desired upper bound for the maximum absolute error. Must be a positive number.

set_independent_variable(independent_variable, independent_variable_values)[source]

Specify an independent (auxiliary) variable and the values at which the surrogate will be evaluated.

This variable is not a model input; it is a field that will be used later (for example, a spatial coordinate, a time step, or any other scalar quantity that the surrogate should be conditioned on). The surrogate will be trained on the parameter samples generated by the study and then provide a response at each value supplied in independent_variable_values.

Parameters:

independent_variable (str) – Name of the independent variable (e.g. "time", "x_position", …) that will be attached to the surrogate output.
independent_variable_values (array‑like of real numbers) – A 1‑D array‑like collection of real numbers indicating the points at which the surrogate should be queried.

set_number_of_test_samples(number_of_test_samples)[source]

Set the number of samples that will be used for testing. By default we test against max_training_samples/20 or the number of parameters*10, whichever is greater.

Parameters:: max_training_samples (int) – desired number of test samples

set_max_training_samples(max_training_samples=1000)[source]

Set the maximum number of training samples you want to be run for Sparse Grid surrogate generation. If the convergence criteria is not reached, the training for the surrogate will stop after max_training_samples has been reached.

Parameters:: max_training_samples (int) – desired maximum number of samples

set_test_data(study_results)[source]

Provide an external test‑data set for the adaptive surrogate study. This must contain the model name and field names necessary for the surrogate. This should only be used when re-running surrogate generation with a previously existing test set from a previous run where surrogate training was attempted. The independent variable data must also match what is specified for surrogate training.

If this method is not called, the adaptive study will automatically generate a test data set using a Halton sampling design. The number of test samples is taken from the value set via set_number_of_test_samples() (default is max_training_samples // 20 or n_parameters * 10, whichever is larger). Supplying an explicit test data set overrides that behavior.

Parameters:

study_results (StudyResults or str) – The test data to be used for surrogate evaluation. * StudyResults – a StudyResults instance containing the desired parameter history and simulation results. * str – a path to a serialized .joblib file that, when loaded returns a StudyResults object.

Raises:

TypeError – If study_results is neither a StudyResults instance nor a string.
FileNotFoundError – If study_results is a string but the file cannot be located or loaded.
RuntimeError – If the loaded object is not a StudyResults instance.

Notes:

The supplied test set is only used for validation of the surrogate;

it is never incorporated into the training data. * Calling this method multiple times replaces any previously stored test data with the most recent value. * The test data must be compatible with the study’s parameter space (same parameter names and bounds as the training data).

set_target_field_name(target_field_name)[source]

Specify the field name for the response that the surrogate model will seek to replicate. This is generally a model response such as temperature, load, etc.

Parameters:: target_field_name (str) – the name of the field that the surrogate will replicate

set_test_group_random_seed(seed)[source]

Set the random seed for the random generator that the study uses to generate the test samples only. The method should be called before launch() to guarantee reproducibility.

Parameters:: seed (int) – Integer seed for the pseudo‑random number generator.

add_evaluation_set(model, state=None, qoi_extractor=None)[source]

Add an evaluation set that uses a SimulationResultsSynchronizer generated from the study’s independent variable, independent‑variable values, and target field name.

Warning

For adaptive surrogates, this can only be called once as the training points are adaptively chosen based on the response of interest.

This method is a thin wrapper around StudyBase.add_evaluation_set(). It accepts only the model (required) and an optional state argument. state must be a single State instance; a collection of states is not supported. If state is None MatCal’s default state will be used.

The synchronizer is built automatically from the attributes that were defined via set_independent_variable() and set_target_field_name().

Parameters:

model (ModelBase) – The model that will generate the simulation data.
state (State, optional) – The single state to which the evaluation set should be applied. If None the model’s default state is used.
qoi_extractor (UserDefinedExtractor) – Provide a UserDefinedExtractor that will act on the simulation results to provide a quantity of interest for the surrogate. It must return target field values of the same length of the independent variable values.

Raises:

RuntimeError – If the required attributes for the synchronizer (independent variable, its values, or target field name) have not been set.

launch()[source]

Run the initial test‑sampling study in a dedicated sub‑directory, then continue with the adaptive Sparse‑Grid workflow.

The test‑sampling phase is performed by a standard HaltonStudy to generate the required test points. If the user called StudyBase.set_working_directory() before launching the study, the test‑sampling directory is created by appending the suffix "_test_samples" to the user‑provided path. Otherwise, the test samples are run in a local directory named "test_samples". After the test sample study finishes, the original working directory is restored and the surrogate‑building routine is started.

property surrogate

Return the AdaptiveSurrogate instance that holds the surrogate models and their training history.

Returns:: The surrogate object, or None if the study has not yet created one (i.e., before launch() is called).
Return type:: AdaptiveSurrogate | None

set_surrogate_save_filename(filename)[source]

Set the path used to save the surrogate object after each training batch.

The surrogate (an AdaptiveSurrogate instance) is periodically saved to disk with matcal.core.serializer_wrapper.matcal_save(). The filename must be a non‑empty string that ends with the .joblib extension. The directory component of the path is not created automatically; it must already exist or be created by the user prior to calling this method.

Parameters:

filename (str) – Full path (absolute or relative) to the file that will store the surrogate. The filename must end with .joblib. Example: "my_model_sparse_grid_surrogate.joblib" or "/tmp/surrogate.joblib".

Raises:

ValueError – If filename does not contain the required .joblib suffix or is empty.
TypeError – If filename is not a string.

property surrogate_save_filename

Retrieve the filename (including the .joblib extension) that will be used to save the surrogate object after each training batch.

Returns:: The absolute or relative path supplied via set_surrogate_save_filename(), or None if no filename has been set.
Return type:: str | None

property results_synchronizer

Return the SimulationResultsSynchronizer that was created for this adaptive surrogate study.

The synchronizer is responsible for evaluating the model at the user‑provided independent‑variable locations and extracting the target field (the quantity of interest) from the simulation output. It is constructed the first time add_evaluation_set() is called and stored internally as self._results_synchronizer. As a result, this should be called after an evaluation set is added to the study.

Returns:: The SimulationResultsSynchronizer instance associated with the study, or None if the synchronizer has not yet been created (i.e. add_evaluation_set has not been called).
Return type:: SimulationResultsSynchronizer | None

add_parameter_preprocessor(parameter_preprocessor)

Add a parameter preprocessor to the study that will operate on the parameters before they are sent to the models. See UserDefinedParameterPreprocessor.

Parameters:: parameter_preprocessor (UserDefinedParameterPreprocessor) – the parameter preprocessor that will modify and update the given model parameters

property final_results_filename

Returns the filename for the final results file for the current study.

return: final results filename as an absolute path rtype: str

make_residuals_study(): This changes the stored total objectives to be the L2 norm of one long concatenated residual from all objectives added using add_evaluation_set()

make_total_objective_study(): This changes the stored total objectives to be a summation of all metric function results.

plot_progress(): Calling this method will cause matcal to generate automatic plots after each batch of parameter evaluations. These plots are made using the standard plotter and will show things such as objective value evolution.

restart()

Sets the study to launch in restart mode. The study will use existing results from previous launches to populate the results instead of running the simulations again. Note that this feature requires that no changes to the study to be made in order for the study to produce correct results.

Files from previous runs are read in to this study, they should not be deleted. Missing files may cause errors in the restart.

If any random number generation is used in the calculation. It is important to set the same seed value as used previously

property results: Return access to the study’s results. Will return None, if study has not been run.

run_in_serial()

Tell MatCal to run evaluations in serial. This is only recommended if the study is serial, like a MCMC Bayes Study, and the model evaluations are fast, like a python model.

Running in serial avoids the overhead of reloading large data sets that are necessary in async studies.

set_Halton_scramble(scramble=True)

Parameters:: scramble (bool) – set the scramble keyword for the numpy Halton object.

set_cleanup_mode(new_pruner: DirectoryPrunerBase): Changes the pruner to the object passed as an argument

set_core_limit(core_limit, override_max_limit=False)

Sets the total number of cores that the study may use.

Parameters:

core_limit (int) – The max number of cores that the study can use at any time.
override_max_limit – Override the default max cores that can be specified for a given study. The current limit of 500 is recommended by the MatCal team but might not be best for all cases.

Raises:

StudyTypeError – if the passed value is not an int.

set_number_of_samples(nsamples=20, skip=None)

Set the number of samples for the study.

Parameters:

nsamples (int) – Number of parameter samples to generate from Halton sequence.
skip (int) – When continuing an existing design, the user may optionally skip ahead in the Halton sequence by an amount determined by ‘skip’.

set_parameters(*parameters)

Parameters:: parameters (Parameter or ParameterCollection) – The parameters of interest for the study.
Raises:: StudyTypeError – if the parameters are of incorrect type.

set_results_storage_options(data: bool = True, qois: bool = True, residuals: bool = True, objectives: bool = True, weighted_conditioned: bool = False, results_save_frequency: int = 1)

Set which history information to save and return with the study results. You can also down sample which evaluations to save using results_save_frequency. This is particularly useful if you wish to not store finite difference evaluations for gradient based studies. The total objective is always stored.

Parameters:

data (bool) – Store the raw data for each simulation and the raw experimental data for each objective for each desired evaluation.
qois (bool) – Store the QoIs for each objective for each desired evaluation. This includes both experiment and simulation QoIs
residuals (bool) – Store the residuals for each objective for each desired evaluation.
objectives (bool) – Store the objective by state and evaluation set for each desired evaluation.
weighted_conditioned (bool) – Store the weighted and conditioned values for each desired evaluation. This will save the weighted and conditioned, residuals, simulation qois and experiment qois.
results_save_frequency (int) – Set how the results save interval. For studies where finite difference derivatives are used, an interval of $n+1$ will exclude finite difference results from the saved results history.

set_seed(seed)

Set the random seed for the random generator that the study uses to generate the samples. The method should be called before launch() to guarantee reproducibility.

Parameters:: seed (int) – Integer seed for the pseudo‑random number generator.

set_use_threads(always_use_threads=False)

By default, MatCal assumes that the model being run is CPU intensive. As a result, it runs each model in a subprocess which can result in some additional overhead. If running studies cheaper python models, it may be beneficial to use threading instead of a subprocess. Using this method will run the study with threading if only one model can be evaluated at a time. You can optionally run with threads even with concurrent model evaluations with the “always_use_threads” option; however, this can be less reliable. For large memory calibrations, we always recommend using subprocess.

Finally, any external executable is always run using subprocess, but threading can be use to manage that job and return its results.

Parameters:: always_use_threads (bool) – if true, MatCal will use threads over subprocess for concurrent modeling jobs. Defaults to False.

set_working_directory(working_directory, remove_existing=False)

By default, MatCal runs in the current working directory. This method allows the user to specify a subdirectory in the current directory for the study to be run in. This method will create only the last directory in the path. So if the desired subdirectory is under a multiple folders from the current directory MatCal will error if the head of the path does not exist. See os.path.split() for a definition of the path “head”.

Parameters:

working_directory (str) – The desired working directory for the current study. MatCal will only create the last folder if the path is a nested path.
remove_existing – If True, then the directory will be removed if pre-existing at study launch.

class matcal.core.adaptive_surrogates.SparseGridAdaptiveSurrogateStudy(*parameters)[source]

The SparseGridAdaptiveSurrogateStudy builds a Sparse Grid adaptive surrogate using the PyApprox library. They generally behave well for larger parameter spaces and problems with discontinuities in the response of interest. Some downsides for these surrogates is that one must be trained independently for each response of interest. As a result, this surrogate requires only a single model and state be passed to it. It also requires that a target field name be specified for building the surrogate that signifies the response of interest for the surrogate.

Parameters:: parameters (list(Parameter) or ParameterCollection) – The parameters of interest for the study.
Raises:: StudyTypeError – if parameters is of incorrect type.

add_evaluation_set(model, state=None, qoi_extractor=None)

Add an evaluation set that uses a SimulationResultsSynchronizer generated from the study’s independent variable, independent‑variable values, and target field name.

Warning

For adaptive surrogates, this can only be called once as the training points are adaptively chosen based on the response of interest.

This method is a thin wrapper around StudyBase.add_evaluation_set(). It accepts only the model (required) and an optional state argument. state must be a single State instance; a collection of states is not supported. If state is None MatCal’s default state will be used.

The synchronizer is built automatically from the attributes that were defined via set_independent_variable() and set_target_field_name().

Parameters:

model (ModelBase) – The model that will generate the simulation data.
state (State, optional) – The single state to which the evaluation set should be applied. If None the model’s default state is used.
qoi_extractor (UserDefinedExtractor) – Provide a UserDefinedExtractor that will act on the simulation results to provide a quantity of interest for the surrogate. It must return target field values of the same length of the independent variable values.

Raises:

RuntimeError – If the required attributes for the synchronizer (independent variable, its values, or target field name) have not been set.

add_parameter_preprocessor(parameter_preprocessor)

Add a parameter preprocessor to the study that will operate on the parameters before they are sent to the models. See UserDefinedParameterPreprocessor.

Parameters:: parameter_preprocessor (UserDefinedParameterPreprocessor) – the parameter preprocessor that will modify and update the given model parameters

property final_results_filename

Returns the filename for the final results file for the current study.

return: final results filename as an absolute path rtype: str

launch()

Run the initial test‑sampling study in a dedicated sub‑directory, then continue with the adaptive Sparse‑Grid workflow.

The test‑sampling phase is performed by a standard HaltonStudy to generate the required test points. If the user called StudyBase.set_working_directory() before launching the study, the test‑sampling directory is created by appending the suffix "_test_samples" to the user‑provided path. Otherwise, the test samples are run in a local directory named "test_samples". After the test sample study finishes, the original working directory is restored and the surrogate‑building routine is started.

make_residuals_study(): This changes the stored total objectives to be the L2 norm of one long concatenated residual from all objectives added using add_evaluation_set()

make_total_objective_study(): This changes the stored total objectives to be a summation of all metric function results.

plot_progress(): Calling this method will cause matcal to generate automatic plots after each batch of parameter evaluations. These plots are made using the standard plotter and will show things such as objective value evolution.

restart()

Sets the study to launch in restart mode. The study will use existing results from previous launches to populate the results instead of running the simulations again. Note that this feature requires that no changes to the study to be made in order for the study to produce correct results.

Files from previous runs are read in to this study, they should not be deleted. Missing files may cause errors in the restart.

If any random number generation is used in the calculation. It is important to set the same seed value as used previously

property results: Return access to the study’s results. Will return None, if study has not been run.

property results_synchronizer

Return the SimulationResultsSynchronizer that was created for this adaptive surrogate study.

The synchronizer is responsible for evaluating the model at the user‑provided independent‑variable locations and extracting the target field (the quantity of interest) from the simulation output. It is constructed the first time add_evaluation_set() is called and stored internally as self._results_synchronizer. As a result, this should be called after an evaluation set is added to the study.

Returns:: The SimulationResultsSynchronizer instance associated with the study, or None if the synchronizer has not yet been created (i.e. add_evaluation_set has not been called).
Return type:: SimulationResultsSynchronizer | None

run_in_serial()

Tell MatCal to run evaluations in serial. This is only recommended if the study is serial, like a MCMC Bayes Study, and the model evaluations are fast, like a python model.

Running in serial avoids the overhead of reloading large data sets that are necessary in async studies.

set_Halton_scramble(scramble=True)

Parameters:: scramble (bool) – set the scramble keyword for the numpy Halton object.

set_cleanup_mode(new_pruner: DirectoryPrunerBase): Changes the pruner to the object passed as an argument

set_core_limit(core_limit, override_max_limit=False)

Sets the total number of cores that the study may use.

Parameters:

core_limit (int) – The max number of cores that the study can use at any time.
override_max_limit – Override the default max cores that can be specified for a given study. The current limit of 500 is recommended by the MatCal team but might not be best for all cases.

Raises:

StudyTypeError – if the passed value is not an int.

set_error_stopping_criteria(average_l2_error_goal: float = 0.01, max_abs_error_goal: float = 0.1)

Set the error thresholds that determine when the adaptive surrogate training stops.

When the average L2 error falls below average_l2_error_goal or the maximum absolute error falls below max_abs_error_goal the training loop terminates (provided at least two batches have been evaluated).

Parameters:

average_l2_error_goal (float, optional) – Desired upper bound for the average L2 error. Must be a positive number.
max_abs_error_goal (float, optional) – Desired upper bound for the maximum absolute error. Must be a positive number.

set_independent_variable(independent_variable, independent_variable_values)

Specify an independent (auxiliary) variable and the values at which the surrogate will be evaluated.

This variable is not a model input; it is a field that will be used later (for example, a spatial coordinate, a time step, or any other scalar quantity that the surrogate should be conditioned on). The surrogate will be trained on the parameter samples generated by the study and then provide a response at each value supplied in independent_variable_values.

Parameters:

independent_variable (str) – Name of the independent variable (e.g. "time", "x_position", …) that will be attached to the surrogate output.
independent_variable_values (array‑like of real numbers) – A 1‑D array‑like collection of real numbers indicating the points at which the surrogate should be queried.

set_max_training_samples(max_training_samples=1000)

Set the maximum number of training samples you want to be run for Sparse Grid surrogate generation. If the convergence criteria is not reached, the training for the surrogate will stop after max_training_samples has been reached.

Parameters:: max_training_samples (int) – desired maximum number of samples

set_number_of_samples(nsamples=20, skip=None)

Set the number of samples for the study.

Parameters:

nsamples (int) – Number of parameter samples to generate from Halton sequence.
skip (int) – When continuing an existing design, the user may optionally skip ahead in the Halton sequence by an amount determined by ‘skip’.

set_number_of_test_samples(number_of_test_samples)

Set the number of samples that will be used for testing. By default we test against max_training_samples/20 or the number of parameters*10, whichever is greater.

Parameters:: max_training_samples (int) – desired number of test samples

set_parameters(*parameters)

Parameters:: parameters (Parameter or ParameterCollection) – The parameters of interest for the study.
Raises:: StudyTypeError – if the parameters are of incorrect type.

set_results_storage_options(data: bool = True, qois: bool = True, residuals: bool = True, objectives: bool = True, weighted_conditioned: bool = False, results_save_frequency: int = 1)

Set which history information to save and return with the study results. You can also down sample which evaluations to save using results_save_frequency. This is particularly useful if you wish to not store finite difference evaluations for gradient based studies. The total objective is always stored.

Parameters:

data (bool) – Store the raw data for each simulation and the raw experimental data for each objective for each desired evaluation.
qois (bool) – Store the QoIs for each objective for each desired evaluation. This includes both experiment and simulation QoIs
residuals (bool) – Store the residuals for each objective for each desired evaluation.
objectives (bool) – Store the objective by state and evaluation set for each desired evaluation.
weighted_conditioned (bool) – Store the weighted and conditioned values for each desired evaluation. This will save the weighted and conditioned, residuals, simulation qois and experiment qois.
results_save_frequency (int) – Set how the results save interval. For studies where finite difference derivatives are used, an interval of $n+1$ will exclude finite difference results from the saved results history.

set_seed(seed)

Set the random seed for the random generator that the study uses to generate the samples. The method should be called before launch() to guarantee reproducibility.

Parameters:: seed (int) – Integer seed for the pseudo‑random number generator.

set_surrogate_save_filename(filename)

Set the path used to save the surrogate object after each training batch.

The surrogate (an AdaptiveSurrogate instance) is periodically saved to disk with matcal.core.serializer_wrapper.matcal_save(). The filename must be a non‑empty string that ends with the .joblib extension. The directory component of the path is not created automatically; it must already exist or be created by the user prior to calling this method.

Parameters:

filename (str) – Full path (absolute or relative) to the file that will store the surrogate. The filename must end with .joblib. Example: "my_model_sparse_grid_surrogate.joblib" or "/tmp/surrogate.joblib".

Raises:

ValueError – If filename does not contain the required .joblib suffix or is empty.
TypeError – If filename is not a string.

set_target_field_name(target_field_name)

Specify the field name for the response that the surrogate model will seek to replicate. This is generally a model response such as temperature, load, etc.

Parameters:: target_field_name (str) – the name of the field that the surrogate will replicate

set_test_data(study_results)

Provide an external test‑data set for the adaptive surrogate study. This must contain the model name and field names necessary for the surrogate. This should only be used when re-running surrogate generation with a previously existing test set from a previous run where surrogate training was attempted. The independent variable data must also match what is specified for surrogate training.

If this method is not called, the adaptive study will automatically generate a test data set using a Halton sampling design. The number of test samples is taken from the value set via set_number_of_test_samples() (default is max_training_samples // 20 or n_parameters * 10, whichever is larger). Supplying an explicit test data set overrides that behavior.

Parameters:

study_results (StudyResults or str) – The test data to be used for surrogate evaluation. * StudyResults – a StudyResults instance containing the desired parameter history and simulation results. * str – a path to a serialized .joblib file that, when loaded returns a StudyResults object.

Raises:

TypeError – If study_results is neither a StudyResults instance nor a string.
FileNotFoundError – If study_results is a string but the file cannot be located or loaded.
RuntimeError – If the loaded object is not a StudyResults instance.

Notes:

The supplied test set is only used for validation of the surrogate;

it is never incorporated into the training data. * Calling this method multiple times replaces any previously stored test data with the most recent value. * The test data must be compatible with the study’s parameter space (same parameter names and bounds as the training data).

set_test_group_random_seed(seed)

Set the random seed for the random generator that the study uses to generate the test samples only. The method should be called before launch() to guarantee reproducibility.

Parameters:: seed (int) – Integer seed for the pseudo‑random number generator.

set_use_threads(always_use_threads=False)

By default, MatCal assumes that the model being run is CPU intensive. As a result, it runs each model in a subprocess which can result in some additional overhead. If running studies cheaper python models, it may be beneficial to use threading instead of a subprocess. Using this method will run the study with threading if only one model can be evaluated at a time. You can optionally run with threads even with concurrent model evaluations with the “always_use_threads” option; however, this can be less reliable. For large memory calibrations, we always recommend using subprocess.

Finally, any external executable is always run using subprocess, but threading can be use to manage that job and return its results.

Parameters:: always_use_threads (bool) – if true, MatCal will use threads over subprocess for concurrent modeling jobs. Defaults to False.

set_working_directory(working_directory, remove_existing=False)

By default, MatCal runs in the current working directory. This method allows the user to specify a subdirectory in the current directory for the study to be run in. This method will create only the last directory in the path. So if the desired subdirectory is under a multiple folders from the current directory MatCal will error if the head of the path does not exist. See os.path.split() for a definition of the path “head”.

Parameters:

working_directory (str) – The desired working directory for the current study. MatCal will only create the last folder if the path is a nested path.
remove_existing – If True, then the directory will be removed if pre-existing at study launch.

property surrogate

Return the AdaptiveSurrogate instance that holds the surrogate models and their training history.

Returns:: The surrogate object, or None if the study has not yet created one (i.e., before launch() is called).
Return type:: AdaptiveSurrogate | None

property surrogate_save_filename

Retrieve the filename (including the .joblib extension) that will be used to save the surrogate object after each training batch.

Returns:: The absolute or relative path supplied via set_surrogate_save_filename(), or None if no filename has been set.
Return type:: str | None

class matcal.core.adaptive_surrogates.VoronoiAdaptiveSurrogateStudy(*parameters)[source]

Initialize the VoronoiAdaptiveSurrogateStudy

set_number_of_initial_samples(num_initial_samples=None)[source]

Parameters:: num_initial_samples – The number of samples to initiate the algorithm with.

The initial samples are used to train the initial surrogate and built the initial voronoi tessellation. Default 10*ndim. :type initial_training_length: None or int

set_voronoi_sampling_options(voronoi_type='full', finite_only=False, iterative_updates=True, thin=None, random_selection=None)[source]

Set options pertaining to the voronoi sampling algorithm. Properties that can be altered are listed below.

Parameters:

vornoi_type –
Defines which Vornoi-based sampling strategy to use. Supported options are:
- ’full’: Constructs the full Voronoi tessellation over all points (Default)
- ’local’: Constructs a local Voronoi tessellation using only nearby
  points determined by k-nearest neighbors. This can reduce computational cost in high dimensions.
finite_only – If True, only Vornoi vertices that lie inside the convex hull defined by the boundary points are consided as candidate sample locations. If False, all vertices are considered, and those lying outside the parameter bounds are clipped back to the convex hull. This is more flexible but can be more computationally expensive, especially in high dimensions.
iterative_updates (bool) – If True, the Voronoi tessellation is recomputed after each new sample is added, promoting a more space-filling design. If False, the tessellation is updated once per batch after all samples in the batch are selected. This can be faster but may result in sample clustering.
thin (int or None) – If specified, every nth candidate sample location is selected as a new sample location. This can significantly reduce computational cost in high-dimensional spaces.
random_selection (int or None) – If sepecified, this defines the number of candidate sample locations that are randomly selected as new samples. This provides an alternative way to reduce computational cost in high-dimensional problems.

set_surrogate_options(**kwargs)[source]

Parameters:: regressor_kwargs – A keyword selection of parameters to pass to the predictor used. Please refer to the sklearn documentation for more information for what can be passed to the predictors.

set_convergence_criteria(eps=1e-12, convergence_metric='nlpd')[source]

Convergence is determined by comparing RMSE or NLPD of surrogate between two successive batches.

Parameters:

convergence_metric – Choose from root mean squared error (‘rmse’) or negative log posterior density (‘nlpd’) to track surrogate performance at each batch iteration. This metric is used to determine if the surrogate has converged according to eps.
eps (float) – Tolerance for surrogate convergence.

set_cross_validation_options(nsplits=5, nmax_folds=3, nmax_loo=10, cv_scale=1.0, cv_metric='rmse', group_kfold=False)[source]

Set options for cross validation. Properties that can be altered are listed below.

Parameters:

nsplits (int) – The number of folds to use in k-fold cross validation. If nsplits = 0, k-fold cross-validation is skipped entirely and new samples are instead selected from every region of the Voronoi tessellation defined by the current set of training samples.
nmax_folds (int) – Points in the folds with the highest k-fold error (the top nmax_folds) define the Voronoi regions from which new samples will be drawn.
nmax_loo (int or 'all') – Points with the largest leave-one-out cross-validation (LOOCV) errors (the top nmax_loo). These define the Voronoi regions from which new samples will be drawn. If nmax_loo = ‘all’, then new samples are drawn from all Voronoi regions defined by nmax_folds, and leave-one-out cross-validation is not performed.
cv_scale – Optional scaling applied to output before calculating errors in cross-validation and leave-one-out cross-validation. This can be used to balance error magnitude across dimensions or outputs.
cv_metric (str) –
Determines which metric is used when computing errors during cross-validation. Supported options are:
- rmse – root mean squared error (Default)
- nlpd – negative log posterior density
group_kfold (bool) – If True, samples are grouped using k-means clustering prior to k-fold cross-validation so that nearby points are allways assigned to the same fold. This prevents spatially correllated points from being split across training and validation sets. If False, folds are assigned randomly by the standard KFold algorithm.

add_evaluation_set(model, state=None, qoi_extractor=None)

Add an evaluation set that uses a SimulationResultsSynchronizer generated from the study’s independent variable, independent‑variable values, and target field name.

Warning

For adaptive surrogates, this can only be called once as the training points are adaptively chosen based on the response of interest.

This method is a thin wrapper around StudyBase.add_evaluation_set(). It accepts only the model (required) and an optional state argument. state must be a single State instance; a collection of states is not supported. If state is None MatCal’s default state will be used.

The synchronizer is built automatically from the attributes that were defined via set_independent_variable() and set_target_field_name().

Parameters:

model (ModelBase) – The model that will generate the simulation data.
state (State, optional) – The single state to which the evaluation set should be applied. If None the model’s default state is used.
qoi_extractor (UserDefinedExtractor) – Provide a UserDefinedExtractor that will act on the simulation results to provide a quantity of interest for the surrogate. It must return target field values of the same length of the independent variable values.

Raises:

RuntimeError – If the required attributes for the synchronizer (independent variable, its values, or target field name) have not been set.

add_parameter_preprocessor(parameter_preprocessor)

Add a parameter preprocessor to the study that will operate on the parameters before they are sent to the models. See UserDefinedParameterPreprocessor.

Parameters:: parameter_preprocessor (UserDefinedParameterPreprocessor) – the parameter preprocessor that will modify and update the given model parameters

property final_results_filename

Returns the filename for the final results file for the current study.

return: final results filename as an absolute path rtype: str

launch()

Run the initial test‑sampling study in a dedicated sub‑directory, then continue with the adaptive Sparse‑Grid workflow.

The test‑sampling phase is performed by a standard HaltonStudy to generate the required test points. If the user called StudyBase.set_working_directory() before launching the study, the test‑sampling directory is created by appending the suffix "_test_samples" to the user‑provided path. Otherwise, the test samples are run in a local directory named "test_samples". After the test sample study finishes, the original working directory is restored and the surrogate‑building routine is started.

make_residuals_study(): This changes the stored total objectives to be the L2 norm of one long concatenated residual from all objectives added using add_evaluation_set()

make_total_objective_study(): This changes the stored total objectives to be a summation of all metric function results.

plot_progress(): Calling this method will cause matcal to generate automatic plots after each batch of parameter evaluations. These plots are made using the standard plotter and will show things such as objective value evolution.

restart()

Sets the study to launch in restart mode. The study will use existing results from previous launches to populate the results instead of running the simulations again. Note that this feature requires that no changes to the study to be made in order for the study to produce correct results.

Files from previous runs are read in to this study, they should not be deleted. Missing files may cause errors in the restart.

If any random number generation is used in the calculation. It is important to set the same seed value as used previously

property results: Return access to the study’s results. Will return None, if study has not been run.

property results_synchronizer

Return the SimulationResultsSynchronizer that was created for this adaptive surrogate study.

The synchronizer is responsible for evaluating the model at the user‑provided independent‑variable locations and extracting the target field (the quantity of interest) from the simulation output. It is constructed the first time add_evaluation_set() is called and stored internally as self._results_synchronizer. As a result, this should be called after an evaluation set is added to the study.

Returns:: The SimulationResultsSynchronizer instance associated with the study, or None if the synchronizer has not yet been created (i.e. add_evaluation_set has not been called).
Return type:: SimulationResultsSynchronizer | None

run_in_serial()

Tell MatCal to run evaluations in serial. This is only recommended if the study is serial, like a MCMC Bayes Study, and the model evaluations are fast, like a python model.

Running in serial avoids the overhead of reloading large data sets that are necessary in async studies.

set_Halton_scramble(scramble=True)

Parameters:: scramble (bool) – set the scramble keyword for the numpy Halton object.

set_cleanup_mode(new_pruner: DirectoryPrunerBase): Changes the pruner to the object passed as an argument

set_core_limit(core_limit, override_max_limit=False)

Sets the total number of cores that the study may use.

Parameters:

core_limit (int) – The max number of cores that the study can use at any time.
override_max_limit – Override the default max cores that can be specified for a given study. The current limit of 500 is recommended by the MatCal team but might not be best for all cases.

Raises:

StudyTypeError – if the passed value is not an int.

set_error_stopping_criteria(average_l2_error_goal: float = 0.01, max_abs_error_goal: float = 0.1)

Set the error thresholds that determine when the adaptive surrogate training stops.

When the average L2 error falls below average_l2_error_goal or the maximum absolute error falls below max_abs_error_goal the training loop terminates (provided at least two batches have been evaluated).

Parameters:

average_l2_error_goal (float, optional) – Desired upper bound for the average L2 error. Must be a positive number.
max_abs_error_goal (float, optional) – Desired upper bound for the maximum absolute error. Must be a positive number.

set_independent_variable(independent_variable, independent_variable_values)

Specify an independent (auxiliary) variable and the values at which the surrogate will be evaluated.

This variable is not a model input; it is a field that will be used later (for example, a spatial coordinate, a time step, or any other scalar quantity that the surrogate should be conditioned on). The surrogate will be trained on the parameter samples generated by the study and then provide a response at each value supplied in independent_variable_values.

Parameters:

independent_variable (str) – Name of the independent variable (e.g. "time", "x_position", …) that will be attached to the surrogate output.
independent_variable_values (array‑like of real numbers) – A 1‑D array‑like collection of real numbers indicating the points at which the surrogate should be queried.

set_max_training_samples(max_training_samples=1000)

Set the maximum number of training samples you want to be run for Sparse Grid surrogate generation. If the convergence criteria is not reached, the training for the surrogate will stop after max_training_samples has been reached.

Parameters:: max_training_samples (int) – desired maximum number of samples

set_number_of_samples(nsamples=20, skip=None)

Set the number of samples for the study.

Parameters:

nsamples (int) – Number of parameter samples to generate from Halton sequence.
skip (int) – When continuing an existing design, the user may optionally skip ahead in the Halton sequence by an amount determined by ‘skip’.

set_number_of_test_samples(number_of_test_samples)

Set the number of samples that will be used for testing. By default we test against max_training_samples/20 or the number of parameters*10, whichever is greater.

Parameters:: max_training_samples (int) – desired number of test samples

set_parameters(*parameters)

Parameters:: parameters (Parameter or ParameterCollection) – The parameters of interest for the study.
Raises:: StudyTypeError – if the parameters are of incorrect type.

set_results_storage_options(data: bool = True, qois: bool = True, residuals: bool = True, objectives: bool = True, weighted_conditioned: bool = False, results_save_frequency: int = 1)

Set which history information to save and return with the study results. You can also down sample which evaluations to save using results_save_frequency. This is particularly useful if you wish to not store finite difference evaluations for gradient based studies. The total objective is always stored.

Parameters:

data (bool) – Store the raw data for each simulation and the raw experimental data for each objective for each desired evaluation.
qois (bool) – Store the QoIs for each objective for each desired evaluation. This includes both experiment and simulation QoIs
residuals (bool) – Store the residuals for each objective for each desired evaluation.
objectives (bool) – Store the objective by state and evaluation set for each desired evaluation.
weighted_conditioned (bool) – Store the weighted and conditioned values for each desired evaluation. This will save the weighted and conditioned, residuals, simulation qois and experiment qois.
results_save_frequency (int) – Set how the results save interval. For studies where finite difference derivatives are used, an interval of $n+1$ will exclude finite difference results from the saved results history.

set_seed(seed)

Set the random seed for the random generator that the study uses to generate the samples. The method should be called before launch() to guarantee reproducibility.

Parameters:: seed (int) – Integer seed for the pseudo‑random number generator.

set_surrogate_save_filename(filename)

Set the path used to save the surrogate object after each training batch.

The surrogate (an AdaptiveSurrogate instance) is periodically saved to disk with matcal.core.serializer_wrapper.matcal_save(). The filename must be a non‑empty string that ends with the .joblib extension. The directory component of the path is not created automatically; it must already exist or be created by the user prior to calling this method.

Parameters:

filename (str) – Full path (absolute or relative) to the file that will store the surrogate. The filename must end with .joblib. Example: "my_model_sparse_grid_surrogate.joblib" or "/tmp/surrogate.joblib".

Raises:

ValueError – If filename does not contain the required .joblib suffix or is empty.
TypeError – If filename is not a string.

set_target_field_name(target_field_name)

Specify the field name for the response that the surrogate model will seek to replicate. This is generally a model response such as temperature, load, etc.

Parameters:: target_field_name (str) – the name of the field that the surrogate will replicate

set_test_data(study_results)

Provide an external test‑data set for the adaptive surrogate study. This must contain the model name and field names necessary for the surrogate. This should only be used when re-running surrogate generation with a previously existing test set from a previous run where surrogate training was attempted. The independent variable data must also match what is specified for surrogate training.

If this method is not called, the adaptive study will automatically generate a test data set using a Halton sampling design. The number of test samples is taken from the value set via set_number_of_test_samples() (default is max_training_samples // 20 or n_parameters * 10, whichever is larger). Supplying an explicit test data set overrides that behavior.

Parameters:

study_results (StudyResults or str) – The test data to be used for surrogate evaluation. * StudyResults – a StudyResults instance containing the desired parameter history and simulation results. * str – a path to a serialized .joblib file that, when loaded returns a StudyResults object.

Raises:

TypeError – If study_results is neither a StudyResults instance nor a string.
FileNotFoundError – If study_results is a string but the file cannot be located or loaded.
RuntimeError – If the loaded object is not a StudyResults instance.

Notes:

The supplied test set is only used for validation of the surrogate;

it is never incorporated into the training data. * Calling this method multiple times replaces any previously stored test data with the most recent value. * The test data must be compatible with the study’s parameter space (same parameter names and bounds as the training data).

set_test_group_random_seed(seed)

Set the random seed for the random generator that the study uses to generate the test samples only. The method should be called before launch() to guarantee reproducibility.

Parameters:: seed (int) – Integer seed for the pseudo‑random number generator.

set_use_threads(always_use_threads=False)

By default, MatCal assumes that the model being run is CPU intensive. As a result, it runs each model in a subprocess which can result in some additional overhead. If running studies cheaper python models, it may be beneficial to use threading instead of a subprocess. Using this method will run the study with threading if only one model can be evaluated at a time. You can optionally run with threads even with concurrent model evaluations with the “always_use_threads” option; however, this can be less reliable. For large memory calibrations, we always recommend using subprocess.

Finally, any external executable is always run using subprocess, but threading can be use to manage that job and return its results.

Parameters:: always_use_threads (bool) – if true, MatCal will use threads over subprocess for concurrent modeling jobs. Defaults to False.

set_working_directory(working_directory, remove_existing=False)

By default, MatCal runs in the current working directory. This method allows the user to specify a subdirectory in the current directory for the study to be run in. This method will create only the last directory in the path. So if the desired subdirectory is under a multiple folders from the current directory MatCal will error if the head of the path does not exist. See os.path.split() for a definition of the path “head”.

Parameters:

working_directory (str) – The desired working directory for the current study. MatCal will only create the last folder if the path is a nested path.
remove_existing – If True, then the directory will be removed if pre-existing at study launch.

property surrogate

Return the AdaptiveSurrogate instance that holds the surrogate models and their training history.

Returns:: The surrogate object, or None if the study has not yet created one (i.e., before launch() is called).
Return type:: AdaptiveSurrogate | None

property surrogate_save_filename

Retrieve the filename (including the .joblib extension) that will be used to save the surrogate object after each training batch.

Returns:: The absolute or relative path supplied via set_surrogate_save_filename(), or None if no filename has been set.
Return type:: str | None