.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "introduction_examples/surrogate_studies/plot_generate_surrogate_a_standard.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_introduction_examples_surrogate_studies_plot_generate_surrogate_a_standard.py: Surrogate Generation Example ============================== This example demonstrates how to generate a basic surrogate from a MatCal study. This example will cover: * Generating a base data set for surrogate generation * Generating a surrogate from a MatCal Study * Obtaining predictions from a surrogate * How to load a saved surrogate * How to launch an interactive window for interrogating a surrogate The problem of interest is the uncertainty surrounding the boundary conditions for a foam and metal component in a high temperature environment. In this problem, a layer of foam separates two layers of steel. The top layer of steel is heated from a far field radiative source and convective heating from the heated gas surrounding it. We are concerned with the temperature rise immediately behind both metal layers. The temperature of the gas, the temperature of the far field source, and the speed of the ambient gas are uncertain. This problem can be solved directly using a finite element simulation. However for large UQ analysis or complicated calibrations, the evaluation time of the finite element simulation can cause studies that require many evaluations of the model to be prohibitively long. Using surrogates as a replacement for higher-fidelity simulations offers an approach to reduce the severity of these challenges. Surrogates can enable advanced analysis techniques such as Bayesian calibration or Multi-fidelity modeling. Surrogates in MatCal are data-based curve fits to predictions from higher fidelity simulations. As such they require an initial body (the word 'corpus' is often used as well) of data to be constructed, but then allow for near instant evaluation for future predictions. To generate this body of data, a large initial battery of simulations must be run. To do this in MatCal, we will run an LHS sampling study. An LHS study will allow us to efficiently sample our prediction space. The need to run a large battery of simulations before one gets a surrogate begs the question 'Why should I spend the time to generate a surrogate when I could just run my study with my high fidelity model?' This is an appropriate question, for simple small analyses generating a surrogate model is not worth the upfront cost but, as the analysis gets more complicated and model evaluations more numerous, having a surrogate for relevant quantities of interest can save a lot of time. In addition, generating the body of data necessary for building a surrogate is often extremely parallelizable, thus with sufficient computing resources all of the necessary simulations can be run in a few simultaneous batches. To generate a surrogate that predicts the heating behavior for a given set of boundary conditions, we start this example by importing MatCal and numpy. .. GENERATED FROM PYTHON SOURCE LINES 51-55 .. code-block:: Python # sphinx_gallery_thumbnail_number = 3 import matcal as mc import numpy as np .. GENERATED FROM PYTHON SOURCE LINES 56-68 For this example there are three parameters that define our boundary conditions. the convective heat transfer coefficient, H, relates how rapidly energy is exchanged between the ambient gas and the solid components, it is closely related to the speed of the ambient gasses. We will use it as an abstraction for how fast the gasses are moving around our component. Air values of H near 1 are characteristic of low flow environments, values near 10 are for conditions of moderate flow, and values near 100 are for conditions of strong flow. The other two parameters of interest are the temperatures of the air and a far field heat source. The ranges of these values were chosen to be on the order of temperatures seen near a fire. .. GENERATED FROM PYTHON SOURCE LINES 68-72 .. code-block:: Python conv_heat_transfer_coeff = mc.Parameter("H", 1, 100) # W / (m^2 K) far_field_temperature = mc.Parameter("T_inf", 500, 1000) # K air_temperature = mc.Parameter("T_air", 400, 800) # K .. GENERATED FROM PYTHON SOURCE LINES 73-76 We then load these parameters into a Latin Hypercube Sensitivity Study. This is the study that will be used to generate our body of training data for the surrogate. For more details see :class:`~matcal.dakota.sensitivity_studies.LhsSensitivityStudy`. .. GENERATED FROM PYTHON SOURCE LINES 76-79 .. code-block:: Python sampling_study = mc.LhsSensitivityStudy(conv_heat_transfer_coeff, far_field_temperature, air_temperature) .. GENERATED FROM PYTHON SOURCE LINES 80-90 Through defining an objective for the LHS, we define what our independent and dependent fields of interest are. In this case, we want to use 'time' as our independent field. Since we do not need to compare to experimental data for this study, we will use a :class:`~matcal.core.objective.SimulationResultsSynchronizer` in place of the objective. It needs the independent field, the values of interest for the independent field and any dependent fields of interest for the study and resulting surrogate. When determining the independent field values of interest, it is important to select an appropriate number of prediction points. For more complicated physical evolutions, selecting too few points will generate poor surrogates. .. GENERATED FROM PYTHON SOURCE LINES 90-97 .. code-block:: Python n_prediction_points = 200 time_start = 0 time_end = 60 * 60 * 2 indep_field_vals = np.linspace(time_start, time_end, n_prediction_points) my_objective = mc.SimulationResultsSynchronizer('time', indep_field_vals, "TC_top", "TC_bottom") .. GENERATED FROM PYTHON SOURCE LINES 98-100 Next, we need to inform MatCal about our high fidelity model. Our model is a SIERRA/aria model that we define in a local subdirectory 'aria_model'. .. GENERATED FROM PYTHON SOURCE LINES 100-111 .. code-block:: Python my_hifi_model = mc.UserDefinedSierraModel('aria', "aria_model/metal_foam_layers.i", "aria_model/test_block.g", "aria_model/include") my_hifi_model.set_results_filename("results/results.csv") my_hifi_model.set_number_of_cores(2) from site_matcal.sandia.tests.utilities import MATCAL_WCID from site_matcal.sandia.computing_platforms import is_sandia_cluster if is_sandia_cluster(): my_hifi_model.run_in_queue(MATCAL_WCID, 0.25) my_hifi_model.continue_when_simulation_fails() my_hifi_model.set_number_of_cores(12) .. GENERATED FROM PYTHON SOURCE LINES 112-118 Now we have all of our necessary components for a LHS study. We pass our model and objective into the study. We then tell our study how many cores its can use and the number of samples it needs to run. We chose 500 samples for this example because it has a decent performance floor and runs in a reasonable amount of time. Depending on the complexity of your problem, a larger sample set may be required (1000-10000). .. GENERATED FROM PYTHON SOURCE LINES 118-127 .. code-block:: Python sampling_study.add_evaluation_set(my_hifi_model, my_objective) if is_sandia_cluster(): sampling_study.set_core_limit(250) else: sampling_study.set_core_limit(112) sampling_study.set_number_of_samples(500) sampling_study.set_seed(12345) sampling_study.set_working_directory("standard_surrogate_training", remove_existing=True) .. GENERATED FROM PYTHON SOURCE LINES 128-132 With our study defined, we run it and wait for it to complete. While it will generate information with regards to the sensitivity of the quantities of interest to the parameters, we are mostly interested in the model results the study produced. .. GENERATED FROM PYTHON SOURCE LINES 132-134 .. code-block:: Python study_results = sampling_study.launch() .. GENERATED FROM PYTHON SOURCE LINES 135-147 Now that the study is done running, we will generate a surrogate for the model using information stored in the study and its results. To generate a surrogate we use MatCal's :class:`~matcal.core.surrogates.SurrogateGenerator`. We construct a generator by passing in the study we just completed. If we wanted to we could alter some of the surrogate generator's settings by evoking :meth:`~matcal.core.surrogates.SurrogateGenerator.set_surrogate_details`, but we pass arguments for the surrogate generator directly through its initialization. We then generate our surrogate by calling :meth:`~matcal.core.surrogates.SurrogateGenerator.generate` with a filename we would like to save our surrogate to. The method then returns the surrogate, and saves a copy of it to the filename we passed with a ".joblib" file extension. .. GENERATED FROM PYTHON SOURCE LINES 147-155 .. code-block:: Python surrogate_generator = mc.SurrogateGenerator(sampling_study, interpolation_field='time', regressor_type="Gaussian Process", n_restarts_optimizer=20, alpha=1e-5, normalize_y=True) surrogate_generator.set_PCA_details(decomp_var=4) surrogate = surrogate_generator.generate("layered_metal_bc_surrogate") .. GENERATED FROM PYTHON SOURCE LINES 156-212 To avoid rerunning a sampling study when debugging the surrogate generator, it is recommended that one pass a :class:`~matcal.core.study_base.StudyResults` with the relevant information from the sampling study rather than rerun the whole study when that is not required. This information is stored in the "final_results.joblib" file generated by the sampling study. This information can be loaded by calling :func:`~matcal.core.serializer_wrapper.matcal_load`. While the surrogate is being trained, the generator will report the testing and training scores for each QOI the surrogate was requested to predict. The best score for any test is 1, with poorer scores less than 1. The training score represents how well the surrogate performs on the data it was trained with, and the test score indicates how well the surrogate performs on data it was not trained on. Ideally both of these scores should be greater than .95. If either score is much below that then the surrogate will likely have poor applicability. .. warning:: These scores represent how well the surrogates predict the PCA mode amplitudes not the actual curves. Therefore, adequate test scores may not be a direct indication of accuracy for predicting the response in the original space. If there are too many modes, the score may be low, but the predictions may be adequate. If there are too few modes, the score may be high, but the predictions may be poor. Always verify surrogate quality as we do below. Even with relatively high scores, the result will likely be a decent approximation of the desired response. This can still be useful if the actual models are very expensive and you need a less expensive model to determine areas in the parameter space the produce desired results. A focused study can then be performed with the full model after using the surrogate model to identify regions of interest in the parameter space. One important case is when the training score is much higher than the testing score. This is an indication that the surrogate is overfitting to its training data. This means that predictions outside of the training data set are likely to be very inaccurate. If this is the case there can be a couple of common causes: #. The source data is poor, try increasing the number of prediction points and the number of samples run. #. There is insufficient data for the underlying predictor. Increase the number of samples used during sampling and/or reduce the complexity level of the predictor. #. There is a poor corelation between the QOIs and the parameters. Examine the results of the sensitivity study to gain a better understanding of how the QOIs and the parameters relate to each other and then try again. #. Trying to predict QOI that change by several orders of magnitude (even going from 1 to near 0). In these cases it is better to calibrate to the natural log of these values. This can be done using the :meth:`~matcal.core.surrogates.SurrogateGenerator.set_fields_to_log_scale` method of the surrogate generator. The scores are output in the log files and standard output, but can also be accessed as properties under the surrogate after it has been produced. We print the scores below for this surrogate. .. GENERATED FROM PYTHON SOURCE LINES 212-215 .. code-block:: Python print('Train scores:\n', surrogate.scores['train']) print('Test scores:\n', surrogate.scores['test']) .. rst-class:: sphx-glr-script-out .. code-block:: none Train scores: OrderedDict({'TC_top': 0.9999770041086106, 'TC_bottom': 0.9987124543658804}) Test scores: OrderedDict({'TC_top': 0.9999501576521572, 'TC_bottom': 0.9961329082917857}) .. GENERATED FROM PYTHON SOURCE LINES 216-229 Both the test scores and the training scores indicate the surrogates are well trained and can be used to predict our responses. Now we use the surrogate to make predictions of the model responses. To do so, we pass in an argument list or keyword argument list for the parameters that we want evaluated. The surrogate will return a dictionary of predictions. The order of the parameters is the same order that they were passed into the the parameter collection or study, but this can be verified by calling :meth:`~matcal.core.surrogates.MatCalMultiModalPCASurrogate.parameter_order`. If keyword arguments are used, the keywords must match the parameter names assigned in the :class:`~matcal.core.parameters.Parameter` object inits. .. GENERATED FROM PYTHON SOURCE LINES 229-233 .. code-block:: Python H = 10 T_inf = 600 T_air = 400 .. GENERATED FROM PYTHON SOURCE LINES 234-243 By default, the surrogates will not allow evaluations outside of the parameter space ranges provided in the combined test and training data sets. Usually, the specified parameter bounds will not be included in the samples generated from a parameter sampling study as done above. To permit evaluations at the bounds, you can update the permissible parameter ranges using :meth:`~matcal.core.surrogates.MatCalMultiModalPCASurrogate.set_parameter_ranges` as we do below. Since we modified the surrogate, we also have to save the surrogate so that the changes are in the stored surrogate just in case we attempt to use it later. .. GENERATED FROM PYTHON SOURCE LINES 243-249 .. code-block:: Python surrogate.set_parameter_ranges(H=[conv_heat_transfer_coeff.get_lower_bound(), conv_heat_transfer_coeff.get_upper_bound()], T_inf=[far_field_temperature.get_lower_bound(), far_field_temperature.get_upper_bound()], T_air=[air_temperature.get_lower_bound(), air_temperature.get_upper_bound()]) .. GENERATED FROM PYTHON SOURCE LINES 250-255 Since we modified the surrogate, we must save the surrogate so that the changes are in the stored surrogate just in case we attempt to use it later. It is saved using the :func:`~matcal.core.serializer_wrapper.matcal_save` function. .. GENERATED FROM PYTHON SOURCE LINES 255-256 .. code-block:: Python mc.matcal_save("layered_metal_bc_surrogate.joblib", surrogate) .. GENERATED FROM PYTHON SOURCE LINES 257-261 With the parameter ranges updated, we can no evaluate the surrogate with our desired parameters. This evaluation includes T_air = 400, which was the user specified bound for the training data study. .. GENERATED FROM PYTHON SOURCE LINES 261-274 .. code-block:: Python prediction = surrogate(H, T_inf, T_air) import matplotlib.pyplot as plt plt.close('all') plt.figure(constrained_layout=True) plt.plot(prediction['time'], prediction['TC_top'].flatten(), label="top") plt.plot(prediction['time'], prediction['TC_bottom'].flatten(), label="bottom") plt.xlabel("time (s)") plt.ylabel("temperature (K)") plt.legend() plt.title("Single Surrogate Prediction") plt.show() .. image-sg:: /introduction_examples/surrogate_studies/images/sphx_glr_plot_generate_surrogate_a_standard_001.png :alt: Single Surrogate Prediction :srcset: /introduction_examples/surrogate_studies/images/sphx_glr_plot_generate_surrogate_a_standard_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 275-279 Multiple sets of parameters can be evaluated simultaneously. Each field in the returned prediction will have a number of rows equal to the number of passed parameter sets. To evaluate multiple parameter sets, pass the `batch_evaluate=True` keyword argument .. GENERATED FROM PYTHON SOURCE LINES 279-285 .. code-block:: Python H2 = 20 T_inf2 = 815 T_air2 = 634 prediction2 = surrogate([[H, T_inf, T_air], [H2, T_inf2, T_air2]], batch_evaluate=True) .. GENERATED FROM PYTHON SOURCE LINES 286-291 We can also run the actual model for these parameters for comparison to the surrogate. Doing this step is recommended when determining if a surrogate is adequate for use in calibration or other studies. We do so using the :class:`~matcal.core.parameter_studies.ParameterStudy`. .. GENERATED FROM PYTHON SOURCE LINES 291-299 .. code-block:: Python param_study = mc.ParameterStudy(conv_heat_transfer_coeff, far_field_temperature, air_temperature) param_study.add_evaluation_set(my_hifi_model, my_objective) param_study.set_core_limit(16) param_study.add_parameter_evaluation(H=H, T_inf=T_inf, T_air=T_air) param_study.add_parameter_evaluation(H=H2, T_inf=T_inf2, T_air=T_air2) results = param_study.launch() .. GENERATED FROM PYTHON SOURCE LINES 300-303 With both the finite element model results and the surrogate model results obtained, we can plot them together for comparison. .. GENERATED FROM PYTHON SOURCE LINES 303-333 .. code-block:: Python fe_data1 = results.simulation_history[my_hifi_model.name]["matcal_default_state"][0] fe_data2 = results.simulation_history[my_hifi_model.name]["matcal_default_state"][1] plt.figure(constrained_layout=True) plt.plot(prediction2['time'], prediction2['TC_top'][0,:], '.', label="top prediction 1", color='tab:blue') plt.plot(prediction2['time'], prediction2['TC_top'][1,:], '.', label="top prediction 2", color='tab:orange') plt.plot(prediction2['time'], prediction2['TC_bottom'][0,:], '.', label="bottom prediction 1", color='tab:green') plt.plot(prediction2['time'], prediction2['TC_bottom'][1,:], '.', label="bottom prediction 2", color='tab:red') plt.plot(fe_data1['time'], fe_data1['TC_top'], label="top FE results 1", color='cornflowerblue') plt.plot(fe_data2['time'], fe_data2['TC_top'], label="top FE results 2", color='orange') plt.plot(fe_data1['time'], fe_data1['TC_bottom'], label="bottom FE results 1", color='lightgreen') plt.plot(fe_data2['time'], fe_data2['TC_bottom'], label="bottom FE results 2", color='orangered') plt.xlabel("time (s)") plt.ylabel("temperature (K)") plt.legend(ncols=2) plt.title("Multiple Surrogate Predictions") plt.show() .. image-sg:: /introduction_examples/surrogate_studies/images/sphx_glr_plot_generate_surrogate_a_standard_002.png :alt: Multiple Surrogate Predictions :srcset: /introduction_examples/surrogate_studies/images/sphx_glr_plot_generate_surrogate_a_standard_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 334-338 Similarly, we can plot the surrogate model error. First, we interpolate the surrogate results to the finite element model times. Next, we calculate and plot the absolute error for each prediction. .. GENERATED FROM PYTHON SOURCE LINES 338-370 .. code-block:: Python interp_prediction_top1 = np.interp(fe_data1['time'], prediction2['time'], prediction2['TC_top'][0,:]) interp_prediction_top2 = np.interp(fe_data2['time'], prediction2['time'], prediction2['TC_top'][1,:]) interp_prediction_bot1 = np.interp(fe_data1['time'], prediction2['time'], prediction2['TC_bottom'][0,:]) interp_prediction_bot2 = np.interp(fe_data2['time'], prediction2['time'], prediction2['TC_bottom'][1,:]) plt.figure(constrained_layout=True) plt.plot(fe_data1['time'], interp_prediction_top1-fe_data1['TC_top'], label="top TC error 1", color='tab:blue') plt.plot(fe_data2['time'], interp_prediction_top2-fe_data2['TC_top'], label="top TC error 2", color='tab:orange') plt.plot(fe_data1['time'], interp_prediction_bot1-fe_data1['TC_bottom'], label="bottom TC error 1", color='tab:green') plt.plot(fe_data2['time'], interp_prediction_bot2-fe_data2['TC_bottom'], label="bottom TC error 2", color='tab:red') plt.xlabel("time (s)") plt.ylabel("temperature error (K)") plt.legend(ncols=2) plt.title("Multiple Surrogate Predictions") plt.show() .. image-sg:: /introduction_examples/surrogate_studies/images/sphx_glr_plot_generate_surrogate_a_standard_003.png :alt: Multiple Surrogate Predictions :srcset: /introduction_examples/surrogate_studies/images/sphx_glr_plot_generate_surrogate_a_standard_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 371-380 These results show that the surrogates predict the response fairly well. Most of the error is below 10 K throughout the entire history which is just a few percent for the curves. The second prediction for the bottom thermal couple has the worst surrogate prediction late in the time history. This could potentially be improved with more modes and more training samples. If needed, we can load this surrogate again for future use by constructing a :class:`~matcal.core.surrogates.MatCalMultiModalPCASurrogate`, with the saved filename created during the surrogate's generation. .. GENERATED FROM PYTHON SOURCE LINES 380-384 .. code-block:: Python loaded_surrogate = mc.matcal_load("layered_metal_bc_surrogate.joblib") loaded_prediction2 = loaded_surrogate([[H, T_inf, T_air], [H2, T_inf2, T_air2]], batch_evaluate=True) .. GENERATED FROM PYTHON SOURCE LINES 385-390 In the follow-on examples, we compare this surrogate to adaptive surrogates. To do so, we evaluate the surrogate over a Halton sampling of the parameter space with a given seed. We then compare the maximum average error in the surrogate give 500 non-adaptive training samples to the adaptive surrogates in the other examples. .. GENERATED FROM PYTHON SOURCE LINES 390-399 .. code-block:: Python test_study = mc.HaltonStudy(conv_heat_transfer_coeff, far_field_temperature, air_temperature) test_study.add_evaluation_set(my_hifi_model, my_objective) test_study.set_core_limit(250) test_study.set_number_of_samples(30) test_study.set_seed(12345) test_study.set_working_directory("standard_surrogate_test", remove_existing=True) test_results = test_study.launch() .. GENERATED FROM PYTHON SOURCE LINES 401-405 .. code-block:: Python surrogate_generator.set_surrogate_details(training_fraction=1.0, test_eval_info=test_results) surrogate = surrogate_generator.generate("layered_metal_bc_surrogate") print(surrogate.max_errors) .. rst-class:: sphx-glr-script-out .. code-block:: none OrderedDict({'train': OrderedDict({'TC_top': 5.005222870508078, 'TC_bottom': 17.061705757217737}), 'test': OrderedDict({'TC_top': 4.585223974808059, 'TC_bottom': 116.01490147862319})}) .. GENERATED FROM PYTHON SOURCE LINES 406-413 Lastly, the surrogate can be investigated in an interactive manner using MatCal's interactive tools. To do so, use the command line call: .. code-block:: python interactive_matcal -s This command will launch a browser window in which you can investigate your surrogate. .. rst-class:: sphx-glr-timing **Total running time of the script:** (7 minutes 48.017 seconds) .. _sphx_glr_download_introduction_examples_surrogate_studies_plot_generate_surrogate_a_standard.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_generate_surrogate_a_standard.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_generate_surrogate_a_standard.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_generate_surrogate_a_standard.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_