.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "introduction_examples/surrogate_studies/plot_generate_surrogate_a_standard.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_introduction_examples_surrogate_studies_plot_generate_surrogate_a_standard.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_introduction_examples_surrogate_studies_plot_generate_surrogate_a_standard.py:


Surrogate Generation Example
==============================

This example demonstrates how to generate a basic surrogate from a MatCal study.
This example will cover:

* Generating a base data set for surrogate generation
* Generating a surrogate from a MatCal Study
* Obtaining predictions from a surrogate 
* How to load a saved surrogate
* How to launch an interactive window for interrogating a surrogate 

The problem of interest is the uncertainty surrounding the boundary 
conditions for a foam and metal component in a high temperature environment.
In this problem, a layer of foam separates two layers of steel. The top layer of 
steel is heated from a far field radiative source and convective heating from the heated
gas surrounding it. We are concerned with the temperature rise immediately behind both metal layers.
The temperature of the gas, the temperature
of the far field source, and the speed of the ambient gas are uncertain. 

This problem can be solved directly using a finite element simulation.
However for large UQ analysis or complicated calibrations, the evaluation time of the 
finite element simulation can cause studies that require many evaluations of the model 
to be prohibitively long. Using surrogates
as a replacement for higher-fidelity simulations offers an approach to reduce the
severity of these challenges. Surrogates can enable advanced analysis techniques such as 
Bayesian calibration or Multi-fidelity modeling. 

Surrogates in MatCal are data-based curve fits to predictions from higher fidelity 
simulations. As such they require an initial body (the word 'corpus' is often used as well)
of data to be constructed, but then allow for near instant evaluation for future predictions. 
To generate this body of data, a large initial battery of simulations must be run. 
To do this in MatCal, we will run an LHS sampling study. An LHS study will allow us
to efficiently sample our prediction space. 

The need to run a large battery of simulations before one gets a surrogate begs the question
'Why should I spend the time to generate a surrogate when I could just run my study with 
my high fidelity model?' This is an appropriate question, for simple small analyses 
generating a surrogate model is not worth the upfront cost but, as the analysis gets 
more complicated and model evaluations more numerous,
having a surrogate for relevant quantities of 
interest can save a lot of time. In addition, generating the body of data necessary 
for building a surrogate is often extremely parallelizable, thus with sufficient
computing resources all of the necessary simulations can be run in a few 
simultaneous batches.

To generate a surrogate that predicts the heating behavior for a given set of boundary conditions,
we start this example by importing MatCal and numpy.

.. GENERATED FROM PYTHON SOURCE LINES 51-55

.. code-block:: Python

    # sphinx_gallery_thumbnail_number = 3
    import matcal as mc
    import numpy as np


.. GENERATED FROM PYTHON SOURCE LINES 56-68

For this example there are three parameters that define our boundary conditions. 
the convective heat transfer coefficient, H, relates how rapidly energy is exchanged
between the ambient gas and the solid components, it is closely related to the speed of 
the ambient gasses. We will use it as an abstraction for how fast the gasses 
are moving around our component.
Air values of H near 1 are characteristic of low flow environments, 
values near 10 are for conditions of moderate
flow, and values near 100 are for conditions of strong flow. 
The other two parameters of interest are 
the temperatures of the air and a far field heat source. The ranges of 
these values were chosen to 
be on the order of temperatures seen near a fire. 

.. GENERATED FROM PYTHON SOURCE LINES 68-72

.. code-block:: Python

    conv_heat_transfer_coeff = mc.Parameter("H", 1, 100) # W / (m^2 K)
    far_field_temperature = mc.Parameter("T_inf", 500, 1000) # K
    air_temperature = mc.Parameter("T_air", 400, 800) # K


.. GENERATED FROM PYTHON SOURCE LINES 73-76

We then load these parameters into a Latin Hypercube Sensitivity Study. This is the 
study that will be used to generate our body of training data for the surrogate. 
For more details see :class:`~matcal.dakota.sensitivity_studies.LhsSensitivityStudy`.

.. GENERATED FROM PYTHON SOURCE LINES 76-79

.. code-block:: Python

    sampling_study = mc.LhsSensitivityStudy(conv_heat_transfer_coeff, far_field_temperature, 
                                            air_temperature)


.. GENERATED FROM PYTHON SOURCE LINES 80-90

Through defining an objective for the LHS, we define what our independent and dependent fields of 
interest are. In this case, we want to use 'time' as our independent field. Since
we do not need to compare to experimental data for this study, 
we will use a :class:`~matcal.core.objective.SimulationResultsSynchronizer`
in place of the objective. It needs the independent field, 
the values of interest for the independent field and any dependent fields
of interest for the study and resulting surrogate.
When determining the independent field values of interest, it is important
to select an appropriate number of prediction points. For more complicated 
physical evolutions, selecting too few points will generate poor surrogates. 

.. GENERATED FROM PYTHON SOURCE LINES 90-97

.. code-block:: Python

    n_prediction_points = 200
    time_start = 0
    time_end = 60 * 60 * 2
    indep_field_vals = np.linspace(time_start, time_end, n_prediction_points)
    my_objective = mc.SimulationResultsSynchronizer('time', indep_field_vals,
                                                     "TC_top", "TC_bottom")


.. GENERATED FROM PYTHON SOURCE LINES 98-100

Next, we need to inform MatCal about our high fidelity model. Our model 
is a SIERRA/aria model that we define in a local subdirectory 'aria_model'. 

.. GENERATED FROM PYTHON SOURCE LINES 100-111

.. code-block:: Python

    my_hifi_model = mc.UserDefinedSierraModel('aria', "aria_model/metal_foam_layers.i", 
                                              "aria_model/test_block.g", "aria_model/include")
    my_hifi_model.set_results_filename("results/results.csv")
    my_hifi_model.set_number_of_cores(2)
    from site_matcal.sandia.tests.utilities import MATCAL_WCID
    from site_matcal.sandia.computing_platforms import is_sandia_cluster
    if is_sandia_cluster():
        my_hifi_model.run_in_queue(MATCAL_WCID, 0.25)
        my_hifi_model.continue_when_simulation_fails()
        my_hifi_model.set_number_of_cores(12)


.. GENERATED FROM PYTHON SOURCE LINES 112-118

Now we have all of our necessary components for a LHS study. We pass our 
model and objective into the study. We then tell our study 
how many cores its can use and the number of samples it needs to run. 
We chose 500 samples for this example because it has a decent performance floor 
and runs in a reasonable amount of time. Depending on the complexity of your problem, 
a larger sample set may be required (1000-10000). 

.. GENERATED FROM PYTHON SOURCE LINES 118-127

.. code-block:: Python

    sampling_study.add_evaluation_set(my_hifi_model, my_objective)
    if is_sandia_cluster():
        sampling_study.set_core_limit(250)
    else:
        sampling_study.set_core_limit(112)
    sampling_study.set_number_of_samples(500)
    sampling_study.set_seed(12345)
    sampling_study.set_working_directory("standard_surrogate_training", remove_existing=True)


.. GENERATED FROM PYTHON SOURCE LINES 128-132

With our study defined, we run it and wait for it to complete. 
While it will generate information with regards to the sensitivity of the 
quantities of interest to the parameters, we are mostly interested in the model
results the study produced. 

.. GENERATED FROM PYTHON SOURCE LINES 132-134

.. code-block:: Python

    study_results = sampling_study.launch()


.. GENERATED FROM PYTHON SOURCE LINES 135-147

Now that the study is done running, we will generate a surrogate for the model
using information stored in the study and its results. 
To generate a surrogate we use MatCal's :class:`~matcal.core.surrogates.SurrogateGenerator`.
We construct a generator by passing in the study we just completed.
If we wanted to we could alter some of the surrogate generator's settings
by evoking :meth:`~matcal.core.surrogates.SurrogateGenerator.set_surrogate_details`, 
but we pass arguments for the surrogate generator directly through its initialization.
We then generate our surrogate by 
calling :meth:`~matcal.core.surrogates.SurrogateGenerator.generate` with 
a filename we would like to save our surrogate to. 
The method then returns the surrogate, and saves a copy of it to 
the filename we passed with a ".joblib" file extension. 

.. GENERATED FROM PYTHON SOURCE LINES 147-155

.. code-block:: Python

    surrogate_generator = mc.SurrogateGenerator(sampling_study, interpolation_field='time',
                                                regressor_type="Gaussian Process", 
                                                n_restarts_optimizer=20, 
                                                alpha=1e-5, 
                                                normalize_y=True)
    surrogate_generator.set_PCA_details(decomp_var=4)
    surrogate = surrogate_generator.generate("layered_metal_bc_surrogate")


.. GENERATED FROM PYTHON SOURCE LINES 156-212

To avoid rerunning a sampling study when debugging the surrogate generator, 
it is recommended that one pass a :class:`~matcal.core.study_base.StudyResults`
with the relevant information from the sampling study rather than rerun the whole 
study when that is not required. This information is stored in the "final_results.joblib"
file generated by the sampling study. This information can be loaded by calling 
:func:`~matcal.core.serializer_wrapper.matcal_load`.

While the surrogate is being trained, 
the generator will report the testing and training scores for each QOI 
the surrogate was requested to predict. The best score for any test is 1, 
with poorer scores less than 1. The training score represents how well the 
surrogate performs on the data it was trained with, and the test score
indicates how well the surrogate performs on data it was not trained on. 
Ideally both of these scores should be greater than .95. If either score is 
much below that then the surrogate will likely have poor applicability. 

.. warning::
   These scores represent how well the surrogates predict the PCA mode amplitudes
   not the actual curves. Therefore, adequate test scores may not be a direct 
   indication of accuracy for predicting the response in the original space.
   If there are too many modes, the score may be low, but the predictions may be 
   adequate. If there are too few modes, the score may be high, but the predictions
   may be poor.
   Always verify surrogate quality as we do below.

Even with relatively high scores, the result will likely be a decent approximation 
of the desired response. This can still be useful if the actual models are very expensive
and you need a less expensive model to determine areas in the parameter space 
the produce desired results. A focused study can then be performed 
with the full model after using the surrogate model to identify regions of interest
in the parameter space.

One important case is when the training score is much higher than the testing score. 
This is an indication that the surrogate is overfitting to its training data. 
This means that predictions outside of the training data set are likely to be very 
inaccurate. If this is the case there can be a couple of common causes:

#. The source data is poor, try increasing the number of prediction points and the 
   number of samples run. 

#. There is insufficient data for the underlying predictor. Increase the number of
   samples used during sampling and/or reduce the complexity level of the predictor. 

#. There is a poor corelation between the QOIs and the parameters. Examine the
   results of the sensitivity study to gain a better understanding of how the QOIs 
   and the parameters relate to each other and then try again. 

#. Trying to predict QOI that change by several orders of magnitude (even going from 1 to near 0).
   In these cases it is better to calibrate to the natural log of these values. This can be 
   done using the :meth:`~matcal.core.surrogates.SurrogateGenerator.set_fields_to_log_scale`
   method of the surrogate generator. 

The scores are output in the log files and standard output, but can 
also be accessed as properties under the surrogate after 
it has been produced. We print the scores below 
for this surrogate.

.. GENERATED FROM PYTHON SOURCE LINES 212-215

.. code-block:: Python

    print('Train scores:\n', surrogate.scores['train'])
    print('Test scores:\n', surrogate.scores['test'])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Train scores:
     OrderedDict({'TC_top': 0.9999770041086106, 'TC_bottom': 0.9987124543658804})
    Test scores:
     OrderedDict({'TC_top': 0.9999501576521572, 'TC_bottom': 0.9961329082917857})


.. GENERATED FROM PYTHON SOURCE LINES 216-229

Both the test scores and the training scores indicate the surrogates are well
trained and can be used to predict our responses. 

Now we use the surrogate to make predictions of the model 
responses. 
To do so, we pass in an argument list or 
keyword argument list for the parameters that we want evaluated.
The surrogate will return a dictionary of predictions.  
The order of the parameters is the same order that they were 
passed into the the parameter collection or study, but this can be verified by 
calling :meth:`~matcal.core.surrogates.MatCalMultiModalPCASurrogate.parameter_order`.
If keyword arguments are used, the keywords must match the parameter names 
assigned in the :class:`~matcal.core.parameters.Parameter` object inits.

.. GENERATED FROM PYTHON SOURCE LINES 229-233

.. code-block:: Python

    H = 10
    T_inf = 600
    T_air = 400


.. GENERATED FROM PYTHON SOURCE LINES 234-243

By default, the surrogates will not allow evaluations outside of the 
parameter space ranges provided in the combined test and training data sets.
Usually, the specified parameter bounds will not be included in the samples
generated from a parameter sampling study as done above. To permit evaluations at the bounds, 
you can update the permissible parameter ranges using 
:meth:`~matcal.core.surrogates.MatCalMultiModalPCASurrogate.set_parameter_ranges`
as we do below. Since we modified the surrogate, we also have to 
save the surrogate so that the changes are in the stored surrogate
just in case we attempt to use it later.

.. GENERATED FROM PYTHON SOURCE LINES 243-249

.. code-block:: Python

    surrogate.set_parameter_ranges(H=[conv_heat_transfer_coeff.get_lower_bound(), 
                                      conv_heat_transfer_coeff.get_upper_bound()], 
                                   T_inf=[far_field_temperature.get_lower_bound(), 
                                          far_field_temperature.get_upper_bound()],
                                   T_air=[air_temperature.get_lower_bound(), 
                                          air_temperature.get_upper_bound()])


.. GENERATED FROM PYTHON SOURCE LINES 250-255

Since we modified the surrogate, we must 
save the surrogate so that the changes are in the stored surrogate
just in case we attempt to use it later.
It is saved using the :func:`~matcal.core.serializer_wrapper.matcal_save`
function.

.. GENERATED FROM PYTHON SOURCE LINES 255-256

.. code-block:: Python

    mc.matcal_save("layered_metal_bc_surrogate.joblib", surrogate)


.. GENERATED FROM PYTHON SOURCE LINES 257-261

With the parameter ranges updated, we can no evaluate 
the surrogate with our desired parameters. 
This evaluation includes T_air = 400, which 
was the user specified bound for the training data study.

.. GENERATED FROM PYTHON SOURCE LINES 261-274

.. code-block:: Python

    prediction = surrogate(H, T_inf, T_air)

    import matplotlib.pyplot as plt
    plt.close('all')
    plt.figure(constrained_layout=True)
    plt.plot(prediction['time'], prediction['TC_top'].flatten(), label="top")
    plt.plot(prediction['time'], prediction['TC_bottom'].flatten(), label="bottom")
    plt.xlabel("time (s)")
    plt.ylabel("temperature (K)")
    plt.legend()
    plt.title("Single Surrogate Prediction")
    plt.show()


.. image-sg:: /introduction_examples/surrogate_studies/images/sphx_glr_plot_generate_surrogate_a_standard_001.png
   :alt: Single Surrogate Prediction
   :srcset: /introduction_examples/surrogate_studies/images/sphx_glr_plot_generate_surrogate_a_standard_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 275-279

Multiple sets of parameters can be evaluated simultaneously. 
Each field in the returned prediction will have a number of rows equal to 
the number of passed parameter sets. To evaluate multiple parameter sets, 
pass the `batch_evaluate=True` keyword argument

.. GENERATED FROM PYTHON SOURCE LINES 279-285

.. code-block:: Python

    H2 = 20
    T_inf2 = 815
    T_air2 = 634

    prediction2 = surrogate([[H, T_inf, T_air], [H2, T_inf2, T_air2]], batch_evaluate=True)


.. GENERATED FROM PYTHON SOURCE LINES 286-291

We can also run the actual model for these parameters for comparison 
to the surrogate. Doing this step is recommended 
when determining if a surrogate is adequate for use in calibration or 
other studies. We do so using the 
:class:`~matcal.core.parameter_studies.ParameterStudy`. 

.. GENERATED FROM PYTHON SOURCE LINES 291-299

.. code-block:: Python

    param_study = mc.ParameterStudy(conv_heat_transfer_coeff, far_field_temperature,
                                     air_temperature)
    param_study.add_evaluation_set(my_hifi_model, my_objective)
    param_study.set_core_limit(16)
    param_study.add_parameter_evaluation(H=H, T_inf=T_inf, T_air=T_air)
    param_study.add_parameter_evaluation(H=H2, T_inf=T_inf2, T_air=T_air2)
    results = param_study.launch()


.. GENERATED FROM PYTHON SOURCE LINES 300-303

With both the finite element model results 
and the surrogate model results obtained, we can 
plot them together for comparison.

.. GENERATED FROM PYTHON SOURCE LINES 303-333

.. code-block:: Python

    fe_data1 = results.simulation_history[my_hifi_model.name]["matcal_default_state"][0]
    fe_data2 = results.simulation_history[my_hifi_model.name]["matcal_default_state"][1]


    plt.figure(constrained_layout=True)
    plt.plot(prediction2['time'], prediction2['TC_top'][0,:], '.', label="top prediction 1", 
             color='tab:blue')
    plt.plot(prediction2['time'], prediction2['TC_top'][1,:], '.', label="top prediction 2", 
             color='tab:orange')
    plt.plot(prediction2['time'], prediction2['TC_bottom'][0,:], '.', label="bottom prediction 1", 
             color='tab:green')
    plt.plot(prediction2['time'], prediction2['TC_bottom'][1,:], '.', label="bottom prediction 2", 
             color='tab:red')

    plt.plot(fe_data1['time'], fe_data1['TC_top'], label="top FE results 1", 
             color='cornflowerblue')
    plt.plot(fe_data2['time'], fe_data2['TC_top'], label="top FE results 2", 
             color='orange')
    plt.plot(fe_data1['time'], fe_data1['TC_bottom'], label="bottom FE results 1", 
             color='lightgreen')
    plt.plot(fe_data2['time'], fe_data2['TC_bottom'], label="bottom FE results 2", 
             color='orangered')
    plt.xlabel("time (s)")
    plt.ylabel("temperature (K)")

    plt.legend(ncols=2)
    plt.title("Multiple Surrogate Predictions")

    plt.show()


.. image-sg:: /introduction_examples/surrogate_studies/images/sphx_glr_plot_generate_surrogate_a_standard_002.png
   :alt: Multiple Surrogate Predictions
   :srcset: /introduction_examples/surrogate_studies/images/sphx_glr_plot_generate_surrogate_a_standard_002.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 334-338

Similarly, we can plot the surrogate model error. First, 
we interpolate the surrogate results to the finite element model 
times. Next, we calculate and plot the absolute error 
for each prediction.

.. GENERATED FROM PYTHON SOURCE LINES 338-370

.. code-block:: Python

    interp_prediction_top1 = np.interp(fe_data1['time'], prediction2['time'], 
                                         prediction2['TC_top'][0,:])
    interp_prediction_top2 = np.interp(fe_data2['time'], prediction2['time'], 
                                         prediction2['TC_top'][1,:])

    interp_prediction_bot1 = np.interp(fe_data1['time'], prediction2['time'], 
                                         prediction2['TC_bottom'][0,:])
    interp_prediction_bot2 = np.interp(fe_data2['time'], prediction2['time'], 
                                         prediction2['TC_bottom'][1,:])

    plt.figure(constrained_layout=True)
    plt.plot(fe_data1['time'], interp_prediction_top1-fe_data1['TC_top'], 
             label="top TC error 1", 
             color='tab:blue')
    plt.plot(fe_data2['time'], interp_prediction_top2-fe_data2['TC_top'], 
             label="top TC error 2", 
             color='tab:orange')
    plt.plot(fe_data1['time'], interp_prediction_bot1-fe_data1['TC_bottom'], 
             label="bottom TC error 1", 
             color='tab:green')
    plt.plot(fe_data2['time'], interp_prediction_bot2-fe_data2['TC_bottom'], 
             label="bottom TC error 2", 
             color='tab:red')
    plt.xlabel("time (s)")
    plt.ylabel("temperature error (K)")

    plt.legend(ncols=2)
    plt.title("Multiple Surrogate Predictions")

    plt.show()


.. image-sg:: /introduction_examples/surrogate_studies/images/sphx_glr_plot_generate_surrogate_a_standard_003.png
   :alt: Multiple Surrogate Predictions
   :srcset: /introduction_examples/surrogate_studies/images/sphx_glr_plot_generate_surrogate_a_standard_003.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 371-380

These results show that the surrogates predict the response fairly well. 
Most of the error is below 10 K throughout the entire history which is just a few 
percent for the curves.  The second prediction for the bottom thermal couple
has the worst surrogate prediction late in the time history. This could potentially
be improved with more modes and more training samples.

If needed, we can load this surrogate again for future use by constructing a 
:class:`~matcal.core.surrogates.MatCalMultiModalPCASurrogate`, with the saved filename
created during the surrogate's generation. 

.. GENERATED FROM PYTHON SOURCE LINES 380-384

.. code-block:: Python

    loaded_surrogate = mc.matcal_load("layered_metal_bc_surrogate.joblib")
    loaded_prediction2 = loaded_surrogate([[H, T_inf, T_air], [H2, T_inf2, T_air2]], 
                                          batch_evaluate=True)


.. GENERATED FROM PYTHON SOURCE LINES 385-390

In the follow-on examples, we compare this surrogate to adaptive surrogates. 
To do so, we evaluate the surrogate over a Halton sampling of the 
parameter space with a given seed. We then compare the maximum average error
in the surrogate give 500 non-adaptive training samples to the adaptive surrogates
in the other examples.

.. GENERATED FROM PYTHON SOURCE LINES 390-399

.. code-block:: Python

    test_study = mc.HaltonStudy(conv_heat_transfer_coeff, far_field_temperature, 
                                            air_temperature)
    test_study.add_evaluation_set(my_hifi_model, my_objective)
    test_study.set_core_limit(250)
    test_study.set_number_of_samples(30)
    test_study.set_seed(12345)
    test_study.set_working_directory("standard_surrogate_test", remove_existing=True)
    test_results = test_study.launch()


.. GENERATED FROM PYTHON SOURCE LINES 401-405

.. code-block:: Python

    surrogate_generator.set_surrogate_details(training_fraction=1.0, test_eval_info=test_results)
    surrogate = surrogate_generator.generate("layered_metal_bc_surrogate")
    print(surrogate.max_errors)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    OrderedDict({'train': OrderedDict({'TC_top': 5.005222870508078, 'TC_bottom': 17.061705757217737}), 'test': OrderedDict({'TC_top': 4.585223974808059, 'TC_bottom': 116.01490147862319})})


.. GENERATED FROM PYTHON SOURCE LINES 406-413

Lastly, the surrogate can be investigated in an interactive manner using 
MatCal's interactive tools. To do so, use the command line call:
.. code-block:: python

   interactive_matcal -s <path_to_surrogate_save_file>

This command will launch a browser window in which you can investigate your surrogate.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (7 minutes 48.017 seconds)


.. _sphx_glr_download_introduction_examples_surrogate_studies_plot_generate_surrogate_a_standard.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_generate_surrogate_a_standard.ipynb <plot_generate_surrogate_a_standard.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_generate_surrogate_a_standard.py <plot_generate_surrogate_a_standard.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_generate_surrogate_a_standard.zip <plot_generate_surrogate_a_standard.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_