matcal.core.data

The data module contains classes and functions for converting data into the structure that MatCal requires for studies.

Functions

`combine_data_sets_in_data_list`(data_list)	Given a list of `Data` objects, this function will return a dictionary where each item is all values from the same field in from all data sets and the key for the items are the field names.
`convert_data_to_dictionary`(data)	Converts a MatCal `Data` class into a dictionary of np.arrays.
`convert_dictionary_to_data`(dict_data)	Takes a dictionary and attempts to create a MatCal `Data` object.
`scale_data_collection`(data_collection, ...)	Scales all data sets in a data collection that have the requested field.

Classes

`AverageAbsDataConditioner`()	This data conditioner will condition data such that each field from the initializing data list is on the order of -1 to 1.
`Data`(data[, state, name])	Data is the base data structure for all MatCal data.
`DataCollection`(name, *data_sets)	A collection of `Data` objects to be used for a study.
`DataCollectionStatistics`([...])	This class can be used to calculate basic statistics on the data in a data collection by state and field.
`DataConditionerBase`()	This is the base class for MatCal data conditioners.
`MaxAbsDataConditioner`()	This data conditioner will condition data such that each field from the initializing data list is in the range of -1 to 1.
`RangeDataConditioner`()	This data conditioner will condition data such that each field from the initializing data list is in the range of 0 to 1.
`ReturnPassedDataConditioner`()	This data conditioner will make no changes to the data sets included in the evaluation set.
`Scaling`(field[, scalar, offset])	This class is used to apply a scaling multiplier and an offset to a specific field of a `Data` class.
`ScalingCollection`(name, *scalings)	A collection of `Scaling` objects.

class matcal.core.data.Data(data, state=<matcal.core.state.SolitaryState object>, name=None)[source]

Data is the base data structure for all MatCal data. This data structure is an interface to data that are used for MatCal studies. It is derived from a NumPy ndarrays but adds name and state, so that the data can be uniquely identified.

Construction / initialization

Data may only be constructed from:

A NumPy structured/record array (i.e., an np.ndarray with dtype.names is not None or an np.record), or
A dict/OrderedDict mapping field names to array-like values. If a dictionary is passed, it is converted using convert_dictionary_to_data().

Passing anything else (including a plain/unstructured np.ndarray) raises a built-in TypeError.

Accessing fields through field names returns the data for that field in either 1D or 2D arrays. If the data is ‘global’ such as time or load, the data will be reported as a 1D [n_times] array. If the data is field based the data is reported back as a 2D [n_times, n_points] array.

Parameters:

data (numpy.ndarray | numpy.record | dict | OrderedDict) –
data to be added to the MatCal data object. Must be either: (1) a NumPy structured/record array, or (2) a dict/OrderedDict of field_name -> array-like, which will be converted

using convert_dictionary_to_data().
state (State) – the state associated with the data. If none is passed it will be assigned the default state.
name (str) – the name for the data. By default it is set to “data_set_#” name with a unique id number. If FileData() is used to import data, then its name is set to the filename from which the data was imported.

set_state(state)[source]

Sets the optional state value for the data.

Parameters:: state (State) – The state for this particular data set.

set_name(name)[source]

Sets the optional name value for the data. If the data is imported using FileData(), the name is set to the filename from which the data was imported. If no name is passed and the data was created from the constructor or another function, an arbitrary name will be given to the data.

Parameters:: name (str) – The name for this particular data set.

add_field(field_name, data)[source]

Adds a new 1D field to the data and returns the updated data. The original data object is not modified. The added field must have the same length as the existing fields.

Parameters:

field_name (str) – The name of the new field to be added.
data (ArrayLike) – the data to be added.

Returns:

the data with newly added field

Return type:

~matcal.core.data.Data

property length

Returns:: The length of the data for each field.
Return type:: int

property state

Returns:: The physical state of the data corresponding to the experimental conditions.
Return type:: State

property field_names

Returns:: list of strings of all field names.
Return type:: list

property name

Returns the name for the data. If the data is imported using FileData(), the name is set to the filename from which the data was imported. If no name is passed and the data was created from the constructor or another function, an arbitrary name will be given to the data.

Rtype name:: str

remove_field(field)[source]

Returns a copy of the Data class with the desired field removed. The original data object is not modified.

Return type:: Data

rename_field(old_name, new_name)[source]

Returns the Data class with the desired the field name changed. Note that the old name is overwritten and not saved.

Parameters:

old_name (str) – the old field name that is to be updated
new_name (str) – the replacement field name for the field name that is being changed.

class matcal.core.data.DataCollection(name, *data_sets)[source]

A collection of Data objects to be used for a study. No restrictions are enforced on the type or contact of Data objects added to the collection. However, they are meant to hold data that is related by experiment and should generally have the same if not similar fields.

Exceptions to this rule may be when two different types of data are taken from the same experiment using different data acquisition hardware. In this case it may make sense to store Data objects in a data collection with different fields.

Warning

Not all MatCal objects or methods support data collections with Data objects that contain different field names. Appropriate errors should be used if such data collections are passed to them.

Parameters:

name (str) – The name of this data collection.
data_sets (list(Data) or Data.) – The Data sets to be added to the collection.

Raises:

CollectionValueError – If name is a an empty string.
CollectionTypeError – If name is not a string and the data objects to be added to the collection are not of the correct type.

property field_names

Returns:: a list of field names that exist in the data collection. These may not exist in all data objects or states and may only be in one data object in the collection.

property state_names

Returns:: the names of the State objects in the data collection.
Return type:: list(str)

property states

Returns:: The state State objects in the data collection.
Return type:: StateCollection

state_field_names(state)[source]

Return all the field names in all Data objects for the given state. Note that not all Data objects need to have all field names. This is just a comprehensive list of field names that exist across all Data objects in the DataCollection for this state.

Parameters:: state (str or State) – the state of interest to get all field names for
Returns:: a list of all field names
Return type:: list(str)

state_common_field_names(state)[source]

Return all the field names common to all Data objects for the given state.

Parameters:: state (str or State) – the state of interest to get all field names for
Returns:: a list of all field names that are common to all data sets for that state
Return type:: list(str)

add(item)[source]

Add a Data object to a data collection.

Parameters:: item (Data) – Data object to be added to the data collection.

remove_field(field_name)[source]

Removes the field from all data sets stored in the data collection that have the passed field name. If the data collection does not have any data sets with the specified field name, a warning will be sent to MatCal output.

Parameters:: field_name (str) – the name of the field to remove

plot(independent_field: str, dependent_field: str, plot_function=None, figure=None, show: bool = True, labels: str = None, state: State = None, block: bool = True, **kwargs) → None[source]

Plots the data with the independent field on the horizontal axis and dependent field on the vertical axis. It plots each state on a separate figure.

Parameters:

independent_field (str) – field name to use as horizontal axis variable.
dependent_field (str) – field name to use as vertical axis variable.
plot_function (matplotlib plot function) – a valid matplotlib plot function such as plot, semilogx, etc
figure (matplotlib Figure) – a valid matplotlib Figure for the data collection to be plotted on.
show (bool) – option to show or not show plot
labels (str) – provide a label for each data set other than the data set name. This can take the form of “suppress”, “{user_provided_label}” or “{user_provided_label} (#)”. If “suppress” is passed, none of the data will be labeled. If “{user_provided_label}” is passed, the first data set will be labeled once as “{user_provided_label}” where “user_provided_label” can be any user provided string. The rest will not be labeled. If the last option is used, where labels=”{user_provided_label} (#)”, each data set will be labeled with “{user_provided label}” and a number based on the order it is pulled from the data set. For example, a data collection with three data sets and this function called with labels=”experiment (#)”, the labels will be “experiment 1”, “experiment 2”, “experiment 3”.
state (State or str) – specify a specific state to plot using the state name or state object
block (bool) – stops Python from executing code after the plot figure is created. Follow-on code will not execute until the figure is closed. Default is to block (e.g. block=True).
kwargs (dict(str, str)) – a set of valid keyword argument pairs for the Matplotlib plotting function

get_data_by_state_values(**kwargs)[source]

Get a DataCollection containing data that has the state variables with values passed into this method.

Parameters:: kwargs (dict(str, str or float)) – keyword/value pairs of the desired state variables
Returns:: all data in the data collection that have states with the state variable and values specified in kwargs.
Return type:: DataCollection

get_states_by_state_values(**kwargs)[source]

Get a StateCollection containing the states with the state variable values passed into this method.

Parameters:: kwargs (dict(str, str or float)) – keyword/value pairs of the desired state variables
Returns:: a state collection that has all states with the state variables and values specified in kwargs.
Return type:: StateCollection

report_statistics(independent_field: str) → dict[source]

Get a summary of the statistics information. The method will report the mean and standard deviation for all dependent fields across the independent within each state. The data will be collocated to a common set of locations within the independent field. Statistics near the limits of the independent field range may be less accurate than of those in the interior because of errors due to extrapolation that may occur in the collocation process.

Parameters:: independent_field (str) – The string to designate which field should be interpreted as the independent field.
Returns:: a dictionary that contains the statistical measurements of the data fields. the data is organized by [field_name][state_name][stat_name]
Return type:: dict

dict()

Returns:: the collection as a dictionary of items with name/value pairs.

classmethod get_collection_type()

Returns:: the data type the collection stores

get_item_names()

Returns:: a list of the names of all items added to the collection.

get_number_of_items()

Returns:: the number of items in the collection

items()

Returns:: a list of tuples of key, value pairs contained in the collection.

keys()

Returns:: a list of all available keys in the collection.

property name

Returns:: the name of the collection
Return type:: str

set_name(name)

Sets the name of the collection.

Parameters:: name (str) – the new collection name

values()

Returns:: a list of all values in the collection.

class matcal.core.data.Scaling(field, scalar=1, offset=0)[source]

This class is used to apply a scaling multiplier and an offset to a specific field of a Data class. The offset is applied first, followed by the scale factor.

Parameters:

field (str) – The name of the field to be scaled.
scalar (float) – The magnitude of the scaling to be applied to the specified field.
offset – The magnitude of the offset to be applied to the specified field.

Raises:

TypeError – If the scaling object name and the field names are not strings.
TypeError – If the scalar value passed in is not a number.

property field

Returns:: The name of the field to be scaled by the scaling object.
Return type:: str

apply_to_data(data)[source]

Parameters:: data (Data) – the data object with the desired field to be scaled.
Returns:: The data object with the appropriately scaled field
Return type:: Data

set_scalar(value)[source]

Sets the scalar value to a different value if needed.

Parameters:: value (float) – the new scalar value for the scaling object.

property scalar

Returns:: the scaling value for the scaling object.

property offset

Returns:: the offset value for the scaling object.

class matcal.core.data.ScalingCollection(name, *scalings)[source]

A collection of Scaling objects. This is used to combine multiple scaling objects so that more than one scaling function or value can be applied to a data set. This class is used when applying different scaling functions or values to different fields within a data set.

Parameters:

name (str) – the name for the scaling collection used for identification for error catching.
scalings (list(Scaling)) – The scaling items to be added to the collection. They can be passed in as comma separated list or an unpacked list. Unpack a list using *list_name.

Raises:

CollectionValueError – If name is an empty string.
CollectionTypeError – If name is not a string and the scalings to be added to the collection are not of the correct type.

add(scaling)[source]

Adds a Scaling object to the collection.

Parameters:: scaling (Scaling) – scaling object to be added to the collection

dict()

Returns:: the collection as a dictionary of items with name/value pairs.

classmethod get_collection_type()

Returns:: the data type the collection stores

get_item_names()

Returns:: a list of the names of all items added to the collection.

get_number_of_items()

Returns:: the number of items in the collection

items()

Returns:: a list of tuples of key, value pairs contained in the collection.

keys()

Returns:: a list of all available keys in the collection.

property name

Returns:: the name of the collection
Return type:: str

set_name(name)

Sets the name of the collection.

Parameters:: name (str) – the new collection name

values()

Returns:: a list of all values in the collection.

class matcal.core.data.DataConditionerBase[source]

This is the base class for MatCal data conditioners. The data conditioners attempt to modify all data sets for a state in a single evaluation set such that the experimental data is on the order of -1 to 1. The data is modified according to:

$\mathbf{d}_c = \frac{\mathbf{d}-o}{s}$

where $\mathbf{d}$ is a vector created from all data sets included in a single state, $o$ is a scalar data offset calculated from $\mathbf{d}$ , and $s$ is a scalar scale factor calculated from $d$ . If $s=0$ after it is calculated, the base conditioner class will change the scale factor such that $s=mean\left(\left|\mathbf{d}\right|\right)$ or the average of the absolute value of the relevant data. If $s$ is still near zero, then the vector is full of zero or near zero values and the base conditioner sets the scale factor to $s=1$

The calculation of $o$ and $s$ is specific to the derived conditioner class. The abstract methods get_scale_for_data_field() and get_scale_for_data_field() define the calculations for $o$ and $s$ . A custom user class can be defined to implement conditioning of the user’s choice by including only the implementation of these methods.

apply_to_data(passed_data)[source]

Apply the conditioner to a data set. This can be any data set and does not need to be the one that was used to initialize the data set.

If a field name in a the data set passed to this method was not in the data set used to initialize the conditioner, the passed data field is returned unchanged.

Parameters:: passed_data (Data) – a data set to be conditioned using an initialized conditioner.

initialize_data_conditioning_values(data_list)[source]

Initialize the conditioner for a given list of data sets from a single state of a data collection.

Param:: list of data sets to be used for conditioning. Generally passed as a __getitem_ of a state from a DataCollection.
Parameters:: type – list(Data)

abstractmethod get_scale_for_data_field(field_data)[source]

Calculates the scale factor $s$ for the data conditioner given all values for a specific field name from the data collection for a single state. This scale factor will be used to condition all data with this state and field name when compared using an evaluation set.

Parameters:: field_data (ArrayLike) – all data for a specific field from a single state of a data collection used to calculate an objective in an evaluation set.

abstractmethod get_offset_for_data_field(field_data)[source]

Calculates the offset $o$ for the data conditioner given all values for a specific field name from the data collection for a single state. This offset will be used to condition all data with this state and field name when compared using an evaluation set.

Parameters:: field_data (ArrayLike) – all data for a specific field from a single state of a data collection used to calculate an objective in an evaluation set.

class matcal.core.data.ReturnPassedDataConditioner[source]

This data conditioner will make no changes to the data sets included in the evaluation set. Its scale and offset values are given by $s=1$ and $o=0$

get_scale_for_data_field(field_data)[source]

Calculates the scale factor $s$ for the data conditioner given all values for a specific field name from the data collection for a single state. This scale factor will be used to condition all data with this state and field name when compared using an evaluation set.

Parameters:: field_data (ArrayLike) – all data for a specific field from a single state of a data collection used to calculate an objective in an evaluation set.

get_offset_for_data_field(field_data)[source]

Calculates the offset $o$ for the data conditioner given all values for a specific field name from the data collection for a single state. This offset will be used to condition all data with this state and field name when compared using an evaluation set.

Parameters:: field_data (ArrayLike) – all data for a specific field from a single state of a data collection used to calculate an objective in an evaluation set.

apply_to_data(passed_data)

Apply the conditioner to a data set. This can be any data set and does not need to be the one that was used to initialize the data set.

If a field name in a the data set passed to this method was not in the data set used to initialize the conditioner, the passed data field is returned unchanged.

Parameters:: passed_data (Data) – a data set to be conditioned using an initialized conditioner.

initialize_data_conditioning_values(data_list)

Initialize the conditioner for a given list of data sets from a single state of a data collection.

Param:: list of data sets to be used for conditioning. Generally passed as a __getitem_ of a state from a DataCollection.
Parameters:: type – list(Data)

class matcal.core.data.RangeDataConditioner[source]

This data conditioner will condition data such that each field from the initializing data list is in the range of 0 to 1. To do so the scale and offset values are calculated as $s=max\left(\mathbf{d}\right)-min\left(\mathbf{d}\right)$ and $o=min\left(\mathbf{d}\right)$ .

get_scale_for_data_field(field_data)[source]

Calculates the scale factor $s$ for the data conditioner given all values for a specific field name from the data collection for a single state. This scale factor will be used to condition all data with this state and field name when compared using an evaluation set.

Parameters:: field_data (ArrayLike) – all data for a specific field from a single state of a data collection used to calculate an objective in an evaluation set.

get_offset_for_data_field(field_data)[source]

Calculates the offset $o$ for the data conditioner given all values for a specific field name from the data collection for a single state. This offset will be used to condition all data with this state and field name when compared using an evaluation set.

Parameters:: field_data (ArrayLike) – all data for a specific field from a single state of a data collection used to calculate an objective in an evaluation set.

apply_to_data(passed_data)

Apply the conditioner to a data set. This can be any data set and does not need to be the one that was used to initialize the data set.

If a field name in a the data set passed to this method was not in the data set used to initialize the conditioner, the passed data field is returned unchanged.

Parameters:: passed_data (Data) – a data set to be conditioned using an initialized conditioner.

initialize_data_conditioning_values(data_list)

Initialize the conditioner for a given list of data sets from a single state of a data collection.

Param:: list of data sets to be used for conditioning. Generally passed as a __getitem_ of a state from a DataCollection.
Parameters:: type – list(Data)

class matcal.core.data.MaxAbsDataConditioner[source]

This data conditioner will condition data such that each field from the initializing data list is in the range of -1 to 1. To do so, the scale values are calculated as $s=max\left(\left|\mathbf{d}\right|\right)$ and $o=0$ . Note that this only guarantees the data will be in the range of -1 to 1, it does not enforce that the data spans the entirety of -1 to 1.

get_scale_for_data_field(field_data)[source]

Calculates the scale factor $s$ for the data conditioner given all values for a specific field name from the data collection for a single state. This scale factor will be used to condition all data with this state and field name when compared using an evaluation set.

Parameters:: field_data (ArrayLike) – all data for a specific field from a single state of a data collection used to calculate an objective in an evaluation set.

get_offset_for_data_field(field_data)[source]

Calculates the offset $o$ for the data conditioner given all values for a specific field name from the data collection for a single state. This offset will be used to condition all data with this state and field name when compared using an evaluation set.

Parameters:: field_data (ArrayLike) – all data for a specific field from a single state of a data collection used to calculate an objective in an evaluation set.

apply_to_data(passed_data)

Apply the conditioner to a data set. This can be any data set and does not need to be the one that was used to initialize the data set.

If a field name in a the data set passed to this method was not in the data set used to initialize the conditioner, the passed data field is returned unchanged.

Parameters:: passed_data (Data) – a data set to be conditioned using an initialized conditioner.

initialize_data_conditioning_values(data_list)

Initialize the conditioner for a given list of data sets from a single state of a data collection.

Param:: list of data sets to be used for conditioning. Generally passed as a __getitem_ of a state from a DataCollection.
Parameters:: type – list(Data)

class matcal.core.data.AverageAbsDataConditioner[source]

This data conditioner will condition data such that each field from the initializing data list is on the order of -1 to 1. To do so, the scale values are calculated as $s=mean\left(\left|\mathbf{d}\right|\right)$ and $o=0$ . Note that this likely puts the all data in the field on the order of -1 to 1, but the data could be well outside of this range depending on the values in the data.

get_scale_for_data_field(field_data)[source]

Calculates the scale factor $s$ for the data conditioner given all values for a specific field name from the data collection for a single state. This scale factor will be used to condition all data with this state and field name when compared using an evaluation set.

Parameters:: field_data (ArrayLike) – all data for a specific field from a single state of a data collection used to calculate an objective in an evaluation set.

get_offset_for_data_field(field_data)[source]

Calculates the offset $o$ for the data conditioner given all values for a specific field name from the data collection for a single state. This offset will be used to condition all data with this state and field name when compared using an evaluation set.

Parameters:: field_data (ArrayLike) – all data for a specific field from a single state of a data collection used to calculate an objective in an evaluation set.

apply_to_data(passed_data)

Apply the conditioner to a data set. This can be any data set and does not need to be the one that was used to initialize the data set.

If a field name in a the data set passed to this method was not in the data set used to initialize the conditioner, the passed data field is returned unchanged.

Parameters:: passed_data (Data) – a data set to be conditioned using an initialized conditioner.

initialize_data_conditioning_values(data_list)

Initialize the conditioner for a given list of data sets from a single state of a data collection.

Param:: list of data sets to be used for conditioning. Generally passed as a __getitem_ of a state from a DataCollection.
Parameters:: type – list(Data)

matcal.core.data.combine_data_sets_in_data_list(data_list)[source]

Given a list of Data objects, this function will return a dictionary where each item is all values from the same field in from all data sets and the key for the items are the field names.

Parameters:: data_list (list(Data)) – list of data sets that will be combined.

matcal.core.data.scale_data_collection(data_collection, field_name, scale, offset=0)[source]

Scales all data sets in a data collection that have the requested field. It will apply the correct scale factor and offset to each data set and return a new data collection that is scaled. Note that if both are used, the offset is applied first and then the results are scaled. A new scaled data collection is returned and the old one is unmodified.

Parameters:

data_collection (DataCollection) – the data collection to be scaled
field_name – the name of the field to be modified
scale (float) – a linear scale factor to scale the field
offset (float) – a constant offset to be added to the field

Returns:

new scaled data collection

Return type:

DataCollection

matcal.core.data.convert_data_to_dictionary(data)[source]

Converts a MatCal Data class into a dictionary of np.arrays.

Parameters:: data (Data) – a MatCal data set
Returns:: dictionary conversion of the data object
Return type:: OrderedDict

matcal.core.data.convert_dictionary_to_data(dict_data)[source]

Takes a dictionary and attempts to create a MatCal Data object. The keys for the dictionary are expected to be strings for the field names and the values are expected to be valid numeric or string data.

Parameters:: dict_data (dict or OrderedDict) – a dictionary with field names as keys and the data values as the dictionary values.
Returns:: a Data object with the default state SolitaryState.
Return type:: Data