dispel.data.collections module#

A module for collections of measure values.

class dispel.data.collections.MeasureCollection[source]#

Bases: object

A measure collection from one or multiple readings.

The measure collection structure provides a common object to handle basic transformations needed to perform analyses across multiple subjects and measures. The data is stored in a pandas data frame and can be retrieved by calling data. The returned data frame contains the measure values as well as some automatically computed properties, such as the trail number, reflecting the number of times a test was performed. A comprehensive list of properties can be found in the table below.

Column

Description

subject_id

A unique identifier of the subject

evaluation_uuid

A unique identifier of the evaluation

evaluation_code

The code identifying the type of evaluation

session_uuid

A unique identifier of a session of multiple evaluations

session_code

The code identifying the type of session

start_date

The start date and time of the evaluation

end_date

The end date and time of the evaluation

is_finished

If the evaluation was completed or not

algo_version

Version of the analysis library

measure_id

The id of the measure

measure_name

The human readable name of the measure

measure_value

The actual measure value

measure_unit

The unit of the measure, if applicable

measure_type

The numpy type of the value

trial

The number of times the evaluation was performed by the subject

relative_start_date

The relative start date based on the first evaluation for each subject

The data frame might contain additional columns if the collection was constructed using from_data_frame() and only_required_columns set to False.

__init__()[source]#
append(measure_value, evaluation, session, _ignore_consistency=False)[source]#

Adding measure value to the measure collection.

Parameters:
  • measure_value (MeasureValue) – The measure value to be added to the collection.

  • evaluation (Evaluation) – The evaluation corresponding to the given measure value.

  • session (Session) – The session corresponding to the given evaluation.

  • _ignore_consistency (bool) – If True, methods for ensuring consistency of the data will be skipped.

property data: DataFrame#

Get measure collection data frame.

property evaluation_count: int#

Get the number of different evaluations.

property evaluation_ids: ndarray#

Get the evaluation ids in the measure collection.

extend(other, overwrite=True, _ignore_consistency=False)[source]#

Extend measure collection by another.

Parameters:
  • other (MeasureCollection) – The object with which the measure collection is to be expanded.

  • overwrite (bool) – True if new measure information is to be replaced with existing one. False otherwise.

  • _ignore_consistency (bool) – If True, methods for ensuring consistency of the data will be skipped.

Raises:

TypeError – If the type of the object to be extended is not a measure collection.

classmethod from_csv(path)[source]#

Create a class instance from a csv file.

Parameters:

path (str) – The path to a csv file from which measures are to be collected.

Returns:

A measure collection from the CSV file specified in path. See also from_data_frame().

Return type:

MeasureCollection

classmethod from_data_frame(data, only_required_columns=False)[source]#

Create a class instance from a DataFrame.

Parameters:
  • data (DataFrame) – A data frame containing the information relative to measures. The data frame should contain the following columns (subject_id or user_id, evaluation_uuid, evaluation_code, session_uuid, session_code, start_date, end_date, is_finished, measure_id, measure_name, measure_value, measure_unit, measure_type).

  • only_required_columns (bool) – True if only the required columns are to be preserved in the measure collection. False otherwise.

Returns:

A measure collection from a pandas data frame.

Return type:

MeasureCollection

Raises:
  • ValueError – If duplicate measures for same evaluations exist in the initializing data frame.

  • MissingColumnError – If required columns are missing from the data frame.

classmethod from_measure_set(measure_set, evaluation, session, _ignore_consistency=False)[source]#

Create a class instance from measure set.

Parameters:
  • measure_set (MeasureSet) – The measure set whose measures are to be collected.

  • evaluation (Evaluation) – The evaluation corresponding to the given measure set.

  • session (Session) – The session corresponding to the given evaluation.

  • _ignore_consistency (bool) –

Returns:

A measure collection containing all measures from the measure_set using the evaluation and session to complement the necessary information.

Return type:

MeasureCollection

classmethod from_reading(reading, _ignore_consistency=False)[source]#

Create a class instance from reading.

Parameters:
  • reading (Reading) – The reading from which the measure collection is to be initialized.

  • _ignore_consistency (bool) –

Returns:

A measure collection containing all measures from the reading measure sets of each level. See also from_measure_set().

Return type:

MeasureCollection

Raises:

ValueError – If the reading session information is not provided.

classmethod from_readings(readings)[source]#

Create a class instance from readings.

Parameters:

readings (Iterable[Reading]) – The readings from which the measure collection is to be initialized.

Returns:

A measure collection from all measure sets of all readings. See also from_reading().

Return type:

MeasureCollection

get_aggregated_measures_over_period(measure_id, period, aggregation)[source]#

Get aggregated measure values over a given period.

Parameters:
  • measure_id (str) – The identifier of the measure for which the data is being computed.

  • period (str) – The period on which the measure is to be aggregated.

  • aggregation (str | Callable) – The aggregation method to be used.

Returns:

A pandas data frame regrouping aggregated measure values over a given period. The resulting data frame contains subjects as rows, aggregation periods as columns, and values based on the provided aggregation method.

Return type:

pandas.DataFrame

get_data(measure_id=None, subject_id=None)[source]#

Retrieve data from measure collection.

Parameters:
  • measure_id (str | None) – The identifier of the measure for which the data is being retrieved.

  • subject_id (str | None) – The identifier of the subject for which the data is being retrieved.

Returns:

A pandas data frame filtered w.r.t. the given arguments.

Return type:

pandas.DataFrame

get_evaluation_ids_for_subject(subject_id)[source]#

Get evaluations related to a subject.

Parameters:

subject_id (str) – The subject identifier.

Returns:

The list of evaluation ids.

Return type:

List[str]

get_measure_definition(measure_id)[source]#

Get the measure definition for a specific measure id.

Parameters:

measure_id (DefinitionId | str) – The measure identifier.

Returns:

The corresponding measure definition.

Return type:

ValueDefinition

Raises:

MeasureNotFound – If the measure id does not correspond to any known measure definition.

get_measure_values_by_trials(measure_id)[source]#

Retrieve measure values over all trials by subject.

Parameters:

measure_id (str) – The identifier of the measure for which the data is being retrieved.

Returns:

A pandas data frame with subjects as indexes, trials as columns and measure values as values.

Return type:

pandas.DataFrame

get_measure_values_over_time(measure_id, subject_id, index='start_date')[source]#

Retrieve data as time indexed measure value series.

Parameters:
  • measure_id (str) – The identifier of the measure for which the data is being retrieved.

  • subject_id (str) – The identifier of the subject for which the data is being retrieved.

  • index (str | List[str]) – The index of the measure values pandas series.

Returns:

A pandas series with start date as index and measure values as values.

Return type:

pandas.Series

property measure_count: int#

Get the number of different measures.

property measure_definitions: ValuesView[ValueDefinition]#

Get measure definitions from measure collection.

property measure_ids: ndarray#

Get the measure ids in the measure collection.

property session_count: int#

Get the number of different session.

property session_ids: ndarray#

Get the session ids in the measure collection.

property size: int#

Get size of measure collection data frame.

property subject_count: int#

Get the number of different subjects.

property subject_ids: ndarray#

Get the subject ids in the measure collection.

to_csv(path=None)[source]#

Write object to a comma-separated values (csv) file.

Parameters:

path (str | None) – File path or object, if None is provided the result is returned as a string. If a file object is passed it should be opened with newline=’’, disabling universal newlines.

Returns:

If path is None, returns the resulting csv format as a string. Otherwise, returns None.

Return type:

Optional[str]

to_dict()[source]#

Convert the measure collection to a dictionary.

Return type:

Dict[str, Any]

to_json(path=None)[source]#

Convert the measure collection to a JSON string.

Parameters:

path (str | None) – File path or object. If not specified, the result is returned as a string.

Returns:

If path is None, returns the resulting json format as a string. Otherwise, returns None.

Return type:

Optional[str]

exception dispel.data.collections.MeasureNotFound[source]#

Bases: Exception

Class exception for not found measures in measure collections.

__init__(measure_id, measures)[source]#
Parameters:
exception dispel.data.collections.SubjectNotFound[source]#

Bases: Exception

Class exception for not found subjects in measure collections.

__init__(subject_id)[source]#
Parameters:

subject_id (str) –