Measure processing#

The library provides a basic framework to extract measures from raw data captured in dispel.data.core.Readings. All functionality can be found under the dispel.processing module. This section provides a brief introduction on using the framework.

For a comprehensive list of measures that are produced please see here.

Measure definitions#

In order to standardize how measures are represented the library comes with a few base classes that handle the definition of measures.

The following example shows how to create a basic measure definition

from dispel.data.validators import RangeValidator
from dispel.data.values import ValueDefinition

definition = ValueDefinition(
    id_='reaction-time',
    name='Reaction time',
    unit='s',
    description='The time it takes the subject to respond to the stimulus',
    data_type='float64',
    validator=RangeValidator(lower_bound=0)
)

Later on this definition is used to tie both definition and values together using the MeasureValue.

A common use case is to have a group of related measures, e.g. the same metric aggregated using different descriptive statistics. The library offers a prototype definition

usage-prototype

>>> from dispel.data.validators import RangeValidator
>>> from dispel.data.values import ValueDefinitionPrototype
>>> prototype = ValueDefinitionPrototype(
...     id_='{method}-reaction-time',
...     name='{method} reaction time',
...     unit='s',
...     description='The {method} time it takes the subject to respond '
...                 'to all stimuli',
...     data_type='float64',
...     validator=RangeValidator(lower_bound=0)
... )

Given this prototype one can quickly create definitions
>>> from dispel.data.values import ValueDefinitionPrototype
>>> prototype = ValueDefinitionPrototype(
...     id_='{method}-reaction-time',
...     name='{method} reaction time',
...     unit='s',
...     description='The {method} time it takes the subject to respond '
...                 'to all stimuli',
...     data_type='float64',
...     validator=RangeValidator(lower_bound=0)
... )

Given this prototype one can quickly create definitions

usage-prototype

>>> prototype.create_definition(method='median')
<ValueDefinition: median-reaction-time (median reaction time, s)>

The prototypes can consume as many placeholders as needed and use python’s str.format() method to create the actual definition.

The measure’s id is represented using the DefinitionId class. This allows to standardize measure ids. In the above examples the definition creates simply an instance of DefinitionId by using from_str(). One can provide their own standard or use one of the more complex ones like dispel.data.measures.MeasureId:

>>> from dispel.data.measures import MeasureId
>>> from dispel.data.values import AbbreviatedValue as AV, ValueDefinition
>>> measure_name = AV('reaction time', 'rt')
>>> definition = ValueDefinition(
...     id_=MeasureId(
...         task_name=AV('test', 'tst'),
...         measure_name=measure_name
...     ),
...     name=measure_name
... )
>>> definition
<ValueDefinition: tst-rt (reaction time)>

Since this is a common use case the library provides two additional classes MeasureValueDefinition and MeasureValueDefinitionPrototype. These two classes allow to structure the definitions into tasks, modalities/variants of the task, measure name, aggregation method, and running ids:

>>> from dispel.data.measures import MeasureValueDefinition
>>> from dispel.data.values import AbbreviatedValue as AV
>>> definition = MeasureValueDefinition(
...     task_name=AV('Cognitive Processing Speed test', 'CPS'),
...     measure_name=AV('correct responses', 'cr'),
...     modalities=[
...         AV('digit-to-digit', 'dtd'),
...         AV('predefined key 1', 'key1')
...     ],
...     aggregation=AV('standard deviation', 'std')
... )
>>> definition
<MeasureValueDefinition: cps-dtd_key1-cr-std (CPS digit-to-digit ...>

Measure extraction#

Measure extraction methods are organized in modules per test, e.g. the Cognitive Processing Speed (CPS) test measures extraction is available in the dispel.providers.generic.tasks.cps.steps. See also here for details on how to contribute new processing modules for tests.

Measure extraction is typically comprised of two generic tasks: (1) transforming raw signals (e.g. computing the magnitude of a signal); and (2) extracting a measure (e.g. the maximum magnitude value of the signal). To ensure re-usability of some generic building blocks the library provides a framework around handling these steps.

The basic class ProcessingStep represents one step that consumes a Reading and yields a ProcessingResult that wraps one of RawDataSet, MeasureValue, or Level.

Measure extractions can be defined by providing a list of ProcessingSteps to the function process():

import pandas as pd
from dispel.data.core import Reading
from dispel.data.levels import Level
from dispel.data.measures import MeasureValue
from dispel.data.raw import (RawDataSetSource, RawDataValueDefinition,
                          RawDataSetDefinition, RawDataSet)
from dispel.data.values import ValueDefinition

from dispel.processing import ErrorHandling, ProcessingStep
from dispel.processing.data_set import RawDataSetProcessingResult
from dispel.processing.level import LevelProcessingResult
from dispel.signal import euclidean_norm

class EuclideanNorm(ProcessingStep):
    def __init__(self, data_set_id, level_id):
        self.data_set_id = data_set_id
        self.level_id = level_id

    def process_reading(self, reading: Reading):
        input = reading.get_level(self.level_id).get_raw_data_set(
            self.data_set_id
        )
        res = euclidean_norm(input.data)
        yield RawDataSetProcessingResult(
            step=self,
            sources=input,
            level=reading.get_level(self.level_id),
            result=RawDataSet(
                RawDataSetDefinition(
                    f'{self.data_set_id}-euclidean-norm',
                    RawDataSetSource('konectom'),
                    [RawDataValueDefinition('mag', 'magnitude')],
                    True  # is computed!
                ),
                pd.DataFrame(res.rename('mag'))
            )
        )

class MaxValue(ProcessingStep):
    def __init__(self, data_set_id, level_id, measure_value_definition):
        self.data_set_id = data_set_id
        self.level_id = level_id
        self.measure_value_definition = measure_value_definition

    def process_reading(self, reading: Reading, **kwargs):
        input = reading.get_level(self.level_id).get_raw_data_set(
            self.data_set_id
        )
        yield LevelProcessingResult(
            step=self,
            sources=input,
            level=reading.get_level(self.level_id),
            result=MeasureValue(
                self.measure_value_definition,
                input.data.max().max()
            )
        )

steps = [
    EuclideanNorm('accelerometer_ts', level_id),
    MaxValue(
        'accelerometer_ts-euclidean-norm',
        level_id,
        ValueDefinition(
            'max-acc',
            'Maximum magnitude of acceleration',
            'm/s^2'
        )
    )
]

The actual processing is done by calling process() on a reading. The following example assumes you have a Reading in the variable reading. For details on reading data sets see here.

>>> from dispel.processing import process
>>> res = process(example, steps)
>>> res
<DataTrace of <Reading: 2 levels (0 flags)>: (11 entities, 2 ...
>>> reading = res.get_reading()
>>> reading.get_measure_set(level_id).get_raw_value('max-acc')
0.012348961

The results will then be available in the measure_set attribute of the returned Reading or from the attribute of the Level available with get_level().

Transformation & Extraction#

Since the two examples above represent two common scenarios of consuming one or more raw data sets to transform and consuming one or more raw data sets to extract one or more measures the following convenience classes exist: TransformStep, ExtractStep, and ExtractMultipleStep. This simplifies the definition of the above examples as follows:

>>> from dispel.processing.extract import ExtractStep
>>> from dispel.processing.transform import TransformStep
>>> transform_step = TransformStep(
...     'accelerometer_ts',
...     euclidean_norm,
...     'accelerometer-euclidean-norm',
...     [RawDataValueDefinition('mag', 'magnitude')]
... )
>>> extract_step = ExtractStep(
...     'accelerometer-euclidean-norm',
...     lambda data: data.max().max(),
...     ValueDefinition(
...         'max-acc',
...         'Maximum magnitude of acceleration',
...         'm/s^2'
...     )
... )
>>> steps = [
...     transform_step,
...     extract_step
... ]
>>> res = process(example, steps).get_reading()

One can also use supplementary information on top of the automatically passed data frame inside the transformation functions. This functionality can be used by passing either level and/or reading as parameters of the transformation function and they will be automatically provided.

>>> from dispel.processing.extract import ExtractStep
>>> from dispel.processing.transform import TransformStep
>>> def reaction_time(data, level):
...     return (
...         data['ts'].min() - level.start
...     ).total_seconds()
>>> extract_step = ExtractStep(
...     'accelerometer',
...     reaction_time,
...     ValueDefinition(
...         'rt',
...         'Reaction time',
...         's'
...     )
... )
>>> steps = [extract_step]
>>> res = process(example, steps).get_reading()

Often transform and extract steps are defined as classes to ensure steps can be reused:

>>> from dispel.processing.data_set import transformation
>>> class MyExtractStep(ExtractStep):
...     data_set_ids = 'accelerometer'
...     definition = ValueDefinition(
...         'rt',
...         'Reaction time',
...         's'
...     )
...
...     @transformation
...     def reaction_time(self, data, level):
...         return (
...             data['ts'].min() - level.start
...         ).total_seconds()
>>> steps = [MyExtractStep()]
>>> res = process(example, steps).get_reading()

The above example shows some additional concepts that allow specify arguments, such as the data set ids, via class variables. Furthermore, class routines can be decorated with @transformation to specify the transformation applied to the data sets. Further details and more advanced use cases can be found in the documentation of TransformStep and ExtractStep.

Grouping#

Another common scenario is to extract measures for a specific task and sub-task. ExtractStep allows to pass a ValueDefinitionPrototype instead of the concrete definition. The helper class ProcessingStepGroup can be used to provide additional arguments to the prototype:

>>> from dispel.data.measures import MeasureValueDefinitionPrototype
>>> from dispel.data.values import AbbreviatedValue as AV
>>> from dispel.processing.level import ProcessingStepGroup
>>> steps = [
...     ProcessingStepGroup([
...         transform_step,
...         ExtractStep(
...             'accelerometer-euclidean-norm',
...             lambda data: data.max().max(),
...             MeasureValueDefinitionPrototype(
...                 measure_name=AV('measure 1', 'f'),
...                 description='{task_name} measure 1 description',
...                 unit='s'
...             )
...         )],
...         task_name=AV('U-turn test', 'UTT')
...     )
... ]
>>> res = process(example, steps).get_reading()

This is achieved by passing all named parameters from ProcessingStepGroup to the process function of each step.

Filtering#

Often one wants to process specific levels of a reading. Each level-based processing step allows to specify a LevelFilter that allows to determine which level will be considered during processing.

The supported processing step classes are

Parameters#

Some processing might be contingent on the context used. Parameter allows to specify configurable values that can be used to configure behavior of processing steps and linked to extracted measures. This is important to keep lineage of any dimension affecting the measure.

Parameters automatically create a unique id based on their location of specification and the provided name. To link a parameter to a processing step it has to be either defined directly in the processing step or assigned to an attribute. dispel.processing.core.ProcessingStep.get_parameters() automatically determines all parameters of a step through inspection.

Assuming we had a module called example.module that defines a parameter on a module level and on a processing step level. The typical pattern of usage would be as following:

>>> # example.module
>>> from dispel.data.core import Reading
>>> from dispel.data.validators import GREATER_THAN_ZERO
>>> from dispel.processing.core import Parameter
>>> from dispel.processing.data_set import transformation
>>> from dispel.processing.transform import TransformStep
>>> PARAM_A = Parameter(
...     id_='param_a',
...     default_value=10,
...     validator=GREATER_THAN_ZERO,
...     description='A description explaining the influence of the param.'
... )
>>> def transform(data, param_a, param_b):
...     return ...
>>> class MyTransformStep(TransformStep):
...     param_a = PARAM_A
...     param_b = Parameter('param_b')
...     @transformation
...     def _transform(self, data):
...         return transform(data, self.param_a, self.param_b)

The above specification will lead to two parameters called example.module.param_a and example.module.MyTransformStep.param_b. The values can be modified by either using their id or reference, e.g., PARAM_A.value = 5 or Parameter.set_value('example.module.param_a', 5).

Data trace graph#

The data trace constitutes a DAG like representation of the main data entities of each evaluation i.e.

The links between entities are the processing steps that were applied on the source and led to the target entity.

The goal of the data trace graph is to keep tabs on transformation and extraction steps in order to trace which raw data has led to the creation on which measure.

Every entity is wrapped in a Node class that links both parent and child nodes related to it. All nodes are then stored in the DataTrace class.

In order to populate the data trace graph one can use the populate() dispatch method.