Measure processing#
The library provides a basic framework to extract measures from raw data
captured in dispel.data.core.Reading
s. All functionality can be found
under the dispel.processing
module. This section provides a brief
introduction on using the framework.
For a comprehensive list of measures that are produced please see here.
Measure definitions#
In order to standardize how measures are represented the library comes with a few base classes that handle the definition of measures.
The following example shows how to create a basic measure definition
from dispel.data.validators import RangeValidator
from dispel.data.values import ValueDefinition
definition = ValueDefinition(
id_='reaction-time',
name='Reaction time',
unit='s',
description='The time it takes the subject to respond to the stimulus',
data_type='float64',
validator=RangeValidator(lower_bound=0)
)
Later on this definition is used to tie both definition and values together
using the MeasureValue
.
A common use case is to have a group of related measures, e.g. the same metric aggregated using different descriptive statistics. The library offers a prototype definition
usage-prototype
>>> from dispel.data.validators import RangeValidator
>>> from dispel.data.values import ValueDefinitionPrototype
>>> prototype = ValueDefinitionPrototype(
... id_='{method}-reaction-time',
... name='{method} reaction time',
... unit='s',
... description='The {method} time it takes the subject to respond '
... 'to all stimuli',
... data_type='float64',
... validator=RangeValidator(lower_bound=0)
... )
Given this prototype one can quickly create definitions
>>> from dispel.data.values import ValueDefinitionPrototype
>>> prototype = ValueDefinitionPrototype(
... id_='{method}-reaction-time',
... name='{method} reaction time',
... unit='s',
... description='The {method} time it takes the subject to respond '
... 'to all stimuli',
... data_type='float64',
... validator=RangeValidator(lower_bound=0)
... )
Given this prototype one can quickly create definitions
usage-prototype
>>> prototype.create_definition(method='median')
<ValueDefinition: median-reaction-time (median reaction time, s)>
The prototypes can consume as many placeholders as needed and use python’s
str.format()
method to create the actual definition.
The measure’s id
is represented using the
DefinitionId
class. This allows to standardize
measure ids. In the above examples the definition creates simply an instance
of DefinitionId
by using
from_str()
. One can provide their own
standard or use one of the more complex ones like
dispel.data.measures.MeasureId
:
>>> from dispel.data.measures import MeasureId
>>> from dispel.data.values import AbbreviatedValue as AV, ValueDefinition
>>> measure_name = AV('reaction time', 'rt')
>>> definition = ValueDefinition(
... id_=MeasureId(
... task_name=AV('test', 'tst'),
... measure_name=measure_name
... ),
... name=measure_name
... )
>>> definition
<ValueDefinition: tst-rt (reaction time)>
Since this is a common use case the library provides two additional classes
MeasureValueDefinition
and
MeasureValueDefinitionPrototype
. These two
classes allow to structure the definitions into tasks, modalities/variants of
the task, measure name, aggregation method, and running ids:
>>> from dispel.data.measures import MeasureValueDefinition
>>> from dispel.data.values import AbbreviatedValue as AV
>>> definition = MeasureValueDefinition(
... task_name=AV('Cognitive Processing Speed test', 'CPS'),
... measure_name=AV('correct responses', 'cr'),
... modalities=[
... AV('digit-to-digit', 'dtd'),
... AV('predefined key 1', 'key1')
... ],
... aggregation=AV('standard deviation', 'std')
... )
>>> definition
<MeasureValueDefinition: cps-dtd_key1-cr-std (CPS digit-to-digit ...>
Measure extraction#
Measure extraction methods are organized in modules per test, e.g. the
Cognitive Processing Speed (CPS) test measures extraction is available in
the dispel.providers.generic.tasks.cps.steps
. See also
here for details on how to contribute new
processing modules for tests.
Measure extraction is typically comprised of two generic tasks: (1) transforming raw signals (e.g. computing the magnitude of a signal); and (2) extracting a measure (e.g. the maximum magnitude value of the signal). To ensure re-usability of some generic building blocks the library provides a framework around handling these steps.
The basic class ProcessingStep
represents one
step that consumes a Reading
and
yields a ProcessingResult
that wraps one of
RawDataSet
, MeasureValue
,
or Level
.
Measure extractions can be defined by
providing a list of ProcessingStep
s to the
function process()
:
import pandas as pd
from dispel.data.core import Reading
from dispel.data.levels import Level
from dispel.data.measures import MeasureValue
from dispel.data.raw import (RawDataSetSource, RawDataValueDefinition,
RawDataSetDefinition, RawDataSet)
from dispel.data.values import ValueDefinition
from dispel.processing import ErrorHandling, ProcessingStep
from dispel.processing.data_set import RawDataSetProcessingResult
from dispel.processing.level import LevelProcessingResult
from dispel.signal import euclidean_norm
class EuclideanNorm(ProcessingStep):
def __init__(self, data_set_id, level_id):
self.data_set_id = data_set_id
self.level_id = level_id
def process_reading(self, reading: Reading):
input = reading.get_level(self.level_id).get_raw_data_set(
self.data_set_id
)
res = euclidean_norm(input.data)
yield RawDataSetProcessingResult(
step=self,
sources=input,
level=reading.get_level(self.level_id),
result=RawDataSet(
RawDataSetDefinition(
f'{self.data_set_id}-euclidean-norm',
RawDataSetSource('konectom'),
[RawDataValueDefinition('mag', 'magnitude')],
True # is computed!
),
pd.DataFrame(res.rename('mag'))
)
)
class MaxValue(ProcessingStep):
def __init__(self, data_set_id, level_id, measure_value_definition):
self.data_set_id = data_set_id
self.level_id = level_id
self.measure_value_definition = measure_value_definition
def process_reading(self, reading: Reading, **kwargs):
input = reading.get_level(self.level_id).get_raw_data_set(
self.data_set_id
)
yield LevelProcessingResult(
step=self,
sources=input,
level=reading.get_level(self.level_id),
result=MeasureValue(
self.measure_value_definition,
input.data.max().max()
)
)
steps = [
EuclideanNorm('accelerometer_ts', level_id),
MaxValue(
'accelerometer_ts-euclidean-norm',
level_id,
ValueDefinition(
'max-acc',
'Maximum magnitude of acceleration',
'm/s^2'
)
)
]
The actual processing is done by calling process()
on a reading. The following example assumes you have a
Reading
in the variable reading
. For details on
reading data sets see here.
>>> from dispel.processing import process
>>> res = process(example, steps)
>>> res
<DataTrace of <Reading: 2 levels (0 flags)>: (11 entities, 2 ...
>>> reading = res.get_reading()
>>> reading.get_measure_set(level_id).get_raw_value('max-acc')
0.012348961
The results will then be available in the measure_set
attribute of the
returned Reading
or from the attribute of
the Level
available with
get_level()
.
Transformation & Extraction#
Since the two examples above represent two common scenarios of consuming one
or more raw data sets to transform and consuming one or more raw data sets to
extract one or more measures the following convenience classes exist:
TransformStep
,
ExtractStep
, and
ExtractMultipleStep
. This simplifies the
definition of the above examples as follows:
>>> from dispel.processing.extract import ExtractStep
>>> from dispel.processing.transform import TransformStep
>>> transform_step = TransformStep(
... 'accelerometer_ts',
... euclidean_norm,
... 'accelerometer-euclidean-norm',
... [RawDataValueDefinition('mag', 'magnitude')]
... )
>>> extract_step = ExtractStep(
... 'accelerometer-euclidean-norm',
... lambda data: data.max().max(),
... ValueDefinition(
... 'max-acc',
... 'Maximum magnitude of acceleration',
... 'm/s^2'
... )
... )
>>> steps = [
... transform_step,
... extract_step
... ]
>>> res = process(example, steps).get_reading()
One can also use supplementary information on top of the automatically passed
data frame inside the transformation functions. This functionality can be used
by passing either level
and/or reading
as parameters of the
transformation function and they will be automatically provided.
>>> from dispel.processing.extract import ExtractStep
>>> from dispel.processing.transform import TransformStep
>>> def reaction_time(data, level):
... return (
... data['ts'].min() - level.start
... ).total_seconds()
>>> extract_step = ExtractStep(
... 'accelerometer',
... reaction_time,
... ValueDefinition(
... 'rt',
... 'Reaction time',
... 's'
... )
... )
>>> steps = [extract_step]
>>> res = process(example, steps).get_reading()
Often transform and extract steps are defined as classes to ensure steps can be reused:
>>> from dispel.processing.data_set import transformation
>>> class MyExtractStep(ExtractStep):
... data_set_ids = 'accelerometer'
... definition = ValueDefinition(
... 'rt',
... 'Reaction time',
... 's'
... )
...
... @transformation
... def reaction_time(self, data, level):
... return (
... data['ts'].min() - level.start
... ).total_seconds()
>>> steps = [MyExtractStep()]
>>> res = process(example, steps).get_reading()
The above example shows some additional concepts that allow specify arguments,
such as the data set ids, via class variables. Furthermore, class routines
can be decorated with @transformation
to specify the transformation
applied to the data sets. Further details and more advanced use cases can be
found in the documentation of TransformStep
and ExtractStep
.
Grouping#
Another common scenario is to extract measures for a specific task and
sub-task. ExtractStep
allows to pass a
ValueDefinitionPrototype
instead of the
concrete definition. The helper class
ProcessingStepGroup
can be used to provide
additional arguments to the prototype:
>>> from dispel.data.measures import MeasureValueDefinitionPrototype
>>> from dispel.data.values import AbbreviatedValue as AV
>>> from dispel.processing.level import ProcessingStepGroup
>>> steps = [
... ProcessingStepGroup([
... transform_step,
... ExtractStep(
... 'accelerometer-euclidean-norm',
... lambda data: data.max().max(),
... MeasureValueDefinitionPrototype(
... measure_name=AV('measure 1', 'f'),
... description='{task_name} measure 1 description',
... unit='s'
... )
... )],
... task_name=AV('U-turn test', 'UTT')
... )
... ]
>>> res = process(example, steps).get_reading()
This is achieved by passing all named parameters from ProcessingStepGroup
to the process
function of each step.
Filtering#
Often one wants to process specific levels of a reading. Each level-based
processing step allows to specify a
LevelFilter
that allows to determine which
level will be considered during processing.
The supported processing step classes are
any other processing step inheriting from
LevelFilterProcessingStepMixin
.
Parameters#
Some processing might be contingent on the context used.
Parameter
allows to specify configurable
values that can be used to configure behavior of processing steps and linked
to extracted measures. This is important to keep lineage of any dimension
affecting the measure.
Parameters automatically create a unique id based on their location of
specification and the provided name. To link a parameter to a processing
step it has to be either defined directly in the processing step or assigned
to an attribute. dispel.processing.core.ProcessingStep.get_parameters()
automatically determines all parameters of a step through inspection.
Assuming we had a module called example.module
that defines a parameter on
a module level and on a processing step level. The typical pattern of usage
would be as following:
>>> # example.module
>>> from dispel.data.core import Reading
>>> from dispel.data.validators import GREATER_THAN_ZERO
>>> from dispel.processing.core import Parameter
>>> from dispel.processing.data_set import transformation
>>> from dispel.processing.transform import TransformStep
>>> PARAM_A = Parameter(
... id_='param_a',
... default_value=10,
... validator=GREATER_THAN_ZERO,
... description='A description explaining the influence of the param.'
... )
>>> def transform(data, param_a, param_b):
... return ...
>>> class MyTransformStep(TransformStep):
... param_a = PARAM_A
... param_b = Parameter('param_b')
... @transformation
... def _transform(self, data):
... return transform(data, self.param_a, self.param_b)
The above specification will lead to two parameters called
example.module.param_a
and example.module.MyTransformStep.param_b
.
The values can be modified by either using their id or reference, e.g.,
PARAM_A.value = 5
or Parameter.set_value('example.module.param_a', 5)
.
Data trace graph#
The data trace constitutes a DAG like representation of the main data entities of each evaluation i.e.
The links between entities are the processing steps that were applied on the source and led to the target entity.
The goal of the data trace graph is to keep tabs on transformation and extraction steps in order to trace which raw data has led to the creation on which measure.
Every entity is wrapped in a Node
class that links both parent and child nodes related to it. All nodes are
then stored in the DataTrace
class.
In order to populate the data trace graph one can use the
populate()
dispatch method.