dispel.processing.transform module#

Transformation functionalities for processing module.

class dispel.processing.transform.Add[source]#

Bases: TransformStep

Add the results of a method onto the columns of a raw data set data.

Parameters:
  • data_set_id – The id of the data set to which the norm is added.

  • method – The method in question. It should output a pandas series with same length as the pandas data frame that it is fed.

  • method_kwargs – Optional arguments required for the methods.

  • columns – The columns on which the method is to be applied.

  • level_filter – An optional LevelFilter to determine the levels to be transformed. If no filter is provided, all levels will be transformed. The level_filter also accepts str, LevelIds and lists of either and passes them to a LevelIdFilter for convenience.

  • new_column – The name of the new column.

Examples

Assuming you want to apply a euclidean norm onto accelerometer you can achieve this by chaining the following steps:

processing

>>> from dispel.processing import process
>>> from dispel.processing.transform import Add
>>> from dispel.signal.core import euclidean_norm
>>> step = Add(
...     'accelerometer',
...     euclidean_norm,
...     columns=list('xyz')
... )

This step will apply a 2. order euclidean norm to the columns x, y, and z and add a column xyz to the transformed data set.

__init__(data_set_id, method, method_kwargs=None, columns=None, level_filter=None, new_column=None)[source]#
Parameters:
class dispel.processing.transform.Apply[source]#

Bases: TransformStep

Apply a method onto columns of a raw data set.

Parameters:
  • data_set_id – The data set id of the data set on which the method is to be applied

  • method – The method in question. This can be any method that accepts a pandas series and returns an array of same length. See also pandas.DataFrame.apply().

  • method_kwargs – Optional arguments required for the methods.

  • columns – The columns to be considered during the method application.

  • drop_nan`True if NaN values are to be droped after transformation.

  • level_filter – An optional LevelFilter to determine the levels to be transformed. If no filter is provided, all levels will be transformed. The level_filter also accepts str, LevelIds and lists of either and passes them to a LevelIdFilter for convenience.

  • new_data_set_id – The id used for the RawDataSetDefinition.

Examples

Assuming you want to low-pass filter your gyroscope data of a reading you can create the following step to do so (note that the filtering expects a time-index-based and constant frequency-based data frame, so you might have to leverage SetTimestampIndex and Resample first):

>>> from dispel.processing.transform import Apply
>>> from dispel.signal.filter import butterworth_low_pass_filter
>>> step = Apply(
...     'gyroscope_ts_resampled',
...     butterworth_low_pass_filter,
...     dict(cutoff=1.5, order=2),
...     list('xyz'),
... )

This step will apply a 2. order butterworth low pass filter to the columns x, y, and z with a cut-off frequency of 1.5Hz.

__init__(data_set_id, method, method_kwargs=None, columns=None, new_data_set_id=None, drop_nan=False, level_filter=None)[source]#
Parameters:
class dispel.processing.transform.ConcatenateLevels[source]#

Bases: LevelFilterProcessingStepMixin, ProcessingStep

A processing step that create a meta level.

The meta level is created concatenating the data and merging the context. The contexts are merged by concatenating them with an extra _{k} in the name, k incrementing from 0. The effective time frame is created taking the start of the first level and the end of the last one.

Parameters:
  • new_level_id – The new level id that will be set inside the reading.

  • data_set_id – The ids of the data sets that will be concatenated.

  • level_filter (dispel.processing.level.LevelFilter) – An optional LevelFilter to determine the levels to be concatenated. If no filter is provided, all levels will be concatenated. The level_filter also accepts str, LevelIds and lists of either and passes them to a LevelIdFilter for convenience.

__init__(new_level_id, data_set_id, level_filter=None)[source]#
Parameters:
get_levels(reading)[source]#

Retrieve the levels used for level concatenation.

Parameters:

reading (Reading) – The reading used for processing.

Returns:

The levels used for concatenation after filtering.

Return type:

Iterable[Level]

process_reading(reading, **kwargs)[source]#

Create the meta level from reading.

Parameters:

reading (Reading) –

Return type:

Generator[ProcessingResult | ProcessingControlResult, None, None]

class dispel.processing.transform.SuffixBasedNewDataSetIdMixin[source]#

Bases: DataSetProcessingStepProtocol

A transformation step that can be chained to a previous step.

In some scenarios it is desirable to simply name the new data set based on the input of the transformation step with a suffix. This can be achieved by adding the mixin SuffixBasedNewDataSetIdMixin and using the & operator between the steps to be chained.

Parameters:

suffix (str) – The suffix to be added to the previous data set ids separated with an underscore. Alternatively, one can overwrite get_suffix() to provide a dynamic suffix.

Examples

Assuming you two transform steps and an extract step

steps = [
    InitialTransformStep(new_data_set_id='a'),
    SecondTransformStep(data_set_ids='a', new_data_set_id='a_b'),
    ExtractStep(data_set_ids='a_b')
]

With the transform steps leverage the SuffixBasedNewDataSetIdMixin, the same can be achieved by chaining the steps in the following way:

steps = [
    InitialTransformStep(new_data_set_id='a') &
    SecondTransformStep(suffix='b') &
    ExtractStep()
]
__init__(*args, **kwargs)[source]#
get_new_data_set_id()[source]#

Get the new data set id based on the chained step’s ids and suffix.

Returns:

The data set ids of the previous step are concatenated with underscores (_) and combined with another underscore and the specified suffix obtained from get_suffix().

Return type:

str

get_suffix()[source]#

Get the suffix to be added to the previous data set id.

suffix: str#
class dispel.processing.transform.TransformStep[source]#

Bases: TransformStepChainMixIn, MutateDataSetProcessingStepBase

A raw data set transformation processing step.

This class provides a convenient way to transform one or more data sets by specifying their ids, their level_ids or a level filter, a transformation function and specifications of a new data set to be returned as result of the processing step.

Parameters:
  • data_set_ids – An optional list of data set ids to be used for the transformation. See DataSetProcessingStepMixin.

  • transform_function – An optional function to be applied to the data sets. See MutateDataSetProcessingStepBase. The transform function is expected to produce one or more columns of a data set according to the specification in definitions. The function can return NumPy unidimensional arrays, Pandas series and data frames.

  • new_data_set_id (str) – An optional id used for the RawDataSetDefinition. If no id was provided, the new_data_set_id class variable will be used. Alternatively, one can overwrite get_new_data_set_id() to provide the new data set id.

  • definitions (List[dispel.data.raw.RawDataValueDefinition]) – An optional list of RawDataValueDefinition that has to match the number of columns returned by the transform_function. If no definitions were provided, the definitions class variable will be used. Alternatively, one can overwrite get_definitions() to provide the list of definitions.

  • level_filter – An optional filter to limit the levels being processed. See LevelProcessingStep.

  • storage_error (dispel.processing.data_set.StorageError) –

    This argument is only useful when the given new data id already exists. In which case, the following options are available:

    • 'ignore': the computation of the transformation step for the concerned level will be ignored.

    • 'overwrite': the existing data set id will be overwritten by the result of transform step computation.

    • 'concatenate': the existing data set id will be concatenated with the result of transform step computation.

    • 'raise': An error will be raised if we want to overwrite on an existing data set id.

Examples

Assuming you want to calculate the euclidean norm of a data set 'acceleration' for a specific level 'left-small' and then name the new data set 'accelerometer-norm', you can create the following step:

>>> from dispel.data.raw import RawDataValueDefinition
>>> from dispel.processing.transform import TransformStep
>>> from dispel.signal.core import euclidean_norm
>>> step = TransformStep(
...     'accelerometer',
...     euclidean_norm,
...     'accelerometer-norm',
...     [RawDataValueDefinition('norm', 'Accelerometer norm', 'm/s^2')]
... )

The transformation function will be called with the specified data sets as arguments. If the function has named parameters matching level or reading, the respective level and reading will be passed to the transformation function.

Another common scenario is to define a class that can be reused.

>>> from dispel.data.raw import RawDataValueDefinition
>>> from dispel.processing.transform import TransformStep
>>> class MyTransformStep(TransformStep):
...     data_set_ids = 'accelerometer'
...     transform_function = euclidean_norm
...     new_data_set_id = 'accelerometer-norm'
...     definitions = [
...         RawDataValueDefinition('norm', 'Accelerometer norm', 'm/s^2')
...     ]

Another convenient way to provide the transformation function is to use the @transformation decorator:

>>> import pandas as pd
>>> import numpy as np
>>> from dispel.data.raw import RawDataValueDefinition
>>> from dispel.processing.data_set import transformation
>>> from dispel.processing.transform import TransformStep
>>> class MyTransformStep(TransformStep):
...     data_set_ids = 'accelerometer'
...     new_data_set_id = 'accelerometer-norm'
...     definitions = [
...         RawDataValueDefinition('norm', 'Accelerometer norm', 'm/s^2')
...     ]
...
...     @transformation
...     def _euclidean_norm(self, data: pd.DataFrame) -> pd.Series:
...         return data.pow(2).sum(axis=1).apply(np.sqrt)

Note that the decorated functions can also use level and reading as parameters to gain access to the respective level and reading being processed.

__init__(data_set_ids=None, transform_function=None, new_data_set_id=None, definitions=None, level_filter=None, storage_error=None)[source]#
Parameters:
definitions: List[RawDataValueDefinition]#
get_definitions()[source]#

Get the definitions of the raw data set values.

Return type:

List[RawDataValueDefinition]

get_new_data_set_id()[source]#

Get the id of the new data set to be created.

Return type:

str

get_raw_data_set_definition()[source]#

Get the raw data set definition.

new_data_set_id: str#
process_level(level, reading, **kwargs)[source]#

Process the provided Level.

Parameters:
Return type:

Generator[ProcessingResult | ProcessingControlResult, None, None]

storage_error: StorageError = 'raise'#
wrap_result(res, level, reading, **kwargs)[source]#

Wrap the result from the processing function into a class.

Parameters:
Return type:

Generator[LevelProcessingResult | RawDataSetProcessingResult, None, None]

class dispel.processing.transform.TransformStepChainMixIn[source]#

Bases: DataSetProcessingStepProtocol

A mixin class that allows to chain transformation steps.

The basic idea is to leverage the new data set ids from the previous transform step as the required data set ids for the current step. This avoids having to define the data_set_ids attribute.

get_data_set_ids()[source]#

Get the data set ids to be processed.

This uses the new data set ids from a previous transform step if set. Otherwise, falls back to the default behavior of returning the set data set ids from the constructor or class variable.

Returns:

An iterable of data set ids.

Return type:

Iterable[str]