dispel.processing.transform module#
Transformation functionalities for processing module.
- class dispel.processing.transform.Add[source]#
Bases:
TransformStep
Add the results of a method onto the columns of a raw data set data.
- Parameters:
data_set_id – The id of the data set to which the norm is added.
method – The method in question. It should output a pandas series with same length as the pandas data frame that it is fed.
method_kwargs – Optional arguments required for the methods.
columns – The columns on which the method is to be applied.
level_filter – An optional
LevelFilter
to determine the levels to be transformed. If no filter is provided, all levels will be transformed. Thelevel_filter
also acceptsstr
,LevelId
s and lists of either and passes them to aLevelIdFilter
for convenience.new_column – The name of the new column.
Examples
Assuming you want to apply a euclidean norm onto accelerometer you can achieve this by chaining the following steps:
processing >>> from dispel.processing import process >>> from dispel.processing.transform import Add >>> from dispel.signal.core import euclidean_norm >>> step = Add( ... 'accelerometer', ... euclidean_norm, ... columns=list('xyz') ... )
This step will apply a 2. order euclidean norm to the columns
x
,y
, andz
and add a columnxyz
to the transformed data set.
- class dispel.processing.transform.Apply[source]#
Bases:
TransformStep
Apply a method onto columns of a raw data set.
- Parameters:
data_set_id – The data set id of the data set on which the method is to be applied
method – The method in question. This can be any method that accepts a pandas series and returns an array of same length. See also
pandas.DataFrame.apply()
.method_kwargs – Optional arguments required for the methods.
columns – The columns to be considered during the method application.
drop_nan –
`True
if NaN values are to be droped after transformation.level_filter – An optional
LevelFilter
to determine the levels to be transformed. If no filter is provided, all levels will be transformed. Thelevel_filter
also acceptsstr
,LevelId
s and lists of either and passes them to aLevelIdFilter
for convenience.new_data_set_id – The
id
used for theRawDataSetDefinition
.
Examples
Assuming you want to low-pass filter your gyroscope data of a
reading
you can create the following step to do so (note that the filtering expects a time-index-based and constant frequency-based data frame, so you might have to leverageSetTimestampIndex
andResample
first):>>> from dispel.processing.transform import Apply >>> from dispel.signal.filter import butterworth_low_pass_filter >>> step = Apply( ... 'gyroscope_ts_resampled', ... butterworth_low_pass_filter, ... dict(cutoff=1.5, order=2), ... list('xyz'), ... )
This step will apply a 2. order butterworth low pass filter to the columns
x
,y
, andz
with a cut-off frequency of 1.5Hz.
- class dispel.processing.transform.ConcatenateLevels[source]#
Bases:
LevelFilterProcessingStepMixin
,ProcessingStep
A processing step that create a meta level.
The meta level is created concatenating the data and merging the context. The contexts are merged by concatenating them with an extra
_{k}
in the name,k
incrementing from 0. The effective time frame is created taking the start of the first level and the end of the last one.- Parameters:
new_level_id – The new level id that will be set inside the reading.
data_set_id – The ids of the data sets that will be concatenated.
level_filter (dispel.processing.level.LevelFilter) – An optional
LevelFilter
to determine the levels to be concatenated. If no filter is provided, all levels will be concatenated. Thelevel_filter
also acceptsstr
,LevelId
s and lists of either and passes them to aLevelIdFilter
for convenience.
- process_reading(reading, **kwargs)[source]#
Create the meta level from reading.
- Parameters:
reading (Reading) –
- Return type:
Generator[ProcessingResult | ProcessingControlResult, None, None]
- class dispel.processing.transform.SuffixBasedNewDataSetIdMixin[source]#
Bases:
DataSetProcessingStepProtocol
A transformation step that can be chained to a previous step.
In some scenarios it is desirable to simply name the new data set based on the input of the transformation step with a suffix. This can be achieved by adding the mixin
SuffixBasedNewDataSetIdMixin
and using the&
operator between the steps to be chained.- Parameters:
suffix (str) – The suffix to be added to the previous data set ids separated with an underscore. Alternatively, one can overwrite
get_suffix()
to provide a dynamic suffix.
Examples
Assuming you two transform steps and an extract step
steps = [ InitialTransformStep(new_data_set_id='a'), SecondTransformStep(data_set_ids='a', new_data_set_id='a_b'), ExtractStep(data_set_ids='a_b') ]
With the transform steps leverage the
SuffixBasedNewDataSetIdMixin
, the same can be achieved by chaining the steps in the following way:steps = [ InitialTransformStep(new_data_set_id='a') & SecondTransformStep(suffix='b') & ExtractStep() ]
- get_new_data_set_id()[source]#
Get the new data set id based on the chained step’s ids and suffix.
- Returns:
The data set ids of the previous step are concatenated with underscores (
_
) and combined with another underscore and the specified suffix obtained fromget_suffix()
.- Return type:
- class dispel.processing.transform.TransformStep[source]#
Bases:
TransformStepChainMixIn
,MutateDataSetProcessingStepBase
A raw data set transformation processing step.
This class provides a convenient way to transform one or more data sets by specifying their ids, their level_ids or a level filter, a transformation function and specifications of a new data set to be returned as result of the processing step.
- Parameters:
data_set_ids – An optional list of data set ids to be used for the transformation. See
DataSetProcessingStepMixin
.transform_function – An optional function to be applied to the data sets. See
MutateDataSetProcessingStepBase
. The transform function is expected to produce one or more columns of a data set according to the specification in definitions. The function can return NumPy unidimensional arrays, Pandas series and data frames.new_data_set_id (str) – An optional id used for the
RawDataSetDefinition
. If no id was provided, thenew_data_set_id
class variable will be used. Alternatively, one can overwriteget_new_data_set_id()
to provide the new data set id.definitions (List[dispel.data.raw.RawDataValueDefinition]) – An optional list of
RawDataValueDefinition
that has to match the number of columns returned by thetransform_function
. If no definitions were provided, thedefinitions
class variable will be used. Alternatively, one can overwriteget_definitions()
to provide the list of definitions.level_filter – An optional filter to limit the levels being processed. See
LevelProcessingStep
.storage_error (dispel.processing.data_set.StorageError) –
This argument is only useful when the given new data id already exists. In which case, the following options are available:
'ignore'
: the computation of the transformation step for the concerned level will be ignored.'overwrite'
: the existing data set id will be overwritten by the result of transform step computation.'concatenate'
: the existing data set id will be concatenated with the result of transform step computation.'raise'
: An error will be raised if we want to overwrite on an existing data set id.
Examples
Assuming you want to calculate the euclidean norm of a data set
'acceleration'
for a specific level'left-small'
and then name the new data set'accelerometer-norm'
, you can create the following step:>>> from dispel.data.raw import RawDataValueDefinition >>> from dispel.processing.transform import TransformStep >>> from dispel.signal.core import euclidean_norm >>> step = TransformStep( ... 'accelerometer', ... euclidean_norm, ... 'accelerometer-norm', ... [RawDataValueDefinition('norm', 'Accelerometer norm', 'm/s^2')] ... )
The transformation function will be called with the specified data sets as arguments. If the function has named parameters matching
level
orreading
, the respective level and reading will be passed to the transformation function.Another common scenario is to define a class that can be reused.
>>> from dispel.data.raw import RawDataValueDefinition >>> from dispel.processing.transform import TransformStep >>> class MyTransformStep(TransformStep): ... data_set_ids = 'accelerometer' ... transform_function = euclidean_norm ... new_data_set_id = 'accelerometer-norm' ... definitions = [ ... RawDataValueDefinition('norm', 'Accelerometer norm', 'm/s^2') ... ]
Another convenient way to provide the transformation function is to use the
@transformation
decorator:>>> import pandas as pd >>> import numpy as np >>> from dispel.data.raw import RawDataValueDefinition >>> from dispel.processing.data_set import transformation >>> from dispel.processing.transform import TransformStep >>> class MyTransformStep(TransformStep): ... data_set_ids = 'accelerometer' ... new_data_set_id = 'accelerometer-norm' ... definitions = [ ... RawDataValueDefinition('norm', 'Accelerometer norm', 'm/s^2') ... ] ... ... @transformation ... def _euclidean_norm(self, data: pd.DataFrame) -> pd.Series: ... return data.pow(2).sum(axis=1).apply(np.sqrt)
Note that the decorated functions can also use
level
andreading
as parameters to gain access to the respective level and reading being processed.- __init__(data_set_ids=None, transform_function=None, new_data_set_id=None, definitions=None, level_filter=None, storage_error=None)[source]#
- Parameters:
new_data_set_id (str | None) –
definitions (List[RawDataValueDefinition] | None) –
level_filter (str | LevelId | List[str] | List[LevelId] | LevelFilter | None) –
storage_error (StorageError | Literal['raise', 'ignore', 'overwrite', 'concatenate'] | None) –
- definitions: List[RawDataValueDefinition]#
- process_level(level, reading, **kwargs)[source]#
Process the provided Level.
- Parameters:
- Return type:
Generator[ProcessingResult | ProcessingControlResult, None, None]
- storage_error: StorageError = 'raise'#
- wrap_result(res, level, reading, **kwargs)[source]#
Wrap the result from the processing function into a class.
- Parameters:
- Return type:
Generator[LevelProcessingResult | RawDataSetProcessingResult, None, None]
- class dispel.processing.transform.TransformStepChainMixIn[source]#
Bases:
DataSetProcessingStepProtocol
A mixin class that allows to chain transformation steps.
The basic idea is to leverage the new data set ids from the previous transform step as the required data set ids for the current step. This avoids having to define the data_set_ids attribute.
- get_data_set_ids()[source]#
Get the data set ids to be processed.
This uses the new data set ids from a previous transform step if set. Otherwise, falls back to the default behavior of returning the set data set ids from the constructor or class variable.
- Returns:
An iterable of data set ids.
- Return type:
Iterable[str]