dispel.stats.reliability module#

Reliability analyses module.

Intraclass correlation coefficients#

This module contains functions to compute intraclass correlation coefficient (ICC) models[1][2].

Those functions allow to compute single score or average score ICCs as an index of inter-rater reliability of quantitative data. Additionally, F-test and confidence interval are computed.

Implemented ICCs#

ICC

Model

Description

Unit

Function

ICC(1, 1)

ONE_WAY

AGREEMENT

SINGLE

icc_oneway_random_absolute_single()

ICC(2, 1)

TWO_WAY

AGREEMENT

SINGLE

icc_two_way_random_absolute_single()

ICC(3,1)

TWO_WAY

CONSISTENCY

SINGLE

icc_two_way_mixed_consistency_single()

ICC(1,k)

ONE_WAY

AGREEMENT

AVERAGE

icc_oneway_random_absolute_average()

ICC(2,k)

TWO_WAY

AGREEMENT

AVERAGE

icc_two_way_random_absolute_average()

ICC(3,k)

TWO_WAY

CONSISTENCY

AVERAGE

icc_two_way_mixed_consistency_average()

When considering which form of ICC is appropriate for an actual set of data one has take several decisions (Shrout & Fleiss, 1979)[3]:

The implementations of the ICCs are based on:

References

class dispel.stats.reliability.ICCDesc[source]#

Bases: StringEnum

The ICC description.

AGREEMENT = 'agreement'#

Agreement between raters

CONSISTENCY = 'consistency'#

Consistency between raters

class dispel.stats.reliability.ICCKind[source]#

Bases: StringEnum

The kind of ICC analysis.

OTHER = 'other'#
PARALLEL_FORM = 'parallel form'#
TEST_RETEST = 'test retest'#
class dispel.stats.reliability.ICCModel[source]#

Bases: StringEnum

The model type used to perform the ICC analysis.

ONE_WAY = 'oneway'#

Random subject effects

TWO_WAY = 'two-way'#

Random subject and repetition effects

class dispel.stats.reliability.ICCResult[source]#

Bases: object

ICC reliability analysis results.

__init__(model, desc, unit, kind, value, l_bound, u_bound, p_value, sample_size, sessions, power=0.8)#
Parameters:
Return type:

None

desc: ICCDesc#

The description of ICC

kind: ICCKind#

The kind of the ICC

l_bound: float#

The lower bound of the 95% confidence interval

model: ICCModel#

The model of ICC

p_value: float#

The p-value of the test must be under 0.05

power: float = 0.8#

The power of the study regarding the sample size and ICC

sample_size: int#

The subject sample size associated to the current study and ICC

sessions: int#

The number of sessions considered in the ICC analysis

u_bound: float#

The upper bound of the 95% confidence interval

unit: ICCUnit#

The unit of ICC

value: float#

The ICC value

class dispel.stats.reliability.ICCResultSet[source]#

Bases: object

Class ensemble of ICC scores for multiple measures.

__init__(study, p0_icc, iccs=<factory>)#
Parameters:
Return type:

None

iccs: Dict[str, ICCResult]#

The ICC scores for each measure associated by their measure_id

p0_icc: float#

The null hypothesis reference for measure ICC scores

study: ICCResultSetStudy#

Sort of study concerned either control or patient

to_data_frame()[source]#

Export ICC result set to a pandas data frame format.

Return type:

DataFrame

class dispel.stats.reliability.ICCResultSetStudy[source]#

Bases: str, Enum

The type of the study.

STUDY_CLINICAL = 'clinical'#
STUDY_CONTROL = 'control'#
class dispel.stats.reliability.ICCUnit[source]#

Bases: StringEnum

The ICC unit.

AVERAGE = 'average'#

Values analyzed are aggregates of multiple values

SINGLE = 'single'#

Values analyzed are single values

class dispel.stats.reliability.StringEnum[source]#

Bases: Enum

String enumerator.

dispel.stats.reliability.ensure_session_standards(data, session_min=8, null_ratio=0.1)[source]#

Ensure sessions have sufficient support.

This transformation ensures that sessions have sufficient support, i.e. a subject is only considered if it has contributed more than session_min sessions. A session is dropped if it has a higher null ratio than null_ratio.

Parameters:
  • data (DataFrame) – A data frame with subjects as rows, sessions as columns, and cells containing the measure values.

  • session_min (int) – The minimum number of required sessions for each subject to be considered in the analysis.

  • null_ratio (float) – The ratio of null values across subjects for a particular session below which it is taken into account.

Returns:

The filtered data containing only subjects that have contributed at least session_min sessions, sessions that have a lower ratio of null values than null_ratio, and subjects that have no null value for the latter sessions.

Return type:

pandas.DataFrame

dispel.stats.reliability.icc_oneway_random_absolute_average(ratings, confidence_level=0.95)[source]#

Compute the ICC(1,k) score.

ICC(1,k): One-way random effects, absolute agreement, multiple raters/measurements.

Parameters:
  • ratings (DataFrame) – Matrix with n subjects m raters, i.e. array-like, shape (n_subjects, n_raters)

  • confidence_level (float) – Confidence level of the interval.

Returns:

ICC with its relative test information.

Return type:

ICCResult

dispel.stats.reliability.icc_oneway_random_absolute_single(ratings, confidence_level=0.95)[source]#

Compute the ICC(1,1) score.

ICC(1,1) : One-way random effects, absolute agreement, single rater/measurement.

Parameters:
  • ratings (DataFrame) – Matrix with n subjects m raters, i.e. array-like, shape (n_subjects, n_raters)

  • confidence_level (float) – Confidence level of the interval.

Returns:

ICC with its relative test information

Return type:

ICCResult

dispel.stats.reliability.icc_parallel_form(form1, form2)[source]#

Compute the icc score from parallel form measures.

This score allows the comparison of to measures of same nature and definition but obtained in different condition (example : CPS mean RT on predefinedKey1 compared to predefinedKey2)

Parameters:
  • form1 (pandas.DataFrame) – A data frame containing the M measure form 1 values for all users

  • form2 (pandas.DataFrame) – A data frame containing the M’ measure form 2 values for all users

Returns:

The parallel form ICC score containing the value with its definition and its 95% confidence interval

Return type:

ICCResult

dispel.stats.reliability.icc_power(icc, p0_icc, n_ratings, n_subjects, p_value=0.05, tails=2)[source]#

Compute the power of an ICC score obtained during a study.

Parameters:
  • icc (float) – The ICC obtained during the study

  • p0_icc (float) – The null hypothesis value of the obtained ICC

  • n_ratings (int) – The number of ratings for each subject

  • n_subjects (int) – Number of subjects during the study

  • p_value (float) – The p_value of the study. Should have been set at 0.05.

  • tails (Literal[1, 2]) – Unilateral (1) or Bilateral (2) test.

Returns:

The power associates with the ICC study

Return type:

float

dispel.stats.reliability.icc_sample_size(icc, p0_icc, n_ratings, p_value=0.05, tails=2, power=0.8)[source]#

Compute the sample size for an ICC reliability scoring. See [4].

Parameters:
  • icc (float) – The ICC score expected during the study

  • p0_icc (float) – The null hypothesis value of the expected ICC

  • n_ratings (int) – The number of ratings for each subject

  • p_value (float) – The desired p_value. Set at 0.05 for statistical tests

  • tails (Literal[1, 2]) – Unilateral (1) or Bilateral (2) test

  • power (float) – The statistical power of the test. Always sets at 0.8 for clinical study.

Returns:

The sample size of the study

Return type:

int

References

dispel.stats.reliability.icc_set_test_retest(measure_collection, study=ICCResultSetStudy.STUDY_CONTROL, session_min=8, null_ratio=0.1, errors='raise')[source]#

Compute the ICC test retest score for all measures.

It takes into consideration the study type, which could be either “control” or “clinical”. It also ensures sessions have sufficient support. See icc_test_retest_session_safe() for details.

Parameters:
  • measure_collection (MeasureCollection) – A collection of measures

  • study (ICCResultSetStudy) – Type of the study used to determine the \(p_0^{icc}\). 0 is used for control studies and 0.6 for clinical ones.

  • session_min (int) – See ensure_session_standards().

  • null_ratio (float) – See ensure_session_standards().

  • errors (Literal['raise', 'ignore']) – How to handle errors occurring during the computation of ICC scores. - If ‘raise’, then errors will be risen. - If ‘ignore’, then the measure will be skipped.

Returns:

A set of measure ICC

Return type:

ICCResultSet

Raises:

ValueError – If errors is set to ‘raise’, ValueError will be risen again if it occurred in icc_test_retest_session_safe().

dispel.stats.reliability.icc_test_retest(data, study=ICCResultSetStudy.STUDY_CONTROL)[source]#

Compute the test-retest ICC of a data frame.

Implemented by: Mind-the-Pineapple/ICC is licensed under the MIT License. See [5] and [6].

Parameters:
  • data (DataFrame) – A N*M pandas DataFrame containing N subjects and M ratings

  • study (ICCResultSetStudy) – Status of the study

Returns:

The test retest ICC score containing the value with its definition and its 95% confidence interval

Return type:

ICCResult

References

dispel.stats.reliability.icc_test_retest_session_safe(measure_collection, measure_id, study=ICCResultSetStudy.STUDY_CONTROL, session_min=8, null_ratio=0.1)[source]#

Compute the ICC test retest score for one measure.

Parameters:
Returns:

The ICC score results for the provided measure.

Return type:

ICCResult

dispel.stats.reliability.icc_two_way_mixed_consistency_average(ratings, confidence_level=0.95)[source]#

Compute the ICC(3,k) score.

ICC(3,k): Two-way mixed effects, consistency, average raters/measurements.

Parameters:
  • ratings (DataFrame) – Matrix with n subjects m raters, i.e. array-like, shape (n_subjects, n_raters)

  • confidence_level (float) – Confidence level of the interval.

Returns:

ICC with its relative test information

Return type:

ICCResult

dispel.stats.reliability.icc_two_way_mixed_consistency_single(ratings, confidence_level=0.95)[source]#

Compute the ICC(3,1) score.

ICC(3,1): Two-way mixed effects, consistency, single rater/measurement.

Parameters:
  • ratings (DataFrame) – Matrix with n subjects m raters, i.e. array-like, shape (n_subjects, n_raters)

  • confidence_level (float) – Confidence level of the interval.

Returns:

ICC with its relative test information

Return type:

ICCResult

dispel.stats.reliability.icc_two_way_random_absolute_average(ratings, confidence_level=0.95)[source]#

Compute the ICC(2,k) score.

ICC(2,k): Two-way random effects, absolute agreement, average raters/measurements.

Parameters:
  • ratings (DataFrame) – Matrix with n subjects m raters, i.e. array-like, shape (n_subjects, n_raters)

  • confidence_level (float) – Confidence level of the interval.

Returns:

ICC with its relative test information.

Return type:

ICCResult

dispel.stats.reliability.icc_two_way_random_absolute_single(ratings, confidence_level=0.95)[source]#

Compute the ICC(2,1) score.

ICC(2,1): Two-way random effects, absolute agreement, single rater/measurement.

Parameters:
  • ratings (DataFrame) – Matrix with n subjects m raters, i.e. array-like, shape (n_subjects, n_raters)

  • confidence_level (float) – Confidence level of the interval.

Returns:

ICC with its relative test information

Return type:

ICCResult