Statistics transformations

Statistics mixin


class simba.mixins.statistics_mixin.Statistics[source]

Statistics methods used for feature extraction, drift assessment, distance computations, distribution comparisons in sliding and static windows.

Note

Most methods implemented using numba parallelization for improved run-times. See line graph below for expected run-times for a few methods included in this class.

Most method has numba typed signatures to decrease compilation time through reduced type inference. Make sure to pass the correct dtypes as indicated by signature decorators. If dtype is not specified at array creation, it will typically be float64 or int64. As most methods here use float32 for the input data argument, make sure to downcast.

This class contains a few probability distribution comparison methods. These are being moved to simba.sandbox.distances (05.24).

Statistics runtimes

References

1

Bernard Desgraupes - https://cran.r-project.org/web/packages/clusterCrit/vignettes/clusterCrit.pdf

2

Ikotun, A. M., Habyarimana, F., & Ezugwu, A. E. (2025). Cluster validity indices for automatic clustering: A comprehensive review. Heliyon, 11(2), e41953. https://doi.org/10.1016/j.heliyon.2025.e41953

3

Hassan, B. A., Tayfor, N. B., Hassan, A. A., Ahmed, A. M., Rashid, T. A., & Abdalla, N. N. (2024). From A-to-Z review of clustering validation indices. arXiv. https://doi.org/10.48550/arXiv.2407.20246

4

Leland McInnes - pynndescent.

Members

Undoc-members

Statistics GPU methods


members

undoc-members