# User-defined 'feature extraction' script in SimBA SimBA extracts ['features' from pose-estimation data](https://github.com/sgoldenlab/simba/blob/master/docs/tutorial.md#step-5-extract-features), and use these features together with [behavior annotations](https://github.com/sgoldenlab/simba/blob/master/docs/tutorial.md#step-6-label-behavior) to [build predictive classifiers](https://github.com/sgoldenlab/simba/blob/master/docs/tutorial.md#step-7-train-machine-model) of the behavior. In SimBA (when using pre-defined pose-estimation body-part configurations), these features compose of curated metrics such as computes explainable feature representations of movements, angles, paths, velocities, distances, and sizes within individual frames and as rolling time-window aggregates. The exact features, and the number of features, is hard-coded in SimBA and is determined by the number of body-part key-points tracked during pose-estimation. For examples of the features calculated by SimBA when using pose-estimation from 2 animals and 16-body-parts, see [this file](https://github.com/sgoldenlab/simba/blob/master/misc/Feature_description.csv). In some scenarios, however, employing these built-in feature-extraction scripts is not suitable, and the user may want to extract other features that are of particuar relevance to their behavior of interest and experimental protocol. We address this with the `user-defined feature extraction` function in SimBA. The `user-defined 'feature extraction` function in SimBA gives users significant flexibility, e.g.: *1. Advanced users can write their own, brand-new, feature extraction scripts and use them within the SimBA GUI environment. These can be shared independently of the SimBA program, and may improve classifier accuracy and computational time in experimental settings for which SimBA was not originally targeted (i.e., non-social behavior).* *2. As SimBA is developing - the default hard-coded feature extraction files could, at times, be updated (i) with further, additional, features that we have found powerful for classifying particular social behaviors, or (ii) originaly calculated features, that we have found to lack predictive power in social settings, may be removed to speed up computational time. The ability to deploy user-defined 'feature extraction' scrips make newer versions of SimBA back-compatible, and users can specify to use feature extraction scripts that came with any prior version of SimBA within the latest SimBA GUI environment.* *3. By default, when using user-defined pose-estimation body-part configurations in SimBA, a generic feature battery of features is calculated in SimBA which encompass the distance between all body-parts and their velocities in rolling windows. This may not be optimal - as this generic feature set could include many features that are not relevant for the behavior of interest, while also missing some key features that could increase predictive accuracy if they were included. If the user is tracking a large number of body-parts using a user-defined pose-estimation configuration, the generaically defined feature set can also turn very large. The ability to deplay user-generated feature extraction scripts in SimBA overcomes these hurdles.* *4. Many features that SimBA calculates (e.g., distances between individual body-parts within animals, or the size of the animal represented as a convex hull) are only really relevant for classifying behaviors if the animals show variability in these features across sequential frames. Shape-shifting animals like, like rodents, do show variability in these features, while non-shape-shifting animals, like fish, do **not** show variability in these features. If we are working with non-shape shifting animals like fish we may therefore want to calculate alternative features; like angular features, dispersion time-series decomposition, rotation etc to build more accurate classifiers instead.* *5. SimBA has a large battery of feature caluclators only accessable through the API (as of 12/23). These feature calculators tap into [frequentist](https://simba-uw-tf-dev.readthedocs.io/en/latest/simba.mixins.html#module-simba.mixins.statistics_mixin), [circular](https://simba-uw-tf-dev.readthedocs.io/en/latest/simba.mixins.html#module-simba.mixins.circular_statistics), and [time-series](https://simba-uw-tf-dev.readthedocs.io/en/latest/simba.mixins.html#module-simba.mixins.timeseries_features_mixin) statistics. They also compute [geometric](https://simba-uw-tf-dev.readthedocs.io/en/latest/simba.mixins.html#module-simba.mixins.geometry_mixin) manipulations and time-dependent [network (graph)](https://simba-uw-tf-dev.readthedocs.io/en/latest/simba.mixins.html#module-simba.mixins.network_mixin) based measures and other ML related distribution measures. To take advantage of these, users will currently have to write standalone classes calling these methods.* ### Use a user-defined feature extraction script in SimBA 1. Before using a user-defined a feature extraction script in SimBA, load your project, import the pose-estimation tracking files and correct outliers. For instructions on how to load your project, importing pose-estimation tracking files, and correcting outliers, read the walk-through tutorial for [Scenario 1](https://github.com/sgoldenlab/simba/blob/master/docs/Scenario1.md) and/or [Part I](https://github.com/sgoldenlab/simba/blob/master/docs/tutorial.md#step-1-generate-project-config%5D) or the generic SimBA tutorial. 2. Navigate to the `Extract features` tab in SimBA. and you should see the following window with the "User-defined feature extraction" menu marked in red in the image.