SimBA argument checksο
- simba.utils.checks.check_all_dfs_in_list_has_same_cols(dfs: List[DataFrame], raise_error: bool = True, source: str = '') bool[source]ο
Check that all DataFrames in a list have the same column names.
This function validates that all DataFrames in the provided list contain identical column headers. It finds the intersection of all column names and identifies any missing headers that are not present in all DataFrames.
- Parameters
- Returns
True if all DataFrames have the same column names, False if they donβt match and raise_error=False.
- Return type
- Raises
MissingColumnsError β If DataFrames have different column names and raise_error=True.
- Example
>>> df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) >>> df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]}) >>> check_all_dfs_in_list_has_same_cols(dfs=[df1, df2]) True >>> df3 = pd.DataFrame({'A': [1, 2], 'C': [3, 4]}) >>> check_all_dfs_in_list_has_same_cols(dfs=[df1, df3], raise_error=False) False
- simba.utils.checks.check_all_file_names_are_represented_in_video_log(video_info_df: DataFrame, data_paths: List[Union[str, PathLike]]) None[source]ο
Helper to check that all files are represented in a dataframe of the SimBA project_folder/logs/video_info.csv file.
- Parameters
video_info_df (pd.DataFrame) β List of file-paths.
data_paths (List[Union[str, os.PathLike]]) β List of file-paths.
- Raises
ParametersFileError β The list is empty.
- simba.utils.checks.check_ffmpeg_available(raise_error: Optional[bool] = False) Optional[bool][source]ο
Helper to check of FFMpeg is available via subprocess
ffmpeg.See also
To check which encoders are available in FFMpeg installation, see
simba.utils.lookups.get_ffmpeg_encoders()- Parameters
raise_error (Optional[bool]) β If True, raises
FFMPEGNotFoundErrorif FFmpeg canβt be found. Else return False. Default False.- Returns bool
True if
ffmpegreturns not None and raise_error is False. Else False.
- simba.utils.checks.check_file_exist_and_readable(file_path: Union[str, PathLike], raise_error: bool = True) bool[source]ο
Checks if a path points to a readable file.
- Parameters
file_path (str) β Path to file on disk.
- Raises
NoFilesFoundError β The file does not exist.
CorruptedFileError β The file can not be read or is zero byte size.
- simba.utils.checks.check_filepaths_in_iterable_exist(file_paths: Iterable[str], name: Optional[str] = None)[source]ο
- simba.utils.checks.check_float(name: str, value: Any, max_value: Optional[float] = None, min_value: Optional[float] = None, raise_error: bool = True, allow_zero: bool = True, allow_negative: bool = True) Tuple[bool, str][source]ο
Check if variable is a valid float.
- Parameters
name (str) β Name of variable
value (Any) β Value of variable
max_value (Optional[int]) β Maximum allowed value of the float. If None, then no maximum. Default: None.
Optional[int] β Minimum allowed value of the float. If None, then no minimum. Default: Non
allow_zero (Optional[bool]) β If True, do not allow float to be zero. Default: True and allow zero.
allow_negative (Optional[bool]) β If True, do not allow float to be below zero Default: True and allow negative.
raise_error (Optional[bool]) β If True, then raise error if invalid float. Default: True.
- Returns
If raise_error is False, then returns size-2 tuple, with first value being a bool representing if valid float, and second value a string representing error (if valid is False, else empty string)
- Return type
- Examples
>>> check_float(name='My_float', value=0.5, max_value=1.0, min_value=0.0)
- simba.utils.checks.check_if_2d_array_has_min_unique_values(data: ndarray, min: int) bool[source]ο
Check if a 2D NumPy array has at least a minimum number of unique rows.
For example, use when creating shapely Polygons or Linestrings, which typically requires at least 2 or three unique body-part coordinates.
- Parameters
data (np.ndarray) β Input 2D array to be checked.
min (np.ndarray) β Minimum number of unique rows required.
- Return bool
True if the input array has at least the specified minimum number of unique rows, False otherwise.
- Example
>>> data = np.array([[0, 0], [0, 0], [0, 0], [0, 1]]) >>> check_if_2d_array_has_min_unique_values(data=data, min=2) >>> True
- simba.utils.checks.check_if_df_field_is_boolean(df: DataFrame, field: Union[str, List[str]], raise_error: bool = True, bool_values: Optional[Tuple[Any]] = (0, 1), df_name: Optional[str] = '')[source]ο
Validate that one or more DataFrame columns only contain accepted boolean labels.
Accepted values are defined by
bool_values(defaults to(0, 1)), so this utility supports both numeric and custom binary encodings.- Parameters
df (pd.DataFrame) β DataFrame to validate.
field (Union[str, List[str]]) β Column name or list of column names to check.
raise_error (bool) β If
True, raiseCountErroron invalid values. IfFalse, returnFalsewhen invalid values are detected.bool_values (Optional[Tuple[Any]]) β Accepted values representing boolean labels.
df_name (Optional[str]) β Optional DataFrame name included in error text.
- Returns
Truewhen validation succeeds, elseFalseifraise_error=Falseand invalid values are found.- Return type
- Raises
InvalidInputError β If
fieldis neitherstrnorList[str].CountError β If invalid values are found and
raise_error=True.
- Example
>>> df = pd.DataFrame({'binary_col': [0, 1, 0, 1], 'mixed_col': [0, 1, 2, 0], 'flag': [1, 0, 1, 0]}) >>> check_if_df_field_is_boolean(df=df, field='binary_col', bool_values=(0, 1)) True >>> check_if_df_field_is_boolean(df=df, field='mixed_col', raise_error=False) False >>> check_if_df_field_is_boolean(df=df, field=['binary_col', 'flag'], bool_values=(0, 1)) True
- simba.utils.checks.check_if_dir_exists(in_dir: Union[str, PathLike], source: Optional[str] = None, create_if_not_exist: Optional[bool] = False, raise_error: bool = True) Union[None, bool][source]ο
Check if a directory path exists.
- Parameters
in_dir (Union[str, os.PathLike]) β Putative directory path.
source (Optional[str]) β String source for interpretable error messaging.
create_if_not_exist (Optional[bool]) β If directory does not exist, then create it. Default False.
raise_error (Optional[bool]) β If True, raise error if dir does not exist. If False return None. Default True.
- Raises
NotDirectoryError β The directory does not exist.
- simba.utils.checks.check_if_filepath_list_is_empty(filepaths: List[str], error_msg: str) None[source]ο
Check if a list is empty
- Parameters
List[str] β List of file-paths.
- Raises
NoFilesFoundError β The list is empty.
- simba.utils.checks.check_if_headers_in_dfs_are_unique(dfs: List[DataFrame]) List[str][source]ο
Helper to check heaaders in multiple dataframes are unique.
- Parameters
dfs (List[pd.DataFrame]) β List of dataframes.
- Return List[str]
List of columns headers seen in multiple dataframes. Empty if None.
- Examples
>>> df_1, df_2 = pd.DataFrame([[1, 2]], columns=['My_column_1', 'My_column_2']), pd.DataFrame([[4, 2]], columns=['My_column_3', 'My_column_1']) >>> check_if_headers_in_dfs_are_unique(dfs=[df_1, df_2]) >>> ['My_column_1']
- simba.utils.checks.check_if_keys_exist_in_dict(data: dict, key: Union[str, int, tuple, List], name: Optional[str] = '', raise_error: Optional[bool] = True) bool[source]ο
Check if one or more keys exist in a dictionary.
This function validates that all specified keys are present in the given dictionary. It can check for a single key or multiple keys at once.
See also
- Parameters
data (dict) β The dictionary to check for key existence.
key (Union[str, int, tuple, List]) β The key(s) to check for in the dictionary. Can be a single key or a list/tuple of keys.
name (Optional[str]) β A string identifying the source or context of the data for informative error messaging. Default: ββ.
raise_error (Optional[bool]) β If True, raises InvalidInputError if any key is missing. If False, returns False instead of raising an error. Default: True.
- Return bool
True if all keys exist in the dictionary, False if any key is missing (when raise_error=False).
- Raises
InvalidInputError β If any of the specified keys do not exist in the dictionary and raise_error=True.
- Example
>>> data = {'a': 1, 'b': 2, 'c': 3} >>> check_if_keys_exist_in_dict(data=data, key='a') True >>> check_if_keys_exist_in_dict(data=data, key=['a', 'b']) True >>> check_if_keys_exist_in_dict(data=data, key='d', raise_error=False) False
- simba.utils.checks.check_if_list_contains_values(data: List[Union[str, int, float]], values: List[Union[str, int, float]], name: str, raise_error: bool = True) None[source]ο
Helper to check if values are represeted in a list. E.g., make sure annotatations of behvaior absent and present are represented in annitation column
- Parameters
data (List[Union[float, int, str]]) β List of values. E.g., annotation column represented as list.
values (List[Union[float, int, str]]) β Values to conform present. E.g., [0, 1].
name (str) β Arbitrary name of the data for more useful error msg.
raise_error (bool) β If True, raise error of not all values can be found in data. Else, print warning.
- Example
>>> check_if_list_contains_values(data=[1,2, 3, 4, 0], values=[0, 1, 6], name='My_data')
- simba.utils.checks.check_if_module_has_import(parsed_file: Module, import_name: str) bool[source]ο
Check if a Python module has a specific import statement. For example, check if module imports argparse or circular statistics mixin.
Used for e.g., user custom feature extraction classes in
simba.utils.custom_feature_extractor.CustomFeatureExtractor.- Parameters
file_path (ast.Module) β The abstract syntax tree (AST) of the Python module.
import_name (str) β The name of the module or package to check for in the import statements.
bool β True if the specified import is found in the module, False otherwise.
- Example
>>> parsed_file = ast.parse(Path('/simba/misc/piotr.py').read_text()) >>> check_if_module_has_import(parsed_file=parsed_file, import_name='argparse') >>> True
- simba.utils.checks.check_if_string_value_is_valid_video_timestamp(value: str, name: str, raise_error: bool = True) bool[source]ο
Helper to check if a string is in a valid HH:MM:SS format
- Parameters
- Raises
InvalidInputError β If the timestamp is in invalid format
- Example
>>> check_if_string_value_is_valid_video_timestamp(value='00:0b:10', name='My time stamp') >>> "InvalidInputError: My time stamp is should be in the format XX:XX:XX where X is an integer between 0-9" >>> check_if_string_value_is_valid_video_timestamp(value='00:00:10', name='My time stamp'
- simba.utils.checks.check_if_valid_img(data: ndarray, source: str = '', raise_error: bool = True, greyscale: bool = False, size: Optional[Tuple[int, int]] = None, color: bool = False) Optional[bool][source]ο
Check if a variable is a valid image.
- Parameters
source (str) β Name of the variable and/or class origin for informative error messaging and logging.
data (np.ndarray) β Data variable to check if a valid image representation.
greyscale (bool) β Checks that the image is greyscale. Default False.
color (bool) β Checks that the image is color. Default False.
raise_error (bool) β If True, raise InvalidInputError if invalid image representation. Else, return bool.
- simba.utils.checks.check_if_valid_input(name: str, input: str, options: ~typing.List[str], raise_error: bool = True) -> (<class 'bool'>, <class 'str'>)[source]ο
Check if string variable is valid option.
See also
Consider
simba.utils.checks.check_str().- Parameters
- Return bool
False if invalid. True if valid.
- Return str
If invalid, then error msg. Else, empty str.
- Example
>>> check_if_valid_input(name='split_eval', input='gini', options=['entropy', 'gini']) >>> (True, '')
- simba.utils.checks.check_if_valid_rgb_str(input: str, delimiter: str = ',', return_cleaned_rgb_tuple: bool = True, reverse_returned: bool = True)[source]ο
Helper to check if a string is a valid representation of an RGB color.
- Parameters
input (str) β Value to check as string. E.g., β(166, 29, 12)β or β22,32,999β
delimiter (str) β The delimiter between subsequent values in the rgb input string.
return_cleaned_rgb_tuple (bool) β If True, and input is a valid rgb, then returns a βcleanβ rgb tuple: Eg. β166, 29, 12β -> (166, 29, 12). Else, returns None.
reverse_returned (bool) β If True and return_cleaned_rgb_tuple is True, reverses to returned cleaned rgb tuple (e.g., RGB becomes BGR) before returning it.
- Example
>>> check_if_valid_rgb_str(input='(50, 25, 100)', return_cleaned_rgb_tuple=True, reverse_returned=True) >>> (100, 25, 50)
- simba.utils.checks.check_if_valid_rgb_tuple(data: Tuple[int, int, int], raise_error: bool = True, source: Optional[str] = None) bool[source]ο
- simba.utils.checks.check_if_video_corrupted(video: Union[str, PathLike, VideoCapture], frame_interval: Optional[int] = None, frame_n: Optional[int] = 20, raise_error: Optional[bool] = True) None[source]ο
Check if a video file is corrupted by inspecting a set of its frames.
Note
For decent run-time regardless of video length, pass a smaller
frame_n(<100).- Parameters
video_path (Union[str, os.PathLike]) β Path to the video file or cv2.VideoCapture OpenCV object.
frame_interval (Optional[int]) β Interval between frames to be checked. If None,
frame_nwill be used.frame_n (Optional[int]) β Number of frames to be checked, will be sampled at large allowed interval. If None,
frame_intervalwill be used.raise_error (Optional[bool]) β Whether to raise an error if corruption is found. If False, prints warning.
- Return None
- Example
>>> check_if_video_corrupted(video_path='/Users/simon/Downloads/NOR ENCODING FExMP8.mp4')
- simba.utils.checks.check_instance(source: str, instance: object, accepted_types: Union[Tuple[Any], Any], raise_error: bool = True, warning: bool = True) bool[source]ο
Check if an instance is an acceptable type.
- Parameters
name (str) β Arbitrary name of instance used for interpretable error msg. Can also be the name of the method.
instance (object) β A data object.
accepted_types (Union[Tuple[object], object]) β Accepted instance types. E.g., (Polygon, pd.DataFrame) or Polygon.
raise_error (Optional[bool]) β If True, raises error of instance is not of valid type, else returns bool.
warning (Optional[bool]) β If True, prints warning of instance is not of valid type, else returns bool.
- simba.utils.checks.check_int(name: str, value: Any, max_value: Optional[int] = None, min_value: Optional[int] = None, unaccepted_vals: Optional[List[int]] = None, accepted_vals: Optional[List[int]] = None, allow_negative: bool = True, allow_zero: bool = True, raise_error: Optional[bool] = True) Tuple[bool, str][source]ο
Check if variable is a valid integer.
Validates that a value is an integer and optionally checks it against constraints such as minimum/maximum values, accepted/unaccepted value lists, and negative/zero number restrictions.
- Parameters
name (str) β Name of the variable being checked (used in error messages).
value (Any) β The value to validate as an integer.
max_value (Optional[int]) β Maximum allowed value. If None, no maximum constraint. Default None.
min_value (Optional[int]) β Minimum allowed value. If None, no minimum constraint. Default None.
unaccepted_vals (Optional[List[int]]) β List of integer values that are not accepted. If value is in this list, validation fails. Default None.
accepted_vals (Optional[List[int]]) β List of integer values that are accepted. If value is not in this list, validation fails. Default None.
allow_negative (bool) β If False, negative values will cause validation to fail. Default True.
allow_zero (bool) β If False, zero values will cause validation to fail. Default True.
raise_error (Optional[bool]) β If True, raises IntegerError when validation fails. If False, returns (False, error_message) tuple. Default True.
- Returns
If raise_error is False, returns a tuple (bool, str) where bool indicates if value is valid, and str contains error message (empty string if valid). If raise_error is True and validation passes, returns (True, ββ). If raise_error is True and validation fails, raises IntegerError.
- Return type
- Raises
IntegerError β If validation fails and raise_error is True.
- Example
>>> check_int(name='My_fps', value=25, min_value=1) >>> check_int(name='Quality', value=50, min_value=0, max_value=100, raise_error=False) >>> check_int(name='Mode', value=2, accepted_vals=[1, 2, 3]) >>> check_int(name='Count', value=-5, allow_negative=False) >>> check_int(name='Divisor', value=0, allow_zero=False)
- simba.utils.checks.check_iterable_length(source: str, val: int, exact_accepted_length: Optional[int] = None, max: Optional[int] = inf, min: int = 1, raise_error: bool = True) bool[source]ο
- simba.utils.checks.check_minimum_roll_windows(roll_windows_values: List[int], minimum_fps: float) List[int][source]ο
Remove any rolling temporal window that are shorter than a single frame in any of the videos within the project.
- simba.utils.checks.check_nvidea_gpu_available(raise_error: bool = False) bool[source]ο
Helper to check of NVIDEA GPU is available via
nvidia-smi. returns bool: True if nvidia-smi returns not None. Else False.
- simba.utils.checks.check_same_files_exist_in_all_directories(dirs: List[Union[str, PathLike]], raise_error: bool = False, file_type: str = 'csv') bool[source]ο
Check if the same files of a given type exist in all specified directories.
- Parameters
dirs (List[Union[str, os.PathLike]]) β List of directory paths to check.
raise_error (bool) β If True, raises an error when file names do not match across directories. Defaults to False.
raise_error β File extension (without the dot) to check for (e.g., βcsvβ, βtxtβ). Defaults to βcsvβ.
- simba.utils.checks.check_same_number_of_rows_in_dfs(dfs: List[DataFrame]) bool[source]ο
Helper to check that each dataframe in list contains an equal number of rows
- Parameters
dfs (List[pd.DataFrame]) β List of dataframes.
- Return bool
True if dataframes has an equal number of rows. Else False.
>>> df_1, df_2 = pd.DataFrame([[1, 2], [1, 2]]), pd.DataFrame([[4, 2], [9, 3], [1, 5]]) >>> check_same_number_of_rows_in_dfs(dfs=[df_1, df_2]) >>> False >>> df_1, df_2 = pd.DataFrame([[1, 2], [1, 2]]), pd.DataFrame([[4, 2], [9, 3]]) >>> True
- simba.utils.checks.check_str(name: str, value: Any, options: Optional[Union[Tuple[Any], List[Any], Iterable[Any]]] = (), allow_blank: bool = False, invalid_options: Optional[Union[List[str], Tuple[str]]] = None, raise_error: bool = True, invalid_substrs: Optional[Union[List[str], Tuple[str]]] = None) Tuple[bool, str][source]ο
Check if variable is a valid string.
- Parameters
name (str) β Name of variable
value (Any) β Value of variable
options (Optional[Tuple[Any]]) β Tuple of allowed strings. If empty tuple, then any string allowed. Default: ().
allow_blank (Optional[bool]) β If True, allow empty string. Default: False.
raise_error (Optional[bool]) β If True, then raise error if invalid string. Default: True.
invalid_options (Optional[List[str]]) β If not None, then a list of strings that are invalid.
invalid_substrs (Optional[List[str]]) β If not None, then a list of characters or substrings that are not allowed in the string.
- Returns
If raise_error is False, then returns size-2 Tuple, with first value being a bool representing if valid string, and second value a string representing error reason (if valid is False, else empty string).
- Return type
- Examples
>>> check_str(name='split_eval', input='gini', options=['entropy', 'gini'])
- simba.utils.checks.check_that_column_exist(df: DataFrame, column_name: Union[str, PathLike, List[str]], file_name: str, raise_error: bool = True) Union[None, bool][source]ο
Check if single named field or a list of fields exist within a dataframe.
See also
Consider
simba.utils.checks.check_valid_dataframe()instead.- Parameters
df (pd.DataFrame) β The DataFrame to check for column existence.
column_name (Union[str, os.PathLike, List[str]]) β Name or names of field(s) to check for existence.
file_name (str) β Path of
dfon disk (used for error messages).raise_error (bool) β If True, raises ColumnNotFoundError if column doesnβt exist. If False, returns bool. Default: True.
- Returns
True if all columns exist, False if any column is missing (when raise_error=False), None if raise_error=True and all columns exist.
- Return type
Union[None, bool]
- Raises
ColumnNotFoundError β The
column_namedoes not exist withindf.- Example
>>> df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) >>> check_that_column_exist(df=df, column_name='A', file_name='test.csv') True >>> check_that_column_exist(df=df, column_name=['A', 'B'], file_name='test.csv') True >>> check_that_column_exist(df=df, column_name='C', file_name='test.csv', raise_error=False) False
- simba.utils.checks.check_that_dir_has_list_of_filenames(dir: Union[str, PathLike], file_name_lst: List[str], file_type: Optional[str] = 'csv')[source]ο
Check that all file names in a list has an equivalent file in a specified directory. E.g., check if all files in the outlier corrected folder has an equivalent file in the featurues_extracted directory.
- Example
>>> file_name_lst = glob.glob('/Users/simon/Desktop/envs/troubleshooting/two_black_animals_14bp/project_folder/csv/outlier_corrected_movement' + '/*.csv') >>> check_that_dir_has_list_of_filenames(dir = '/Users/simon/Desktop/envs/troubleshooting/two_black_animals_14bp/project_folder/csv/features_extracted', file_name_lst=file_name_lst)
- simba.utils.checks.check_that_directory_is_empty(directory: Union[str, PathLike], raise_error: Optional[bool] = True) None[source]ο
Checks if a directory is empty. If the directory has content, then returns False or raises
DirectoryNotEmptyError.- Parameters
directory (str) β Directory to check.
- Raises
DirectoryNotEmptyError β If
directorycontains files.
- simba.utils.checks.check_that_hhmmss_start_is_before_end(start_time: str, end_time: str, name: str, raise_error: bool = True) bool[source]ο
Helper to check that a start time in HH:MM:SS or HH:MM:SS:MS format is before an end time in HH:MM:SS or HH:MM:SS:MS format
- Parameters
- Raises
InvalidInputError β If end time is before the start time.
- Example
>>> check_that_hhmmss_start_is_before_end(start_time='00:00:05', end_time='00:00:01', name='My time period') >>> "InvalidInputError: My time period has an end-time which is before the start-time" >>> check_that_hhmmss_start_is_before_end(start_time='00:00:01', end_time='00:00:05')
- simba.utils.checks.check_umap_hyperparameters(hyper_parameters: Dict[str, Any]) None[source]ο
Checks if dictionary of paramameters (umap, scaling, etc) are valid for grid-search umap dimensionality reduction .
- Parameters
hyper_parameters (dict) β Dictionary holding umap hyerparameters.
- Raises
InvalidInputError β If any input is invalid
- Example
>>> check_umap_hyperparameters(hyper_parameters={'n_neighbors': [2], 'min_distance': [0.1], 'spread': [1], 'scaler': 'MIN-MAX', 'variance': 0.2})
- simba.utils.checks.check_valid_array(data: ndarray, source: Optional[str] = '', accepted_ndims: Optional[Union[Tuple[int], Any]] = None, accepted_sizes: Optional[List[int]] = None, accepted_axis_0_shape: Optional[Union[List[int], Tuple[int]]] = None, accepted_axis_1_shape: Optional[Union[List[int], Tuple[int]]] = None, accepted_dtypes: Optional[Union[List[Union[str, Type]], Tuple[Union[str, Type]], Iterable[Any]]] = None, accepted_values: Optional[List[Any]] = None, accepted_shapes: Optional[List[Tuple[int]]] = None, min_axis_0: Optional[int] = None, max_axis_1: Optional[int] = None, min_axis_1: Optional[int] = None, min_value: Optional[Union[float, int]] = None, max_value: Optional[Union[float, int]] = None, raise_error: bool = True) Union[None, bool][source]ο
Check if the given array satisfies specified criteria regarding its dimensions, shape, and data type.
- Parameters
data (np.ndarray) β The numpy array to be checked.
source (Optional[str]) β A string identifying the source, name, or purpose of the array for interpretable error messaging.
accepted_ndims (Optional[Union[Tuple[int], Any]]) β List of tuples representing acceptable dimensions. If provided, checks whether the arrayβs number of dimensions matches any tuple in the list.
accepted_sizes (Optional[List[int]]) β List of acceptable sizes for the arrayβs shape. If provided, checks whether the length of the arrayβs shape matches any value in the list.
accepted_axis_0_shape (Optional[Union[List[int], Tuple[int]]]) β List of accepted number of rows of 2-dimensional array. Will also raise error if value passed and input is not a 2-dimensional array.
accepted_axis_1_shape (Optional[Union[List[int], Tuple[int]]]) β List of accepted number of columns or fields of 2-dimensional array. Will also raise error if value passed and input is not a 2-dimensional array.
accepted_dtypes (Optional[Union[List[Union[str, Type]], Tuple[Union[str, Type]], Iterable[Any]]]) β List of acceptable data types for the array. If provided, checks whether the arrayβs data type matches any string in the list.
accepted_values (Optional[List[Any]]) β List of acceptable values that can be present in the array.
accepted_shapes (Optional[List[Tuple[int]]]) β List of acceptable shapes for the array. If provided, checks whether the arrayβs shape matches any tuple in the list.
min_axis_0 (Optional[int]) β Minimum number of rows required for the array.
max_axis_1 (Optional[int]) β Maximum number of columns allowed for the array.
min_axis_1 (Optional[int]) β Minimum number of columns required for the array.
min_value (Optional[Union[float, int]]) β Minimum value allowed in the array.
max_value (Optional[Union[float, int]]) β Maximum value allowed in the array.
raise_error (bool) β If True, raises ArrayError if validation fails. If False, returns bool. Default: True.
- Returns
True if array passes all validation checks, False if validation fails (when raise_error=False), None if raise_error=True and validation passes.
- Return type
Union[None, bool]
- Example
>>> data = np.array([[1, 2], [3, 4]]) >>> check_valid_array(data, source="Example", accepted_ndims=(2,), accepted_sizes=[2], accepted_dtypes=[np.int64]) True >>> check_valid_array(data, source="Example", min_axis_0=3, raise_error=False) False
- simba.utils.checks.check_valid_boolean(value: Union[Any, List[Any]], source: Optional[str] = '', raise_error: Optional[bool] = True)[source]ο
Check if a value or list of values contains only valid boolean values.
This function validates that the input value(s) are valid Python boolean values (True or False). It can handle single values or lists of values, and provides flexible error handling options.
- Parameters
value (Union[Any, List[Any]]) β Single value or list of values to validate for boolean type.
source (Optional[str]) β Source identifier for error messages. Default: ββ.
raise_error (Optional[bool]) β If True, raises InvalidInputError when non-boolean values are found. If False, returns False. Default: True.
- Returns
True if all values are valid booleans, False if any non-boolean values found and raise_error=False.
- Return type
- Raises
InvalidInputError β If non-boolean values are found and raise_error=True.
- Example
>>> check_valid_boolean(True) True >>> check_valid_boolean([True, False, True]) True >>> check_valid_boolean([True, 1, False], raise_error=False) False >>> check_valid_boolean('not_bool', raise_error=False) False
- simba.utils.checks.check_valid_codec(codec: str, raise_error: bool = True, source: str = '')[source]ο
Validate that a codec string is available in the current FFmpeg installation.
Checks if the provided codec name exists in the list of available FFmpeg encoders by querying FFmpeg directly. This ensures the codec can be used for video encoding/decoding.
Note
This function requires FFmpeg to be installed and available in the system PATH. The function queries FFmpeg for available encoders at runtime, so it will reflect the actual encoders available in your FFmpeg installation.
See also
To get a list of all available encoders, see
get_ffmpeg_encoders(). To check if FFmpeg is available, seecheck_ffmpeg_available().- Parameters
codec (str) β The codec name to validate (e.g., βlibx264β, βh264_nvencβ, βlibvpx-vp9β).
raise_error (bool) β If True, raises
InvalidInputErrorwhen codec is invalid. If False, returns False. Default: True.source (str) β Source identifier for error messages. Used when raising exceptions. Default: ββ.
- Returns
True if codec is valid, False if invalid and
raise_error=False.- Return type
- Raises
InvalidInputError β If codec is not valid and
raise_error=True.- Example
>>> check_valid_codec(codec='libx264') >>> check_valid_codec(codec='h264_nvenc', source='my_function') >>> is_valid = check_valid_codec(codec='invalid_codec', raise_error=False)
- simba.utils.checks.check_valid_cpu_pool(value: Any, source: str = '', max_cores: Optional[int] = None, min_cores: Optional[int] = None, accepted_cores: Optional[Union[List[int], Tuple[int, ...], int]] = None, raise_error: bool = True) bool[source]ο
Validates that a value is a valid multiprocessing.Pool instance and optionally checks core count constraints.
- Parameters
value (Any) β The value to validate. Must be an instance of multiprocessing.pool.Pool.
source (str) β Optional source identifier for error messages. Default is empty string.
max_cores (Optional[int]) β Optional maximum number of processes allowed in the pool. If provided, validates that pool._processes <= max_cores.
min_cores (Optional[int]) β Optional minimum number of processes required in the pool. If provided, validates that pool._processes >= min_cores.
accepted_cores (Optional[Union[List[int], Tuple[int, ...], int]]) β Optional exact or list of acceptable process counts. If an int, validates that pool._processes == accepted_cores. If a list/tuple of ints, validates that pool._processes is in accepted_cores. All values must be positive integers.
raise_error (bool) β If True, raises InvalidInputError on validation failure. If False, returns False on failure. Default is True.
- Return bool
True if validation passes, False if validation fails and raise_error is False.
- Raises
InvalidInputError β If value is not a valid Pool instance, if core count constraints are violated, if accepted_cores contains invalid types, or if raise_error is True.
- Example
>>> import multiprocessing >>> pool = multiprocessing.Pool(processes=4) >>> check_valid_cpu_pool(value=pool, source='test', max_cores=8, min_cores=2) >>> True >>> check_valid_cpu_pool(value=pool, source='test', accepted_cores=[4, 8, 16]) >>> True >>> check_valid_cpu_pool(value=pool, source='test', accepted_cores=4) >>> True
- simba.utils.checks.check_valid_dataframe(df: DataFrame, source: Optional[str] = '', valid_dtypes: Optional[Tuple[Any]] = None, required_fields: Optional[List[str]] = None, min_axis_0: Optional[int] = None, min_axis_1: Optional[int] = None, max_axis_0: Optional[int] = None, max_axis_1: Optional[int] = None, allow_duplicate_col_names=True, accepted_rows: Optional[Union[int, Tuple[int]]] = None)[source]ο
Validate a DataFrame against various criteria.
This function performs comprehensive validation of a pandas DataFrame including data types, dimensions, required columns, and duplicate column names. It raises exceptions for any validation failures.
- Parameters
df (pd.DataFrame) β The DataFrame to validate.
source (Optional[str]) β Source identifier for error messages. Default: ββ.
valid_dtypes (Optional[Tuple[Any]]) β Tuple of allowed data types. If None, no dtype validation. Default: None.
required_fields (Optional[List[str]]) β List of required column names. If None, no field validation. Default: None.
min_axis_0 (Optional[int]) β Minimum number of rows required. If None, no minimum row validation. Default: None.
min_axis_1 (Optional[int]) β Minimum number of columns required. If None, no minimum column validation. Default: None.
max_axis_0 (Optional[int]) β Maximum number of rows allowed. If None, no maximum row validation. Default: None.
max_axis_1 (Optional[int]) β Maximum number of columns allowed. If None, no maximum column validation. Default: None.
allow_duplicate_col_names (bool) β If False, raises error for duplicate column names. Default: True.
- Returns
None if validation passes.
- Return type
None
- Raises
InvalidInputError β If any validation criteria are not met.
- Example
>>> df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) >>> check_valid_dataframe(df=df, required_fields=['A', 'B'], min_axis_0=1) >>> check_valid_dataframe(df=df, valid_dtypes=(int,), max_axis_1=2) >>> check_valid_dataframe(df=df, allow_duplicate_col_names=False)
- simba.utils.checks.check_valid_device(device: Union[typing_extensions.Literal['cpu'], int], raise_error: bool = True) bool[source]ο
Validate a compute device specification, ensuring it is either βcpuβ or a valid GPU index.
This function validates that a device specification is valid for use with PyTorch/CUDA operations. It checks if the device is either βcpuβ for CPU usage or a valid integer representing a CUDA device index.
- Parameters
device (Union[Literal['cpu'], int]) β The device to validate. Should be the string βcpuβ for CPU usage, or an integer representing a CUDA device index (e.g., 0 for βcuda:0β).
raise_error (bool) β If True, raises InvalidInputError or SimBAGPUError when the device is invalid. If False, returns False instead of raising errors. Default: True.
- Returns
True if the device is valid, False if itβs invalid and raise_error=False.
- Return type
- Raises
InvalidInputError β If the device format is invalid and raise_error=True.
SimBAGPUError β If the GPU device is not available or not valid and raise_error=True.
- Example
>>> check_valid_device('cpu') True >>> check_valid_device(0) # GPU 0 True >>> check_valid_device(5, raise_error=False) # Non-existent GPU False >>> check_valid_device('gpu', raise_error=False) # Invalid format False
- simba.utils.checks.check_valid_dict(x: dict, valid_key_dtypes: Optional[Tuple[Any]] = None, valid_values_dtypes: Optional[Tuple[Any, ...]] = None, valid_keys: Optional[Union[Tuple[Any], List[Any]]] = None, max_len_keys: Optional[int] = None, min_len_keys: Optional[int] = None, required_keys: Optional[Tuple[Any, ...]] = None, max_value: Optional[Union[float, int]] = None, min_value: Optional[Union[float, int]] = None, source: Optional[str] = None)[source]ο
Validate a dictionary against various criteria.
This function performs comprehensive validation of a dictionary including key/value data types, key constraints, required keys, and numeric value ranges. It raises exceptions for any validation failures.
- Parameters
x (dict) β The dictionary to validate.
valid_key_dtypes (Optional[Tuple[Any]]) β Tuple of allowed data types for dictionary keys. If None, no key type validation. Default: None.
valid_values_dtypes (Optional[Tuple[Any, ...]]) β Tuple of allowed data types for dictionary values. If None, no value type validation. Default: None.
valid_keys (Optional[Union[Tuple[Any], List[Any]]]) β Tuple or list of valid key names. If None, no key name validation. Default: None.
max_len_keys (Optional[int]) β Maximum number of keys allowed. If None, no maximum key count validation. Default: None.
min_len_keys (Optional[int]) β Minimum number of keys required. If None, no minimum key count validation. Default: None.
required_keys (Optional[Tuple[Any, ...]]) β Tuple of required key names. If None, no required key validation. Default: None.
max_value (Optional[Union[float, int]]) β Maximum numeric value allowed for numeric values. If None, no maximum value validation. Default: None.
min_value (Optional[Union[float, int]]) β Minimum numeric value allowed for numeric values. If None, no minimum value validation. Default: None.
source (Optional[str]) β Source identifier for error messages. If None, uses function name. Default: None.
- Returns
None if validation passes.
- Return type
None
- Raises
InvalidInputError β If any validation criteria are not met.
- Example
>>> check_valid_dict(x={'a': 1, 'b': 2}, valid_key_dtypes=(str,), valid_values_dtypes=(int,)) >>> check_valid_dict(x={'key1': 10, 'key2': 20}, required_keys=('key1',), min_value=5, max_value=25) >>> check_valid_dict(x={'x': 1, 'y': 2}, valid_keys=('x', 'y', 'z'), min_len_keys=2)
- simba.utils.checks.check_valid_extension(path: Union[str, PathLike], accepted_extensions: Union[List[str], str])[source]ο
Checks if the file extension of the provided path is in the list of accepted extensions.
- Parameters
file_path (Union[str, os.PathLike]) β The path to the file whose extension needs to be checked.
accepted_extensions (List[str]) β A list of accepted file extensions. E.g., [βpickleβ, βcsvβ].
- simba.utils.checks.check_valid_hex_color(color_hex: str, raise_error: Optional[bool] = True) bool[source]ο
Check if given string represents a valid hexadecimal color code.
- Parameters
- Return bool
True if the color_hex is a valid hexadecimal color code; False otherwise (if raise_error is False).
- Raises
IntegerError β If the color_hex is an invalid hexadecimal color code and raise_error is True.
- simba.utils.checks.check_valid_img_path(path: Union[str, PathLike], raise_error: bool = True)[source]ο
Check if a file path is a valid image file.
This function validates that a file path exists, is readable, and can be opened as an image file using OpenCV. It performs basic image file validation by attempting to read the file with cv2.imread.
- Parameters
path (Union[str, os.PathLike]) β Path to the image file to validate.
raise_error (bool) β If True, raises InvalidInputError when file is not a valid image. If False, returns False. Default: True.
- Returns
True if the file is a valid image file, False if itβs not valid and raise_error=False.
- Return type
- Raises
InvalidInputError β If the file is not a valid image file and raise_error=True.
- Example
>>> check_valid_img_path('/path/to/image.jpg') True >>> check_valid_img_path('/path/to/invalid.txt', raise_error=False) False >>> check_valid_img_path('/path/to/corrupted.png', raise_error=False) False
- simba.utils.checks.check_valid_lst(data: list, source: Optional[str] = '', valid_dtypes: Optional[Union[Tuple[Any], List[Any], Any]] = None, valid_values: Optional[List[Any]] = None, min_len: Optional[int] = 1, max_len: Optional[int] = None, min_value: Optional[float] = None, exact_len: Optional[int] = None, raise_error: Optional[bool] = True) bool[source]ο
Check the validity of a list based on passed criteria.
- Parameters
data (list) β The input list to be validated.
source (Optional[str]) β A string indicating the source or context of the data for informative error messaging.
valid_dtypes (Optional[Union[Tuple[Any], List[Any], Any]]) β A tuple, list, or single type of accepted data types. If provided, check if all elements in the list have data types in this collection.
valid_values (Optional[List[Any]]) β A list of accepted list values. If provided, check if all elements in the list have matching values in this list.
min_len (Optional[int]) β The minimum allowed length of the list. Default: 1.
max_len (Optional[int]) β The maximum allowed length of the list.
min_value (Optional[float]) β The minimum value allowed for numeric elements in the list.
exact_len (Optional[int]) β The exact length required for the list. If provided, overrides min_len and max_len.
raise_error (Optional[bool]) β If True, raise an InvalidInputError if any validation fails. If False, return False instead of raising an error. Default: True.
- Return bool
True if all validation criteria are met, False otherwise.
- Example
>>> check_valid_lst(data=[1, 2, 'three'], valid_dtypes=(int, str), min_len=2, max_len=5) True >>> check_valid_lst(data=[1, 2, 3], valid_dtypes=(int,), exact_len=3) True >>> check_valid_lst(data=[1, 2, 3], min_value=0, raise_error=False) True
- simba.utils.checks.check_valid_polygon(polygon: Union[ndarray, Polygon], raise_error: bool = True, name: Optional[str] = None) Optional[bool][source]ο
Validates whether the given polygon is a valid geometric shape.
- Parameters
polygon (Union[np.ndarray, Polygon]) β The polygon to validate, either as a NumPy array of shape (N, 2) or a shapely Polygon object.
raise_error (bool) β If True, raises an InvalidInputError if the polygon is invalid; otherwise, returns False.
name (Optional[str]) β An optional name for the polygon to include in error messages.
- Returns
True if the polygon is valid, False if invalid (and raise_error is False), or None if an error is raised.
- simba.utils.checks.check_valid_tuple(x: tuple, source: Optional[str] = '', accepted_lengths: Optional[Tuple[int]] = None, valid_dtypes: Optional[Tuple[Any]] = None, minimum_length: Optional[int] = None, accepted_values: Optional[Iterable[Any]] = None, min_integer: Optional[int] = None, raise_error: bool = True) bool[source]ο
Validate a tuple against various criteria.
This function performs comprehensive validation of a tuple including length constraints, data types, minimum values, and accepted values. It raises exceptions for any validation failures.
- Parameters
x (tuple) β The tuple to validate.
source (Optional[str]) β Source identifier for error messages. Default: ββ.
accepted_lengths (Optional[Tuple[int]]) β Tuple of accepted lengths. If None, no length validation. Default: None.
valid_dtypes (Optional[Tuple[Any]]) β Tuple of allowed data types for tuple elements. If None, no dtype validation. Default: None.
minimum_length (Optional[int]) β Minimum length required. If None, no minimum length validation. Default: None.
accepted_values (Optional[Iterable[Any]]) β Iterable of accepted values for tuple elements. If None, no value validation. Default: None.
min_integer (Optional[int]) β Minimum value for integer elements. If None, no integer validation. Default: None.
- Returns
None if validation passes.
- Return type
None
- Raises
InvalidInputError β If any validation criteria are not met.
- Example
>>> check_valid_tuple(x=(1, 2, 3), accepted_lengths=(2, 3), valid_dtypes=(int,)) >>> check_valid_tuple(x=('a', 'b'), minimum_length=2, accepted_values=['a', 'b', 'c']) >>> check_valid_tuple(x=(5, 10, 15), min_integer=5)
- simba.utils.checks.check_valid_url(url: str, raise_error: bool = False, source: str = '') bool[source]ο
Check if a string is a valid URL (http, https, or ftp).
- Parameters
- Returns
True if the string is a valid URL, False otherwise.
- simba.utils.checks.check_video_and_data_frm_count_align(video: Union[str, PathLike, VideoCapture], data: Union[str, PathLike, DataFrame], name: Optional[str] = '', raise_error: Optional[bool] = True) Union[None, bool][source]ο
Check if the frame count of a video matches the row count of a data file.
- Parameters
video (Union[str, os.PathLike, cv2.VideoCapture]) β Path to the video file or cv2.VideoCapture object.
data (Union[str, os.PathLike, pd.DataFrame]) β Path to the data file or DataFrame containing the data.
name (Optional[str]) β Name of the video (optional for interpretable error msgs).
raise_error (Optional[bool]) β Whether to raise an error if the counts donβt align (default is True). If False, prints warning.
- Return None
- Example
>>> data_1 = '/Users/simon/Desktop/envs/simba/troubleshooting/mouse_open_field/project_folder/csv/outlier_corrected_movement_location/SI_DAY3_308_CD1_PRESENT.csv' >>> video_1 = '/Users/simon/Desktop/envs/simba/troubleshooting/mouse_open_field/project_folder/frames/output/ROI_analysis/SI_DAY3_308_CD1_PRESENT.mp4' >>> check_video_and_data_frm_count_align(video=video_1, data=data_1, raise_error=True)
- simba.utils.checks.check_video_has_rois(roi_dict: Dict[str, DataFrame], roi_names: Optional[List[str]] = None, video_names: Optional[List[str]] = None, source: str = 'roi dict', raise_error: bool = True)[source]ο
Check that specified videos all have user-defined ROIs with specified names.
This function validates that all specified videos contain the required ROIs (Regions of Interest) with the specified names. It checks across all ROI types: rectangles, circles, and polygons.
Note
To get roi dictionary, see
simba.mixins.config_reader.ConfigReader.read_roi_data().- Parameters
roi_dict (Dict[str, pd.DataFrame]) β Dictionary containing ROI dataframes with keys for rectangles, circles, and polygons.
roi_names (Optional[List[str]]) β List of ROI names to check for. If None, uses all unique ROI names from the data. Default: None.
video_names (Optional[List[str]]) β List of video names to check. If None, uses all unique video names from the data. Default: None.
source (str) β A string identifying the source or context for informative error messaging. Default: βroi dictβ.
raise_error (bool) β If True, raises NoROIDataError if any videos are missing required ROIs. If False, returns tuple with validation result and missing ROIs. Default: True.
- Returns
If raise_error=True: None if all validations pass, raises exception if validation fails. If raise_error=False: Tuple of (bool, dict) where bool indicates success and dict contains missing ROIs by video.
- Return type
- Raises
NoROIDataError β If any videos are missing required ROIs and raise_error=True.
- Example
>>> roi_dict = { ... 'rectangles': pd.DataFrame({'Video': ['video1'], 'Name': ['ROI1']}), ... 'circles': pd.DataFrame({'Video': ['video1'], 'Name': ['ROI2']}), ... 'polygons': pd.DataFrame({'Video': ['video1'], 'Name': ['ROI3']}) ... } >>> check_video_has_rois(roi_dict=roi_dict, roi_names=['ROI1', 'ROI2'], video_names=['video1']) True >>> check_video_has_rois(roi_dict=roi_dict, roi_names=['ROI1', 'ROI4'], video_names=['video1'], raise_error=False) (False, {'video1': ['ROI4']})
- simba.utils.checks.get_fn_ext(filepath: ~typing.Union[~os.PathLike, str]) -> (<class 'str'>, <class 'str'>, <class 'str'>)[source]ο
Split file path into three components: (i) directory, (ii) file name, and (iii) file extension.
- Parameters
filepath (str) β Path to file.
- Return str
File directory name
- Return str
File name
- Return str
File extension
- Example
>>> get_fn_ext(filepath='C:/My_videos/MyVideo.mp4') >>> ('My_videos', 'MyVideo', '.mp4')
- simba.utils.checks.is_img_bw(img: ndarray, raise_error: bool = True, source: Optional[str] = '') bool[source]ο
Check if an image is binary black and white.
This function validates that an image contains only two pixel values: 0 (black) and 255 (white). It checks all unique pixel values in the image and ensures they are exactly these two values.
- Parameters
img (np.ndarray) β The image array to validate for binary black and white format.
raise_error (bool) β If True, raises InvalidInputError when image is not binary black and white. If False, returns False. Default: True.
source (Optional[str]) β Source identifier for error messages. Default: ββ.
- Returns
True if the image is binary black and white, False if itβs not and raise_error=False.
- Return type
- Raises
InvalidInputError β If the image is not binary black and white and raise_error=True.
- Example
>>> bw_img = np.array([[0, 255], [255, 0]], dtype=np.uint8) >>> is_img_bw(bw_img) True >>> gray_img = np.array([[128, 200], [50, 100]], dtype=np.uint8) >>> is_img_bw(gray_img, raise_error=False) False
- simba.utils.checks.is_img_greyscale(img: ndarray, raise_error: bool = True, source: Optional[str] = '') bool[source]ο
Check if an image is greyscale.
This function validates that an image is in greyscale format by checking that it has exactly 2 dimensions (height and width). Greyscale images have a single channel and are represented as 2D arrays.
- Parameters
- Returns
True if the image is greyscale, False if itβs not and raise_error=False.
- Return type
- Raises
InvalidInputError β If the image is not greyscale and raise_error=True.
- Example
>>> gray_img = np.array([[128, 200], [50, 100]], dtype=np.uint8) >>> is_img_greyscale(gray_img) True >>> color_img = np.array([[[128, 200, 50], [100, 150, 75]]], dtype=np.uint8) >>> is_img_greyscale(color_img, raise_error=False) False
- simba.utils.checks.is_lxc_container() bool[source]ο
Helper to check if the current environment is inside a LXC Linux container.
Note
See GitHub issue 457 for origin - https://github.com/sgoldenlab/simba/issues/457#issuecomment-3052631284 Thanks Heinrich2818 - https://github.com/Heinrich2818
- Returns
True if current environment is a LXC linux container, False if not.
- Return type
- simba.utils.checks.is_valid_video_file(file_path: Union[str, PathLike], raise_error: bool = True)[source]ο
Check if a file path is a valid video file.
This function validates that a file path exists, is readable, and can be opened as a video file using OpenCV. It performs basic video file validation by attempting to open the file with cv2.VideoCapture.
- Parameters
file_path (Union[str, os.PathLike]) β Path to the video file to validate.
raise_error (bool) β If True, raises InvalidFilepathError when file is not a valid video. If False, returns False. Default: True.
- Returns
True if the file is a valid video file, False if itβs not valid and raise_error=False.
- Return type
- Raises
InvalidFilepathError β If the file is not a valid video file and raise_error=True.
- Example
>>> is_valid_video_file('/path/to/video.mp4') True >>> is_valid_video_file('/path/to/invalid.txt', raise_error=False) False >>> is_valid_video_file('/path/to/corrupted.mp4', raise_error=False) False
- simba.utils.checks.is_video_color(video: Union[str, PathLike, VideoCapture]) bool[source]ο
Determines whether a video is in color or greyscale.
- Parameters
video (Union[str, os.PathLike, cv2.VideoCapture]) β The video source, either a cv2.VideoCapture object or a path to a file on disk.
- Returns
Returns True if the video is in color (has more than one channel), and False if the video is greyscale (single channel).
- Return type
- simba.utils.checks.is_windows_path(value)[source]ο
Check if the value is a valid Windows path format.
This function validates that a string follows the Windows path format by checking that it starts with a drive letter followed by a colon (e.g., βC:β, βD:β, etc.). It performs basic format validation without checking if the path actually exists on the filesystem.
- Parameters
value β The value to check for Windows path format.
- Returns
True if the value is a valid Windows path format, False otherwise.
- Return type
- Example
>>> is_windows_path("C:\Users\username\file.txt") True >>> is_windows_path("D:\data\folder") True >>> is_windows_path("/home/user/file.txt") False >>> is_windows_path("relative/path") False >>> is_windows_path("") False
- simba.utils.checks.is_wsl() bool[source]ο
Check if SimBA is running in Microsoft WSL (Windows Subsystem for Linux).
This function detects whether the current environment is running inside Microsoft WSL by checking the contents of /proc/version for the presence of βmicrosoftβ string, which indicates WSL environment.
- Returns
True if running in WSL, False otherwise.
- Return type
- Example
>>> is_wsl() False # When running on native Linux >>> is_wsl() True # When running in WSL
SimBA project config creatorο
- class simba.utils.config_creator.ProjectConfigCreator(project_path: str, project_name: str, target_list: List[str], pose_estimation_bp_cnt: str, body_part_config_idx: int, animal_cnt: int, file_type: str = 'csv')[source]ο
Create SimBA project directory tree and associated project_config.ini config file.
Note
- Parameters
project_path (str) β path to directory where to save the SimBA project directory tree
project_name (str) β Name of the SimBA project
target_list (List[str]) β Classifier names in the SimBA project
pose_estimation_bp_cnt (str) β String representing the number of body-parts in the pose-estimation data used in the simba project. E.g., β4β, β7β, β8β, β9β, β14β, β16β or βuser_definedβ, β3D_user_definedβ.
body_part_config_idx (int) β The index of the SimBA GUI dropdown pose-estimation selection. E.g.,
1. I.e., the row representing your pose-estimated body-parts in this file.animal_cnt (int) β Number of animals tracked in the input pose-estimation data.
file_type (str) β The SimBA project file type. OPTIONS:
csvorparquet.
Note
For example project_config.ini files, see https://github.com/sgoldenlab/simba/tree/master/tests/data/test_projects.
- Example
>>> _ = ProjectConfigCreator(project_path = 'project/path', project_name='project_name', target_list=['Attack'], pose_estimation_bp_cnt='16', body_part_config_idx=9, animal_cnt=2, file_type='csv')
Data utilitiesο
- simba.utils.data.add_missing_ROI_cols(shape_df: DataFrame) DataFrame[source]ο
Add missing ROI definitions in ROI info dataframes created by the first version of the SimBA ROI user-interface but analyzed using newer versions of SimBA.
- Parameters
shape_df (pd.DataFrame) β Dataframe holding ROI definitions.
:returns DataFrame
- simba.utils.data.align_target_warpaffine_vectors(centers: ndarray, target: ndarray)[source]ο
Create WarpAffine for placing original center at new target position. These are used for egocentric alignment of video.
Note
centers are returned by
simba.utils.data.egocentrically_align_pose(), orsimba.utils.data.egocentrically_align_pose_numba()target in the location in the image where the anchor body-part should be placed. results are used within e.g., :func:`simba.video_processors.egocentric_video_rotator.EgocentricVideoRotator
- simba.utils.data.animal_interpolator(df: DataFrame, animal_bp_dict: Dict[str, Any], source: Optional[str] = '', method: Optional[typing_extensions.Literal['nearest', 'linear', 'quadratic']] = 'nearest', verbose: Optional[bool] = True) DataFrame[source]ο
Interpolate missing values for frames where entire animals are missing.
Note
Animals are inferred to be βmissingβ when all their body-parts have exactly the same value on both the x and y plane (or None).
- Parameters
df (pd.DataFrame) β The input DataFrame containing animal body part positions.
animal_bp_dict (Dict[str, Any]) β A dictionary where keys are animal names and values are dictionaries with keys βX_bpsβ and βY_bpsβ, which are lists of column names for the x and y coordinates of the animal body parts.
source (Optional[str]) β An optional string indicating the source of the DataFrame, used for logging and informative error messages.
method (Optional[Literal['nearest', 'linear', 'quadratic']]) β The interpolation method to use. Options are βnearestβ, βlinearβ, and βquadraticβ. Defaults to βnearestβ.
verbose (Optional[bool]) β If True, prints the number of missing body parts being interpolated for each animal.
- Return pd.DataFrame
The DataFrame with interpolated values for the specified animal body parts.
- Example
>>> animal_bp_dict = {'Animal_1': {'X_bps': ['Ear_left_1_x', 'Ear_right_1_x', 'Nose_1_x', 'Center_1_x', 'Lat_left_1_x', 'Lat_right_1_x', 'Tail_base_1_x'], 'Y_bps': ['Ear_left_1_y', 'Ear_right_1_y', 'Nose_1_y', 'Center_1_y', 'Lat_left_1_y', 'Lat_right_1_y', 'Tail_base_1_y']}, 'Animal_2': {'X_bps': ['Ear_left_2_x', 'Ear_right_2_x', 'Nose_2_x', 'Center_2_x', 'Lat_left_2_x', 'Lat_right_2_x', 'Tail_base_2_x'], 'Y_bps': ['Ear_left_2_y', 'Ear_right_2_y', 'Nose_2_y', 'Center_2_y', 'Lat_left_2_y', 'Lat_right_2_y', 'Tail_base_2_y']}} >>> df = pd.read_csv('/Users/simon/Desktop/envs/simba/troubleshooting/two_black_animals_14bp/project_folder/csv/machine_results/Together_1.csv', index_col=0) >>> interpolated_df = animal_interpolator(df=df, animal_bp_dict=animal_bp_dict, source='test')
- simba.utils.data.body_part_interpolator(df: DataFrame, animal_bp_dict: Dict[str, Any], source: Optional[str] = '', method: Optional[typing_extensions.Literal['nearest', 'linear', 'quadratic']] = 'nearest', verbose: Optional[bool] = True) DataFrame[source]ο
Interpolate missing body-parts in pose-estimation data.
Note
Data is inferred to be βmissingβ when data for the body-part is either βNoneβ on both the x- and y-plane or located at (0, 0).
- Parameters
df (pd.DataFrame) β The input DataFrame containing animal body part positions.
animal_bp_dict (Dict[str, Any]) β A dictionary where keys are animal names and values are dictionaries with keys βX_bpsβ and βY_bpsβ, which are lists of column names for the x and y coordinates of the animal body parts.
source (Optional[str]) β An optional string indicating the source of the DataFrame, used for logging and informative error messages.
method (Optional[Literal['nearest', 'linear', 'quadratic']]) β The interpolation method to use. Options are βnearestβ, βlinearβ, and βquadraticβ. Defaults to βnearestβ.
verbose (Optional[bool]) β If True, prints the number of missing body parts being interpolated for each animal.
- Return pd.DataFrame
The DataFrame with interpolated values for the specified animal body parts.
- Example
>>> animal_bp_dict = {'Animal_1': {'X_bps': ['Ear_left_1_x', 'Ear_right_1_x', 'Nose_1_x', 'Center_1_x', 'Lat_left_1_x', 'Lat_right_1_x', 'Tail_base_1_x'], 'Y_bps': ['Ear_left_1_y', 'Ear_right_1_y', 'Nose_1_y', 'Center_1_y', 'Lat_left_1_y', 'Lat_right_1_y', 'Tail_base_1_y']}, 'Animal_2': {'X_bps': ['Ear_left_2_x', 'Ear_right_2_x', 'Nose_2_x', 'Center_2_x', 'Lat_left_2_x', 'Lat_right_2_x', 'Tail_base_2_x'], 'Y_bps': ['Ear_left_2_y', 'Ear_right_2_y', 'Nose_2_y', 'Center_2_y', 'Lat_left_2_y', 'Lat_right_2_y', 'Tail_base_2_y']}} >>> df = pd.read_csv('/Users/simon/Desktop/envs/simba/troubleshooting/two_black_animals_14bp/project_folder/csv/machine_results/Together_1.csv', index_col=0) >>> interpolated_df = body_part_interpolator(df=df, animal_bp_dict=animal_bp_dict, source='test')
- simba.utils.data.bucket_data(data: ndarray, method: typing_extensions.Literal['fd', 'doane', 'auto', 'scott', 'stone', 'rice', 'sturges', 'sqrt'] = 'auto') Tuple[float, int][source]ο
Computes the optimal bin count and bin width non-heuristically using specified method.
- Parameters
data (np.ndarray) β 1D array of numerical data.
method (np.ndarray) β The method to compute optimal bin count and bin width. These methods differ in how they estimate the optimal bin count and width. Defaults to βautoβ, which represents the maximum of the Sturges and Freedman-Diaconis estimators. Available methods are βfdβ, βdoaneβ, βautoβ, βscottβ, βstoneβ, βriceβ, βsturgesβ, βsqrtβ.
- Returns
A tuple containing the optimal bin width and bin count.
- Return type
- Example
>>> data = np.random.randint(low=1, high=1000, size=(1, 100)) >>> bucket_data(data=data, method='fd') >>> (190.8, 6) >>> bucket_data(data=data, method='doane') >>> (106.0, 10)
- simba.utils.data.bucket_data_mp(data: ndarray, method: typing_extensions.Literal['fd', 'doane', 'auto', 'scott', 'stone', 'rice', 'sturges', 'sqrt'] = 'auto', n_jobs: Optional[int] = -1) Tuple[ndarray, ndarray][source]ο
Compute histogram bin edges for many inputs in parallel using CPU with Joblib.
- Parameters
data β 2D input arrays for which to calculate histogram bin edges.
method (np.ndarray) β The method to compute optimal bin count and bin width. These methods differ in how they estimate the optimal bin count and width. Defaults to βautoβ, which represents the maximum of the Sturges and Freedman-Diaconis estimators. Available methods are βfdβ, βdoaneβ, βautoβ, βscottβ, βstoneβ, βriceβ, βsturgesβ, βsqrtβ.
n_jobs β Number of CPU cores to use for parallelism (-1 uses all available cores).
- Returns Tuple[float, int]
A tuple containing the optimal bin width and bin count.
- simba.utils.data.center_rotation_warpaffine_vectors(rotation_vectors: ndarray, centers: ndarray)[source]ο
Create WarpAffine vectors for rotating a video around the center. These are used for egocentric alignment of video.
Note
rotation_vectors and centers are returned by
simba.utils.data.egocentrically_align_pose(), orsimba.utils.data.egocentrically_align_pose_numba()results are used within e.g., :func:`simba.video_processors.egocentric_video_rotator.EgocentricVideoRotator
- simba.utils.data.convert_roi_definitions(roi_definitions_path: Union[str, PathLike], save_dir: Union[str, PathLike]) None[source]ο
Helper to convert SimBA ROI_definitions.h5 file into human-readable CSV format.
- Parameters
roi_definitions_path (Union[str, os.PathLike]) β Path to SimBA ROI_definitions.h5 on disk.
save_dir (Union[str, os.PathLike]) β Directory location where the output data should be stored
- simba.utils.data.create_color_palette(pallete_name: str, increments: int, as_rgb_ratio: Optional[bool] = False, as_hex: Optional[bool] = False, as_int: Optional[bool] = False) List[Union[str, float]][source]ο
Create a list of colors in RGB from specified color palette.
- Parameters
pallete_name (str) β Palette name (e.g.,
jet)increments (int) β Numbers of colors in the color palette to create.
as_rgb_ratio (Optional[bool]) β Return RGB to ratios. Default: False
as_hex (Optional[bool]) β Return values as HEX. Default: False
as_int (Optional[bool]) β Return RGB values as integers rather than float if possible. Default: False
Note
If both as_rgb_ratio and as_hex, HEX values will be returned.
>>> create_color_palette(pallete_name='jet', increments=3) >>> [[127.5, 0.0, 0.0], [255.0, 212.5, 0.0], [0.0, 229.81481481481478, 255.0], [0.0, 0.0, 127.5]] >>> create_color_palette(pallete_name='jet', increments=3, as_rgb_ratio=True) >>> [[0.5, 0.0, 0.0], [1.0, 0.8333333333333334, 0.0], [0.0, 0.0.9012345679012345, 1.0], [0.0, 0.0, 0.5]] >>> create_color_palette(pallete_name='jet', increments=3, as_hex=True) >>> ['#800000', '#ffd400', '#00e6ff', '#000080']
- simba.utils.data.create_color_palettes(no_animals: int, map_size: int, cmaps: Optional[List[str]] = None) List[List[int]][source]ο
Create list of lists of bgr colors, one for each animal. Each list is pulled from a different palette matplotlib color map.
- Parameters
- Returns
BGR colors
- Return type
List[List[int]]
- Example
>>> create_color_palettes(no_animals=2, map_size=2) >>> [[[255.0, 0.0, 255.0], [0.0, 255.0, 255.0]], [[102.0, 127.5, 0.0], [102.0, 255.0, 255.0]]]
- simba.utils.data.detect_bouts(data_df: DataFrame, target_lst: Union[List[str], str], fps: Union[int, float]) DataFrame[source]ο
Detect behavior βboutsβ (e.g., continous sequence of classified behavior-present frames) for specified classifiers.
Note
Can be any field of boolean type. E.g., target_lst = [βInside_ROI_1`] also works for bouts inside ROI shape.
See also
For multi-class Boolean classifiers, see
simba.utils.data.detect_bouts_multiclass().- Parameters
- Returns
Dataframe where bouts are represented by rows and fields are represented by βEvent type β, βStart timeβ, βEnd timeβ, βStart frameβ, βEnd frameβ, βBout timeβ
- Return type
pd.DataFrame
- Example
>>> data_df = read_df(file_path='tests/data/test_projects/two_c57/project_folder/csv/machine_results/Together_1.csv', file_type='csv') >>> detect_bouts(data_df=data_df, target_lst=['Attack', 'Sniffing'], fps=25) >>> 'Event' 'Start_time' 'End Time' 'Start_frame' 'End_frame' 'Bout_time' >>> 0 'Attack' 5.03 5.33 151 159 0.30 >>> 1 'Attack' 5.87 6.23 176 186 0.37 >>> 2 'Sniffing' 3.47 3.83 104 114 0.37
- simba.utils.data.detect_bouts_multiclass(data: DataFrame, target: str, fps: int = 1, classifier_map: Optional[Dict[int, str]] = None) DataFrame[source]ο
Detect bouts in a multiclass time series dataset and return the bout event types, their start times, end times and duration.
See also
For single class Boolean classifiers, see
simba.utils.data.detect_bouts().- Parameters
data (pd.DataFrame) β A Pandas DataFrame containing multiclass time series data.
target (str) β Name of the target column in
data.fps (int) β Frames per second of the video used to collect
data. Default is 1.classifier_map (Dict[int, str]) β A dictionary mapping class labels to their names. Used to replace numeric labels with descriptive names. If None, then numeric event labels are kept.
- Returns
Dataframe where bouts are represented by rows and fields are represented by βEvent type β, βStart timeβ, βEnd timeβ, βStart frameβ, βEnd frameβ, βBout timeβ
- Return type
pd.DataFrame
- Example
>>> df = pd.DataFrame({'value': [0, 0, 0, 2, 2, 1, 1, 1, 3, 3]}) >>> detect_bouts_multiclass(data=df, target='value', fps=3, classifier_map={0: 'None', 1: 'sharp', 2: 'track', 3: 'sync'}) >>> 'Event' 'Start_time' 'End_time' 'Start_frame' 'End_frame' 'Bout_time' >>> 0 'None' 0.000000 1.000000 0.0 2.0 1.000000 >>> 1 'sharp' 1.666667 2.666667 5.0 7.0 1.000000 >>> 2 'track' 1.000000 1.666667 3.0 4.0 0.666667 >>> 3 'sync ' 2.666667 3.333333 8.0 9.0 0.666667
- simba.utils.data.df_smoother(data: DataFrame, fps: float, time_window: int, source: Optional[str] = '', std: Optional[int] = 5, method: Optional[typing_extensions.Literal['bartlett', 'blackman', 'boxcar', 'cosine', 'gaussian', 'hamming', 'exponential']] = 'gaussian') DataFrame[source]ο
Smooth the data in a DataFrame using a specified window function.
This function applies a rolling window smoothing operation to the data in the DataFrame. The type of window function and the standard deviation for the smoothing can be specified. The window size is determined based on the frame rate per second (fps) and the time window.
See also
For low-pass Fourier smoothing, see
simba.utils.data.fft_lowpass_filter(). For Savitzky-Golay smoothing, seesimba.utils.data.savgol_smoother().- Parameters
data (pd.DataFrame) β The input data to be smoothed.
fps (float) β The frame rate per second of the data.
time_window (int) β The time window in milliseconds over which to apply the smoothing.
source (Optional[str]) β An optional string indicating the source of the data, used for logging and informative error messages.
std (Optional[int]) β The standard deviation for the window function, used when the method is βgaussianβ.
method (Optional[Literal['bartlett', 'blackman', 'boxcar', 'cosine', 'gaussian', 'hamming', 'exponential']]) β The type of window function to use for smoothing. Default βgaussianβ.
- Return pd.DataFrame
The smoothed DataFrame.
- simba.utils.data.egocentric_frm_rotator(frames: ndarray, rotation_matrices: ndarray, interpolate: Optional[bool] = True) ndarray[source]ο
Rotates a sequence of frames using the provided rotation matrices in an egocentric manner using acceleration through numba JIT.
Applies a geometric transformation to each frame in the input sequence based on its corresponding rotation matrix. The transformation includes rotation and translation, followed by bilinear interpolation to map pixel values from the source frame to the output frame.
Note
To create rotation matrices, see
simba.utils.data.center_rotation_warpaffine_vectors()andsimba.utils.data.align_target_warpaffine_vectors()- Parameters
frames (np.ndarray) β A 4D array of shape (N, H, W, C)
rotation_matrices (np.ndarray) β A 3D array of shape (N, 3, 3), where each 3x3 matrix represents an affine transformation for a corresponding frame. The matrix should include rotation and translation components.
- Returns
A 4D array of shape (N, H, W, C), representing the warped frames after applying the transformations. The shape matches the input frames.
- Return type
np.ndarray
- Example
>>> DATA_PATH = r"/mnt/c/Users/sroni/OneDrive/Desktop/rotate_ex/data/501_MA142_Gi_Saline_0513.csv" >>> VIDEO_PATH = r"/mnt/c/Users/sroni/OneDrive/Desktop/rotate_ex/videos/501_MA142_Gi_Saline_0513.mp4" >>> SAVE_PATH = r"/mnt/c/Users/sroni/OneDrive/Desktop/rotate_ex/videos/501_MA142_Gi_Saline_0513_rotated.mp4" >>> ANCHOR_LOC = np.array([300, 300]) >>> >>> df = read_df(file_path=DATA_PATH, file_type='csv') >>> bp_cols = [x for x in df.columns if not x.endswith('_p')] >>> data = df[bp_cols].values.reshape(len(df), int(len(bp_cols)/2), 2).astype(np.int64) >>> data, centers, rotation_matrices = egocentrically_align_pose(data=data, anchor_1_idx=6, anchor_2_idx=2, anchor_location=ANCHOR_LOC, direction=180) >>> imgs = read_img_batch_from_video_gpu(video_path=VIDEO_PATH, start_frm=0, end_frm=100) >>> imgs = np.stack(list(imgs.values()), axis=0) >>> >>> rot_matrices_center = center_rotation_warpaffine_vectors(rotation_vectors=rotation_matrices, centers=centers) >>> rot_matrices_align = align_target_warpaffine_vectors(centers=centers, target=ANCHOR_LOC) >>> >>> imgs_centered = egocentric_frm_rotator(frames=imgs, rotation_matrices=rot_matrices_center) >>> imgs_out = egocentric_frm_rotator(frames=imgs_centered, rotation_matrices=rot_matrices_align)
- simba.utils.data.egocentrically_align_pose(data: ndarray, anchor_1_idx: int, anchor_2_idx: int, anchor_location: ndarray, direction: int) Tuple[ndarray, ndarray, ndarray][source]ο
Aligns a set of 2D points egocentrically based on two anchor points and a target direction.
Rotates and translates a 3D array of 2D points (e.g., time-series of frame-wise data) such that one anchor point is aligned to a specified location, and the direction between the two anchors is aligned to a target angle.
See also
For numba acceleration, see
simba.utils.data.egocentrically_align_pose_numba(). To align both pose and video, seesimba.data_processors.egocentric_aligner.EgocentricalAligner(). To egocentrically rotate video, seesimba.video_processors.egocentric_video_rotator.EgocentricVideoRotator()- Parameters
data (np.ndarray) β A 3D array of shape (num_frames, num_points, 2) containing 2D points for each frame. Each frame is represented as a 2D array of shape (num_points, 2), where each row corresponds to a pointβs (x, y) coordinates.
anchor_1_idx (int) β The index of the first anchor point in data used as the center of alignment. This body-part will be placed in the center of the image.
anchor_2_idx (int) β The index of the second anchor point in data used to calculate the direction vector. This bosy-part will be located direction degrees from the anchor_1 body-part.
direction (int) β The target direction in degrees to which the vector between the two anchors will be aligned.
anchor_location (np.ndarray) β A 1D array of shape (2,) specifying the target (x, y) location for anchor_1_idx after alignment.
- Returns
A tuple containing the rotated data, and variables required for also rotating the video using the same rules: - aligned_data: A 3D array of shape (num_frames, num_points, 2) with the aligned 2D points. - centers: A 2D array of shape (num_frames, 2) containing the original locations of anchor_1_idx in each frame before alignment. - rotation_vectors: A 3D array of shape (num_frames, 2, 2) containing the rotation matrices applied to each frame.
- Return type
Tuple[np.ndarray, np.ndarray, np.ndarray]
- Example
>>> data = np.random.randint(0, 500, (100, 7, 2)) >>> anchor_1_idx = 5 # E.g., the animal tail-base is the 5th body-part >>> anchor_2_idx = 7 # E.g., the animal nose is the 7th row in the data >>> anchor_location = np.array([250, 250]) # the tail-base (index 5) is placed at x=250, y=250 in the image. >>> direction = 90 # The nose (index 7) will be placed in direction 90 degrees (S) relative to the tailbase. >>> results, centers, rotation_vectors = egocentrically_align_pose(data=data, anchor_1_idx=anchor_1_idx, anchor_2_idx=anchor_2_idx, direction=direction)
- simba.utils.data.egocentrically_align_pose_numba(data: ndarray, anchor_1_idx: int, anchor_2_idx: int, direction: int, anchor_location: ndarray) Tuple[ndarray, ndarray, ndarray][source]ο
Aligns a set of 2D points egocentrically based on two anchor points and a target direction.
Rotates and translates a 3D array of 2D points (e.g., time-series of frame-wise data) such that one anchor point is aligned to a specified location, and the direction between the two anchors is aligned to a target angle.
EXPECTED RUNTIMES
FRAMES (MILLIONS)
NUMBA TIME (S)
NUMBA TIME (STEV)
NUMPY TIME (S)
NUMPY TIME (STEV)
1
0.733
0.006
10.138
0.459
2
1.474
0.004
16.894
0.264
4
2.969
0.032
33.813
0.371
8
5.991
0.061
73.434
0.526
16
12.123
0.215
134.028
0.858
32
23.844
0.105
270.435
1.379
64
48.296
0.034
540.896
1.781
7 BODY-PARTS PER FRAME
3 ITERATIONS
See also
For numpy function, see
simba.utils.data.egocentrically_align_pose(). To align both pose and video, seesimba.data_processors.egocentric_aligner.EgocentricalAligner(). To egocentrically rotate video, seesimba.video_processors.egocentric_video_rotator.EgocentricVideoRotator()- Parameters
data (np.ndarray) β A 3D array of shape (num_frames, num_points, 2) containing 2D points for each frame. Each frame is represented as a 2D array of shape (num_points, 2), where each row corresponds to a pointβs (x, y) coordinates.
anchor_1_idx (int) β The index of the first anchor point in data used as the center of alignment. This body-part will be placed in the center of the image.
anchor_2_idx (int) β The index of the second anchor point in data used to calculate the direction vector. This bosy-part will be located direction degrees from the anchor_1 body-part.
direction (int) β The target direction in degrees to which the vector between the two anchors will be aligned.
anchor_location (np.ndarray) β A 1D array of shape (2,) specifying the target (x, y) location for anchor_1_idx after alignment.
- Returns
A tuple containing the rotated data, and variables required for also rotating the video using the same rules: - aligned_data: A 3D array of shape (num_frames, num_points, 2) with the aligned 2D points. - centers: A 2D array of shape (num_frames, 2) containing the original locations of anchor_1_idx in each frame before alignment. - rotation_vectors: A 3D array of shape (num_frames, 2, 2) containing the rotation matrices applied to each frame.
- Return type
Tuple[np.ndarray, np.ndarray, np.ndarray]
- Example
>>> data = np.random.randint(0, 500, (100, 7, 2)) >>> anchor_1_idx = 5 # E.g., the animal tail-base is the 5th body-part >>> anchor_2_idx = 7 # E.g., the animal nose is the 7th row in the data >>> anchor_location = np.array([250, 250]) # the tail-base (index 5) is placed at x=250, y=250 in the image. >>> direction = 90 # The nose (index 7) will be placed in direction 90 degrees (S) relative to the tailbase. >>> results, centers, rotation_vectors = egocentrically_align_pose_numba(data=data, anchor_1_idx=anchor_1_idx, anchor_2_idx=anchor_2_idx, direction=direction)
- simba.utils.data.fast_mean_rank(data: ndarray, descending: Optional[bool] = True) ndarray[source]ο
Jitted helper to rank values in 1D array using
meanmethod.See also
- Parameters
data (np.ndarray) β 1D array of feature values.
descending (bool) β If True, ranks returned where low values get a high rank. If False, low values get a low rank. Default: True.
- Returns
1D array with the
datavalues ranked indices.- Return type
np.ndarray
- References
- Example
>>> data = np.array([1, 1, 3, 4, 5, 6, 7, 8, 9, 10]) >>> fast_mean_rank(data=data, descending=True) >>> [9.5, 9.5, 8. , 7. , 6. , 5. , 4. , 3. , 2. , 1. ]
- simba.utils.data.fast_minimum_rank(data: ndarray, descending: Optional[bool] = True) ndarray[source]ο
Jitted helper to rank values in 1D array using
minimummethod.See also
- Parameters
data (np.ndarray) β 1D array of feature values.
descending (bool) β If True, ranks returned where low values get a high rank. If False, low values get a low rank. Default: True.
- Returns
1D array with the
datavalues ranked indices.- Return type
np.ndarray
- References
- Example
>>> data = np.array([1, 1, 3, 4, 5, 6, 7, 8, 9, 10]) >>> fast_minimum_rank(data=data, descending=True) >>> [9, 9, 8, 7, 6, 5, 4, 3, 2, 1] >>> fast_minimum_rank(data=data, descending=False) >>> [ 1, 1, 3, 4, 5, 6, 7, 8, 9, 10]
- simba.utils.data.fft_lowpass_filter(data: ndarray, cut_off: float = 0.1) ndarray[source]ο
Apply FFT-based lowpass filter to 1D or 2D data.
See also
For Savitzky-Golay smoothing, see
simba.utils.data.savgol_smoother(). For βbartlettβ, βblackmanβ, βboxcarβ, βcosineβ, βgaussianβ, βhammingβ, βexponentialβ smoothing, see func:simba.utils.data.df_smoother.- Parameters
data (np.ndarray) β Input data array (1D or 2D)
cut_off (float) β Cutoff frequency as fraction of Nyquist frequency (0 < cut_off < 1)
- Return np.ndarray
Filtered data with same shape and dtype as input
- Example
>>> from simba.utils.read_write import read_df >>> IN_PATH = r"C:/troubleshooting/RAT_NOR/project_folder/csv/outlier_corrected_movement_location/2022-06-20_NOB_DOT_4.csv" >>> OUT_PATH = r"C:/troubleshooting/RAT_NOR/project_folder/csv/outlier_corrected_movement_location/2022-06-20_NOB_DOT_4_filtered.csv" >>> df = read_df(file_path=IN_PATH) >>> data = df.values >>> x = fft_lowpass_filter(data=data, cut_off=0.1)
- simba.utils.data.find_bins(data: Dict[str, List[int]], bracket_type: typing_extensions.Literal['QUANTILE', 'QUANTIZE'], bracket_cnt: int, normalization_method: typing_extensions.Literal['ALL VIDEOS', 'BY VIDEO']) Dict[str, ndarray][source]ο
Helper to find bin cut-off points.
- Parameters
data (dict) β Dictionary with video names as keys and list of values of size len(frames).
bracket_type (Literal[str]) β βQUANTILEβ or βQUANTIZEβ
bracket_cnt (str) β Number of bins.
normalization_method (str) β Create bins based on data in all videos (βALL VIDEOSβ) or create different bins per video (βBY VIDEOβ)
- Returns dict
The videos as keys and bin cut off points as array of size len(bracket_cnt) x 2.
- simba.utils.data.find_frame_numbers_from_time_stamp(start_time: str, end_time: str, fps: int) List[int][source]ο
Given start and end timestamps in HH:MM:SS formats and the fps, return the frame numbers representing the time period.
Note
For the converse (find frame numbers from start and in HH:MM:SS format), use func:simba.utils.read_write.find_time_stamp_from_frame_numbers.
- Parameters
- Returns
Frame numbers within the period.
- Return type
List[int]
- Example
>>> find_frame_numbers_from_time_stamp(start_time='00:00:00', end_time='00:00:01', fps=10) >>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
- simba.utils.data.find_ranked_colors(data: Dict[Any, float], palette: str, as_hex: bool = False, as_rgb_ratio: bool = False, reverse: bool = True) Dict[str, Union[Tuple[int], str]][source]ο
Find ranked colors for a given data dictionary values based on a specified color palette.
The key with the highest value in the data dictionary is assigned the most intense palette color, while the key with the lowest value in the data dictionary is assigned the least intense palette color.
- Parameters
data β A dictionary where keys are labels and values are numerical scores.
palette β A string representing the name of the color palette to use (e.g., βmagmaβ).
as_hex β If True, return colors in hexadecimal format; if False, return as RGB tuples. Default is False.
- Returns
A dictionary where keys are labels and values are corresponding colors based on ranking.
- Return type
- Examples
>>> data = {'Animal_1': 0.34786870380536705, 'Animal_2': 0.4307923198152757, 'Animal_3': 0.221338976379357} >>> find_ranked_colors(data=data, palette='magma', as_hex=True) >>> {'Animal_2': '#040000', 'Animal_1': '#7937b7', 'Animal_3': '#bffdfc'}
- simba.utils.data.freedman_diaconis(data: ndarray) Tuple[float, int][source]ο
Use Freedman-Diaconis rule to compute optimal count of histogram bins and their width.
Note
Can also use
simba.utils.data.bucket_datapassing methodfd.- Parameters
data (np.ndarray) β 1d array with values to compute optimal bins for.
- Returns
Tuple representing the optimal count of histogram bins and their width.
- Return type
- References
- simba.utils.data.get_confusion_matrix(x: ndarray, y: ndarray) ndarray[source]ο
Compute a confusion matrix
Note
Adapted from mucunwuxianβs Stack Overflow answer: https://stackoverflow.com/a/67747070
- Parameters
x (np.ndarray) β Predicted cluster labels (1D array of integers).
y (np.ndarray) β Ground truth class labels (1D array of integers, same length as x).
- Returns
A 2D confusion matrix of shape (n_labels, n_labels), where entry (i, j) is the number of times label i in x coincided with label j in y.
- Return type
np.ndarray
- Example
>>> x = np.random.randint(0, 5, (100000,)) >>> y = np.random.randint(0, 5, (100000,)) >>> c = get_confusion_matrix(x=x, y=y)
- simba.utils.data.get_cpu_pool(core_cnt: int = -1, maxtasksperchild: int = 8000, context: Optional[typing_extensions.Literal['fork', 'spawn', 'forkserver']] = None, verbose: bool = True, source: Optional[str] = None) Pool[source]ο
Creates and returns a multiprocessing.Pool instance with platform-appropriate defaults and validation.
- Parameters
core_cnt (int) β Number of worker processes. -1 uses all available cores. Default: -1.
maxtasksperchild (int) β Maximum number of tasks a worker process can complete before being replaced. Default: From Defaults.MAXIMUM_MAX_TASK_PER_CHILD.
context (Optional[Literal['fork', 'spawn', 'forkserver']]) β Multiprocessing start method. None uses platform default. Default: None.
verbose (bool) β If True, prints pool creation message with timestamp. Default: True.
source (Optional[str]) β Optional identifier string for logging purposes (e.g., βVideoProcessorβ). Default: None.
- Returns
Configured multiprocessing.Pool instance.
- Return type
multiprocessing.Pool
- Example
>>> pool = get_cpu_pool(core_cnt=4, source='FeatureExtractor') >>> pool = get_cpu_pool(core_cnt=-1, context='spawn', verbose=True) >>> pool = get_cpu_pool(core_cnt=8, maxtasksperchild=100, source='VideoProcessor')
- simba.utils.data.get_library_version(library_name: str, raise_error: bool = False) Union[str, bool][source]ο
Get the version installed package in python environment.
- Parameters
library_name (str) β Name of library.
- Return str
Library version name, if installed
- Example
>>> get_library_version(library_name='sklearn') >>> 0.22.2
- simba.utils.data.get_mode(x: ndarray) Union[float, int][source]ο
Get the mode (most frequent value) within an array
- Parameters
x (np.ndarray) β 1d array of numerics.
- Returns
The mode of x.
- Rtype Union[float, int]
- simba.utils.data.hist_1d_mp(data: ndarray, bin_counts: ndarray, bin_widths: ndarray, normalize: Optional[bool] = False) List[source]ο
Jitted helper to compute 1D histograms with counts or rations (if normalize is True) for a 2D dataset
Note
For non-heuristic rules for bin counts and bin ranges, see
simba.data.freedman_diaconisor simba.data.bucket_data``.For computing a single 1D histogram from 1d data, use : func: hist_1d_
- Parameters
data (np.ndarray) β 2d array containing feature values. The data in each row will be binned seperately.
bin_count (int) β The number of bins.
range (np.ndarray) β 1d array with two values representing minimum and maximum value to bin.
normalize (Optional[bool]) β If True, then the counts are returned as a ratio of all values. If False, then the raw counts. Pass normalize as True if the datasets are unequal counts. Default: True.
- Returns
A numba list of list of same size as data.shape[0]
- Return type
typed.List
- Example
>>> data = np.random.randint(0, 100, (900, 300)) >>> bin_counts, bin_widths = bucket_data_mp(data=data) >>> r = hist_1d_mp(data=data, bin_counts=bin_counts, bin_widths=bin_widths, normalize=True)
- simba.utils.data.interpolate_color_palette(start_color: Tuple[int, int, int], end_color: Tuple[int, int, int], n: Optional[int] = 10) List[Tuple[int, int, int]][source]ο
Generate a list of colors interpolated between two passed RGB colors.
- Parameters
start_color β Tuple of RGB values for the start color.
end_color β Tuple of RGB values for the end color.
n β Number of colors to generate.
- Returns
List of interpolated RGB colors.
- Return type
- Example
>>> red, black = (255, 0, 0), (0, 0, 0) >>> colors = interpolate_color_palette(start_color=red, end_color=black, n = 10)
- simba.utils.data.plug_holes_shortest_bout(data_df: DataFrame, clf_name: str, fps: float, shortest_bout: int) DataFrame[source]ο
Removes behavior βboutsβ that are shorter than the minimum user-specified length within a dataframe.
Note
In the initial step the function looks for behavior βinteruptionsβ that are the length of the
shortest_boutor shorter. I.e., these are0sequences that are the length of theshortest_boutor shorter with trailing and leading 1`s. These interuptions are filled with `1`s. Next, the behavioral bouts shorter than the `shortest_bout are removed. This operations are perfomed as it helps in preserving longer sequences of the desired behavior, ensuring they arenβt fragmented by brief interruptions.- Parameters
- Returns
Dataframe where behavior bouts with invalid lengths have been removed (< shortest_bout)
- Return type
pd.DataFrame
- Example
>>> data_df = pd.DataFrame(data=[1, 0, 1, 1, 1], columns=['target']) >>> plug_holes_shortest_bout(data_df=data_df, clf_name='target', fps=10, shortest_bout=2000) >>> target >>> 0 1 >>> 1 1 >>> 2 1 >>> 3 1 >>> 4 1
- simba.utils.data.resample_geometry_vertices(vertices: Union[List[ndarray], ndarray], vertice_cnt: int) ndarray[source]ο
Resample geometry vertices to a specified number of vertices in each polygon.
This function takes a list or a single array of 2D coordinates representing the vertices of polygons and resamples each polygon to have exactly vertice_cnt vertices. The resampling is done by interpolating the distances between consecutive vertices and then uniformly distributing the requested number of vertices along the perimeter of each polygon.
- Parameters
np.ndarray] (Union[List[np.ndarray],) β A list of 2D coordinate arrays or a single 3D array representing the vertices of polygons. Each 2D array should have shape (n, 2), where n is the number of vertices.
vertice_cnt (int) β The target number of vertices for resampling in each polygon. This value should be at least 3.
- Returns
A 3D array of shape (len(vertices), vertice_cnt, 2), where each 2D array in the result contains the resampled vertices of the corresponding polygon.
- Return type
np.ndarray
- simba.utils.data.run_user_defined_feature_extraction_class(file_path: Union[str, PathLike], config_path: Union[str, PathLike]) None[source]ο
Loads and executes user-defined feature extraction class within .py file.
- Parameters
file_path β Path to .py file holding user-defined feature extraction class.
config_path (str) β Path to SimBA project config file.
Warning
Legacy function. The GUI since 12/23 uses
simba.utils.custom_feature_extractor.UserDefinedFeatureExtractor().Note
If the
file_pathcontains multiple classes, then the first class will be used.The user defined class needs to contain a
config_pathinit argument.If the feature extraction class contains a
if __name__ == "__main__":entry point and uses argparse, then the custom feature extraction module will be executed through python subprocess.Else, will be executed using
sys.I recommend using the
if __name__ == "__main__:and subprocess alternative, as the feature extraction clas will be executed in a different thread and any multicore parallel processes within the user feature extraction class will not be throttled by the graphical interface mainloop.- Example
>>> run_user_defined_feature_extraction_class(config_path='/Users/simon/Desktop/envs/troubleshooting/circular_features_zebrafish/project_folder/project_config.ini', file_path='/Users/simon/Desktop/fish_feature_extractor_2023_version_5.py') >>> run_user_defined_feature_extraction_class(config_path='/Users/simon/Desktop/envs/troubleshooting/piotr/project_folder/train-20231108-sh9-frames-with-p-lt-2_plus3-&3_best-f1.ini', file_path='/simba/misc/piotr.py')
- simba.utils.data.sample_df_n_by_unique(df: DataFrame, field: str, n: int) DataFrame[source]ο
Randomly sample at most N rows per unique value in specified field of a dataframe.
For example, sample 100 observation from each inferred cluster assignment.
- Parameters
:return A dataframe containing randomly sampled rows. :rtype: pd.DataFrame
- simba.utils.data.savgol_smoother(data: Union[DataFrame, ndarray], fps: float, time_window: int, source: Optional[str] = '', mode: Optional[typing_extensions.Literal['mirror', 'constant', 'nearest', 'wrap', 'interp']] = 'nearest', polyorder: Optional[int] = 3) Union[DataFrame, ndarray][source]ο
Apply Savitzky-Golay smoothing to the input data pose-estimation data
Applies the Savitzky-Golay filter to smooth the data in a DataFrame or a NumPy array. The filter smoothes the data using a polynomial of the specified order and a window size based on the frame rate per second (fps) and the time window.
See also
For βbartlettβ, βblackmanβ, βboxcarβ, βcosineβ, βgaussianβ, βhammingβ, βexponentialβ smoothing, see func:simba.utils.data.df_smoother. For low-pass Fourier smoothing, see
simba.utils.data.fft_lowpass_filter().- Parameters
data (Union[pd.DataFrame, np.ndarray]) β The input data to be smoothed. Can be a pandas DataFrame or a 2D NumPy array.
fps (float) β The frame rate per second of the data.
time_window (int) β The time window in milliseconds over which to apply the smoothing.
source (Optional[str]) β An optional string indicating the source of the data, used for logging and informative error messages.
mode (Optional[Literal['mirror', 'constant', 'nearest', 'wrap', 'interp']]) β The mode parameter determines the behavior at the edges of the data. Options are:βmirrorβ, βconstantβ, βnearestβ, βwrapβ, βinterpβ. Default: βnearestβ.
polyorder (Optional[int]) β The order of the polynomial used to fit the samples.
- Return Union[pd.DataFrame, np.ndarray]
The smoothed data, returned as a DataFrame if the input was a DataFrame, or a NumPy array if the input was an array.
- Example
>>> data = pd.read_csv('/Users/simon/Desktop/envs/simba/troubleshooting/two_black_animals_14bp/project_folder/csv/machine_results/Together_1.csv', index_col=0) >>> savgol_smoother(data=data.values, fps=15, time_window=1000)
- simba.utils.data.scale_pose_keypoints(keypoints: ndarray, original_size: Tuple[int, int], new_size: Tuple[int, int]) ndarray[source]ο
Scale pose keypoints from original image dimensions to new image dimensions.
- Parameters
- Returns
Array of scaled (x, y) coordinates. Same shape as input (1D if input was 1D, else Nx2).
- Example
>>> kp = np.array([[100, 200], [300, 400]]) >>> scale_pose_keypoints(kp, original_size=(640, 480), new_size=(320, 240)) >>> scale_pose_keypoints(np.array([100, 200]), original_size=(640, 480), new_size=(320, 240))
- simba.utils.data.slice_roi_dict_for_video(data: Dict[str, DataFrame], video_name: str) Tuple[Dict[str, DataFrame], List[str]][source]ο
Given a dictionary of dataframes representing different ROIs (created by
simba.mixins.config_reader.ConfigReader.read_roi_data), retain only the ROIs belonging to the specified video.- Parameters
- Returns
Tuple with (i) a dictionary of the same shape as input data, and a list of the roi names for the sliced video.
- Return type
- simba.utils.data.slice_roi_dict_from_attribute(data: Dict[str, DataFrame], shape_names: Optional[List[str]] = None, video_names: Optional[List[str]] = None) Tuple[Dict[str, DataFrame], List[str], int][source]ο
Filters ROI (Region of Interest) shape data based on provided shape names and/or video names.
- Parameters
data (Dict[str, pd.DataFrame]) β A dictionary where keys are shape type strings (e.g., βRectanglesβ, βCirclesβ, βPolygonsβ), and values are pandas DataFrames containing at least βNameβ and βVideoβ columns. Obtained from ConfigReader.read_roi_data.
shape_names (Union[str, List[str]]) β A string or list of strings specifying ROI names to retain. If None, all names are kept.
video_names (Union[str, List[str]]) β A string or list of strings specifying video names to retain. If None, all videos are kept.
- Returns
A dictionary of filtered DataFrames, one per shape type, with the index reset, the names of the ROIs, and the number of shapes returned.
- Return type
- simba.utils.data.slp_to_df_convert(file_path: Union[str, PathLike], headers: List[str], joined_tracks: Optional[bool] = False, multi_index: Optional[bool] = True, drop_body_parts: Optional[List[str]] = None) DataFrame[source]ο
Helper to convert .slp pose-estimation data in h5 format to pandas dataframe.
- Parameters
file_path (Union[str, os.PathLike]) β Path to SLEAP H5 file on disk.
headers (List[str]) β List of strings representing output dataframe headers.
joined_tracks (bool) β If True, the h5 file has been created by joining multiple .slp files.
multi_index (bool) β If True, inserts multi-index place-holders in the output dataframe (used in SimBA data import).
drop_body_parts (Optional[List[str]]) β Body-parts that should be removed from the SLEAP H5 dataset before import into SimBA. Use the body-part names as defined in SLEAP. Default: None.
- Raises
InvalidFileTypeError β If
file_pathis not a valid SLEAP H5 pose-estimation file.DataHeaderError β If sleap file contains more or less body-parts than suggested by len(headers)
- Return pd.DataFrame
With animal ID, Track ID and body-part names as columns.
- Example
>>> headers = ['d_nose_1', 'd_neck_1', 'd_back_1', 'd_tail_1', 'nest_s_2', 'nest_cc_2', 'nest_cv_2', 'nest_cc_2', 'nest_csc_2', 'nest_cscd_2'] >>> new_headers = [] >>> for h in headers: new_headers.append(h + '_x'); new_headers.append(h + '_y'); new_headers.append(h + '_p') >>> df = slp_to_df_convert(file_path='/Users/simon/Desktop/envs/troubleshooting/ryan/LBN4a_Ctrl_P05_1_2022-01-15_08-16-20c.h5', headers=new_headers, joined_tracks=True)
- simba.utils.data.smooth_data_gaussian(config: ConfigParser, file_path: str, time_window_parameter: int) None[source]ο
Perform Gaussian smoothing of pose-estimation data.
Important
Overwrites the input data with smoothened data.
- Parameters
config (configparser.ConfigParser) β Parsed SimBA project_config.ini file.
file_path (str) β Path to pose estimation data.
time_window_parameter (int) β Gaussian rolling window size in milliseconds.
Example
>>> config = read_config_file(ini_path='/Users/simon/Desktop/envs/troubleshooting/Tests_022023/project_folder/project_config.ini') >>> smooth_data_gaussian(config=config, file_path='/Users/simon/Desktop/envs/troubleshooting/Tests_022023/project_folder/csv/input_csv/Together_1.csv', time_window_parameter=500)
- simba.utils.data.smooth_data_savitzky_golay(config: ConfigParser, file_path: Union[str, PathLike], time_window_parameter: int, overwrite: Optional[bool] = True) None[source]ο
Perform Savitzky-Golay smoothing of pose-estimation data within a file.
Important
LEGACY: USE
simba.utils.data.savgol_smootherinstead.Overwrites the input data with smoothened data.
- Parameters
config (configparser.ConfigParser) β Parsed SimBA project_config.ini file.
file_path (str) β Path to pose estimation data.
time_window_parameter (int) β Savitzky-Golay rolling window size in milliseconds.
overwrite (bool) β If True, overwrites the input data. If False, returns the smoothened dataframe.
- Example
>>> config = read_config_file(config_path='Tests_022023/project_folder/project_config.ini') >>> smooth_data_savitzky_golay(config=config, file_path='Tests_022023/project_folder/csv/input_csv/Together_1.csv', time_window_parameter=500)
- simba.utils.data.terminate_cpu_pool(pool: Pool, force: bool = False, verbose: bool = True, source: Optional[str] = None) None[source]ο
Safely terminates a multiprocessing.Pool instance with optional graceful shutdown.
Note
If pool is None or invalid, function returns without action. Exceptions during termination are silently caught.
- Parameters
pool (multiprocessing.pool.Pool) β The multiprocessing pool to terminate. If None, function returns without action.
force (bool) β If True, skips graceful shutdown (close/join) and immediately terminates. Default: False.
verbose (bool) β If True, prints termination message with timestamp. Default: True.
source (Optional[str]) β Optional identifier string for logging purposes (e.g., βVideoProcessorβ). Default: None.
- Example
>>> import multiprocessing >>> pool = multiprocessing.Pool(4) >>> terminate_cpu_pool(pool=pool, force=False, verbose=True, source='FeatureExtractor')
SimBA Enumeralsο
- class simba.utils.enums.ConfigKey(value)[source]ο
Bases:
EnumAn enumeration.
- ANIMAL_CNT = 'animal_no'ο
- BODYPART_DIRECTION_VALUE = 'bodypart_direction'ο
- CREATE_ENSEMBLE_SETTINGS = 'create ensemble settings'ο
- DIRECTIONALITY_SETTINGS = 'Directionality settings'ο
- DISPLAY_SETTINGS = 'DISPLAY SETTINGS'ο
- DISTANCE_MM = 'distance_mm'ο
- DISTANCE_PLOT_SETTINGS = 'Distance plot'ο
- FILE_TYPE = 'workflow_file_type'ο
- FOLDER_PATH = 'folder_path'ο
- FRAME_SETTINGS = 'Frame settings'ο
- GENERAL_SETTINGS = 'General settings'ο
- HEATMAP_SETTINGS = 'Heatmap settings'ο
- LINE_PLOT_SETTINGS = 'Line plot settings'ο
- LOCATION_CRITERION = 'location_criterion'ο
- MAX_ROI_DISPLAY_HEIGHT = 'max_roi_draw_display_ratio_height'ο
- MAX_ROI_DISPLAY_WIDTH = 'max_roi_draw_display_ratio_width'ο
- MIN_BOUT_LENGTH = 'Minimum_bout_lengths'ο
- MIN_ROI_DISPLAY_HEIGHT = 'min_roi_draw_display_ratio_height'ο
- MIN_ROI_DISPLAY_WIDTH = 'min_roi_draw_display_ratio_width'ο
- MODEL_DIR = 'model_dir'ο
- MOVEMENT_CRITERION = 'movement_criterion'ο
- MULTI_ANIMAL_IDS = 'ID_list'ο
- MULTI_ANIMAL_ID_SETTING = 'Multi animal IDs'ο
- OS = 'OS_system'ο
- OUTLIER_SETTINGS = 'Outlier settings'ο
- PATH_PLOT_SETTINGS = 'Path plot settings'ο
- POSE_SETTING = 'pose_estimation_body_parts'ο
- PROBABILITY_THRESHOLD = 'probability_threshold'ο
- PROCESS_MOVEMENT_SETTINGS = 'process movements'ο
- PROJECT_NAME = 'project_name'ο
- PROJECT_PATH = 'project_path'ο
- RF_JOBS = 'RF_n_jobs'ο
- ROI_ANIMAL_CNT = 'no_of_animals'ο
- ROI_SETTINGS = 'ROI settings'ο
- SKLEARN_BP_PROB_THRESH = 'bp_threshold_sklearn'ο
- SML_SETTINGS = 'SML settings'ο
- TARGET_CNT = 'no_targets'ο
- THRESHOLD_SETTINGS = 'threshold_settings'ο
- VALIDATION_SETTINGS = 'validation/run model'ο
- VALIDATION_VIDEO = 'generate_validation_video'ο
- VIDEO_INFO_CSV = 'video_info.csv'ο
- class simba.utils.enums.Defaults(value)[source]ο
Bases:
EnumAn enumeration.
- BROWSE_FILE_BTN_TEXT = 'Browse File'ο
- BROWSE_FOLDER_BTN_TEXT = 'Browse Folder'ο
- CHUNK_SIZE = 1ο
- LARGE_MAX_TASK_PER_CHILD = 1000ο
- MAXIMUM_MAX_TASK_PER_CHILD = 8000ο
- MAX_TASK_PER_CHILD = 10ο
- NO_FILE_SELECTED_TEXT = 'No file selected'ο
- SPLASH_TIME = 2500ο
- STR_SPLIT_DELIMITER = '\t'ο
- THREADSAFE_CORE_COUNT = 61ο
- WELCOME_MSG = 'Welcome fellow scientists! \n SimBA v.5.3.8 \n 'ο
- class simba.utils.enums.DirNames(value)[source]ο
Bases:
EnumAn enumeration.
- BP_NAMES = 'bp_names'ο
- CONFIGS = 'configs'ο
- CSV = 'csv'ο
- FEATURES_EXTRACTED = 'features_extracted'ο
- FRAMES = 'frames'ο
- INPUT = 'input'ο
- INPUT_CSV = 'input_csv'ο
- LOGS = 'logs'ο
- MACHINE_RESULTS = 'machine_results'ο
- MEASURES = 'measures'ο
- MODEL = 'models'ο
- OUTLIER_MOVEMENT = 'outlier_corrected_movement'ο
- OUTLIER_MOVEMENT_LOCATION = 'outlier_corrected_movement_location'ο
- OUTPUT = 'output'ο
- POSE_CONFIGS = 'pose_configs'ο
- PROJECT = 'project_folder'ο
- TARGETS_INSERTED = 'targets_inserted'ο
- VIDEOS = 'videos'ο
- class simba.utils.enums.Dtypes(value)[source]ο
Bases:
EnumAn enumeration.
- ENTROPY = 'entropy'ο
- FLOAT = 'float'ο
- FOLDER = 'folder_path'ο
- INT = 'int'ο
- NAN = 'NaN'ο
- NONE = 'None'ο
- SQRT = 'sqrt'ο
- STR = 'str'ο
- class simba.utils.enums.ENV_VARS(value)[source]ο
Bases:
EnumAn enumeration.
- CUML = 'CUML'ο
- NUMBA_PRECOMPILE = 'NUMBA_PRECOMPILE'ο
- PRINT_EMOJIS = 'PRINT_EMOJIS'ο
- UNSUPERVISED_INTERFACE = 'UNSUPERVISED_INTERFACE'ο
- class simba.utils.enums.FontPaths(value)[source]ο
Bases:
EnumAn enumeration.
- PLAYWRIGHT = PosixPath('assets/fonts/Playwrite ES Deco.ttf')ο
- POPPINS_BOLD = PosixPath('assets/fonts/Poppins Bold.ttf')ο
- POPPINS_REGULAR = PosixPath('assets/fonts/Poppins Regular.ttf')ο
- class simba.utils.enums.Formats(value)[source]ο
Bases:
EnumAn enumeration.
- AREA = 'area'ο
- AVI_CODEC = 'XVID'ο
- BATCH_CODEC = 'libx264'ο
- BTN_HOVER_CLR = '#d1e0e0'ο
- BUTTON_WIDTH_L = 310ο
- BUTTON_WIDTH_S = 135ο
- BUTTON_WIDTH_XL = 340ο
- BUTTON_WIDTH_XS = 105ο
- BUTTON_WIDTH_XXL = 360ο
- CSV = 'csv'ο
- DLC_FILETYPES = {'box': ['bx.h5', 'bx_filtered.h5'], 'ellipse': ['el.h5', 'el_filtered.h5'], 'skeleton': ['sk.h5', 'sk_filtered.h5']}ο
- DLC_NETWORK_FILE_NAMES = ['dlc_resnet50', 'dlc_resnet_50', 'dlc_dlcrnetms5', 'dlc_effnet_b0', 'dlc_resnet101']ο
- EXPECTED_VIDEO_INFO_COLS = ['Video', 'fps', 'Resolution_width', 'Resolution_height', 'Distance_in_mm', 'pixels/mm']ο
- FONT = 4ο
- FONT_HEADER = ('DejaVu Sans', 10, 'bold')ο
- FONT_LARGE = ('DejaVu Sans', 13, 'bold')ο
- FONT_LARGE_BOLD = ('DejaVu Sans', 13, 'bold')ο
- FONT_LARGE_ITALICS = ('DejaVu Sans', 13, 'italic')ο
- FONT_REGULAR = ('DejaVu Sans', 8)ο
- FONT_REGULAR_BOLD = ('DejaVu Sans', 8, 'bold')ο
- FONT_REGULAR_ITALICS = ('DejaVu Sans', 8, 'italic')ο
- FONT_SMALL = ('DejaVu Sans', 6)ο
- H5 = 'h5'ο
- INTEGER_DTYPES = (<class 'numpy.int64'>, <class 'numpy.int32'>, <class 'numpy.int8'>, <class 'numpy.uint8'>, <class 'int'>, <class 'numpy.integer'>)ο
- LABELFRAME_GREY = '#DCDCDC'ο
- LABELFRAME_HEADER_CLICKABLE_COLOR = '#0563c1'ο
- LABELFRAME_HEADER_CLICKABLE_FORMAT = ('Helvetica', 12, 'bold', 'underline')ο
- LABELFRAME_HEADER_FORMAT = ('Helvetica', 12, 'bold')ο
- MP4_CODEC = 'mp4v'ο
- NUMERIC_DTYPES = (<class 'numpy.float32'>, <class 'numpy.float64'>, <class 'numpy.int64'>, <class 'numpy.int32'>, <class 'numpy.int8'>, <class 'numpy.uint8'>, <class 'int'>, <class 'float'>, <class 'numpy.integer'>)ο
- PARQUET = 'parquet'ο
- PERIMETER = 'perimeter'ο
- PICKLE = 'pickle'ο
- ROOT_WINDOW_SIZE = '750x750'ο
- SUPERANIMAL_TOPVIEW_BP_NAMES = ['nose', 'left_ear', 'right_ear', 'left_ear_tip', 'right_ear_tip', 'left_eye', 'right_eye', 'neck', 'mid_back', 'mouse_center', 'mid_backend', 'mid_backend2', 'mid_backend3', 'tail_base', 'tail1', 'tail2', 'tail3', 'tail4', 'tail5', 'left_shoulder', 'left_midside', 'left_hip', 'right_shoulder', 'right_midside', 'right_hip', 'tail_end', 'head_midpoint']ο
- TXT_LOCATIONS = ('top_left', 'top_middle', 'top_right', 'bottom_left', 'bottom_middle', 'bottom_right')ο
- VALID_TABLEFMT = ('plain', 'simple', 'github', 'grid', 'simple_grid', 'rounded_grid', 'heavy_grid', 'mixed_grid', 'double_grid', 'fancy_grid', 'outline', 'simple_outline', 'rounded_outline', 'heavy_outline', 'mixed_outline', 'double_outline', 'fancy_outline', 'pipe', 'orgtbl', 'jira', 'presto', 'pretty', 'psql', 'rst', 'mediawiki', 'moinmoin', 'youtrack', 'html', 'unsafehtml', 'latex', 'latex_raw', 'latex_booktabs', 'latex_longtable', 'textile', 'tsv')ο
- XLXS = 'xlsx'ο
- class simba.utils.enums.GeometryEnum(value)[source]ο
Bases:
EnumAn enumeration.
- CAP_STYLE_MAP = {'flat': 3, 'round': 1, 'square': 2}ο
- CONTOURS_MODE_MAP = {'all': 1, 'exterior': 0, 'interior': 3}ο
- CONTOURS_RETRIEVAL_MAP = {'kcos': 4, 'l1': 3, 'none': 0, 'simple': 2}ο
- HISTOGRAM_COMPARISON_MAP = {'bhattacharyya': 3, 'chi_square': 1, 'chi_square_alternative': 5, 'correlation': 0, 'hellinger': 4, 'intersection': 2}ο
- RANKING_METHODS = ['area', 'min_distance', 'max_distance', 'mean_distance', 'left_to_right', 'top_to_bottom']ο
- class simba.utils.enums.Keys(value)[source]ο
Bases:
EnumAn enumeration.
- DOCUMENTATION = 'documentation'ο
- EAR_LEFT = 'Ear_left'ο
- EAR_RIGHT = 'Ear_right'ο
- FRAME_COUNT = 'frame_count'ο
- NOSE = 'Nose'ο
- ROI_CIRCLES = 'circleDf'ο
- ROI_POLYGONS = 'polygons'ο
- ROI_RECTANGLES = 'rectangles'ο
- X_BPS = 'X_bps'ο
- Y_BPS = 'Y_bps'ο
- class simba.utils.enums.Labelling(value)[source]ο
Bases:
EnumAn enumeration.
- MAX_FRM_SIZE = (1280, 650)ο
- PADDING = 5ο
- PLAY_VIDEO_SCRIPT_PATH = '/home/docs/checkouts/readthedocs.org/user_builds/simba-uw-tf-dev/checkouts/latest/simba/labelling/play_annotation_video.py'ο
- VALID_ANNOTATIONS_ADVANCED = [0, 1, 2]ο
- VIDEO_FRAME_SIZE = (700, 500)ο
- class simba.utils.enums.Links(value)[source]ο
Bases:
EnumAn enumeration.
- ADDITIONAL_IMPORTS = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario1.md#step-2-optional-step--import-more-dlc-tracking-data-or-videos'ο
- ADVANCED_LBL = 'https://github.com/sgoldenlab/simba/blob/master/docs/advanced_labelling.md'ο
- AGGREGATE_BOOL_STATS = 'https://github.com/sgoldenlab/simba/blob/master/docs/ROI_tutorial.md#compute-aggregate-conditional-statistics-from-boolean-fields'ο
- ANALYZE_ML_RESULTS = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario2.md#part-4--analyze-machine-results'ο
- ANALYZE_ROI = 'https://github.com/sgoldenlab/simba/blob/master/docs/ROI_tutorial.md#part-2-analyzing-roi-data'ο
- APPEND_ROI_FEATURES = 'https://github.com/sgoldenlab/simba/blob/master/docs/ROI_tutorial.md#part-3-generating-features-from-roi-data'ο
- BATCH_PREPROCESS = 'https://github.com/sgoldenlab/simba/blob/master/docs/tutorial_process_videos.md'ο
- BBOXES = 'https://github.com/sgoldenlab/simba/blob/master/docs/anchored_rois.md'ο
- BLOB_TRACKING = 'https://github.com/sgoldenlab/simba/blob/master/docs/blob_track.md'ο
- CIRCLE_CROP = 'https://github.com/sgoldenlab/simba/blob/master/docs/Tutorial_tools.md#circle-crop'ο
- CLF_VALIDATION = 'https://github.com/sgoldenlab/simba/blob/master/docs/classifier_validation.md'ο
- CONCAT_VIDEOS = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario2.md#merging-concatenating-videos'ο
- COUNT_ANNOTATIONS_IN_PROJECT = 'https://github.com/sgoldenlab/simba/blob/master/docs/label_behavior.md#count-annotations-in-simba-project'ο
- COUNT_ANNOTATIONS_OUTSIDE_PROJECT = 'https://github.com/sgoldenlab/simba/blob/master/docs/Tutorial_tools.md#extract-project-annotation-counts'ο
- CREATE_PROJECT = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario1.md#step-1-generate-project-config'ο
- CUE_LIGHTS = 'https://github.com/sgoldenlab/simba/blob/master/docs/cue_light_tutorial.md'ο
- DATA_ANALYSIS = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario2.md#part-4--analyze-machine-results'ο
- DATA_TABLES = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario2.md#visualizing-data-tables'ο
- DIRECTING_ANIMALS_PLOTS = 'https://github.com/sgoldenlab/simba/blob/master/docs/directionality_between_animals.md'ο
- DISTANCE_PLOTS = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario2.md#visualizing-distance-plots'ο
- DOWNSAMPLE = 'https://github.com/sgoldenlab/simba/blob/master/docs/Tutorial_tools.md#downsample-video'ο
- EXTRACT_FEATURES = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario1.md#step-5-extract-features'ο
- FEATURE_SUBSETS = 'https://github.com/sgoldenlab/simba/blob/master/docs/feature_subsets.md'ο
- FSTTC = 'https://github.com/sgoldenlab/simba/blob/master/docs/FSTTC.md'ο
- GANTT_PLOTS = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario2.md#visualizing-gantt-charts'ο
- GITHUB_REPO = 'https://github.com/sgoldenlab/simba'ο
- GITTER = 'https://gitter.im/SimBA-Resource/community'ο
- HEATMAP_CLF = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario2.md#visualizing-classification-heatmaps'ο
- HEATMAP_LOCATION = 'https://github.com/sgoldenlab/simba/blob/master/docs/ROI_tutorial.md#heatmaps'ο
- KLEINBERG = 'https://github.com/sgoldenlab/simba/blob/master/docs/kleinberg_filter.md'ο
- LABEL_BEHAVIOR = 'https://github.com/sgoldenlab/simba/blob/master/docs/label_behavior.md'ο
- LOAD_PROJECT = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario1.md#part-2-load-project-1'ο
- OSF_REPO = 'https://osf.io/tmu6y/'ο
- OULIERS = 'https://github.com/sgoldenlab/simba/blob/master/misc/Outlier_settings.pdf'ο
- OUTLIERS_DOC = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario1.md#step-4-outlier-correction'ο
- OUT_OF_SAMPLE_VALIDATION = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario1.md#step-8-evaluating-the-model-on-new-out-of-sample-data'ο
- PATH_PLOTS = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario2.md#visualizing-path-plots'ο
- PLOTLY = 'https://github.com/sgoldenlab/simba/blob/master/docs/plotly_dash.md'ο
- PSEUDO_LBL = 'https://github.com/sgoldenlab/simba/blob/master/docs/pseudoLabel.md'ο
- REMOVE_CLF = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario1.md#step-2-optional-step--import-more-dlc-tracking-data-or-videos'ο
- ROI = 'https://github.com/sgoldenlab/simba/blob/master/docs/ROI_tutorial_new.md'ο
- ROI_DATA_ANALYSIS = 'https://github.com/sgoldenlab/simba/blob/master/docs/ROI_tutorial.md#part-2-analyzing-roi-data'ο
- ROI_DATA_PLOT = 'https://github.com/sgoldenlab/simba/blob/master/docs/ROI_tutorial.md#part-4-visualizing-roi-data'ο
- ROI_FEATURES = 'https://github.com/sgoldenlab/simba/blob/master/docs/ROI_tutorial.md#part-3-generating-features-from-roi-data'ο
- ROI_FEATURES_PLOT = 'https://github.com/sgoldenlab/simba/blob/master/docs/ROI_tutorial.md#part-5-visualizing-roi-features'ο
- SCENARIO_2 = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario2.md'ο
- SCENARIO_4 = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario4_new.md'ο
- SET_RUN_ML_PARAMETERS = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario2.md#part-3-run-the-classifier-on-new-data'ο
- SIMBA_PIP_URL = 'https://pypi.org/pypi/simba-uw-tf-dev/json'ο
- SIMON_WEBSITE = 'https://sronilsson.netlify.app/'ο
- SKLEARN_PLOTS = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario2.md#visualizing-classifications'ο
- THIRD_PARTY_ANNOTATION = 'https://github.com/sgoldenlab/simba/blob/master/docs/third_party_annot.md'ο
- THIRD_PARTY_ANNOTATION_NEW = 'https://github.com/sgoldenlab/simba/blob/master/docs/third_party_annot_new.md'ο
- TRAIN_ML_MODEL = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario1.md#step-7-train-machine-model'ο
- USER_DEFINED_FEATURE_EXTRACTION = 'https://github.com/sgoldenlab/simba/blob/master/docs/extractFeatures.md'ο
- VIDEO_PARAMETERS = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario1.md#step-3-set-video-parameters'ο
- VIDEO_TOOLS = 'https://github.com/sgoldenlab/simba/blob/master/docs/Tutorial_tools.md'ο
- VISUALIZATION = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario2.md#part-5--visualizing-results'ο
- VISUALIZE_CLF_PROBABILITIES = 'https://github.com/sgoldenlab/simba/blob/master/docs/Scenario2.md#visualizing-classification-probabilities'ο
- YOLO_11_WEIGHTS = {'yolo11l-pose': 'https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11l-pose.pt', 'yolo11m-pose': 'https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11s-pose.pt', 'yolo11n-pose': 'https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n-pose.pt', 'yolo11s-pose': 'https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11s-pose.pt', 'yolo11x-pose': 'https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x-pose.pt'}ο
- class simba.utils.enums.MLParamKeys(value)[source]ο
Bases:
EnumAn enumeration.
- CLASSIFIER = 'classifier'ο
- CLASSIFIER_MAP = 'classifier_map'ο
- CLASSIFIER_NAME = 'classifier_name'ο
- CLASS_CUSTOM_WEIGHTS = 'class_custom_weights'ο
- CLASS_WEIGHTS = 'class_weights'ο
- CLF_REPORT = 'generate_classification_report'ο
- CUDA = 'cuda'ο
- EX_DECISION_TREE = 'generate_example_decision_tree'ο
- EX_DECISION_TREE_FANCY = 'generate_example_decision_tree_fancy'ο
- IMPORTANCE_BARS_N = 'N_feature_importance_bars'ο
- IMPORTANCE_BAR_CHART = 'generate_features_importance_bar_graph'ο
- IMPORTANCE_LOG = 'generate_features_importance_log'ο
- LEARNING_CURVE = 'generate_sklearn_learning_curves'ο
- LEARNING_CURVE_DATA_SPLITS = 'learning_curve_data_splits'ο
- LEARNING_CURVE_K_SPLITS = 'learning_curve_k_splits'ο
- LEARNING_DATA_SPLITS = 'LearningCurve_shuffle_data_splits'ο
- MIN_LEAF = 'rf_min_sample_leaf'ο
- MODEL_TO_RUN = 'model_to_run'ο
- N_FEATURE_IMPORTANCE_BARS = 'n_feature_importance_bars'ο
- OVERSAMPLE_RATIO = 'over_sample_ratio'ο
- OVERSAMPLE_SETTING = 'over_sample_setting'ο
- PARTIAL_DEPENDENCY = 'partial_dependency'ο
- PERMUTATION_IMPORTANCE = 'compute_feature_permutation_importance'ο
- PRECISION_RECALL = 'generate_precision_recall_curves'ο
- RF_CRITERION = 'rf_criterion'ο
- RF_ESTIMATORS = 'rf_n_estimators'ο
- RF_MAX_DEPTH = 'rf_max_depth'ο
- RF_MAX_FEATURES = 'rf_max_features'ο
- RF_METADATA = 'generate_rf_model_meta_data_file'ο
- RF_META_DATA = 'RF_meta_data'ο
- SAVE_TRAIN_TEST_FRM_IDX = 'save_train_test_frm_idx'ο
- SHAP_ABSENT = 'shap_target_absent_no'ο
- SHAP_MULTIPROCESS = 'shap_multiprocess'ο
- SHAP_PRESENT = 'shap_target_present_no'ο
- SHAP_SAVE_ITERATION = 'shap_save_iteration'ο
- SHAP_SCORES = 'generate_shap_scores'ο
- TRAIN_TEST_SPLIT_TYPE = 'train_test_split_type'ο
- TT_SIZE = 'train_test_size'ο
- UNDERSAMPLE_RATIO = 'under_sample_ratio'ο
- UNDERSAMPLE_SETTING = 'under_sample_setting'ο
- class simba.utils.enums.Methods(value)[source]ο
Bases:
EnumAn enumeration.
- ADDITIONAL_THIRD_PARTY_CLFS = 'ADDITIONAL third-party behavior detected'ο
- AGG_METHODS = ('mean', 'median')ο
- ANOVA = 'ANOVA'ο
- BORIS = 'BORIS'ο
- CLASSIC_TRACKING = 'Classic tracking'ο
- CREATE_POSE_CONFIG = 'Create pose config...'ο
- ERROR = 'ERROR'ο
- FACEMAP = 'facemap'ο
- GAUSSIAN = 'Gaussian'ο
- INVALID_THIRD_PARTY_APPENDER_FILE = 'INVALID annotations file data format'ο
- MULTI_TRACKING = 'Multi tracking'ο
- RANDOM_UNDERSAMPLE = 'random undersample'ο
- SAVITZKY_GOLAY = 'Savitzky Golay'ο
- SIMBA_BLOB = 'simba_blob'ο
- SMOTE = 'SMOTE'ο
- SMOTEENN = 'SMOTEENN'ο
- SPLIT_TYPE_BOUTS = 'BOUTS'ο
- SPLIT_TYPE_FRAMES = 'FRAMES'ο
- SUPER_ANIMAL_TOPVIEW = 'superanimal_topview'ο
- THIRD_PARTY_ANNOTATION_FILE_NOT_FOUND = 'Annotations data file NOT FOUND'ο
- THIRD_PARTY_EVENT_COUNT_CONFLICT = 'Annotations EVENT COUNT conflict'ο
- THIRD_PARTY_EVENT_OVERLAP = 'Annotations OVERLAP inaccuracy'ο
- THIRD_PARTY_FPS_CONFLICT = 'Annotations and pose FPS conflict'ο
- THIRD_PARTY_FRAME_COUNT_CONFLICT = 'Annotations and pose FRAME COUNT conflict'ο
- THREE_D_TRACKING = '3D tracking'ο
- USER_DEFINED = 'user_defined'ο
- WARNING = 'WARNING'ο
- ZERO_THIRD_PARTY_VIDEO_ANNOTATIONS = 'ZERO third-party video annotations found'ο
- ZERO_THIRD_PARTY_VIDEO_BEHAVIOR_ANNOTATIONS = 'ZERO third-party video behavior annotations found'ο
- class simba.utils.enums.OS(value)[source]ο
Bases:
EnumAn enumeration.
- FORK = 'fork'ο
- LINUX = 'Linux'ο
- MAC = 'Darwin'ο
- PYTHON_VER = '3.6'ο
- SIMBA_VERSION = '5.3.8'ο
- SPAWN = 'spawn'ο
- WINDOWS = 'Windows'ο
- class simba.utils.enums.Options(value)[source]ο
Bases:
EnumAn enumeration.
- ALL_IMAGE_FORMAT_OPTIONS = ('.bmp', '.png', '.jpeg', '.jpg', '.webp')ο
- ALL_IMAGE_FORMAT_STR_OPTIONS = '.bmp .png .jpeg .jpg'ο
- ALL_VIDEO_FORMAT_OPTIONS = ('.avi', '.mp4', '.mov', '.flv', '.m4v', '.webm', '.h264')ο
- ALL_VIDEO_FORMAT_OPTIONS_2 = ('avi', 'mp4', 'mov', 'flv', 'm4v', 'webm', 'h264')ο
- ALL_VIDEO_FORMAT_STR_OPTIONS = '.avi .mp4 .mov .flv .m4v .webm .h264'ο
- ALL_YOLO_MODEL_FORMAT_STR_OPTIONS = '.onnx .engine .jit .onnx .mlmodel .xml .pb .pb .tflite .pt'ο
- ANIMAL_ALIGNED = 'animal-aligned'ο
- AXIS_ALIGNED = 'axis-aligned'ο
- BBOX_OPTIONS = ['axis-aligned', 'animal-aligned']ο
- BOOL_STR_OPTIONS = ['TRUE', 'FALSE']ο
- BUCKET_METHODS = ['fd', 'doane', 'auto', 'scott', 'stone', 'rice', 'sturges', 'sqrt']ο
- CLASSICAL_TRACKING_OPTIONS = ['1 animal; 4 body-parts', '1 animal; 7 body-parts', '1 animal; 8 body-parts', '1 animal; 9 body-parts', '2 animals; 8 body-parts', '2 animals; 14 body-parts', '2 animals; 16 body-parts', 'MARS', 'SimBA BLOB Tracking', 'FaceMap']ο
- CLASS_WEIGHT_OPTIONS = ['None', 'balanced', 'balanced_subsample', 'custom']ο
- CLF_CRITERION = ['gini', 'entropy']ο
- CLF_DESCRIPTIVES_OPTIONS = ['Bout count', 'Total event duration (s)', 'Mean event bout duration (s)', 'Median event bout duration (s)', 'First event occurrence (s)', 'Mean event bout interval duration (s)', 'Median event bout interval duration (s)']ο
- CLF_MAX_FEATURES = ['sqrt', 'log2', 'None']ο
- CLF_MODELS = ['RF', 'GBC', 'XGBoost']ο
- CLF_TEST_SIZE_OPTIONS = ['0.1', '0.2', '0.3', '0.4', '0.5', '0.6', '0.7', '0.8', '0.9']ο
- CV2_FONTS = [0, 1, 2, 3, 4, 5, 6, 7]ο
- DPI_OPTIONS = [100, 200, 400, 800, 1600, 3200]ο
- FEATURE_SUBSET_OPTIONS = ['Two-point body-part distances (mm)', 'Within-animal three-point body-part angles (degrees)', 'Within-animal three-point convex hull perimeters (mm)', 'Within-animal four-point convex hull perimeters (mm)', 'Entire animal convex hull perimeters (mm)', 'Entire animal convex hull area (mm2)', 'Frame-by-frame body-part movements (mm)', 'Frame-by-frame body-part distances to ROI centers (mm)', 'Frame-by-frame body-parts inside ROIs (Boolean)']ο
- GANTT_VALIDATION_OPTIONS = ['None', 'Gantt chart: final frame only (slightly faster)', 'Gantt chart: video']ο
- HEATMAP_BIN_SIZE_OPTIONS = ['10Γ10', '20Γ20', '40Γ40', '80Γ80', '100Γ100', '160Γ160', '320Γ320', '640Γ640', '1280Γ1280']ο
- HEATMAP_SHADING_OPTIONS = ['gouraud', 'flat']ο
- HHMMSSSSSS = 'HH:MM:SS.SSSS'ο
- IMPORT_TYPE_OPTIONS = ['CSV (DLC/DeepPoseKit)', 'CSV (SimBA BLOB)', 'CSV (SimBA YOLO)', 'CSV (SLEAP)', 'H5 (FaceMap)', 'H5 (multi-animal DLC)', 'H5 (SLEAP)', 'H5 (SuperAnimal-TopView)', 'JSON (BENTO)', 'MAT (DANNCE 3D)', 'SLP (SLEAP)', 'TRK (multi-animal APT)']ο
- INTERPOLATION_OPTIONS = ['Animal(s): Nearest', 'Animal(s): Linear', 'Animal(s): Quadratic', 'Body-parts: Nearest', 'Body-parts: Linear', 'Body-parts: Quadratic']ο
- INTERPOLATION_OPTIONS_W_NONE = ['None', 'Animal(s): Nearest', 'Animal(s): Linear', 'Animal(s): Quadratic', 'Body-parts: Nearest', 'Body-parts: Linear', 'Body-parts: Quadratic']ο
- MIN_MAX_SCALER = 'MIN-MAX'ο
- MULTI_ANIMAL_TRACKING_OPTIONS = ['Multi-animals; 4 body-parts', 'Multi-animals; 7 body-parts', 'Multi-animals; 8 body-parts', 'AMBER', 'SuperAnimal-TopView']ο
- MULTI_DLC_TYPE_IMPORT_OPTION = ['skeleton', 'box', 'ellipse']ο
- OVERSAMPLE_OPTIONS = ['None', 'SMOTE', 'SMOTEENN']ο
- PALETTE_OPTIONS = ['magma', 'jet', 'inferno', 'plasma', 'viridis', 'gnuplot2', 'RdBu', 'winter', 'coolwarm']ο
- PALETTE_OPTIONS_CATEGORICAL = ['Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'Set1', 'Set2', 'Set3', 'tab10', 'tab20']ο
- PERFORM_FLAGS = ['yes', True, 'True']ο
- QUANTILE_SCALER = 'QUANTILE'ο
- RESOLUTION_OPTIONS = ['320Γ240', '640Γ480', '720Γ480', '800Γ640', '960Γ800', '1120Γ960', '1280Γ720', '1980Γ1080']ο
- RESOLUTION_OPTIONS_2 = ['AUTO', 240, 320, 480, 640, 720, 800, 960, 1120, 1080, 1980, 2560, 3024, 5120, 6400, 7680, 8192]ο
- ROLLING_WINDOW_DIVISORS = [2, 5, 6, 7.5, 15]ο
- RUN_OPTIONS_FLAGS = ['yes', True, 'True', 'False', 'no', False, 'true', 'false']ο
- SCALER_NAMES = ['MIN-MAX', 'STANDARD', 'QUANTILE']ο
- SCALER_OPTIONS = ['MIN-MAX', 'STANDARD', 'QUANTILE']ο
- SECONDS = 'SECONDS'ο
- SMOOTHING_OPTIONS = ['Gaussian', 'Savitzky Golay']ο
- SMOOTHING_OPTIONS_W_NONE = ['None', 'Gaussian', 'Savitzky Golay']ο
- SPEED_OPTIONS = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0]ο
- STANDARD_SCALER = 'STANDARD'ο
- THIRD_PARTY_ANNOTATION_APPS_OPTIONS = ['BORIS', 'ETHOVISION', 'OBSERVER', 'SOLOMON', 'DEEPETHOGRAM', 'BENTO']ο
- THIRD_PARTY_ANNOTATION_ERROR_OPTIONS = ['INVALID annotations file data format', 'ADDITIONAL third-party behavior detected', 'Annotations OVERLAP conflict', 'ZERO third-party video behavior annotations found', 'Annotations and pose FRAME COUNT conflict', 'Annotations EVENT COUNT conflict', 'Annotations data file NOT FOUND']ο
- THREE_DIM_TRACKING_OPTIONS = ['3D tracking']ο
- TIMEBINS_MEASURMENT_OPTIONS = ['First occurrence (s)', 'Event count', 'Total event duration (s)', 'Mean event duration (s)', 'Median event duration (s)', 'Mean event interval (s)', 'Median event interval (s)']ο
- TIMER_OPTIONS = ['HH:MM:SS.SSSS', 'seconds']ο
- TRACKING_TYPE_OPTIONS = ['Classic tracking', 'Multi tracking', '3D tracking']ο
- TRAIN_TEST_SPLIT = ['FRAMES', 'BOUTS']ο
- UNDERSAMPLE_OPTIONS = ['None', 'random undersample']ο
- UNSUPERVISED_FEATURE_OPTIONS = ['INCLUDE FEATURE DATA (ORIGINAL)', 'INCLUDE FEATURES (SCALED)', 'EXCLUDE FEATURE DATA']ο
- VALID_YOLO_FORMATS = ['onnx', 'engine', 'torchscript', 'onnxsimplify', 'coreml', 'openvino', 'pb', 'tf', 'tflite', 'torch']ο
- VIDEO_FORMAT_OPTIONS = ['mp4', 'avi']ο
- WORKFLOW_FILE_TYPE_OPTIONS = ['csv', 'parquet']ο
- WORKFLOW_FILE_TYPE_STR_OPTIONS = '.csv .parquet'ο
- class simba.utils.enums.PackageNames(value)[source]ο
Bases:
EnumAn enumeration.
- ULTRALYTICS = 'ultralytics'ο
- class simba.utils.enums.Paths(value)[source]ο
Bases:
EnumAn enumeration.
- ABOUT_ME = PosixPath('assets/img/about_me.png')ο
- ANNOTATED_FRAMES_DIR = PosixPath('frames/output/annotated_frames')ο
- BG_IMG_PATH = PosixPath('assets/img/bg_2024.png')ο
- BLOB_EXECUTOR_PATH = PosixPath('video_processors/blob_tracking_executor.py')ο
- BLOB_POSITION_PATH = PosixPath('csv/output/blob_positions')ο
- BODY_PART_DIRECTIONALITY_DF_DIR = PosixPath('logs/body_part_directionality_dataframes')ο
- BP_NAMES = PosixPath('logs/measures/pose_configs/bp_names/project_bp_names.csv')ο
- CLF_DATA_VALIDATION_DIR = PosixPath('csv/validation')ο
- CLF_VALIDATION_DIR = PosixPath('frames/output/classifier_validation')ο
- CLUSTER_EXAMPLES = PosixPath('frames/output/cluster_examples')ο
- CONCAT_VIDEOS_DIR = PosixPath('frames/output/merged')ο
- CRITICAL_VALUES = PosixPath('simba/assets/lookups/critical_values_05.pickle')ο
- CUE_LIGHTS_PATH = PosixPath('csv/cue_lights')ο
- DATA_TABLE = PosixPath('frames/output/live_data_table')ο
- DETAILED_ROI_DATA_DIR = PosixPath('logs/Detailed_ROI_data')ο
- DIRECTING_ANIMALS_OUTPUT_PATH = PosixPath('frames/output/ROI_directionality_visualize')ο
- DIRECTING_BETWEEN_ANIMALS_OUTPUT_PATH = PosixPath('frames/output/Directing_animals')ο
- DIRECTING_BETWEEN_ANIMAL_BODY_PART_OUTPUT_PATH = PosixPath('frames/output/Body_part_directing_animals')ο
- DIRECTIONALITY_DF_DIR = PosixPath('logs/directionality_dataframes')ο
- ENV_PATH = PosixPath('assets/.env')ο
- FEATURES_EXTRACTED_DIR = PosixPath('csv/features_extracted')ο
- FRAMES_OUTPUT_DIR = PosixPath('frames/output')ο
- GANTT_PLOT_DIR = PosixPath('frames/output/gantt_plots')ο
- HEATMAP_CLF_LOCATION_DIR = PosixPath('frames/output/heatmaps_classifier_locations')ο
- HEATMAP_LOCATION_DIR = PosixPath('frames/output/heatmaps_locations')ο
- ICON_ASSETS = PosixPath('assets/icons')ο
- INPUT_CSV = PosixPath('csv/input_csv')ο
- INPUT_FRAMES_DIR = PosixPath('frames/input')ο
- KALEIDO_PATH = '/home/docs/checkouts/readthedocs.org/user_builds/simba-uw-tf-dev/checkouts/latest/simba/kaleido/executable/bin/kaleido.exe'ο
- LANDING_MOVIE = PosixPath('assets/img/landing.mp4')ο
- LINE_PLOT_DIR = PosixPath('frames/output/line_plot')ο
- LOGO_ICON_DARWIN_PATH = PosixPath('assets/icons/SimBA_logo_3.png')ο
- LOGO_ICON_WINDOWS_PATH = PosixPath('assets/icons/SimBA_logo_3.ico')ο
- MACHINE_RESULTS_DIR = PosixPath('csv/machine_results')ο
- OUTLIER_CORRECTED = PosixPath('csv/outlier_corrected_movement_location')ο
- OUTLIER_CORRECTED_MOVEMENT = PosixPath('csv/outlier_corrected_movement')ο
- PATH_PLOT_DIR = PosixPath('frames/output/path_plots')ο
- PROBABILITY_PLOTS_DIR = PosixPath('frames/output/probability_plots')ο
- PROJECT_POSE_CONFIG_NAMES = PosixPath('pose_configurations/configuration_names/pose_config_names.csv')ο
- RECENT_PROJECTS_PATHS = PosixPath('assets/.recent_projects.txt')ο
- ROI_ANALYSIS = PosixPath('frames/output/ROI_analysis')ο
- ROI_DEFINITIONS = PosixPath('measures/ROI_definitions.h5')ο
- ROI_FEATURES = PosixPath('frames/output/ROI_features')ο
- SCHEMATICS = PosixPath('pose_configurations/schematics')ο
- SHAP_LOGS = PosixPath('logs/shap')ο
- SIMBA_BP_CONFIG_PATH = PosixPath('pose_configurations/bp_names/bp_names.csv')ο
- SIMBA_FEATURE_EXTRACTION_COL_NAMES_PATH = PosixPath('assets/lookups/feature_extraction_headers.csv')ο
- SIMBA_NO_ANIMALS_PATH = PosixPath('pose_configurations/no_animals/no_animals.csv')ο
- SIMBA_SHAP_CATEGORIES_PATH = PosixPath('assets/shap/feature_categories/shap_feature_categories.csv')ο
- SIMBA_SHAP_IMG_PATH = PosixPath('assets/shap')ο
- SIMON_SMALL_IMG = PosixPath('assets/img/simon_n.webp')ο
- SINGLE_CLF_VALIDATION = PosixPath('frames/output/validation')ο
- SKLEARN_RESULTS = PosixPath('frames/output/sklearn_results')ο
- SPLASH_PATH_LINUX = PosixPath('assets/img/splash.PNG')ο
- SPLASH_PATH_MOVIE = PosixPath('assets/img/splash_2024.mp4')ο
- SPLASH_PATH_WINDOWS = PosixPath('assets/img/splash.png')ο
- SPONTANEOUS_ALTERNATION_VIDEOS_DIR = PosixPath('frames/output/spontanous_alternation')ο
- TARGETS_INSERTED_DIR = PosixPath('csv/targets_inserted')ο
- TEST_PATH = '/Users/simon/Desktop/envs/simba_dev/simba/'ο
- TOOLTIPS = PosixPath('assets/lookups/tooptips.json')ο
- UNSUPERVISED_MODEL_NAMES = PosixPath('assets/lookups/model_names.parquet')ο
- VIDEO_INFO = PosixPath('logs/video_info.csv')ο
- YOLO_SCHEMATICS_DIR = PosixPath('assets/lookups/yolo_schematics')ο
- class simba.utils.enums.ROI_SETTINGS(value)[source]ο
Bases:
EnumAn enumeration.
- CIRCLE = 'circle'ο
- CLICK_SENSITIVITY = 10ο
- DUPLICATION_JUMP_SIZE = 20ο
- EAR_TAG_SIZE_OPTIONS = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]ο
- FONT = 0ο
- GREY_CLR = (128, 128, 128)ο
- KEYBOARD_SENSITIVITY = 3ο
- LINE_TYPE = -1ο
- LINE_TYPE_OPTIONS = [4, 8, 16, -1]ο
- OUTSIDE_ROI = 'OUTSIDE REGIONS OF INTEREST'ο
- OVERLAY_GRID_COLOR = (192, 192, 192)ο
- POLYGON = 'polygon'ο
- POLYGON_TOLERANCE = 2ο
- RECTANGLE = 'rectangle'ο
- ROI_SELECT_CLR = (105, 105, 105)ο
- ROI_TRACKING_STYLE = 'FALSE'ο
- SELECT_COLOR = 'red'ο
- SHAPE_THICKNESS_OPTIONS = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]ο
- SHOW_GRID_OVERLAY = 'FALSE'ο
- TEXT_THICKNESS = 2ο
- UNSELECT_COLOR = 'black'ο
- class simba.utils.enums.TagNames(value)[source]ο
Bases:
EnumAn enumeration.
- CLASS_INIT = 'CLASS_INIT'ο
- COMPLETE = 'complete'ο
- ERROR = 'error'ο
- GREETING = 'greeting'ο
- INFORMATION = 'information'ο
- STANDARD = 'standard'ο
- TRASH = 'trash'ο
- WARNING = 'warning'ο
- class simba.utils.enums.TestPaths(value)[source]ο
Bases:
EnumAn enumeration.
- CRITICAL_VALUES = '../simba/assets/lookups/critical_values_05.pickle'ο
- class simba.utils.enums.TextOptions(value)[source]ο
Bases:
EnumAn enumeration.
- BORDER_BUFFER_X = 5ο
- BORDER_BUFFER_Y = 10ο
- COLOR = (147, 20, 255)ο
- FIRST_LINE_SPACING = 2ο
- FLAMINGO = (172, 142, 252)ο
- FONT = 0ο
- FONT_SCALER = 0.8ο
- LINE_SPACING = 1ο
- LINE_THICKNESS = 2ο
- RADIUS_SCALER = 10ο
- RESOLUTION_SCALER = 1500ο
- SPACE_SCALER = 25ο
- TEXT_THICKNESS = 1ο
- WHITE = (255, 255, 255)ο
- class simba.utils.enums.TkBinds(value)[source]ο
Bases:
EnumAn enumeration.
- B1_MOTION = '<B1-Motion>'ο
- B1_PRESS = '<ButtonPress-1>'ο
- B1_RELEASE = '<ButtonRelease-1>'ο
- CTRL_LEFT_PRESS = '<KeyPress-Control_L>'ο
- CTRL_LEFT_RELEASE = '<KeyRelease-Control_L>'ο
- CTRL_RIGHT_PRESS = '<KeyPress-Control_R>'ο
- CTRL_RIGHT_RELEASE = '<KeyPress-Control_R>'ο
- DOWN = '<Down>'ο
- ENTER = '<Enter>'ο
- ESCAPE = '<Escape>'ο
- LEAVE = '<Leave>'ο
- LEFT = '<Left>'ο
- RIGHT = '<Right>'ο
- SHIFT_LEFT_PRESS = '<KeyPress-Shift_L>'ο
- SHIFT_LEFT_RELEASE = '<KeyRelease-Shift_L>'ο
- SHIFT_RIGHT_PRESS = '<KeyPress-Shift_R>'ο
- SHIFT_RIGHT_RELEASE = '<KeyRelease-Shift_R>'ο
- UP = '<Up>'ο
- class simba.utils.enums.UMAPParam(value)[source]ο
Bases:
EnumAn enumeration.
- HYPERPARAMETERS = ['n_neighbors', 'min_distance', 'spread', 'scaler', 'variance']ο
- MIN_DISTANCE = 'min_distance'ο
- N_NEIGHBORS = 'n_neighbors'ο
- SCALER = 'scaler'ο
- SPREAD = 'spread'ο
- VARIANCE = 'variance'ο
- class simba.utils.enums.UML(value)[source]ο
Bases:
EnumAn enumeration.
- ALL_FEATURES_EXCLUDING_POSE = 'ALL FEATURES (EXCLUDING POSE)'ο
- ALL_FEATURES_EX_POSE = 'ALL FEATURES (EXCLUDING POSE)'ο
- ALL_FEATURES_INCLUDING_POSE = 'ALL FEATURES (INCLUDING POSE)'ο
- ALPHA = 'alpha'ο
- BOUTS_FEATURES = 'BOUTS_FEATURES'ο
- BOUTS_TARGETS = 'BOUTS_TARGETS'ο
- BOUT_AGGREGATION_TYPE = 'bout_aggregation_type'ο
- CLASSIFIER = 'CLASSIFIER'ο
- CLF_SLICE_SELECTION = 'clf_slice'ο
- CLUSTER_MODEL = 'CLUSTER_MODEL'ο
- COLLINEAR_FIELDS = 'COLLINEAR_FIELDS'ο
- CSV = 'CSV'ο
- DATA = 'DATA'ο
- DATASET_DATA_FIELDS = ['FRAME_FEATURES', 'FRAME_POSE', 'FRAME_TARGETS', 'BOUTS_FEATURES', 'BOUTS_TARGETS']ο
- DATA_SLICE_SELECTION = 'data_slice'ο
- DR_MODEL = 'DR_MODEL'ο
- END_FRAME = 'END_FRAME'ο
- EPSILON = 'cluster_selection_epsilon'ο
- EUCLIDEAN = 'euclidean'ο
- FEATURES = 'FEATURES'ο
- FEATURE_NAMES = 'FEATURE_NAMES'ο
- FEATURE_PATH = 'feature_path'ο
- FIT_KEYS = ('n_neighbors', 'min_distance', 'spread')ο
- FORMAT = 'format'ο
- FRAME = 'FRAME'ο
- FRAME_FEATURES = 'FRAME_FEATURES'ο
- FRAME_POSE = 'FRAME_POSE'ο
- FRAME_TARGETS = 'FRAME_TARGETS'ο
- HASHED_NAME = 'HASH'ο
- HDBSCAN = 'HDBSCAN'ο
- HYPERPARAMETERS = ['n_neighbors', 'min_distance', 'spread', 'scaler', 'variance']ο
- LOW_VARIANCE_FIELDS = 'LOW_VARIANCE_FIELDS'ο
- METHODS = 'METHODS'ο
- MIN_BOUT_LENGTH = 'min_bout_length'ο
- MIN_CLUSTER_SIZE = 'min_cluster_size'ο
- MIN_DISTANCE = 'min_distance'ο
- MIN_MAX = 'MIN-MAX'ο
- MIN_SAMPLES = 'min_samples'ο
- MODEL = 'MODEL'ο
- MULTICOLLINEARITY = 'multicollinearity'ο
- MULTICOLLINEARITY_THRESHOLD = 'MULTICOLLINEARITY_THRESHOLD'ο
- NAMES = 'NAMES'ο
- N_NEIGHBORS = 'n_neighbors'ο
- PARAMETERS = 'PARAMETERS'ο
- PROBABILITY = 'PROBABILITY'ο
- QUANTILE = 'QUANTILE'ο
- RAW = 'RAW'ο
- SCALED = 'scaled'ο
- SCALED_DATA = 'SCALED_DATA'ο
- SCALED_TRAIN_DATA = 'SCALED_TRAIN_DATA'ο
- SCALER = 'scaler'ο
- SCALER_TYPE = 'SCALER_TYPE'ο
- SPREAD = 'spread'ο
- STANDARD = 'STANDARD'ο
- START_FRAME = 'START_FRAME'ο
- TRAIN_DATA = 'TRAIN_DATA'ο
- TSNE = 'TSNE'ο
- UMAP = 'UMAP'ο
- UNSCALED_TRAIN_DATA = 'UNSCALED_TRAIN_DATA'ο
- USER_DEFINED_SET = 'USER-DEFINED FEATURE SET'ο
- VARIANCE = 'variance'ο
- VARIANCE_THRESHOLD = 'VARIANCE_THRESHOLD'ο
- VIDEO = 'VIDEO'ο
SimBA Errorsο
- exception simba.utils.errors.AdvancedLabellingError(frame: str, lbl_lst: list, unlabel_lst: list, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.AnimalNumberError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.AnnotationFileNotFoundError(video_name: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.ArrayError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.BodypartColumnNotFoundError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.ClassifierInferenceError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.ColumnNotFoundError(column_name: str, file_name: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.CorruptedFileError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.CountError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.CropError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.DataHeaderError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.DirectoryExistError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.DirectoryNotEmptyError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.DuplicationError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.FFMPEGCodecGPUError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.FFMPEGNotFoundError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.FaultyTrainingSetError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.FeatureNumberMismatchError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.FileExistError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.FloatError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.FrameRangeError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.IntegerError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.InvalidFileTypeError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.InvalidFilepathError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.InvalidHyperparametersFileError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.InvalidInputError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.InvalidVideoFileError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.MissingColumnsError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.MissingProjectConfigEntryError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.MixedMosaicError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.NoChoosenClassifierError(source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.NoChoosenMeasurementError(source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.NoChoosenROIError(source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.NoDataError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.NoFilesFoundError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.NoROIDataError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.NoSpecifiedOutputError(msg: str, source: str = '', show_window: bool = True)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.NotDirectoryError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.ParametersFileError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.PermissionError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.ROICoordinatesNotFoundError(expected_file_path: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.ResolutionError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.SamplingError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.SimBAGPUError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.SimBAModuleNotFoundError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.SimBAPAckageVersionError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.SimbaError(msg: str, source: str = ' ', show_window: bool = False)[source]ο
Bases:
Exception
- exception simba.utils.errors.StringError(msg: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.ThirdPartyAnnotationEventCountError(video_name: str, clf_name: str, start_event_cnt: int, stop_event_cnt: int, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.ThirdPartyAnnotationFileNotFoundError(video_name: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.ThirdPartyAnnotationOverlapError(video_name: str, clf_name: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.ThirdPartyAnnotationsAdditionalClfError(video_name: str, clf_names: list, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.ThirdPartyAnnotationsClfMissingError(video_name: str, clf_name: str, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.ThirdPartyAnnotationsFpsConflictError(video_name: str, annotation_fps: int, video_fps: int, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.ThirdPartyAnnotationsMissingAnnotationsError(video_name: str, clf_names: list, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
- exception simba.utils.errors.ThirdPartyAnnotationsOutsidePoseEstimationDataError(video_name: str, frm_cnt: int, clf_name: Optional[str] = None, annotation_frms: Optional[int] = None, first_error_frm: Optional[int] = None, ambiguous_cnt: Optional[int] = None, source: str = '', show_window: bool = False)[source]ο
Bases:
SimbaError
Lookupsο
Bases:
objectCounter that can be shared across processes on different cores
- simba.utils.lookups.cardinality_to_integer_lookup() Dict[str, int][source]ο
Create dictionary that maps cardinal compass directions to integers.
- Example
>>> data = ["N", "NE", "E", "SE", "S", "SW", "W", "NW"] >>> [cardinality_to_integer_lookup()[d] for d in data] >>> [0, 1, 2, 3, 4, 5, 6, 7]
- simba.utils.lookups.check_for_updates(time_out: int = 2)[source]ο
Check for SimBA package updates by querying PyPI and comparing with the installed version.
Fetches the latest SimBA version from PyPI and compares it with the currently installed version. Prints an informational message indicating whether an update is available or if the installation is up-to-date. Requires an active internet connection to query PyPI.
- Parameters
time_out (int) β Timeout in seconds for the PyPI API request. Default is 2 seconds. Must be at least 1 second.
- Returns
None. Prints update information to stdout via stdout_information.
- Raises
SimBAPAckageVersionError β If the latest version cannot be fetched from PyPI, or if the local SimBA version cannot be determined.
- Example
>>> check_for_updates() >>> # Prints: "UP-TO-DATE. You have the latest SimBA version (1.0.0)." >>> # or: "NEW SimBA VERSION AVAILABLE. You have SimBA version 1.0.0. The latest version is 1.1.0..."
- simba.utils.lookups.create_color_palettes(no_animals: int, map_size: int) List[List[int]][source]ο
Create list of lists of bgr colors, one for each animal. Each list is pulled from a different palette matplotlib color map.
- Parameters
- Return List[List[int]]
BGR colors
- Example
>>> create_color_palettes(no_animals=2, map_size=2) >>> [[[255.0, 0.0, 255.0], [0.0, 255.0, 255.0]], [[102.0, 127.5, 0.0], [102.0, 255.0, 255.0]]]
- simba.utils.lookups.create_directionality_cords(bp_dict: dict, left_ear_name: str, nose_name: str, right_ear_name: str) dict[source]ο
Helper to create a dictionary mapping animal body-parts (nose, left ear, right ear) to their X and Y coordinate column names for directionality analysis.
- Parameters
bp_dict (dict) β Dictionary with animal names as keys and body-part coordinate information as values. Expected to contain βX_bpsβ and βY_bpsβ keys with lists of column names.
left_ear_name (str) β Name of the left ear body-part to search for in coordinate column names.
nose_name (str) β Name of the nose body-part to search for in coordinate column names.
right_ear_name (str) β Name of the right ear body-part to search for in coordinate column names.
- Returns
Nested dictionary with animal names as keys, body-part types (nose, ear_left, ear_right) as second-level keys, and coordinate types (X_bps, Y_bps) as third-level keys with corresponding column names as values.
- Return type
- Raises
InvalidInputError β If any required body-part or coordinate cannot be found in the input dictionary.
- Example
>>> bp_dict = {'Animal_1': {'X_bps': ['Animal_1_Nose_x', 'Animal_1_Ear_left_x', 'Animal_1_Ear_right_x'], 'Y_bps': ['Animal_1_Nose_y', 'Animal_1_Ear_left_y', 'Animal_1_Ear_right_y']}} >>> create_directionality_cords(bp_dict=bp_dict, left_ear_name='Ear_left', nose_name='Nose', right_ear_name='Ear_right') >>> {'Animal_1': {'nose': {'X_bps': 'Animal_1_Nose_x', 'Y_bps': 'Animal_1_Nose_y'}, 'ear_left': {'X_bps': 'Animal_1_Ear_left_x', 'Y_bps': 'Animal_1_Ear_left_y'}, 'ear_right': {'X_bps': 'Animal_1_Ear_right_x', 'Y_bps': 'Animal_1_Ear_right_y'}}}
- simba.utils.lookups.find_best_multi_animal_assignment_frame(h5_path: Union[str, PathLike], expected_animals: int, strategy: typing_extensions.Literal['longest_run_middle', 'first'] = 'longest_run_middle', min_bodyparts_per_animal: int = 1) Optional[int][source]ο
Find a frame index suitable for the SimBA multi-animal identity-assignment UI.
Scans a DeepLabCut multi-animal H5 (e.g.
_el.h5/_full.h5) and returns a frame index where allexpected_animalsindividuals have at leastmin_bodyparts_per_animalnon-NaN body-part detections. Useful for jumping the multi-animal assignment UI straight to a frame where every animal is clearly tracked, skipping the manual βxβ-stepping loop insimba.mixins.pose_importer_mixin.PoseImporterMixin.multianimal_identification().The recommendation can be used as the
initial_frame_noargument tosimba.pose_importers.superanimal_import.SuperAnimalTopViewImporter(or any other multi-animal importer that exposes the same parameter).Note
The function expects a modern DLC PyTorch / multi-animal pandas H5 layout with at least an
individualscolumn level. Single-animal files and legacy DLC TF files withoutindividualscannot be analysed this way and returnNonewith a warning.- Parameters
h5_path (Union[str, os.PathLike]) β Path to a DLC multi-animal H5 file with an
individualscolumn level (typically modern DLC PyTorch backend output).expected_animals (int) β Number of animals the SimBA project is configured for, i.e. the number of distinct individuals that must all be simultaneously detected on the returned frame. Must be >= 1.
strategy (Literal['longest_run_middle', 'first']) β How to pick among candidate frames.
'longest_run_middle'(default) returns the midpoint of the longest consecutive run of frames where all animals meet the body-part threshold (most robust for the assignment UI).'first'returns the first qualifying frame.min_bodyparts_per_animal (int) β Minimum number of non-NaN body-parts that each animal must have on a candidate frame. Default
1reproduces the original βat least one body-part visible per animalβ behaviour. Higher values yield frames where animals are more completely tracked, which makes click-based identity assignment more reliable (e.g. for SuperAnimal-TopView with 27 body parts per animal,min_bodyparts_per_animal=14requires that more than half of every animalβs body-parts are tracked on the returned frame).
- Returns
Frame index recommended for the assignment UI, or
Noneif no frame in the file satisfies the constraint, or if the file does not contain a multi-animal layout.- Return type
Optional[int]
- Example
>>> frame = find_best_multi_animal_assignment_frame( ... h5_path=r'G:\projects\edmayelle\raw_data\HCS17_..._el.h5', ... expected_animals=5, ... ) >>> # frame == 3313 (middle of the longest 5-mice run)
- Example require >= 10 body-parts per animal for higher-quality assignment frames
>>> frame = find_best_multi_animal_assignment_frame( ... h5_path=..., expected_animals=5, min_bodyparts_per_animal=10)
- simba.utils.lookups.find_closest_string(target: str, string_list: List[str], case_sensitive: bool = False, token_based: bool = True) Optional[Tuple[str, Union[int, float]]][source]ο
Find the closest string in a list to a target string using hybrid similarity matching.
This function uses a combination of token-based matching and Levenshtein distance to find the best match. Token-based matching is particularly useful for strings like body part names where word order may vary (e.g., βLeft_earβ vs βEar_leftβ).
- Parameters
target (str) β The target string to match against.
string_list (List[str]) β List of strings to search through.
case_sensitive (bool) β If True, comparison is case-sensitive. If False (default), comparison is case-insensitive.
token_based (bool) β If True (default), uses hybrid token-based and Levenshtein matching which handles word reordering better. If False, uses pure Levenshtein distance only.
- Returns
Tuple of (closest_string, distance) or None if string_list is empty. When token_based=True, distance is a float score (lower is better). When token_based=False, distance is integer edit distance.
- Return type
- Example
>>> find_closest_string("cat", ["dog", "car", "bat"]) >>> ('car', 0.33) >>> find_closest_string("Left_ear", ["Ear_left", "Right_ear", "Nose"]) >>> ('Ear_left', 0.0) >>> find_closest_string("CAT", ["dog", "car", "bat"], case_sensitive=False) >>> ('car', 0.33) >>> find_closest_string("CAT", ["dog", "car", "bat"], case_sensitive=True, token_based=False) >>> ('car', 3)
- simba.utils.lookups.get_body_part_configurations() Dict[str, Union[str, PathLike]][source]ο
Return dict with named body-part schematics of pose-estimation schemas in SimBA installation as keys, and paths to the images representing those body-part schematics as values.
- simba.utils.lookups.get_bp_config_code_class_pairs() Dict[str, object][source]ο
Helper to match SimBA project_config.ini [create ensemble settings][pose_estimation_body_parts] setting to feature extraction module class.
- simba.utils.lookups.get_bp_config_codes() Dict[str, str][source]ο
Helper to match SimBA project_config.ini [create ensemble settings][pose_estimation_body_parts] to string names.
- simba.utils.lookups.get_color_dict() Dict[str, Tuple[int, int, int]][source]ο
Get dict of color names as keys and RGB tuples as values
- simba.utils.lookups.get_display_resolution() Tuple[int, int][source]ο
Helper to get main monitor / display resolution.
Note
May return the virtual geometry in multi-display setups. To return the resolution of each available monitor in mosaic, see
simba.utils.lookups.get_monitor_info().
- simba.utils.lookups.get_emojis() Dict[str, str][source]ο
Helper to get dictionary of emojis with names as keys and emojis as values. Note, the same emojis are represented differently in different python versions.
- simba.utils.lookups.get_ext_codec_map() Dict[str, str][source]ο
Get a dictionary mapping video file extensions to their recommended FFmpeg codecs. Automatically falls back to alternative codecs if the preferred codec is not available.
- Returns
Dictionary mapping file extensions (without leading dot) to codec names.
- Return type
- Example
>>> codec_map = get_ext_codec_map() >>> codec = codec_map.get('webm', 'libx264') # Returns 'libvpx-vp9' or fallback
- simba.utils.lookups.get_ffmpeg_codec(file_name: Union[str, PathLike], fallback: str = 'mpeg4') str[source]ο
Get the recommended FFmpeg codec for a video file based on its extension.
- Parameters
file_name (Union[str, os.PathLike]) β Path to video file or file extension.
fallback (str) β Codec to return if file extension is not recognized. Default: βmpeg4β.
- Returns
Recommended FFmpeg codec name for the video file.
- Return type
- Example
>>> codec = get_ffmpeg_codec(file_name='video.mp4') >>> codec = get_ffmpeg_codec(file_name='video.webm', fallback='libx264') >>> codec = get_ffmpeg_codec(file_name=r'C:/videos/my_video.avi')
- simba.utils.lookups.get_ffmpeg_encoders(raise_error: bool = True, alphabetically_sorted: bool = False) List[str][source]ο
Get a list of all available FFmpeg encoders.
- Parameters
raise_error (bool) β If True, raises an exception when FFmpeg is not available or the command fails. If False, returns an empty list on error. Default: True.
- Returns
List of encoder names (e.g., [βlibx264β, βaacβ, βlibvpxβ, β¦]). Returns empty list if FFmpeg is unavailable and raise_error=False.
- Return type
List[str]
- Example
>>> codecs = get_ffmpeg_encoders() >>> print(Formats.BATCH_CODEC.value in codecs)
- simba.utils.lookups.get_fonts(sort_alphabetically: bool = False)[source]ο
Returns a dictionary with all fonts available in OS, with the font name as key and font path as value
- simba.utils.lookups.get_icons_paths() Dict[str, Union[str, PathLike]][source]ο
Helper to get dictionary with icons with the icon names as keys (grabbed from file-name) and their file paths as values.
- simba.utils.lookups.get_img_resize_info(img_size: Tuple[int, int], display_resolution: Optional[Tuple[int, int]] = None, max_height_ratio: float = 0.5, max_width_ratio: float = 0.5, min_height_ratio: float = 0.0, min_width_ratio: float = 0.0) Tuple[int, int, float, float][source]ο
Calculates the new dimensions and scaling factors needed to resize an image while preserving its aspect ratio so that it fits within a given portion of the display resolution.
:param Tuple[int, int] img_size : The original size of the image as (width, height). :param Optional[Tuple[int, int]] display_resolution: Optional resolution of the display as (width, height). If none, then grabs the resolution of the main monitor. :param float max_height_ratio: The maximum allowed height of the image as a fraction of the display height (default is 0.5). :param float max_width_ratio: The maximum allowed width of the image as a fraction of the display width (default is 0.5). :returns: Length 4 tuple with resized width, resized height, downscale factor, and upscale factor :rtype: Tuple[int, int, float, float]
- simba.utils.lookups.get_labelling_img_kbd_bindings() dict[source]ο
Returns dictionary of tkinter keyboard bindings.
Note
Change
kbdvalues to change keyboard shortcuts. For example:- Some possible examples:
<Key>, <KeyPress>, <KeyRelease>: Binds to any key press or release. <KeyPress-A>, <Key-a>: Binds to the βaβ key press (case sensitive). <Up>, <Down>, <Left>, <Right>: Binds to the arrow keys. <Control-KeyPress-A>, <Control-a>: Binds to Ctrl + A or Ctrl + a
- simba.utils.lookups.get_labelling_video_kbd_bindings() dict[source]ο
Returns a dictionary of OpenCV-compatible keyboard bindings for video labeling.
Notes
Change the kbd values to customize keyboard shortcuts.
OpenCV key codes differ from Tkinter bindings (see get_labelling_img_kbd_bindings).
Use either single-character strings (e.g. βpβ) or integer ASCII codes (e.g. 32 for space bar).
Examples
- Remap space bar to Pause/Play:
{βPause/Playβ: {βlabelβ: βSpace = Pause/Playβ, βkbdβ: 32}}
- simba.utils.lookups.get_meta_data_file_headers() List[str][source]ο
Get List of headers for SimBA classifier metadata output.
- Return List[str]
- simba.utils.lookups.get_monitor_info() Tuple[Dict[int, Dict[str, int]], Tuple[int, int]][source]ο
Helper to get main monitor / display resolution.
Note
Returns dict containing the resolution of each available monitor. To get the virtual geometry, see
simba.utils.lookups.get_display_resolution(), and tuple of main monitor width and height.
- simba.utils.lookups.get_nvdec_count(gpu_name: Optional[str] = None) int[source]ο
Return the number of concurrent NVDEC (hardware video decode) sessions typical for the GPU model.
Note
When
gpu_nameis None, the first GPU name reported bynvidia-smiis used. Matching is done by substring: the longest dictionary key contained ingpu_namewins, so shorter names do not shadow longer ones (e.g.RTX 4070 Ti SuperbeforeRTX 4070 Ti). Unknown or unmatched GPUs return1.
- simba.utils.lookups.get_random_color_palette(n_colors: int)[source]ο
Get a random color palette with N random colors.
- simba.utils.lookups.get_table(data: Dict[str, Any], headers: Optional[Tuple[str, str]] = ('SETTING', 'VALUE'), tablefmt: str = 'grid') str[source]ο
Create a formatted table string from dictionary data using the tabulate library.
Converts a dictionary into a formatted table string suitable for display or printing. Each key-value pair in the dictionary becomes a row in the table.
- Parameters
data (Dict[str, Any]) β Dictionary containing the data to be formatted as a table. Keys become the first column, values become the second column.
headers (Optional[Tuple[str, str]]) β Tuple of two strings representing the column headers. Default is (βSETTINGβ, βVALUEβ).
tablefmt (Literal["grid"]) β Table format style. For options, see simba.utils.enums.Formats.VALID_TABLEFMT
- Return str
Formatted table string ready for display or printing.
- Example
>>> data = {"fps": 30, "width": 1920, "height": 1080, "frame_count": 3000} >>> table = get_table(data=data, headers=("PARAMETER", "VALUE"))
- simba.utils.lookups.get_third_party_appender_file_formats() Dict[str, str][source]ο
Helper to get dictionary that maps different third-party annotation tools with different file formats.
- simba.utils.lookups.integer_to_cardinality_lookup()[source]ο
Create dictionary that maps integers to cardinal compass directions.
- simba.utils.lookups.intermittent_palette(n: int = 10, base_light: float = 0.55, contrast_delta: float = 0.18, seed_hue: Optional[float] = None, output: typing_extensions.Literal['rgb', 'rgb255', 'hex'] = 'rgb', rng: Optional[Random] = None) Union[List[Tuple[float, float, float]], List[Tuple[int, int, int]], List[str]][source]ο
Generate a categorical colour palette with evenly spaced hues and alternating lightness.
Note
Use to get color palette where immediate colors are distinct.
- Parameters
n (int) β Number of colours to generate. Must be greater than or equal to 1.
base_light (float) β Midpoint HSV value (0-1) used as the baseline lightness. Default
0.55.contrast_delta (float) β Lightness offset added/subtracted per colour to improve visual separation. Default
0.18.seed_hue (Optional[float]) β Initial hue (0-1). If
None, a random hue is sampled. DefaultNone.output (str) β Output colour format. One of
{"rgb", "rgb255", "hex"}. Default"rgb".rng (Optional[random.Random]) β Optional pre-seeded RNG for reproducible random starts.
- Returns
Colour palette in the requested format (RGB floats, RGB 0-255 integers, or hexadecimal strings).
- Return type
Union[List[Tuple[float, float, float]], List[Tuple[int, int, int]], List[str]]
- Example
>>> palette = intermittent_palette(n=6, output="hex") >>> palette >>> ['#a33f46', '#51a5df', '#b36824', '#4dbd9f', '#c749b4', '#7a9a3e']
- simba.utils.lookups.load_simba_fonts()[source]ο
Load fonts defined in simba.utils.enums.FontPaths into memory
- simba.utils.lookups.percent_to_crf_lookup() Dict[str, int][source]ο
Create dictionary that matches human-readable percent values to FFmpeg Constant Rate Factor (CRF) values that regulates video quality in CPU codecs. Higher CRF values translates to lower video quality and reduced file sizes.
- simba.utils.lookups.percent_to_qv_lk()[source]ο
Create dictionary that matches human-readable percent values to FFmpeg regulates video quality in CPU codecs. Higher FFmpeg quality scores maps to smaller, lower quality videos. Used in some AVI codecs such as βdivxβ and βmjpegβ.
- simba.utils.lookups.print_video_meta_data(data_path: Union[str, PathLike]) None[source]ο
Print video metadata as formatted tables to the console.
This function reads video metadata from either a single video file or all video files in a directory, then prints the metadata as formatted tables.
See also
To get video metadata as a dictionary without printing, use
simba.utils.read_write.get_video_meta_data(). To get video metadata as a table without printing, usesimba.utils.lookups.get_table().- Parameters
data_path (Union[str, os.PathLike]) β Path to video file or directory containing videos.
- Returns
None. Video metadata is printed as formatted tables in the main console.
SimBA Printingο
- class simba.utils.printing.SimbaTimer(start: bool = False, perf_counter: bool = False)[source]ο
Bases:
objectTimer class for keeping track of start and end-times of calls
- simba.utils.printing.log_event(logger_name: str, log_type: typing_extensions.Literal['CLASS_INIT', 'error', 'warning'], msg: str)[source]ο
- simba.utils.printing.stdout_information(msg: str, source: Optional[str] = '', elapsed_time: Optional[str] = None) None[source]ο
Helper to parse information msg to SimBA main interface. E.g., how many monitors and their resolutions which is available.
- simba.utils.printing.stdout_success(msg: str, source: Optional[str] = '', elapsed_time: Optional[str] = None) None[source]ο
Helper to parse msg of completed operation to SimBA main interface.
- simba.utils.printing.stdout_trash(msg: str, source: Optional[str] = '', elapsed_time: Optional[str] = None) None[source]ο
Helper to parse msg of delete operation to SimBA main interface.
Reading and writingο
- simba.utils.read_write.archive_processed_files(config_path: Union[str, PathLike], archive_name: str) None[source]ο
Archive files within a SimBA project.
- Parameters
See also
- Example
>>> archive_processed_files(config_path='project_folder/project_config.ini', archive_name='my_archive')
- simba.utils.read_write.bento_file_reader(file_path: Union[str, PathLike], fps: Optional[float] = None, orient: Optional[typing_extensions.Literal['index', 'columns']] = 'index', save_path: Optional[Union[str, PathLike]] = None, raise_error: Optional[bool] = False, log_setting: Optional[bool] = False) Union[None, Dict[str, DataFrame]][source]ο
Reads a BENTO annotation file and processes it into a dictionary of DataFrames, each representing a classified behavior. Optionally, the results can be saved to a specified path.
The function handles both frame-based and second-based annotations, converting the latter to frame-based annotations if the frames-per-second (FPS) is provided or can be inferred from the file.
- Parameters
file_path (Union[str, os.PathLike]) β Path to the BENTO annotation file.
fps (Optional[float]) β Frames per second (FPS) for converting second-based annotations to frames. If not provided, the function will attempt to infer FPS from the file. If FPS is required and cannot be inferred, an error is raised.
save_path (Optional[Union[str, os.PathLike]]) β Path to save the processed results as a pickle file. If None, results are returned instead of saved.
- Returns
A dictionary where the keys are classifier names and the values are DataFrames with βSTARTβ and βSTOPβ columns representing the start and stop frames of each behavior.
- Return type
Dict[str, pd.DataFrame]
- Example
>>> bento_file_reader(file_path=r"C:/troubleshooting/bento_test/bento_files/20240812_crumpling3.annot")
- simba.utils.read_write.bgr_to_rgb_tuple(value: Tuple[int, int, int]) Tuple[int, int, int][source]ο
convert bgr tuple to rgb tuple
- simba.utils.read_write.check_if_hhmmss_timestamp_is_valid_part_of_video(timestamp: str, video_path: Union[str, PathLike]) None[source]ο
Helper to check that a timestamp in HH:MM:SS format is a valid timestamp in a video file.
- Parameters
- Raises
FrameRangeError β If timestamp is not in the video file. E.g., timestamp 00:01:00 will raise FrameRangeError if the video is 59s long.
- Example
>>> check_if_hhmmss_timestamp_is_valid_part_of_video(timestamp='01:00:05', video_path='/Users/simon/Desktop/video_tests/Together_1.avi') >>> "FrameRangeError: The timestamp '01:00:05' does not occur in video Together_1.avi, the video has length 10s"
- simba.utils.read_write.clean_sleap_file_name(filename: str) str[source]ο
Clean a SLEAP input filename by removing β.analysisβ suffix, the video number, and project name prefix, to match orginal video name.
Note
Modified from vtsai881.
- Parameters
filename (str) β The original filename to be cleaned to match video name.
- Returns str
The cleaned filename.
- Example
>>> clean_sleap_file_name("projectname.v00x.00x_videoname.analysis.csv") >>> 'videoname.csv' >>> clean_sleap_file_name("projectname.v00x.00x_videoname.analysis.h5") >>> 'videoname.h5'
- simba.utils.read_write.clean_sleap_filenames_in_directory(dir: Union[str, PathLike], verbose: bool = False) None[source]ο
Clean up SLEAP input filenames in the specified directory by removing a prefix and a suffix, and renaming the files to match the names of the original video files.
Note
Modified from vtsai881.
- Parameters
dir (Union[str, os.PathLike]) β The directory path where the SLEAP CSV or H5 files are located.
- Example
>>> clean_sleap_filenames_in_directory(dir='/Users/simon/Desktop/envs/troubleshooting/Hornet_SLEAP/import/')
- simba.utils.read_write.concatenate_videos_in_folder(in_folder: Union[str, PathLike, bytes], save_path: Union[str, PathLike], file_paths: Optional[List[Union[str, PathLike]]] = None, video_format: Optional[str] = 'mp4', substring: Optional[str] = None, remove_splits: Optional[bool] = True, gpu: Optional[bool] = False, fps: Optional[Union[int, str]] = None, verbose: bool = True) None[source]ο
Concatenate (temporally) all video files in a folder into a single video.
Important
Input video parts will be joined in alphanumeric order, should ideally have to have sequential numerical ordered file names, e.g.,
1.mp4,2.mp4β¦.Note
If substring and file_paths are both not None, then file_paths with be sliced and only file paths with substring will be retained.
- Parameters
in_folder (Union[str, os.PathLike]) β Path to folder holding un-concatenated video files.
save_path (Union[str, os.PathLike]) β Path to the saved the output file. Note: If the path exist, it will be overwritten
file_paths (Optional[List[Union[str, os.PathLike]]]) β If not None, then the files that should be joined. If None, then all files. Default None.
video_format (Optional[str]) β The format of the video clips that should be concatenated. Default: mp4.
substring (Optional[str]) β If a string, then only videos in in_folder with a filename that contains substring will be joined. If None, then all are joined. Default: None.
video_format β Format of the input video files in
in_folder. Default:mp4.remove_splits (Optional[bool]) β If true, the input splits in the
in_folderwill be removed following concatenation. Default: True.
- Return type
None
- simba.utils.read_write.convert_csv_to_parquet(directory: Union[str, PathLike]) None[source]ο
Convert all csv files in a folder to parquet format.
- Parameters
directory (str) β Path to directory holding csv files.
- Raises
NoFilesFoundError β The directory has no
csvfiles.- Examples
>>> convert_parquet_to_csv(directory='project_folder/csv/input_csv')
- simba.utils.read_write.convert_parquet_to_csv(directory: str) None[source]ο
Convert all parquet files in a directory to csv format.
- Parameters
directory (str) β Path to directory holding parquet files
- Raises
NoFilesFoundError β The directory has no
parquetfiles.- Examples
>>> convert_parquet_to_csv(directory='project_folder/csv/input_csv')
- simba.utils.read_write.copy_files_in_directory(in_dir: Union[str, PathLike], out_dir: Union[str, PathLike], raise_error: bool = True, filetype: Optional[str] = None, prefix: Optional[str] = None, verbose: Optional[bool] = False, skip_truncated_img: Optional[bool] = False) None[source]ο
Copy files from the specified input directory to the output directory.
- Parameters
in_dir (Union[str, os.PathLike]) β The input directory from which files will be copied.
out_dir (Union[str, os.PathLike]) β The output directory where files will be copied to.
raise_error (bool) β If True, raise an error if no files are found in the input directory. Default is True.
filetype (Optional[str]) β If specified, only copy files with the given file extension. Default is None, meaning all files will be copied.
prefix (Optional[str]) β If specified, the given prefix will be added to the copied filesβ names.
- Example
>>> copy_files_in_directory('/input_dir', '/output_dir', raise_error=True, filetype='txt')
- simba.utils.read_write.copy_files_to_directory(file_paths: Union[List[Union[str, PathLike]], str, PathLike], dir: Union[str, PathLike], verbose: Optional[bool] = True, overwrite: bool = True, check_validity: bool = True, integer_save_names: Optional[bool] = False) List[Union[str, PathLike]][source]ο
Copy a list of files to a specified directory.
- Parameters
file_paths (List[Union[str, os.PathLike]]) β List of paths to the files to be copied, or a single filepath string.
dir (Union[str, os.PathLike]) β Path to the directory where files will be copied.
verbose (Optional[bool]) β If True, prints progress information. Default True.
integer_save_names (Optional[bool]) β If True, saves files with integer names. E.g., file one in
file_pathswill be saved as dir/0.
- Return List[Union[str, os.PathLike]]
List of paths to the copied files
- simba.utils.read_write.copy_multiple_videos_to_project(config_path: Union[str, PathLike], source: Union[str, PathLike], file_type: str, symlink: Optional[bool] = False, recursive_search: Optional[bool] = False, allowed_video_formats: Optional[Tuple[str]] = ('avi', 'mp4')) None[source]ο
Import directory of videos to SimBA project.
- Parameters
config_path (Union[str, os.PathLike]) β path to SimBA project config file in Configparser format
source (Union[str, os.PathLike]) β Path to directory with video files outside SimBA project.
file_type (str) β Video format of imported videos (i.e.,: mp4 or avi)
symlink (Optional[bool]) β If True, creates soft copies rather than hard copies. Default: False.
recursive_search (Optional[bool]) β If True, copies all video files in subdirectories and immediately in
source. If False, only files immediately insource. Default: False.allowed_video_formats (Optional[Tuple[str]]) β Allowed video formats. DEFAULT: avi or mp4
- simba.utils.read_write.copy_single_video_to_project(simba_ini_path: Union[str, PathLike], source_path: Union[str, PathLike], symlink: bool = False, allowed_video_formats: Optional[Tuple[str]] = ('avi', 'mp4'), overwrite: Optional[bool] = False) None[source]ο
Import single video file to SimBA project
- Parameters
simba_ini_path (Union[str, os.PathLike]) β path to SimBA project config file in Configparser format
source_path (Union[str, os.PathLike]) β Path to video file outside SimBA project.
symlink (Optional[bool]) β If True, creates soft copy rather than hard copy. Default: False.
allowed_video_formats (Optional[Tuple[str]]) β Allowed video formats. DEFAULT: avi or mp4
overwrite (Optional[bool]) β If True, overwrites existing video if it exists in SimBA project. Else, raise FileExistError.
- simba.utils.read_write.create_directory(paths: Union[str, PathLike, bytes, List[str], Tuple[str]], overwrite: bool = False, verbose: bool = False) None[source]ο
Create one or multiple directories.
- Parameters
paths (Union[str, os.PathLike, bytes, List[str], Tuple[str]]) β A single path or a list/tuple of paths to create. Each path must be a non-empty string.
overwrite β If True and the directory already exists, it will be deleted and recreated. If False, the existing directory will be preserved.
- Returns
None
- simba.utils.read_write.create_empty_xlsx_file(xlsx_path: Union[str, PathLike])[source]ο
Create an empty MS Excel file. :param Union[str, os.PathLike] xlsx_path: Path where to save MS Excel file on disk.
- simba.utils.read_write.df_to_xlsx_sheet(xlsx_path: Union[str, PathLike], df: DataFrame, sheet_name: str, create_file: bool = True) None[source]ο
Append a DataFrame as a new worksheet in an Excel workbook.
If
xlsx_pathdoes not exist andcreate_fileis True, an empty workbook is created first. The function then appendsdfas a new sheet. If a sheet withsheet_namealready exists, aDuplicationErroris raised.Note
The DataFrame index is written to Excel because
DataFrame.to_excelis called with default settings.- Parameters
xlsx_path (Union[str, os.PathLike]) β Path to the target
.xlsxfile.df (pd.DataFrame) β DataFrame to write into the new worksheet.
sheet_name (str) β Name of the worksheet to create.
create_file (bool) β If True, create a new workbook when
xlsx_pathis missing. If False, raiseNoFilesFoundErrorwhen the file does not exist.
- Returns
None.
- Return type
None
- Raises
NoFilesFoundError β If
xlsx_pathdoes not exist andcreate_fileis False.DuplicationError β If
sheet_namealready exists in the workbook.InvalidInputError β If inputs fail validation.
- simba.utils.read_write.drop_df_fields(data: DataFrame, fields: List[str], raise_error: Optional[bool] = False) DataFrame[source]ο
Drops specified fields in dataframe.
- Parameters
pd.DataFrame β Data in pandas format.
fields (List[str]) β Columns to drop.
:return pd.DataFrame
- simba.utils.read_write.extract_audio_from_video(video_path: Union[str, PathLike], save_path: Union[str, PathLike], bitrate: str = '192k', sample_rate: int = 44100) None[source]ο
Extract audio track from video file and save as MP3.
- Parameters
video_path (Union[str, os.PathLike]) β Path to input video file.
save_path (Union[str, os.PathLike]) β Path where the MP3 file will be saved.
bitrate (str) β Audio bitrate (e.g., β128kβ, β192kβ, β320kβ). Default: β192kβ.
sample_rate (int) β Audio sample rate in Hz. Default: 44100.
- Raises
InvalidInputError β If video has no audio track or ffmpeg is not available.
FFMPEGCodecGPUError β If ffmpeg extraction fails.
- Example
>>> extract_audio_from_video(video_path='my_video.mp4', save_path='audio.mp3') >>> extract_audio_from_video(video_path='my_video.mp4', save_path='audio.mp3', bitrate='320k')
- simba.utils.read_write.fetch_pip_data(pip_url: str = 'https://pypi.org/pypi/simba-uw-tf-dev/json', time_out: int = 2) Union[Tuple[Dict[str, Any], str], Tuple[None, None]][source]ο
Fetch PyPI package metadata from a PyPI JSON API URL.
Retrieves package information from the PyPI JSON API endpoint and extracts the latest version. Used primarily for checking if newer versions of SimBA are available. Returns the full JSON response data and the latest version string, or (None, None) if the request fails.
- Parameters
pip_url (str) β URL to the PyPI JSON API endpoint for the package. Defaults to SimBAβs PyPI URL.
- Returns
Tuple containing (JSON data dictionary, latest version string) on success, or (None, None) on failure.
- Return type
- Example
>>> json_data, version = fetch_pip_data() >>> if version: >>> print(f"Latest version: {version}")
- simba.utils.read_write.find_all_videos_in_directory(directory: Union[str, PathLike], as_dict: bool = False, raise_error: bool = False, video_formats: Tuple[str] = ('.avi', '.mp4', '.mov', '.flv', '.m4v', '.webm'), sort_alphabetically: bool = False) Union[dict, list][source]ο
Get all video file paths within a provided directory
- Parameters
directory (str) β Directory to search for video files.
as_dict (bool) β If True, returns dictionary with the video name as key and file path as value.
raise_error (bool) β If True, raise error if no videos are found. Else, NoFileFoundWarning.
video_formats (Tuple[str]) β Acceptable video formats. Default: β.aviβ, β.mp4β, β.movβ, β.flvβ, β.m4vβ.
:return Either a list or dictionary of all available video files in the
directory. :rtype: Union[dict, list]- Raises
NoFilesFoundError β If
raise_erroranddirectoryhas no files in formatsvideo_formats.- Examples
>>> find_all_videos_in_directory(directory='project_folder/videos')
- simba.utils.read_write.find_all_videos_in_project(videos_dir: Union[str, PathLike], basename: Optional[bool] = False, raise_error: bool = True) List[str][source]ο
Get filenames of .avi and .mp4 files within a directory
- Parameters
- Example
>>> find_all_videos_in_project(videos_dir='project_folder/videos') >>> ['project_folder/videos/Together_2.avi', 'project_folder/videos/Together_3.avi', 'project_folder/videos/Together_1.avi']
- simba.utils.read_write.find_closest_readable_frame(video_path: Union[str, PathLike], target_frame: int, max_search_range: int = 50) Tuple[Optional[ndarray], Optional[int]][source]ο
Finds the closest readable frame to a target frame index.
This function attempts to read the target frame from a video. If the target frame cannot be read (e.g., due to corruption or encoding issues), it searches nearby frames in both directions to find the closest readable frame.
- Parameters
video_path (Union[str, os.PathLike]) β Path to video file.
target_frame (int) β Target frame index to read (0-based).
max_search_range (int) β Maximum number of frames to search in each direction from target. Default: 100.
- Returns
Tuple of (frame array, actual frame index) or (None, None) if no readable frame found.
- Return type
Tuple[Optional[np.ndarray], Optional[int]]
- Example
>>> frame, actual_idx = find_closest_readable_frame(video_path='video.mp4', target_frame=10810) >>> if frame is not None: >>> print(f"Read frame {actual_idx} (target was 10810, offset: {actual_idx - 10810})")
- simba.utils.read_write.find_core_cnt() Tuple[int, int][source]ο
Find the local cpu count and quarter of the cpu counts.
- Return int
The local cpu count
- Return int
The local cpu count // 4
- Example
>>> find_core_cnt() >>> (8, 2)
- simba.utils.read_write.find_files_of_filetypes_in_directory(directory: Union[str, PathLike], extensions: Union[List[str], Tuple[str], str], raise_warning: bool = True, as_dict: bool = False, raise_error: bool = False, sort_alphabetically: bool = False) Union[List[str], Dict[str, str]][source]ο
Find all files in a directory of specified extensions/types.
- Parameters
directory (str) β Directory holding files.
extensions (List[str]) β Accepted file extensions as a list of string, string, or tuple.
raise_warning (bool) β If True, raise warning if no files are found. Default True.
raise_error (bool) β If True, raise error if no files are found. Default False.
as_dict (bool) β If True, returns a dictionary with all filenames as keys and filepaths as values. If False, then a list of all filepaths. Default False.
- Returns
All files in
directorywith the specified extension(s).- Return type
- Example
>>> find_files_of_filetypes_in_directory(directory='project_folder/videos', extensions=['mp4', 'avi', 'png'], raise_warning=False)
- simba.utils.read_write.find_largest_blob_location(imgs: Dict[int, ndarray], verbose: bool = False, video_name: Optional[str] = None, inclusion_zone: Optional[Union[Polygon, MultiPolygon]] = None) Dict[int, ndarray][source]ο
Helper to find the largest connected component in binary image. E.g., Use to find a βblobβ (i.e., animal) within a background subtracted image.
- Parameters
imgs (Dict[int, np.ndarray]) β Dictionary of images where the key is the frame id and the value is an image in np.ndarray format.
verbose (bool) β If True, prints progress. Default: False.
video_name (video_name) β The name of the video being processed for interpretable progress msg if
verbose.inclusion_zones (Optional[np.ndarray]) β If not None, then 2D numpy array of ROI / shape vertices. If not None, the largest blob will be searched for only in the ROI.
- Returns
Dictionary where the key is the frame id and the value is a 2D array with x and y coordinates.
- Return type
Dict[int, np.ndarray]
- simba.utils.read_write.find_max_vertices_coordinates(shapes: List[Union[Polygon, LineString, MultiPolygon, Point]], buffer: Optional[int] = None) Tuple[int, int][source]ο
Find the maximum x and y coordinates among the vertices of a list of geometries.
Can be useful for plotting puposes, to dtermine the rquired size of the canvas to fit all geometries.
- Parameters
shapes (List[Union[Polygon, LineString, MultiPolygon, Point]]) β A list of Shapely geometries including Polygons, LineStrings, MultiPolygons, and Points.
buffer (Optional[int]) β If int, adds to maximum x and y.
- Returns
A two-part tuple containing the maximum x and y coordinates found among the vertices.
- Return type
- Example
>>> polygon = Polygon([(0, 0), (1, 0), (1, 1), (0, 1)]) >>> line = LineString([(1, 1), (2, 2), (3, 1), (4, 0)]) >>> multi_polygon = MultiPolygon([Polygon([(0, 0), (1, 0), (1, 1), (0, 1)]), Polygon([(1, 1), (2, 1), (2, 2), (1, 2)])]) >>> point = Point(3, 4) >>> find_max_vertices_coordinates([polygon, line, multi_polygon, point]) >>> (4, 4)
- simba.utils.read_write.find_time_stamp_from_frame_numbers(start_frame: int, end_frame: int, fps: float) List[str][source]ο
Given start and end frame numbers and frames per second (fps), return a list of formatted time stamps corresponding to the frame range start and end time.
- Parameters
- Returns
A list of time stamps in the format βHH:MM:SS:MSβ.
- Return type
List[str]
- Example
>>> find_time_stamp_from_frame_numbers(start_frame=11, end_frame=20, fps=3.4) >>> ['00:00:03:235', '00:00:05:882']
- simba.utils.read_write.find_video_of_file(video_dir: Union[str, PathLike], filename: str, raise_error: Optional[bool] = False, warning: Optional[bool] = True, recursive: bool = False) Optional[Union[str, PathLike]][source]ο
Helper to find the video file with the SimBA project that represents a known data file path.
- Parameters
video_dir (str) β Directory holding putative video file.
filename (str) β Data file name (stem only, e.g.
Video_1). Path separators are stripped to the basename.raise_error (Optional[bool]) β If True, raise error if no file can be found. If False, returns None if no file can be found. Default: False
warning (Optional[bool]) β If True, print warning if no file can be found. If False, no warning is printed if file cannot be found. Default: False
recursive (bool) β If True, search subdirectories of video_dir for the video file. If False, only the top-level of video_dir is searched. Default: False. If several files are found as a match, the first one is returned.
- Returns
Video file path, or None if not found.
- Return type
Union[str, os.PathLike, None]
- Examples
>>> find_video_of_file(video_dir='project_folder/videos', filename='Together_1') >>> 'project_folder/videos/Together_1.avi'
- simba.utils.read_write.get_all_clf_names(config: ConfigParser, target_cnt: int) List[str][source]ο
Get all classifier names in a SimBA project.
- Parameters
config (configparser.ConfigParser) β Parsed SimBA project_config.ini
target_cnt (int) β Count of models in SimBA project
- Returns
Classifier model names
- Return type
List[str]
- Example
>>> get_all_clf_names(config=config, target_cnt=2) >>> ['Attack', 'Sniffing']
- simba.utils.read_write.get_audio_duration(audio_path: Union[str, PathLike]) float[source]ο
Get duration of audio file in seconds using ffprobe.
- Parameters
audio_path (Union[str, os.PathLike]) β Path to audio file.
- Return float
Duration in seconds.
- simba.utils.read_write.get_bp_headers(body_parts_lst: List[str]) list[source]ο
Helper to create ordered list of all column header fields from body-part names for SimBA project dataframes.
- Parameters
body_parts_lst (List[str]) β Body-part names in the SimBA prject
- Returns
Body-part headers
- Return type
List[str]
- Examaple
>>> get_bp_headers(body_parts_lst=['Nose']) >>> ['Nose_x', 'Nose_y', 'Nose_p']
- simba.utils.read_write.get_cpu_pool(core_cnt: int = -1, maxtasksperchild: int = 8000, context: Optional[typing_extensions.Literal['fork', 'spawn', 'forkserver']] = None, verbose: bool = True, source: Optional[str] = None) Pool[source]ο
Creates and returns a multiprocessing.Pool instance with platform-appropriate defaults and validation.
- Parameters
core_cnt (int) β Number of worker processes. -1 uses all available cores. Default: -1.
maxtasksperchild (int) β Maximum number of tasks a worker process can complete before being replaced. Default: From Defaults.MAXIMUM_MAX_TASK_PER_CHILD.
context (Optional[Literal['fork', 'spawn', 'forkserver']]) β Multiprocessing start method. None uses platform default. Default: None.
verbose (bool) β If True, prints pool creation message with timestamp. Default: True.
source (Optional[str]) β Optional identifier string for logging purposes (e.g., βVideoProcessorβ). Default: None.
- Returns
Configured multiprocessing.Pool instance.
- Return type
multiprocessing.Pool
- Example
>>> pool = get_cpu_pool(core_cnt=4, source='FeatureExtractor') >>> pool = get_cpu_pool(core_cnt=-1, context='spawn', verbose=True) >>> pool = get_cpu_pool(core_cnt=8, maxtasksperchild=100, source='VideoProcessor')
- simba.utils.read_write.get_desktop_path(raise_error: bool = False)[source]ο
Get the path to the user desktop directory
- simba.utils.read_write.get_downloads_path(raise_error: bool = False)[source]ο
Get the path to the user downloads directory
- simba.utils.read_write.get_env_pose_config_dir(raise_error: Optional[bool] = True)[source]ο
Locate and validate the pose_configurations directory in the active SimBA installation.
- simba.utils.read_write.get_file_name_info_in_directory(directory: Union[str, PathLike], file_type: str) Dict[str, str][source]ο
Get dict of all file paths in a directory with specified extension as values and file base names as keys.
- Parameters
- Return dict
All found files as values and file base names as keys.
- Example
>>> get_file_name_info_in_directory(directory='C:/project_folder/csv/machine_results', file_type='csv') >>> {'Video_1': 'C:/project_folder/csv/machine_results/Video_1'}
- simba.utils.read_write.get_fn_ext(filepath: Union[PathLike, str], raise_error: bool = True) Union[Tuple[str, str, str], Tuple[None, None, None]][source]ο
Split file path into three components: (i) directory, (ii) file name, and (iii) file extension.
- Parameters
filepath (Union[os.PathLike, str]) β Path to file.
raise_error (bool) β If True, raises InvalidFilepathError for invalid paths. If False, returns (None, None, None) for invalid paths. Default: True.
- Returns
3-part tuple with file directory name, file name (w/o extension), and file extension. Returns (None, None, None) if invalid path and raise_error=False.
- Return type
- Example
>>> get_fn_ext(filepath='C:/My_videos/MyVideo.mp4') ('C:/My_videos', 'MyVideo', '.mp4') >>> get_fn_ext(filepath='invalid_path', raise_error=False) (None, None, None)
- simba.utils.read_write.get_h5_frame_count(path: Union[str, PathLike]) Optional[int][source]ο
Return the number of frames (rows) in a DLC H5 file without loading the full data.
Inspects the H5 fileβs structural metadata to read the row dimension cheaply, handling both common pandas-on-HDF storage modes:
format='table'(legacy DLC TF backend) β<key>/tableshape[0].format='fixed'(modern DLC PyTorch backend) β<key>/axis1shape[0] or<key>/block0_valuesshape[0].
If the structural shortcut fails for any reason, falls back to a full
pandas.read_hdf()read.- Parameters
path (Union[str, os.PathLike]) β Path to a DLC H5 file.
- Returns
Number of frames in the file, or
Noneif no row count could be determined.- Return type
Optional[int]
- Example
>>> n = get_h5_frame_count(r'video_DLC_HrnetW32_..._el.h5') >>> # 5400
- simba.utils.read_write.get_memory_usage_array(x: ndarray) Dict[str, float][source]ο
Calculates the memory usage of a NumPy array in bytes, megabytes, and gigabytes.
- Parameters
x β A NumPy array for which memory usage will be calculated. It should be a valid NumPy array with a defined size and dtype.
- Returns
A dictionary with memory usage information, containing the following keys: - βbytesβ: Memory usage in bytes. - βmegabytesβ: Memory usage in megabytes. - βgigabytesβ: Memory usage in gigabytes.
- simba.utils.read_write.get_memory_usage_of_df(df: DataFrame) Dict[str, float][source]ο
Get the RAM memory usage of a dataframe.
- Parameters
df (pd.DataFrame) β Parsed dataframe
- Returns
Dict holding the memory usage of the dataframe in bytes, mb, and gb.
- Return type
- Example
>>> df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD')) >>> {'bytes': 3328, 'megabytes': 0.003328, 'gigabytes': 3e-06}
- simba.utils.read_write.get_number_of_header_columns_in_df(df: DataFrame) int[source]ο
Returns the count of non-numerical header rows in dataframe. E.g., can be helpful to determine if dataframe is multi-index columns.
- Parameters
df (pd.DataFrame) β Dataframe to check the count of non-numerical header rows for.
- Example
>>> get_number_of_header_columns_in_df(df='project_folder/csv/input_csv/Video_1.csv') >>> 3
- simba.utils.read_write.get_pkg_version(pkg: str, raise_error: Optional[bool] = False)[source]ο
Helper to get the version of a package in the current python environment.
- Example
>>> get_pkg_version(pkg='simba-uw-tf-dev') >>> 1.82.7 >>> get_pkg_version(pkg='bla-bla') >>> None
- simba.utils.read_write.get_recent_projects_paths(max: int = 15, sort_alphabetically: bool = True) List[str][source]ο
- simba.utils.read_write.get_site_packages_path(raise_error: Optional[bool] = True) Union[None, PathLike, str][source]ο
Retrieve the path to the current Python environmentβs site-packages directory.
- simba.utils.read_write.get_unique_values_in_iterable(data: Iterable, name: Optional[str] = '', min: Optional[int] = 1, max: Optional[int] = None) int[source]ο
Helper to get and check the number of unique variables in iterable. E.g., check the number of unique identified clusters.
- simba.utils.read_write.get_video_info_ffmpeg(video_path: Union[str, PathLike]) Dict[str, Any][source]ο
Extracts metadata information from a video file using FFmpegβs ffprobe.
Note
FFMpeg based metadata extraction seems preferable over OpenCV with data in .h264 format.
See also
To use OpenCV instead of FFmpeg, see
simba.utils.read_write.get_video_meta_data()- Parameters
video_path (Union[str, os.PathLike]) β The file path to the video for which metadata is to be extracted.
- Returns
A dictionary containing video metadata:
- Return type
Dict[str, Any]
- simba.utils.read_write.get_video_meta_data(video_path: Union[str, PathLike, VideoCapture], fps_as_int: bool = True, raise_error: bool = True) Optional[Dict[str, Any]][source]ο
Read video metadata (fps, resolution, frame cnt etc.) from video file (e.g., mp4).
See also
To use FFmpeg instead of OpenCV, see
simba.utils.read_write.get_video_info_ffmpeg().- Parameters
- Returns
The video metadata in dict format with parameter (e.g.,
fps) as keys.- Return type
Dict[str, Any].
- Example
>>> get_video_meta_data('test_data/video_tests/Video_1.avi') {'video_name': 'Video_1', 'fps': 30, 'width': 400, 'height': 600, 'frame_count': 300, 'resolution_str': '400 x 600', 'video_length_s': 10}
- simba.utils.read_write.img_array_to_clahe(img: ndarray, clip_limit: int = 2, tile_grid_size: Tuple[int, int] = (16, 16)) ndarray[source]ο
- simba.utils.read_write.img_stack_to_bw(imgs: ndarray)[source]ο
Jitted conversion of a 4D stack of color images (RGB format) to black and white.
- Parameters
imgs (np.ndarray) β A 4D array representing color images. It should have the shape (num_images, height, width, 3) where the last dimension represents the color channels (R, G, B).
- Returns np.ndarray
A 3D array containing the black and white versions of the input images. The shape of the output array is (num_images, height, width).
- Example
>>> imgs = ImageMixin().read_img_batch_from_video( video_path='/Users/simon/Desktop/envs/troubleshooting/two_black_animals_14bp/videos/Together_1.avi', start_frm=0, end_frm=100) >>> imgs = np.stack(list(imgs.values())) >>> imgs_gray = ImageMixin.img_stack_to_greyscale(imgs=imgs)
- simba.utils.read_write.img_stack_to_greyscale(imgs: ndarray)[source]ο
Jitted conversion of a 4D stack of color images (RGB format) to grayscale.
- Parameters
imgs (np.ndarray) β A 4D array representing color images. It should have the shape (num_images, height, width, 3) where the last dimension represents the color channels (R, G, B).
- Returns np.ndarray
A 3D array containing the grayscale versions of the input images. The shape of the output array is (num_images, height, width).
- Example
>>> imgs = ImageMixin().read_img_batch_from_video( video_path='/Users/simon/Desktop/envs/troubleshooting/two_black_animals_14bp/videos/Together_1.avi', start_frm=0, end_frm=100) >>> imgs = np.stack(list(imgs.values())) >>> imgs_gray = ImageMixin.img_stack_to_greyscale(imgs=imgs)
- simba.utils.read_write.img_stack_to_video(x: ndarray, save_path: Union[str, PathLike], fps: float, gpu: Optional[bool] = False, bitrate: Optional[int] = 5000) None[source]ο
Converts a NumPy image stack to a video file, with optional GPU acceleration and configurable bitrate.
- Parameters
x (np.ndarray) β A NumPy array representing the image stack. The array should have shape (N, H, W) for greyscale or (N, H, W, 3) for RGB images, where N is the number of frames, H is the height, and W is the width.
save_path (Union[str, os.PathLike]) β Path to the output video file where the video will be saved.
fps (float) β Frames per second for the output video. Should be a positive floating-point number.
gpu (Optional[bool]) β Whether to use GPU acceleration for encoding. If True, the video encoding will use NVIDIAβs NVENC encoder. Defaults to False.
bitrate (Optional[int]) β Bitrate for the video encoding in kilobits per second (kbps). Should be an integer between 1000 and 35000. Defaults to 5000.
- Returns
None
- simba.utils.read_write.img_to_bw(img: ndarray) ndarray[source]ο
Jitted conversion of a single image (grayscale or RGB) to black and white.
- Parameters
img β A 2D grayscale image (H, W) or 3D RGB image (H, W, 3), dtype uint8.
- Returns
A 2D binary black and white image with values 0 or 255.
- simba.utils.read_write.labelme_to_dlc(labelme_dir: Union[str, PathLike], scorer: Optional[str] = 'SN', save_dir: Optional[Union[str, PathLike]] = None) None[source]ο
Convert labels from labelme format to DLC format.
- Parameters
labelme_dir (Union[str, os.PathLike]) β Directory with labelme json files.
scorer (Optional[str]) β Name of the scorer (anticipated by DLC as header)
save_dir (Optional[Union[str, os.PathLike]]) β Directory where to save the DLC annotations. If None, then same directory as labelme_dir with _dlc_annotations suffix.
- Returns
None
- Example
>>> labelme_dir = r'D:/ts_annotations' >>> labelme_to_dlc(labelme_dir=labelme_dir)
- simba.utils.read_write.osf_download(project_id: str, save_dir: Union[str, PathLike], storage: str = 'osfstorage', overwrite: bool = False)[source]ο
Download all files from an OSF (Open Science Framework) project to a local directory.
This function connects to the OSF API, accesses the specified project and storage location, and downloads all files to the local save directory. Files can be skipped if they already exist locally and overwrite is disabled.
- Parameters
project_id (str) β OSF project identifier (e.g., βabc123β from osf.io/abc123).
save_dir (Union[str, os.PathLike]) β Local directory path where files will be downloaded.
storage (str) β OSF storage location name (default: βosfstorageβ).
overwrite (bool) β If True, overwrite existing files. If False, skip existing files (default: False).
- Example
>>> osf_download(project_id="7fgwn", save_dir=r'E:/rgb_white_vs_black_imgs') >>> osf_download(project_id="kym42", save_dir=r'E:/crim13_imgs', overwrite=True)
- simba.utils.read_write.read_boris_file(file_path: Union[str, PathLike], fps: Optional[Union[float, int]] = None, orient: Optional[typing_extensions.Literal['index', 'columns']] = 'index', save_path: Optional[Union[str, PathLike]] = None, raise_error: Optional[bool] = False, log_setting: Optional[bool] = False) Union[None, Dict[str, Dict[str, DataFrame]]][source]ο
Reads a BORIS behavioral annotation file, processes the data, and optionally saves the results to a file.
- Parameters
file_path (Union[str, os.PathLike]) β The path to the BORIS file to be read. The file should be a CSV containing behavioral annotations.
fps (Optional[Union[int, float]]) β Frames per second (FPS) to convert time annotations into frame numbers. If not provided, it will be extracted from the BORIS file if available.
orient (Optional[Literal['index', 'columns']]) β Determines the orientation of the results. βindexβ will organize data with start and stop times as indices, while βcolumnsβ will store data in columns.
save_path (Optional[Union[str, os.PathLike]) β The path where the processed results should be saved as a pickle file. If not provided, the results will be returned instead.
raise_error (Optional[bool]) β Whether to raise errors if the file format or content is invalid. If False, warnings will be logged instead of raising exceptions.
log_setting (Optional[bool]) β Whether to log warnings and errors. This is relevant when raise_error is set to False.
- Returns
If save_path is None, returns a dictionary where keys are behaviors and values are dataframes containing start and stop frames for each behavior. If save_path is provided, the results are saved and nothing is returned.
- simba.utils.read_write.read_config_entry(config: ConfigParser, section: str, option: str, data_type: str, default_value: Optional[Any] = None, options: Optional[List] = None) Union[float, int, str][source]ο
Helper to read entry in SimBA project_config.ini parsed by configparser.ConfigParser.
- Parameters
config (configparser.ConfigParser) β Parsed SimBA project_config.ini. Use
simba.utils.read_config_file()to parse file.section (str) β Section name of entry to parse.
option (str) β Option name of entry to parse.
data_type (str) β Type of data to parse. E.g., str, int, float.
default_value (Optional[Any]) β If no matching entry can be found in the project_config.ini, use this as default.
options (Optional[List] or None) β List of valid options. If not None, checks that the returned entry value exists in this list.
:return Any
- Example
>>> read_config_entry(config='project_folder/project_config.ini', section='General settings', option='project_name', data_type='str') >>> 'two_animals_14_bps'
- simba.utils.read_write.read_config_file(config_path: Union[str, PathLike]) ConfigParser[source]ο
Helper to parse SimBA project project_config.ini file
- Parameters
config_path (Union[str, os.PathLike]) β Path to project_config.ini file
- Returns
parsed project_config.ini file
- Return type
- Raises
MissingProjectConfigEntryError β Invalid file format.
- Example
>>> read_config_file(config_path='project_folder/project_config.ini')
- simba.utils.read_write.read_data_paths(path: Optional[Union[str, PathLike]], default: List[Union[str, PathLike]], default_name: Optional[str] = '', file_type: Optional[str] = 'csv') List[str][source]ο
Helper to flexibly read in a set of file-paths.
- Parameters
path (Union[str, os.PathLike]) β None or path to a file or a folder or list of paths to files.
default (List[Union[str, os.PathLike]]) β If
pathis None. Use this passed list of file paths.default_name (Optional[str]) β A readable name representing the
defaultfor interpretable error msgs. Defaults to empty string.file_type (Optional[str]) β If path is a directory, read in all files in directory with this file extension. Default:
csv.
- Return List[str]
List of file paths.
- simba.utils.read_write.read_df(file_path: Union[str, PathLike], file_type: Union[str, PathLike] = 'csv', has_index: Optional[bool] = True, remove_columns: Optional[List[str]] = None, usecols: Optional[List[str]] = None, anipose_data: Optional[bool] = False, check_multiindex: Optional[bool] = False, multi_index_headers_to_keep: Optional[int] = None, verbose: Optional[bool] = False) Union[DataFrame, dict][source]ο
Read single tabular data file or pickle
Note
For improved runtime, defaults to
pyarrow.csv.write_cs()if file type iscsv.EXPECTED RUNTIMES
CSV DISK SIZE (MB)
TIME (S)
STDEV TIME (S)
257
0.682866667
0.063618891
643
1.551066667
0.048732057
6435
22.01703333
0.612539014
7722
39.37053333
8.153716055
- Parameters
file_path (str) β Path to data file
file_type (str) β Type of data. OPTIONS: βparquetβ, βcsvβ, βpickleβ.
Optional[bool] β If the input file has an initial index column. Default: True.
remove_columns (Optional[List[str]]) β If not None, then remove columns in lits.
usecols (Optional[List[str]]) β If not None, then keep columns in list.
check_multiindex (bool) β check file is multi-index headers. Default: False.
multi_index_headers_to_keep (int) β If reading multi-index file, and we want to keep one of the dropped multi-index levels as the header in the output file, specify the index of the multiindex hader as int.
- Returns
Table data in pd.DataFrame format.
- Return type
pd.DataFrame
- Example
>>> read_df(file_path='project_folder/csv/input_csv/Video_1.csv', file_type='csv', check_multiindex=True)
- simba.utils.read_write.read_df_array(df: DataFrame, column: str)[source]ο
Convert string representations of 2D arrays in a DataFrame column to actual numpy arrays.
- Parameters
df (pd.DataFrame) β The DataFrame containing the column.
column (str) β The name of the column with string representations of 2D arrays.
- Returns
A list of numpy arrays, each corresponding to an entry in the specified column.
- Return type
List[np.ndarray]
- simba.utils.read_write.read_dlc_superanimal_h5(path: Union[str, PathLike], col_names: List[str]) DataFrame[source]ο
Read and parse DeepLabCut SuperAnimal-TopView pose estimation data from H5 format.
Supports both DLC H5 layouts that the SuperAnimal-TopView workflow can produce:
Legacy DLC TensorFlow backend β H5 contains a
df_with_missinggroup with a nested PyTablestable(DLC <= 2.x export written withdf.to_hdf(..., key='df_with_missing', format='table')).Modern DLC 3.0+ PyTorch backend (HRNet / RTMPose, including multi-animal
_el.h5/_full.h5outputs) β H5 stores a pandas DataFrame with a multi-index column header (typical levels:scorer / [individuals] / bodyparts / coords). Readable directly withpandas.read_hdf().
Regardless of the source format, the returned DataFrame has its columns assigned to
col_namespositionally. The H5 column order is therefore assumed to follow the SimBA projectβs body-part order (i.e. SuperAnimal-TopView 27 body parts per animal, each as anx, y, likelihoodtriplet, animals in the order specified byid_lstinsimba.pose_importers.superanimal_import.SuperAnimalTopViewImporter).- Parameters
path (Union[str, os.PathLike]) β Path to the SuperAnimal DLC H5 file.
col_names (List[str]) β List of column names to assign to the DataFrame. Must match the expected number of columns based on the SimBA project configuration (typically body-part coordinates: x, y, p).
- Returns
DataFrame containing pose estimation data with columns named according to
col_names.- Return type
pd.DataFrame
- Raises
InvalidInputError β If the file cannot be read by any supported strategy, or if the number of columns in the file is less than the number of expected column names.
- Example
>>> col_names = ['Animal_1_Nose_x', 'Animal_1_Nose_y', 'Animal_1_Nose_p', 'Animal_1_Ear_left_x', ...] >>> df = read_dlc_superanimal_h5(path='project_folder/videos/Video_1.h5', col_names=col_names)
- simba.utils.read_write.read_facemap_h5(file_path: Union[str, PathLike]) DataFrame[source]ο
Convert FaceMap pose-estimation data to pandas Dataframe format.
See also
See FaceMap GitHub repository for expected H5 file format: https://github.com/MouseLand/facemap
- Parameters
file_path (Union[str, os.PathLike]) β Path to facemap data file in .h5 format.
- Returns
FaceMap pose-estimation data in DataFrame format.
- Return type
pd.DataFrame
- simba.utils.read_write.read_frm_of_video(video_path: Union[str, PathLike, VideoCapture], frame_index: Optional[int] = 0, opacity: Optional[float] = None, size: Optional[Tuple[int, int]] = None, keep_aspect_ratio: bool = False, greyscale: Optional[bool] = False, black_and_white: Optional[bool] = False, clahe: Optional[Union[Tuple[int, int, int], bool]] = False, use_ffmpeg: Optional[bool] = False, raise_error: Optional[bool] = True) Optional[ndarray][source]ο
Reads a single frame from a video file.
See also
To read a batch of images with GPU acceleration, see
simba.utils.read_write.read_img_batch_from_video_gpu(). To read a batch of videos using multicore CPU acceleration, seesimba.utils.read_write.read_img_batch_from_video(). To read frames batches asynchronously, seesimba.video_processors.async_frame_reader.AsyncVideoFrameReader().- Parameters
video_path (Union[str, os.PathLike, cv2.VideoCapture]) β Path to video file, or cv2.VideoCapture object.
frame_index (Optional[int]) β The frame index to return (0-based). Default: 0. If -1 is passed, the last frame of the video is read.
opacity (Optional[float]) β Value between 0 and 100 or None. If float value, returns image with opacity. 100 fully opaque. 0.0 fully transparent.
size (Optional[Tuple[int, int]]) β If tuple (width, height), resizes the image. If None, returns original image size. When used with keep_aspect_ratio=True, the image is resized to fit within the target size while maintaining aspect ratio.
keep_aspect_ratio (bool) β If True and size is provided, resizes the image to fit within the target size while maintaining aspect ratio. If False, resizes to exact size (may distort aspect ratio). Default False.
greyscale (Optional[bool]) β If True, returns the greyscale image. Default False.
black_and_white (Optional[bool]) β If True, returns black and white image at threshold 127. Default False.
clahe (Optional[Union[Tuple[int, int, int], bool]]) β CLAHE settings. If
True, uses default CLAHE (clipLimit=2, tileGridSize=(16, 16)). If a 3-tuple, interpreted as(clip_limit, tile_x, tile_y). IfFalse/None, CLAHE is not applied.use_ffmpeg (Optional[bool]) β If True, uses FFmpeg for frame extraction instead of OpenCV. Default False.
raise_error (Optional[bool]) β If True, raises error on failure. If False, returns None on failure. Default True.
- Returns
Image as numpy array, or None if raise_error=False and an error occurs.
- Return type
Union[np.ndarray, None]
- Example
>>> img = read_frm_of_video(video_path='/Users/simon/Desktop/envs/platea_featurizer/data/video/3D_Mouse_5-choice_MouseTouchBasic_s9_a4_grayscale.mp4') >>> cv2.imshow('img', img) >>> cv2.waitKey(5000)
- simba.utils.read_write.read_img(img_path: Union[str, PathLike], greyscale: bool = False, clahe: bool = False, opacity: Optional[float] = None) ndarray[source]ο
- simba.utils.read_write.read_img_batch_from_video(video_path: Union[str, PathLike], start_frm: Optional[int] = None, end_frm: Optional[int] = None, greyscale: bool = False, black_and_white: bool = False, clahe: bool = False, core_cnt: int = -1, size: Optional[Tuple[int, int]] = None, pool: Optional[Pool] = None, verbose: bool = False) Dict[int, ndarray][source]ο
Read a batch of frames from a video file. This method reads frames from a specified range of frames within a video file using multiprocessing.
EXPECTED RUNTIMES
READ FRAME COUNT
TIME (S)
STDEV (S)
1000
7.149766667
1.001209181
2000
8.874533333
0.258467219
REPEATS = 3
RESOLUTION: 670 x 530
CORES: 24
See also
For GPU acceleration, see
simba.utils.read_write.read_img_batch_from_video_gpu()Note
When black-and-white videos are saved as MP4, there can be some small errors in pixel values during compression. A video with only (0, 255) pixel values therefore gets other pixel values, around 0 and 255, when read in again. If you expect that the video you are reading in is black and white, set
black_and_whiteto True to round any of these wonly value sto 0 and 255.- Parameters
video_path (Union[str, os.PathLike]) β Path to the video file.
start_frm (int) β Starting frame index.
end_frm (int) β Ending frame index.
core_cnt (Optionalint]) β Number of CPU cores to use for parallel processing. Default is -1, indicating using all available cores.
greyscale (Optional[bool]) β If True, reads the images as greyscale. If False, then as original color scale. Default: False.
black_and_white (bool) β If True, returns the images in black and white. Default False.
clahe (bool) β If True, returns clahe enhanced images.
- Returns
A dictionary containing frame indices as keys and corresponding frame arrays as values.
- Return type
Dict[int, np.ndarray]
- Example
>>> read_img_batch_from_video(video_path='/Users/simon/Desktop/envs/troubleshooting/two_black_animals_14bp/videos/Together_1.avi', start_frm=0, end_frm=50)
- simba.utils.read_write.read_img_batch_from_video_gpu(video_path: Union[str, PathLike], start_frm: Optional[int] = None, end_frm: Optional[int] = None, verbose: bool = False, greyscale: bool = False, black_and_white: bool = False, out_format: typing_extensions.Literal['dict', 'array'] = 'dict') Union[Dict[int, ndarray], ndarray][source]ο
Reads a batch of frames from a video file using GPU acceleration.
EXPECTED RUNTIMES
READ FRAME COUNT
TIME (S)
STDEV (S)
1000
0.679366667
0.006305817
2000
1.269433333
0.133388543
4000
2.8926
0.343663338
8000
5.2628
0.293268546
16000
14.2577
1.20444887
REPEATS = 3
RESOLUTION: 670 x 530
This function uses FFmpeg with CUDA acceleration to read frames from a specified range in a video file. It supports both RGB and greyscale video formats. Frames are returned as a dictionary where the keys are frame indices and the values are NumPy arrays representing the image data.
Note
When black-and-white videos are saved as MP4, there can be some small errors in pixel values during compression. A video with only (0, 255) pixel values therefore gets other pixel values, around 0 and 255, when read in again. If you expect that the video you are reading in is black and white, set
black_and_whiteto True to round any of these wonly value sto 0 and 255.See also
For CPU multicore acceleration, see
simba.mixins.image_mixin.ImageMixin.read_img_batch_from_video()orsimba.utils.read_write.read_img_batch_from_video().- Parameters
video_path β Path to the video file. Can be a string or an os.PathLike object.
start_frm β The starting frame index to read. If None, starts from the beginning of the video.
end_frm β The ending frame index to read. If None, reads until the end of the video.
verbose β If True, prints progress information to the console.
greyscale β If True, returns the images in greyscale. Default False.
black_and_white β If True, returns the images in black and white. Default False.
- Returns
A dictionary where keys are frame indices (integers) and values are NumPy arrays containing the image data of each frame.
- simba.utils.read_write.read_json(x: Union[str, PathLike, List[Union[str, PathLike]]], encoding: str = 'utf-8', raise_error: bool = True) dict[source]ο
Reads one or multiple JSON files from disk and returns their contents as a dictionary.
- Parameters
x (Union[Union[str, os.PathLike], List[Union[str, os.PathLike]]]) β A path or list of paths to JSON files on disk.
- Returns
A dictionary with JSON data. If multiple files are provided, keys are derived from filenames.
- Return type
- simba.utils.read_write.read_meta_file(meta_file_path: Union[str, PathLike]) dict[source]ο
Read in single SimBA modelconfig meta file CSV to python dictionary.
- Parameters
meta_file_path (str) β Path to SimBA config meta file
- Return dict
Dictionary holding model parameters.
- Example
>>> read_meta_file('project_folder/configs/Attack_meta_0.csv') >>> {'Classifier_name': 'Attack', 'RF_n_estimators': 2000, 'RF_max_features': 'sqrt', 'RF_criterion': 'gini', ...}
- simba.utils.read_write.read_pickle(data_path: Union[str, PathLike], verbose: Optional[bool] = False) Dict[Any, Any][source]ο
Read a single or directory of pickled objects. If directory, returns dict with numerical sequential integer keys for each object.
- Parameters
- Returns
Dictionary representation of the pickle.
- Return type
Dict[Any, Any]
- Example
>>> data = read_pickle(data_path='/test/unsupervised/cluster_models')
- simba.utils.read_write.read_project_path_and_file_type(config: ConfigParser) Tuple[str, str][source]ο
Helper to read the path and file type of the SimBA project from the project_config.ini.
- Parameters
config (configparser.ConfigParser) β parsed SimBA config in configparser.ConfigParser format
- Returns
The path of the project
project_folderand the set file type of the project (i.e.,csvorparquet) as two-part tuple.- Return type
- simba.utils.read_write.read_roi_data(roi_path: Union[str, PathLike]) Tuple[DataFrame, DataFrame, DataFrame][source]ο
Method to read in ROI definitions from SimBA project.
- Parameters
roi_path (Union[str, os.PathLike]) β path to ROI_definitions.h5 on disk.
- Returns
3-part Tuple of dataframes representing rectangles, circles, polygons.
- Return type
Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]
- simba.utils.read_write.read_shap_feature_categories_csv() Tuple[DataFrame, List[str], List[str], List[str]][source]ο
Helper to read feature names and their categories used for binning and visualizing shapely values
- simba.utils.read_write.read_shap_img_paths()[source]ο
Helper to read in the images used to create the SHAP visualization
- simba.utils.read_write.read_simba_meta_files(folder_path: str, raise_error: bool = False) List[str][source]ο
Read in paths of SimBA model config files directory (project_folder/configsβ). Consider files that have `meta suffix only.
- Parameters
- Returns
List of paths to SimBA model config meta files.
- Return type
List[str]
- Example
>>> read_simba_meta_files(folder_path='/project_folder/configs') >>> ['project_folder/configs/Attack_meta_1.csv', 'project_folder/configs/Attack_meta_0.csv']
- simba.utils.read_write.read_sleap_csv(file_path: Union[str, PathLike]) Tuple[DataFrame, list, list][source]ο
Reads and validates a SLEAP-exported CSV file containing tracking data.
- Parameters
file_path (Union[str, os.PathLike]) β Path to the SLEAP CSV file.
- Returns
Tuple with (i) The validated and cleaned DataFrame, (ii) A list of unique body part names, (iii) A flattened list of coordinate column names for each body part (e.g., [βnose.xβ, βnose.yβ, β¦]) excliding probability scores.
- Return type
- simba.utils.read_write.read_sleap_h5(file_path: Union[str, PathLike]) DataFrame[source]ο
Helper to read in SLEAP H5 file in format expected by SimBA
- simba.utils.read_write.read_video_info(video_name: str, video_info_df: Optional[DataFrame] = None, vid_info_df: Optional[DataFrame] = None, raise_error: Optional[bool] = True) Union[Tuple[DataFrame, float, float], Tuple[None, None, None]][source]ο
Helper to read the metadata (pixels per mm, resolution, fps etc) from the video_info.csv for a single input file/video
- Parameters
vid_info_df (pd.DataFrame) β Parsed
project_folder/logs/video_info.csvfile. This file can be parsed bysimba.utils.read_write.read_video_info_csv().video_info_df (pd.DataFrame) β Alias for
vid_info_df. If both are provided, thevid_info_dfis used.video_name (str) β Name of the video as represented in the
Videocolumn of theproject_folder/logs/video_info.csvfile.raise_error (Optional[bool]) β If True, raises error if the video cannot be found in the
vid_info_dffile. If False, returns None if the video cannot be found.
- Returns
3-part tuple: One row DataFrame representing the video in the
project_folder/logs/video_info.csvfile, the frame rate of the video, and the the pixels per millimeter of the video- Return type
Union[Tuple[pd.DataFrame, float, float], Tuple[None, None, None]]
- Example
>>> video_info_df = read_video_info_csv(file_path='project_folder/logs/video_info.csv') >>> read_video_info(vid_info_df=video_info_df, video_name='Together_1')
- simba.utils.read_write.read_video_info_csv(file_path: Union[str, PathLike], raise_error: bool = True) DataFrame[source]ο
Helper to read the project_folder/logs/video_info.csv of the SimBA project in as a pd.DataFrame
- Parameters
file_path (Union[str, os.PathLike]) β Path to the project_folder/logs/video_info.csv file.
raise_error (bool) β If True, raises error if the entries in the file are not of expected format. Default True.
- Returns
Dataframe representation of the file.
- Return type
pd.DataFrame
- simba.utils.read_write.read_yolo_bp_names_file(file_path: Union[str, PathLike]) Tuple[str][source]ο
Helper to read CSV with single column listing body-part names.
- simba.utils.read_write.recursive_file_search(directory: Union[str, PathLike], extensions: Union[str, List[str]], case_sensitive: bool = False, substrings: Optional[Union[str, List[str]]] = None, skip_substrings: Optional[Union[str, List[str]]] = None, raise_error: bool = True, as_dict: bool = False) Union[List[str], Dict[str, str]][source]ο
Recursively search for files in a directory and all subdirectories that: - Contain any of the given substrings in their filename - Have one of the specified file extensions
- Parameters
directory β Directory to start the search from.
substrings β A substring or list of substrings to match in filenames. If None, all files with the specified extensions will be returned.
substrings β A substring or list of substrings to match. If filename contains this substring, it will be removed. If None, all files with the specified extensions will be returned.
extensions β A file extension or list of allowed extensions (with or without dot).
case_sensitive β If True, substring match is case-sensitive. Default False.
raise_error β If True, raise an error if no matches are found.
as_dict β If True, return a dictionary where rge file names ar ekeys and filepaths ar the values.
- Returns
List of matching file paths.
- simba.utils.read_write.remove_a_folder(folder_dir: Union[str, PathLike], ignore_errors: Optional[bool] = True, verbose: bool = False) None[source]ο
Helper to remove a directory.
- simba.utils.read_write.remove_files(file_paths: List[Union[str, PathLike]], raise_error: Optional[bool] = False) None[source]ο
Delete (remove) the files specified within a list of filepaths.
- Parameters
file_paths (Union[str, os.PathLike]) β A list of file paths to be removed.
raise_error (Optional[bool]) β If True, raise exceptions for errors during file deletion. Else, pass. Defaults to False.
- Examples
>>> file_paths = ['/path/to/file1.txt', '/path/to/file2.txt'] >>> remove_files(file_paths, raise_error=True)
- simba.utils.read_write.remove_multiple_folders(folders: List[Union[str, PathLike]], raise_error: Optional[bool] = False) None[source]ο
Helper to remove multiple directories.
- Parameters
List[os.PathLike] (folders) β List of directory paths.
raise_error (bool) β If True, raise
NotDirectoryErrorerror of folder does not exist. if False, then pass. Default False.
- Raises
NotDirectoryError β If
raise_errorand directory does not exist.- Example
>>> remove_multiple_folders(folders= ['gerbil/gerbil_data/featurized_data/temp'])
- simba.utils.read_write.save_json(data: dict, filepath: Union[str, PathLike], encoding: str = 'utf-8') None[source]ο
Saves a dictionary as a JSON file to the specified filepath.
- Parameters
data (dict) β Dictionary containing data to save.
filepath (Union[str, os.PathLike]) β Path where the JSON file should be saved.
- simba.utils.read_write.seconds_to_timestamp(seconds: Union[int, float, List[Union[int, float]]], hh_mm_ss_sss: bool = False) Union[str, List[str]][source]ο
Convert an integer/float number of seconds, or a list of seconds, to a timestamp string.
- simba.utils.read_write.str_2_bool(input_str: str) bool[source]ο
Helper to convert string representation of bool to bool.
- Example
>>> str_2_bool(input_str='yes') >>> True
- simba.utils.read_write.tabulate_clf_info(clf_path: Union[str, PathLike]) None[source]ο
Print the hyperparameters and creation date of a pickled classifier.
- Parameters
clf_path (str) β Path to classifier
- Raises
InvalidFilepathError β The file is not a pickle or not a scikit-learn RF classifier.
- simba.utils.read_write.terminate_cpu_pool(pool: Pool, force: bool = False, verbose: bool = True, source: Optional[str] = None) None[source]ο
Safely terminates a multiprocessing.Pool instance with optional graceful shutdown.
Note
If pool is None or invalid, function returns without action. Exceptions during termination are silently caught.
- Parameters
pool (multiprocessing.pool.Pool) β The multiprocessing pool to terminate. If None, function returns without action.
force (bool) β If True, skips graceful shutdown (close/join) and immediately terminates. Default: False.
verbose (bool) β If True, prints termination message with timestamp. Default: True.
source (Optional[str]) β Optional identifier string for logging purposes (e.g., βVideoProcessorβ). Default: None.
- Example
>>> import multiprocessing >>> pool = multiprocessing.Pool(4) >>> terminate_cpu_pool(pool=pool, force=False, verbose=True, source='FeatureExtractor')
- simba.utils.read_write.timestamp_to_seconds(timestamp: str) int[source]ο
Returns the number of seconds into the video given a timestamp in HH:MM:SS format.
- Parameters
timestamp (str) β Timestamp in HH:MM:SS format
- Returns
The timestamps as seconds.
- Return type
- Raises
FrameRangeError β If timestamp is not a valid format.
- Example
>>> timestamp_to_seconds(timestamp='00:00:05') >>> 5
- simba.utils.read_write.write_df(df: DataFrame, file_type: str, save_path: Union[str, PathLike], multi_idx_header: bool = False, verbose: bool = False) None[source]ο
Write single tabular data file.
Note
For improved runtime, defaults to
pyarrow.csvif file_type ==csv.EXPECTED RUNTIMES
DATAFRAME SIZE (RAM GB)
TIME (S)
STDEV TIME (S)
0.1
1.311
0.057529731
0.25
3.247433333
0.017068782
0.5
6.403333333
0.12338887
1
12.627
0.040894009
1.5
18.83206667
0.138718576
2
25.7713
0.348281366
2.5
31.81306667
0.604449711
3
38.13923333
1.063170773
- Parameters
df (pd.DataFrame) β Pandas dataframe to save to disk.
file_type (str) β Type of data. OPTIONS:
parquet,csv,pickle.save_path (str) β Location where to store the data.
check_multiindex (bool) β check if input file is multi-index headers. Default: False.
verbose (bool) β Prints message on completion. Default: False.
- Example
>>> write_df(df=df, file_type='csv', save_path='project_folder/csv/input_csv/Video_1.csv')
SimBA Warningsο
- simba.utils.warnings.ThirdPartyAnnotationEventCountWarning(video_name: str, clf_name: str, start_event_cnt: int, stop_event_cnt: int, source: str = '', log_status: bool = False)[source]ο
- simba.utils.warnings.ThirdPartyAnnotationFileNotFoundWarning(video_name: str, source: str = '', log_status: bool = False)[source]ο
- simba.utils.warnings.ThirdPartyAnnotationOverlapWarning(video_name: str, clf_name: str, source: str = '', log_status: bool = False)[source]ο
- simba.utils.warnings.ThirdPartyAnnotationsAdditionalClfWarning(video_name: str, clf_names: list, source: str = '', log_status: bool = False)[source]ο
- simba.utils.warnings.ThirdPartyAnnotationsClfMissingWarning(video_name: str, clf_name: str)[source]ο
- simba.utils.warnings.ThirdPartyAnnotationsFpsConflictWarning(video_name: str, annotation_fps: int, video_fps: int, source: str = '')[source]ο
- simba.utils.warnings.ThirdPartyAnnotationsInvalidFileFormatWarning(annotation_app: str, file_path: str, source: str = '', log_status: bool = False)[source]ο
- simba.utils.warnings.ThirdPartyAnnotationsMissingAnnotationsWarning(video_name: str, clf_names: list, source: str = '', log_status: bool = False)[source]ο
SimBA CLI toolsο
- simba.utils.cli.cli_tools.blob_tracker(config_path: Union[str, PathLike]) None[source]ο
Method to access blob detection through CLI or notebook
Note
For an example blob detection config file, see https://github.com/sgoldenlab/simba/blob/master/misc/blob_definitions_ex.json.
- Parameters
config_path (Union[str, os.PathLike]) β Path to json file holding blob detection setting
- Returns
None. The blob detection data is saved at the location specified in the
config_path.- Return type
None
- Example
>>> blob_tracker('/Users/simon/Downloads/result_bg/blob_definitions.json')
- simba.utils.cli.cli_tools.feature_extraction_runner(config_path: Union[str, PathLike]) None[source]ο
Helper to run feature extraction from CLI.
- Parameters
config_path β Path to SimBA project config file in ini format.
- simba.utils.cli.cli_tools.set_outlier_correction_criteria_cli(config_path: Union[str, PathLike], movement_criterion: float, location_criterion: float, aggregation: typing_extensions.Literal['mean', 'median'], body_parts: dict)[source]ο
Helper to set outlier settings in a SimBA project_config.ini from command line