hoops_ai.dataset.dataset_loader
Classes
|
A framework-agnostic dataset object that contains training, validation or testing data. |
|
A framework-agnostic dataset class that: |
- class hoops_ai.dataset.dataset_loader.CADDataset(parent_dataset, indices)
Bases:
objectA framework-agnostic dataset object that contains training, validation or testing data. Can be converted to framework-specific formats as needed.
- property data_files
Return the list of .bin file paths for this subset only.
- get_item(i)
Framework-agnostic item access
- get_raw_data(i)
Get raw file paths for an item without loading
- property label_datas
Return the list of label arrays for this subset only.
- remove_indices(local_indices_to_remove)
Remove items by local subset index. This also removes them from the parent dataset, so the parent’s data_files/label_datas arrays and indexing will be updated. Then we adjust self.indices accordingly.
- to_torch(collate_fn=None)
Convert to PyTorch Dataset using lazy import
- Parameters:
collate_fn – Optional collate function for batching
- class hoops_ai.dataset.dataset_loader.DatasetLoader(merged_store_path=None, parquet_file_path=None, item_loader_func=None)
Bases:
object- A framework-agnostic dataset class that:
Creates internally a DatasetExplorer from a .dataset and infoset files based on any group/key available
Builds membership matrix for multi-label stratification
Splits data into train/validation/test subsets
Provides get_dataset(…) to get a CADDataset object
Offers remove_indices(…) returning a map of old -> new indices
- Parameters:
- available_arrays(group_name)
Get all available arrays for a specific group.
- close_resources(clear_split_history=True)
Close and cleanup resources, particularly the DatasetExplorer instance.
- Parameters:
clear_split_history (bool) – If True, also clears the split history
- diagnose_file_codes_mismatch(file_codes=None)
Diagnostic method to help understand file code and ID mismatches. Call this method when experiencing issues with file code mapping.
- find_group_for_array(array_name)
Find which group contains a specific array. Returns None if the array is not found in any group.
- get_available_stratification_keys()
Get all available keys that can be used for stratification, grouped by their containing group.
- Return type:
- get_dataset(subset, key=None)
Return a framework-agnostic CADDataset for ‘train’, ‘validation’, or ‘test’.
- Parameters:
- Returns:
A framework-agnostic dataset containing the requested subset
- Return type:
- remove_indices(indices_to_remove)
Removes the given global indices from data_files/label_datas. Returns a dict mapping old_index -> new_index for items that remain.
- reset_split_state()
Reset the split state to allow for a new split with different parameters. This preserves the previous split results in _split_history.
- split(key='label', group='machining', categories=None, train=0.8, validation=0.1, test=0.1, random_state=42, force_reset=False)
Perform stratified split of dataset based on a key.
- Parameters:
key (str) – The key to stratify by (e.g. ‘label’, ‘machining_category’)
group (str) – The group where the key is stored
categories (List[Any] | None) – Optional list of categories. If None, they will be inferred.
train (float) – The fraction of data to use for training.
validation (float) – The fraction of data to use for validation.
test (float) – The fraction of data to use for testing.
random_state (int) – Random seed for reproducibility.
force_reset (bool) – If True, force reset the split state even if the key is the same.
- Returns:
(train_count, val_count, test_count)
- Return type: