hoops_ai.storage.DatasetInfo
- class hoops_ai.storage.DatasetInfo(info_files, merged_store_path, attribute_file_path, schema=None)
Bases:
object- Handles:
Extraction of metadata from .json info_files
Building file-ID -> code mappings
Internally manages two Parquet files: 1) One for the core info data (merged_store_path) 2) Another for attributes (attribute_file_path)
- Parameters:
- build_code_mappings(zip_files)
Given a list of .zip file paths, generate a mapping from file-ID (stem) -> integer code, storing it internally and returning it.
- parse_info_files()
Loads all info_files into an internal dictionary and routes metadata according to schema-driven routing rules. After successful loading, the processed JSON files are removed, and the parent folder is removed if all files were processed successfully.
- Return type:
None
- set_attributes(metadata, table_name)
Stores structured metadata (e.g. label, face type, or custom categorical attributes) into the separate attributes Parquet file. Now fully generic to handle any categorical metadata type.