hoops_ai.storage.DatasetInfo

class hoops_ai.storage.DatasetInfo(info_files, merged_store_path, attribute_file_path, schema=None)

Bases: object

Handles:
  • Extraction of metadata from .json info_files

  • Building file-ID -> code mappings

  • Internally manages two Parquet files: 1) One for the core info data (merged_store_path) 2) Another for attributes (attribute_file_path)

Parameters:
build_code_mappings(zip_files)

Given a list of .zip file paths, generate a mapping from file-ID (stem) -> integer code, storing it internally and returning it.

Returns:

Dict[str, int] like { “cadfile1”

Return type:

0, “cadfile2”: 1, … }

Parameters:

zip_files (List[str])

parse_info_files()

Loads all info_files into an internal dictionary and routes metadata according to schema-driven routing rules. After successful loading, the processed JSON files are removed, and the parent folder is removed if all files were processed successfully.

Return type:

None

set_attributes(metadata, table_name)

Stores structured metadata (e.g. label, face type, or custom categorical attributes) into the separate attributes Parquet file. Now fully generic to handle any categorical metadata type.

Parameters:
Return type:

None

store_info_to_parquet(table_name='file_info')

Writes the loaded info data to the main Parquet file (self.merged_store_path) via self._info_parquet_manager. Now handles dynamic metadata based on what was actually extracted.

Parameters:

table_name (str)