hoops_ai.storage.DatasetMerger
- class hoops_ai.storage.DatasetMerger(zip_files=None, merged_store_path=None, file_id_codes=None, dask_client_params=None, delete_source_files=True)
Bases:
objectHandles the merging of multiple .zip Zarr files into a single store (plus optional partial batching logic).
- Parameters:
- close_dask_resources(close_global=False)
Close the Dask client and cluster that this class manages. If close_global is True, also close any globally active Dask client.
- Parameters:
close_global (bool)
- Return type:
None
- discover_groups_from_files(max_files_to_scan=5)
Dynamically discover group and array structure from the input files. Uses stored schemas if available, falls back to naming heuristics for backward compatibility.
Parameters:
- max_files_to_scanint
Maximum number of files to scan for structure discovery
Returns:
- Dict[str, Dict[str, List[str]]]
Discovered group structure in DATA_GROUPS format
- merge_data(face_chunk=500000, edge_chunk=500000, faceface_flat_chunk=100000, batch_size=None, consolidate_metadata=True, force_compression=True, use_streaming=True)
Merges all zip_files into a single Zarr store.
If force_compression is True, ensures the output uses ZipStore format with .dataset extension. If batch_size is None, merges all files in one shot. Otherwise merges them in partial batches and then merges those partial stores into a final store.
Parameters:
- face_chunkint
Chunk size for face arrays
- edge_chunkint
Chunk size for edge arrays
- faceface_flat_chunkint
Chunk size for faceface arrays
- batch_sizeOptional[int]
If provided, process files in batches of this size
- consolidate_metadatabool
Whether to consolidate metadata after merging
- force_compressionbool
If True, ensures output is a compressed .dataset file instead of a directory