hoops_ai.storage

Quick Overview

Modules

hoops_ai.storage.datasetstorage

Classes

DataStorage()

MemoryStorage()

MLStorage()

DGLGraphStoreHandler()

LabelStorage(path_for_storing[, ...])

Class for encoding and decoding labels.

MetricStorage(store)

Abstract class defining the interface for storing machine learning metrics based on their type of data and visualization.

CADFileRetriever(storage_provider[, ...])

LocalStorageProvider(directory_path)

Functions

convert_storage(source_handler, dest_handler)

Generic converter that works with ANY DataStorage implementation.

Data Storage Module

The Storage module provides persistent storage solutions for CAD data, ML models, and analysis results. It offers a unified interface for various storage backends, optimized for the unique requirements of CAD data processing and machine learning workflows.

This module handles the efficient storage and retrieval of large-scale CAD datasets, encoded geometric data, trained ML models, and experimental results. It provides both high-performance options for production use and convenient formats for development and prototyping.

For storage architecture details and usage patterns, see the Data Storage Programming Guide.

class hoops_ai.storage.CADFileRetriever(storage_provider, formats=None, filter_pattern=None, use_regex=False)

Bases: object

Parameters:
  • storage_provider (StorageProvider)

  • formats (List[str] | None)

  • filter_pattern (str | None)

  • use_regex (bool)

get_file_list()

Returns the filtered list of CAD file paths.

Return type:

List[str]

class hoops_ai.storage.DataStorage

Bases: ABC

abstract close()

Handles any necessary cleanup or resource deallocation.

Return type:

None

compress_store()
Return type:

int

abstract format()

a string spcifying the concrete format for this storage.

Return type:

str

abstract get_file_path(data_key)

Retrieves the file path for a given data key.

Parameters:

data_key (str) – The data key.

Returns:

The file path corresponding to the data key.

Return type:

str

get_group_for_array(array_name)

Determines which group an array belongs to based on the schema.

Parameters:

array_name (str) – Name of the array

Returns:

Group name for the array, or None if not found in schema

Return type:

str

abstract get_keys()

Retrieves a list of all keys in the storage. :returns: A list of all keys in the storage. :rtype: list

Return type:

list

get_schema()

Retrieves the schema definition for this storage instance.

Returns:

The schema definition, or empty dict if no schema is set

Return type:

dict

abstract load_data(data_key)

Loads data associated with a specific key.

Parameters:

data_key (str) – The key of the data to load.

Returns:

The loaded data.

Return type:

Any

abstract load_metadata(key)

Loads metadata associated with a specific key.

Parameters:

key (str) – The metadata key.

Returns:

The loaded metadata value.

Return type:

Any

abstract save_data(data_key, data)

Saves data associated with a specific key.

Parameters:
  • data_key (str) – The key under which to store the data.

  • data (Any) – The data to store.

Return type:

None

abstract save_metadata(key, value)

Saves metadata as a key-value pair into the metadata JSON file. If the file doesn’t exist, it will be created.

Parameters:
  • key (str) – The metadata key.

  • value (Any) – The metadata value (bool, int, float, string, list, or array).

Return type:

None

set_schema(schema)

Sets a schema definition for this storage instance. The schema defines how arrays should be organized into groups for merging.

Parameters:

schema (dict) – Schema definition containing group and array specifications

Return type:

None

validate_data_against_schema(data_key, data)

Validates data against the stored schema if present.

Parameters:
  • data_key (str) – The key under which the data will be stored

  • data (Any) – The data to validate

Returns:

True if valid or no schema present, False if validation fails

Return type:

bool

class hoops_ai.storage.LabelStorage(path_for_storing, total_faces=0, total_edges=0)

Bases: object

Class for encoding and decoding labels.

Parameters:
  • path_for_storing (str)

  • total_faces (int)

  • total_edges (int)

EDGE_CADENTITY = 'edge_labels'
EDGE_ENTITY = 'GRAPH_EDGE'
GRAPH_CADENTITY = 'file_label'
GRAPH_ENTITY = 'GRAPH_ENTITY'
NODE_CADENTITY = 'face_labels'
NODE_ENTITY = 'GRAPH_NODE'
load_cadentity(mlTask)
Parameters:

mlTask (str)

Return type:

str

load_entity(mlTask)
Parameters:

mlTask (str)

Return type:

str

load_graph_edges_labels(mlTask)

Loads the label codes and descriptions for each CAD edge.

Parameters:

mlTask (str)

Return type:

Tuple[List[int], List[str]]

load_graph_label(mlTask)

Loads the label code and description for the entire graph.

Parameters:

mlTask (str)

Return type:

Tuple[int, str]

load_graph_nodes_labels(mlTask)

Loads the label codes and descriptions for each CAD face.

Parameters:

mlTask (str)

Return type:

Tuple[List[int], List[str]]

load_sparse_graph_edge_label(mlTask)

Loads sparse edge label data.

Parameters:

mlTask (str)

Return type:

Tuple[List[int], List[int], List[str] | None]

save_graph_edge_label(mlTask, edgeLabels, edgeLabelDescriptions)

Saves the label codes and descriptions for each CAD edge.

Parameters:
save_graph_label(mlTask, graphLabel, graphLabelDescription)

Saves the label code and description for the entire graph.

Parameters:
  • mlTask (str)

  • graphLabel (int)

  • graphLabelDescription (str)

save_graph_node_label(mlTask, faceLabels, faceLabelDescriptions)

Saves the label codes and descriptions for each CAD face.

Parameters:
save_sparse_graph_edge_label(mlTask, edgeIndices, edgeLabels, defaultLabel=0, edgeLabelDescriptions=None)

Saves sparse edge label codes while assigning a default label to all other edges.

Parameters:
save_sparse_graph_node_label(mlTask, faceIndices, faceLabels, defaultLabel=0, faceLabelDescriptions=None)

Saves sparse face label codes and descriptions while assigning a default label to all other faces.

Parameters:
class hoops_ai.storage.LocalStorageProvider(directory_path)

Bases: StorageProvider

Parameters:

directory_path (str)

list_files(extensions)

Retrieves CAD file paths from a local directory or a text file using pathlib.

Parameters:

extensions (List[str])

Return type:

List[str]

class hoops_ai.storage.MemoryStorage

Bases: DataStorage

close()

Clears the stored data and metadata from memory.

Return type:

None

compress_store()

This method is a placeholder, as in-memory storage does not require compression.

Returns:

Always returns 0 as no compression is performed.

Return type:

int

create_store_in_group(store_path='')

Creates a new MemoryStorage instance and adds it to the store group.

Parameters:

store_path (str)

Return type:

MemoryStorage

format()

Returns the format of this storage.

Returns:

A string specifying that this is in-memory storage.

Return type:

str

get_file_path(data_key)

Since this is an in-memory storage, file paths do not exist. For compatibility with code that expects a directory path for the root (“”), we return a temporary directory path. For specific keys, we return a descriptive message.

Parameters:

data_key (str) – The data key.

Returns:

A temporary directory path for root key, or descriptive message for specific keys.

Return type:

str

get_group_for_array(array_name)

Determines which group an array belongs to based on the schema.

Parameters:

array_name (str) – Name of the array

Returns:

Group name for the array, or None if not found in schema

Return type:

str

get_keys()

Retrieves a list of all stored data keys.

Returns:

A list of keys in the storage.

Return type:

list

get_schema()

Retrieves the schema definition for this storage instance.

Returns:

The schema definition, or empty dict if no schema is set

Return type:

dict

get_store_group()
Return type:

StoreGroup

load_data(data_key)

Loads data from memory by key.

Parameters:

data_key (str) – The key of the data to load.

Returns:

The loaded data.

Return type:

Any

Raises:

KeyError – If the key does not exist.

load_metadata(key)

Loads metadata by key from memory storage. Supports nested keys using ‘/’ as a separator.

Parameters:

key (str) – The metadata key, which can be a nested key using ‘/’ as a separator.

Returns:

The loaded metadata value.

Return type:

Any

Raises:

KeyError – If the key does not exist.

save_data(data_key, data)

Stores the data in memory and tracks file size.

Parameters:
  • data_key (str) – The key under which to store the data.

  • data (Any) – The data to store.

Return type:

None

save_metadata(key, value)

Stores metadata as a key-value pair in memory. Supports nested keys using ‘/’ as a separator.

Parameters:
  • key (str) – The metadata key, which can be a nested key using ‘/’ as a separator.

  • value (Any) – The metadata value.

Return type:

None

set_schema(schema)

Sets a schema definition for this storage instance. The schema defines how arrays should be organized into groups for merging.

Parameters:

schema (dict) – Schema definition containing group and array specifications

Return type:

None

validate_data_against_schema(data_key, data)

Validates data against the stored schema if present.

Parameters:
  • data_key (str) – The key under which the data will be stored

  • data (Any) – The data to validate

Returns:

True if valid or no schema present, False if validation fails

Return type:

bool

class hoops_ai.storage.MetricStorage(store)

Bases: object

Abstract class defining the interface for storing machine learning metrics based on their type of data and visualization.

Parameters:

store (DataStorage)

get_storage()

Returns the storage handler for this metric storage.

Returns:

The storage handler instance.

list_data_ids(name)

Returns all data_ids pushed under ‘name’ by examining the Zarr store keys.

Parameters:

name (str)

Return type:

List[int]

pull_category_metric(name, epoch)

Pulls category-based metric data for a specific epoch from storage.

Parameters:
  • name (str) – The metric name (e.g., “per_class_accuracy”).

  • epoch (int) – The epoch for which to retrieve the metric.

Returns:

A tuple containing a list of categories and the corresponding metric values.

Raises:

ValueError – If the specified epoch is not found.

Return type:

Tuple[List[int], List[float]]

pull_data(name, data_id)

Loads prediction data from storage.

Parameters:
  • name (str) – The name of the prediction metric.

  • data_id (int) – The identifier for the data.

Returns:

A NumPy array containing the prediction results.

Return type:

ndarray

pull_matrix_metric(name, epoch)

Pulls a matrix-based metric for a specific epoch from storage.

Parameters:
  • name (str) – The metric name (e.g., “confusion_matrix”, “feature_correlation”).

  • epoch (int) – The epoch for which to retrieve the matrix metric.

Returns:

A 2D NumPy array representing the matrix metric.

Raises:

ValueError – If the specified epoch is not found.

Return type:

ndarray

pull_trend_metric(name)

Pulls trend metric data from storage.

Parameters:

name (str) – The metric name (e.g., “train_loss”, “val_accuracy”).

Returns:

A tuple containing a list of epochs and corresponding metric values.

Return type:

Tuple[List[int], List[float]]

push_category_metric(name, epoch, categories, values)

Pushes category-based metrics incrementally in memory before writing them to storage.

Reason: Compares performance across different categories (e.g., classes, features) for each epoch.

Metrics Included:

  • Per-Class Accuracy (one bar per class)

  • Per-Class IoU (one bar per class)

  • Feature Importance (e.g., in random forests, SHAP values)

Parameters:
  • name (str) – The metric name (e.g., “per_class_accuracy”).

  • epoch (int) – The current epoch index.

  • categories (List[int]) – List of category labels (e.g., class indices, feature indices).

  • values (List[float] | List[int]) – List of metric values corresponding to each category.

push_matrix_metric(name, epoch, matrix)

Pushes matrix-based metrics incrementally in memory before writing them to storage.

Reason: Stores structured relationships between multiple variables in matrix form, per epoch.

Metrics Included:

  • Confusion Matrix (classification tasks)

  • Correlation Matrix (e.g., feature correlations)

Parameters:
  • name (str) – The metric name (e.g., “confusion_matrix”, “feature_correlation”).

  • epoch (int) – The current epoch index.

  • matrix (ndarray) – A 2D NumPy array representing the metric (e.g., confusion matrix).

push_predictions(name, data_id, result)

Saves prediction data to storage.

Parameters:
  • name (str) – The name of the prediction metric.

  • data_id (int) – The identifier for the data.

  • result (ndarray) – A NumPy array containing the prediction results.

push_trend_metric(name, epoch, value)

Pushes trend metrics incrementally in memory before writing to a file.

Reason: Tracks values over time (epochs) to analyze learning progress.

Metrics Included:

  • Loss (training, validation, test)

  • Accuracy over epochs

  • Precision/Recall over epochs

  • F1-score over epochs

  • IoU (mean IoU, per-class IoU over time)

  • RMSE/MSE/MAE for regression tasks over epochs

  • Learning Rate schedules

Parameters:
  • name (str) – The metric name (e.g., “train_loss”, “val_accuracy”).

  • epoch (int) – The current epoch index.

  • value (float) – The metric value at the given epoch.

store(compress=False)

Writes all stored metrics in memory to the storage handler and clears memory storage.

Parameters:

compress (bool) – If True, compresses the store after saving the data.

hoops_ai.storage.convert_storage(source_handler, dest_handler, verbose=False)

Generic converter that works with ANY DataStorage implementation.

This universal converter supports all storage types including: - OptStorage / ZarrStorage - Compressed binary format - JsonStorageHandler - Human-readable JSON format - MemoryStorage - In-memory storage (no files) - Custom implementations - Any class inheriting from DataStorage

The converter uses only the base DataStorage interface methods: - get_keys() or load_metadata() to discover data keys - load_data() to read data arrays - save_data() to write data arrays - save_metadata() to copy metadata

This makes it completely storage-agnostic and extensible.

Common Use Cases: - OptStorage → JSON: Decompress for inspection/debugging - JSON → OptStorage: Compress for performance/storage - MemoryStorage → OptStorage: Persist in-memory data to disk - OptStorage → MemoryStorage: Load encoded data for ML training

Parameters:
  • source_handler (DataStorage) – Source storage to read from (any type).

  • dest_handler (DataStorage) – Destination storage to write to (any type).

  • verbose (bool) – Print progress messages. Default False.

Returns:

None

Raises:

RuntimeError – If no data keys found or conversion fails.

Return type:

None

Examples

>>> # OptStorage → JSON (decompression)
>>> opt = OptStorage("flows/data_mining/abc123")
>>> json_storage = JsonStorageHandler("flows/json/abc123")
>>> convert_storage(opt, json_storage)
>>> # MemoryStorage → OptStorage (persist to disk)
>>> mem = MemoryStorage()
>>> mem.save_data("faces/positions", np.array([[0,0,0], [1,1,1]]))
>>> opt = OptStorage("output/saved_data")
>>> convert_storage(mem, opt)
>>> # JSON → MemoryStorage (load for processing)
>>> json_storage = JsonStorageHandler("data/abc123")
>>> mem = MemoryStorage()
>>> convert_storage(json_storage, mem)