hoops_ai.ml.embeddings.HOOPSEmbeddings

class hoops_ai.ml.embeddings.HOOPSEmbeddings(cad_loader=None, model='', device='cpu')

Bases: ShapeEmbeddingsModel

HOOPS AI shape embeddings using graph neural networks on B-Rep data.

Model registry: Maps model names to their FlowModel class and checkpoint file.

To add a new model: 1. Add entry to MODEL_REGISTRY with format:

“model_name”: (FlowModelClass, “checkpoint_filename.ckpt”)

  1. Place checkpoint file in packages/trained_ml_models/

Parameters:
embed_shape(cad_path, storage=None)

Embed a CAD file into a vector representation.

Parameters:
  • cad_path (str) – Path to the CAD file to embed

  • storage (DataStorage) – Optional DataStorage instance. If provided, embeddings will be automatically saved with schema and key “EMBEDDINGS/HOOPS_EMBEDDINGS”

Returns:

Embedding object containing the vector representation

Return type:

Embedding

embed_shape_batch(cad_path_list, num_workers=None, show_progress=True, specifications=None)

Compute embeddings for multiple CAD files in parallel using process pools.

This method uses the ParallelExecutor infrastructure to distribute the embedding computation across multiple worker processes. Each worker initializes its own HOOPSEmbeddings model instance once and reuses it for all files assigned to that worker.

Parameters:
  • cad_path_list (List[str]) – List of CAD file paths to process

  • num_workers (int | None) – Number of parallel workers (None = auto-detect based on CPU count)

  • show_progress (bool) – Show progress bar during processing (default: True)

  • specifications (Dict | None) – Dictionary with execution specifications. Default values: - ‘file_size_bucketing’: True - ‘min_available_ram_gb’: None - ‘min_available_ram_percent’: 10 - ‘ram_check_interval_s’: 0.25 - ‘time_limit_overall’: 120.0 - ‘time_limit_small’: 120.0 - ‘time_limit_medium’: 120.0 - ‘time_limit_large’: 120.0 - ‘start_method’: ‘spawn’ - ‘log_dir’: ‘.’

Returns:

  • values: numpy array of shape (n_successful, dim) with embeddings

  • model: Model identifier string

  • dim: Embedding dimensionality

  • ids: List of successfully processed file paths

  • metadata: Dict with ‘failed_count’, ‘total_processed’, ‘num_workers’

Return type:

EmbeddingBatch containing

Example

>>> embedder = HOOPSEmbeddings(model="ts3d_scl_dual_v1")
>>> files = ["part1.step", "part2.step", "part3.step"]
>>> # Use 4 parallel workers
>>> batch = embedder.embed_shape_batch(files, num_workers=4)
>>> print(batch.values.shape)  # (3, 256) if all succeeded
>>> print(batch.ids)  # ["part1.step", "part2.step", "part3.step"]
>>> # Auto-detect workers based on CPU count
>>> batch = embedder.embed_shape_batch(files)

Note

For small batches (< 4 files), sequential processing may be used automatically for efficiency. For single files, use embed_shape() instead.

classmethod list_available_models()

Get list of available pre-trained models.

Returns:

List of model names that can be used with this class

Return type:

list[str]

classmethod register_model(model_name, checkpoint_path)

Register a custom embeddings model for use with HOOPSEmbeddings.

This allows users to register their own trained models and use them just like built-in models. The checkpoint file will be loaded using the EmbeddingFlowModel architecture.

Parameters:
  • model_name (str) – Unique name for the custom trained model

  • checkpoint_path (str) – Absolute path to the checkpoint file (.ckpt)

Raises:
schema()

Return the default schema for HOOPS shape embeddings.

Returns:

Schema object with EMBEDDINGS/HOOPS_EMBEDDINGS array defined

MODEL_REGISTRY: Dict[str, Tuple[Type[FlowModel], str]] = {}
property embedding_dim: int

Return the dimensionality of embeddings produced by this model.

Returns:

Embedding dimension (proj_dim if projection enabled, else emb_dim)

Return type:

int

property model_id: str
property model_name: str