hoops_ai.ml.embeddings.HOOPSEmbeddings
- class hoops_ai.ml.embeddings.HOOPSEmbeddings(cad_loader=None, model='', device='cpu')
Bases:
ShapeEmbeddingsModelHOOPS AI shape embeddings using graph neural networks on B-Rep data.
Model registry: Maps model names to their FlowModel class and checkpoint file.
To add a new model: 1. Add entry to MODEL_REGISTRY with format:
“model_name”: (FlowModelClass, “checkpoint_filename.ckpt”)
Place checkpoint file in packages/trained_ml_models/
- Parameters:
cad_loader (HOOPSLoader)
model (str)
device (str)
- embed_shape(cad_path, storage=None)
Embed a CAD file into a vector representation.
- Parameters:
cad_path (str) – Path to the CAD file to embed
storage (DataStorage) – Optional DataStorage instance. If provided, embeddings will be automatically saved with schema and key “EMBEDDINGS/HOOPS_EMBEDDINGS”
- Returns:
Embedding object containing the vector representation
- Return type:
- embed_shape_batch(cad_path_list, num_workers=None, show_progress=True, specifications=None)
Compute embeddings for multiple CAD files in parallel using process pools.
This method uses the ParallelExecutor infrastructure to distribute the embedding computation across multiple worker processes. Each worker initializes its own HOOPSEmbeddings model instance once and reuses it for all files assigned to that worker.
- Parameters:
cad_path_list (List[str]) – List of CAD file paths to process
num_workers (int | None) – Number of parallel workers (None = auto-detect based on CPU count)
show_progress (bool) – Show progress bar during processing (default: True)
specifications (Dict | None) – Dictionary with execution specifications. Default values: - ‘file_size_bucketing’: True - ‘min_available_ram_gb’: None - ‘min_available_ram_percent’: 10 - ‘ram_check_interval_s’: 0.25 - ‘time_limit_overall’: 120.0 - ‘time_limit_small’: 120.0 - ‘time_limit_medium’: 120.0 - ‘time_limit_large’: 120.0 - ‘start_method’: ‘spawn’ - ‘log_dir’: ‘.’
- Returns:
values: numpy array of shape (n_successful, dim) with embeddings
model: Model identifier string
dim: Embedding dimensionality
ids: List of successfully processed file paths
metadata: Dict with ‘failed_count’, ‘total_processed’, ‘num_workers’
- Return type:
EmbeddingBatch containing
Example
>>> embedder = HOOPSEmbeddings(model="ts3d_scl_dual_v1") >>> files = ["part1.step", "part2.step", "part3.step"] >>> # Use 4 parallel workers >>> batch = embedder.embed_shape_batch(files, num_workers=4) >>> print(batch.values.shape) # (3, 256) if all succeeded >>> print(batch.ids) # ["part1.step", "part2.step", "part3.step"] >>> # Auto-detect workers based on CPU count >>> batch = embedder.embed_shape_batch(files)
Note
For small batches (< 4 files), sequential processing may be used automatically for efficiency. For single files, use embed_shape() instead.
- classmethod list_available_models()
Get list of available pre-trained models.
- classmethod register_model(model_name, checkpoint_path)
Register a custom embeddings model for use with HOOPSEmbeddings.
This allows users to register their own trained models and use them just like built-in models. The checkpoint file will be loaded using the EmbeddingFlowModel architecture.
- Parameters:
- Raises:
ValueError – If model_name already exists in registry
FileNotFoundError – If checkpoint file doesn’t exist
- schema()
Return the default schema for HOOPS shape embeddings.
- Returns:
Schema object with EMBEDDINGS/HOOPS_EMBEDDINGS array defined