hoops_ai.ml.CADSearch

class hoops_ai.ml.CADSearch(shape_model=None, text_model=None)

Bases: object

Provides similarity search over CAD datasets using pre-computed embeddings.

Supports: - Indexing embeddings from EmbeddingBatch objects - Shape-based retrieval (query by CAD file) - Text-based retrieval (query by description) (FUTURE WORK) - Saving/loading of vector indices - Metadata filtering

Example

# Create embedding batch from arrays batch = EmbeddingBatch.from_arrays(embeddings, model=”my_model”, ids=part_ids)

# Index and search searcher = CADSearch(shape_model=model) searcher.index_shape(batch) results = searcher.search_by_shape(“query.step”)

Parameters:
close()

Close the search instance and release resources.

Return type:

None

index_shape(embedding_batch, vector_store=None)

Index shape embeddings from an EmbeddingBatch into a vector store.

Parameters:
  • embedding_batch (EmbeddingBatch) – Pre-constructed EmbeddingBatch with embeddings and optional IDs

  • vector_store (VectorStore | None) – Custom vector store (if None, creates FAISS store)

Raises:

ValueError – If embedding batch is invalid

Return type:

None

index_text(embedding_batch, vector_store=None)

(FUTURE WORK) Index text embeddings from an EmbeddingBatch into a vector store.

Parameters:
  • embedding_batch (EmbeddingBatch) – Pre-constructed EmbeddingBatch with text embeddings and optional IDs

  • vector_store (VectorStore | None) – Custom vector store (if None, creates FAISS store)

Raises:

ValueError – If embedding batch is invalid

Return type:

None

list_shape_ids()

Return shape vector-store record IDs in index order.

The returned IDs are the exact record IDs partners use as keys in a hoops_ai.ml.context_layer.ContextProvider. This is read-only and does not inspect or interpret per-record metadata.

Return type:

List[str]

load_shape_index(path, vector_store_cls=None)

Load a previously saved shape vector index from disk.

This allows you to skip re-computing shape embeddings and directly use a pre-built index for queries.

Parameters:
  • path (str) – File path to load index from (e.g., “parts_index.faiss”)

  • vector_store_cls (type | None) – Vector store class to use for loading. If None, defaults to FaissVectorStore.

Returns:

EmbeddingBatch containing all stored (L2-normalized) vectors, their IDs, the model identifier, and the embedding dimension.

Return type:

EmbeddingBatch

Example

# Load pre-built index searcher = CADSearch(shape_model=model) batch = searcher.load_shape_index(“parts_index.faiss”) print(batch.values.shape) # (n_parts, dim)

# Query immediately without indexing results = searcher.search_by_shape(“query.step”, top_k=10)

load_text_index(path, vector_store_cls=None)

(FUTURE WORK) Load a previously saved text vector index from disk.

This allows you to skip re-computing text embeddings and directly use a pre-built index for queries.

Parameters:
  • path (str) – File path to load index from (e.g., “text_index.faiss”)

  • vector_store_cls (type | None) – Vector store class to use for loading. If None, defaults to FaissVectorStore.

Return type:

None

Example

# Load pre-built index searcher = CADSearch(text_model=model) searcher.load_text_index(“text_index.faiss”)

# Query immediately without indexing results = searcher.search_by_text(“steel bracket”, top_k=10)

save_shape_index(path)

Save the shape vector index to disk for later reuse.

This allows you to persist computed shape embeddings and avoid re-indexing when restarting scripts or machines.

Parameters:

path (str) – File path to save index (e.g., “parts_index.faiss”)

Raises:

ValueError – If shape index hasn’t been built yet

Return type:

None

Example

# Build and save index searcher.index_shape(embedding_batch) searcher.save_shape_index(“parts_index.faiss”)

# Later, load instead of re-indexing searcher.load_shape_index(“parts_index.faiss”)

save_text_index(path)

(FUTURE WORK) Save the text vector index to disk for later reuse.

This allows you to persist computed text embeddings and avoid re-indexing when restarting scripts or machines.

Parameters:

path (str) – File path to save index (e.g., “text_index.faiss”)

Raises:

ValueError – If text index hasn’t been built yet

Return type:

None

Example

# Build and save index searcher.index_text(embedding_batch) searcher.save_text_index(“text_index.faiss”)

# Later, load instead of re-indexing searcher.load_text_index(“text_index.faiss”)

search_by_embedding(query_embedding, search_space='shape', top_k=10, filters=None, include_metadata=True, alpha=0.5, num_candidates=50)

Search using a pre-computed embedding.

Parameters:
  • query_embedding (Embedding) – Pre-computed embedding

  • search_space (str) – Which index to search (“shape” or “text”)

  • top_k (int) – Number of results to return

  • filters (Dict[str, Any] | None) – Optional metadata filters

  • include_metadata (bool) – Whether to include metadata in results

  • alpha (float)

  • num_candidates (int)

Returns:

List of VectorHit objects sorted by similarity

Return type:

List[VectorHit]

search_by_shape(cad_path, top_k=10, filters=None, include_metadata=True, query_body_dedupe_eps=0.0001, alpha=0.5, num_candidates=50, scale_match_threshold=0.99)

Search for similar parts from a query CAD file.

The query is embedded body by body, near-duplicate query bodies can be merged, and the resulting matches may be reranked with geometry signals.

After geometric reranking, hits whose score is >= scale_match_threshold are additionally sorted by oriented bounding box similarity (scale-sensitive raw PCA half-extents). Hits below the threshold keep their reranked order. hit.score is never modified; _size_similarity is added to high-confidence hit metadata for inspection.

Parameters:
  • cad_path (str) – Path to the query CAD file.

  • top_k (int) – Number of results to return to the user per unique query body.

  • filters (Dict[str, Any] | None) – Optional metadata filters.

  • include_metadata (bool) – Whether to include metadata in results.

  • query_body_dedupe_eps (float) – Similarity threshold for deduplicating query bodies.

  • alpha (float) – Geometry blend factor used during reranking. Must be in [0.0, 1.0].

  • num_candidates (int) – FAISS fetch pool size (search breadth). Defaults to 50. fetch_k = max(num_candidates, top_k) vectors are retrieved from the vector store; after reranking the top_k best are returned to the user.

  • scale_match_threshold (float) – Minimum reranked score for a hit to be included in the size-sorted group. Hits scoring >= this value are re-sorted by raw bounding-box size similarity; hits below are appended unchanged. Set to any value > 1.0 to disable size sorting entirely. Default is 0.99.

Return type:

List[List[VectorHit]]

search_by_text(query_text, top_k=10, filters=None, include_metadata=True)

(FUTURE WORK) Search for parts by text description.

Parameters:
  • query_text (str) – Text query

  • top_k (int) – Number of results to return

  • filters (Dict[str, Any] | None) – Optional metadata filters

  • include_metadata (bool) – Whether to include metadata in results

Returns:

List of VectorHit objects sorted by similarity

Return type:

List[VectorHit]