hoops_ai.ml.CADSearch
- class hoops_ai.ml.CADSearch(shape_model=None, text_model=None)
Bases:
objectProvides similarity search over CAD datasets using pre-computed embeddings.
Supports: - Indexing embeddings from EmbeddingBatch objects - Shape-based retrieval (query by CAD file) - Text-based retrieval (query by description) (FUTURE WORK) - Saving/loading of vector indices - Metadata filtering
Example
# Create embedding batch from arrays batch = EmbeddingBatch.from_arrays(embeddings, model=”my_model”, ids=part_ids)
# Index and search searcher = CADSearch(shape_model=model) searcher.index_shape(batch) results = searcher.search_by_shape(“query.step”)
- Parameters:
shape_model (Optional[ShapeEmbeddingsModel])
text_model (Optional[TextEmbeddingsModel])
- close()
Close the search instance and release resources.
- Return type:
None
- index_shape(embedding_batch, vector_store=None)
Index shape embeddings from an EmbeddingBatch into a vector store.
- Parameters:
embedding_batch (EmbeddingBatch) – Pre-constructed EmbeddingBatch with embeddings and optional IDs
vector_store (VectorStore | None) – Custom vector store (if None, creates FAISS store)
- Raises:
ValueError – If embedding batch is invalid
- Return type:
None
- index_text(embedding_batch, vector_store=None)
(FUTURE WORK) Index text embeddings from an EmbeddingBatch into a vector store.
- Parameters:
embedding_batch (EmbeddingBatch) – Pre-constructed EmbeddingBatch with text embeddings and optional IDs
vector_store (VectorStore | None) – Custom vector store (if None, creates FAISS store)
- Raises:
ValueError – If embedding batch is invalid
- Return type:
None
- list_shape_ids()
Return shape vector-store record IDs in index order.
The returned IDs are the exact record IDs partners use as keys in a
hoops_ai.ml.context_layer.ContextProvider. This is read-only and does not inspect or interpret per-record metadata.
- load_shape_index(path, vector_store_cls=None)
Load a previously saved shape vector index from disk.
This allows you to skip re-computing shape embeddings and directly use a pre-built index for queries.
- Parameters:
- Returns:
EmbeddingBatch containing all stored (L2-normalized) vectors, their IDs, the model identifier, and the embedding dimension.
- Return type:
Example
# Load pre-built index searcher = CADSearch(shape_model=model) batch = searcher.load_shape_index(“parts_index.faiss”) print(batch.values.shape) # (n_parts, dim)
# Query immediately without indexing results = searcher.search_by_shape(“query.step”, top_k=10)
- load_text_index(path, vector_store_cls=None)
(FUTURE WORK) Load a previously saved text vector index from disk.
This allows you to skip re-computing text embeddings and directly use a pre-built index for queries.
- Parameters:
- Return type:
None
Example
# Load pre-built index searcher = CADSearch(text_model=model) searcher.load_text_index(“text_index.faiss”)
# Query immediately without indexing results = searcher.search_by_text(“steel bracket”, top_k=10)
- save_shape_index(path)
Save the shape vector index to disk for later reuse.
This allows you to persist computed shape embeddings and avoid re-indexing when restarting scripts or machines.
- Parameters:
path (str) – File path to save index (e.g., “parts_index.faiss”)
- Raises:
ValueError – If shape index hasn’t been built yet
- Return type:
None
Example
# Build and save index searcher.index_shape(embedding_batch) searcher.save_shape_index(“parts_index.faiss”)
# Later, load instead of re-indexing searcher.load_shape_index(“parts_index.faiss”)
- save_text_index(path)
(FUTURE WORK) Save the text vector index to disk for later reuse.
This allows you to persist computed text embeddings and avoid re-indexing when restarting scripts or machines.
- Parameters:
path (str) – File path to save index (e.g., “text_index.faiss”)
- Raises:
ValueError – If text index hasn’t been built yet
- Return type:
None
Example
# Build and save index searcher.index_text(embedding_batch) searcher.save_text_index(“text_index.faiss”)
# Later, load instead of re-indexing searcher.load_text_index(“text_index.faiss”)
- search_by_embedding(query_embedding, search_space='shape', top_k=10, filters=None, include_metadata=True, alpha=0.5, num_candidates=50)
Search using a pre-computed embedding.
- Parameters:
- Returns:
List of VectorHit objects sorted by similarity
- Return type:
- search_by_shape(cad_path, top_k=10, filters=None, include_metadata=True, query_body_dedupe_eps=0.0001, alpha=0.5, num_candidates=50, scale_match_threshold=0.99)
Search for similar parts from a query CAD file.
The query is embedded body by body, near-duplicate query bodies can be merged, and the resulting matches may be reranked with geometry signals.
After geometric reranking, hits whose score is >=
scale_match_thresholdare additionally sorted by oriented bounding box similarity (scale-sensitive raw PCA half-extents). Hits below the threshold keep their reranked order.hit.scoreis never modified;_size_similarityis added to high-confidence hit metadata for inspection.- Parameters:
cad_path (str) – Path to the query CAD file.
top_k (int) – Number of results to return to the user per unique query body.
filters (Dict[str, Any] | None) – Optional metadata filters.
include_metadata (bool) – Whether to include metadata in results.
query_body_dedupe_eps (float) – Similarity threshold for deduplicating query bodies.
alpha (float) – Geometry blend factor used during reranking. Must be in [0.0, 1.0].
num_candidates (int) – FAISS fetch pool size (search breadth). Defaults to 50. fetch_k = max(num_candidates, top_k) vectors are retrieved from the vector store; after reranking the top_k best are returned to the user.
scale_match_threshold (float) – Minimum reranked score for a hit to be included in the size-sorted group. Hits scoring >= this value are re-sorted by raw bounding-box size similarity; hits below are appended unchanged. Set to any value > 1.0 to disable size sorting entirely. Default is 0.99.
- Return type:
- search_by_text(query_text, top_k=10, filters=None, include_metadata=True)
(FUTURE WORK) Search for parts by text description.