=================== CAD Data Encoding =================== .. sidebar:: Table of Contents .. contents:: :depth: 1 :local: Overview ======== **What is CAD Encoding?** The **CAD Encoding** module (:autolink:`hoops_ai.cadencoder`) is the bridge between CAD geometry and machine learning. It transforms **symbolic CAD representations** (surfaces, curves, topological relationships) into **numeric** :term:`feature vectors ` that neural networks can process. .. tip:: **New to CAD or ML?** This guide assumes familiarity with: - **CAD concepts**: :term:`B-rep `, :term:`Topology`, faces, edges → See :doc:`/programming_guide/cad-fundamentals` - **ML concepts**: :term:`Feature vectors `, :term:`graph neural networks `, :term:`node features ` → See :doc:`/programming_guide/ml-fundamentals` - **Key terms**: Check the :doc:`/resources/glossary` for quick definitions **The Core Challenge**: CAD files store geometry in a **semantic, mathematical form** (symbolic representations with precise meaning): - A planar face is stored as: "Plane equation: :math:`0.707x + 0.707y + 0z = 10`" - A cylindrical face is: "Cylinder with axis :math:`[0, 0, 1]`, radius :math:`5mm`" - Face adjacency is: "Face #5 and Face #12 share edge #23" Machine learning models need **fixed-size numeric vectors** (:term:`feature vectors ` - arrays of numbers representing object properties): - Face features: ``[area, perimeter, centroid_x, centroid_y, centroid_z, ...]`` - Edge features: ``[length, angle, curvature, ...]`` - Graph structure: ``edges = [[src_nodes], [dst_nodes]]`` .. image:: /_assets/images/cad-encoding.jpg :alt: CAD Encoding Architecture :align: center :width: 80% **What This Module Does**: :autolink:`hoops_ai.cadencoder` provides the :autolink:`BrepEncoder ` class that: 1. **Queries CAD data** using :autolink:`HOOPSBrep ` interface (from :autolink:`cadaccess ` module) 2. **Computes numeric features** (areas, lengths, angles, surface types) 3. **Structures data for ML** (:term:`face adjacency graphs `, feature arrays, :term:`UV grids `) 4. **Persists to storage** using :autolink:`DataStorage ` interface (from :autolink:`storage ` module) **Why "Push" Methods?** The :autolink:`BrepEncoder` class computes and persists geometric and topological features from BREP data. It follows a **push-based architecture** where each method: 1. Checks if data already exists in storage 2. Ensures the appropriate schema definition exists for the data 3. Computes the feature if needed 4. Saves to storage with schema management 5. Returns None (if storage is used) or the computed data (if no storage) The encoder automatically manages schemas for data organization, creating groups and arrays as needed during the encoding process. .. seealso:: - :doc:`/programming_guide/cad-data-access` - How to query CAD geometry using HOOPSBrep - :doc:`/programming_guide/storage` - How DataStorage works and when to use different backends - :doc:`/programming_guide/cad-fundamentals` - Understanding B-rep topology and geometry Architecture - How Encoding Works ================================== **The Data Flow**: .. code-block:: text CAD File (part.step) ↓ HOOPSLoader (loads file) ↓ HOOPSModel (in-memory representation) ↓ HOOPSBrep (query interface) ↓ BrepEncoder ←--------→ DataStorage (feature extraction) (persistence) ↓ Encoded Dataset (.data file or memory dict) **Component Interaction**: 1. **HOOPSBrep**: Provides query methods (:autolink:`get_face_attributes() `, :autolink:`get_edge_attributes() `, :autolink:`build_face_adjacency_graph() `, etc.) 2. **BrepEncoder**: Orchestrates feature extraction by calling :autolink:`HOOPSBrep ` methods 3. **DataStorage**: Receives extracted features and saves them (Zarr arrays, JSON, etc.) **Why This Architecture?**: - **Separation of Concerns**: :autolink:`HOOPSBrep ` focuses on queries, :autolink:`BrepEncoder ` on feature engineering, :autolink:`DataStorage ` on persistence - **Testability**: Mock :autolink:`HOOPSBrep ` and :autolink:`DataStorage ` for unit testing encoders - **Flexibility**: Swap :term:`storage backends ` without changing encoding logic .. tip:: **Getting Started?** If you're new to this workflow: 1. Start with the simple example in the next section 2. See :doc:`/tutorials/index` for hands-on encoding walkthroughs 3. Understand what gets encoded by reading :doc:`/programming_guide/cad-fundamentals` first The BrepEncoder Class ===================== **What is BrepEncoder?** :autolink:`BrepEncoder ` is the **main feature extraction engine** in HOOPS AI. It systematically processes a B-rep model and generates all the numeric features needed for ML training. **Initialization**: .. code-block:: python from hoops_ai.cadencoder import BrepEncoder from hoops_ai.storage import DataStorage # With storage storage = DataStorage(...) encoder = BrepEncoder(brep_access=brep, storage_handler=storage) # Without storage (returns raw data) encoder = BrepEncoder(brep_access=brep) **Parameters**: - **brep_access** (HOOPSBrep): BREP interface from a loaded CAD model - **storage_handler** (DataStorage, optional): Storage backend for persistence .. **Constructor and Dependencies**: .. This code snippet loads a CAD file, creates a BrepEncoder, and prepares for encoding: .. .. code-block:: python .. import pathlib .. import os .. nb_dir = pathlib.Path.cwd() .. output_dir = nb_dir.joinpath("out") .. # Create output directory if it does not exist .. if not output_dir.exists(): .. os.makedirs(output_dir) .. from hoops_ai.cadencoder import BrepEncoder .. from hoops_ai.cadaccess import HOOPSLoader, HOOPSModel, HOOPSTools .. from hoops_ai.storage import MemoryStorage .. # 1. Load CAD file (cadaccess layer) .. cad_loader = HOOPSLoader() .. model = cad_loader.create_from_file("part.step") .. # Specify the brep Options and modify the model .. hoopstools = HOOPSTools() .. brep_options = hoopstools.brep_options() .. brep_options["force_compute_uv"] = True .. brep_options["force_compute_3d"] = True .. hoopstools.adapt_brep(cad_model, brep_options ) .. brep = model.get_brep() # HOOPSBrep interface .. # 2. Create storage handler (storage layer) .. storage = MemoryStorage() # Or OptStorage("output.data") for Zarr .. # 3. Create encoder (bridges CAD and Storage) .. encoder = BrepEncoder(brep, storage) **Constructor Signature**: .. code-block:: python def __init__(self, brep_access: HOOPSBrep, storage_handler: DataStorage = None): """ Args: brep_access: The B-Rep geometry data source interface (from cadaccess module) storage_handler: Optional, object to load/save data from disk or memory """ **What the Encoder Needs**: - A :autolink:`hoops_ai.cadaccess.hoops_exchange.hoops_brep.HOOPSBrep` object (from loaded CAD model) - this is the "source" of geometric queries - Optional :autolink:`hoops_ai.storage.DataStorage` object - this is the "sink" where features are saved (if None, methods return data directly) .. **Design Pattern - The Push Architecture**: .. The encoder follows a **push-based** design: .. .. code-block:: text .. Traditional "Pull" Pattern: HOOPS AI "Push" Pattern: .. features = encoder.extract_all() encoder.push_face_attributes() .. storage.save(features) → Queries brep .. → Computes features .. → Pushes to storage immediately .. # Pull: Extract everything, # Push: Extract and save incrementally .. # then save all at once # Better for large datasets (streaming) .. **Why Push Instead of Pull?**: .. 1. **Memory Efficiency**: Don't need to hold all features in memory before saving .. 2. **Incremental Processing**: Extract just what you need (skip unused features) .. 3. **Streaming**: Process large datasets that don't fit in RAM .. 4. **Error Recovery**: If encoding fails mid-way, partial results are already saved .. 5. **Parallel Processing**: Different workers can push to same storage (with locking) .. **How to Use the Encoder**: .. .. code-block:: python .. # Create encoder .. encoder = BrepEncoder(brep, storage) .. # Option 1: Push everything (typical for ML pipelines) .. key, num_faces, num_edges = encoder.push_face_adjacency_graph() # Topology .. face_keys, face_type_desc = encoder.push_face_attributes() # Face geometry .. edge_keys, edge_type_desc = encoder.push_edge_attributes() # Edge geometry .. grid_key = encoder.push_facegrid(32, 32) # Surface sampling (UV grids) .. # Option 2: Push selectively (for specific ML tasks) .. key, num_faces, num_edges = encoder.push_face_adjacency_graph() # GNNs need graph .. face_keys, face_type_desc = encoder.push_face_attributes() # + face features .. # Option 3: Custom encoding (extend BrepEncoder) .. class CustomEncoder(BrepEncoder): .. def push_custom_features(self): .. # Compute domain-specific features .. ... .. note:: **Prerequisite**: Understand what face adjacency graphs are and why they matter. See: - :doc:`/programming_guide/cad-fundamentals` - B-rep Topology section explains faces, edges, adjacency - :doc:`/programming_guide/cad-data-access` - Topological Queries section shows how to build adjacency graphs Topology Encoding Methods ========================== **What is Topology Encoding?** Topology encoding extracts the **connectivity structure** of the :term:`B-rep ` - which entities connect to which. This is distinct from **geometry encoding** (sizes, shapes, positions). See :doc:`/programming_guide/cad-fundamentals` for the difference between :term:`Topology` and geometry. **Why Topology Matters for ML**: - **Graph Neural Networks**: :term:`Topology` defines the graph edges (:term:`Message Passing` paths) - see :doc:`/programming_guide/ml-fundamentals` for how GNNs use graph structure - **Feature Recognition**: Machining features are **subgraphs** with specific topology (e.g., "pocket = 6 connected planar faces forming a box") - **Manufacturing Constraints**: Adjacent faces must have compatible machining directions - **Segmentation**: Group faces that are topologically connected BrepEncoder.push_face_adjacency_graph() --------------------------------------- The method :autolink:`BrepEncoder.push_face_adjacency_graph < hoops_ai.cadencoder.brep_encoder.BrepEncoder.push_face_adjacency_graph>` builds a face adjacency graph from the B-rep model. This graph represents the topology of the model where nodes are faces and edges connect adjacent faces. **What It Does**: Build a graph representation of face connectivity where faces are nodes and edges represent shared boundaries. **Mathematical Formulation**: Define an undirected graph :math:`G=(V,E)` where: .. math:: V = \mathcal{F} = \{f_0, f_1, \ldots, f_{N_f-1}\} E = \{(f_i, f_j) : f_i \text{ and } f_j \text{ share an edge}\} The graph is represented by: - Node count: :math:`|V| = N_f` - Edge list: :math:`\{(s_k, d_k)\}_{k=0}^{|E|-1}` where :math:`s_k, d_k \in V` **Method Signature**: .. code-block:: python def push_face_adjacency_graph(self) -> Union[Tuple[str, int, int], nx.Graph]: """ Returns: If storage_handler is not None: Tuple[str, int, int] - (storage_key, num_faces, num_edges) If storage_handler is None: nx.Graph - the face adjacency graph directly """ **Usage**: .. code-block:: python # we assume 'cad_model' is your loaded CADModel instance from hoops_ai.cadencoder import BrepEncoder brep_encoder = BrepEncoder(cad_model.get_brep()) adj_graph = brep_encoder.push_face_adjacency_graph() print(adj_graph) import networkx as nx import matplotlib.pyplot as plt pos = nx.spring_layout(adj_graph) # compute layout once nx.draw_networkx(adj_graph, pos, arrows=False) # draw nodes, edges, labels plt.axis('off') # turn off axes for clarity plt.show() **Example Output**: This example shows a ``DiGraph with 21 nodes and 46 edges``. The images below show the 3D CAD model and its corresponding face adjacency graph: .. list-table:: :widths: 50 50 :class: borderless * - .. figure:: /_assets/images/part.png :alt: 3D CAD model :width: 100% 3D CAD model with 21 faces - .. figure:: /_assets/images/example-adjacency-graph.png :alt: Face adjacency graph :width: 100% Face adjacency graph representation The graph visualization shows nodes (faces) numbered 0-20 and edges connecting adjacent faces. The spring layout algorithm positions the nodes for clarity. Each node in the graph corresponds to a face in the 3D model, and edges represent shared boundaries between faces. **Storage Format Details**: The encoder stores graph data in **two formats simultaneously** for compatibility: 1. **Flat arrays**: - `num_nodes`: scalar count of nodes in the graph - `edges_source`: source node indices for each edge - `edges_destination`: destination node indices for each edge - `graph`: nested structure containing edges dict and num_nodes (for backward compatibility) 2. **Nested dictionary**: `Dtypes: int32` **Returns**: - With storage: Returns ``None`` (data is stored with keys: "num_nodes", "edges_source", "edges_destination", and "graph") - Without storage: Returns ``nx.Graph`` - NetworkX graph object with edge attributes .. note:: **Understanding the format**: If you're unfamiliar with graph representations for ML, see :doc:`/programming_guide/ml-fundamentals` - the "Graph Representation of CAD Models" section explains nodes, edges, and :term:`node features `. BrepEncoder.push_extended_adjacency() -------------------------------------- The method :autolink:`BrepEncoder.push_extended_adjacency ` computes the extended adjacency matrix representing shortest path distances between all pairs of faces and ensures 'extended_adjacency' is in storage or returns the extended adjacency data directly. **What It Does**: Computes shortest path distances between **all pairs of faces** using Floyd-Warshall algorithm on the face adjacency graph. This provides global topological context. **Method Signature**: .. code-block:: python def push_extended_adjacency(self) -> Union[str, np.ndarray]: """ Returns: If storage_handler is not None: str - storage key ("extended_adjacency") If storage_handler is None: np.ndarray - shape (num_faces, num_faces) """ **Usage**: .. code-block:: python # Compute all-pairs shortest paths key = encoder.push_extended_adjacency() # Later: check topological distance distances = storage.load_data("extended_adjacency") # distances[i, j] = shortest path length from face i to face j # distances[i, i] = 0 (same face) # distances[i, j] = 1 (directly adjacent) # distances[i, j] = 2 (connected through one intermediate face) **Mathematical Formulation**: Compute the graph distance matrix :math:`\mathbf{D}_G \in \mathbb{R}^{N_f \times N_f}`: .. math:: D_G[i,j] = \begin{cases} 0 & \text{if } i = j \\ \min\{|p| : p \text{ is path from } f_i \text{ to } f_j\} & \text{if path exists} \\ \infty & \text{otherwise} \end{cases} where :math:`|p|` is the number of edges in path :math:`p`. This is computed using the Floyd-Warshall or BFS algorithm via NetworkX's `all_pairs_shortest_path_length`. **Storage**: - Array: extended_adjacency - Shape: [node_i, node_j] - Dtype: float32 **Returns**: - With storage: Returns ``None`` (data is stored with key ``"extended_adjacency"``) - Without storage: ``np.ndarray`` of shape ``(N_f, N_f)`` BrepEncoder.push_face_neighbors_count() ---------------------------------------- The method :autolink:`BrepEncoder.push_face_neighbors_count ` counts the number of adjacent faces for each face and ensures 'face_neighborscount' is in storage or returns the neighbor counts directly. **What It Does**: Count the number of adjacent faces for each face (node degree in the graph). **Usage**: .. code-block:: python key = encoder.push_face_neighbors_count() neighbor_counts = storage.load_data("face_neighborscount") # neighbor_counts[i] = number of faces adjacent to face i **Mathematical Formulation**: For each face :math:`f_i`, compute the degree: .. math:: \deg(f_i) = |\{f_j \in \mathcal{F} : (f_i, f_j) \in E\}| **Storage:** - **Array:** ``face_neighborscount`` - **Shape:** ``[face]`` - **Dtype:** ``int32`` **Returns:** - With storage: Returns ``None`` (data is stored with key ``"face_neighborscount"``) - Without storage: ``np.ndarray`` of shape ``(N_f,)`` BrepEncoder.push_face_pair_edges_path(max_allow_edge_length=16) --------------------------------------------------------------- The method :autolink:`BrepEncoder.push_face_pair_edges_path ` computes the sequence of edges along the shortest path between all pairs of faces and ensures 'face_pair_edges_path' is in storage or returns the edge paths directly. **What It Does**: Store the sequence of shared edges along the shortest path between every pair of faces. **Usage**: .. code-block:: python key = encoder.push_face_pair_edges_path(max_allow_edge_length=16) edge_paths = storage.load_data("face_pair_edges_path") # Shape: (num_faces, num_faces, 16) # edge_paths[i, j, :] = edge indices from face i to face j (-1 for padding) **Mathematical Formulation:** For each face pair :math:`(f_i, f_j)`, find the shortest path: .. math:: p_{ij} = [f_i = v_0, v_1, \ldots, v_k = f_j] Then extract the edge sequence: .. math:: \mathbf{e}_{ij} = [e(v_0, v_1), e(v_1, v_2), \ldots, e(v_{k-1}, v_k)] where :math:`e(u,v)` is the edge index connecting faces :math:`u` and :math:`v` . If :math:`|\mathbf{e}_{ij}| > M` (max_allow_edge_length), truncate to first :math:`M` edges. Pad with :math:`-1` if path is shorter. **Storage:** - **Array:** ``face_pair_edges_path`` - **Shape:** ``[face_i, face_j, path_idx]`` - **Dtype:** ``int32`` **Parameters:** - `max_allow_edge_length` (int): Maximum path length to store (default: 16) **Returns:** - With storage: Returns ``None`` (data is stored with key ``"face_pair_edges_path"``) - Without storage: ``np.ndarray`` of shape ``(N_f, N_f, M)`` Geometry Encoding Methods ========================== **What is Geometry Encoding?** Geometry encoding extracts **numeric measurements** of CAD entities - sizes, shapes, positions, and curvatures. While :term:`Topology` tells us which faces are connected, geometry tells us their actual physical properties. See :doc:`/programming_guide/cad-fundamentals` for the topology vs. geometry distinction. BrepEncoder.push_face_attributes() ---------------------------------- The method :autolink:`BrepEncoder.push_face_attributes ` Extracts and stores various face attributes, including face types, areas, and loop counts. **What It Does**: Compute geometric and topological properties of each face. **Method Signature**: .. code-block:: python def push_face_attributes(self) -> Union[Tuple[List[str], Dict], Tuple[List[np.ndarray], Dict]]: """ Returns: If storage_handler is not None: Tuple[List[str], Dict] - (list_of_stored_keys, face_type_descriptions) Example: (['face_types', 'face_areas', 'face_loops'], {9: 'Plane', 10: 'Cylinder'}) If storage_handler is None: Tuple[List[np.ndarray], Dict] - (list_of_arrays, face_type_descriptions) """ **Usage**: **With Storage Handler**: .. code-block:: python # Extract face attributes (with storage handler) keys, face_type_desc = encoder.push_face_attributes() print(f"Stored face data at keys: {keys}") # Output: ['face_types', 'face_areas', 'face_loops'] print(f"Face type descriptions:") for type_id, description in face_type_desc.items(): print(f" {type_id}: {description}") # Output: # 9: Plane # 10: Cylinder # 11: Cone # ... # Later: retrieve from storage face_types = storage.load_data("face_types") # int32 array[num_faces] face_areas = storage.load_data("face_areas") # float32 array[num_faces] face_loops = storage.load_data("face_loops") # int32 array[num_faces] **Without Storage Handler**: .. code-block:: python [face_types, face_areas, face_loops], face_types_descr = encoder.push_face_attributes() print("face_types", face_types) print("face_areas", face_areas) print("face_loops", face_loops) print("face_types_descr", face_types_descr) **Example Output**: .. code-block:: text face_types [0 1 0 1 0 0 0 1 0 1 1 1 0 0 0 0 0 1 1 2 2] face_areas [ 43.911655 141.3149 75.277115 51.831074 24.732485 57.030937 19.963306 12.871587 28.228918 39.265965 39.265965 12.871587 57.030937 57.030933 57.030937 57.030937 57.030933 51.831074 141.3149 10.575602 10.575602] face_loops [2 1 2 1 2 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1] face_types_descr {0: 'Plane', 1: 'Cylinder', 2: 'Cone'} **Mathematical Formulation**: For each face :math:`f_i \in \mathcal{F}`: 1. **Surface Type** :math:`\tau(f_i)`: Categorical classification (plane, cylinder, sphere, etc.) .. math:: \tau: \mathcal{F} \rightarrow \mathbb{Z}^+ 2. **Face Area** :math:`A(f_i)`: Surface integral over the face .. math:: A(f_i) = \iint_{S_i} dS 3. **Loop Count** :math:`L(f_i)`: Number of boundary loops (including holes) .. math:: L(f_i) = |\{\text{loops in } f_i\}| **Storage:** - **Arrays:** ``face_types``, ``face_areas``, ``face_loops`` - **Shapes:** All ``[face]`` - **Dtypes:** ``int32``, ``float32``, ``int32`` **Returns:** - With storage: Returns ``None`` (data is stored with keys: ``"face_types"``, ``"face_areas"``, ``"face_loops"``, and metadata ``"descriptions/face_types"``) - Without storage: ``Tuple[List[np.ndarray], Dict]`` - (list of numpy arrays [face_types, face_areas, face_loops], face_types_descr dictionary mapping face type IDs to descriptions) BrepEncoder.push_edge_attributes() ---------------------------------- The method :autolink:`BrepEncoder.push_edge_attributes ` Extracts and stores various edge attributes, including edge types, lengths, dihedral angles, and convexities. **What It Does**: Compute geometric and topological properties of each edge. **Method Signature**: .. code-block:: python def push_edge_attributes(self) -> Union[Tuple[List[str], Dict], Tuple[List[np.ndarray], Dict]]: """ Returns: If storage_handler is not None: Tuple[List[str], Dict] - (list_of_stored_keys, edge_type_descriptions) If storage_handler is None: Tuple[List[np.ndarray], Dict] - (list_of_arrays, edge_type_descriptions) """ **Mathematical Formulation**: For each edge :math:`e_i \in \mathcal{E}`: 1. **Curve Type** :math:`\kappa(e_i)`: Categorical classification (line, circle, spline, etc.) .. math:: \kappa: \mathcal{E} \rightarrow \mathbb{Z}^+ 2. **Edge Length** :math:`\ell(e_i)`: Arc length of the curve .. math:: \ell(e_i) = \int_{0}^{1} \left\| \frac{d\mathbf{C}(t)}{dt} \right\| dt where :math:`\mathbf{C}(t)` is the curve parameterization. 3. **Dihedral Angle** :math:`\theta(e_i)`: Angle between adjacent face normals .. math:: \theta(e_i) = \arccos(\mathbf{n}_1 \cdot \mathbf{n}_2) where :math:`\mathbf{n}_1, \mathbf{n}_2` are unit normals of adjacent faces. 4. **Convexity** :math:`\chi(e_i) \in \{-1, 0, 1\}`: .. math:: \chi(e_i) = \begin{cases} 1 & \text{if convex} \\ 0 & \text{if smooth} \\ -1 & \text{if concave} \end{cases} **Storage:** - **Arrays:** ``edge_types``, ``edge_lengths``, ``edge_dihedral_angles``, ``edge_convexities`` - **Shapes:** All ``[edge]`` - **Dtypes:** ``int32``, ``float32``, ``float32``, ``int32`` **Returns:** - With storage: Returns ``None`` (data is stored with keys: ``"edge_types"``, ``"edge_lengths"``, ``"edge_dihedral_angles"``, ``"edge_convexities"``, and metadata ``"descriptions/edge_types"``) - Without storage: ``Tuple[List[np.ndarray], Dict]`` - (list of numpy arrays [edge_types, edge_lengths, edge_dihedrals, edge_convexities], edge_type_descrip dictionary mapping edge type IDs to descriptions) **Usage**: **With Storage Handler**: .. code-block:: python # Extract edge attributes (with storage handler) keys, edge_type_desc = encoder.push_edge_attributes() print(f"Stored edge data at keys: {keys}") # Output: ['edge_types', 'edge_lengths', 'edge_dihedral_angles', 'edge_convexities'] # Later: retrieve from storage edge_types = storage.load_data("edge_types") # int32 array[num_edges] edge_lengths = storage.load_data("edge_lengths") # float32 array[num_edges] edge_dihedrals = storage.load_data("edge_dihedral_angles") # float32 array[num_edges] edge_convexities = storage.load_data("edge_convexities") # int32 array[num_edges] **Without Storage Handler**: .. code-block:: python [edge_types_np, edge_lengths_np, edge_dihedrals_np, edge_convexities_np], edge_type_descrip = brep_encoder.push_edge_attributes() print("edge_types_np", edge_types_np) print("edge_lengths_np", edge_lengths_np) print("edge_dihedrals_np", edge_dihedrals_np) print("edge_convexities_np", edge_convexities_np) print("edge_type_descrip", edge_type_descrip) **Example Output**: .. code-block:: text edge_types_np [1 1 1 1 0 1 0 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 0 0 0 0 0 0] edge_lengths_np [14.137167 14.137167 7.8539815 7.8539815 18. 7.8539815 18. 17.278759 17.278759 7.8539815 3. 3. 17.278759 17.278759 5.196152 5.196152 5.196152 5.196152 5.196152 5.196152 10.97561 5.196152 10.97561 5.196152 5.196152 5.196152 5.196152 5.196152 12.566371 12.566371 1.0243902 12.566371 1.0243902 15.707963 15.707963 12.566371 2.5 2.5 15.707963 15.707963 10.97561 10.97561 10.97561 10.97561 0.70710677 0.70710677] edge_dihedrals_np [ 7.8539819e-01 7.8539819e-01 -1.5707964e+00 -1.5707964e+00 2.4492937e-16 1.5707964e+00 0.0000000e+00 -1.5707964e+00 -1.5707964e+00 1.5707964e+00 0.0000000e+00 2.4492937e-16 1.5707964e+00 1.5707964e+00 -1.5707964e+00 -1.5707964e+00 -1.5707964e+00 -1.5707964e+00 -1.5707964e+00 -1.5707964e+00 1.0471976e+00 1.5707964e+00 1.0471976e+00 1.5707964e+00 1.5707964e+00 1.5707964e+00 1.5707964e+00 1.5707964e+00 -1.5707964e+00 -1.5707964e+00 2.4492937e-16 1.5707964e+00 0.0000000e+00 -1.5707964e+00 -1.5707964e+00 1.5707964e+00 0.0000000e+00 2.4492937e-16 7.8539819e-01 7.8539819e-01 1.0471976e+00 1.0471976e+00 1.0471976e+00 1.0471976e+00 0.0000000e+00 0.0000000e+00] edge_convexities_np [1 1 2 2 3 1 3 2 2 1 3 3 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 1 2 2 3 1 3 2 2 1 3 3 1 1 1 1 1 1 3 3] edge_type_descrip {1: 'Circle', 0: 'Line'} **What Gets Stored**: .. list-table:: :header-rows: 1 :widths: 25 15 20 40 * - Storage Key - Type - Shape - Description * - ``edge_types`` - int32 - ``(num_edges,)`` - Curve type IDs (1=Line, 2=Circle, 3=Ellipse, 4=NURBS, etc.) * - ``edge_lengths`` - float32 - ``(num_edges,)`` - Curve length in model units * - ``edge_dihedral_angles`` - float32 - ``(num_edges,)`` - Angle between adjacent faces (radians) * - ``edge_convexities`` - int32 - ``(num_edges,)`` - Convexity: 1=convex, -1=concave, 0=smooth/tangent * - ``descriptions/edge_types`` - metadata - dict - Human-readable curve type names: ``{1: 'Line', 2: 'Circle', ...}`` BrepEncoder.push_curvegrid(ugrid=5) --------------------------------------- The method :autolink:`BrepEncoder.push_curvegrid ` Samples points along edges at regular intervals. **What It Does**: Sample points and tangents along edge curves. **Method Signature**: .. code-block:: python def push_curvegrid(self, ugrid: int = 5) -> Union[str, np.ndarray]: """ Args: ugrid: Number of samples along each edge (default: 5) Returns: If storage_handler is not None: str - storage key ("edge_u_grids") If storage_handler is None: np.ndarray - shape (num_edges, ugrid-2, 6) """ **Usage**: .. code-block:: python edge_grids = brep_encoder.push_curvegrid(ugrid=3) print("edge_grids\n", edge_grids[0]) **Example Output**: .. code-block:: text edge_grids [[ 2.2000000e+01 4.5000000e+00 0.0000000e+00 -0.0000000e+00 0.0000000e+00 -1.0000000e+00] [ 2.2000000e+01 2.2500000e+00 -3.8971143e+00 0.0000000e+00 -8.6602539e-01 -5.0000000e-01] [ 2.2000000e+01 -4.5000000e+00 -1.5436120e-14 0.0000000e+00 -3.4302490e-15 1.0000000e+00]] **Mathematical Formulation**: For each edge :math:`e_i`, sample along the curve parameter: .. math:: \mathbf{C}_i = \left[\mathbf{C}(t_j), \mathbf{T}(t_j)\right]_{j=0}^{U-1} where: - :math:`\mathbf{C}: [0,1] \rightarrow \mathbb{R}^3` is the curve - :math:`\mathbf{T}(t) = \frac{d\mathbf{C}(t)}{dt}` is the tangent vector - :math:`t_j = \frac{j}{U-1}` for :math:`j = 0, \ldots, U-1` **Storage:** - **Array:** ``edge_u_grids`` - **Shape:** ``[edge, u, component]`` where component includes (x,y,z) + (tx,ty,tz) - **Dtype:** ``float32`` **Parameters:** - ``ugrid`` (int): Number of samples along edge (default: 5) **Returns:** - With storage: ``str`` - key name ``"edge_u_grids"`` - Without storage: ``np.ndarray`` of shape ``(N_e, ugrid, 6)`` BrepEncoder.push_face_indices() -------------------------------- **What it does:** Extract and store unique identifiers for all faces in the model. **Mathematical Formulation:** .. math:: \mathcal{F} = \{f_0, f_1, \ldots, f_{N_f-1}\} where :math:`\mathcal{F}` is the set of face indices and :math:`N_f` is the total number of faces. **Storage:** - **Group:** ``faces`` - **Array:** ``face_indices`` - **Shape:** ``[face]`` - **Dtype:** ``int32`` **Returns:** - With storage: ``str`` - key name ``"face_indices"`` - Without storage: ``np.ndarray`` of shape ``(N_f,)`` BrepEncoder.push_edge_indices() -------------------------------- **What it does:** Extract and store unique identifiers for all edges in the model. **Mathematical Formulation:** .. math:: \mathcal{E} = \{e_0, e_1, \ldots, e_{N_e-1}\} where :math:`\mathcal{E}` is the set of edge indices and :math:`N_e` is the total number of edges. **Storage:** - **Group:** ``edges`` - **Array:** ``edge_indices`` - **Shape:** ``[edge]`` - **Dtype:** ``int32`` **Returns:** - With storage: ``str`` - key name ``"edge_indices"`` - Without storage: ``np.ndarray`` of shape ``(N_e,)`` BrepEncoder.push_face_discretization(pointsamples=25) ----------------------------------------------------- **What it does:** Sample points and normals on face surfaces using uniform point sampling (rather than structured UV grids). **Mathematical Formulation:** For each face :math:`f_i`, sample :math:`P` points uniformly across the surface: .. math:: \mathbf{P}_{i} = \left[\mathbf{S}(\mathbf{u}_j), \mathbf{N}(\mathbf{u}_j), V(\mathbf{u}_j)\right]_{j=1}^{P} where: - :math:`\mathbf{S}: \Omega \rightarrow \mathbb{R}^3` is the surface parameterization - :math:`\mathbf{N}: \Omega \rightarrow \mathbb{S}^2` is the normal field - :math:`V: \Omega \rightarrow \{0,1\}` is visibility status (inside/outside) - :math:`\mathbf{u}_j` are uniformly sampled parameter points across the face - :math:`P` is the number of sample points (default: 25) The sampling uses three methods concatenated along the component axis: 1. **Point samples**: :math:`(x, y, z)` coordinates 2. **Normal samples**: :math:`(n_x, n_y, n_z)` unit normals 3. **Inside/outside flags**: visibility indicators **Storage:** - **Array:** ``face_discretization`` - **Shape:** ``[face, sample, component]`` where component includes (x,y,z) + (nx,ny,nz) + (visibility) - **Dtype:** ``float32`` **Parameters:** - ``pointsamples`` (int): Number of points to sample per face (default: 25) **Returns:** - With storage: ``str`` - key name ``"face_discretization"`` - Without storage: ``np.ndarray`` of shape ``(N_f, pointsamples, 7)`` Histogram-Based Features ========================= BrepEncoder.push_average_face_pair_distance_histograms(grid=5, num_bins=64) ------------------------------------------------------------------------------ The method :autolink:`BrepEncoder.push_average_face_pair_distance_histograms ` computes histograms of point-to-point distances between all pairs of faces and ensures 'd2_distance' is in storage or returns the distance histograms directly. **What It Does**: Compute normalized histograms of pairwise point-to-point distances between all face pairs (D2 shape descriptor). **Usage**: .. code-block:: python key = encoder.push_average_face_pair_distance_histograms(grid=5, num_bins=64) distance_histograms = storage.load_data("d2_distance") # Shape: (num_faces, num_faces, 64) # Histogram of distances between sample points from face i and face j **Implementation Notes**: - Uses optimized sampling: maximum 25 points per face (or fewer if face has less than 25 points) - Employs 2-thread parallel processing for improved performance - Processes faces in two chunks to balance memory and computation **Mathematical Formulation:** 1. **Sample Points:** For each face :math:`f_i`, sample :math:`P` points uniformly: .. math:: \mathcal{P}_i = \{\mathbf{p}_1^i, \mathbf{p}_2^i, \ldots, \mathbf{p}_P^i\} \subset \mathbb{R}^3 2. **Compute Distances:** For faces :math:`f_i` and :math:`f_j`, compute all pairwise distances: .. math:: d_{ij}^{mn} = \|\mathbf{p}_m^i - \mathbf{p}_n^j\|_2, \quad m,n = 1,\ldots,P 3. **Normalize by Diagonal:** Let :math:`D` be the bounding box diagonal: .. math:: D = \|\mathbf{b}_{\max} - \mathbf{b}_{\min}\|_2 Normalized distances: .. math:: \tilde{d}_{ij}^{mn} = \frac{d_{ij}^{mn}}{D} 4. **Build Histogram:** Bin the normalized distances into :math:`B` bins over :math:`[0,1]`: .. math:: H_{ij}[b] = \frac{1}{P^2} \sum_{m=1}^P \sum_{n=1}^P \mathbb{1}\left[\frac{b}{B} \leq \tilde{d}_{ij}^{mn} < \frac{b+1}{B}\right] Result: :math:`\mathbf{H} \in \mathbb{R}^{N_f \times N_f \times B}` where :math:`H_{ij}` is the distance histogram between faces :math:`i` and :math:`j`. **Storage:** - **Group:** ``histograms`` - **Array:** ``d2_distance`` - **Shape:** ``[face_i, face_j, bin]`` - **Dtype:** ``float32`` **Parameters:** - ``grid`` (int): Grid density for sampling (default: 5) - ``num_bins`` (int): Number of histogram bins (default: 64) **Returns:** - With storage: Returns ``None`` (data is stored with key ``"d2_distance"``) - Without storage: ``np.ndarray`` of shape ``(N_f, N_f, num_bins)`` BrepEncoder.push_average_face_pair_angle_histograms(grid=5, num_bins=64) ------------------------------------------------------------------------ The method :autolink:`BrepEncoder.push_average_face_pair_angle_histograms ` computes histograms of angles between normals for all pairs of faces and ensures 'a3_distance' is in storage or returns the angle histograms directly. **What It Does**: Compute normalized histograms of pairwise normal-to-normal angles between all face pairs (A3 shape descriptor). **Implementation Notes**: - Uses optimized sampling: maximum 25 normals per face (or fewer if face has less than 25 normals) - Employs 2-thread parallel processing for improved performance - Processes faces in two chunks to balance memory and computation **Usage**: .. code-block:: python key = encoder.push_average_face_pair_angle_histograms(grid=5, num_bins=64) angle_histograms = storage.load_data("a3_distance") # Shape: (num_faces, num_faces, 64) # Histogram of angles between normal vectors from face i and face j **Mathematical Formulation:** 1. **Sample Normals:** For each face :math:`f_i`, sample :math:`P` normal vectors: .. math:: \mathcal{N}_i = \{\mathbf{n}_1^i, \mathbf{n}_2^i, \ldots, \mathbf{n}_P^i\} \subset \mathbb{S}^2 2. **Compute Angles:** For faces :math:`f_i` and :math:`f_j`, compute all pairwise angles: .. math:: \theta_{ij}^{mn} = \arccos(\mathbf{n}_m^i \cdot \mathbf{n}_n^j), \quad m,n = 1,\ldots,P Clamping: :math:`\mathbf{n}_m^i \cdot \mathbf{n}_n^j \in [-1, 1]` to avoid numerical issues. 3. **Normalize to [0,1]:** .. math:: \tilde{\theta}_{ij}^{mn} = \frac{\theta_{ij}^{mn}}{\pi} 4. **Build Histogram:** Bin the normalized angles into :math:`B` bins: .. math:: H_{ij}^{\theta}[b] = \frac{1}{P^2} \sum_{m=1}^P \sum_{n=1}^P \mathbb{1}\left[\frac{b}{B} \leq \tilde{\theta}_{ij}^{mn} < \frac{b+1}{B}\right] Result: :math:`\mathbf{H}^{\theta} \in \mathbb{R}^{N_f \times N_f \times B}` where :math:`H_{ij}^{\theta}` is the angle histogram between faces :math:`i` and :math:`j`. **Storage:** - **Group:** ``histograms`` - **Array:** ``a3_distance`` - **Shape:** ``[face_i, face_j, bin]`` - **Dtype:** ``float32`` **Parameters:** - ``grid`` (int): Grid density for sampling normals (default: 5) - ``num_bins`` (int): Number of histogram bins (default: 64) **Returns:** - With storage: Returns ``None`` (data is stored with key ``"a3_distance"``) - Without storage: ``np.ndarray`` of shape ``(N_f, N_f, num_bins)`` .. _complete_encoding_example: Complete Encoding Example ========================== Here's a comprehensive encoding workflow following the actual usage pattern from the tutorials: .. code-block:: python from hoops_ai.cadaccess import HOOPSLoader from hoops_ai.cadencoder import BrepEncoder from hoops_ai.storage import OptStorage # 1. Load CAD file loader = HOOPSLoader() model = loader.create_from_file("part.step") # 2. Extract BREP brep = model.get_brep() # 3. Initialize storage and encoder storage = OptStorage(output_path="./encoded_data") encoder = BrepEncoder(brep_access=brep, storage_handler=storage) # 4. Extract geometric features encoder.push_face_indices() encoder.push_edge_indices() encoder.push_face_attributes() encoder.push_edge_attributes() # 5. Extract parameterized grids encoder.push_face_discretization(pointsamples=100) encoder.push_curvegrid(ugrid=20) # 6. Extract topology encoder.push_face_adjacency_graph() encoder.push_extended_adjacency() encoder.push_face_neighbors_count() # 7. Extract shape descriptors encoder.push_average_face_pair_distance_histograms(grid=7, num_bins=64) encoder.push_average_face_pair_angle_histograms(grid=7, num_bins=64) print("Encoding complete!") .. Understanding the Encoder .. ========================== .. Dual-Mode Behavior: With vs Without Storage .. -------------------------------------------- .. **Critical Design**: ALL `push_*` methods have **dual behavior** depending on whether a storage handler is provided: .. .. code-block:: python .. # Mode 1: WITH storage handler (production) .. storage = OptStorage("output.data") .. encoder = BrepEncoder(brep, storage) .. key, num_faces, num_edges = encoder.push_face_adjacency_graph() .. # Returns: ("graph", 42, 84) .. # Side effect: Data saved to storage["graph"] .. # Mode 2: WITHOUT storage handler (testing/debugging) .. encoder = BrepEncoder(brep) # storage_handler defaults to None .. graph = encoder.push_face_adjacency_graph() .. # Returns: networkx.Graph object directly .. # Side effect: None (data in memory only) .. **Why This Design?** .. - **Production**: Automatic persistence to disk (Zarr format) .. - **Testing**: Fast in-memory operations without I/O .. - **Debugging**: Inspect data structures directly without loading from storage .. Method Dependencies .. ------------------- .. Some `push_*` methods **require** other methods to be called first: .. .. code-block:: python .. # Dependencies enforced by runtime checks: .. encoder.push_face_attributes() .. # ❌ ERROR: Requires push_face_indices() first .. encoder.push_extended_adjacency() .. # ❌ ERROR: Requires push_face_adjacency_graph() first .. # Correct order: .. encoder.push_face_adjacency_graph() # Creates face_indices internally .. encoder.push_face_attributes() # ✓ Works now .. encoder.push_extended_adjacency() # ✓ Works now .. **Dependency Chain**: .. .. code-block:: text .. push_face_adjacency_graph() .. ├── (calls internally) push_face_indices() .. ├── (calls internally) push_edge_indices() .. └── enables → push_extended_adjacency() .. └── enables → push_face_neighbors_count() .. └── enables → push_face_pair_edges_path() .. push_face_indices() .. └── enables → push_face_attributes() .. └── enables → push_facegrid() .. └── enables → push_average_face_pair_*() .. push_edge_indices() .. └── enables → push_edge_attributes() .. └── enables → push_curvegrid() .. Schema System (Advanced) .. ------------------------- .. The encoder automatically manages data structure via schemas: .. .. code-block:: python .. # Automatic schema creation (default behavior) .. encoder = BrepEncoder(brep, storage) .. encoder.push_face_attributes() .. # Creates schema group "faces" with arrays: .. # - face_types: (face,) int32 .. # - face_areas: (face,) float32 .. # - face_loops: (face,) int32 .. # Custom schema (advanced) .. from hoops_ai.storage.datasetstorage.schema_builder import SchemaBuilder .. builder = SchemaBuilder(domain="Custom_CAD", version="2.0") .. faces_group = builder.create_group("faces", "face", "Face data") .. faces_group.add_array("types", ["face"], "int32", "Surface types") .. faces_group.add_array("areas", ["face"], "float32", "Surface areas") .. schema = builder.build() .. storage.set_schema(schema) .. encoder = BrepEncoder(brep, storage) .. **When to Use Custom Schemas**: .. - Merging datasets from multiple sources (consistent structure required) .. - Adding domain-specific metadata .. - Enforcing data validation rules .. - Integration with production ML pipelines .. Feature Engineering Patterns .. ============================= .. Common patterns for deriving ML-ready features from raw CAD data: .. Normalization .. ------------- .. .. code-block:: python .. import numpy as np .. def normalize_geometric_features(storage): .. """Normalize geometric features to [0, 1] range""" .. # Load face data .. face_areas = storage.load_data("face_areas") .. edge_lengths = storage.load_data("edge_lengths") .. # Normalize by model scale (using diagonal length) .. # Note: get_diagonal_length would need to be computed separately .. max_area = np.max(face_areas) .. max_length = np.max(edge_lengths) .. normalized_areas = face_areas / max_area .. normalized_lengths = edge_lengths / max_length .. # Store normalized values .. storage.save_data("face_areas_normalized", normalized_areas) .. storage.save_data("edge_lengths_normalized", normalized_lengths) .. Categorical Encoding .. -------------------- .. .. code-block:: python .. from sklearn.preprocessing import LabelEncoder, OneHotEncoder .. def encode_surface_types(storage): .. """Convert surface type IDs to one-hot vectors""" .. # Load face types .. face_types = storage.load_data("face_types") .. # One-hot encoding .. encoder = OneHotEncoder(sparse=False) .. face_types_onehot = encoder.fit_transform(face_types.reshape(-1, 1)) .. # Store .. storage.save_data("face_types_onehot", face_types_onehot) .. # Also store mapping for later interpretation .. type_descriptions = storage.load_metadata("descriptions/face_types") .. storage.save_metadata("surface_type_classes", type_descriptions) .. Aggregated Features .. ------------------- .. .. code-block:: python .. def compute_part_level_features(storage): .. """Compute global part-level statistics from face/edge data""" .. # Load data .. face_areas = storage.load_data("face_areas") .. face_types = storage.load_data("face_types") .. edge_lengths = storage.load_data("edge_lengths") .. # Compute statistics .. part_features = { .. 'total_surface_area': np.sum(face_areas), .. 'mean_face_area': np.mean(face_areas), .. 'std_face_area': np.std(face_areas), .. 'num_faces': len(face_areas), .. 'num_edges': len(edge_lengths), .. 'num_unique_surface_types': len(np.unique(face_types)), .. 'total_edge_length': np.sum(edge_lengths) .. } .. # Store as metadata .. storage.save_metadata("part_statistics", [part_features]) Performance Considerations ========================== Memory Management ----------------- - The encoder uses a **push-and-discard** pattern: data is computed, saved, and not kept in memory - Large arrays (histograms) use chunked processing with ThreadPoolExecutor - UV grids and curve grids are stacked only temporarily Parallelization ------------------ - Face pair histograms use 2-thread parallel processing - Sampling operations are vectorized with NumPy - Graph algorithms leverage NetworkX's optimized implementations Storage Efficiency ------------------ - Float32 is used throughout for memory/disk efficiency - Zarr format provides compression and chunked access - Schema management ensures consistent data organization Next Steps ========== - Read :doc:`/programming_guide/storage` to understand DataStorage backends (OptStorage, MemoryStorage) - Study :doc:`/programming_guide/datasets` for batch processing and dataset management - Review :doc:`/programming_guide/flow` for automating encoding across many CAD files - Try hands-on tutorials in :doc:`/tutorials/index` for practical examples