CAD Feature Recognition Model

Introduction

CAD feature recognition is a fundamental task in manufacturing and design automation, enabling automatic identification of machining features on individual faces of mechanical components. This capability supports various downstream applications including manufacturing process planning, design rule checking, feature-based similarity search, and automated design analysis.

HOOPS AI provides the GraphNodeClassification model, a node-level (per-face) classifier specifically designed for feature recognition tasks. This model operates directly on Boundary Representation (B-rep) data from 3D CAD models, leveraging both geometric features and topological relationships to produce accurate face-level classifications.

Use Cases

The feature recognition capability addresses several practical scenarios:

Machining Feature Recognition

Automatically identify machining features such as holes, pockets, slots, and chamfers on individual faces. This supports automated manufacturing process planning and CNC programming.

Face Semantic Segmentation

Classify each face based on its functional or manufacturing characteristics, enabling automated design analysis and quality checking.

Manufacturing Process Planning

Determine appropriate machining operations for each feature based on face geometry and topology, supporting automated process selection.

Design Rule Checking

Validate that design features conform to manufacturing constraints and design standards.

Feature-Based Similarity Search

Enable intelligent search and retrieval based on specific feature types and configurations.

Model Overview

The GraphNodeClassification model implements a node-level classification architecture that processes CAD models through the following pipeline:

  1. Graph Representation: Convert B-rep models into graph representations where nodes are faces and edges are adjacency relationships

  2. Rich Feature Encoding: Extract geometric and topological features including face discretization, neighbor relationships, angle histograms, and distance histograms

  3. Feature Learning: Apply Transformer-based encoders and Graph Attention Networks to capture both local and global context

  4. Per-Face Classification: Produce a classification label for each individual face in the CAD model

This approach captures both local geometric details and global topological context, providing robust feature recognition even for complex mechanical parts.

Note

Attribution Notice

This implementation is based on a third-party architecture. For complete attribution and citation information, see Acknowledgments.

Model Architecture

Overview

The GraphNodeClassification model takes B-rep (Boundary Representation) models from CAD files and converts them into graph-based representations suitable for machine learning. This conversion enables per-face classification using a combined Transformer and Graph Neural Network (GNN) approach.

Graph Representation

The model uses a graph-based representation of CAD data:

  • Nodes: Represent the individual faces of the CAD model

  • Edges: Capture adjacency relationships between faces where they share common edges

  • Node Features: Encode rich geometric and topological information for each face

  • Edge Features: Represent curve geometry and relationships between connected faces

Neural Network Components

The model employs several neural network components:

  • Transformer-based Encoder: Implements a multi-layer attention mechanism for feature processing

  • Graph Attention Networks: Aggregate and propagate information from neighboring nodes

  • MLP Classifier: Produces per-node classification predictions for each face

Key Features

The architecture includes several important capabilities:

  • Local Geometric Encoding: Captures surface shape through face-level discretization samples

  • Global Topological Encoding: Leverages the graph structure to capture part-level contextual information

  • Transfer Learning: Supports two-step training approach from synthetic to real CAD models

  • Attention Mechanisms: Enables the model to focus on relevant geometric relationships

┌─────────────────────────────────────────────────────────────────┐
│                        CAD B-rep Model                          │
│                   (Faces + Adjacency Edges)                     │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Graph Representation                         │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ Nodes: Individual faces                                  │   │
│  │ Edges: Face adjacency (shared edges)                     │   │
│  └──────────────────────────────────────────────────────────┘   │
└───────────────────────────────┬─────────────────────────────────┘
                                │
                   ┌────────────┴────────────┐
                   ▼                         ▼
   ┌───────────────────────────┐ ┌───────────────────────────┐
   │   Node Features (Rich)    │ │   Edge Features           │
   │                           │ │                           │
   │ • Face discretization     │ │ • U-grids (10, 6)         │
   │   (pointsamples, 7)       │ │ • Edge attributes         │
   │ • Face attributes         │ │ • Path information        │
   │ • Neighbor count          │ │                           │
   │ • Angle histograms        │ │                           │
   │ • Distance histograms     │ │                           │
   └─────────────┬─────────────┘ └─────────────┬─────────────┘
                 │                             │
                 └──────────────┬──────────────┘
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                 GraphNodeClassification                         │
└─────────────────────────────────────────────────────────────────┘

Output

The model produces a classification label for each face in the CAD model.

Technology Details

This implementation is based on a state-of-the-art architecture for machining feature recognition in B-rep models. The architecture employs:

  • Rich geometric and topological feature encoding

  • Transformer-based attention mechanisms for capturing long-range dependencies

  • Graph neural networks for local neighborhood aggregation

  • Domain adaptation capabilities for transferring from synthetic to real data

Model Initialization

Before using the GraphNodeClassification model, you must initialize it with configuration parameters.

Basic Usage

The simplest initialization requires only the number of feature classes:

from hoops_ai.ml.EXPERIMENTAL import GraphNodeClassification

# Create model with default parameters
flow_model = GraphNodeClassification(
    num_classes=25,
    result_dir="./results"
)

This creates a GraphNodeClassification instance configured for a 25-class feature recognition task, with results saved to the ./results directory.

Parameters

The GraphNodeClassification initialization accepts several parameter categories:

Model Architecture Parameters

Parameter

Type

Default

Description

num_classes

int

25

Number of feature classes (e.g., hole, pocket, slot)

n_layers_encode

int

8

Number of Transformer encoder layers

dim_node

int

256

Node embedding dimension

d_model

int

512

Transformer model dimension

n_heads

int

32

Number of attention heads

dropout

float

0.3

Classifier dropout rate

attention_dropout

float

0.3

Attention mechanism dropout

act_dropout

float

0.3

Activation layer dropout

Training Hyperparameters

Parameter

Type

Default

Description

learning_rate

float

0.002

Initial learning rate

optimizer_betas

Tuple[float, float]

(0.99, 0.999)

AdamW optimizer betas

scheduler_factor

float

0.5

LR reduction factor

scheduler_patience

int

5

Patience for LR scheduler

scheduler_threshold

float

1e-4

Threshold for scheduler

scheduler_min_lr

float

1e-6

Minimum learning rate

scheduler_cooldown

int

2

Cooldown period after LR reduction

max_warmup_steps

int

5000

Warmup steps for learning rate

Logging and Output

Parameter

Type

Default

Description

log_file

str

‘training_errors.log’

Error logging file path

result_dir

str

None

Results output directory

generate_stream_cache_for_visu

bool

False

Generate visualization cache

Advanced Configuration

For production workflows, provide explicit configuration:

flow_model = GraphNodeClassification(
    # Architecture
    num_classes=25,
    n_layers_encode=8,
    dim_node=256,
    d_model=512,
    n_heads=32,
    dropout=0.3,

    # Training
    learning_rate=0.002,

    # Logging
    result_dir="./experiments/feature_recognition",
    log_file="training_errors.log",
    generate_stream_cache_for_visu=True  # Enable for debugging
)

CAD Encoding Strategy

The GraphNodeClassification model uses a richer encoding strategy compared to GraphClassification. While GraphClassification uses standard face discretization and edge features, GraphNodeClassification adds extended topological features including neighbor count, angle histograms between adjacent faces, and distance histograms to neighbor centroids. These additional features provide richer contextual information for per-face classification tasks.

Encoding Pipeline

The encode_cad_data() method follows this pipeline:

def encode_cad_data(self, cad_file: str, cad_loader: CADLoader, storage: DataStorage):
    # 1. Configure CAD loading
    general_options = cad_loader.get_general_options()
    general_options["read_feature"] = True
    general_options["read_solid"] = True

    # 2. Load model
    model = cad_loader.create_from_file(cad_file)

    # 3. Configure BREP with UV computation
    hoopstools = HOOPSTools()
    brep_options = hoopstools.brep_options()
    brep_options["force_compute_uv"] = True
    brep_options["force_compute_3d"] = True
    hoopstools.adapt_brep(model, brep_options)

    # 4. Encode features (RICHER than GraphClassification)
    brep_encoder = BrepEncoder(model.get_brep(body_index=0), storage)

    # Standard features
    brep_encoder.push_face_adjacency_graph()
    brep_encoder.push_face_attributes()
    brep_encoder.push_face_discretization(pointsamples=25)
    brep_encoder.push_edge_attributes()
    brep_encoder.push_curvegrid(10)
    brep_encoder.push_face_pair_edges_path(16)

    # Extended features (specific to GraphNodeClassification)
    brep_encoder.push_extended_adjacency()
    brep_encoder.push_face_neighbors_count()
    brep_encoder.push_average_face_pair_angle_histograms(5, 64)
    brep_encoder.push_average_face_pair_distance_histograms(5, 64)

Feature Specifications

The encoding process extracts different types of features from the CAD model’s B-rep structure. These features are organized into three categories: node features (attached to each face), edge features (attached to connections between faces), and the overall graph structure that represents how faces connect to each other.

Node Features (Per Face):

  1. Face discretization: (pointsamples, 7) - Surface geometry sampled points

  2. Face attributes: Surface type, area, loop count

  3. Neighbor count: Number of adjacent faces

  4. Angle histograms: Distribution of dihedral angles with neighbors

  5. Distance histograms: Distribution of distances to neighbor centroids

Edge Features (Per Face-Face Connection):

  1. U-grids: (10, 6) - Shared edge curve geometry

  2. Edge attributes: Curve type, length, dihedral angle

  3. Path information: Shortest path metrics

Graph Structure:

  • Nodes: All faces in the CAD model

  • Edges: Face adjacency (two faces share an edge)

  • Extended Adjacency: Multi-hop neighbor relationships

Mathematical Representation

For each face \(f_i\) with neighbors \(\mathcal{N}(f_i)\):

Node Embedding:

\[\mathbf{h}_i^{(0)} = \text{Concat}\left( \text{CNN}_{2D}(\mathbf{S}_i),\, \text{MLP}(\mathbf{a}_i),\, \text{Hist}(\{\theta_{ij}\}_{j \in \mathcal{N}(i)}),\, \text{Hist}(\{d_{ij}\}_{j \in \mathcal{N}(i)}) \right)\]

where:

  • \(\mathbf{S}_i\) is the face discretization sample points

  • \(\mathbf{a}_i\) are face attributes (type, area, etc.)

  • \(\theta_{ij}\) is the dihedral angle between faces \(i\) and \(j\)

  • \(d_{ij}\) is the distance between face centroids

Transformer-based Message Passing:

\[\mathbf{h}_i^{(\ell+1)} = \text{TransformerLayer}^{(\ell)}\left( \mathbf{h}_i^{(\ell)},\, \{\mathbf{h}_j^{(\ell)}\}_{j \in \mathcal{N}(i)},\, \{\mathbf{e}_{ij}\}_{j \in \mathcal{N}(i)} \right)\]

Per-Face Classification:

\[\hat{y}_i = \text{softmax}\left(\text{MLP}\left(\mathbf{h}_i^{(L)}\right)\right)\]

Integration with Flow Tasks

Overview

Like GraphClassification, the GraphNodeClassification model integrates with HOOPS AI’s Flow framework. However, there are key differences due to node-level labels.

Pattern: Wrapping FlowModel Methods

The key insight is to instantiate the FlowModel once at the module level, then call its methods inside decorated Flow tasks:

from hoops_ai.flowmanager import flowtask

# 1. Create FlowModel instance
flow_model = GraphNodeClassification(num_classes=25, result_dir="./results")

# 2. Wrap encode_cad_data() in a Flow task
@flowtask.transform(
    name="advanced_cad_encoder",
    inputs=["cad_file", "cad_loader", "storage"],
    outputs=["face_count", "edge_count"]
)
def my_encoder(cad_file: str, cad_loader, storage):
    # Call the FlowModel's encoding method
    face_count, edge_count = flow_model.encode_cad_data(cad_file, cad_loader, storage)

    # Optional: Add custom label processing
    # ... your label code here ...

    # Optional: Convert to graph
    flow_model.convert_encoded_data_to_graph(storage, graph_handler, filename)

    return face_count, edge_count

Key Difference: Node-Level Labels

Unlike graph classification (one label per file), node classification requires one label per face:

# Graph classification (GraphClassification)
storage.save_data("file_label", np.array([3]))  # Single label

# Node classification (GraphNodeClassification)
storage.save_data("face_labels", np.array([0, 1, 1, 2, 0, ...]))  # Label per face

Complete Example: CADSynth-AAG Dataset

This example demonstrates processing the CADSynth-AAG segmentation dataset (162k models) using GraphNodeClassification integrated with HOOPS AI Flows.

Complete Implementation

"""
CADSynth-AAG Segmentation Preprocessing using GraphNodeClassification FlowModel
"""

import pathlib
from typing import Tuple, List

# Flow framework imports
from hoops_ai.flowmanager import flowtask
import hoops_ai
from hoops_ai.cadaccess import HOOPSLoader, CADLoader
from hoops_ai.storage import (
    DataStorage,
    MLStorage,
    CADFileRetriever,
    LocalStorageProvider,
    DGLGraphStoreHandler
)
from hoops_ai.dataset import DatasetExplorer

# FlowModel import
from hoops_ai.ml.EXPERIMENTAL import GraphNodeClassification
from hoops_ai.storage.label_storage import LabelStorage

# ==================================================================================
# CONFIGURATION
# ==================================================================================

flows_inputdir = r"C:\Temp\Cadsynth_aag\step"
flows_outputdir = str(pathlib.Path(flows_inputdir))
datasources_dir = str(flows_inputdir)

# ==================================================================================
# FLOWMODEL INSTANTIATION
# ==================================================================================

flowName = "cadsynth_aag_162k_flowtask"
flow_model = GraphNodeClassification(
    num_classes=25,  # Will be updated when labels are available
    result_dir=str(pathlib.Path(flows_inputdir).joinpath("flows").joinpath(flowName))
)

# ==================================================================================
# FLOW TASK DEFINITIONS
# ==================================================================================

@flowtask.extract(
    name="gather_cad_files_to_be_treated",
    inputs=["cad_datasources"],
    outputs=["cad_dataset"]
)
def my_demo_gatherer(source: str) -> List[str]:
    """
    Gather all CAD files from the dataset directory.
    """
    cad_formats = [".stp", ".step"]
    local_provider = LocalStorageProvider(directory_path=source)
    retriever = CADFileRetriever(
        storage_provider=local_provider,
        formats=cad_formats
    )
    return retriever.get_file_list()


@flowtask.transform(
    name="advanced_cad_encoder",
    inputs=["cad_file", "cad_loader", "storage"],
    outputs=["face_count", "edge_count"]
)
def my_demo_encoder(cad_file: str, cad_loader: HOOPSLoader, storage: DataStorage) -> Tuple[int, int]:
    """
    Encode CAD data using GraphNodeClassification FlowModel.
    """
    # ===== CALL FLOWMODEL METHOD =====
    face_count, edge_count = flow_model.encode_cad_data(cad_file, cad_loader, storage)
    # =================================

    # Convert encoded data to DGL graph file
    location = pathlib.Path(storage.get_file_path("."))
    dgl_output_path = pathlib.Path(location.parent.parent / "dgl" / f"{location.stem}.ml")
    dgl_output_path.parent.mkdir(parents=True, exist_ok=True)

    # ===== CALL FLOWMODEL METHOD =====
    flow_model.convert_encoded_data_to_graph(storage, DGLGraphStoreHandler(), str(dgl_output_path))
    # =================================

    return face_count, edge_count

# ==================================================================================
# FLOW ORCHESTRATION
# ==================================================================================

def main():
    """
    Execute the CADSynth-AAG preprocessing flow.
    """
    cad_flow = hoops_ai.create_flow(
        name=flowName,
        tasks=[
            my_demo_gatherer,
            my_demo_encoder
        ],
        max_workers=40,
        flows_outputdir=str(flows_outputdir),
        ml_task="Feature Recognition with GraphNodeClassification",
    )

    output, dict_data, flow_file = cad_flow.process(
        inputs={'cad_datasources': [datasources_dir]}
    )

    print(output.summary())

    explorer = DatasetExplorer(flow_output_file=str(flow_file))
    explorer.print_table_of_contents()


if __name__ == "__main__":
    main()

Key Integration Points

FlowModel Instantiation

Create the GraphNodeClassification model instance once at the module level to ensure consistency across all Flow tasks:

# Instantiate ONCE at module level
flow_model = GraphNodeClassification(num_classes=25, result_dir="./results")

Encoding Task Wrapper

Wrap the FlowModel’s encode_cad_data() method inside a Flow task to enable batch processing while maintaining access to custom logic:

@flowtask.transform(...)
def my_demo_encoder(cad_file, cad_loader, storage):
    # Call FlowModel method directly
    face_count, edge_count = flow_model.encode_cad_data(cad_file, cad_loader, storage)

    # Add custom logic (labels, schema, etc.)
    # ...

    # Call another FlowModel method
    flow_model.convert_encoded_data_to_graph(storage, graph_handler, filename)

    return face_count, edge_count

Rich Feature Encoding

This demonstrates the extended features that GraphNodeClassification encodes beyond the standard GraphClassification approach:

# Inside flow_model.encode_cad_data():
brep_encoder.push_face_adjacency_graph()           # Standard graph structure
brep_encoder.push_face_discretization(pointsamples=25)  # Standard face sampling
brep_encoder.push_extended_adjacency()             # Extended topological features
brep_encoder.push_face_neighbors_count()           # Neighbor counting
brep_encoder.push_average_face_pair_angle_histograms(5, 64)  # Angle relationships
brep_encoder.push_average_face_pair_distance_histograms(5, 64)  # Distance relationships

Output Structure

The Flow generates a structured output directory containing encoded CAD data, graph files ready for machine learning, and execution metadata:

Cadsynth_aag/
└── flows/
    └── cadsynth_aag_162k_flowtask/
        ├── encoded/           # Individual .data files
        ├── dgl/              # DGL graph files for ML training
        └── flow_output.json  # Flow execution summary

Training Workflow

Step 1: Preprocess Dataset

Use the Flow example above to create ML-ready graph files.

Step 2: Prepare Face-Level Labels

Important: Node classification requires labels for each face in each model.

# Example: Load face labels from annotation file
import json

def load_face_labels(cad_file, annotation_file):
    """
    Load per-face labels from annotation.
    Returns array of shape (num_faces,) with class labels.
    """
    with open(annotation_file) as f:
        annotations = json.load(f)

    # Extract face labels for this CAD file
    face_labels = annotations[cad_file]["face_labels"]
    return np.array(face_labels)

Step 3: Load Dataset

from hoops_ai.dataset import DatasetLoader

# Load preprocessed graphs
# Note: For node classification, labels are stored IN the graph files
dataset_loader = DatasetLoader(
    graph_files=["./flows/cadsynth_aag_162k_flowtask/dgl/*.ml"]
    # No separate label files - labels are per-face in graph
)

# Split into train/val/test
dataset_loader.split_data(train_ratio=0.7, val_ratio=0.15, test_ratio=0.15)

Step 4: Create Trainer

from hoops_ai.ml import FlowTrainer

trainer = FlowTrainer(
    flowmodel=flow_model,  # Same instance used in Flow
    datasetLoader=dataset_loader,
    batch_size=32,
    num_workers=4,
    experiment_name="machining_feature_recognition",
    accelerator='gpu',
    devices=1,
    max_epochs=100,
    result_dir="./experiments"
)

Step 5: Train Model

# Train and get best checkpoint
best_checkpoint = trainer.train()
print(f"Training complete! Best model: {best_checkpoint}")

# Evaluate on test set
trainer.test(trained_model_path=best_checkpoint)

# Access metrics
metrics = trainer.metrics_storage()
train_loss = metrics.get("train_loss")
val_accuracy = metrics.get("val_node_accuracy")  # Note: node-level accuracy

Step 6: Monitor Training

# Launch TensorBoard
tensorboard --logdir=./experiments/ml_output/machining_feature_recognition/

Metrics to Monitor:

  • train_loss: Per-face cross-entropy loss

  • val_node_accuracy: Percentage of correctly classified faces

  • val_per_class_accuracy: Accuracy for each feature class

Inference Workflow

Single File Inference

from hoops_ai.ml import FlowInference
from hoops_ai.cadaccess import HOOPSLoader

# Setup
cad_loader = HOOPSLoader()
inference = FlowInference(
    cad_loader=cad_loader,
    flowmodel=flow_model,  # Same instance used in training
    log_file='inference_errors.log'
)

# Load trained model
inference.load_from_checkpoint("./experiments/best.ckpt")

# Predict on new CAD file
batch = inference.preprocess("new_part.step")
predictions = inference.predict_and_postprocess(batch)

# Interpret results (PER-FACE predictions)
face_predictions = predictions['node_predictions']  # [N_faces]
face_confidences = predictions['node_probabilities']  # [N_faces, N_classes]
num_faces = predictions['num_faces']

print(f"Model has {num_faces} faces")
for i in range(num_faces):
    pred_class = face_predictions[i]
    confidence = face_confidences[i][pred_class]
    print(f"Face {i}: Class {pred_class} (confidence: {confidence:.2%})")

Visualizing Face Predictions

# Map predictions to CAD model for visualization
feature_names = {
    0: "Base",
    1: "Hole",
    2: "Pocket",
    3: "Slot",
    # ... etc
}

# Create color-coded face map
face_colors = []
for pred in face_predictions:
    feature = feature_names[pred]
    face_colors.append(get_color_for_feature(feature))

# Export for visualization
# (Use HOOPS Communicator or other CAD viewer)

Hyperparameter Tuning

Architecture Hyperparameters

# Baseline (default)
flow_model = GraphNodeClassification(
    num_classes=25,
    n_layers_encode=8,
    dim_node=256,
    d_model=512,
    n_heads=32,
)

# Larger model for complex features
flow_model = GraphNodeClassification(
    num_classes=25,
    n_layers_encode=12,       # More layers
    dim_node=512,              # Larger embedding
    d_model=1024,              # Larger transformer
    n_heads=16,                # More attention heads
)

Training Hyperparameters

trainer = FlowTrainer(
    flowmodel=flow_model,
    datasetLoader=dataset_loader,

    # Batch size
    batch_size=64,  # Try: 16, 32, 64, 128

    # Epochs
    max_epochs=200,  # Try: 50, 100, 200

    # Device
    accelerator='gpu',
    devices=1,
)

Face Discretization Resolution

# In your encoding task:
def my_encoder(cad_file, cad_loader, storage):
    face_count, edge_count = flow_model.encode_cad_data(cad_file, cad_loader, storage)

    # Override default encoding with custom resolution
    brep_encoder = BrepEncoder(model.get_brep(), storage)
    brep_encoder.push_face_discretization(pointsamples=50)  # Higher resolution (default: 25)
    brep_encoder.push_curvegrid(20)     # Higher resolution (default: 10)

Troubleshooting

Issue: Shape Mismatch During Training

Symptom:

RuntimeError: expected shape [B, 25, 7], got [B, 40, 7]

Cause:

Inconsistent face discretization resolution between files.

Solution:

# Ensure all files use same number of sample points
brep_encoder.push_face_discretization(pointsamples=25)  # Always use 25 points

Issue: Missing Face Labels

Symptom:

KeyError: 'face_labels'

Cause:

Node classification requires per-face labels, but they are not present in the graph files.

Solution:

# Ensure face labels are saved during encoding
storage.save_data("face_labels", face_label_array)  # Array of shape (num_faces,)

Issue: Low Accuracy

Possible Causes:

  1. Insufficient training data: Collect more annotated samples

  2. Class imbalance: Use weighted loss or data augmentation

  3. Poor feature encoding: Verify extended features are being extracted

  4. Hyperparameters: Try different model sizes or learning rates

Solutions:

# Check feature encoding
explorer = DatasetExplorer(flow_output_file="...")
# Verify extended features are present

# Use class weights during training (requires model modification)

Conclusion

GraphNodeClassification provides a production-ready implementation of a node-level classifier for CAD feature recognition. By following the FlowModel interface, it seamlessly integrates with HOOPS AI’s Flow framework for batch preprocessing and supports both training and inference workflows with guaranteed encoding consistency.

Key Takeaways:

  1. Instantiate GraphNodeClassification once at module level

  2. Wrap its methods in @flowtask decorated tasks

  3. Use the same instance for training (FlowTrainer) and inference (FlowInference)

  4. Ensure per-face labels are provided for node classification

  5. Leverage rich feature encoding for better performance

Attribution: This implementation is based on a third-party architecture. When publishing research using this model, please refer to Acknowledgments for proper citation.