Parts Classification Model

Introduction 

Part classification is a fundamental task in CAD data analysis, enabling automatic categorization of mechanical components based on their geometric and topological properties. This capability supports various downstream applications including automated design retrieval, manufacturing process selection, smart CAD tools, and design recommendation systems.

HOOPS AI provides the GraphClassification model, a graph-level classifier specifically designed for part classification tasks. This model operates directly on Boundary Representation (B-rep) data from 3D CAD models, leveraging both geometric features and topological relationships to produce accurate classifications.

Use Cases

The part classification capability addresses several practical scenarios:

Part Type Recognition: Automatically identify component types such as bearings, bolts, brackets, gears, and housings. This supports automated parts library organization and intelligent search systems.
Shape Categorization: Classify parts based on their overall shape characteristics, enabling design pattern recognition and similarity-based retrieval.
Manufacturing Process Selection: Determine appropriate manufacturing processes (e.g., casting, machining, additive manufacturing) based on part geometry, supporting automated process planning.
Design Style Recognition: Identify design styles and families, enabling consistency checking and design standard enforcement across large CAD databases.

Model Overview

The GraphClassification model implements a graph-level classification architecture that processes CAD models through the following pipeline:

Geometric Encoding: Extract geometric features from faces (surfaces) and edges (curves) using discretization techniques

Topological Representation: Construct a face-adjacency graph representing the topological structure of the CAD model

Feature Learning: Apply Convolutional Neural Networks (CNNs) to geometric features and Graph Neural Networks (GNNs) to topological relationships

Classification: Produce a single classification label for the entire CAD model

This approach captures both the local geometric details and global topological structure, providing robust classification even for complex mechanical parts.

Note

Attribution Notice

The GraphClassification implementation is based on a state-of-the-art third-party architecture for learning from boundary representations. For complete attribution information, original paper citation, and licensing details, please refer to Acknowledgments.

Model Architecture 

Overview

The GraphClassification model operates directly on Boundary Representation (B-rep) data from 3D CAD models using a CNN+GNN approach.

Geometric Encoding

The model processes geometric information from CAD models:

Face Geometry: Discretized sample points sampled on face surfaces

Edge Geometry: 1D U-grids along edge curves

Neural Network Components

The model employs several neural network components:

2D CNNs: Applied to face discretization samples to extract surface features

1D CNNs: Applied to edge U-grids to extract curve features

Graph Neural Networks: Aggregate topological information via face-adjacency graph

Topology Representation

The model captures topological relationships through a face-adjacency graph:

Nodes: Individual faces of the CAD model

Edges: Adjacency relationships between faces

Node Features: Encoded face discretization samples

Edge Features: Encoded edge U-grids

Output

The model produces a single classification label for the entire CAD model.

Model Initialization 

Before using the GraphClassification model, you must initialize it with configuration parameters.

Basic Usage

The simplest initialization requires only the number of classification categories:

from hoops_ai.ml.EXPERIMENTAL import GraphClassification

# Create model with default parameters
flow_model = GraphClassification(
    num_classes=10,
    result_dir="./results"
)

This creates a GraphClassification instance configured for a 10-class classification task, with results saved to the ./results directory.

Parameters

The GraphClassification initialization accepts several parameters:

num_classes (int): Number of classification categories.
result_dir (str, optional): Directory for saving results and metrics.
log_file (str, optional, default: ‘cnn_graph_training_errors.log’): Path to error logging file.
generate_stream_cache_for_visu (bool, optional, default: False): Generate visualization cache for debugging.

Advanced Configuration

For production workflows, provide explicit configuration:

flow_model = GraphClassification(
    num_classes=45,  # FABWAVE dataset has 45 classes
    result_dir="./experiments/part_classification",
    log_file="training_errors.log",
    generate_stream_cache_for_visu=False
)

CAD Encoding Strategy 

The GraphClassification model uses a specific encoding strategy in its encode_cad_data() method.

Encoding Pipeline

The encode_cad_data() method follows this pipeline:

def encode_cad_data(self, cad_file: str, cad_loader: CADLoader, storage: DataStorage):
    # 1. Configure CAD loading
    general_options = cad_loader.get_general_options()
    general_options["read_feature"] = True
    general_options["read_solid"] = True

    # 2. Load model
    model = cad_loader.create_from_file(cad_file)

    # 3. Configure BREP with UV computation
    hoopstools = HOOPSTools()
    brep_options = hoopstools.brep_options()
    brep_options["force_compute_uv"] = True
    brep_options["force_compute_3d"] = True
    hoopstools.adapt_brep(model, brep_options)

    # 4. Encode features
    brep_encoder = BrepEncoder(model.get_brep(body_index=0), storage)

    # Graph structure
    brep_encoder.push_face_adjacency_graph()

    # Node features (faces)
    brep_encoder.push_face_attributes()
    brep_encoder.push_face_discretization(pointsamples=25)  # Sample points per face

    # Edge features
    brep_encoder.push_edge_attributes()
    brep_encoder.push_curvegrid(10)  # 10 points along edge

    # Additional topological features
    brep_encoder.push_face_pair_edges_path(16)

Feature Specifications

Node Features (Face Discretization):

Shape: \((n_{\text{samples}}, 7)\) where \(n_{\text{samples}}\) is typically 25

Components: (x, y, z, nx, ny, nz, visibility)

Encoding: 2D CNN processes each face’s discretized sample points

Edge Features (Edge U-grids):

Shape: \((10, 6)\)

Components: 3D points and tangent vectors along edge curve

Encoding: 1D CNN processes each edge’s U-grid

Graph Structure:

Nodes: Faces of the CAD model

Edges: Face-face adjacency (shared edges)

Mathematical Representation

For each face \(f_i\):

\[\mathbf{X}_{f_i} = \text{CNN}_{2D}\left(\mathbf{S}_{f_i}^{n_{\text{samples}} \times 7}\right) \in \mathbb{R}^{d_{\text{face}}}\]

For each edge \(e_{ij}\) between faces \(f_i\) and \(f_j\):

\[\mathbf{X}_{e_{ij}} = \text{CNN}_{1D}\left(\mathbf{U}_{e_{ij}}^{10 \times 6}\right) \in \mathbb{R}^{d_{\text{edge}}}\]

Graph classification via message passing:

\[\hat{y} = \text{MLP}\left(\text{READOUT}\left(\text{GNN}(G, \{\mathbf{X}_{f_i}\}, \{\mathbf{X}_{e_{ij}}\})\right)\right)\]

Integration with Flow Tasks 

Overview

The GraphClassification model integrates seamlessly with HOOPS AI’s Flow framework via the @flowtask decorator pattern. This allows you to wrap FlowModel methods inside Flow tasks for batch processing of CAD datasets.

Pattern: Wrapping FlowModel Methods

The key insight is to instantiate the FlowModel once at the module level, then call its methods inside decorated Flow tasks:

from hoops_ai.flowmanager import flowtask

# 1. Create FlowModel instance
flow_model = GraphClassification(num_classes=45, result_dir="./results")

# 2. Wrap encode_cad_data() in a Flow task
@flowtask.transform(
    name="advanced_cad_encoder",
    inputs=["cad_file", "cad_loader", "storage"],
    outputs=["face_count", "edge_count"]
)
def my_encoder(cad_file: str, cad_loader, storage):
    # Call the FlowModel's encoding method
    face_count, edge_count = flow_model.encode_cad_data(cad_file, cad_loader, storage)

    # Optional: Add custom label processing
    # ... your label code here ...

    # Optional: Convert to graph
    flow_model.convert_encoded_data_to_graph(storage, graph_handler, filename)

    return face_count, edge_count

This pattern provides several benefits:

Consistency: Encoding logic defined once in FlowModel, reused in Flow
Maintainability: Changes to encoding strategy only need to update FlowModel
Reusability: Same FlowModel used for both training (Flow) and inference
Type Safety: Flow decorators provide clear input/output contracts

Complete Example: FABWAVE Dataset Processing 

This section demonstrates a complete end-to-end workflow for processing the FABWAVE dataset, a benchmark dataset containing 45 distinct mechanical part categories.

Dataset Structure

The FABWAVE dataset organizes parts into folders by category:

fabwave/
├── Bearings/
│   ├── bearing_001.step
│   ├── bearing_002.step
│   └── ...
├── Bolts/
│   ├── bolt_001.step
│   ├── bolt_002.step
│   └── ...
├── Brackets/
│   ├── bracket_001.step
│   └── ...
├── Gears/
└── ...  (45 categories total)

Each folder represents a part classification category, and the folder name serves as the label for all contained CAD files.

Configuration and Setup

The first step is to define the dataset configuration, including file paths, label mappings, and the data schema:

"""
FABWAVE Part Classification Pipeline

This script demonstrates complete processing of the FABWAVE dataset
using the GraphClassification model integrated with HOOPS AI Flows.
"""

import pathlib
import numpy as np
from typing import Tuple, List

# Flow framework imports
from hoops_ai.flowmanager import flowtask
import hoops_ai
from hoops_ai.cadaccess import HOOPSLoader, CADLoader
from hoops_ai.storage import (
    DataStorage,
    MLStorage,
    CADFileRetriever,
    LocalStorageProvider,
    DGLGraphStoreHandler
)
from hoops_ai.storage.datastorage.schema_builder import SchemaBuilder
from hoops_ai.dataset import DatasetExplorer

# FlowModel import
from hoops_ai.ml.EXPERIMENTAL import GraphClassification

# ===== CONFIGURATION =====

# Define input/output directories
flows_inputdir = pathlib.Path(r"C:\path\to\fabwave")
flows_outputdir = pathlib.Path(r"C:\path\to\output")
datasources_dir = str(flows_inputdir)

# Define label mapping: class_id -> {name, description}
labels_description = {
    0: {"name": "Bearings", "description": "FABWAVE bearing samples"},
    1: {"name": "Bolts", "description": "FABWAVE bolt samples"},
    2: {"name": "Brackets", "description": "FABWAVE bracket samples"},
    3: {"name": "Bushings", "description": "FABWAVE bushing samples"},
    # ... (continue for all 45 classes)
    44: {"name": "Wide Grip External Retaining Ring", "description": "FABWAVE retaining ring samples"},
}

# Create reverse mapping: folder_name -> class_id
description_to_code = {v["name"]: k for k, v in labels_description.items()}

The label mapping serves two purposes:

Forward Mapping (labels_description): Converts numeric class IDs to human-readable names for visualization and reporting
Reverse Mapping (description_to_code): Converts folder names to numeric class IDs during encoding

Schema Definition

The schema defines how data is organized in the storage system and how metadata is routed:

# Define schema for part classification
builder = SchemaBuilder(
    domain="Part_classification",
    version="1.0",
    description="Schema for part classification"
)
file_group = builder.create_group("file", "file", "Information related to the cad file")
file_group.create_array("file_label", ["file"], "int32", "FABWAVE part label as integer (0-44)")
builder.define_categorical_metadata('file_label_description', 'str', 'Part classification')
builder.set_metadata_routing_rules(
    categorical_patterns=['file_label_description', 'category', 'type']
)
cad_schema = builder.build()

This schema ensures that:

Part labels are stored in the file group as integer arrays
Label descriptions are routed to the .attribset file for human-readable lookup
All encoded files follow a consistent data organization

Model Instantiation

Create the GraphClassification model instance that will be used throughout the pipeline:

# Define flow name
flowname = "FABWAVE_v2_45classes"

# Create GraphClassification model
flow_model = GraphClassification(
    num_classes=45,  # FABWAVE has 45 part categories
    result_dir=str(pathlib.Path(flows_outputdir).joinpath("flows").joinpath(flowname))
)

This single instance will be used for all encoding operations and later for training and inference, ensuring complete consistency.

Flow Task Definitions

Define the Flow tasks that will gather CAD files and encode them using the GraphClassification model:

File Gathering Task

The first task collects all CAD files from the dataset directory:

@flowtask.extract(
    name="gather_cad_files_to_be_treated",
    inputs=["cad_datasources"],
    outputs=["cad_dataset"]
)
def my_demo_gatherer(source: str) -> List[str]:
    """
    Gather all CAD files from the FABWAVE dataset directory.
    """
    cad_formats = [".stp", ".step"]
    local_provider = LocalStorageProvider(directory_path=source)
    retriever = CADFileRetriever(
        storage_provider=local_provider,
        formats=cad_formats
    )
    return retriever.get_file_list()

This task uses the CADFileRetriever to automatically discover all STEP files, regardless of their location within the directory tree. The returned list becomes the input for the encoding task.

Encoding Task

The encoding task wraps the GraphClassification model’s encoding method and adds dataset-specific label extraction:

@flowtask.transform(
    name="advanced_cad_encoder",
    inputs=["cad_file", "cad_loader", "storage"],
    outputs=["face_count", "edge_count"]
)
def my_demo_encoder(cad_file: str, cad_loader: HOOPSLoader, storage: DataStorage) -> Tuple[int, int]:
    """
    Encode CAD data using GraphClassification FlowModel.

    This task wraps the FlowModel's encode_cad_data() method and adds:
    1. Schema configuration
    2. Label extraction from folder name
    3. Graph conversion for ML training
    """
    # Set schema for storage
    storage.set_schema(cad_schema)

    # ===== CALL FLOWMODEL METHOD =====
    face_count, edge_count = flow_model.encode_cad_data(cad_file, cad_loader, storage)
    # =================================

    # Extract label from folder structure (FABWAVE-specific logic)
    folder_with_name = str(pathlib.Path(cad_file).parent.parent.stem)
    label_code = description_to_code.get(folder_with_name, -1)

    # Save label to storage
    storage.save_data("file_label", np.array([label_code]).astype(np.int64))
    storage.save_metadata(f"file_label_description", [
        {str(label_code): labels_description[label_code]["name"]}
    ])

    # Convert encoded data to DGL graph file
    location = pathlib.Path(storage.get_file_path("."))
    dgl_output_path = pathlib.Path(location.parent.parent / "dgl" / f"{location.stem}.ml")
    dgl_output_path.parent.mkdir(parents=True, exist_ok=True)

    # ===== CALL FLOWMODEL METHOD =====
    flow_model.convert_encoded_data_to_graph(storage, DGLGraphStoreHandler(), str(dgl_output_path))
    # =================================

    return face_count, edge_count

This task demonstrates the complete integration pattern:

Schema Configuration: Ensures consistent data organization across all files

FlowModel Encoding: Delegates geometric/topological encoding to the model

Label Extraction: Implements dataset-specific logic to determine the class label

Metadata Storage: Saves both numeric labels and human-readable descriptions

Graph Conversion: Transforms encoded data into DGL graph format ready for training

Flow Orchestration and Execution

With the tasks defined, create and execute the Flow pipeline:

def main():
    """
    Execute the FABWAVE preprocessing pipeline.

    This function:
    1. Creates a Flow with the defined tasks
    2. Executes the Flow with parallel processing
    3. Prints summary statistics
    4. Explores the resulting dataset
    """
    # Create Flow with tasks
    cad_flow = hoops_ai.create_flow(
        name=flowname,
        tasks=[
            my_demo_gatherer,      # Gather CAD files
            my_demo_encoder        # Encode using FlowModel
        ],
        max_workers=40,  # 40 parallel workers
        flows_outputdir=str(flows_outputdir),
        ml_task="Part Classification with GraphClassification",
    )

    # Execute Flow
    output, dict_data, flow_file = cad_flow.process(
        inputs={'cad_datasources': [datasources_dir]}
    )

    # Print summary
    print(output.summary())

    # Explore dataset
    explorer = DatasetExplorer(flow_output_file=str(flow_file))
    explorer.print_table_of_contents()

    # Filter files with medium face count
    facecount_is_medium = lambda ds: ds['num_nodes'] > 40
    filelist = explorer.get_file_list(group="graph", where=facecount_is_medium)
    print(f"Files with num_nodes > 40: {len(filelist)}")


if __name__ == "__main__":
    main()

Understanding the Output

After successful execution, the output directory contains:

output/
└── flows/
    └── FABWAVE_v2_45classes/
        ├── encoded/           # Individual .data files (Zarr format)
        │   ├── bearing_001.data
        │   ├── bolt_001.data
        │   └── ...
        ├── dgl/              # DGL graph files for ML training
        │   ├── bearing_001.ml
        │   ├── bolt_001.ml
        │   └── ...
        ├── info/             # Metadata (.infoset/.attribset)
        └── flow_output.json  # Flow execution summary

Key Output Files:

.data files: Individual CAD files encoded in Zarr format containing geometric and topological features

.ml files: DGL graph files ready for loading by DatasetLoader for training

.dataset file: Merged dataset combining all individual files (created by auto_dataset_export=True)

.infoset file: Parquet file containing file-level metadata (labels, processing time, etc.)

.attribset file: Parquet file containing categorical metadata (label descriptions, type names, etc.)

.flow file: JSON specification documenting the Flow execution for reproducibility

This structured output enables seamless transition to the training phase using DatasetLoader and FlowTrainer.

Training Workflow 

After preprocessing the dataset using the Flow pipeline, the next step is to train the GraphClassification model to learn part classification from the encoded CAD data.

Dataset Loading and Splitting

The first step in training is to load the preprocessed dataset and perform stratified splitting into training, validation, and test subsets:

from hoops_ai.dataset import DatasetLoader

# Load preprocessed graphs and labels
dataset_loader = DatasetLoader(
    graph_files=["./output/flows/FABWAVE_v2_45classes/dgl/*.ml"],
    label_files=["./output/flows/FABWAVE_v2_45classes/info/*.attribset"]
)

# Split into train/val/test
dataset_loader.split_data(train_ratio=0.7, val_ratio=0.15, test_ratio=0.15)

Creating the Trainer

With the dataset split, create a FlowTrainer instance to manage the training process:

from hoops_ai.ml import FlowTrainer

# Create trainer with training configuration
trainer = FlowTrainer(
    flowmodel=flow_model,  # Same GraphClassification instance used for encoding
    datasetLoader=dataset_loader,
    batch_size=32,
    num_workers=4,
    experiment_name="fabwave_classification",
    accelerator='gpu',     # Use GPU if available, otherwise 'cpu'
    devices=1,             # Number of GPUs to use
    max_epochs=100,
    result_dir="./experiments"
)

Trainer Configuration Parameters:

flowmodel (GraphClassification): The same GraphClassification instance created for encoding. Using the same instance ensures encoding consistency between training and inference.
datasetLoader (DatasetLoader): The dataset loader with completed stratified split.
batch_size (int): Number of graphs processed per training iteration. Larger batch sizes enable faster training but require more GPU memory. Typical values: 16, 32, 64.
num_workers (int): Number of parallel workers for data loading. Increase for faster data loading, but be aware that on Windows, values >0 may decrease performance.
experiment_name (str): Name for this training experiment. Used for organizing TensorBoard logs and saved checkpoints.
accelerator (str): Hardware accelerator: 'gpu' for GPU training (much faster), 'cpu' for CPU-only training.
devices (int or list): Number of devices to use, or specific device IDs. For single-GPU training, use devices=1.
max_epochs (int): Maximum number of training epochs. Training may stop earlier if validation loss stops improving (early stopping).
result_dir (str): Directory for saving training results, checkpoints, and logs.

Training Execution

Start the training process with a single method call:

# Train and get best checkpoint
best_checkpoint = trainer.train()
print(f"Training complete! Best model: {best_checkpoint}")

Model Evaluation

After training, evaluate the model’s performance on the held-out test set:

# Evaluate on test set
trainer.test(trained_model_path=best_checkpoint)

The test set provides an unbiased estimate of model performance on unseen data, representing how well the model will generalize to real-world part classification tasks.

Accessing Training Metrics

The trainer stores comprehensive metrics throughout training:

# Get metrics storage
metrics = trainer.metrics_storage()

Monitoring with TensorBoard

For real-time training monitoring, launch TensorBoard:

# Launch TensorBoard
tensorboard --logdir=./experiments/ml_output/fabwave_classification/

Then open your browser to http://localhost:6006 to visualize:

Loss curves (training and validation)

Accuracy metrics

Learning rate schedule

Gradient histograms

Model graph structure

TensorBoard provides interactive visualization and is especially useful for comparing multiple training runs with different hyperparameters.

Inference Workflow 

After training, use the trained model to classify new, unseen CAD parts.

Setting Up Inference

Initialize the FlowInference component with the trained model:

from hoops_ai.ml import FlowInference
from hoops_ai.cadaccess import HOOPSLoader

# Create CAD loader
cad_loader = HOOPSLoader()

# Create inference instance
inference = FlowInference(
    cad_loader=cad_loader,
    flowmodel=flow_model,  # Same GraphClassification instance
    log_file='inference_errors.log'
)

# Load trained model
inference.load_from_checkpoint("./experiments/best.ckpt")

Key Points:

Use the same FlowModel instance (flow_model) that was used for encoding and training

This ensures encoding consistency—the same geometric features are extracted

The checkpoint file contains the trained neural network weights

Single File Prediction

Classify a single CAD file:

# Path to new CAD file
new_part = "path/to/unknown_part.step"

# Preprocess: encode the CAD file using the same encoding pipeline
batch = inference.preprocess(new_part)

# Predict: run the neural network to get classification
predictions = inference.predict_and_postprocess(batch)

# Interpret results
predicted_class = predictions['predictions'][0]
confidence = predictions['probabilities'][0][predicted_class]
class_name = labels_description[predicted_class]["name"]

print(f"Predicted: {class_name} (confidence: {confidence:.2%})")

Understanding Predictions

The prediction output contains:

predictions (array): Predicted class indices (integers from 0 to num_classes-1)
probabilities (array): Probability distribution over all classes (sums to 1.0)

The probability distribution indicates the model’s confidence. High confidence in a single class (e.g., 94%) suggests a clear classification. Similar probabilities across multiple classes (e.g., 40%, 35%, 25%) indicate uncertainty.

Batch Inference

For processing multiple CAD files, use batch inference:

import os

# Get all STEP files from directory
test_dir = "./test_parts"
cad_files = [os.path.join(test_dir, f) for f in os.listdir(test_dir) if f.endswith('.step')]

# Predict for each file
results = []
for cad_file in cad_files:
    batch = inference.preprocess(cad_file)
    pred = inference.predict_and_postprocess(batch)
    results.append({
        'file': cad_file,
        'prediction': pred['predictions'][0],
        'confidence': pred['probabilities'][0].max()
    })

# Print summary
for result in results:
    print(f"{result['file']}: {result['prediction']} ({result['confidence']:.2%})")

This batch processing approach enables high-throughput classification of entire CAD libraries, supporting applications like automated parts library organization and design search.

Hyperparameter Tuning 

Optimizing model performance often requires tuning hyperparameters. The GraphClassification model provides several tuning opportunities at different levels.

The GraphClassification model uses default hyperparameters which are embedded in the underlying architecture.

Tuning Strategy

Since the model uses a pre-defined architecture, hyperparameter tuning focuses on:

Training Hyperparameters (via FlowTrainer)

Face Discretization Resolution (in encoding)

Data Augmentation (custom preprocessing)

For architecture-level modifications, you would need to extend the GraphClassification class.

Training Hyperparameters

The most accessible hyperparameters are controlled through the FlowTrainer:

trainer = FlowTrainer(
    flowmodel=flow_model,
    datasetLoader=dataset_loader,

    # Batch size (larger = faster, more memory)
    batch_size=64,  # Try: 16, 32, 64, 128

    # Learning rate
    learning_rate=0.001,  # Try: 0.0001, 0.001, 0.01

    # Epochs
    max_epochs=200,  # Try: 50, 100, 200

    # Gradient clipping
    gradient_clip_val=1.0,  # Try: 0.5, 1.0, 2.0

    # Device
    accelerator='gpu',
    devices=1,
)

Batch Size

The batch size controls how many graphs are processed together in each training iteration:

# Small batch size: Lower memory, slower training, more gradient noise
trainer = FlowTrainer(batch_size=16, ...)

# Medium batch size: Balanced (recommended starting point)
trainer = FlowTrainer(batch_size=32, ...)

# Large batch size: Higher memory, faster training, less gradient noise
trainer = FlowTrainer(batch_size=64, ...)

Tuning Guidance:

Start with 32 and adjust based on GPU memory
If out-of-memory errors occur, reduce batch size
If training is slow and GPU is underutilized, increase batch size

Learning Rate

The learning rate controls how quickly the model adapts to the training data:

# Low learning rate: Slow but stable training
trainer = FlowTrainer(learning_rate=0.0001, ...)

# Medium learning rate: Balanced (default)
trainer = FlowTrainer(learning_rate=0.001, ...)

# High learning rate: Fast but potentially unstable
trainer = FlowTrainer(learning_rate=0.01, ...)

Tuning Guidance:

Start with 0.001 (default)
If training loss is not decreasing, try higher learning rate
If training is unstable (loss jumping), reduce learning rate
Use learning rate scheduling for best results (automatic in FlowTrainer)

Maximum Epochs

The number of training epochs determines how long training runs:

# Short training: Fast experimentation
trainer = FlowTrainer(max_epochs=50, ...)

# Medium training: Balanced
trainer = FlowTrainer(max_epochs=100, ...)

# Long training: Maximum performance
trainer = FlowTrainer(max_epochs=200, ...)

Tuning Guidance:

Early stopping prevents overfitting, so setting a high max_epochs is safe
Monitor validation loss curves to see if more epochs help
For large datasets, fewer epochs may be sufficient

Gradient Clipping

Gradient clipping prevents exploding gradients during training:

# Moderate clipping (default)
trainer = FlowTrainer(gradient_clip_val=1.0, ...)

# Stronger clipping: More stability, slower learning
trainer = FlowTrainer(gradient_clip_val=0.5, ...)

# Weaker clipping: Faster learning, less stability
trainer = FlowTrainer(gradient_clip_val=2.0, ...)

Tuning Guidance:

Use gradient clipping if you observe NaN losses
Default of 1.0 works well for most cases

Encoding Resolution

The geometric encoding resolution affects model accuracy and computational cost:

Face Discretization Resolution

Higher sample point density captures more geometric detail:

# In your encoding task:
def my_encoder(cad_file, cad_loader, storage):
    face_count, edge_count = flow_model.encode_cad_data(cad_file, cad_loader, storage)

    # Override default encoding with custom resolution
    brep_encoder = BrepEncoder(model.get_brep(), storage)
    brep_encoder.push_face_discretization(pointsamples=50)  # Higher resolution (default: 25)
    brep_encoder.push_curvegrid(20)     # Higher resolution (default: 10)

Trade-offs:

Low resolution (10 points): Fast encoding, low memory, may miss fine geometric details
Medium resolution (25 points): Good balance between detail and efficiency (recommended)
High resolution (50+ points): Best geometric detail capture, higher memory and processing time

Model Architecture Modifications

To modify the underlying architecture, you would need to:

Extend the GraphClassification class
Override the retrieve_model() method
Customize the model initialization in _thirdparty/ directory

Example:

from hoops_ai.ml.EXPERIMENTAL import GraphClassification

class CustomGraphClassification(GraphClassification):
    def __init__(self, num_classes, **kwargs):
        super().__init__(num_classes, **kwargs)

        # Override with custom model parameters if needed
        # See the underlying architecture documentation for available options

Performance Optimization 

Optimizing performance ensures efficient use of computational resources during encoding, training, and inference.

GPU Acceleration

GPU acceleration dramatically speeds up training:

# GPU training (recommended if available)
trainer = FlowTrainer(
    accelerator='gpu',
    devices=1,  # Single GPU
    precision=16,  # Mixed precision for 2x speedup
)

# Multi-GPU training (for very large datasets)
trainer = FlowTrainer(
    accelerator='gpu',
    devices=[0, 1],  # Use GPU 0 and GPU 1
)

# CPU training (fallback if no GPU available)
trainer = FlowTrainer(
    accelerator='cpu',
    devices=1,
)

Parallel Preprocessing

Flow-based encoding supports parallel processing for fast dataset preparation:

# Increase workers for Flow preprocessing
cad_flow = hoops_ai.create_flow(
    name="my_flow",
    tasks=[my_gatherer, my_encoder],
    max_workers=50,  # More workers = faster preprocessing
    flows_outputdir="./output"
)

Tuning Guidance:

max_workers should not exceed the number of CPU cores
For I/O-bound encoding (reading files), more workers helps
For CPU-bound encoding (geometric computation), use cores × 1.5
Monitor CPU usage to find optimal worker count

Memory Management

Large datasets and high-resolution encoding can exhaust memory. Strategies for managing memory:

Reduce Batch Size

If you encounter out-of-memory errors during training:

# Reduce batch size
trainer = FlowTrainer(batch_size=16, ...)  # Down from 32

# Use gradient accumulation to maintain effective batch size
trainer = FlowTrainer(
    batch_size=16,
    accumulate_grad_batches=2  # Effective batch size: 32
)

Troubleshooting 

Common issues and their solutions:

Shape Mismatch During Training

Symptom:

RuntimeError: expected shape [B, 25, 7], got [B, 40, 7]

Cause:

Inconsistent face discretization resolution between files. Some files were encoded with 25 sample points, others with 40.

Solution:

Ensure all files use the same encoding parameters:

# Ensure all files use same number of sample points
brep_encoder.push_face_discretization(pointsamples=25)  # Always use 25 points

Re-encode any files that used different parameters.

Label Not Found Error

Symptom:

KeyError: 'Unknown_Folder'

Cause:

A CAD file is in a folder that doesn’t exist in the description_to_code mapping.

Solution:

Add a default label for unknown categories:

# Add default label for unknown classes
label_code = description_to_code.get(folder_name, -1)

# Filter out unknown classes during dataset loading
dataset_loader = DatasetLoader(...)
dataset_loader.filter(lambda x: x['file_label'] != -1)

Low Classification Accuracy

Possible Causes and Solutions:

Insufficient Training Data
- Solution: Collect more samples per category
Class Imbalance
- Solution: Use weighted loss function or data augmentation
Poor UV Parameterization
- Cause: Some CAD files may have degenerate UV coordinates
Suboptimal Hyperparameters
- Solution: Try different learning rates, batch sizes, or encoding resolutions

Solutions:

# Check class distribution
explorer = DatasetExplorer(flow_output_file="...")
labels = explorer.get_column_data("file", "file_label")
print(np.bincount(labels))

# Use class weights during training (requires model modification)

Conclusion 

GraphClassification provides a production-ready implementation of a graph-level classifier for CAD part classification. By following the FlowModel interface, it seamlessly integrates with HOOPS AI’s Flow framework for batch preprocessing and supports both training and inference workflows with guaranteed encoding consistency.

Key Takeaways:

Instantiate GraphClassification once at module level
Wrap its methods in @flowtask decorated tasks
Use the same instance for training (FlowTrainer) and inference (FlowInference)
Customize encoding by modifying the Flow task, not the FlowModel

Attribution: This implementation is based on a third-party architecture. When publishing research using this model, please refer to Acknowledgments for proper citation.

Parts Classification Model

Use Cases

Model Overview

Overview

Geometric Encoding

Neural Network Components

Topology Representation

Output

Basic Usage

Parameters

Advanced Configuration

Encoding Pipeline

Feature Specifications

Mathematical Representation

Overview

Pattern: Wrapping FlowModel Methods

Dataset Structure

Configuration and Setup

Schema Definition

Model Instantiation

Flow Task Definitions

File Gathering Task

Encoding Task

Flow Orchestration and Execution

Understanding the Output

Dataset Loading and Splitting

Creating the Trainer

Training Execution

Model Evaluation

Accessing Training Metrics

Monitoring with TensorBoard

Setting Up Inference

Single File Prediction

Understanding Predictions

Batch Inference

Tuning Strategy

Training Hyperparameters

Batch Size

Learning Rate

Maximum Epochs

Gradient Clipping

Encoding Resolution

Face Discretization Resolution

Model Architecture Modifications

GPU Acceleration

Parallel Preprocessing

Memory Management

Reduce Batch Size

Shape Mismatch During Training

Label Not Found Error

Low Classification Accuracy

Related Documentation

Hello! I'm HOOPSY