Parts Classification Model
Introduction
Part classification is a fundamental task in CAD data analysis, enabling automatic categorization of mechanical components based on their geometric and topological properties. This capability supports various downstream applications including automated design retrieval, manufacturing process selection, smart CAD tools, and design recommendation systems.
HOOPS AI provides the GraphClassification model, a graph-level classifier specifically designed for part classification tasks. This model operates directly on Boundary Representation (B-rep) data from 3D CAD models, leveraging both geometric features and topological relationships to produce accurate classifications.
Use Cases
The part classification capability addresses several practical scenarios:
- Part Type Recognition
Automatically identify component types such as bearings, bolts, brackets, gears, and housings. This supports automated parts library organization and intelligent search systems.
- Shape Categorization
Classify parts based on their overall shape characteristics, enabling design pattern recognition and similarity-based retrieval.
- Manufacturing Process Selection
Determine appropriate manufacturing processes (e.g., casting, machining, additive manufacturing) based on part geometry, supporting automated process planning.
- Design Style Recognition
Identify design styles and families, enabling consistency checking and design standard enforcement across large CAD databases.
Model Overview
The GraphClassification model implements a graph-level classification architecture that processes CAD models through the following pipeline:
Geometric Encoding: Extract geometric features from faces (surfaces) and edges (curves) using discretization techniques
Topological Representation: Construct a face-adjacency graph representing the topological structure of the CAD model
Feature Learning: Apply Convolutional Neural Networks (CNNs) to geometric features and Graph Neural Networks (GNNs) to topological relationships
Classification: Produce a single classification label for the entire CAD model
This approach captures both the local geometric details and global topological structure, providing robust classification even for complex mechanical parts.
Note
Attribution Notice
The GraphClassification implementation is based on a state-of-the-art third-party architecture for learning from boundary representations. For complete attribution information, original paper citation, and licensing details, please refer to Acknowledgments.
Model Architecture
Overview
The GraphClassification model operates directly on Boundary Representation (B-rep) data from 3D CAD models using a CNN+GNN approach.
Geometric Encoding
The model processes geometric information from CAD models:
Face Geometry: Discretized sample points sampled on face surfaces
Edge Geometry: 1D U-grids along edge curves
Neural Network Components
The model employs several neural network components:
2D CNNs: Applied to face discretization samples to extract surface features
1D CNNs: Applied to edge U-grids to extract curve features
Graph Neural Networks: Aggregate topological information via face-adjacency graph
Topology Representation
The model captures topological relationships through a face-adjacency graph:
Nodes: Individual faces of the CAD model
Edges: Adjacency relationships between faces
Node Features: Encoded face discretization samples
Edge Features: Encoded edge U-grids
Output
The model produces a single classification label for the entire CAD model.
Model Initialization
Before using the GraphClassification model, you must initialize it with configuration parameters.
Basic Usage
The simplest initialization requires only the number of classification categories:
from hoops_ai.ml.EXPERIMENTAL import GraphClassification
# Create model with default parameters
flow_model = GraphClassification(
num_classes=10,
result_dir="./results"
)
This creates a GraphClassification instance configured for a 10-class classification task, with results saved to the ./results directory.
Parameters
The GraphClassification initialization accepts several parameters:
- num_classes (int)
Number of classification categories.
- result_dir (str, optional)
Directory for saving results and metrics.
- log_file (str, optional, default: ‘cnn_graph_training_errors.log’)
Path to error logging file.
- generate_stream_cache_for_visu (bool, optional, default: False)
Generate visualization cache for debugging.
Advanced Configuration
For production workflows, provide explicit configuration:
flow_model = GraphClassification(
num_classes=45, # FABWAVE dataset has 45 classes
result_dir="./experiments/part_classification",
log_file="training_errors.log",
generate_stream_cache_for_visu=False
)
CAD Encoding Strategy
The GraphClassification model uses a specific encoding strategy in its encode_cad_data() method.
Encoding Pipeline
The encode_cad_data() method follows this pipeline:
def encode_cad_data(self, cad_file: str, cad_loader: CADLoader, storage: DataStorage):
# 1. Configure CAD loading
general_options = cad_loader.get_general_options()
general_options["read_feature"] = True
general_options["read_solid"] = True
# 2. Load model
model = cad_loader.create_from_file(cad_file)
# 3. Configure BREP with UV computation
hoopstools = HOOPSTools()
brep_options = hoopstools.brep_options()
brep_options["force_compute_uv"] = True
brep_options["force_compute_3d"] = True
hoopstools.adapt_brep(model, brep_options)
# 4. Encode features
brep_encoder = BrepEncoder(model.get_brep(body_index=0), storage)
# Graph structure
brep_encoder.push_face_adjacency_graph()
# Node features (faces)
brep_encoder.push_face_attributes()
brep_encoder.push_face_discretization(pointsamples=25) # Sample points per face
# Edge features
brep_encoder.push_edge_attributes()
brep_encoder.push_curvegrid(10) # 10 points along edge
# Additional topological features
brep_encoder.push_face_pair_edges_path(16)
Feature Specifications
Node Features (Face Discretization):
Shape: \((n_{\text{samples}}, 7)\) where \(n_{\text{samples}}\) is typically 25
Components:
(x, y, z, nx, ny, nz, visibility)Encoding: 2D CNN processes each face’s discretized sample points
Edge Features (Edge U-grids):
Shape: \((10, 6)\)
Components: 3D points and tangent vectors along edge curve
Encoding: 1D CNN processes each edge’s U-grid
Graph Structure:
Nodes: Faces of the CAD model
Edges: Face-face adjacency (shared edges)
Mathematical Representation
For each face \(f_i\):
For each edge \(e_{ij}\) between faces \(f_i\) and \(f_j\):
Graph classification via message passing:
Integration with Flow Tasks
Overview
The GraphClassification model integrates seamlessly with HOOPS AI’s Flow framework via the @flowtask decorator pattern. This allows you to wrap FlowModel methods inside Flow tasks for batch processing of CAD datasets.
Pattern: Wrapping FlowModel Methods
The key insight is to instantiate the FlowModel once at the module level, then call its methods inside decorated Flow tasks:
from hoops_ai.flowmanager import flowtask
# 1. Create FlowModel instance
flow_model = GraphClassification(num_classes=45, result_dir="./results")
# 2. Wrap encode_cad_data() in a Flow task
@flowtask.transform(
name="advanced_cad_encoder",
inputs=["cad_file", "cad_loader", "storage"],
outputs=["face_count", "edge_count"]
)
def my_encoder(cad_file: str, cad_loader, storage):
# Call the FlowModel's encoding method
face_count, edge_count = flow_model.encode_cad_data(cad_file, cad_loader, storage)
# Optional: Add custom label processing
# ... your label code here ...
# Optional: Convert to graph
flow_model.convert_encoded_data_to_graph(storage, graph_handler, filename)
return face_count, edge_count
This pattern provides several benefits:
- Consistency
Encoding logic defined once in FlowModel, reused in Flow
- Maintainability
Changes to encoding strategy only need to update FlowModel
- Reusability
Same FlowModel used for both training (Flow) and inference
- Type Safety
Flow decorators provide clear input/output contracts
Complete Example: FABWAVE Dataset Processing
This section demonstrates a complete end-to-end workflow for processing the FABWAVE dataset, a benchmark dataset containing 45 distinct mechanical part categories.
Dataset Structure
The FABWAVE dataset organizes parts into folders by category:
fabwave/
├── Bearings/
│ ├── bearing_001.step
│ ├── bearing_002.step
│ └── ...
├── Bolts/
│ ├── bolt_001.step
│ ├── bolt_002.step
│ └── ...
├── Brackets/
│ ├── bracket_001.step
│ └── ...
├── Gears/
└── ... (45 categories total)
Each folder represents a part classification category, and the folder name serves as the label for all contained CAD files.
Configuration and Setup
The first step is to define the dataset configuration, including file paths, label mappings, and the data schema:
"""
FABWAVE Part Classification Pipeline
This script demonstrates complete processing of the FABWAVE dataset
using the GraphClassification model integrated with HOOPS AI Flows.
"""
import pathlib
import numpy as np
from typing import Tuple, List
# Flow framework imports
from hoops_ai.flowmanager import flowtask
import hoops_ai
from hoops_ai.cadaccess import HOOPSLoader, CADLoader
from hoops_ai.storage import (
DataStorage,
MLStorage,
CADFileRetriever,
LocalStorageProvider,
DGLGraphStoreHandler
)
from hoops_ai.storage.datastorage.schema_builder import SchemaBuilder
from hoops_ai.dataset import DatasetExplorer
# FlowModel import
from hoops_ai.ml.EXPERIMENTAL import GraphClassification
# ===== CONFIGURATION =====
# Define input/output directories
flows_inputdir = pathlib.Path(r"C:\path\to\fabwave")
flows_outputdir = pathlib.Path(r"C:\path\to\output")
datasources_dir = str(flows_inputdir)
# Define label mapping: class_id -> {name, description}
labels_description = {
0: {"name": "Bearings", "description": "FABWAVE bearing samples"},
1: {"name": "Bolts", "description": "FABWAVE bolt samples"},
2: {"name": "Brackets", "description": "FABWAVE bracket samples"},
3: {"name": "Bushings", "description": "FABWAVE bushing samples"},
# ... (continue for all 45 classes)
44: {"name": "Wide Grip External Retaining Ring", "description": "FABWAVE retaining ring samples"},
}
# Create reverse mapping: folder_name -> class_id
description_to_code = {v["name"]: k for k, v in labels_description.items()}
The label mapping serves two purposes:
Forward Mapping (
labels_description): Converts numeric class IDs to human-readable names for visualization and reportingReverse Mapping (
description_to_code): Converts folder names to numeric class IDs during encoding
Schema Definition
The schema defines how data is organized in the storage system and how metadata is routed:
# Define schema for part classification
builder = SchemaBuilder(
domain="Part_classification",
version="1.0",
description="Schema for part classification"
)
file_group = builder.create_group("file", "file", "Information related to the cad file")
file_group.create_array("file_label", ["file"], "int32", "FABWAVE part label as integer (0-44)")
builder.define_categorical_metadata('file_label_description', 'str', 'Part classification')
builder.set_metadata_routing_rules(
categorical_patterns=['file_label_description', 'category', 'type']
)
cad_schema = builder.build()
- This schema ensures that:
Part labels are stored in the
filegroup as integer arraysLabel descriptions are routed to the
.attribsetfile for human-readable lookupAll encoded files follow a consistent data organization
Model Instantiation
Create the GraphClassification model instance that will be used throughout the pipeline:
# Define flow name
flowname = "FABWAVE_v2_45classes"
# Create GraphClassification model
flow_model = GraphClassification(
num_classes=45, # FABWAVE has 45 part categories
result_dir=str(pathlib.Path(flows_outputdir).joinpath("flows").joinpath(flowname))
)
This single instance will be used for all encoding operations and later for training and inference, ensuring complete consistency.
Flow Task Definitions
Define the Flow tasks that will gather CAD files and encode them using the GraphClassification model:
File Gathering Task
The first task collects all CAD files from the dataset directory:
@flowtask.extract(
name="gather_cad_files_to_be_treated",
inputs=["cad_datasources"],
outputs=["cad_dataset"]
)
def my_demo_gatherer(source: str) -> List[str]:
"""
Gather all CAD files from the FABWAVE dataset directory.
"""
cad_formats = [".stp", ".step"]
local_provider = LocalStorageProvider(directory_path=source)
retriever = CADFileRetriever(
storage_provider=local_provider,
formats=cad_formats
)
return retriever.get_file_list()
This task uses the CADFileRetriever to automatically discover all STEP files, regardless of their location within the directory tree. The returned list becomes the input for the encoding task.
Encoding Task
The encoding task wraps the GraphClassification model’s encoding method and adds dataset-specific label extraction:
@flowtask.transform(
name="advanced_cad_encoder",
inputs=["cad_file", "cad_loader", "storage"],
outputs=["face_count", "edge_count"]
)
def my_demo_encoder(cad_file: str, cad_loader: HOOPSLoader, storage: DataStorage) -> Tuple[int, int]:
"""
Encode CAD data using GraphClassification FlowModel.
This task wraps the FlowModel's encode_cad_data() method and adds:
1. Schema configuration
2. Label extraction from folder name
3. Graph conversion for ML training
"""
# Set schema for storage
storage.set_schema(cad_schema)
# ===== CALL FLOWMODEL METHOD =====
face_count, edge_count = flow_model.encode_cad_data(cad_file, cad_loader, storage)
# =================================
# Extract label from folder structure (FABWAVE-specific logic)
folder_with_name = str(pathlib.Path(cad_file).parent.parent.stem)
label_code = description_to_code.get(folder_with_name, -1)
# Save label to storage
storage.save_data("file_label", np.array([label_code]).astype(np.int64))
storage.save_metadata(f"file_label_description", [
{str(label_code): labels_description[label_code]["name"]}
])
# Convert encoded data to DGL graph file
location = pathlib.Path(storage.get_file_path("."))
dgl_output_path = pathlib.Path(location.parent.parent / "dgl" / f"{location.stem}.ml")
dgl_output_path.parent.mkdir(parents=True, exist_ok=True)
# ===== CALL FLOWMODEL METHOD =====
flow_model.convert_encoded_data_to_graph(storage, DGLGraphStoreHandler(), str(dgl_output_path))
# =================================
return face_count, edge_count
This task demonstrates the complete integration pattern:
Schema Configuration: Ensures consistent data organization across all files
FlowModel Encoding: Delegates geometric/topological encoding to the model
Label Extraction: Implements dataset-specific logic to determine the class label
Metadata Storage: Saves both numeric labels and human-readable descriptions
Graph Conversion: Transforms encoded data into DGL graph format ready for training
Flow Orchestration and Execution
With the tasks defined, create and execute the Flow pipeline:
def main():
"""
Execute the FABWAVE preprocessing pipeline.
This function:
1. Creates a Flow with the defined tasks
2. Executes the Flow with parallel processing
3. Prints summary statistics
4. Explores the resulting dataset
"""
# Create Flow with tasks
cad_flow = hoops_ai.create_flow(
name=flowname,
tasks=[
my_demo_gatherer, # Gather CAD files
my_demo_encoder # Encode using FlowModel
],
max_workers=40, # 40 parallel workers
flows_outputdir=str(flows_outputdir),
ml_task="Part Classification with GraphClassification",
)
# Execute Flow
output, dict_data, flow_file = cad_flow.process(
inputs={'cad_datasources': [datasources_dir]}
)
# Print summary
print(output.summary())
# Explore dataset
explorer = DatasetExplorer(flow_output_file=str(flow_file))
explorer.print_table_of_contents()
# Filter files with medium face count
facecount_is_medium = lambda ds: ds['num_nodes'] > 40
filelist = explorer.get_file_list(group="graph", where=facecount_is_medium)
print(f"Files with num_nodes > 40: {len(filelist)}")
if __name__ == "__main__":
main()
Understanding the Output
After successful execution, the output directory contains:
output/
└── flows/
└── FABWAVE_v2_45classes/
├── encoded/ # Individual .data files (Zarr format)
│ ├── bearing_001.data
│ ├── bolt_001.data
│ └── ...
├── dgl/ # DGL graph files for ML training
│ ├── bearing_001.ml
│ ├── bolt_001.ml
│ └── ...
├── info/ # Metadata (.infoset/.attribset)
└── flow_output.json # Flow execution summary
Key Output Files:
.datafiles: Individual CAD files encoded in Zarr format containing geometric and topological features
.mlfiles: DGL graph files ready for loading by DatasetLoader for training
.datasetfile: Merged dataset combining all individual files (created byauto_dataset_export=True)
.infosetfile: Parquet file containing file-level metadata (labels, processing time, etc.)
.attribsetfile: Parquet file containing categorical metadata (label descriptions, type names, etc.)
.flowfile: JSON specification documenting the Flow execution for reproducibility
This structured output enables seamless transition to the training phase using DatasetLoader and FlowTrainer.
Training Workflow
After preprocessing the dataset using the Flow pipeline, the next step is to train the GraphClassification model to learn part classification from the encoded CAD data.
Dataset Loading and Splitting
The first step in training is to load the preprocessed dataset and perform stratified splitting into training, validation, and test subsets:
from hoops_ai.dataset import DatasetLoader
# Load preprocessed graphs and labels
dataset_loader = DatasetLoader(
graph_files=["./output/flows/FABWAVE_v2_45classes/dgl/*.ml"],
label_files=["./output/flows/FABWAVE_v2_45classes/info/*.attribset"]
)
# Split into train/val/test
dataset_loader.split_data(train_ratio=0.7, val_ratio=0.15, test_ratio=0.15)
Creating the Trainer
With the dataset split, create a FlowTrainer instance to manage the training process:
from hoops_ai.ml import FlowTrainer
# Create trainer with training configuration
trainer = FlowTrainer(
flowmodel=flow_model, # Same GraphClassification instance used for encoding
datasetLoader=dataset_loader,
batch_size=32,
num_workers=4,
experiment_name="fabwave_classification",
accelerator='gpu', # Use GPU if available, otherwise 'cpu'
devices=1, # Number of GPUs to use
max_epochs=100,
result_dir="./experiments"
)
Trainer Configuration Parameters:
- flowmodel (GraphClassification)
The same GraphClassification instance created for encoding. Using the same instance ensures encoding consistency between training and inference.
- datasetLoader (DatasetLoader)
The dataset loader with completed stratified split.
- batch_size (int)
Number of graphs processed per training iteration. Larger batch sizes enable faster training but require more GPU memory. Typical values: 16, 32, 64.
- num_workers (int)
Number of parallel workers for data loading. Increase for faster data loading, but be aware that on Windows, values >0 may decrease performance.
- experiment_name (str)
Name for this training experiment. Used for organizing TensorBoard logs and saved checkpoints.
- accelerator (str)
Hardware accelerator:
'gpu'for GPU training (much faster),'cpu'for CPU-only training.- devices (int or list)
Number of devices to use, or specific device IDs. For single-GPU training, use
devices=1.- max_epochs (int)
Maximum number of training epochs. Training may stop earlier if validation loss stops improving (early stopping).
- result_dir (str)
Directory for saving training results, checkpoints, and logs.
Training Execution
Start the training process with a single method call:
# Train and get best checkpoint
best_checkpoint = trainer.train()
print(f"Training complete! Best model: {best_checkpoint}")
Model Evaluation
After training, evaluate the model’s performance on the held-out test set:
# Evaluate on test set
trainer.test(trained_model_path=best_checkpoint)
The test set provides an unbiased estimate of model performance on unseen data, representing how well the model will generalize to real-world part classification tasks.
Accessing Training Metrics
The trainer stores comprehensive metrics throughout training:
# Get metrics storage
metrics = trainer.metrics_storage()
Monitoring with TensorBoard
For real-time training monitoring, launch TensorBoard:
# Launch TensorBoard
tensorboard --logdir=./experiments/ml_output/fabwave_classification/
Then open your browser to http://localhost:6006 to visualize:
Loss curves (training and validation)
Accuracy metrics
Learning rate schedule
Gradient histograms
Model graph structure
TensorBoard provides interactive visualization and is especially useful for comparing multiple training runs with different hyperparameters.
Inference Workflow
After training, use the trained model to classify new, unseen CAD parts.
Setting Up Inference
Initialize the FlowInference component with the trained model:
from hoops_ai.ml import FlowInference
from hoops_ai.cadaccess import HOOPSLoader
# Create CAD loader
cad_loader = HOOPSLoader()
# Create inference instance
inference = FlowInference(
cad_loader=cad_loader,
flowmodel=flow_model, # Same GraphClassification instance
log_file='inference_errors.log'
)
# Load trained model
inference.load_from_checkpoint("./experiments/best.ckpt")
Key Points:
Use the same FlowModel instance (
flow_model) that was used for encoding and trainingThis ensures encoding consistency—the same geometric features are extracted
The checkpoint file contains the trained neural network weights
Single File Prediction
Classify a single CAD file:
# Path to new CAD file
new_part = "path/to/unknown_part.step"
# Preprocess: encode the CAD file using the same encoding pipeline
batch = inference.preprocess(new_part)
# Predict: run the neural network to get classification
predictions = inference.predict_and_postprocess(batch)
# Interpret results
predicted_class = predictions['predictions'][0]
confidence = predictions['probabilities'][0][predicted_class]
class_name = labels_description[predicted_class]["name"]
print(f"Predicted: {class_name} (confidence: {confidence:.2%})")
Understanding Predictions
The prediction output contains:
- predictions (array)
Predicted class indices (integers from 0 to num_classes-1)
- probabilities (array)
Probability distribution over all classes (sums to 1.0)
The probability distribution indicates the model’s confidence. High confidence in a single class (e.g., 94%) suggests a clear classification. Similar probabilities across multiple classes (e.g., 40%, 35%, 25%) indicate uncertainty.
Batch Inference
For processing multiple CAD files, use batch inference:
import os
# Get all STEP files from directory
test_dir = "./test_parts"
cad_files = [os.path.join(test_dir, f) for f in os.listdir(test_dir) if f.endswith('.step')]
# Predict for each file
results = []
for cad_file in cad_files:
batch = inference.preprocess(cad_file)
pred = inference.predict_and_postprocess(batch)
results.append({
'file': cad_file,
'prediction': pred['predictions'][0],
'confidence': pred['probabilities'][0].max()
})
# Print summary
for result in results:
print(f"{result['file']}: {result['prediction']} ({result['confidence']:.2%})")
This batch processing approach enables high-throughput classification of entire CAD libraries, supporting applications like automated parts library organization and design search.
Hyperparameter Tuning
Optimizing model performance often requires tuning hyperparameters. The GraphClassification model provides several tuning opportunities at different levels.
The GraphClassification model uses default hyperparameters which are embedded in the underlying architecture.
Tuning Strategy
Since the model uses a pre-defined architecture, hyperparameter tuning focuses on:
Training Hyperparameters (via
FlowTrainer)Face Discretization Resolution (in encoding)
Data Augmentation (custom preprocessing)
For architecture-level modifications, you would need to extend the GraphClassification class.
Training Hyperparameters
The most accessible hyperparameters are controlled through the FlowTrainer:
trainer = FlowTrainer(
flowmodel=flow_model,
datasetLoader=dataset_loader,
# Batch size (larger = faster, more memory)
batch_size=64, # Try: 16, 32, 64, 128
# Learning rate
learning_rate=0.001, # Try: 0.0001, 0.001, 0.01
# Epochs
max_epochs=200, # Try: 50, 100, 200
# Gradient clipping
gradient_clip_val=1.0, # Try: 0.5, 1.0, 2.0
# Device
accelerator='gpu',
devices=1,
)
Batch Size
The batch size controls how many graphs are processed together in each training iteration:
# Small batch size: Lower memory, slower training, more gradient noise
trainer = FlowTrainer(batch_size=16, ...)
# Medium batch size: Balanced (recommended starting point)
trainer = FlowTrainer(batch_size=32, ...)
# Large batch size: Higher memory, faster training, less gradient noise
trainer = FlowTrainer(batch_size=64, ...)
- Tuning Guidance:
Start with 32 and adjust based on GPU memory
If out-of-memory errors occur, reduce batch size
If training is slow and GPU is underutilized, increase batch size
Learning Rate
The learning rate controls how quickly the model adapts to the training data:
# Low learning rate: Slow but stable training
trainer = FlowTrainer(learning_rate=0.0001, ...)
# Medium learning rate: Balanced (default)
trainer = FlowTrainer(learning_rate=0.001, ...)
# High learning rate: Fast but potentially unstable
trainer = FlowTrainer(learning_rate=0.01, ...)
- Tuning Guidance:
Start with 0.001 (default)
If training loss is not decreasing, try higher learning rate
If training is unstable (loss jumping), reduce learning rate
Use learning rate scheduling for best results (automatic in FlowTrainer)
Maximum Epochs
The number of training epochs determines how long training runs:
# Short training: Fast experimentation
trainer = FlowTrainer(max_epochs=50, ...)
# Medium training: Balanced
trainer = FlowTrainer(max_epochs=100, ...)
# Long training: Maximum performance
trainer = FlowTrainer(max_epochs=200, ...)
- Tuning Guidance:
Early stopping prevents overfitting, so setting a high max_epochs is safe
Monitor validation loss curves to see if more epochs help
For large datasets, fewer epochs may be sufficient
Gradient Clipping
Gradient clipping prevents exploding gradients during training:
# Moderate clipping (default)
trainer = FlowTrainer(gradient_clip_val=1.0, ...)
# Stronger clipping: More stability, slower learning
trainer = FlowTrainer(gradient_clip_val=0.5, ...)
# Weaker clipping: Faster learning, less stability
trainer = FlowTrainer(gradient_clip_val=2.0, ...)
- Tuning Guidance:
Use gradient clipping if you observe NaN losses
Default of 1.0 works well for most cases
Encoding Resolution
The geometric encoding resolution affects model accuracy and computational cost:
Face Discretization Resolution
Higher sample point density captures more geometric detail:
# In your encoding task:
def my_encoder(cad_file, cad_loader, storage):
face_count, edge_count = flow_model.encode_cad_data(cad_file, cad_loader, storage)
# Override default encoding with custom resolution
brep_encoder = BrepEncoder(model.get_brep(), storage)
brep_encoder.push_face_discretization(pointsamples=50) # Higher resolution (default: 25)
brep_encoder.push_curvegrid(20) # Higher resolution (default: 10)
- Trade-offs:
Low resolution (10 points): Fast encoding, low memory, may miss fine geometric details
Medium resolution (25 points): Good balance between detail and efficiency (recommended)
High resolution (50+ points): Best geometric detail capture, higher memory and processing time
Model Architecture Modifications
To modify the underlying architecture, you would need to:
Extend the
GraphClassificationclassOverride the
retrieve_model()methodCustomize the model initialization in
_thirdparty/directoryExample:
from hoops_ai.ml.EXPERIMENTAL import GraphClassification class CustomGraphClassification(GraphClassification): def __init__(self, num_classes, **kwargs): super().__init__(num_classes, **kwargs) # Override with custom model parameters if needed # See the underlying architecture documentation for available options
Performance Optimization
Optimizing performance ensures efficient use of computational resources during encoding, training, and inference.
GPU Acceleration
GPU acceleration dramatically speeds up training:
# GPU training (recommended if available)
trainer = FlowTrainer(
accelerator='gpu',
devices=1, # Single GPU
precision=16, # Mixed precision for 2x speedup
)
# Multi-GPU training (for very large datasets)
trainer = FlowTrainer(
accelerator='gpu',
devices=[0, 1], # Use GPU 0 and GPU 1
)
# CPU training (fallback if no GPU available)
trainer = FlowTrainer(
accelerator='cpu',
devices=1,
)
Parallel Preprocessing
Flow-based encoding supports parallel processing for fast dataset preparation:
# Increase workers for Flow preprocessing
cad_flow = hoops_ai.create_flow(
name="my_flow",
tasks=[my_gatherer, my_encoder],
max_workers=50, # More workers = faster preprocessing
flows_outputdir="./output"
)
- Tuning Guidance:
max_workersshould not exceed the number of CPU coresFor I/O-bound encoding (reading files), more workers helps
For CPU-bound encoding (geometric computation), use cores × 1.5
Monitor CPU usage to find optimal worker count
Memory Management
Large datasets and high-resolution encoding can exhaust memory. Strategies for managing memory:
Reduce Batch Size
If you encounter out-of-memory errors during training:
# Reduce batch size
trainer = FlowTrainer(batch_size=16, ...) # Down from 32
# Use gradient accumulation to maintain effective batch size
trainer = FlowTrainer(
batch_size=16,
accumulate_grad_batches=2 # Effective batch size: 32
)
Troubleshooting
Common issues and their solutions:
Shape Mismatch During Training
Symptom:
RuntimeError: expected shape [B, 25, 7], got [B, 40, 7]
Cause:
Inconsistent face discretization resolution between files. Some files were encoded with 25 sample points, others with 40.
Solution:
Ensure all files use the same encoding parameters:
# Ensure all files use same number of sample points
brep_encoder.push_face_discretization(pointsamples=25) # Always use 25 points
Re-encode any files that used different parameters.
Label Not Found Error
Symptom:
KeyError: 'Unknown_Folder'
Cause:
A CAD file is in a folder that doesn’t exist in the description_to_code mapping.
Solution:
Add a default label for unknown categories:
# Add default label for unknown classes
label_code = description_to_code.get(folder_name, -1)
# Filter out unknown classes during dataset loading
dataset_loader = DatasetLoader(...)
dataset_loader.filter(lambda x: x['file_label'] != -1)
Low Classification Accuracy
Possible Causes and Solutions:
Insufficient Training Data
Solution: Collect more samples per category
Class Imbalance
Solution: Use weighted loss function or data augmentation
Poor UV Parameterization
Cause: Some CAD files may have degenerate UV coordinates
Suboptimal Hyperparameters
Solution: Try different learning rates, batch sizes, or encoding resolutions
Solutions:
# Check class distribution
explorer = DatasetExplorer(flow_output_file="...")
labels = explorer.get_column_data("file", "file_label")
print(np.bincount(labels))
# Use class weights during training (requires model modification)
Conclusion
GraphClassification provides a production-ready implementation of a graph-level classifier for CAD part classification. By following the FlowModel interface, it seamlessly integrates with HOOPS AI’s Flow framework for batch preprocessing and supports both training and inference workflows with guaranteed encoding consistency.
Key Takeaways:
Instantiate GraphClassification once at module level
Wrap its methods in
@flowtaskdecorated tasksUse the same instance for training (FlowTrainer) and inference (FlowInference)
Customize encoding by modifying the Flow task, not the FlowModel
Attribution: This implementation is based on a third-party architecture. When publishing research using this model, please refer to Acknowledgments for proper citation.