#############################
Machine Learning Fundamentals
#############################

.. important::
   **New to Machine Learning?** This guide is designed for CAD engineers without an AI background.
   
   - **Quick definitions**: :doc:`/resources/glossary` has all ML and CAD terms
   - **Need CAD basics?**: :doc:`/programming_guide/cad-fundamentals` explains :term:`B-rep <B-Rep (Boundary Representation)>`, :term:`Topology`, and geometric concepts
   - **Learn by doing**: :doc:`/tutorials/index` for hands-on examples

.. sidebar:: Table of Contents

   .. contents::
      :depth: 1
      :local:


Overview
========

This guide introduces the core machine learning concepts that underpin HOOPS AI workflows. It is designed for CAD engineers and 3D modeling experts who want to leverage machine learning but may not have a background in artificial intelligence. We'll explain everything from basic neural network concepts to advanced graph neural networks (GNNs) used for CAD analysis.

.. tip::
   **CAD Concepts**: Terms like :term:`B-rep <B-Rep (Boundary Representation)>`, :term:`UV parameterization <UV Parameterization>`, and :term:`surface normals <Surface Normal>` are explained in :doc:`/programming_guide/cad-fundamentals`.

What is Machine Learning?
==========================

Machine learning (ML) is a subset of artificial intelligence that enables computers to learn patterns from data without being explicitly programmed. Instead of writing rules manually (like "if the part has 6 faces and they're all rectangular, it's a box"), we show the computer thousands of examples and let it discover the patterns automatically.

Types of Machine Learning Tasks
--------------------------------

HOOPS AI primarily focuses on four types of ML tasks:

**Classification**
   Assigning a category to an entire CAD model. For example:
   
      - Part type classification (bracket, gear, housing, etc.)
      - Manufacturing process classification (casting, machining, forging)
      - Complexity level classification (simple, moderate, complex)

**Node Classification (Segmentation)**
   Assigning a category to each element within a CAD model. For example:
   
      - Machining feature detection (identifying holes, slots, pockets on specific faces)
      - Surface type classification (planar, cylindrical, freeform per face)
      - Manufacturing region segmentation (regions requiring different tools)

**Feature Recognition**
   A specialized form of node classification focused on identifying and classifying machining features in CAD models. For example:
   
      - Detecting and classifying 24 machining feature types (holes, pockets, slots, chamfers, fillets, etc.)
      - Identifying feature boundaries (which faces belong to each feature)
      - Recognizing feature hierarchies (features that contain other features)
   
   .. note::
      Feature recognition is critical for CAM (Computer-Aided Manufacturing) as it enables automatic toolpath generation and machining process planning.

**Regression**
   Predicting continuous numerical values. For example:
   
   - Manufacturing cost estimation
   - Processing time prediction
   - Quality metrics prediction

.. tip::
   **For CAD Concepts**: Terms like B-rep, UV parameterization, surface normals, and topology are explained in :doc:`/programming_guide/cad-fundamentals`. We'll reference that guide when CAD concepts appear.


Neural Networks Basics
======================

What is a Neural Network?
--------------------------

.. important::
   **Don't be intimidated by the equations and terminology!** This guide provides background on the ML concepts behind HOOPS AI, but you **don't need to understand all the mathematical details** to use the library effectively. The HOOPS AI package is designed to simplify these complex concepts - you can build, train, and deploy CAD ML models using high-level interfaces without directly implementing neural networks or working with the underlying mathematics. Think of this as a "what's happening under the hood" reference rather than required reading.

A neural network is a computational model inspired by biological brains. It consists of layers of interconnected "neurons" (mathematical functions) that transform input data into predictions.

.. code-block:: text

   Input → Hidden Layer 1 → Hidden Layer 2 → ... → Output
   [Data]     [Transform]      [Transform]         [Prediction]

.. figure:: /_assets/images/neural_network.png
                           
   :alt: Simple Neural Network Diagram

Key Components
--------------

**Neurons (Nodes)**
   Each neuron receives inputs, applies a weighted sum, adds a bias, and passes the result through an activation function:
   
   .. math::
      
      y = \sigma(w_1x_1 + w_2x_2 + ... + w_nx_n + b)
   
   Where:
   
   - :math:`x_i` are inputs
   - :math:`w_i` are learned weights
   - :math:`b` is a learned bias
   - :math:`\sigma` is an activation function (ReLU, sigmoid, etc.)

**Layers**
   Neural networks are organized into layers that progressively transform input data. The **input layer** receives the raw features (geometry, topology), **hidden layers** perform transformations to extract meaningful patterns through learned weights and non-linear activations, and the **output layer** produces the final prediction (class probabilities for classification or continuous values for regression).

**Activation Functions**
   Activation functions introduce **non-linearity** into neural networks (meaning they can learn curves and complex patterns, not just straight lines), enabling them to learn complex patterns beyond simple linear relationships. The most commonly used activation is **ReLU** (Rectified Linear Unit), defined as :math:`f(x) = \max(0, x)`, which is both simple and effective. The **sigmoid** function :math:`f(x) = \frac{1}{1 + e^{-x}}` outputs values between 0 and 1, making it suitable for binary probabilities. The **tanh** function :math:`f(x) = \tanh(x)` outputs between -1 and 1, providing zero-centered activations. Finally, **softmax** converts raw scores (**logits** - the raw numbers output by the network before converting to probabilities) into a probability distribution over multiple classes for classification tasks.

Training Process
----------------

Training is the process of adjusting network weights to minimize errors:

1. **Forward Pass**: Input data flows through the network to produce predictions
2. **Loss Calculation**: Compare predictions to ground truth using a loss function
3. **Backward Pass** (also called **backpropagation**): Calculate **gradients** - mathematical measures of how much each weight contributed to the error - by working backwards through the network
4. **Optimization**: Update weights to reduce loss using an **optimizer** (an algorithm like Adam or SGD that decides how to adjust weights based on gradients)

This cycle repeats for many iterations (epochs) until the model converges.

**Key Terminology**:

   - **Epoch**: One complete pass through the entire training dataset
   - **Batch**: A subset of the dataset processed together (batch size = 32 means 32 samples at once)
   - **Learning Rate**: How big the weight update steps are (typical: 0.001 - 0.0001)
   - **Overfitting**: Model memorizes training data but fails on new data
   - **Validation Set**: Data held out during training to check for overfitting

Graph Neural Networks (GNNs)
=============================

Why GNNs for CAD Data?
-----------------------

CAD models have an inherent **graph structure**: faces connected by shared edges, edges meeting at vertices. Traditional neural networks (CNNs - Convolutional Neural Networks, fully-connected networks) expect fixed-size grid-like inputs (images, vectors), but CAD models have:

- **Variable topology**: A simple box has 6 faces; a complex engine part may have thousands
- **Relational structure**: Which faces are adjacent matters for understanding geometry (see :doc:`/programming_guide/cad-fundamentals` for :term:`Topology` details)
- :term:`Permutation invariance <Permutation Invariance>` (order-independence): The order we list faces shouldn't affect the result - face [A, B, C] should give the same answer as [C, A, B]

:term:`Graph Neural Networks <Graph Neural Network (GNN)>` are designed exactly for this type of structured, relational data.

Graph Representation of CAD Models
-----------------------------------

In HOOPS AI, a CAD :term:`B-rep <B-Rep (Boundary Representation)>` model is represented as a graph where **nodes** correspond to CAD entities (faces, edges, or vertices depending on the task), and **edges** represent relationships between entities (such as :term:`face-adjacency <Face Adjacency Graph>` or edge-face incidence). Each node carries :term:`node features <Node Features>` - geometric properties like area, curvature, surface type, or sampled points - while :term:`edge features <Edge Features>` encode relationship properties such as shared edge length or :term:`dihedral angle <Dihedral Angle>` between adjacent faces.

.. seealso::
   For detailed examples of face-adjacency graphs and how CAD topology maps to graph structure, see :doc:`/programming_guide/cad-fundamentals` - "Face Adjacency Graph" section.

.. figure:: /_assets/images/brep_graph.png
   :alt: Graph Neural Network Diagram
   :align: center
   :width: 50%
   
   Boundary representation(BRep) and converting it to graph. The faces and curves in BRep correspond to the nodes and edges in graph.

How GNNs Work
-------------

GNNs operate through :term:`message passing <Message Passing>` (a process where each node collects and combines information from its neighboring nodes): each node aggregates information from its neighbors, updates its representation, and repeats for multiple layers.

**Basic GNN Layer**:

.. math::

   h_i^{(l+1)} = \sigma \left( W^{(l)} h_i^{(l)} + \text{AGG}\left(\{h_j^{(l)} : j \in \mathcal{N}(i)\}\right) \right)

.. note::
   **Skip the equation?** No problem! Think of it this way: each face learns about its neighbors, then their neighbors, and so on. After 3 layers, a face "knows" about faces up to 3 edges away.

Where:

- :math:`h_i^{(l)}` is the :term:`feature vector <Feature Vector>` of node :math:`i` at layer :math:`l`
- :math:`\mathcal{N}(i)` are the neighbors of node :math:`i`
- :math:`\text{AGG}` is an :term:`aggregation <Aggregation>` **function** (a way to combine multiple values into one, like taking the sum, mean, max, or attention-weighted average)
- :math:`W^{(l)}` are learnable weight matrices
- :math:`\sigma` is an :term:`activation function <Activation Function>`

**Intuition**: After :math:`L` GNN layers, each node's representation incorporates information from nodes up to :math:`L` **hops** (steps through connections) away. For a 3-layer GNN on a :term:`face-adjacency graph <Face Adjacency Graph>`, each face "knows about" the geometry of faces 3 edges away.

.. tip::
   **For Advanced Readers**: The following research papers demonstrate these concepts in practice. You don't need to read them to use HOOPS AI!

This message passing principle has been successfully applied across CAD ML research. **BRepNet** (`Lambourne et al., 2021 <https://graphics.stanford.edu/courses/cs348n-22-winter/PapersReferenced/Lambourne%20et%20al.%20-%202021%20-%20BRepNet%20A%20topological%20message%20passing%20system%20for%20solid%20models.pdf>`_) introduced one of the first topological message passing systems specifically designed for B-rep models, using face-to-face and edge-to-face message passing to learn machining features. More recent work like **GC-CAD** (`Quan et al., 2024 <https://arxiv.org/html/2406.08863v2>`_) combines GNN message passing with UV-Net-style feature extraction and :term:`contrastive learning <Self-Supervised Learning>` (a training technique that teaches the model to distinguish similar parts from dissimilar ones), achieving 100× speedup over classical shape descriptors while processing 360k CAD models.

Common GNN Architectures
------------------------

**Graph Convolutional Network (GCN)**
   Simple and effective. **Aggregates** (combines) neighbor features using **mean pooling** (averaging all neighbor values together). **BRepNet** uses a variant of GCN for B-rep topology, where face nodes aggregate information from adjacent faces through shared edges.
   
   .. code-block:: python
   
      # Conceptual (DGL handles the details)
      new_features = activation(
          self_transform(node_features) + 
          neighbor_transform(mean(neighbor_features))
      )

**Graph Attention Network (GAT)**
   Graph Attention Networks use an **attention mechanism** (a learned weighting system that determines which inputs are most important) to weight neighbor contributions differently, learning which neighboring nodes are most important for each prediction. For example, when classifying a cylindrical hole feature, GAT might assign higher **attention weights** to adjacent planar faces than to distant curved surfaces. **BrepMFR** (`2024 <https://www.sciencedirect.com/science/article/abs/pii/S0167839624000529>`_) applies GAT with transformers for enhanced machining feature recognition, where the learned attention weights provide **interpretability** (the ability to understand why the model made a decision) - showing which adjacent faces the model considers most relevant when classifying each feature.

**Graph Transformer**
   Graph Transformers apply the **transformer architecture** (a modern neural network design originally created for language processing that uses attention to weigh the importance of different inputs) to graph-structured data, using attention mechanisms over all nodes rather than just immediate neighbors. This allows the model to capture **long-range dependencies** (relationships between distant parts of the model) across the entire part topology, though at higher computational cost. **BRep-BERT** (`Lou et al., CIKM 2023 <https://www.semanticscholar.org/paper/A-learning-based-approach-to-feature-recognition-of-Muraleedharan-Muthuganapathy/458a3186ddfb2f2c7cafe848c65f80a65b42b0d8>`_) adapts **BERT's masked prediction** approach (a training method where the model learns by trying to fill in deliberately hidden parts of the input) to B-rep graphs, **pre-training** (initial training on a large dataset before fine-tuning on a specific task) on Fusion 360 Gallery models through **self-supervised learning** (learning patterns from unlabeled data without human annotations) where the model learns to predict masked entities from context. This pre-training enables effective **few-shot feature recognition** (learning to recognize features from very few examples) even with limited labeled data.

**Message Passing Neural Networks (MPNN)**
   General framework where you can customize message, aggregation, and update functions.

Point Cloud Networks
--------------------

Point cloud networks process 3D geometry as unordered sets of points, making them ideal for CAD surface sampling.

**PointNet**
   PointNet is the foundational architecture for **point cloud** processing (working with collections of 3D points in space), introducing a :term:`permutation-invariant <Permutation Invariance>` design (meaning the output doesn't change regardless of the order points are listed - [A,B,C] gives the same result as [C,A,B]) that can handle unordered sets of 3D points. The architecture processes each point individually through a shared Multi-Layer Perceptron (MLP), then **aggregates** (combines) all point features using :term:`max pooling <Max Pooling>` (taking the maximum value across all points for each feature dimension) to produce a global feature vector. This max pooling operation is the key to permutation invariance - regardless of the order in which points are presented, the maximum value in each dimension remains constant. In CAD applications, PointNet processes :term:`UV grid <UV Grid>` **samples** (points sampled from the parametric surface representation of a face - see :doc:`/programming_guide/cad-fundamentals` for :term:`UV parameterization <UV Parameterization>` details) as unordered point clouds, where each point includes both position (x, y, z) and :term:`surface normal <Surface Normal>` (a vector perpendicular to the surface indicating its orientation - see :doc:`/programming_guide/cad-fundamentals`) information (nx, ny, nz).
   
   .. code-block:: python
   
      # Simplified PointNet flow
      # Input: (N, 6) - N points with [x, y, z, nx, ny, nz]
      point_features = shared_mlp(points)      # (N, 64)
      global_feature = max_pool(point_features) # (64,) - single vector for entire face
      classification = classifier(global_feature)
   
   **Mathematical Formulation**:
   
   .. math::
   
      f(\{x_1, ..., x_n\}) = \gamma \circ \text{MAX}_{i=1,...,n} \{\phi(x_i)\}
   
   .. note::
      **Math intimidating?** Just remember: PointNet processes each point separately, then takes the maximum value across all points. HOOPS AI handles the implementation!
   
   Where:
   
   - :math:`\phi`: Per-point feature extraction (shared MLP)
   - :math:`\text{MAX}`: Max pooling (symmetric aggregation)
   - :math:`\gamma`: Classification network

**PointNet++**
   PointNet++ extends the original PointNet with hierarchical processing to capture local geometric structures. The architecture uses multi-scale grouping of points through a series of set abstraction layers, where each layer samples a subset of points, groups nearby neighbors, and applies PointNet to extract local features. This hierarchical approach allows the network to capture fine geometric details that the original PointNet's global pooling would miss, making it particularly effective for tasks requiring local context like segmentation and feature detection.
   
   .. code-block:: python
   
      # Hierarchical processing
      # Level 1: Sample 512 points, group neighbors, extract local features
      centroids_1, features_1 = set_abstraction(points, num_points=512, radius=0.2)
      
      # Level 2: Sample 128 points, group at larger radius
      centroids_2, features_2 = set_abstraction(centroids_1, num_points=128, radius=0.4)
      
      # Global feature
      global_feature = max_pool(features_2)

**Use Cases in CAD**:

While pure point cloud methods are common for mesh data, **UV-Net** (`Jayaraman et al., CVPR 2021 <https://openaccess.thecvf.com/content/CVPR2021/papers/Jayaraman_UV-Net_Learning_From_Boundary_Representations_CVPR_2021_paper.pdf>`_) showed that structured :term:`UV parameterization <UV Parameterization>` (a mapping from 2D parameter space (u,v) to 3D surface coordinates - see :doc:`/programming_guide/cad-fundamentals` for details) with 2D CNNs outperforms unordered point clouds for :term:`B-rep <B-Rep (Boundary Representation)>` CAD data. The paper demonstrated that sampling a regular 10×10 :term:`UV grid <UV Grid>` per face captures small features more reliably than random point sampling, as UV grids preserve spatial relationships. However, PointNet-style architectures remain useful for:

- **UV Grid Sampling**: Flatten UV grids into point clouds with normals when spatial structure is less critical
- **Surface Characterization**: Extract geometric features from face samples in a permutation-invariant manner
- **Multi-Face Aggregation**: Process multiple faces independently, then combine using max pooling or attention
- **Hybrid Approaches**: **Self-Supervised Representation Learning for CAD** (`Jones et al., CVPR 2023 <https://arxiv.org/abs/2210.10807>`_) combines UV parametric sampling with **implicit SDF reconstruction** (learning a continuous function that represents distance to the surface - SDF stands for Signed Distance Field), using point cloud techniques to approximate per-face signed distance fields from UV-sampled points.

.. tip::
   **Advanced Research Note**: The papers referenced above are for readers interested in the academic background. You don't need to understand these to use HOOPS AI's point cloud and UV grid features!

Convolutional Neural Networks (CNNs)
-------------------------------------

:term:`CNNs <Convolutional Neural Network (CNN)>` process grid-structured data (images, :term:`UV grids <UV Grid>`) using learnable filters. The breakthrough application of CNNs to CAD came with **UV-Net** (`Jayaraman et al., CVPR 2021 <https://arxiv.org/abs/2006.10211>`_, `GitHub <https://github.com/AutodeskAILab/UV-Net>`_), which demonstrated that 2D convolutions on **UV parametric grids** (2D grids of sampled points from CAD surface parameterization - see :doc:`/programming_guide/cad-fundamentals`) could effectively learn :term:`B-rep <B-Rep (Boundary Representation)>` geometry while preserving topological structure through :term:`face adjacency graphs <Face Adjacency Graph>`.

**Key Concepts**:

CNNs operate through **convolution** - sliding learnable filters over the input to detect local patterns like edges and textures. :term:`Pooling <Pooling>` operations downsample the spatial dimensions (reduce the grid size), reducing computational cost while adding **translation invariance** (the network recognizes patterns regardless of their position in the grid). The architecture learns **hierarchical features**: early convolutional layers detect simple patterns (edges, corners), while deeper layers combine these to recognize complex shapes and geometric structures.

.. seealso::
   New to CNNs? See :doc:`/tutorials/getting-started` for hands-on examples, or check the :doc:`glossary </resources/glossary>` for quick definitions!

**2D CNN for UV Grids**:

.. code-block:: python

   import torch.nn as nn
   
   class UVGridCNN(nn.Module):
       def __init__(self):
           super().__init__()
           # Input: (batch, 6, height, width) - 6 channels: x,y,z,nx,ny,nz
           self.conv1 = nn.Conv2d(6, 32, kernel_size=3, padding=1)
           self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
           self.pool = nn.MaxPool2d(2, 2)
           self.fc = nn.Linear(64 * 8 * 8, 128)  # Assuming 32x32 input -> 8x8 after pooling
       
       def forward(self, uv_grid):
           x = torch.relu(self.conv1(uv_grid))
           x = self.pool(x)
           x = torch.relu(self.conv2(x))
           x = self.pool(x)
           x = x.view(x.size(0), -1)  # Flatten
           features = self.fc(x)
           return features

**Why CNNs for UV Grids?**

The effectiveness of CNNs on :term:`UV grids <UV Grid>` was rigorously demonstrated in **UV-Net** (`Jayaraman et al., CVPR 2021 <https://openaccess.thecvf.com/content/CVPR2021/papers/Jayaraman_UV-Net_Learning_From_Boundary_Representations_CVPR_2021_paper.pdf>`_), which showed that 2D convolutions outperform point cloud methods (PointNet) on the SolidLetters classification benchmark. The key insight is that nearby points in UV parameter space correspond to nearby points on the physical surface, enabling CNNs to learn local geometric patterns just as they detect edges in images. Convolutional filters naturally detect curvature changes, ridges, and holes, producing consistent fixed-size :term:`feature vectors <Feature Vector>` regardless of grid resolution (whether 10×10 or 32×32). The architecture handles UV seam discontinuities on cylindrical and toroidal surfaces through periodic padding (discussed in the supplementary material), and :term:`pooling <Pooling>` layers capture patterns at multiple scales - from fine details like small fillets to global shape characteristics like overall surface curvature. Subsequent work like **Self-Supervised Representation Learning for CAD** (`Jones et al., CVPR 2023 <https://arxiv.org/abs/2210.10807>`_) extended this approach by combining UV-grid CNNs with implicit SDF reconstruction (SDF = Signed Distance Field - a continuous function representing distance to the surface), pre-training on 1M unlabeled ABC models to achieve state-of-the-art few-shot learning.

.. tip::
   **For Advanced Readers**: The research papers above provide academic context for HOOPS AI's CNN implementation, but you can use the UV grid features without reading them!

Hybrid Architectures for CAD
-----------------------------

HOOPS AI supports hybrid architectures that combine multiple neural network types. These architectures are accessed through the ``FlowModel`` interface with specific architecture names. Modern CAD ML research has shown that combining different network types leverages complementary strengths: :term:`CNNs <Convolutional Neural Network (CNN)>` capture local geometry, :term:`GNNs <Graph Neural Network (GNN)>` model :term:`topology <Topology>`, and transformers (attention-based networks that can process sequential data and learn relationships between distant elements) handle long-range dependencies.

.. seealso::
   Not familiar with these architectures? See :term:`Graph Neural Network (GNN)` and :term:`Convolutional Neural Network (CNN)` in the glossary, or review :doc:`/programming_guide/cad-fundamentals` for CAD-specific concepts!

**UV-Net: Hybrid CNN-GNN Architecture (cnn_gnn)**
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   
   The pioneering hybrid architecture from **Autodesk Research** (`Jayaraman et al., CVPR 2021 <https://openaccess.thecvf.com/content/CVPR2021/papers/Jayaraman_UV-Net_Learning_From_Boundary_Representations_CVPR_2021_paper.pdf>`_, `GitHub <https://github.com/AutodeskAILab/UV-Net>`_) combines image processing with graph reasoning for comprehensive CAD understanding. Tested on five datasets including SolidLetters, Fusion 360 Gallery, and ModelNet, UV-Net demonstrated superior classification and segmentation compared to point cloud and voxel methods.
   
   **Architecture**:
   
   UV-Net's architecture consists of three main components working in parallel and then fusing their outputs. The :term:`CNN <Convolutional Neural Network (CNN)>` **component** uses 2D convolutional layers to process :term:`UV grids <UV Grid>` (sampled face geometry), extracting local geometric features such as curvature patterns and surface characteristics. Simultaneously, the :term:`GNN <Graph Neural Network (GNN)>` **component** applies graph convolutional layers to the :term:`face-adjacency graph <Face Adjacency Graph>`, capturing topological relationships between faces and propagating information across the model structure. These two feature streams are then combined in a **fusion layer** that concatenates the geometric features from the CNN with the topological context from the GNN. Finally, a **classifier** processes these combined features to produce the final prediction, leveraging both local geometric details and global structural understanding.
   
   **When to Use**:
   
   UV-Net is most effective for tasks requiring both local geometric understanding and global topological context, such as face-level classification (e.g., surface type prediction) or part segmentation. It works best with datasets where UV parameterization is available for all faces.
   
   **HOOPS AI Implementation**:
   
   UV-Net is implemented in HOOPS AI as the **GraphClassification** FlowModel, which wraps the Classification model (the UV-Net architecture implementation). The model uses the exact hybrid CNN-GNN architecture described in the UV-Net paper.
   
   .. code-block:: python
   
      from hoops_ai.flowmanager._flows import GraphClassification
      
      # UV-Net architecture for part classification
      model = GraphClassification(
          num_classes=10  # Number of part categories
      )
   
   See :doc:`/programming_guide/train` for complete GraphClassification usage examples and :doc:`/tutorials/classification` for classification workflows.

**BrepMFR: Transformer-based GNN Architecture with Domain Adaptation (transformer_gnn)**
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   
   Advanced architecture using transformer attention (mechanism that allows models to focus on relevant parts of the input) for machining feature recognition with transfer learning (applying knowledge learned from one dataset to improve performance on another).
   
   **Architecture**:
   
      - **Graph Transformer**: Applies transformer attention mechanism over graph nodes (faces)
      - **Geometric Encoding**: Encodes local geometric shapes using learned embeddings
      - **Topological Encoding**: Captures global relationships through multi-head attention
      - **Domain Adaptation**: Two-step training strategy to transfer from synthetic to real CAD data
   
   **Domain Adaptation Strategy**:
   
      1. **Pre-training**: Train on large synthetic CAD dataset
      2. **Fine-tuning**: Adapt to real CAD data with domain adaptation loss
      3. **Transfer**: Leverages synthetic data to overcome limited real-world labels
   
   **When to Use**:
   
      - Machining feature recognition tasks
      - Limited real-world labeled data (use synthetic pre-training)
      - Tasks requiring long-range dependencies (transformer attention sees entire model)
   
   **HOOPS AI Implementation**:
   
   BrepMFR is implemented in HOOPS AI as the **GraphNodeClassification** FlowModel, which wraps the BrepSeg model (the BrepMFR architecture implementation). The model class is called `BrepSeg` but it implements the BrepMFR paper's transformer-based graph neural network with domain adaptation capabilities.
   
   .. code-block:: python
   
      from hoops_ai.flowmanager._flows import GraphNodeClassification
      
      # BrepMFR architecture for machining feature recognition
      model = GraphNodeClassification(
          num_classes=24,        # 24 machining feature types
          n_layers_encode=8,     # Transformer layers
          dim_node=256,          # Node embedding dimension
          d_model=512,           # Model dimension
          n_heads=32,            # Attention heads
          dropout=0.3,
          attention_dropout=0.3
      )
   
   See :doc:`/programming_guide/train` for complete GraphNodeClassification usage examples and :doc:`/tutorials/feature-recognition` for machining feature recognition workflows.

PyTorch and PyTorch Lightning
==============================

HOOPS AI uses **PyTorch** as its deep learning framework, with **PyTorch Lightning** for training orchestration. PyTorch is an open-source machine learning library widely used in research and production, while Lightning provides a structured framework that simplifies training workflows.

.. seealso::
   New to PyTorch? Check the `official PyTorch tutorials <https://pytorch.org/tutorials/>`_ for hands-on introduction, or see :doc:`/tutorials/getting-started` for HOOPS AI-specific examples!

PyTorch Basics
--------------

**Tensors**
   Multi-dimensional arrays (similar to NumPy) that can run on GPUs (Graphics Processing Units - hardware accelerators for parallel computation):
   
   .. code-block:: python
   
      import torch
      
      # Create tensors
      features = torch.tensor([[1.0, 2.0], [3.0, 4.0]])  # 2D tensor
      labels = torch.tensor([0, 1])  # 1D tensor
      
      # Move to GPU
      features_gpu = features.cuda()

**Modules (Models)**
   Neural networks are defined as classes inheriting from ``torch.nn.Module``:
   
   .. code-block:: python
   
      import torch.nn as nn
      
      class SimpleClassifier(nn.Module):
          def __init__(self, input_dim, num_classes):
              super().__init__()
              self.fc1 = nn.Linear(input_dim, 128)
              self.fc2 = nn.Linear(128, num_classes)
          
          def forward(self, x):
              x = torch.relu(self.fc1(x))
              x = self.fc2(x)
              return x

**Optimizers**
   Algorithms that update model weights during training using computed gradients (the direction and magnitude to adjust each weight):
   
   - **Adam** (Adaptive Moment Estimation): Automatically adjusts learning rates for each parameter, works well in most cases (HOOPS AI default)
   - **SGD** (Stochastic Gradient Descent): Updates weights using random batches of data, simpler but requires careful tuning of learning rate

PyTorch Lightning
-----------------

PyTorch Lightning is a high-level wrapper that removes boilerplate code (repetitive setup) and adds best practices automatically.

**Why Lightning?**

- Handles GPU/CPU switching automatically
- Manages training/validation loops (no need to write ``for epoch in range(num_epochs)`` yourself)
- Integrates logging and checkpointing (saves model progress during training)
- Scales to multi-GPU training easily

**LightningModule Structure**:

.. code-block:: python

   import pytorch_lightning as pl
   
   class MyModel(pl.LightningModule):
       def __init__(self):
           super().__init__()
           self.model = build_network()
       
       def forward(self, x):
           """Forward pass - how data flows through the model"""
           return self.model(x)
       
       def training_step(self, batch, batch_idx):
           """What happens for each training batch"""
           x, y = batch
           predictions = self(x)
           loss = compute_loss(predictions, y)
           self.log('train_loss', loss)
           return loss
       
       def validation_step(self, batch, batch_idx):
           """What happens for each validation batch"""
           x, y = batch
           predictions = self(x)
           accuracy = compute_accuracy(predictions, y)
           self.log('val_accuracy', accuracy)
       
       def configure_optimizers(self):
           """Define optimizer and learning rate schedule"""
           return torch.optim.Adam(self.parameters(), lr=0.001)

**Trainer**:

The Trainer handles the entire training loop:

.. code-block:: python

   from pytorch_lightning import Trainer
   
   trainer = Trainer(
       max_epochs=100,           # Train for 100 epochs
       accelerator='gpu',        # Use GPU if available
       devices=1,                # Use 1 GPU
       log_every_n_steps=10,    # Log metrics every 10 batches
   )
   
   trainer.fit(model, train_dataloader, val_dataloader)

DGL: Deep Graph Library
=======================

HOOPS AI uses **DGL** (Deep Graph Library) for efficient graph neural network operations. DGL is an open-source Python library optimized for :term:`GNN <Graph Neural Network (GNN)>` training on both CPUs and GPUs.

.. seealso::
   New to graph libraries? See the `DGL tutorials <https://docs.dgl.ai/tutorials/blitz/index.html>`_ for hands-on introduction, or start with :doc:`/tutorials/index` for HOOPS AI-specific examples!

DGL Graph Structure
-------------------

.. code-block:: python

   import dgl
   import torch
   
   # Create graph from edge list
   # Face-adjacency: face 0 connects to faces 1,2,3
   src = [0, 0, 0, 1, 1, 2]
   dst = [1, 2, 3, 2, 3, 3]
   graph = dgl.graph((src, dst))
   
   # Add node features (face geometric properties)
   graph.ndata['features'] = torch.randn(4, 64)  # 4 faces, 64-dim features
   
   # Add edge features (relationship properties)
   graph.edata['edge_attr'] = torch.randn(6, 16)  # 6 edges, 16-dim features

**Batching Multiple Graphs**:

When training on multiple CAD models, DGL batches graphs into a single large graph (combines multiple small graphs into one big graph for efficient parallel processing):

.. code-block:: python

   # Batch 3 CAD models (graphs) together
   graphs = [graph1, graph2, graph3]
   batched = dgl.batch(graphs)  # Creates one large graph with disconnected components
   
   # Process entire batch through GNN in parallel
   output = gnn_model(batched, batched.ndata['features'])

PyTorch Geometric: Alternative Graph Library
=============================================

While HOOPS AI primarily uses DGL, **PyTorch Geometric (PyG)** is another popular graph deep learning library. PyG provides similar functionality to DGL with a slightly different API design.

.. seealso::
   Want to learn PyG? See the `PyTorch Geometric documentation <https://pytorch-geometric.readthedocs.io/>`_ for comprehensive tutorials!

When You Might See PyTorch Geometric
-------------------------------------

PyTorch Geometric may appear in:

   - **Research code**: Many :term:`GNN <Graph Neural Network (GNN)>` papers provide PyG implementations
   - **External tutorials**: CAD ML tutorials often use PyG for examples
   - **Custom implementations**: Users extending HOOPS AI with custom architectures

PyG Graph Structure
-------------------

.. code-block:: python

   import torch
   from torch_geometric.data import Data
   
   # Create graph from edge list
   # Edge index: [source_nodes, destination_nodes]
   edge_index = torch.tensor([[0, 0, 0, 1, 1, 2],  # Source nodes
                              [1, 2, 3, 2, 3, 3]], # Destination nodes
                             dtype=torch.long)
   
   # Node features
   x = torch.randn(4, 64)  # 4 faces, 64-dim features
   
   # Edge features
   edge_attr = torch.randn(6, 16)  # 6 edges, 16-dim features
   
   # Create PyG Data object
   graph = Data(x=x, edge_index=edge_index, edge_attr=edge_attr)

**Converting Between DGL and PyTorch Geometric**:

.. code-block:: python

   # DGL to PyG
   def dgl_to_pyg(dgl_graph):
       edge_index = torch.stack(dgl_graph.edges())
       x = dgl_graph.ndata['features']
       edge_attr = dgl_graph.edata.get('edge_attr', None)
       return Data(x=x, edge_index=edge_index, edge_attr=edge_attr)
   
   # PyG to DGL
   def pyg_to_dgl(pyg_data):
       src, dst = pyg_data.edge_index
       g = dgl.graph((src, dst))
       g.ndata['features'] = pyg_data.x
       if pyg_data.edge_attr is not None:
           g.edata['edge_attr'] = pyg_data.edge_attr
       return g

**Key Differences**:

.. list-table::
   :header-rows: 1
   :widths: 30 35 35

   * - Aspect
     - DGL
     - PyTorch Geometric
   * - Graph Storage
     - Graph object with ndata/edata
     - Data object with x/edge_index
   * - Batching
     - ``dgl.batch()`` creates large graph
     - ``Batch.from_data_list()`` creates batch
   * - Message Passing
     - Functional API (``g.update_all()``)
     - Layer-based (``GCNConv()``)
   * - HOOPS AI Usage
     - Primary library (recommended)
     - Not directly used (but compatible)

**Why HOOPS AI Uses DGL**:

   1. **Efficient batching**: Better performance for variable-size CAD graphs (different models have different numbers of faces)
   2. **Flexible message passing**: Easier to customize for CAD-specific operations (like handling :term:`B-rep <B-Rep (Boundary Representation)>` topology)
   3. **Integration**: Better integration with the existing HOOPS AI pipeline

.. Machine Learning Workflow in HOOPS AI
.. ======================================

.. The typical ML workflow in HOOPS AI follows these stages. This section shows how all the concepts we've discussed (graphs, :term:`feature vectors <Feature Vector>`, neural networks) come together in practice.

.. .. seealso::
..    Want a hands-on walkthrough? See :doc:`/tutorials/getting-started` for step-by-step examples, or :doc:`/programming_guide/cad-data-encoding` for detailed data extraction patterns!

.. 1. Data Extraction and Encoding
.. --------------------------------

.. Convert CAD files into ML-friendly representations (from :term:`B-rep <B-Rep (Boundary Representation)>` geometry to numeric :term:`feature vectors <Feature Vector>`):

.. .. code-block:: python

..    from hoops_ai.cadaccess import HOOPSLoader
..    from hoops_ai.cadencoder import BrepEncoder
..    from hoops_ai.storage.datastorage import MemoryStorage
   
..    # Load CAD file
..    cad_loader = HOOPSLoader()
..    cad_model = cad_loader.create_from_file("part.step")
   
..    # Extract features
..    brep = cad_model.get_brep()
..    storage = MemoryStorage()
..    encoder = BrepEncoder(brep, storage)
   
..    # Extract graph topology
..    encoder.push_face_adjacency_graph()
   
..    # Extract geometric features
..    encoder.push_face_attributes()
..    encoder.push_face_uv_grid(ugrid=10, vgrid=10)

.. **What gets extracted?**

..    - :term:`Topology <Topology>`: Graph structure (which faces are adjacent) - see :doc:`/programming_guide/cad-fundamentals` for details
..    - **Geometry**: Numeric features (area, curvature, surface type, point clouds)
..    - **Metadata**: File information, labels, custom attributes

.. 2. Dataset Creation and Management
.. -----------------------------------

.. Organize encoded data for training (combine all your CAD models into a structured dataset):

.. .. code-block:: python

..    from hoops_ai.dataset import DatasetLoader
   
..    # Load merged dataset (created by Flow pipeline)
..    dataset_loader = DatasetLoader(
..        merged_store_path="dataset.dataset",
..        parquet_file_path="dataset.infoset"
..    )
   
..    # Explore dataset
..    explorer = dataset_loader.explorer
..    print(f"Total files: {len(explorer.file_ids())}")
..    print(f"Available groups: {explorer.available_groups()}")
   
..    # Split into train/validation/test sets
..    dataset_loader.split(
..        key="label",           # Stratify by label (meaning maintain the same class distribution in each split)
..        group="file",          # Split at file level (not individual faces)
..        train=0.7,            # 70% training
..        validation=0.15,      # 15% validation
..        test=0.15             # 15% test
..    )

.. **Why split the data?**

..    - **Training set**: Used to update model weights through backpropagation (the algorithm that adjusts weights to minimize error)
..    - **Validation set**: Checks performance during training to detect **overfitting** (when the model memorizes training data instead of learning general patterns)
..    - **Test set**: Final evaluation on completely unseen data to measure real-world performance

.. 3. Model Definition
.. -------------------

.. In HOOPS AI, models are defined through the ``FlowModel`` interface:

.. .. code-block:: python

..    from hoops_ai.ml import FlowModel
..    import pytorch_lightning as pl
..    import torch.nn as nn
   
..    class MyFlowModel(FlowModel):
..        def __init__(self, num_classes):
..            self.num_classes = num_classes
       
..        def encode_cad_data(self, cad_file, cad_access, storage):
..            """Define what features to extract from CAD"""
..            brep = cad_access.get_brep()
..            encoder = BrepEncoder(brep, storage)
..            encoder.push_face_adjacency_graph()
..            encoder.push_face_attributes()
       
..        def convert_encoded_data_to_graph(self, storage, graph, filename):
..            """Convert encoded features to DGL graph"""
..            # Load topology
..            graph_data = storage.load('graph')
..            g = dgl.graph((graph_data['edges']['source'], 
..                          graph_data['edges']['destination']))
           
..            # Load features
..            face_attrs = storage.load('faces')
..            g.ndata['features'] = torch.tensor(face_attrs['area'])
           
..            graph.save(g, filename)
       
..        def retrieve_model(self) -> pl.LightningModule:
..            """Define the neural network architecture"""
..            return MyGNNModel(num_classes=self.num_classes)

.. The ``FlowModel`` separates concerns:

..    - **Data extraction**: What features to compute (geometry, topology)
..    - **Graph conversion**: How to structure data for the GNN
..    - **Model architecture**: The neural network design

.. 4. Training
.. -----------

.. Training uses the ``FlowTrainer`` class:

.. .. code-block:: python

..    from hoops_ai.ml import FlowTrainer
   
..    # Create trainer
..    trainer = FlowTrainer(
..        flowmodel=my_flow_model,
..        dataset_loader=dataset_loader,
..        max_epochs=100,
..        batch_size=32,
..        learning_rate=0.001,
..        accelerator='gpu'  # or 'cpu'
..    )
   
..    # Start training
..    trainer.train()

.. **What happens during training?**

..    1. For each **epoch** (one complete pass through all training data):
      
..       a. Load batches of CAD :term:`graphs <Face Adjacency Graph>`
..       b. **Forward pass**: Compute predictions by passing data through the network
..       c. Calculate loss: How wrong are the predictions?
..       d. **Backward pass**: Compute gradients using backpropagation (calculate how to adjust each weight)
..       e. Update weights: Improve the model using the optimizer
..       f. Validate: Check performance on validation set
      
..    2. Track metrics: Loss, accuracy, F1-score
..    3. Save **checkpoints** (saved model weights at specific training stages): Best model based on validation performance
..    4. Log to TensorBoard: Visualize training progress

.. **Hyperparameters** (settings you choose before training, not learned from data) **to tune**:

..    - **Learning rate**: Too high = unstable, too low = slow training
..    - **Batch size**: Larger = faster but needs more GPU memory
..    - **Number of epochs**: More epochs may improve performance but risk overfitting
..    - :term:`GNN <Graph Neural Network (GNN)>` **depth** (number of layers): More layers = larger **receptive field** (how much of the graph each node can "see") but harder to train

.. .. seealso::
..    Need help tuning hyperparameters? See :doc:`/tutorials/hyperparameter-tuning` for practical guidance!

.. 5. Evaluation and Inference
.. ----------------------------

.. After training, evaluate on the test set and run inference on new CAD files:

.. .. code-block:: python

..    from hoops_ai.ml import FlowInference
   
..    # Load trained model
..    inference = FlowInference(
..        cad_loader=cad_loader,
..        flowmodel=my_flow_model,
..    )
   
..    # Predict on new CAD file
..    prediction = inference.infer_from_file(
..        cad_file="new_part.step",
..        checkpoint_path="best_model.ckpt"
..    )
   
..    print(f"Predicted class: {prediction['class']}")
..    print(f"Confidence: {prediction['probability']:.2%}")

.. **Common Evaluation Metrics**:

..    - **Accuracy**: Percentage of correct predictions (total correct / total predictions)
..    - **Precision**: Of all "positive" predictions, how many were correct? (true positives / (true positives + false positives))
..    - **Recall**: Of all actual positives, how many did we find? (true positives / (true positives + false negatives))
..    - **F1-Score**: **Harmonic mean** (a type of average that penalizes extreme values) of precision and recall
..    - **Confusion Matrix**: A table showing which classes get confused with each other (rows = actual class, columns = predicted class)

Loss Functions
==============

Loss functions measure how wrong the model's predictions are. HOOPS AI uses different losses for different tasks.

.. seealso::
   For practical loss function usage, see :doc:`/tutorials/classification` and :doc:`/tutorials/segmentation` examples!

Cross-Entropy Loss (Classification)
------------------------------------

For predicting discrete classes (part type, feature type):

.. math::

   \mathcal{L}_{CE} = -\sum_{i=1}^{C} y_i \log(\hat{y}_i)

Where:

- :math:`y_i` is the true label (**one-hot encoded**, meaning a vector where only the correct class is 1 and all others are 0, like [0, 1, 0, 0] for class 2)
- :math:`\hat{y}_i` is the predicted probability distribution (after applying **softmax**, which converts raw scores to probabilities that sum to 1)
- :math:`C` is the number of classes

.. code-block:: python

   import torch.nn.functional as F
   
   # Model outputs logits
   logits = model(graph, features)  # Shape: [batch_size, num_classes]
   
   # Compute loss
   loss = F.cross_entropy(logits, labels)

**Intuition**: Penalizes confident wrong predictions more than uncertain ones (predicting 90% confidence for the wrong class is penalized more than 60% confidence).

Binary Cross-Entropy (Node Classification)
------------------------------------------

For binary classification on each node (e.g., is this face a hole?):

.. math::

   \mathcal{L}_{BCE} = -\frac{1}{N}\sum_{i=1}^{N} [y_i \log(\hat{y}_i) + (1-y_i)\log(1-\hat{y}_i)]

Where :math:`N` is the number of nodes (faces), :math:`y_i` is 0 or 1 (binary label), and :math:`\hat{y}_i` is predicted probability.

Mean Squared Error (Regression)
--------------------------------

For predicting continuous values (like surface area or curvature):

.. math::

   \mathcal{L}_{MSE} = \frac{1}{N}\sum_{i=1}^{N} (y_i - \hat{y}_i)^2

Where :math:`y_i` is the true value and :math:`\hat{y}_i` is the predicted value. MSE penalizes large errors quadratically (an error of 2 is penalized 4 times more than an error of 1).

Advanced Concepts
=================

This section covers advanced ML techniques used in CAD research. These concepts are **optional** - you can build effective HOOPS AI models without them!

.. important::
   **Advanced Section!** The following topics are for readers interested in cutting-edge research. You can skip to :doc:`/tutorials/getting-started` to start building models immediately!

Transfer Learning and Domain Adaptation
----------------------------------------

**Transfer Learning**: Use a model pre-trained on one dataset (e.g., synthetic CAD) and **fine-tune** (continue training with a smaller learning rate) on another (e.g., real CAD).

.. code-block:: python

   # Load pre-trained weights (saved model parameters)
   model = MyGNNModel.load_from_checkpoint("pretrained.ckpt")
   
   # Fine-tune on new dataset with smaller learning rate
   trainer = FlowTrainer(
       flowmodel=flow_model,
       dataset_loader=new_dataset,
       learning_rate=0.0001,  # 10x smaller for fine-tuning
       max_epochs=50
   )

**Domain Adaptation**: Techniques to reduce the gap between training data (synthetic) and real-world data:

- **Adversarial training**: Train the model to make features **domain-invariant** (features that look the same whether they come from synthetic or real data)
- **Data augmentation** (creating variations of existing data): Add noise, rotations, variations to synthetic data to make the model more robust
- **Progressive fine-tuning**: Train on synthetic → fine-tune on small real dataset

.. tip::
   **For Advanced Readers**: Transfer learning and domain adaptation are research topics explored in papers like **Self-Supervised Representation Learning for CAD** (Jones et al., CVPR 2023). HOOPS AI supports these workflows through the ``FlowTrainer`` interface!

Regularization Techniques
--------------------------

Methods to prevent overfitting (when the model memorizes training data instead of learning general patterns):

**Dropout**
   Randomly "drop" (set to zero) neurons during training to prevent co-adaptation (where neurons become too dependent on each other):
   
   .. code-block:: python
   
      self.dropout = nn.Dropout(p=0.5)  # Drop 50% of neurons randomly
      x = self.dropout(x)

**Weight Decay**
   Add penalty for large weights (**L2 regularization**, meaning the sum of squared weights is added to the loss) to prevent overfitting:
   
   .. code-block:: python
   
      optimizer = torch.optim.Adam(model.parameters(), 
                                   lr=0.001, 
                                   weight_decay=1e-5)  # Penalty coefficient

**Early Stopping**
   Stop training when validation performance stops improving (prevents wasting time and overfitting):
   
   .. code-block:: python
   
      from pytorch_lightning.callbacks import EarlyStopping
      
      early_stop = EarlyStopping(
          monitor='val_accuracy',
          patience=10,  # Stop if no improvement for 10 epochs
          mode='max'
      )
      trainer = Trainer(callbacks=[early_stop])

.. seealso::
   For practical regularization examples, see :doc:`/tutorials/preventing-overfitting`!

Attention Mechanisms
--------------------

Attention allows models to focus on important parts of the input (e.g., which faces are most relevant for predicting a machining feature).

.. math::

   \text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V

Where Q (query), K (key), and V (value) are learned transformations of the input. The softmax operation produces weights that determine how much to "attend to" each input element.

In CAD context:

- **Graph Attention**: Learns which neighboring faces are most relevant (e.g., holes matter more than outer shells for certain features)
- **Transformer Attention**: Captures long-range dependencies across the entire model (e.g., detecting symmetry patterns)

Self-Supervised Learning
------------------------

Learning useful representations without manual labels (useful when labeled CAD data is expensive to obtain):

**Contrastive Learning**
   Learn to distinguish between similar and dissimilar samples:
   
   - **Positive pairs**: Different views of the same CAD model (e.g., different rotations or UV samplings)
   - **Negative pairs**: Different CAD models entirely
   - Goal: Make **representations** (the :term:`feature vectors <Feature Vector>` produced by the encoder) of positive pairs similar, negative pairs different

**Masked Prediction** (BERT-style)
   Hide parts of the input and predict them (like filling in blanks):
   
   - Mask certain faces in a CAD model (hide their features)
   - Train the model to predict their properties from surrounding context
   - Used in **BRep-BERT architecture** (BERT stands for Bidirectional Encoder Representations from Transformers - a self-supervised learning approach)

Best Practices for CAD Machine Learning
========================================

This section provides practical guidelines for building effective CAD ML models with HOOPS AI.

.. seealso::
   For hands-on examples demonstrating these practices, see :doc:`/tutorials/best-practices`!

Data Quality
------------

   1. **Balanced datasets**: Ensure each class has sufficient examples (aim for 100+ per class to avoid **class imbalance**, where the model ignores rare classes)
   2. **Clean labels**: Verify ground truth labels are accurate (incorrect labels = model learns wrong patterns)
   3. **Representative data**: Include variation (simple/complex parts, different manufacturers) to ensure **generalization** (ability to work on unseen data)
   4. **Consistent preprocessing**: Use the same encoding settings for train/test data (e.g., same :term:`UV grid <UV Grid>` resolution)

Feature Engineering
-------------------

   1. **Normalization** (scaling features to similar ranges, typically [0,1] or mean=0 std=1): Scale features to prevent some from dominating others
      
      .. code-block:: python
      
         # Area in [0.01, 1000] m² → normalize to [0, 1] using min-max scaling
         normalized_area = (area - area_min) / (area_max - area_min)

   2. **Domain knowledge**: Include CAD-specific features (surface type, curvature, convexity - see :doc:`/programming_guide/cad-fundamentals` for CAD concepts)
   3. **Multi-scale features**: Combine local (face-level) and global (model-level) information for richer representations

Model Training
--------------

   1. **Start simple**: Try a basic GCN before complex architectures (simpler models are easier to debug and may work just as well)
   2. **Monitor overfitting**: Track train vs. validation metrics (if train keeps improving but validation plateaus or worsens, you're overfitting)
   3. **Use checkpoints**: Save best model based on validation performance (not final epoch!)
   4. **Visualize training**: Use TensorBoard to spot issues early (loss spikes, overfitting trends)
   5. **Ablation studies** (systematically removing features/components to see what's actually needed): Test which features matter most for your task

.. seealso::
   Need help visualizing training? See :doc:`/tutorials/tensorboard-guide` for TensorBoard setup and usage!

Debugging ML Models
-------------------

**Common Issues**:

1. **Loss not decreasing**:
   
   - Check learning rate (try 1e-3, 1e-4, 1e-5)
   - Verify data preprocessing (normalized? correct shapes?)
   - Inspect first batch manually (print values, check for NaN or extreme outliers)

2. **Overfitting** (train accuracy high, validation low):
   
   - Add dropout (start with p=0.3)
   - Reduce model complexity (fewer layers, smaller hidden dimensions)
   - Get more training data (or use data augmentation)
   - Use regularization (weight decay, early stopping)

3. **Underfitting** (both train and validation accuracy low):
   
   - Increase model capacity (more layers, wider layers - increase hidden_dim)
   - Train longer (more epochs)
   - Add more relevant features (see :doc:`/programming_guide/cad-data-encoding` for feature extraction options)

4. **Unstable training** (loss oscillates or explodes):
   
   - Reduce learning rate (divide by 10)
   - Use **gradient clipping** (limiting gradient magnitudes to prevent extreme updates)
   - Check for data issues (NaN values, extreme outliers, incorrect normalization)

Resources and Further Reading
==============================

**For Beginners**:

- :doc:`/tutorials/getting-started`: Hands-on HOOPS AI introduction
- :doc:`/programming_guide/cad-fundamentals`: CAD concepts for ML practitioners
- :doc:`/resources/glossary`: Quick reference for ML and CAD terms

**Concepts**:

- :term:`Graph Neural Networks <Graph Neural Network (GNN)>`: `Geometric Deep Learning Grids, Groups, Graphs, Geodesics, and Gauges <https://geometricdeeplearning.com/>`_
- PyTorch: `Official PyTorch Tutorials <https://pytorch.org/tutorials/>`_
- PyTorch Lightning: `Lightning Documentation <https://lightning.ai/docs/pytorch/stable/>`_
- DGL: `DGL User Guide <https://docs.dgl.ai/guide/index.html>`_

**CAD ML Research**:

- **UV-Net**: `Learning from Boundary Representations (Jayaraman et al., CVPR 2021) <https://openaccess.thecvf.com/content/CVPR2021/papers/Jayaraman_UV-Net_Learning_From_Boundary_Representations_CVPR_2021_paper.pdf>`_
- **Self-Supervised CAD**: `Representation Learning for CAD (Jones et al., CVPR 2023) <https://arxiv.org/abs/2210.10807>`_
- **BrepMFR**: `Machining Feature Recognition with Graph Transformers <https://arxiv.org/abs/2209.09839>`_

Next Steps
==========

1. Read :doc:`CAD Fundamentals </programming_guide/cad-fundamentals>` to understand :term:`B-rep <B-Rep (Boundary Representation)>` representation
2. Follow :doc:`Tutorials </tutorials/index>` for hands-on examples
3. Explore :doc:`API Reference </api>` for detailed class documentation

**Additional Resources**:

- :doc:`/resources/glossary` for quick term definitions
- :doc:`/programming_guide/cad-data-encoding` for practical feature extraction examples
- :doc:`/tutorials/getting-started` for step-by-step walkthroughs