######################################
Develop Your own ML Model
######################################

.. warning::
   **EXPERIMENTAL STATUS**
   
   **Important**: This architecture is currently **EXPERIMENTAL** as it primarily focuses on fitting PyTorch Lightning model wrappers into a standardized interface. The schema and API may change in future releases based on:

    - Evolving requirements for different ML architectures
    - Performance optimization needs
    - Integration with additional ML frameworks beyond PyTorch Lightning
    - Community feedback and use cases

    Users should expect potential breaking changes in upcoming versions as the architecture matures.

.. sidebar:: Quick Navigation

    .. contents:: 
       :local:
       :depth: 1

..    - :ref:`train-what-is-ml`
..    - :ref:`train-understanding-flow`
..    - :ref:`train-choosing-task`
..    - :ref:`train-training-workflow`
..    - :ref:`train-inference-workflow`
..    - :ref:`train-analyzing-results`

.. _train-what-is-ml:

What is Machine Learning for CAD?
==================================

This guide assumes you understand basic machine learning concepts like training, inference, neural networks, and graph neural networks. If you're new to ML or need a refresher:

.. seealso::
   **New to Machine Learning?** Start with :doc:`/programming_guide/ml-fundamentals`:

   - :ref:`programming_guide/ml-fundamentals:What is Machine Learning?` - Classification, regression, node classification
   - :ref:`programming_guide/ml-fundamentals:Neural Networks Basics` - Training process, epochs, batches, overfitting
   - :ref:`programming_guide/ml-fundamentals:Graph Neural Networks (GNNs)` - Why GNNs for CAD, message passing, graph representation

**Training vs. Inference in Brief**:

    - **Training**: Feed thousands of labeled CAD models to a neural network, which adjusts its parameters to minimize prediction errors. Output: a trained model checkpoint file.
    - **Inference**: Load a trained model and use it to predict labels for new, unlabeled CAD files.

.. _train-understanding-flow:

Components Overview
===================

HOOPS AI's ML system has three main components that work together:

FlowModel - Defines the ML Architecture
----------------------------------------

:class:`FlowModel<hoops_ai.ml.flow_model.FlowModel>` is an abstract base class (from :autolink:`hoops_ai.ml.flow_model`) that defines the contract that all Flow Models must implement.

**Location**: ``src/hoops_ai/ml/EXPERIMENTAL/flow_model.py``

A FlowModel encapsulates **how** to:

    1. Transform a CAD file into encoded features
    2. Process labels for supervised learning
    3. Convert encoded features into graph structures
    4. Load model inputs from persisted files
    5. Batch multiple inputs together (collation)
    6. Retrieve the underlying PyTorch Lightning model
    7. Post-process predictions into interpretable results
    8. Access training metrics

You instantiate a FlowModel implementation (:class:`GraphClassification<hoops_ai.flowmanager._flows.flow_model_graph_classification>`, :class:`GraphNodeClassification<hoops_ai.flowmanager._flows.flow_model_graphnode_classification>`, or :class:`CustomFlowModel<hoops_ai.flowmanager._flows.flexible_flow_model>`) and pass it to both FlowTrainer and FlowInference.

FlowTrainer - Trains Neural Networks
-------------------------------------

:class:`FlowTrainer<hoops_ai.ml.flow_trainer.FlowTrainer>` (from :autolink:`hoops_ai.ml.flow_trainer`) orchestrates the complete training workflow for Flow Models.

**Location**: ``src/hoops_ai/ml/EXPERIMENTAL/flow_trainer.py``

**Purpose**: FlowTrainer handles:

    - Dataset loading and splitting (train/validation/test)
    - Model initialization and checkpointing
    - Training loop with PyTorch Lightning
    - Metric logging and visualization
    - Data quality validation (purify method)

**Key Advantage**: By consuming the FlowModel interface, FlowTrainer **automatically knows** how to:

    1. Load encoded graph files using ``load_model_input_from_files()``
    2. Batch multiple samples using ``collate_function()``
    3. Initialize the correct model architecture via ``retrieve_model()``
    4. Access training metrics through ``metrics()``

**Result**: Write the encoding logic once in your FlowModel implementation, and both training and inference use it consistently.

FlowInference - Makes Predictions
----------------------------------

:class:`FlowInference<hoops_ai.ml.flow_inference.FlowInference>` (from :autolink:`hoops_ai.ml.flow_inference`) handles single-file CAD inference using trained Flow Models.

**Location**: ``src/hoops_ai/ml/EXPERIMENTAL/flow_inference.py``

**Purpose**: FlowInference provides:

    - On-the-fly CAD encoding (identical to training encoding)
    - Model checkpoint loading
    - Single-file prediction pipeline
    - Clean separation from batch training infrastructure

**Key Advantage**: By consuming the same FlowModel interface used during training, FlowInference **guarantees** that:

    1. CAD files are encoded using the exact same logic as training data
    2. Graph construction follows the same schema
    3. Feature dimensions match model expectations
    4. No code duplication between training and inference

**Result**: Train once, deploy confidently knowing the encoding pipeline is consistent.

Workflow Summary
----------------

    1. **Define Your ML Pipeline**: Implement a :class:`FlowModel<hoops_ai.ml.flow_model.FlowModel>` or use one of the provided implementations (GraphClassification or GraphNodeClassification)

    2. **Prepare Your Data**: Use :class:`DatasetLoader<hoops_ai.dataset.dataset_loader.DatasetLoader>` to load and split your dataset

    3. **Train Your Model**: Use :class:`FlowTrainer<hoops_ai.ml.flow_trainer.FlowTrainer>` to train and validate your model

    4. **Perform Inference**: Use :class:`FlowInference<hoops_ai.ml.flow_inference.FlowInference>` to make predictions on new CAD models

Quick Start Example
====================

Here's a complete workflow for training and deploying a machine learning model on CAD data.

**Training Example:**

.. code-block:: python

   from hoops_ai.ml import FlowTrainer
   from hoops_ai.ml import GraphNodeClassification
   from hoops_ai.dataset import DatasetLoader
   
   # Create your flow model
   flowmodel = GraphNodeClassification(
       num_classes=25,
       n_layers_encode=8,
       result_dir="./results"
   )
   
   # Create dataset loader
   dataset_loader = DatasetLoader(
       graph_files=["path/to/graphs/*.bin"],
       label_files=["path/to/labels/*.json"]
   )
   dataset_loader.split_data(train_ratio=0.7, val_ratio=0.15, test_ratio=0.15)
   
   # Create trainer
   trainer = FlowTrainer(
       flowmodel=flowmodel,
       datasetLoader=dataset_loader,
       batch_size=32,
       num_workers=4,
       experiment_name="machining_feature_recognition",
       accelerator='gpu',
       devices=1,
       gradient_clip_val=1.0,
       max_epochs=100,
       learning_rate=0.002,
       result_dir="./results"
   )
   
   # Train the model
   best_checkpoint_path = trainer.train()
   print(f"Training complete! Best model: {best_checkpoint_path}")

**Inference Example:**

.. code-block:: python

   from hoops_ai.ml import FlowInference, GraphNodeClassification
   from hoops_ai.cadaccess import HOOPSLoader
   import time
   
   # Setup
   cad_loader = HOOPSLoader()
   flowmodel = GraphNodeClassification(num_classes=25)
   inference = FlowInference(cad_loader, flowmodel)
   inference.load_from_checkpoint("./trained_models/best.ckpt")
   
   # Inference on new CAD file
   start_time = time.time()
   batch = inference.preprocess("new_part.step")
   predictions = inference.predict_and_postprocess(batch)
   total_time = time.time() - start_time
   
   # Results
   print(f"Inference completed in {total_time:.2f} seconds")
   print(f"Face predictions: {predictions['node_predictions']}")

.. seealso::
   **Using these components in multi-step workflows?**
   
   See :doc:`/programming_guide/flow` to learn how the Flow orchestration system 
   combines encoding, training, and inference into automated pipelines.


.. _train-choosing-task:

FlowModel Interface
===================

The FlowModel class defines the contract between the Flow Management system and the machine learning pipeline. It specifies how CAD data should be encoded, how graph structures should be built, and which model architecture should be used for training. Understanding FlowModel is essential because it defines what operations your flow will perform on each CAD file in your dataset.

:class:`FlowModel<hoops_ai.ml.flow_model.FlowModel>` is the abstract base class that defines the entire ML pipeline. Users implement this interface to create custom ML solutions or use the provided implementations.


Understanding FlowModel
--------------------------

What Problem Does FlowModel Solve?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Machine learning workflows for CAD data involve two distinct phases with very different requirements:

  1. **Training Phase (Dataset Processing)**:

      - Process large CAD datasets into ML-ready input files
      - Work with encoded datasets (no longer raw CAD files) for data science experimentation
      - Manage training, validation, and test splits carefully
      - Run experimentation cycles exclusively with preprocessed ML files for efficiency

  2. **Inference Phase (Single File Processing)**:

      - Receive a new CAD file as input to the trained model
      - Must encode the CAD data **exactly** as done during training
      - Operate on single files without dataset infrastructure
      - Require "memory" of the encoding process used during training

**The Challenge**: How do we ensure that the encoding logic used to prepare training data is identical to the encoding used during inference? How do we avoid messy code duplication when handling batch datasets versus single files?

**The Solution**: 

FlowModel provides a unified abstraction that encapsulates:

    - CAD data encoding strategies (what features to extract)
    - Label processing logic (how to structure labels)
    - Graph conversion methods (how to build neural network inputs)
    - Model input preparation (how to batch and format data)
    - Model architecture references (which neural network to use)

You configure FlowModel once, then:

    - Pass it to :class:`FlowTrainer<hoops_ai.ml.flow_trainer.FlowTrainer>` → it knows how to train on batched datasets
    - Pass it to :class:`FlowInference<hoops_ai.ml.flow_inference.FlowInference>` → it knows how to predict on single files (using identical preprocessing)

This guarantees encoding consistency across the entire ML lifecycle - preventing the common bug where training and inference process data differently.

The FlowModel Abstract Interface
---------------------------------

FlowModel is an abstract base class (from :autolink:`hoops_ai.ml.flow_model`) that declares methods for each pipeline stage. The architecture follows this relationship:

  .. code-block:: text

      ┌───────────────────────────────────────────────────────────┐
      │                  FlowModel (Abstract)                     │
      │  - encode_cad_data()                                      │
      │  - encode_label_data()                                    │
      │  - convert_encoded_data_to_graph()                        │
      │  - load_model_input_from_files()                          │
      │  - collate_function()                                     │
      │  - retrieve_model()                                       │
      │  - predict_and_postprocess()                              │
      │  - metrics()                                              │
      └───────────────────────────────────────────────────────────┘
                                   ▲
                                   │
                             Implemented by
                                   │
              ┌────────────────────┴──────────────────┐
              │                                       │
        ┌──────────────────────┐        ┌─────────────────────────┐
        │ GraphClassification  │        │ GraphNodeClassification │
        │ (Graph classifier)   │        │ (Graph node classifier) │
        └──────────────────────┘        └─────────────────────────┘
              │                                       │
              └────────────────────┬──────────────────┘
                                   │
                               Consumed by
                                   │
              ┌────────────────────┴──────────────────┐
              │                                       │
      ┌──────────────────────┐    ┌────────────────────────────┐
      │    FlowTrainer       │    │     FlowInference          │
      │ - Batch processing   │    │ - Single file processing   │
      │ - Train/Val/Test     │    │ - Real-time encoding       │
      │ - Checkpointing      │    │ - Model deployment         │
      └──────────────────────┘    └────────────────────────────┘

The FlowModel Abstract Interface
---------------------------------

FlowModel is an abstract base class (ABC) that declares methods for each pipeline stage. Each method handles a specific part of the ML pipeline:

Abstract Methods - Data Processing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**encode_cad_data(cad_file: str, cad_access: CADLoader, storage: DataStorage) -> Tuple[int, int]**

    **Purpose**: Opens a CAD file and encodes its geometric/topological data into a format suitable for machine learning.

    **Parameters**:

        - ``cad_file`` (str): Path to the CAD file
        - ``cad_access`` (CADLoader): CAD file loading interface
        - ``storage`` (DataStorage): Storage handler for persisting encoded data

    **Returns**: Tuple containing face count and edge count

    **Workflow**:

        1. Configure CAD loading options (features, solids, BREP settings)
        2. Load the CAD model
        3. Extract BREP representation
        4. Use BrepEncoder to compute geometric features
        5. Push features to storage

**encode_label_data(label_storage: LabelStorage, storage: DataStorage) -> Tuple[str, int]**

    **Purpose**: Retrieves labeling information and stores it according to the ML task requirements.

    **Parameters**:

        - ``label_storage`` (LabelStorage): Interface to label data
        - ``storage`` (DataStorage): Storage handler for persisting labels

    **Returns**: Tuple containing label key and label count

    **Key Considerations**:

        - Handles different label granularities (graph-level vs. node-level)
        - Validates label-entity compatibility (e.g., graph labels for graph classification)
        - Stores both label codes and descriptions

**convert_encoded_data_to_graph(storage: DataStorage, graph: MLStorage, filename: str) -> Dict[str, Any]**

    **Purpose**: Converts encoded features from storage into a graph representation suitable as ML model input.

    **Parameters**:

        - ``storage`` (DataStorage): Source of encoded features
        - ``graph`` (MLStorage): Graph storage handler (e.g., DGL, PyTorch Geometric)
        - ``filename`` (str): Output filename for the serialized graph

    **Returns**: Dictionary with graph metadata (file size, node/edge counts, etc.)

    **Workflow**:

        1. Load graph structure (edges, nodes)
        2. Attach node features (e.g., face discretization samples)
        3. Attach edge features (e.g., edge curve grids)
        4. Attach labels (if available)
        5. Save graph to file

    **Mathematical Representation**:

    For a CAD model with faces :math:`\mathcal{F} = \{f_0, \ldots, f_{N_f-1}\}` and edges :math:`\mathcal{E} = \{e_0, \ldots, e_{N_e-1}\}`:

        - **Graph**: :math:`G = (V, E)` where `V = \mathcal{F}` and `E \subseteq V \times V`
        - **Node Features**: :math:`\mathbf{X}_v \in \mathbb{R}^{d_v}` for each :math:`v \in V`
        - **Edge Features**: :math:`\mathbf{X}_e \in \mathbb{R}^{d_e}` for each :math:`e \in E`
        - **Labels**: :math:`y \in \{0, \ldots, C-1\}` (graph-level) or :math:`\mathbf{y} \in \{0, \ldots, C-1\}^{|V|}` (node-level)

**load_model_input_from_files(graph_file: str, data_id: int, label_file: str = None) -> Any**

    **Purpose**: Loads a persisted graph and prepares it as model input. Used by DataLoader during training and inference.

    **Parameters**:

        - ``graph_file`` (str): Path to serialized graph file
        - ``data_id`` (int): Unique identifier for this data sample
        - ``label_file`` (str, optional): Path to label file (None during inference)

    **Returns**: Model-specific input format (e.g., DGL graph, PyTorch Geometric Data object)

    **Key Design Point**: This method is called multiple times by DatasetLoader, both during training (with labels) and inference (without labels). It must handle both cases gracefully.

**collate_function(batch) -> Any**

    **Purpose**: Combines multiple graph samples into a single batched input for the model.

    **Parameters**:

        - ``batch``: List of samples returned by ``load_model_input_from_files``

    **Returns**: Batched model input (framework-specific format)

    **Framework Examples**:

        - **DGL**: Use ``dgl.batch(graphs)`` to create a batched graph
        - **PyTorch Geometric**: Use ``Batch.from_data_list(data_list)``

Abstract Methods - Model Interface
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**retrieve_model(check_point: str = None) -> pl.LightningModule**

    **Purpose**: Returns the PyTorch Lightning model instance, optionally loaded from a checkpoint.

    **Parameters**:
        - ``check_point`` (str, optional): Path to saved model checkpoint

    **Returns**: PyTorch Lightning module ready for training or inference

**predict_and_postprocess(batch) -> Any**

    **Purpose**: Runs model inference on a batch and formats the output into interpretable predictions.

    **Parameters**:
        - ``batch``: Batched model input from ``collate_function``

    **Returns**: Post-processed predictions (e.g., class labels, probabilities, segmentation masks)

    **Typical Workflow**:
        1. Set model to eval mode
        2. Disable gradient computation
        3. Forward pass through model
        4. Apply softmax/argmax for classification
        5. Convert to numpy/lists for downstream use

**model_name() -> str**

    **Purpose**: Returns a human-readable name for the model.

**get_citation_info() -> Dict[str, Any]**

    **Purpose**: Provides citation information for the underlying ML architecture.

    **Returns**: Dictionary with keys: ``author``, ``paper``, ``year``, ``url``, ``architecture``, ``applications``

**metrics() -> MetricStorage**

    **Purpose**: Returns the metric storage object containing training/validation metrics.

    **Returns**: MetricStorage instance with logged metrics (loss, accuracy, etc.)

FlowModel Method Summary
~~~~~~~~~~~~~~~~~~~~~~~~~

FlowModel declares methods for each pipeline stage:

    **Data Processing Methods:**

        - :meth:`encode_cad_data(cad_file, cad_loader, storage)<hoops_ai.ml.flow_model.FlowModel.encode_cad_data>` - Extracts geometric features from CAD files
        - :meth:`encode_label_data(label_storage, storage)<hoops_ai.ml.flow_model.FlowModel.encode_label_data>` - Processes label information for supervised learning
        - :meth:`convert_encoded_data_to_graph(storage, graph_handler, filename)<hoops_ai.ml.flow_model.FlowModel.convert_encoded_data_to_graph>` - Builds graph structures from features

    **Model Interface Methods:**

        - :meth:`retrieve_model(checkpoint)<hoops_ai.ml.flow_model.FlowModel.retrieve_model>` - Returns the PyTorch Lightning model for training
        - :meth:`load_model_input_from_files(graph_file, data_id, label_file)<hoops_ai.ml.flow_model.FlowModel.load_model_input_from_files>` - Loads graph data for batching
        - :meth:`collate_function(batch) <hoops_ai.ml.flow_model.FlowModel.collate_function>` - Combines multiple samples into batches

    **Utility Methods:**

        - :meth:`model_name() <hoops_ai.ml.flow_model.FlowModel.model_name>` - Returns the model's name
        - :meth:`get_citation_info() <hoops_ai.ml.flow_model.FlowModel.get_citation_info>` - Provides citation information for the model
        - :meth:`predict_and_postprocess(batch) <hoops_ai.ml.flow_model.FlowModel.predict_and_postprocess>` - Formats model predictions
        - :meth:`metrics() <hoops_ai.ml.flow_model.FlowModel.metrics>` - Returns metric storage for tracking training progress

.. note::
   **Experimental Status**: The FlowModel architecture is currently **EXPERIMENTAL**. It primarily focuses on fitting PyTorch Lightning model wrappers into a standardized interface. The API may change in future releases based on evolving requirements, performance optimization needs, and community feedback.

Available FlowModel Implementations
------------------------------------

HOOPS AI provides three concrete FlowModel implementations:
   
**1. GraphClassification - Graph-Level Classifier**

    **Use Case**: Graph-level classification (e.g., part classification, shape categorization)

    **Key Features**:

        - Whole-model classification
        - 2D CNNs on face discretization samples
        - 1D CNNs on edge U-grids
        - Ideal for: Part type identification, shape similarity

    **Technology**: Based on a CNN+GNN architecture for learning from Boundary Representations

**2. GraphNodeClassification - Graph Node Classifier**

    **Use Case**: Node-level classification (e.g., machining feature recognition, face segmentation)

    **Key Features**:

        - Per-face classification
        - Transformer-based GNN architecture
        - Rich topological feature encoding
        - Ideal for: Feature recognition, semantic segmentation

    **Technology**: Based on a Transformer+GNN architecture with enhanced topological encoding

.. list-table:: FlowModel Comparison
   :header-rows: 1
   :widths: 25 25 25 25
   :align: center

   * - Implementation
     - Task Type
     - Output
     - Use When
   * - GraphClassification
     - Whole-model classification
     - 1 label per CAD model
     - Categorizing parts into classes
   * - GraphNodeClassification
     - Face-level segmentation
     - 1 label per face
     - Recognizing manufacturing features

.. seealso::
   **Implementation Details**
   
   For detailed information about each implementation including feature extraction, 
   architecture specifics, and complete code examples:
   
   - :doc:`/programming_guide/part-class` - GraphClassification for parts classification
   - :doc:`/programming_guide/feature-rec` - GraphNodeClassification for feature recognition
   - :doc:`/programming_guide/cad-data-encoding` - CustomFlowModel and custom feature extraction

.. _train-training-workflow:

FlowTrainer Interface
=========================

Understanding FlowTrainer
--------------------------

After you've chosen a FlowModel implementation (GraphClassification or GraphNodeClassification), the next step is training it on your data. :class:`FlowTrainer<hoops_ai.ml.flow_trainer.FlowTrainer>` orchestrates the entire training process, wrapping PyTorch Lightning to provide a high-level, easy-to-use interface.

What Does FlowTrainer Do?
~~~~~~~~~~~~~~~~~~~~~~~~~

Think of FlowTrainer as your training conductor. It handles all the technical complexity:

    1. **Batch Management**: Loads your preprocessed data in batches (groups of samples processed together)
    2. **Optimization Loop**: Runs the forward pass (prediction) → loss calculation → backward pass (gradient computation) → parameter update cycle
    3. **Validation**: Periodically evaluates the model on validation data to track generalization
    4. **Checkpointing**: Automatically saves model snapshots, including the best-performing version
    5. **Logging**: Records metrics (losses, accuracies, etc.) for later analysis via TensorBoard
    6. **Hardware Management**: Handles GPU/CPU placement, multi-GPU training, and mixed precision automatically

You don't need to understand PyTorch or deep learning internals - FlowTrainer abstracts all of this away.

Basic usage looks like this:

.. code-block:: python

    from hoops_ai.ml import FlowTrainer
    from hoops_ai.ml import GraphNodeClassification
    from hoops_ai.dataset import DatasetLoader
    
    # Create your flow model
    flowmodel = GraphNodeClassification(
        num_classes=25,
        n_layers_encode=8,
        result_dir="./results"
    )
    
    # Create dataset loader
    dataset_loader = DatasetLoader(
        graph_files=["path/to/graphs/*.bin"],
        label_files=["path/to/labels/*.json"]
    )
    dataset_loader.split_data(train_ratio=0.7, val_ratio=0.15, test_ratio=0.15)
    
    # Create trainer
    trainer = FlowTrainer(
        flowmodel=flowmodel,
        datasetLoader=dataset_loader,
        batch_size=32,
        num_workers=4,
        experiment_name="machining_feature_recognition",
        accelerator='gpu',
        devices=1,
        gradient_clip_val=1.0,
        max_epochs=100,
        learning_rate=0.002,
        result_dir="./results"
    )
    
    # Start training
    best_checkpoint_path = trainer.train()
    print(f"Training complete! Best model: {best_checkpoint_path}")

Key Configuration Options
-------------------------

When creating a FlowTrainer, you configure how the training process will run. The trainer requires two essential components and offers many optional parameters for fine-tuning. For complete API details, see :class:`hoops_ai.ml.flow_trainer.FlowTrainer`.

Required Components
~~~~~~~~~~~~~~~~~~~~~~~~~

**Core Components**:

    - ``flowmodel`` (FlowModel): Initialized FlowModel implementation
    - ``datasetLoader`` (DatasetLoader): Dataset with train/val/test splits

Training Hyperparameters
~~~~~~~~~~~~~~~~~~~~~~~~~

**Training Configuration**:

    - ``batch_size`` (int): Samples per training batch (default: 64)
    - ``num_workers`` (int): DataLoader worker processes (default: 0)
    - ``experiment_name`` (str): Name for logging and checkpoints
    - ``accelerator`` (str): 'cpu', 'gpu', or 'tpu' (default: 'cpu')
    - ``devices`` (int or 'auto'): Number of devices to use (default: 'auto')
    - ``gradient_clip_val`` (float): Gradient clipping threshold (default: 1.0)
    - ``max_epochs`` (int): Maximum training epochs (default: 100)
    - ``learning_rate`` (float): Initial learning rate (default: 0.002)
    - ``result_dir`` (str): Output directory for results

Core Training Methods
-------------------------

train() -> str
~~~~~~~~~~~~~~~~

Executes the full training loop and returns the path to the best checkpoint.

**Returns**: ``str`` - Path to best model checkpoint

**Workflow**:

    1. Initialize model from ``flowmodel.retrieve_model()``
    2. Create DataLoaders for train/val/test datasets
    3. Configure PyTorch Lightning Trainer with callbacks and loggers
    4. Execute training loop with automatic validation
    5. Save best checkpoint based on validation loss
    6. Log metrics via TensorBoard

**Example**:

    .. code-block:: python

        best_checkpoint_path = trainer.train()
        print(f"Training complete! Best model: {best_checkpoint_path}")

test(trained_model_path: str)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Evaluates a trained model on the test set.

**Example**:

    .. code-block:: python

        trainer.test(trained_model_path=best_checkpoint_path)

purify(num_processes: int = 1, chunks_per_process: int = 1)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Validates dataset quality by running forward and backward passes on each sample to detect numerical errors (NaNs, infinite values, or crashes). Validates data quality by attempting to load all samples and identifying corrupted/problematic data.

**Purpose**: ML training can fail due to corrupted graph files, mismatched dimensions, or encoding errors. This method proactively identifies problematic samples.

metrics_storage() -> MetricStorage
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Returns the path to the metrics file for this training run.

    **Returns**: ``str`` - Path to ``.metrics`` file

.. _train-inference-workflow:

FlowInference Interface
=======================

After training a model, you need to use it to make predictions on new CAD files you've never seen before. :class:`FlowInference<hoops_ai.ml.flow_inference.FlowInference>` handles this entire pipeline, ensuring that preprocessing, feature extraction, and prediction are consistent with what the model was trained on.

What Does FlowInference Do?
----------------------------

FlowInference is your inference engine. It loads a trained model checkpoint and uses it to predict labels for new CAD files. The process is simple:

    1. **Create** a FlowInference object with the same FlowModel used during training
    2. **Load** the trained model checkpoint from FlowTrainer
    3. **Preprocess** new CAD files (extract features, build graphs)
    4. **Predict** using the trained model

The critical guarantee: **preprocessing in inference must exactly match preprocessing in training**. FlowInference ensures this automatically by using the same FlowModel configuration.

Key Constructor Parameters
---------------------------

FlowInference has a minimal interface - only what's essential for inference.

**Required Parameters**:

    **cad_loader**: Must be the same loader type used during dataset creation. For example, if you used :class:`HOOPSLoader()<hoops_ai.cadaccess.hoops_exchange.hoops_access.HOOPSLoader>` during training, use it here too.

    **flowmodel**: Must be the **exact same FlowModel configuration** used during training:

        - Same task type (GraphClassification vs GraphNodeClassification)
        - Same ``num_classes``
        - Same preprocessing settings (if you customized any)

**Optional Parameters**:

    **log_file**: Errors during inference (e.g., corrupted CAD files) are logged here instead of crashing.

**Initialization**:

    .. code-block:: python

        from hoops_ai.ml import FlowInference
        from hoops_ai.ml import GraphNodeClassification
        from hoops_ai.cadaccess import HOOPSLoader
        
        # Initialize CAD loader
        cad_loader = HOOPSLoader()
        
        # Create the same flow model used during training
        flowmodel = GraphNodeClassification(
            num_classes=25,
            n_layers_encode=8,
            result_dir="./inference_results"
        )
        
        # Create inference pipeline
        inference = FlowInference(
            cad_loader=cad_loader,
            flowmodel=flowmodel,
            log_file='inference_errors.log'
        )
        
        # Load trained model
        inference.load_from_checkpoint("path/to/best.ckpt")

Core Inference Methods
-----------------------

load_from_checkpoint(checkpoint_path: str)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Loads a trained model from a checkpoint file.

    .. code-block:: python

        inference.load_from_checkpoint("path/to/best.ckpt")

    **Parameters**:

        - ``checkpoint_path`` (str): Path to the ``.ckpt`` file from training

    **What it does**:

        1. Loads the model architecture
        2. Restores trained weights from the checkpoint
        3. Sets the model to evaluation mode (disables dropout, batch norm updates)
        4. Moves model to CPU by default (you can modify to use GPU)

    **Important**: Must be called before ``preprocess()`` or ``predict_and_postprocess()``.

    **Example**:

        .. code-block:: python

            inference = FlowInference(
                cad_loader=HOOPSLoader(),
                flowmodel=GraphClassification(num_classes=26)
            )
            
            # Load the trained model
            inference.load_from_checkpoint("results/ml_output/exp/0128/143052/best.ckpt")
            
            # Now ready for inference
            batch = inference.preprocess("new_part.step")
            prediction = inference.predict_and_postprocess(batch)

preprocess(file_path: str) -> Dict[str, torch.Tensor]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Encodes a single CAD file into a model-ready input batch.

    **Parameters**:

        - ``file_path`` (str): Path to the CAD file (STEP, IGES, etc.)

    **Returns**: ``Dict[str, torch.Tensor]`` - Batched model input

    **What it does**:

        1. Parses CAD file using ``cad_loader``
        2. Extracts features using ``flowmodel.encode_cad_data()``
        3. Builds graph using ``flowmodel.convert_encoded_data_to_graph()``
        4. Batches the graph using ``flowmodel.collate_function()``
        5. Moves tensors to CPU
        6. Cleans up temporary files (graph is written to disk briefly)

    **Performance**: This is the slow step in inference (seconds per file). The model forward pass is fast (milliseconds).

    **Example**:

        .. code-block:: python

            batch = inference.preprocess("path/to/new_part.step")

predict_and_postprocess(batch: Dict[str, torch.Tensor]) -> np.ndarray
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Runs the model on a preprocessed batch and returns predictions.

    **Parameters**:

        - ``batch`` (Dict[str, torch.Tensor]): Output from ``preprocess()``

    **Returns**: ``np.ndarray`` - Predictions as numpy array

    **Performance**: Very fast (milliseconds). The model forward pass is optimized, especially on GPU.

    **Example**:

        .. code-block:: python

            predictions = inference.predict_and_postprocess(batch)
            print(f"Predicted class: {predictions['predictions'][0]}")

Complete Inference Example
--------------------------------

Here's a complete example of using FlowInference to predict on a new CAD file:

.. code-block:: python

    from hoops_ai.ml import FlowInference, GraphNodeClassification
    from hoops_ai.cadaccess import HOOPSLoader
    import time

    # Setup
    cad_loader = HOOPSLoader()
    flowmodel = GraphNodeClassification(num_classes=25)
    inference = FlowInference(cad_loader, flowmodel)
    inference.load_from_checkpoint("./trained_models/best.ckpt")

    # Inference on new CAD file
    start_time = time.time()
    batch = inference.preprocess("new_part.step")
    predictions = inference.predict_and_postprocess(batch)
    total_time = time.time() - start_time

    # Results
    print(f"Inference completed in {total_time:.2f} seconds")
    print(f"Face predictions: {predictions['node_predictions']}")


Next Steps
==========

Related Documentation
---------------------

- :doc:`/programming_guide/cad-data-encoding` - Module Access & Encoder Documentation
- :doc:`/programming_guide/datasets` - DataStorage Documentation
- :doc:`/programming_guide/cad-fundamentals` - CAD fundamentals and B-rep concepts
- :doc:`/api_ref/hoops_ai.ml` - Full API reference for ML module
- :doc:`/tutorials/index` - Complete examples and walkthroughs