Data Visualization Experience

Overview

After extracting features from CAD files and training models, you need to see what you’re working with. Which parts have complex geometry? Are your model’s predictions reasonable? How do different classes look visually? The Insights module answers these questions by bringing your CAD data to life.

The hoops_ai.insights module provides visualization tools that work throughout your entire workflow. Whether you’re exploring raw CAD files, inspecting encoded datasets, validating filtered results, or analyzing model predictions, these tools let you see what’s happening at every step.

Visualize at any stage:

During data exploration: Browse CAD files before encoding to understand your dataset

After encoding: Inspect generated features and verify data quality

While filtering: Visualize query results to confirm you’ve selected the right subset

During training: Check training/validation splits to ensure balanced distributions

After inference: Color-code predictions on 3D geometry to validate model behavior

The Insights module transforms your CAD analysis from abstract data pipelines into visual, interactive workflows.

Think of Insights as your visual dashboard for CAD analysis. Instead of staring at numbers in spreadsheets, you can:

Display grids of CAD previews to quickly scan through datasets

Open interactive 3D models directly in Jupyter notebooks

Visualize which parts match specific criteria (e.g., “show me all gears”)

Color-code model predictions on the actual 3D geometry

The module centers around three core tools:

DatasetViewer helps you explore datasets in bulk. After filtering with DatasetExplorer (e.g., “find all parts with >50 faces labeled ‘bracket’”), DatasetViewer shows you the results as image grids or interactive 3D viewers. You see what your queries return, not just file IDs.

CADViewer focuses on individual models. Load a single CAD file and interact with it in 3D - rotate, zoom, inspect. This is perfect for debugging: “Why did my model misclassify this part? Let me look at its geometry.”

ColorPalette utilities solve a common visualization problem: how to assign meaningful colors to predictions. If your model classifies 26 part types, ColorPalette generates 26 distinct colors automatically. If you want to group predictions (e.g., “color all fasteners blue, all housings red”), it handles that too.

Together, these tools transform abstract data pipelines into visual workflows you can understand at a glance.

DatasetViewer

Purpose

DatasetViewer is your window into dataset exploration. After running queries with DatasetExplorer (e.g., “find all parts labeled as ‘gear’ with more than 30 faces”), you get back a list of file IDs. But IDs don’t tell you much - you need to see the parts.

That’s where DatasetViewer comes in. It takes those query results and shows them to you as:

Image grids: Quick visual scans of 10, 25, or 100 parts at once

Interactive 3D viewers: Rotate and inspect specific models inline in Jupyter

Filtered comparisons: “Show me the 10 most complex gears vs. the 10 simplest”

DatasetViewer enables visualization of multiple CAD files from dataset queries. It’s designed to work seamlessly with DatasetExplorer, allowing you to:

Filter files based on criteria (labels, complexity, features)

Visualize the filtered results as image grids or interactive 3D views

Compare multiple parts side-by-side

Creating a DatasetViewer

The easiest way is to create it directly from an existing DatasetExplorer:

from hoops_ai.dataset import DatasetExplorer
from hoops_ai.insights import DatasetViewer

# Initialize explorer
explorer = DatasetExplorer(flow_output_file="fabwave.flow")

# Create viewer from explorer (automatically extracts visualization paths)
viewer = DatasetViewer.from_explorer(explorer)

# Check availability
viewer.print_statistics()

The from_explorer() method read your .flow file and extracted three critical pieces of information:

PNG preview paths: Pre-rendered images of each CAD file (for grid displays)
3D cache paths: Optimized 3D geometry files (for interactive viewing)
File IDs and names: Mapping between your data and the visualization assets

When you call print_statistics(), you see coverage metrics:

==================================================
Dataset Visualization Statistics
==================================================
Total files:              234
Files with PNG preview:   234 (100.0%)
Files with 3D cache:      234 (100.0%)
Overall coverage:         100.0%
==================================================

This tells you that all 234 files in your dataset have both PNG previews and 3D cache files ready. If coverage is less than 100%, some visualizations might fail for files without cached assets.

You can also create a viewer manually if you already have the paths:

from hoops_ai.insights import DatasetViewer

# Extract data from explorer manually
cache_df = explorer.get_stream_cache_paths()

file_ids = cache_df['id'].astype(int).tolist()
png_paths = cache_df['stream_cache_png'].tolist()
scs_paths = cache_df['stream_cache_3d'].tolist()
file_names = cache_df['name'].tolist()

# Create viewer with explicit data
viewer = DatasetViewer(file_ids, png_paths, scs_paths, file_names)

This is useful when you want to visualize a subset of files or when working with custom data sources.

Visualizing Image Grids

Once you have a viewer, the most common task is displaying query results as image grids. Say you filtered your dataset and got back a list of file IDs - now you want to see what those parts actually look like.

Start by getting the list of available files:

# Get all file IDs
all_ids = viewer.get_available_file_ids()

# Display first 25 files as image grid
fig = viewer.show_preview_as_image(all_ids, k=25)

This creates a 5×5 grid showing PNG previews of 25 CAD files. The k parameter limits how many files to show (useful when you have thousands of results). The layout is automatic - DatasetViewer calculates the optimal grid dimensions based on how many files you’re displaying.

Want more control over the layout? Customize the grid:

# Custom 4-column grid with file names
fig = viewer.show_preview_as_image(
   all_ids,
   k=20,
   grid_cols=4,
   title="Dataset Preview",
   label_format='name',
   show_labels=True
)

Here’s what each parameter does:

file_ids: Which files to show (from your query results)

k: Maximum number of files (if you have 1000 results but only want to see 20)

grid_cols: Force a specific number of columns (otherwise auto-calculated)

figsize: Figure size in inches as (width, height) (otherwise auto-calculated)

show_labels: Whether to overlay file information on each image

label_format: Show file 'id', 'name', or 'both'

title: Overall title for the entire grid

missing_color: RGB color tuple for files without PNG previews (default: gray)

save_path: Automatically save the figure to a file path

The label format controls what text appears on each image. You have three options:

# Show only file IDs
viewer.show_preview_as_image(file_ids, label_format='id')
# Labels: "ID: 42", "ID: 87", ...

# Show only file names
viewer.show_preview_as_image(file_ids, label_format='name')
# Labels: "bracket.step", "housing.step", ...

# Show both ID and name
viewer.show_preview_as_image(file_ids, label_format='both')
# Labels: "ID:42\nbracket.step"

The returned fig is a Matplotlib figure object, so you can save it or customize it further:

# Create and save grid
fig = viewer.show_preview_as_image(
    file_ids,
    k=100,
    save_path='results/dataset_preview.png'
)

Interactive 3D Viewing

Image grids are great for quick overviews, but sometimes you need to inspect geometry in detail. That’s where 3D viewing comes in. The show_preview_as_3d() method creates interactive 3D viewers inline in your Jupyter notebook:

# Open 3 interactive 3D viewers (inline in notebook)
viewers_3d = viewer.show_preview_as_3d(file_ids, k=3)

# Each viewer is a CADViewer instance
print(f"Created {len(viewers_3d)} 3D viewers")

This displays 3 separate 3D viewers in your notebook, one for each file. You can rotate, zoom, and pan each model independently. The k parameter works the same way as in image grids - it limits how many viewers to create (you don’t want 100 3D viewers clogging your notebook).

Want bigger viewers or different layouts? Customize the display:

# Larger inline viewers
viewers_3d = viewer.show_preview_as_3d(
    file_ids,
    k=3,
    width=600,
    height=500
)

# Sidecar layout (opens in side panel) [AVAILABLE IN FUTURE RELEASES]
viewers_3d = viewer.show_preview_as_3d(
   file_ids,
   k=5,
   display_mode='sidecar'
)

Here’s what you can control:

file_ids: Which files to show in 3D

k: Maximum number of 3D viewers (default: 5 - be conservative, each viewer uses resources)

display_mode: 'inline' (in notebook), 'sidecar' (side panel), or 'none' (headless)

layout: 'sequential' (one per cell) or 'grid' (arranged in grid)

host: Server host address (default: '127.0.0.1')

start_port: Starting port for web servers (default: 8000, increments for each viewer)

silent: Suppress server output logs (default: True)

width: Viewer width in pixels (default: 400)

height: Viewer height in pixels (default: 400)

The method returns a list of CADViewer instances (one per file).

You can interact with these viewers programmatically:

# Get selected faces from user interaction
selected_faces = viewers_3d[0].get_selected_faces()
print(f"Selected faces: {selected_faces}")

# Color selected faces red
viewers_3d[0].set_face_color(selected_faces, [255, 0, 0])

# Clear all face colors
viewers_3d[0].clear_face_colors()

# Clean up when done
for v in viewers_3d:
    v.terminate()

When you’re done viewing, call terminate() on each viewer to shut down its web server and free up ports.

Side-By-Side Comparison (AVAILABLE IN FUTURE RELEASES)

You’ll be able to load two CAD files and view them in a split-screen layout, making it easy to spot differences and similarities.

# Compare two specific files
viewer_a, viewer_b = viewer.create_comparison_view(
   file_id_a=42,
   file_id_b=87,
   display_mode='sidecar'
)

# Highlight same features in both
viewer_a.set_face_color([1, 2, 3], [255, 0, 0])
viewer_b.set_face_color([1, 2, 3], [255, 0, 0])

Working with DatasetExplorer

The real power of DatasetViewer comes from combining it with DatasetExplorer filtering capabilities. You filter your dataset based on criteria (labels, complexity, geometry), then immediately visualize the results.

Here’s the typical workflow:

Filter -> Visualize

from hoops_ai.dataset import DatasetExplorer
from hoops_ai.insights import DatasetViewer

# Step 1: Initialize explorer and viewer
explorer = DatasetExplorer(flow_output_file="fabwave.flow")
viewer = DatasetViewer.from_explorer(explorer)

# Step 2: Define filter condition
high_complexity = lambda ds: ds['num_nodes'] > 30

# Step 3: Get file IDs matching condition
complex_file_ids = explorer.get_file_list(
    group="graph",
    where=high_complexity
)

print(f"Found {len(complex_file_ids)} complex files")

# Step 4: Visualize filtered results
fig = viewer.show_preview_as_image(
    complex_file_ids,
    title="High Complexity Parts",
    grid_cols=5
)

Let’s break this down. First, you create both an explorer and a viewer from the same .flow file. The explorer handles querying (find files matching criteria), while the viewer handles visualization (show me what they look like).

Then you define a filter condition using a lambda function. Here, high_complexity says “keep only files where the number of graph nodes exceeds 30.” The explorer applies this filter and returns matching file IDs.

Finally, you pass those IDs to the viewer, which displays them as an image grid. The result: you immediately see what “high complexity” parts look like in your dataset.

Example: Filter by Label and Visualize

You can filter by any criterion available in your dataset. For example, filter by label:

# Filter files with specific label
pipe_fittings = lambda ds: ds['file_label'] == 15

pipe_file_ids = explorer.get_file_list(
   group="file",
   where=pipe_fittings
)

print(f"Found {len(pipe_file_ids)} pipe fittings")

# Show image grid
viewer.show_preview_as_image(
   pipe_file_ids,
   k=16,
   title='Pipe Fittings (Label 15)',
   label_format='name'
)

# Show 3D views of first 4
viewers_3d = viewer.show_preview_as_3d(pipe_file_ids, k=4)

Or combine multiple criteria:

Example: Multi-Criteria Filtering

# Complex query: high face count AND specific label
def complex_brackets(ds):
   return (ds['num_nodes'] > 25) & (ds['file_label'] == 3)

bracket_ids = explorer.get_file_list(group="graph", where=complex_brackets)

# Visualize results
viewer.show_preview_as_image(
   bracket_ids,
   k=20,
   title=f'Complex Brackets ({len(bracket_ids)} files)',
   grid_cols=4,
   save_path='results/complex_brackets.png'
)

This filter-then-visualize pattern is powerful. You’re not just looking at random parts, you’re seeing exactly the subset you care about, filtered by any combination of attributes in your dataset.

Helper Methods

DatasetViewer provides several utility methods for inspecting your data:

Get File Information

# Get info for specific file
file_info = viewer.get_file_info(42)
print(f"Name: {file_info['name']}")
print(f"PNG available: {file_info['png_path'] is not None}")
print(f"3D available: {file_info['stream_cache_path'] is not None}")

Get Available Files

# Get all file IDs with visualization data
available_ids = viewer.get_available_file_ids()
print(f"Total files with visualization: {len(available_ids)}")

Statistics

# Get detailed statistics
stats = viewer.get_statistics()
print(f"Total files: {stats['total_files']}")
print(f"PNG coverage: {stats['png_percentage']:.1f}%")
print(f"3D coverage: {stats['3d_percentage']:.1f}%")

# Pretty print
viewer.print_statistics()

The get_file_info() method returns a dictionary with file name and paths to visualization assets (PNG and 3D cache). Use it to check if a specific file has visualization data before trying to display it.

The get_available_file_ids() method gives you all file IDs that have at least some visualization data. This is useful for iterating over visualizable files.

The get_statistics() method returns coverage metrics (how many files have PNGs, how many have 3D caches). The print_statistics() method shows the same info in a nicely formatted table.

CADViewer

Purpose

While DatasetViewer handles bulk visualization, CADViewer focuses on individual models. It provides interactive 3D viewing with face coloring, selection, and manipulation. This is perfect for:

Detailed inspection of single CAD models (rotate, zoom, inspect geometry)

Visualizing ML predictions on 3D geometry (color each face by its predicted class)

Interactive feature highlighting (select faces, highlight regions)

Educational demonstrations (show specific geometric features)

The key difference: CADViewer gives you programmatic control. You can color specific faces, retrieve user selections, and update the view dynamically based on predictions or analysis results.

Loading a CAD File

The simplest way to view a CAD file is to create a viewer, load a file, and display it:

from hoops_ai.insights import CADViewer

# Create viewer (auto-finds free port)
viewer = CADViewer()

# Load CAD file
viewer.load_cad_file("bracket.step")

# Display in notebook
viewer.show()

The viewer starts a local web server (automatically finding an available port between 8000-8099), loads your CAD file, and embeds an interactive 3D viewer in your Jupyter notebook. You can now rotate, zoom, and pan the model.

Quick View (One-Liner)

For even simpler one-off viewing, use the quick_view() convenience function:

from hoops_ai.insights import quick_view

# Load and display in one call
viewer = quick_view("bracket.step")

This creates the viewer and loads the file in a single line. The returned viewer object is still fully functional, you can color faces, get selections, etc.

Display Modes

CADViewer supports three display modes depending on your workflow:

# Inline display (embedded in notebook)
viewer = CADViewer(display_mode='inline')
viewer.load_cad_file("bracket.step")
viewer.show(width=600, height=500)

# Sidecar display (side panel)
viewer = CADViewer(display_mode='sidecar')
viewer.load_cad_file("bracket.step")
viewer.show()

# No display (server only)
viewer = CADViewer(display_mode='none')
viewer.load_cad_file("bracket.step")
print(f"Viewer URL: {viewer.get_viewer_url()}")

Inline mode (default) embeds the 3D viewer directly in your notebook output. This is great for documentation and sharing notebooks.
Sidecar mode opens the viewer in JupyterLab’s side panel, giving you a split-screen view. You can write code on one side while seeing the model on the other. This is ideal for interactive development. (Note: In classic Jupyter, this falls back to inline mode.)
None mode runs the server without displaying anything. You get the viewer URL and can open it in a separate browser tab. This is useful for debugging or when you want manual control over display.

Managing Ports

By default, CADViewer automatically finds an available port:

# Auto-find free port (default, recommended)
viewer = CADViewer()  # Finds port 8000-8099

The viewer scans ports 8000 through 8099 and picks the first available one. This prevents conflicts when running multiple viewers.

If you need a specific port (e.g., for firewall rules), you can specify it:

# Use specific port (strict mode)
viewer = CADViewer(port=9000)  # Must be available or fails

In strict mode, if port 9000 is already in use, the viewer will fail rather than picking a different port. This ensures you always know which port is being used.

Automatic Cleanup

Viewers create web servers that need to be shut down when you’re done. The easiest way is using a context manager:

# Automatic resource cleanup
with CADViewer() as viewer:
    viewer.load_cad_file("model.step")
    viewer.show()
    # ... interact with viewer ...
# Automatically terminates on exit

When the with block exits, the viewer’s server is automatically terminated and the port is freed. This is the recommended approach for scripts and notebooks to avoid port leaks.

Alternatively, you can manually terminate:

# Terminate viewer and release resources
viewer.terminate()

# Check if still active
print(f"Active: {viewer.is_active}")  # False

Coloring and Selecting Faces

The most powerful feature of CADViewer is the ability to color individual faces. This is essential for visualizing ML predictions, highlighting features, or showing analysis results.

Start by interacting with the viewer to select faces:

# User: Click faces in 3D viewer (Ctrl+Click for multiple)
# Get selected face indices
selected = viewer.get_selected_faces()
print(f"Selected {len(selected)} faces: {selected}")

# Color selected faces
viewer.set_face_color(selected, [255, 0, 0])  # Red
viewer.set_face_color(selected, [0, 255, 0])  # Green
viewer.set_face_color(selected, [0, 0, 255])  # Blue

# Default highlight color
viewer.set_face_color(selected)  # Light blue

The workflow: you click faces in the 3D viewer (hold Ctrl/Cmd to select multiple), then call get_selected_faces() to retrieve their indices. With those indices, you can color them using set_face_color().

Colors are specified as RGB lists: [red, green, blue] where each value is 0-255. If you don’t specify a color, it defaults to light blue for highlighting.

You can also color faces directly by index without user interaction:

# Color faces by index
hole_faces = [1, 2, 5, 7]
viewer.set_face_color(hole_faces, [255, 100, 0])

This is useful when you have predictions from a model or results from an analysis, you already know which faces to color.

To reset and remove all coloring:

# Remove all face coloring
viewer.clear_face_colors()

This returns all faces to their default appearance.

Coloring Groups with Visual Feedback

When you have multiple groups of faces to color (like different feature types from a classifier), you can color them all at once with visual feedback:

# Define feature groups
groups = [
    ([1, 2, 6], (255, 0, 0), 'through hole'),
    ([3, 4], (0, 0, 255), 'blind hole'),
    ([8, 9, 10, 11], (0, 255, 0), 'pocket')
]

# Color with progress feedback
viewer.color_faces_by_groups(
    groups,
    delay=0.5,      # Delay between groups (seconds)
    verbose=True    # Show colored terminal output
)

Each group is a tuple of (face_indices, color, description). The color_faces_by_groups() method colors each group sequentially, with an optional delay between them so you can watch the colors appear. When verbose=True, you get colored terminal output:

🟥 through hole (3 faces)
🟦 blind hole (2 faces)
🟩 pocket (4 faces)

The delay is useful for presentations or debugging, you can watch each feature type get colored and verify the predictions make sense. For production use, set delay=0 to color everything instantly.

Loading Different File Formats

CAD Files (Auto-Convert to SCS)

# Automatically converts STEP/IGES to SCS format
viewer.load_cad_file("model.step", auto_convert=True)
viewer.load_cad_file("model.iges", auto_convert=True)

SCS Files (Direct Loading)

When you call load_cad_file() with auto_convert=True, HOOPS Exchange converts your STEP or IGES file to SCS format (HOOPS’ optimized streaming format) before loading. This conversion happens once and is cached.

If you already have an SCS file, load it directly for faster performance:

# Load pre-converted SCS file directly (faster)
viewer.load_scs_file("model.scs")

Background Options

SCS files load almost instantly because they’re already in the viewer’s native format. If you’re loading the same model repeatedly, convert it once and reuse the SCS file.

You can also control the background color:

# White background (default, good for presentations)
viewer.load_cad_file("model.step", white_background=True)

# Black background (optional)
viewer.load_cad_file("model.step", white_background=False)

White backgrounds work better for presentations and documentation (the default). Black backgrounds can reduce eye strain during long analysis sessions.

Checking Viewer Status

You can query the viewer’s current state at any time:

status = viewer.get_status()
print(f"Active: {status['active']}")
print(f"Model loaded: {status['model_loaded']}")
print(f"Viewer URL: {status['viewer_url']}")
print(f"Port: {status['port']}")

The status dictionary tells you whether the viewer’s server is running (active), whether a model is currently loaded (model_loaded), the URL to access the viewer (viewer_url), and which port it’s using (port).

This is useful for debugging connection issues or verifying the viewer is ready before proceeding with operations.

Validating Colors

Before passing colors to the viewer, you can validate they’re properly formatted RGB tuples:

from hoops_ai.insights import CADViewer

# Check if color is valid RGB
CADViewer.validate_color([255, 0, 0])   # True
CADViewer.validate_color([255, 0, 256]) # False (out of range)
CADViewer.validate_color([255, 0])      # False (wrong length)

Valid RGB colors are lists or tuples of exactly 3 integers, each in the range 0-255. The validate_color() method returns True if valid, False otherwise. This helps catch errors before they cause viewer issues.

Quick View Function

Convenience function for one-line visualization:

from hoops_ai.insights import quick_view

# Basic usage (auto-finds port)
viewer = quick_view("model.step")

# Inline with custom size
viewer = quick_view("model.step", display_mode='inline')

# Sidecar display
viewer = quick_view("model.step", display_mode='sidecar')

# Specific port (strict mode)
viewer = quick_view("model.step", port=9000)

Visualization Utils

ColorPalette: Managing Label Colors

When you’re visualizing classification results, you need consistent colors for each class. ColorPalette solves this by managing the mapping between label IDs and their colors.

Create a palette from your label definitions:

from hoops_ai.insights.utils import ColorPalette

# Define label descriptions
labels = {
   0: "background",
   1: "through hole",
   2: "blind hole",
   3: "pocket",
   4: "slot"
}

# Create palette with automatic colors
palette = ColorPalette.from_labels(
   labels,
   cmap_name='hsv',  # Matplotlib colormap
   reserved_colors={
      0: (200, 200, 200),  # Gray for background
      1: (255, 0, 0)        # Red for through holes
   }
)

Access Colors and Descriptions

The from_labels() method creates a palette from your label dictionary. You provide a color_map for the classes you care about, and any remaining classes get automatically assigned distinct colors from a built-in palette.

In this example, you explicitly set colors for background (gray), through holes (red), blind holes (blue), and pockets (green). The “slot” class (label 4) gets an auto-generated color since you didn’t specify one.

Once you have a palette, use it to look up colors and descriptions:

# Get color for label
color = palette.get_color(1)  # (255, 0, 0)

# Get description
desc = palette.get_description(1)  # "through hole"
# Or use alias
label = palette.get_label(1)  # "through hole"

# Get all mappings
all_colors = palette.get_all_colors()
# {0: (200, 200, 200), 1: (255, 0, 0), ...}

all_descs = palette.get_all_descriptions()
# {0: "background", 1: "through hole", ...}

Palette Operations

The palette acts like a bidirectional lookup: given a label ID, you get its color and description. The get_all_colors() and get_all_descriptions() methods return dictionaries with all mappings.

You can also iterate over the palette:

# Check membership
1 in palette  # True

# Get size
len(palette)  # 5

# Iterate
for label_id in palette:
    color = palette.get_color(label_id)
    desc = palette.get_description(label_id)
    print(f"Label {label_id}: {desc} = {color}")

# Iterate with items
for label_id, (color, desc) in palette.items():
    print(f"{label_id}: {desc} -> {color}")

This makes it easy to generate legends, reports, or validation summaries showing all your label-color associations.

Grouping Predictions for Visualization

After running a classifier on CAD faces, you get an array of predictions (one label per face). To visualize these predictions, you need to group faces by their predicted label and assign colors. The group_predictions_by_label() function does exactly this:

from hoops_ai.insights.utils import group_predictions_by_label
import numpy as np

# Predictions array (one per face)
predictions = np.array([0, 1, 1, 2, 0, 2, 1, 3, 3])
# Face 0: background, Face 1-2,6: hole, Face 3,5: blind hole, etc.

# Group by label with colors
groups = group_predictions_by_label(
   predictions,
   palette,
   exclude_labels={0}  # Skip background
)

# Result format: [(face_indices, color, description), ...]
# [
#     ([1, 2, 6], (255, 0, 0), 'through hole'),
#     ([3, 5], (0, 0, 255), 'blind hole'),
#     ([7, 8], (0, 255, 0), 'pocket')
# ]

# Use directly with CADViewer
viewer.color_faces_by_groups(groups, verbose=True)

the function takes your prediction array and groups faces that have the same predicted label. For each group, it looks up the color and description from the palette. The result is a list of tuples ready to pass to color_faces_by_groups().

The exclude_labels parameter lets you skip certain labels (like background) that you don’t want to color. In this example, faces predicted as “background” (label 0) are excluded, so only features are colored.

This is the standard workflow for visualizing ML predictions: 1. Run inference to get predictions (array of label IDs) 2. Create a ColorPalette with your label descriptions and colors 3. Group predictions using group_predictions_by_label() 4. Color the CAD model using viewer.color_faces_by_groups()

Complete Workflow Examples

Example 1: Dataset Exploration

This example shows the complete workflow for exploring a dataset: load it, check coverage, visualize random samples, filter by complexity, and view results in both 2D grids and 3D viewers.

from hoops_ai.dataset import DatasetExplorer
from hoops_ai.insights import DatasetViewer

# Initialize
explorer = DatasetExplorer(flow_output_file="fabwave.flow")
viewer = DatasetViewer.from_explorer(explorer)

# Print statistics
viewer.print_statistics()

# Get all files
all_ids = viewer.get_available_file_ids()

# Visualize random sample
import random
sample_ids = random.sample(all_ids, 25)
viewer.show_preview_as_image(sample_ids, title='Random Sample')

# Filter by complexity
complex_parts = lambda ds: ds['num_nodes'] > 40
complex_ids = explorer.get_file_list(group="graph", where=complex_parts)

# Visualize complex parts
viewer.show_preview_as_image(
   complex_ids,
   k=16,
   title=f'High Complexity Parts ({len(complex_ids)} files)',
   save_path='results/complex_parts.png'
)

# Interactive 3D view of first 3
viewers_3d = viewer.show_preview_as_3d(complex_ids, k=3, width=500, height=400)

# Cleanup
for v in viewers_3d:
   v.terminate()
explorer.close()

The workflow: First, print statistics to verify visualization coverage (are PNGs and 3D files available?). Then grab a random sample to get a feel for the dataset. Next, apply a complexity filter (more than 40 graph nodes) and visualize those results both as an image grid (saved to disk) and as interactive 3D viewers (for detailed inspection). Finally, clean up resources.

Example 2: Label-Based Filtering

This example demonstrates filtering by specific labels and creating high-resolution visualizations:

from hoops_ai.dataset import DatasetExplorer
from hoops_ai.insights import DatasetViewer

# Setup
explorer = DatasetExplorer(flow_output_file="fabwave.flow")
viewer = DatasetViewer.from_explorer(explorer)

# Get label descriptions
label_df = explorer.get_descriptions("file_label")
print(label_df)

# Filter by specific label
pipe_fittings = lambda ds: ds['file_label'] == 15
pipe_ids = explorer.get_file_list(group="file", where=pipe_fittings)

print(f"\nFound {len(pipe_ids)} pipe fittings")

# Create visualization
fig = viewer.show_preview_as_image(
   pipe_ids,
   k=25,
   grid_cols=5,
   title='Pipe Fittings (Label 15)',
   label_format='name',
   figsize=(15, 8)
)

# Save high-resolution version
fig.savefig('results/pipe_fittings_overview.png', dpi=300, bbox_inches='tight')

# Cleanup
explorer.close()

The workflow: First, examine available labels using get_descriptions() to understand what label 15 represents. Filter for that specific label. Create a detailed visualization with both IDs and names shown (label_format='both'), arranged in 6 columns. Save the result at high resolution (300 DPI) for presentation or publication.

Example 3: ML Prediction Visualization

This example shows the complete pipeline from model predictions to colored 3D visualization:

from hoops_ai.insights import CADViewer
from hoops_ai.insights.utils import ColorPalette, group_predictions_by_label
import numpy as np

# Load model predictions (example)
predictions = np.load('predictions.npy')  # Shape: (n_faces,)

# Define label palette
labels = {
   0: "no feature",
   17: "through hole",
   18: "blind hole",
   23: "pocket",
   24: "slot"
}

palette = ColorPalette.from_labels(
   labels,
   cmap_name='Set3',
   reserved_colors={
      0: (220, 220, 220),  # Light gray for no feature
      17: (255, 0, 0),      # Red for through holes
      18: (255, 165, 0)     # Orange for blind holes
   }
)

# Group predictions by label
groups = group_predictions_by_label(
   predictions,
   palette,
   exclude_labels={0}
)

# Visualize on 3D model
viewer = CADViewer()
viewer.load_cad_file("test_part.step")
viewer.show(display_mode='sidecar')

# Color faces by prediction
viewer.color_faces_by_groups(groups, delay=0.3, verbose=True)

# Get statistics
print("\nPrediction Distribution:")
for indices, color, desc in groups:
   print(f"  {desc}: {len(indices)} faces")

# Cleanup
viewer.terminate()

The workflow: Load predictions from your trained model (a NumPy array with one label per face). Create a ColorPalette mapping each label ID to a color and description. Use group_predictions_by_label() to organize faces by their predicted label, excluding background (label 0). Load the CAD model in a viewer, then color it using the grouped predictions. The verbose=True option shows colored terminal output as each feature type is colored. Finally, print statistics showing how many faces were predicted for each class.

Example 4: Side-by-Side Comparison

from hoops_ai.insights import DatasetViewer

# Setup viewer
viewer = DatasetViewer.from_explorer(explorer)

# Compare original vs optimized design
viewer_original, viewer_optimized = viewer.create_comparison_view(
   file_id_a=100,  # Original design
   file_id_b=150,  # Optimized design
   display_mode='sidecar'
)

# Highlight same features in both
critical_faces = [5, 7, 12, 18]

viewer_original.set_face_color(critical_faces, [255, 0, 0])
viewer_optimized.set_face_color(critical_faces, [255, 0, 0])

# User can interact with both viewers simultaneously
# Compare geometry, analyze changes, etc.

# Cleanup
viewer_original.terminate()
viewer_optimized.terminate()

Example 5: Batch Processing with Visualization

from hoops_ai.dataset import DatasetExplorer
from hoops_ai.insights import DatasetViewer
import matplotlib.pyplot as plt

# Initialize
explorer = DatasetExplorer(flow_output_file="dataset.flow")
viewer = DatasetViewer.from_explorer(explorer)

# Get distribution of face counts
dist = explorer.create_distribution(
   key="num_nodes",
   bins=10,
   group="graph"
)

# Visualize distribution
bin_centers = 0.5 * (dist['bin_edges'][1:] + dist['bin_edges'][:-1])
plt.figure(figsize=(10, 5))
plt.bar(bin_centers, dist['hist'], width=(dist['bin_edges'][1] - dist['bin_edges'][0]))
plt.xlabel('Number of Faces')
plt.ylabel('Count')
plt.title('Face Count Distribution')
plt.savefig('results/face_count_distribution.png', dpi=300)
plt.show()

# Visualize samples from each bin
for i, bin_files in enumerate(dist['file_id_codes_in_bins']):
   if len(bin_files) > 0:
      # Take up to 9 samples from this bin
      sample_ids = bin_files[:9]

      # Create visualization
      fig = viewer.show_preview_as_image(
            sample_ids,
            k=9,
            grid_cols=3,
            title=f'Bin {i+1}: {int(dist["bin_edges"][i])}-{int(dist["bin_edges"][i+1])} faces',
            figsize=(9, 9)
      )

      # Save
      fig.savefig(f'results/bin_{i+1}_samples.png', dpi=150)
      plt.close(fig)

print("Batch visualization complete!")
explorer.close()

Best Practices

Performance Tips

When working with large datasets or many CAD files, performance becomes critical. Here’s how to keep your visualization workflows fast and responsive.

Limit 3D Viewers: Opening many 3D viewers consumes significant system resources, each viewer runs a separate server process and maintains an active web session. Keep the number reasonable:

# Good: Limit to 3-5 viewers
viewers = viewer.show_preview_as_3d(file_ids, k=3)

# Avoid: Too many simultaneous 3D viewers
viewers = viewer.show_preview_as_3d(file_ids, k=50)  # May crash!

Why this matters: Each CADViewer instance spawns a hoops-viewer server process. With 50 viewers, you’d have 50 server processes competing for CPU and memory. Your system will slow to a crawl or run out of resources. Stick to 3-5 simultaneous viewers for responsive performance.

Use Image Grids for Overview: When you need to see many parts at once, image grids are vastly more efficient than 3D viewers:

# Efficiently preview 100 files
viewer.show_preview_as_image(file_ids, k=100)

Image grids load pre-rendered PNGs, which are lightweight and fast. You can display 100+ parts in a single grid without the overhead of running server processes. Use this for quick overviews, then open 3D viewers for the specific parts you want to inspect in detail.

Clean Up Resources: Always terminate 3D viewers when done. Each viewer holds system resources (ports, memory, CPU) until explicitly closed:

# Manual cleanup
for v in viewers_3d:
    v.terminate()

# Or use context manager
with CADViewer() as viewer:
    # ... use viewer ...
    pass  # Auto-cleanup

The context manager (with statement) is safer because it guarantees cleanup even if your code raises an exception. Manual cleanup with terminate() works but requires discipline, it’s easy to forget, especially during interactive experimentation in Jupyter notebooks.

Filter Before Visualizing: Reduce your data before creating visualizations. Don’t visualize files that don’t have the required resources:

# Filter first
filtered_ids = viewer.filter_by_availability(
    all_ids,
    require_png=True
)

# Then visualize
viewer.show_preview_as_image(filtered_ids, k=25)

This prevents errors from missing files and avoids wasting time trying to display files that don’t exist. The filter_by_availability() method checks which files actually have PNG/SCS files available, so you only work with valid data.

Color Scheme Guidelines

Choosing the right colors for your visualizations isn’t just aesthetic, it affects how quickly you can interpret results and spot patterns in your data.

Use Reserved Colors for Important Labels: Certain labels deserve specific colors for consistency and clarity:

palette = ColorPalette.from_labels(
    labels,
    reserved_colors={
        0: (200, 200, 200),  # Gray for background/no-label
        1: (255, 0, 0)        # Red for critical features
    }
)

The reserved colors ensure that label 0 (typically background or “no feature”) always appears gray, and label 1 (perhaps a critical feature type) always appears red. This consistency helps you quickly scan visualizations, you immediately know “gray = background, red = important thing to check.”

Choose Appropriate Colormaps: Different colormaps work better for different types of data:

# Discrete labels (feature types): Use distinct colors
palette = ColorPalette.from_labels(labels, colormap='tab20')  # 20 distinct colors
# or 'Set3', 'Paired' for smaller label sets

# Sequential data (continuous values): Use gradients
palette = ColorPalette.from_labels(labels, colormap='viridis')  # Dark to bright
# or 'plasma', 'cividis' for perceptually uniform gradients

# Diverging data (values around a center): Use opposing colors
palette = ColorPalette.from_labels(labels, colormap='RdBu')  # Red-white-blue
# or 'coolwarm' for warm-to-cool transition

# Many labels (10+): Use full spectrum
palette = ColorPalette.from_labels(labels, colormap='hsv')  # Full hue range
# Warning: Colors may be hard to distinguish with many labels

The colormap choice affects how easily you can distinguish different labels. ‘tab20’ and ‘Paired’ give maximally distinct colors, making it easy to tell labels apart. ‘viridis’ and ‘plasma’ show smooth progressions (useful for representing continuous values discretized into bins). ‘RdBu’ and ‘coolwarm’ emphasize differences from a midpoint (useful for showing deviations). ‘hsv’ covers the full color spectrum but can be confusing with many labels.

Discrete labels: ‘tab20’, ‘Set3’, ‘Paired’

Sequential data: ‘viridis’, ‘plasma’, ‘cividis’

Diverging data: ‘RdBu’, ‘coolwarm’

Many labels: ‘hsv’ (but can be hard to distinguish)

Exclude Background from Visualization: Don’t waste colors on background faces, they clutter the visualization and make actual features harder to see:

groups = group_predictions_by_label(
    predictions,
    palette,
    exclude_labels={0}  # Don't color background
)

By excluding label 0 (background), those faces stay their default color (typically light gray from the base geometry). This keeps the focus on actual features. You immediately see which faces the model classified as features, while background faces fade into… well, the background.

Integration Patterns

The real power of the Insights module comes from combining its components in systematic workflows. Here are the three most common patterns you’ll use.

Pattern 1: Explore → Filter → Visualize

This is the fundamental pattern for dataset exploration. You start by understanding what’s in your dataset, filter down to interesting subsets, then visualize those subsets:

# 1. Explore dataset
explorer = DatasetExplorer(flow_output_file="dataset.flow")
explorer.print_table_of_contents()

# 2. Filter files
interesting_files = explorer.get_file_list(
   group="graph",
   where=lambda ds: ds['num_nodes'] > 30
)

# 3. Visualize
viewer = DatasetViewer.from_explorer(explorer)
viewer.show_preview_as_image(interesting_files, k=25)

The workflow: First, call print_table_of_contents() to see what data is available in your dataset, what groups exist, what attributes you can filter on, and how many files you have. This gives you the lay of the land.

Next, define a filter condition (here: parts with more than 30 graph nodes, indicating higher geometric complexity). The get_file_list() method applies this filter and returns matching file IDs.

Finally, create a viewer from the same explorer (this automatically extracts visualization paths) and display the filtered results as an image grid. You’ve gone from “I have 10,000 parts” to “here are the 150 complex ones, and this is what they look like.”

Pattern 2: Analyze → Sample → Inspect

When you want to understand distributions and examine representative samples from different ranges:

# 1. Analyze distribution
dist = explorer.create_distribution(key="num_nodes", bins=5)

# 2. Sample from specific bin
high_complexity_bin = dist['file_id_codes_in_bins'][-1]  # Last bin
sample = high_complexity_bin[:10]

# 3. Inspect in 3D
viewers = viewer.show_preview_as_3d(sample, k=3)

The workflow: Use create_distribution() to bin your data by some attribute (here: number of graph nodes). This tells you how your parts are distributed, do you have mostly simple parts with a few complex ones, or is it evenly distributed?

The distribution returns bins with file IDs in each bin. You can sample from specific bins, for example, the last bin contains the most complex parts. Take a sample from that bin (here: 10 files).

Then create 3D viewers for a few of those samples (here: 3 viewers) to visually inspect what “high complexity” actually looks like in your dataset. This helps you understand whether your filters are capturing what you want.

Pattern 3: Predict → Visualize → Validate

This is the ML prediction workflow, run your model, visualize results on 3D geometry, and validate predictions visually:

# 1. Run predictions (from ML model)
predictions = model.predict(test_data)

# 2. Group by prediction
groups = group_predictions_by_label(predictions, palette)

# 3. Visualize on geometry
cad_viewer = CADViewer()
cad_viewer.load_cad_file("test_part.step")
cad_viewer.show()
cad_viewer.color_faces_by_groups(groups)

# 4. Validate visually and correct if needed

The workflow: After training your model, run inference on test data to get predictions (one label per face). Create a ColorPalette mapping labels to colors, and use group_predictions_by_label() to organize faces by their predicted labels.

Now comes validation: visually inspect the colored model. Do the predictions make sense? Are holes colored red as expected? If you see something wrong, you can click faces to get their predictions and understand what the model is doing. This visual feedback is crucial for debugging ML models, you immediately see when the model confuses similar features or misses edge cases.

Troubleshooting Common Issues

Port already in use:

If you see errors about the port being unavailable, another process is using that port. On Windows PowerShell, find and kill the process:

# Problem: Specified port is busy
viewer = CADViewer(port=8000)  # Error if port 8000 is busy

# Solution 1: Use auto port selection (recommended)
viewer = CADViewer()  # Auto-finds free port

# Solution 2: Find and kill process using port
# Windows PowerShell:
# netstat -ano | findstr :8000
# taskkill /F /PID <pid>

3D viewer not displaying:

If the viewer window doesn’t appear or you see a blank iframe, check if hoops-viewer is available in your environment:

# Check if hoops-viewer is installed
from hoops_ai.insights.hoops_viewer_interface import is_viewer_available

if not is_viewer_available():
   print("Install hoops-viewer: pip install hoops-viewer")

Missing PNG or SCS files:

When using DatasetViewer with files from a dataset, you might encounter missing files if the dataset creation didn’t generate all expected outputs. Filter your file list to only include files with available visualizations:

# Check availability
stats = viewer.get_statistics()
print(f"PNG coverage: {stats['png_percentage']:.1f}%")
print(f"3D coverage: {stats['3d_percentage']:.1f}%")

# Filter to available files only
available = viewer.filter_by_availability(
   all_ids,
   require_png=True,
   require_3d=True
)

This prevents errors from trying to display non-existent files. If most of your files are missing, check your dataset creation flow, the PNG and SCS conversion tasks might have failed or been skipped.

Image grid not displaying in Jupyter:

# Ensure matplotlib backend is configured
import matplotlib
matplotlib.use('inline')  # For Jupyter notebooks

import matplotlib.pyplot as plt
plt.ion()  # Interactive mode

# Then create visualization
fig = viewer.show_preview_as_image(file_ids)
plt.show()  # Explicitly show if needed

Summary

The Insights module provides a complete visualization solution for CAD datasets:

DatasetViewer

✅ Batch visualization of query results

✅ Image grids for quick overview

✅ Interactive 3D for detailed inspection

✅ Seamless DatasetExplorer integration

✅ Side-by-side comparison

CADViewer

✅ Interactive 3D viewing in notebooks

✅ Face coloring and selection

✅ ML prediction visualization

✅ Multiple display modes

✅ Automatic resource management

Visualization Utils

✅ ColorPalette for label-color management

✅ Automatic color generation

✅ Prediction grouping utilities

✅ Matplotlib colormap integration

Typical Workflows

DatasetExplorer → Filter Files → DatasetViewer → Image Grid / 3D Views
                                       ↓
                                 CADViewer → Face Coloring → Visual Analysis

The Insights module transforms data analysis into visual understanding, making it easy to explore large CAD datasets, validate ML predictions, and communicate findings effectively.

Next Steps

You now understand how to visualize CAD data using the Insights module. Here’s what to explore next:

Explore datasets - The Dataset Exploration and Mining guide shows you how to structure and query your CAD datasets. You’ll learn about the .flow format, how DatasetExplorer works, and what filtering operations are available. This is essential for understanding what data you can visualize.

Build ML workflows - Combine visualization with model training using the Develop Your own ML Model guide. You’ll see how to train classifiers on CAD features, run inference to generate predictions, and use the visualization tools from this guide to display results on 3D geometry.

Interactive analysis - The CAD Data Encoding guide explains how features are extracted from CAD files. Understanding the encoding process helps you interpret what your visualizations are showing and why certain faces get classified as specific feature types.

Hands-on examples - Try the complete tutorials in Tutorials to see end-to-end workflows combining data loading, model training, and result visualization.

Data Visualization Experience

Overview

DatasetViewer

Purpose

Creating a DatasetViewer

Visualizing Image Grids

Interactive 3D Viewing

Working with DatasetExplorer

Helper Methods

Get File Information

Get Available Files

Statistics

CADViewer

Purpose

Loading a CAD File

Quick View (One-Liner)

Display Modes

Managing Ports

Automatic Cleanup

Coloring and Selecting Faces

Coloring Groups with Visual Feedback

Loading Different File Formats

CAD Files (Auto-Convert to SCS)

SCS Files (Direct Loading)

Background Options

Checking Viewer Status

Validating Colors

Quick View Function

Visualization Utils

ColorPalette: Managing Label Colors

Access Colors and Descriptions

Palette Operations

Grouping Predictions for Visualization

Complete Workflow Examples

Example 1: Dataset Exploration

Example 2: Label-Based Filtering

Example 3: ML Prediction Visualization

Example 4: Side-by-Side Comparison

Example 5: Batch Processing with Visualization

Best Practices

Performance Tips

Color Scheme Guidelines

Integration Patterns

Pattern 1: Explore → Filter → Visualize

Pattern 2: Analyze → Sample → Inspect

Pattern 3: Predict → Visualize → Validate

Troubleshooting Common Issues

Summary

Next Steps

Hello! I'm HOOPSY