hoops_ai.insights.dataset_viewer

Dataset Viewer Module for HOOPS AI Insights.

This module provides the DatasetViewer class for visualizing collections of CAD files based on dataset queries. It bridges the gap between DatasetExplorer queries and visual analysis by creating collages of images or interactive 3D views.

Classes

DatasetViewer(file_ids, png_paths, scs_paths)

Powerful visualization tool for exploring CAD datasets.

class hoops_ai.insights.dataset_viewer.DatasetViewer(file_ids, png_paths, scs_paths, file_names=None)

Bases: object

Powerful visualization tool for exploring CAD datasets.

This class accepts lists of file IDs and their corresponding visualization paths to enable visualization of CAD files as either: 1. Image collages/grids using PNG previews 2. Interactive 3D views using stream cache files

The DatasetViewer is designed to work with data from DatasetExplorer but remains decoupled, accepting only the necessary lists for maximum flexibility.

Examples

>>> # Get data from explorer
>>> explorer = DatasetExplorer(flow_output_file="flow.json")
>>> cache_df = explorer.get_stream_cache_paths()
>>>
>>> # Extract lists
>>> file_ids = cache_df['id'].tolist()
>>> png_paths = cache_df['stream_cache_png'].tolist()
>>> scs_paths = cache_df['stream_cache_3d'].tolist()
>>> file_names = cache_df['name'].tolist()
>>>
>>> # Create viewer
>>> viewer = DatasetViewer(file_ids, png_paths, scs_paths, file_names)
>>>
>>> # Or use convenience method
>>> viewer = DatasetViewer.from_explorer(explorer)
>>>
>>> # Query files and visualize
>>> query_ids = explorer.get_file_list(group="graph", where=lambda ds: ds['num_nodes'] > 30)
>>> viewer.show_preview_as_image(query_ids, k=25)

Parameters:

file_ids (List[int])
png_paths (List[str | None])
scs_paths (List[str | None])
file_names (List[str] | None)

create_comparison_view(file_id_a, file_id_b, display_mode='sidecar', host='127.0.0.1', silent=True)

Create side-by-side comparison of two files.

This is useful for comparing similar parts, analyzing variations, or examining before/after scenarios.

Parameters:

file_id_a (int) – First file ID to compare
file_id_b (int) – Second file ID to compare
display_mode (str) – Display mode for viewers (default: ‘sidecar’)
host (str) – Host address for viewer servers (default: ‘127.0.0.1’)
silent (bool) – Whether to suppress viewer server output (default: True)

Returns:

Tuple of (viewer_a, viewer_b) or (None, None) if failed

Return type:

Tuple[CADViewer | None, CADViewer | None]

Examples

>>> # Compare two files side by side
>>> viewer_a, viewer_b = dataset_viewer.create_comparison_view(42, 87)
>>>
>>> # Highlight same features in both
>>> viewer_a.set_face_color([1, 2, 3], [255, 0, 0])
>>> viewer_b.set_face_color([1, 2, 3], [255, 0, 0])
>>>
>>> # Clean up
>>> viewer_a.terminate()
>>> viewer_b.terminate()

filter_by_availability(file_ids, require_png=False, require_3d=False)

Filter file IDs based on visualization data availability.

This is useful to ensure you only try to visualize files that have the necessary visualization data available.

Parameters:

file_ids (List[int]) – List of file IDs to filter
require_png (bool) – If True, only return IDs with PNG previews (default: False)
require_3d (bool) – If True, only return IDs with 3D stream cache (default: False)

Returns:

Filtered list of file IDs

Return type:

List[int]

Examples

>>> # Get files that have PNG previews
>>> files_with_images = viewer.filter_by_availability(
...     all_file_ids,
...     require_png=True
... )
>>>
>>> # Get files that have both PNG and 3D
>>> fully_visualizable = viewer.filter_by_availability(
...     all_file_ids,
...     require_png=True,
...     require_3d=True
... )

classmethod from_explorer(explorer)

Convenience constructor to create DatasetViewer from a DatasetExplorer.

This method queries the explorer for stream cache paths and creates a DatasetViewer with the extracted data.

Parameters:: explorer (DatasetExplorer) – DatasetExplorer instance
Returns:: DatasetViewer instance
Return type:: DatasetViewer

Examples

>>> explorer = DatasetExplorer(flow_output_file="flow.json")
>>> viewer = DatasetViewer.from_explorer(explorer)
>>> viewer.print_statistics()

get_available_file_ids()

Get list of all file IDs that have visualization data available.

Returns:: List of file IDs with PNG or stream cache paths
Return type:: List[int]

Examples

>>> available_ids = viewer.get_available_file_ids()
>>> print(f"Files with visualization: {len(available_ids)}")

get_file_info(file_id)

Get visualization information for a specific file ID.

Parameters:: file_id (int) – File ID to query (int or convertible to int)
Returns:: Dictionary with ‘name’, ‘png_path’, ‘stream_cache_path’ or None if not found
Return type:: Dict[str, Any] | None

Examples

>>> info = viewer.get_file_info(42)
>>> print(f"File name: {info['name']}")
>>> print(f"PNG available: {info['png_path'] is not None}")

get_statistics()

Get statistics about available visualization data.

Returns:: Dictionary containing statistics about the dataset visualization data
Return type:: Dict[str, Any]

Examples

>>> stats = viewer.get_statistics()
>>> print(f"Total files: {stats['total_files']}")
>>> print(f"PNG available: {stats['files_with_png']}")
>>> print(f"3D cache available: {stats['files_with_3d']}")
>>> print(f"Coverage: {stats['coverage_percentage']:.1f}%")

print_statistics()

Print formatted statistics about visualization data availability.

Examples

>>> viewer.print_statistics()

Dataset Visualization Statistics ═══════════════════════════════════════════════ Total files: 234 Files with PNG preview: 234 (100.0%) Files with 3D cache: 234 (100.0%) Overall coverage: 100.0%

Return type:: None

refresh_mapping(file_ids, png_paths, scs_paths, file_names=None)

Refresh the internal file mapping with new data.

Use this method if the dataset has been updated or you want to update the visualization paths.

Parameters:

file_ids (List[int]) – List of file IDs
png_paths (List[str | None]) – List of PNG paths
scs_paths (List[str | None]) – List of SCS paths
file_names (List[str] | None) – Optional list of file names

Return type:

None

Examples

>>> # Update with new data
>>> viewer.refresh_mapping(new_ids, new_pngs, new_scs, new_names)
>>> print(f"Refreshed mapping contains {len(viewer._file_mapping)} files")

show_preview_as_3d(file_ids, k=5, display_mode='inline', layout='sequential', host='127.0.0.1', start_port=8000, silent=True, width=400, height=400)

Open interactive 3D viewers for file IDs using stream cache files.

This method creates CADViewer instances for each file, loading their 3D stream cache representations. Users can interact with the 3D models directly in the notebook.

Parameters:

file_ids (List[int]) – List of file IDs to visualize (ints, numpy array, or convertible to int)
k (int) – Maximum number of 3D viewers to open (default: 5)
display_mode (str) – Display mode - ‘inline’, ‘sidecar’, or ‘none’ (default: ‘inline’)
layout (str) – Layout strategy - ‘sequential’ or ‘grid’ (default: ‘sequential’) Note: ‘grid’ not yet implemented, uses sequential
host (str) – Host address for viewer servers (default: ‘127.0.0.1’)
start_port (int) – Starting port for viewer servers (default: 8000)
silent (bool) – Whether to suppress viewer server output (default: True)
width (int) – Width of inline viewer in pixels (default: 400)
height (int) – Height of inline viewer in pixels (default: 400)

Returns:

List of CADViewer instances (one per displayed file)

Return type:

List[CADViewer]

Examples

>>> # Open 3 compact inline 3D viewers (forms a grid-like layout)
>>> viewers = viewer.show_preview_as_3d(file_ids, k=3)
>>>
>>> # Open larger inline viewers
>>> viewers = viewer.show_preview_as_3d(
...     file_ids,
...     k=3,
...     width=600,
...     height=500
... )
>>>
>>> # Open viewers in sidecar layout (full size)
>>> viewers = viewer.show_preview_as_3d(
...     file_ids,
...     k=5,
...     display_mode='sidecar'
... )
>>>
>>> # Interact with specific viewer
>>> selected_faces = viewers[0].get_selected_faces()
>>> viewers[0].set_face_color(selected_faces, [255, 0, 0])
>>>
>>> # Clean up viewers when done
>>> for v in viewers:
...     v.terminate()

show_preview_as_image(file_ids, k=25, grid_cols=6, figsize=(15, 5), show_labels=True, label_format='id', title=None, missing_color=(200, 200, 200), save_path=None)

Generate an image grid visualization from file IDs.

This method creates a matplotlib figure displaying PNG previews of CAD files in a grid layout. It’s perfect for quickly visualizing query results.

Parameters:

file_ids (List[int]) – List of file IDs to visualize (ints, numpy array, or convertible to int)
k (int) – Maximum number of files to display (default: 25)
grid_cols (int | None) – Number of columns in grid. If None, auto-calculated (default: None)
figsize (Tuple[int, int] | None) – Figure size as (width, height). If None, auto-calculated (default: None)
show_labels (bool) – Whether to show file labels on images (default: True)
label_format (str) – Label format - ‘id’, ‘name’, or ‘both’ (default: ‘id’)
title (str | None) – Overall figure title (default: None)
missing_color (Tuple[int, int, int]) – RGB color for files without PNG preview (default: gray)
save_path (str | None) – If provided, save the figure to this path (default: None)

Returns:

matplotlib Figure object

Return type:

matplotlib.pyplot.Figure

Examples

>>> # Simple grid visualization
>>> fig = viewer.show_preview_as_image(file_ids, k=16)
>>>
>>> # Custom 4-column grid with names
>>> fig = viewer.show_preview_as_image(
...     file_ids,
...     k=20,
...     grid_cols=4,
...     label_format='name',
...     title='High Complexity Parts'
... )
>>>
>>> # Save to file
>>> fig = viewer.show_preview_as_image(
...     file_ids,
...     k=100,
...     save_path='results/query_visualization.png'
... )

hoops_ai.insights.dataset_viewer

Hello! I'm HOOPSY