hoops_ai.insights.DatasetViewer

class hoops_ai.insights.DatasetViewer(file_ids, png_paths, scs_paths, file_names=None, reference_dir=None)

Bases: object

Powerful visualization tool for exploring CAD datasets.

This class accepts lists of file IDs and their corresponding visualization paths to enable visualization of CAD files as either:

  1. Image collages/grids using PNG previews

  2. Interactive 3D views using stream cache files

The DatasetViewer is designed to work with data from DatasetExplorer but remains decoupled, accepting only the necessary lists for maximum flexibility.

Performance Features: - Maintains a persistent process pool (4 workers) for parallel PNG generation - Workers are pre-initialized with HOOPS licenses on creation - Significantly faster for multiple search result visualizations - Automatic cleanup via context manager or destructor

Examples:

# Get data from explorer
explorer = DatasetExplorer(flow_output_file="flow.json")
cache_df = explorer.get_stream_cache_paths()

# Extract lists
file_ids = cache_df['id'].tolist()
png_paths = cache_df['stream_cache_png'].tolist()
scs_paths = cache_df['stream_cache_3d'].tolist()
file_names = cache_df['name'].tolist()

# Create viewer
viewer = DatasetViewer(file_ids, png_paths, scs_paths, file_names)

# Or use convenience method
viewer = DatasetViewer.from_explorer(explorer)

# Query files and visualize
query_ids = explorer.get_file_list(group="graph", where=lambda ds: ds['num_nodes'] > 30)
viewer.show_preview_as_image(query_ids, k=25)

# Use as context manager for automatic cleanup
with DatasetViewer.from_explorer(explorer) as viewer:
    viewer.show_search_results(hits, query_file=query)
# Process pool is automatically cleaned up

# Or manually clean up when done
viewer.close()
Parameters:
close()

Clean up resources, including the process pool.

Call this method when done with the DatasetViewer to free resources. After calling close(), parallel PNG generation will no longer work.

Examples::

viewer = DatasetViewer.from_explorer(explorer) # … use viewer … viewer.close() # Clean up when done

Return type:

None

filter_by_availability(file_ids, require_png=False, require_3d=False)

Filter file IDs based on visualization data availability.

This is useful to ensure you only try to visualize files that have the necessary visualization data available.

Parameters:
  • file_ids (List[int]) – List of file IDs to filter

  • require_png (bool) – If True, only return IDs with PNG previews (default: False)

  • require_3d (bool) – If True, only return IDs with 3D stream cache (default: False)

Returns:

Filtered list of file IDs

Return type:

List[int]

Examples:

# Get files that have PNG previews
files_with_images = viewer.filter_by_availability(
    all_file_ids,
    require_png=True
)

# Get files that have both PNG and 3D
fully_visualizable = viewer.filter_by_availability(
    all_file_ids,
    require_png=True,
    require_3d=True
)
classmethod from_explorer(explorer)

Convenience constructor to create DatasetViewer from a DatasetExplorer.

This method queries the explorer for stream cache paths and creates a DatasetViewer with the extracted data.

Parameters:

explorer (DatasetExplorer) – DatasetExplorer instance

Returns:

DatasetViewer instance

Return type:

DatasetViewer

Examples:

explorer = DatasetExplorer(flow_output_file="flow.json")
viewer = DatasetViewer.from_explorer(explorer)
viewer.print_statistics()
get_available_file_ids()

Get list of all file IDs that have visualization data available.

Returns:

List of file IDs with PNG or stream cache paths

Return type:

List[int]

Examples::

available_ids = viewer.get_available_file_ids() print(f”Files with visualization: {len(available_ids)}”)

get_file_info(file_id, resolve_paths=True)

Get visualization information for a specific file ID.

Parameters:
  • file_id (int) – File ID to query (int or convertible to int)

  • resolve_paths (bool) – If True (default), resolve relative paths to absolute paths using the reference directory. This enables portability.

Returns:

Dictionary with ‘name’, ‘png_path’, ‘stream_cache_path’ or None if not found

Return type:

Dict[str, Any] | None

Examples:

info = viewer.get_file_info(42)
print(f"File name: {info['name']}")
print(f"PNG available: {info['png_path'] is not None}")
get_statistics()

Get statistics about available visualization data.

Returns:

Dictionary containing statistics about the dataset visualization data

Return type:

Dict[str, Any]

Examples:

stats = viewer.get_statistics()
print(f"Total files: {stats['total_files']}")
print(f"PNG available: {stats['files_with_png']}")
print(f"3D cache available: {stats['files_with_3d']}")
print(f"Coverage: {stats['coverage_percentage']:.1f}%")
print_statistics()

Print formatted statistics about visualization data availability.

Examples

>>> viewer.print_statistics()
>>> Dataset Visualization Statistics
... ═══════════════════════════════════════════════
... Total files:              234
... Files with PNG preview:   234 (100.0%)
... Files with 3D cache:      234 (100.0%)
... Overall coverage:         100.0%
Return type:

None

refresh_mapping(file_ids, png_paths, scs_paths, file_names=None)

Refresh the internal file mapping with new data.

Use this method if the dataset has been updated or you want to update the visualization paths.

Parameters:
  • file_ids (List[int]) – List of file IDs

  • png_paths (List[str | None]) – List of PNG paths

  • scs_paths (List[str | None]) – List of SCS paths

  • file_names (List[str] | None) – Optional list of file names

Return type:

None

Examples:

# Update with new data
viewer.refresh_mapping(new_ids, new_pngs, new_scs, new_names)
print(f"Refreshed mapping contains {len(viewer._file_mapping)} files")
show_preview_as_3d(file_ids, k=5, display_mode='inline', layout='grid', host='127.0.0.1', start_port=8000, silent=True, width=400, height=400)

Open interactive 3D viewers for file IDs using stream cache files.

This method creates CADViewer instances for each file, loading their 3D stream cache representations. Users can interact with the 3D models directly in the notebook.

Parameters:
  • file_ids (List[int]) – List of file IDs to visualize (ints, numpy array, or convertible to int)

  • k (int) – Maximum number of 3D viewers to open (default: 5)

  • display_mode (str) – Display mode - ‘inline’, ‘sidecar’, or ‘none’ (default: ‘inline’)

  • layout (str) – Layout strategy - ‘sequential’ or ‘grid’ (default: ‘grid’) ‘grid’ displays viewers in a horizontal row ‘sequential’ displays viewers one after another

  • host (str) – Host address for viewer servers (default: ‘127.0.0.1’)

  • start_port (int) – Starting port for viewer servers (default: 8000)

  • silent (bool) – Whether to suppress viewer server output (default: True)

  • width (int) – Width of inline viewer in pixels (default: 400)

  • height (int) – Height of inline viewer in pixels (default: 400)

Returns:

List of CADViewer instances (one per displayed file)

Return type:

List[CADViewer]

Examples:

# Open 3 compact inline 3D viewers in a grid
viewers = viewer.show_preview_as_3d(file_ids, k=3)

# Open larger inline viewers
viewers = viewer.show_preview_as_3d(
    file_ids,
    k=3,
    width=600,
    height=500
)

# Open viewers sequentially (one after another)
viewers = viewer.show_preview_as_3d(
    file_ids,
    k=5,
    layout='sequential'
)

# Open viewers in sidecar layout (full size)
viewers = viewer.show_preview_as_3d(
    file_ids,
    k=5,
    display_mode='sidecar'
)

# Interact with specific viewer
selected_faces = viewers[0].get_selected_faces()
viewers[0].set_face_color(selected_faces, [255, 0, 0])

# Clean up viewers when done
for v in viewers:
    v.terminate()
show_preview_as_image(file_ids, k=25, grid_cols=6, figsize=(15, 5), show_labels=True, label_format='id', title=None, missing_color=(200, 200, 200), save_path=None)

Generate an image grid visualization from file IDs.

This method creates a matplotlib figure displaying PNG previews of CAD files in a grid layout. It’s perfect for quickly visualizing query results.

Parameters:
  • file_ids (List[int]) – List of file IDs to visualize (ints, numpy array, or convertible to int)

  • k (int) – Maximum number of files to display (default: 25)

  • grid_cols (int | None) – Number of columns in grid. If None, auto-calculated (default: None)

  • figsize (Tuple[int, int] | None) – Figure size as (width, height). If None, auto-calculated (default: None)

  • show_labels (bool) – Whether to show file labels on images (default: True)

  • label_format (str) – Label format - ‘id’, ‘name’, or ‘both’ (default: ‘id’)

  • title (str | None) – Overall figure title (default: None)

  • missing_color (Tuple[int, int, int]) – RGB color for files without PNG preview (default: gray)

  • save_path (str | None) – If provided, save the figure to this path (default: None)

Returns:

matplotlib Figure object

Return type:

matplotlib.pyplot.Figure

Examples:

# Simple grid visualization
fig = viewer.show_preview_as_image(file_ids, k=16)

# Custom 4-column grid with names
fig = viewer.show_preview_as_image(
    file_ids,
    k=20,
    grid_cols=4,
    label_format='name',
    title='High Complexity Parts'
)

# Save to file
fig = viewer.show_preview_as_image(
    file_ids,
    k=100,
    save_path='results/query_visualization.png'
)
show_search_results(hits, query_file=None, output_dir=None, k=None, grid_cols=4, figsize=None, show_scores=True, show_filenames=True, title='CAD Similarity Search Results', missing_color=(200, 200, 200), save_path=None, is_white_background=True, overwrite=True)

Visualize CAD similarity search results from vector search hits.

This method takes VectorHit objects (from CADSearch.search_by_shape()), generates PNG previews on-the-fly from CAD files, and displays them in a grid with similarity scores.

Unlike show_preview_as_image() which uses pre-existing PNGs, this method: - Loads CAD files from paths stored in hit.id - Generates stream cache PNGs on-the-fly - Displays similarity scores alongside images - Perfect for visualizing search results

Parameters:
  • hits (List[Any]) – List of VectorHit objects from CADSearch (each has .id, .score, .metadata)

  • query_file (str | None) – Optional path to query CAD file to display on the left (default: None)

  • output_dir (str | None) – Directory to save generated PNG files (default: current_dir/out)

  • k (int | None) – Maximum number of hits to display. If None, shows all hits (default: None)

  • grid_cols (int | None) – Number of columns in grid for hits (default: 4)

  • figsize (Tuple[int, int] | None) – Figure size as (width, height). If None, auto-calculated

  • show_scores (bool) – Whether to show similarity scores on images (default: True)

  • show_filenames (bool) – Whether to show filename labels (default: True)

  • title (str | None) – Overall figure title (default: “CAD Similarity Search Results”)

  • missing_color (Tuple[int, int, int]) – RGB color for files that fail to load (default: gray)

  • save_path (str | None) – If provided, save the figure to this path (default: None)

  • is_white_background (bool) – Use white background for PNG export (default: True)

  • overwrite (bool) – Overwrite existing PNGs if they exist (default: True)

Returns:

matplotlib Figure object

Return type:

matplotlib.pyplot.Figure

Examples:

# Basic usage with CADSearch results
from hoops_ai.ml import CADSearch

searcher = CADSearch(shape_model=embedder)
searcher.load_shape_index("my_index.faiss")
hits = searcher.search_by_shape("query.step", top_k=10)

# Visualize results
viewer = DatasetViewer([], [], [])  # Empty initialization for search results
fig = viewer.show_search_results(hits)

# Customize display
fig = viewer.show_search_results(
    hits,
    k=16,
    grid_cols=4,
    title="Top Similar Gears",
    output_dir="search_results"
)

# Show more results without scores
fig = viewer.show_search_results(
    hits,
    k=25,
    grid_cols=5,
    show_scores=False,
    save_path="results.png"
)

Note

This method requires CAD files to be accessible at the paths stored in hit.id. It will load each file and generate a PNG preview, which may take time for large datasets.