Memory Usage

This page aims at providing some details on memory usage inside HOOPS Luminate. We’ll review different parameters that are important to be aware of when using HOOPS Luminate. Due to the fact that HOOPS Luminate is an hybrid engine, capable of CPU and / or GPU rendering, memory management needs to be detailed to get a global understanding of the engine behavior.

Running HOOPS Luminate Hardware, Hybrid or Software

The RED::OPTIONS_RAY_ENABLE_SOFT_TRACER option value rules the overall behavior of HOOPS Luminate. This value is set on startup, as detailed here: Hardware, Hybrid or Software Start. This freezes the HOOPS Luminate memory management:

  • Hardware only applications will load their resources on the GPU: Geometries are stored on the CPU and uploaded on the GPU. Images are only stored on the GPU (hence the specific management workflow for images: Manipulating Images). Buffers and offscreen buffers are GPU memory only.

  • Software only applications will load their resources on the CPU: Geometries are stored on the CPU, images are stored on the CPU. No graphic driver is involved.

  • Hybrid applications do both: they both store their data on the CPU and on the GPU. So hybrid applications are the most demanding applications in terms of memory.

Hardware Image Size Limits

Keep this in mind: some hardware systems have limited image dimensions; For instance, it may not be possible to create a 32.768 x 16.384 pixel image on a given GPU, simply because the image is consuming too much memory for it to handle such large dimensions. Therefore, HOOPS Luminate will silently reduce the size of the image before uploading it on the GPU, so that it has a chance to fit in the video memory. Then:

  • If the application is running pure hardware mode, a query on the image size will return the GPU reduced size of that image. If that image gets saved into a .red file for instance, it’ll be saved with the reduced resolution. The original image no longer exists.

  • If the application is running hybrid mode, then a query on the image size will return the original CPU image size. The CPU part of HOOPS Luminate will keep the original image while a downgraded version of that image is loaded on the GPU. On saving the image to a .red file, the original image is saved.

Recent GPUs rarely suffer from that problem, as the maximal image size they can handle is generally 16.384 or above.

Exceeding the Available Video Memory

Hardware or hybrid applications may for one reason or another exceed the amount of video memory. First, on some systems, there’s no real video memory: video memory is shared with the main CPU memory, so there’s no real amount of memory which is only dedicated to graphics. Then, exceeding the amount of video memory will cause some slowdowns to occur, but some applications can bear this, as the graphic driver does a good paging job here: it’ll upload data from the system memory to the video memory regularly.

HOOPS Luminate offers you several tricks to reduce the memory consumption. We review them below:

Handling the RED_DRV_ALLOC_FAILURE Return Code

A call to RED::IWindow::FrameDrawing may return RED_DRV_ALLOC_FAILURE. If this arise, the first thing to be aware of is that this is not fatal to the application. The application is still running normally, but it could not complete the rendering of the frame due to insufficient resources available on the GPU. So we need to reduce the memory usage on the GPU and start rendering again.

To do this we can change several parameters:

  • If the rendering uses shadow maps (Shadow Mapping Detailed), then we can reduce the largest shadow map resolution. This’ll save memory used by intermediate buffers.

  • If the rendering uses composite images (Composite Images), the screen resolution matters, as each composite image is a screen sized image. HDR composite images rendered in large windows may take some memory for sure, so maybe some savings can be made here.

  • If there are too many geometries to render, turning on immediate mode rendering may be considered to offload the GPU memory. See all details below.

  • If there are pending, useless geometries at some point, these can be flushed out of the GPU. See all details below too.

Turning on Immediate Mode Geometry Rendering

The RED::OPTIONS_IMMEDIATE_MODE can be set to force HOOPS Luminate to render geometries using old style OpenGL graphics through immediate mode calls. Every geometry will be processed from the CPU using glBegin(), glVertex(…), glEnd() sequences. This slows down the rendering a lot, but in this case, no video memory gets involved in the rendering of the geometries.

HOOPS Luminate can decide on a per material basis on using immediate mode rendering or not. This is ruled by both the option and the RED::IMaterial::SetImmediateMode call.

Flushing Geometries out of the GPU

This is a mechanism that can be used to reduce the amount of video memory used by geometries on the GPU. Sometimes it may be useful for an application to keep geometries outside of any scene graph: think to geometries that need to be rendered on a certain event, or to geometries that have been preloaded but not yet displayed. The default behavior of HOOPS Luminate is to load all these geometries on the GPU so they are ready to draw once linked to a camera through a scene graph. So loaded geometries will use video memory, even if not rendered yet.

It’s possible to remove geometries from the GPU in calling RED::IShape::RemoveFromGPU. Removed geometries won’t consume any video memory until they get rendered again after having been linked again to a scene graph being displayed.

Memory Analysis Tools

The RED::MemoryAllocator and RED::MemoryLeakTracker classes can be used to hunt unwanted memory allocations, or to get global statistics on the memory being used by HOOPS Luminate at some point during the life of the application.

Hardware Driver Overhead

The OpenGL driver provided by hardware vendors uses some memory too. This can be a concern for hardware based or hybrid based HOOPS Luminate applications. The OpenGL specification forces the OpenGL driver to keep a copy of the data it manages, so that it can do the paging with the GPU whenever needed or so that it can answer to queries backward to the calling application. This memory usage should be reminded in establishing the global memory footprint of an application. While not part of the application itself, it does have an effect.

Reducing Data Memory on the GPU

The amount of data being loaded in the engine may become critical for large datasets that have to consider millions of triangles to render a single frame. If we consider a simple dataset with vertices, normals and a set of UV coordinates, the basic memory footprint it’ll have on the GPU will follow - more or less - this equation:

Attribute

Byte Size

Vertex

12 bytes (vertex coordinates) + 12 bytes (normal coordinates) + 8 bytes (UV values).

Triangle

12 bytes (indices for P0, P1 and P2).

Sum

44 bytes.

So, if we assume that our data structure is roughly 1 vertex for 1 triangle (models with solids such as CAD models will use less vertices, models with many surfaces will use more vertices, but on average, we believe that this ratio is a good starting point), we can store the following datasets for the given video memory below:

Video Memory

Number of Triangles

128 Mb

2.9 millions

256 Mb

5.8 millions

512 Mb

11.6 millions

1 Gb

23 millions

Consequently, this may not be enough for many applications, or the video memory requirements may get too high and force expensive GPUs to be used.

Note that we can exceed the total video memory and still maintain correct performances, but the more we exceed the available video memory, the more the frame rate drops.

HOOPS Luminate can be fine tuned to reduce the memory footprint used by the loaded data in two ways:

  • Change the accuracy of several data channels

  • Use limited size meshes

Reducing Geometry Channels Accuracy

Many applications do not need channels with full accuracy, as they’re visualizing large datasets in real-time and are not using high quality shaders. Therefore, we can save a lot of memory in reducing the accuracy of our input channels. We’ll consider the normals example. Normals are usually loaded using 3 floats. We can load them using unsigned byte (see the hardware support list for channel formats below) instead after having remapped their values:

// Assuming our initial normal array is 'fnor' with 3 floats per vertex, and 'nb_vertices' vertices:
RC_TEST( imesh->SetArray( RED::MCL_NORMAL, NULL, nb_vertices, 3, RED::MFT_UBYTE, iresmgr->GetState() ) );

// Accessing our new normal array, re-encoding it:
unsigned char* unor;
RC_TEST( imesh->GetArray( (void*&)unor, RED::MCL_NORMAL, iresmgr->GetState() ) );

for( int i = 0; i < nb_vertices; i++ )
{
unor[ 3 * i ] = (unsigned char)( 255.0f * ( fnor[ 3 * i ] + 1.0f ) / 2.0f );
unor[ 3 * i + 1 ] = (unsigned char)( 255.0f * (fnor[ 3 * i + 1 ] + 1.0f ) / 2.0f );
unor[ 3 * i + 2 ] = (unsigned char)( 255.0f * (fnor[ 3 * i + 2 ] + 1.0f ) / 2.0f );
}

Then, a pair of vertex and pixel programs can be used to decode these parameters:

// Transmit input normals (RED_VSH_NORMAL = 2) to the pixel shader stage:
vsh.Add( "MOV result.texcoord[0], vertex.attrib[2];\n" );

// Decode and renormalize normals for a pixel shader usage:
psh.Temp( "normal" );
psh.Add( "ADD normal, fragment.texcoord[0], { -127.5 }.x;\n" );
psh.Normalize( "normal", "normal" );

The quality loss resulting of this compression is hardly visible for most models. Then, this reduces our vertex cost from 44 bytes to 36 bytes, saving 8 bytes per vertex (we need to maintain 4 unsigned bytes per vertex for memory alignment, otherwise performances drop!).

A side effect of this optimization is that as we’re using less memory on the GPU, the frame rate may slightly improve, as the GPU has to move less memory to do the rendering of a frame.

Most of the time, the same kind of technique can applied to UVs. If UVs are bounded - which is very often the case - then we can consider using short values or unsigned bytes again to reduce the memory footprint of a single data vertex.

Reducing Triangle Index Space

HOOPS Luminate uses an internal implicit optimization for all meshes that have less than 65536 vertices. Index arrays are loaded using (unsigned short) values on the GPU rather than using (int) values. This can divide the amount of memory stored by indices by 50%.

Summary After Reduction

If we apply all these optimizations, then our numbers become:

Attribute

Byte Size

Vertex

12 bytes (float vertex coordinates) + 4 bytes (unsigned byte normal coordinates) + 4 bytes (unsigned short UV values)

Triangle

6 bytes (unsigned short indices for P0, P1 and P2)

Sum

26 bytes

And our average capacity is raised up to:

Video Memory

Number of Triangles

128 Mb

5.05 millions

256 Mb

10.1 millions

512 Mb

20.2 millions

1 Gb

40.4 millions