Friday, February 6, 2015

New demo showcasing dynamic lighting, soft shadows and post-processing effects

I have compiled a little demo using an updated version of my deferred renderer from 2008.

I updated the shaders to conform to the latest version of OpenGL and GLSL and optimized the rendering pipeline. The performance is still terrible at higher resolutions, though, which is a general problem of the deferred rendering approach but is also due to the engine's general design. There's OpenGL calls scattered everywhere, making it very hard to eliminate redundant draw calls. Also, a fullscreen quad is rendered for a every light, even those that only take up a fraction of the screen. I did, however, implement some optimizations into the shaders, e.g. normal compression using stereographic projection.

The demo presents the main features of the engine: dynamic lighting and soft shadows, integration of the physics engine Bullet Dynamics and some post-processing effects such as light bloom, depth of field, dynamic reflections and screen space ambient occlusion.

I am particularly proud of the soft shadows algorithm, which produces good results even on low resolution shadow maps. It is a combination of variance shadow mapping and two subsequent gaussian blur passes applied to the shadow map.

I created the scene using the no longer supported Milkshape3D which started crashing every few minutes after I was halfway done. I'm glad I switched to Blender for my later frameworks. All textures are licensed from gametextures.com. The skybox textures I found here.

You can download the demo here: Backyard Demo (117 MB, Windows only). Please make sure you have the latest drivers for your graphics card installed and let me know if you have any problems running the demo! Also, you can check the file startup.log for shader compilation errors.

Here are some screenshots with medium settings (which is all my laptop with integrated Intel graphics can handle):








Wednesday, November 28, 2012

TypeScript/JavaScript: Minimizing garbage collection when using a math library

I've spent the past few weeks porting the engine from C++/OpenGL to Microsoft's new JavaScript superset TypeScript, using WebGL to do the rendering. This lead to the development of a set of vector and matrix classes which I've released separately on Github.

The port is now mostly complete and I've started working on  performance improvements. One important aspect to remember is that JavaScript objects are always allocated on the heap (using new) and will be taken care of by the garbage collector once they're no longer needed.

In the original C++ project, I used the GLM vector and matrix library. The following is a (simplified) method from my Plane class which computes the intersection point of the plane with a ray:

class Plane
{  

   public:  

     vec3 normal;  
     float distance;  

   ...  

   vec3 intersectRay(const vec3 &rayStart, const vec3 &rayEnd)
   {  
     vec3 ray = rayStart - rayEnd; 
 
     float d = vec3.dot(this.normal, ray);  
     float t = signedDistance(rayStart) / d;  

     return (rayStart - t * ray);  
   }  

 }  

And here's the equivalent in TypeScript, using TSM's vec3 class. It uses methods instead of operators for arithmetic operations because JavaScript does not support operator overloading:

class Plane { 
 
   public normal: vec3;  
   public distance: number;
 
   ...  

   intersectRay(rayStart: vec3, rayEnd: vec3): vec3 {  
     var ray = vec3.difference(rayStart, rayEnd);
  
     var d = vec3.dot(this.normal, ray);  
     var t = this.signedDistance(rayStart) / d; 
 
     return vec3.difference(rayStart, ray.copy().scale(t)); 
   }  

 }  

In both versions, three instances of vec3 have are created: the static minus operator as well as vec3.difference both return a new vec3 instance. The same is true for the asterisk operator in the return statement of the C++ version, which scales a vector by a scalar. In the TypeScript version, the non-static method scale is used on a copy of the original vector.

This isn't much of a problem in C++ because the vectors are allocated on the stack; in JavaScript, however, object allocation is expensive and keeps the garbage collector busy. In a simple test scene performing some basic collision detection and frustum culling, the engine created over 400 instances of vec3 per frame! A ten second snapshot of the memory consumption inside Chrome illustrates how frequently the garbage collector had to be invoked:


Ideally, no new objects should be allocated in a static scene. Instead, the results of arithmetic operations should be stored in dedicated variables. To allow for this, I added an optional dest parameter to all methods. If it's not specified, the method creates a new vector instance, just like before. If a valid argument is provided, the result will be written into that instead:

class vec3 { 
 
    static difference(vector: vec3, vector2: vec3, dest: vec3 = null): vec3 {
        if (!dest) dest = new vec3();

        dest.x = vector.x - vector2.x;
        dest.y = vector.y - vector2.y;
        dest.z = vector.z - vector2.z;

        return dest;
    }  

 }  

We can now create a dedicated member variable or a static member variable to permanentely or temporarily hold the result of an operation:

class Plane { 
 
   public normal: vec3;  
   public distance: number;  

   private m_ray = new vec3();  
 
   ...  

   intersectRay(rayStart: vec3, rayEnd: vec3, dest: vec3 = null): vec3 {  
     if (!dest) dest = new vec3(); 
 
     vec3.difference(rayStart, rayEnd, this.m_ray);
  
     var d = vec3.dot(this.normal, this.m_ray);  
     var t = this.signedDistance(rayStart) / d; 
 
     vec3.difference(rayStart, this.m_ray.scale(t), dest); 
 
     return dest;  
   }  

 }  

Using this simple fix in math-heavy routines (frustum culling, collision detection, etc.) I managed to drastically reduce the number of vector instances created each frame.

And although the overall memory consumption is somewhat higher, the garbage collector has to be called less frequently, resulting in a higher, steady framerate:


Applying this fix to other frequently-instantiated classes should increase performance even more. I have updated all of TSM's vector and matrix classes accordingly.

Saturday, May 26, 2012

Detect left and right Shift, Ctrl and Alt key presses, disable system commands (Win32)

For a game prototype I'm working on I needed to detect whether the left or right Ctrl, Shift and Alt keys were pressed or held down. I expected implementation to be straightforward but in fact, handling of these "special" keys in Windows is not particularly intuitive. Not only are these keys handled by a different set of messages (WM_SYSKEYUP and WM_SYSKEYDOWN as opposed to WM_KEYUP and WM_KEYDOWN for "regular" keys) but one also has to explicitly disable system commands (such as Alt + F4) by overriding the SC_KEYMENU message and setting the  WS_POPUP window style.

I've posted my message loop below. I hope it's useful to some.

 unordered_map<char, int> Window::initializeKeytable()  
 {  
      unordered_map<char, int> virtualKeys;  
   
      virtualKeys['0'] = Input::KEY_0;  
      virtualKeys['1'] = Input::KEY_1;  
      etc.  
   
      virtualKeys[VK_SPACE] = Input::KEY_SPACE;  
      virtualKeys[VK_RETURN] = Input::KEY_RETURN;  
      etc.  
   
      return virtualKeys;  
 }  
   
 LRESULT CALLBACK Win32Window::processMessages( UINT uMsg, WPARAM wParam, LPARAM lParam )  
 {  
      static unordered_map<char, int> virtualKeys = initializeKeytable();  
   
      switch( uMsg )  
      {  
           case WM_KILLFOCUS:  
   
                resetKeyStates();  
   
                break;  
   
           case WM_CHAR:  
   
                if( (int)wParam >= 32 )  
                {  
                     m_keystrokes += (int)wParam;  
                }  
   
                break;  
   
           case WM_KEYUP:  
           case WM_SYSKEYUP:  
   
                if( wParam == VK_MENU ) // alt keys  
                {  
                     if( lParam & (1 << 24) )  
                     {  
                          m_keyStates[Input::KEY_RALT] = false;  
                     }  
                     else  
                     {  
                          m_keyStates[Input::KEY_LALT] = false;  
                     }  
                }  
                else if( wParam == VK_CONTROL ) // ctrl keys  
                {  
                     if( lParam & (1 << 24) )  
                     {  
                          m_keyStates[Input::KEY_RCTRL] = false;  
                     }  
                     else  
                     {  
                          m_keyStates[Input::KEY_LCTRL] = false;  
                     }  
                }  
                else if( wParam == VK_SHIFT ) // shift keys  
                {  
                     auto rShiftState = GetKeyState( VK_RSHIFT );  
                     auto lShiftState = GetKeyState( VK_LSHIFT );  
   
                     if( (((unsigned short)rShiftState) >> 15) != 1 )  
                     {  
                          m_keyStates[Input::KEY_RSHIFT] = false;  
                     }  
   
                     if( (((unsigned short)lShiftState) >> 15) != 1 )  
                     {  
                          m_keyStates[Input::KEY_LSHIFT] = false;  
                     }  
                }  
                else // non-system keys  
                {  
                     m_keyStates[virtualKeys[(int)wParam]] = false;  
                }  
   
                break;  
   
           case WM_KEYDOWN:  
           case WM_SYSKEYDOWN:  
                  
                if( wParam == VK_MENU ) // alt keys  
                {  
                     if( lParam & (1 << 24) )  
                     {  
                          m_keyStates[Input::KEY_RALT] = true;  
                     }  
                     else  
                     {  
                          m_keyStates[Input::KEY_LALT] = true;  
                     }  
                }  
                else if( wParam == VK_CONTROL ) // ctrl keys  
                {  
                     if( lParam & (1 << 24) )  
                     {  
                          m_keyStates[Input::KEY_RCTRL] = true;  
                     }  
                     else  
                     {  
                          m_keyStates[Input::KEY_LCTRL] = true;  
                     }  
                }  
                else if( wParam == VK_SHIFT ) // shift keys  
                {  
                     auto rShiftState = GetKeyState( VK_RSHIFT );  
                     auto lShiftState = GetKeyState( VK_LSHIFT );  
   
                     if( (((unsigned short)rShiftState) >> 15) == 1 )  
                     {  
                          m_keyStates[Input::KEY_RSHIFT] = true;  
                     }  
   
                     if( (((unsigned short)lShiftState) >> 15) == 1 )  
                     {  
                          m_keyStates[Input::KEY_LSHIFT] = true;  
                     }  
                }  
                else // non-system keys  
                {  
                     m_keyStates[virtualKeys[(int)wParam]] = true;  
                }  
   
                break;  
   
           case WM_LBUTTONUP:  
   
                m_keyStates[Input::MOUSE_LEFT] = false;  
   
                break;  
   
           case WM_RBUTTONUP:  
   
                m_keyStates[Input::MOUSE_RIGHT] = false;  
   
                break;  
   
           case WM_MBUTTONUP:  
   
                m_keyStates[Input::MOUSE_MIDDLE] = false;  
   
                break;  
   
           case WM_LBUTTONDOWN:  
   
                m_keyStates[Input::MOUSE_LEFT] = true;  
   
                break;  
   
           case WM_RBUTTONDOWN:  
   
                m_keyStates[Input::MOUSE_RIGHT] = true;  
   
                break;  
   
           case WM_MBUTTONDOWN:  
   
                m_keyStates[Input::MOUSE_MIDDLE] = true;  
   
                break;  
   
           case WM_MOUSEWHEEL:  
   
                if( (short)HIWORD(wParam) > 0 )  
                {  
                     m_keyStates[Input::MOUSE_WHEEL_UP] = true;  
                }  
                else if( (short)HIWORD(wParam) < 0 )  
                {  
                     m_keyStates[Input::MOUSE_WHEEL_DOWN] = true;  
                }  
   
                break;  
   
           case WM_PAINT:  
   
                PAINTSTRUCT ps;  
                BeginPaint( m_handle, &ps );  
                EndPaint( m_handle, &ps );  
   
                break;  
   
           case WM_CLOSE:  
   
                PostQuitMessage( 0 );  
                m_isActive = false;  
   
                break;  
   
           case WM_SYSCOMMAND:  
   
                switch( wParam & 0xFFF0 )  
                {  
                     case SC_KEYMENU:  
                     case SC_SCREENSAVE:  
   
                     return 0;  
                }  
   
                break;  
   
           default:  
   
                return DefWindowProc( m_handle, uMsg, wParam, lParam );  
      }  
   
      m_keyStates[Input::KEY_ALT]  = m_keyStates[Input::KEY_LALT]  || m_keyStates[Input::KEY_RALT];  
      m_keyStates[Input::KEY_CTRL] = m_keyStates[Input::KEY_LCTRL] || m_keyStates[Input::KEY_RCTRL];  
      m_keyStates[Input::KEY_SHIFT] = m_keyStates[Input::KEY_LSHIFT] || m_keyStates[Input::KEY_RSHIFT];  
   
      return 0;  
 }  

Monday, January 23, 2012

Material system

I've been working on the material system lately. A material handles the textures and shaders used when rendering a mesh. Every material has its own shading program, which consists of a vertex shader, a fragment shader and an optional geometry shader. Some of the material's properties, like its number of texture layers, are compiled directly into the shaders, while others, like material colors, can be updated at run-time using uniform variables.

Materials can be made up of several layers which are blended on top of each other with a variety of blend modes, very similar to Photoshop. A texture layer can contain a diffuse texture with an optional alpha channel (for transparency effects), a normal map with a height component stored in the alpha channel (for parallax mapping and similar effects) and a specular map. Internally, texture layers are represented by array textures. This brings the advantage that only one texture object has to be bound per layer, effectively reducing the number of texture binds by two thirds (diffuse, normal and specular maps are all addressed by the same sampler in the shader).

In addition, texture layers can be scrolled, stretched or rotated for effects like animated computer screens etc. This was heavily used in Quake 3, for example. I'll show an example of that later (if I don't forget as usual).

The wall on the screenshot is composed of four images (as seen on the left). You can see the material definition in the upper left corner.