Wednesday, November 28, 2012

TypeScript/JavaScript: Minimizing garbage collection when using a math library

I've spent the past few weeks porting the engine from C++/OpenGL to Microsoft's new JavaScript superset TypeScript, using WebGL to do the rendering. This lead to the development of a set of vector and matrix classes which I've released separately on Github.

The port is now mostly complete and I've started working on  performance improvements. One important aspect to remember is that JavaScript objects are always allocated on the heap (using new) and will be taken care of by the garbage collector once they're no longer needed.

In the original C++ project, I used the GLM vector and matrix library. The following is a (simplified) method from my Plane class which computes the intersection point of the plane with a ray:

class Plane
{  

   public:  

     vec3 normal;  
     float distance;  

   ...  

   vec3 intersectRay(const vec3 &rayStart, const vec3 &rayEnd)
   {  
     vec3 ray = rayStart - rayEnd; 
 
     float d = vec3.dot(this.normal, ray);  
     float t = signedDistance(rayStart) / d;  

     return (rayStart - t * ray);  
   }  

 }  

And here's the equivalent in TypeScript, using TSM's vec3 class. It uses methods instead of operators for arithmetic operations because JavaScript does not support operator overloading:

class Plane { 
 
   public normal: vec3;  
   public distance: number;
 
   ...  

   intersectRay(rayStart: vec3, rayEnd: vec3): vec3 {  
     var ray = vec3.difference(rayStart, rayEnd);
  
     var d = vec3.dot(this.normal, ray);  
     var t = this.signedDistance(rayStart) / d; 
 
     return vec3.difference(rayStart, ray.copy().scale(t)); 
   }  

 }  

In both versions, three instances of vec3 have are created: the static minus operator as well as vec3.difference both return a new vec3 instance. The same is true for the asterisk operator in the return statement of the C++ version, which scales a vector by a scalar. In the TypeScript version, the non-static method scale is used on a copy of the original vector.

This isn't much of a problem in C++ because the vectors are allocated on the stack; in JavaScript, however, object allocation is expensive and keeps the garbage collector busy. In a simple test scene performing some basic collision detection and frustum culling, the engine created over 400 instances of vec3 per frame! A ten second snapshot of the memory consumption inside Chrome illustrates how frequently the garbage collector had to be invoked:


Ideally, no new objects should be allocated in a static scene. Instead, the results of arithmetic operations should be stored in dedicated variables. To allow for this, I added an optional dest parameter to all methods. If it's not specified, the method creates a new vector instance, just like before. If a valid argument is provided, the result will be written into that instead:

class vec3 { 
 
    static difference(vector: vec3, vector2: vec3, dest: vec3 = null): vec3 {
        if (!dest) dest = new vec3();

        dest.x = vector.x - vector2.x;
        dest.y = vector.y - vector2.y;
        dest.z = vector.z - vector2.z;

        return dest;
    }  

 }  

We can now create a dedicated member variable or a static member variable to permanentely or temporarily hold the result of an operation:

class Plane { 
 
   public normal: vec3;  
   public distance: number;  

   private m_ray = new vec3();  
 
   ...  

   intersectRay(rayStart: vec3, rayEnd: vec3, dest: vec3 = null): vec3 {  
     if (!dest) dest = new vec3(); 
 
     vec3.difference(rayStart, rayEnd, this.m_ray);
  
     var d = vec3.dot(this.normal, this.m_ray);  
     var t = this.signedDistance(rayStart) / d; 
 
     vec3.difference(rayStart, this.m_ray.scale(t), dest); 
 
     return dest;  
   }  

 }  

Using this simple fix in math-heavy routines (frustum culling, collision detection, etc.) I managed to drastically reduce the number of vector instances created each frame.

And although the overall memory consumption is somewhat higher, the garbage collector has to be called less frequently, resulting in a higher, steady framerate:


Applying this fix to other frequently-instantiated classes should increase performance even more. I have updated all of TSM's vector and matrix classes accordingly.

Saturday, May 26, 2012

Detect left and right Shift, Ctrl and Alt key presses, disable system commands (Win32)

For a game prototype I'm working on (I'll publish some early screenshots soon) I needed to detect whether the left or right Ctrl, Shift and Alt keys were pressed or held down. I expected implementation to be straightforward but in fact, handling of these "special" keys in Windows is not particularily intuitive. Not only are these keys handled by a different set of messages (WM_SYSKEYUP and WM_SYSKEYDOWN as opposed to WM_KEYUP and WM_KEYDOWN for "regular" keys) but one also has to explicitly disable system commands (such as Alt + F4) by overriding the SC_KEYMENU message and setting the  WS_POPUP window style.

I've posted my message loop below. I hope it's useful to some.

 unordered_map<char, int> Window::initializeKeytable()  
 {  
      unordered_map<char, int> virtualKeys;  
   
      virtualKeys['0'] = Input::KEY_0;  
      virtualKeys['1'] = Input::KEY_1;  
      etc.  
   
      virtualKeys[VK_SPACE] = Input::KEY_SPACE;  
      virtualKeys[VK_RETURN] = Input::KEY_RETURN;  
      etc.  
   
      return virtualKeys;  
 }  
   
 LRESULT CALLBACK Win32Window::processMessages( UINT uMsg, WPARAM wParam, LPARAM lParam )  
 {  
      static unordered_map<char, int> virtualKeys = initializeKeytable();  
   
      switch( uMsg )  
      {  
           case WM_KILLFOCUS:  
   
                resetKeyStates();  
   
                break;  
   
           case WM_CHAR:  
   
                if( (int)wParam >= 32 )  
                {  
                     m_keystrokes += (int)wParam;  
                }  
   
                break;  
   
           case WM_KEYUP:  
           case WM_SYSKEYUP:  
   
                if( wParam == VK_MENU ) // alt keys  
                {  
                     if( lParam & (1 << 24) )  
                     {  
                          m_keyStates[Input::KEY_RALT] = false;  
                     }  
                     else  
                     {  
                          m_keyStates[Input::KEY_LALT] = false;  
                     }  
                }  
                else if( wParam == VK_CONTROL ) // ctrl keys  
                {  
                     if( lParam & (1 << 24) )  
                     {  
                          m_keyStates[Input::KEY_RCTRL] = false;  
                     }  
                     else  
                     {  
                          m_keyStates[Input::KEY_LCTRL] = false;  
                     }  
                }  
                else if( wParam == VK_SHIFT ) // shift keys  
                {  
                     auto rShiftState = GetKeyState( VK_RSHIFT );  
                     auto lShiftState = GetKeyState( VK_LSHIFT );  
   
                     if( (((unsigned short)rShiftState) >> 15) != 1 )  
                     {  
                          m_keyStates[Input::KEY_RSHIFT] = false;  
                     }  
   
                     if( (((unsigned short)lShiftState) >> 15) != 1 )  
                     {  
                          m_keyStates[Input::KEY_LSHIFT] = false;  
                     }  
                }  
                else // non-system keys  
                {  
                     m_keyStates[virtualKeys[(int)wParam]] = false;  
                }  
   
                break;  
   
           case WM_KEYDOWN:  
           case WM_SYSKEYDOWN:  
                  
                if( wParam == VK_MENU ) // alt keys  
                {  
                     if( lParam & (1 << 24) )  
                     {  
                          m_keyStates[Input::KEY_RALT] = true;  
                     }  
                     else  
                     {  
                          m_keyStates[Input::KEY_LALT] = true;  
                     }  
                }  
                else if( wParam == VK_CONTROL ) // ctrl keys  
                {  
                     if( lParam & (1 << 24) )  
                     {  
                          m_keyStates[Input::KEY_RCTRL] = true;  
                     }  
                     else  
                     {  
                          m_keyStates[Input::KEY_LCTRL] = true;  
                     }  
                }  
                else if( wParam == VK_SHIFT ) // shift keys  
                {  
                     auto rShiftState = GetKeyState( VK_RSHIFT );  
                     auto lShiftState = GetKeyState( VK_LSHIFT );  
   
                     if( (((unsigned short)rShiftState) >> 15) == 1 )  
                     {  
                          m_keyStates[Input::KEY_RSHIFT] = true;  
                     }  
   
                     if( (((unsigned short)lShiftState) >> 15) == 1 )  
                     {  
                          m_keyStates[Input::KEY_LSHIFT] = true;  
                     }  
                }  
                else // non-system keys  
                {  
                     m_keyStates[virtualKeys[(int)wParam]] = true;  
                }  
   
                break;  
   
           case WM_LBUTTONUP:  
   
                m_keyStates[Input::MOUSE_LEFT] = false;  
   
                break;  
   
           case WM_RBUTTONUP:  
   
                m_keyStates[Input::MOUSE_RIGHT] = false;  
   
                break;  
   
           case WM_MBUTTONUP:  
   
                m_keyStates[Input::MOUSE_MIDDLE] = false;  
   
                break;  
   
           case WM_LBUTTONDOWN:  
   
                m_keyStates[Input::MOUSE_LEFT] = true;  
   
                break;  
   
           case WM_RBUTTONDOWN:  
   
                m_keyStates[Input::MOUSE_RIGHT] = true;  
   
                break;  
   
           case WM_MBUTTONDOWN:  
   
                m_keyStates[Input::MOUSE_MIDDLE] = true;  
   
                break;  
   
           case WM_MOUSEWHEEL:  
   
                if( (short)HIWORD(wParam) > 0 )  
                {  
                     m_keyStates[Input::MOUSE_WHEEL_UP] = true;  
                }  
                else if( (short)HIWORD(wParam) < 0 )  
                {  
                     m_keyStates[Input::MOUSE_WHEEL_DOWN] = true;  
                }  
   
                break;  
   
           case WM_PAINT:  
   
                PAINTSTRUCT ps;  
                BeginPaint( m_handle, &ps );  
                EndPaint( m_handle, &ps );  
   
                break;  
   
           case WM_CLOSE:  
   
                PostQuitMessage( 0 );  
                m_isActive = false;  
   
                break;  
   
           case WM_SYSCOMMAND:  
   
                switch( wParam & 0xFFF0 )  
                {  
                     case SC_KEYMENU:  
                     case SC_SCREENSAVE:  
   
                     return 0;  
                }  
   
                break;  
   
           default:  
   
                return DefWindowProc( m_handle, uMsg, wParam, lParam );  
      }  
   
      m_keyStates[Input::KEY_ALT]  = m_keyStates[Input::KEY_LALT]  || m_keyStates[Input::KEY_RALT];  
      m_keyStates[Input::KEY_CTRL] = m_keyStates[Input::KEY_LCTRL] || m_keyStates[Input::KEY_RCTRL];  
      m_keyStates[Input::KEY_SHIFT] = m_keyStates[Input::KEY_LSHIFT] || m_keyStates[Input::KEY_RSHIFT];  
   
      return 0;  
 }  

Monday, January 23, 2012

Material system

I've been working on the material system lately. A material handles the textures and shaders used when rendering a mesh. Every material has its own shading program, which consists of a vertex shader, a fragment shader and an optional geometry shader. Some of the material's properties, like its number of texture layers, are compiled directly into the shaders, while others, like material colors, can be updated at run-time using uniform variables.

Materials can be made up of several layers which are blended on top of each other with a variety of blend modes, very similar to Photoshop. A texture layer can contain a diffuse texture with an optional alpha channel (for transparency effects), a normal map with a height component stored in the alpha channel (for parallax mapping and similar effects) and a specular map. Internally, texture layers are represented by array textures. This brings the advantage that only one texture object has to be bound per layer, effectively reducing the number of texture binds by two thirds (diffuse, normal and specular maps are all adressed by the same sampler in the shader).

In addition, texture layers can be scrolled, stretched or rotated for effects like animated computer screens etc. This was heavily used in Quake 3, for example. I'll show an example of that later (if I don't forget as usual).

The wall on the screenshot is composed of four images (as seen on the left). You can see the material definition in the upper left corner.


Monday, January 9, 2012

Level export and portal rendering

I've been a busy bee these past months even though I failed, as usual, to report on my progress.. so here's a short recap.

I designed the engine's level format and wrote a Blender script which creates levels from prepared .blend files. Levels use portals and bounding volume hierarchies for fast visibility detection. This way, both indoor and outdoor scenes can be rendered reasonably fast and, most importantly, mixed. For now, portals have to be set manually in Blender. It works like this:
  1. The artist creates an indoor level as a series of connected rooms. Every room must be a separate object.
  2. Portal polygons are created for every door, window, etc. which connects two rooms or a room with the outside. Every portal polygon must be kept as a separate object, its name must begin with "p_" to tell the export script it's a portal.
  3. The export script creates a list of rooms and portals and their connections. In addition, a bounding volume hierarchy is created from the rooms and free objects (landscapes, detail meshes, etc.)
This way, portals are merely an optional optimization; if the scene you're working with it rather simple or you simply want to preview a complex level and don't care to much about speed, you can simply skip portal creation; potential rooms will then be placed in the bounding volume hierarchy.

A portal's definition in the level file may look like this:

 portal02                       // portal's name
 room02                         // target room's name  
 4                              // number of vertices
 -2.947355 0.001296 -4.529554   // vertex #1   
 -2.947347 3.001296 -4.529551   // vertex #2    
 -4.947346 3.001301 -4.529550   // vertex #3    
 -4.947354 0.001301 -4.529554   // vertex #4   

The next step was to put the new information to use. When a camera's render routine is called, it first determines the smallest sector it is in. Then it traverses the portals connected with this sector and recursively performs the following steps:
  1. If it is not completely visible, the portal is clipped to the screen.
  2. If the clipped portal has more or less than four vertices, it is replaced by its projected bounding rectangle.
  3. A new view frustum is computed from the clipped portal, using the portal's plane as the new near clipping plane.
  4. The sector is rendered using the new frustum.
  5. The entity components (meshes, sprites, etc.) contained in the sector are rendered.
  6. Sub-sectors are rendered.
Here's a screenshot with debug visualizations of the portals created during the recursive process:


The player's camera sees what is displayed in the top-right viewport. It is positioned in the first room. The doorway acts as a portal to the adjacent corridor, which in turn acts as a portal to the next room, and so on. This means that only parts of all rooms following the first are visible from the camera's current position, and only these parts and the entities they contain have to be rendered.The white frustum visualizes the camera's perspective from the first room, the turquoise frustum its perspective from the adjacent corridor, etc.