A typical process tends to have these steps:
- Theorize – Review possible causes based on contextual knowledge of the project as a whole.
- Establish Repro Case – Check if it is reproducible on another machine and by another human. If necessary create isolated test cases based on possible theories as needed.
- Narrow Down the Possibilities – Validate the nature of the frame rate issue by using the debug tools available (ie. an FPS monitor, draw call data, etc.) to do a sweep of the level and identify if the issues is related to tech, art or design changes. If its art or design related, check file history. Assess whether new tools needed to identify the causes.
- Compare and Contrast – What is different with this map/build/script/code? What has change? Establish whether this problem has always existed, and roll back to an alternate build.
- Isolate Probable Causes – Turn off various features and subsystems to isolate and identify the cause of the problem (ie. vegetation, AI, scripts, physics, sound etc.).
- Resolution – Once the cause is established and understood, communicate the nature of these issues as well as any new constraints that need to be defined to prevent future problems to relevant members of the team. As much as possible try to make sure the solution addresses the cause and not the symptoms. Short term fixes always bite you in the ass later.
Usually each problem gets isolated and diagnosed in a pretty unique manner based on the engine, project and tools that might or might initially not be available, and of course the current stage of project development. Here are a few brief examples of a variety of issues and the various solutions used to resolve them:
- Tech / engine / tools pipeline related level performance issues:
In a previous project that contained a destruction system, as you progressed and destroyed buildings through any of the levels, the performance deteriorated even if you were looking at nothing. The initial theory pointed to the physics system where it was speculated that objects were not properly coming to rest.
I ended up creating several test cases using the engine debugger as well as external physics debuggers to help isolate the cause. One test case was simply a single building before and after a destruction performance check. I was able to deduce the primary slowdown was caused by a performance optimization which collapsed all of the destruction debris into a single draw call. Once the physics objects came to rest they were being “hidden” from the scene rather than completely cleared. This also had the side effect of drawing an entire building even if only a single piece of debris was in the camera frustum. Imagine debris scattering all over a level then hidden resulting in every object in the game being drawn all the time.
In addition, there were issues of physics objects not coming to rest due a several reasons ranged from debris penetrating the world and falling forever to more process issue steaming from human error due to a poor authoring process, and a lack of ownership for the material system. Having artists hand tag materials with complex physics variables was not the best way to get results.I ended up working with the engineers and artists to redesign and implement the entire physics system and content authoring process. We changed the destruction system to spawn instance objects instead of custom unique debris, which in turn lead to changes in how art and design authored the levels and assets with instance geometry but kept the unique look when walls were destroyed. This also led to changes in how the material system was architected and data was managed by separating physics effects and gameplay materials. Global materials were also defined to cover 90% of the possible cases from a centralized location, and we added a new default/undefined physics material that was easy to use for debugging purposes. - Various art and design related issues:
- Overzealous on use of physics / destructible objects – This caused problems when too many of the 3000 poly debris spawned. Worked with artists on making sure we spawned less stuff and lowered the poly count. Devised a “rule of mass conservation” for destruction art where all of the mass should be accounted for, but you can make a lot it dust and smaller particles. Also added vaporization parameter for designers to control the amount of debris released.
- Overuse of post screen processing effects that blew the draw call limits – This problem only revealed itself when finalized art with all of the different materials came into the scene that in conjunction with the post processing caused the total draw call count to sky rocket. This was resolved by redesigning the post processing effects that required less rendering passes.
- Too many textures – Reduced the variety of AI archetypes being used at the same time that blew the texture budget causing the system to page to disk and load the new textures.
- Too many ray casts – Caused by a combination of too many shotgun-wielding AI all firing at the same time. A little tricky to isolate as the types of AI and weapon load outs that spawn were random. Reduced the number of shotgun AI that could spawn and the number of ray casts each shotgun fired.
- In addition to the cases described above, I have seen a variety of other causes for frame rate issues such as overblown draw call count, unnecessarily high poly count, improperly placed occlusion planes, poor use of BSP portals, bad level flow for a sectorization system, overloaded particle systems, overly detailed sound volumes, fill rate issues due to large transparent particles, excessive use of dynamic lights, sounds being played that couldn’t be heard, etc.