This article originally appeared in the “Inner Product” column in Game Developer Magazine, January 2006

Last month I described an optimization that needs to be done early on in a project, if it is to be done at all. This month I expand on the theme of early optimizations, and present a few examples of what I call “Mature Optimization”.

CONVENTIONAL WISDOM

Every year a fresh crop of young programmers enters the games industry with the wisdom of books and lectures swirling around their heads. One pearl of wisdom is drilled deep into their tender programming muscles; the pearl sometimes known as “Hoare’s Dictum”:

“Premature optimization is the root of all evil” – C.A.R. Hoare.

Unfortunately, in the mind of the freshly minted programmer entering the games industry, this quote seems to turn into:

“Early optimization is evil” – Junior Game Programmer, 2005

The legendary Donald Knuth most famously weighed in on this subject when he wrote these often quoted (and often misleadingly punctuated) sentences:

“We should forget about small efficiencies, say about 97% of the time. Premature optimization is the root of all evil” – Knuth, Literate Programming, 1992, p28.

Nobody wants to argue with Knuth – it is the equivalent of arguing against the second law of thermodynamics. However, the problem here is that too often people misunderstand what Knuth was getting at. Ask a game programmer when optimization should be done, and they will tell you “after the code is written, we can profile it, see where the bottlenecks are, and eliminate them”.

CONVENTIONAL GAME OPTIMIZATION

Often that is exactly what happens. Towards the end of the project we profile the code, and find that 20% of our CPU time is taken up by a function like HandleMessageEvent(), so we track down why that’s taking so much time, then we fix it.

Then we tackle our next function on the profiler’s list; maybe one that takes 10% of the CPU time. We fix that, and then tackle a few more. Our game now runs 30% faster, and our fattest function takes just 1% of our CPU time.

However, if after that the code is still too slow, you will now find it is not because of a few weighty functions, but rather a general problem with all of the code. The code is just generally inefficient. By ignoring optimization until the end of the project you have ended up with “sick code” that contains multiple inefficiencies too deeply ingrained within the fabric of the code to be removed in a timely manner.

WHAT KNUTH REALLY SAID

When Knuth said, “we should forget about small efficiencies, about 97% of the time”, he was actually defending something he just did on the previous page, which was to optimize a loop to make it 12% faster, using “goto”.

Knuth then states my main point here – something contrary to what most programmers initially seem to take from Hoare’s Dictum
Page 28

“The conventional wisdom shared by many of today’s software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by penny-wise-and-pound-foolish programmers, who can’t debug or maintain their ‘optimized’ programs” (Knuth, 1992, p28)

There you have it. Conventional wisdom was wrong in 1992. Sadly, conventional wisdom continues to be wrong 14 years later. Programmers, even game programmers, are still starting out with the misconception that optimization is evil, and so they ignore optimization until the end of the project.

MATURE OPTIMIZATION

What is evil here is a very specific type of optimization: “Premature optimization” which is optimization done early in development without a real understanding of if it is necessary, and which may have an overall negative effect on the project.

Given we have an evil type of optimization called “premature optimization”, It then follows that there is a contrasting type of optimization that is not evil, which we naturally call “Mature Optimization”.

Mature optimization is any optimization done early in development that you know in advance will provide a significant performance boost without unmanageable side effects. Mature optimizations are often well-known techniques that were successfully used before. Mature optimizations include small local modifications, coding standards, and architecture-level design decisions.

GAME OPTIMIZATION

Game programming is complicated, and getting more complicated every year. I think it’s now more complicated than either Hoare or Knuth really could appreciate. A game engine is comprised of a large number of systems, each handling a part of the game, and each system is itself often vastly more complex than an entire game of 20 years ago. Since each system is so complex, and each system only contributes a little to CPU usage, then individual optimizations are not often going to have much effect on overall performance. To ensure our game runs as fast as possible, we need to optimize our code as it is developed, using every mature optimization available to us.

Developing a game is more and more about developing the content for that game (rather than developing the code). When developing that content, you cannot really get a good idea of how the game plays unless the code is running at the speed you hope it will run at when you ship the game. This is especially true of action and sports games, where the “feel” of the game is a vital part of the gameplay, and very important in establishing a difficulty curve. Thus, it is beneficial to do as much optimization as possible in the early stages of a project.

Most games are real-time applications. A very large chunk of the code is running thirty (or hopefully sixty) times per second. In developing business applications, a response time of a large fraction of a second for a single transaction is quite acceptable. In the games industry we need to advance the state of the entire game in just 0.0166 seconds. It is very rare for a single transaction or operation (say total collision detection) to be budgeted for more than one millisecond (0.001 seconds).

Processing of triangles and other graphics primitives has mostly been off-loaded to the GPU. The CPU is mostly concerned with iterating over game objects, and doing things with them. Optimization of games even a few years ago was often heavily weighted towards optimizing the rendering loop. The balance here has been shifting over the years to the bulk of the CPU time being used by game logic, AI and physics. The result is that there is less “low hanging fruit” to optimize.

The relatively high power of modern CPUs has also contributed to a shift toward physic calculations taking up more and more of the CPU’s power. Here again it is important that optimization be completed early, so that the designers can design their game scenarios within well-defined limits as to the complexity of the physics simulations they can use, specifically the number of objects in a scene. If you leave the optimization until too late, then you are either going to have a scene that runs too slow (and may require some last minute butchering), or a scene that is much lower in complexity than the engine can support, and which looks weak compared to the competition.

Given that overview of the issues relating to optimizing game, I will now give a few representative examples of mature optimizations.

AVOID ALLOCATIONS

Memory allocations are expensive. If you have some system that is doing a large number of small allocations from the main heap, on a continuous basis, then consider making some kind of custom allocator. A good example of this is a particle system. If you know a particle system has a maximum of 1024 particles, then you should use a high-speed pool rather than a heap.

This kind of memory system optimization is well known in game development. However, it does fall into the category of a mature optimization, since it can be tricky and dangerous to replace one method of memory allocation with another toward the end of a project.

AVOID RTTI

RTTI, or Run Time Type Inspection, is what allows you to use <dynamic_cast> on a base class pointer to see what type of derived class it actually is. The problem with RTTI is that it is very slow. It depends on the specific compiler implementation, but on many compilers a <dynamic_cast> is implemented using the strcmp function on the name of the class. If your profiling shows that strcmp is registering above 0.1% of CPU time, then suspect RTTI.

Instead of using <dynamic_cast> you should first identify those classes that require some kind of run-time type inspection, consider if you really need it, and then incorporate a type member variable into the base class, with an enum providing the type numbers. You then do your own RTTI by checking the type variable to verify the type (if needed), and then use <static_cast> to do the actual cast.

The reason avoiding RTTI qualifies as a mature optimization is that it can be almost impossible to apply towards the end of the project. If a programmer starts implementing a complex system with many usages of RTTI, then it can be a significant undertaking to remove it. Adding your own optimized version of RTTI is best done early.

AVOID ALIASING PROBLEMS

Aliasing is an insidious problem with compiled code. Aliasing itself is a straightforward concept – the same piece of memory can be pointed at by two different pointers. The performance problem occurs when the compiler assumes that the contents of memory can be changed by something other than the current code, and hence will continually re-fetch something from memory when you the programmer know that it’s not going to change.

How you handle aliasing problems depends on your compiler. You could simply turn off aliasing entirely. You might be able to use the “__restrict__” keyword selectively to tell the compiler that a pointer cannot be aliased. Both of these solutions have the characteristics of a mature optimization, in that they are best done early in the project if they are able to have an effect safely. It is also a mature optimization in that it often takes a mature programmer to spot it as a problem. Detecting aliasing as an issue can require that you look at the disassembly to see exactly what instructions are being generated.

AVOID PER-FRAME CODE

Not everything needs to be done 60 times a second. A huge saver of CPU time is to make certain aspects of the game logic run at a frame rate slower than the screen refresh rate. Most physics will look quite reasonable running at 30 frames per second. Some types of logic, such as path-finding, can be spread over a large number of frames.

This is a good example of a mature optimization. Many games use this form of optimization. It can be a massive time saver, cutting as much as 20% to 50% off your CPU time. It is something that is very difficult to add late in the project. The change in timing from running logic synchronized with the rendering frame advance, to logic running independently can introduce all kinds of obscure timing bugs.

AVOID ITERATIONS

Doing any single thing on a modern CPU takes no time at all. Where we fall over is when we need to do that single thing multiple times. Whenever you are iterating over some list of things in your game, consider how you might be iterating over fewer things, or not iterating at all.

For example, let us say we want to automatically target enemies that are within range. The simple way to do this is to iterate over all objects in the game, see which can be targeted, and of those see which are within range.

The problem here is that the list of all object might only contain a few that are actually targetable. You could optimize this by maintaing a separate list of all the objects that can be targeted. When targetable objects are created, and destroyed, they are added to, and removed from, this list. Then when you want to target something, you just iterate over the shorter list.

PROFILE INLINE FUNCTIONS

When profiling your code at a function level, most profilers will not be able to give you any kind of reading for inline functions. This is because the optimizing compiler will interleave the inline functions instructions with the instructions in the calling function, making it impossible to tell when the inline function is being executed.

While this is expected behavior, it can often hide problems with weighty inline functions that are either too slow, or called too often. The culprit will often show up as some large higher level function, which calls several inline functions.

To get a clearer picture of exactly what is going on here, you can turn off the expansion of inline functions just for the compilation unit that contains the high level function. This should reveal the inline functions to the profiler. Recognize that it is not an entirely accurate picture of what is going on, but it will give you an indication of the balance of CPU usage between the high level function, and the inline functions, and hence indicate on what you need to focus your optimization efforts.

PROCESS OFFLINE

Load times for games need to be short. Yet too often we get to the end of a project and find the load times are over a minute. Some advance optimization here can save you a lot of trouble later.

The most obvious optimization is to load your assets all as one big file (a ZIP, PAK or WAD file), rather than a large number of individual files. This is a very mature optimization – all developers should be using this technique, especially when developing for consoles– and yet it is surprising just how often the problem comes up. Developers often start off by loading individual files, as at the start of development, there are simply not many files to load, and the load times are bearable.

Towards the end of a project, there is a rapid increase in the number of assets in the game, and load times soar. Patching a quick load system at this point can be very difficult, especially with limited memory available on a console to act as a buffer. You will be forced to do some kind of optimization at the end of a project. Do it early. Do it maturely.

FURTHER READING

– Hyde, Randall, “Embedded-system programmers must learn the fundamentals”, EDN, 5/26/2005 http://www.edn.com/article/CA601846.html
– Kirmse, Andrew, “Optimization for C++ Games”, Game Programming Gems 3, pp 5-15
– Grinke, Sebastian, “Minimizing Agent Processing in Conflict: Desert Storm”. AI Game Programming Wisdom 2, pp 373-378

Follow Up:


Notes on “Mature Optimization”

More Follow Up:

Turns out Knuth actually wrote that back in 1974, in this paper:
Structured Programming with go to Statements (rehosted), Computing Surveys, Vol 6, No. 4, Dec 1974, page 268 (page 8 of the pdf).

It’s amusing the amount of people who quote Knuth on premature optimization who are also firm believers in “goto considered harmful”, not realizing that the paper they are quoting is basically a paean to both goto, and to mature optimization.

Update: Similar article by Randall Hyde
http://www.acm.org/ubiquity/views/v7i24_fallacy.html