Cowboy Programming

Game Development and General Hacking by the Old West

June 3rd, 2010

1995 Programming on the Sega Saturn

This is a document I wrote in 1995, while working on Neversoft’s first game: Skeleton Warriors.  It was the first game I’d worked on where I did not use 68K Assembly.

The photo shows me at work around that time. The Saturn dev kit (the “Small Box” and the ICE) is on the right.

The State of the game

The following document briefly describes the state of the code for Skeleton Warriors on the Sega Saturn, and also points at some of the many things still to be done…..

The primary purpose of this document is to effectively bring Dan, Ken and James up to speed with the current code, explaining what each module does, and how they interact. It will also give Mick an overview of the sad and sorry state of his code, hopefully prompting him to pull his socks up.

I will also maybe go into a little detail regarding the incorporation of data (the .GOB and .GOB files) into the program, and what we will be doing in the future.

The Development Hardware

The target machine is the Sega Saturn, which has two SH2 Risc microprocessors and one 68000. Currently we only use the Master SH2, the slave SH2 will be used when we get around to figuring out how. The 68000 is used to drive the sound chip, we should not have to write any code for that as we will be using the Sega supplied sound library.

The program is nearly all written in plain C. We use the GNU SH2 compiler to produce the SH2 assembly output. There are a few SH2 modules, mostly limited to pure data. I have not, as yet written anything at all of significance in SH2.

The development system we use is PsyQ. This is not the Sega standard development system, however, it is considered to be the best by everyone who has tried it. The alternative system is SNASM, by Cross Products, who are owned by Sega. Most of the sample code supplied by Sega is intended to run on the SNASM development system, however it is easily convertible to the PsyQ.

The PsyQ system consists of a SCSI interface board that you install on the PC, and a cartridge that plugs into the Saturn, and a cable to connect the two. Source is compiled on the PC and downloaded to the Saturn where it is run. The code can be debugged from the PC.

The link is controlled by a TSR program (PSYBIOS) that handles communications between the machines. It also allows the Saturn to load files from the PC in much the same way it would load them from CD. We use this feature to load the files for each level.

Mick also has a couple of large, and loud boxes in his room. He also has two PCs. The smaller of the two boxes is a E7000PC, which is an SH2 In Circuit Emulator, useful for telling where you program has crashed if the PsyQ debugger does not stop it. Also useful for watching memory writes, but I've not used it very much yet.

The Second of the loud boxes is what is known as a "Small Box" (The original "Large Box" was about the same size as a small fridge). This is essentially a Saturn, with extra interfaces for the E7000 and for the CD emulator. It also has switches on the front to change the country ID, and to switch between PAL and NTSC.

The CD emulator is what the other computer has inside it. A big board which uses the hard drive of the computer to pretend to be a CD drive. You can build a CD image on this and emulate in real time mode to see what the game will look like when it actually gets to CD. This seems to nearly more or less work to some extent, though there are a few problems which I am working with Sega to try and resolve.


Compilation and linking

The overall construction of the final program is controlled by a single makefile: MAKEFILE.MAK. This contains the dependencies and targets for the entire project, including the compilation of the .GOB and .GOV files.

Individual C source (.C) modules are compiled into SH2 (.OBJ) object modules by the program CCSH. This first calls the GNU C preprocessor CPPSH (in the C:\GNUSH2\BIN) then calling CC1SH on this output to produce SH2 assembly code, then finally calling ASSH (in C:\PSYQ) to assemble this into the final object format.

We do not use C++, as I was told it produced prohibitively large object files. However I have not experimented with this, and those // comment would be useful. Feel free to experiment.

The few SH2 assembly language files we us (with .S extension) as simply assembled directly to .OBJ files using ASMSH (not the same as ASSH, it is a more complex macro assembler). At the moment, they are really only used for the simple incorporation of data, and have no machine specific code in there.

The RAM on the Saturn that you can put code in is split into two one megabyte chunks. One at address $06000000 and the other at $00200000. The chunk at $00200000 is used exclusively to hold the graphics for the main character. The program code actually goes at $06010000 (the first $10000 bytes are used for system space, the stack and suchlike.)

The code is position dependent and is complied to run at that specific address ($06010000) and no other.

The .OBJ files are linked together with the PSYLINK program to produce a file MAIN.CPE which is an executable program with a small header that can be downloaded to the Saturn with the RUN program. PSYLINK uses the file TEST.LNK to specify which .OBJ files to include and where to put them.

The Data

The game is split into a number of levels, many of the levels use some of the same data, but mostly it is different from level to level. Each level has all the data for that level brought together in two huge files .GOV and .GOB. (for the mine it's MINE.GOV and MINE.GOB). The GOV file contains a short header, then all the data that need to be in video memory. The .GOB file contains all the data that need to be in RAM.

A level consist of some of the following data files.

.SSQ – A sprite sequencer file

.SBM – A bitmap file, used for the bitmap backgrounds

.MAP- Both maps for the character mapped backgrounds.

.TIL – The tilesets and palettes for the character mapped backgrounds.

.PTH – The data for the path and trigger points.

.TEX – Textures for the path.

.SSQ and .SBM files are produced by Mick's increasingly irrelevant sequencer "SEQ".

.MAP, .TIL, .PTH and .TEX files are produced by Dan's increasing amazing map editor "TULE".

These files are assembled into the appropriate .GOV and .GOB files, using the ASMSH assembler. See files LEVEL.S and LEVEL1.S to see how it is done. Some level specific data is also included in the .GOV file.

The Modules

TEST.S – Nothing really, sets up a few labels.

MAIN.C – The top level of the program. Contains the hardware initialization, the level setup, the level playing code and various other little bits and pieces that should really be put in more appropriate modules. Has a fair amount of garbage in as it is the module easiest to put things in for quick testing. Contains the code for loading from CD or PC fileserver. Contains the Flag for switching the TIMING color bars on and off.

GFXLIB.C – Various routines for accessing the hardware and performing various graphics functions. These were all written pretty much from scratch by Dan and are often very inefficient. If you find yourself using a routine from here a lot, it might be a good idea to take a look at what it is doing and write a faster version incorporated with your code.

However, all the functions work and provide an excellent framework for the initial implementation and testing of things. Pat on the back for Dan, without whom, none of this would be possible.

SMP_PAD.C – Various routines for reading the Saturn joypad, very hardware dependent.

GLOBALS.C – All the global variables and a few general function. The use of global variables is appropriate programming practice. However, it is rather slow to implement global variables in SH2 for various reasons, so I may eventually convert some to these to global structures, if needs be.

Contain the variables describing the state of the MAN and the PATH.

MAN.C – Handles the movement and display of the man (Prince Lightstar, Talyn, Guardian or Grimskull – the character that you control). At present this is mostly logic for moving him around and colliding with the path. Also providing the appropriate animation for each action. A lot of work still remains to be done here.

OB.C – Handles movement and display of the objects in the game, specifically the enemy objects, like Skeleton Warriors and little alien things. This is were the bulk of the gameplay will be programmed in terms of enemy AI and basic movement and triggering. the data structure is still not finalized, specifically the issues of collision and animation are not fully worked out. Lots of work here.

DATA.S – Various tables, at the moment, mainly the animations of the main man characters.

LAYER.C – The scrolling of the parallax backgrounds. Updates the character mapped backgrounds and scroll the bitmaps. Also does the line scroll (the wavy effect) on the fog layer. At the moment the maps for the character mapped layers are stored uncompressed. They need to be compressed in the RLE format I was using on the genesis version. This task may fall to Ken if we get a Saturn dev system before a Sony.

PAL.C – The palette. There are 2048 colors to choose from, any pixel on screen can be any on these colors. I have logically divided the palettes into eight 256 color palettes. PAL.C has code for initialize these, setting them up and some provision for cycling them. They will also need fading and more complex cycling, as well as brightness flashes etc.

BUL.C – Primitive setup for handling bullets (sword blast, hand blast, wrist rockets etc.) as separate objects. Still needs a fair bit for work for more complex use of bullets. Still needs proper collision and animation code.

PAD.C – Simple module to remember the state of the joypad in  a more usable format. Remembers if a button has recently been pressed, and if it is pressed now.

START.C – One line to say what the first level is, for ease of changing it with a batch file.

PANEL.C – Some simple routines for putting up a power bar.

PATH.C – Monstrous routines for drawing the path and also doing collision detection with the path.

MATH.C – Simple Sine, Cosine and Rotate a point by an angle.

[Update] Here’s some sample code from MAN.C. Everything is very hard coded, referring to a global “Man” data structure. Lots of hard coded numbers.

/**************************************************************/
/* Trigger jumping if needed, also variable height jump logic */

Man_JumpTrigger()
{
  if ( Man.JumpFudge )
  {
    Man.JumpFudge--;
  }

  if ( Man.Mode != M_Crouch || Man_StandingRoom() )    // ok if not crouched, or there is headroom
  {
    if (Pad_Jump->Pressed)               /* jump button pressed */
    {
      if ((Man.Contact || (Man.Mode == M_Hang) || Man.JumpFudge) && Pad_Jump->Triggered && !Man.Blocking) /* and not already jumping */
      {
        if (Man.Mode == M_Hang && Pad1.Down.Pressed)
        {
          Man.Contact=0;
          Man.Mode=M_Jump;
          Man.AnimBase = LS_Jumping;    /* Change base anim to jumping */
          Man_TriggerSeq(LS_Jump);    /* start the jumping start anim */
          Man.YV.f = 0x10000;           /* and have no YV */
          Man.Y.i += 4;           /* and have no YV */
        }
        else
        {
          Pad_Jump->Triggered = 0;
          if ( !JetPacCheat )
            Man.YV.f = -0x00080000;     /* Initial jump speed */
          else
            Man.YV.f = -0x00008000;     // Initial speed in Jetpac mode
          Man.Contact = 0;          /* not on the ground any more */
          Man.JumpTime = 0;         /* just started jumping */
          Man.AnimBase = LS_Jumping;    /* Change base anim to jumping */
          Man_TriggerSeq(LS_Jump);    /* start the jumping start anim */
          Man.XV.f+=Man.FlyVel;

          if (Man.HangEnd && Man.Mode == M_Hang)  // if hanging
          {                   // and on the end of a path
            Man.HangEnd = 0;
            Man.X.i += 12*Man.Facing; // the move past end of path
            Man.JumpTime = -3;      // bit more fixed v jump time
          }
          Man.Mode = M_Jump;    /* change mode to jumping */

        }
      }
      else                        /* Already jumping */
      {
        if (Man.JumpTime++ < MaxJumpTime) /* Still in initial jump period */
          Man.YV.f -= 0x0005000;        /* So can maintain jump YV */
      }
    }
    else                      /* jump button not pressed */
    {
      Man.JumpTime = MaxJumpTime+1;     /* so can't alter YV again until landed */
    }

  }

}

The file OB.C grew to a 9000 line monster file, which included all the behavior patterns of the individual objects in the game. Also with a vast amount of hard-coded numbers, like this:

Drop_Arac(S_Ob *pOb)
{
  int t;
  if (pOb->Jump==1)
  {
    pOb->yv.f+=0x7fff;
    pOb->y.f+=pOb->yv.f;
    t=Path_GetYZ(pOb->x.i,pOb->y.i,pOb)-15;
    if ((t>pOb->y.i)&&(t
y.i+20))
    {
      pOb->Jump=0;
      pOb->y.i+=15;
      Turn_Around(pOb);
      pOb->SeqFile=Sprites[SpriteMap[34]];
      Object_TriggerSeq(Arac_JumpLand,pOb);
    }
  }
  else
  {
    if (pOb->Frame==16)
      pOb->Jump=1;
    if (pOb->AnimStat==AnimDone)
    {
      pOb->t1=0;
      pOb->Mode=&Pattern_Arac;
    }
  }
  Command_Arac(pOb);
}

Nasty stuff. This style of code was appropriate when games were very small, and grew out of the 68K days.

March 20th, 2009

iPhone OpenAL Linking Problem

If you are having linker problems like:
ld warning: in /Library/Frameworks//OpenAL.framework/OpenAL, file is not of required architecture
Undefined symbols:
“_alSourcePlay”, referenced from:
SoundEngineEffect::Start()      in SoundEngine.o
SoundEngineEffect::PlaybackProc(void*)   in SoundEngine.o

Then the problem is that you have an OpenAL framework in /Library/Frameworks and XCode is looking there first.

The simple solution is to delete or rename it.  But that’s not very useful if you need it for something else, or you want you code to compile for other people.   So the correct solution is to change the Framework Search Path.

So, in Project -> Edit Project Settings  select “All Configurations” and “All Settings”, then under Search Paths -> Framework Search Paths,  add:
$SDKROOT/System/Library/Frameworks/
This will show up in the options as something like iphoneos2.2/System/Library/Frameworks/ (depending on the project setting, and at compile time will automatically switch to the correct framework for the target architecture (simulator or device) and OS version.

December 29th, 2008

Some Uses of SQL Databases in Game Development

RELATIONAL DATABASES

Relational Databases are sometimes viewed as being in the domain of business applications and web development. You would use a relational database for boring applications such as inventory, accounting, or implementing a shopping cart system for a commercial web site. Databases can be viewed by game-programmers as old-fashioned, large, slow, and replete with imponderable terminology such as “inner joins”, “foreign key” and “tuple”. However, modern databases are actually fast, easy to use, and can be quite useful in various stages of game development. While relational databases can be immensely complex and powerful, their robustness and ubiquity means they can also be useful for relatively simple tasks. In this article I’ll attempt to de-mystify databases a little, show how they can easily be incorporated into development code, and discuss a few potential usages.

WHAT IS IT?

A relational database is simply a set of tables with named columns where each row is an individual record. Game programmers essentially use many forms of relation databases at run-time to store and organize things like game objects and various resources. Those databases are generally rather ad-hoc, and tuned for a specific purpose and operating environment, and are implemented using custom code. While technically this is a database, the programmer would probably not refer to it as such. The usages of relational databases I discuss below are not intended to replace these custom in-game “databases”, but rather to add new functionality to be used during the development process.

While the examples I give are all for Windows based development, the nature of communication with a database server is basically text based, so a minimum amount of work would be required to implement similar functionality on console platforms.

SETUP

Databases are run by a server, so the first thing you need to do is set up a server. The server can be local, in that it’s part of your code, and you access it directly via a relatively low level API. Alternatively the server can be remote, and you access it via a network connection. This distinction can be blurred a little as you can have a server on your local machine, accessed via the network (using localhost) – but that’s still essentially a remote server, just somewhat quicker. Here we’ll be discussing remote servers.

There are several ways of setting up a database, and what you settle on will vary with your needs and situation. If you have developers in various locations, then you might benefit from having your database hosted by a third party, as this should ensure everyone has sufficiently fast access. If your developers are all on the same network, then you’d more typically have the databases hosted on the network server. If you are a lone developer, then you’d be more likely to have the database on your local machine, to take advantage of the additional speed.

Setting up a database is very easy. If you have remote hosting you are often supplied with a web interface such as phpMyAdmin that allows you to create databases and users. On a local network your network server will often already have some database server software installed, and you can just add a database to that. Lacking this, you can very simply install a database by downloading and installing the MySQL software, which takes only a few minutes to get up and running.

Once you have a database up and running, it’s very important that you have some way of testing your connection and the database, so you can more easily debug problems with your code. A useful tool here is HeidiSQL, a free program that let’s you connect to your database server, and setup, examine and modify databases in a visual manner. There is also more fully featured software such as PremiumSoft’s NaviCat, which performs similar functions. In the examples below I give the SQL query definitions for the database tables. While it’s quite possible to set these up using a command line tool or web interface, it’s generally easier to use a tool like NaviCat, as it allows you to more easily adjust individual parameters in your tables.

A database server can have various users. If you are just doing some initial experimentation, then you can just log in as the “root” user that you set up when you installed the server. However, as you expand the usage of the databases, then you will want to add additional users with fewer privileges to prevent inadvertent modifications to the database.

Once this is set up, you can now connect to your database from your code. The simplest way (from game code) is to use the C API. The code to connect is shown in listing 1. The SERVER_NAME would be the URL or ip address of your server, or “localhost” if it’s on your local machine. This setup needs only be done once when your program runs, and the other examples assume this has already been done, and there is a valid value in the “handle” variable. Error checking is omitted for clarity, but is something you will need to add, especially if connecting over the internet.

LISTING 1 – Code to connect to a database

#define SERVER_NAME “localhost”
#define DB_USER “user_name”
#define DB_USERPASS “password”
#define DB_NAME “db1″
MYSQL *handle=NULL;
handle = mysql_init(NULL);
mysql_real_connect(handle,SERVER_NAME,
DB_USER,DB_USERPASS,DB_NAME,0,NULL,0);

UNIQUE ASSET IDS

A useful example of using a database during development is for the generation of asset IDs. I touched on this briefly in my article “Practical Hash IDs” (Game Developer, December 2005). The idea is that for the purpose of efficiency (both speed and space) it’s best to refer to assets using a unique 32 bit ID. In the previous article, I suggested using 32 bit CRCs for the ID. That approach has a number of advantages, but there is still the problem with collisions, and if you are going to use databases in a broader manner it makes more sense to generate the IDs using the database.

The simplest way to do this is to have a table that consists of an ID and an asset name (as a string). The ID field will be an integer, and set to autoincrement. (See Listing 2) Then whenever you add an asset name to the table, a unique ID will automatically be generated. To find the 32-bit ID number of any string, we simply look to see if it’s in the database, and if not, we add it. Then we just query the database for this string. See Listing 3 for a function that implements this.

Listing 2 – SQL that defines the simple table of IDs

CREATE TABLE `table1` (
`id` int(11) NOT NULL auto_increment,
`name` text,
PRIMARY KEY (`id`)
)

Listing 3 – Get a unique ID for a string identifier

uint32 GetID(const char *name)
{
char select_query[1024];
char add_query[1024];
MYSQL_RES *result=NULL;
MYSQL_ROW row;
uint32 id = -1;
sprintf(select_query,“SELECT * FROM table1 WHERE name=’%s’”,name);
if (!mysql_query(handle,select_query)) {
result = mysql_use_result(handle);
row = mysql_fetch_row(result);
if (!row) {
sprintf(add_query,“INSERT INTO `table1` (`id`,`name`) VALUES (NULL,’%s’)”,name);
mysql_query(handle,add_query);
mysql_query(handle,select_query);
result = mysql_use_result(handle);
row = mysql_fetch_row(result);
}
id = atoi(row[0]);
mysql_free_result(result);
}
return id;
}

These IDs are typically baked into the data as part of the build process, but can also be used directly in the code in the exact same way as was outlined in “Practical Hash IDs”. Note here that what we are doing is not a run-time process. The database is only intended to be used during game development, for the initial creations of ID by the team. If you are using CRCs, then it’s quite easy to modify this code to check for collisions.

TRACKING ASSERTS

A common issue during game development is what to do about asserts and warnings in the code. Warnings are often ignored by non-technical staff, and manual solutions such as “when you see this warning, come and tell me” are not very reliable. The line between asserts and warnings is often blurred in order to facilitate uninterrupted development. Asserts (which should indicate some fatal error which requires immediate attention) sometime have an “ignore” option, which gets switched on by the creative staff, who don’t care so much about tracking down your bugs. In a development environment with a large number of people, there could be a lot of asserts or warnings being fired off by your code, and it swiftly becomes very difficult to separate signal from noise, and to get the information to the correct person. Clearly some automated system would be useful.

Here we can easily use a simple database table to track these things. Listing 4 and 5 show a simple implementation of this.

Listing 4 – SQL Table for storing asserts

CREATE TABLE `asserts` (
`assert` text,
`message` text,
`file` text,
`line` int(11) default NULL,
`machine` text,
`time` timestamp NULL default CURRENT_TIMESTAMP
)

<<LISTING END>>

Listing 5 – Assert replacements that log asserts to a database table

char assert_buffer[1024];

void assert_printf( const char* text, … )

{

va_list a;

va_start(a, text );

vsprintf( assert_buffer,text,a);

va_end(a);

}

void SQLAssert(const char *assert, const char *file, int line)

{

printf (“%s, %s, %d\n”,assert_buffer, file, line);

char query[2048];

sprintf(query,”INSERT INTO `asserts` (`assert`,`message`,`file`,`line`,`machine`) VALUES (‘%s’,'%s’,'%s’,'%d’,'%s’)”,

assert,assert_buffer,file,line,GetMachineName());

mysql_query(handle,query);

if (0 /*don’t ignore*/ ) __asm int 3

}

#define NewAssert( test)\

if( !(test)) { \

assert_buffer[0]=0 ; \

SQLAssert(#test,__FILE__,__LINE__); \

}

#define NewAssertM( test, params )\

if( !(test)) { \

assert_printf params ; \

SQLAssert(#test,__FILE__,__LINE__); \

}

// Usage, note extra parentheses:

// NewAssertM(p==NULL,(“p not NULL (%p)”,p));

// NewAssert(p==NULL);

<<LISTING END>>

The macro NewAssert is a drop-in replacement for the standard assert() macro. The only parameter is a test that must return true. If it returns false, then the macro calls SQLAssert with #test (a string containing the actual test code), and the standard file and line numbers. SQLAssert then formats a string that will add these to the database.

In addition there is a more sophisticated assert macro, NewAssertM, that takes an additional parameter which is actually a list of parameters enclosed in parentheses. These are passed to the assert_printf() function which treats it as a sprintf into the assert_buffer. The assert_buffer is then passed to the database. This allows you to add an arbitrary string to the assert info in the database, usually this would contain the values of various variables involved in whatever you are testing. See the example usage at the end of listing 5.

So what we have now is an assert macro (or a warning macro) that you can track every single instance where it fires, and no longer have to rely on the artists and level designers (or even the testers) to accurately report what is going on. You can even leave it in for beta versions, and gather a large amount of data from a geographically diverse set.

The example shown includes a field for “machine”, which is intended to hold the machine name. Using this you can identify if a particular warning is going off a lot for one particular user. You could quite easily extend the assert logging to hold any additional information that might be useful, such as the IP address, or the current level name. Since this is a standard database, it’s very easy to query, extract reports into spreadsheets, and even generate graphs and web-pages from the information in the database. Interesting metrics can be generated, such as which source files trigger the most assertions, or even what day of the week has the highest rate of problems.

The 'time’ field in the table is set to “default CURRENT_TIMESTAMP”, which means that whenever the assert fires, this field is set to the server’s current date and time. This can be very useful in tracking down bugs, as you can see when an error or warning first occurred, and attempt to correlate that with whatever was changed around that time. This can be useful for prioritizing things. If a particular assert has been triggering for several days (or weeks) and nothing is being done about it, then it might be that it needs to be downgraded to a warning, or an informational message (or you might need to fire someone.) This kind of high level overview of issues can be useful when there are a large number of developers on a team.

OTHER USES

The two examples above make very little usage of the vast power of a relational database server. The data structures are essentially flat, and there is nothing “relational” about them. Normally a database would have multiple tables, cross indexed with each other to avoid data duplication. However, there is nothing wrong with using a database in this simple manner. It might seem like overkill to use it for logging – which you might do to a CSV text file – but it does not cost us anything to do it like this, and you immediately get the benefits of multiple remote connections to a robust repository, and very sophisticated filtering and report generation. The fact that we are barely using any of the features of the database is beside the point.

With that in mind, there might be other obvious areas where a database could be used instead of plain logging. User input could be logged if your database is fast enough. Many gameplay metrics could be logged in early test versions, such as how long it takes to complete each goal in each level. This data could be collated over thousands of runs, and used to fine-tune the difficulty level of a game. Remember that once you have the basic database set up and the connection software in place, it is very easy to add arbitrary new tables, and to start recording whatever you like.

Resources:

MySQL – A powerful open source database server. dev.mysql.com

Paul DuBois – Writing MySQL Programs Using C, MySQL, Third Edition, Sample Chapter: http://www.kitebird.com/mysql-book/ch06-3ed.pdf

November 15th, 2008

My coding practices in 1991

I wrote this in 1991, when I was writing Amiga and Atari ST games for Ocean Software in Manchester, UK. I think at the time I was working on Parasol Stars. It’s an interesting look at a simpler time in games programming.

An explanation of my 68000 development system - by Mick West
------------------------------------------------------------

The purpose of this document is twofold:

	1 - To make the meaning of my code a touch more accesible
            to those that may come after me.

	2 - To remind me of the meaning of my code.

Fundamentals
------------

	A 68000 development system, at present only encompassing
the Atari ST and the Commodore Amiga. My present System is PC
based, using the SNASM assembler, though this could change.

	I try to be modular and to this end I have lots of small
INCLUDE files that contain short modules of code to do things
like random numbers, heap management, sprite printing, etc.
	These modules are (hopefully) written to use the minimum
of extrenal references and ideally to incorporate the functions
of an Abstract Data Type - ie a 'Black Box' of code that can
only be accessed by calling a number of specified routines. The
ADT module can do whatever function it is required to do in
whatever way it wants so long as it fullfills the specifications.
	A good example is the scroll routines I wrote at the
start of the 'Darkman' project. I actually wrote three diffent
scroll routines and they all did the same thing, but in
different ways, some quicker, some using less memory. All three
had the same external interface with the same named routines
(SETUP_MAP_BUFFER, MOVE_LEFT_4, etc) to facilitated the screen
scrolling. I could simply include one instead of the other and
the program would assemble exactly the same.

	There are a set of fundamental routines that practically
all games require, You need to read the keyboard, you need a
Vertical Blank Interrupt, you need a double buffered screen, you
need some way of reserving areas of memory. These basic routines
are provide by the file KERNAL.S.
	For all my projects, the first part of the source code
will INCLUDE \DEV\KERNAL.S. I should point out at this point
that all my modules that are not entirely specific to a
particular project are kept in a folder called \dev, which
should also include this file you are reading now.

	Let's have a look at the very simplest program you can
write using this method:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; SIMPLE.S - A very simple program

	org	$1000
	regs	pc=$1000,sr=2700
	include	c:\dev\kernal.s
main
	bra	main
my_vbl
	trap	#0
	rts   

; data areas
		rsset	free_memory
something	rs.b	10000

; end of SIMPLE.S
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

	Ignoring the comments and blank lines, the first two
lines are to tell SNASM where to assemble the program to and to
turn the interrupts off before running.
	The next line includes the kernal routines, as it is at
the start of the program then the first thing that will be
executed will be a small routine at the start of KERNAL.S that
kills the interrupt vectors, sets up the double buffered screen,
installs a keyboard interrupt and sets up a VBL.
	The kernal needs two external labels: MAIN and MY_VBL.
The first, MAIN (memories of C) is just the start of your
program. The second, MY_VBL is a routine that is called once per
Vertical Blank Interrupt.
	In the example above, all MAIN does is go into a loop.
All MY_VBL does is a TRAP #0 to give SNASM access to the
machine. This is not done in the kernal because you might want
some control over when you let SNASM get it's claws in, maybe
one per game cycle or whatever.

	The last couple of lines are rather fundamental in that
they illustrate how I do my memory organisation. The Kernal
allows a fixed amount (configurable, the default is 100K) for
your raw code (ie - the assembled code, including all modules).
All memory above that is assumed to be free and is reserved
using the RS directive (which I use extensivly, so you have to
understand it to understand me).
	The kernal will reserve about 70k itself for the two
screens and the stack. Then the lable FREE_MEMORY will be SET to
the end of this reserved memory. Any modules that you include
may also reserve memory, if so they will set FREE_MEMORY to the
end of the area they have just reserved. So in your main
program, if you want to reserve areas of memory then you will
simply use 'RSSET free_memory' to set the __RS variable
correctly and the just use RS to Reserve Space as desired.
	If you are writing a module that needs memory then in
addition to the above you will also need to SET the FREE_MEMORY
variable so other modules and your main code recognises what you
have done.
	This may sound complicated (it does to me!) but if you
just look back at the example program you can see that in
practice it is really very simple.

	More about the Kernal - since this is common to every
program I write, I shall describe it's functions here. Modules
should (if I have the time/inclination/energy) have a full
specification at the start of the source. So if you want to know
what it does, read the comments not the code!

Kernal Configuartion variables (compile time)
if these are undefined then they will be given a default
	ST		true if target is an ST (true)
	THE_RPF		rasters per frame (1)
	MAX_CODE	max code size (100*1024)
	RASTER_COUNT	if true then will display a raster count (false)

Kernal Variables for external usage.
	TIME.L		starts as zero, increments every VBL
	SCREEN.L	address of the virtual screen
	REAL.L		address of the real screen

Major kernal routines
	FLIP_SCREENS	flips screen and real (waits for RPF)

Major macros
	PUSHEM		save registers (always long)
	POPEM
	PUSH.s		save a register (sized)
	POP.s
	PAUSE jiffies	hangs for a certain time
	CLS screen	fills 32000 bytes with 0

This is not everything in the kernal, you should look at the
source comments for that.

Mick's Code Style Conventions
-----------------------------

	I am trying to develop a consistent style of writing
code, this will cut down on having to write unique routines
every time I write a new program. The style should not just be
about code, but about data structures and how the code accesses
them. Eventually I hope to have macros for the most common code
structures as soon as I have them fully formalised.

Mick's Maxims (not in order!!) (!)
-------------

1 - Use Meaningful Labels!
2 - Use Local Labels!
3 - Save Registers!
4 - Don't repeat code, put it in a routine or a macro!
5 - Don't use numbers, use equates!
6 - Use lower case!
7 - Use data structures!
8 - Use consistent program structures!
9 - Use Comments!
10- Make Backups! 

1 - Use Meaningful Labels! By meaningful I mean use full words,
not abbriviations, use underscores to separate words, use the
proper tense for the word and the correct plurality. EG:
	setup_heap
	flip_screens
	move_baddies
	game_over
	file_not_found

2 - Use local labels! Only use global labels for routines that
can be accessed externally. ALL labels within a routine
(including any local variables it uses) MUST be local. EG:

main_routine
	clr.w	d0
.loop
	add.w	#1,.counter
	bsr	another_routine
	bne	.loop
	rts
.counter	dc.w	0

next_routine
	...

3 - Save Registers! Unless you formally specify otherwise, all
registers not used to return values should be UNCHANGED when the
routine returns to caller. If you want to write faster versions
then do so, but SPECIFY that they may corrupt registers.

4 - Don't repeat code! Put it in a routine or a macro!

5 - Don't use numbers! Use equates! EG:

	add.w	#4,man_x		; is WRONG
	add.w	#walking_speed,man_x	; is RIGHT

6 - Use lower case! Upper case programs are a useless reminder
of the ancient days of computing. They make programs difficult
to read, and your comments either have to be in upper case, or
you have to mess around with the CAPS LOCK key. It's just a
waste of time.

7 - Use data structures! This is a big subject, addressed later.

8 - Use consistent program structures! Like mine for example.

9 - Use Comments! At the very minimum each routine should have
one comment, on  the line before the label. This should give a
brief description of what the routine does and it's input and
output requirements. A full specification of a routine should
include the following.
	Exact input/output requirements
	Register corruption
	The external references it needs
	Brief explanation (one line is ok) of the algorithm
	Possible areas for improvement
	Known bugs still to be fixed
As well as the comments at the start of a routine it is usually
a good idea to have comments in the routine itself. For
espescially complex routines it is a great help to development
to comment EVERY line to provide information on: Flow of
control, meaning of decisions, contents of registers and just
general elucidation of exactly what you think you are doing.

10 - Make Backups! Not strictly a coding requirement, but still
a vital part of development. I use an Archive program and have a
batch file to archive everything in the \DEV folder and
everything in the folder containing the project that I am
currently working on. They are archived to hard disk and then
copied to floppy, which is then taken home and copied to my hard
disk there.
	Besides the obvious frustration of having to rewrite
code that you have already slaved hours over, there is the fact
that there is a lot of money at stake here. Think money, make backup!

	Right, that's enough maxims for one day (ten was good
enough for Moses so it'll do for me) Now, let's expand on the
dual topics of Data Structures and Program structures.
	At university they taught me somthing called 'Data
Driven Design', the idea was that you built your program around
the structure of what you were doing things too, the data.
	They also waffled a lot about Abstract Data Types which
I described briefly earlier and involve keeping the data
structure seperate from the program code.
  	Data driven design is really a very simple concept. Put
at it's most basic level all you do is list everything in the
program and then write routines to handle each of them.

	This document is not supposed to be a beginners guide to
structured programming, so I'll just describe what I do, and
what I think I might do if I get around to it.

	What came first, the program structure or the data
structure? I usually start with a loose outline of the program
structure and then write some specific data structures and then
the program and the data structures evolve together as they
respond to changing needs and new idea. This is not the way it
should be, but it has happend like that because of the usual
lack of time and energy. 

	Anyway, enough waffling, what is a data structure? Well,
it's like a RECORD in pascal or a STRUCT in C, which probably
leaves you none the wiser. Essentially it is an array or a list
of groups of bytes. It is best illustrated by an example.
	Say you are writing galaxians or something similar,
obviously you need to store the positions of all the aliens. To
do this you will just have a list of say 20 bytes for each
alien, in those 20 bytes you will store information such as the
x and y position, the speed, the sprite number, the energy, what
it will score, how long before it next does something and things
like that, depending on your game.
	To get information on a specifc alien you point an
address register to the base of the alien and then use offsets
to access the various bits fo data in that alien. EG:

	lea	alien,a0		; address of alien
	move.w	(a0),d0			; get x
	move.w	2(a0),d1		; get y
	move.w	4(a0),d2		; get sprite number
	bsr	draw_alien		; draw it

	In line with my maxim "don't use numbers", I dont use
numbers as the offsets. Instead I define my structures using the
RS command. This let's you set up a data structure as if you
were defining variables using DS, but instead of the labels
being absolute addresses they will be offsets relative to the
start of the code. For example:

		rsreset			; new data structure
alien_flag	rs.w	1
alien_x		rs.w	1
alien_y		rs.w	1
alien_xv	rs.w	1
alien_yv	rs.w	1
alien_sprite	rs.w	1
alien_energy	rs.w	1
alien_len	rs.w	0

	Having set up a data structure I will define an area of
memory at the end of the program (using RS and explained a page
or so ago) like:

alien_table	rs.w	max_alien*alien_len+2

	The table will be initialised by:

		fill	#alien_table,#max_alien*alien_len,#0
		move.w	#-2,alien_table+max_alien*alien_len

	So the extra word at the end of the table will be in
effect a dummy entry in the list with a flag value of -2. I
usually end my lists with a negative value in the first field.

	There will be a routine called NEW_ALIEN that will
return a0 as the first free baddy in the list. My usual way of
going through all the objects in a list is as follows:

	lea	alien_table-alien_len,a0
.next_alien
	lea	alien_len(a0),a0
	tst.w	alien_flag(a0)
	beq	.next_alien
	bmi	.done

; just an example of the sort of thing I might do with it.
	move.w	alien_x(A0),d0
	move.w	alien_y(a0),d1
	move.w	alien_sprite(a0),d2
	bsr	draw_sprite

	bra	.next_alien
.done

	You can kill a object using

		clr.w	alien_flag(A0)

	This sort of structure is the fundamental basis of my
programs at the moment.

-------------------------------------------------------------
	I have written this in a brief attempt to slightly
formalise what I am doing, mostly to get a few things clear in
my own mind. So if you are not me and this does not seem
entirely sensible to you then hard luck...
September 9th, 2008

Debugging Memory Corruption in Game Development

Definition:  Memory corruption is an unexpected change in the contents of a memory location.

The symptoms of memory corruption can range from hard crashes, all the way through minor glitches, to no symptoms at all. The causes of memory corruption are many and varied, and include memory corruption itself.   In this article I attempt to classify the various ways in which memory corruption can manifest itself, the various causes, and some ideas for identifying the root causes of various types of memory corruption.  I’ll cover:

  • Symptoms of Memory Corruption
  • Investigating Corruption
  • Identifying Hex Droppings
  • Causes and Effects of Corruptio

 

Symptoms of Memory Corruption

Given that memory corruption can manifest in almost any way, it seems redundant to list all the symptoms. However, different symptoms of memory corruption are inticative of different causes of corruption. Sometimes we can also gather valuable clues from the type of symptom, which might lead us closer to the cause of the corruption.

 

Crashes

Crash bugs come in all flavors, but memory corruption can cause just about all of them. The way in which the game crashes can give you valuable clues as to what type of memory corruption is occurring. These clues can indicate where you need to start looking for the cause of the crash.

Address Error

An address error indicates that a pointer has been modified to point to an illegal address. This could be an address that is: not word aligned, NULL, or an address that points to unmapped or protected memory.

Address errors are quite helpful, since program execution stops when an address error is encountered, and it is quite easy to enter the debugger, and determine the address and corrupted contents of the pointer variable that is being used.

Infinite Loop

Corruption of data can make a loop fail to terminate. Take for example code that traverses a linked list. Memory may be corrupted in such a way that the list contains a loop. Since the code expects the list to terminate with a NULL value at some point, it simply carries on around the loop forever.

This behavior is a lot more likely with a list that uses indexing instead of pointers, but it is still possible with pointer. Consider  the implication of memory being corrupted in such a way that a list gets a pointer corrupted so that the list is now circular. It is very unlikely that some random corruption, or some unrelated code would happen to stick a semi-valid pointer in the right place. Hence, it is more likely that the corruption was something in the list code itself.

 

Illegal Instruction

An illegal instruction could mean one of several kinds of memory corruption.

 

Stack Corruption – If the stack has been corrupted in some way, this can lead to an incorrect return address, which ends up pointing to illegal code. This is the most common way buffer overruns are exploited by hackers.

 

Jump Table Corruption – if a v-table (or any kind of table of jump-addresses) is corrupted, then the PC can end up pointing at an illegal instruction.

 

Code Corruption – The code itself can be corrupted by a bad pointer corrupting sections of the code. This type of corruption can be very hard to detect if the code that has been corrupted is not executed very often.

 

Stack Overwriting Code – A special kind of code corruption. Runaway recursion can sometimes run unchecked until the stack overwrites the routine that it is executing. This shows up nicely in the debugger in a hex window.

 

Function Pointers – Since function pointers are sometimes stored in data structures, and passed around like regular variables, then they can be corrupted just like any other variable. This can eventually lead to the program executing incorrect code.

Unexpected Values

If you have a variable of some kind that normal has a value in a certain range, and you unexpectedly find that the variable contains some ridiculous value, then this may be due to memory corruption.

Wildly unusual values often have noticeable effects, such as the player teleporting to the end of the universe, or a model being scaled infinitely large.

Less severe corruptions can occur, for example, a counter might simply be reset to zero or even just changed slightly. This type of corruption can be difficult to track down, as it may not produce especially noticeable effects.

Here a good testing department is invaluable. If the testers can notice little inconsistencies like this, then you will catch potentially harmful bugs at a much earlier stage.

Since the location of the corruption of memory is often somewhat random, then the problem may go undetected for some time. This may give the false impression that the existing code is solid. Upon adding new code or data, the bug may reveal itself, causing you to think that the new code has caused the bug, when in fact the new code has only cause memory to be slightly re-ordered into a configuration that reveals a pre-existing bug.

 

Glitches in the Graphics

Since memory often contains graphical data, then if memory is being corrupted, it may show us as some corruption in graphics. The way this is manifest will depend on the nature of the graphics, and the nature of the corruptions.

Textures

Changes in color of a single pixel, or a very short row or column of pixels, indicates that a pointer to a variable has acquired a wrong value, perhaps the result of earlier memory corruption.

Changes to large swathes of a texture indicate either an incorrect pointer, or some kind of buffer overrun. Corruption that looks like a regular patter, often containing vertical or diagonal stripes indicates some kind of array exceeding its bounds, or one that is now at an incorrect address due to a corrupt pointer.

Corruption in a texture that resembles a squashed or discolored version of another texture indicates that you might be overwriting the texture with another one of different dimensions or different bit depth.

If the corruption is static (unchanging), then it indicates a one-time event, where a pointer was misused just once. The corruption happened, and the game went along on its way. In this case, you need to try to track down what triggered that event. Testers need to try to find a way of duplicating the circumstances that lead to the visual corruption. Video of the game is very useful in this case.

If the corruption appears to be animating, if the corrupt section is flickering, or the banded area is flashing on and off, then you have some ongoing corruption. If the game remains in this state, it should make it easier to debug.

 

Meshes

Corrupt meshes usually result in some vertices being displaced a considerable distance from the model. If the corruption region is small (a word or so), then you may just see one vertex displaced, this will appear as a thin triangle or line that extends off screen and swings wildly about as the model animates.

Corruption of a large amount of the model’s mesh can result in the model “exploding”, covering the entire screen with random looking triangles that flicker and swing around.

 

Skeletons and Animation

Corruptions of the underlying skeleton data, or associated animation, can result in the model still looking somewhat recognizable, but with the various body parts being displaced to unusual locations. Corrupt animation will result in body parts flickering and jumping around wildly. The exact manifestation of the symptoms of corruption depends upon the method used to store the animations.

 

Investigating Corruption:

If you suspect that memory corruption is occurring, then your first step is to try to determine if this is actually some form of corruption, and what type it is.

 

Is it actually corruption?

 

Just because a value in memory looks rather unusual, does not mean that it was not generated by the code that owns that memory. The unusual value might simply be the result of an error in your logic. It could have been quite legally copied from somewhere else. It could be the result of computations involving incorrect data, perhaps data that was already corrupt.

To determine this, you need to determine if the code that you might think is writing to that location actually is writing to that location, and see what values it is writing. Ideally, you would add assertions at all location that you think might legally be writing to that location, and check the range of values that are being written (make sure the “corrupt” value is outside this range.)

 

Who owns that memory location?

Memory corruption usually occurs when some piece of code is using an area of memory that it should not. The corrupt memory then causes problems in some code

There are two primary ways in which this can happen: corrupting a legal area, and using an area illegally.

Consider a piece of code A, that uses and area of memory A(m). If another piece of code, B, also happens to have a pointer to A(m), and writes some data to that, then code B is corrupting memory A(m). This is the normal form of memory corruption.

Now consider if the code A is legally using memory location A(m). Code B is illegally also using some location within (or overlapping) A(m). Code B appears to work correctly, but then code A makes a legal update to A(m), causing code B to manifest a bug. It appears that code A is corrupting memory B(m). However, the fault here is with code B. It has the appearance of corruption, yet may mislead you to thinking that the problem is with code A.

It is important to determine who actually owns the memory location that is being corrupted. Is the “legal” use actually legal? Can you demonstrate that code B actually owns those memory locations? If you can quickly determine that code B does not actually own that memory, then the tracking down of code A is irrelevant, which can save you substantial time.

 

Repeatable, Fixed Location

If the corruption is consistent, meaning it happens in the same location and under the same conditions then you are (relatively speaking) in luck. Debugging in this case is a matter of somehow watching that location, and tracking down the cause of the corruption. Since the corruption happens under the same conditions, you should be able either to trap it immediately, or quickly narrow down the possibilities.

Intermittent, Fixed Location

If the corruption happens in the same memory location, yet is intermittent, then this makes tracking down the corruption more difficult. Since you do not know when the corruption occurs, you cannot be as focused in your search, and must rely more on general observation as to the nature of the corruption when tracking down the cause.

Intermittent, Variable Location

If the corruption happens in varying location, and at unpredictable times, then your debugging options are often limited to making observations about the corruption after it has occurred.

 

Determine the location of the corruption

If the memory corruption is the immediate cause of the bug, such as with an address error due to a corrupt pointer, then you may be able to immediately determine the effect of the corruption simply by seeing what address was being accessed at the time of the bug.

If the memory corruption is an intermediate cause, then you will track down the address of the corruption in the process of analyzing the immediate cause of the bug, and any intermediate causes that lie between the root cause and the symptoms.

Hardware Breakpoints

If your target platform has some kind of break-on-access breakpoint, then use this as your first line of investigation when debugging memory corruption with a know address. Simply set the debugger to execute a breakpoint when a memory location changes, then when the location happens, see what code is executing.

This technique can work very well if the location that is being corrupted contains data that is relatively static. However, if the location contains some dynamic variable that changes hundreds of times per frame, then you may have some difficult in finding the single write access that is causing the problem.

In that case, you may be able to augment your write-access breakpoint with a conditional check that verifies that the data being written to the corrupt location is in the valid range.

Sometimes memory is corrupted with vales that are within the valid range, but nonetheless are wrong. Your options here are more limited:

 

- Repeatedly run the code, and each time the breakpoint trips, look at the call stack until you see something that you do not recognize as code that can legally write to this location.

- If the legal places that write to this location are known and relatively limited, then update them to first write to some separate location. First ensue the corruption does not also affect that separate location, and then update the breakpoint condition to check the value written matches the stored value.

Try tracing through the code, stepping over functions, when you find one that does the corruption, and then next time around, dig into that function.

If you don’t have this debugger functionality, then you can still roll your own by writing a little function that checks that memory location.  You can then sprinkle calls to this function through your code, narrowing down the region of code that causes the error.   If the meory location’s address is dependent on the code, you may need to compile the code, note the new address, and then re-compile with the new address wired into the code. 

Another manual method is to keep track of allocations that include that particular location.   Memory corruption is often due to a dangling pointer that was once legal.  So if you know the location of corruption in advance, then having a list of the callstacks of all the allocations that once owned that location can help quickly identify the culprit.

Identifying Hex Droppings

A memory location has been corrupted. Assuming you cannot quickly find what bit of code is responsible for the corruption, you can learn a lot about what that piece of code might be by examining the nature of the corruption.

Once you have identified the location that has been corrupted, then look at a hex dump of it in the debugger (or print out your own if a debugger is not available). A hex dump looks something like this.

 

0x00322B90  fd fd fd fd ab ab ab ab  ýýýý««««
0x00322B98  ab ab ab ab ee fe ee fe  ««««îþîþ
0x00322BA0  00 00 00 00 00 00 00 00  ........
0x00322BA8  12 00 0d 00 22 07 18 00  ...."...
0x00322BB0  48 2b 32 00 40 2c 32 00  H+2.@,2.

 

The memory address is on the left, then comes the contents of memory, here listed eight bytes to a row, and then those eight bytes are repeated as ASCII characters

Single Bit Corruption

Few pieces of code will cause only a single bit to be flipped. The most likely candidate is a bit-field of flags.

Single Byte Corruption

If only one byte was modified, then that can narrow down the fields considerably. If the corrupt value is 0 or 1, then perhaps it is a byte flag.

Single 32-bit Word Corruption

A 32 bit word is often the fastest and most convenient way of storing data. It is the only way for certain data types such as floats or pointers (depending on your platform). Looking at the contents of the 32 bits will tell you something about the code that inserted that value there.

If you know that a 32 bit value is being corrupted, then you should view the memory location as a single 32 bit word, rather than as a sequence of four bytes. This removes any confusion with endianness, and makes the type of data much easier to recognize.

That said, it is also a useful skill to be able to recognize certain types of data as a byte stream, since the data may be intermingled (in a class) with other data of varying types. In the examples below we give the values both as a 32-bit integer, and as a four byte little-endian format, which harder to recognize than big-endian, since that is just the word with the bytes spread out.

Zero

Example:

00000000   or   00 00 00 00

Zero is easy to recognize. At first, you might not think there is much information in a zero, but consider the limited number of reasons a piece of code could be writing a single zero to a location in memory, and it may give some clue as to what piece of code might be responsible.

 
Zero is:

 

NULL – Perhaps the errant code is clearing a pointer? Some programmers make a habit of cleaning any pointer that is a member variable after they have deleted whatever it was pointing to (a reasonable practice to help prevent dangling pointers). However they might be doing it at the wrong time.

 

Zero. Both as an integer (0) and as a floating point (0.0f). Where in the code are individual values set to zero?

 

FALSE. Perhaps the code is treating the location as a flag, and simply setting it to FALSE.

 

The first value in an enum – Perhaps a type field, of a status field. What kinds of enumerations do you have in your suspect code? What does the first entry mean? What causes the code to write out the first value?

Clear and empty - Often data structures are initilized to zero.  Does this happen anywhere in the code?  Does the size of the data being cleared match the zeros in the corruption?

One

Example

00000001 or 01 00 00 00

 

One is also easy to recognize. Less common that zero, it can still tell you something about the code that wrote it there.

 

One is

 

TRUE – Perhaps it is being used as a flag. What could be set to TRUE?

 

An integer – Hence it’s not a floating point number. You can discount code that stores floats.

 

Not a pointer – Odds are that the code causing the corruption is not thinking that it is storing a pointer, unless it is a secondary bug.

 

The first value of an enum – like any small number, it’s possible it is an enumerated value, possibly a type number.

 

Floating Point Numbers

 

Example

3F800000 or 00 00 00 80 3F

 
Many floating point numbers have an easily recognizable format. A very common floating point value is the one shown above, 3F800000 is the hex representation of the 32-bit floating point value of 1.0. See Table 24.X for additional values.

 

Table 24.X

Float			Int

0.00000000		00000000
0.50000000		3F000000
1.00000000		3F800000
-1.0000000		BF800000
2.00000000		40000000
100.000000		42C80000
0.33333334		3EAAAAAB
3.14159274		40490FDB

 

Notice how the small values start with a 3. A floating point number has the first bit being the sign, the next eights bit being the exponent, and the following 23 bits being the fractional part. Since numbers in the same range tend to have similar exponents, you can often recognize a group of floating point numbers of similar magnitude.

In games, a very common range for floating point values is from -1.0 to +1.0. These numbers are used extensively in unit vectors, transformation matrices, UV coordinates and scaling factors. Numbers in this range usually start with a 3 (for positive numbers), or a B (for negative numbers).

If you suspect it is a floating point number, you can then sometimes tell if it is an original (hard wired in the code) value, or a value arrived at by calculation. Consider the numbers above. The values 1.0, 2.0, 0.5, 100.0 all have trailing zeros in their hex representation. The value 3.3333334 also a sequence of AAAAAA in it.

By contrast, the less rational number 3.14159274 has what seems to be a random string of hex digits. We can see the degree of entropy in the hex number matches that in the floating point number.

So, a floating point number that has been the subject of some computations is much more likely to have random looking hex digits. Hence, uou can tell if you are looking for code more like from an update function:
 

p->m_speed = sqrtf(p->m_speed*p_m_speed - 2.0 * g * h);

or from an initialization function

p->m_speed = 2.5f;

 

Small Integers

 

Small integers (in the range 0 to 10000) are usually counters or enums. If you see the value incrementing or decrementing evenly, then that indicates a counter.

If you see it oscillate between a few fixed values, then it is probably some kind of state variable.

Does this small integer seem to match anything in the game at the time of corruption? Some possibilities:

Score
Health
Lives
Level number
Weapon number
Button Pressed

Try to find some correlation between what is going on the game, and the value of corruption.

Large Integers

 
As numbers get larger, the number of uses for them decrease. It’s unlikely that you will be managing groups of over 100,000 items. If you have a large integers that look like thye are counting, then you should consider what it might be counting.

Consider then if it might actually be a pointer, or a code address, and not an integer value at all.

Negative Integers

Example:

FFFFF3A2 of A2 F3 FF FF

Negative integers start with ‘F’s rather than ‘0’s.

Integers are generally used for counting things. If you have a negative integer, then that greatly narrows down the range of things it might be used for.

Some code uses the negative form of an integer as a single kind of flag to change the behavior of the code, avoiding the need to have an additional flag.

Negative numbers are also sometimes used as error codes. Some functions take a pointer as a parameter, and then return the error code in the location pointed to by the pointer. If the pointer is incorrect, that will lead to memory corruption with a negative number.

 

Magic Hex Numbers

 

Example

DEADBEEF or EF BE AD DE

 

A magic hex number in the context of debugging is a hex value that has been specifically chosen by the programmer to be visible in the debugger.

The numbers are also chosen so that using it inadvertently will maximize the chances of that use causing an error, and hence alerting the programmer to the illegal usage.

The most common use is in initializing a block of memory to certain values both when it is allocated and when it is freed. This both makes the block visible in the debugger (in the memory window), and also fills it with values that the programmer should notice if they are used either before the memory has been initialized correctly, or if the memory continues to be used after it has been freed.

 

Common Magic Hex Numbers are:

CCCCCCCC
CDCDCDCD
DEADBEEF
DEADDEAD
DDDDDDDD
FDFDFDFD

 

Use of magic numbers varies by platform. Often developers use their own magic numbers, and they tend to prefer those that can be read aloud, such as DEADBEEF.

 

Magic ASCII

 

Example:

474E5089 or 89 50 4E 47 or ‰PNG

 

Frequently asset files are identified by a four byte (partially) ASCII string that indicates the file type in some human readable way. It’s quite unlikely that this will find its way into a single word corruption, but it’s worth looking in the ASCII column in the memory window, just to check if this is the case, since if you recognize this, it should hopefully point you directly at the culprit.

 

Pointers

 

Example

00434150 or 50 41 43 00

 
Your program usually occupies a relatively small amount of the available four gigabyte address range of a 32-bit pointer. Hence, pointers usually fall within a recognizable range.

Under Win32, your executable starts at address 00400000 (4MB from the start of it’s virtual address space) so function pointers, and pointers to static data will often start with 004 (and 005, 006 etc as your program increases in size).

On the PS2, your executable start at 00100000 (1MB), so pointers will start with 001, 002, etc.

Function pointers are an unlikely candidate for corruption data, so if you see a pointer like this, it’s more likely a pointer to some static data.

The most common type of pointer to static data that is passed around is a pointer to a string. If it looks like you have a pointer in your corruption data, then try following it and see if it points to a recognizable string.

Depending on your platform, pointers may be more likely to be word aligned. On the PS2, pointers to code or any word sized data must be word aligned. The PC allows all data referencing at the byte level.

 

Random Numbers

 

Example

9D29F113 or 13 F1 29 9D

 

When you look through the memory occupied by your game, you will find surprising little data that looks random. There are usually lots of zeros, and where the data is more closely packed, certain bytes or patterns predominate.

So when you find a number that looks random, it almost certainly has some meaning. Here are some of the things it could be.

 
A floating point number – as mentioned previously, a floating point number with several significant digits will look kind of random. The constant pi (3.141592654) comes out as 40490FDB – which looks random.

 
A checksum – if your code uses a checksum, such as CRC32, for some reason, such as identifying assets, then this could be a stray one. If you have the capability, then try seeing what string generates this checksum.

 
Compressed data – well compressed data should look random. It’s unlikely that it would end up in a single word of corruption, but possible.

 
Text – It looks random at first sight, but if the bytes are mostly in the range 0×30 to 0×7F, then it is quite possible that it is a fragment of a string. See what it says in the ASCII column of the memory window.

 

Block Corruption

 

Block corruption is where a group of words in memory are corrupted more or less together. The block can be any size, but we are generally talking anything from four bytes to 1024 bytes.

The corruption data in the block may contain any combination of the types of corruption data found in a single word, as discussed previously. There are a few situations specific to block corruption.

 

Partial corruption

 

When the data in the block of memory covered is not entirely corrupt, just say every few bytes or words has been changed, then this is a good indication that we are dealing with a pointer to a data structure (a structure or class) that has gone astray.

The most likely explanation is a dangling pointer. The code is continuing to update some data structure that has already been freed.

Full corruption

 

If the block of corruption is contiguous and no byte within it remains unchanged (except for a few common bytes, like zero, that might exist frequently in both corrupt and correct data), then it seems like the data structure has either been initialized, reset, or copied from somewhere else.

 

Unit Vectors

 

A common arrangement of three floats is in a vector, and a common sub-group of vectors is the unit vector. Unit vectors are quite recognizable in memory, since they consist of three small floating point numbers (in the range -1.0 to +1.0), and so they frequently start with the hex digit 3 or B.

Here’s an example of a unit vector sitting incongruously in the middle of a string.

 

5c6b6369 73636f64 6d61675c 6e697365  ick\docs\gamesin
3e6fdb1a bd0ee1b0 3f7909cd 6f635c6b  .Ûo>°á.½ Í.y?k\co
655c6564 706d6178 5c73656c 6d617865  de\examples\exam

 

Looking at the hexadecimal, it is not immediately obvious that anything is wrong. We can see however from the text display column that there seems to be some garbage bytes in the middle of the path name.

Looking more closely at the garbage bytes, viewed as words, we can see that two of then start with 3, and one starts with B – a very good indication that we are dealing with a vector of small numbers, possibly a unit vector.

 

We can then switch to a floating point view, which gives us:

 

2.6502369e+017  1.8019267e+031  4.3599426e+027  1.8062378e+028
0.23423424      -0.034883201    0.97280580      7.0364824e+028
6.5049435e+022  2.9386312e+029  2.7403974e+017  4.3612297e+027

 

This confirms the nature of the corruption. We have three floating point numbers in the range -1.0 to +1.0, we can do a quick calculation to confirm that if we square the numbers and add them it comes out at about 1.0, so the length of the vector is 1.0, a unit vector.

 

Causes and Effects of Corruption

Once you have determined the likely nature of the corruption, you need to identify the piece of code that caused the corruption. If you are not able to directly observe the corruption taking place, you may have to selectively instrument suspicious pieces of code.

To narrow down the field of pieces of code that might be considered, we should have a look at the most common direct causes of corruption, and examine how each cause manifests itself.

Buffer Overruns

A Buffer overrun is perhaps the most common type of bug. You often hear about “buffer exploits” in the hacking world. Here a programmer has neglected to check that the size of the input data fits into the destination space. The data overruns the buffer, and possibly overwrites some space used for code. By adding some appropriate code to the end of the data, an industrious hacker can inject some of his own code into an application and take control of it.

Buffer exploits are less of a security problem for game developers, unless they are accepting data over the internet. However buffer overflows are still a very significant cause of bugs.

 

 

Bad Pointers

 

If the value of a pointer is incorrect, then it can corrupt memory (as well as providing bad data to whoever uses that pointer). The value in a pointer can become “bad” in a number of ways.

 

Dangling Pointer – If a memory block is de-allocated or freed, yet some pointer still references that block (or an object within that block), then that pointer is said to be a “Dangling Pointer”. The value of the pointer has not changed, however the pointer has become bad since it no longer points to valid data.

 

Incorrect Pointer Calculation – The pointer could be generated using incorrect pointer arithmetic, or using other values that are themselves incorrect, causing the value of the pointer to be calculated incorrectly. Pointer arithmetic might also return a pointer out of range of the target buffer – a form of buffer overflow.

 

Corrupt Pointers – The memory in which the pointer is stored may itself have become corrupted due to some unrelated cause. Thus corruption can cause corruption, extending the chain of causes.

 

Bad Local Pointers

If a pointer is created to an object that has local scope, then that pointer will only be valid while that object is in scope. See Listing 1

 

Listing 1

void CheckThing(CThing *p_x)
{
  CThing p_thing;
  p_thing = *p_x;
  if (ThingCheck(p_thing))
 {
    AddToList(p_thing));
 }
}

 

Here a local variable p_thing is being used for some temporary purpose. However, during the course of the function the variable is added to some global list, then the function returns.

The result is that there is now a pointer in some list somewhere that points to memory that is used by the stack. This will not be an immediate problem, since when the function returns, then the stack pointer will recede higher in memory, leaving the instance of p_thing safely below the stack. Then one of two things might happen.

Object gets corrupted – the object pointer to by p_thing now no longer legally exists, however its binary image is still in memory, and code can continue to use it without problems until the stack once more descends below that location in memory. At that point the object may get corrupted. This, in a sense, is not a memory corruption bug, since the writes are legal, and in the correct place. But it behaves very like a corruption bug.

Stack gets corrupted – the object is in a list, and presumably some operations are going to carried out with it. When the stack descends past this point in memory, then if that object is updated via the list, then updating the object will corrupt some memory that is legally being used by the stack. This could be a return address, it could be a saved register value, or it could be local variables in some routine higher up the call stack. Whichever it is, the effects will be deferred until the function call stack returns to that point, which could be quite distant from the cause of the problems.

 

Stack overflow

The stack overflowing can cause memory corruption in a number of ways. A stack is of a fixed size, and immediately it overflows that size it has begun to corrupt memory. What happens next depends on the size of the stack frame, the position of the stack in memory and what lays beneath it in memory.

Not all platforms are equally vulnerable to corruption for stack overflow. Win32 will simply raise a stack overflow exception if the stack pointer writes beyond the bounds of the stack. On platforms that do this, debugging is a relatively simply matter of looking at the call stack, which should have one or two functions repeated over and over, pointing you directly at the culprit.

Other platforms are less fortunate. The PS2 has not special protection for the stack pointer. The stack is frequently placed at the top of the 32MB of memory, which means that if it grows downward past the area reserved for it, it can corrupt data and possibly even code.

Code Corruption

 

As already mentioned, if the stack is allowed to proceed apace through memory, it will eventually overwrite the code that is currently being executed, causing a crash.

Code corruption due to stack overflow can often be seen in the disassembly view of the debugger. If the code before the crash location looks reasonable, and the code after looks repetitive or contains illegal instructions, then overflowing stack corruption is the most likely cause.

 

Sparse Corruption

 

The common cause of stack overflow is runaway recursion. A less common cause is moderate recursion combined with a very large stack frame. This occurs when the programmer has a local variable in the recursive routine that takes up a large amount of space. Example: See listing 2

 

Listing 2

class	CBuffer
{
int x[2048];
}

int DigTree(CTree *p_tree)
{
 CBuffer local_buffer;
 Dig(p_tree,local_buffer);
 if (NotFinished(p_tree))
   DigTree(p_tree);
 Finish(p_tree,local_buffer);
}

 

Here DigTree is a recursive function that has a local variable local_buffer. A new instance of local_buffer must be created on the stack. Since the CBuffer class takes 8K of memory, it takes relatively few recursions to overflow the stack, especially on consoles such as the Gamecube where the stack size is kept as low as possible, often down to 64K or less.

If the huge CBuffer object is not cleared every time the function is entered, then this can have the effect of corruption being evenly spaced every 8K through memory. This can be quite a red herring in a number of ways. Firstly, the first time you see the corruption it might seem to be just a single instance of corruption, making you not think of a stack overflow.

Secondly, it can overstep any tests you do to check for stack overflow. Often you would place some magic numbers in the bottom of the stack, and check to see if they are still there, as a way of detecting the stack has overflowed. If the game does not immediately crash during the recursion, the stack pointer will return to a normal address, and your code will run along merrily until the corruption causes some later problem.

Thirdly, if it is runaway recursion, then the large stack frame might overstep the code that is being executed, causing widespread code corruption, yet not actually crashing in the code that caused the corruption. While this should still point you to stack overflow, the culprit will be less obvious. A stack analysis on the corrupting stack frame will indicate the location of the corrupting code – providing you can detect the write that causes the corruption.