November 15, 2008

My coding practices in 1991

Filed under: Game Development — Mick West @ 9:55 am

I wrote this in 1991, when I was writing Amiga and Atari ST games for Ocean Software in Manchester, UK. I think at the time I was working on Parasol Stars. It’s an interesting look at a simpler time in games programming.

An explanation of my 68000 development system - by Mick West
------------------------------------------------------------

The purpose of this document is twofold:

	1 - To make the meaning of my code a touch more accesible
            to those that may come after me.

	2 - To remind me of the meaning of my code.


Fundamentals
------------

	A 68000 development system, at present only encompassing
the Atari ST and the Commodore Amiga. My present System is PC
based, using the SNASM assembler, though this could change.

	I try to be modular and to this end I have lots of small
INCLUDE files that contain short modules of code to do things
like random numbers, heap management, sprite printing, etc.
	These modules are (hopefully) written to use the minimum
of extrenal references and ideally to incorporate the functions
of an Abstract Data Type - ie a 'Black Box' of code that can
only be accessed by calling a number of specified routines. The
ADT module can do whatever function it is required to do in
whatever way it wants so long as it fullfills the specifications.
	A good example is the scroll routines I wrote at the
start of the 'Darkman' project. I actually wrote three diffent
scroll routines and they all did the same thing, but in
different ways, some quicker, some using less memory. All three
had the same external interface with the same named routines
(SETUP_MAP_BUFFER, MOVE_LEFT_4, etc) to facilitated the screen
scrolling. I could simply include one instead of the other and
the program would assemble exactly the same.

	There are a set of fundamental routines that practically
all games require, You need to read the keyboard, you need a
Vertical Blank Interrupt, you need a double buffered screen, you
need some way of reserving areas of memory. These basic routines
are provide by the file KERNAL.S.
	For all my projects, the first part of the source code
will INCLUDE \DEV\KERNAL.S. I should point out at this point
that all my modules that are not entirely specific to a
particular project are kept in a folder called \dev, which
should also include this file you are reading now.
	
	Let's have a look at the very simplest program you can
write using this method:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; SIMPLE.S - A very simple program

	org	$1000
	regs	pc=$1000,sr=2700
	include	c:\dev\kernal.s	
main
	bra	main
my_vbl
	trap	#0
	rts   

; data areas
		rsset	free_memory
something	rs.b	10000

; end of SIMPLE.S
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

	Ignoring the comments and blank lines, the first two
lines are to tell SNASM where to assemble the program to and to
turn the interrupts off before running.
	The next line includes the kernal routines, as it is at
the start of the program then the first thing that will be
executed will be a small routine at the start of KERNAL.S that
kills the interrupt vectors, sets up the double buffered screen,
installs a keyboard interrupt and sets up a VBL. 
	The kernal needs two external labels: MAIN and MY_VBL.
The first, MAIN (memories of C) is just the start of your
program. The second, MY_VBL is a routine that is called once per
Vertical Blank Interrupt.
	In the example above, all MAIN does is go into a loop.
All MY_VBL does is a TRAP #0 to give SNASM access to the
machine. This is not done in the kernal because you might want
some control over when you let SNASM get it's claws in, maybe
one per game cycle or whatever.

	The last couple of lines are rather fundamental in that
they illustrate how I do my memory organisation. The Kernal
allows a fixed amount (configurable, the default is 100K) for
your raw code (ie - the assembled code, including all modules).
All memory above that is assumed to be free and is reserved
using the RS directive (which I use extensivly, so you have to
understand it to understand me).
	The kernal will reserve about 70k itself for the two
screens and the stack. Then the lable FREE_MEMORY will be SET to
the end of this reserved memory. Any modules that you include
may also reserve memory, if so they will set FREE_MEMORY to the
end of the area they have just reserved. So in your main
program, if you want to reserve areas of memory then you will
simply use 'RSSET free_memory' to set the __RS variable
correctly and the just use RS to Reserve Space as desired.
	If you are writing a module that needs memory then in
addition to the above you will also need to SET the FREE_MEMORY
variable so other modules and your main code recognises what you
have done.
	This may sound complicated (it does to me!) but if you
just look back at the example program you can see that in
practice it is really very simple.

	More about the Kernal - since this is common to every
program I write, I shall describe it's functions here. Modules
should (if I have the time/inclination/energy) have a full
specification at the start of the source. So if you want to know
what it does, read the comments not the code!

Kernal Configuartion variables (compile time)
if these are undefined then they will be given a default
	ST		true if target is an ST (true) 
	THE_RPF		rasters per frame (1)
	MAX_CODE	max code size (100*1024)
	RASTER_COUNT	if true then will display a raster count (false)

Kernal Variables for external usage.
	TIME.L		starts as zero, increments every VBL
	SCREEN.L	address of the virtual screen
	REAL.L		address of the real screen

Major kernal routines
	FLIP_SCREENS	flips screen and real (waits for RPF)
	
Major macros
	PUSHEM		save registers (always long)
	POPEM
	PUSH.s		save a register (sized)
	POP.s
	PAUSE jiffies	hangs for a certain time
	CLS screen	fills 32000 bytes with 0

This is not everything in the kernal, you should look at the
source comments for that.


Mick's Code Style Conventions
-----------------------------
	
	I am trying to develop a consistent style of writing
code, this will cut down on having to write unique routines
every time I write a new program. The style should not just be
about code, but about data structures and how the code accesses
them. Eventually I hope to have macros for the most common code
structures as soon as I have them fully formalised.

Mick's Maxims (not in order!!) (!)
-------------

1 - Use Meaningful Labels!
2 - Use Local Labels!
3 - Save Registers! 
4 - Don't repeat code, put it in a routine or a macro!
5 - Don't use numbers, use equates!
6 - Use lower case!
7 - Use data structures!
8 - Use consistent program structures!
9 - Use Comments! 
10- Make Backups! 



1 - Use Meaningful Labels! By meaningful I mean use full words,
not abbriviations, use underscores to separate words, use the
proper tense for the word and the correct plurality. EG:
	setup_heap
	flip_screens
	move_baddies
	game_over
	file_not_found
	
2 - Use local labels! Only use global labels for routines that
can be accessed externally. ALL labels within a routine
(including any local variables it uses) MUST be local. EG:

main_routine
	clr.w	d0
.loop
	add.w	#1,.counter
	bsr	another_routine
	bne	.loop
	rts
.counter	dc.w	0

next_routine
	...
	
3 - Save Registers! Unless you formally specify otherwise, all
registers not used to return values should be UNCHANGED when the
routine returns to caller. If you want to write faster versions
then do so, but SPECIFY that they may corrupt registers.

4 - Don't repeat code! Put it in a routine or a macro!

5 - Don't use numbers! Use equates! EG:

	add.w	#4,man_x		; is WRONG
	add.w	#walking_speed,man_x	; is RIGHT

6 - Use lower case! Upper case programs are a useless reminder
of the ancient days of computing. They make programs difficult
to read, and your comments either have to be in upper case, or
you have to mess around with the CAPS LOCK key. It's just a
waste of time.

7 - Use data structures! This is a big subject, addressed later.

8 - Use consistent program structures! Like mine for example.

9 - Use Comments! At the very minimum each routine should have
one comment, on  the line before the label. This should give a
brief description of what the routine does and it's input and
output requirements. A full specification of a routine should
include the following.
	Exact input/output requirements
	Register corruption
	The external references it needs
	Brief explanation (one line is ok) of the algorithm
	Possible areas for improvement
	Known bugs still to be fixed
As well as the comments at the start of a routine it is usually
a good idea to have comments in the routine itself. For
espescially complex routines it is a great help to development
to comment EVERY line to provide information on: Flow of
control, meaning of decisions, contents of registers and just
general elucidation of exactly what you think you are doing.

10 - Make Backups! Not strictly a coding requirement, but still
a vital part of development. I use an Archive program and have a
batch file to archive everything in the \DEV folder and
everything in the folder containing the project that I am
currently working on. They are archived to hard disk and then
copied to floppy, which is then taken home and copied to my hard
disk there.
	Besides the obvious frustration of having to rewrite
code that you have already slaved hours over, there is the fact
that there is a lot of money at stake here. Think money, make backup!

	Right, that's enough maxims for one day (ten was good
enough for Moses so it'll do for me) Now, let's expand on the
dual topics of Data Structures and Program structures. 
	At university they taught me somthing called 'Data
Driven Design', the idea was that you built your program around
the structure of what you were doing things too, the data.
	They also waffled a lot about Abstract Data Types which
I described briefly earlier and involve keeping the data
structure seperate from the program code.
  	Data driven design is really a very simple concept. Put
at it's most basic level all you do is list everything in the
program and then write routines to handle each of them.

	This document is not supposed to be a beginners guide to
structured programming, so I'll just describe what I do, and
what I think I might do if I get around to it.

	What came first, the program structure or the data
structure? I usually start with a loose outline of the program
structure and then write some specific data structures and then
the program and the data structures evolve together as they
respond to changing needs and new idea. This is not the way it
should be, but it has happend like that because of the usual
lack of time and energy. 

	Anyway, enough waffling, what is a data structure? Well,
it's like a RECORD in pascal or a STRUCT in C, which probably
leaves you none the wiser. Essentially it is an array or a list
of groups of bytes. It is best illustrated by an example.
	Say you are writing galaxians or something similar,
obviously you need to store the positions of all the aliens. To
do this you will just have a list of say 20 bytes for each
alien, in those 20 bytes you will store information such as the
x and y position, the speed, the sprite number, the energy, what
it will score, how long before it next does something and things
like that, depending on your game.
	To get information on a specifc alien you point an
address register to the base of the alien and then use offsets
to access the various bits fo data in that alien. EG:

	lea	alien,a0		; address of alien
	move.w	(a0),d0			; get x
	move.w	2(a0),d1		; get y
	move.w	4(a0),d2		; get sprite number
	bsr	draw_alien		; draw it

	In line with my maxim "don't use numbers", I dont use
numbers as the offsets. Instead I define my structures using the
RS command. This let's you set up a data structure as if you
were defining variables using DS, but instead of the labels
being absolute addresses they will be offsets relative to the
start of the code. For example:

		rsreset			; new data structure
alien_flag	rs.w	1
alien_x		rs.w	1
alien_y		rs.w	1
alien_xv	rs.w	1
alien_yv	rs.w	1
alien_sprite	rs.w	1
alien_energy	rs.w	1
alien_len	rs.w	0

	Having set up a data structure I will define an area of
memory at the end of the program (using RS and explained a page
or so ago) like:

alien_table	rs.w	max_alien*alien_len+2

	The table will be initialised by:

		fill	#alien_table,#max_alien*alien_len,#0
		move.w	#-2,alien_table+max_alien*alien_len

	So the extra word at the end of the table will be in
effect a dummy entry in the list with a flag value of -2. I
usually end my lists with a negative value in the first field.

	There will be a routine called NEW_ALIEN that will
return a0 as the first free baddy in the list. My usual way of
going through all the objects in a list is as follows:

	lea	alien_table-alien_len,a0
.next_alien
	lea	alien_len(a0),a0
	tst.w	alien_flag(a0)
	beq	.next_alien
	bmi	.done

; just an example of the sort of thing I might do with it.
	move.w	alien_x(A0),d0
	move.w	alien_y(a0),d1
	move.w	alien_sprite(a0),d2
	bsr	draw_sprite
	
	bra	.next_alien
.done

	
	You can kill a object using

		clr.w	alien_flag(A0)

	This sort of structure is the fundamental basis of my
programs at the moment.




	
-------------------------------------------------------------
	I have written this in a brief attempt to slightly
formalise what I am doing, mostly to get a few things clear in
my own mind. So if you are not me and this does not seem
entirely sensible to you then hard luck...

Comments (17)

September 9, 2008

Debugging Memory Corruption in Game Development

Filed under: Game Development — Mick West @ 7:42 am

Definition: Memory corruption is an unexpected change in the contents of a memory location.

The symptoms of memory corruption can range from hard crashes, all the way through minor glitches, to no symptoms at all. The causes of memory corruption are many and varied, and include memory corruption itself. In this article I attempt to classify the various ways in which memory corruption can manifest itself, the various causes, and some ideas for identifying the root causes of various types of memory corruption. I’ll cover:

Symptoms of Memory Corruption

Investigating Corruption

Identifying Hex Droppings

Causes and Effects of Corruption

Symptoms of Memory Corruption

Given that memory corruption can manifest in almost any way, it seems redundant to list all the symptoms. However, different symptoms of memory corruption are inticative of different causes of corruption. Sometimes we can also gather valuable clues from the type of symptom, which might lead us closer to the cause of the corruption.

Crashes

Crash bugs come in all flavors, but memory corruption can cause just about all of them. The way in which the game crashes can give you valuable clues as to what type of memory corruption is occurring. These clues can indicate where you need to start looking for the cause of the crash.

Address Error

An address error indicates that a pointer has been modified to point to an illegal address. This could be an address that is: not word aligned, NULL, or an address that points to unmapped or protected memory.

Address errors are quite helpful, since program execution stops when an address error is encountered, and it is quite easy to enter the debugger, and determine the address and corrupted contents of the pointer variable that is being used.

Infinite Loop

Corruption of data can make a loop fail to terminate. Take for example code that traverses a linked list. Memory may be corrupted in such a way that the list contains a loop. Since the code expects the list to terminate with a NULL value at some point, it simply carries on around the loop forever.

This behavior is a lot more likely with a list that uses indexing instead of pointers, but it is still possible with pointer. Consider the implication of memory being corrupted in such a way that a list gets a pointer corrupted so that the list is now circular. It is very unlikely that some random corruption, or some unrelated code would happen to stick a semi-valid pointer in the right place. Hence, it is more likely that the corruption was something in the list code itself.

Illegal Instruction

An illegal instruction could mean one of several kinds of memory corruption.

Stack Corruption – If the stack has been corrupted in some way, this can lead to an incorrect return address, which ends up pointing to illegal code. This is the most common way buffer overruns are exploited by hackers.

Jump Table Corruption – if a v-table (or any kind of table of jump-addresses) is corrupted, then the PC can end up pointing at an illegal instruction.

Code Corruption – The code itself can be corrupted by a bad pointer corrupting sections of the code. This type of corruption can be very hard to detect if the code that has been corrupted is not executed very often.

Stack Overwriting Code – A special kind of code corruption. Runaway recursion can sometimes run unchecked until the stack overwrites the routine that it is executing. This shows up nicely in the debugger in a hex window.

Function Pointers – Since function pointers are sometimes stored in data structures, and passed around like regular variables, then they can be corrupted just like any other variable. This can eventually lead to the program executing incorrect code.

Unexpected Values

If you have a variable of some kind that normal has a value in a certain range, and you unexpectedly find that the variable contains some ridiculous value, then this may be due to memory corruption.

Wildly unusual values often have noticeable effects, such as the player teleporting to the end of the universe, or a model being scaled infinitely large.

Less severe corruptions can occur, for example, a counter might simply be reset to zero or even just changed slightly. This type of corruption can be difficult to track down, as it may not produce especially noticeable effects.

Here a good testing department is invaluable. If the testers can notice little inconsistencies like this, then you will catch potentially harmful bugs at a much earlier stage.

Since the location of the corruption of memory is often somewhat random, then the problem may go undetected for some time. This may give the false impression that the existing code is solid. Upon adding new code or data, the bug may reveal itself, causing you to think that the new code has caused the bug, when in fact the new code has only cause memory to be slightly re-ordered into a configuration that reveals a pre-existing bug.

Glitches in the Graphics

Since memory often contains graphical data, then if memory is being corrupted, it may show us as some corruption in graphics. The way this is manifest will depend on the nature of the graphics, and the nature of the corruptions.

Textures

Changes in color of a single pixel, or a very short row or column of pixels, indicates that a pointer to a variable has acquired a wrong value, perhaps the result of earlier memory corruption.

Changes to large swathes of a texture indicate either an incorrect pointer, or some kind of buffer overrun. Corruption that looks like a regular patter, often containing vertical or diagonal stripes indicates some kind of array exceeding its bounds, or one that is now at an incorrect address due to a corrupt pointer.

Corruption in a texture that resembles a squashed or discolored version of another texture indicates that you might be overwriting the texture with another one of different dimensions or different bit depth.

If the corruption is static (unchanging), then it indicates a one-time event, where a pointer was misused just once. The corruption happened, and the game went along on its way. In this case, you need to try to track down what triggered that event. Testers need to try to find a way of duplicating the circumstances that lead to the visual corruption. Video of the game is very useful in this case.

If the corruption appears to be animating, if the corrupt section is flickering, or the banded area is flashing on and off, then you have some ongoing corruption. If the game remains in this state, it should make it easier to debug.

Meshes

Corrupt meshes usually result in some vertices being displaced a considerable distance from the model. If the corruption region is small (a word or so), then you may just see one vertex displaced, this will appear as a thin triangle or line that extends off screen and swings wildly about as the model animates.

Corruption of a large amount of the model’s mesh can result in the model “exploding”, covering the entire screen with random looking triangles that flicker and swing around.

Skeletons and Animation

Corruptions of the underlying skeleton data, or associated animation, can result in the model still looking somewhat recognizable, but with the various body parts being displaced to unusual locations. Corrupt animation will result in body parts flickering and jumping around wildly. The exact manifestation of the symptoms of corruption depends upon the method used to store the animations.

Investigating Corruption:

If you suspect that memory corruption is occurring, then your first step is to try to determine if this is actually some form of corruption, and what type it is.

Is it actually corruption?

Just because a value in memory looks rather unusual, does not mean that it was not generated by the code that owns that memory. The unusual value might simply be the result of an error in your logic. It could have been quite legally copied from somewhere else. It could be the result of computations involving incorrect data, perhaps data that was already corrupt.

To determine this, you need to determine if the code that you might think is writing to that location actually is writing to that location, and see what values it is writing. Ideally, you would add assertions at all location that you think might legally be writing to that location, and check the range of values that are being written (make sure the “corrupt” value is outside this range.)

Who owns that memory location?

Memory corruption usually occurs when some piece of code is using an area of memory that it should not. The corrupt memory then causes problems in some code

There are two primary ways in which this can happen: corrupting a legal area, and using an area illegally.

Consider a piece of code A, that uses and area of memory A(m). If another piece of code, B, also happens to have a pointer to A(m), and writes some data to that, then code B is corrupting memory A(m). This is the normal form of memory corruption.

Now consider if the code A is legally using memory location A(m). Code B is illegally also using some location within (or overlapping) A(m). Code B appears to work correctly, but then code A makes a legal update to A(m), causing code B to manifest a bug. It appears that code A is corrupting memory B(m). However, the fault here is with code B. It has the appearance of corruption, yet may mislead you to thinking that the problem is with code A.

It is important to determine who actually owns the memory location that is being corrupted. Is the “legal” use actually legal? Can you demonstrate that code B actually owns those memory locations? If you can quickly determine that code B does not actually own that memory, then the tracking down of code A is irrelevant, which can save you substantial time.

Repeatable, Fixed Location

If the corruption is consistent, meaning it happens in the same location and under the same conditions then you are (relatively speaking) in luck. Debugging in this case is a matter of somehow watching that location, and tracking down the cause of the corruption. Since the corruption happens under the same conditions, you should be able either to trap it immediately, or quickly narrow down the possibilities.

Intermittent, Fixed Location

If the corruption happens in the same memory location, yet is intermittent, then this makes tracking down the corruption more difficult. Since you do not know when the corruption occurs, you cannot be as focused in your search, and must rely more on general observation as to the nature of the corruption when tracking down the cause.

Intermittent, Variable Location

If the corruption happens in varying location, and at unpredictable times, then your debugging options are often limited to making observations about the corruption after it has occurred.

Determine the location of the corruption

If the memory corruption is the immediate cause of the bug, such as with an address error due to a corrupt pointer, then you may be able to immediately determine the effect of the corruption simply by seeing what address was being accessed at the time of the bug.

If the memory corruption is an intermediate cause, then you will track down the address of the corruption in the process of analyzing the immediate cause of the bug, and any intermediate causes that lie between the root cause and the symptoms.

Hardware Breakpoints

If your target platform has some kind of break-on-access breakpoint, then use this as your first line of investigation when debugging memory corruption with a know address. Simply set the debugger to execute a breakpoint when a memory location changes, then when the location happens, see what code is executing.

This technique can work very well if the location that is being corrupted contains data that is relatively static. However, if the location contains some dynamic variable that changes hundreds of times per frame, then you may have some difficult in finding the single write access that is causing the problem.

In that case, you may be able to augment your write-access breakpoint with a conditional check that verifies that the data being written to the corrupt location is in the valid range.

Sometimes memory is corrupted with vales that are within the valid range, but nonetheless are wrong. Your options here are more limited:

– Repeatedly run the code, and each time the breakpoint trips, look at the call stack until you see something that you do not recognize as code that can legally write to this location.

– If the legal places that write to this location are known and relatively limited, then update them to first write to some separate location. First ensue the corruption does not also affect that separate location, and then update the breakpoint condition to check the value written matches the stored value.

Try tracing through the code, stepping over functions, when you find one that does the corruption, and then next time around, dig into that function.

If you don’t have this debugger functionality, then you can still roll your own by writing a little function that checks that memory location. You can then sprinkle calls to this function through your code, narrowing down the region of code that causes the error. If the meory location’s address is dependent on the code, you may need to compile the code, note the new address, and then re-compile with the new address wired into the code.

Another manual method is to keep track of allocations that include that particular location. Memory corruption is often due to a dangling pointer that was once legal. So if you know the location of corruption in advance, then having a list of the callstacks of all the allocations that once owned that location can help quickly identify the culprit.

Identifying Hex Droppings

A memory location has been corrupted. Assuming you cannot quickly find what bit of code is responsible for the corruption, you can learn a lot about what that piece of code might be by examining the nature of the corruption.

Once you have identified the location that has been corrupted, then look at a hex dump of it in the debugger (or print out your own if a debugger is not available). A hex dump looks something like this.

0x00322B90  fd fd fd fd ab ab ab ab  Ã½Ã½Ã½Ã½ « « « «
0x00322B98  ab ab ab ab ee fe ee fe   « « « «Ã®Ã¾Ã®Ã¾
0x00322BA0  00 00 00 00 00 00 00 00  ........
0x00322BA8  12 00 0d 00 22 07 18 00  ...."...
0x00322BB0  48 2b 32 00 40 2c 32 00  H+2.@,2.

The memory address is on the left, then comes the contents of memory, here listed eight bytes to a row, and then those eight bytes are repeated as ASCII characters

Single Bit Corruption

Few pieces of code will cause only a single bit to be flipped. The most likely candidate is a bit-field of flags.

Single Byte Corruption

If only one byte was modified, then that can narrow down the fields considerably. If the corrupt value is 0 or 1, then perhaps it is a byte flag.

Single 32-bit Word Corruption

A 32 bit word is often the fastest and most convenient way of storing data. It is the only way for certain data types such as floats or pointers (depending on your platform). Looking at the contents of the 32 bits will tell you something about the code that inserted that value there.

If you know that a 32 bit value is being corrupted, then you should view the memory location as a single 32 bit word, rather than as a sequence of four bytes. This removes any confusion with endianness, and makes the type of data much easier to recognize.

That said, it is also a useful skill to be able to recognize certain types of data as a byte stream, since the data may be intermingled (in a class) with other data of varying types. In the examples below we give the values both as a 32-bit integer, and as a four byte little-endian format, which harder to recognize than big-endian, since that is just the word with the bytes spread out.

Zero

Example:

00000000   or   00 00 00 00

Zero is easy to recognize. At first, you might not think there is much information in a zero, but consider the limited number of reasons a piece of code could be writing a single zero to a location in memory, and it may give some clue as to what piece of code might be responsible.

Zero is:

NULL – Perhaps the errant code is clearing a pointer? Some programmers make a habit of cleaning any pointer that is a member variable after they have deleted whatever it was pointing to (a reasonable practice to help prevent dangling pointers). However they might be doing it at the wrong time.

Zero. Both as an integer (0) and as a floating point (0.0f). Where in the code are individual values set to zero?

FALSE. Perhaps the code is treating the location as a flag, and simply setting it to FALSE.

The first value in an enum – Perhaps a type field, of a status field. What kinds of enumerations do you have in your suspect code? What does the first entry mean? What causes the code to write out the first value?

Clear and empty – Often data structures are initilized to zero. Does this happen anywhere in the code? Does the size of the data being cleared match the zeros in the corruption?

One

Example

00000001 or 01 00 00 00

One is also easy to recognize. Less common that zero, it can still tell you something about the code that wrote it there.

One is

TRUE – Perhaps it is being used as a flag. What could be set to TRUE?

An integer – Hence it’s not a floating point number. You can discount code that stores floats.

Not a pointer – Odds are that the code causing the corruption is not thinking that it is storing a pointer, unless it is a secondary bug.

The first value of an enum – like any small number, it’s possible it is an enumerated value, possibly a type number.

Floating Point Numbers

Example

3F800000 or 00 00 00 80 3F

Many floating point numbers have an easily recognizable format. A very common floating point value is the one shown above, 3F800000 is the hex representation of the 32-bit floating point value of 1.0. See Table 24.X for additional values.

Table 24.X

Float			Int

0.00000000		00000000
0.50000000		3F000000
1.00000000		3F800000
-1.0000000		BF800000
2.00000000		40000000
100.000000		42C80000
0.33333334		3EAAAAAB
3.14159274		40490FDB

Notice how the small values start with a 3. A floating point number has the first bit being the sign, the next eights bit being the exponent, and the following 23 bits being the fractional part. Since numbers in the same range tend to have similar exponents, you can often recognize a group of floating point numbers of similar magnitude.

In games, a very common range for floating point values is from -1.0 to +1.0. These numbers are used extensively in unit vectors, transformation matrices, UV coordinates and scaling factors. Numbers in this range usually start with a 3 (for positive numbers), or a B (for negative numbers).

If you suspect it is a floating point number, you can then sometimes tell if it is an original (hard wired in the code) value, or a value arrived at by calculation. Consider the numbers above. The values 1.0, 2.0, 0.5, 100.0 all have trailing zeros in their hex representation. The value 3.3333334 also a sequence of AAAAAA in it.

By contrast, the less rational number 3.14159274 has what seems to be a random string of hex digits. We can see the degree of entropy in the hex number matches that in the floating point number.

So, a floating point number that has been the subject of some computations is much more likely to have random looking hex digits. Hence, uou can tell if you are looking for code more like from an update function:

p->m_speed = sqrtf(p->m_speed*p_m_speed - 2.0 * g * h);

or from an initialization function

p->m_speed = 2.5f;

Small Integers

Small integers (in the range 0 to 10000) are usually counters or enums. If you see the value incrementing or decrementing evenly, then that indicates a counter.

If you see it oscillate between a few fixed values, then it is probably some kind of state variable.

Does this small integer seem to match anything in the game at the time of corruption? Some possibilities:

Score
Health
Lives
Level number
Weapon number
Button Pressed

Try to find some correlation between what is going on the game, and the value of corruption.

Large Integers

As numbers get larger, the number of uses for them decrease. It’s unlikely that you will be managing groups of over 100,000 items. If you have a large integers that look like thye are counting, then you should consider what it might be counting.

Consider then if it might actually be a pointer, or a code address, and not an integer value at all.

Negative Integers

Example:

FFFFF3A2 of A2 F3 FF FF

Negative integers start with ‘F’s rather than ‘0’s.

Integers are generally used for counting things. If you have a negative integer, then that greatly narrows down the range of things it might be used for.

Some code uses the negative form of an integer as a single kind of flag to change the behavior of the code, avoiding the need to have an additional flag.

Negative numbers are also sometimes used as error codes. Some functions take a pointer as a parameter, and then return the error code in the location pointed to by the pointer. If the pointer is incorrect, that will lead to memory corruption with a negative number.

Magic Hex Numbers

Example

DEADBEEF or EF BE AD DE

A magic hex number in the context of debugging is a hex value that has been specifically chosen by the programmer to be visible in the debugger.

The numbers are also chosen so that using it inadvertently will maximize the chances of that use causing an error, and hence alerting the programmer to the illegal usage.

The most common use is in initializing a block of memory to certain values both when it is allocated and when it is freed. This both makes the block visible in the debugger (in the memory window), and also fills it with values that the programmer should notice if they are used either before the memory has been initialized correctly, or if the memory continues to be used after it has been freed.

Common Magic Hex Numbers are:

CCCCCCCC
CDCDCDCD
DEADBEEF
DEADDEAD
DDDDDDDD
FDFDFDFD

Use of magic numbers varies by platform. Often developers use their own magic numbers, and they tend to prefer those that can be read aloud, such as DEADBEEF.

Magic ASCII

Example:

474E5089 or 89 50 4E 47 or ”°PNG

Frequently asset files are identified by a four byte (partially) ASCII string that indicates the file type in some human readable way. It’s quite unlikely that this will find its way into a single word corruption, but it’s worth looking in the ASCII column in the memory window, just to check if this is the case, since if you recognize this, it should hopefully point you directly at the culprit.

Pointers

Example

00434150 or 50 41 43 00

Your program usually occupies a relatively small amount of the available four gigabyte address range of a 32-bit pointer. Hence, pointers usually fall within a recognizable range.

Under Win32, your executable starts at address 00400000 (4MB from the start of it’s virtual address space) so function pointers, and pointers to static data will often start with 004 (and 005, 006 etc as your program increases in size).

On the PS2, your executable start at 00100000 (1MB), so pointers will start with 001, 002, etc.

Function pointers are an unlikely candidate for corruption data, so if you see a pointer like this, it’s more likely a pointer to some static data.

The most common type of pointer to static data that is passed around is a pointer to a string. If it looks like you have a pointer in your corruption data, then try following it and see if it points to a recognizable string.

Depending on your platform, pointers may be more likely to be word aligned. On the PS2, pointers to code or any word sized data must be word aligned. The PC allows all data referencing at the byte level.

Random Numbers

Example

9D29F113 or 13 F1 29 9D

When you look through the memory occupied by your game, you will find surprising little data that looks random. There are usually lots of zeros, and where the data is more closely packed, certain bytes or patterns predominate.

So when you find a number that looks random, it almost certainly has some meaning. Here are some of the things it could be.

A floating point number – as mentioned previously, a floating point number with several significant digits will look kind of random. The constant pi (3.141592654) comes out as 40490FDB – which looks random.

A checksum – if your code uses a checksum, such as CRC32, for some reason, such as identifying assets, then this could be a stray one. If you have the capability, then try seeing what string generates this checksum.

Compressed data – well compressed data should look random. It’s unlikely that it would end up in a single word of corruption, but possible.

Text – It looks random at first sight, but if the bytes are mostly in the range 0x30 to 0x7F, then it is quite possible that it is a fragment of a string. See what it says in the ASCII column of the memory window.

Block Corruption

Block corruption is where a group of words in memory are corrupted more or less together. The block can be any size, but we are generally talking anything from four bytes to 1024 bytes.

The corruption data in the block may contain any combination of the types of corruption data found in a single word, as discussed previously. There are a few situations specific to block corruption.

Partial corruption

When the data in the block of memory covered is not entirely corrupt, just say every few bytes or words has been changed, then this is a good indication that we are dealing with a pointer to a data structure (a structure or class) that has gone astray.

The most likely explanation is a dangling pointer. The code is continuing to update some data structure that has already been freed.

Full corruption

If the block of corruption is contiguous and no byte within it remains unchanged (except for a few common bytes, like zero, that might exist frequently in both corrupt and correct data), then it seems like the data structure has either been initialized, reset, or copied from somewhere else.

Unit Vectors

A common arrangement of three floats is in a vector, and a common sub-group of vectors is the unit vector. Unit vectors are quite recognizable in memory, since they consist of three small floating point numbers (in the range -1.0 to +1.0), and so they frequently start with the hex digit 3 or B.

Here’s an example of a unit vector sitting incongruously in the middle of a string.

5c6b6369 73636f64 6d61675c 6e697365  ick\docs\gamesin
3e6fdb1a bd0ee1b0 3f7909cd 6f635c6b  .Ã›o> °Ã¡. ½ Ã.y?k\co
655c6564 706d6178 5c73656c 6d617865  de\examples\exam

Looking at the hexadecimal, it is not immediately obvious that anything is wrong. We can see however from the text display column that there seems to be some garbage bytes in the middle of the path name.

Looking more closely at the garbage bytes, viewed as words, we can see that two of then start with 3, and one starts with B – a very good indication that we are dealing with a vector of small numbers, possibly a unit vector.

We can then switch to a floating point view, which gives us:

2.6502369e+017  1.8019267e+031  4.3599426e+027  1.8062378e+028
0.23423424      -0.034883201    0.97280580      7.0364824e+028
6.5049435e+022  2.9386312e+029  2.7403974e+017  4.3612297e+027

This confirms the nature of the corruption. We have three floating point numbers in the range -1.0 to +1.0, we can do a quick calculation to confirm that if we square the numbers and add them it comes out at about 1.0, so the length of the vector is 1.0, a unit vector.

Causes and Effects of Corruption

Once you have determined the likely nature of the corruption, you need to identify the piece of code that caused the corruption. If you are not able to directly observe the corruption taking place, you may have to selectively instrument suspicious pieces of code.

To narrow down the field of pieces of code that might be considered, we should have a look at the most common direct causes of corruption, and examine how each cause manifests itself.

Buffer Overruns

A Buffer overrun is perhaps the most common type of bug. You often hear about “buffer exploits” in the hacking world. Here a programmer has neglected to check that the size of the input data fits into the destination space. The data overruns the buffer, and possibly overwrites some space used for code. By adding some appropriate code to the end of the data, an industrious hacker can inject some of his own code into an application and take control of it.

Buffer exploits are less of a security problem for game developers, unless they are accepting data over the internet. However buffer overflows are still a very significant cause of bugs.

Bad Pointers

If the value of a pointer is incorrect, then it can corrupt memory (as well as providing bad data to whoever uses that pointer). The value in a pointer can become “bad” in a number of ways.

Dangling Pointer – If a memory block is de-allocated or freed, yet some pointer still references that block (or an object within that block), then that pointer is said to be a “Dangling Pointer”. The value of the pointer has not changed, however the pointer has become bad since it no longer points to valid data.

Incorrect Pointer Calculation – The pointer could be generated using incorrect pointer arithmetic, or using other values that are themselves incorrect, causing the value of the pointer to be calculated incorrectly. Pointer arithmetic might also return a pointer out of range of the target buffer – a form of buffer overflow.

Corrupt Pointers – The memory in which the pointer is stored may itself have become corrupted due to some unrelated cause. Thus corruption can cause corruption, extending the chain of causes.

Bad Local Pointers

If a pointer is created to an object that has local scope, then that pointer will only be valid while that object is in scope. See Listing 1

Listing 1

void CheckThing(CThing *p_x)
{
  CThing p_thing;
  p_thing = *p_x;
  if (ThingCheck(p_thing))
 {
    AddToList(p_thing));
 }
}

Here a local variable p_thing is being used for some temporary purpose. However, during the course of the function the variable is added to some global list, then the function returns.

The result is that there is now a pointer in some list somewhere that points to memory that is used by the stack. This will not be an immediate problem, since when the function returns, then the stack pointer will recede higher in memory, leaving the instance of p_thing safely below the stack. Then one of two things might happen.

Object gets corrupted – the object pointer to by p_thing now no longer legally exists, however its binary image is still in memory, and code can continue to use it without problems until the stack once more descends below that location in memory. At that point the object may get corrupted. This, in a sense, is not a memory corruption bug, since the writes are legal, and in the correct place. But it behaves very like a corruption bug.

Stack gets corrupted – the object is in a list, and presumably some operations are going to carried out with it. When the stack descends past this point in memory, then if that object is updated via the list, then updating the object will corrupt some memory that is legally being used by the stack. This could be a return address, it could be a saved register value, or it could be local variables in some routine higher up the call stack. Whichever it is, the effects will be deferred until the function call stack returns to that point, which could be quite distant from the cause of the problems.

Stack overflow

The stack overflowing can cause memory corruption in a number of ways. A stack is of a fixed size, and immediately it overflows that size it has begun to corrupt memory. What happens next depends on the size of the stack frame, the position of the stack in memory and what lays beneath it in memory.

Not all platforms are equally vulnerable to corruption for stack overflow. Win32 will simply raise a stack overflow exception if the stack pointer writes beyond the bounds of the stack. On platforms that do this, debugging is a relatively simply matter of looking at the call stack, which should have one or two functions repeated over and over, pointing you directly at the culprit.

Other platforms are less fortunate. The PS2 has not special protection for the stack pointer. The stack is frequently placed at the top of the 32MB of memory, which means that if it grows downward past the area reserved for it, it can corrupt data and possibly even code.

Code Corruption

As already mentioned, if the stack is allowed to proceed apace through memory, it will eventually overwrite the code that is currently being executed, causing a crash.

Code corruption due to stack overflow can often be seen in the disassembly view of the debugger. If the code before the crash location looks reasonable, and the code after looks repetitive or contains illegal instructions, then overflowing stack corruption is the most likely cause.

Sparse Corruption

The common cause of stack overflow is runaway recursion. A less common cause is moderate recursion combined with a very large stack frame. This occurs when the programmer has a local variable in the recursive routine that takes up a large amount of space. Example: See listing 2

Listing 2

class	CBuffer
{
int x[2048];
}

int DigTree(CTree *p_tree)
{
 CBuffer local_buffer;
 Dig(p_tree,local_buffer);
 if (NotFinished(p_tree))
   DigTree(p_tree);
 Finish(p_tree,local_buffer);
}

Here DigTree is a recursive function that has a local variable local_buffer. A new instance of local_buffer must be created on the stack. Since the CBuffer class takes 8K of memory, it takes relatively few recursions to overflow the stack, especially on consoles such as the Gamecube where the stack size is kept as low as possible, often down to 64K or less.

If the huge CBuffer object is not cleared every time the function is entered, then this can have the effect of corruption being evenly spaced every 8K through memory. This can be quite a red herring in a number of ways. Firstly, the first time you see the corruption it might seem to be just a single instance of corruption, making you not think of a stack overflow.

Secondly, it can overstep any tests you do to check for stack overflow. Often you would place some magic numbers in the bottom of the stack, and check to see if they are still there, as a way of detecting the stack has overflowed. If the game does not immediately crash during the recursion, the stack pointer will return to a normal address, and your code will run along merrily until the corruption causes some later problem.

Thirdly, if it is runaway recursion, then the large stack frame might overstep the code that is being executed, causing widespread code corruption, yet not actually crashing in the code that caused the corruption. While this should still point you to stack overflow, the culprit will be less obvious. A stack analysis on the corrupting stack frame will indicate the location of the corrupting code – providing you can detect the write that causes the corruption.

Comments (15)

August 18, 2008

CPanel Hotlink Protection Breaks WP Permalinks

Filed under: Game Development — Mick West @ 9:08 pm

I was messing with the hotlink protection in CPanel, and just toggled it on and off, unfortunately this broke the permalinks in wordpress. Specifically it removed the line:

RewriteEngine On

In .htaccess

Took me a while to track down.

With wp-supercache, the mod_rewrite part of your .htaccess should look like:

<IfModule mod_rewrite.c>
# BEGIN WordPress

RewriteEngine On
RewriteBase /
#RewriteCond %{QUERY_STRING} !.*s=.*
#RewriteCond %{HTTP_COOKIE} !^.*comment_author_.*$
#RewriteCond %{HTTP_COOKIE} !^.*wordpress.*$
#RewriteCond %{HTTP_COOKIE} !^.*wp-postpass_.*$
#RewriteCond %{HTTP:Accept-Encoding} gzip
#RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz -f
#RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz [L]

#RewriteCond %{QUERY_STRING} !.*s=.*
#RewriteCond %{HTTP_COOKIE} !^.*comment_author_.*$
#RewriteCond %{HTTP_COOKIE} !^.*wordpress.*$
#RewriteCond %{HTTP_COOKIE} !^.*wp-postpass_.*$
#RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html -f
#RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html [L]
#
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

# END WordPress
</IfModule>

In my case the “RewriteEngine On” was missing, causing individual pages not to work, but the main page worked fine, so I did not notice.

Comments (18)

August 5, 2008

The CheckWord Pricing Experiment

Filed under: Game Development — Mick West @ 11:46 am

I release my Scrabble word checker CheckWord about a month ago, for free, on the iTune App Store. I later released a significant update, which added a lot of new features – specifically word generation and annagrams. I kept this free, even though I’d put a lot of work into it, and several people told me they would pay for it – one guy even set me $10.

A week ago, I started to get daily “sales” figures, and saw that I was selling (at $0.00) around 2,000 copies a day. Meaning 2000 people EVERY DAY were finding my app interesting enough to download. I wondered how many of those would actually pay 99 cents for it. Given the vast numbers of people who seemingly would pay 99 cents for a units converter, I figured maybe quite a few. So I decided to experiment and see what would happen if I increased the price to 99 cents. I could always put it back later.

So I made the change in the evening of August 4th, and I’m not sure how long it too to filter through to the app store. But in the morning the change was made. I’d also got my first feedback in the form of a new review that said:

was free, now costs $$, super dumb
by k)smith
I hate devs that relase an app for free, then once it becomes slightly popular they start charging for it. If you wanted to make money off it, charge for it in the first place idiots

Hmm, not quite what I was hoping for. Still, interesting reaction. It annoyed me at first, but then I though that hey – that’s just one out of a few people who would have downloaded it for free. So k)smith is annoyed it’s not free any more – but what proportion of my my audience does he represent.

I then noticed something else interesting. I was at position #30 in the “Top Paid Apps”, right up there with Scrabble at #27. Now I’m assuming this is because it ranks you based on your number of “sales”, regardless of if they were at $0.99 or $0.00. When I changed the price, I got my previous 30,000 (estimated) free “sales” as part of my ranking for paid sales. This is a bit misleading, and quite possibly something Apple will change. But it means I’ll be in the top 100 paid apps for several weeks if I keep the price there.

So now I wait, and see how much money I make in a day. If it drops down to ten copies or so, then I’ll probably make it free again. If it’s $100 a day, then I’ll probably keep charging for it, at least until demand dries up. Economics. Maybe I’ll make enough to cover the cost of my Mac Mini and spare iPhone, and enough to justify keep making improvements to CheckWord, and to release other utilities.

This actually does not seem like a terrible marketing strategy. You release an app for free, so thousands of people download it. They give it good reviews, and you get lots of word-of-mouth. When you’ve got sufficient momentum, you up the price as much as the market will bear. This is rather a novel scheme as it relies on having zero distribution costs, something that really only came about with the advent of the Apple App Store.

I suspect this “Introductory free pricing” scheme is show up more and more.

But now I’ve got to wait and see what “the market” for Scrabble word checkers thinks about my $0.99.

Day 1 (8/5/2008):

Sold 98 copies. That’s $68.60, probably in a bit less than 24 hours, as a few sales were reported at $0.00. If that keeps up, it’s $25,000 a year. Okay, seems like it’s worth letting this experiment continue! The first day might be an anomaly, so let’s see how it pans out over the course of the week.

The breakdown was:

US – 65
UK – 18
Canada – 9
Australia – 3
Korea – 2
New Zealand – 1

Another change was I dropped from #30 to #39 on the “Top Paid Apps” list. Perhaps that’s a rolling average of the last seven days or so. Scrabble is now at #30. I’ve also got a few more reviews, so criticizing my move to $0.99, and some defending it. Luckily these reviews don’t actually show up at the top of the list, as it defaults to “most helpful”.

Day 2 (8/6)

Sold 75 copies, down quite a bit, but still worth keeping up there. Let’s give this a week, and see what the actual trend it. First two days may have unusual variance.

Day 3 (8/7)

58 – Down more, not looking too good.

Day 4 (8/8) Friday

61 – Up a bit, maybe it’s stabilized. Heck if I sell 60 a day,then that’s still, erm, 60*.70*365 = $15,330 a year. See how it does over the weekend – maybe all those weekend Scrabble players will give it a boost. I’ve now dropped off the top 100 paid apps, which is as it should be.

Day 5 (8/9) Saturday

Day 6 (8/10) Sunday

Week 1 (8/4 to 8/10)

US 286, UK 64, Canada 29, Australia 18, Rest of World: 31

Total for week = 428, average of 61 per day. And that’s with about a day as a free app.

So, what I decided to do was put out a free version of CheckWord that just has the Scrabble word checking function in there. The full version will remain at $0.99, and hopefully enough people will continue to download the free version, and a portion of them will upgrade to the full version with word generation, anagrams and pattern matching.

The free version is now in the app store, so we’ll see how that works out.

Comments (14)

July 23, 2008

CheckWord Reviews

Filed under: Game Development — Mick West @ 9:49 am

My Scrabble word checker “CheckWord” for the iPhone has got a surprisingly large number of reviews and comments for such a simple application. I think this is probably because it was a launch app, the the early adopters are just going through every single app they can get their hands on. Also early-adopters tend to be a bit nerdy, and hence more likely to play Scrabble.

Currently there are three video reviews, one of which is in Spanish, which almost makes me want to do a Spanish version of CheckWord.

And the Spanish one:

The third video review is on appstoreapps.com:
http://www.appstoreapps.com/2008/08/11/checkword/

There are also lots of web posts, most of which seem to be cut and paste from each other – probably mostly automatically in some kind of click mining operations. I’ve tried to get the original sources below:

I make the top ten free utilities for the iPhone on this list, coming it at #8

http://www.extremetech.com/article2/0,2845,2325378,00.asp

Find out if a word is acceptable in games like Scrabble. CheckWord makes it fast and easy to find out if the word somebody is trying to use is real or if they made it all up. Don’t be cheated when playing Scrabble put CheckWord on your iPhone today.

http://www.appleiphoneschool.com/2008/07/14/checkword-10/

CheckWord is a simple application that allows you to check whether or not words are good or bad in games like Scrabble. I love the developer’s description, “Quicker than using a dictionary, easier than using a computer.” See, now that is why we love our iPhones! […]This application is great for those of you who like to play Scrabble the correct way

http://macenstein.com/default/archives/1519

CheckWord is a Scrabble player’s dream app

http://theapplife.com/2008/07/20/a-new-favorite-app-checkword-is-good/

A new favorite app. CHECKWORD is GOOD

http://inc.ongruo.us/2008/08/04/my-new-iphone-25-apps-in-10-days/

CheckWord (free): Very simple, very cool little app that simply tells you whether a word is good or bad in Scrabble. It uses the TWL (Tournament Word List) as opposed to the consumer dictionary, so “FUCK is GOOD”, which is just as it should be.

http://icali.tv/checkword-app-review

Overall, CheckWord is a valuable resource while playing games like Scrabble, whether you’re on your iPhone/iPod Touch, or you’re playing actual board games.

I also got some nice reviews on the App store, although several people there seemed to miss that point that this was a Scrabble word checker, and not a dictionary.

Comments (12)

« Newer Posts — Older Posts »

Cowboy Programming Game Development and General Hacking by the Old West

November 15, 2008

My coding practices in 1991

September 9, 2008

Debugging Memory Corruption in Game Development

Symptoms of Memory Corruption

Crashes

Address Error

Infinite Loop

Illegal Instruction

Unexpected Values

Glitches in the Graphics

Textures

Meshes

Skeletons and Animation

Investigating Corruption:

Is it actually corruption?

Who owns that memory location?

Repeatable, Fixed Location

Intermittent, Fixed Location

Intermittent, Variable Location

Determine the location of the corruption

Hardware Breakpoints

Identifying Hex Droppings

Single Bit Corruption

Single Byte Corruption

Single 32-bit Word Corruption

Zero

One

Floating Point Numbers

Small Integers

Large Integers

Negative Integers

Magic Hex Numbers

Magic ASCII

Pointers

Random Numbers

Block Corruption

Partial corruption

Full corruption

Unit Vectors

Causes and Effects of Corruption

Buffer Overruns

Bad Pointers

Bad Local Pointers

Stack overflow

Code Corruption

Sparse Corruption

August 18, 2008

CPanel Hotlink Protection Breaks WP Permalinks

August 5, 2008

The CheckWord Pricing Experiment

Day 1 (8/5/2008):

Day 2 (8/6)

Day 3 (8/7)

Day 4 (8/8) Friday

Day 5 (8/9) Saturday

Day 6 (8/10) Sunday

Week 1 (8/4 to 8/10)

July 23, 2008

CheckWord Reviews