Stack Allocation

This is another gem from Game Programming Gems 8, describes a method of allocation that shave valuable cycles off, to optimize our code for speed and space.

The typical way of doing a rapid allocation is to use a linked list. Here we will use a simpler stack allocator. This system uses a pre-allocated list of items, with arbitrary-size indices for the pointers, INTs, WORDs, BYTEs or even BITs. But unlike a normal list, we use a simple stack concept and position the stack pointer at the end. Then we can pop the next free item off the stack, and return it with less code, while we push items onto the list to free them.

1601519543482_.pic_hd.jpg

The ability to easily use varying sizes of indices, bits, pointers, or even POD types directly without having to really change the implementation is very powerful. Here is a example with basic pointer allocation.

Pointer-Based Implementation

First, we need to create and initialize the array.

#define      MAX_OBJECT    10000

Particle*    pStack[MAX_OBJECT];          // Our object stack
int          SP;                          // The Stack Pointer
Particle     ParticleArray[MAX_OBJECT];   // The object pool

// Initialise the stack with object indexes from 0 to MAX_OBJECT-1
void InitStack (void)
{
    // Create our object list
    // Pre-Fill stack with indexes (or pointers)
    for (int i = 0; i < MAX_OBJECT; i++) {
        Stack[i] = &ParticleArray[i];
    }
    // Initialise the stack pointer
    SP = MAX_OBJECT;
}

Creation is simply a matter of filling the array with object pointers and then setting up a stack pointer. Then allocation and releasing of a particle object:

// Pop a particle from the free pool.
// -1 is returned if there are no free particles left.
Particle* Pop (void)
{
    assert(SP >= 0, "Error: Stack pointer has gone negative!");
    if (SP == 0) return -1;
    return pStack[--SP];
} 

// Push a used particle back onto the free pool
void Push (Particle* _pParticle) {
    assert (SP >= 0, "Error: Stack pointer has gone negative!");
    assert (SP != MAX_OBJECT, "Error: Not enough space in stack. Object freed twice?");
    pStack[SP++] = _pParticle;
}

Allocating a particle is simply a matter of calling Pop(), while releasing only requires to call Push(). To clarify when dealing with a stack method, we will stick with Push() and Pop().

Index-Based Implementation

Now let’s try to allocate an index as handle to an object rather than a pointer to the object. In this example, we have 256 sprites that we wish to allocate from, and rather than allocating an object and returning a whole pointer, we will simply allocate and return the BYTE index, thereby saving memory.

#define          MAX_SPRITES    256
unsigned char    Stack[MAX_SPRITES];
Sprite           SpritePool[MAX_SPRITES];

// Initialize the stack with object indexes from 0 to MAX_OBJECT-1
void InitStack (void)
{
    // Pre-Fill stack with ready to use particles.
    for (int i = 0; i < MAX_SPRITES; i++) {
        Stack[i] = (unsigned char) i;
    }

    // Initialize the stack pointer
    SP = MAX_SPRITES;
}

Here we changed the allocation type by simply replacing the Particle* with a sprite index. Because the total index number is 256, so we can use an 8 bit unsigned char to store them. The calling class can then store the index and use it as a handle directly, or at least take the address of &SpritePool[index] just before using it.

Taking an extreme case of 16 million objects (a full 24-bit number), normally we would choose a full INT index or a linked list. However, using the stack method, we can easily store 3 bytes per entry by storing two tables of a byte and a short. Thereby saving 1 byte, that is 16 MB for total. Taking this further, for example, a 18 bits of information, we can deal with multiple tables of one holding a short and another holding the leftover bits.


const int         MAX_OBJ = 0x40000;
unsigned short    Stack_short[MAX_OBJ];
unsigned int      Stack_bits[MAX_OBJ/16];
int               SP;
// Initialize the free list
void Init (void)
{
    // Create and fill our 18 bit table
    // Use Push to build the table as it is easier in this case
    for (int i = 0; i < MAX_OBJ; i++) {
        Push(i);
    }
}

There are some tricky bit manipulations in Push command for dividing 18 bits into a short and 2 bits, then push them onto two stacks.

// Push an 18-bit index onto the object stack
void Push (int _value)
{
    assert(_value >= 0);
    assert(_value < 0x400000);
    assert(SP < MAX_OBJ);

    Stack_short[SP] = (unsigned short) _value & 0xffff;

    // Clear the bits we're about to OR into.
    int shift = (SP & 0xf) << 1; // 0, 2, 4, 6, 8, ...30
    int index = SP >> 4; // every 16 entries have the same most significant 2 bits
    Stack_bits[index] &= ~(3 << shift) // keep the rest bits in the entry
    Stack_bits[index] |= ((_value >> 16) & 0x3) << shift;
    SP++;
}

It is written for storage efficiency, although much slower than a linked list or a pointer array. Then for the allocation:

// Return the next free index (an 18-bit number), or -1 for an error.
int Pop (void)
{
    if (SP == 0) return -1;

    // Get the main 16 bits
    int val = Stack_short[--SP];

    // OR in the extra bits we need to make up the 18bit index
    val |= ((Stack_bits[SP>>4]>>((SP&0xf)<<1))&0x3)<<16;

    return val;
}

The real strength is the ease and speed at which we can adapt the stack allocation system to our needs. Not only pointers, BYTEs, SHORTs, and INTs, even groups of BITs.

We can also use POD(Plain Old Data) type, which store a whole structure but can also fit inside a standard native type. For example, we need a POD to hold 64×64 tile coordinate inside multiple 4096×4096 texture pages.

struct STileCoordinate
{
    unsigned char    X;    // 64x64 X tile coordinate
    unsigned char    Y;    // 64x64 Y tile coordinate
    short            page; // Texture page to use. (also padding)
};

Since this data fits inside 4 bytes, we can actually allocate and return it directly without the need of pointers. We can also make an array of these that is the same amount of memory as an array of INTs. In fact this is copied around at the same speed as a standard integer number, as now modern compilers will recognize this and allow us to pass around this data inside a register.

// Return a free tile structure
STileCoordinate Pop(void)
{
    // If none left, then return an error
    if (SP == 0) return EmptyTile

    // Get the main 16 bits
    return TileStack[--SP];
}
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s