There are two major ways of using lookup tables (LUTs) in C/C++ code:
- build them at runtime,
- embed them in the code.
One major advantage of runtime initialisation is the choice between static initialisation (at program startup), or lazy initialisation (on demand) to save memory. Also, the generating code can be complex, or use information that is only available at runtime.
In the case of an embedded table, the generation cost is only at compile time, which can be very useful. Also, the compiler may take advantage of its early knowledge of the table contents to optimise code. However, quite often the content of embedded tables is abstruse and hardly useful to someone viewing the code. Usually this is due to the use of an external program for generation, sometimes in a completely different language. But the generation can also often be done using the C/C++ preprocessor.
Practical example
Consider the bit interleaving routine at Sean Eron Anderson's Bit Twiddling Hacks page (which, by the way, I recommend you read and bookmark). It uses the following LUT (shortened for brevity):
static const unsigned short MortonTable256[256] =
{
0x0000, 0x0001, 0x0004, 0x0005, 0x0010, 0x0011, 0x0014, 0x0015,
0x0040, 0x0041, 0x0044, 0x0045, 0x0050, 0x0051, 0x0054, 0x0055,
0x0100, 0x0101, 0x0104, 0x0105, 0x0110, 0x0111, 0x0114, 0x0115,
... 32 lines in total ...
0x5540, 0x5541, 0x5544, 0x5545, 0x5550, 0x5551, 0x5554, 0x5555
};
The MortonTable256 table has, as its name suggests, 256 elements. It was pregenerated by some external piece of code which probably looked like this:
for (int i = 0; i < 256; i++)
MortonTable256[i] = (i & 1) | ((i & 2) << 1) | ((i & 4) << 2) | ((i & 8) << 3);
The problem with that external piece of code is that it is external. You cannot write it in this form and have it fill the table at compile time.
If you only take the output of this table, the information on how the table was created is lost. It makes it impractical to build another table that has, for instance, all values shifted one bit left. Even if such a table was created using a modified version of the above code, switching between the two tables would be a hassle unless both versions were kept between preprocessor tests.
Preprocessor iterator
Here is one way to get the best of both worlds. First, declare the following iterator macros. They can be declared somewhere in a global .h, maybe with more descriptive names:
#define S4(i) S1((i)), S1((i)+1), S1((i)+2), S1((i)+3)
#define S16(i) S4((i)), S4((i)+4), S4((i)+8), S4((i)+12)
#define S64(i) S16((i)), S16((i)+16), S16((i)+32), S16((i)+48)
#define S256(i) S64((i)), S64((i)+64), S64((i)+128), S64((i)+192)
#define S1024(i) S256((i)), S256((i)+256), S256((i)+512), S256((i)+768)
Their purpose is simple: calling eg. S16(i) will expand to S1(i), S1(i+1), …, S1(i+15). Similarly, S256(i) will call S1 with values from i to i + 255 times.
And this is how to use them in our example:
static const unsigned short MortonTable256[256] =
{
#define S1(i) ((i & 1) | ((i & 2) << 1) | ((i & 4) << 2) | ((i & 8) << 3))
S256(0)
#undef S1
};
That's it! The table will be built at compile time, and you get to keep the logic behind it.
A more complex example
Jeroen van der Zijp's fast half float conversions paper describes table-based methods to convert between 16-bit and 32-bit floating point values. The construction of one of the LUTs is as follows:
void generatetables(){
for(unsigned int i=0; i<256; ++i){
int e=i-127;
if(e<-24){ // Very small numbers map to zero
basetable[i|0x000]=0x0000;
basetable[i|0x100]=0x8000;
} else if(e<-14){ // Small numbers map to denorms
basetable[i|0x000]=(0x0400>>(-e-14));
basetable[i|0x100]=(0x0400>>(-e-14)) | 0x8000;
} else if(e<=15){ // Normal numbers just lose precision
basetable[i|0x000]=((e+15)<<10);
basetable[i|0x100]=((e+15)<<10) | 0x8000;
} else if(e<128){ // Large numbers map to Infinity
basetable[i|0x000]=0x7C00;
basetable[i|0x100]=0xFC00;
} else{ // Infinity and NaN's stay Infinity and NaN's
basetable[i|0x000]=0x7C00;
basetable[i|0x100]=0xFC00;
}
}
}
And this is the compile-time version :
static uint16_t const basetable[512] =
{
#define S1(i) (((i) < 103) ? 0x0000 : \
((i) < 113) ? 0x0400 >> (0x1f & (113 - (i))) : \
((i) < 143) ? ((i) - 112) << 10 : 0x7c00)
S256(0),
#undef S1
#define S1(i) (0x8000 | basetable[i])
S256(0),
#undef S1
};
In this case the macro code is slightly bigger and was slightly rewritten, but is no more complicated than the original code. Note also the elegant reuse of previous values in the second half of the table.
This trick is certainly not new, but since I have found practical uses for it, I thought you may find it useful, too.