Fuck you, Microsoft: the sorry state of Visual Studio syntax highlighting

TL;DR: you can’t even choose a different colour for “return” and “float” in Visual Studio.

I mostly use Vim as my everyday editor. This is not the place to go through the details of why, but basically it does exactly what I want. One thing Vim does well for me is syntax highlighting. Here is a bit of Lol Engine C++ using Vim’s default colour scheme:

The colour scheme is simple here:

  • yellow for control flow
  • green for types and qualifiers
  • magenta for constants
  • light grey for everything else

You may notice that the vec3 or quat types, which are not C++ base types but which behave exactly as if, are coloured just like float. This is simply done by adding the following line to my main config file or to a separate configuration file:

au Syntax cpp syn keyword cType vec2 vec3 vec4 quat

Okay now I would like you to read that again.

  • yellow for control flow
  • green for types (including custom types) and qualifiers
  • I get all that shit by adding one single configuration line

And I am going to show you the pain it is to do the same in Visual Studio.

First Visual Studio attempt: usertype.dat

Since at least Visual Studio 2003, the usertype.dat file can be used to add a list of user types. This is still true in Visual Studio 2010. So let’s add those new types to usertype.dat:

vec2
vec3
vec4
quat

Not quite there.

M_PI is not coloured, and if I add it to the list it becomes green instead of magenta, but let’s forget about it for now.

float and const are still yellow and there is no way to fix that! Of course I thought about adding them to the list of user types, but the scanner decides that they’re C++ keywords way before it checks for user types.

Second Visual Studio attempt: a Language Service add-in

Visual Studio add-ins are very powerful. Almost any part of the editor can be modified and augmented. There are hundreds of add-ins available and you can write your own.

This looks promising: you can write your own language handler, but you can certainly modify an existing one quite easily (or so I thought).

To handle a new language, you need to subclass the LanguageService class and you register that new class in Visual Studio. One of the methods you need to implement in the language service is GetScanner:

class MyLanguageService : LanguageService
{
    [...]
    public override IScanner GetScanner(IVsTextLines buffer)
    {
        return new MyScanner(buffer);
    }
}

And the IScanner interface has a ScanTokenAndProvideInfoAboutIt method that is responsible for choosing the colour of tokens:

class MyScanner : IScanner
{
    [...]
    bool IScanner.ScanTokenAndProvideInfoAboutIt(TokenInfo tokeninfo, ref int state)
    {
        [...]
        if (FoundKeyword)
        {
            tokeninfo.Color = TokenColor.Keyword;
            return true;
        }

        [...]
    }
}

This is brilliant. I mean, with such an architecture, you can implement your own sublanguage, add colour schemes, modify whatever you want. Honestly, this is perfect for my needs.

So here is the plan:

  1. find the LanguageService class responsible for parsing C++
  2. inherit from it and reimplement GetScanner() to use our own IScanner
  3. in our version of ScanTokenAndProvideInfoAboutIt, just call the C++ scanner’s version of ScanTokenAndProvideInfoAboutIt and inspect tokeninfo
  4. if the token info says this is a keyword, match that keyword with our list of types, and change its colour if necessary
  5. register our new language service with a higher priority

This sounds pretty simple and elegant. It’s some kind of two-level proxy pattern.

Except it has absolutely no chance to work. Because there is no language service class for C++.

That’s right. Visual Studio does not use its own advertised architecture to handle the C++ language. If you want to slightly change the behaviour of the C++ language service, you need to fully reimplement it. And by fully reimplement, that means fully, even the completion stuff for IntelliSense.

Third Visual Studio attempt: a Classifier add-in

A classifier add-in differs from a regular language add-in in that it only affects the text that is being displayed. It has no knowledge of the language syntax or structure or what the underlying parser has analysed, but it does know about what the underlying classifier did. For instance, it doesn't know whether a given chunk of text is a C-style or a C++-style comment, but it does know that it was classified as "comment".

This proved to be the correct thing to use! My Visual Studio colour scheme now looks a lot more like my Vim setup:

There are still limitations, but it's a good start. When another plugin comes in and has higher priority, it undoes everything my add-in did, which is arguably the fault of those other plugins, but I believe the lack of a properly pluggable architecture is definitely the issue.

Further thoughts

I know this is a rant, but I will nonetheless add my own constructive information here, as well as anything readers may wish to contribute.

There are other paths I have not explored yet:

  • disassemble the Visual Studio DLLs

I am pretty sure people will suggest that I use VAX (Visual Assist X). I am already using it. I am even a paying customer. In fact I asked for that feature more than three years ago and was more or less ignored (the only answer I got was about a minor point where the person thought I was wrong — I wasn’t). While most of the bugs I reported against VAX were fixed, I have a problem with their stance on accessibility, illustrated by their attitude on this bug. My general feeling is that VAX is a pathetic, slow and annoying piece of crap. The only reason I do not rant more about it is that I know how painful it is to write Visual Studio extensions.

I asked for advice on StackOverflow but since my problem is very specific and probably has no solution, it’s not surprising that I haven’t got any answers yet.

Someone wanted to extend the syntax colouring but was told that apparently “this can't be accomplished with an add-in”, “you may be looking at implementing a full language service to provide this feature” and “Todays language services are currently not architected to be extensible”. One suggestion was to replace a whole COM object using only its CLSID.

Another person wanted to leverage existing language services from within Visual Studio in order to use it for his own language, and was told it was not possible. The workaround mentioned in that thread involves creating a whole new virtual project that would mirror files, hide them, rename them to .c or .h, and analyse the result.

Conclusion

Honestly, the only reasons I still use Visual Studio are:

  • I use it at work
  • a lot of people use it and I need to provide them with a usable environment
  • there’s no other acceptable way to develop for the Xbox 360

But given how it sucks, and has sucked for years, and made my life miserable, and how some of the bugs I have reported back in 1997 are still present, I can only hope that this pathetic piece of crap either becomes opensource (wishful thinking) or just dies and we get something really extensible instead.

The stolen bytes: Visual Studio, virtual methods and data alignment

This article describes a design choice in the C++ ABI of the Visual Studio compiler that I believe should be considered a bug. I propose a trivial workaround at the end.

TL;DR — if the topmost polymorphic class in a hierarchy has members with alignment requirement N where N > sizeof(void *), the Visual Studio compiler may add up to N bytes of useless padding to your objects.

Update: be sure to read the explanation by Jan Gray, who designed the relevant part of the MS C++ ABI some 22 years ago, in the comments section below.

My colleague Benlitz first hit the problem when trying to squeeze memory out of some of our game’s most often instantiated classes. I think it is best illustrated with the following minimal example:

class Foo
{
    virtual void Hello() {}
    float f;     /* 4 bytes */
};
class Bar
{
    virtual void Hello() {}
    float f;     /* 4 bytes */
    double d;    /* 8 bytes */
};

This is the size of Foo and Bar on various 32-bit platforms:

Platform sizeof(Foo) sizeof(Bar) Madness?
Linux x86 (gcc) 8 16 no
Linux ARMv9 (gcc) 8 16 no
Win32 (gcc) 8 16 no
Win32 (Visual Studio 2010) 8 24 yes
Xbox 360 (Visual Studio 2010) 8 24 yes
PlayStation 3 (gcc) 8 16 no
PlayStation 3 (SNC) 8 16 no
Mac OS X x86 (gcc) 8 16 no

There is no trick. This is by design. The Visual Studio compiler is literally stealing 8 bytes from us!

What the fuck is happening?

This is the memory layout of Foo on all observed platforms:

\begin{tabular}{|r|llll|llll|}
\hline
byte & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 \\
\hline
field & \multicolumn{4}{|c|}{\textit{vfptr}}
      & \multicolumn{4}{|c|}{\texttt{float f;}} \\
\hline
\end{tabular}

The vfptr field is a special pointer to the vtable. The vtable is probably the most widespread compiler-specific way to implement virtual methods. Since all the platforms studied here are 32-bit, this pointer requires 4 bytes. A float requires 4 bytes, too. The total size of the class is therefore 8 bytes.

This is the memory layout of Bar on eg. Linux using GCC:

\begin{tabular}{|r|llll|llll|llllllll|}
\hline
byte & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7
     & 8 & 9 & 10 & 11 & 12 & 13 & 14 & 15 \\
\hline
field & \multicolumn{4}{|c|}{\textit{vfptr}}
      & \multicolumn{4}{|c|}{\texttt{float f;}}
      & \multicolumn{8}{|c|}{\texttt{double d;}} \\
\hline
\end{tabular}

The double type has an alignment requirement of 8 bytes, which makes it fit perfectly at byte offset 8.

And finally, this is the memory layout of Bar on Win32 using Visual Studio 2010:

\begin{tabular}{|r|llll|llll|llll|llll|llllllll|}
\hline
byte & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7
     & 8 & 9 & 10 & 11 & 12 & 13 & 14 & 15
     & 16 & 17 & 18 & 19 & 20 & 21 & 22 & 23 \\
\hline
field & \multicolumn{4}{|c|}{\textit{vfptr}}
      & \multicolumn{4}{|c|}{\textit{padding}}
      & \multicolumn{4}{|c|}{\texttt{float f;}}
      & \multicolumn{4}{|c|}{\textit{padding}}
      & \multicolumn{8}{|c|}{\texttt{double d;}} \\
\hline
\end{tabular}

This is madness! The requirement for the class to be 8-byte aligned causes the first element of the class to be 8-byte aligned, too! I demand a rational explanation for this design choice.

The problem is that the compiler decides to add the vtable pointer after it has aligned the class data, resulting in excessive realignment.

Compilers affected

The Visual Studio compilers for Win32, x64 and Xbox 360 all appear to create spurious padding in classes.

Though this article focuses on 32-bit platforms for the sake of simplicity, 64-bit Windows is affected, too.

The problem becomes even worse with larger alignment requirements, for instance with SSE3 or AltiVec types that require 16-byte storage alignment such as _FP128:

class Quux
{
    virtual void Hello() {}
    float f;     /* 4 bytes */
    _FP128 dd;   /* 16 bytes */
};

This is the GCC memory layout on both 32-bit and 64-bit platforms:

\begin{tabular}{|r|c|c|c|c|c|c|c|c|}
\hline
byte & 0--3 & 4--7 & 8--11 & 12--15
     & 16--19 & 20--23 & 24--27 & 28--31 \\
\hline
\hline
field (32-bit) & \textit{vfptr}
                & \texttt{float f;}
                & \multicolumn{2}{|c|}{\textit{padding}}
                & \multicolumn{4}{|c|}{\texttt{\_FP128 dd;}} \\
\hline
field (64-bit) & \multicolumn{2}{|c|}{\textit{vfptr}}
               & \texttt{float f;}
               & \textit{padding}
               & \multicolumn{4}{|c|}{\texttt{\_FP128 dd;}} \\
\hline
\end{tabular}

The padding there is perfectly normal and expected, because of the alignment requirements for dd.

But this is how Visual Studio decides to lay it out:

\begin{tabular}{|r|c|c|c|c|c|c|c|c|c|c|c|c|}
\hline
byte & 0--3 & 4--7 & 8--11 & 12--15
     & 16--19 & 20--23 & 24--27 & 28--31
     & 32--35 & 36--39 & 40--43 & 44--47 \\
\hline
\hline
field (32-bit) & \textit{vfptr}
               & \multicolumn{3}{|c|}{\textit{padding}}
               & \texttt{float f;}
               & \multicolumn{3}{|c|}{\textit{padding}}
               & \multicolumn{4}{|c|}{\texttt{\_FP128 dd;}} \\
\hline
field (64-bit) & \multicolumn{2}{|c|}{\textit{vfptr}}
               & \multicolumn{2}{|c|}{\textit{padding}}
               & \texttt{float f;}
               & \multicolumn{3}{|c|}{\textit{padding}}
               & \multicolumn{4}{|c|}{\texttt{\_FP128 dd;}} \\
\hline
\end{tabular}

That is 16 lost bytes, both on 32-bit and 64-bit versions of Windows.

Workaround

There is fortunately a workaround if you want to get rid of the useless padding. It is so trivial that it actually makes me angry that the problem exists in the first place.

This will get you your bytes back:

class EmptyBase
{
protected:
    virtual ~EmptyBase() {}
};
class Bar : public EmptyBase
{
    virtual void Hello() {}
    float f;     /* 4 bytes */
    double d;    /* 8 bytes */
};

And this is the size of Bar on the same 32-bit platforms:

Platform sizeof(Bar)
Linux x86 (gcc) 16
Linux ARMv9 (gcc) 16
Win32 (gcc) 16
Win32 (Visual Studio 2010) 16
Xbox 360 (Visual Studio 2010) 16
PlayStation 3 (gcc) 16
PlayStation 3 (SNC) 16
Mac OS X x86 (gcc) 16

Phew. Sanity restored.

\begin{tabular}{|r|cccc|cccc|cccccccc|}
\hline
byte & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7
     & 8 & 9 & 10 & 11 & 12 & 13 & 14 & 15 \\
\hline
\texttt{EmptyBase} fields & \multicolumn{4}{|c|}{\textit{vfptr}}
                          & \multicolumn{12}{|c|}{} \\
\hline
\texttt{Bar} fields & \multicolumn{4}{c}{\texttt{EmptyBase}}
                    & \multicolumn{4}{|c|}{\texttt{float f;}}
                    & \multicolumn{8}{|c|}{\texttt{double d;}} \\
\hline
\end{tabular}

The compiler is a lot less confused now: it no longer has to create space for a vfptr in Bar since it is technically already part of EmptyBase.

Conclusion

Lessons learned:

  • The pointer to the vtable isn’t just like any other pointer.
  • Various C++ ABIs have different stances on padding and alignment.
  • Inheriting from an empty abstract class can make your objects smaller on Windows and Xbox 360!
  • Design decisions can haunt you for decades!

The workaround is so simple that it sounds like a good idea to always use it, preemptively.

Setting up a real Compose key on Mac OS X

The Compose key is my method of choice for international character input. It lets me type characters as diverse as é Â ẃ ṗ § … « ¿ ¥ ¹ ½ © × using simple and intuitive key combinations.

How intuitive exactly? Let’s see:

  • Compose + C + , gives me Ç.
  • Compose + l + / gives me ł.
  • Compose + < + < gives me «.
  • Compose + - + > gives me →.
  • Compose + < + 3 gives me ♥.
  • Compose + C + C + C + P gives me ☭ (no kidding).

You can of course set up your own rules. This shit is so powerful that I cannot imagine I could ever use any other input method.

So, one would think that with all its glorious Unix heritage, Mac OS X would let you get the most out of your keyboard like the good old X11 system does. Well it turns out it’s possible, but not straightforward.

Fortunately, other people already did all the work. I will just indicate how to put their stuff together.

Step 1: choose a Compose key

Choose the Compose key so that it is easily accessible but does not prevent you from doing anything you ordinarily do. Fortunately, modern keyboards come with more and more idiotic and useless keys.

I use the Right Alt key as my Compose key. I already have a Left Alt key so the right one is a bit useless to me. And it somehow matches the position of the Compose key on old Sun keyboards.

That would be Right Option on a Mac keyboard. I recommend that.

Step 2: remap the Compose key

The problem is that the Mac OS X keyboard preferences:

  • do not let you differentiate between Left and Right Option keys
  • only let you remap modifier keys to another modifier key (or to nothing)

Fortunately, there is KeyRemap4MacBook that lets you do very low level things with your keyboard. Install it.

We will now remap our compose key to something that the next layer will understand. I chose Shift-Control-F13 for that. It is very unlikely you will need that key combination.

In the file ~/Library/Application Support/KeyRemap4MacBook/private.xml put the following:

<?xml version="1.0"?>
<root>
  <item>
    <name>Send Shift-Ctrl-F13 for Right Option</name>
    <identifier>private.send_shift_ctrl_f13_for_ropt</identifier>
    <autogen>--KeyToKey-- KeyCode::OPTION_R,
                          KeyCode::F13, ModifierFlag::SHIFT_L
                                      | ModifierFlag::CONTROL_L
    </autogen>
  </item>
</root>

Finally, from the System Preferences, open the KeyRemap4MacBook settings and click on the ReloadXML button:

The new option should appear. Activate it:

Step 3: create compose bindings

The last step is the creation of the actual bindings. I chose to import the rules from /usr/share/X11/locale/en_US.UTF-8/Compose on my Debian system.

Bob Kåres wrote a script that lets you convert X11 compose rules into Cocoa key bindings.

Either convert a Compose file of your own using Bob’s script, or download my DefaultKeyBinding.dict. Save it in ~/Library/KeyBindings/DefaultKeyBinding.dict.

Be careful: by default Bob’s script uses F13 instead of Shift-Ctrl-F13 so in DefaultKeyBinding.dict you need to change:

    "\UF710"

into:

    "^$\UF710"

If for some reason you decided to go for another combination, check out this article by Xah Lee to find out the proper syntax.

Step 4: restart all applications

And that’s it! Your Mac OS X system is now slightly more usable.

Official IRC channel

The official IRC channel for Lol Engine developers and users is now #LOL on irc.oftc.net.

Anything related to the Lol Engine usage or development is on-topic, but also game development, rendering, maths, physics, though of course there are better channels for that.

You can access #LOL directly using the following link: irc://irc.oftc.net/#LOL.

LINK : fatal error LNK1104: cannot open file ‘XAPID.lib’

Ever got a link error for a library that was referenced nowhere in your Visual Studio project or even in the final link.exe command line? Here's a hint: check the contents of static libraries, too. They may be pulling unexpected dependencies behind your back!

If the static library is part of your solution, here is another hint: check that the [Configuration Properties] >> [C/C++] >> [Code Generation] >> [Runtime Library] configuration values match across projects.

Beyond De Bruijn: fast binary logarithm of a 10-bit number

Recently I needed a method for retrieving the binary logarithm of a 10-bit number (for the curious, it was for the purpose of converting between 32-bit and 16-bit floating point numbers).

Computing the binary logarithm is equivalent to knowing the position of the highest order set bit. For instance, log2(0x1) is 0 and log2(0x100) is 8.

One well known method for fast binary logarithm is presented at Bit Twiddling Hacks. It is a two-step method where first all lower bits are set to 1 and then a De Bruijn-like sequence is used to perform a table lookup:

int fastlog2(uint32_t v)
{
    static const int MultiplyDeBruijnBitPosition[32] =
    {
        0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
        8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
    };
    v |= v >> 1;
    v |= v >> 2;
    v |= v >> 4;
    v |= v >> 8;
    v |= v >> 16;
    return MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >> 27];
}

That is 12 integer operations and a table lookup.

Optimising

It should be obvious what the sequence of operations on v does: fill the integer with ones starting from the highest order bit. Here are a few examples of what happens to v at each step:

v v |= v >> 1 v |= v >> 2 v |= v >> 4 v |= v >> 8 v |= v >> 16
0x0001 0x0001 0x0001 0x0001 0x0001 0x0001
0x0002 0x0003 0x0003 0x0003 0x0003 0x0003
0x0003 0x0003 0x0003 0x0003 0x0003 0x0003
0x0004 0x0006 0x0007 0x0007 0x0007 0x0007
0x0100 0x0180 0x01e0 0x01fe 0x01ff 0x01ff
0x80000000 0xc0000000 0xf0000000 0xff000000 0xffff0000 0xffffffff

There is one obvious optimisation available: since the input is only 10-bit, the last shift operation v |= v >> 16 can be omitted because the final value was already reached.

int fastlog2(uint32_t v)
{
    static const int MultiplyDeBruijnBitPosition[32] =
    {
        0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
        8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
    };
    v |= v >> 1;
    v |= v >> 2;
    v |= v >> 4;
    v |= v >> 8;
    return MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >> 27];
}

10 instructions instead of 12. Not really amazing, but worth mentioning.

Optimising more?

Could we do better? Now the last line is v |= v >> 8; and it is only useful to propagate the 9th and 10th bits to positions 1 and 2. What happens if we omit that line? Let’s see:

  • For most values of v, the expected value is obtained.
  • For values of v with a highest order bit at 9th position, 0x1fe could be obtained instead of 0x1ff.
  • For values of v with a highest order bit at 10th position, one of 0x3fc, 0x3fd or 0x3fe could be obtained instead of 0x3ff.

The list of possible output values would therefore be 0x1, 0x3, 0x7, 0xf, 0x1f, 0x3f, 0x7f, 0xff, 0x1fe, 0x1ff, 0x3fc, 0x3fd, 0x3fe, 0x3ff. What happens to these values when multiplying them with the De Bruijn sequence? Let's see:

v v * 0x07C4ACDDU) >> 27
0x1 0
0x3 2
0x7 6
0xf 14
0x1f 30
0x3f 29
0x7f 27
0xff 23
0x1fe 15
0x1ff 16
0x3fc 30
0x3fd 31
0x3fe 0
0x3ff 1

Damn! Two values are colliding. It looks like we cannot omit the last line after all.

Beyond De Bruijn

Let’s give another try at the problem. Usually De Bruijn sequences are built using either nontrivial algorithms, or brute force. Maybe we could find another sequence that has no collision? Or a sequence that is not a De Bruijn sequence but that works for our problem?

Well, let’s just brute force!

(2 seconds later)

int fastlog2(uint32_t v)
{
    static const int MagicTable[16] =
    {
        0, 1, 2, 8, -1, 3, 5, 9, 9, 7, 4, -1, 6, -1, -1, -1
    };
    v |= v >> 1;
    v |= v >> 2;
    v |= v >> 4;
    return MagicTable[(uint32_t)(v * 0x5a1a1a2u) >> 28];
}

Down to 8 instructions instead of 12. And the lookup table is now half the size!

Conclusion

It is possible for multiply-and-shift techniques similar to the De Bruijn sequence algorithm to exist for a larger set of problems. Brute forcing the search is a totally valid method for 32-bit multiplication.

The code used for this article is included in the attached file.

Maths trick: doing fewer comparisons

Note: this is not an optimisation. It is just one more tool you should have in your toolbox when looking for optimisations. It may be useful.

This is the trick:

\[\min(x,y) = \dfrac{x + y - |x - y|}{2}\]
\[\max(x,y) = \dfrac{x + y + |x - y|}{2}\]

You can check for yourself that it is always true: when x > y, |x - y| is the same as x - y, etc.

What good is it for? There is often an implicit comparison in min or max. It might be interesting to replace it with a call to the branchless fabs.

Example usage

Consider the following code:

float a, b, c, d;
/* ... */
return (a > b) && (c > d);

That kind of code is often used eg. in collision checks, where a lot of tests can be done. This code does two comparisons. On some architectures, this means two branches. Not always something you want.

The test condition is equivalent to:

(a - b > 0) && (c - d > 0)

Now when are two given numbers both positive? That is if and only if the smallest is positive:

min(a - b, c - d) > 0

We may now use our trick:

(a - b) + (c - d) - |(a - b) - (c + d)| > 0

And so the code could be rewritten as such:

float a, b, c, d;
/* ... */
return (a - b) + (c - d) > fabsf((a - b) - (c - d));

We basically replaced the additional test with a call to fabsf and some additions/subtractions. It may be possible to reorganise the input data so that this second version performs better.

C++ trick: selectively restrict implicit conversions

TL;DR: given a class Foo with an implicit constructor from int, how to allow the implicit conversion in f(42); but not in g(42); where both f and g take a Foo const & argument?

Background

So I have this real class that performs numeric operations that I want use just like any other C++ numeric type. For instance, I can write the following:

float f = 15, g = 3.5;
int x = f / g;

If I decide that I need double precision, I can write:

double f = 15, g = 3.5;
int x = f / g;

And of course, using my real class for even higher precision:

real f = 15, g = 3.5;
int x = f / g;

I like that. I can just write code as usual, and when I need higher precision, I use real instead of double. It's transparent and convenient.

Implementation example

Here is a highly simplified example of a real class:

struct real
{
    inline real(double d) : m_value(d) {}
    inline operator int() const { return (int)m_value; }
    /* ... */
    long double m_value;
};

It is possible to write real f = 15 because of the implicit constructor. Actually, C++ constructors are always implicit unless specified otherwise.

It is possible to write int x = f / g because of the conversion operator.

So far, so good.

The problem with implicit promotion

Here is how fabs could be implemented:

real fabs(real const &r)
{
    return real(r.m_value < 0 ? -r.m_value : r.m_value);
}

But now we have a problem. A subtle problem. Consider the following code:

double x = fabs(-5.0);

What does this do? Well, it depends. It depends whether <cmath> was included or not. Because if <cmath> wasn’t included, then that code is going to automatically promote -5.0 to a real and call our custom function instead of the one provided by the math library! With no compile-time warning!

This is confusing. It should not happen. But it is a well known problem and there are several obvious workarounds:

  1. What most professional C++ programmers will tell you: use namespaces
  2. Mark the real(int) constructor explicit

The problem with 1. is that I am not a professional C++ programmer. I am a C programmer who uses C++. I use preprocessor macros and printf and memalign and goto. Try and stop me!

The problem with 2. is that I can no longer write real f = 15, I would need real f(15) or real f = real(15) instead. This is not acceptable, I want real to behave exactly like float and others, to the fullest extent of what the language allows.

Another solution

Fortunately, the C++ standard has a solution for us: “Implicit conversions will be performed [...] if the parameter type contains no template-parameters that participate in template argument deduction” (ISO/IEC 14882:1998, section 14.8.1.4). You cannot have implicit conversion and template argument deduction at the same time.

It means we just have to make fabs a template function! Which means making real a template class, too.

A quick way to fix real would be:

/* N is unused */
template<int N> struct real_base
{
    inline real_base(double d) : m_value(d) {}
    inline operator int() const { return (int)m_value; }
    /* ... */
    long double m_value;
};
typedef real_base<0> real;

The template argument is useless, unfortunately. It will just have to be here, forever. But who knows, you might find a use for it one day.

And to fix fabs:

/* A generic template declaration is needed */
template<int N> real_base<N> fabs(real_base<N> const &r);
/* Here we just add template<> to the previous version */
template<>
real fabs(real const &r)
{
    return real(r.m_value < 0 ? -r.m_value : r.m_value);
}

So, what happens with double x = fabs(-5.0); when we forget to include <cmath> now? Well, here is what GCC says:

In function ‘int main()’:
error: no matching function for call to ‘fabs(double)’
note: candidate is:
note: template<int N> real_base<N> fabs(const real_base<N>&)

It seems we’ve successfully managed to avoid the problematic implicit conversion, yet still allow it in places where it was useful!

So what is the rule? It’s simple: where implicit conversion should not be allowed, make the function a specialised template function.

C/C++ trick: static string hash generation

I am always interested in having the compiler do more things for me, without giving away code clarity or performance for the convenience. Today a colleague linked me to Pope Kim's Compile-Time Hash String Generation article which is a perfect example of the things I like: hidden syntactic sugar that does useful things.

Inline hash function

The goal: for a given hash function, write something like HASH_STRING("funny_bone") in the code, and have the compiler directly replace it with the result, 0xf1c6fd7f.

The solution: inline the function and hope that the compiler will be clever enough.

#include <string.h>
#define HASH(str) generateHash(str, strlen(str))

inline unsigned int generateHash(const char *string, size_t len)
{
    unsigned int hash = 0;
    for(size_t i = 0; i < len; ++i)
        hash = 65599 * hash + string[i];
    return hash ^ (hash >> 16);
}

Unfortunately Pope ran into several very problematic issues:

  • requires heavy optimisation flags (/O2 with Visual Studio, -O3 with g++)
  • limited to 10-character strings with Visual Studio
  • limited to 17-character strings with g++

I could personally reproduce the g++ limitations. I believe they are more related to loop unrolling limits than to the actual string size, but they indeed make the technique unusable in practice.

Macro-based hash function

If you read my previous article about C/C++ preprocessor LUT generation, you may remember that it used preprocessor tricks to do loop unrolling.

Hence the following implementation:

#include <string.h>
#include <stdint.h>
#include <stdio.h>

#define H1(s,i,x)   (x*65599u+(uint8_t)s[(i)<strlen(s)?strlen(s)-1-(i):strlen(s)])
#define H4(s,i,x)   H1(s,i,H1(s,i+1,H1(s,i+2,H1(s,i+3,x))))
#define H16(s,i,x)  H4(s,i,H4(s,i+4,H4(s,i+8,H4(s,i+12,x))))
#define H64(s,i,x)  H16(s,i,H16(s,i+16,H16(s,i+32,H16(s,i+48,x))))
#define H256(s,i,x) H64(s,i,H64(s,i+64,H64(s,i+128,H64(s,i+192,x))))

#define HASH(s)    ((uint32_t)(H256(s,0,0)^(H256(s,0,0)>>16)))

It has the following properties:

  • works in C in addition to C++
  • strings are always optimised away by gcc or g++ (but not always the computation itself)
  • hash computation is optimised away by gcc or g++ even with -O, -O1 or -Os
  • string size limit is 256 characters (probably more than enough for most uses) and can be manually increased or decreased

The following code:

int main(void)
{
    printf("%08x\n", HASH("funny_bone"));
    printf("%08x\n", HASH("incredibly_large_string_that_gcc_groks_easily"));
}

Is (correctly) optimised to this with gcc -Os:

  ...
  movl    $-238617217, %esi
  movl    $.LC0, %edi
  xorl    %eax, %eax
  call    printf
  movl    $-453669173, %esi
  movl    $.LC0, %edi
  xorl    %eax, %eax
  call    printf
  ...

I haven't tested it with Visual Studio. Feedback from this compiler would be very appreciated!