The thorny issue of naming conventions (part 2): scope and type hints

This is the second part of this series of posts on naming conventions in the source code. In the previous post I discussed the use of small and capital letters in type and variable names, and today I’m going to write about the convenience (or inconvenience) of using prefixes to indicate the scope and the type of variables. Since I started the Nubaria Software project, the languages I’ve had to use on a regular basis are C++, PHP and JavaScript, so this discussion, like the previous one, is mainly based on my experience with these languages. In any case, many of these ideas about naming conventions are equally valid for other programming languages with a similar C-like syntax, like C itself, Java and C#.

I will first discuss the controversy surrounding the Hungarian notation, which consists in using prefixes that indicate the type of the variables. This style of notation has often been frowned upon in C and C++ circles, but I think that it is useful in weakly-typed languages like JavaScript. A closely-related notation practice consists in using prefixes that indicate the scope of variables, like an initial ‘m’ for class member variables. As I will try to argue, that is a convention that can make the code more readable, so it’s part of our in-house naming rules.

1. Hungarian notation for types

The Hungarian notation must be one of the most derided naming conventions in the history of programming. It takes its name from the nationality of Charles Simonyi, the Microsoft developer who recommended using prefixes in variable and function names to identify the way those values or operations were used. Simonyi’s original article is actually quite good, and the bad reputation for Hungarian is due to the fact that the system was later abused by other developers at Microsoft who began to name all DWORD variables as dwWhatever, all WORD variables as wWhatever, strings as lpszWhatever (long pointer to string that is zero-terminated), and so on. Since virtually anything can be stored in a DWORD, the ‘dw’ prefix does not help at all, and starting all string names with ‘lpsz’ is just a nuisance too, at least in typed languages like C, where bad operations or assignments based on the type of the variable can be detected by the compiler. Things got even worse with the Win32 Windows API, which adopted a 32-bit memory model and put to rest concepts like near and far pointers and reduced the use of 16-bit variables. This led to inconsistencies such as the wParam parameter used in window procedures turning into a 32-bit quantity while keeping its original name, as it would have been a hassle to search and replace all appearances of wParam (and change all the documentation on Windows messages where the two message-specific parameters are always referred to as wParam and lParam). This was somewhat fixed by Microsoft by adding two new typedefs WPARAM and LPARAM, but the whole thing seems very messy. Such inconsistencies, as the Windows code evolved and the uselessness of prefixes like ‘dw’ or ‘lpsz’ became clear, made Windows programmers very critical of this naming convention. The classic book Win32 Programming by Brent E. Rector and Joseph M. Newcomer includes a very negative view of Hungarian notation, and today anyone defending some flavour of Hungarian notation is in for a heated and passionate debate.

However, the abuse of the Hungarian notation in the Win32 API doesn’t mean that it can be dismissed off-hand as a bad idea. First, we need to keep in mind that when it was devised, programmers were using text editors to write code in C, and it is likely that at that time a lot of global variables were being used and that compilation times were high. With modern compilers and IDE’s as well as coding practices that encourage encapsulation, tracking the declaration of a variable is much simpler than it was in the past. Secondly, it must be pointed out that Simonyi advocated the use of prefixes to make some subtler distinctions than the types as recognised by the C compiler. A brilliant defence for this approach has been written by Joel Spolsky in his blog. A reply to it can be found in this article in the Lambda the Ultimate blog.

I think Hungarian notation in its crudest form of replicating the declared type in the name of the variable (sometimes referred to as ‘Systems Hungarian’, see the Wikipedia article) is unnecessary in typed languages (at least with modern compilers and IDE’s) and I prefer the simplicity of having variable names with readable names like previousName and newName. Even the more sophisticated version (‘Apps Hungarian’) which uses prefixes to add a sort of manual extension to the type system is unnecessary in such languages. If we use integers to store colour values we may add a typedef so that all colour values are declared as ColourValues and not as plain integers. Similarly, buffer sizes are traditionally declared as size_t in C and C++ rather than as plain int or unsigned int.

However, I think Hungarian notation is indeed useful in weakly-typed languages like JavaScript. This is also argued very convincingly by Ian Alderson, of Caplin Systems, in a blog post that describes the JavaScript variable naming convention that they follow at Caplin. JavaScript was originally designed for small scripts, but nowadays it is also used for very complex web applications. In a large-scale JavaScript project it is essential to supplement the raw syntax of the language with some sort of type safety, even if this is just in the simple guise of a naming convention, as this can go a long way in making it easier to know what variables stand for and to debug and find programming errors.

Based on this reasoning, here at Nubaria Software we use Hungarian notation in JavaScript and avoid it altogether in C++ code. There is, however, a related use of prefixes to identify scope that we use in C++ too. I will explain this in the next section.

3. Prefixes for scope

A convention somehow related to Hungarian notation is the use of prefixes to indicate scope, like ‘m’ or ‘_’ for member variables. Unlike Hungarian notation, I think this habit is actually a very good practice in C++ code. Basically, if I am reading some code written by another member of the team (or by myself a long time ago) and I am in the middle of a loop, it is easier for me to understand what’s going on if the names of the variables indicate whether they have local scope, class scope or global scope. Even if we keep global and static member variables to a minimum, in my experience such a practice is useful. In fact, I now find it hard to read code that doesn’t differentiate between local variables and member variables, which seems to back my view that such a visual separation of scope is indeed a good practice. This has been expressed very convincingly by Matt Arnold in a post in the comp.lang.c++ Usenet newsgroup. There are of course other opinions. < a href="#ref10">This blog post by Peter Ritchie argues against the need for such prefixes (although he’s talking about C#, and he accepts that it is useful for C++ initialiser lists, where you can’t use the this pointer). Another argument in favour of scope prefixes is that members appear together in the visual environment tools provided by the IDE. Dave Donaldson explains this in his blog. It is a valid argument, although it is true that it has more to do with IDE capabilities than with the sheer readability of the code, which is, in my opinion, the main issue here. Another article that argues the case for a member variable prefix, in this case for C#, is Prefix for Class Members by Shahar Y, which has some interesting remarks and alternative views in the Comments section.

Matt Arnold in that comp.lang.c++ post also argues against a related practice, which consists in explicitly referring to member variables using the this pointer, as in the following example:

void Foo::SetColor(ColorType color)
    this->color = color;

The problem with this is that such a use of the this pointer is not enforced by the compiler, and we can always leave it out if there is no clash with other local variables. I think it doesn’t look like a good idea to impose a coding style that requires permanent attention on the part of the programmer, every time a member is used rather than just when their names are assigned in the class definition. And if we only type the this pointer when there is a name clash, then we lose the visual feedback of scope. As the void Nish(char* szBlog); blog puts it, ‘the onus is on the coder to remember to do this – there’s nothing stopping him from forgetting to prefix this for member fields’. I completely agree.

So, once we decide to use a prefix to indicate class members, which one should we use? Some people use a plain underscore, so that a class would have members like _name and _value. My personal preference is for an initial ‘m’: mName and mValue in camelCase fashion. There are two reasons why I think a plain underscore is not a good idea. First, the C++ standard says that names beginning with two underscores or with one underscore followed by an uppercase letter are reserved for the language implementation for any use, while any names that start with an underscore are reserved for the implementation for names in the global namespace (see Global names in the freely-available February 2011 draft of the C++11 standard). This means that by choosing names that start with an underscore we’re treading perilously close to the limits of legal names. A new programmer might get the convention wrong and type _Value, an invalid variable name according to the standard C++ specification. Because of that, I think it is better to leave the use of initial underscores to the implementation of the standard libraries. The second reason for my preference for the ‘m’ prefix is that this style can be extended naturally to other scopes. Static member variables of a class have global scope, so they should be labelled in a different way: we can use ‘s’ as a prefix. This use of ‘s’ is common in the C++ literature (for example, it is used in the lesson on static member variables). In this way, mValue would be a non-static member variable of a class, whereas sValue would be a static member variable. Similarly, global variables can be prefixed by ‘g’. In C++ projects, it should be possible to get rid of them completely, so the presence of one should indicate some temporary code (like a test or a quick hack to patch a severe bug). The fourth prefix that we use is ‘k’ for constants. Note that in C++ there is no need to define constants as macros (as in C #define PI 3.1416), and you can define such constants as global const objects: const double kPi = 3.1416;. Some C++ programmers adopt the use of C++ constants but keep the macro-style names. I think it is better to treat them as any other variable. The ‘k’ prefix is good enough to identify them as constants. We could use ‘c’, but since ‘c’ is used in some styles of Hungarian notation for classes or characters, it is better to use a different letter. The use of ‘k’, perhaps of German origin, is part of the naming conventions of Apple and Google, so it is quite widespread and recognisable.

The values in enumerated types in C++ can be regarded as constants, but since they are declared in a special way, we’ve chosen the ‘e’ prefix for those values, so that they can be readily identified as part of an enumeration. So, in C++ we write enum Season {eSpring, eSummer, eAutumn, eWinter};.

3. References and further reading

  1. Hungarian notation in Wikipedia.
  2. Hungarian Notation by Charles Simonyi. A 1999 reprint of the original internal Microsoft article.
  3. Win32 Programming, by Brent E. Rector and Joseph M. Newcomer.
  4. Making Wrong Code Look Wrong, an article by Joel Spolsky.
  5. Hungarian Notation vs The Right Thing. A reply to the previous article in the Lambda the Ultimate blog.
  6. Javascript Variable Naming Convention. A very good article on the use of Hungarian notation in Javascript at Caplin Systems.
  7. The Case against Hungarian Notation in Javascript. A blog post by Nicholas Zakas, author of some books on Javascript who has changed his mind regarding Hungarian notation.
  8. Underscore-prepended Variable Names. A discussion about initial underscores on the Usenet comp.std.c++ group.
  9. Google C++ Style Guide.
  10. The Religion of Class Member Prefixing. An article on Peter Ritchie’s MVP Blog.
  11. Coding Guideline: DO Use an Underscore for Class Member Variables. An article by Dave Donaldson.
  12. Prefix for Class Members. An article by Shahar Y in favour of using prefixes for class member variables.
  13. Using underscore to prefix private member fields. An article in the void Nish(char* szBlog); blog.
  14. Hungarian Notation. An article in a MSDN blog.
This entry was posted in Coding standards. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>