The thorny issue of naming conventions (part 1): lowercase v. uppercase

It’s been a long time since my previous post. As I should have expected, I’ve found that it’s actually pretty hard to maintain a blog and I’ve been busier than usual during the last few months. I hope I can now resume my blogging activity and continue posting articles more regularly.

Different naming schemes

Today’s post is the first one in a two-part series on naming standards. When I started my current professional project of Nubaria Software I reflected about this issue of what naming conventions to follow, and adopted some guidelines that I now try to follow in all the code I write, which is mostly C++, PHP and JavaScript these days. These naming guidelines have become part of our in-house coding standards. In this article and the next one I’ll try to sum up the rationale behind these naming conventions and the pros and cons of the alternative approaches I’ve also used or considered in the past.

This first post addresses the use of lower and upper case in names. In the next post, I will discuss the use of prefixes to indicate types and scope.

1. Introduction

Few issues can raise such strong opinions among programmers as the apparently harmless question of how one should name functions, classes and variables. On the web one can find many heated debates about whether we should have class_names, classNames or ClassNames, or whether class member variables should be ‘m_members’, ‘mMembers’, ‘_members’ or plain ‘members’. Even though such debates can become very passionate, most people will agree that in the end it’s just a matter of personal preference, and that the most important thing is consistency. If you decide that your classes will have lowercase_names_with_underscores, then stick to that decision and don’t mix such names with PascalCaseNames or camelCaseNames. Consistency can save a lot of time and prevent errors when writing code (‘did that variable start with a capital A?’) or when searching the existing code (‘was that function “ReleaseResource” or “release_resource”?’).

2. The big debate: lowercase v. uppercase

The main naming issue when writing source code in languages that don’t allow spaces in identifiers such as variables and types is how to mark word separation. A closely related issue is when to use capital letters. A common programming style consists in using lowercase identifiers, typically separated by underscores (as in ‘read_file’). Another very common style relies on capital letters to delimit the different words in an identifier (as in ‘ReadFile’). This second style is sometimes called ‘intercapitalisation’, ‘mixed case’ or ‘CamelCase’. The C and C++ languages have always favoured the first approach, while Microsoft have always used capital letters for the functions and classes in their various Windows libraries. Other well-known libraries, like OpenGL, also use capital letters to delimit the words in the names of functions. Note that there are two possible styles of intercapitalisation: one, often referred to as ‘PascalCase’ (because of its origins in the Pascal language, which doesn’t allow underscores in identifiers) has an initial capital letter, as in ReadFile, whereas the name ‘camelCase’ is sometimes reserved for the other possible style, which uses the capital letters to delimit words but not for the initial letter. In the software projects I have worked for in the past, it was common to use camelCase for variables and PascalCase for type names, so that we would have declarations like ‘ClassName instanceName;’. As I explain below, this is the style I still follow at present.

The main debate about the use of capitalisation is whether one form has better readability than the other. A lot of people, including myself, feel that text_separated_by_underscores is actually more easily read than TextSeparatedByCapitalisation. However, it has also been pointed out that the boundaries of types and variable names are clearer with the compact name format produced by capitalisation. For example, in the following expression

LargeIntType newBigNumber = bigNumber2 – bigNumber1;

it is clear that there is a declaration of a new variable which is initialised with the result of a subtraction. On the other hand, the corresponding lowercase expression

large_int_type new_big_number = big_number_2 – big_number_1;

can be misread as if it contains more than one subtraction because the underscores and the subtraction sign can be mixed up, and the limits of the variable names are not so clear. In this regard, the fact that underscores look like spaces can also be seen as a drawback when reading expressions such as statements and declarations. This again is very subjective, and some people will say that the second expression is more readable.

A possible argument in favour of lowercase for C and C++ code is that in these languages, such use is in line with the standard libraries, where we only find lowercase names such as basic_string or strcat. But note that this argument works both ways. It can also be argued that if I come across a name like ReadFile, it will be clear that it is not a C++ standard name, whereas read_file could be a standard class or function (even though in C++ the std namespace helps to tell apart what is standard from what is not). And in any case, a lot of books on C++ actually use intercapitalisation quite often. In fact, PascalCase names for classes are probably much more common in recent books on C++, and it is very difficult to find C++ code that uses only lowercase and underscores outside the standard library and the Boost library.

Because of my past experience, I am slightly biased towards intercapitalisation, but I have tried to think about the whole thing once again from scratch and read some of the discussions on the web. My impression is that the case for lowercase and underscores is based on the false premise that reading code is like reading prose. I think most people agree that if we modify one paragraph of a novel in two alternative ways by first joining all the words with underscores and then removing the spaces and capitalising the initials of each word, the first form will have much better readability, which makes sense since underscores look like an underlined space after all. However, I feel that the LargeIntType v. large_int_type example above shows that reading programming statements is different from reading plain text because we need to identify discrete units such as the class name or the variable name, which look much more compact when written in camelCase or PascalCase. This is probably the psychological reason why I feel that a lot of underscores make source code ugly. And it seems that a lot of people feel the same since names with underscores, which used to prevail in older C and C++ programming, and are still used in the C++ standard library, have become much less common during the last two decades. Most code in languages like Java and C# uses intercapitalisation, and many modern books in C++ do so too. Even Brian Kernighan, one of the original authors together with the late Dennis Ritchie of The C Programming Language, uses intercapitalisation in his 1999 book The Practice of Programming.

There are, however, two good arguments in favour of lowercase. One is the problem with acronyms, which can get messy when using mixed case. Should we write HtmlUrlEncoder or HTMLURLEncoder? Such spelling hesitation does not affect the lowercase option, where it can only be html_url_encoder. Another reason to favour lowercase names is the consequence that the names of source files will also be lowercase. For example, if a header file defines a class called text_file_reader, the file will be called text_file_reader.hpp. But what if the class is called TextFileReader? The more obvious name is TextFileReader.hpp, but capital letters in file names can be error-prone if we program in a system where file names are case-insensitive like Windows and then port the code to a system like Linux, where file names are case-sensitive and a line like ‘#include <textfilereader.hpp>’ would yield an error. So, basically the presence of capital letters in class names means that we have to choose between accepting capital letters in file names too or naming files differently from the classes.

Last year, while I was pondering about these issues, the readability factor and the two reasons I mention in the previous paragraph nearly made me reconsider my original position, and I seriously contemplated ditching my coding habits of the previous four years and go for the ‘text_file_reader a_file_reader;’ syntax. However, I finally decided to stick to the camelCase and PascalCase style that I was used to because:

  1. As I have argued above, I think the readability argument in favour of underscores is flawed. In a line of code, spaces mark the separation between different elements and tokens and I think compact names are clearer.
  2. The trend during the last two decades or so seems to be in favour of intercapitalisation. The fact that this general trend matches my aesthetic gut feeling may indicate that there is some Darwinian process (maybe related to point 1) favouring this.
  3. I want to adopt a consistent style for the various programming languages I use. Right now, these languages are C++, Javascript and PHP, but I don’t rule out writing some code in C# or Java in the future. While lowercase identifiers are common in C/C++ and PHP, the other languages have clearly joined the trend for mixed case, so most people reading my code or collaborating with me are likely to feel more comfortable with this style. Using a common style for different languages also makes it easier to translate some code from one language to another (like from C++ to C#), as I will have to just modify the language-dependent syntax and worry less about refactoring variable names.
  4. Intercapitalisation is also the naming style I am used to, so it will definitely be easier and less error-prone for me to continue using that style rather than readjusting my habits.

So my final decision was to use mixed case in C, C++, Javascript and PHP, and also for CSS classes and identifiers. The only exception is C and C++ macros, where I use fully capitalised names, which is one of the few universal naming conventions in C/C++. The fact that such capitalised names are ugly is actually a good thing because the use of macros in C++ should be kept to a minimum (mainly conditional compilation and header file guards), so it is good if the macros clearly stand out.

Regarding the two issues about acronyms and file names, I have decided to treat acronyms as full words (‘HtmlParser’, ‘XmlNode’) and I keep file names completely lowercase to avoid nasty issues when moving files from a Windows server to a Linux server, so in my naming conventions the header file that defines a class called ‘FileReader’ is ‘filereader.hpp’.

As for the decision whether to use lower camelCase or PascalCase, I’ve decided to take advantage of both possibilities: camelCase for variables and constants and PascalCase for namespaces and types. The next section explains the details.

3. What distinctions to make?

The common use of various styles of capitalisation and word separation can be exploited to differentiate between different kinds of entities. For example, one can use a different naming convention for a type and a variable, or a private member and a public member. But making too many distinctions through naming can be messy and confusing. Just imagine having different naming rules for public, protected and private methods, classes, enums and typedefs, local variables and parameters, and so on. I prefer to adhere to a simple style and not allow too many naming schemes. In fact, such differences can be left to the colouring and highlighting styles of the IDE, so even if we went for the all_lowercase style, like the C++ standard library, it is unlikely that there would be many errors due to mixing up variables with types, or classes with functions or whatever.

In fact, even if we decided to adopt the same naming style for namespaces, types, functions, variables and constants, all these categories are very different and their use within the code should make it clear which is which. For example, I’m used to typing ‘Dog dog;’ to instantiate a dog object, where I rely on the capitalisation of the initial to tell the type and the variable apart. But the word order also makes it clear that the first one must be the type and the second one must be the variable. A possible advantage of using the same naming style for both is that it would encourage coming up with different names for the variables, so that one would have to type ‘dog the_dog;’. This would make a clearer distinction between the class name and the instance. There’s another good argument against ‘Dog dog;’, mentioned by Patrick Doyle in a post in a classic discussion thread about naming conventions in the Usenet newsgroup comp.object, which is that the use of capitalisation in such declarations runs counter to English-language use, where it is the name of the individual dog which would be capitalised rather than the common name of its species. In that same discussion, Alf P. Steinbach argues in a post in favour of making just function names stand out. However, I would find it very inconsistent to have lowercase for variables and types and then uppercase for function names. Unless you use function names as function pointer parameters to other functions, something that I always avoid in C++, function names stand out because of their use. Others, like Dave Harris in another post in the same thread, argue that it is class names which should stand out.

All these arguments and counterarguments show how difficult it is to reach any clear conclusion as to which naming scheme is better. In the end, I decided to accept the ‘Dog dog;’ syntax, and this is how I now declare my local variables (or ‘Dog theDogThatGuardsTheHouse;’, if I need to be more explicit about the intent of the variable). My rationale for this decision is that I am used to having different naming styles for types (typedefs, structs, classes, enums) and variables (including constants), and I find that such a difference can help me understand the code better, like for example spotting that there are only two local variables within a function definition. Similarly, I also stick to the nearly universal practice of using fully capitalised names for C macros.

To sum up, the current naming convention we follow at Nubaria Software comprises the following three cases:

  1. Use camelCase for variables and constants.
  2. Use PascalCase for namespaces, types (including template parameters in C++), and functions.
  3. Use fully-capitalised names for macros in C and C++.

Fell free to use the Comments section below if you want to point out any other interesting arguments in favour of a particular naming practice regarding the use of capitalisation. In the next post I will discuss the use of prefixes to indicate type and scope.

4. References and further reading

  1. Naming Convention in Wikipedia.
  2. Variable Naming Conventions. A thread in the comp.lang.c newsgroup.
  3. ISO Studies of underscores vs MixedCase in Ada or C++. A thread in the comp.programming newsgroup.
  4. RANT: Stop Indicating Type in Variable Name Please. A thread in the comp.object newsgroup.
  5. New Coding Standards. A thread in the comp.lang.misc newsgroup.
  6. C++ Variable Names Poll. A question about camel case v. underscores. Most replies go for camel case.
  7. Underscore or camelcase? A Stack Overflow discussion with a lot of comments
  8. Poll Results: Hyphens, Underscores or camelCase?. A poll with comments about style preferences in CSS.
This entry was posted in Coding standards. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *