Setting up a central Git repository on a Windows server

In one of the first posts I wrote when I started this blog I tried to give a general overview of Git, which has quickly become probably the most successful version control system in current use, especially in new and open-source projects. I’ve been a regular Git user since the time when I wrote that post, and recently I found myself in a situation where I had to set up a Git repository for a client on a computer running Windows Server 2003, which I would have to access through the Internet. What I initially thought would be a pretty straightforward task turned out to be much harder than I expected. To begin with, Git repositories with remote access are mostly set up on computers running Linux, where things are considerably simpler. This is due to the fact that Linux offers much better support for the SSH protocol that Git uses. When one searches the web for information, it’s hard to find good documentation about how to set Git up for remote access on Windows, and some of the resources I found were often not completely reliable. Spiked by the challenge and all the troubles I bumped into, I’ve decided to write down a quick guide of all the steps for my future reference and share it here just in case others may find it useful too. If you find anything inaccurate or just feel like commenting or sharing any additional information on this topic, please don’t hesitate to use the comments area below.

Continue reading

Posted in Git, Version control software, Windows | 22 Comments

The thorny issue of naming conventions (part 2): scope and type hints

This is the second part of this series of posts on naming conventions in the source code. In the previous post I discussed the use of small and capital letters in type and variable names, and today I’m going to write about the convenience (or inconvenience) of using prefixes to indicate the scope and the type of variables. Since I started the Nubaria Software project, the languages I’ve had to use on a regular basis are C++, PHP and JavaScript, so this discussion, like the previous one, is mainly based on my experience with these languages. In any case, many of these ideas about naming conventions are equally valid for other programming languages with a similar C-like syntax, like C itself, Java and C#.

I will first discuss the controversy surrounding the Hungarian notation, which consists in using prefixes that indicate the type of the variables. This style of notation has often been frowned upon in C and C++ circles, but I think that it is useful in weakly-typed languages like JavaScript. A closely-related notation practice consists in using prefixes that indicate the scope of variables, like an initial ‘m’ for class member variables. As I will try to argue, that is a convention that can make the code more readable, so it’s part of our in-house naming rules.

Continue reading

Posted in Coding standards | Leave a comment

The thorny issue of naming conventions (part 1): lowercase v. uppercase

It’s been a long time since my previous post. As I should have expected, I’ve found that it’s actually pretty hard to maintain a blog and I’ve been busier than usual during the last few months. I hope I can now resume my blogging activity and continue posting articles more regularly.

Different naming schemes

Today’s post is the first one in a two-part series on naming standards. When I started my current professional project of Nubaria Software I reflected about this issue of what naming conventions to follow, and adopted some guidelines that I now try to follow in all the code I write, which is mostly C++, PHP and JavaScript these days. These naming guidelines have become part of our in-house coding standards. In this article and the next one I’ll try to sum up the rationale behind these naming conventions and the pros and cons of the alternative approaches I’ve also used or considered in the past.

This first post addresses the use of lower and upper case in names. In the next post, I will discuss the use of prefixes to indicate types and scope.

Continue reading

Posted in Coding standards | Leave a comment

Bitcoin. The money of the future?

In today’s post, I’m going to write about a recent software project that has taken the geek world by storm: Bitcoin, the cryptographic cybercurrency. I’m going to discuss the general aspects of Bitcoin today, and I’ll leave some of the technical aspects for a future post. If you haven’t heard about Bitcoin yet, it is a form of electronic currency that can be used for online payments. This video from www.weusecoins.com gives a very basic introduction:

And this is another interesting video on YouTube, where Jerry Brito explains some of the key points about Bitcoin:

Continue reading

Posted in Bitcoin | Leave a comment

A code point iterator adapter for C++ strings in UTF-8

As the last post in this series I’ve been writing on Unicode and UTF-8, I thought I would elaborate on an interesting idea I mentioned in my previous post. When discussing how a std::string object that stores UTF-8 text is just a sequence of raw bytes rather than Unicode code points, I hinted that it wouldn’t be difficult to write a special iterator class for those situations where we may need to traverse the code point values rather than the bytes. In this post I explain how to write such an iterator class.

Continue reading

Posted in C/C++, Character encoding, Unicode | 10 Comments

Using UTF-8 as the internal representation for strings in C and C++ with Visual Studio

In today’s long post, I’m going to explain the guidelines we follow at Retibus Software in order to handle Unicode text in Windows programs written in C and C++ with Microsoft Visual Studio. Our approach is based on using the types char and std::string and imposing the rule that text must always be encoded in UTF-8. Any other encodings or character types are only allowed as temporary variables to interact with other libraries, like the Win32 API.

Note that a lot of books on Windows programming recommend using wide characters for internationalised text, but I find that using single bytes encoded as UTF-8 as the internal representation for strings is a much more powerful and elegant approach. The reason for this is that it is easier to use char-based functions in standard C and C++. Developers are usually much more familiar with functions like strcpy in C or the C++ std::string class than with the wide-character equivalents wcscpy and std::wstring, and the support for wide characters is not completely consistent in either standard. For example, the C++ std::exception class only accepts std::string descriptions in its constructor. In addition, using the char and std::string types makes the code much more portable across platforms, as the char type is always, by sheer definition, one byte long, whereas sizeof(wchar_t) can typically be 2 or 4 depending on the platform.

Even if we are developing a Windows-only application, it is good practice to isolate the Windows-dependent parts as much as possible, and using UTF-8-encoded strings is a solid way of providing full Unicode support with a highly portable and readable coding style.

Continue reading

Posted in C/C++, Character encoding, Unicode | 73 Comments

Character encodings and the beauty of UTF-8

我愛 UTF-8 أحِبّ

In my previous blog post, I discussed what was needed to ensure that a web site uses the UTF-8 character encoding consistently. I thought I should write a post on why I think UTF-8 is superior to any other encodings, and why outdated and limited schemes such as ISO-8859-1 should be ditched once and for all. In my experience, it seems that a lot of developers don’t pay much attention to character encoding issues, and there aren’t that many good introductions on the web (see the references below for a couple of good ones). In this post, I will write a primer on the main concepts behind character encoding and the advantages of using UTF-8. Even if it will appear very basic to well-seasoned programmers, I hope this post may be useful for those who feel their understanding of encodings is on shaky ground.

Continue reading

Posted in Character encoding, Unicode | 3 Comments

Thanks for signing up, Mr. González – Welcome back, Mr. González!

In my previous post in the blog, I mentioned my frustration about those programs that surprisingly fail on Unicode support and encoding issues. I thought I should write a post about this because it never stops to amaze me how, more than ten years into the 21st century, there’s still a lot of software around that can’t cope with accented letters or non-Latin characters. And this happens quite often in the web too. I experience this constantly because my name has some accented letters and it is displayed incorrectly in a lot of e-mails I receive. The situation exemplified by the title of this post will be very familiar to those who have a name with any of the accents and other diacritics common in most European languages.

These problems with accented letters typically happen because of an inconsistency in character encodings creeping in as the information flows through different systems. Just think about the process involved in signing up for a website. In order to register as a user you typically have to fill in an HTML form, which may use some JavaScript, and the information you enter will be sent to a server, where it may be processed by some scripting code and stored in a database. When your user name is retrieved, for example in order to send you a notification by e-mail, it has to be read from a table in a database, and then handled by some scripting code again that will generate the text of the e-mail that finally appears in your web-based inbox as an HTML document. Problems with accented letters and non-Latin characters happen when there is some mismatch of encodings between how the text is read in one place and how it is written somewhere else. For example, the conversion from ‘González’ to ‘González’ in the title of this post is a typical case which happens when some text that was originally retrieved in UTF-8 encoding is then misinterpreted as ISO-8859-1. This is quite sloppy. In my view, just like scientists in a laboratory should never mix up metric and imperial units, people who write software should devote some time to think about these issues and ensure that they consistently stick to one character encoding in the application they’re developing. Unfortunately, there are still a lot of programs and websites that run into such problems, and this is a shame, especially because nearly 20 years have elapsed since the universal encoding UTF-8, one of the various flavours of Unicode, was first proposed and all these problems just go away if you use UTF-8 consistently and avoid region-specific encodings.

Continue reading

Posted in Character encoding, Unicode, Web development | Leave a comment

Gittin’ the job done: the choice of a version control system

One of the good things about starting a software project from scratch is that you can think about the best way to organise your source code and documentation without the constraints of being in a place where a system, which could be flawed or old-fashioned (often both), has already been in place for a long time, and which your managers and colleagues refuse to modify. This applies to things such as the way you set up the files and directories, the coding guidelines and, in particular, the decision of which version control system (VCS) to use. In the companies I’ve worked for before, I never had to think about these issues since somebody had already made a decision and it was simply a matter of adapting to whatever the company had chosen. In this post I’m going to discuss the choice of a version control system. If you’re new to version control software, also referred to as ‘revision control software’, or ‘source code management’ (SCM) when used for source code, I recommend you read the Wikipedia articles Revision control and Comparison of revision control software.

Working out which VCS to use is a bit of a daunting task because there are a lot of competing systems. The browser wars may be the ones that get the most headlines, but the fights have been fiercer, and there have been much more victims, on the VCS battleground. After reading a bit about the various systems, I’ve finally decided to use Git, which is quickly becoming the most successful of the new kids on the block. While researching this, I’ve found that the VCS landscape has changed dramatically over the last few years, and three new systems: Git, Mercurial and Bazaar have completely taken over from the older systems, like CVS and Subversion. The success of these newer systems stems from a fundamental change in paradigm: whereas the older systems follow a centralised approach, where all the versioning history is stored on a server and the clients communicate with that server in order to update changes or check the history, the newer systems adhere to a distributed approach, where there is no privileged server and all the working copies of a repository are fully-fledged repositories by themselves. The initial gut feeling is that the distributed approach sounds like overkill, but it has actually turned out to be much more powerful.

Git was originally conceived by Linus Torvalds for use in the Linux kernel after the software they were using, BitKeeper, stopped being free. There is a video on YouTube of a talk Torvalds gave at the Google headquarters, in which he explains the advantages of Git.

Continue reading

Posted in Version control software | 5 Comments

Triple OS boot on a Mac Mini: Mac OS X, Windows 7 and Linux

Until recently my only computer was a five-year-old Sony VAIO laptop running Windows XP. Since I intend to develop applications for the Apple operating systems, I knew I’d have to buy a Mac at some point. As far as I know, right now there are no reliable emulators for Mac OS X and iOS that run under Windows, and there are also legal issues about running Apple operating systems on a Windows machine. So, trying to run some sort of emulator on my Windows laptop didn’t sound like a good idea at all. If you want to develop software for any of the Apple platforms, the only reasonable (and legal) option is to get a Mac.

And buying a Mac is a good idea for its own sake. While I’m no Apple fanboy, in my experience Macs are slick and elegant, and I actually felt like buying one. But I didn’t want to spend a lot of money and, knowing that Apple products tend to come with a hefty price tag, I decided to buy the cheapest Mac currently on sale: the Mac Mini.

My Mac Mini

Continue reading

Posted in Mac, Operating systems | 2 Comments