For Windows programmers, Linus Torvalds work has suddenly become relevant. No, we don't mean Linux, but Git. This distributed Source Control system now works sweetly as a nut on Windows. We contacted Linus for a second interview; this time to talk mainly about Git, but also to catch up with his thoughts about computer languages.
"Thinking that file locking is
a feature is just a sign that
you haven't been taking
your medication lately."
From 1991 until 2002, the Linux kernel source was passed around as patches or archived files, and there was no formal source control system, mainly because of the difficulties caused by the wide geographic distribution of the developers. Then, in 2002, the Linux kernel project started to use the proprietory DVCS (Distributed Version Control system) system called BitKeeper. This was a controversial move that was criticised at the time by Richard Stallman amongst others for developing an open-source project by means of a commercial one. In 2005, the critics proved to be right when the Linux team decided to give up using the 'community' version of BitKeeper after the copyright holder Larry McVoy (also a Linux developer) withdrew free use of his product to some of the key members of the Linux team after he claimed that an Australian Linux programmer in the team had violate the terms of the license by reverse engineering the BitKeeper protocols for another project.
Like any other good red-blooded hacker, Linus Torvalds sat down and wrote his own source control system called ‘git’ - which, he says, is named after himself. Though git was was initially nothing more than a rough temporary fill-in for BitKeeper, based on some of their experiences of using BitKeeper, other kernel hackers immediately started sending in patches to improve git - just as they had with the fledgling Linux and, in less than a month, git was close to matching the core capabilities of BitKeeper.
Git was written because of the non-linear development of Linux: there were thousands of parallel branches; It had to be reliable, fast and simple too, because of the huge size of the project. It had to be distributed and allow disconnected working. Torvalds was the right person to set the project going. He is a true software architect, who is able to spot the deeper issues, formulate a framework for addressing them, and knock up a rough solution that others can perfect. He also has the charisma to lead a large force of volunteers to turn his vision into reality
It was this that helped to make Linux so successful against all the odds and it is this that has propelled Git into prominence as a Source Control system. Linus Torvalds paused on the way to world domination via a diving trip in Fiji, to tell Simple Talk about code management with git, his view on programming, share his trenchant views on Subversion and why he believes the software industry is a glorious mess of innovation.
- Linus, it’s two decades since Linux was first launched. Can you identify any big changes in the way you think about programming now and you did twenty years ago?
- Oh, sure. The biggest is actually that I don't do nearly as much programming at all these days. I spend my time looking at and merging other people’s code, rather than write any myself. I do some programming (like the dive application I'm playing with), but most of my Linux programming is actually writing small example snippets of pseudo-codein email when I want to illustrate something.
That said, when I do write code, the code I write looks somewhat different too. Especially in the kernel, I'm just much more careful these days - and that is one reason it can be relaxing to write user-level (application) code, because you can do a lot of shortcuts and you don't have to be as careful as in the kernel.
- Would you consider developing a new language from scratch?
- I have, but at the same time I'm very pragmatic, and I think people who design things from scratch often forget that for things to be actually useful you need to have a lot of infrastructure, and you have to have people who know it.
Designing a new language is painful. Even if it were to be much better than some old language, you're just losing a lot of the existing knowledge base of an old one. And that "if it were to be much better" is pretty doubtful - people often have things that they dislike about old languages, but they don't think so much about all the small details that work very well - so when you design a new one you may fix the things you don't like, but you probably don't think about, and screw up all the things that made the old language successful.
I loathe the C pre-processor - the C language is wonderful, but the C pre-processor is pretty annoying. I'd love to have a pre-processor that could be aware of semantics and types - but also able to break them when necessary. That said, every attempt at that has always been a total failure, so I'm kind of resigned to the status quo.
- I’m intrigued to hear about the language. Would it have been a pure functional language?
- Oh, no. I've been interested in C-like languages; I just wish types in C were a more first-class citizen. Untyped inline functions (generics) types as arguments. Sparse does some of that, and was influenced by the kind of static checking that I think a C-type language could do.
- If you were starting out at university again, would you be drawn to programming the way programming is today? Where did you get the idea that given a problem to solve, the first thing you need is an interactive programming environment?
- Hmm. Starting out today, I suspect things have just changed so much that I'd probably not do things the way I used to.
For example, I really love doing low-level stuff, and work close to the hardware. It's why I did a kernel, of course. And that made much more sense back when I started with computers, because the computers were simpler, and you really could know them very intimately. So programming close to the hardware was how you did things, and the documentation existed, and there was a lot of support for that.
These days? It's hard to get very close to the machine, because machines have gotten so much more complex. You need to know about SMP, you need to have device drivers for infinitely more hardware, it's just intimidating. And the bar is just so much higher - to decide to do an OS today sounds like a very big undertaking.
Of course, that was true to some degree twenty years ago when I started Linux too - people who actually knew what was involved would never have started their own OS, because they knew what a huge undertaking it would be. I was just naive enough and ignorant enough that I didn't know how much work it would be.
- Has programming - and therefore the kind of people who can succeed as programmers - has changed? Can you be a great programmer operating at a certain level without ever learning assembly or C?
- I don't think you need to learn assembly, but I do think knowing C is useful if you want to be a great programmer.
Why? Not because C is some magical language, but because C is in many ways closer to the machine than most other modern and still used languages. And to be a great programmer, you really do need to know how the machine works, I believe. You need to really understand what can be done efficiently, and what needs a lot of work from the CPU or the system.
Of course, many programming projects aren't about efficiency or being close to the hardware. Many projects are about being close to the user, and it's almost irrelevant what you ask the machine to do, because it's all about the user interfaces. And that requires a different kind of understanding.
So I think in the end there is simply just a need for different kinds of programmers. Different projects need different skills and often even the same project - you might have different people working on different parts.
I do think that programming has changed, but it's not because the ‘old kind’ of programming has gone away, it's because there's just a wider variety. A couple of decades ago you pretty much had to be technical to do programming. These days there are environments where you need other skills.
- Do you think languages are getting better? It's obviously easier to write software now because of the advances that have been made but what are the things that are making it more difficult?
- I don't think it's about languages getting better - I still think C is a wonderful language, and unmatched for what it does.
But that ‘for what it does’ is about the same thing as in the previous question: the same way you'd want people with different interests for different parts of a project, those new programming languages are not necessarily better than the old ones, but they have a particular bent. They may be better at handling multithreading, they may have easier memory management with automatic garbage collection, they may have native interfaces for user interfaces etc. That doesn't make them any better than C - it just means that for some particular use they may be more convenient.
And I think a lot of the programming language debate is kind of pointless - it's not so much about the programming language, as it's about the programmer. People will always prefer - and be more productive with - a language that they know, and that matches their interest and expertise. So I like C, and I'll happily prototype things in C and play around with it - while others find the whole notion of using C for some simple prototype to be abhorrent and crazy. They'd prefer to use some scripting language that is ‘easier’ - but it's easier because of what they are used to, and what they want to do with it.
One reason I still think that C is so important is that you can do pretty much anything with it. It's a supremely flexible language that is limited only by your skills and effort. Yes, it may take more effort, but you know you can do anything: you really can do anything from an OS kernel to a library to a nice pretty GUI and artificial intelligence. It may not help you all that much, but it never ever says ‘you can't do that’.
Other languages tend to be more about the particular niche that they are really useful for.
- Is programming getting easier and should everyone learn to program, at least a little? There's an argument which starts that programming teaches away of thinking that's important, it gives you order and structure in life.
Opponents to this say programmers are completely misunderstanding the world in exactly the same way everybody else does and it doesn't make you intellectually superior. What camp are you in?
- I'm absolutely not a believer in the ‘everybody should know it’. That's just crazy talk. It's the same mindset as ‘everybody should be able to fix their car’, or ‘everybody should be able to build their own house’.
No, not everybody should. Specialization is what makes us human. We can afford to specialize in things and be expert at something, because other people do other things. Software is important in modern society, yes - but so is growing food. You shouldn't expect everybody to know how to be a farmer or to milk a cow.
And the thing is some people will be better at it. Either because of some natural inclination, or just because they spent a lot of time at it. Or both. Expecting everybody to learn to program when you know that the majority won't care or ever get all that great at it – why even bother?
- Is the software industry a brilliant engine of innovation or a horrible mess? And if it's in a horrible mess what could we do to drag ourselves out of it? Review the way we create standards and specify less?
- I think it's a mess, but it's not a horrible mess. It's a glorious, crazy mess. Software engineering is about complex systems and few people really understand the whole thing - I'd argue that nobody does. You can't plan or even really engineer things in that kind of complex environment - I think you have to make things grow more or less organically.
And that's why I think open source works so well. It's this crazy free-for-all where people who are interested and have a viewpoint can all participate and you get a kind of odd Darwinian thing where successful approaches work better and propagate. There isn't much ‘intelligent design’, but there are lots of changes and lots of very active fitness testing.
- Okay, a change of subject. There's a lot of noise from people fresh to Git who say that the documentation is still less than brilliant, that git's wiki still has a long way to go and they have to consult the web to find out exactly how to format revision specifiers. Are these criticisms valid or should they do something else with their life such as use Mercurial?
- I don't think they are really valid any more. There is tons of documentation, and the problem most people tend to have is that getting into a new SCM takes effort and may require you to change how you think about what you're doing. And git is a very powerful tool that can be used in many different ways. The fact that it can take a while to get used to it isn't because the documentation is lacking or bad, it's simply because it may take some effort to get proficient.
Getting started is easy. Doing some of the complicated things can be hard. I suspect people may get to the documentation for the complex things before they are really ready for it.
- Windows users say that the official way of running git under Windows (using cygwin) is far from ideal, although it is completely functional; it's a little sluggish and in some way Windows users remain second class citizens in the world of git. You once said that 'If Microsoft ever does applications for Linux it means I've won' does this apply the other way round?
- Actually, you really shouldn't use the cygwin port of git. It's gotten lots better, but it's a very non-native interface for Windows, and I really suspect that people should use the more ‘native’ git port (msysgit).
That said, a lot of people want to use cygwin just in general, because it obviously makes Windows look more like a UNIX environment – and that part has nothing what-so-ever to do with git, and everything to do with people wanting full shell access and all the normal UNIX tools. If you're that kind of person, you probably want the cygwin git port too.
As to the ‘who won’ part - I just think that git won. It's a good design, and it's the most robust (and fastest) SCM around. And people are slowly really learning what the point of distributed SCM's are, and git has been instrumental in that and is the best tool around for it. The fact that people use it on Windows too is just a side effect of that - once you ‘get’ the whole distributed model, I don't think you can ever really go back.
- In 2007 you're quoted as saying that Subversion was 'the most pointless project ever started', though it's the most popular version control system on the planet. Why do you think so many use it and can you see the day when Veracity will be as pointless as Subversion?
- Oh, I see why people use Subversion. They wanted CVS, they're used to it, and SVN is the CVS of today. In other words, they just don't want a new SCM system; they just want to go on with their life and not really think about it.
I just never really liked CVS, and I think it did several things fundamentally wrong. And many of the SVN ‘fixes’ are in my opinion even worse - the branch and tag handling of SVN is just insane – it takes the CVS insanity to a whole another level. The file rename handling is also really stupid and wrong, although I can't call it worse than CVS, which was ‘hack it by hand by changing history’.
So sure, SVN has improvements too, and hey, I have my biases and hang-ups. As to veracity, I really don't know enough about it to say much. Some of the features they tout sound just stupid - anybody who wants to do explicit rename tracking is a moron, in my not-so-humble opinion, and thinking that file locking is a feature is just a sign that you haven't been taking your medication lately.
So looking at the veracity site, I'm not hugely impressed, but at the same time I think it's good that people try things out. I have very strong opinions, and I'm out-spoken, but hey, sometimes people prove me wrong too - and that's all to the good.