08 November 2011

Michael Pilato: Geek of the Week

For a large number of .NET developers, Subversion is Source Control. The book they go to to find out how to use it is O'Reilly's 'Version Control with Subversion'. Both Subversion and the book owe a great deal to the Subversion open source development team, including Michael Pilato of CollabNet, who has worked on the project for many years, almost since the project was founded in 2000 by Collabnet.

When the Dutch computer scientist Dick Grune invented and developed the Concurrent Versions System (CVS) in the 1980s, it was seen as the programming equivalent of inventing the steam engine. All of a sudden, instead of checking out code to have sole right of updating the code, programmers could work on an entire branch together and could merge all changes at once.


CVS was written as a set of scripts around RCS executable, and only later on was this packaged into single executable. There were several attempts to rewrite CVS from scratch, as it got to the point where it was impossible to modernize or extend its functionality. Subversion was started by Collabnet with some of the CVS developers with the objective of replacing CVS, rather than rewriting it. A relational database was used as the basis for the storage system for the versions, and a design was used that greatly improved its performance over a network. it was designed to be similar enough that any CVS user could make the switch with little effort. It  borrowed from the version control model employed by CVS but lacked the irritating shortcomings of the original, It grew to be a top-level Apache project by 2010.

Karl Fogel, and Jim Blandy did a great deal of the initial work but were soon joined by a number of developers who were keen to replace CVS and were only too aware of its shortcomings. They coalesced into an open-source team that was given the responsibility of designing and building the new version control system. One of the team-members who was employed by CollabNet  joined eight months later. His name was Michael Pilato and he soon became a core Subversion developer. Michael spends his days (and many nights) improving Subversion and other tools with which it integrates.

A co-author of ‘Version Control with Subversion’ and basic concepts behind version control with Ben Collins-Sussman and Brian W Fitzpatrick, he also enjoys composing and performing music, freelance graphic design work, hiking, and spending time with his family. Mike has a degree in computer science and mathematics from the University of North Carolina at Charlotte.

When did you start programming? Can you remember the first interesting code you wrote; and the first interesting code you read?
I started toying with programming when I was eight years old. My best friend’s dad worked for IBM and owned an old IBM PS/2 machine. We’d spend our days playing games on it, and then sit up late at night hand-copying BASIC programs from computer gaming magazines into it. Inevitably, we’d learn hours later that we’d fat-fingered something and the program wouldn’t run. But I remember being fascinated that the programs that did run (The ones somebody else typed all the code for!) existed because some person or group of persons thought through all that game logic, AI, UI, and so on, and then encapsulated all that thought into a language that a mere machine could act upon.

My opportunities to command a personal computer were scattered and decreased in regularity over the next several years of my life. It wasn’t really until I was in the latter years of high school and then into my studies at university I was able to buy a computer of my own to learn to program in earnest. Initially, most of my programming energy went toward simply keeping up with my computer science assignments. But it didn’t take long to realize how handy software development could be in terms of making my own life easier. In fact, it was out of a need to solve a real problem, the organization of the tech-support work done by myself and other employees of a local computer store, that I wrote my first piece of interesting software: a Visual Basic monstrosity we called the “Tech Bench Assistant”.

You got a Bachelor of Science in Computer Science and then went into industry. What were the things you had to learn about programming, beyond the ability to write code?
Programming for education and programming in the industry had very little in common in my experience. My assignments in school were all solo tasks, and all relatively small pieces of standalone functionality. The industry introduced to me to many new things that today I think I’d be lost without, including this magical thing called “version control”. I learned how to think through solid API design and write modular, librarized code. I learned the importance of testing. And of course, as I was now working on much larger pieces of software not written by myself, I learned two other important things. The first was the value of writing good code comments – a lesson that you generally learn while muttering unkind things about the uncommented code someone else wrote that you now have to read. Secondly, I learned the value of reading and writing code side-by-side with my peers, talking as much about the Why of a particular bit of code as about the What and How.
How did you like to work on a team as a programmer? Is it better to take a problem and split it up so everybody gets their piece? Or do you like the XP model of pair-program everything and everybody owns all the code collectively?
As a remote employee, XP-style programming is not such a ready option for me these days. I still work in teams – the greater Subversion development community, my CollabNet Engineering group, and so on – but as advanced as the state of the real-time collaboration art is, it’s still not quite the same experience as sharing an office space with others.

When I worked in a traditional office, I enjoyed a good mix of pair-programming and division-of-labor activity. I don’t have a particularly strong preference for one approach or the other; I say “Do what works.” Certainly, each member of a development team will come bearing their own individual skills. Pair programming lets those skills overlap and blend, ultimately allowing the whole team to grow together toward a more complete understanding of the software. I learned so much in my early days at CollabNet by pair programming with Karl Fogel and Ben Collins-Sussman as we sliced new trails through the version control landscape. And yes, many days it was exactly that adventureful! But we programmers aren’t programs ourselves so we have days or seasons when we just want or need to work alone. I appreciate the fact that none of my employers (either ex or now) have been so process-bound that I couldn’t enjoy the freedom to work in whatever style worked best, mixing and matching as required to be productive.

Regardless of the approach to development taken, though, I definitely favor group ownership of a codebase over something more territorial. Work alone or in pairs as you see fit, but don’t fragment the ownership of and responsibility for the whole product.

What was the Eureka moment for you with Subversion? What problem were you trying to solve? Did you use tests to drive the design, or do you see them more as a way of correcting errors?
The Subversion project was started by CollabNet about eight months before I joined the project (and company). The goal was pretty straightforward: to design and build a version control system which borrowed from the version control model employed by CVS (which was the de facto standard in open source version control systems) but which lacked CVS’s many well-documented-yet-still-quite-annoying shortcomings. 

Subversion’s design came about in a much more academic fashion, long before the first test was ever written. The Subversion test suite today is a mixture of various types of tests, from low-level functional tests of specific APIs to much higher-level behavior tests of, say, an entire Subversion command-line client subcommand. I think it’s fair to say, though, the Subversion developers haven’t really embraced test-driven development. Our tests are geared more toward error correction and guarding against functional regressions.

Subversion was developed to stop headaches caused by concurrent and collaborative work. Do you think it’s a silver bullet for concurrent programming?
Oh, goodness, no! Collaborative success is more about communication than in merely avoiding merge conflicts! Subversion is just another tool for communication. In this case, it allows developers to communicate via their coding efforts. Its design is such that it encourages this sort of communication to happen on a regular basis while not serializing those interactions unnecessarily. This all works to the benefit of the larger collaborative effort. Subversion is great software that does its job well. But “silver bullet” might be a step too far. You can toss Subversion into any dysfunctional collection of self-serving programmers and find that it doesn’t improve their collaboration effort in the least. Version control allows you to speak using code, but there’s just no substitute for the fundamental ability for programmers to speak about code with each other using methods of communication more directly related to the spoken word. Teams need to first learn to be teams.
Knuth has an essay about developing TeX where he talks about going over a pure destructive QA personality and trying his hardest to break his own code. Do you think most developers are good at doing that?
May I simply refer you to the size of the defect trackers of the world’s most mature pieces of software?

Programmers probably aren’t always the most thorough analysts of their own code’s robustness. I’d venture to guess that every programmer who has written software for someone else to use has had at least one opportunity to stare blankly at a bug report and wonder, perhaps aloud, “Well of course it failed when they did that, but … who would do that?!” That said, I think the develop mindset is capable of that sort of “pure destructive QA personality”. But it’s far easier to code for good input than to consider every possible failure scenario, and sometimes we get lazy about that stuff or assume that someone else has already checked the constraints elsewhere in the codebase.

How do you avoid over generalization and building more than you need and consequently wasting resources that way?
There’s always a temptation to over-generalize when developing. It’s a side-effect of the programmer gene, right?  Software is entirely about convenience. Software developers (and engineering-minded folk in general) are a paradox: we work so hard just to do so little! Every line of code exists because there’s some task or portion of a task that we simply don’t want to repeat in the future. I’ve written countless little custom scripts merely because I can’t be bothered to manually repeat some multi-step activity. So it’s so very natural for a programmer to think not just, “How can I do X?” but “How can I do X, Y, and everything tangentially related to both once and never again?”

I’ve found it helpful to work in a community with like-minded folks who keep those tendencies in check. I also notice that I’m more likely to over-generalize when writing a piece of code for my own use only as opposed to something that will be publicly released. Maybe those two boil down to the same basic idea – working transparently. Strict deadlines can also help to keep you grounded in the immediate needs of the moment, too. But deadlines tend to be less forgiving than human peers when you foul up!

What has changed the most in the way you think about programming now versus when you started?
Scope. When I first started programming, I was doing stuff by me, for me. I was scratching my own creative and organizational itches and I selfishly thought about programming in that context: my creativity; my organization; my approach.

Programming is a gift that’s meant to be shared. Sure, I still write stuff for my own use only but I realize now more than ever before that the code I write or help to write has the power to make other people’s lives easier, too. In the past decade, I’ve had so many chance meetings with random Subversion users, and while many of them seem captivated by the fact that they are speaking with one of its developers, I’m likewise captivated by the idea that some text I bang out in Emacs on my laptop gets shared with and refined by 20 or 30 other developers, compiled on countless other computers, and then used and enjoyed by millions of people. Oh, and that Emacs software, and the operating system it’s running on, and the compiler, and who knows how many other piece of software that bind it all together  were all written in the same way, by teams of programmers sharing their gift with others. Suddenly, I find that it’s no longer just about me.

Along the way, my opinions about software licenses and copyright and intellectual property and such have also matured. It’s yet another scope change. As a starting programmer, I was really picky about maintaining ownership and control of my creative efforts. This was the case not just in software development, but in other creative outlets in my life, too. Now, my default stance is one of freedom and community benefit: Apache licenses for code, non-restrictive Creative Commons licenses for music and graphic design, etc. My friend Karl Fogel, founder of QuestionCopyright.org, has played a large role in this change of thought for me over the past decade (though I doubt he even realizes it). And some of this thought development stems from my own maturing religious beliefs, too – but that’s an essay I’ve yet to write!

You’ve written or co-written books. Do you write with the same speed and ease everyday? How do you feel about parting with books when you finish them? 
I enjoy writing, and find that it comes fairly easily for me. I struggle with it for a little while after making the initial context-switch from something else (such as programming) after long periods of not writing prose. For good or for ill, I’m involved in so many different things – at work, with my family and church, and so on – that I rarely have the luxury of doing any one thing for extended periods of time. Fortunately, I can flip between these activities and roles pretty easily. But prose authorship tends to come in occasional bursts of activity, usually to address some immediate need. I find that it takes me a bit longer to get my head into the correct state for that sort of communication. But once I’m there and mentally acclimated, I can plough through a writing task pretty easily.

Unfortunately, I have no experience in truly parting with books. Or in finishing them, for that matter. ‘Version Control With Subversion’ is an open-source book that I continue to maintain as the Apache Subversion software continues to mature. I suppose the closest I could come to relating to that experience is the pair of times that O’Reilly chose to publish hard copies of the book, complete with real authorial deadlines and professional copyediting and such. When those experiences came to a close, I was usually so burned out and blurry-eyed from flipping through 400 pages of red-ink-marked editorial feedback that my only feeling when parting with the book was of relief. Those weren’t even true partings in my situation, because the book text still lives on in a public Subversion repository, always ready for still more corrections and additions.

Were there books that were important to you when you were learning to program?
I was always more of an “… in a Nutshell” sort of learner than a “The Art of …” sort. I’ve learned far more from my peers and the code reviews they provided than from books. Books are great, and I don’t want to be misunderstood as stating otherwise. But my advice to new programmers is the same that I would give to a new guitarist, a new chef, or a blacksmith’s apprentice (are those still around?): refine your craft beside attentive mentors who value your mastery of the craft, too. Get involved in an open source community. Ask for bite-sized tasks you can help with. Solicit code review for your patches. Don’t work in secret, waiting for some future opportunity to spring your fully-formed, time-perfected and inimitable Self on an unsuspecting world in a dazzling “reveal” of your talent. You’ll have more time to make your mark on society if you spend less of it re-experiencing and recovering from the same mistakes that a good mentor could have easily prepared you for.
There’s another kind of reading which is obviously important – reading code. How do you find your way into a big pile of code you didn’t write? 
If I’m fortunate enough to be looking at software whose version control history is available to me and well-documented, I find it very helpful to read commit logs and such for a bit of additional context beyond whatever code comments exist. Another useful tool is the debugger; sometimes the best way to understand how a piece of code works is just to observe it working (or not working, as the case may be).
Do you feel there are times in your life where your passion for programming runs amok to the detriment of other parts of your life?
Perhaps this question should have been asked of my wife!

Certainly there have been seasons where my sheer love of this craft has caused me to make some … less-than-ideal decisions regarding time management. I’m getting too old to do the all-nighter thing without physical repercussions. Besides, I can’t seem to find a pizza in my hometown that I consistently enjoy!

I’m fortunate, though, to have many passions. Programming is one, with music and soccer helping to disqualify me from most geek stereotypes.  I have the great privilege of waking up next to a beautiful woman and great friend every day, of being greeted before I’m fully awake by two amazing sons, and of existing as just the faintest reflection of the original Creator. So long as I remain cognizant of all of those amazing blessings, keeping my other passions properly prioritized is no sweat.

Keep up to date with Simple-Talk

For more articles like this delivered fortnightly, sign up to the Simple-Talk newsletter

This post has been viewed 4849 times – thanks for reading.

  • Rate
    [Total: 35    Average: 4.5/5]
  • Share

Richard Morris

View all articles by Richard Morris

Related articles

Also in .NET

Posting Form Content via JavaScript

Web-based applications run smoother if instead of using the traditional form method, they use JavaScript to post data to the server and to update the user interface after posting data: It also makes it easier to keep POST and GET actions separated. SignalR makes it even slicker; it can even update multiple pages at the same time. Is it time to use JavaScript to post data rather than posting via the browser the traditional way?… Read more

Also in Geek of the Week

Clive Sinclair: Geek of the Week

Although most of the geeks of the IT industry are famous for their software, it was the geeky entrepreneurs that changed society by bringing cheap microcomputers to the market. Sir Clive Sinclair is most famous for applying his background in electronic engineering to provide a whole generation, both in America and Europe, with their first taste of programming.… Read more

Also in Open Source

Chet Ramey: Geek of the Week

The BASH shell is the most popular UNIX command-line scriptable shell. It became the inspiration for PowerShell. As with so many standard components of the Open Source movement, there is a hard-working and dedicated individual who quietly supports the tool over many years. Chet Ramey maintains and extends BASH by himself, and we all give thanks to him for his dedication.… Read more

Also in Opinion

Relational Algebra and its implications for NoSQL databases

With the rise of NoSQL databases that are exploiting aspects of SQL for querying, and are embracing full transactionality, is there a danger of the data-document model's hierarchical nature causing a fundamental conflict with relational theory? We asked our relational expert, Hugh Bin-Haad to expound a difficult area for database theorists.… Read more

Join Simple Talk

Join over 200,000 Microsoft professionals, and get full, free access to technical articles, our twice-monthly Simple Talk newsletter, and free SQL tools.

Sign up