Source control is a conservative technology. Even so, there has been a general willingness to use internet-based servers for the task for a long time, especially for open-source projects. Will the trend toward cloud-based services for version control continue, or is the question made irrelevant with the growing acceptance of distributed source control? Is Cloud-based version control still too risky for a service that must be secure, and almost risk-free?
When I visited the Code Spaces website recently, I discovered the headline, “Code Spaces : Is Down!” This was followed by a brief explanation of how the company had been targeted by a distributed denial-of-service (DDoS) attack on June 17. Apparently the attacker had gained access to the service’s Amazon EC2 control panel and left messages that led to attempts at extortion and the subsequent deletion of most of the company’s data, backups, and machine configurations. All this took place within a 12-hour period. Code Spaces, their services, and most of their clients data, had ceased to exist.
Prior to the attack, Code Spaces provided what they described as ‘rock solid’, secure source code and project management services to application developers, one of a growing number of companies to offer version control in the cloud. By this, I mean hosting a source code repository as a service, where much of the computing is offloaded to virtual machines running on remote servers. Data, too, is stored remotely, and users access their repositories via the Internet. Assembla, CloudForge, GitHub, and Bitbucket are just a few of the services offering cloud-based version control, just like Code Spaces used to do. They share a confidence in their resilience that Code Spaces once had, boasting ‘With Data Centers in 3 continents we can guarantee 99% uptime. If our servers are not up for 99% of the time, we will give you that month’s hosting for free - now that's a guarantee!’
So, what are we to conclude from the collapse of Cloud Spaces? Should we still consider cloud-based source control as an option, and if so what precautions should we take?
Version control in the cloud
The problem suffered by Cloud Spaces, criminal sabotage, was probably made easier by their reliance on a single cloud provider. Before all the cloud hoopla took hold, source code hosting was often offered as a dedicated hosting service, or managed hosting service, in which customers leased an entire server not shared with anyone. This would make any wide scale malicious attack far less likely, but more difficult to scale with fluctuations in demand. The components necessary to support the version control system ran on that server, with perhaps the necessary attached storage. An Internet connection was still required, but such a system lacked other features often associated with cloud services, such as virtualization and a remote storage infrastructure housed in a large data center (cloud storage).
Cloud-based version control brings with it virtualization and remote data storage, typical of cloud services. That said, cloud services are still often referred to as hosted services, which might confuse those who were around a decade ago when such things were provided in a more traditional way.
But times are changing and many “hosted” systems have moved to the cloud, and in such a system, the vendor makes available the infrastructure necessary to run the version control software—most often one of the more popular products, such as Subversion (SVN), Mercurial, or Git—and hosts the subscribers’ repositories in cloud data centers. The service provider ensures that all implementation and maintenance tasks are being handled and provides customers with an interface for setting up the repositories and managing users.
Developers then connect to hosted repositories (via the remote virtualized servers) from their computers, usually through a client application supported for the particular version control system. For example, if a team has set up a Git repository with a cloud service, the developers might use the Git command-line client or the TortoiseGit GUI to connect to the remote repository. Or they might interface with the hosted repository from within Visual Studio. Whichever approach a team takes, users are ultimately connecting via the Internet.
Rarely does a service provider offer only source code hosting. More often than not, version control is available alongside other services, such as issue tracking or project management, as we saw with Code Spaces. In such cases, the provider is often adhering to a software-as-a-service (SaaS) model in which subscribers access, via the Internet, one or more cloud-based applications running in virtual environments on remote servers. The provider facilitates the hosting and management of each application and makes it available as an online service. SaaS vendors generally differ from one another in the version control systems they support, the additional tools they offer, the level of integration with other systems and their pricing models. For example, GitHub provides integrated issue tracking and collaborative code review, but supports only the Git version control system. Assembla, on the other hand, supports SVN, Git, and Perforce, along with providing integrated issue tracking and project management.
In some cases, providers adhere more to a platform-as-a-service (PaaS) model in which they offer an entire cloud-based development environment that supports multiple integrated services, with version control being only one of them. Heroku, for example, provides a cloud application platform for building, deploying, and managing applications. Part of that platform is the Git version control system, which is integrated into the entire project management cycle. Then there’s CloudForge, self-dubbed as a development platform-as-a-service (dPaaS). CloudForge provides the infrastructure and tools necessary to develop, deploy, and scale application services. As part of that platform, you get hosted version control based on either SVN or Git.
When it comes to cloud-based version control, no two service providers are the same, but they all share one trait. They offer some form of cloud-based version control that requires Internet accessibility and a willingness to offload at least part of your operations and data to an outside vendor.
Why bother with hosted version control?
A version control system protects code, maintains a history of changes, and enables collaboration. These days, few developers would consider building an application without it. Some teams see hosted version control in the same light, believing a hosted cloud service to be as essential as the version control system itself. And they have good reason for feeling this way.
As the number and size of the version control repositories grow, so too do the implementation and maintenance costs associated with housing an on-premises system. Even with distributed version control (which we’ll discuss in more detail in a bit), a team will still often designate a single repository as the one source of truth that can integrate with other development tools and serve as the primary source for deployments and backups. Having a “central” server also ensures a credible copy of the code is stored offsite, should disaster hit a development center.
A cloud-based version control service can eliminate many of the implementation and maintenance costs associated with in-house hosting. Because service providers have access to massive infrastructures and data centers, such as those available through Amazon EC2, they can also better support geographically-dispersed development and high-availability requirements. In addition, these services ensure that regular maintenance tasks are performed, such as backing up the repositories at regular intervals and implementing tested recovery strategies. Cloud services also have the advantage of on-demand scalability beyond the capabilities of most in-house environments. Organizations can expand and contract their operations as necessary, without costly long-term investments in equipment, software, and other resources.
Cloud-based version control services also make it easier to set up a new project and get started. Plus, most providers offer additional services that are integrated with the source control system, helping to streamline the entire development process. For example, source control providers usually offer integrated issue tracking so that repository commit operations can be linked to specific tasks or issues. They might also offer project management services that are integrated with the version control system. Most vendors provide much more than just a “repository in the cloud,” and it’s all those extras that can make hosted version control shine.
That’s not to say a hosted cloud service doesn’t have its downside. If you’re building an open source solution, public exposure is more a benefit than a risk. However, many organizations using hosted services want to keep their development efforts private, and uploading proprietary code via the Internet and storing that code in data centers in the cloud carry its own set of risks. What happened at Code Spaces is just one example. For those subscribers whose data was wiped out, the degree to which they’ll be able to recover that data will depend on whether any of their users have an up-to-date workspace (though chances are, someone will). But other tracking and project management data could be lost.
Yet it’s not just deleted data that’s a concern. If a provider’s security protections are compromised, the integrity of the code could be compromised. In addition, code files and other sensitive data that end up in the wrong hands could make a company even more vulnerable. What happens to that data if the provider goes out of business? Which of the provider’s employees have access to your data? What safeguards are in place to protect against malware?
In addition, the very nature of a cloud-based service makes it vulnerable to disruptions in Internet access and system-wide outages. Disruptions can range from overloaded ISPs to regional power failures to internal system failures to cyber attacks. True, an on-premises system is also vulnerable to an assortment of risks, but at least with an in-house solution, you have some level of control over data access and what you can do if your systems are compromised.
On the other hand, if you do go with a cloud service, you can take steps to help protect your code, such as digitally signing your files or implementing automated vulnerability tests as a part of the build process. In some cases, you might even find that the service provider has put into place a more robust security model than what you have available in-house. And for those development projects open to the public, it might be especially beneficial to give the hosted service a try, especially if that service is free.
Centralized and distributed version control
A version control service hosts the repositories and application components necessary to manage those repositories. Users connect to the repositories from their computers, usually through a client component specific to the version control system. However, they don’t directly modify the files in the repository. Instead, they edit working copies within their own environments and then commit or push those changes to the repository. The exact way in which this works depends on the version control system and whether that system is a centralized or distributed one.
A centralized system is built around a primary repository that acts as the one source of truth for all code files stored to that system. A central server manages the files, maintains version histories, and controls all operations that affect the repository. Users connect to the repository via the server in order to commit changes and retrieve updates. Products such as SVN and Team Foundation Version Control (part of Team Foundation Server) are commonly used centralized source control systems. A service provider offering centralized source control manages the product’s server and repositories and provides the necessary interfaces for users to connect to the system.
In a distributed system, there is no central server and no one repository that is considered the main store. Each user has a clone of the repository on his or her computer, creating a peer-to-peer relationship between all the repositories. Users still edit working copies, but they commit changes to the repositories on their own systems. Only then do they push their changes out to other copies of that repository, or pull changes from those systems. Commonly implemented distributed version control systems include Git, Mercurial and Bazaar.
A solution provider that offers distributed source control as a service is essentially housing one of the clones in the cloud. In a sense, that clone acts as central server to which all users push their changes (and from which they subsequently pull updated files). It also provides an integration point for other services, such as issue tracking, project management, and deployment, and facilitates backup operations that rely on having the most updated files. A central server doesn’t prevent users from pushing files to or pulling files from other peer repositories, but it does represent the one source of truth that can sometimes be missing from a distributed system.
There are advantages and disadvantages to both centralized and distributed version control systems (and much debate about choosing one over the other). A full discussion of each system and their differences is beyond the scope of this article. However, before choosing a service provider, you should know whether you want to implement a centralized or distributed system. In some cases, your organization might already have a version control solution in place and you simply want to move to a hosted service. As a result, you can quickly eliminate some providers from consideration. For example, if you’re using SVN in your organization (and you have no plans to switch), you’ll likely bypass GitHub and Bitbucket and look toward such services as Assembla or Beanstalk.
It’s also worth noting that not all service providers implement one of the industry standards for their source control. For example, PowerVCS offers web-based source control that requires no client components and uses MySQL in the backend. User interaction is primarily through a browser. This keeps things simple in many respects, but such a solution is not always as robust as some of the tried-and-true products implemented as a service.
Public and private development strategies
Another consideration when choosing a service provider for your source control hosting is whether you plan to develop public solutions (open source), private ones, or both. What this essentially comes down to is whether your code repositories are made public or are limited to authorized users. Providers such as Launchpad, GitHub and Bitbucket support both public and private repositories, while Beanstalk and Unfuddle support only private projects, and CodePlex and Google Code support only public ones.
Private repositories usually come with at a steeper price, but also serve to protect your code, at least in theory. Open source development is, of course, open to the public, but you can find free hosting for your projects. Yet price alone should not be your only consideration when it comes to open source development. The reason you put your code out there is to encourage community involvement. As a result, you want to take into account not only the number of people who use the service, but also the type of projects being hosted. If you’re developing software related to the Linux kernel, for example, you’ll likely want to choose GitHub over CodePlex.
Integration with other systems
Earlier, we touched on the topic of integration, an important consideration when deciding upon a service provider. Most providers offer at least some level integration between the version control system and an issue tracking solution. Integration between the code and issues can be an important aspect of maintaining your codebase, and when committing code to the repository, users should be able to relate that operation to specific issues. That way, the code history is always linked to the issues history. Bitbucket, for example, lets users link their commit operations to issues within JIRA tracking software simply by including an issue key in the commit comment.
The degree to which the version control service is integrated with other services varies greatly from one provider to the next. In addition to issue tracking, many providers offer services such as code review, wikis, and project management. For instance, in Beanstalk you can link your commit comments to JIRA, just like you can in Bitbucket, but you can also link them to FogBugz, Lighthouse, Zendesk, and Sifter.
When considering a provider, you should also take into account the level of integration with systems outside of the provider’s domain. For example, CloudForge provides Web hooks (call-back APIs) to run scripts that can update other tools in response to a source-code commit. The service also supports WebDAV in order to facilitate document sharing and includes the capacity to connect your TeamForge project to a Box storage account.
When deciding upon the level of integration you need, you should also look at the tools you’re already using and determine how well they’ll work with the version control service. The last thing you want is for your continuous integration system to be incompatible with your version control service.
Protecting your assets
If the Code Spaces attack shows us nothing else, it demonstrates the importance of security and what happens when that security fails. In the face of lost data, we’re often left with nothing but questions. In the case of Code Spaces, for example, we might wonder why there no mechanism to warn Amazon to disconnect the console before administrators tried to wrest control back from the blackmailer. And why didn’t an alarm sound on Amazon’s end as the result of such uncharacteristic behavior?
When it comes to your private development projects, you want to be sure your code is safe both at rest and in motion. The only way to do that is to vet each potential provider to determine the level of security it supports. Considerations include where the repositories are hosted, the types of facilities used, what data is encrypted, types of encryption available, whether brute-force protection has been implemented, what steps have been taken to prevent DDoS attacks, the types of auditing being done, and any other considerations relevant to protecting your organization. You need to know up-front what your security needs are and then determine whether the provider meets those needs.
You should also determine how easy it will be to manage groups and users and the degree to which you can control access. Assembla, for example, provides a simple user interface for granting access to two types of users: team members and watchers (typically, clients). For each group, you can control the type of permissions users have to each tool, or you can deny access to a particular tool.
Another consideration to take into account is authentication. For example, does the service support two-factor authentication? Beanstalk does. So does GitHub. And it appears Bitbucket is working on it, but it hasn’t been implemented yet. On the other hand, you might want your users to be able to log in to the service using an account created through another provider or system. For instance, you can sign into Visual Studio Online by using your Microsoft account, and you can sign into Bitbucket by using your Google, GitHub, Twitter, or Facebook account. But for GitHub you can use only your GitHub account, and for Launchpad you can use only your Launchpad account.
What’s this going to cost?
Here’s the general breakdown. Public repositories are often free. Private repositories usually are not. If you’re going the open source route, and not doing private repositories at all, you might consider CodePlex, GitHub, Google Code, or Launchpad. If you need privacy, then you should take into account the service provider’s pricing model, which can be based on the number of users, number of repositories, resources used, or other factors. Each provider has it’s own system.
For example, GitHub offers personal plans and organizational plans. All plans permit an unlimited number of users and public repositories, but charge for private repositories, starting a $7/month for five (personal) or $25/month for 10 (organizational). Bitbucket, on the other hand, offers plans that start at $10/month for up to 10 users, with unlimited public and private repositories. (It’s free up to five users.) Then there’s Heroku, which charges for the resources and services used, starting at 5 cents a dyno-hour. A dyno refers to a unit of computing power.
You’ll have to determine how you’ll be using a service and your estimated number of users and repositories to estimate how much it’s going to cost you. Keep in mind, however, you don’t have to limit yourself to one service. You can host your open source projects with one provider and your private ones with another provider, determining which one is best for each situation. Also be aware that pricing structures change, so make sure you have the most current information when you’re ready to finalize your decision.
Lots more to think about
Not surprisingly, you’ll want to take into account a number of other considerations when trying to decide on a where to host your source control, such as how backups and risk management are handled and what type of recovery strategies are in place. You’ll also want to consider issues related to availability and reliability. What sort of down times has the service experienced? What about scheduled maintenance? If data centers are geographically dispersed, what happens if one of those centers goes offline?
You’ll also want to look at issues related to management and general usability, for administrators and developers. For some operations, this might not be much of a factor, particularly with regard to version control. After you set up the repository, most interaction will be between developers and the version control system, via the clients on their computers. Even so, you should have a sense of what to expect and how to perform administrative tasks. Also consider other features that have not been at the top of your priority list, such as wikis and web pages, or even whether you’ll have to contend with any ads on your web pages.
Another factor to take into account is how well the service is known and used. Services such as GitHub and Bitbucket come with large user bases (with GitHub easily leading the pack). These numbers translate into extensive communities looking for similar solutions and addressing similar issues. You should also verify what sort of support the provider offers, should you run into any issues, and how much you might have to pay for that support. In addition, be sure to verify how well processes and systems are documented, paying close attention to the usefulness and quality of that information.
Keeping your head out of the clouds
Clearly, you must take into account a number of issues when choosing a service provider that hosts version control in the cloud. Perhaps the best place to start is to look at your current version control system and decide whether you want to continue to use that one or switch to a different system. For example, a team running SVN, a centralized system, might decide that moving to a cloud-based version control service offers a good opportunity to switch to a distributed system such as Git. However, if you want to stick with your current system, then you need to find a provider that supports that system. And if you’re starting anew, then you must decide whether you want to go with a centralized system or a distributed one, and from there, decide upon a product.
You’ll also need to determine whether you will require private repositories, public repositories, or both. If private repositories must be part of the package, then carefully vet the security mechanisms that the provider has in place. In addition, your decision should take into account what other services you need in order to augment or replace systems you’ve already implemented within your organization. Be sure to determine what services the provider offers and how well those services integrate with other systems. And don’t forget to carefully calculate what all this is going to cost you.
Of course, you might not yet be convinced you need hosted version control. If you already have the resources to deliver the necessary services in-house, you might want to stick with what you’ve got, especially if you’re not convinced the cloud is secure enough to meet your needs. After what happened at Cloud Spaces, no one could blame you. The company has essentially been put out of business, and we have no way of knowing what the full impact has been on their clients. That said, if you’re developing open source solutions, on-premises hosting misses out on the benefits of community exposure and support. Even with private repositories, hosted source control can be a good solution for some teams. But do your homework first. The investment you make now will help to ensure the integrity of your code for a long time to come.
SQL Server Source Control Basics
Robert Sheldon has co-authored a free eBook on SQL Server Source Control, along with Rob Richardson and Tony Davis. The book gives a detailed walkthrough of the concepts, complete with code samples.
Download the free eBook