Using Version Control
Introduction
This paper is a short description of what is version control and how does it work. The goal of this paper is not to be a complete in depth study of Version Control System, VCS, but an introductory view to readers who have little or no background in version control. This paper should not be taken as a paper about Software Configuration Management (SCM); version control is only a small part of a SCM system. VCS is one of the critical components of a SCM system, but in no way is it the only component in a SCM system.
Overview
One of the strengths of the Unified Process, UP, is that it looks at deployment even at the beginning of the development cycle. There are many advantages to keeping the end goal in sight at the very beginning of the project. This long-term view of source code control leads to a very deployment centric development process. The advantages a deployment centric development process is that both the customer and system maintenance become the focal point of system development. This tends to improve responsiveness to customer needs and helps to have a more successful system. Also as a side effect these customers can be either internal, external, or both and all received the same level of service.
It is within this context that version control should be viewed. There is no reasonable way version control can be done without some sort of tool being employed and a version control tool is only as good as the process that is using the tool. If development is following a sound process, then the code product created will always be a higher quality product. One author’s opinion of how important process is can be seen it their comment:
We have seen a number of projects work from considerable periods of time (months if not years) using a single mainline with no branches. The fact that they could is a testament to the soundness of their practices. [1]
In no way should these statements be taken that good SCVC and process will created better systems and system quality. There are many other factors that will contribute to creating a better system. Some of these other factors include good decomposition of the system into very small components, small changes to the system at any one time, and Test Driven Development, TDD. As it can be seen these additional factors are not really process type factors, they are implementation and management type factors. The conclusion that these additional factors should lead to is that the successful achievement of the goal is done by an over all more mature development process not just a good SCM process.
While on this note of SCM there are two types of SCM tools on the market today. These are Version Control and Process Management. [2] It is very important that these categories are kept in mind when looking at different SCM tools. If a good process already exists then Version Control type tools are all that is needed. However, if process is lacking then a Process Management type tool will be required. In selecting a version control tool there is one basic question that must be answered first: Does the tool have to fit the current process, or should the process be improved as well as a new tool? This question needs to be answered before tool selection can be done adequately.
File Locking
There are two distinct approaches to file locking in a version control system. These are Optimistic and Pessimistic. Each has its own set of advantages and problems. A version control tools is said to have Optimistic locking when it allows more than one user edit a file at the same time. A version control tool is said to have Pessimistic locking when a file can be edit only be one user at a time.
Pessimistic locks run certain risks, which include:
If a developer has a file checked out and becomes sick, or goes on vacation, the file might be locked for an extended time and this can cause problems for other developers needing the file.
Even though a file is locked so no other modification can be made to it, another file that it depends on may be changed in such a way that the checked out file breaks anyway.
Optimistic locks run certain risks, which include:
Code merged many different times.
Some file type cannot be merged like graphics, or binary document files.
Pessimistic locks have certain advantages, which include:
Inherent work order process.
Forced smaller modules.
Optimistic locks have certain advantages, which include:
Faster development times.
No waiting for other developers.
Repository
There are two types of repositories found in Version Control Systems. These are Centralized and Distributed. For a little more information please refer to http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/. The oldest store in a VCS is centralized repository. Just as its name suggests all data is stored in a central location. This type of repository is good backup, undo, and synchronization. In some ways it falls short for merging and branching, but in many ways these two issues can be addressed by process.
With a Distributed repository individual users have the repository. The focus of a distributed repository is sharing changes. Everyone has a local sandbox to work in and they essentially work offline of each other. Also in a distributed model there is no real latest version. Depending on the current development life cycle being used this could, or could not, be a problem. If the life cycle goal is to always have a shippable version, there will be issues that will need to be addressed with a distributed model. This may include promoting up to a common central repository all current changes.
Basic Implementation
The goal of an ideal Version Control Processes is to have a single mainline of code that all system is developed from. If the VCS being used includes a labeling functionality this single mainline can be achieved by using labels. This type of a mainline is described as a Virtual Codeline. [3] To allow all this to come together, all projects must be very small modules and a module must be able to be changed in a very short period of time before the rest of the application has to be changed. The virtual codeline is created because of the way the labels are being used. In reality each label actually shows a branch and merge of a set of changes. This only works if both the design implementation and project management are both working towards the same goal of a single code base.
Branching Concepts
The basic design of the SCM system also assumes that there are three types of code lines that may exist at anytime. These are Release, Service Pack, and Hot Fix. Release and Service Pack code always goes through a complete Quality Assurance, QA, cycle where as the Hot Fix is only superficially reviewed by QA and should only be released to limited sites. To make this process work there is an assumption that code is merged from one level to the next. There is never a skip of a level between. [4] The following diagram shows how this is done.
Forcing the merge to only one level at a time has the advantage of helping to keep from having a branch that goes on forever. By eliminating the possibility of a branch going on forever we also remove the illusion of non-existing gains.
There is little lasting value in branching if the branches will diverge indefinitely without propagating any change … But the gains of such persistent variants are very short-term, and the resulting long-term costs of proliferating code and effort across the same codebase can quickly overwhelm whatever short-term gain is to be had. So whenever we branch we almost always will need to merge back to another branch. [4]
The side effect to this design is that there becomes a very delicate balance between Early and Late branching. Each of these have there own risks. Although there are many strengths and weaknesses to both there are a couple of key points that need to be considered at this point. Early branching is better suited to larger or more formal efforts that require a high degree of fine-grained isolation and control; you assume less safety risks but pay the price of additional merging and propagation. Late branching is good for projects that can afford to risk losing a bit of safety in order to gain more productivity; less branching and integration means less overhead, but also less isolation and verification. [5]
Complex Branching
The current system design handles almost all the situations for systems development. There is a case that can occur that will complicate the process. This case is not too unusual but has to be addressed. This is when a new version of the system has to be started before the current service pack is completed. The following diagram will demonstrate the case.
In this case the Service Pack is branched from the Mainline. When the New Release is started it is branched from the Mainline. Once the Service Pack is completed it is merged back into the Mainline. At this point the Mainline is “rebased” back into the New Release branch. It should be noted that since the New Release branch is not ready to be deployed all Hot Fixes must occur on the Service Pack branch. There are two reasons; first this will be the shortest living branch after the New Release is started, which should be the normal situation. Secondly the fixes will be merged into the mainline before they merge back into the New Release branch and this holds to the merge at one level at a time rule.
Build System Influence
The build system can have a great deal of influence on the system design. One of the key factors that a build system may be based on is the idea that “The basic rule of thumb is that you should be able to walk up to the project with a virgin machine, do a checkout, and be able to fully build the system.” [6]
This “rule of thumb” that Martin Fowler talks about is a cornerstone to the system design. If the build system is capable of working on any system, then it is very easy to create a “sandbox” for a developer to work in on his or her own system. This way a developer can work on code with no affect on the main code line. [7] The other advantage is that now the same system that is used to build the final executables is used on the developer system so there is a consistency in all code builds.
One of the major aspects of a SCM system is the ability to tell the exact code that is currently deployed. One way this can be achieved is by ensuring that executables can tell the current and correct version. This type of requirement, for the build system, will also have a major influence on the VCS. To automate this functionality there must be predictability within the VCS system. Using labels, instead of file version numbers, will add more flexibility in automating shipping code to tell what its version number is. This will also allow the version number to be related to an exact set of source code files that was used to create the executable.
Terms
One of the most important parts of communication is ensuring that there is an understanding what is being said. At the heart of this concept is common definition of terms. This section will give common used terms and their definitions.
Mainline
All components must compile and link, and pass regression tests; completed, tested new features may be checked in. [2]
Configuration Item
Component identified as an item to be controlled. [5]
Release
Any code being worked on for the next released version of the system.
Service Pack
Any code being worked on for the current service pack (update) of the system.
Hot Fix
Any code that requires a fast turn around fix for a customer.
Late Branching
A branch that is created at the time of a codeline conflict. [7]
Early Branching
A codeline is created before any codeline conflicts exists. [7]
Rebase
Merging a codeline from the Mainline into an other branch.
SCM
Software Configuration Management
VCS
Version Control System
Test Driven Development, TDD
Development process where tests are created before code.
References
1 – “Branching and Merging – An Agile perspective” by R. Cowham found at http://www.cmcrossroads.com
2 – “Configuration Management Frequently Asked Questions” found at http://www.daveeaton.com/scm/CMFAQ.html#GQwhatisCM
3 – “Streamed Lines: Branching Patterns for Parallel Software Development” by B. Appleton, S. Berczuk, R. Cabrera, R. Orenstein found at http://www.cmcrossroads.com/bradapp/acme/branching/
4 – “Lean Branching” by R. Cowham, B. Appleton, and S. Berczuk found at http://www.cmcrossroads.com
5 – “Streamed Lines: Branching Patterns for Parallel Software Development” by B. Appleton, S. Berczuk, R. Cabrera, R. Orenstein found at http://www.cmcrossroads.com/bradapp/acme/branching/
6 – “Continuous Integration” found at http://martinfowler.com/articles/continuousIntegration.html
7 – “Making Your Source Control System Work for You” found at http://www.wrox.com/WileyCDA/Section/id-110057.html