Introduction
This paper is a short description of what is version control
and how does it work. The goal of this paper is not to be a
complete in depth study of Version Control System, VCS, but
an introductory view to readers who have little or no background
in version control. This paper should not be taken as a paper
about Software Configuration Management (SCM); version control
is only a small part of a SCM system. VCS is one of the critical
components of a SCM system, but in no way is it the only component
in a SCM system.
Overview
One of the strengths of the Unified Process, UP, is that it
looks at deployment even at the beginning of the development
cycle. There are many advantages to keeping the end goal in
sight at the very beginning of the project. This long-term view
of source code control leads to a very deployment centric
development process. The advantages a deployment centric development
process is that both the customer and system maintenance become the
focal point of system development. This tends to improve responsiveness
to customer needs and helps to have a more successful system. Also as a
side effect these customers can be either internal, external, or both
and all received the same level of service.
It is within this context that version control should be viewed.
There is no reasonable way version control can be done without
some sort of tool being employed and a version control tool is
only as good as the process that is using the tool. If development
is following a sound process, then the code product created will
always be a higher quality product. One author’s opinion of how
important process is can be seen it their comment:
We have seen a number of projects work from considerable periods
of time (months if not years) using a single mainline with no branches.
The fact that they could is a testament to the soundness of their
practices. [1]
In no way should these statements be taken that good SCVC and process
will created better systems and system quality. There are many other
factors that will contribute to creating a better system. Some of these
other factors include good decomposition of the system into very small
components, small changes to the system at any one time, and Test Driven
Development, TDD. As it can be seen these additional factors are not really
process type factors, they are implementation and management type factors.
The conclusion that these additional factors should lead to is that the
successful achievement of the goal is done by an over all more mature development
process not just a good SCM process.
While on this note of SCM there are two types of SCM tools on the market
today. These are Version Control and Process Management. [2] It is very
important that these categories are kept in mind when looking at different
SCM tools. If a good process already exists then Version Control type tools
are all that is needed. However, if process is lacking then a Process Management
type tool will be required. In selecting a version control tool there is one
basic question that must be answered first: Does the tool have to fit the current
process, or should the process be improved as well as a new tool? This question
needs to be answered before tool selection can be done adequately.
File Locking
There are two distinct approaches to file locking in a version control
system. These are Optimistic and Pessimistic. Each has its own set of
advantages and problems. A version control tools is said to have Optimistic
locking when it allows more than one user edit a file at the same time. A
version control tool is said to have Pessimistic locking when a file can be
edit only be one user at a time.
Pessimistic locks run certain risks, which include:
If a developer has a file checked out and becomes sick, or goes on vacation,
the file might be locked for an extended time and this can cause problems for
other developers needing the file.
Even though a file is locked so no other modification can be made to it,
another file that it depends on may be changed in such a way that the
checked out file breaks anyway.
Optimistic locks run certain risks, which include:
Code merged many different times.
Some file type cannot be merged like graphics, or binary document files.
Pessimistic locks have certain advantages, which include:
Inherent work order process.
Forced smaller modules.
Optimistic locks have certain advantages, which include:
Faster development times.
No waiting for other developers.
Repository
There are two types of repositories found in Version Control
Systems. These are Centralized and Distributed. For a little
more information please refer to
http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/.
The oldest store in a VCS is centralized repository. Just as
its name suggests all data is stored in a central location.
This type of repository is good backup, undo, and synchronization.
In some ways it falls short for merging and branching, but in many
ways these two issues can be addressed by process.
With a Distributed repository individual users have the repository.
The focus of a distributed repository is sharing changes. Everyone
has a local sandbox to work in and they essentially work offline of
each other. Also in a distributed model there is no real latest version.
Depending on the current development life cycle being used this could,
or could not, be a problem. If the life cycle goal is to always have a
shippable version, there will be issues that will need to be addressed
with a distributed model. This may include promoting up to a common
central repository all current changes.
Basic Implementation
The goal of an ideal Version Control Processes is to have a
single mainline of code that all system is developed from.
If the VCS being used includes a labeling functionality this
single mainline can be achieved by using labels. This type of
a mainline is described as a Virtual Codeline. [3] To allow all
this to come together, all projects must be very small modules
and a module must be able to be changed in a very short period
of time before the rest of the application has to be changed.
The virtual codeline is created because of the way the labels
are being used. In reality each label actually shows a branch
and merge of a set of changes. This only works if both the design
implementation and project management are both working towards the
same goal of a single code base.
Branching Concepts
The basic design of the SCM system also assumes that there are
three types of code lines that may exist at anytime. These are
Release, Service Pack, and Hot Fix. Release and Service Pack
code always goes through a complete Quality Assurance, QA,
cycle where as the Hot Fix is only superficially reviewed by
QA and should only be released to limited sites. To make
this process work there is an assumption that code is merged
from one level to the next. There is never a skip of a level
between. [4] The following diagram shows how this is done.
Forcing the merge to only one level at a time has the
advantage of helping to keep from having a branch that
goes on forever. By eliminating the possibility of a branch
going on forever we also remove the illusion of non-existing gains.
There is little lasting value in branching if the branches
will diverge indefinitely without propagating any change …
But the gains of such persistent variants are very short-term,
and the resulting long-term costs of proliferating code and
effort across the same codebase can quickly overwhelm whatever
short-term gain is to be had. So whenever we branch we almost
always will need to merge back to another branch. [4]
The side effect to this design is that there becomes a very
delicate balance between Early and Late branching. Each of
these have there own risks. Although there are many strengths
and weaknesses to both there are a couple of key points that
need to be considered at this point. Early branching is better
suited to larger or more formal efforts that require a high
degree of fine-grained isolation and control; you assume less
safety risks but pay the price of additional merging and propagation.
Late branching is good for projects that can afford to risk losing
a bit of safety in order to gain more productivity; less branching
and integration means less overhead, but also less isolation and
verification. [5]
Complex Branching
The current system design handles almost all the situations for
systems development. There is a case that can occur that will
complicate the process. This case is not too unusual but has to
be addressed. This is when a new version of the system has to be
started before the current service pack is completed. The following
diagram will demonstrate the case.
In this case the Service Pack is branched from the Mainline. When
the New Release is started it is branched from the Mainline. Once
the Service Pack is completed it is merged back into the Mainline.
At this point the Mainline is “rebased” back into the New Release
branch. It should be noted that since the New Release branch is not
ready to be deployed all Hot Fixes must occur on the Service Pack
branch. There are two reasons; first this will be the shortest living
branch after the New Release is started, which should be the normal
situation. Secondly the fixes will be merged into the mainline before
they merge back into the New Release branch and this holds to the merge
at one level at a time rule.
Build System Influence
The build system can have a great deal of influence on the
system design. One of the key factors that a build system may
be based on is the idea that “The basic rule of thumb is that
you should be able to walk up to the project with a virgin machine,
do a checkout, and be able to fully build the system.” [6]
This “rule of thumb” that Martin Fowler talks about is a cornerstone
to the system design. If the build system is capable of working on
any system, then it is very easy to create a “sandbox” for a developer
to work in on his or her own system. This way a developer can work on
code with no affect on the main code line. [7] The other advantage is
that now the same system that is used to build the final executables
is used on the developer system so there is a consistency in all code builds.
One of the major aspects of a SCM system is the ability to tell the exact
code that is currently deployed. One way this can be achieved is by ensuring
that executables can tell the current and correct version. This type of
requirement, for the build system, will also have a major influence on the
VCS. To automate this functionality there must be predictability within the
VCS system. Using labels, instead of file version numbers, will add more
flexibility in automating shipping code to tell what its version number is.
This will also allow the version number to be related to an exact set of
source code files that was used to create the executable.
Copyright © 2008 David Demland. All Rights Reserved
Terms
One of the most important parts of communication is ensuring that
there is an understanding what is being said. At the heart of this
concept is common definition of terms. This section will give common
used terms and their definitions.
Mainline
All components must compile and link, and pass regression tests;
completed, tested new features may be checked in. [2]
Configuration Item
Component identified as an item to be controlled. [5]
Release
Any code being worked on for the next released version of the system.
Service Pack
Any code being worked on for the current service pack (update) of the system.
Hot Fix
Any code that requires a fast turn around fix for a customer.
Late Branching
A branch that is created at the time of a codeline conflict. [7]
Early Branching
A codeline is created before any codeline conflicts exists. [7]
Rebase
Merging a codeline from the Mainline into an other branch.
SCM
Software Configuration Management
VCS
Version Control System
Test Driven Development, TDD
Development process where tests are created before code.