2

2Git, the concept

2.1

What is version control?

Let’s start with the basics—“what is version control?”

Version control is a mechanism for recording changes made to any files within a software project. It records all the changes, what files were affected by each change and a reason explaining why those changes were made. It also records who made the change and the time and date of the change.

It keeps a record of every change made within the project and allows any file that has been modified to be reverted back to a previous state. It means that if you change an image on a website, you can always go back and find the original version.

Version control systems do other things too, they can show the differences between two different versions of the software (even down to lines within a file), they allow multiple people to work on the project at the same time—even to work on the same file at the same time and they provide mechanisms for resolving conflicts (where two different people have modified the same section of a file for example).

Version control systems can be applied to any kind of project; it can be a website, a documentation project, a software application, engineering control system—anything at all, as long as it’s a collection of files that can be stored on computer.

The version control system does not itself edit or modify any of the files within the project; it just records changes and, where it recognises a file type, is able to display the changes that have occurred to it.

The version control system does not care what software application is used to modify files within the project, it can be anything: text editor, word processor, file manager, graphics editor, specialist programming application &c. All it cares about is knowing a change has taken place and why.

Version control systems simply record any change made within a collection of files (the project), who made it, when it was made and the reason why. That’s all.

2.1.1

Why use version control?

Well, mainly for the reasons I illustrated at the start of this chapter, if you don’t have a VCS, things get out of hand very quickly.

The proper reasons are:

  1. Collaboration: the VCS allows a team of people to work on the same project at the same time

  2. Version storage: the VCS manages all the versions of all the files, stores them, names them and can recover them

  3. Tracking the changes: the VCS records precisely what was changed and a reason must be given for the change

  4. Restoring changes and regression paths: the VCS allows individual files or groups of files to be restored to a previous version.

Looking at these in turn:

Collaboration

If your idea of working in collaboration is shouting across the office that you’re working on Function Bock 12 and no one else should touch it until you’re finished; then you are probably doing it wrong.

With a version control system anybody within the project team should be able to work on any file at any time; even the same file1.

The VCS should be clever enough to work out what has changes and allow all these changed to be merged back into a common version.

The VCS should allow any member of the team to take a copy of the project, work on it locally on their machine and then recombine all the changes back into the main project.

†1 I’m not sure I recommend this though—sometimes it’s inevitable, but try to avoid it as much as possible.

Version storage

Storing a version of software isn’t as easy as you think, look at my example at the start of this section.

When you store a version, it begs the question what do you actually store; do you store the whole project and every file within it (even though most of these files won’t have changed) or do you just store the files you’ve modified (hoping that you’ve remembered to include all the files that you’ve changed).

If it’s the first case, you will have as I did lots of unnecessary information and files stored on your system.

In the second case, it is very difficult to have a complete picture of the whole project at any given point.

This is what the VCS does for a living; it keeps track of the changes and is able to reconstruct the whole project at any given revision. It does it for you and provides the complete set of project files (or indeed just individual files) from any particular revision.

Tracking changes

This is important; it comes down to how revisions are numbered, and what is in each revision.

You would think numbering would be straight forward start at V01.00 and work up. Unfortunately, this generally only works when you have one person working on a project in a linear fashion.

What version does a file get if two people work on it at the same time? Perhaps files should be revised with the date and time? Again this falls down if you need two different versions of the same file (perhaps one for internal use and one for customers &c.).

It’s fairly certain that unless it is a single person project, things will get out of hand quickly.

The next thing is knowing what changes are contained within each revision; in my example at the start, I recorded the changes made to each file in a header at the start of each file. And that is ok when there are only a small number of files, but by the end I had thousands of files and it meant I had to open each file to see when it had changed and what those changes were.

Even trying to keep a separate log (in say, a spreadsheet) doesn’t work, you eventually forget to record something.

A VCS keeps a log of all the changes and ensures that no change is made without a corresponding record being made when the change is stored (it won’t accept the change without a statement from the person making the change, that doesn’t stop the person entering complete bollocks, but it’s a start).

This log is visible and easy to navigate, it shows the changes made, when they were made, what was changed, who made the change and most importantly a statement explaining why the changes were made.

Restoring changes

Being able to restore older versions of files or even the whole project makes it difficult to screw things up. It’s always possible to go back to an earlier version and start again.

A regression path shows the changes that have been made and the order in which they were made, it allows a user to track the changes backwards and if a bug were introduced regress to an earlier version.

There is one final point:

Project backup

A VCS doesn’t necessarily provide backup facilities, if it exists on a single machine and the VCS repository is deleted, then there will be no backup (unless the operator made one). However, if the VCS is distributed (like Git and GitHub) or indeed if more than one person is working on the project, then each team member will have a local copy of the project that can be restored.

With Git and GitHub it is possible to keep an online copy of the project that is always up to date and can be copied to another PC.

Well, two final points:

VCS the downside

Yes, there is a downside—it all takes time—you have to think when to add a new version. I also spend some time composing the messages that get stored with the changes; I like them to be right.

I find that managing the requirements of the VCS spoils the flow of the work, it interrupts the creative juices as it were.

In short, managing the VCS is time consuming and disruptive.



End flourish image