Wednesday, March 12, 2008

To Fork or Not to Fork

... that is the question. You're working on a big project and you start a new version, but you have to keep the old version alive. Do you fork?

I have to say that I hate this question. Neither answer is right and it's a bigger problem today with internet services than it was with monolithic apps shipped on CD. You might deploy a new rev of the old version at any time to fix a security hole.

If you fork, maintaining the old version can be a royal pain. Bug fixes don't automatically roll back (or forward) and maintaining two independent copies of the source code on dev machines is always confusing.

If you don't fork, you end up with lots of places where you're checking which version it is. There are lots of places where it's easy to screw up, like config files and data files. You'll probably end up forking individual files and classes. Do your new classes inherit from the old version of the equivalent classes? Either way you answer that, it can play havoc with inheritance. And you need a way to build or deploy the old version without it being corrupted by the new version.

Most of us use source code control systems. Why don't they help? Source code control systems are really pretty trivial systems. I've written two myself and neither was brilliant software. Just to note: both systems were built more than ten years ago and, each time, we had needs that couldn't be met with off-the-shelf systems, of which there far fewer. Both systems are long dead.

It seems to me that source code control systems are built as if the biggest problem is the database. Well, certainly that's a big problem, but it's pretty much a solved problem at this point. The real problem is the interface to the database (and by that I mean any GUI as well as any shell commands).

The Missing Features

There are at least two large features that source code control systems seem to be lacking.

Semi-fork. Why can't I start a fork for a codebase and specify, on a file-by-file basis, decide which ones get forked? Whenever I edit a file, I can choose whether that file gets forked or not. Even better, drop it below the file level. If the source code control system is integrated into the development environment, why can't I pick on a class-by-class or method-by-method basis? Instead of having if (v2) or #if V2 sprinkled throughout my code, why not build this ability deep into the dev environment? Allowing the user to switch between showing the different versions isn't rocket science -- even Microsoft Word can toggle between showing different versions of a document.

Lightweight fork. I haul a laptop back and forth from work and I have all my current code on it, including stuff that I haven't checked in yet. I've hauled hard disks and flash drives as well, but syncing is a pain and, of course, it's "cheating" as far as source code control goes, which means that I can't check in when I'm using my home machine because the records on my work machine will get screwed up. The alternative is to enlist twice and check out code at work and at home independently. But, if I do that, I can't easily take work home that I haven't checked in. And I really don't want to check in code I'm in the middle of working on. It might not be tested -- hey, it might not even compile at the moment. A fork would solve the problem but forks are heavyweight and, generally, can only be done by an administrator.

Why can't I have a lightweight fork that I can create at any time? In fact, whenever I start making changes, it could instantly be a lightweight fork. I can commit that fork and I have a backup that I can also get on any other machine. When I commit the changes into the main branch of the codebase, the fork is automatically removed (though it's remembered for historical purposes). The underlying systems probably support this already. We just need the UI to catch up.

Moving Forward

It seems to me that the feature set of source code control systems hasn't changed much in a long time. What's your favorite missing feature in source code control systems? What features are needed for the needs of constantly updated, always live services? And how do we move source code control systems forward so they incorporate features like these?

3 comments:

tante said...

to get simple forking, try one of the distributed vcs like git, mercurial or bazaar: You just create a new branch when you are ready to leave work and at home, you get that branch work on it and merge that branch with your branch at work.
When you get to work, you commit to the central repo without anyone seeing your branches or you breaking the "master" branch.

The distributed model gives you many of the features you want and the distributed vcs implementations can usually interface with the big old-style vcs implementations like svn.

Roy Leban said...

Thanks for the pointers. I confess I'm not familiar with any of them. In the last couple of years, I've used Subversion, CVS, and (ack!) SourceSafe.

At a quick glance, Bazaar certainly looks like its worth checking out.

tante said...

Did you check any DVCS out yet? I'd be interested in hearing your opinion on them.

Post a Comment