Rectangle 27 390

As far as I know a common solution is to add a ?<version> to the script's src link.

I assume at this point that there isn't a better way than find-replace to increment these "version numbers" in all of the script tags?

You might have a version control system do that for you? Most version control systems have a way to automatically inject the revision number on check-in for instance.

<script type="text/javascript" src="myfile.js?$$REVISION$$"></script>

Of course, there are always better solutions like this one.

Does anyone know if IE7 ignores this? It seems to be ignoring the appended data and using the cached file when I test in IE8 comparability view.

I always knew the query strings are key-value pair as in ?ver=123. Thanks! :)

So when you add a higher javascript version it will automatically download to the client's cache? Or am i understanding this in a wrong way?

i think it's not about higher or lower version number but about changing the appended variables value to something the browser couldn't have cached yet.

this will not work in chrome

We recently encountered the same issue, and the best i could come up with was a simple function that attached "?mod=123456" where 123456 was the unix timestamp of the modified date on the file. That seems to fix the issues while still allowing for caching where appropriate. However, I have still seen browsers flat out ignore this directive and use old JS anyway, but I don't know that there's an elegant "full fix."

For awareness: this is considered to be a hack. This method tricks the browser into thinking that a new file is being specified, as it simply looks at the full file name without interpreting it. foo.js?1 is not the same name as foo.js?2, so the browser will think they are two different files. One downside is that both files will simultaneously exist in the users' cache, taking up unnecessary space.

caching - How can I force clients to refresh JavaScript files? - Stack...

javascript caching versioning
Rectangle 27 372

As far as I know a common solution is to add a ?<version> to the script's src link.

I assume at this point that there isn't a better way than find-replace to increment these "version numbers" in all of the script tags?

You might have a version control system do that for you? Most version control systems have a way to automatically inject the revision number on check-in for instance.

<script type="text/javascript" src="myfile.js?$$REVISION$$"></script>

Of course, there are always better solutions like this one.

Does anyone know if IE7 ignores this? It seems to be ignoring the appended data and using the cached file when I test in IE8 comparability view.

I always knew the query strings are key-value pair as in ?ver=123. Thanks! :)

i think it's not about higher or lower version number but about changing the appended variables value to something the browser couldn't have cached yet.

this will not work in chrome

For awareness: this is considered to be a hack. This method tricks the browser into thinking that a new file is being specified, as it simply looks at the full file name without interpreting it. foo.js?1 is not the same name as foo.js?2, so the browser will think they are two different files. One downside is that both files will simultaneously exist in the users' cache, taking up unnecessary space.

caching - How can I force clients to refresh JavaScript files? - Stack...

javascript caching versioning
Rectangle 27 369

As far as I know a common solution is to add a ?<version> to the script's src link.

I assume at this point that there isn't a better way than find-replace to increment these "version numbers" in all of the script tags?

You might have a version control system do that for you? Most version control systems have a way to automatically inject the revision number on check-in for instance.

<script type="text/javascript" src="myfile.js?$$REVISION$$"></script>

Of course, there are always better solutions like this one.

Does anyone know if IE7 ignores this? It seems to be ignoring the appended data and using the cached file when I test in IE8 comparability view.

I always knew the query strings are key-value pair as in ?ver=123. Thanks! :)

i think it's not about higher or lower version number but about changing the appended variables value to something the browser couldn't have cached yet.

this will not work in chrome

For awareness: this is considered to be a hack. This method tricks the browser into thinking that a new file is being specified, as it simply looks at the full file name without interpreting it. foo.js?1 is not the same name as foo.js?2, so the browser will think they are two different files. One downside is that both files will simultaneously exist in the users' cache, taking up unnecessary space.

Sign up for our newsletter and get our top new questions delivered to your inbox (see an example).

caching - How can I force clients to refresh JavaScript files? - Stack...

javascript caching versioning
Rectangle 27 230

  • Made a change to code, realised it was a mistake and wanted to revert back?
  • Lost code or had a backup that was too old?
  • Had to maintain multiple versions of a product?
  • Wanted to see the difference between two (or more) versions of your code?
  • Wanted to prove that a particular change broke or fixed a piece of code?
  • Wanted to see how much work is being done, and where, when and by whom?
  • Wanted to experiment with a new feature without interfering with working code?

In these cases, and no doubt others, a version control system should make your life easier.

To misquote a friend: A civilised tool for a civilised age.

sounds useful.. until i have to learn and master it. heh

Good points. However, note that version control is not a backup! A backup is stored on a separate system/media, and keeps old backups for a while (just in case your repository gets screwed up somehow).

Couldn't agree more sleske. That's why along with our standard VM backup and nightly repository verification, I keep a mirror repository which is synced hourly and is also backed up and verified :) We use Subversion and have found svnedge to be a good product.

Hi Tim, how do you track your change history? How do you link your change history to an issue tracker or release notes? How do you manage merging different branches of your code? How do you find the changes you made in your last 100 versions? Maybe if you code alone, or never worry about why you changed code, then maybe just having a backup is enough, but I bet once you used a decent VCS you will understand why so many people use them.

svn - Why should I use version control? - Stack Overflow

svn git version-control cvs
Rectangle 27 53

and Jeff's article

I feel your pain, and I wish there were a better answer. This might be closer to what you were looking for.

Generally, I feel there is no adequate, accepted solution to this, and I roll my own in this area.

As you can tell from my question, I am aware of the concept of deltas. My question is about conventions for creating those, preferably automatically.

Have you tried out DBDiff: github.com/DBDiff/DBDiff ? It's a good fit for what you're looking for @EranGalperin as it does automated migrations for both schema and data in SQL. Disclosure I'm the developer behind it!

sql - How do you version your database schema? - Stack Overflow

sql mysql schema versioning
Rectangle 27 53

and Jeff's article

I feel your pain, and I wish there were a better answer. This might be closer to what you were looking for.

Generally, I feel there is no adequate, accepted solution to this, and I roll my own in this area.

As you can tell from my question, I am aware of the concept of deltas. My question is about conventions for creating those, preferably automatically.

Have you tried out DBDiff: github.com/DBDiff/DBDiff ? It's a good fit for what you're looking for @EranGalperin as it does automated migrations for both schema and data in SQL. Disclosure I'm the developer behind it!

sql - How do you version your database schema? - Stack Overflow

sql mysql schema versioning
Rectangle 27 26

The two common reasons you may want to store binaries in a Version Control System are (written in 2009):

  • store external third-party libraries. Usually one stores them into a Maven repository, but to store them into SVN allows you to have one and only one referential for all your need: get your sources, and get your libraries you need to compile those sources. All comes from one repository.

(As noted by ivorujavaboy in 2017: "The only good reason to do this at present day is if you have STATIC libraries that will never change, which is a really rare case")

  • store deliveries for quicker deployment. Usually deliveries (the executable you build to deploy into production) are built on demand. But if you have many pre-production environment, and if you have many deliveries, the cost of building them for assembly, integration, homologation, pre-production platforms can be high. A solution is to build them once, store them in a deliveries section of your SVN, and use them directly in your different environment. Note: This apply also to development elements: if you have a Jaxb process which generates 900 POJO files (through XML binding), and you need to download that development set in multiple environments, you may want 1 compressed file copy transaction, rather than 900 ones.

So yes, it is "acceptable/good to store runtime binaries in the SVN"... for the right reasons.

  • Wim Coenen rightfully mentions the disadvantages (bad practice, slow, mismatch between sources and stored delivery)

Also, if you can't use Maven or prefer ant, then storing the libraries in your repository where they can easily be checked out for an ant build makes sense.

a 3rd common reason is for storing bitmapped graphics, audio and video. These need to be version controlled too. Certain versions of the same video may only be appropriate for a certain version of the source code that uses it.

@Rob: a/ if they changed often and b/ if you need to retrieve an old version, then yes. If not, other repositories (not VCS-based) like Nexus will ensure some history for those elements, and you will be able to remove them from said repository much more easily than in a VCS.

+1 for the Nexus introduction, an idea I should explore myself.

@ivoruJavaBoy I agree. This was written in another time, 8 years ago ;) Let me edit the answer.

version control - Is it acceptable/good to store binaries in SVN? - St...

svn version-control binary
Rectangle 27 161

Martin Fowler wrote my favorite article on the subject, http://martinfowler.com/articles/evodb.html. I choose not to put schema dumps in under version control as alumb and others suggest because I want an easy way to upgrade my production database.

For a web application where I'll have a single production database instance, I use two techniques:

A sequence database upgrade scripts that contain the DDL necessary to move the schema from version N to N+1. (These go in your version control system.) A _version_history_ table, something like

create table VersionHistory (
    Version int primary key,
    UpgradeStart datetime not null,
    UpgradeEnd datetime
    );

gets a new entry every time an upgrade script runs which corresponds to the new version.

This ensures that it's easy to see what version of the database schema exists and that database upgrade scripts are run only once. Again, these are not database dumps. Rather, each script represents the changes necessary to move from one version to the next. They're the script that you apply to your production database to "upgrade" it.

  • A script to backup, sanitize, and shrink a production database. Run this after each upgrade to the production DB.
  • A script to restore (and tweak, if necessary) the backup on a developer's workstation. Each developer runs this script after each upgrade to the production DB.

Dumping (and versioning) the full DB schema after running new upgrade scripts is a good way to make information available to other tools in your build/deploy process as well. Also, having the full schema in a script means being able to "spin up" a fresh database without going through all the migration steps. It also makes it possible to diff the current version against accumulated previous versions.

Are saying that you put upgrade scripts in source control, nut do not put rollback ones there?

I have a habit of maintaining a full create and drop script, as well as delta scripts for updating existing db instances up to date. Both go into version control. The delta scripts are named according to revision numbers. That way it's easy to automate db patching with an update script.

Versioning SQL Server database - Stack Overflow

sql-server database svn version-control
Rectangle 27 161

Martin Fowler wrote my favorite article on the subject, http://martinfowler.com/articles/evodb.html. I choose not to put schema dumps in under version control as alumb and others suggest because I want an easy way to upgrade my production database.

For a web application where I'll have a single production database instance, I use two techniques:

A sequence database upgrade scripts that contain the DDL necessary to move the schema from version N to N+1. (These go in your version control system.) A _version_history_ table, something like

create table VersionHistory (
    Version int primary key,
    UpgradeStart datetime not null,
    UpgradeEnd datetime
    );

gets a new entry every time an upgrade script runs which corresponds to the new version.

This ensures that it's easy to see what version of the database schema exists and that database upgrade scripts are run only once. Again, these are not database dumps. Rather, each script represents the changes necessary to move from one version to the next. They're the script that you apply to your production database to "upgrade" it.

  • A script to backup, sanitize, and shrink a production database. Run this after each upgrade to the production DB.
  • A script to restore (and tweak, if necessary) the backup on a developer's workstation. Each developer runs this script after each upgrade to the production DB.

Dumping (and versioning) the full DB schema after running new upgrade scripts is a good way to make information available to other tools in your build/deploy process as well. Also, having the full schema in a script means being able to "spin up" a fresh database without going through all the migration steps. It also makes it possible to diff the current version against accumulated previous versions.

Are saying that you put upgrade scripts in source control, nut do not put rollback ones there?

I have a habit of maintaining a full create and drop script, as well as delta scripts for updating existing db instances up to date. Both go into version control. The delta scripts are named according to revision numbers. That way it's easy to automate db patching with an update script.

Versioning SQL Server database - Stack Overflow

sql-server database svn version-control
Rectangle 27 7

The use of delta storage in the pack file is just an implementation detail. At that level, Git doesn't know why or how something changed from one revision to the next, rather it just knows that blob B is pretty similar to blob A except for these changes C. So it will only store blob A and changes C (if it chooses to do so - it could also choose to store blob A and blob B).

When retrieving objects from the pack file, the delta storage is not exposed to the caller. The caller still sees complete blobs. So, Git works the same way it always has without the delta storage optimisation.

All of this is equally true of deltas stored by most other version control systems though...

The relationships between the blobs when compressed may have nothing to do with the relationships suggested/implied by the revision history.

I understand that this is not exposed to the caller, but still, to retrieve a given hash, Git may have to take a blob and apply a changeset; it doesn't have a true snapshot, right? So what sets this apart from SVN at that point - is it that it never has to stack up a large number of deltas to get to a particular point in history because it limits how many deltas is stacks on a blob?

I'm not sure how it's different from SVN, because I'm not really sure how SVN works inside. But I can point out that it's different from Darcs (and the Darcs theory of patches) in that the pack files have nothing to do with how git merges branches. Git reconstructs the snapshot of the revisions it's merging, it also reconstructs snapshots of any older revisions it needs to merge, and then figures out what to do. Darcs, on the other hand, stores patches, and combines them when you need a working directory, and it can create different working directories based on different sets of patches.

Git does limit how many deltas may need to be applied to regenerate an object (and btw delta representation is just as useful for trees as blobs) which is the --depth parameter to git pack-objects. Some looking at what actually gets packed also suggests Git requires a delta to be really quite small for it to be worth using instead of the compressed blob.

version control - Are Git's pack files deltas rather than snapshots? -...

git version-control internals
Rectangle 27 568

pip was originally written to improve on easy_install in the following ways

  • All packages are downloaded before installation. Partially-completed installation doesnt occur as a result.
  • Care is taken to present useful output on the console.
  • The reasons for actions are kept track of. For instance, if a package is being installed, pip keeps track of why that package was required.
  • The code is relatively concise and cohesive, making it easier to use programmatically.
  • Packages dont have to be installed as egg archives, they can be installed flat (while keeping the egg metadata).
  • Native support for other version control systems (Git, Mercurial and Bazaar)
  • Simple to define fixed sets of requirements and reliably reproduce a set of packages.

The "error messages" advantage is huge, especially for newer users. Easy-install is famous for spitting out dozens of what look like fatal errors, only to have wound up doing the install successfully anyway, which makes it difficult to use until you learn to ignore most everything it says. Pip simply omits saying those things in the first place.

I find it really silly that pip is not installable via easy_install pip. Also to make the transition easier, the hidden instruction of downloading the pip installer is faulty because the web server certificate cannot be verified.

easy_install pip

I install pip via easy_install pip all the time, and in fact I did so well before the timestamp on that comment. I'm not sure what @sorin is referring to.

sudo apt-get install python-pip

Do not use easy_install outside of a virtualenv on package-based distributions: workaround.org/easy-install-debian

@Dennis: When using sudo apt-get Ubuntu/Debian will install Python packages in /usr/lib/python/dist-packages whereas sudo pip or sudo easy_install will install in /local/lib/python/site-packages and unfortunately the Debian/Ubuntu packages often have different names that pip isn't familiar with. The best solution IMHO is to use virtualenv and pip intall your packages there.

This needs an update.

python - Why use pip over easy_install? - Stack Overflow

python pip setuptools easy-install pypi
Rectangle 27 567

pip was originally written to improve on easy_install in the following ways

  • All packages are downloaded before installation. Partially-completed installation doesnt occur as a result.
  • Care is taken to present useful output on the console.
  • The reasons for actions are kept track of. For instance, if a package is being installed, pip keeps track of why that package was required.
  • The code is relatively concise and cohesive, making it easier to use programmatically.
  • Packages dont have to be installed as egg archives, they can be installed flat (while keeping the egg metadata).
  • Native support for other version control systems (Git, Mercurial and Bazaar)
  • Simple to define fixed sets of requirements and reliably reproduce a set of packages.

The "error messages" advantage is huge, especially for newer users. Easy-install is famous for spitting out dozens of what look like fatal errors, only to have wound up doing the install successfully anyway, which makes it difficult to use until you learn to ignore most everything it says. Pip simply omits saying those things in the first place.

easy_install pip
sudo apt-get install python-pip

Do not use easy_install outside of a virtualenv on package-based distributions: workaround.org/easy-install-debian

@Dennis: When using sudo apt-get Ubuntu/Debian will install Python packages in /usr/lib/python/dist-packages whereas sudo pip or sudo easy_install will install in /local/lib/python/site-packages and unfortunately the Debian/Ubuntu packages often have different names that pip isn't familiar with. The best solution IMHO is to use virtualenv and pip intall your packages there.

This needs an update.

python - Why use pip over easy_install? - Stack Overflow

python pip setuptools easy-install pypi
Rectangle 27 567

pip was originally written to improve on easy_install in the following ways

  • All packages are downloaded before installation. Partially-completed installation doesnt occur as a result.
  • Care is taken to present useful output on the console.
  • The reasons for actions are kept track of. For instance, if a package is being installed, pip keeps track of why that package was required.
  • The code is relatively concise and cohesive, making it easier to use programmatically.
  • Packages dont have to be installed as egg archives, they can be installed flat (while keeping the egg metadata).
  • Native support for other version control systems (Git, Mercurial and Bazaar)
  • Simple to define fixed sets of requirements and reliably reproduce a set of packages.

The "error messages" advantage is huge, especially for newer users. Easy-install is famous for spitting out dozens of what look like fatal errors, only to have wound up doing the install successfully anyway, which makes it difficult to use until you learn to ignore most everything it says. Pip simply omits saying those things in the first place.

easy_install pip
sudo apt-get install python-pip

Do not use easy_install outside of a virtualenv on package-based distributions: workaround.org/easy-install-debian

@Dennis: When using sudo apt-get Ubuntu/Debian will install Python packages in /usr/lib/python/dist-packages whereas sudo pip or sudo easy_install will install in /local/lib/python/site-packages and unfortunately the Debian/Ubuntu packages often have different names that pip isn't familiar with. The best solution IMHO is to use virtualenv and pip intall your packages there.

This needs an update.

python - Why use pip over easy_install? - Stack Overflow

python pip setuptools easy-install pypi
Rectangle 27 567

pip was originally written to improve on easy_install in the following ways

  • All packages are downloaded before installation. Partially-completed installation doesnt occur as a result.
  • Care is taken to present useful output on the console.
  • The reasons for actions are kept track of. For instance, if a package is being installed, pip keeps track of why that package was required.
  • The code is relatively concise and cohesive, making it easier to use programmatically.
  • Packages dont have to be installed as egg archives, they can be installed flat (while keeping the egg metadata).
  • Native support for other version control systems (Git, Mercurial and Bazaar)
  • Simple to define fixed sets of requirements and reliably reproduce a set of packages.

The "error messages" advantage is huge, especially for newer users. Easy-install is famous for spitting out dozens of what look like fatal errors, only to have wound up doing the install successfully anyway, which makes it difficult to use until you learn to ignore most everything it says. Pip simply omits saying those things in the first place.

easy_install pip
sudo apt-get install python-pip

Do not use easy_install outside of a virtualenv on package-based distributions: workaround.org/easy-install-debian

@Dennis: When using sudo apt-get Ubuntu/Debian will install Python packages in /usr/lib/python/dist-packages whereas sudo pip or sudo easy_install will install in /local/lib/python/site-packages and unfortunately the Debian/Ubuntu packages often have different names that pip isn't familiar with. The best solution IMHO is to use virtualenv and pip intall your packages there.

This needs an update.

python - Why use pip over easy_install? - Stack Overflow

python pip setuptools easy-install pypi
Rectangle 27 537

And oh, by the way, Subversion still sucks

The claim of why merging is better in a DVCS than in Subversion was largely based on how branching and merge worked in Subversion a while ago. Subversion prior to 1.5.0 didn't store any information about when branches were merged, thus when you wanted to merge you had to specify which range of revisions that had to be merged.

1   2   4     6     8
trunk o-->o-->o---->o---->o
       \
        \   3     5     7
b1       +->o---->o---->o

When we want to merge b1's changes into the trunk we'd issue the following command, while standing on a folder that has trunk checked out:

svn merge -r 2:7 {link to branch b1}

which will attempt to merge the changes from b1 into your local working directory. And then you commit the changes after you resolve any conflicts and tested the result. When you commit the revision tree would look like this:

1   2   4     6     8   9
trunk o-->o-->o---->o---->o-->o      "the merge commit is at r9"
       \
        \   3     5     7
b1       +->o---->o---->o

However this way of specifying ranges of revisions gets quickly out of hand when the version tree grows as subversion didn't have any meta data on when and what revisions got merged together. Ponder on what happens later:

12        14
trunk  -->o-------->o
                                     "Okay, so when did we merge last time?"
              13        15
b1     ----->o-------->o

This is largely an issue by the repository design that Subversion has, in order to create a branch you need to create a new virtual directory in the repository which will house a copy of the trunk but it doesn't store any information regarding when and what things got merged back in. That will lead to nasty merge conflicts at times. What was even worse is that Subversion used two-way merging by default, which has some crippling limitations in automatic merging when two branch heads are not compared with their common ancestor.

To mitigate this Subversion now stores meta data for branch and merge. That would solve all problems right?

On a centralized system, like subversion, virtual directories suck. Why? Because everyone has access to view them even the garbage experimental ones. Branching is good if you want to experiment but you don't want to see everyones' and their aunts experimentation. This is serious cognitive noise. The more branches you add, the more crap you'll get to see.

The more public branches you have in a repository the harder it will be to keep track of all the different branches. So the question you'll have is if the branch is still in development or if it is really dead which is hard to tell in any centralized version control system.

Most of the time, from what I've seen, an organization will default to use one big branch anyway. Which is a shame because that in turn will be difficult to keep track of testing and release versions, and whatever else good comes from branching.

There is a very simple reason why: branching is a first-class concept. There are no virtual directories by design and branches are hard objects in DVCS which it needs to be such in order to work simply with synchronization of repositories (i.e. push and pull).

The first thing you do when you work with a DVCS is to clone repositories (git's clone, hg's clone and bzr's branch). Cloning is conceptually the same thing as creating a branch in version control. Some call this forking or branching (although the latter is often also used to refer to co-located branches), but it's just the same thing. Every user runs their own repository which means you have a per-user branching going on.

The version structure is not a tree, but rather a graph instead. More specifically a directed acyclic graph (DAG, meaning a graph that doesn't have any cycles). You really don't need to dwell into the specifics of a DAG other than each commit has one or more parent references (which what the commit was based on). So the following graphs will show the arrows between revisions in reverse because of this.

A very simple example of merging would be this; imagine a central repository called origin and a user, Alice, cloning the repository to her machine.

a   b   c
origin   o<---o<---o
                   ^master
         |
         | clone
         v

         a   b   c
alice    o<---o<---o
                   ^master
                   ^origin/master

What happens during a clone is that every revision is copied to Alice exactly as they were (which is validated by the uniquely identifiable hash-id's), and marks where the origin's branches are at.

Alice then works on her repo, committing in her own repository and decides to push her changes:

a   b   c
origin   o<---o<---o
                   ^ master

              "what'll happen after a push?"


         a   b   c   d   e
alice    o<---o<---o<---o<---o
                             ^master
                   ^origin/master

The solution is rather simple, the only thing that the origin repository needs to do is to take in all the new revisions and move it's branch to the newest revision (which git calls "fast-forward"):

a   b   c   d   e
origin   o<---o<---o<---o<---o
                             ^ master

         a   b   c   d   e
alice    o<---o<---o<---o<---o
                             ^master
                             ^origin/master

The use case, which I illustrated above, doesn't even need to merge anything. So the issue really isn't with merging algorithms since three-way merge algorithm is pretty much the same between all version control systems. The issue is more about structure than anything.

Admittedly the above example is a very simple use case, so lets do a much more twisted one albeit a more common one. Remember that origin started out with three revisions? Well, the guy who did them, lets call him Bob, has been working on his own and made a commit on his own repository:

a   b   c   f
bob      o<---o<---o<---o
                        ^ master
                   ^ origin/master

                   "can Bob push his changes?" 

         a   b   c   d   e
origin   o<---o<---o<---o<---o
                             ^ master

Now Bob can't push his changes directly to the origin repository. How the system detects this is by checking if Bob's revisions directly descents from origin's, which in this case doesn't. Any attempt to push will result into the system saying something akin to "Uh... I'm afraid can't let you do that Bob."

So Bob has to pull-in and then merge the changes (with git's pull; or hg's pull and merge; or bzr's merge). This is a two-step process. First Bob has to fetch the new revisions, which will copy them as they are from the origin repository. We can now see that the graph diverges:

v master
         a   b   c   f
bob      o<---o<---o<---o
                   ^
                   |    d   e
                   +----o<---o
                             ^ origin/master

         a   b   c   d   e
origin   o<---o<---o<---o<---o
                             ^ master

The second step of the pull process is to merge the diverging tips and make a commit of the result:

Hopefully the merge won't run into conflicts (if you anticipate them you can do the two steps manually in git with fetch and merge). What later needs to be done is to push in those changes again to origin, which will result into a fast-forward merge since the merge commit is a direct descendant of the latest in the origin repository:

v origin/master
                                 v master
         a   b   c   f       1
bob      o<---o<---o<---o<-------o
                   ^             |
                   |    d   e  |
                   +----o<---o<--+

                                 v master
         a   b   c   f       1
origin   o<---o<---o<---o<-------o
                   ^             |
                   |    d   e  |
                   +----o<---o<--+

There is another option to merge in git and hg, called rebase, which'll move Bob's changes to after the newest changes. Since I don't want this answer to be any more verbose I'll let you read the git, mercurial or bazaar docs about that instead.

As an exercise for the reader, try drawing out how it'll work out with another user involved. It is similarly done as the example above with Bob. Merging between repositories is easier than what you'd think because all the revisions/commits are uniquely identifiable.

There is also the issue of sending patches between each developer, that was a huge problem in Subversion which is mitigated in git, hg and bzr by uniquely identifiable revisions. Once someone has merged his changes (i.e. made a merge commit) and sends it for everyone else in the team to consume by either pushing to a central repository or sending patches then they don't have to worry about the merge, because it already happened. Martin Fowler calls this way of working promiscuous integration.

Because the structure is different from Subversion, by instead employing a DAG, it enables branching and merging to be done in an easier manner not only for the system but for the user as well.

I don't agree with your branches==noise argument. Lots of branches doesn't confuse people because the lead dev should tell people which branch to use for big features... so two devs might work on branch X to add "flying dinosaurs", 3 might work on Y to "let you throw cars at people"

John: Yes, for small number of branches there is little noise and is managable. But come back after you've witnessed 50+ branches and tags or so in subversion or clear case where most of them you can't tell if they're active or not. Usability issue from the tools aside; why have all that litter around in your repository? At least in p4 (since a user's "workspace" is essentially a per-user branch), git or hg you've got the option to not let everyone know about the changes you do until you push them upstream, which is a safe-guard for when the changes are relevant to others.

I don't get your "too many experimental branches are noise argument either, @Spoike. We have a "Users" folder where every user has his own folder. There he can branch as often as he wishes. Branches are inexpensive in Subversion and if you ignore the folders of the other users (why should you care about them anyway), then you don't see noise. But for me merging in SVN does not suck (and I do it often, and no, it's not a small project). So maybe I do something wrong ;) Nevertheless the merging of Git and Mercurial is superior and you pointed it out nicely.

In svn it's easy to kill inactive branches, you just delete them. The fact that people don't remove unused branches therefore creating clutter is just a matter of housekeeping. You could just as easily wind up with lots of temporary branches in Git as well. In my workplace we use a "temp-branches" top-level directory in addition to the standard ones - personal branches and experimental branches go in there instead of cluttering the branches directory where "official" lines of code are kept (we don't use feature branches).

Does this mean then, that from v1.5 subversion can at least merge as well as git can?

How and/or why is merging in Git better than in SVN? - Stack Overflow

svn git version-control mercurial merge
Rectangle 27 1

As far as I know there is no easy way to remove an added file from versioning control in svn once it is committed.

You will have to save the file somewhere else and delete it from version control. Than copy the backup back again.

It's a version control system after all... ;)

How to "unversion" a file in either svn and/or git - Stack Overflow

svn git version-control versioning
Rectangle 27 53

In Ruby on Rails, there's a concept of a migration -- a quick script to change the database.

You generate a migration file, which has rules to increase the db version (such as adding a column) and rules to downgrade the version (such as removing a column). Each migration is numbered, and a table keeps track of your current db version.

To migrate up, you run a command called "db:migrate" which looks at your version and applies the needed scripts. You can migrate down in a similar way.

The migration scripts themselves are kept in a version control system -- whenever you change the database you check in a new script, and any developer can apply it to bring their local db to the latest version.

This is the choice for Ruby projects. The nearest equivalent to this design in java is mybatis schema migrations. For .NET the equivalent is code.google.com/p/migratordotnet. They're all excellent tools for this job IMO.

sql - Is there a version control system for database structure changes...

sql database oracle version-control
Rectangle 27 52

In Ruby on Rails, there's a concept of a migration -- a quick script to change the database.

You generate a migration file, which has rules to increase the db version (such as adding a column) and rules to downgrade the version (such as removing a column). Each migration is numbered, and a table keeps track of your current db version.

To migrate up, you run a command called "db:migrate" which looks at your version and applies the needed scripts. You can migrate down in a similar way.

The migration scripts themselves are kept in a version control system -- whenever you change the database you check in a new script, and any developer can apply it to bring their local db to the latest version.

This is the choice for Ruby projects. The nearest equivalent to this design in java is mybatis schema migrations. For .NET the equivalent is code.google.com/p/migratordotnet. They're all excellent tools for this job IMO.

sql - Is there a version control system for database structure changes...

sql database oracle version-control
Rectangle 27 79

I feel the answer to your question is a resounding yes- the benefits of managing your files with a version control system far outweigh the costs of implementing such a system.

  • Backup: I have a backup system already in place.

Yes, and so do I. However, there are some questions to consider regarding the appropriateness of relying on a general purpose backup system to adequately track important and active files relating to your work. On the performance side:

  • At what interval does your backup system take snapshots?
  • Does it have to image your entire hard drive when taking a snapshot, or could it be easily told to just back up two files that just received critical updates?
  • Can your backup system show you, with pinpoint accuracy, what changed in your text files from one backup to the next?
  • How many locations are the backups saved in? Are they in the same physical location as your computer?
  • How easy is it to restore a given version of a single file from your backup system?

For example, have a Mac and use Time Machine to backup to another hard drive in my computer. Time Machine is great for recovering the odd file or restoring my system if things get messed up. However it simply doesn't have what it takes to be trusted with my important work:

With a version control system like Git, I can initiate a backup of specific files with no more effort that requesting a save in a text editor- and the file is imaged and stored instantaneously. Furthermore, Git is distributed so each computer that I work at has a full copy of the repository.

  • Forking and rewinding: I've never felt the need to do this, but I can see how it could be useful (e.g., you are preparing multiple journal articles based on the same dataset; you are preparing a report that is updated monthly, etc)

As a soloist, I don't fork that much either. However, the time I have saved by having the option to rewind has single-handedly paid back my investment in learning a version control system many, many times. You say you have never felt the need to do this- but has rewinding any file under your current backup system really been a painless, feasible option?

Sometimes the report just looked better 45 minutes, an hour or two days ago.

  • Collaboration: Most of the time I am analysing data myself, thus, I wouldn't get the collaboration benefits of version control.

Yes, but you would learn a tool that may prove to be indispensable if you do end up collaborating with others on a project.

  • Time to evaluate and learn a version control system

Don't worry too much about this. Version control systems are like programming languages- they have a few key concepts that need to be learned and the rest is just syntactic sugar. Basically, the first version control system you learn will require investing the most time- switching to another one just requires learning how the new system expresses key concepts.

Pick a popular system and go for it!

Do you have one folder, say Projects that contains all the folders and files related to your data analysis activities? If so then slapping version control on it is going to increase the complexity of your file system by exactly 0. If your projects are strewn about your computer- then you should centralize them before applying version control and this will end up decreasing the complexity of managing your files- that's why we have a Documents folder after all.

  • Is version control worth the effort?

Yes! It gives you a huge undo button and allows you to easily transfer work from machine to machine without worrying about things like losing your USB drive.

2 What are the main pros and cons of adopting version control?

The only con I can think of is a slight increase in file size- but modern version control systems can do absolutely amazing things with compression and selective saving so this is pretty much a moot point.

3 What is a good strategy for getting started with version control for data analysis with R (e.g., examples, workflow ideas, software, links to guides)?

Keep files that generate data or reports under version control, be selective. If you are using something like Sweave, store your .Rnw files and not the .tex files that get produced from them. Store raw data if it would be a pain to re-acquire. If possible, write and store a script that acquires your data and another that cleans or modifies it rather than storing changes to raw data.

As for learning a version control system, I highly recommend Git and this guide to it.

These websites also have some nice tips and tricks related to performing specific actions with Git:

+1 for the reply to "A possible increase in complexity over my current file management system". Version control will potentially reduce the level of complexity if the items placed in version control along with giving granular level of control over snapshot points in your backup process (You give some description too which is very helpful in recovering not by date but by feature or change).

git - R and version control for the solo data analyst - Stack Overflow

git version-control r
Rectangle 27 78

I feel the answer to your question is a resounding yes- the benefits of managing your files with a version control system far outweigh the costs of implementing such a system.

  • Backup: I have a backup system already in place.

Yes, and so do I. However, there are some questions to consider regarding the appropriateness of relying on a general purpose backup system to adequately track important and active files relating to your work. On the performance side:

  • At what interval does your backup system take snapshots?
  • Does it have to image your entire hard drive when taking a snapshot, or could it be easily told to just back up two files that just received critical updates?
  • Can your backup system show you, with pinpoint accuracy, what changed in your text files from one backup to the next?
  • How many locations are the backups saved in? Are they in the same physical location as your computer?
  • How easy is it to restore a given version of a single file from your backup system?

For example, have a Mac and use Time Machine to backup to another hard drive in my computer. Time Machine is great for recovering the odd file or restoring my system if things get messed up. However it simply doesn't have what it takes to be trusted with my important work:

With a version control system like Git, I can initiate a backup of specific files with no more effort that requesting a save in a text editor- and the file is imaged and stored instantaneously. Furthermore, Git is distributed so each computer that I work at has a full copy of the repository.

  • Forking and rewinding: I've never felt the need to do this, but I can see how it could be useful (e.g., you are preparing multiple journal articles based on the same dataset; you are preparing a report that is updated monthly, etc)

As a soloist, I don't fork that much either. However, the time I have saved by having the option to rewind has single-handedly paid back my investment in learning a version control system many, many times. You say you have never felt the need to do this- but has rewinding any file under your current backup system really been a painless, feasible option?

Sometimes the report just looked better 45 minutes, an hour or two days ago.

  • Collaboration: Most of the time I am analysing data myself, thus, I wouldn't get the collaboration benefits of version control.

Yes, but you would learn a tool that may prove to be indispensable if you do end up collaborating with others on a project.

  • Time to evaluate and learn a version control system

Don't worry too much about this. Version control systems are like programming languages- they have a few key concepts that need to be learned and the rest is just syntactic sugar. Basically, the first version control system you learn will require investing the most time- switching to another one just requires learning how the new system expresses key concepts.

Pick a popular system and go for it!

Do you have one folder, say Projects that contains all the folders and files related to your data analysis activities? If so then slapping version control on it is going to increase the complexity of your file system by exactly 0. If your projects are strewn about your computer- then you should centralize them before applying version control and this will end up decreasing the complexity of managing your files- that's why we have a Documents folder after all.

  • Is version control worth the effort?

Yes! It gives you a huge undo button and allows you to easily transfer work from machine to machine without worrying about things like losing your USB drive.

2 What are the main pros and cons of adopting version control?

The only con I can think of is a slight increase in file size- but modern version control systems can do absolutely amazing things with compression and selective saving so this is pretty much a moot point.

3 What is a good strategy for getting started with version control for data analysis with R (e.g., examples, workflow ideas, software, links to guides)?

Keep files that generate data or reports under version control, be selective. If you are using something like Sweave, store your .Rnw files and not the .tex files that get produced from them. Store raw data if it would be a pain to re-acquire. If possible, write and store a script that acquires your data and another that cleans or modifies it rather than storing changes to raw data.

As for learning a version control system, I highly recommend Git and this guide to it.

These websites also have some nice tips and tricks related to performing specific actions with Git:

+1 for the reply to "A possible increase in complexity over my current file management system". Version control will potentially reduce the level of complexity if the items placed in version control along with giving granular level of control over snapshot points in your backup process (You give some description too which is very helpful in recovering not by date but by feature or change).

git - R and version control for the solo data analyst - Stack Overflow

git version-control r