Git versus Subversion

distributed-version-control - done reading, great - done reading - done reading - done reading

Subversion runs on a central server, adds changes to its repository of data and 
can give you a snapshot after every change. That snapshot has a revision number; 
the revision number is very important to SVN and the people who use it. If your 
change goes in after mine, you're guaranteed to have a higher revision number.

Git has no centralized server. This difference is crucial. Where Subversion is 
centralized, Git is distributed; therefore, Git has no way to provide an 
increasing revision number, because there is no "latest revision." It still has 
unique revision IDs but they are just not as useful on their own as the 
Subversion's revision numbers.

With Git, the crucial action is no longer the commit; it is the merge. Anyone 
can clone a repository and commit to the clone. The owner of the repository is 
given the choice of merging changes back. Alternatively, developers can push 
changes back to the repository.

git push origin master
git push remoteName branchNameOnRemoteServer

Here is the most crucial difference between Subversion and Git.  With Subversion,
when we commit, we send the changes to the centralized server.  With Git, when 
we commit, the change is committed to the local repository (it is still local, 
and there is no network involved).  With Git, if we want to share our changes 
with other people in the team, we have to use 'git push' to push our changes 
to our topic branch on the remote repository where it is accessible to other 
people, and other people have to pull our changes into their repositories.

// Creating a repository and importing files (Subversion):
svnadmin create /path/to/repo 
svn import localPath REPO_URL
svn co REPO_URL localPath
svn add ...
svn commit -m "..."

// Creating a repository and importing files (Git):
git init

svn up (Getting the latest)
svn st (View status)
svn delete
svn rename (delete and add that keep history)
svn mkdir (create a directory)
svn add (add files)
svn commit

With Git, there is no 'resolved' command.  We just resolve the conflict, and 
run "git add" and commit.

What are the differences between Git and Subversion?

  1. Git is much faster than Subversion
  2. Subversion allows you to check out just a subtree of a repository; Git requires you to clone the entire repository (including history) and create a working copy that mirrors at least a subset of the items under version control.
  3. Git's repositories are much smaller than Subversions (for the Mozilla project, 30x smaller)
  4. Git was designed to be fully distributed from the start, allowing each developer to have full local control.
    1. Because there is no centralized server, you can commit code without having a network connection. This is similar to having your own Subversion repository server using the file:/// protocol.
  5. With Git, every file and commit is checksummed so corruption is rare.
  6. Git has a staging area. Git has a two-phase commit approach. First, you "commit" normally like you do with subversion. The change go to the staging area. It is not fully committed yet. You then do thing (git push) to move the code from the staging area to the repository. This is a bit different from the staging / administrative area that exists in Subversion.
  7. Git tracks content not files. Many revision control systems provide an add command that tells the system to start tracking changes to a new file. Git’s add command does something simpler and more powerful: git add is used both for new and newly modified files, and in both cases it takes a snapshot of the given files and stages that content in the index, ready for inclusion in the next commit.
  8. Git branches are simpler and less resource heavy than Subversion's
  9. Git branches carry their entire history
  10. Git provides better auditing of branch and merge events
  11. Git's repo file formats are simple, so repair is easy and corruption is rare.
  12. Backing up Subversion repositories centrally is potentially simpler - since you can choose to distributed folders within a repo in git
  13. Git repository clones act as full repository backups
  14. Subversion's UI is more mature than Git's
  15. Walking through versions is simpler in Subversion because it uses sequential revision numbers (1,2,3,..); Git uses unpredictable SHA-1 hashes. Walking backwards in Git is easy using the "^" syntax, but there is no easy way to walk forward.

What does it mean to say that Git was designed to be fully distributed from the start?

Git was designed from the ground up as a distributed version control system. Being a distributed version control system means that multiple redundant repositories and branching are first class concepts of the tool.

In a distributed VCS like Git every user has a complete copy of the repository data stored locally, thereby making access to file history extremely fast, as well as allowing full functionality when disconnected from the network. It also means every user has a complete backup of the repository. Have 20 users? You probably have more than 20 complete backups of the repository as some users tend to keep more than one repository for the same project. If any repository is lost due to system failure only the changes which were unique to that repository are lost. If users frequently push and fetch changes with each other this tends to be a small amount of loss, if any.

In a centralized VCS like Subversion only the central repository has the complete history. This means that users must communicate over the network with the central repository to obtain history about a file. Backups must be maintained independently of the VCS. If the central repository is lost due to system failure it must be restored from backup and changes since that last backup are likely to be lost. Depending on the backup policies in place this could be several human-weeks worth of work.

(Note that even SVK doesn't do quite the same thing as git. SVK downloads a complete history and allows disconnected commits, but there is still a unique "upstream" repository. Two SVK users can't merge with each other and then push the changes to the upstream.)

How does Git handle access control differently compared to Subversion?

Due to being distributed, you inherently do not have to give commit access to other people in order for them to use the versioning features. Instead, you decide when to merge what from whom.

That is, because subversion controls access, in order for daily checkins to be allowed - for example - the user requires commit access. In git, users are able to have version control of their own work while the source is controlled by the repo owner.

Since Subversion has a single central repository it is possible to specify read and write access controls in a single location and have them be enforced across the entire project. Git can operate with a central repository workflow. Read and write access can be specified at the central repository.

How does Git handle branches?

Branches in Git are a core concept used everyday by every user. In Subversion they are more cumbersome and often used on an as-needed basis. The reason branches are so core in Git is every developer's working directory is itself a branch. Even if two developers are modifying two different unrelated files at the same time it's easy to view these two different working directories as different branches stemming from the same common base revision of the project.

Consequently Git:

  1. Tracks the project revision the branch started from - this information is necessary to merge the branch back to trunk
  2. Records branch merge events including:
    1. author, time and date
    2. branch and revision information
    3. Changes made on the branch(es) remain attributed to the original authors and the original timestamps of those changes
    4. What changes were made to complete the merge. These are attributed to the merging user
    5. Why the merge was done (optional; can be supplied by the user).
  3. Automatically starts the next merge at the last merge.
    1. Knowing what revision was last merged is necessary in order to successfully merge the same branches together again in the future.

This is different to Subversion's handling of branches. As of Subversion 1.5:

  1. Automatically tracks the project revision the branch started from.
    1. Like Git, Subversion remembers where a branch originated.
  2. In Subversion, branches and tags all are copies. Sometimes this is inconvenient, it is easy to check out the whole repository by mistake. Branch path and file path lie in same namespace but they have different semantics - this can be confusing.

How does Git manage to take less space than Subversion?

Git's repository and working directory sizes are extremely small when compared to SVN. For example the Mozilla repository is reported to be almost 12 Gb when stored in SVN using the fsfs backend. Previously, the fsfs backend also required over 240,000 files in one directory to record all 240,000 commits made over the 10 year project history. This was fixed in SVN 1.5, where every 1000 revisions are placed in a separate directory. The exact same history is stored in Git by only two files totaling just over 420 Mb. This means that SVN requires 30x the disk space to store the same history.

One of the reasons for the smaller repo size is that an SVN working directory always contains two copies of each file: one for the user to actually work with and another hidden in .svn/ to aid operations such as status, diff and commit. In contrast a Git working directory requires only one small index file that stores about 100 bytes of data per tracked file. On projects with a large number of files this can be a substantial difference in the disk space required per working copy.

As a full Git clone is often smaller than a full checkout, Git working directories (including the repositories) are typically smaller than the corresponding SVN working directories. There are even ways in Git to share one repository across many working directories, but in contrast to SVN, this requires the working directories to be colocated.

How does Git handle line ending?

Subversion can be easily configured to automatically convert line endings to CRLF or LF, depending on the native line ending used by the client's operating system. This conversion feature is useful when Windows and UNIX users are collaborating on the same set of source code. It is also possible to configure a fixed line ending independent of the native operating system. Files such as a Makefile need to only use LFs, even when they are accessed from Windows. This can be adjusted in a global config and overridden in user configs. Binary files are checked in with a binary flag (like with CVS except that SVN does this almost always automatically) and such never get converted or keyword substituted. Subversion also allows the user to specify line ending conversion on a file-by-file basis. But if the user does not check the binary flag on adding (Subversion prints for every added file whether it recognized it as binary) binary content might get corrupted.

Whilst Git versions prior 1.5.1 never convert files and always assume that every file is opaque and should not be modified. Git 1.5.1 and onwards make [line ending conversion configurable]. Git's advantage over Subversion is that you do not have to manually specify which files this conversion should be applied to, it happens automatically (hence autocrlf).

What are some advantages of a centralized repository versus multiple distributed repositories?

Since Subversion only supports a single repository there is little doubt about where something is stored. Once a user knows the repository URL they can reasonably assume that all materials and all branches related to that project are always available at that location. Backup to tape/CD/DVD is also simple as there is exactly one location that needs to be backed up regularly.

Since Git is distributed by nature not everything related to a project may be stored in the same location. Therefore there may be some degree of confusion about where to obtain a particular branch, unless repository location is always explicitly specified. There may also be some confusion about which repositories are backed up to tape/CD/DVD regularly, and which aren't.

What are you trying to argue here? That a central repository is easier to locate/backup? Git can work with a central repository workflow. Git wins here - speed, size (less backup media needed).

Should I use Git or should I use Subversion?

It looks like I will continue to use Subversion if it was up to me to decide.

  1. To me, the fact that they make branching and merging easier just means that your coworkers are more likely to branch and merge, and you’re more likely to be confused… I did struggle along for a while by memorizing a few key commands, imagining that they were working just like Subversion, but when something didn’t go the way it would have with Subversion, I got confused, and would pretty much just have to run down the hall to get Benjamin or Jacob to help.
  2. Or I may give it a try considering that distributed version control makes merging between long-lived branches easier.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License