The Wave/Particle Duality of Git Commits

Jan 20, 2020

4 min read

A lot of my friends get confused when they encounter more advanced Git topics such as git rebase. This misunderstanding typically arises during the first merge conflict that interrupts the rebase process. The central point of confusion is usually how Git can reapply one commit onto another. My friends are confused about why a snapshot could have a merge conflict. Shouldn’t it be obvious, they think, what the final state should be?

In order to understand what’s going on, it’s first necessary to take a quick detour into quantum physics—tongue planted firmly in cheek.

Wave/particle duality

If you’re not already passingly familiar with this particular flavor of apparent quantum nonsense, I am afraid you’re about to get a lot more confused. It turns out that particles such as photons (and electrons, etc) have properties of both classical particles and waves. This leads to experimental results such as the well-known double-slit experiment. If photons are fired through a slit onto a piece of film, they create a band where they passed through the slit, acting as if photons are particles. However, photons that pass through two side-by-side slits interfere with themselves to create an interference pattern on a piece of film, as if photons are waves.

An example of the double-slit experiment — The small extra ripples are the interference from the second slit. (photo CC BY-SA 3.0, by Jordgette)

The upshot is that it is important to realize that photons are neither particles nor waves, but rather something strange that behaves like both.

What do photons have to do with Git?

I’m glad you asked.

It’s easy to see that many Git operations treat commits as snapshots. Every Git user knows how to use checkout:

$ git checkout 42bd1ef5 && ls -lh
HEAD is now at 42bd1ef Adjust padding for social links
total 48K
drwxr-xr-x 2 georgev georgev 4.0K Jun  1  2019 archetypes
drwxr-xr-x 3 georgev georgev 4.0K Jun  1  2019 assets
drwxr-xr-x 2 georgev georgev 4.0K Jun  1  2019 images
drwxr-xr-x 7 georgev georgev 4.0K Aug 25 13:27 layouts
-rw-r--r-- 1 georgev georgev 9.3K Jan  4 12:21 README.md
drwxr-xr-x 3 georgev georgev 4.0K Aug 15 01:01 static
-rw-r--r-- 1 georgev georgev  712 Jun  1  2019 theme.toml

But other commands, such as show, output a unified diff—that is, a patch:

$ git show 42bd1ef5
commit 42bd1ef5c5541815ad0ae4eba2c695387228e719
Author: George Hilliard <thirtythreeforty@gmail.com>
Date:   Tue Dec 31 10:43:45 2019 -0600

    Adjust padding for social links

diff --git a/assets/scss/hyde-hyde/_sidebar.scss b/assets/scss/hyde-hyde/_sidebar.scss
index 0686b42..6b34856 100644
--- a/assets/scss/hyde-hyde/_sidebar.scss
+++ b/assets/scss/hyde-hyde/_sidebar.scss
@@ -54,7 +54,7 @@
 .social {
   text-align: center;
   a {
-    padding: 0 4px;
+    padding: 0 7px;
     @include link-no-decoration();
   }
 }

The same commit hash is present in both commands. There’s no “--as-patch” or “--as-snapshot” flags to indicate a difference. I am in fact referring to the same object. What is going on?

Commits, it seems, are neither patches nor snapshots until they are observed. Rather, they are something strange¹ that can be treated as either a patch or a snapshot, just as a photon has properties of both a particle and a wave.

When Git treats a commit as a patch, it appears as the difference between its snapshot and that of its parent. When Git treats a commit as a snapshot, it appears as the tree of files that were in the staging area when the commit was made.

This duality of Git commits is what enables operations such as rebase. Git commands such as rebase, cherry-pick, format-patch, etc, all treat a Git commit as a patch, rather than as a snapshot. When viewed this way, Git’s rebase operation suddenly makes a whole lot more sense: Git takes a string of commits, interprets them as patches, and applies them to the commit you specify.

Rebase diagram — A simple rebase, where `C4`’s diff has been applied to `C3`. (from Git docs)

If this process goes smoothly, this creates a new string of commits, with the destination snapshot modified according to the original commits-as-patches. And so this is also why it’s possible for rebase to create “merge conflicts:” it’s the patches, not that snapshots, that won’t apply cleanly.

Is this a silly way to think of commits? Probably. But it has been helpful to at least one other person when I explained it this way. I hope it helps you too.

You can subscribe to updates via RSS or email.

This duality extends as far as the underlying storage format: individual commits are stored as something very similar to a snapshot: they are a “tree” referencing “blobs”, all referenced by a “commit” blob. But pack files are stored similar to patches, with objects optionally being stored as a “delta” to another object occurring earlier in the pack. This is an internal detail that most users of Git need not be aware of. ↩︎