Object Model Operations

Video

Want to see the full-length video right now for free?

Notes

Welcome back to our tour of the Git object model. In this video, we'll go beyond the base objects and look at more of the structure with tags, branches, and remotes, as well as reviewing how the various Git commands act on this collection of objects.

Note - If you haven't watched the First Part of this review of the Git object model, we highly recommend you go back and do that now, as this video largely builds on that foundation.

Git Object Review

Before adding more to our growing picture of the Git object model, let's quickly review the base objects we covered in the first video:

Blobs - Our base objects, storing the contents of a single version of a file.
Trees - Trees store directory listings, pointing at blobs and other trees to define a full directory structure.
Commits - Commits lock in a version of our code, pointing at a single tree, the "working tree", as well as holding the commit message and author info. Commits also point at parent commits to capture the history of our code.

Refs

Returning to our peek into the .git directory, we can first review the layout:

$ tree .git -L 1
.git
├── COMMIT_EDITMSG
├── HEAD
├── config
├── description
├── hooks/
├── index
├── info/
├── logs/
├── objects/
└── refs/

In the previous video, we focused primarily on the objects directory, which acted as a database of the blob, tree, and commit objects we created as we worked with our repo.

In this video, we'll instead focus on the refs directory. Peeking inside, we'll see:

$ tree .git/refs
.git/refs
├── heads/
|   └── master
└── tags/

Heads

The first directory we'll encounter within the refs directory is heads. These are our local branches. The directory is called heads, as our local branches are the collection of things that HEAD can point at.HEAD is the ultimate ref, defining what we currently have checked out.

Currently, our heads directory only contains a single file, master. We call this master file a "file", rather than some more complex Git object, because that it what it is. We can test this by cating it out:

$ cat .git/refs/heads/master
f95b2fe3b64c6351e7eec4011921b4469098b9ba

Here we can see that the file contains a string which looks very much like a Git object hash. We can then turn around and ask Git about the object:

$ git cat-file -t f95b2fe3b64c6351e7eec4011921b4469098b9ba
commit

$ git cat-file -p f95b2fe3b64c6351e7eec4011921b4469098b9ba
tree 0cae7dc167b255c0123c7c396fc48ce40fc35cfa
parent ef34a153025fffb8a498fff540f7c93963937291
author Chris Toomey <chris@ctoomey.com> 1441311544 -0400
committer Chris Toomey <chris@ctoomey.com> 1441311544 -0400

Another file in app dir

Now we have a full picture of what exactly our master branch is: a file, stored in .git/refs/heads. Its contents are the hash of a single commit. We know that commits contain a pointer to the working tree, as well as parent commits, and now we can add branches to the list of pointers in our view of the Git world.

Branches are just pointers; nothing more!

Difference Between Tags and Branches

While branches and tags are very similar in that they both simply contain a reference to a commit, they differ in that branches can change what they point at, but tags cannot.

Tags exist to lock down and name ("tag", if you will) a specific version of the code. Branches exist to track the changes in our codebase over time, and will therefore update whenever we commit or merge.

Remote Branches

For the small local sample repo we've been working with so far there are no remotes, but we can hop over to the local checkout of the Upcase repo to see an example that contains remotes:

$ tree .git/refs
.git/refs
├── heads/
|   ├── deck-last-attempt
|   ├── master
|   ├── ... (truncated)
|   └── welcome-trail
├── remotes/
│   ├── origin
│   │   ├── HEAD
│   │   ├── cjt-north-star-metric
│   │   ├── master
│   │   ├── mg-button-colors
│   │   └── ... (truncated)
│   ├── production
│   │   └── master
│   └── staging
│       ├── dashboard-staging
│       ├── ... (truncated)
│       └── master
└── tags/
    └── v0.1

With the more real-world example of the Upcase Git repo, we can see that there is now a third subdirectory alongside heads and tags in the .git/refs directory.

Within this remotes directory, there is a directory for each of our remotes, namely origin, staging, and production. This adds a bit more structure, but otherwise these objects are the same as our branches. We can confirm this by investigating the contents of one of these remote branch files:

$ cat .git/refs/origin/cjt-north-star-endpoint
3891a7bc21e5e0c69e71e8153bb8b4a67b80bff5

$ git cat-file -t 3891a7bc21e5e0c69e71e8153bb8b4a67b80bff5
commit

$ git cat-file -p 3891a7bc21e5e0c69e71e8153bb8b4a67b80bff5
tree 32022b6465ebf9f9e37b7e1caccb3c9e620dd465
parent 7262141ae317f56b567ed2f95505e6ca9bbe1605
author Chris Toomey <chris@ctoomey.com> 1433384047 -0400
committer Chris Toomey <chris@ctoomey.com> 1435239388 -0400

WIP analytics JSON endpoint

Again, we see more of the same. Remote branches are simply pointers to a commit. It's pointers all the way down, friends!

HEAD Object

HEAD is the final object we need to be aware of to understand Git. HEAD, unlike the other objects we've discussed, is a singleton, meaning that there is only ever one HEAD.

HEAD identifies the currently checked out object. Typically, this is a branch (with that branch pointing to a commit), but it is possible to check out a commit directly, in which case HEAD would be pointing at that commit.

HEAD is a file just like our branch objects. It lives at the root of the .git/ directory and its contents are similarly simple:

$ cat .git/HEAD
ref: refs/heads/master

This is the normal mode for Git, where HEAD points to a branch, in this case the master branch. If we were to check out a commit directly, then HEAD would simply point at that commit:

$ git co 833c1ea

$ cat .git/HEAD
833c1ea55d76adcf48b5f7e933271fcc3e36f123

So once again we find ourselves with a pointer. HEAD points to a branch, that branch points to a commit, and that commit points to a working tree and parent commit. Pointers. All. The. Way. Down.

Final Object Model Diagram

And, with the addition of HEAD, we have a complete picture of the Git object model.

Objects - blobs, trees, and commits.
Refs - branches, tags, and remote branches.
HEAD - The single pointer to rule them all.

Git Object Model

Git Operations and Objects

Now that we understand the objects that are used throughout Git, we're going to zoom out a bit and focus primarily on commits and refs. Nearly all operations in Git involve commits, although typically these commits are referenced through refs like branches and remotes.

Checkout

Checking out a new branch is just the act of creating a ref file, specifically a "head", and populating it with the relevant commit hash.

$ git checkout -b new-branch

First Git will follow from the HEAD to the current branch to determine what commit hash that branch points at. With that info, Git creates a new file in .git/refs/headswith our new branch name as the file name, and the commit hash as the contents. Lastly, it updates HEAD to point at this new ref.

Verbose Checkout

Similarly, we can use the verbose form of checkout, where we explicitly specify the base branch. For instance:

$ git checkout -b other-branch master

is largely the same as the last check out, but instead of starting from HEAD, we start from the specified branch to determine the commit to point at, and use that to populate our new ref file.

Checking Out a File

There's an alternative form of checkout when we check out a file by specifying a ref. Technically, we need a tree to get to a specific version of a file, but Git's pointer system also allows for something to be "tree-ish". When something is tree-ish, it will eventually lead to a single tree by dereferencing the pointers.

A commit is tree-ish because commits point at a single tree for the working directory.

Refs are tree-ish because they point at commits, which point at a tree.

Even HEAD is tree-ish by the same logic.

So if we use the following form of the checkout command:

$ git checkout master -- app/assets/javascripts/application.js

Git will begin by looking up the commit that master points at, then the working tree of that commit, and then walk down through the intermediate trees until it reaches the blob for app/assets/javascripts/application.js, and restore that version of the file.

Committing

Committing takes all of the staged objects and stores them as needed. This typically involves at least one new blob, and a new tree for the current version of the working directory.

It then builds a commit object that points at our new tree, as well as the commit we are currently on.

Lastly, it updates our checked out branch to point at this newly created commit.

$ git commit -m "Add new file"

Fast-Forward Merge

A fast-forward merge is about the simplest operation we can perform. It creates no new objects, instead simply updating the current branch to reference a different commit.

$ git merge --ff-only feature

Merge

A traditional merge is much more interesting. We start with two diverging histories, and Git creates a new tree for us from the two existing trees.

Once it has the new tree, Git will create a new commit that points at this tree. Lastly, the branch ref will be updated to point at this new commit.

Comparing these two merge strategies, it becomes clear why we prefer the fast-forward only merges. In a fast-forward merge we are just updating a pointer, but the code is not changed. In a traditional merge, Git does its best to bring together two different versions of the code, creating a new commit and tree that we have not interacted with.

$ git merge feature

Rebasing

So with this comparison of traditional and fast-forward merges in mind, we can talk about our good friend rebase. Rebase can be performed when we have new commits on both our feature branch, and our "upstream" branch (typically master). We want to update the commits on our branch so they include the changes on master.

When we rebase, we essentially replay our work on the current version of the upstream branch. Git does this by calculating each of the diffs for the commits unique to our branch, then applies them onto the upstream branch one by one. Each application of a diff creates a new commit, reusing the associated commit message and author details.

Note that the old commits still exist, but they are now orphaned. No refs point to them any longer and so they are essentially unreachable, although we know from the discussion of the reflog in the first video that we could easily restore them by checking the reflog.

Once all the new commits have been created, our branch is updated to point at the tip commit of our rebased group.

From here, we could now fast-forward merge the master branch into ours, as we are now in line with its history. The key difference between this and a traditional merge is that all of the commits here were created by us, and we get to interact with them and test them as needed before merging them into master.

$ git rebase master

Interactive Rebase

Interactive rebase is very similar. We begin with a set of commits, typically on a feature branch and ahead of master, and we perform our interactive rebase. When we squash them down, we create a new commit using the tree of our former tip commit, and compose a new commit message.

Once again, we can see that the old commits live on despite being orphaned, and we can therefore get back to them as needed.

$ git rebase --interactve master

Mastering Git

29 minutes