One of the Git commands I use the most is

git log --oneline -10

It allows me to have a quick look at the history of a repo. In a previous post, I showed how to inflate a commit object. Now, I want to explore how to get the Git history log.

You can find the source code of the script here.

Creating a new Git repo

First, let’s create a new folder and initialize a Git repo with a few dummy commits

mkdir git_log_elixir
cd git_log_elixir
git init

git commit --allow-empty -m foo
git commit --allow-empty -m bar
git commit --allow-empty -m baz

git log --oneline

and you should get something like this:

938814a (HEAD -> main) baz
53c187c bar
ace12fe foo

Reading the HEAD reference

The .git/HEAD file contains the path to the checked out branch reference, in my case

ref: refs/heads/main

We need to read this path and then retrieve the hash from the reference file.

.git
├── HEAD ← this file contains the path to main
└── refs
    ├── heads
    │   └── main ← this file contains the commit hash
    └── tags

Create a new file git_log.exs and add the following:

read_head = fn ->
  head_path = Path.join([".git", "HEAD"])

  <<"ref: " :: binary, ref_path :: binary>> = File.read!(head_path)

  [".git", ref_path]
  |> Path.join()
  |> String.trim()
  |> File.read!()
  |> String.trim()
end

As for this function, it is interesting how we use pattern matching to extract the path from the reference file. Let’s test our progress so far by adding a test function and calling it immediately

_test = fn ->
  read_head.()
  |> IO.puts
end.()

and in the terminal, we call the script

> elixir git_log.exs

938814ada758886d0840b31ce3f791619cd65a43

Reading the commit object

As we saw in a previous post, Git uses the first two characters of the hash as the folder name, and the rest as the file name of the “commit object”.

We’ll copy the inflate script from the previous post after read_head

inflate = fn binary ->
  z = :zlib.open()
  :zlib.inflateInit(z)
  [uncompressed] = :zlib.inflate(z, binary)
  :zlib.close(z)

  uncompressed
end

and we’ll create a function to read the file from the hash. After inflate definition add

read_file_from_hash = fn <<dir :: binary-size(2), filename :: binary>> ->
  path = Path.join([".git", "objects", dir, filename])

  path
  |> File.read!()
  |> inflate.()
end

Remember that the content of the file is gzipped, so we need to inflate it.

We can test it by replacing the test function with:

_test = fn ->
  read_head.()
  |> read_file_from_hash.()
  |> IO.puts
end.()

If you run it, you should see the information of the last commit

commit 220tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
parent 53c187cd284f1d3ccbcfefb16a671d776a3fc97d
author Luis Ferreira <test@example.com> 1650306402 -0300
committer Luis Ferreira <test@example.com> 1650306402 -0300

baz

Parsing the commit

Now we need to extract some information out of the commit. So let’s write a few helper functions to do that. Add them after the read_file_from_hash function.

get_message = fn content ->
  content
  |> String.split("\n\n")
  |> Enum.at(1)
  |> String.trim()
end

get_date = fn content ->
  content
  |> String.split("\n")
  |> Enum.filter(fn "author" <> _rest -> true; _ -> false end)
  |> List.first()
  |> String.reverse()
  |> (fn <<_offset :: binary-size(6), date :: binary-size(10), _ :: binary>> -> String.reverse(date) end).()
end

get_short_hash = fn <<commit :: binary-size(7), _ :: binary>> ->
  commit
end

get_parents = fn content ->
  content
  |> String.split("\n", trim: true)
  |> Enum.filter(fn "parent" <> _rest -> true; _ -> false end)
  |> Enum.map(fn "parent " <> commit -> commit end)
  |> Enum.map(&String.trim/1)
end

We now have helper functions to get the commit message, the date, the short hash and the parent commits.

In order to get the date part of the commit, we pattern match on the author line. The date is the last part of the row. So, we reverse the string and pattern match the date, and then reverse the string again.

Let’s test our progress by updating the test function.

_test = fn ->
  commit_hash = read_head.()
  commit = read_file_from_hash.(commit_hash)

  short_hash = get_short_hash.(commit_hash)
  date = get_date.(commit)
  msg = get_message.(commit)
  parents = get_parents.(commit) |> Enum.join(" ")

  IO.puts "#{short_hash} #{date} #{msg} #{parents}"
end.()

When we run the script, we get:

> elixir git_log.exs
938814a 1650306402 baz 53c187cd284f1d3ccbcfefb16a671d776a3fc97d

We are getting close!

Building the graph

Git stores the information as a graph. Every commit references its parent commit (it can be more than one), until we reach the first commit that doesn’t reference any other commit.

To get a list of all commits, we need to traverse the commits to get the parents recursively until we reach the first one, by then, we can return the list of all commits. The following is a naive implementation of a function to traverse the graph, but it’ll do the work. Put it after the get_parents function.

get_commits = fn [], _fun -> [];
                 commit_hashes, fun ->
  for commit_hash <- commit_hashes do
    parents =
      commit_hash
        |> read_file_from_hash.()
        |> get_parents.()
        |> fun.(fun)

      [commit_hash | parents]
  end
  |> List.flatten()
end

This y-combinator function traverses the graph, accumulating the commits in a single list.

Let’s test it. Remember, y-combinators functions receive a reference to themselves as a parameter to be able to do the recursive call.

_test = fn ->
  read_head.()
  |> List.wrap()
  |> get_commits.(get_commits)
  |> IO.inspect
end.()

and we get

> elixir git_log.exs
["938814ada758886d0840b31ce3f791619cd65a43",
 "53c187cd284f1d3ccbcfefb16a671d776a3fc97d",
 "ace12fe245e00410f3462f9eba3c9862cfcb59bb"]

Those are the three commits of the repo!

Printing the Git history

As we have the hashes, now we can load each commit and print the data we want. Let’s add a few more helper functions before the _test function.

load_commits = fn commits ->
  for commit <- commits do
    {commit, read_file_from_hash.(commit)}
  end
end

extract_commit_data = fn commits ->
  for {commit, content} <- commits do
    {
      get_short_hash.(commit),
      get_date.(content),
      get_message.(content)
    }
  end
end

sort = fn commits ->
  commits
  |> Enum.sort_by(fn {_, date, _} -> date end)
  |> Enum.reverse()
end

pretty_print = fn commits ->
  for {short_hash, _, msg} <- commits do
    "#{short_hash} #{msg}"
  end
  |> Enum.join("\n")
end

Pretty straight forward, we load each commit, extract the information we need, sort them by date and print the information we want.

Let’s put it all together:

_test = fn ->
  read_head.()
  |> List.wrap()
  |> get_commits.(get_commits)
  |> Enum.uniq()
  |> load_commits.()
  |> extract_commit_data.()
  |> sort.()
  |> pretty_print.()
  |> IO.puts()
end.()

Then, we can run it:

> elixir git_log.exs

938814a baz
53c187c bar
ace12fe foo

Perfect! It worked!

Conclusion

By starting with the HEAD file, we’re able to reconstruct the history of the repository. We get the current checked out branch hash and from there, we extract the parent commits until we get the full history. Then it is just a matter of getting the information we want to print out from every commit object.

It was a fun ride! I hope you enjoyed it.