Git history with Elixir
One of the Git commands I use the most is
git log --oneline -10
It allows me to have a quick look at the history of a repo. In a previous post, I showed how to inflate a commit object. Now, I want to explore how to get the Git history log.
You can find the source code of the script here.
Creating a new Git repo
First, let’s create a new folder and initialize a Git repo with a few dummy commits
mkdir git_log_elixir
cd git_log_elixir
git init
git commit --allow-empty -m foo
git commit --allow-empty -m bar
git commit --allow-empty -m baz
git log --oneline
and you should get something like this:
938814a (HEAD -> main) baz
53c187c bar
ace12fe foo
Reading the HEAD reference
The .git/HEAD
file contains the path to the checked out branch reference, in my case
ref: refs/heads/main
We need to read this path and then retrieve the hash from the reference file.
.git
├── HEAD ← this file contains the path to main
└── refs
├── heads
│ └── main ← this file contains the commit hash
└── tags
Create a new file git_log.exs
and add the following:
read_head = fn ->
head_path = Path.join([".git", "HEAD"])
<<"ref: " :: binary, ref_path :: binary>> = File.read!(head_path)
[".git", ref_path]
|> Path.join()
|> String.trim()
|> File.read!()
|> String.trim()
end
As for this function, it is interesting how we use pattern matching to extract the path from the reference file. Let’s test our progress so far by adding a test function and calling it immediately
_test = fn ->
read_head.()
|> IO.puts
end.()
and in the terminal, we call the script
> elixir git_log.exs
938814ada758886d0840b31ce3f791619cd65a43
Reading the commit object
As we saw in a previous post, Git uses the first two characters of the hash as the folder name, and the rest as the file name of the “commit object”.
We’ll copy the inflate script from the previous post after read_head
inflate = fn binary ->
z = :zlib.open()
:zlib.inflateInit(z)
[uncompressed] = :zlib.inflate(z, binary)
:zlib.close(z)
uncompressed
end
and we’ll create a function to read the file from the hash. After inflate
definition add
read_file_from_hash = fn <<dir :: binary-size(2), filename :: binary>> ->
path = Path.join([".git", "objects", dir, filename])
path
|> File.read!()
|> inflate.()
end
Remember that the content of the file is gzipped, so we need to inflate it.
We can test it by replacing the test function with:
_test = fn ->
read_head.()
|> read_file_from_hash.()
|> IO.puts
end.()
If you run it, you should see the information of the last commit
commit 220tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
parent 53c187cd284f1d3ccbcfefb16a671d776a3fc97d
author Luis Ferreira <test@example.com> 1650306402 -0300
committer Luis Ferreira <test@example.com> 1650306402 -0300
baz
Parsing the commit
Now we need to extract some information out of the commit. So let’s write a few helper functions to do that. Add them after the read_file_from_hash
function.
get_message = fn content ->
content
|> String.split("\n\n")
|> Enum.at(1)
|> String.trim()
end
get_date = fn content ->
content
|> String.split("\n")
|> Enum.filter(fn "author" <> _rest -> true; _ -> false end)
|> List.first()
|> String.reverse()
|> (fn <<_offset :: binary-size(6), date :: binary-size(10), _ :: binary>> -> String.reverse(date) end).()
end
get_short_hash = fn <<commit :: binary-size(7), _ :: binary>> ->
commit
end
get_parents = fn content ->
content
|> String.split("\n", trim: true)
|> Enum.filter(fn "parent" <> _rest -> true; _ -> false end)
|> Enum.map(fn "parent " <> commit -> commit end)
|> Enum.map(&String.trim/1)
end
We now have helper functions to get the commit message, the date, the short hash and the parent commits.
In order to get the date part of the commit, we pattern match on the author line. The date is the last part of the row. So, we reverse the string and pattern match the date, and then reverse the string again.
Let’s test our progress by updating the test function.
_test = fn ->
commit_hash = read_head.()
commit = read_file_from_hash.(commit_hash)
short_hash = get_short_hash.(commit_hash)
date = get_date.(commit)
msg = get_message.(commit)
parents = get_parents.(commit) |> Enum.join(" ")
IO.puts "#{short_hash} #{date} #{msg} #{parents}"
end.()
When we run the script, we get:
> elixir git_log.exs
938814a 1650306402 baz 53c187cd284f1d3ccbcfefb16a671d776a3fc97d
We are getting close!
Building the graph
Git stores the information as a graph. Every commit references its parent commit (it can be more than one), until we reach the first commit that doesn’t reference any other commit.
To get a list of all commits, we need to traverse the commits to get the parents recursively until we reach the first one, by then, we can return the list of all commits. The following is a naive implementation of a function to traverse the graph, but it’ll do the work. Put it after the get_parents
function.
get_commits = fn [], _fun -> [];
commit_hashes, fun ->
for commit_hash <- commit_hashes do
parents =
commit_hash
|> read_file_from_hash.()
|> get_parents.()
|> fun.(fun)
[commit_hash | parents]
end
|> List.flatten()
end
This y-combinator function traverses the graph, accumulating the commits in a single list.
Let’s test it. Remember, y-combinators functions receive a reference to themselves as a parameter to be able to do the recursive call.
_test = fn ->
read_head.()
|> List.wrap()
|> get_commits.(get_commits)
|> IO.inspect
end.()
and we get
> elixir git_log.exs
["938814ada758886d0840b31ce3f791619cd65a43",
"53c187cd284f1d3ccbcfefb16a671d776a3fc97d",
"ace12fe245e00410f3462f9eba3c9862cfcb59bb"]
Those are the three commits of the repo!
Printing the Git history
As we have the hashes, now we can load each commit and print the data we want. Let’s add a few more helper functions before the _test
function.
load_commits = fn commits ->
for commit <- commits do
{commit, read_file_from_hash.(commit)}
end
end
extract_commit_data = fn commits ->
for {commit, content} <- commits do
{
get_short_hash.(commit),
get_date.(content),
get_message.(content)
}
end
end
sort = fn commits ->
commits
|> Enum.sort_by(fn {_, date, _} -> date end)
|> Enum.reverse()
end
pretty_print = fn commits ->
for {short_hash, _, msg} <- commits do
"#{short_hash} #{msg}"
end
|> Enum.join("\n")
end
Pretty straight forward, we load each commit, extract the information we need, sort them by date and print the information we want.
Let’s put it all together:
_test = fn ->
read_head.()
|> List.wrap()
|> get_commits.(get_commits)
|> Enum.uniq()
|> load_commits.()
|> extract_commit_data.()
|> sort.()
|> pretty_print.()
|> IO.puts()
end.()
Then, we can run it:
> elixir git_log.exs
938814a baz
53c187c bar
ace12fe foo
Perfect! It worked!
Conclusion
By starting with the HEAD file, we’re able to reconstruct the history of the repository. We get the current checked out branch hash and from there, we extract the parent commits until we get the full history. Then it is just a matter of getting the information we want to print out from every commit object.
It was a fun ride! I hope you enjoyed it.