Git and large repos

As you may heard, at Shopify we prefer monolith application to microservices. Dealing a monolyth repo can become tricky.

Today the Shopify project directory takes > 2 Gb on my SSD drive, plus 2.5 Gb of non-CVS tracked files like Ruby and Javascript dependencies and cache.

I imagine that it’s not the largest Git repo in the world, but sometimes Git still breaks on such a huge repo. Last week I’ve got the following symptoms:

That was 12 background git fetch processes that consumed 360% of CPU and a ton of memory. As Git launched these processes automatically in background, there was definetely something that went wrong.

Killing them didn’t help: Git launched them again. I’ve also tried to disable auto GC (git config --global 0) but it didn’t help either.

As I found, there is a Git command to verify validity of the repo database: git fsck. While in the clear scenario the output of git fsck should be empty, in my case it printed a huge list of invalid objects.

Afterwards, I launched garbage collection manually with git gc --prune=now. And it finally solved the issue.

I hope my story will help someone with the the similar Git symptoms.


comments powered by Disqus