Should You Squash Merge or Merge Commit?
Should you squash commits or merge them as they are?
There’s certainly strong proponents on both sides of this debate. It becomes more complex when you consider the third option - rebasing. I never rebase, so I’m not going to discuss it here. The choice of merging commits as they are, or “squashing” commits is not always a clear one - both sides of the argument will fiercely extol the advantages of their preferred approach and disadvantages of the opposite approach. It soon becomes a philosophical debate.
This could be considered the default behaviour of Git. The source control history will be preserved exactly as it happened. The argument for merging commits exactly as they are, is that the commits form a history of the codebase and therefore are valuable because of the story the commits tell. The commits can be used to track down bugs, for example. The idea being it is easier to track down when a bug was introduced if there are lots of small commits instead of many commits compressed (squashed) into one. The argument against this is that there is too much noise which actually makes the previous claim harder to be practical.
Squash merges, as it’s proponents argue, are more valuable than merge commits because entire new features or bug fixes can be compressed into a single commit and therefore easier to code review and read at some point in the future. As previously discussed, another benefit of squash merges is that they prevent noisy source control history which can include typo fixes, previously accidentally missed files, etc. It keeps the history “clean”, as they say. However, proponents of merge commits vehemently dislike this concept, feeling that a lot of history and context is lost in the process.
So, what’s the best choice?
I can honestly see the advantages and disadvantages of both sides of the debate. I don’t have a good answer for which should be the default option for a codebase. I like the ability to see every commit, but I also like a cleaner looking history. I think this puts me in a minority simply because I haven’t become enthusiastically attached to one way or the other.
I think that factors such as team size (including teams of one developer) and how the source history is used (if at all) should be considered before picking an approach. For example, I am much less inclined to use squash merges on the repository for this site as I am the only developer. However, at times I have squash merged when I’ve had many commits with messages I didn’t want in the source control. On the other side, I am more inclined to use squash merges on team projects where other developers may read source history.
Another important consideration is usage of automated tooling and processes that use the commit messages as part of the CI/CD build pipeline. It is common to have the CHANGELOG.md file have new features and breaking changes added to it. How those tools are able to do this can depend if all the commits are present or if they are squashed.
What can I agree on then?
Whichever side of the debate you, the reader, are on I certainly hope that as professionals we all agree that keeping the build green (build pipelines running successfully and all tests passing) should be considered not just a default but expected for any software project.
Wait, so why don’t you like rebasing?
I don’t plan on going to great lengths to explain my dislike for rebasing but I knew there would be a lot of questions, so I’ll write a small note on it. Fundamentally, I dislike rebasing because of it’s possibly destructive nature. Sure, there are also a number of proponents of rebasing - usually spouting pithy advice such as “It’s not destructive, if you use Git right”.
Git is already hard enough to use with a very poor and inconsistent developer experience that leads to too many mistakes. Telling developers to just “use Git right” when rebasing could result in you overwriting a team members work feels somewhat thoughtless. Not to mention, there are many explanations and guides with elaborate diagrams showing fictional repositories being rebased and “rewriting history” (that’s the destructive “oops I deleted your code” part) that I find tedious.
Finally, merge conflicts during a rebase result in some of the worst developer experience imaginable. Each conflict will be represented one commit at a time ad nauseam. This is confusing and an easy way to pick the wrong side of the conflict. Reversing a rebase is difficult. Merge conflicts during a merge are presented all at once - I’d much rather pair with the developer my commits are conflicting with and work together through a single commit than potentially dozens.