Can We Use Trunk-Based Development for Legacy Software?

Not right away! Trunk-Based Development requires that the software builds and passes enough tests, before we integrate our changes into the main branch (a.k.a., trunk). We have enough tests, if breaking the software is highly unlikely. By definition, legacy code has no or not enough tests. Hence, we cannot apply trunk-based development right way, but should evolve our development process towards it.

Trunk-Based Development

In trunk-based development, a developer performs the following steps to change and integrate code.

Step 1. She synchronises (pulls) the latest changes of her team members from the remote main branch into her local copy of main.
Step 2. She changes the code using Test-Driven Development (TDD) and commits the changes to her local main branch.
Step 3. Once done, she runs the test suite (all unit tests and fast acceptance and integration tests). If some tests fail, she goes back to Step 2 and fixes the problem. If all tests pass, she proceeds to Step 4.
Step 4. She integrates (pushes) her changes into the remote main branch for all team members to see. She continues with Step 1 on the next change.

Trunk-based development only works, if developers integrate small well-tested changes frequently. Integrating multiple times per day is great. Once per day is good, once every other day is OK, and everything slower is bad. Frequent integrations of small changes minimise the number of conflicts with changes from other developers and make finding and fixing bugs faster and easier. They are a major time saver.

Frequent integrations require very fast build and test runs. Ideally, Step 3 – building and running tests – takes less than 3 minutes on developers’ computers. 5-10 minutes are OK. Longer than 10 minutes is not acceptable. The longer Step 3 takes the less often developers will integrate the changes and the more likely they accumulate more changes.

The test suite, which developers run in Step 3, must give the team high confidence that the code change doesn’t break anything on main. High confidence means that less than 1% of the integrations fail. The lower the failure rate the less often developers must roll back a change or fix a bug. Low failure rates save a lot of time.

Integrations can only fail, if developers can’t run all the tests on their computers. The more powerful build server may run more tests – especially more integration, system and performance tests. We should try hard to write unit tests for the failing tests. Failing faster is better.

By now, it should be clear that trunk-based development doesn’t work without a good enough test suite. As legacy code is defined as code with no or not enough tests, trunk-based development cannot be applied to legacy code, at least not right away. With legacy code, integrations will fail often. We will only notice these failures days, weeks or even months later due to the lack of tests.

The 4-step procedure above mentions only a single developer. And that’s on purpose! Working on a local copy of master counters the bad practice of several developers working on a shared branch (e.g., on a feature or release branch). This bad practice leads to infrequent big-bang integrations (e.g, every one or two weeks). Developers waste time resolving conflicts and fixing bugs caused by conflict resolutions gone wrong. Overwriting changes of other developers is a major morale killer.

Mitigating the Risks of Trunk-Based Development

We cannot apply trunk-based development to legacy software right away. However, we should strive to reach it sooner rather than later. We can adapt trunk-based development for teams working on legacy software with three practices.

TDD for Every Code Change

We use TDD for every code change. As TDD requires us to write unit tests first, we are guaranteed to have unit tests for every code change. We also use TDD to understand code relevant to our changes. So, we end up with more code covered by unit tests than strictly necessary. That’s good! The test suite required for trunk-based development grows faster.

Getting legacy classes under test can be tricky and takes some time. James Grenning’s crash-to-pass algorithm turns this into a mechanical exercise. With a bit of practice, we get most classes under test in 2-4 hours. Classes depending on many other classes with lots of functions will take a bit longer. I find it easier to work my way from the leaves of the dependency tree (fewer dependencies) up to the root (more dependencies).

Enabling TDD with the crash-to-pass algorithm is worth our efforts. It quickly grows the test suite of our legacy code base. Refactoring code – improving code without changing its behaviour – is an integral part of TDD. TDD produces FIRST tests: fast, isolated, reliable, self-verifying and timely tests. Diligent use of TDD increases the testability of code, which in turn makes it easier to write high-quality acceptance, integration and system tests.

Short-Lived Branches for Code Reviews

Although the practice TDD for Every Code Change ensures unit tests for every code change, we should not push our changes directly to master. The danger of breaking code in a seemingly unrelated part of the software is too high. Legacy code has many, often surprising dependencies between remote code parts.

We mitigate this danger by introducing code reviews. The developer creates a short-lived branch for the sole purpose of a code review. She creates a merge request for code review.

Paul Hammant, the author of the excellent web site Trunk-Based Development, calls these branches short-lived feature branches. I prefer the name short-lived review branch. It makes clear that developers must not abuse the branch for lengthy feature development on a shared branch.

The reviewers should focus on the following questions:

Is there a test for the code change?
Do the tests conform to the FIRST principles?
Did the author use the tests to refactor the code?
Are the tests easy to understand?

If reviewers answer one of these questions with no, they will reject the merge request. Otherwise, the code remains too difficult to change for others. Reviewer and author should work together to turn all answers to yes.

Changing the code, reviewing it and integrating it into main should not take longer than two days. Shorter is better. Hence, code reviews must happen in a timely manner. Good times for reviews are first thing in the morning or first thing after lunch. In the evening before leaving for home is also quite good. Code reviews are an essential part of developers’ jobs. The bar for pair programming gets lower and lower – especially for complicated or complex pieces of code.

Trunk-based development with short-lived review branches is called scaled trunk-based development. It is better suited for teams working on legacy code and for bigger teams.

Small User Stories

Big user stories lead to infrequent integrations of big changes. Guided by the Scrum Master, the Product Owner and the developers should try hard to break down user stories into smaller stories that take at most one ideal working day to finish. Then integration times of less than two days are achievable.

Many teams have serious problems to break down stories into such small pieces. The number one reason is that developers don’t understand the code, which isn’t surprising for legacy code.

Developers can increase their understanding by doing some experiments on the code with TDD. Writing a test codifies their understanding. If the test fails, their understanding was wrong and they must adapt it. Otherwise, they go on with the next test. I have found that doing this for 15-30 minutes helps me split up big stories into several smaller ones and come up with better estimates for the smaller stories.

A complicated story may require a spike of up to a day. Longer spikes return less and less value. A spike should give the team enough information to come up with separable steps for an implementation and hence a reliable estimate. A spike is not a substitute for a partial or even full implementation.

Product Owners can help a lot, too. They can split up a story into normal, special and error scenarios. They can reduce the complexity of a story by making assumptions that work, say, for 60% of the scenarios.

Driver of Continuous Delivery

Trunk-based development is a major driver of continuous delivery. Hence, we want development teams use it – even for legacy software.

Our research also found that developing off trunk/master rather than on long-lived feature branches was correlated with higher delivery performance. Teams that did well had fewer than three active branches at any time, their branches had very short lifetimes (less than a day) […] and never had “code freezes” or stabilization periods. […] these results are independent of team size, organization size, or industry.
Nicole Forsgren et al., Accelerate: Building and Scaling High Performing Technology Organisations, 2019. Page 55.

Trunk-based development positively affects continuous integration and continuous deployment and is positively affected by automated tests. All four are major drivers of Continuous Delivery. Trunk-based development epitomises how and why Continuous Delivery works: Get feedback quickly and frequently.