With Team Topologies, Matthew Skelton and Manuel Pais have written a brilliant book how to create a high-performance software development organisation with the best team structure and interaction. Organisations are complex adaptive systems that change as the result of the interactions between their components (e.g., the teams) and the interactions with other systems (e.g., the rest of the organisation, competitors, suppliers, technology and market trends). The resulting changes are hard to predict and hard to control.
Chapter 2: Conway’s Law and Why It Matters
Understanding Conway’s law is the cornerstone for good team design.
“Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations.”p. 10
In short: The software architecture always reflects the organisation structure. We can use Conway’s law to our advantage by shaping the organisation structure according to the desired software architecture. This is called the reverse Conway manoeuvre.
A loosely coupled architecture with cohesive components translates into an organisation with cross-functional teams working in a self-dependent way. What makes for a good architecture also makes for a good organisation structure. In the words of Michael Nygard: “Team assignments are the first draft of the architecture.”
Ruth Malan points out the implications of Conway’s law:
“If we have managers deciding which services will be built, by which teams, we implicitly have managers deciding on the system architecture.”p. 23
So, these managers better have the required technical expertise. This is why Apple requires its leaders to “know the details of their organizations three levels down” (see my post How Apple is Organized For Innovation). An architect is required to have both technical and social skills, as “organization design and software design are, in practice, two sides of the same coin”.
As between software components, “not all communication and collaboration [between teams] is good”. The team structure should “minimize the number of communication paths between teams” and “encourage teams to communicate who wouldn’t otherwise do so”. Unexpected intensive communication between two teams often points to problems in the software design (e.g., a bad interface).
Chapter 5: The Four Fundamental Team Topologies
The authors identify four fundamental team topologies: the stream-aligned team, the enabling team, the complicated-subsystem team and the platform team.
“The stream-aligned team is the primary team in an organization, and the purpose of the other fundamental team topologies is to reduce the burden on the stream-aligned teams.”p. 81
Feature or product teams are the prototypical stream-aligned teams, where product teams may be made of one or more feature teams. Streams can be defined by many criteria like customer types, business areas, geography and user personas. The stream for different teams must be clearly separated to minimise communication. A stream-aligned team has the following characteristics.
- It gets valuable features in the hands of customers at a steady pace – measured by Continuous Delivery metrics.
- It adapts quickly to stakeholder feedback, that is, it is agile.
- It has minimal communication with other teams.
- It has enough time and the necessary skills to reduce technical debt or to avoid it in the first place.
- It “proactively and regularly reaches out to […] complicated-subsystem, enabling and platform [teams]” to ensure that these teams know its most pressing needs.
- It feels in charge of its own fate.
The job of stream-aligned teams is to get valuable features into customers’ hands as quickly as possible. Hence, they don’t have the time to make components reusable for other teams, to improve the CI/CD pipeline or to get lots of legacy code under test. This is the job of enabling teams, which are composed of technical specialists. These specialists fill the knowledge gaps of the stream-aligned teams.
“The end goal of an enabling team is to increase the autonomy of stream-aligned teams by growing their capabilities […] If an enabling team does its job well, the team that it is helping should no longer need the help from the enabling team after a few weeks or months; there should not be a permanent dependency on an enabling team.”p. 87
Enabling teams don’t help with the execution but provide technical guidance. Members of an enabling team will eventually move to teams of other types. The enabling team shows the following behaviours among others.
- It anticipates the needs of stream-aligned teams and regularly checks “when more collaboration is needed”.
- It facilitates learning across all stream-aligned teams.
Imaging systems for recognising traffic on roads or the ripeness of fruit are examples of complicated subsystems. Members of complicated-subsystem teams must be experts “to understand and make changes to the subsystem”. Embedding such experts in the stream-aligned teams would be a waste of time and money. The complicated-subsystem team synchronises regularly with the stream-aligned teams to work on the right requirements.
Platform teams build services, libraries, Linux systems or CI/CD pipelines so that stream-aligned teams can “deliver product features at a higher pace, with reduced coordination”. A platform team may consist of stream-aligned teams, complicated-subsystem teams and platform teams. A platform team
- collaborates closely with stream-aligned teams to anticipate their needs,
- produces “services” that are reliable and can be used with little to no communication, and
- understands that stream-aligned teams may adopt its “services” with some delay.
Often, a good platforms is itself built on a platform. For example, the libraries and services of an embedded system could be built on Qt. A platform is managed like a product. It is “not simply driven by feature requests from Dev teams; instead, it is created and carefully shaped to meet their needs in the longer term”.
Chapter 6: Choose Team-First Boundaries
Software Boundaries or “Fracture Planes”
A fracture plane is a natural seam in the software system that allows the system to be split easily into two or more parts. This splitting of software is particularly useful with monolithic software.p. 115
The inverse Conway manoeuvre tells us that these fracture planes are candidate seams where to split teams. The authors list the following fracture planes (examples mine).
- Business domain bounded context. “Bounded contexts give team members a clear and shared understanding of what to be consistent and what can develop independently” (from the book Domain-Driven Design by Eric Evans). A domain or system model is partitioned into multiple bounded contexts. Field navigation, automatic adaptation of the cutting length, customer accounting, fleet management, remote support, optimisation of diesel consumption and media players are among the bounded contexts of harvesters.
- Regulatory compliance. Metal-sheet bending machines are protected by an infrared grid. If the operator breaks the grid, the machine stops to avoid serious injuries. This protection mechanism must never fail. Hence, it must comply with safety regulations. Safety critical software will typically run on a bare-metal system or a special RTOS, because safety compliance is much easier to achieve for these simple systems than for a complex Linux operating system.
- Change cadence. Operating systems and platform libraries change much less frequently than business logic or HMI. They are deployed at different frequencies.
- Team location. Different office buildings, locations and time zones are natural candidates for fracture planes.
- Risk. A combine-harvester manufacturer may release a threshing assistant – a complicated subsystem – early, although it only works better than a human driver 70% of the time. The manufacturer may win many new customers, while losing a few existing customers. Or, the manufacturer takes the less risky path and integrates its accounting system with the accounting systems of other manufacturers. It will keep its existing customers happy, but hardly win any new customers.
- Performance isolation. Machines sorting ripe from unripe fruits must work in real time. The operator terminal (HMI) for these machines doesn’t have real-time constraints.
- Technology. The HMI part of the system is written with Qt and C++ (frontend). The machine control is written in C (backend). Good technology splits are the exception and not the rule.
- User personas. Hired drivers, expert drivers, support technicians, agency owners (renting harvesters with drivers) and farmers are typical user personas for harvesters.
How do we know which fracture planes are best suited for our organisation?
The litmus for the applicability of a fracture plane: Does the resulting architecture support more autonomous teams (less dependent teams) with reduced cognitive load (less disparate responsibilities)?
Of course, achieving such results often requires some initial experimentation and fine tuning. […] A simple heuristic that can help guide assessment of your system and team boundaries is simply to ask: Could we, as a team, effectively consume or provide this subsystem as a service? If the answer is yes, then the subsystem is a good candidate for splitting off and assigning to a team to own and evolve.p. 121 (emphasis mine)
In short: “Choose software boundaries to match team cognitive load.” We must often combine multiple fracture planes to find the right split into teams.
Identifying fracture planes helps us come up with the context and container diagrams for Qt embedded systems.
Chapter 7: Team Interaction Modes
The authors identify three essential team interaction modes:
- Collaboration: working closely together with another team
- X-as-a-Service: consuming or providing something with minimal collaboration
- Facilitating: helping (or being helped by) another team to clear impediments
“Poorly defined team interactions and responsibilities are a source of friction and ineffectiveness.”
Collaboration: Driver of Innovation and Rapid Discovery but Boundary Blurring
Stream-aligned teams use collaboration mode in their interaction with platform and complicated-subsystem teams to align on the needs. Once the platform team has delivered a reusable component, the stream-aligned teams use it as a service. Heavy collaboration with many teams slows down a team. Therefore, a team should not use this mode with more than one team at a time.
Collaboration mode works best between two teams with high mutual respect. Teams located in different countries or time zones are typically ill-suited for collaboration mode. Collaboration could be improved by “rewarding one team for the work of the other team”.
X-as-a-Service: Clear Responsibilities with Predictable Delivery but Needs Good Product Management
Stream-aligned and complicated-subsystem teams use X-as-a-Service (XaaS) mode to consume services from platform and complicated-subsystem teams. One team can use XaaS mode with many other teams, as the communication overhead is minimal.
Ownership of a service always lies with the team producing the service and never with the team consuming it. There is never shared code ownership. In XaaS mode, the speed of innovation (e.g., adding new value to a system) is slower than in collaboration mode, because it happens across team and software boundaries.
The team providing X-as-a-Service should “emphasise the user [or developer] experience”. The value of X for the consuming team should be high compared to the effort to use X.
Facilitating: Sense and Reduce Gaps in Capabilities
Enabling teams use facilitating mode to help other teams do their work better.
“A team with a facilitating remit does not take part in building the main software systems, supporting components or platform, but, instead, focuses on the quality of interactions between other teams building and running the software.”p. 140
A team uses facilitating mode only with a few other teams at the same time.
“People in the stream-aligned team need to be open to being helped by the enabling team; they need to have an open mind to new approaches and be aware that the enabling team has probably seen some better approaches.”p. 143