Extracting Microservices from a Modular Monolith

When developing the operator terminals for their machines, OEMs must reinvent the wheel over and again. Each OEM implements home-grown solutions for standard features like OTA updates, user authentication, factory installation, machine gateways and IoT gateways. None of these features belong to the OEM’s core business. OEMs could save a lot of time and money, if they could buy these features as ready-made solutions from third-party vendors.

Microservices could be these ready-made solutions. They have technology-agnostic interfaces, which the terminal application would call to perform standard jobs like OTA updates and user authentication. Microservices are independently deployable and run in their own process (see the Definition of Microservices for more details). Third-party vendors can build the microservices once for many OEMs to use on their devices.

As terminal applications are typically monoliths, they have all the standard features built-in. This has several disadvantages.

OTA updates require root privileges. Hence, the application also runs with root privileges. This is a serious security risk. We can mitigate the risk by extracting OTA updates as a microservice. The application runs with normal privileges and the update service with root privileges. See the section Extracting the Microservice for Updating the System for a detailed discussion.
The GUI is divided up in spaces used by different user types. The space for harvesting beets is mostly used by the drivers, whereas the zone for diagnosing machine failures may only be used by technicians. Drivers use their zone almost 24/7 during the harvest, whereas technicians use their zone only rarely, hopefully never during the harvest. In a monolith, memory leaks, crashes or deadlocks in the technicians’ zone would stop the harvest. This wouldn’t happen, if the feature for diagnosing machine failures were extracted into its own microservice. See the section Extracting the Microservice for Diagnosing the Machine for a detailed discussion.

We have already hinted at several good reasons – not the OEM’s core business, reduced time to market, different user privileges, different user types and resilience – for extracting microservices from a monolith. The section Reasons for Introducing Microservices gives a detailed discussion of these and other reasons.

The best starting point for extracting microservices is a Modular Monolith Based on the Ports-and-Adapters Architecture. The ports and adapters share the main traits with microservices: single purpose, clear business value, technology-agnostic interfaces and independent deployability. However, adapters don’t run in their own process.

If our monolith leans more towards a big ball of mud (BBoM), it is worth the effort to transform the BBoM into a modular monolith before extracting any microservices. If we work on a greenfield project, we don’t have to extract any microservices. We can simply use them. We should ensure that the ports are very similar to the interfaces of the microservices we want to use. This makes the integration of the third-party microservices easier.

A Modular Monolith Based on the Ports-and-Adapters Architecture

Many operator terminals run a single HMI application for controlling a machine or device like a harvester, excavator, vending machine, X-ray frequency analyser, multi-head weigher or UV cleaning robot. Such an application is a monolith. Monoliths range from unstructured big balls of mud to well-structured modular monoliths.

A modular monolith based on the ports-and-adapter architecture is well-suited for the extraction of microservices. The adapters tend to be good candidates for microservices. If an existing application is more on the unstructured side, it’s a good idea to make the application more modular first. If we can start application development from scratch, we can introduce the microservices right away without the detour via the modular monolith.

Let us quickly recap the ports-and-adapters architecture. Our running example is the driver terminal of, say, a sugar beet harvester.

Figure 1: Modular monolith *Harvest App* based on ports-and-adapters architecture

The hexagon denotes the Core, which contains the business logic of the application. The Core is technology-agnostic. It doesn’t have a clue whether system updates are performed over 5G or WLAN or whether they use Mender, SwUpdate or RAUC as the update client and Mender, Memfault or QBee as the update server.

In the Core, we implement the business rules for the system update. As the harvester must run nearly 24/7 in the 8-week harvesting period, any problem must be fixed as quickly as possible. Hence, the business rule could look like this.

The Fleet Management server sends a push notification to the Harvest App.
The Harvest App tells the Driver that an update is available.
Depending on the importance, the Driver installs the update right away or schedules it for the next break.

The business rule for a fully automatic production line would look different. The update can only happen during a scheduled downtime and it must be very fast like switching to a spare.

The sides of the hexagon are the interfaces or ports between the Core (the inside world) and the outside world. Ports shield the Core from specific technologies used in the outside world.

We organise interactions between the application and the external actors by the reason why they are interacting […] In this model, each set of interactions with a given purpose or intention is a port.

Alistair Cockburn, Juan Manuel Garrido de Paz: Hexagonal Architecture Explained. Page 25.

The Harvest App has four ports: For Harvesting Beets, For Providing Support, For Controlling Machine and For Updating System. The port names follow the convention For Doing Something (for + verb + noun) and say “for” what purpose they are used.

The Harvest App has four external actors, where the Driver and the Technician are driving or primary actors and where the Machine and Fleet Management are driven or secondary actors.

A primary actor is any entity, whether human or electronic, that kicks the [Core] into action. It makes a service request of the [Core through a Port], initiating what may be a complex set of back-and-forth interactions.
A secondary actor is any entity, human or electronic, that the [Core] kicks into action, requesting a service from it.

Alistair Cockburn, Juan Manuel Garrido de Paz: Hexagonal Architecture Explained. Page 27.

An adapter converts a port into the interface of an external actor. For example, the Update Adapter converts the port For Updating System into the interface of a specific Fleet Management server. The port provides technology-agnostic functions (slots in Qt speak) like checkAvailability, installAtScheduledTime, switchToNewSystem and publishes events (signals in Qt speak) like updateAvailable and updateProgress. The adapter converts the function calls into HTTPS requests to the Memfault server via an intermediary SwUpdate daemon. This conversion obviously depends on specific technology.

Each port has at least two adapters: a production and a test adapter. Driving ports have a test driver running Test Cases, whereas driven ports have Test Doubles emulating the adapter. Testing of the Core is built in. Test adapters provide an early-warning system for business logic from the Core leaking into an adapter or vice versa.

Adapters do a good job, if they lift the Port to a significantly higher abstraction level than the interface to the external actor. They do a bad job, if they leave the port at nearly the same abstraction level as the external interface.

“The best [adapters] are those whose [ports] are much simpler than their implementations.” A port provides “a simplified view [of an adapter], which omits unimportant details”. This simplified view is called an abstraction. “The key to designing abstractions is to understand what is important, and to look for designs that minimise the amount of [important] information.”

From my review of John Ousterhout’s book A Philosophy of Software Design

The port For Updating System presents an excellent abstraction, because it hides a lot of unimportant implementation details. We could implement the Update Adapter with an SwUpdate client and Memfault server, a RAUC client and QBee server, or with a Mender client and server. The port For Updating System would stay the same.

The port For Controlling Machine to the Machine Adapter typically presents a poor abstraction. It provides a class for each ECU of the machine. Each class provides getters and setters for all the parameters of its corresponding ECU. Some classes have hundreds of parameters, some only a dozen The ECU classes have no behaviour. The only abstraction these classes provide is the conversion of CAN messages into parameter values.

If we move a parameter into a different ECU or if we merge five ECUs into one ECU, the port will change. Changes in the external actor Machine imply changes in the port For Controlling Machine and hence in the Core. We just created a maintenance nightmare. There are two ways out of this mess.

We can define the port For Controlling Machine from the perspective of the primary actors Driver and Technician. Parameters that are used together by a primary actor are grouped together in the same class. The classes still have no behaviour, but the port protects the Core from changes in the Machine. We raised the abstraction level a bit, but we still have the same total number of getters and setters over all ECU classes.
We must bring in behaviour to raise the abstraction level in a meaningful way. When harvesting sugar beets, the Driver must – among other things – ensure that the roots of the sugar beets are not cut off to minimise the loss of sugar. Currently, the Driver changes several parameters to lower the shovels and to dig out the beets intact.
In a behaviour-driven approach, the Driver would tell the Machine Adapter via the port For Controlling Machine by how much – a bit, a lot or somewhere in-between – the root is cut off. The adapter could translate the Driver’s observation into concrete settings of the parameters. It sends the parameter changes over the CAN buses to the involved ECUs. Now, the port is on a good abstraction level. It also has a much smaller interface, because we moved most getters and setters into the adapter.

We’ll soon see that ports and adapters with high abstraction levels are good candidates for microservices. In contrast, ports and adapters with low abstraction levels are bad candidates and need some massaging.

Definition of a Microservice

In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms […]. These services are built around business capabilities and independently deployable by fully automated deployment machinery.

James Lewis and Martin Fowler, Microservices – a definition of this new architectural term, 2014 (emphasis mine)

A service is a small component with a single clearly-defined business responsibility (see the single-responsibility principle). The monolithic Harvest App is certainly not a service, as it has multiple responsibilities and is far too big. The adapters of the Harvest App, however, could qualify as such services.

The adapter For Updating System has a single responsibility, has a very small interface and provides good business value: a secure and reliable OTA update. This adapter would be a good candidate.
In contrast, the adapter For Controlling Machine as a whole would be a bad candidate, as it has multiple responsibilities like harvesting sugar beets, diagnosing failures in the harvester electronics, calibrating mechanical parts and collecting harvest data for billing. Each responsibility, however, could be a good candidate for a microservice.

Even suitable adapters cannot be used directly as microservices, because – as libraries – they don’t run in their own process. The technology-agnostic ports come to the rescue. We expose the port to a lightweight inter-process communication (IPC) mechanism like D-Bus, Qt Remote Objects or gRPC.

Sending a message from one process to another takes longer than an in-process function call. When moving from a monolith to microservices, we must change the communication pattern: from many in-process function calls to fewer inter-process messages. We must move the service interface from implementation steps to behaviour-driven tasks, from “How to do a task?” to “What task to do?” with the service figuring out the “how”, or from a low abstraction level to a higher abstraction level.

As John Ousterhout put it (see the full quote above): The service interface must provide “a simplified view of a module, which omits unimportant details” – a.k.a. information hiding. A well-designed port is a good candidate for the interface of a microservice and the corresponding adapter for the implementation of the microservice.

A well-designed interface on the right abstraction level enables the independent deployment of a microservice. If we modify an interface, we must also deploy all the applications and other services depending on the service with the modified interface. Interface-breaking changes should be very rare events.

If we change an interface, we best apply the open-closed principle introduced by Bertrand Meyer: “software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification”. We can add another function to a class or another class to a module, but we must never modify the interface or behaviour of an existing function or class.

By now, it should be pretty clear why an adapter with its port is often a good candidate for a microservice. A well-designed adapter

has a single purpose,
provides clear business value,
provides a small, behaviour-driven and technology-agnostic interface, and
enables independent deployment.

Adapters only lack one feature: They don’t run in their own process and hence don’t need inter-process communication. Although adapters and microservices share so many traits, we should always have a good reason to convert an adapter into a microservice.

Reasons for Introducing Microservices

Should we convert every adapter into a microservice? No, absolutely not. We should have a good reason for such a conversion. We should also ensure that the adapter is well-suited for such a conversion. We might have to split big adapters with multiple responsibilities into multiple smaller adapters with a single responsibility before we extract them as microservices. But we shouldn’t overdo the splitting. If we converted our modular monolith into a distributed big ball of mud with too many interdependent microservices, we would have lost quite a lot.

In the end, we will have a reduced Harvest App performing its job with the help of some microservices. We will keep the ports, replace their adapters by remote proxies and move the adapters into microservices. As such an extraction takes considerable time, we should have a good reason. Here is a non-exhaustive list of good reasons.

Not core business. The core business of a harvester manufacturer is to develop an application that enables hired drivers to achieve a good yield. It is not to develop the 1,000th version of an OTA update for Linux systems. Ideally, the manufacturer could buy a microservice for OTA updates from different vendors. All vendors implement the same standard interface. They can use different combinations of update client and server in their implementations: e.g., Mender client and server, RAUC client and Memfault or QBee server, Torizon client and server.
Reducing time to market. If manufacturers can buy microservices for system updates, user authentication, machine health monitoring or collecting billing info, they save the development, maintenance and support costs. They can focus on their core business and release their products to the market earlier.
Scaling the team. Thanks to the high-level ports, the core and the adapters of our modular monolith are loosely coupled. Applying the reverse Conway manoeuvre, each adapter and the core could be built by separate loosely coupled teams. Whether we do this depends – as usual – on the context. This manoeuvre ensures that the team structure maps to the system architecture (Conway’s law).
The teams can be internal or external. Microservices that are not the core business for manufacturers are good candidates for outsourcing. Microservices for updating a device or for authenticating users come to mind. But even a microservice like diagnosing failures of construction and agricultural machines could be implemented by a third-party vendor.
User authentication. Given the user credentials by username/password, smart card, fingerprint or face ID, a microservice could check whether a user is allowed to control the harvester and to which domains the user has access (see different user types below). The manufacturer could even buy this microservice off-the-shelf from a vendor.
Different user privileges. System updates require root privileges to write into disk partitions. Hence, the monolithic Harvester App runs as root. If we move the Update Adapter into a microservice, we can run the Harvester App as a normal user and the microservice as root. Additionally, only applications, which are explicitly authorised for using the microservice, are allowed to perform a system update. Running applications and services with the lowest privileges possible is also demanded by the requirement Mitigation of Incidents of the EU CRA.
Different user types. The architecture diagram of the Harvest App above shows two user types: Drivers and Technicians. There may be more types like expert drivers (“power users”) and developers. Each user type has access to certain business domains provided by the Harvester App. Naturally, Drivers have access to the core domain: harvesting sugar beets. In addition to the drivers’ domains, Technicians have access to subdomains like diagnosing failures, calibrating the machine and commissioning the machines. Expert drivers can get temporary access to some of the Technicians’ domains. Developers have access to all domains. Access to different business domains is controlled by the Harvester App and not by the Linux system.
Subdomains that are used only by certain user types for special purposes are good candidates for microservices. They are typically extracted together with their driving and driven adapters. The microservice for diagnosing the machine is an example.
Performance. The Harvester App must process 1500 CAN messages per second in normal operation. This flood would freeze the GUI. Moreover, it doesn’t make sense, because the driver can’t keep track of 1500 changes per second. In a monolith, we move the processing into threads and ignore too frequent changes. CAN message processing is a natural candidate for a microservice running in its own process. We could give this process a higher priority than the HMI process.
We could go a step further and run this microservice on the microcontroller that is available on many SoCs these days but rarely used. We offload the load from the microprocessor cores to the microcontroller. The communication between the application and the microservice is typically done over RPMsg (remote processor messaging).
If our device controls a machine directly in real time or in near real time, we best run these control tasks on the microcontroller. The embedded industry has been doing this for years – without calling these tasks microservices. However, these control tasks are microservices – especially when they have a well-designed interface.
Safety. Similar to control tasks, we can consider safety-critical tasks as microservices and run them on the microcontroller. A small program with a single purpose running bare-metal on a microcontroller can be certified according to relevant standards much easier than the same program running standalone or as part of a monolith on a full-blown Linux system. Again, nobody in the embedded industry would call safety-critical tasks microservices – but they normally are.
Resilience. If a monolith has a memory leak, accesses a dangling pointer or waits on a mutex forever, the whole application goes down and becomes unusable. Such a showstopper would be especially annoying, if it happened in a part of the Harvest App that isn’t required for harvesting. We could move such parts into a microservice. As the microservice runs in its own process, only this non-vital service wouldn’t be available. And the harvest could continue. Good examples are services for diagnosing machine failures, calibrating the machine or collecting billing info.
If resilience were the only reason for introducing a microservice, we would probably end up with dozens of microservices. All of them would have been introduced for technical and not for business reasons. Not a good idea! We best pair resilience with another reason.

Extracting the Microservice for Updating the System

The Update Adapter is an obvious candidate for a microservice. The only working feature any device needs for its initial release is an OTA update. Then, we can incrementally add all the other features. We might even buy an Update Service from a third-party vendor instead of reinventing the wheel. The Update Adapter has a single purpose: updating the embedded system. And not to forget: The port For Updating System is a clean, minimal and high-level interface enabling us to implement different update strategies in the Core.

We have two good reasons to extract the Update Adapter as a microservice.

System updates are not the core business of machine manufacturers. They are without doubt extremely important, but manufacturers should buy a ready-made update solution instead of building the 1000th half-baked “solution”.
System updates require root privileges. Hence, the Harvest App must run as root. By moving system updates into a microservice, the Harvest App can run as a normal user. This would also be in line with the EU CRA requiring applications and services to run with the lowest possible privileges (see requirement Mitigation of Incidents of the EU CRA).

Figure 2: Modular monolith *Harvest App* after extracting the *Update Adapter* into the microservice *Update Service*

So, what are we waiting for! Let us tackle the extraction of Update Adapter from Harvest App into the new microservice Update Service. We transform the modular monolith with no microservices from Figure 1 into the modular monolith with one microservice Update Service from Figure 2 above.

We start by defining the D-Bus interface – typically in XML – for the asynchronous communication between Harvest App and Update Service. We use a generator (e.g., qdbusxml2cpp for Qt, xml2cpp-codegen or dbus-cxx-xml2cpp for pure C++) to create the boilerplate code for Update Proxy and Update Remote. As a minimum, Update Proxy sends the messages checkAvailability, installAtScheduledTime and switchToNewSystem to Update Remote. It receives the messages updateAvailable and updateProgress from Update Remote. For Update Remote, the roles for sending and receiving are reversed. In short, Update Proxy and Update Remote implement the remote proxy pattern.

Update Proxy is another adapter implementing the port For Updating Systems. It will eventually replace Update Adapter. For the time being, we keep both adapters. We build and run the Harvest App with both adapters. The Harvest App can switch between the two adapters for production and development.

Now that we have Harvest App running with the new adapter Update Proxy, we must assemble the new microservice Update Service in an executable of its own. We have most parts of this executable working already: the test driver running the test cases on the Update Adapter. Replacing the test driver with Update Remote gives us the desired executable. The Update Service executable is made up of the generated Update Remote, the port For Updating System (a header in our case) and a copy of the Update Adapter.

This leaves us with some wiring in Update Remote. When the remote receives a message, it calls the corresponding slot on the port For Updating System. If the port forwards a signal from Update Adapter to Update Remote, the remote sends the corresponding message to Update Proxy.

Figure 3: Message sequence triggered by *Core* calling *installNow* in *Harvest App* with *Update Service*

The above figure illustrates the message sequence triggered by the Core calling the function installNow, which is a convenience function for installAtScheduledTime(Now). Straight-line arrows are synchronous function calls. Squiggly-line arrows are asynchronous function calls. The D-Bus communication between Harvest App and Update Service (messages 2 and 7) is asynchronous, so are the progress updates (messages 5, 6 and 7).

The next figure shows the same message sequence in the modular monolith without the Update Service. The important observation is that nothing changed for the Core, Harvest GUI Adapter, Support GUI Adapter and Machine Adapter in the Harvest App. The port For Updating System hides all the additional steps (messages 2, 3, 6 and 7) from the rest of the Harvester App. Hiding complexity and changes is exactly what a good abstraction should do.

Figure 4: Message sequence triggered by Core calling installNow in *Harvester App* without any services

We are finally ready to run Harvester App together with Update Service – each in its own process. We can try out the system update with the monolith cooperating with the new microservice.

We should expect system updates to succeed, as we – as TDD and BDD aficionados – updated the tests along with the transformation.

Testing the business rules for system updates in the Core of the Harvest App didn’t change.
Testing the adapter Update Proxy in Harvester App is new. We test that calling a function in the port results in sending the correct message and that receiving a message results in the port emitting the corresponding signal.
Testing Update Remote in Update Service is new, too. The testing is done similar to Update Proxy – just in reverse.
We need not test the D-Bus communication between Update Proxy and Update Remote, because it is generated.
Testing the Update Adapter stays the same but now happens as part of deploying the Update Service.

Once we are confident that our new combo works properly, we can remove the original Update Adapter from the Harvester App. We have successfully extracted our first microservice from a modular monolith. Our extraction procedure followed the Branch by Abstraction pattern (see also Sam Newman, Monolith to Microservices: Evolutionary Patterns to Transform Your Monolith, 2020, p. 104-113), where the port For Updating System corresponds to the abstraction and the adapter Update Proxy to the branch. An abstraction (port) has several branches (adapters) as its implementations.

Extracting the Microservice for Diagnosing the Machine

The GUI of the Harvest App is split in the Harvest GUI and the Support GUI. Drivers use the Harvest GUI for harvesting sugar beets. Technicians use the Support GUI for providing support to the Drivers when the machine doesn’t function properly. Drivers are not allowed to use the Support GUI, because they might damage the machine. Technicians are allowed to use the Harvest GUI but they will only do so sporadically.

We have two different user types – Drivers and Technicians – doing their jobs in different parts of the GUI. Drivers are on the field 14-18 hours each day. The GUI must make their work efficient. If the harvested beets have too much foliage or if the top of the beets is cut off, Drivers know immediately which action to take. In contrast, Technicians are ideally not needed during the harvest. If they are, they must find low-level causes like open circuits, short circuits and stuck valves. The Support GUI looks very different from the Harvest GUI.

Different user types and different usage patterns are very good reasons to extract a Diagnosis App – a diagnosis microservice with a GUI – from the Harvest App. But, we can’t extract the Diagnosis App right away, because the Machine Adapter is not prepared for this yet.

Although the two applications are mostly interested in different sets of CAN messages, they have a shared interest in some CAN messages. The Linux kernel stores the received messages in a single message queue per CAN bus. This message queue is a shared resource for all applications on the same driver terminal. If the Harvester App reads a message from the kernel queue, the Diagnosis App will never see it – and vice versa. Hence, we need a component – the CAN Message Router – that reads all the messages from the CAN buses and publishes them to the subscribed applications (see the publish-subscribe pattern).
In the Harvester App, the subdomains For Harvesting Beets and For Providing Support both use the Machine Adapter to get their job done. We must extract the diagnosis functionality from the Machine Adapter into the new Diagnosis Adapter. The subdomain For Providing Support will use the Diagnosis Adapter through the port For Diagnosing Machine and the subdomain For Harvesting Beets the reduced Machine Adapter through the reduced port For Controlling Machine. This restructuring will enable us to extract the separate Diagnosis App later.
Typical implementations of the Machine Adapter provide a class for each ECU. The class has getters and setters for each ECU parameter. Bigger ECUs easily have 500-1000 parameters. These ECU classes provide a very poor abstraction of the Machine. They suffer at least from the code smells Data Class (a.k.a. data dump), Large Class and Primitive Obsession (see Chapter 3 of Martin Fowler’s book Refactoring, 2nd edition). In short, they stink!
Moving a parameter from one ECU to another changes the corresponding ECU classes – and hence the port For Controlling Machine and the code in the Core. The same happens if we replace, say, the MAN engine by a Liebherr engine or if we merge several ECUs into one ECU. If we think about the tasks performed by users and about the machine parameters used together by these tasks, we will end up with a much higher abstraction level of the port. The port will hide changes in the machine. We can always extend the port for new tasks made possible by the machine.

Once we have applied these three preparation steps, the architecture of the Harvest App looks as follows.

Figure 5: *Harvester App* after extracting *Diagnosis Adapter* from *Machine Adapter* and after introducing the *CAN Message Router*

The CAN Message Router decodes the CAN messages into parameter values and sends the parameter values to the interested adapters. It ignores messages that no adapter is interested in. ECUs send important messages up to 100 times per second. Humans can’t deal with 100 changes of a parameter per second. They reach their limit at 5 changes per second. Even the GUI cannot cope with 1500 changes per second, which is the load for the harvesting page in normal operation. It would quickly freeze. The router slows down the flood to at most 5 changes per second per parameter. This filtering reduces the load of the harvesting page to roughly 150 changes per second.

The Harvest App becomes even more responsive, if it processes the messages from each CAN bus in a thread of its own. The threads take the load off the main GUI thread. Aggressively filtering the messages and processing the messages in multiple threads improves the performance and responsiveness of the GUI considerably. Unfortunately, multi-threading comes with the risk of strange crashes and deadlocks – reducing the resilience of the Harvest App. We will easily regain the slight loss in resilience, once we introduce the Diagnosis App.

The CAN Message Router is begging us to extract it into a microservice for performance and resilience reasons. Moreover, the Machine Service will enable us to extract the Diagnosis App so that Drivers and Technicians have their separate work spaces. Here is the architecture diagram after extracting the CAN Message Router into the new Machine Service.

Figure 6: *Harvester App* with the newly extracted *Machine Service*

We create the D-Bus communication between the proxies in the Harvester App and the remotes in the Machine Service in the same way as for the Update Service. For the Update Service, we made the calls to the high-level interface of Update Adapter available via a D-Bus channel in the remote service. In contrast, for the Machine Service, we make the calls to the low-level interfaces of Diagnosis Adapter and Machine Adapter available via D-Bus channels in the remote service. Low-level interfaces have more function calls than their high-level counterparts. Hence, we must ensure that the higher frequency of function calls doesn’t overload the D-Bus. The aggressive filtering of CAN Message Router certainly helps.

The CAN Message Router extracts the parameter values from the diagnostic CAN messages and sends them to the Diagnosis Remote. The remote forwards them via D-Bus to the Diagnosis Proxy. The proxy feeds them into the Diagnosis Adapter, which prepares the diagnostic information for the Core. The Core can send diagnostic commands in the other direction. The Diagnosis Adapter decomposes them into parameter values. The CAN Message Router translates the parameter values into CAN messages and writes the messages on the right CAN bus. The communication between the Machine Service and the Harvest App works similar for the “machine” messages.

Finally, we can extract the Diagnosis App as a microservice. The result is shown in the next diagram.

Figure 7: *Harvest App* with the newly extracted *Diagnosis App* in addition to the already existing *Machine Service* and *Update Service*

We roughly follow these steps for extracting the Diagnosis App.

We create a new home for the Diagnosis App.
We move the code and tests of For Diagnosing Machine, Diagnosis Adapter and Diagnosis Proxy to their new home. We get the tests running and passing. We can take the test executable as a template for the application executable.
As the port For Diagnosing Machine is missing in the Harvester App, the compiler will tell us which functionality in the Core of the Harvester App is related to diagnosis. We move this functionality to its new home in the Core of the Diagnosis App. We keep the Diagnosis App compiling.
We move For Providing Support to its new home. We get the tests for the Core of the Diagnosis App running and passing.
As the port For Providing Support is missing in the Harvester App, the compiler will tell us which parts of the GUI belong to the Support GUI Adapter. Guided by the compiler errors we move the Support GUI Adapter step-by-step to the Diagnosis App. We get the tests for the Support GUI Adapter running and passing.
When users press the Diagnose button, the Harvest App starts the Diagnosis App. The window manager makes sure that the Diagnosis App is shown at exactly the same place as before. Users don’t notice that the diagnosis GUI now runs in a new process.

The Diagnosis App is a microservice with a GUI. The primary actor or main user of the Diagnosis App is a human. In contrast, the primary actor for microservices without a GUI like Update Service or Machine Service is another microservice or application.

The introduction of the Diagnosis App increases the resilience of the Harvest App. If the Diagnosis App gets overloaded with messages, leaks memory or crashes, the Harvest App and the other services are unaffected. The user could shut down the Diagnosis App. In an improved version, the Harvest App or an application manager could monitor the microservices and shut them down automatically, if they notice irregularities. The Driver has the option to continue harvesting beets, update to a new version or fall back to an old version.

The Ultimate Goal: SoM = Solutions on Module

In my newsletter Solutions on Modules, I explain why we should move from “System on Modules” to “Solutions on Modules” and how microservices could help us to achieve this goal. The “S” in SoM or SoC should stand for “Solutions” – not “System”. The embedded Linux systems provided by the SoM, SoC and SBC vendors as BSPs (board support packages) have one crucial deficiency.

What most OEMs overlook [when selecting a SoC, SoM or SBC] is that they just bought into a custom Linux system of unknown provenance. Building custom embedded Linux systems […] is a big black hole for time and money. The root problem is the lack of well-defined interfaces between the application and the operating system layer.

Section Embedded Linux: A Mess of Epic Proportion in my newsletter “Solutions on Modules”

In other words, BSPs are lousy platforms because they lack well-defined APIs (application programming interfaces) separating the BSP from the application layer. Embedded hardware and software vendors force OEMs to reinvent the wheel over and over again. Each OEM must implement their own “solutions” for OTA updates, user authentication, factory installation, machine gateways, IoT gateways, secure boot and integrity checks of system images. As OEMs rarely have the expertise in all these areas, they inevitably come up with bad solutions. Instead of wasting lots of time and money on solved problems, they should focus on their core business.

Microservices cannot solve all the problems listed in the previous paragraph, but they can play an important role in solving many of the problems in ways good enough for most OEMs. Microservices must have clean, minimal and technology-agnostic interfaces, with which applications and other services can communicate. But such APIs are missing in embedded Linux systems. The independent deployability would make it easy for third party vendors to sell microservices as ready-made products.

Let us have a quick look how microservices could help us with some of the mentioned problems.

OTA updates. The core business application (the Harvest App in our running example) uses the high-level interface of the Update Service. It doesn’t know, which update strategy (A/B, A/recovery, artefactory, etc.) or which update client/server (RAUC, OSTree, Mender, Memfault, etc.) the microservice uses. The vendor is responsible to prepare the embedded Linux system for their implementation of the microservice. The OEM may have a choice of the update strategy and of the update client and server – depending on the microservice vendor.
People from Mender and Memfault have told me that my idea of an update microservice is unrealistic, because OTA update solutions vary too much. Well, I disagree. I am pretty sure that the 80/20 rule applies for OTA updates, too. I think that 80% of the OEMs – especially small and medium OEMs – would be happy with a standard update solution. The remaining 20% can have their special solutions. With Torizon cloud as the fleet management system, Toradex already provides an OTA update solution that is good enough for most of its customers. If Toradex provided its update solution as a microservice, OEMs would have hardly any reason to buy from Mender or Memfault and one reason less to look at other SoM vendors.
Factory installation. When the device is powered up for the very first time in the factory, it starts a small rescue Linux system over the network, from a USB drive or from an SD card. The rescue system runs the factory installation as a microservice. Factory installation partitions the disks, installs the bootloader, Linux kernel and root file system in the disks, installs the super-root keys in the e-fuses and closes the device. From then on, users can run normal OTA updates.
As with OTA updates, the 80/20 rule applies. 80% of the OEMs can be served with the same solution. The remaining 20% may need some customisations.
User authentication. As user authentication is needed in every embedded device but is hardly ever the core business of OEMs, it is well-suited for deployment as a microservice. The API is super simple: check whether a user should be granted access. The API hides, whether authentication happens by username/password, smart card, fingerprint or face ID.
Machine Service. We could generate the Machine Service from a specification of the CAN messages sent or received by each ECU and from a specification in which messages subscribers like the Harvest App and the Daignosis App are interested. The former specification is used to generate the CAN Message Router. The latter specification is used to generate the remotes. Similarly, we could introduce a remote for routing a small set of relevant messages to an IoT cloud.
The vendor of this microservice would provide the code generator and define the format of the specifications. Such an auto-generated Machine Service would save OEMs of agricultural and construction machines a good amount of time and money. I should know, because I have implemented code generators for three different customers.
ECUs. Run-off-the-mill ECUs are not microservices at all. The typical ECU interface provides getters and setters for 20, 100, 500 or even more parameters. It is a shallow interface without any abstraction. However, users would benefit from ECUs as proper microservices. The interfaces would be more in line how users see the world than with how the machine sees the world. In our running example, Drivers could simply tell the Harvest App what they observe: far too much foliage, a little bit of the root cut off, beets too dirty, etc. The Harvest App forwards these observations via the Machine Adapter and the Machine Service to the responsible ECU. The ECU translates the observations into the right machine commands.
We see this trend to ECUs as microservices with the assistant systems in cars and with some delays in machines. Parking, emergency-braking, lane and blind-spot assistants illustrate this trend. Some maize harvesters automatically control the cutting length by the colour of the plants. These assistant systems are currently mushrooming. One architectural trade-off is: Shall we just add another box in the harvester or shall we implement the functionality as a microservice on a more powerful ECU?

Microservices will help commoditise business capabilities that are outside the core business of OEMs. They will help OEMs avoid reinventing the wheel and save them good money. They are a powerful tool to move from “System on Modules” to “Solutions on Modules”. However, this will require the embedded industry to re-think their business propositions. It will take time!