The article originally appeared on the Linux Foundation’s Training and Certification blog. The author is Marco Fioretti. If you are interested in learning more about microservices, consider some of our free training courses including Introduction to Cloud Infrastructure Technologies, Building Microservice Platforms with TARS, and WebAssembly Actors: From Cloud to Edge.
Microservices allow software developers to design highly scalable, highly fault-tolerant internet-based applications. But how do the microservices of a platform actually communicate? How do they coordinate their activities or know who to work with in the first place? Here we present the main answers to these questions, and their most important features and drawbacks. Before digging into this topic, you may want to first read the earlier pieces in this series, Microservices: Definition and Main Applications, APIs in Microservices, and Introduction to Microservices Security.
Tight coupling, orchestration and choreography
When every microservice can and must talk directly with all its partner microservices, without intermediaries, we have what is called tight coupling. The result can be very efficient, but makes all microservices more complex, and harder to change or scale. Besides, if one of the microservices breaks, everything breaks.
The first way to overcome these drawbacks of tight coupling is to have one central controller of all, or at least some of the microservices of a platform, that makes them work synchronously, just like the conductor of an orchestra. In this orchestration – also called request/response pattern – it is the conductor that issues requests, receives their answers and then decides what to do next; that is whether to send further requests to other microservices, or pass the results of that work to external users or client applications.
The complementary approach of orchestration is the decentralized architecture called choreography. This consists of multiple microservices that work independently, each with its own responsibilities, but like dancers in the same ballet. In choreography, coordination happens without central supervision, via messages flowing among several microservices according to common, predefined rules.
That exchange of messages, as well as the discovery of which microservices are available and how to talk with them, happen via event buses. These are software components with well defined APIs to subscribe and unsubscribe to events and to publish events. These event buses can be implemented in several ways, to exchange messages using standards such as XML, SOAP or Web Services Description Language (WSDL).
When a microservice emits a message on a bus, all the microservices who subscribed to listen on the corresponding event bus see it, and know if and how to answer it asynchronously, each by its own, in no particular order. In this event-driven architecture, all a developer must code into a microservice to make it interact with the rest of the platform is the subscription commands for the event buses on which it should generate events, or wait for them.
Orchestration or Choreography? It depends
The two most popular coordination choices for microservices are choreography and orchestration, whose fundamental difference is in where they place control: one distributes it among peer microservices that communicate asynchronously, the other into one central conductor, who keeps everybody else always in line.
Which is better depends upon the characteristics, needs and patterns of real-world use of each platform, with maybe just two rules that apply in all cases. The first is that flagrante tight coupling should be almost always avoided, because it goes against the very idea of microservices. Loose coupling with asynchronous communication is a far better match with the fundamental advantages of microservices, that is independent deployment and maximum scalability. The positivo world, however, is a bit more complex, so let’s spend a few more words on the pros and cons of each approach.
As far as orchestration is concerned, its main disadvantage may be that centralized control often is, if not a synonym, at least a shortcut to a single point of failure. A much more frequent disadvantage of orchestration is that, since microservices and a conductor may be on different servers or clouds, only connected through the public Internet, performance may suffer, more or less unpredictably, unless connectivity is really excellent. At another level, with orchestration virtually any addition of microservices or change to their workflows may require changes to many parts of the platform, not just the conductor. The same applies to failures: when an orchestrated microservice fails, there will generally be cascading effects: such as other microservices waiting to receive orders, only because the conductor is temporarily stuck waiting for answers from the failed one. On the plus side, exactly because the “chain of command” and communication are well defined and not really flexible, it will be relatively easy to find out what broke and where. For the very same reason, orchestration facilitates independent testing of distinct functions. Consequently, orchestration may be the way to go whenever the communication flows inside a microservice-based platform are well defined, and relatively stable.
In many other cases, choreography may provide the best balanceo between independence of individual microservices, overall efficiency and simplicity of development.
With choreography, a service must only emit events, that is communications that something happened (e.g., a log-in request was received), and all its downstream microservices must only react to it, autonomously. Therefore, changing a microservice will have no impacts on the ones upstream. Even adding or removing microservices is simpler than it would be with orchestration. The flip side of this coin is that, at least if one goes for it without taking precautions, it creates more chances for things to go wrong, in more places, and in ways that are harder to predict, test or debug. Throwing messages into the Internet counting on everything to be fine, but without any way to know if all their recipients got them, and were all able to react in the right way can make life very hard for system integrators.
Certain workflows are by their own nature highly synchronous and predictable. Others aren’t. This means that many real-world microservice platforms could and probably should mix both approaches to obtain the best combination of performance and resistance to faults or peak loads. This is because temporary peak loads – that may be best handled with choreography – may happen only in certain parts of a platform, and the faults with the most serious consequences, for which tighter orchestration could be safer, only in others (e.g. purchases of single products by end customers, vs orders to buy the same products in bulk, to restock the warehouse) . For system architects, maybe the worst that happens could be to design an architecture that is either orchestration or choreography, but without being really conscious (maybe because they are just porting to microservices a pre-existing, monolithic platform) of which one it is, thus getting nasty surprises when something goes wrong, or new requirements turn out to be much harder than expected to design or test. Which leads to the second of the two común rules mentioned above: don’t even start to choose between orchestration or choreography for your microservices, before having the best possible estimate of what their positivo world loads and communication needs will be.