Modern distributed systems, especially microservices, are hard to be designed, developed, and maintained,
as we’ve already seen in one of the previous
posts, where were discussed more some key principles that should be followed when working on a microservice-based architecture.
We have seen that at this moment the most complete idea about building this kind of system is given by the Reactive Manifesto.
In this set of principles, the most important one is the responsiveness of our system. This is achieved by building an elastic
and resilient system on top of message-driven integrations between components of the system.
In a previous post, we have also discussed some message-driven integration patterns between microservices.
In this current post, we will discuss a little more about the integration parts of our system, focusing on the real cost
of latency produced by the communication between components.
A little talk about traditions …
Traditionally, when talking about microservices, most people think about doing synchronous invocation over HTTP via some REST APIs.
This is possible, and large organizations are doing it very well. We can take a look at Monzo.
When talking about reactive systems, the Reactive Manifesto encourages us to use a message-driven approach and asynchronous
integrations, but we can still use some other kinds of integration patterns and still have a responsive system (remember
that this is our final goal, according to the manifesto).
The mistakes of the past
It’s not the first time when we hear that integration between software components is hard, especially when the components
are not running in the same process.
When designing a microservices-based architecture and taking into consideration synchronous RPC integrations between microservices,
we can not figure out at the very beginning what will be the responsiveness of our system and what latencies will be added by
each microservice. We can at most set some SLAs for the response time of each microservice, but this will lead us to build a
complex system model from the very beginning and respect it no matter what will happen in the future (I think that this is
what organizations like Monzo did).
A frustrating moment is when we already have a part of our system running in production (even if it is still an MVP, or
maybe we have version 1 of the system) and we want to add some functionality that requires some very low-latency processing
that we can’t provide right now. In this specific moment, we realize the mistakes of the past (using synchronous RPC all-over
the place).
Good mannered integrations
Because it’s never too late, we can start to redesign our system in order to accommodate the performance requirements of
the new functionality. But how we can know that the path we will following right now will be a good one?
Well… this is a hard question, but at least there are few key points to be taken into consideration.
One of the world’s top performance specialists, Martin Thompson, considered that integration between software components
“is all about good manners”, and he militates for avoiding bad performant
protocols such as HTTP in combination with JSON payloads and considering some others, like binary protocols.
We can replace plain old HTTP and JSON combination with something like gRPC and
Protocol Buffers if we want to do RPC.
Another aspect that was taken into consideration by Martin and also is discussed in the Reactive Manifesto is the usage of
asynchronous integrations. We can integrate our system’s components in an asynchronous way by doing RPC or by leveraging
a message broker such as Apache Kafka. Even in the case of the message broker, there are plenty of serialization frameworks
that can be used in order to gain performance and reduce the latency of our system at a minimum. To enumerate just a few
of these serialization frameworks:
Latency numbers
The responsiveness of our system is in direct relation with the latency of seen by a customer when wanting to perform a
specific business request.
In one of the best books about the system’s performance, called
Systems Performance: Enterprise and the Cloud,
written by Brendan Gregg, there are presented some latencies imposed by performing some very common operations. Those can
be seen in the following table.
An important aspect is that the values in the last column of the table are scaled in some more “human-friendly” values,
taken into consideration that 1 CPU cycle it’s equivalent to 1 second!
Let’s have a quick look at some values from the table. We think that in most cases of microservices-based architecture,
the most important latency value is the round trip within the same data center, and we can see in the table
that the latency of this specific operation is about 0.5 milliseconds. For us as humans, this is almost instantaneous, so
it may sound reasonable to access some other services at the cost of 0.5 milliseconds for each of those accesses. But (!)
let’s look at the associated scaled latency of this operation in the last column. We can see that if we will consider 1 CPU
cycle as being equivalent to 1 second, a round trip within the same data center takes approximately 14 days, that’s 2 weeks!!!
Are we still considering that the cost of performing round-trips within the same data center is a reasonable one? I don’t think so!
More, if we will design and develop business flows that are including sequential blocking invocations between multiple services,
then the total latency can be approximated by the formula: NI x 0.5, where NI stands for Number of Invocations.
If we take some concrete values for NI, let’s say 3 then we will have a total latency of 1.5 milliseconds. Remember
that a millisecond is equivalent to 14 days in human time, so basically, for performing a business flow which implies 3
round-trips within the same data center, a customer should wait 1.5 milliseconds, being blocked in front of our application,
while the servers will be blocked for an equivalent of 21 days performing customer’s requests. Are we still considering
our system as being responsive?!
What if non-blocking?
It’s obvious that performing tasks in distributed systems can’t avoid invocations between remote components that are deployed
in the same data center or not. And as we have seen above, these invocations come at an expensive price. We have two problems:
the customer will be blocked for a while in front of his/her computer or mobile device and also, our resources will be blocked
performing customer’s requests. What can we do in this situation?
From the customer’s point of view, he will be somehow irritated that performing a task will take some time, but remember,
he will wait just for 1.5 milliseconds (which is not so bad for a human). Still, this can be heavily improved by designing
a non-blocking system, leveraging asynchronous integrations via message passing. Basically, after the customer will send
his/her initial request, the first service will capture the request and will notify the customer about it (so in the happiest
scenario, the customer will wait until the first service will notify about receiving his/her request). After that, asynchronously,
all the services will cooperate in performing this request. In the end, the customer will be notified by some service about
the response of the request.
What about our servers? Well… by using asynchronous message-driven integrations, we will not keep our services blocked anymore.
When one of them will receive a message, he will perform a task and at the end will send a message to another service, without
waiting for a response. This will allow us better management of the resources.
So the solution to our problem is using an asynchronous message-driven style for doing integrations between the system’s components.
This will allow our microservices to be not-blocked in performing business flows.
Conclusions
- When working with microservices integrations between them are hard to be designed. Most important is to not use some
integration patterns just because other big organizations did it. It is very probable that they took into consideration
some very precise characteristics of their system and everyone will design and develop accordingly to these. - Building large distributed systems using HTTP and JSON can be possible, but there are plenty of arguments about existing
better solutions than these. - We should design “good mannered” integrations.
- It is a must that we should always know the latencies of our system.
- Avoiding bad customer experience and keeping our servers blocked in performing requests can be avoided by using asynchronous
message-driven integrations.