By Mike Cohen
Imagine you are ready to deploy a new microservice application. You’ve been given carte blanche by management to do it “right”.
- You spin up your Kubernetes cluster in the public cloud of your choice.
- You let the cloud provider handle the Kubernetes control plane and spread your nodes out over a few availability zones for redundancy.
- You fire up your services and you are off and running.
This is how it’s supposed to work! That was super easy.
Then you start getting billing alarms...
Apparently, your new cluster is generating a ton of cross availability zone traffic. You knew that was possible but it wasn’t top of mind. Your Kubernetes nodes are by design distributed across zones and your services are by design distributed across nodes.
And this is by design going to run up your traffic bill.
Think for moment about the microservice architecture you thought you were deploying.
Since your services are load balanced across the available containers, you are actually achieving this traffic pattern across availability zones.
For anyone building service-based applications on public cloud, this is both an important and often overlooked observability problem. Which pairs of services are generating the most cross zone traffic? Have there been any recent spikes in cross zone traffic that you should be aware of?
This is also the kind of problem flows are excellent at solving if the data are gathered in the right way. A typical cloud provider VPC flow log would capture IP address to IP address communication. Unfortunately, in your Kubernetes environment, you would have to do a lot of post-processing to map IP addresses to your services, you’d miss some of the NAT done around service vips, and none of it could be done anywhere near real time.
By gathering flow information from the operating system, Flowmill can breakdown service to service traffic patterns by availability zone as they happen so you know which interactions are most expensive and optimize accordingly.