ISE Blog

Observability in Distributed Systems

Whether you’re working with a single-instance application or a complex deployment of dozens of orchestrated microservices, it is important to know that the code is working the way it should, and how people and outside systems are interacting with it. I’ve written before about instrumentation of applications and even showed a toy example using amazon X-Ray, but I thought I should devote some space to observability and why it is important.

What is Observability?

In any given week I have a handful of emails in my inbox from cloud vendors selling “Observability”.  But what is it?  There’s a technical definition from control theory: 

Observability: Adj.  A measure of how well internal states of a system can be inferred from knowledge of its external outputs.

In practice, I like to ask the questions: “How will I know if it’s working the way it’s intended?  How will I know if it is not working?”  In other words, as engineers, our practice with respect to observability is to make sure that the code we deploy is providing sufficient outputs to determine if it is working and how it is working. 

Vintage Car Dashboard

[Source: Public Domain Pictures]

Monitoring and Instrumentation

Monitoring solutions are usually touted as offering observability.  Monitoring is a way to make your observability accessible, but it is not observability on its own. A monitoring solution looks to provide insight into how your application code is working, and present that information in a meaningful way.  But, out of the box, monitoring can only show you how the underlying infrastructure is working, and the externally visible portions of the application such as aggregated logging, built in health metrics, and perhaps basic statistics on API calls.

Instrumentation is the ability to monitor or measure the level of a product's performance, to diagnose errors and to write trace information.  Which arguments are most frequently passed to that API call?  How long does an internal function run on average before returning a result?  Instrumentation is the art of adding these outputs in a way that will help answer important questions if something goes wrong.

Dashboard

[Source: flikr]

Level of Detail

An important consideration is capturing the right level of detail.  The instrumentation should give you insight, and this is a careful balance. A dashboard full of dozens of metrics presents the challenge of identifying what pieces are valuable.  Too much information makes it hard to draw correlations between different metrics. 

In a large distributed system, many various levels of detail are appropriate for different tasks. The view in debugging a single microservice is much different than holistic application health.  The best solutions allow for multiple dash views, which are task- or component-specific .

What’s the Difference?

Charity Majors, former facebook engineer and CEO at Honeycomb.io describes the difference like so:

As we work to make our code more observable, we seek understanding of our systems and how they operate.  Doing so deliberately in your development process will lead to systems that are inherently more understandable, and that leads to them being more reliable.


Do you track your application's performance?  If so, how do you incorporate it into your delivery process? Comment below and join the conversation!

Samuel Thurston, Software Engineer

Samuel Thurston, Software Engineer

Samuel Thurston is a Software Engineer and Cloud Practice Lead for ISE, architecting and implementing cloud solutions for enterprise clients. He enjoys running, yoga, and cooking, and is frequently found on the disc golf course.

Samuel Thurston, Software Engineer

Latest posts by Samuel Thurston, Software Engineer (see all)

FaaS: Function as a Service Sep 28, 2018

Networking is Hard Aug 23, 2018

Observability in Distributed Systems Jul 19, 2018

Hands-On With Amazon AWS DeepLens Jun 21, 2018

Using Amazon Athena to Query Large Datasets May 24, 2018