Your business as an observable system

Your organization is a complex adaptive system that processes information and learns. To understand it in terms of systems thinking, as a bounded object, you need to be able to see it.

Surprisingly, very few companies I have seen over the years are interested in the study of their own organization. At first they are too busy getting their business off the ground, and only a few blinks later, already too deep into their daily grind to step back.

For most of Codegram's life, we have been no different. You know the drill –you get busy, there is no time for that, we need to finish this project first. Soon enough, you realize answering the most basic questions about your business is tedious work –diving through spreadsheets, sending emails back and forth, debugging custom scripts that no one remembers how to run anymore. For slightly more complex questions, it might simply be impossible.

In this article I describe the steps we took to change this, and why you might want to do it too.

Seeing your business quantitatively

Instrumenting your production deployments is essential to monitor their health, and your organization is no different. Like any complex system, you have a whole wealth of metrics that give you different perspectives. However, stealing some ideas from the recent observability trends permeating our distributed systems literature, we favor capturing data as close to the source as possible, in its raw form, rather than aggregations.

Data storage

All this data needs to end up somewhere, permanently. For that, we chose a db.m4.largeAmazon RDS instance. Depending on your queries and the size of your data, you might have different requirements.

Data capture

There are a number of systems that you want to capture data from: invoicing data, page views, timesheets, etc.

We started by developing a custom data pipeline based on AWS Lambda, but even though it was relatively extensible and lightweight, it was designed around streams of events such as webhooks, and did not support backfilling large datasets.

That's when we found Stitch, a sort of ETL software as a service. It fit our data volumes perfectly, as it's free for replicating up to 5 million rows monthly. So we signed up and connected all our sources (Typeform for rough project timesheets, Harvest for invoicing, GitHub for all our projects, etc) and configured it to replicate periodically to our Data Warehouse.

Most of the sources we use support key-based replication, meaning that we only replicate changes incrementally. That enables us to have a full history of everything that happened while saving on storage. This becomes more and more important as we try to answer questions not only about how things look at every point in time, but also about how things change over time.

Asking questions about the data

Once everything is in our Data Warehouse, it is available for querying. Hooking up to your RDS instance and issuing a SQL query would be fine if you are the only person in the organization, however that is unlikely.

Here is where Metabase comes into the picture. Being an open-source solution, with very convenient one-click deployments on Heroku and AWS, trying it out was a breeze --and we've been using it ever since.

It is a platform where you can connect different data storage instances and query them. Not only that, but its focus is on making it easy to ask questions, save them and share them with your team. It also overlays a metadata schema on top of all your tables where you can document and type your columns for easier querying, as well as defining custom segments.

Is data all we need?

It might be tempting to think that, now that your whole organization is reduced to events, the answers to all your questions are just a few queries away.

The truth is, to make good decisions, understanding people's mental models is also essential. That means qualitative understanding can never be replaced. No dashboards can replace 1-on-1s, and team dynamics is still a very tricky matter that only humans can assess, and it remains almost an art form to tweak.

By all means, the more quantitative insight you get into your organization, the better -- just don't forget that every view on a system is biased and incomplete by definition.

What to measure?

If we want to have a better picture of our organization, there are a number of things we ought to measure and track over time.

Team productivity

A common mistake in organizations is to try to measure productivity at the individual level. Gathering metrics on every individual's contribution is as pointless as capturing the state of every single cell in a body organ. To understand health, and even to predict disease, one needs to look at systems and feedback loops.

For example, sometimes to increase the team productivity you need to reduce the output of a particular individual —maybe if they spend more time helping others, or improving infrastructure and tooling, the multiplier effects on the team are much greater than if they are writing features all day. This unintuitive notion cannot be grasped from an individual productivity point of view.

Delivery cycle time, time from customer inquiry to successful ticket closing, all of these are perfectly valid metrics to track, but always track them at the team level. There is no use in people spending resources in competition rather than collaboration, and it's extremely toxic for the culture.

Profitability

If you are an agency, measuring how profitable each project is is relatively straight-forward, as long as you have a way to roughly track time spent on a project and you have invoicing data.

Tracking time is tricky. The act of tracking it alone can change individual behavior immensely. People will routinely under-report time (even behind their team's back) if there is any suspicion that it will be equated to their individual productivity.

The way we solved this problem at Codegram is tracking rough time blocks, asking two questions. At lunch time, a bot asks "What project did you mostly focus on this morning?", and before leaving the bot asks the same about the afternoon. The answers are anonymous.

The collected data is enough to extrapolate a rough estimate of hours spent on each project, while avoiding the incentive to under-report, both thanks to the vagueness of the question and to answers being anonymous.

Red flags

People committing code in the evenings or weekends, longer delivery cycles whenever technology X is involved, clients that tend to pay late --all have one thing in common: they are a single query away. And they can be reified into dashboards for everyone to see, to keep an eye on, and even to trigger alerts.

Learning to identify red flags and act upon them on time is invaluable, especially for management.

A note on incentives and culture

Everyone in an organization is in a two-way relationship with a cultural macro-system, simultaneously shaping it (what is a norm and what is a deviation) and adjusting their own behavior to it.

In terms of control, management can set examples, incentives and rules to shape those macro-systems, which can eventually change the team dynamics entirely. This is a tricky thing. Whichever metrics or indicators are on display, discussed about, praised or dismissed, will affect the culture.

As an example: if you have a leaderboard where people are ranked by number of lines of code shipped in a week, expect all sorts of unintended emerging behavior.

Towards becoming a learning organization

By making the organization's datasets available to the whole company and reducing friction to ask and recall questions, over time it becomes easier to align everyone with the organization's objectives, and thus develop a shared vision.

Over time, these questions and answers solidify into organizational learning. The company learns the goals it cares about, and how what it did in the past helped achieve it.

Becoming a learning organization is a continuous endeavor, but I feel we are on the right path. All it takes is taking a step back.

business observability data