By James Mayfield
Do you know that feeling of presenting an important data insight and realizing midway through your slide deck that your numbers don't line up with some other numbers that were shared last week? Or that feeling when you realize only after pressing send on an important email that your metrics definitions differ, and the first response will be from your boss asking why your KPI has a different value than what your colleague reported? This lack of tie-back to a single, human-readable metric definition and single code-reviewed definition is felt across data teams worldwide.
This is partly due to a data ecosystem and suite of tools that promote the easy creation of lots of new ingestion pipelines, ETL jobs, data tables, and visualizations. The modern data stack has conspired, perhaps unintentionally, to make it challenging for data teams to enforce a single set of metrics that are known and trusted across all consumption points. Analytics tools are designed to create new shiny data widgets that demonstrate value as quickly as possible to close sales deals. Simply put, it is easier to build a novel data asset during a trial window that shows value and gets a contract signed. But the cost of this speed of creation for new things is that to create even a modicum of metric governance becomes a process-laden policing slog.
This burden of figuring out why metrics are misaligned falls heavy on the shoulders of analysts to manually debug technical problems and communicate nuances to internal data consumers. There is human cost associated with this -- technical data contributors spending precious time to sort out why definitions do not line up across dashboards, reports, spreadsheets, and presentations.
Go ask any analyst who is accountable for reports that span multiple teams and they'll all share the same metric misalignment fears with you. Examples have real business impact too, like when someone on the marketing team has been making business decisions on data that differ meaningfully from the finance team's data. An analyst from the growth team is reporting on user metrics that are way higher than how the product team counts new users can actually impact things reported to the board of directors.
The good news is that data people are a resourceful bunch, and we can help to cover the tool deficiencies that exist in the market today with some forward-looking work on alignment. In this blog post, I'll share a couple of personal stories as well as a guide with ideas on how to set your teams up for success with metrics governance.
In early 2014, I joined Airbnb as the first technical Product Manager at the company. My role was focused on data infrastructure and building tools to help support the data science team as well as all data consumers using metrics to make decisions. Riley Newman was the leader of the Data Science team, and he was just starting to put together an ambitious project plan for creating the "core data" to align the whole company.
Airbnb is a two-sided marketplace, with hosts providing the supply of listings and guests providing the demand. Everything we did was meant to help facilitate great travel experiences and to increase commerce in that two-sided marketplace. Because the marketplace was so interconnected, we needed to ensure that all employees could speak the same data language and have the same ground of truth about core metrics like "how many active hosts do we have" or "how many bookings happened last month in this region".
Before starting the "core data" project, there were some concerning disconnects between teams about fundamental metrics. One illustrative example was the product team working on our host and guest experiences did not report on numbers the same way our internal finance team did. There were subtle differences in how we approached the definitions of metrics -- even fundamental things like bookings had nuance around things like booking cancellations that occurred after reservation but before travel. The problem was so acute that at times, the numbers we reported to the company's leadership advisors were different from the numbers we reported to internal executives. We needed to get on the same page, and we needed to get there fast.
WHERE TO START
The data science team put their heads together and identified 20 metrics that we believed were the most critical data representation of the Airbnb marketplace. We then set out to define them in plain English, before taking them to the relevant stakeholders to build alignment and agreement upon the definitions.
Example: "an active host is someone with at least one property listing that is available to be booked by a guest for at least 3 days in the next 30 days".
Example: "a booking is considered complete at the moment that a guest reservation request is accepted by the host, regardless of whether either party cancels the reservation before the travel occurs"
As data people, we often presume that people have the same working definitions of metrics and KPIs that we have in our heads, but doing this exercise will open many folks' eyes to the amount of confusion and misalignment happening. One of my hypotheses as to why this happens is that just a few people understand the data ingestion pipeline and ETL logic details at an organization. And those folks have very weak tooling for articulating what’s happening in a logical way without diving deep into hundreds of lines of SQL – let alone showing metadata about metrics in plain English. There is a clear gap in the modern data stack around metrics-based tools that start with governance, and then work toward data viz dashboards.
For our “core data” project, we started by writing definitions in simple human-readable text. Once these definitions were written, it was relatively easy to take the definitions to various stakeholders on data, eng, product, and finance to get their sign-off that this logic is the right way to count things. My strong suggestion is that the data analysts and data scientists assert their best-understood metric definitions with buy-in from executives and existing board members. As data people it's our responsibility to make sure that everyone understands the role they have when it comes to those previously defined metrics and the power and impact they have in every future decision the company makes.
When there is confusion about a metric definition, most of the time it can be handled with a quick meeting or email thread to suss out the source of misalignment. At times there is good reason to create multiple metrics that differ from others in small, but meaningful ways. For example, an operations team that runs a warehouse may need to count actual products on the warehouse floor, whereas a finance team doing amortization accounting may need to count all units stored in the warehouse by the end of quarter for tax filings. When this happens, the right solution is to create two different metrics with distinct titles like "warehouse stock count" and "finance stock count" and clarify how they are differentiated in your metric catalog.
If two definitions have a fundamental disagreement, it is worthwhile to set a meeting with the appropriate stakeholders and escalate issues to leaders in finance and product. It is critical to ensure numbers reported by both the finance and product teams align, thus providing clarity in the driving metrics that companies evaluate to score or judge progress.
HOSTING A KPI DEVELOPMENT DAY
At Airbnb, after alignment was reached on human-readable metric definitions, these critical metric definitions were put into a long Google Doc (which is far from the ideal solution, but was the best tool we had at our disposal). We made sure to put the names of all consulted stakeholders next to the metric definitions themselves to demonstrate proper sign-off and approval.
Then we set about to create the "core data" repo where we would codify all these definitions with corresponding data pipeline scripts. Identifying existing tables or building new tables to power these metrics came first, then second was ensuring that the exact SQL calculation was clear for how to compute these metrics. Better yet, including join examples for how to correctly join these metrics to dimensional slices -- though this got complicated very quickly.
• Get everyone together in person, or virtually for the day (buy a catered lunch, or give folks an Uber Eats code to order something if remote)
• Assign small groups of people to review the human readable definitions and find places where the ingestion, pipeline, or SQL code definition differs from the agreed upon human-readable
• Write new pipelines where necessary, annotating both new and existing pipelines to link to the human readable definition glossary
• In places where new tables are needed, create a new staging/dev table and backfill correct numbers
• Go through the most popular data dashboards that report on these metrics and add new widgets to show the new numbers if there is a change
• Talk to everyone generating reports, asking them to show the new numbers alongside the old numbers for 2-3 weeks
• Set a clear date for deprecation when all the old numbers will no longer be shown, as the new set of definitions are now the only correct ones
When you walk out of your KPI Development day with your metrics catalog and people aligned on the 20 most important metrics at the company, and all the reporting consistent, the data team will feel a huge amount of relief. Similar structure, formatting, and common definitions ensure a clean API to – supporting a data mesh approach – so dashboards, reports, spreadsheets, and visualizations can be sourced consistently by everyone.
A KPI Day is about getting the right set of people together in a room, have them review each other's code and metric queries, and have the ground truth defined once and for all. Having more metric-centric tools should help with this cataloging of definitions, metadata, annotations and more -- which is becoming more popular these days.
The next challenges to tackle would be around figuring out the right join paths so these metrics can be safely sliced by dimensions in a consistent way – but that's a much more complicated issue to tackle with mapping a semantic layer on top of your data warehouse. That topic may be a bit outside the scope of this post but something I always love chatting about.
Reach out to us on social media with any questions about how to host your very own KPI development day.