Editor's note: This interview mentions the Transform metrics framework. On April 6, 2022 we open sourced our metrics framework, now called MetricFlow.
Please click on this link to review the 1-hr audio format directly on the "Data Engineering Podcast" website. To make readability easier, we attempted to break the 1-hour podcast into smaller chapters. A big thanks to the Data Engineering team for this very engaging conversation with Nick Handel, our CEO, and Co-Founder.
Table of Contents:
- Section 1: Introductions
- Section 2: Experiences with Airbnb’s early metric store
- Section 3: Defining ‘a metric’
- Section 4: Defining ‘a metric store’
- Section 5: Impact of metrics inconsistencies on the business
- Section 6: Structure of ‘a metrics store’ and target users
- Section 7: Metrics store deployment: Cloud/On-Prem
- Section 8: Defining Metrics in Code
- Section 9: The four stages of metrics governance
- Section 10: Differences between ‘a feature store’ and ‘a metrics store’
- Section 11: Breaking down ‘denormalization’ and ‘normalization’
- Section 12: Explaining interfaces through MQL
- Section 13: How the metrics store fits into the modern data stack?
- Section 14: Why is it critical to build institutional knowledge around metrics?
- Section 15: Applying data mesh principles to the metrics store
- Section 16: Lessons learnt from customers
- Section 17: Limitations of the summary table approach
- Section 18: When does Transform makes sense and when it does not
---------------------------------------------------------------------------------------------
Section 1: Introductions
Tobias Macey: Your host is Tobias Macey, and today I'm interviewing Nick Handel about Transform, a platform providing a dedicated metrics layer for your data stack. Nick, can you start by introducing yourself?
Nick Handel: Yes, thanks for having me, Tobias. I'm a big fan of the show. I am the co-founder and CEO of Transform.
Tobias: Do you remember how you first got involved in data management?
Nick: Yes. For me, I originally studied math and then joined BlackRock out of college. I was working on a bunch of different technologies that I think now would be considered legacy tooling, but learned a lot about just how BlackRock was using various macroeconomic datasets to build models, and do analysis on some of their portfolios. From there, progressed towards wanting to do things in, I would say, more modern tooling, and so started exploring different opportunities and moved over to Airbnb in 2014, originally as a data scientist.
This was a golden era of Airbnb's data team. There was a bunch of investment in tooling, like Airflow and then Superset, the experimentation platform, their knowledge repo, just a bunch of great tools.
Section 2: Experiences with Airbnb’s early metric store
Tobias: My understanding is that your work at Airbnb, and experiencing the work that they were doing with their metrics layer was some of the inspiration for what you're building at Transform now. I'm wondering if you can just give a bit of the backstory of how you ended up where you are now, and what you're building at Transform.
Nick: I actually joined a few weeks before Airbnb released the very first version of its metrics store. It was called Metrics Repo, and it was actually within the experimentation tool that the company was building. Everybody was going through this shift of being a very design-led company to being a both design plus data-led company. As a part of that, Airbnb was really investing in tooling around product experimentation. I had joined the growth team. The primary job that I had as a data scientist was to run experiments.
When I first joined, I was really just a bottleneck to the product team that I was on, because it took me so long to run analysis on each of these individual experiments. This tool came around that basically just made it easy to define the various metrics that I wanted to use and do analysis on my experiments with, and just built out the pipelines to then serve those metrics to this kind of experimentation readout. Very quickly went from running an individual experiment a week, to running tens of experiments at the same time, and actually getting to dive a lot deeper into the interesting parts of them, because all of the metrics, and all of the different basic analysis, the stats testing and whatnot was served to me in this nice clean readout.
Over time, Airbnb invested more and more in that framework. Originally, it really served the use case of experimentation, but data scientists started to see that there were different applications. My very naive approach was to start running fake experiments, and generating metrics out of this automated data pipelining tool, to then pull into analysis. Then later on, it evolved into the tool that is now Minerva, which Airbnb talks a lot about.
Section 3: Defining ‘a metric’
Tobias: In the context of analytics and data platforms, I'm wondering if you can just share your definition of what a metric actually is, and some of the ways that they manifest throughout the data life cycle?
Nick: Yes. A metric is a bit of an abstract concept. To make it a bit more concrete I might dive into a specific example from Airbnb. One of our key metrics was nights booked. It was the North Star metric for the whole company. Every team tracked it, every experiment run at the company was either trying to impact it, or make sure that it didn't impact it and get something else done. That metric actually makes sense in a bunch of different contexts. It makes sense as how many nights booked were there by country, by listing type, by Superhost status, by whether it was a tree house or not. These things are called dimensions, and they bring context to numerical data.
Being able to aggregate that metric to many different dimensions is really powerful. There's a clear relation here to OLAP cubes, and data engineers, and today, more and more analytics engineers, are responsible for building these nice clean interfaces into the data warehouse for broader business consumption. By capturing these definitions for metrics in a somewhat abstract way, and then being able to flexibly build them to various different dimensional levels, we can serve these nice clean data sets to the company that then allow less technical users to consume them.
We've seen a bunch of different solutions here around summary tables, or just queries existing in a bunch of different downstream tools, from BI tools, to really a wide range of different places where people want to consume metrics. The point of this definition of a metric in our framework is to then be able to both build those datasets in the warehouse, and also build them in downstream tools consistently.
Tobias: One of the other pieces of terminology that I've encountered, that is reminiscent of what we're discussing here with the idea of metrics is the concept of master data management where you have this one golden table that says if you need to be able to query against, using the example that you gave of nights booked, then you query it against this table, because we did the calculation ahead of time for you.
I'm wondering if you can just draw some parallels between some of the ways that master data management has been done historically, and some of the challenges that it poses, and what you are working towards with Transform to enable this more flexible category of metrics that can be calculated at query time?
Nick: Yes. There's this really interesting history of semantic layers in general. There are a wide range of takes historically, whether they existed inside of business intelligence tools, or they existed within data warehousing type solutions. The point of this tool is really to pull that out, and separate it from the various pieces of infrastructure that are either storing or applying compute to data, and then all of the different places where people want to consume metrics.
Section 4: Defining ‘a metric store’
Tobias: As you said, there've been a few different generational shifts with the idea of the metric store being the most recent one, and one that's been gaining a lot of attention, at least in the past few months that I've been seeing it popping up. I'm wondering if you can just talk through some of the ways that those different semantic layers have been managed, and some of the challenges and complexities that teams face when trying to create, and manage the context, and the semantic meaning around data, and what you see as driving the shift towards this dedicated metrics layer.
Nick: Yes. It might help to back up and define what a metric store is, and then dive into the various takes. I see a metric store as really these four pieces, and that is the semantics for how you capture the information. It seems relatively simple. It's various tables in the data warehouse, and they have connections or relationships to each other, but actually, it's probably one of the most important pieces, and it's something that Airbnb iterated on for years. It's also quite hard to change once you start capturing information, and so moving between different ways of capturing the semantic information, it's a challenging evolution.
The second piece is really around performance, and that's getting at this question that I think you're asking around static, are you building the datasets in the data warehouse? Are you building them to some kind of location that can serve them really quickly versus dynamic. Are you asking on the fly for some metric denormalized data set to get constructed? The next two are really how you are exposing that data to the rest of the company and so the third piece is governance, how do you apply lifecycle management? How are you managing the definitions of these metrics? The last one is interfaces, how are you exposing these metrics to all the different places where they're getting consumed?
When I look across various tools that exist, I think that they're largely the techniques that they're applying, can be bucketed into those four categories. There's varying levels of investment in each of those. I think that there are quite a number of tools out there that solve problems in each of those spaces but the metrics store, in my mind, is a holistic solution to each of those pieces and allows for a connection between each of them. All the way down to how am I consuming data off of the data warehouse, to how am I making sure it's right and getting it into the various tools where it needs to get consumed from.
Section 5: Impact of metrics inconsistencies on the business
Tobias: As you're saying, historically, there have been a few different approaches to solving different pieces of the problem where a lot of it will live maybe in the business intelligence tool, where there's a way to add context to a particular calculation, but then if you need to be able to use that same calculation in a Spark job, for instance, then there's no clean way to be able to access that because it doesn't live in a place that Spark can easily get to without reaching into the metadata database for the business intelligence tool.
I'm wondering, what are some of the potential negative impacts of having slight differences or inconsistencies in how these metrics are calculated and maintained, and differences in lifecycle that can come about. If you think that you have replicated a metric view accurately in two different places, but then later find out that maybe you flipped an operation or changed an order of operations somewhere, and all of a sudden, you're wildly divergent?
Nick: Yes, exactly. The challenge here is, how do you, in this process of doing denormalization, once you have these nice, clean, normalized models, sitting in your data warehouse, how are you then going and consistently building the datasets that you want to consume in all the different places that you want to consume them? There are a lot of different negative consequences but I think that it all boils down to lost trust in data and a lack of productivity amongst the data consumers. Can they easily access the metric that they're trying to consume, and do they trust others when they say they have some insight?
When I joined Airbnb, there were three definitions of the company's North Star metric bookings. The big challenge there was that different teams would come to a meeting and say they saw this thing happening in the business, and then there would be some disagreement. Ultimately, what it would boil down to is two data analysts staying after that meeting, and just hashing out specific nuances of the SQL that they had written. It was an incredibly inefficient process, but worse, it led to the higher ups coming to these meetings, to just say, "I'm just going to use intuition here, let these data analysts figure this out, and then next time we'll come back and look at the data."
Section 6: Who would benefit from a metrics store?
Tobias: In terms of what you're building at Transform, what are the primary goals that you have for the platform and the target users that you have in mind as you're building out the overall system, and the user experience design, and the integration points?
Nick: The company's mission is to make data accessible. The philosophy behind how we're going to do that is that there needs to be better interfaces for data producers and data consumers, broadly bucketed, to communicate with each other. Our hypothesis is that a metric is a really great interface, because in some ways, it's the language of how non-technical users use to then communicate around data.
This all starts with establishing a definition in our metrics framework, and then exposing that broadly to be both computed, but also to share that definition and that metadata with a wide range of tools. That's where our APIs and our metrics catalog come in. Then on top of that, there are a bunch of different ideas for how we can use those metric data sets to do interesting things. There are ideas around forecasting and anomaly detection, and applying annotations to metrics, and building data sets for experimentation, and really just pushing metrics into the various places where people can then make use of them.
The two users of the tool in my mind are these broad buckets of data producers and consumers. I think to get a little bit more granular on data producers, it's some combination of data engineers, analytics engineers, data analysts. They're the people who build the normalized data sets in the warehouse, and have a hypothesis around how they should be consumed by the broader company.
On the consumer side, probably about 97% of most companies are not data workers, they're not data analysts or data engineers. Really, I think these metrics should be consumable much more broadly. That means building nice interfaces that allow them to then consume those data sets or to pull them into the interfaces that they know and like to consume data sets from. In order to accomplish that, the metrics framework, which is really aimed at that data producer is a framework that's built around YAML and SQL. It's contributed to Git in order to do version control, and then that publishes these metrics through either our catalog, which is the first demonstration of the power of some of our APIs.
Hopefully, that catalog makes it easy for this data consumer to then ask basic questions, "Show me this metric sliced by this dimension." Then beyond that, there are a bunch of different interfaces that data producers also want to expose their data sets in. We publish a number of different APIs that can then connect to anything from business intelligence tools, and Jupyter notebooks, to GraphQL and React, which allow front end developers to build on top of Transform.
Section 7: Transform’s platform architecture
Tobias: Can you dig a bit deeper into the way that the platform is architected and some of the system design considerations that you had to deal with as you were building out the initial versions of the platform, and some of the ways that it has grown and evolved since those initial prototypes?
Nick: The core of this platform is really this semantic layer, where the data producer is defining these YAML files. These YAML files have some amount of SQL expressions in them, but really the most important part are the abstractions that we've chosen for how to capture this information and what those abstractions enable. Those files then get parsed by the semantic layer. We have a server which then basically builds SQL against the customer's underlying data warehouse.
Everything that we do is built on top of the customer's data warehouse, we use their existing storage and compute, but we can kind of do two deployments because of that infrastructure. One is where we're actually deploying on their virtual premise. That means that they are connecting their data warehouse to Transform, it's all staying in their ecosystem and they get all of the security guarantees that they want.
The other option is a hosted version where we're basically just building SQL to their data warehouse, and not actually passing any data back to our ecosystem. The specifics of what's built out, the metrics framework that we use, is written in Python. The front end is TypeScript, GraphQL or React. Then the APIs are written around a GraphQL core, but really there are any number of interfaces that we can build on top of that, and that would be in whatever language it's being consumed by.
Our command line interface and our Python client are both built in Python. Our JDBC is built in Java and then our front end is built using the React components GraphQL components at our GraphQL interface that we are then exposing to our customers, so they can build on top of those APIs.
Section 8: Defining Metrics in Code
Tobias: In terms of the actual workflow of building a set of metrics and then consuming it downstream, what's involved in actually defining a metric, populating that into Transform, validating that in terms of any sort of organizational discussion that needs to happen around that, and then being able to consume that from a downstream system, whether that's business intelligence or a Jupyter notebook, or a Spark pipeline for instance?
Nick: The actual definition workflow is typically done locally and we have a command-line interface that makes it easy to iterate on these config files, test them, run variations of metrics that already exist, or define new metrics. Then it follows the standard code commit practices that the company is using. Those files will get contributed to Git, those once merged, would go to our MQL server, get parsed into the current active semantic layer, and then any API requests coming in would be made against that current semantic layer. That means that our front end is then building on top of these current definitions.
Another really cool thing about this is that if a metric definition changes in that semantic layer, then all of the different places that the company is referencing that metric, so through our JDBC over SQL, or through some notebook, really any of the interfaces that they're consuming it would then be consistent because they're getting the current definition of that metric. The nice thing about this is that we're really building on top of the same interface that we're exposing to our customers, which means that once the metric is defined in this framework, it should be consistent across all of the different places that they're consuming it.
Tobias: As far as the integration with the customer's data systems and data platform, you mentioned that Transform sits on top of the data warehouse layer. I'm wondering what types of validation and introspection you need to do to be able to provide useful feedback to the engineers who are building the metrics definitions, and as they iterate on defining it and creating the code representation, that they're then going to commit and populate into transform?
Nick: Yes. The core of this dev workflow is to basically be able to run the semantic layer against whatever set of configs you're using. The objective here is to really be able to iterate off of the current version of this kind of semantic mapping of the data warehouse, and then to be able to use those configs in the same way that you would use the configs that are currently in production. It effectively gives the end-user the same experience as if they were just querying the production MQL server.
Tobias: Because of the fact that you are targeting the data warehouse, I'm wondering if there are any challenges in being able to extend this layer, or if it even makes sense to try and extend this layer to account for more semi-structured or unstructured data storage locations, or if it's purely something that only really makes sense on a data warehouse that already has some measure of structure applied to it?
Nick: Yes. Right now, we're really focused on the data analytics use case, and because of that we're primarily building on top of the data warehouse as it exists and the structured data sets that are already there. I think that that probably satisfies the large majority of applications for metrics. I think it would be probably good to understand what kinds of metrics are getting built off of unstructured or semi-structured data sets, to really be able to answer that question.
Section 9:The four stages of metrics governance
Tobias: In terms of the actual life cycle of a metrics definition, I'm wondering what are some of the interesting stages that it progresses through from when it's first instantiated and somebody determines that they need to create this calculation, through to many years down the road where the business shifts, and maybe the underlying meaning of the metrics change, or do you need to be able to incorporate additional factors into how the metric is calculated or what the overall value should be?
Nick: This is a really interesting and important evolution of the framework that we saw at Airbnb. For the first two years of this framework, there was really very little governance. Aside from the fact that it was being committed to Git, there was very little oversight of what these metrics were, and who was consuming them, and how are they consuming them, and which ones were old. There was a big push to basically think through what are the stages of a metric life cycle.
I think that there's been a lot of iteration, and Airbnb published some great blog posts about this, but we have our own definition that's that there are four stages. It starts with definition. I have an idea of how I want to measure something. How do I define this? Is it different than the other metrics that exist in this framework? How do I compare it to existing metrics? How do I test it? Who do I want to consume this, and how do I want them to consume it?
That leads into the second stage consumption. If I want to consume this, am I using this right? Does it mean what I think it means? Is it up to date? Is it still accurate? Is the data good? Generally, am I able to pull it into the tools that I want to consume this from?
The third stage is iteration. I think that this metric needs to change. Who needs to know? Why is it changing? How is it different than before? What's actually changed about this metric? How do I compare it to the old version? How do I then, in the UI or in these APIs, be able to generate the old version, if for some reason, I still need to do that?
The final stage is archival. If this metric is old, how can I stop others from consuming it? Where does it go? Do I still want to maybe calculate it at some point in the future, but I want to make sure that nobody else is calculating it? How do I retain the knowledge that's been built around this? I don't want people necessarily to consume this, but I still probably learned some valuable things around this metric over time, and we used it to make decisions. There's some lasting institutional knowledge that's been created that needs to be tracked over time.
Tobias: There are a few interesting points from that that are largely based on the organizational aspects of the metric, particularly in terms of who needs to know about this metrics changing, who needs to be brought into help with the definition of the metric, or validate that the way that I'm calculating it is accurate. I'm wondering if you can just talk through some of the collaboration aspects of what you're building with Transform and how you think about enabling these organizational workflows, beyond just the technical implementation.
Nick: In our minds, the biggest challenge around helping an organization to define these metrics is really creating that interface between the data consumer and the data producer. I said this previously, but we really do believe that the metric is the ideal interface because it is currently the language that data consumers around a company are using to describe data and to understand it. By enabling the data producer to then go out and define these metrics, and follow some process, at the very least, it establishes a standard for where it's located and how it connects to these various systems. That enables an organization, I think, to build some of their own process around metric definition.
Hopefully, on the other side of this, there is a product that can then support the process that they're trying to build. I think that is probably one of the biggest things that we will be working on in the future, as we continue to expand our customer base, is just understanding all of the differences between how these organizations are consuming metrics, and what that means for the actual process that they want to follow, to make sure that those metrics are agreed upon and trusted across the organization.
Section 10: Differences between ‘a feature store’ and ‘a metrics store’
Tobias: Your point about the metrics layer being the interface between data producers and data consumers puts me in mind of the feature store, which is another layer that's been gaining a lot of ground recently that acts as that same interface point with the difference being that that's primarily for the machine learning workflow versus the analytics workflow that the metrics store empowers.
I'm wondering if you have any thoughts on the juxtaposition of the metric store versus the feature store, and the relative utility of metrics versus features, and maybe some of the overlap that might exist where you might want to have some level of communication between your metrics and your feature stores, and how those different calculations are defined and performed.
Nick: That's certainly a great question and something that I glossed over in my background was for a while I was working as a product manager at Airbnb. The team I was working on was building out Airbnb's feature store zip-line. At the core, I think these two things are very similar, but there are some really significant differences that I think make it a long way off of being a similar piece of infrastructure that is going to get built out, but at the core, what they're doing is creating derived data and then serving that derived data to specific application.
The really hard part here around the feature store is that there are much stricter requirements around the way that a feature is defined and it tends to be a lot more granular. That means that it doesn't necessarily serve the analytical application nearly as well, where you want to be able to slice and dice and ask different questions. There's some other complicated ones around timeliness. Feature stores require some melding of real-time and batch data construction. Machine learning models tend to require something called point-in-time correctness or time travel, and it's a complicated subject, but it's also something that is fairly different between analysis and feature construction.
The last really big difference between the two is consumption and reuse. There are really strong forces within organizations that push metric consumption to be consistent.
At the core, a metric is really just a way of compressing a bunch of information that accompanies collecting a bunch of data into something that's useful for decision-making or analysis.
What that means is that broadly, you have companies that are trying to push for a consistent definition across teams, across individual data analysts. It just makes the world simpler if everything is clean and consistent. That's a really big difference compared to features because features can perform better in certain models, and sometimes you want many different iterations of the same features.
The ways that I saw feature stores being adopted was primarily taking a feature, iterating on it, and then ending up with another variant to that feature. That's not really something that you do with a metric, or if you do that kind of analysis, it is through some dimensions and it's not actually changing the core definition of the metric. You're just aggregating it to some different granularity.
Section 11: Breaking down ‘denormalization’ and ‘normalization’
Tobias: In terms of the granularity and dimensionality of the metric, I'm wondering if you can dig a bit deeper into some of the complexities that come up, and some of the ways that somebody who's trying to build a metric definition can shoot themselves the foot when they're trying to figure out, "How do I calculate this metric?", and then be able to actually explore it at different levels of granularity and dimensionality, and just some of these technical and cognitive complexity that arises from that.
Nick: When I think about the most complicated part of this tool, the most complicated technical challenge. I really think that it's denormalization. To back up and just quickly define normalization and then denormalization. Normalization is defined as reducing data redundancy, and improving data integrity. The goal there is to basically define these nice clean data sets that don't replicate data around the warehouse, because then they're much easier to manage.
There are a bunch of great tools that have come out more recently that have enabled companies to build better cleaned, normalized datasets. There's been a ton of research in this space and a ton of discussions of different techniques of normalization, like Kimball and Inmon, and et cetera. When I think about what do you do with the data from there, well, that's really great that that data is clean, but then you need to go and make it useful. In order to make it useful, you need to start merging data sets, you need to start applying filters and doing all of the different things that happen in SQL or in Python to transform data.
That's really where this framework is aimed at supporting our end users technically. The input into our framework is typically these nice, clean, normalized data sets, and you can put in raw data sets and partially denormalized data sets, but really you get a lot more out of this framework if you've gone through the work of building these nice, clean, and normalized datasets. From there, denormalization is happening across so many different tools today. It's happening in the data warehouse where we're building summary tables, it's happening in the BI tool where we're asking some question. It's happening in dashboards, where we've asked a bunch of specific questions. What we really want to be able to do is build those metrics to a wide variety of granularities consistently across all of those tools. One of the biggest challenges there is what are you doing ahead of time and what are you doing on the fly? Ideally, you want those data sets to be really snappy. You want your BI dashboards to load quickly, but the more you've kind of baked into your tables in the data warehouse, the less questions you can ask.
The power of these data modeling frameworks is that they enable you to ask a wide variety of questions while also consuming those datasets in all of the different places where you want to consume them, and hopefully it's making them much faster. The kind of core technical challenge of our framework then is enabling denormalization to happen in all of these different places, efficiently and consistently.
In order to solve for that, we've worked on a bunch of different approaches to caching data sets, and trying to make that end result, whatever the question is, whether it's something that the end user has pre-specified as a question that they ask frequently, or if it's a question that is new, trying to make that dataset as fast as possible.
Section 12: Explaining interfaces through Transform’s API: MQL (Metrics Query Language)
Tobias: Then as far as the actual platform integration, as far as the data source, that's fairly obvious that you connect up to the customer's data warehouse and use either ODBC or JDBC for that connection. Then on the other side, you mentioned that you have these JDBC interfaces or you have GraphQL APIs, but for somebody who maybe connects it up to their business intelligence dashboard and then wants to run a query that uses data from their data warehouse and also factors in this metric, is that something where they would just pass everything through the Transform layer and then you would pull in your metrics definition and then also push down a query into the data warehouse and then join those two on their return flight back?
What's the story of being able to query against the existing database tables and the calculated metrics?
Nick: We basically have an API and we call it MQL, metrics query language, and it allows the end user to ask questions in the format of metric by dimension. You're asking for some metric aggregated to some dimension. You can also apply filters, ordering and whatnot, but that API requests can basically be expressed within a SQL query. I can say from MQL metric by dimensions, and that will return to me some data set that kind of comes in as metric and then the various dimensions that I've aggregated that metric to.
I can then express that API request within some broader SQL query where I'm using the full power of the customer's underlying data warehouse. Really, what this is doing is it's just building a de-normalized data set on the fly, and then querying that data set and joining it, and applying aggregations or transformations in whatever SQL the end user has expressed.
Tobias: In some ways, it's kind of the inverse of a stored procedure or user defined function, in that instead of you pushing a function definition into the database, you're pushing the database into the function definition.
Nick: Yes. That's exactly right. Yes, that's right.
Tobias: You mentioned that the interface for the data producers is this code-first YAML and SQL, sort of combined format. I'm wondering what your process was for deciding whether to go with a code first and code-native approach versus more of a sort of low-code or no-code UI-driven framework for somebody who is maybe coming from the business side who wants to be able to define these dimensions, and just what you see as the trade-offs of having this sort of text-based flat file definition versus a more UI-driven approach?
Nick: I think probably a future where those files get pushed into a UI or an IDE kind of experience. I think we just wanted to start with an interface that gave us the maximum flexibility and ability to iterate. In the early days, when we kind of thought about that, what are the tools that our end users are using right now? Well, SQL and YAML are pretty widely adopted in kind of the data engineering analytics, engineering world. We wanted to kind of meet them where they were.
Section 13: Missed opportunity in the modern data stack
Tobias: Another interesting element of this emerging space is how much support there is in downstream tools, thinking particularly around things like business intelligence dashboards, and other analytics frameworks for being able to introspect and understand the additional context that can be defined and exposed by the metrics layer as far as having a pros definition of this is what this metric is for, this is how you might want to use it. This is some of the metadata about who owns it, and who created it kind of thing.
What are some of the missing pieces of the overall data ecosystem that you hope to see filled in the coming months and years as the metrics layer becomes a more established architectural quanta?
Nick: The challenge here for us is that the entire data ecosystem is really built around tables today. It's not necessarily a significant challenge, but it is a missed opportunity. We can build tables off of our API requests, and by exposing this JDBC, we can build data sets that make sense and share the metadata that we want over whatever connection is coming in, but really what's missing here is that you're not necessarily getting that rich experience that you get when you connect to an underlying database, where you can browse the various tables, and you can look at all of the different columns, and get some summary information around it.
Ultimately, I think that it's not necessarily a challenge for our end users to get that information because we can expose it as tables to them. If they want to look at a metric and look at the various dimensions that they could aggregate that metric to, we can share that with them, but it's coming in the form of a table, obviously to conform to the world as it exists today. It's more about a missed opportunity to share that information, and the interesting information that can come with a semantic layer.
Section 14: Why is it critical to build institutional knowledge around metrics?
Tobias: In terms of having this semantic layer and this more holistically defined and uniformly exposed method of creating and managing these metrics, what are some of the capabilities, or projects, or organizational capacities that are unlocked by adding this to the data platform that are either impractical or intractable otherwise?
Nick: Just to start with the core value proposition, just consistent consumption of metrics and various tools. I think that it sounds obvious, but it really just doesn't exist at the majority of companies that we've talked to. It seems like it's one of the most universal challenges in the data stack right now. Then, looking out to the future, I think that there are a number of different applications that are enabled if you have this information.
Just thinking about the first one that really got me hooked on this type of tooling product experimentation, when I was at Airbnb, I ran 150 experiments in something like two years. I was looking at 100+ metrics on every single one of those. That is just not possible today. People don't have that tooling broadly. This is one of the core things that this enables. Beyond that, I have a lot of ideas for our product around this connection between forecasting, anomaly detection, annotations, and then notifications in contexts that can be pushed out to a company more broadly.
A forecast is where do you think that the metric is going to go, an anomaly is when it's outside of wherever you think it's going to go. Then an annotation is the addition of some context for whenever that metric moved outside of what you expected. Then, that's an important piece of structured information that can then be pushed out to an organization. I think that that is a very significant paradigm shift, where today we're creating a lot of data objects where we expect data consumers, so business users, to come to a dashboard and pull some insight out of it, and it's a really, really tall ask.
It's not just a tall ask because it's hard to get the data it's a tall ask because having all of the context that's necessary to pull some interesting and valuable insight out of that data typically takes somebody who has seen the data go end to end to that place. I think that we can create these really interesting interfaces beyond just the APIs and pushing the data out, to actually add context to these metrics.
That takes me to this last point, which is that a metric is an incredible vehicle for information. They're one of the most consistent objects in a company over time. They don't switch teams, they don't quit. They are consistent and long-lasting, especially if they're well-managed. By actually tying knowledge to them over time, you have the potential to add a lot of context that I think people don't have in many of the organizations that they're working in.
Just to tie that down to a concrete example, it just happens so frequently in just about every organization that I was in, that somebody asked me, "What happened on this specific date? I know you were at this company three years ago, help me understand that." Oftentimes, that information just gets lost. I think that a metric is a really interesting unit to carry that information forward.
Section 15: Cross-functional collaboration with governed metrics
Tobias: There's definitely a huge risk of loss of context and of value in an organization when somebody who has that useful understanding and experience either changes roles or responsibilities, or leaves the company entirely and doesn't actively document it. Being able to have this as the long-term artifact of somebody's experience, I can definitely see a lot of potential value from that. In terms of the users of the platform and customers who are starting to onboard with Transform, what are some of the most interesting, or innovative, or unexpected ways that you're seeing it used?
Nick: I think that's probably the most interesting thing is defining interfaces between teams. I think that I took this for granted when I was at Airbnb. I just assumed that this was normal, but we've seen a lot of teams adopt this tool and then define various metrics in different parts of the company that historically have not been consumed, or crossed the boundaries of various teams. We've gotten some really fun feedback from our customers around, "I've never sliced this metric by this dimension before, because this one existed in a data set that this team relied on, and this one existed in a data set that my team relied on."
That's really exciting, and I think it demonstrates a lot of the potential of this framework. I think back now to my time at Airbnb, where I was on the growth team, right? I consumed metrics from a wide variety of teams because oftentimes, growth teams' work impact some other team. I was consuming the customer service contact rate or the account takeovers related to signup and log-in flow work that I was doing.
I, to this day, don't know the definitions of those metrics. I could not have written the SQL to calculate them, but I know that I consumed them. I know that the teams that reviewed my analysis trusted the analysis because they had defined the SQL. It's this incredible unlock to basically just be able to communicate with another team reliably. I think that this actually touches on one of the core principles of data mesh, that's an exciting future that we are moving towards.
Tobias: The data mesh aspect is definitely an interesting element to pull out because it's been gaining a lot of ground over the past couple of years, and has a lot of utility in terms of how you think about building out the technical underlayment of the organizational capacity for data. I can definitely see the metric as being a useful exposed artifact for a given data team to be able to propagate and let other teams consume and combine them without necessarily having to understand the underlying calculations and computation that happens. That's an interesting point worth noting.
Section 16: Best practices from our early adopters
Then in terms of your experience, creating the Transform product and building the business around it, what are some of the most interesting, or unexpected, or challenging lessons that you've learned in the process?
Nick: I think that the majority of these come from generalization, so we saw this tool work within one company, and we went out and talked to maybe the 10 or 15 companies that have gone out and built similar tooling, but that's a very narrow picture of how people build and consume metrics. There are a lot of really complicated factors in there that require us to then generalize the way that the tool is built, such that it will be more useful broadly.
Some of these include just different data modeling techniques. Airbnb had a good mixture, I think, of nice, clean, normalized data sets, semi de-normalized data sets, and then raw data sets that were finding their ways into metrics, but it wasn't even close to representative of all of the different data modeling and engineering techniques that companies are using. A lot of lessons there. I think that also different scale puts different requirements on this framework. When I think about this, I think about that denormalization challenge of what are we building statically? What are we building to the data warehouse ahead of time? What are the kinds of questions that we're making it so that even if there 100 billion rows in this fact table, we can still answer the question of, how many rows were there per dimension that I'm trying to aggregate it to?
What that takes is basically pre-aggregating data sets. That's something that Airbnb got really good at because it had large amounts of data, but a lot of the companies that we're working with really just want to be able to do these things dynamically and on the fly, and they still don't want to wait that much time. It's some combination of building data sets and then storing intermediate representations of them such that incremental questions can be answered quickly, but they don't necessarily have the time to go out and build a bunch of, nice, clean, de-normalized summary tables that they can expose to their organization.
That's been a really big challenge, but also really big learning. I think that it pushed us towards making our APIs dynamics so that you can ask for any metric dimension combination, but there's a bunch of interesting work that we're doing around caching to make it so that those results can get returned quickly.
The last one I think is just organizational challenges associated with metric definition, the whole life cycle management process that I mentioned. It's tough and just about every company has a different idea of how this works. There is a big challenge around productizing that. What that means is that there needs to be a lot of configurability because this catalog really needs to work in the ways that companies expect it to work for that process that they want.
Section 17: Limitations of the summary table approach
Tobias: Your point about pre-calculating summary tables is interesting because I've had a lot of conversations with people where the general guidance is that you should have one or a small set of tables that can answer 80% of the questions in your business, and with the introduction of metrics and the amount of information that you have about what data is being used, how, and by whom, exposes the potential for an interesting feature where you can recommend a set of summary tables that would be useful to precompute, to increase the speed at which you're able to generate these other metrics views of the underlying data.
Nick: That's right. That's why we have really two primary layers of caching. The first one is one where the company can say, "I know that I want to compute this metric and this dimension together, and I want it to be really, really quickly, or I want the queries to be really quick on top of that." That's something where they know ahead of time, and we can get that query down to, in a really fast data warehouse, under one second.
On the other end of that, there are times when people just ask new questions, but if they find something interesting, they're going to keep asking it. We have this layer that we call dynamic caching, which basically allows you to ask questions. Then if you go and ask that same question again, it's going to be really fast because we're saving that data set in a similar way to the way that we're saving that materializations dataset.
This really enables people to ask these metric questions really quickly, but also enables them to ask a wide variety of them. I've definitely heard that 80% of questions can be answered by core summary tables. I think I would push back on that. I would say that it might be that the people who are consuming data at your company have just given up, and so you're not discovering the rest of the data questions that they have because you're just seeing the ones where they ask a question and it's not answerable, and then they give up.
I think that what we're seeing is that as more and more people are adopting this tool, and there are more combinations of metrics and dimensions that people can ask questions about, they will just ask more questions. Hopefully, that leads to more interesting and valuable insights getting pulled out of the data.
Section 18: When does Transform makes sense and when it does not
Tobias: For people who are interested in the idea of a metrics layer, and they want to be able to add some uniformity to how the metrics are defined across their different tools, and they want to be able to enable their business users to explore more of the dimensionality of their data, what are the cases where Transform is the wrong choice and they might be better suited with some in-house tool or something purpose-built for their particular use case?
Nick: We've talked to a lot of fairly small companies because I think that they have productivity challenges, but they don't yet have the trust challenges that our framework and the rest of our product is really aimed at solving. The reason there is just that if you have one or two data analysts on a team, you already have metrics consistency. It's already in the heads of those data analysts. They know the definitions and they are the interface to data for the rest of the company. There are some productivity challenges associated with that because if it's a data-hungry organization, there's going to be a lot of consumption of metrics and that's a significant thing to support.
Then what inevitably happens is they add more data analysts to that team and then you start to have some of those trust challenges. I would say that fairly small companies should probably just focus on the core of getting good, clean data into their warehouse, and normalized, and ready for consumption. Then they need to start thinking about what are the different applications where I want to consume metrics, because Transform is really valuable once you have more than one application. Just because if you're consuming it in multiple places, that is where this framework adds a lot of value.
The second one I would say is, there's a whole set of companies that consumes metrics off of Salesforce, or Zendesk, or any number of other tools. Because we're built on top of the company's centralized data warehouse, we just can't serve those customers yet. Generally, I would say that just about every medium to large company has metrics problems. That's the set of companies that we're working with in the early days.
Tobias: As you continue to build out the product and build out the business, what are some of the things that you have planned for the near to medium term that you're excited for?
Nick: There's just so much foundational work. The reason there is that if you are going to define a single source of truth for metrics, there's a core product philosophy that I think you have to have. One is that you have to be able to consume metrics from wherever it's located, and you need to be able to build whatever metric types the customer wants and we're still working on that. There are a lot of different types of metrics that companies want to consume. I would put us in the 90% at this point. We can support all of the core types of metrics, but still working to support some of the edge cases that specific companies are interested in tracking.
Then on the other side of this, you have to be able to connect to every single tool that a company wants to consume those metrics in, because in order for this truly to be a single source of truth, it has to be consumable in all of them. The moment it's not consumable in one of them, they will go around this tool, and it is no longer a single source of truth. There's just a lot of foundational work to enable that vision.
Beyond that, I think that once you have consistent metric definitions, there are a bunch of really interesting applications. These are the things that I already called out around forecasting, anomaly detection, interesting correlation analysis between them, building metrics for different applications like experimentation, executive reporting. There are just so many different applications. I think some of those are well-served today.
BI is an example of something that there are many different takes on how BI should work, and there are many people who are building the future there, but I think that there are a lot of different applications for metrics where people are still just starting from home base and trying to figure out how am I going to build this application? It all starts with building metrics out in the data warehouse, and then figuring out how to then productionize that, and so I think we can help with some of those, I call them long tail applications of metrics.
Tobias: Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. As a final question, I'd just like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today?
Nick: Well, I think the obvious answer is probably the one that I'm putting my career behind, which is just making metrics a first-class citizen of the data ecosystem, and generally making data more accessible, but maybe more broadly, one that I'm passionate about, is I think in order for data to really truly be accessible, we need to make a lot of progress with the data tools that we've built out.
I think in order to do that, there needs to be much broader cooperation between the various companies working in this industry. I'm excited about projects like Open Lineage. I'm excited that we are pushing the specs of how our semantic layer works out into the open. I think that this is something that will, hopefully allow more companies to build on top of Transform.
Tobias: Thank you very much for taking the time today to join me and share the work that you're doing at Transform. It's a very interesting product and an interesting problem space. I'm definitely excited to see more energy behind it and the wider availability of metrics across the overall data ecosystem. Thank you for all of the time and energy you're putting into that, and I hope you enjoy the rest of your day.
Nick: Thanks for having me, Tobias, this was a lot of fun.