MetricFlow: how we got here, and where we're headed

MetricFlow: how we got here, and where we're headed

Victoria Marinoff

We spoke with Tom Lento to learn about MetricFlow and the impact it will have in the future. He talked about how MetricFlow allows people to build shared knowledge and extend each other’s work on metrics and data. We also discussed MetricFlow’s future as a core building block of tools that allow people to tell stories and have conversations with their data.

Can you tell me about yourself, your background, your work history and how you ended up at Transform?

Tom: I was doing a PHD in sociology, studying what motivates people to participate in online communities, and once I left graduate school I ended up joining Facebook. Theoretically, I was to continue the work I had been doing but I joined at a time when they had a very primitive data stack. They needed people who could write code to run queries in order to answer fairly simple metric related questions, so I quickly became something of a measurement specialist on how users behave within the product. Over the years, I moved into software and started building software frameworks to make it easier for people to work with data at Facebook.

After 13 years, I decided it was time for a new challenge and James called me up and pitched Transform. I had worked on frameworks with similar goals as Transform’s, and I thought it would be fun to take a shot at creating a new product in a smaller company, so here I am.

Can you tell us a bit about the idea behind creating something like MetricFlow?

Tom: I think there are 2 essential parts of it - there’s MetricFlow itself and then there’s the open source nature of it.

MetricFlow grew out of Transform’s core framework. After I joined, Paul had some great ideas on how to revamp the original framework and make it more elegant, more streamlined, and more easily extensible to different metric scenarios that we would want to support. The thing that really got me excited about working on it was this notion of centralizing metric definitions in a way that humans can understand. What I worked on at Facebook, and what I saw Transform doing, was not so much solving a technical problem around metric computation, but rather a more fundamental organizational problem of communication around data.

What I’d seen a lot at Facebook was people who have an idea of what a given metric is and what it means, and they create that in SQL somewhere. Then when somebody else wants to create that same metric, they end up repeating that process and ultimately define it differently in SQL.

So we’d end up with two different metrics that share the same name. That’s a communication problem, a social problem within the organization. What MetricFlow does is it allows people to communicate via configuration and meta-data specifications to say "hey, this is what this metric is, this is what it means, this is how it should be computed" and then we compute it in a very consistent way for them so that anyone querying it knows that they're getting the same answer. That’s what’s so exciting to me about it - it allows people to build shared knowledge and extend each other’s work in a way that has historically been very difficult in the data and metrics space.

When it comes to open sourcing MetricFlow, what I found most interesting was the community engagement. Getting people who could contribute to the project in ways that would bring it to areas that we, as a company, would probably not get around to supporting for a long time.

I think the other part of it that was really exciting was just getting more eyes on the project. A lot of what we’re doing with our query rendering requires a high degree of precision in terms of how we’re managing the SQL. For example, things like dimension joins with cumulative metrics or percentiles (coming soon!) can be tricky to get right in terms of balancing the need for optimization with the strict requirement that we always produce the right answer. I think we've done a good job with quality but just having more people looking at it gives us more confidence that they're using the framework in a more open way, that we're building the right set of tools, and that we're building them in a way that people agree is good.

What impact do you believe MetricFlow will have on future projects?

Tom: When people use metrics they're not sitting there thinking, "I want to run this SQL query so I can build this dashboard," but that is kind of how the tool chain works today. Typically, they're thinking something along the lines of "I want to know how many people from Germany used our application yesterday." That's the way we typically phrase concrete data questions in a business context. In an ideal world, we’d be able to say things like "I want to see all of our revenue streams broken down by sales region and customer size" or “I want to know how much of our revenue comes from the 5 biggest customers in each of our top 10 markets.” We’re still a long way away from being able to phrase questions like that and get sensible answers from our data tools, but MetricFlow provides a key first step to asking better questions of our data. As the first building block on top of the base level warehouse where all the data is kept, it's what allows us to specify "hey this is where customer’s size is stored and this is how it maps to country and to sales region” and “this is what revenue means and this is how it's computed". Having that information allows us to compose these more complicated scenarios in a way that's still fairly easy for everyone to understand, and therefore it enables broader access to reliable data.

It also becomes easier for us to drop down to different levels in the data stack when something goes wrong, or we’re having trouble finding something. We can go to someone who's an expert in that one next level, or learn what we need to know about that one single thing in order to get unstuck. That alone is a big shift over what I’ve seen, where we have to search the organization for the one person who happens to know how the data infrastructure works and how the data warehouse is laid out and how that specific metric was defined, which gets to be an impossible challenge once an organization gets beyond a few hundred people.

Where do you think MetricFlow will be in 5 years?

Tom: I hope that in 5 years MetricFlow will be the standard way for specifying how metrics and dimensions interact in the warehouse. I also hope that MetricFlow itself is stable and the community has shifted its focus to building other layers on top of it. Someone can then say, "hey, let me figure out how to parse a natural language query into something that MetricFlow can understand” or “let me figure out a way to really streamline and make it more efficient for us to then switch between different kinds of dimensional contexts".

I think that kind of stuff is going to be much harder to build into MetricFlow itself because it's a building block, it's not the whole answer. So I would really like to see our community getting to the point where we're actually figuring out ways for people to have conversations and tell stories with data.

Do you have any advice you would like to give to new MetricFlow community members?

Tom: I think it depends on what kind community member we're talking about.

If you're a developer, reach out to us, connect with us, look at the Github issues and read the contribution guide. I think there are a lot of things we can do to make the onboarding experience easier. We've worked pretty hard on making it as straightforward as we can but there's nothing like having an outsider go through the flow for the first time to help us improve that process.

We’ve also developed a MetricFlow example repo that essentially consists of an end-to-end deployment of MetricFlow - from start to finish a user can download this repository and follow all of the steps laid out there and begin exploring MetricFlow more independently.

For somebody who is on the user end of MetricFlow (where they're installing MetricFlow and trying to do things), the #how-do-I channel in slack is a really great resource and we're pretty active there. The other thing that I would say is just try and play around with it. Try and get a model working. That will give you a good feel of what MetricFlow can do. I would even suggest not trying to make the perfect model, just do something simple where you can play around with some metrics and dimensions and run some queries and really just get a feel of how it works first. Then just step up from there.

MetricFlow Github Repo

Join the MetricFlow Community on Slack