Build org-level alignment around metrics and publish metrics definitions to the rest of the org with confidence
Transform is a centralized metrics store in your data stack that enables you to define all your important metrics in a single place using MetricFlow, our Metrics Framework, explore and collaborate on these metrics using the Metrics Catalog, and perform further analysis by querying for metrics within your favorite tools using our Metrics API.
MetricFlow provides many benefits, including providing a centralized source of truth and the ability to query for metrics and dimensions while simplifying complex SQL. Most importantly, you can easily publish metrics definitions to the rest of the company with confidence and scale governance as your organization evolves and grows.
This blog post gives a quick overview of how the framework can be set up as a foundational basis to deliver accurate insights.
Configuration language
To use MetricFlow, the first step is to model the data in your warehouse through our configuration language. Let's start with a simple example from a hypothetical web store. A table that might be in the data warehouse is an orders table that includes information about the orders that were placed.
The given table can be represented in MetricFlow as a data source using the following YAML configuration:
There are many fields in the configuration, and to keep this post short, we'll focus on the essentials.
The first one to note is the sql_table field. This specifies the name of the SQL table that this configuration applies to. For something more complex, a SQL query (as sql_query) can be used as well.
You can also note that various columns in the table have been categorized into three main types: identifiers, measures, and dimensions.
- The name field refers to the name of the column in the SQL table.
- Identifiers are generally the keys that are used in the joins between normalized tables.
- Measures are quantitative, numeric values of something that you want to measure and aggregate. All of the measures include an agg field that can be used to specify how those fields should be aggregated.
- Dimensions are qualitative values that can be used to categorize measures.
- For all these elements, a SQL expression (expr) can be specified that tells the MetricFlow framework how to extract the associated values from the SQL table or SQL query.
Using the measures defined in the data source, we can define a simple metric. In this example, the metric is almost the same as a measure, so the type of metric that we are creating is known as a measure proxy. More complex metrics like ratio and expression metrics are possible as well. For this case, you can see that we are using the order_total measure to create the revenue_usd metric.
These files can be stored in source control (e.g. git), allowing for reviews and auditing of changes.
Power of the API
Once the above definitions are created, you can answer questions like the following through the API:
- Revenue for last year
- Year-over-year revenue growth
- Revenue for last year from orders without coupons
- Year-over-year revenue growth of orders with coupons
If another data source were added that described the users table, you could quickly answer questions like
- Revenue by user's country
- Revenue by user's age
- Year-over-year revenue growth from US
- and more...
You can do all that without having to write SQL! MetricFlow will generate the SQL to realize the joins and produce the de-normalized datasets for you. While these are simple examples, the queries can get complex when there are numerous metrics, dimensions, and joins involved.
The configuration above can serve as the single definition of how metrics should be computed in your data ecosystem. Using the definition, the data returned through the API can become the single source of truth for metric values as well.