What metrics do ML product teams look at to define success? Which do you find to be the most important?

Question

Savita Kini · Accepted Answer

Metrics are an interesting question. This really depends on the type of product we are building that leverages ML. Since ML can be use for example in electronic records, sales workflows, computer vision type use cases or speech / audio use cases some of which I am familiar with -- we can break it down to product use itself and then algorithm/model used, how often it is used, what kind of business or customer experience metrics it provided or influenced. So the long and short answer is there is no "generic metric" -- we are still building product and features.

When it comes to the model itself -- there is the accuracy of the model. Here too -- there is no 100% accuracy, because it is either classifying, predicting, using regressions and so on. There could be again "domain specific" metrics that apply for example for computer vision models. If the model is for example of predicting system failures --- there would be model accuracy metrics to track. We are really getting into "model performance metrics" here.

Savita Kini · Answer

There is no one "metric" that ML Product Teams will use to define success. It entirely depends on what is the "ML" used in the context of a feature or product. Typical metrics might include

* Quality improvements achieved via ML implementation vs traditional algorithmic implementations

* Usage adoption of AI/ML based feature

* Business or user perception metrics of improvements - CSAT, NPS scores

* Time savings, $$ savings, etc typical Business metrics.

* Benchmarking vs peer products

* Accuracy and responsible AI metrics

Suhas Manangi · Answer

In addition to core business metrics that are improtant for a product success, below are the additional ones AI PMs obsess over to ensure the success upon launch doesn't regress over time.

1. Precision and Recall
 2. False Positive
 3. Model quality monitoring metrics based on where is the risk to business (feature quality, score shift, re-training frequency, etc)

Deepak Mukunthu · Answer

While working on ML product/feature, there are 2 sets of metrics:

1. Product success metrics that product managers define. Purpose of the is to measure the business/product outcome you are trying to optimize for. Your standard metrics like customer adoption, usage, retention, satisfaction etc. fall in this bucket. So, product managers choose what's best for the situation at hand.

2. ML metrics that data scienstics define. These metrics measure how good the ML approach/solution/model is. For e.g., good metrics to measure a Regression task are: Root Mean Squared Error (RMSE) or Mean Absolute Error (MAE).

Product manages and Data scientists collaborate to ensure that the business/product scenario is mapped to ML problem correctly and that there is clear line of site to achieve the product success metrics.

Rapha Danilo · Answer

The same first principles actually still apply to AI PMs, but with an added dimension of complexity, which is that a generational paradigm/platform shift like AI requires a PM to re-think the benchmarks for what good looks like, consider the new types of outputs/inputs required, and the additional internal communication paths needed to achieve goals.

Most similar to being an early PM in mobile 10 years ago, or the early web/internet even before that.

What doesn't change:

* (+) Focus on key outputs:
   
   * Focus on customers' jobs to be done
   
   * How well does our product solve them? What user behaviors should we see?
   
   * Measure the utilization and product KPIs/metrics that back up that these behaviors are really happening
   
   * Which business metrics (e.g. revenue, retention) will this impact

* (-) Budget the required inputs:
   
   * How much time, resources and investment is required to achieve this?

What's unique about AI product management.

1. Benchmarks move a lot faster: At the pace of change we're in, what was best-in-class a few months ago can be below-market today. These types of shifts in what "good" looks like usually happen over years, not weeks/months, making it harder to define success even if the success metric you are measuring against may not change.

2. Need to manage more complex inputs and outputs: You need to add more dimensions to your ROI analysis to account for the ML component compared to a traditional software product. See below examples.

3. Bigger need to break through the noise: With the ongoing AI frenzy, PMs need be very careful to not put too much weight on PR value of a new AI feature. You will get pressure to rush an AI feature, maybe from your leadership or the GTM teams, mostly to make noise around it and capture part of the hype. It's not necessarily a bad idea btw, but remember that ultimately you will need to build, maintain, and live with the outcomes of that AI feature.

Examples of new inputs/outputs to consider:

* Accuracy, reliability and speed of your ML models
   
   * e.g. What investment would it take to improve accuracy by X%? How well do our models perform in different languages? Are there labeling costs or MLOps/infrastructure investments we need to make? etc
   
   * Impact on users: How would this impact the ability for our customer to complete their jobs to be done in our product?
   
   * Impact on business: How will this show up in our product and business metrics?

* Bias, trust and safety
   
   * How are we measuring for it?
   
   * What are the impact and risks for customers and their adoption of our product?

Mike Flouton · Answer

Let me preface this by defining a product team as PM, UX and Engineering. I'd suggest there are at least two sets of metrics you should be looking at.

First and foremost, don't forget you're here to solve a customer problem. Judge success according in how the capability is driving that specific outcome just like you would any other product. That could be the time it takes a customer to do a task, number of phishing attacks detected, sales volume of your sellers on a marketplace or rides taken by customers.

To supplement those outcome metrics, you want to look at some supporting metrics as well. These may be specific to AI. Classic examples are

* Precision and recall

* F-score

* False positives and false negatives

* True positives and true negatives

There's a slew of others that I'm not going to list or explain, but if you're an AI PM you need to know them to be able to talk to your ML engineers. I'd recommend taking a course or working your way through a book. Don't be afraid to get your hands dirty and write some python in a Jupyter notebook!

Shruti Tiwari · Answer

Exact metrics would vary by product and application, but I would think about metrics in three categories-

1. Model performance metrics like accuracy, precision, recall, AUC-ROC, latency etc. Key metric will be based on the application like minimizing false negatives in email filtration app to avoid missing any important emails

2. Product adoption/engagement metrics like adoption rate, user override rate, CTR etc.

3. Business impact metrics like cost savings, revenue impact, CSAT impact, time savings etc.

What metrics do ML product teams look at to define success? Which do you find to be the most important?

Related Ask Me Anything Sessions

Salesforce Senior Director of Product, Generative AI Platform (Einstein GPT), Deepak Mukunthu on Generative AI

DocuSign Director of Product Management, Hiral Shah on AI Product Management

GitLab VP, Product, Mike Flouton on AI Product Management

Top Product Management Mentors

Related Questions