What metrics do ML product teams look at to define success? Which do you find to be the most important?
Metrics are an interesting question. This really depends on the type of product we are building that leverages ML. Since ML can be use for example in electronic records, sales workflows, computer vision type use cases or speech / audio use cases some of which I am familiar with -- we can break it down to product use itself and then algorithm/model used, how often it is used, what kind of business or customer experience metrics it provided or influenced. So the long and short answer is there is no "generic metric" -- we are still building product and features.
When it comes to the model itself -- there is the accuracy of the model. Here too -- there is no 100% accuracy, because it is either classifying, predicting, using regressions and so on. There could be again "domain specific" metrics that apply for example for computer vision models. If the model is for example of predicting system failures --- there would be model accuracy metrics to track. We are really getting into "model performance metrics" here.
There is no one "metric" that ML Product Teams will use to define success. It entirely depends on what is the "ML" used in the context of a feature or product. Typical metrics might include
Quality improvements achieved via ML implementation vs traditional algorithmic implementations
Usage adoption of AI/ML based feature
Business or user perception metrics of improvements - CSAT, NPS scores
Time savings, $$ savings, etc typical Business metrics.
Benchmarking vs peer products
Accuracy and responsible AI metrics
In addition to core business metrics that are improtant for a product success, below are the additional ones AI PMs obsess over to ensure the success upon launch doesn't regress over time.
- Precision and Recall
- False Positive
- Model quality monitoring metrics based on where is the risk to business (feature quality, score shift, re-training frequency, etc)
While working on ML product/feature, there are 2 sets of metrics:
1. Product success metrics that product managers define. Purpose of the is to measure the business/product outcome you are trying to optimize for. Your standard metrics like customer adoption, usage, retention, satisfaction etc. fall in this bucket. So, product managers choose what's best for the situation at hand.
2. ML metrics that data scienstics define. These metrics measure how good the ML approach/solution/model is. For e.g., good metrics to measure a Regression task are: Root Mean Squared Error (RMSE) or Mean Absolute Error (MAE).
Product manages and Data scientists collaborate to ensure that the business/product scenario is mapped to ML problem correctly and that there is clear line of site to achieve the product success metrics.
The same first principles actually still apply to AI PMs, but with an added dimension of complexity, which is that a generational paradigm/platform shift like AI requires a PM to re-think the benchmarks for what good looks like, consider the new types of outputs/inputs required, and the additional internal communication paths needed to achieve goals.
Most similar to being an early PM in mobile 10 years ago, or the early web/internet even before that.
What doesn't change:
(+) Focus on key outputs:
Focus on customers' jobs to be done
How well does our product solve them? What user behaviors should we see?
Measure the utilization and product KPIs/metrics that back up that these behaviors are really happening
Which business metrics (e.g. revenue, retention) will this impact
(-) Budget the required inputs:
How much time, resources and investment is required to achieve this?
What's unique about AI product management.
Benchmarks move a lot faster: At the pace of change we're in, what was best-in-class a few months ago can be below-market today. These types of shifts in what "good" looks like usually happen over years, not weeks/months, making it harder to define success even if the success metric you are measuring against may not change.
Need to manage more complex inputs and outputs: You need to add more dimensions to your ROI analysis to account for the ML component compared to a traditional software product. See below examples.
Bigger need to break through the noise: With the ongoing AI frenzy, PMs need be very careful to not put too much weight on PR value of a new AI feature. You will get pressure to rush an AI feature, maybe from your leadership or the GTM teams, mostly to make noise around it and capture part of the hype. It's not necessarily a bad idea btw, but remember that ultimately you will need to build, maintain, and live with the outcomes of that AI feature.
Examples of new inputs/outputs to consider:
Accuracy, reliability and speed of your ML models
e.g. What investment would it take to improve accuracy by X%? How well do our models perform in different languages? Are there labeling costs or MLOps/infrastructure investments we need to make? etc
Impact on users: How would this impact the ability for our customer to complete their jobs to be done in our product?
Impact on business: How will this show up in our product and business metrics?
Bias, trust and safety
How are we measuring for it?
What are the impact and risks for customers and their adoption of our product?