Sharebird

What strategies are you using for testing and validating the AI components of your product?

Answer
3 Answers
  1. Natalie Chung
    Natalie Chung

    Atlassian Director | Senior Principal PM, Teamwork Collection • 6mo

    Examples of strategies to test and validate : Evals: Given the probabilistic nature of AI, manual testing at scale is impossible and evals are crucial for validation. This includes setting up Golden output data sets: Create a static set of inputs with "perfect" human-written answers. Automated Judging: When updating your model, run those 100 inputs through it. Use a separate "Judge" model to compare the new output against the golden dataset. Pass/Fail Metrics: This provides a percentage score (e ...Read More

    924 Views
  2. Tanu Mutreja
    Tanu Mutreja

    Salesforce Senior Director, Product Management | Formerly Microsoft, AWS, Salesforce, New Relic, Sun Microsystems, Netscape • 5mo

    For testing and validating the AI components of our product, we use a multi-layered strategy that combines rigorous pre-deployment evaluation with continuous post-deployment monitoring, because AI systems are non-deterministic and evolve with data and usage.   We validate models through offline evaluation using curated test sets, golden datasets, and regression benchmarks to validate accuracy, robustness, and failure modes before anything reaches users. Note that we do not rely on public benchma ...Read More

    608 Views
  3. Deepak Mukunthu
    Deepak Mukunthu

    Salesforce Senior Director of Product, Agentforce AI Platform • 2y

    Testing and validating AI components of a product is crucial to ensure accuracy, reliability, and effectiveness. In addition to regular software testing, here are some strategies commonly used for testing and validating AI components: Data Quality Assessment: Assess the quality, completeness, and relevance of training data used to train AI models. Verify that the data is representative of the real-world scenario and free from biases or inaccuracies that could affect model performance. Cross-Vali ...Read More

    1,098 Views

Related Ask Me Anything Sessions

Top Product Management Mentors