Sharebird
Natalie Chung

AMA: Atlassian Director | Senior Principal PM, Teamwork Collection, Natalie Chung on AI Product Management


December 9 @ 9:00AM PT

View AMA Answers

  1. What are the key skills one should target to learn to grow in Trust & Risk , as a Product Manager ?

    Natalie Chung
    Natalie Chung

    Atlassian Director | Senior Principal PM, Teamwork Collection • 6mo

    For a Product Manager, Trust & Risk is no longer just about compliance - for those building AI experiences, this is also validate viability and quality. Examples of key skills PM should target to develop in this area include Setting up Evals - systematic way to score AI outputs against a standard. It is also important to create a scoring rubric and creating reference datasets (which curated examples of ideal inputs and outputs that serve as ground truth), and create judge prompts for one AI ...Read More

    3,727 Views
    1 request
  2. How do you handle scenarios where the AI may not perform as expected?

    Natalie Chung
    Natalie Chung

    Atlassian Director | Senior Principal PM, Teamwork Collection • 6mo

    Unexpected AI behaviour should be treated it as a quality and risk problem than a traditional bug. It would be good to get a better understanding of any patterns in the failures, put them into buckets such as retrieval gaps, instruction‑following errors, hallucinations, etc. Then start putting the representative use cases into your evaluation set, so those scenarios are automatically tested every time your team refines data, prompts, or models. On the UX side, there are resilient design to help ...Read More

    1,662 Views
    1 request
  3. What strategies are you using for testing and validating the AI components of your product?

    Natalie Chung
    Natalie Chung

    Atlassian Director | Senior Principal PM, Teamwork Collection • 6mo

    Examples of strategies to test and validate : Evals: Given the probabilistic nature of AI, manual testing at scale is impossible and evals are crucial for validation. This includes setting up Golden output data sets: Create a static set of inputs with "perfect" human-written answers. Automated Judging: When updating your model, run those 100 inputs through it. Use a separate "Judge" model to compare the new output against the golden dataset. Pass/Fail Metrics: This provides a percentage score (e ...Read More

    924 Views
    1 request