Atlassian Director | Senior Principal PM, Teamwork Collection • 6mo
Examples of strategies to test and validate : Evals: Given the probabilistic nature of AI, manual testing at scale is impossible and evals are crucial for validation. This includes setting up Golden output data sets: Create a static set of inputs with "perfect" human-written answers. Automated Judging: When updating your model, run those 100 inputs through it. Use a separate "Judge" model to compare the new output against the golden dataset. Pass/Fail Metrics: This provides a percentage score (e ...Read More