π΅οΈββοΈHow to set up Evaluations?
Last updated
Last updated
@Dara.network / Gooey.AI / support@gooey.ai
In this example scenario, we are comparing and evaluating the quality of the answers of various AI Copilots that have all the same settings and functionalities except for different LLMs.
Choose the βSAVEDβ run from Gooey.AI Workflows that you would like to use.
Prepare your golden QnA set:
Create a list of the most frequently asked questions for your AI Copilot (we recommend between 25 for optimum observability and regression you can do more if you prefer)
Make sure the Excel sheet/Google Sheets table has a βheaderβ section
Add all your questions and golden answer in the column below it
You must provide the Golden Answers. Golden answers are the most suitable and accurate answers provided by humans with expertise on the subject.
Paste the link of your Google sheet or upload your data
In the current scenario, we want to use the Gooey Copilot to answer all the questions in the Google sheet. So essentially they are the βinputβ for the Bulk Workflow.
Select the βquestionsβ column in the βInput Promptβ section.
As this is a βBulk and Evalβ scenario, you can βselectβ the Copilot Evaluator option in the section. After that hit the βSubmitβ button.
Note: We recommend using the βCopilot Evaluatorβ if you are evaluating Copilot Runs.
The workflow will create a new CSV, with an added few columns based on the run, including, βOutput Textβ, βRun URLβ, and βRun Timeβ.
With the evaluation option, you will also get output for βRationaleβ, βCompare Run Scoreβ, etc. You will also get a Compare Chart which will show the aggregate scores.
Your output will be on the right side of the page.
Keep it simple - try to use an input spreadsheet with limited columns
Donβt leave any empty data points in the second row - there is a bug and the column does not read
Make sure to name your βSavedβ workflows with relatable titles so that it is easy to set up the run
We recommend collecting user messages from your saved copilot's βAnalyticsβ section. Head to Your copilot link> Integrations tab > View Analytics, scroll to the bottom, and export the βMessagesβ tab CSV.
Bulk runner will only read the first sheet of your Excel or Google Sheet
In the case of Google Sheets, you can shift your relevant sheet to be the first and then re-enter the link in the Input section. IT WILL NOT REFRESH ON ITS OWN.