LogoLogo
HomeExploreDocsAPIBlogContact
  • 🗃️Gooey.AI Docs
  • Changelog
  • 📖Guides
    • 🤖How to build an AI Copilot?
      • AI Prompting: Best practices
      • Curate your Knowledge Base Documents
      • Advanced Settings
      • Prepare Synthetic Data
      • Conversation Analysis
        • Glossary
      • Building a Multi-Modal Copilot
      • Frequently Asked Questions about AI Copilot
      • How to Automate Data Export?
    • 🚀How to deploy an AI Copilot?
      • Deploy to Web
      • Deploy to WhatsApp
      • Deploy to Slack
      • Deploy to Facebook
      • Broadcast Messages (via web or API)
      • Add buttons to your Copilot
    • ⚖️Understanding Bulk Runner and Evaluation
      • 💪How to set up Bulk Runner?
      • 🕵️‍♀️How to set up Evaluations?
      • How to use Bulk Run via API
    • 👄How to use AI Lip Sync Generator?
      • Lip Sync Animation Generator (WITH AUDIO FILES)
      • LipSync videos with Custom Voices
      • Set up your API for Lipsync with Local Folders
      • Tips to create great HD lipsync output
      • Frequently Asked Questions about Lipsync
    • 🗣️How to use ASR?
      • 📊How to create language evaluation for ASR?
    • How to use Compare AI Translations?
      • Google Translate Glossary
    • How does RAG-based document search work?
    • 🧩How to use Gooey Functions?
      • ✨LLM-enabled Functions
      • How to use SECRETS in Functions?
      • 🔥How to connect FirebaseDB to Copilot
    • 🎞️How to create AI Animations?
    • 🤳How to make amazing AI Art QR Codes?
      • API tips on AI Art QR Codes
    • 🖼️Create an AI Image with text
      • AI Image Prompting
      • API Tips for AI Image Generator
    • 📸AI Photo Editor
      • Build your avatar with AI
    • 🧑‍🏫How to use Gooey.AI’s Image Model Trainer?
    • 🔍Generate “People Also Ask” SEO Content
    • 🌐How to create SEO-Optimized content with AI?
    • How to use Workspaces?
      • How to use Version History?
      • How to add SECRETS in your Workspace?
    • 🍟How can I get free credits?
  • 😇CONTRIBUTING
    • Contributing
    • Documentation Style Guide
  • 🤓API REFERENCE
    • Getting started
    • API Generator
    • Rate Limits
    • Error Codes
  • 🍭ENDPOINTS
    • Copilot
    • Lipsync
    • Lipsync TTS
    • AI Art QR Generator
    • AI Animation Generator
    • Compare AI Image Generator
    • Gooey.AI on GitHub
Powered by GitBook
LogoLogo

Home

  • Gooey.AI
  • Explore Workflows
  • Sign In
  • Pricing

Learn

  • Docs
  • Blog
  • FAQs
  • Videos

Developers

  • How-to Guides
  • Get your Gooey.AI Key
  • Github
  • API Endpoints

Connect

  • Book a Demo
  • Discord
  • Team
  • Jobs

@Dara.network / Gooey.AI / support@gooey.ai

On this page
  • Step 1: Select Gooey Workflows to evaluate
  • Step 2: Input Data Spreadsheet
  • Step 3: Select your input columns
  • Step 4: Hit Submit
  • Output
  • Best Practices

Was this helpful?

Edit on GitHub
  1. Guides
  2. Understanding Bulk Runner and Evaluation

How to set up Evaluations?

Last updated 4 months ago

Was this helpful?

In this example scenario, we are comparing and evaluating the quality of the answers of various AI Copilots that have all the same settings and functionalities except for different LLMs.

Step 1: Select Gooey Workflows to evaluate

Choose the “SAVED” run from Gooey.AI Workflows that you would like to use.

Step 2: Input Data Spreadsheet

Prepare your golden QnA set:

  1. Create a list of the most frequently asked questions for your AI Copilot (we recommend between 25 for optimum observability and regression you can do more if you prefer)

  2. Make sure the Excel sheet/Google Sheets table has a “header” section

  3. Add all your questions and golden answer in the column below it

You must provide the Golden Answers. Golden answers are the most suitable and accurate answers provided by humans with expertise on the subject.

  1. Paste the link of your Google sheet or upload your data

Step 3: Select your input columns

In the current scenario, we want to use the Gooey Copilot to answer all the questions in the Google sheet. So essentially they are the “input” for the Bulk Workflow.

Select the “questions” column in the “Input Prompt” section.

Step 4: Hit Submit

As this is a “Bulk and Eval” scenario, you can “select” the Copilot Evaluator option in the section. After that hit the “Submit” button.

Note: We recommend using the “Copilot Evaluator” if you are evaluating Copilot Runs.

Output

The workflow will create a new CSV, with an added few columns based on the run, including, “Output Text”, “Run URL”, and “Run Time”.

With the evaluation option, you will also get output for “Rationale”, “Compare Run Score”, etc. You will also get a Compare Chart which will show the aggregate scores.

Your output will be on the right side of the page.

Best Practices

  • Keep it simple - try to use an input spreadsheet with limited columns

  • Don’t leave any empty data points in the second row - there is a bug and the column does not read

  • Make sure to name your “Saved” workflows with relatable titles so that it is easy to set up the run

  • We recommend collecting user messages from your saved copilot's “Analytics” section. Head to Your copilot link> Integrations tab > View Analytics, scroll to the bottom, and export the “Messages” tab CSV.

Note:

  • Bulk runner will only read the first sheet of your Excel or Google Sheet

  • In the case of Google Sheets, you can shift your relevant sheet to be the first and then re-enter the link in the Input section. IT WILL NOT REFRESH ON ITS OWN.

📖
⚖️
🕵️‍♀️

Check out the example run here: Evaluation only