LogoLogo
HomeExploreDocsAPIBlogContact
  • 🗃️Gooey.AI Docs
  • Changelog
  • 📖Guides
    • 🤖How to build an AI Copilot?
      • AI Prompting: Best practices
      • Curate your Knowledge Base Documents
      • Advanced Settings
      • Prepare Synthetic Data
      • Conversation Analysis
        • Glossary
      • Building a Multi-Modal Copilot
      • Frequently Asked Questions about AI Copilot
      • How to Automate Data Export?
    • 🚀How to deploy an AI Copilot?
      • Deploy to Web
      • Deploy to WhatsApp
      • Deploy to Slack
      • Deploy to Facebook
      • Broadcast Messages (via web or API)
      • Add buttons to your Copilot
    • ⚖️Understanding Bulk Runner and Evaluation
      • 💪How to set up Bulk Runner?
      • 🕵️‍♀️How to set up Evaluations?
      • How to use Bulk Run via API
    • 👄How to use AI Lip Sync Generator?
      • Lip Sync Animation Generator (WITH AUDIO FILES)
      • LipSync videos with Custom Voices
      • Set up your API for Lipsync with Local Folders
      • Tips to create great HD lipsync output
      • Frequently Asked Questions about Lipsync
    • 🗣️How to use ASR?
      • 📊How to create language evaluation for ASR?
    • How to use Compare AI Translations?
      • Google Translate Glossary
    • How does RAG-based document search work?
    • 🧩How to use Gooey Functions?
      • ✨LLM-enabled Functions
      • How to use SECRETS in Functions?
      • 🔥How to connect FirebaseDB to Copilot
    • 🎞️How to create AI Animations?
    • 🤳How to make amazing AI Art QR Codes?
      • API tips on AI Art QR Codes
    • 🖼️Create an AI Image with text
      • AI Image Prompting
      • API Tips for AI Image Generator
    • 📸AI Photo Editor
      • Build your avatar with AI
    • 🧑‍🏫How to use Gooey.AI’s Image Model Trainer?
    • 🔍Generate “People Also Ask” SEO Content
    • 🌐How to create SEO-Optimized content with AI?
    • How to use Workspaces?
      • How to use Version History?
      • How to add SECRETS in your Workspace?
    • 🍟How can I get free credits?
  • 😇CONTRIBUTING
    • Contributing
    • Documentation Style Guide
  • 🤓API REFERENCE
    • Getting started
    • API Generator
    • Rate Limits
    • Error Codes
  • 🍭ENDPOINTS
    • Copilot
    • Lipsync
    • Lipsync TTS
    • AI Art QR Generator
    • AI Animation Generator
    • Compare AI Image Generator
    • Gooey.AI on GitHub
Powered by GitBook
LogoLogo

Home

  • Gooey.AI
  • Explore Workflows
  • Sign In
  • Pricing

Learn

  • Docs
  • Blog
  • FAQs
  • Videos

Developers

  • How-to Guides
  • Get your Gooey.AI Key
  • Github
  • API Endpoints

Connect

  • Book a Demo
  • Discord
  • Team
  • Jobs

@Dara.network / Gooey.AI / support@gooey.ai

On this page
  • Why do you need synthetic data for AI Copilot?
  • Features of Synthetic Document Extractor
  • Step 1: Create a New Google Sheet
  • Step 2: Enter Raw Data links
  • Step 3: Add instructions
  • Step 4: Select a Model
  • Step 5: Select ASR Model
  • Step 6: Hit SUBMIT
  • Harnessing Additional Functions

Was this helpful?

Edit on GitHub
  1. Guides
  2. How to build an AI Copilot?

Prepare Synthetic Data

Learn how to add AI synthesized data to improve your AI Copilot results

Last updated 4 months ago

Was this helpful?

Why do you need synthetic data for AI Copilot?

Often AI Copilots must respond succinctly and answer in FAQ style.

When using direct transcriptions from video tutorials and recordings for your , you will need some synthetic data to ensure the user's query is recognized and answered correctly. Video or audio tutorials, can often be very long sentences and have a casual tone of conversation. This can hinder the AI Copilot's search and summary ability. For this, we recommend using our Synthetic Document Extractor workflow.

Features of Synthetic Document Extractor

  1. Extract information from videos for various purposes

  2. Collect lists of YouTube videos or PDFs for data crunching.

  3. Update Google Sheets as workflow progresses.

LINK TO WORKFLOW:

Step 1: Create a New Google Sheet

Create a new, empty Google sheet to store your extracted data. Set the access permissions to "Anyone with link can edit."

Step 2: Enter Raw Data links

What will work:

  • Hosted video and audio links

  • Youtube links

  • PDFs (OCR and Tabulated Data will work)

PRO TIP: If you copy the link to the Google Folder with your docs/pdf. You should immediately see all the files in the folder

Step 3: Add instructions

Open the settings tabs and add the relevant instructions for the synthetic data conversion. Example below:

You are a Javascript tutor. Read the video training transcripts and create a properly outputted data with the sections with the following headings: Provide a short and succinct title, with an additional delimiter at the title's end - Description: provide an short summary of the video as a description Facts: succinct and accurately list all the facts from the transcription that will be useful to the students; don't self reference the video. FAQs: think about the questions that students would ask for this Javascript course based on the transcripts. Remember the notes below:

  • make a comprehensive set of questions and answers based on the transcript

  • avoid repetitions

  • avoid self referencing the course

  • don't make up questions and answers beyond contents of the transcript

Step 4: Select a Model

Choose a Language model for the synthetic data extraction among the available options.

Step 5: Select ASR Model

Choose the relevant ASR Model that will work best for your speech recognition.

Step 6: Hit SUBMIT

Hit "Submit." The tool will prepare the sheet and update it in real-time. It will then auto-populate all the needed information along with a transcription.

Harnessing Additional Functions

Synthetic Data Extractor Workflow allows you to upload all videos from your YouTube playlist through one playlist link, which gives an entire transcription output. Likewise, you can manually choose a list of videos for specific transcription tasks.

Note: Adding new data on the same sheet may overwrite the saved information.

The tool works best for content that is less than 30-40 minutes due to word limit restrictions on Google Sheets.

Transcription Bonus: Extract Data from PDFs

This tool also supports the extraction of data from PDFs. Simply paste the link of the accessible PDF in the input and hit "Submit." Like videos, it will extract important data from your document while also updating a Google sheet in real-time.

Tutorial available here:

📖
🤖
Knowledge Base Documents
https://gooey.ai/doc-extract