Chatzora Docs
Knowledge Base

Knowledge Ingestion & RAG

Learn how to upload documents, crawl websites recursively, and configure retrieval settings.

Knowledge Ingestion & RAG

Chatzora utilizes Retrieval-Augmented Generation (RAG) to provide accurate, context-bound responses. The chatbot does not speculate; it answers user queries using the exact materials you provide.

Supported Content Formats

You can feed your bot a variety of data types:

  1. PDF Documents: Ideal for product manuals, white papers, and long-form FAQs.
  2. Microsoft Word (.docx): Fully supported via our custom mammoth XML text extraction pipeline.
  3. Plain Text (.txt): Great for copy-pasting unstructured logs, instructions, or notes.
  4. Website URLs: Add single URLs or crawl entire domains.

Recursive Website Scraper

Our scraper automatically strips navigation headers, sidebars, cookie notices, and footers to ensure only meaningful main-article text is indexed.

Crawl Depth Levels

When you input a URL to crawl, you can specify the depth limit:

  • Depth 1: Scrapes only the exact URL entered.
  • Depth 2: Scrapes the entered URL and any link found on that page that matches the same origin.
  • Depth 3: Scrapes the main page, sub-pages, and any level-three sub-pages.

[!TIP] Ensure your website's origin matches the target url (e.g. https://yoursite.com). Links to external sites like Twitter or Google will be ignored automatically to prevent indexing unrelated external domains.


Similarity Threshold & Fallbacks

To guarantee zero hallucinations, go to your bot's LLM Settings:

  1. RAG Similarity Threshold Slider: Configure a minimum match percentage (from 0.00 to 1.00).
    • If the similarity of the closest document chunk falls below this threshold (e.g., 0.75), the bot will determine it does not have the answer.
  2. Custom Fallback Response: Define the exact text to display if the threshold is not met (e.g., "I'm sorry, I don't have that information in my knowledge base. Would you like me to connect you to a human support agent?").

Triage & Review Queue

When a visitor's question is blocked by the threshold or receives a low confidence score, the interaction is flagged.

  1. Go to Review Queue in the dashboard.
  2. Review unanswered or low-confidence visitor queries.
  3. Type the correct answer directly into the triage panel and click Approve & Index.
  4. The system immediately creates new embeddings and updates the knowledge database, automatically training the chatbot for future questions.