Knowledge Ingestion & RAG
Learn how to upload documents, crawl websites recursively, and configure retrieval settings.
Knowledge Ingestion & RAG
Chatzora utilizes Retrieval-Augmented Generation (RAG) to provide accurate, context-bound responses. The chatbot does not speculate; it answers user queries using the exact materials you provide.
Supported Content Formats
You can feed your bot a variety of data types:
- PDF Documents: Ideal for product manuals, white papers, and long-form FAQs.
- Microsoft Word (
.docx): Fully supported via our custommammothXML text extraction pipeline. - Plain Text (
.txt): Great for copy-pasting unstructured logs, instructions, or notes. - Website URLs: Add single URLs or crawl entire domains.
Recursive Website Scraper
Our scraper automatically strips navigation headers, sidebars, cookie notices, and footers to ensure only meaningful main-article text is indexed.
Crawl Depth Levels
When you input a URL to crawl, you can specify the depth limit:
- Depth 1: Scrapes only the exact URL entered.
- Depth 2: Scrapes the entered URL and any link found on that page that matches the same origin.
- Depth 3: Scrapes the main page, sub-pages, and any level-three sub-pages.
[!TIP] Ensure your website's origin matches the target url (e.g.
https://yoursite.com). Links to external sites like Twitter or Google will be ignored automatically to prevent indexing unrelated external domains.
Similarity Threshold & Fallbacks
To guarantee zero hallucinations, go to your bot's LLM Settings:
- RAG Similarity Threshold Slider: Configure a minimum match percentage (from
0.00to1.00).- If the similarity of the closest document chunk falls below this threshold (e.g.,
0.75), the bot will determine it does not have the answer.
- If the similarity of the closest document chunk falls below this threshold (e.g.,
- Custom Fallback Response: Define the exact text to display if the threshold is not met (e.g., "I'm sorry, I don't have that information in my knowledge base. Would you like me to connect you to a human support agent?").
Triage & Review Queue
When a visitor's question is blocked by the threshold or receives a low confidence score, the interaction is flagged.
- Go to Review Queue in the dashboard.
- Review unanswered or low-confidence visitor queries.
- Type the correct answer directly into the triage panel and click Approve & Index.
- The system immediately creates new embeddings and updates the knowledge database, automatically training the chatbot for future questions.