HyDE — Hypothetical Document Embeddings
LLM generates a hypothetical answer for the user query, then uses that answer embedding for vector retrieval. Specifically applied to article-type entries (item_type = article) to enhance semantic matching for long-form content.
Hybrid Search + RRF
Parallel pgvector cosine similarity search and PostgreSQL ILIKE keyword search, merged via Reciprocal Rank Fusion (k=60). Default hybrid_weight=0.3 favors semantic search, tunable per bot.
Multi-Query Retrieval
Rewritten query and original user query both execute searches; results merged taking the highest similarity score per item. Used in the fixed-reply decision phase to avoid edge cases missed by rewrite.
Query Rewriting + Compound Detection
LLM rewrites follow-up questions into self-contained queries with full pronoun resolution and context (handling cases like How much is it or What is its size that rely on prior context), and simultaneously detects whether the message is a compound question. On detection, the system auto-splits into multiple independent sub-questions for downstream handling. Rewrite, detect, and split happen in one pass — avoiding false handoff triggers from misclassified compound queries.
Topic Summary Auto-Compilation
When entries reach 5 or more, LLM clusters same-topic items and generates summary entries (item_type = summary). Summary entries receive 1.4× RRF boost, accelerating retrieval for overarching questions.
Contradiction Detection
New entries are first filtered by vector similarity ≥ 0.6, then sent to LLM for comparison against numerical or policy conflicts. Graceful degradation: if LLM fails, returns has_contradiction=False without blocking the write.
Three-Layer Reply Architecture
① Fixed Reply: similarity ≥ 0.90 + length > 4 + non-compound query; ② RAG: top-3 entries form context for LLM generation; ③ Fallback: below min_similarity (default 0.4) returns safe message. Each layer has independent thresholds for traceable replies.
Multi-KB Architecture
Every search hits both client-private KB and system-shared KB, merged via RRF. System KB hits can independently pass the fallback threshold, providing rescue when client KB misses.
Rich Media Reply with Per-Card Threshold
AI responses can deliver text, images, and URLs simultaneously, with each platform using its native multi-message format. Each referenced knowledge card is independently checked for semantic precision — only cards crossing the threshold contribute media, eliminating borderline topic entries dragging in unrelated images mismatches.
pgvector + IVFFlat
PostgreSQL pgvector extension stores 1536-dim embeddings (OpenAI text-embedding-3-small), with ivfflat index (vector_cosine_ops, lists=10) for enterprise-grade retrieval performance.
Auto Human Handoff System
Four trigger types (intent / ai_struggling / fallback / code) + Pause Timer (Redis ai_disabled key with TTL) + LLM auto-generated handoff summary (six writing principles). Fallback counter uses Redis 24h rolling counter, triggers at configured threshold (default 2, range 1–10).
Smart Handoff Structured Output
When smart handoff is enabled, LLM produces output via OpenAI strict JSON schema (reasoning / needs_handoff / intent / reply fields). The intent enum is locked to AI cannot process plus merchant-defined scenarios, preventing parse failures from free-form LLM output.
Multi-Platform OAuth & Webhook
Unified webhook endpoints for LINE / Facebook Messenger / Instagram (e.g., /webhook/line/redis_id/bot_id). OAuth 2.0 flows support LINE Login / Google / Facebook / IG Graph API with CSRF tokens cached in Redis with TTL.
Compound Question Handling
When customers ask multiple things in one message (e.g. Where is the water source? Can you install in Kaohsiung?), the system auto-splits into independent sub-questions, runs complete RAG retrieval and media matching for each, and finally sends multiple separate messages — each replying only to that sub-question with its corresponding media. Eliminates image-text mismatches and information compression from forced merged answers.
Knowledge Gap Auto-Tracking
Any scenario where AI cannot answer — fallback safe-response, AI self-evaluation failure, sub-question miss in a compound query, smart handoff trigger — is automatically logged as a Knowledge Gap and ranked by frequency for administrators. Owners can identify the next KB entries to add at a glance, without manually reviewing conversation logs.
Document Direct-Read Import
When customers drag PDF / Word / Excel files into the admin panel, the system uses GPT-5-class models to read entire documents in batches — capturing document structure, tables, and policy sections in one pass, auto-splitting into knowledge base entries. Bypasses the traditional multi-stage lossy pipeline of first OCR / text heuristics extracting paragraphs, then small models stitching them together.
Inbound Image Strategy
When customers send images, each merchant can choose a handling policy per scenario: ① Auto-handoff to live agent (portraits, on-site photos, product shots); ② Text prompt asking customers to describe in text; ③ Silent ignore (avoid noise interference). Prevents AI from hallucinating image content, and gives each industry flexibility.