Domain Knowledge
Domain knowledge lets you build a searchable corpus from your own documents. The agent uses local vector embeddings to find relevant chunks and synthesise answers — no data leaves your machine.
Domain knowledge is disabled by default. Enable it explicitly:
{
"knowledge": { "enabled": true }
}
How It Works
Your documents
│
▼
Chunking (512 tokens, 64 overlap)
│
▼
Embedding (nomic-embed-text-v1.5, local ONNX)
│
▼
sqlite-vec (vec0 virtual table in reeboot.db)
│
▼
FTS5 full-text index (in reeboot.db)
│
▼
Agent tools: knowledge_search, knowledge_ingest, knowledge_lint
The embedding model downloads once on first use and is cached locally. No API key or internet connection is required after that.
Document Tiers
| Tier | Path | Purpose |
|---|---|---|
template | ~/.reeboot/knowledge/raw/template/ | Pre-packaged knowledge shipped with agent profiles |
owner | ~/.reeboot/knowledge/raw/owner/ | Documents you add (PDFs, Markdown, text files) |
Both tiers feed the same vector index. The source_tier field on every chunk tells the agent where a piece of knowledge came from.
Ingesting Documents
Drop files into ~/.reeboot/knowledge/raw/owner/ — reeboot watches this directory and ingests new files automatically.
Or ask the agent directly:
Ingest this document: [paste content or file path]
Supported formats: plain text, Markdown, PDF.
Wiki Synthesis Mode
Wiki synthesis is an optional layer on top of RAG. When enabled, the agent maintains a set of Markdown pages in ~/.reeboot/knowledge/wiki/ that synthesise knowledge across documents — cross-referencing, resolving contradictions, and building a structured knowledge base.
Enable it:
{
"knowledge": {
"enabled": true,
"wiki": { "enabled": true }
}
}
Tradeoffs: wiki synthesis adds LLM processing at ingest time, carries a risk of hallucination contamination (synthesised cross-references can look authoritative), and requires more storage. For most use cases, pure RAG mode (wiki disabled) is recommended.
Agent Tools
| Tool | What it does |
|---|---|
knowledge_search | Vector + FTS5 search over the corpus |
knowledge_ingest | Ingest a document into the corpus |
knowledge_lint | Check for orphaned pages, stale claims, contradictions |
Configuration Reference
| Field | Type | Default | Description |
|---|---|---|---|
knowledge.enabled | boolean | false | Enable domain knowledge. When false, no vector index or FTS5 table is created. |
knowledge.embeddingModel | string | "nomic-ai/nomic-embed-text-v1.5" | Local ONNX model for embeddings. 8192-token context, no API key needed. |
knowledge.dimensions | number | 768 | Embedding vector dimensions. Must match the model. |
knowledge.chunkSize | number | 512 | Token chunk size for document splitting. |
knowledge.chunkOverlap | number | 64 | Overlap in tokens between consecutive chunks. |
knowledge.wiki.enabled | boolean | false | Enable wiki synthesis mode. |
knowledge.wiki.lint.schedule | string | "0 9 * * 1" | Cron schedule for wiki lint. Default: 9 AM every Monday. |
Developer Notes
- Vector storage uses sqlite-vec (
vec0virtual table). sqlite-vec is pre-v1 but Mozilla-backed and production-tested. - Auxiliary columns in sqlite-vec are declared as
TEXT, notINTEGER, due to a known type-binding limitation in better-sqlite3. Integer values are stored as strings and parsed on retrieval. - Wiki content lives as Markdown files (filesystem); metadata is mirrored in the
wiki_pagesSQLite table. Files are the source of truth.