Platform

Hybrid retrieval: BM25 + dense embeddings, RRF-fused

Q: What embedding backends are supported?

Three backends are available via the wikantik.search.dense.backend property: inmemory (brute-force cosine scan; the config-file default for new installs), pgvector (delegates to PostgreSQL HNSW), and lucene-hnsw (in-process Lucene HNSW index, the docker1 production default). All three share the same fail-closed BM25 fallback.

Q: Does hybrid retrieval work for agents and humans?

Yes. The same /api/search endpoint is used by the web UI, the MCP server tools, and direct API callers. Agents benefit from the same BM25 + dense RRF pipeline that humans use, plus the structured JSON response format that makes results easy to process programmatically.

Wikantik's search stack fuses Lucene BM25 with dense embedding cosine similarity using weighted Reciprocal Rank Fusion — and falls back cleanly to BM25 alone if any upstream component is unavailable.

The problem with keyword-only search

Classic BM25 keyword search is fast and reliable, but it is fundamentally a vocabulary-matching exercise. A query for "index funds for retirement" will miss a page titled "Passive Investing Fundamentals" unless it happens to contain those exact terms. For a wiki used by both humans exploring topics and AI agents answering questions, that gap is not a cosmetic issue — it is a retrieval failure that produces hallucinated or missing citations.

Dense embedding search solves the vocabulary problem by mapping both queries and documents into a shared semantic vector space. But dense search on its own can be brittle: it rewards topical similarity over exact relevance, and it fails silently when the embedding backend is unavailable.

Hybrid retrieval takes the best of both worlds.

How it works: two stages (plus an optional experimental stage)

Stage 1 — BM25 + dense fusion with weighted RRF

At query time, Wikantik runs two retrieval passes in parallel: a Lucene BM25 pass and a dense cosine-similarity pass against chunk embeddings. The two ranked lists are then merged using weighted Reciprocal Rank Fusion (RRF) — a rank-based combination method that is robust to score-scale differences between the two signals.

The key configuration properties are:

wikantik.search.hybrid.rrf.bm25-weight — weight applied to BM25 ranks (default 1.0)
wikantik.search.hybrid.rrf.dense-weight — weight applied to dense ranks (default 1.5)
wikantik.search.hybrid.rrf.k — the RRF smoothing constant (default 60)

Dense retrieval operates at the chunk level — each page is broken into overlapping passages, each embedded separately — and the SUM_TOP_3 page aggregation strategy collapses chunk scores into a per-page score before fusion.

Stage 2 — fail-closed BM25 fallback

Every abnormal path in the dense retrieval stage returns the BM25 result list unchanged rather than failing. If the embedding backend times out, the circuit breaker trips OPEN and the embedder returns empty — and BM25 takes over seamlessly. This is not an afterthought; it is a design invariant enforced in HybridSearchService.rerank().

Optional stage: experimental knowledge-graph proximity rerank

An experimental third stage can rerank the fused list using the Knowledge Graph. It resolves entities in the query against kg_nodes, then boosts pages whose mentioned entities are close (in Knowledge Graph hops) to those query entities. This stage is off by default (wikantik.search.graph.boost = 0) — empirical measurement found no net ranking lift, so production search is BM25 + dense only. The stage only reorders candidates; it never adds or removes them. If no graph signal exists for a query, it returns the fused list unchanged.

Production note: the dense backend on docker1 (the live Wikantik deployment) is lucene-hnsw — an in-process Lucene HNSW approximate nearest-neighbour index held in JVM RAM. It is faster than the brute-force inmemory backend on large corpora and avoids the extra database load of the pgvector backend. Switch backends with a single property: wikantik.search.dense.backend = inmemory | pgvector | lucene-hnsw.

Dense retrieval sits under the same cost ceiling as the rest of the platform: an operator running the core wikantik.genai.mode tier gets BM25 only, with zero embedding infrastructure to run or pay for, while search or full enable dense fusion through a CPU-only or hosted embedding backend. See self-hosting & backup for the tier breakdown.

Why it matters for RAG and agents

Retrieval-Augmented Generation (RAG) systems are only as good as their retrieval step. An agent that gets the wrong page returns a wrong or hallucinated answer. Hybrid retrieval dramatically reduces the chance of a retrieval miss compared to either BM25 or dense alone — particularly for queries that combine exact terminology with broader semantic intent, which is exactly how humans and agents tend to ask questions about a technical knowledge base.

The same /api/search endpoint that powers the search box in the wiki UI is the one the MCP tools call. Agents get first-class search, not a stripped-down API.

For deeper reading on the implementation, see the Hybrid Retrieval design doc in the live wiki.

Frequently asked questions

What embedding backends are supported?

Three backends are available via the wikantik.search.dense.backend property: inmemory (brute-force cosine scan; the config-file default for new installs), pgvector (delegates to PostgreSQL HNSW index), and lucene-hnsw (in-process Lucene HNSW index, the docker1 production default). All three share the same fail-closed BM25 fallback.

What happens if the vector store is down?

Wikantik hybrid retrieval is fail-closed: every failure path in the dense retrieval stack returns the unmodified BM25 result list. The circuit breaker trips OPEN after consecutive embedding failures and automatically re-probes after a reset interval; during the OPEN period, search continues returning BM25 results. Search never goes dark because of an embedding outage.

Does hybrid retrieval work for both agents and humans?

Yes. The same /api/search endpoint is used by the web UI search box, the /knowledge-mcp MCP tools, and direct API callers. Agents benefit from the same BM25 + dense RRF pipeline that humans get, plus the structured JSON response that makes results easy to process programmatically.

Explore the live wiki → Talk to us