Architecture

How Hortora's core systems work — for platform architects evaluating enterprise deployment.

Knowledge never goes stale silently

An entry from two years ago looks identical to one written yesterday — same formatting, same authority, same surfacing. Without enforcement, AI agents confidently advise fixes that were correct for an earlier library version but break on current ones, with no indication anything is amiss.

Hortora enforces staleness in five layers:

Guarantees
  • Every search result shows age — staleness is never invisible
  • Entries past staleness_threshold are flagged before influencing behaviour
  • last_reviewed resets the staleness clock when a human explicitly confirms validity
  • harvest REVIEW provides full-garden coverage across all domains
Graceful degradation
  • SWEEP domain filtering requires session activity to identify domains — cold-start sessions skip the spot-check, but the always-on annotation still fires
  • harvest REVIEW requires deliberate scheduling; it is not triggered automatically by CI

Decision reasoning survives beyond the session that produced it

An entry that says "do X" is useful once. An entry that says "do X because Y, not Z because..." is useful across library upgrades, team changes, and context shifts. Without captured reasoning, a knowledge garden accumulates cargo-cult fixes — correct procedures whose rationale nobody remembers, applied in contexts where the original reason no longer holds.

Hortora treats reasoning as a first-class field, not an optional comment:

Guarantees
  • Every entry scoring ≥12/15 carries explicit prose reasoning at submission time — high-value workarounds always document the decision, not just the fix
  • Reasoning is version-contextualised via verified_on where applicable — readers know when the decision was made, not just that it was made
  • Reasoning is re-evaluated at staleness review — last_reviewed confirms both the fix and its rationale still hold
  • Full decision history is auditable via git — no reasoning is silently overwritten
Graceful degradation
  • Entries below ≥12 do not require explicit rationale — lower-scoring entries may document only the fix; the threshold is a quality floor, not universal coverage
  • Rationale prose is written by the submitter at capture time; quality depends on the developer's context in that session — there is no automated reasoning validation
  • Cross-entry reasoning (architectural trade-offs spanning multiple entries) is not automatically linked — related decisions require manual See also: references added during harvest REVIEW

No entry enters the garden without passing validation

AI-generated content can be plausible-sounding but wrong, duplicate, or malformed. An unguarded knowledge store accumulates noise that erodes trust faster than it accumulates value — and bad entries are harder to find than good ones.

Every submission passes a multi-gate validation pipeline (validate_pr.py):

Guarantees
  • No entry merges without passing format validation and the ≥8 score threshold
  • CI blocks merge on failure — there is no silent pass
  • Every entry carries a permanent, collision-resistant GE-ID (GE-YYYYMMDD-xxxxxx) assigned at write time with no central counter
  • All entry history is preserved in git — nothing is silently overwritten
Graceful degradation
  • L1 Jaccard catches obvious duplicates; close variants can still enter — harvest DEDUPE provides L2/L3 semantic deduplication as a periodic maintenance sweep
  • Score is self-reported by the submitter; CI enforces the valid range (1–15) but not accuracy — periodic re-scoring improves signal over time

Concurrent sessions cannot corrupt garden state

Multiple AI sessions writing to a shared knowledge store simultaneously can produce partial writes, conflicting commits, and filesystem races — inconsistencies that are hard to detect and harder to recover from. A knowledge garden used across a team is always under concurrent write pressure.

Hortora uses git as the consistency layer:

Guarantees
  • Concurrent sessions cannot produce partially-visible entries — the commit is the write
  • Every read sees a complete, committed state regardless of concurrent write activity
  • Full audit trail: git history records who submitted what, when, from which session
  • Entries are never deleted — only deprecated or retired, with content preserved
Graceful degradation
  • Rebase recovery handles non-conflicting concurrent writes automatically; two sessions editing the same entry simultaneously require manual conflict resolution (rare — entries are append-only by convention)
  • Sparse blobless clone requires a git remote with partial clone support (GitHub, GitLab); local-only gardens fetch all blobs at clone time

Relevant entries surface; irrelevant ones don't

A flat keyword search across hundreds of entries produces two failure modes: false positives (cross-domain noise that wastes context budget and erodes trust) and false negatives (relevant entries missed because symptom keywords don't align). Both degrade the garden's value faster than new entries can recover it.

Hortora uses a three-tier retrieval algorithm with bounded context cost:

Guarantees
  • Technology scoping eliminates cross-domain false positives at query time
  • Three tiers ensure no entry in the relevant domain is unreachable
  • Context budget cost is bounded — the index pre-load is a single file regardless of garden size
  • Index is always committed state — no read can observe a partial write
Graceful degradation
  • Tier 3 (full domain scan) degrades in precision as a domain grows beyond ~200 entries — RAPTOR cluster summaries (Phase 8) will address this at scale
  • Keyword effectiveness is not automatically measured — entries with thin keywords may be missed until a maintainer updates them; harvest REVIEW is the natural point to catch this

The protocol scales beyond a single garden

A single canonical garden becomes a bottleneck at enterprise scale. Organisation-specific knowledge cannot sit alongside public community knowledge. Domain gardens need independent curation cadences. Yet independent forks fragment the protocol and eliminate sharing.

Hortora uses a three-tier federation model:

Guarantees (current)
  • Entry format and GE-ID scheme are stable and versioned — entries written today will be valid in future schema versions or carry an explicit migration path
  • The protocol is vendor-neutral — no lock-in to a specific AI provider or model family
  • Enterprise air-gapped deployment is supported today via local-only mode — no GitHub remote required
Guarantees Phase 5+
  • Child garden validation prevents duplication of canonical entries
  • Schema version mismatches fail loudly at integration time, not silently at retrieval time
  • Compliant gardens can participate in federated retrieval — forage SEARCH can query across gardens in a single session
Graceful degradation
  • Federation is Phase 5+ — current production implementation is single-garden. The spec is published; multi-garden tooling is not yet deployed.

Duplicate knowledge is eliminated in layers, not all at once

A single dedup gate at submission time is either too strict (rejecting valid near-duplicates) or too permissive (letting close variants through). A knowledge garden under active contribution accumulates semantic drift that point-in-time checks cannot catch.

Hortora uses three-level deduplication with a drift counter:

Guarantees
  • L1 catches obvious duplicates before merge — no obviously redundant entry enters through the standard workflow
  • Every dedup comparison is logged in CHECKED.md — pairs are never compared twice, and the full comparison history is auditable
  • Discarded entries are recorded in DISCARDED.md, not silently deleted — the decision to discard is always traceable
  • The drift counter makes dedup debt visible — maintainers always know how many entries have been added since the last sweep
Graceful degradation
  • L2/L3 (harvest DEDUPE) require a dedicated session with full context budget — they are not run automatically by CI
  • The drift counter triggers an obligation, not an enforcement gate — a garden can operate above threshold, but the debt is always visible
  • Cross-domain duplicates are not checked — entries in different domains are assumed distinct by definition
Full specification: The complete design — nine implementation phases, federation protocol, deduplication algorithm, and governance model — is in spec/docs/design/2026-04-07-garden-rag-redesign-design.md on GitHub.