Architecture

How Hortora's core systems work — for platform architects evaluating enterprise deployment.

Knowledge never goes stale silently

An entry from two years ago looks identical to one written yesterday — same formatting, same authority, same surfacing. Without enforcement, AI agents confidently advise fixes that were correct for an earlier library version but break on current ones, with no indication anything is amiss.

Hortora enforces staleness in five layers:

Always-on annotation: Every SEARCH result shows the entry's age. Past-threshold entries get a ⚠️ stale warning with verification date and version (if captured). Approaching-threshold entries get an ℹ️ advisory. This fires even if no maintenance has ever been run.
Version capture at write time: Library and tool-specific entries prompt for verified_on (e.g. quarkus: 3.34.2) at CAPTURE time. SEARCH uses this to produce a concrete version-gap flag rather than a generic age warning.
Rationale capture for high-scoring entries: Entries scoring ≥12/15 prompt for "why this fix over the obvious alternative" — preserving reasoning that would otherwise be lost with the session.
Domain-filtered session review: At SWEEP end, forage checks entries in the domains worked in that session against their staleness_threshold. Expired entries surface for Confirm/Revise/Retire while the developer has fresh context.
Systematic backstop: validate_garden.py --freshness scans all entries and reports the overdue count. harvest REVIEW processes all overdue entries across all domains — the guarantee that no entry escapes review indefinitely.

Guarantees

Every search result shows age — staleness is never invisible
Entries past staleness_threshold are flagged before influencing behaviour
last_reviewed resets the staleness clock when a human explicitly confirms validity
harvest REVIEW provides full-garden coverage across all domains

Graceful degradation

SWEEP domain filtering requires session activity to identify domains — cold-start sessions skip the spot-check, but the always-on annotation still fires
harvest REVIEW requires deliberate scheduling; it is not triggered automatically by CI

Decision reasoning survives beyond the session that produced it

An entry that says "do X" is useful once. An entry that says "do X because Y, not Z because..." is useful across library upgrades, team changes, and context shifts. Without captured reasoning, a knowledge garden accumulates cargo-cult fixes — correct procedures whose rationale nobody remembers, applied in contexts where the original reason no longer holds.

Hortora treats reasoning as a first-class field, not an optional comment:

Mandatory rationale for high-value entries: Entries scoring ≥12/15 trigger an explicit prompt at CAPTURE time: "Why this fix over the obvious alternative?" The answer is written into a ### Why this fix prose section in the entry body — not a metadata tag, but a readable explanation that travels with the entry permanently.
Version-contextualised decisions: Library and tool-specific entries capture verified_on (e.g. quarkus: 3.34.2) at write time. This tells the next reader exactly when and where the reasoning was valid — not just what the fix was, but under what conditions it was the right call.
Reasoning is re-evaluated at review time: harvest REVIEW treats the rationale section as part of what is confirmed, not just the fix. A reviewer does not reset the staleness clock without affirming that both the workaround and its reasoning still apply. The last_reviewed timestamp records this explicit human confirmation.
Full decision history in git: Entry bodies are committed as plain markdown. If a rationale is revised during a staleness review, the original reasoning is preserved in git history — the evolution of the decision is always auditable, not just its current state.

Guarantees

Every entry scoring ≥12/15 carries explicit prose reasoning at submission time — high-value workarounds always document the decision, not just the fix
Reasoning is version-contextualised via verified_on where applicable — readers know when the decision was made, not just that it was made
Reasoning is re-evaluated at staleness review — last_reviewed confirms both the fix and its rationale still hold
Full decision history is auditable via git — no reasoning is silently overwritten

Graceful degradation

Entries below ≥12 do not require explicit rationale — lower-scoring entries may document only the fix; the threshold is a quality floor, not universal coverage
Rationale prose is written by the submitter at capture time; quality depends on the developer's context in that session — there is no automated reasoning validation
Cross-entry reasoning (architectural trade-offs spanning multiple entries) is not automatically linked — related decisions require manual See also: references added during harvest REVIEW

No entry enters the garden without passing validation

AI-generated content can be plausible-sounding but wrong, duplicate, or malformed. An unguarded knowledge store accumulates noise that erodes trust faster than it accumulates value — and bad entries are harder to find than good ones.

Every submission passes a multi-gate validation pipeline (validate_pr.py):

Format validation: YAML frontmatter must contain all required fields (id, title, type, domain, stack, tags, score, verified, staleness_threshold, submitted). Malformed entries fail immediately.
Score threshold: Entries are rated across five dimensions — non-obviousness, discoverability, breadth, pain/impact, longevity — and must reach ≥8/15. The dimensions are defined in the protocol spec and enforced by CI.
Injection detection: Entry content is scanned for prompt-injection patterns before it enters the garden. An entry that tries to modify Claude's behaviour at retrieval time is rejected.
L1 deduplication: Jaccard similarity is checked against existing entries in the same domain at submission time. Obvious duplicates are rejected before a human reviewer sees them.
CI enforcement: GitHub Actions runs validate_pr.py on every PR against the garden repo. Merge is blocked on any failure. No bypass path exists in the standard workflow.

Guarantees

No entry merges without passing format validation and the ≥8 score threshold
CI blocks merge on failure — there is no silent pass
Every entry carries a permanent, collision-resistant GE-ID (GE-YYYYMMDD-xxxxxx) assigned at write time with no central counter
All entry history is preserved in git — nothing is silently overwritten

Graceful degradation

L1 Jaccard catches obvious duplicates; close variants can still enter — harvest DEDUPE provides L2/L3 semantic deduplication as a periodic maintenance sweep
Score is self-reported by the submitter; CI enforces the valid range (1–15) but not accuracy — periodic re-scoring improves signal over time

Concurrent sessions cannot corrupt garden state

Multiple AI sessions writing to a shared knowledge store simultaneously can produce partial writes, conflicting commits, and filesystem races — inconsistencies that are hard to detect and harder to recover from. A knowledge garden used across a team is always under concurrent write pressure.

Hortora uses git as the consistency layer:

Read discipline: All reads use git show HEAD:<path> — never the filesystem directly. A session mid-write to a file does not affect what another session reads; they always see the last committed state.
Write discipline: Every file write is followed immediately by git add and git commit. No uncommitted garden state persists between skill operations.
Conflict recovery: When two sessions commit simultaneously, the second receives a rejected push. git rebase HEAD resolves non-conflicting concurrent writes automatically, without data loss.
Human-readable format: YAML frontmatter + markdown — diffs are readable, history is auditable with git log and git blame, no binary formats.
Sparse blobless clone: Large gardens use --filter=blob:none --sparse at clone time. Clone size stays bounded as the garden grows to thousands of entries.

Guarantees

Concurrent sessions cannot produce partially-visible entries — the commit is the write
Every read sees a complete, committed state regardless of concurrent write activity
Full audit trail: git history records who submitted what, when, from which session
Entries are never deleted — only deprecated or retired, with content preserved

Graceful degradation

Rebase recovery handles non-conflicting concurrent writes automatically; two sessions editing the same entry simultaneously require manual conflict resolution (rare — entries are append-only by convention)
Sparse blobless clone requires a git remote with partial clone support (GitHub, GitLab); local-only gardens fetch all blobs at clone time

Relevant entries surface; irrelevant ones don't

A flat keyword search across hundreds of entries produces two failure modes: false positives (cross-domain noise that wastes context budget and erodes trust) and false negatives (relevant entries missed because symptom keywords don't align). Both degrade the garden's value faster than new entries can recover it.

Hortora uses a three-tier retrieval algorithm with bounded context cost:

Tier 1 — Technology filter: Every entry is assigned a domain at write time (e.g. quarkus, java, tools). Retrieval scopes to the relevant domain first — cross-domain entries are never surfaced for an unrelated query.
Tier 2 — Symptom and label match: Within the domain, the index is searched by symptom type and label across three dimensions — By Technology, By Symptom/Type, By Label.
Tier 3 — Full domain scan: If tiers 1 and 2 produce no match, a full scan of the domain is performed. No entry in the relevant domain is unreachable regardless of keyword alignment.
Index pre-load, bodies on demand: The GARDEN.md index is loaded at session start. Entry bodies are only fetched when an entry is selected — context budget cost is proportional to entries read, not garden size.
Git-only reads: Index and entry bodies are always read from git show HEAD: — no partial-write races, no stale filesystem cache.

Guarantees

Technology scoping eliminates cross-domain false positives at query time
Three tiers ensure no entry in the relevant domain is unreachable
Context budget cost is bounded — the index pre-load is a single file regardless of garden size
Index is always committed state — no read can observe a partial write

Graceful degradation

Tier 3 (full domain scan) degrades in precision as a domain grows beyond ~200 entries — RAPTOR cluster summaries (Phase 8) will address this at scale
Keyword effectiveness is not automatically measured — entries with thin keywords may be missed until a maintainer updates them; harvest REVIEW is the natural point to catch this

The protocol scales beyond a single garden

A single canonical garden becomes a bottleneck at enterprise scale. Organisation-specific knowledge cannot sit alongside public community knowledge. Domain gardens need independent curation cadences. Yet independent forks fragment the protocol and eliminate sharing.

Hortora uses a three-tier federation model:

Canonical garden: Curated, CI-enforced, publicly reviewed. Source of truth for cross-domain entries. Maintained under the Hortora GitHub organisation.
Child garden: Inherits from a canonical garden and extends it with domain-specific or organisation-specific entries. Validation prevents duplication of canonical entries — a child garden adds without repeating.
Peer gardens: Independent gardens sharing the Hortora protocol. No inheritance — entries are entirely distinct — but the entry format, GE-ID scheme, and retrieval algorithm are compatible.
SchemaVer: Each entry declares its schema version. Mismatches between gardens are detectable at integration time rather than silent.
Open protocol: Entry format, validation pipeline, and retrieval algorithm are not Claude-specific. Any AI assistant implementing the forage/harvest skill convention can participate. The spec is published and versioned independently of any model or vendor.

Guarantees (current)

Entry format and GE-ID scheme are stable and versioned — entries written today will be valid in future schema versions or carry an explicit migration path
The protocol is vendor-neutral — no lock-in to a specific AI provider or model family
Enterprise air-gapped deployment is supported today via local-only mode — no GitHub remote required

Guarantees Phase 5+

Child garden validation prevents duplication of canonical entries
Schema version mismatches fail loudly at integration time, not silently at retrieval time
Compliant gardens can participate in federated retrieval — forage SEARCH can query across gardens in a single session

Graceful degradation

Federation is Phase 5+ — current production implementation is single-garden. The spec is published; multi-garden tooling is not yet deployed.

Duplicate knowledge is eliminated in layers, not all at once

A single dedup gate at submission time is either too strict (rejecting valid near-duplicates) or too permissive (letting close variants through). A knowledge garden under active contribution accumulates semantic drift that point-in-time checks cannot catch.

Hortora uses three-level deduplication with a drift counter:

L1 — Jaccard at PR time (validate_pr.py): Runs automatically on every submission. Compares the incoming entry against all existing entries in the same domain. Catches obvious duplicates and rejects them before any human review. Fires in CI — no operator action required.
L2 — Related entry detection (harvest DEDUPE): Periodic maintenance operation that compares all within-domain entry pairs not yet checked. Related entries — similar but legitimately distinct — are cross-referenced: See also: GE-XXXX is added to both. This improves retrieval coherence without discarding valid knowledge.
L3 — Duplicate consolidation (harvest DEDUPE): Within the same sweep, true duplicates are surfaced to the maintainer for consolidation. The less complete entry is retired; the more complete entry is preserved. Discarded entries are recorded in DISCARDED.md for audit.
Drift counter: GARDEN.md tracks Entries merged since last sweep. When it reaches the configured threshold (default: 10), harvest DEDUPE should be run. The counter is incremented by the CI integration step on every merge — dedup debt is always visible.

Guarantees

L1 catches obvious duplicates before merge — no obviously redundant entry enters through the standard workflow
Every dedup comparison is logged in CHECKED.md — pairs are never compared twice, and the full comparison history is auditable
Discarded entries are recorded in DISCARDED.md, not silently deleted — the decision to discard is always traceable
The drift counter makes dedup debt visible — maintainers always know how many entries have been added since the last sweep

Graceful degradation

L2/L3 (harvest DEDUPE) require a dedicated session with full context budget — they are not run automatically by CI
The drift counter triggers an obligation, not an enforcement gate — a garden can operate above threshold, but the debt is always visible
Cross-domain duplicates are not checked — entries in different domains are assumed distinct by definition

Full specification: The complete design — nine implementation phases, federation protocol, deduplication algorithm, and governance model — is in spec/docs/design/2026-04-07-garden-rag-redesign-design.md on GitHub.