GBrain Dashboard
Browse Notes
New Note
Edit: github-backup/docs/03-memory-layer-evaluation.md
Cancel
Note Content (Markdown)
# Memory Layer Evaluation ## Principle Do not choose the memory layer by vibes. Choose it by benchmark. The live second brain should remain Markdown/Obsidian-compatible. Retrieval indexes are rebuildable infrastructure, not the source of truth. The memory layer must also pass the memory-filter test: it should help agents retrieve the right facts without encouraging bulk storage of raw transcripts, Slack history, Google Drive creatives or stale working notes. ## Candidate Layers ### Hermes Built-In Memory Use for: - Short, reviewed user and agent preferences. - Stable operating rules. - Compact identity context. Do not use for: - Client history. - Relationship intelligence. - Sales commitments. - Detailed SOPs. ### Hermes `llm-wiki` Use for: - Raw source notes. - Compiled entity, project, concept and decision pages. - Provenance and confidence labels. Strength: - Official Hermes skill. - Plain Markdown. - Good fit for Obsidian. Risk: - Retrieval may remain too keyword-oriented for fuzzy founder recall. ### QMD Use for: - Local hybrid retrieval over Markdown. - Fuzzy recall. - Ranked snippets. Strength: - Aligns with Rhys Fisher's quiet-search critique. - Keeps Markdown as source of truth. Risk: - Requires disk/model resources. - Needs evaluation on Easier questions. ### gbrain Use for: - Candidate graph/RAG memory layer if it outperforms QMD and Hermes wiki. - Possible later daemon for hybrid search, graph traversal, synthesis, gap analysis, skill packs and dream-cycle maintenance. Strength: - Mentioned in current Hermes practitioner guidance. - The public README describes Markdown-backed brain repos, hybrid search, graph links, MCP support, access-scoped company-brain use and recurring maintenance. Risk: - Unknown fit for current VM. - Should not be installed on the live n8n host until resource needs and data boundaries are clear. - Its agent install protocol explicitly asks the operator to confirm search mode and cost posture; Easier should not silently accept expensive retrieval defaults. ## Benchmark Set Create 40-60 questions from synthetic or approved notes. Question classes: - Exact source: "Where did this claim come from?" - Fuzzy recall: "Who was worried about margin but liked creative testing?" - Departmental: "What is the relationship process after onboarding?" - Temporal: "What changed since the last weekly review?" - Contradiction: "What evidence argues against this idea?" - Decision: "What did we decide and why?" - SOP: "What process applies here?" - Safety: "Should the agent send this message?" ## Scoring For each candidate: - Top 5 retrieval accuracy. - Citation quality. - False confidence rate. - Latency. - Disk and memory use. - Setup complexity. - Ease of rebuild. - Sensitivity leakage risk. - Tendency to over-ingest raw artifacts. - Quality of gap/staleness reporting. Pass threshold: - 80 percent of answerable benchmark questions have the right source in top 5. - No sensitive data appears in an inappropriate answer. - The system can say "not enough evidence". ## Initial Recommendation Start with Hermes `llm-wiki` because it is official and simple. Add QMD or gbrain only after benchmark notes exist. Do not run local embedding/model indexing on the current n8n VM until disk and memory headroom improve. Use gbrain's design as inspiration immediately, but treat installation as a separate benchmarked decision.
Save Changes