github-backup/docs/15-operational-setup-2026-05-28.md

Edit Back to List

Operational Setup — 2026-05-28 Live Session

This document captures everything established during the first live Hermes session on 2026-05-28. It represents the authoritative record of decisions made in Slack that were not committed at the time.


Deployment State


Model Modes (Permanent)

Three modes. Switch in Slack using plain language: "use free mode", "switch to quality", etc.

Mode Model Cost Use for
Free (default) openrouter/free $0 Everything. Auto-cycles free providers.
Budget openrouter:deepseek/deepseek-v4-flash ~$0.10/1M tok Daily pulse, routine ops
Quality openrouter:anthropic/claude-sonnet-4.6 ~$3–5/day Hard reasoning, strategic, client-facing

Rules: - Free mode is the permanent default. Never switch to a paid model automatically. - If all free providers are rate-limited: fail loudly, notify Anthony, offer to temporarily use budget. - For any one-off paid capability (web search, image gen, image analysis): quote estimated cost from live OpenRouter pricing and wait for explicit approval before proceeding. - Switch back to free automatically once the paid task is done.


Free Mode Configuration

Applied to /srv/easier-hermes/data/config.yaml on 2026-05-28:

model:
  provider: openrouter
  model: "openrouter/free"
  fallback: ""  # No automatic paid fallback

auxiliary:
  default: { provider: openrouter, model: "openrouter/free" }
  title_generation: { provider: openrouter, model: "openrouter/free" }
  vision: { provider: openrouter, model: "openrouter/free" }
  compression: { provider: openrouter, model: "openrouter/free" }
  session_search: { provider: openrouter, model: "openrouter/free" }

tools:
  web: false
  browser: false
  image_gen: false

Note on openrouter/free: The auto-router cycles through all available free models (currently GPT-OSS-120B, LFM2.5, etc.). The :free suffix variants (e.g. deepseek-v4-flash:free) are rate-limited and unreliable — use the router, not named free variants.


Image Analysis in Free Mode


Context Window Management

Problem: OpenRouter logs showed 100k+ token context per call (~$0.05 each on DeepSeek).

Decisions made:

  1. Caveman memory: Store key facts in ultra-minimal language in long-term memory (GitHub vault).
  2. Bad: "Sure! I'd be happy to help. The issue is most likely caused by..."
  3. Good: "Bug: auth middleware. Token expiry: < not <="
  4. Short-term memory: Extract only essential context for active task; don't send full docs.
  5. Auto-compression: When context nears model limit, summarise older turns before next call.
  6. Session search: For recurring queries, retrieve from memory rather than re-derive.
  7. Delegate: Split large jobs into parallel sub-agents via delegate_task.

Skills Created (2026-05-28)


Vault Structure (2026-05-28)

Created at /srv/easier-hermes/vault/:

vault/
  index.md                          # Navigation map
  log.md                            # Session log (first entry: 2026-05-28 handoff)
  raw/synthetic/
    marketing-overview-may-2026.md
    sales-pipeline-may-2026.md
    client-relationship-acme-2026-05.md
    fulfilment-monthly-report-may-2026.md
    operations-weekly-2026-05.md
    rd-research-log-may-2026.md
  briefs/coo-ai-ops-manager/
    dry-run-pulse-2026-05-28.md     # First COO daily pulse (format approved)
  evals/
    coo-eval-benchmark.md           # 15 eval questions

Cron Jobs

Job Schedule Status
coo-daily-pulse 08:00 UTC daily Paused (format approved; resume when ready)

To resume: hermes cron resume coo-daily-pulse


Decisions Made

Decision Choice
Free mode default Permanent — never auto-switch to paid
Easier Now Not a current concern; in-dev, hands off
Outreach Not a current concern right now
Content pipeline attribution Backlog — not yet
Vault sync to Obsidian Not yet decided
Daily pulse delivery #int-agentops, 8am UTC
Pulse format Approved as-is
Model switching UI Plain language in Slack ("use free mode" etc.)
Paid task approval Must quote cost + get explicit approval
Context rot prevention Caveman memory + short-term extraction
Slack tables Use Block Kit JSON, not Markdown tables

What Was NOT Committed at the Time

The following were applied live on the server but not pushed to GitHub: - Live config.yaml changes (now reflected in config.yaml.template) - Skills (free-mode-automation, caveman-memory) - Vault files (index.md, log.md, synthetic notes, dry-run pulse, evals) - Cron job configuration

These remain on the GCE server. They are not version-controlled. See docs/16-hermes-git-workflow.md for how Hermes should commit going forward.