Edit: github-backup/docs/15-operational-setup-2026-05-28.md

Note Content (Markdown)

# Operational Setup — 2026-05-28 Live Session

This document captures everything established during the first live Hermes session on 2026-05-28.
It represents the authoritative record of decisions made in Slack that were not committed at the time.

---

## Deployment State

- **Gateway**: Running on GCE instance `instance-20250306-165614` (us-central1-c)
- **Host**: `admin@34.27.189.109` via `~/.ssh/easier_validation_gce_ed25519`
- **Install path**: `/srv/easier-hermes`
- **Hermes version**: v2026.5.16
- **Slack channel**: #int-agentops (`C0B7JE5QYDN`)
- **Slack app**: @Easier
- **Config**: `/srv/easier-hermes/data/config.yaml`

---

## Model Modes (Permanent)

Three modes. Switch in Slack using plain language: "use free mode", "switch to quality", etc.

| Mode | Model | Cost | Use for |
|------|-------|------|---------|
| **Free (default)** | `openrouter/free` | $0 | Everything. Auto-cycles free providers. |
| **Budget** | `openrouter:deepseek/deepseek-v4-flash` | ~$0.10/1M tok | Daily pulse, routine ops |
| **Quality** | `openrouter:anthropic/claude-sonnet-4.6` | ~$3–5/day | Hard reasoning, strategic, client-facing |

**Rules:**
- Free mode is the permanent default. Never switch to a paid model automatically.
- If all free providers are rate-limited: fail loudly, notify Anthony, offer to temporarily use budget.
- For any one-off paid capability (web search, image gen, image analysis): quote estimated cost from live OpenRouter pricing and wait for explicit approval before proceeding.
- Switch back to free automatically once the paid task is done.

---

## Free Mode Configuration

Applied to `/srv/easier-hermes/data/config.yaml` on 2026-05-28:

```yaml
model:
  provider: openrouter
  model: "openrouter/free"
  fallback: ""  # No automatic paid fallback

auxiliary:
  default: { provider: openrouter, model: "openrouter/free" }
  title_generation: { provider: openrouter, model: "openrouter/free" }
  vision: { provider: openrouter, model: "openrouter/free" }
  compression: { provider: openrouter, model: "openrouter/free" }
  session_search: { provider: openrouter, model: "openrouter/free" }

tools:
  web: false
  browser: false
  image_gen: false
```

**Note on `openrouter/free`**: The auto-router cycles through all available free models
(currently GPT-OSS-120B, LFM2.5, etc.). The `:free` suffix variants (e.g. `deepseek-v4-flash:free`)
are rate-limited and unreliable — use the router, not named free variants.

---

## Image Analysis in Free Mode

- `image_gen` is disabled (no image generation).
- Vision/image **analysis** still works via `auxiliary.vision` which uses `openrouter/free`.
- If a free model can't handle an image, Hermes must quote cost and request approval before using a paid vision model.

---

## Context Window Management

**Problem**: OpenRouter logs showed 100k+ token context per call (~$0.05 each on DeepSeek).

**Decisions made:**

1. **Caveman memory**: Store key facts in ultra-minimal language in long-term memory (GitHub vault).
   - Bad: "Sure! I'd be happy to help. The issue is most likely caused by..."
   - Good: "Bug: auth middleware. Token expiry: < not <="
2. **Short-term memory**: Extract only essential context for active task; don't send full docs.
3. **Auto-compression**: When context nears model limit, summarise older turns before next call.
4. **Session search**: For recurring queries, retrieve from memory rather than re-derive.
5. **Delegate**: Split large jobs into parallel sub-agents via `delegate_task`.

---

## Skills Created (2026-05-28)

- **`free-mode-automation`** — Manages model switching, cost quoting, Slack workflows.
  Location: `/srv/easier-hermes/data/skills/free-mode-automation/`
- **`caveman-memory`** — Stores critical info in ultra-minimal format.
  Location: `/srv/easier-hermes/data/skills/caveman-memory/`
- **`project-onboarding`** — Created via self-improvement review.

---

## Vault Structure (2026-05-28)

Created at `/srv/easier-hermes/vault/`:

```
vault/
  index.md                          # Navigation map
  log.md                            # Session log (first entry: 2026-05-28 handoff)
  raw/synthetic/
    marketing-overview-may-2026.md
    sales-pipeline-may-2026.md
    client-relationship-acme-2026-05.md
    fulfilment-monthly-report-may-2026.md
    operations-weekly-2026-05.md
    rd-research-log-may-2026.md
  briefs/coo-ai-ops-manager/
    dry-run-pulse-2026-05-28.md     # First COO daily pulse (format approved)
  evals/
    coo-eval-benchmark.md           # 15 eval questions
```

---

## Cron Jobs

| Job | Schedule | Status |
|-----|----------|--------|
| `coo-daily-pulse` | 08:00 UTC daily | Paused (format approved; resume when ready) |

To resume: `hermes cron resume coo-daily-pulse`

---

## Decisions Made

| Decision | Choice |
|----------|--------|
| Free mode default | Permanent — never auto-switch to paid |
| Easier Now | Not a current concern; in-dev, hands off |
| Outreach | Not a current concern right now |
| Content pipeline attribution | Backlog — not yet |
| Vault sync to Obsidian | Not yet decided |
| Daily pulse delivery | #int-agentops, 8am UTC |
| Pulse format | Approved as-is |
| Model switching UI | Plain language in Slack ("use free mode" etc.) |
| Paid task approval | Must quote cost + get explicit approval |
| Context rot prevention | Caveman memory + short-term extraction |
| Slack tables | Use Block Kit JSON, not Markdown tables |

---

## What Was NOT Committed at the Time

The following were applied live on the server but not pushed to GitHub:
- Live `config.yaml` changes (now reflected in `config.yaml.template`)
- Skills (`free-mode-automation`, `caveman-memory`)
- Vault files (index.md, log.md, synthetic notes, dry-run pulse, evals)
- Cron job configuration

These remain on the GCE server. They are not version-controlled.
See `docs/16-hermes-git-workflow.md` for how Hermes should commit going forward.