Hi,👋 we have updated the app and fixed multiple bugs. We are lacking funds, request to free user not to use Adblock. Ads are non intrusive. 😊

@SyedAbdurR2hman: Every Hermes Agent guide you'l...

@SyedAbdurR2hman
54 views May 09, 2026
Advertisement

Every Hermes Agent guide you'll find right now covers the same three things: run the one-line installer, run `hermes setup`, connect a Telegram bot. The DataCamp tutorial, the NxCode walkthrough, the Braincuber guide — all of them stop there.

Media image

That's fine if you want a chatbot that survives a reboot. It's not enough if you want an agent that actually operates autonomously, builds its own tools, runs a self-hosted search stack, delegates to local models, and executes complex multi-step workflows without you.

I've been running Hermes Agent v0.12.0 with 145 skills across 28 categories, the full built-in tool surface plus a handful of custom ones, a SearXNG search stack aggregating five engines, Blender 3D rendering, multi-agent WoT reasoning, and a supervisor/worker inference architecture. I also have 4 open PRs in the Hermes codebase — including the Web-of-Thought engine and native desktop control.

This is the guide that doesn't exist yet.

First: Understand What Hermes Actually Is

The official docs describe Hermes as "the agent that grows with you." That's accurate but undersells the architecture.

Hermes isn't a chatbot wrapper. It's an agent runtime — a persistent process that:

  • Maintains memory and learned workflows across sessions indefinitely
  • Executes real system tools (terminal, browser, file system, email, local LLMs, 3D software)
  • Builds and reuses skills — structured workflow documents it loads before tasks
  • Runs scheduled jobs unattended via a built-in cron system
  • Delegates subtasks to sub-agents and reviews their output before shipping
  • The mental model shift: you're not prompting it. You're configuring a system that operates on your behalf. The difference shows up after week two, not day one.

    The Inference Architecture Nobody Talks About

    Every guide tells you to pick a model. Nobody tells you to run two tiers.

    The setup that actually scales is a supervisor/worker split:

    ```plaintext
    
    Supervisor — cloud model, large, capable, expensive
    │
    ├── Worker (leaf task) — local model, small, fast, free
    ├── Worker (leaf task) — local model, small, fast, free
    └── Worker (leaf task) — local model, small, fast, free
    
    ```

    My setup:

  • Supervisor: DeepSeek V4 Flash via OpenRouter. 284B MoE with 13B active parameters, 1M context window. Handles all reasoning, orchestration, user-facing responses, and final review.
  • Workers: Qwen3 family (0.6B, 1.7B, 4B) running locally via Ollama on an RTX 4050. Handle leaf tasks — translate 20 messages in parallel, classify 50 leads, summarize a 3,000-word document, extract structured data from a batch of emails.
  • The pattern is non-negotiable: Workers never reply to the user directly. Every worker output goes through the supervisor before shipping. Workers are leaf executors. The supervisor decomposes, delegates, reviews, and assembles.

    Why this matters economically: bulk operations (classify 50 items, translate 20 messages) run at essentially zero cost on local models. You only pay for frontier-model inference on tasks that actually require it. On a $5/month VPS or a consumer laptop, this architecture lets you run production-grade workloads cheaply.

    To set this up, configure your supervisor in `~/.hermes/config.yaml`:

    ```yaml
    model:
    default: deepseek/deepseek-v4-flash
    provider: openrouter
    base_url: https://openrouter.ai/api/v1
    context_length: 128000
    
    ```

    Then define routing rules in a `local-llm` skill (more below). For Ollama workers, make sure your context length is adequate:

    ```bash
    OLLAMA_CONTEXT_LENGTH=32768 ollama serve
    # Or set permanently in /etc/systemd/system/ollama.service
    
    ```

    Search — The Foundation Most People Skip

    The default Hermes web search works. You'll still hit rate limits and quality ceilings quickly if you're doing real research.

    My setup: SearXNG running in Docker, aggregating Google, Bing, DuckDuckGo, Brave, and Startpage into one query. No API keys. No rate limits. No per-search cost.

    ```bash
    docker run -d \
    --name searxng \
    -p 8888:8080 \
    -e SEARXNG_SECRET_KEY=$(openssl rand -hex 32) \
    searxng/searxng
    
    ```

    Then a thin Python wrapper at `~/.local/bin/ws-search` that makes it callable from any skill:

    ```python
    #!/usr/bin/env python3
    import sys, requests
    query = " ".join(sys.argv[1:])
    r = requests.get("http://localhost:8888/search", params={
    "q": query, "format": "json", "language": "en"
    }, timeout=10)
    for result in r.json().get("results", [])[:10]:
    print(f"{result['title']}\n{result['url']}\n{result.get('content','')}\n")
    
    ```

    Now any skill or tool call can run `ws-search "query"` via terminal and get multi-engine results as plain text. Zero API keys in the loop.

    I also keep `ddgr` (DuckDuckGo CLI) as a lightweight fallback, and `lynx` (text browser) for actually reading pages that search returns. Standard pipeline:

    ```bash
    ws-search "query" | head -20
    # → grab URLs
    lynx --dump https://example.com | pandoc -f html -t plain
    # → readable text for the model
    ```

    Pair with `tesseract` for OCR on images/PDFs and `ImageMagick convert` for extraction. Complete document intelligence pipeline, no cloud dependencies beyond the LLM.

    Skills — The Actual Power of Hermes

    The official quickstart mentions skills. It doesn't tell you how to think about them properly.

    A skill is a markdown file at `~/.hermes/skills///SKILL.md`. It tells Hermes exactly how to handle a specific type of task — what tools to use, in what order, with what parameters, producing what output. The model reads this skill at the start of a task and follows the procedure.

    Skills are the difference between Hermes doing something once and Hermes doing it reliably every time.

    Minimal skill structure:

    ```markdown
    ---
    name: currency-rates
    description: Fetch real-time exchange rates for any major currency pair
    version: 1.0.0
    platforms: [linux, macos]
    ---
    ## When to use
    Trigger on: "what's the exchange rate", "convert  to ",
    any currency conversion request.
    ## Procedure
    1. Run: `ws-search " exchange rate today xe.com"`
    2. Fetch top result: `lynx --dump {url} | grep -A2 "1 "`
    3. Extract rate, timestamp, source URL
    4. If SearXNG is down, fallback: `ddgr " to  rate today"`
    ## Output
    Single line: "1  = {rate}  as of {time} (source: {url})"
    ## Notes
    Never invent a rate. If extraction fails, say so and link xe.com directly.
    
    ```

    That's a production skill from my stack. Simple, reliable, parameterized, explicit about failure modes.

    How I actually build skills — the AI-assisted workflow:

  • Do the task manually once, noting every step in order
  • Tell Claude: "Write a Hermes Agent SKILL.md for this task. The procedure is: [exact steps]. It should be triggered by: [phrases]. Output should be: [format]."
  • Paste the output into `~/.hermes/skills///SKILL.md`
  • Test it with the trigger phrase from the skill
  • Fix what breaks, iterate SKILL.md once or twice
  • Done — it works reliably from now on
  • Most skills stabilize in 2-3 iterations. After that they run unsupervised.

    I now have 145 skills across 28 categories. They compound — a research skill calling a verification skill calling a draft-response skill is a complete end-to-end pipeline triggered by one message. Each skill I add makes adjacent skills more powerful.

    The autonomous skill creation rule I added to my SOUL.md:

    If a task required 5+ tool calls and no existing skill matched cleanly, create a new skill before finishing the response.

    This is how the library grows organically. The agent builds its own tools.

    To install community skills from the Skills Hub:

    ```bash
    hermes skills search 
    hermes skills install 
    
    ```

    Tools — Extending What Hermes Can Do

    Skills define *how* to do something. Tools define *what Hermes can physically do*.

    Hermes ships with 40+ built-in tools: terminal, browser, file operations, vision analysis, memory, kanban, cron, MCP servers, and more. But the real power is extending it.

    Every tool is a Python file in `~/.hermes/hermes-agent/tools/`. Drop a file there, restart the gateway, and the tool is available.

    Minimal tool structure — Hermes uses an explicit `registry.register(...)` call at module top (the AST tool-discovery scans for this exact pattern):

    ```python
    import subprocess
    from tools.registry import registry
    BUCK_SEARCH_SCHEMA = {
    "name": "buck_search",
    "description": (
    "Multi-engine web search via SearXNG. Returns top 10 results as plain text. "
    "Use for any research query."
    ),
    "parameters": {
    "type": "object",
    "properties": {
    "query": {
    "type": "string",
    "description": (
    "Search query. Be specific. "
    "Example: 'open-source vector databases benchmark'"
    ),
    },
    },
    "required": ["query"],
    },
    }
    def _buck_search_handler(args, **kw):
    result = subprocess.run(
    ["ws-search", args["query"]],
    capture_output=True, text=True, timeout=15,
    )
    return result.stdout or "No results found."
    registry.register(
    name="buck_search",
    toolset="search",
    schema=BUCK_SEARCH_SCHEMA,
    handler=_buck_search_handler,
    check_fn=lambda: True,
    requires_env=[],
    is_async=False,
    description="Multi-engine SearXNG search.",
    emoji="🔎",
    )
    ```

    The `description` inside the schema is what the model reads to decide when to use the tool. Write it like you're explaining it to a colleague — be specific about what it does and when.

    For tools that depend on system capabilities, use `check_fn` to make them gracefully absent:

    ```python
    import shutil
    from tools.registry import registry
    def _xdotool_available():
    return shutil.which("xdotool") is not None
    registry.register(
    name="computer_use_linux",
    toolset="computer_use",
    schema={
    "name": "computer_use_linux",
    "description": "Control Linux desktop via xdotool — click, type, screenshot, get active window.",
    "parameters": {"type": "object", "properties": {...}, "required": [...]},
    },
    handler=_computer_use_linux_handler,
    check_fn=_xdotool_available,
    requires_env=[],
    is_async=False,
    description="Native Linux desktop control.",
    emoji="",
    )
    ```

    When `check_fn` returns False, the model never sees the tool. No error handling needed in skills. The tool simply doesn't exist on systems where it can't work. (Result is cached for 30s by the registry, so probing live state in `check_fn` is cheap.)

    After dropping a new tool file, restart the gateway so the AST tool-discovery picks it up:

    ```bash
    # Find the running gateway process and kill it, then re-run:
    pkill -f "hermes_cli.main gateway"
    hermes gateway run --replace
    # Or if you have it installed as a systemd user service:
    systemctl --user restart hermes-gateway
    
    ```

    Custom helpers I've built that transformed my setup:

    - `~/.local/bin/ws-search` — SearXNG Python wrapper, callable from any skill via terminal

    - `~/.local/bin/send-email` — Python/smtplib SMTP wrapper for outbound email from skills

    - `~/.local/bin/save-list` — takes JSON structured data, outputs formatted Markdown or PDF

    - `~/.local/bin/gateway-boot-notify.sh` + systemd user unit — sends Telegram ping when gateway starts

    None of these exceed 100 lines. Each unlocks an entire category of workflows.

    The SOUL.md — Where Most People Leave Power on the Table

    `~/.hermes/SOUL.md` is the system prompt for your entire agent. It runs before every response. Most people leave it as the default 30 lines. I have 200+ lines. The difference is enormous.

    Demeanor block. Tell it exactly how to behave. Vague instructions produce inconsistent behavior. Specific instructions — "no filler words, lead with results, one composed line beats three eager ones, never say 'Certainly!' or 'Great question!'" — produce reliable behavior.

    Mandatory first action. The most impactful addition I made:

    ```markdown
    ## MANDATORY FIRST ACTION
    Before composing ANY substantive reply:
    1. Scan available skills. Score top 2-3 candidates. Load the best one via skill_view().
    2. If even 1% underconfident about the best tool, library, or method — search online first.
    Do not guess. Do not proceed on stale knowledge. Search, verify, then act.
    ```

    Without this, the model improvises from scratch every time — ignoring skills you spent hours building. With it, skills actually get used consistently.

    Machine context. Tell it your actual hardware limits. VRAM ceiling, installed CLIs, what's NOT installed. Prevents it from trying to use tools that don't exist.

    Pull your specs:

    ```bash
    uname -a
    lscpu | grep -E "Model|Thread|Core"
    free -h
    nvidia-smi --query-gpu=name,memory.total --format=csv,noheader 2>/dev/null
    ```

    Autonomous skill creation rule:

    ```markdown
    When a task required 5+ tool calls and no existing skill matched cleanly,
    and the task type could repeat — create a new skill before finishing.
    skill_manage(action="create", category="", name="", content="")
    ```

    Multi-Agent Reasoning — The WoT Engine

    For hard decisions, I use Web-of-Thought reasoning — a multi-agent coordinator where specialized agents debate and refine an answer before it reaches you. I have an open PR for this in Hermes (#20158) — the Web-of-Thought engine.

    The idea: instead of one model answering a hard question, spawn 3-7 agents with different perspectives, let them communicate directly across multiple rounds, synthesize the transcript.

    ```plaintext
    hermes: use wot_chat — agent 1 argues the case for,
    agent 2 argues the case against, agent 3 synthesizes —
    decision: should we adopt async or threads for this worker pool?
    
    ```

    Four communication modes:

    - `parallel` — all agents react simultaneously, see peers on round boundaries

    - `streaming` — same, but agents see partial CoT tokens as they're generated

    - `sequential` — round-robin, each agent gets full prior transcript

    - `queue` — tag-driven, agents only act when their relevant tag appears

    When to use WoT:

    - Hard tradeoffs with real downside risk

    - Decisions needing adversarial pressure

    - Complex plans worth red-teaming before committing

    When NOT to use it:

    - Simple lookups

    - Single-fact questions

    - Anything a tool call can answer in one step

    Cost discipline: set `max_rounds: 2-3` for most tasks. Token cost scales with rounds × agents × length.

    Channels — Running 24/7 Without Being There

    This is where Hermes stops being a CLI tool and becomes an actual agent.

    Set up the messaging gateway:

    ```bash
    hermes gateway setup
    # Follow prompts to connect Telegram, Discord, Slack, WhatsApp, Signal, Email
    hermes gateway install # systemd/launchd service — survives reboots
    
    ```

    My active channels:

  • Telegram — primary. Send tasks from my phone at 11pm, wake up to results in the morning.
  • Discord — secondary, better for longer structured outputs.
  • Email — Hermes monitors inbox, can draft and send replies autonomously.
  • The gateway runs as a background service. The agent is always on, accessible from anywhere.

    The Kanban Board — For Long-Running Work

    The official kanban tutorial exists but almost nobody uses it in practice.

    For multi-step workflows that span days — lead pipelines, research projects, content schedules — the SQLite-backed kanban gives Hermes durable state across gateway restarts.

    ```bash
    hermes kanban init
    hermes dashboard # opens http://127.0.0.1:9119
    
    ```

    Create dependent task chains:

    ```bash
    TASK1=$(hermes kanban create "Research candidates" --assignee hermes --json | jq -r .id)
    hermes kanban create "Draft outreach" --parent $TASK1 --assignee hermes
    ```

    The second task won't become available until the first completes. This is how you build workflows that run across days without micromanaging.

    Cron — Set It and Forget It

    Most guides mention cron exists. Nobody shows you how to actually use it.

    ```bash
    # Natural language scheduling
    hermes: schedule a task every Monday at 9am — pull currency rates and send to Telegram
    hermes: every day at 7am — check for new emails and summarize them
    hermes: every Sunday at 6pm — research this week's relevant industry news
    
    ```

    Hermes translates these into actual cron specs and registers them in `~/.hermes/cron/`. They run unattended via the gateway, delivering results to whatever channel you specify.

    Practical Notes From Production Use

    Read the built-in skills before building anything. Hermes ships with 89 skills across mlops, software-development, research, github, creative, and more. Check first.

    The 5-minute skill ROI. A skill that takes 30 minutes to write, used 6 times, has paid for itself. Build them aggressively.

    Know your hardware limits and encode them. On 6GB VRAM: one Ollama model at a time, no parallel Blender renders, 5-15 second model switching cost. Put these constraints in SOUL.md — the agent respects them if they're written down.

    `hermes doctor` when things break. Built-in diagnostics catch 90% of config issues before you go digging.

    The compound effect is real. Month one: 10 skills, basic workflows. Month three: 50 skills, the agent handles entire processes. Month six: 145 skills, it's doing things you didn't anticipate when you started. Each skill makes adjacent skills more powerful.

    The official documentation is excellent for the basics. This is what comes after the basics.

    4 days of running Hermes Agent as my primary AI system:

    $5.95 total

    49.1M tokens processed

    2K requests

    145 skills built

    DeepSeek V4 Flash via OpenRouter as supervisor, Qwen3 local workers for bulk tasks.

    $5.95 for 4 days of serious agentic work. The cost argument for this architecture is real.

    Media image
    Actions
    Visual Editor Carousel Maker NEW
    Update Thread
    What You Can Do
    • Download as PDF
    • Save to Notion
    • Export as Markdown
    • Visual Editor
    • LinkedIn & Instagram Carousel Maker
    Create Free Account

    Includes 7-day Premium trial

    Advertisement