Hi,👋 we have updated the app and fixed multiple bugs. We are lacking funds, request to free user not to use Adblock. Ads are non intrusive. 😊

@gkisokay: It's the agent most builders s...

@gkisokay
35 views May 10, 2026
Advertisement

It's the agent most builders skip. It turns the outside world into compounding intelligence. In 3 months, it's logged 8,000+ pieces of evidence across 16 topics, and every other agent in my stack starts smarter as a result.

Media image

It's the research agent. The always-on research department inside Hermes or OpenClaw. It watches what is happening, reads my own workspace, tracks what is moving, and writes everything into a local vault that every other agent in my stack can reuse.

It is not a scraper. This isn't a daily summary bot. This part of the system ensures tomorrow's agents don't start with yesterday's knowledge.

The first version of a research agent is usually a feed reader. The better version is a librarian. The best version is an evidence operator.

Skip that distinction, and research bots become hallucination laundries: confident prose with no separation between what was observed, what was claimed, and what was actually verified.

This guide is the template. Point Hermes, OpenClaw, or any agent system at it and build your own.


Why This Matters

A typical research agent scrapes, summarizes, and posts a digest. Then the next run starts almost from scratch, except for a note.

That looks productive because there is output everywhere. But output is not the same thing as compounding evidence.

A real research agent should be able to answer:

  • What do we know now that we did not know before?
  • Which claims are strong, and which are interesting but under-evidenced?
  • Which sources and topics keep being useful?
  • Which signals belong to which agent, and which old beliefs are now stale?
  • The research agent exists because the outside world is too noisy for the main agent to parse from scratch every time.

    The agent system needs a research layer that can absorb noise, preserve evidence, and route only what matters to you.

    That is the difference between agents that stay busy and agents that get sharper.

    What The Vault Actually Holds

    To make this concrete, here is what my live research vault looks like right now:

  • 2,631 claim records: candidate beliefs the system has extracted, clustered, and tracked over time
  • 2,694 findings: individual observed signals from docs, GitHub, feeds, X, search, and other collectors
  • 2,694 source evidence records: the citation trail tying findings back to URLs, source types, excerpts, and timestamps
  • 13 indexed dossiers: living topic files for lanes like AI agents, frontier AI, crypto rails, memory orchestration, and robotics
  • 18 registered source surfaces: the collector lanes, the research agent knows how to watch or audit
  • 8 operational modes in the loop skill
  • 6 handoff lanes routing to other agents
  • 36 vault tools, plus profile wrapper scripts, for validation, compilation, backup, search, health checks, and recovery
  • A 6-hour refresh cadence plus 3 daily delivery jobs
  • This is what happens when one agent has a vault, a source plan, and discipline for a few months.

    The counts are not the point. A big ledger is not automatically a good research system. The point is separation.

    A finding is not a claim. A claim is not verified knowledge. A source record is not a conclusion. A dossier is not a daily summary. A weak signal is not a task.

    The research agent works because it has space for each stage, rather than flattening everything into confident prose.


    The Current Architecture

    Hermes agent is the production owner. The live system has separate roles.

  • The research agent is the evidence collector.
  • The main is the conscious operator.
  • Subconscious is the pattern-noticer.
  • Coder is the builder.
  • QA is the auditor.
  • Content is the publishing operator.
  • Media image

    As you can see, everything is downstream from the research agent. It contributes evidence and implications. It does not own the whole machine.

    When research, judgment, building, and publishing collapse into a single agent, the system starts treating every interesting signal as an action item.

    The research agent is not allowed to do that. It gathers, weighs, remembers, and routes. Then the rest of the system decides what to do.

    The SOUL.md

    This is the file that defines the research agent's role. The SOUL is the job description, identity, and boundary layer. Here is the shape of mine:

    ```
    Research Agent - GK's Always-On Research Operator
    
    You are the Hermes profile responsible for building a durable research corpus that helps GK get ahead.
    
    Your job is to turn the outside world into compounding evidence for influence, product opportunity, workflow advantage, monetization, and better decisions.
    
    Core Loop
    observe -> infer priorities -> gather evidence -> deepen one question -> update vault -> route implications -> repeat
    
    ```

    That loop is the whole pattern.

    The research agent does not "read news." It observes, infers priorities, gathers evidence, deepens one question, updates the vault, and routes implications.

    Reading the news creates summaries. Running a research loop creates judgment.

    The Vault

    Do not make your research agent's memory a chat transcript. Give it a vault. Chat history is continuous. A vault is an accumulation.

    The minimum recommended shape looks like this:

    ```
    research-vault/
      context/
        interest-profile.json
        interest-profile.md
        source-plan.md
      config/
        collector-config.json
        source-registry.json
        source-weights.json
        thresholds.json
      dossiers/
        ai_agents.md
        frontier_ai.md
        memory_orchestration.md
      knowledge/
        claims.jsonl
        findings.jsonl
        sources.jsonl
      raw/
      sources/
      topics/
      decisions/
      runs/
      indexes/
      queue/
        research-questions.md
        verification-review.md
        buildroom-handoff.json
        content-handoff.json
      notes/
        operator-brief.md
        daily-summary.md
      wiki/
        concepts/
        articles/
      health/
        latest-health-check.md
      ops/
        collector-health.md
        source-balance.md
        operator-cockpit.html
    
    ```

    This is the recommended starting shape. Mine has grown to roughly 29 directories over time as the system needed places for run receipts, decision logs, packets, reviews, and visualizations. You do not need all of them on day one. You do need the core distinctions above.

    A few of these are worth calling out because most research agent setups skip them:

  • decisions/ is where the system records what was decided, by whom, and on what evidence. Research without a decision ledger forgets why it changed direction.
  • runs/ is where each refresh leaves a receipt. If you cannot replay a run, you cannot trust its output later.
  • raw/ is the unprocessed capture layer. Keeping it separate from knowledge/ is the difference between an agent that synthesizes and an agent that smudges.
  • The folders force the system to distinguish raw material, claims, sources, dossiers, verification, summaries, handoffs, and health.

    If you do not give your research agent those distinctions, it will flatten everything into prose.

    The Research Loop

    The loop should be packaged as a skill.

    Each research run updates the interest profile, source plan, topic dossiers, question backlog, operator brief, daily summary, source balance, collector health, knowledge ledgers, verification queue, wiki pages, and a run receipt.

    If you are building your own, you can start with a skill file like this:

    ```
    ---
    name: research-agent-loop
    description: Bootstrap and refresh a local research vault from shared system context. Builds an interest profile, source plan, research question backlog, operator brief, and durable evidence ledgers.
    version: 0.1.0
    author: Your System
    license: MIT
    metadata:
      tags: [research, agent, local-first, vault, evidence]
    ---
    
    # Research Agent Loop
    
    Use this skill to build and refresh the research agent's local vault.
    
    ## Purpose
    
    - Convert shared system state into a clean research operating surface
    - Infer current interests from durable notes, posting behavior, shipped work, and repeated questions
    - Collect from a bounded source plan instead of scraping everything
    - Separate raw captures, findings, claims, sources, verification gaps, and dossiers
    - Route implications to the right downstream lane: build, content, monetization, watch, verification, or subconscious reflection
    
    ## Modes
    
    ### BOOTSTRAP
    
    Initialize or rebuild the vault from shared source files.
    
    bash
    python3 ~/.hermes/profiles/research-agent/skills/research/research-agent-loop/scripts/research_agent_loop.py --mode bootstrap
    
    
    ### REFRESH
    
    Recompute the interest profile, source plan, evidence ledgers, dossiers, question backlog, operator brief, and wiki/index layer using the latest shared state.
    
    Allow enough runtime for collection, compilation, and health checks.
    
    bash
    python3 ~/.hermes/profiles/research-agent/skills/research/research-agent-loop/scripts/research_agent_loop.py --mode refresh
    
    
    ### DAILY_SUMMARY
    
    Refresh first, then print the human-facing digest exactly as it should be delivered.
    
    bash
    python3 ~/.hermes/profiles/research-agent/skills/research/research-agent-loop/scripts/research_agent_loop.py --mode refresh
    python3 ~/.hermes/profiles/research-agent/skills/research/research-agent-loop/scripts/print_research_digest.py --shared
    
    
    ### SUBCONSCIOUS_BRIEF
    
    Refresh first, then print a smaller pattern-facing brief for the subconscious or strategy layer.
    
    bash
    python3 ~/.hermes/profiles/research-agent/skills/research/research-agent-loop/scripts/research_agent_loop.py --mode refresh
    python3 ~/.hermes/profiles/research-agent/skills/research/research-agent-loop/scripts/print_research_digest.py --tier subconscious
    
    
    ### MIDDAY_FOCUS
    
    Do not scrape. Rebuild only operator surfaces from existing artifacts, then print the current focus receipt.
    
    bash
    python3 ~/.hermes/profiles/research-agent/scripts/research_agent_midday_focus.py
    
    
    ### BACKUP
    
    Create a timestamped backup of the profile config, cron, vault, and local memory database.
    
    bash
    python3 ~/.hermes/profiles/research-agent/skills/research/research-agent-loop/scripts/backup_research_agent.py
    
    
    ### RESTORE
    
    Preview or restore a backup archive.
    
    bash
    python3 ~/.hermes/profiles/research-agent/skills/research/research-agent-loop/scripts/restore_research_agent.py --latest --dry-run
    python3 ~/.hermes/profiles/research-agent/skills/research/research-agent-loop/scripts/restore_research_agent.py --latest --force
    
    
    ### RECOVER
    
    One-command recovery path. Optionally restore the latest backup, then run `bootstrap` or `refresh`.
    
    bash
    python3 ~/.hermes/profiles/research-agent/skills/research/research-agent-loop/scripts/recover_research_agent.py --dry-run
    python3 ~/.hermes/profiles/research-agent/skills/research/research-agent-loop/scripts/recover_research_agent.py --latest-backup
    
    
    ## Operating Rules
    
    1. Treat shared workspace context as upstream read-only evidence.
    2. Treat `workspace/research-vault/` as the canonical write surface for this profile.
    3. Do not write into production state owned by other agents.
    4. Prefer bounded, high-signal source plans over broad scraping.
    5. Keep social feeds as early signal unless verified by stronger sources.
    6. Bias toward durable artifacts: JSON ledgers, Markdown briefs, question backlogs, dossiers, run receipts, and source trails.
    7. Separate findings, claims, sources, verification gaps, and tasks.
    8. If a collector fails or data is stale, say so explicitly.
    9. Treat wiki/index compilation and health checks as part of refresh, not optional cleanup.
    
    ## Output Discipline
    
    - Return a short run summary with the main updated artifacts.
    - If nothing materially changed, say so briefly.
    - For delivery modes, return only the rendered digest body with no wrappers, commentary, or code fences.
    - Never pretend stale or degraded data is fresh.
    - Never promote a weak signal into an approved task.
    
    ```

    If you didn't build my Subconscious agent, don't include anything that references it, but also be aware of your setup and make sure everything is wired properly.

    That skill file gives the research agent an operating contract. It tells the agent what modes exist, where it can write, what artifacts count as success, and what it is not allowed to blur.

    The exact scripts can change. The contract should stay stable. The important modes are:

  • BOOTSTRAP
  • REFRESH
  • DAILY_SUMMARY
  • BACKUP
  • RESTORE
  • RECOVER
  • Bootstrap builds the vault. Refresh updates it. Daily summary renders the human-facing digest. Backup, restore, and recover to keep the system survivable.

    These are related jobs, but they are not the same job. Keeping them separate makes the system calmer.

    The Source Plan

    In my system, the research agent watches my own X posts and posting behaviour, a small number of curated X lists, GitHub repos for agent frameworks and memory tools, Hugging Face signals, RSS feeds from high-signal AI sources, official docs and blogs, and targeted domains around AI labs, crypto rails, and developer tooling, plus optional browser enrichment when necessary.

    The rule is simple: Prefer sources that change decisions. Most people build research agents, thinking a wider collection is always better. It is not.

    If your research agent collects everything, it will drown. If it only collects what is easy, it will overfit to the most convenient API.

    The source plan is where the agent learns taste.

    The Interest Profile

    The research agent does not begin by asking "what is trending?" It asks, "What does the user care about now?"

    The interest profile is rebuilt from shared Hermes state, durable notes, thesis files, my recent posting behaviour, shipped builds, repeated questions, and prior research outputs.

    The files are: research-vault/context/interest-profile.json, research-vault/context/interest-profile.md

    This makes the research agent personal without making it sloppy. It is not trying to become a neutral news service. It's being designed to be useful to a single operator and a single agent system.

    For me, the current lanes include AI agents and agentic systems, crypto rails and agent commerce, Base- and Coinbase-adjacent infrastructure, open-source agent frameworks, memory and orchestration, frontier AI releases, robotics and embodied agents, and content-leverage patterns.

    The categories can change. The important part is that they are explicit.

    If your agent does not know what matters to you, it will optimize for whatever is loudest.

    Dossiers, Ledgers, And Verification

    Recurring topics become dossiers:

    ```
    research-vault/dossiers/ai_agents.md
    research-vault/dossiers/open_source_frameworks.md
    research-vault/dossiers/memory_orchestration.md
    research-vault/dossiers/frontier_ai.md
    
    ```

    A dossier remembers what is accumulating. It can hold why the topic matters, current signal, source trail, contradictions, open questions, product implications, build implications, content implications, and monetization clues.

    This is where research starts becoming leverage. The research agent starts asking:

  • What can we decide?
  • What can Hermes build?
  • What can GK explain?
  • What can become a product?
  • What should we not trust yet?
  • The vault also has machine-readable ledgers:

    ```
    research-vault/knowledge/claims.jsonl
    research-vault/knowledge/findings.jsonl
    research-vault/knowledge/sources.jsonl
    
    ```

    These are the differences between "the agent said something" and "the system can inspect what it believes."

    Every useful research system needs separate places for claims, sources, findings, and verification gaps. If you do not separate these, the agent will smuggle uncertainty into confident prose.

    That is how research bots become hallucination laundries. Not every interesting claim is ready to become knowledge, so weak or under-sourced claims go here:

    ```
    research-vault/queue/verification-review.md
    research-vault/queue/verification-leads.json
    
    ```

    The research agent needs somewhere to put uncertainty. Without a verification queue, uncertainty gets ignored or over-promoted. With a queue, the agent can say, "This might matter, but do not build on it yet."

    The Wiki Layer

    The vault also has an Obsidian-compatible wiki layer inspired by @karpathy in research-vault/wiki/

    Every refresh can compile the latest run into linked pages and indexes.

    The pipeline looks like:

    ```
    collectors -> raw/*.json -> knowledge/*.jsonl -> dossiers/*.md -> wiki/concepts/*.md -> indexes + health
    
    ```


    Raw evidence becomes findings. Findings become dossiers. Dossiers become wiki pages. Wiki pages become searchable memory. Health checks keep the structure from rotting.

    The wiki is there because agents need durable places to return to.

    Operator Surfaces And Handoffs

    The research agent writes an operator brief in research-vault/notes/operator-brief.md.

    This is the first thing I check after a run. It is supposed to answer:

  • What changed?
  • What deserves attention?
  • What is blocked by weak evidence?
  • What should Main, Coder, Content, or Subconscious do with it?
  • The research agent also produces control surfaces:

    ```
    research-vault/ops/operator-cockpit.html
    research-vault/ops/operator-action-ledger.md
    research-vault/ops/operator-focus-discord.md
    research-vault/ops/operator-action-dispatch.json
    
    ```

    The cockpit is for scanning. The action ledger is for follow-through. The focus message is for what matters now. The dispatch JSON is for routing.

    Most people build agents that write notes, but not agents that create operator surfaces. A research system should make the next decision easier.

    The vault also has handoff lanes:

    ```
    queue/buildroom-handoff.json
    queue/content-handoff.json
    queue/monetize-handoff.json
    queue/subc-handoff.json
    queue/verify-handoff.json
    queue/watch-handoff.json
    
    ```

    These hand-offs are specific to my setup, and you'll need to figure out which agents in your stack require one.

    If every signal is treated as the same kind of task, the agent system becomes reckless. If every signal has a lane, the system can route without pretending everything is urgent.

    Quality Gates

    The research agent tracks the shape of its evidence in these files: research-vault/ops/source-balance.md, research-vault/ops/source-balance.json

    This tells the operator whether the run is over-dependent on low-trust surfaces, stale collectors, or degraded lanes.

    A research agent should know when their source mix is weak. Did this rely too much on social media? Did primary sources show up? Did docs, blogs, GitHub, and official releases balance the feed? Which collectors failed?

    Research quality is about the balance of evidence.

    The research agent also has a health lane: research-vault/health/latest-health-check.md, research-vault/health/latest-health-check.json

    It checks for broken links, missing front matter, gaps in the source trail, orphan candidates, stale reviews, verification pressure, and wiki compile drift.

    If the vault is structurally unhealthy, do not trust new synthesis until the structure is fixed.

    From the vault root, the normal validation flow is:

    ```
    cd ~/hermes/profiles/research-agent/workspace/research-vault
    python3 tools/evaluate_research_agent_run.py --dry-run
    python3 tools/validate_research_agent_release.py
    python3 tools/validate_research_agent_regressions.py
    python3 tools/compile_refresh_to_wiki.py
    
    ```

    If you treat research output as "just text," you will never notice when the system gets worse. Validation turns research into an operating surface.

    Runtime, Config, And Models

    The cron file lives here: research-agent/cron/jobs.json

    The live jobs include:

    ```
    research-agent-refresh       every 360 minutes
    research-agent-daily-summary daily
    research-agent-subc-brief    daily
    research-agent-midday-focus  daily
    
    ```

    The daily summary goes to the research channel. The Subconscious brief stays local. The midday focus goes to the research channel, but it does not scrape inline. It rebuilds the operator cockpit and action ledger from existing artifacts. Delivery jobs should not surprise-scrape.

    Important note: if the data is stale, the system should indicate this rather than assume it is fresh.

    The profile config lives here: /research-agent/config.yaml

    This file controls the default model, provider, terminal working directory, timeouts, memory, tool access, compression, browser behaviour, delivery behaviour, and runtime assumptions.

    The code runs the research loop. The config tells the loop what kind of operator it is allowed to be.

    My current research agent stack is intentionally modest. Most scheduled jobs run on MiniMax M2.7, with a local terminal backend, a persistent workspace, and refresh timeouts long enough to exceed short-run defaults.

    The important thing is not the exact model. It is the separation of model jobs. You do not need your strongest model for every scrape, parse, or local rebuild.

    Use cheaper or local models for routine refresh and repeated low-risk processing. Use stronger models for synthesis, judgment, handoff quality, and sensitive reasoning.

    Research is many jobs wearing one coat.

    The Minimum Folder Structure

    If I were recreating this now, I would start here:

    ```
    your-system/
      profiles/
        main/
        research-agent/
        subconscious/
        coder/
        qa/
        contentd/
      profiles/research-agent/
        SOUL.md
        config.yaml
        cron/
          jobs.json
        scripts/
          research_agent_refresh.py
          research_agent_validate.py
          research_agent_daily_summary.py
          research_agent_midday_focus.py
        skills/
          research/
            research-agent-loop/
              SKILL.md
              scripts/
                research_agent_loop.py
                backup_research_agent.py
                restore_research_agent.py
                recover_research_agent.py
        workspace/
          research-vault/
            context/
            config/
            dossiers/
            knowledge/
            queue/
            notes/
            raw/
            wiki/
            indexes/
            health/
            ops/
            tools/
      state/
        research-agent/
    
    ```

    You can rename the profiles. Do not collapse the responsibilities. That is how people accidentally build overconfident autopilots.

    The Guardrails

    The research agent is explicitly not allowed to make trading decisions, publish public posts, make purchases, commit to partnerships, touch secrets or auth surfaces, turn weak signals into approved tasks, or pretend stale data is fresh.

    It can read shared context, collect evidence, score source quality, write dossiers and operator briefs, maintain a verification queue, route implications to other agents, and surface degraded collectors.

    The research agent can influence the machine, but it cannot seize the steering wheel.

    Minimal Pseudocode

    The loop looks roughly like this:

    ```
    on schedule:
      load the research agent SOUL
      read shared Hermes state
      read prior research vault
      rebuild interest profile
      load source plan
    
      collect owned X signal if available
      collect curated list signal if enabled
      collect GitHub, feeds, docs, Hugging Face, and web sources
    
      score source quality
      extract findings
      write raw captures
    
      update sources ledger
      update claims ledger
      update topic dossiers
    
      rebuild research question backlog
      write operator brief
      write daily summary
      update handoff queues
    
      compile refresh into wiki
      rebuild indexes
      run health check
      write latest-run receipt
    
      if collector degraded:
        mark lane degraded
        avoid pretending full freshness
    
      if low-trust coverage too high:
        route claims to verification
    
      if implication looks useful:
        route to buildroom, content, monetize, watch, subc, or verify
    
      if validation fails:
        do not promote downstream until resolved or explicitly tolerated
    
    ```

    The important part is that evidence has stages.

  • Raw input is not a finding.
  • A finding is not a claim.
  • A claim is not verified knowledge.
  • Verified knowledge is not automatically a task.
  • A task is not automatically approved work.
  • That chain is what keeps the system honest.


    The Real Lesson

    A real research agent knows what you care about, what it has seen before, which sources are strong, which collectors are degraded, and which claims are not yet ready.

    Most importantly, it knows when to say, "This is interesting, but not safe to build on yet."

    That is where research agents become valuable, as they can preserve uncertainty without losing momentum.

    If You Want To Build One

    Point your agent to this article, but do not start by permitting them to read the entire internet.

  • Start by giving it a vault and an identity.2
  • Then, a source plan and knowledge ledgers.
  • Then, there is a verification queue and handoff lanes.
  • Then, there are health checks and a schedule.
  • Then let the other agents consume its outputs.
  • The research agent should be the only thing that helps the rest of the system determine what is worth shipping.

    A normal agent answers the prompt in front of it. A better agent remembers what happened. A research agent builds the evidence base that makes future prompts smarter before they are even asked.


    Thank you for reading!

    If you are interested in implementing AI into your business to cut waste, speed up processes, and improve your margins, visit gkisokay.com to book a free AI audit call to see if we can assist you.

    If you're a builder, join my free Discord community here.

    Remember to follow @gkisokay and reshare this article if it helps you.

    Actions
    Visual Editor Carousel Maker NEW
    Update Thread
    What You Can Do
    • Download as PDF
    • Save to Notion
    • Export as Markdown
    • Visual Editor
    • LinkedIn & Instagram Carousel Maker
    Create Free Account

    Includes 7-day Premium trial

    Advertisement