Skip to main content

Agent Tools

ToolDefinition is the stable callable contract for Agent tools. The current runtime treats tools as first-class runtime objects rather than prompt-only suggestions.

What tools do not own

Not every Agent capability should become a tool. Two important capabilities are only partially tool-shaped:
  • proactive
    • mainly lives in the loop directive plus wake queue
  • learning
    • mainly lives in the loop directive plus learn store
Tools support those loops, but they do not define them. This matters because otherwise the runtime becomes a pile of ad hoc tool names instead of a coherent Agent model.

Current shape

Each tool definition carries:
  • name
  • description
  • inputSchema
  • outputSchema
  • ownership
  • permissionPolicy
  • effects
  • readOnly
  • destructive
  • interruptBehavior

Ownership

Current ownership values:
  • managed
  • mcp
  • custom
Meaning:
  • managed
    • runtime-owned capability
  • mcp
    • capability exposed through MCP
  • custom
    • capability that requires user-side or app-side fulfillment
This ownership split matters because not all tools are executed the same way.

Permission policies

Current permission policies:
  • always_allow
  • always_ask
These follow the Anthropic naming directly. They should be read as runtime policy, not UI phrasing. Current managed always_ask frontier:
  • resources_promote_to_substrate
That tool now pauses the session with requires_action until a matching user.tool_confirmation event arrives. permissions_check is the preflight seam for that policy surface, so the model can inspect whether a named managed tool will require explicit confirmation before it calls it. For shared mutation tools, permissions_check now also surfaces whether the tool is evaluator-gated through outcome_evaluate, plus the current evaluator verdict and the next recommended verification step when promotion is still unsafe. If evaluator posture is staying stable or drifting regressing, that next step now prefers outcome_history before another broad shared mutation. For shell mutation tools such as shell_run and shell_exec, permissions_check also now surfaces the current session shell mutation posture (shellMutationPosture) and the current bounded outcomeEvaluation, so shell preflight can see execution-hand posture and task posture on one surface. If the caller includes a bounded toolArgs preview for shell_run or shell_exec, permissions_check can also project a conservative readOnlyAlternative when the planned command fits the low-risk bash hand. In that shell posture, nextStep may therefore change to a read-first or recovery-first shell move such as bash, shell_read_last_output, shell_describe, or shell_open, while nextOutcomeStep still exposes the current evaluator-directed follow-up when an active outcome exists. For shell mutation tools, permissions_check now also returns shellReadFirstAlternatives, a small bounded menu of safer exploratory moves such as shell_describe, shell_read_last_output, bash pwd, and session_describe_context when context pressure is already elevated. When the planned shell command is clearly a direct file read, bounded file preview, directory listing, or plain workspace search, that same preflight can now push the next step all the way down to first-class built-ins such as read, glob, or grep instead of stopping at bash. That now also covers common head -n / tail -n previews, conservative sed -n '10,20p' style line-range previews, simple wc -l <file> line-count checks, ls -la / ls -al style listing variants, conservative find <path> -name <pattern> searches and current-cwd find -name <pattern> searches, and bounded find <path> -type d|f -name <pattern> searches by passing the optional glob.kind filter, plus conservative grep -n / grep -i / grep -r style search variants and bounded rg -n / rg -i search variants when the command still behaves like an ordinary inspection step. Query-only grep, rg, and find -name forms now fall back to the current session cwd instead of forcing a shell call just to search the current tree. Those read previews now report both totalLineCount and selectedLineCount so the model can decide whether another narrower reread is needed without guessing file size. permissions_describe now provides the broad posture view:
  • pending confirmation state
  • a read-only shell alternative when the pending blocked shell request can safely run through bash
  • all always_ask managed tools
  • all evaluator-gated shared mutation tools
  • the current session’s bounded evaluator verdict
  • the current session’s nextOutcomeStep when evaluator-gated promotion is still unsafe
  • the current session’s contextPressure when the current assembled prompt is already dropping history or running low on headroom
  • the current session’s shellMutationPosture and nextShellStep when the live session shell is busy or missing
  • the current session’s shellReadFirstAlternatives, so shell preflight is not only prohibitive but also points at cheaper read-first seams
That same bounded decision surface is also materialized into .openboa-runtime/permission-posture.json and .openboa-runtime/permission-posture.md, so filesystem-first agents can reread current permission, evaluator, context-pressure, and shell recovery posture without depending on prompt-local summaries.

Custom tools

Custom tools are special because they do not complete inside the same bounded harness run. Instead:
  1. the harness emits agent.custom_tool_use
  2. the session pauses with requires_action
  3. a later user.custom_tool_result event resumes the session
That keeps custom-tool pausing inside the session model rather than inventing a separate out-of-band workflow.

Managed tool confirmation

Managed tools can now use the same requires_action pause seam. Current flow:
  1. a managed tool with permissionPolicy: "always_ask" is invoked
  2. execution pauses before the tool side effect happens
  3. the session stores a pending tool-confirmation request
  4. the runtime emits a blocking agent.tool_use
  5. session.status_idle records blockingEventIds
  6. a later user.tool_confirmation event resumes the session
This keeps permission policy inside the session event model rather than hiding it in UI state. If the user changes direction before the next wake, a user.interrupt event can clear the pending blocked state so the session is redirected instead of remaining pinned to the old confirmation request.

Why proactive is not a tool family

There is no dedicated proactive.* tool family. That is intentional. Proactive continuation is currently expressed through:
  • loop directive queuedWakes
  • session wake queue
  • orchestration consuming due wakes
The runtime keeps this outside the tool catalog because it is session-control behavior, not a callable side-effect surface. If it were flattened into ordinary tools, the distinction between:
  • “do work now”
  • “schedule the next revisit”
would become harder to reason about.

Managed navigation and recall tools

The current managed runtime exposes more than one kind of read surface. Session navigation:
  • environment_describe
  • agent_describe_setup
  • agent_compare_setup
  • vault_list
  • permissions_describe
  • permissions_check
  • session_list
  • session_list_children
  • session_get_snapshot
  • session_describe_context
  • session_get_events
  • session_get_trace
  • session_search_traces
  • outcome_read
  • outcome_grade
  • outcome_evaluate
  • outcome_history
  • outcome_define
environment_describe now returns the environment fingerprint, the current session resource-contract fingerprint, the current agent-setup fingerprint, plus the materialized environment.json and agent-setup.json artifact paths, so the model can verify the exact execution contract it is standing on before it mutates or promotes anything. agent_describe_setup is the setup-introspection seam. It reads the materialized agent-setup.json and agent-setup.md contract for the current or another same-agent session, including the exact provider/model pairing, prompt-section fingerprints, bootstrap file fingerprints, managed tool catalog, skill catalog, environment contract, mounted resource contract, permission posture, and vault catalog that produced that session’s runtime. agent_compare_setup is the bounded setup-drift seam. It compares the current session’s materialized setup contract against another same-agent session and reports whether the two fingerprints match plus which setup sections changed, so cross-session reuse can stay explicit about setup compatibility before reopening prior work. session_get_trace is now the canonical bounded reread seam for one wake. By default it returns the full wake-scoped trace for the selected wakeId, including span.started and span.completed records for wake-level and tool-level execution, unless the caller explicitly narrows it with types or limit. session_get_snapshot now also returns the materialized runtime artifact map for that session, the same agentSetupFingerprint, setupMatchesCurrent, the current evaluator posture (outcomeTrend, nextOutcomeStep), and a lifted requiresAction summary (pendingActionKind, pendingActionToolName) so the model can see whether the target session came from the same current setup, whether its latest outcome loop is actually improving, and whether it is already blocked on a bounded follow-up tool before reusing its work. For the current session, that snapshot posture now uses the same live-shell-aware evaluator guard as outcome_evaluate, so a busy persistent shell forces outcomeStatus="not_ready" and points the next step at shell_wait instead of presenting stale promotion-safe posture. That includes the current .openboa-runtime paths for context budget, outcome status/grade/evaluation, shell state/history/last-output, permissions, environment, tools, skills, vaults, traces, and event feed, so cross-session navigation can continue from a snapshot without guessing the next reread surface. It also returns relationToCurrent and childCount, so multi-agent navigation can stay explicit about lineage and fanout instead of inferring it from raw metadata. For navigation ergonomics, the snapshot summary also lifts outcomeStatus and promotionReady to the top level instead of forcing every caller to inspect the nested evaluator object first. session_list, session_list_children, and session_run_child now include agentSetupFingerprint alongside resourceContractFingerprint, and they surface the latest evaluator posture (outcomeTrend, promotionReady, nextOutcomeStep) plus lifted requiresAction posture so the model can spot setup-compatible, still-improving, already-blocked sessions before opening a fuller snapshot. When the current session itself is included in that list, its summary uses the same live-shell-aware evaluator guard, so navigation posture does not drift away from permissions_check or outcome_evaluate while a persistent shell command is still running. The list tools also expose setupMatchesCurrent and outcomeMatchesCurrent, and they now accept bounded filters such as hasOutcome, outcomeStatus, promotionReady, status, and activeMinutes. session_list also accepts lineage=related|parent|children|siblings and returns relationToCurrent plus childCount, so the model can stay inside the nearest parent/child/sibling cluster and see parent fanout before widening to all same-agent sessions. It now also accepts outcomeTrend=first_iteration|improving|stable|regressing, and list ordering mildly prefers improving sessions over stalled or regressing ones when other signals are tied. Its summaries also expose top-level outcomeStatus and promotionReady, so parent and sibling navigation can filter or sort on evaluator posture without extra nesting. That lets a parent or same-agent session narrow navigation toward sessions that are already working on the same objective, already blocked on the same evaluator posture, or already promotion-safe before it opens a fuller snapshot. session_search_traces is the wake-unit search seam. It searches same-agent wake traces across sessions so the model can find one bounded prior execution run before calling session_get_trace to reread it in detail. session_search_context, session_search_traces, memory_search, and retrieval_search now also accept a bounded lineage scope:
  • related
  • parent
  • children
  • siblings
That lets multi-agent or delegated same-agent work stay inside the most relevant parent/child/sibling session cluster instead of searching all same-agent history. The current bounded multi-agent seam is:
  • session_delegate
  • session_list_children
  • session_run_child
session_delegate creates a direct child session for the same agent and seeds it with a bounded task. session_run_child then lets the parent advance that direct child for a few bounded cycles without collapsing both threads into one context window. The delegated child summaries now also carry relationToCurrent="child", top-level outcomeStatus, promotionReady, and childCount, so parent sessions can inspect child posture without digging through nested evaluator state. Cross-session recall:
  • memory_list
  • session_search_context
  • retrieval_search
  • memory_search
  • memory_read
  • memory_list_versions
  • memory_read_version
  • memory_write
  • memory_promote_note
  • learning_list
Learning is partly tool-shaped and partly harness-shaped. The harness captures learnings from loop directives. The managed tool surface then makes those learnings inspectable and promotable. Today the most relevant learning-adjacent seams are:
  • learning_list
    • inspect captured durable learnings
  • memory_search
    • retrieve prior durable memory and learnings
  • memory_read
    • reopen specific managed memory surfaces
  • memory_promote_note
    • promote bounded durable notes into shared MEMORY.md
The important design rule is:
  • learning capture happens in the harness
  • learning inspection and promotion happen through tools
That split keeps improvement durable without pretending that every lesson should be an immediate tool side effect. memory_search is now store-aware. Instead of collapsing all prior session memory into one coarse checkpoint hit, it can return separate candidates for:
  • shared workspace_memory
  • managed workspace_memory_notes
  • session_checkpoint
  • session_evaluation
  • session_outcome
  • session_state
  • working_buffer
  • shell_state
Each hit carries a narrower expansion recommendation such as agent_compare_setup, session_get_snapshot, outcome_evaluate, outcome_read, or memory_read(target=...). When a matched prior session was produced by a different agentSetupFingerprint, retrieval now recommends agent_compare_setup before broader rereads like session_get_events or session_get_trace. When the matched store is shell_state and a durable last command exists, the expansion recommendation now prefers shell_read_last_output before broader shell inspection. When the current session has a materialized agentSetupFingerprint, memory hits from setup-compatible prior sessions receive a small deterministic ranking boost and report that setup affinity in the candidate metadata. Retrieval candidates now also carry the matched session’s bounded evaluator posture (outcomeStatus, promotionReady, outcomeTrend) so the model can tell whether a prior session is still improving, stalled, or already promotion-safe before it rereads anything broader. If the caller passes lineage, that same store-aware recall can be limited to parent/child/sibling session neighborhoods before scoring. outcome_grade is the first bounded evaluation seam. It does not replace a future separate evaluator context, but it gives the runtime a deterministic rubric for whether a session is missing an outcome, blocked, sleeping, in progress, or a done candidate before the model decides the next bounded move. That same grade is also materialized into .openboa-runtime/outcome-grade.json, .openboa-runtime/outcome-grade.md, and .openboa-runtime/outcome-repair.md, and the harness may surface a bounded [outcome-repair] runtime note when the grade implies that the next tool choice should be more explicit than free-form continuation. When the blocked managed tool is a shell mutation request whose stored command is actually read-only, outcome_grade now prefers bash as the next bounded move instead of sending the model back through another confirmation-oriented preflight. outcome_evaluate is the bounded promotion-safety seam. It inspects the current durable outcome, recent wake trace, and idle result to decide whether shared promotion is actually safe yet. For the current session, that bounded evaluator now also respects live persistent-shell posture: if a persistent shell command is still running, the evaluator is forced back to not_ready and its next suggested tool becomes shell_wait instead of allowing promotion-safe conclusions while the execution hand is still unsettled. It now also reports a bounded evaluator trend:
  • first_iteration
  • improving
  • stable
  • regressing
plus a short trendSummary, so self-improvement loops can tell whether the latest bounded revision actually improved evaluator posture. That evaluator verdict is materialized into .openboa-runtime/outcome-evaluation.json and .openboa-runtime/outcome-evaluation.md. The same runtime now also keeps a bounded durable evaluation history, exposed through outcome_history and materialized into .openboa-runtime/outcome-evaluations.json and .openboa-runtime/outcome-evaluations.md. Each record carries an iteration, the wake that produced it, the grade posture that led into it, and the evaluator verdict, so the agent can inspect evaluator drift across repeated bounded revisions instead of only reading the latest pass/fail posture. When a durable outcome exists but the evaluator still says promotion is unsafe, the harness may also surface a [promotion-gate] runtime note so the model sees the current blocker before it tries to mutate shared memory or shared substrate. If that evaluator posture stays stable or turns regressing across repeated bounded passes, the harness may also surface a bounded [outcome-trend] runtime note that points back to outcome_history before more mutation. session_describe_context is the bounded context-introspection seam. It exposes the current wake’s assembled context budget, and it can also read the latest materialized context budget for another same-agent session. That lets the model inspect prompt footprint, selected-vs-dropped history pressure, and top schema contributors without treating any single summary as truth. When the current wake is obviously crowded, the harness may also inject a bounded [context-pressure] runtime note that points back to session_describe_context before the model keeps widening context blindly. That same footprint is materialized into .openboa-runtime/context-budget.json and .openboa-runtime/context-budget.md. memory_write is intentionally bounded. Today it can:
  • replace or append session-state.md
  • replace or append working-buffer.md
  • create an immutable version on each write
  • enforce an optional expectedVersionId precondition for safe concurrent updates
memory_promote_note is the shared-memory writeback seam. Today it can:
  • append or replace the managed notes section inside shared MEMORY.md
  • require explicit confirmation before mutating that shared agent-level memory surface
  • create an immutable version on each promoted note write
  • enforce an optional expectedVersionId precondition before shared note promotion
  • require outcome_evaluate to report status=pass before mutating shared notes when a durable outcome exists, unless the caller explicitly overrides that gate
Neither tool rewrites the promoted runtime learnings section directly. memory_list_versions and memory_read_version expose the audit trail for writable managed memory stores. Today that includes:
  • session_state
  • working_buffer
  • workspace_memory_notes
memory_list exposes the current attached managed memory-store contract. Today that includes:
  • checkpoint
  • shell_state
  • session_state
  • working_buffer
  • workspace_memory
  • workspace_memory_notes
retrieval_search is intentionally backend-agnostic. The current deterministic ranking policy also treats setup-compatible prior sessions as a mild prior, so same-agent history from the same agent setup rises before otherwise similar hits from older or drifted setups. When the current session has an active durable outcome, deterministic retrieval also gives a mild boost to prior sessions that were working against the same outcome title or overlapping success criteria. When sessions are in a parent/child/sibling relationship, deterministic retrieval also carries a mild relation prior and can be explicitly scoped by lineage so multi-agent work stays inside the nearest delegated cluster before widening to all same-agent history. Automatic same-agent recall also mixes the current outcome grade and evaluator posture into its query cue, so blocked, sleeping, or promotion-unsafe sessions can rediscover more relevant prior repair loops before they widen context again. When the current evaluator verdict is still promotion-unsafe, retrieval_search also biases its expansion plan toward bounded verification seams such as outcome_read, outcome_evaluate, session_get_snapshot, and session_get_trace before it recommends broader event rereads. If the current evaluator trend is stable or regressing, that same expansion plan now prefers outcome_history before another broad reread so repeated churn is inspected explicitly. Within the shell primitive, shell_read_last_output is the bounded reread seam for the most recent command result. It now also returns the current busyPlan, recoveryPlan, and nextStep, so a live-running or missing persistent shell can keep the next bounded move inside the shell primitive instead of forcing the model to switch surfaces just to decide what to do next. It exposes the latest durable stdout/stderr summary plus the .openboa-runtime/shell-last-output.* artifact paths, without requiring the model to jump straight to broader shell history. When the current session’s persistent shell is still busy, it also returns a liveCommand block with the running command and partial stdout/stderr preview so read-first inspection does not fall back to stale last-command memory. shell_read_command is the bounded reread seam for one specific recent command, and it now mirrors busyPlan, recoveryPlan, and nextStep so command-specific rereads still stay inside the shell primitive when the live shell is unresolved. It uses a durable commandId from shell_history so the model can reopen an earlier shell step without depending on unstable list position or broad shell summaries. Today the built-in deterministic backends are memory, session-context, and session-trace search. The interface is also shaped so optional backends such as vector search can be added later without redefining the Agent core. Procedural guidance:
  • skills_list
  • skills_search
  • skills_read
  • shell_describe
  • shell_history
  • shell_wait
    • wait briefly on the live persistent shell and return bounded running/completed status plus the same busy/recovery next-step posture when the shell is still unresolved
  • shell_read_command
  • shell_set_cwd
  • shell_set_env
  • shell_unset_env
  • shell_open
  • shell_run
  • shell_exec
  • shell_close
session_describe_context now also returns a bounded pressure summary derived from the current context budget. That summary exposes a level, a short list of reasons, and recommendedTools such as retrieval_search, session_search_context, session_get_snapshot, session_get_trace, or shell_describe, so the model can pivot toward narrower reread seams before it keeps widening prompt-local context. skills_list and skills_search return concise metadata plus a small preview, and they now also carry a bounded nextStep that points at skills_read(name) so the model can move from discovery to full procedure read without inventing its own follow-up shape. skills_read loads the full skill body only when the model has decided the skill is relevant. Execution hand introspection:
  • read
  • write
  • edit
  • glob
  • grep
  • bash
  • sandbox_describe
  • sandbox_execute
  • resources_stage_from_substrate
  • resources_list_versions
  • resources_read_version
  • resources_restore_version
  • resources_compare_with_substrate
  • resources_promote_to_substrate
The first six are the preferred built-in tool surface for ordinary workspace work. They map to the same bounded session hand as sandbox_execute, but they expose the common file and command loop directly instead of forcing the model to manually specify sandbox action names. The runtime also mirrors the live managed tool and permission contract into the session hand itself under /workspace/.openboa-runtime/managed-tools.json and /workspace/.openboa-runtime/permissions.json. That lets filesystem-first agents re-check the current tool surface and confirmation posture without depending only on prompt text. Current local semantics:
  • read
    • bounded text read from mounted resources
  • write
    • overwrite or create a file under a writable mounted root
  • edit
    • exact text replacement inside a writable file
  • glob
    • glob-style file or directory matching under a mounted root
  • grep
    • bounded text search under a mounted root
  • bash
    • bounded read-only non-shell command execution rooted in a mounted path
  • shell_describe
    • inspect the current durable shell state, mounted hand constraints, command policy, recent bounded commands, current context pressure, read-first shell alternatives, and the runtime artifact paths for shell rereads
  • shell_history
    • reread the recent bounded shell history, including small stdout/stderr previews, live busy/recovery posture, and the latest shell-output artifact paths, before continuing shell-driven work
  • shell_wait
    • wait briefly on the current session’s live persistent shell command and return either bounded running status or the completed live result with the same shell artifact paths
    • when the command completes, it also syncs the durable shell-state and shell-last-output artifacts so later rereads do not depend on stale pre-completion memory
  • shell_read_command
    • reread one specific recent shell command by durable commandId, including bounded stdout/stderr and the same shell artifact paths
  • shell_set_cwd
    • update the durable session-scoped working directory for future bounded commands
  • shell_set_env
    • persist one session-scoped shell environment variable for future bash and shell_run calls
  • shell_unset_env
    • remove one session-scoped shell environment variable from the durable shell hand
  • shell_open
    • open or reuse the current session’s persistent shell process before multi-step shell work
  • shell_restart
    • restart the current session’s persistent shell process when shell_describe reports that the live shell is closed or stale
  • shell_run
    • permission-gated one-shot writable shell execution inside the session execution hand
  • shell_exec
    • permission-gated execution through the current session’s persistent shell so cwd and exported env survive across steps
  • shell_close
    • close the current session’s persistent shell process when the multi-step shell loop is done
bash is intentionally the low-risk read-only command hand. shell_run is the writable one-shot shell seam and requires explicit confirmation. shell_exec is the writable persistent-shell seam and also requires explicit confirmation. This is still not a persistent PTY-backed shell session, but together these tools now give the runtime:
  • durable session-scoped cwd continuity
  • durable session-scoped shell environment continuity
  • optional durable session-scoped shell-process continuity
  • recent command history with bounded output previews
  • bounded read-only inspection
  • permission-gated writable shell composition inside /workspace
shell_describe also prefers live sandbox introspection for the current session’s persistent shell status, so it does not rely only on durable runtime memory when a shell process has already exited. That live view now includes busy and currentCommand, so the model can see whether a long-running shell step is still active before it issues another shell mutation. When the live shell is busy, shell_describe now also returns a busyPlan that explicitly recommends shell_wait before another shell mutation, but it also exposes an evidencePlan for shell_read_last_output plus an allowlistedReadTools list (bash, read, glob, grep, session_get_snapshot, retrieval_search, and the shell read/status tools) so the runtime can keep gathering bounded evidence without treating the busy shell as a full stop. That busy plan also includes a small live stdout/stderr preview when the running command has already emitted output. When the live shell is closed or missing, it now returns a recoveryPlan pointing at shell_restart or shell_open so the model can repair shell continuity without guessing. Outside those busy/recovery cases, shell_describe now also returns the same bounded contextPressure summary and shellReadFirstAlternatives menu that the permission surface uses, except it omits the recursive shell_describe self-reference. That lets the shell primitive itself point at bash, shell_read_last_output, or session_describe_context without requiring a separate permission preflight. Both shell_describe, shell_history, and shell_read_command now return the .openboa-runtime/shell-state.json, shell-history.*, and shell-last-output.* paths so the model can reread recent shell evidence from the filesystem instead of trusting only prompt-local summaries. Current sandbox actions are filesystem-like within mounted resources:
  • list_dir
  • read_text
  • write_text
  • append_text
  • replace_text
  • mkdir
  • stat
  • find_entries
  • grep_text
  • run_command
  • run_shell
  • inspect_persistent_shell
  • open_persistent_shell
  • exec_persistent_shell
  • wait_persistent_shell
  • close_persistent_shell
This is intentionally bounded. The Agent can work freely under mounted workspace paths, while unmounted or read-only paths stay protected by sandbox policy. sandbox_describe is not just a friendly summary. It returns the mounted resource map plus:
  • constraints
  • action-level access hints
  • commandPolicy
The model should read that contract before using sandbox_execute, especially before run_command. resources_stage_from_substrate, resources_compare_with_substrate, and resources_promote_to_substrate exist because the shared substrate mount is intentionally not writable through normal sandbox actions. The model can stage a durable file into /workspace, compare it with the current substrate, revise it there, then explicitly promote the chosen result back into /workspace/agent through the managed tool surface. Shared substrate writeback is now versioned.
  • resources_list_versions
    • lists immutable versions for one promoted substrate path
  • resources_read_version
    • rereads one immutable promoted substrate version by versionId
  • resources_restore_version
    • restores one immutable substrate version back into the shared substrate as a new promoted version
    • can also enforce an optional expectedVersionId or expectedContentHash precondition before rollback writeback
    • now also defaults to the same outcome_evaluate gate when a durable outcome exists
  • resources_promote_to_substrate
    • can enforce an optional expectedVersionId or expectedContentHash precondition before replacing shared substrate
    • now also defaults to the same outcome_evaluate gate when a durable outcome exists
resources_compare_with_substrate also returns the current content hashes and the latest recorded substrate version metadata. That gives the model a safe optimistic contract: compare first, then pass the returned latestVersionId back as expectedVersionId when version history exists, or fall back to the returned expectedContentHash when the substrate file exists but has not been versioned yet. The important rule is:
  • search and summaries provide hints
  • exact reread tools provide verification
outcome_read and outcome_define sit between those two layers. They make the session’s current success target durable without pretending that the outcome itself replaces the event log. The outcome is meant to guide the bounded run, while the session log still records how the run actually evolved. That is why retrieval_search should be read as candidate generation, not as canonical truth.

What does not belong here

Tool definitions are not:
  • session scheduling
  • application-specific routing
  • external publication semantics
  • environment configuration
Those are separate seams.