Agent Tools
ToolDefinition is the stable callable contract for Agent tools.
The current runtime treats tools as first-class runtime objects rather than prompt-only suggestions.
What tools do not own
Not every Agent capability should become a tool. Two important capabilities are only partially tool-shaped:proactive- mainly lives in the loop directive plus wake queue
learning- mainly lives in the loop directive plus learn store
Current shape
Each tool definition carries:namedescriptioninputSchemaoutputSchemaownershippermissionPolicyeffectsreadOnlydestructiveinterruptBehavior
Ownership
Current ownership values:managedmcpcustom
managed- runtime-owned capability
mcp- capability exposed through MCP
custom- capability that requires user-side or app-side fulfillment
Permission policies
Current permission policies:always_allowalways_ask
always_ask frontier:
resources_promote_to_substrate
requires_action until a matching user.tool_confirmation event arrives.
permissions_check is the preflight seam for that policy surface, so the model can inspect whether a named managed tool will require explicit confirmation before it calls it.
For shared mutation tools, permissions_check now also surfaces whether the tool is evaluator-gated through outcome_evaluate, plus the current evaluator verdict and the next recommended verification step when promotion is still unsafe.
If evaluator posture is staying stable or drifting regressing, that next step now prefers outcome_history before another broad shared mutation.
For shell mutation tools such as shell_run and shell_exec, permissions_check also now surfaces the current session shell mutation posture (shellMutationPosture) and the current bounded outcomeEvaluation, so shell preflight can see execution-hand posture and task posture on one surface.
If the caller includes a bounded toolArgs preview for shell_run or shell_exec, permissions_check can also project a conservative readOnlyAlternative when the planned command fits the low-risk bash hand.
In that shell posture, nextStep may therefore change to a read-first or recovery-first shell move such as bash, shell_read_last_output, shell_describe, or shell_open, while nextOutcomeStep still exposes the current evaluator-directed follow-up when an active outcome exists.
For shell mutation tools, permissions_check now also returns shellReadFirstAlternatives, a small bounded menu of safer exploratory moves such as shell_describe, shell_read_last_output, bash pwd, and session_describe_context when context pressure is already elevated.
When the planned shell command is clearly a direct file read, bounded file preview, directory listing, or plain workspace search, that same preflight can now push the next step all the way down to first-class built-ins such as read, glob, or grep instead of stopping at bash. That now also covers common head -n / tail -n previews, conservative sed -n '10,20p' style line-range previews, simple wc -l <file> line-count checks, ls -la / ls -al style listing variants, conservative find <path> -name <pattern> searches and current-cwd find -name <pattern> searches, and bounded find <path> -type d|f -name <pattern> searches by passing the optional glob.kind filter, plus conservative grep -n / grep -i / grep -r style search variants and bounded rg -n / rg -i search variants when the command still behaves like an ordinary inspection step. Query-only grep, rg, and find -name forms now fall back to the current session cwd instead of forcing a shell call just to search the current tree. Those read previews now report both totalLineCount and selectedLineCount so the model can decide whether another narrower reread is needed without guessing file size.
permissions_describe now provides the broad posture view:
- pending confirmation state
- a read-only shell alternative when the pending blocked shell request can safely run through
bash - all
always_askmanaged tools - all evaluator-gated shared mutation tools
- the current session’s bounded evaluator verdict
- the current session’s
nextOutcomeStepwhen evaluator-gated promotion is still unsafe - the current session’s
contextPressurewhen the current assembled prompt is already dropping history or running low on headroom - the current session’s
shellMutationPostureandnextShellStepwhen the live session shell is busy or missing - the current session’s
shellReadFirstAlternatives, so shell preflight is not only prohibitive but also points at cheaper read-first seams
.openboa-runtime/permission-posture.json and .openboa-runtime/permission-posture.md, so filesystem-first agents can reread current permission, evaluator, context-pressure, and shell recovery posture without depending on prompt-local summaries.
Custom tools
Custom tools are special because they do not complete inside the same bounded harness run. Instead:- the harness emits
agent.custom_tool_use - the session pauses with
requires_action - a later
user.custom_tool_resultevent resumes the session
Managed tool confirmation
Managed tools can now use the samerequires_action pause seam.
Current flow:
- a managed tool with
permissionPolicy: "always_ask"is invoked - execution pauses before the tool side effect happens
- the session stores a pending tool-confirmation request
- the runtime emits a blocking
agent.tool_use session.status_idlerecordsblockingEventIds- a later
user.tool_confirmationevent resumes the session
user.interrupt event can clear the pending blocked state so the session is redirected instead of remaining pinned to the old confirmation request.
Why proactive is not a tool family
There is no dedicatedproactive.* tool family.
That is intentional.
Proactive continuation is currently expressed through:
- loop directive
queuedWakes - session wake queue
- orchestration consuming due wakes
- “do work now”
- “schedule the next revisit”
Managed navigation and recall tools
The current managed runtime exposes more than one kind of read surface. Session navigation:environment_describeagent_describe_setupagent_compare_setupvault_listpermissions_describepermissions_checksession_listsession_list_childrensession_get_snapshotsession_describe_contextsession_get_eventssession_get_tracesession_search_tracesoutcome_readoutcome_gradeoutcome_evaluateoutcome_historyoutcome_define
environment_describe now returns the environment fingerprint, the current session resource-contract fingerprint, the current agent-setup fingerprint, plus the materialized environment.json and agent-setup.json artifact paths, so the model can verify the exact execution contract it is standing on before it mutates or promotes anything.
agent_describe_setup is the setup-introspection seam.
It reads the materialized agent-setup.json and agent-setup.md contract for the current or another same-agent session, including the exact provider/model pairing, prompt-section fingerprints, bootstrap file fingerprints, managed tool catalog, skill catalog, environment contract, mounted resource contract, permission posture, and vault catalog that produced that session’s runtime.
agent_compare_setup is the bounded setup-drift seam.
It compares the current session’s materialized setup contract against another same-agent session and reports whether the two fingerprints match plus which setup sections changed, so cross-session reuse can stay explicit about setup compatibility before reopening prior work.
session_get_trace is now the canonical bounded reread seam for one wake.
By default it returns the full wake-scoped trace for the selected wakeId, including span.started and span.completed records for wake-level and tool-level execution, unless the caller explicitly narrows it with types or limit.
session_get_snapshot now also returns the materialized runtime artifact map for that session, the same agentSetupFingerprint, setupMatchesCurrent, the current evaluator posture (outcomeTrend, nextOutcomeStep), and a lifted requiresAction summary (pendingActionKind, pendingActionToolName) so the model can see whether the target session came from the same current setup, whether its latest outcome loop is actually improving, and whether it is already blocked on a bounded follow-up tool before reusing its work.
For the current session, that snapshot posture now uses the same live-shell-aware evaluator guard as outcome_evaluate, so a busy persistent shell forces outcomeStatus="not_ready" and points the next step at shell_wait instead of presenting stale promotion-safe posture.
That includes the current .openboa-runtime paths for context budget, outcome status/grade/evaluation, shell state/history/last-output, permissions, environment, tools, skills, vaults, traces, and event feed, so cross-session navigation can continue from a snapshot without guessing the next reread surface.
It also returns relationToCurrent and childCount, so multi-agent navigation can stay explicit about lineage and fanout instead of inferring it from raw metadata.
For navigation ergonomics, the snapshot summary also lifts outcomeStatus and promotionReady to the top level instead of forcing every caller to inspect the nested evaluator object first.
session_list, session_list_children, and session_run_child now include agentSetupFingerprint alongside resourceContractFingerprint, and they surface the latest evaluator posture (outcomeTrend, promotionReady, nextOutcomeStep) plus lifted requiresAction posture so the model can spot setup-compatible, still-improving, already-blocked sessions before opening a fuller snapshot.
When the current session itself is included in that list, its summary uses the same live-shell-aware evaluator guard, so navigation posture does not drift away from permissions_check or outcome_evaluate while a persistent shell command is still running.
The list tools also expose setupMatchesCurrent and outcomeMatchesCurrent, and they now accept bounded filters such as hasOutcome, outcomeStatus, promotionReady, status, and activeMinutes.
session_list also accepts lineage=related|parent|children|siblings and returns relationToCurrent plus childCount, so the model can stay inside the nearest parent/child/sibling cluster and see parent fanout before widening to all same-agent sessions.
It now also accepts outcomeTrend=first_iteration|improving|stable|regressing, and list ordering mildly prefers improving sessions over stalled or regressing ones when other signals are tied.
Its summaries also expose top-level outcomeStatus and promotionReady, so parent and sibling navigation can filter or sort on evaluator posture without extra nesting.
That lets a parent or same-agent session narrow navigation toward sessions that are already working on the same objective, already blocked on the same evaluator posture, or already promotion-safe before it opens a fuller snapshot.
session_search_traces is the wake-unit search seam.
It searches same-agent wake traces across sessions so the model can find one bounded prior execution run before calling session_get_trace to reread it in detail.
session_search_context, session_search_traces, memory_search, and retrieval_search now also accept a bounded lineage scope:
relatedparentchildrensiblings
session_delegatesession_list_childrensession_run_child
session_delegate creates a direct child session for the same agent and seeds it with a bounded task.
session_run_child then lets the parent advance that direct child for a few bounded cycles without collapsing both threads into one context window.
The delegated child summaries now also carry relationToCurrent="child", top-level outcomeStatus, promotionReady, and childCount, so parent sessions can inspect child posture without digging through nested evaluator state.
Cross-session recall:
memory_listsession_search_contextretrieval_searchmemory_searchmemory_readmemory_list_versionsmemory_read_versionmemory_writememory_promote_notelearning_list
Learning-related tool surface
Learning is partly tool-shaped and partly harness-shaped. The harness captures learnings from loop directives. The managed tool surface then makes those learnings inspectable and promotable. Today the most relevant learning-adjacent seams are:learning_list- inspect captured durable learnings
memory_search- retrieve prior durable memory and learnings
memory_read- reopen specific managed memory surfaces
memory_promote_note- promote bounded durable notes into shared
MEMORY.md
- promote bounded durable notes into shared
- learning capture happens in the harness
- learning inspection and promotion happen through tools
memory_search is now store-aware.
Instead of collapsing all prior session memory into one coarse checkpoint hit, it can return separate candidates for:
- shared
workspace_memory - managed
workspace_memory_notes session_checkpointsession_evaluationsession_outcomesession_stateworking_buffershell_state
agent_compare_setup, session_get_snapshot, outcome_evaluate, outcome_read, or memory_read(target=...).
When a matched prior session was produced by a different agentSetupFingerprint, retrieval now recommends agent_compare_setup before broader rereads like session_get_events or session_get_trace.
When the matched store is shell_state and a durable last command exists, the expansion recommendation now prefers shell_read_last_output before broader shell inspection.
When the current session has a materialized agentSetupFingerprint, memory hits from setup-compatible prior sessions receive a small deterministic ranking boost and report that setup affinity in the candidate metadata.
Retrieval candidates now also carry the matched session’s bounded evaluator posture (outcomeStatus, promotionReady, outcomeTrend) so the model can tell whether a prior session is still improving, stalled, or already promotion-safe before it rereads anything broader.
If the caller passes lineage, that same store-aware recall can be limited to parent/child/sibling session neighborhoods before scoring.
outcome_grade is the first bounded evaluation seam.
It does not replace a future separate evaluator context, but it gives the runtime a deterministic rubric for whether a session is missing an outcome, blocked, sleeping, in progress, or a done candidate before the model decides the next bounded move.
That same grade is also materialized into .openboa-runtime/outcome-grade.json, .openboa-runtime/outcome-grade.md, and .openboa-runtime/outcome-repair.md, and the harness may surface a bounded [outcome-repair] runtime note when the grade implies that the next tool choice should be more explicit than free-form continuation.
When the blocked managed tool is a shell mutation request whose stored command is actually read-only, outcome_grade now prefers bash as the next bounded move instead of sending the model back through another confirmation-oriented preflight.
outcome_evaluate is the bounded promotion-safety seam.
It inspects the current durable outcome, recent wake trace, and idle result to decide whether shared promotion is actually safe yet.
For the current session, that bounded evaluator now also respects live persistent-shell posture: if a persistent shell command is still running, the evaluator is forced back to not_ready and its next suggested tool becomes shell_wait instead of allowing promotion-safe conclusions while the execution hand is still unsettled.
It now also reports a bounded evaluator trend:
first_iterationimprovingstableregressing
trendSummary, so self-improvement loops can tell whether the latest bounded revision actually improved evaluator posture.
That evaluator verdict is materialized into .openboa-runtime/outcome-evaluation.json and .openboa-runtime/outcome-evaluation.md.
The same runtime now also keeps a bounded durable evaluation history, exposed through outcome_history and materialized into .openboa-runtime/outcome-evaluations.json and .openboa-runtime/outcome-evaluations.md.
Each record carries an iteration, the wake that produced it, the grade posture that led into it, and the evaluator verdict, so the agent can inspect evaluator drift across repeated bounded revisions instead of only reading the latest pass/fail posture.
When a durable outcome exists but the evaluator still says promotion is unsafe, the harness may also surface a [promotion-gate] runtime note so the model sees the current blocker before it tries to mutate shared memory or shared substrate.
If that evaluator posture stays stable or turns regressing across repeated bounded passes, the harness may also surface a bounded [outcome-trend] runtime note that points back to outcome_history before more mutation.
session_describe_context is the bounded context-introspection seam.
It exposes the current wake’s assembled context budget, and it can also read the latest materialized context budget for another same-agent session.
That lets the model inspect prompt footprint, selected-vs-dropped history pressure, and top schema contributors without treating any single summary as truth.
When the current wake is obviously crowded, the harness may also inject a bounded [context-pressure] runtime note that points back to session_describe_context before the model keeps widening context blindly.
That same footprint is materialized into .openboa-runtime/context-budget.json and .openboa-runtime/context-budget.md.
memory_write is intentionally bounded.
Today it can:
- replace or append
session-state.md - replace or append
working-buffer.md - create an immutable version on each write
- enforce an optional
expectedVersionIdprecondition for safe concurrent updates
memory_promote_note is the shared-memory writeback seam.
Today it can:
- append or replace the managed notes section inside shared
MEMORY.md - require explicit confirmation before mutating that shared agent-level memory surface
- create an immutable version on each promoted note write
- enforce an optional
expectedVersionIdprecondition before shared note promotion - require
outcome_evaluateto reportstatus=passbefore mutating shared notes when a durable outcome exists, unless the caller explicitly overrides that gate
memory_list_versions and memory_read_version expose the audit trail for writable managed memory stores.
Today that includes:
session_stateworking_bufferworkspace_memory_notes
memory_list exposes the current attached managed memory-store contract.
Today that includes:
checkpointshell_statesession_stateworking_bufferworkspace_memoryworkspace_memory_notes
retrieval_search is intentionally backend-agnostic.
The current deterministic ranking policy also treats setup-compatible prior sessions as a mild prior, so same-agent history from the same agent setup rises before otherwise similar hits from older or drifted setups.
When the current session has an active durable outcome, deterministic retrieval also gives a mild boost to prior sessions that were working against the same outcome title or overlapping success criteria.
When sessions are in a parent/child/sibling relationship, deterministic retrieval also carries a mild relation prior and can be explicitly scoped by lineage so multi-agent work stays inside the nearest delegated cluster before widening to all same-agent history.
Automatic same-agent recall also mixes the current outcome grade and evaluator posture into its query cue, so blocked, sleeping, or promotion-unsafe sessions can rediscover more relevant prior repair loops before they widen context again.
When the current evaluator verdict is still promotion-unsafe, retrieval_search also biases its expansion plan toward bounded verification seams such as outcome_read, outcome_evaluate, session_get_snapshot, and session_get_trace before it recommends broader event rereads.
If the current evaluator trend is stable or regressing, that same expansion plan now prefers outcome_history before another broad reread so repeated churn is inspected explicitly.
Within the shell primitive, shell_read_last_output is the bounded reread seam for the most recent command result. It now also returns the current busyPlan, recoveryPlan, and nextStep, so a live-running or missing persistent shell can keep the next bounded move inside the shell primitive instead of forcing the model to switch surfaces just to decide what to do next.
It exposes the latest durable stdout/stderr summary plus the .openboa-runtime/shell-last-output.* artifact paths, without requiring the model to jump straight to broader shell history.
When the current session’s persistent shell is still busy, it also returns a liveCommand block with the running command and partial stdout/stderr preview so read-first inspection does not fall back to stale last-command memory.
shell_read_command is the bounded reread seam for one specific recent command, and it now mirrors busyPlan, recoveryPlan, and nextStep so command-specific rereads still stay inside the shell primitive when the live shell is unresolved.
It uses a durable commandId from shell_history so the model can reopen an earlier shell step without depending on unstable list position or broad shell summaries.
Today the built-in deterministic backends are memory, session-context, and session-trace search.
The interface is also shaped so optional backends such as vector search can be added later without redefining the Agent core.
Procedural guidance:
skills_listskills_searchskills_readshell_describeshell_historyshell_wait- wait briefly on the live persistent shell and return bounded running/completed status plus the same busy/recovery next-step posture when the shell is still unresolved
shell_read_commandshell_set_cwdshell_set_envshell_unset_envshell_openshell_runshell_execshell_close
session_describe_context now also returns a bounded pressure summary derived from the current context budget. That summary exposes a level, a short list of reasons, and recommendedTools such as retrieval_search, session_search_context, session_get_snapshot, session_get_trace, or shell_describe, so the model can pivot toward narrower reread seams before it keeps widening prompt-local context.
skills_list and skills_search return concise metadata plus a small preview, and they now also carry a bounded nextStep that points at skills_read(name) so the model can move from discovery to full procedure read without inventing its own follow-up shape.
skills_read loads the full skill body only when the model has decided the skill is relevant.
Execution hand introspection:
readwriteeditglobgrepbashsandbox_describesandbox_executeresources_stage_from_substrateresources_list_versionsresources_read_versionresources_restore_versionresources_compare_with_substrateresources_promote_to_substrate
sandbox_execute, but they expose the common file and command loop directly instead of forcing the model to manually specify sandbox action names.
The runtime also mirrors the live managed tool and permission contract into the session hand itself under /workspace/.openboa-runtime/managed-tools.json and /workspace/.openboa-runtime/permissions.json.
That lets filesystem-first agents re-check the current tool surface and confirmation posture without depending only on prompt text.
Current local semantics:
read- bounded text read from mounted resources
write- overwrite or create a file under a writable mounted root
edit- exact text replacement inside a writable file
glob- glob-style file or directory matching under a mounted root
grep- bounded text search under a mounted root
bash- bounded read-only non-shell command execution rooted in a mounted path
shell_describe- inspect the current durable shell state, mounted hand constraints, command policy, recent bounded commands, current context pressure, read-first shell alternatives, and the runtime artifact paths for shell rereads
shell_history- reread the recent bounded shell history, including small stdout/stderr previews, live busy/recovery posture, and the latest shell-output artifact paths, before continuing shell-driven work
shell_wait- wait briefly on the current session’s live persistent shell command and return either bounded running status or the completed live result with the same shell artifact paths
- when the command completes, it also syncs the durable shell-state and shell-last-output artifacts so later rereads do not depend on stale pre-completion memory
shell_read_command- reread one specific recent shell command by durable
commandId, including bounded stdout/stderr and the same shell artifact paths
- reread one specific recent shell command by durable
shell_set_cwd- update the durable session-scoped working directory for future bounded commands
shell_set_env- persist one session-scoped shell environment variable for future
bashandshell_runcalls
- persist one session-scoped shell environment variable for future
shell_unset_env- remove one session-scoped shell environment variable from the durable shell hand
shell_open- open or reuse the current session’s persistent shell process before multi-step shell work
shell_restart- restart the current session’s persistent shell process when
shell_describereports that the live shell is closed or stale
- restart the current session’s persistent shell process when
shell_run- permission-gated one-shot writable shell execution inside the session execution hand
shell_exec- permission-gated execution through the current session’s persistent shell so cwd and exported env survive across steps
shell_close- close the current session’s persistent shell process when the multi-step shell loop is done
bash is intentionally the low-risk read-only command hand.
shell_run is the writable one-shot shell seam and requires explicit confirmation.
shell_exec is the writable persistent-shell seam and also requires explicit confirmation.
This is still not a persistent PTY-backed shell session, but together these tools now give the runtime:
- durable session-scoped cwd continuity
- durable session-scoped shell environment continuity
- optional durable session-scoped shell-process continuity
- recent command history with bounded output previews
- bounded read-only inspection
- permission-gated writable shell composition inside
/workspace
shell_describe also prefers live sandbox introspection for the current session’s persistent shell status, so it does not rely only on durable runtime memory when a shell process has already exited.
That live view now includes busy and currentCommand, so the model can see whether a long-running shell step is still active before it issues another shell mutation.
When the live shell is busy, shell_describe now also returns a busyPlan that explicitly recommends shell_wait before another shell mutation, but it also exposes an evidencePlan for shell_read_last_output plus an allowlistedReadTools list (bash, read, glob, grep, session_get_snapshot, retrieval_search, and the shell read/status tools) so the runtime can keep gathering bounded evidence without treating the busy shell as a full stop.
That busy plan also includes a small live stdout/stderr preview when the running command has already emitted output.
When the live shell is closed or missing, it now returns a recoveryPlan pointing at shell_restart or shell_open so the model can repair shell continuity without guessing.
Outside those busy/recovery cases, shell_describe now also returns the same bounded contextPressure summary and shellReadFirstAlternatives menu that the permission surface uses, except it omits the recursive shell_describe self-reference. That lets the shell primitive itself point at bash, shell_read_last_output, or session_describe_context without requiring a separate permission preflight.
Both shell_describe, shell_history, and shell_read_command now return the .openboa-runtime/shell-state.json, shell-history.*, and shell-last-output.* paths so the model can reread recent shell evidence from the filesystem instead of trusting only prompt-local summaries.
Current sandbox actions are filesystem-like within mounted resources:
list_dirread_textwrite_textappend_textreplace_textmkdirstatfind_entriesgrep_textrun_commandrun_shellinspect_persistent_shellopen_persistent_shellexec_persistent_shellwait_persistent_shellclose_persistent_shell
sandbox_describe is not just a friendly summary.
It returns the mounted resource map plus:
constraints- action-level access hints
commandPolicy
sandbox_execute, especially before run_command.
resources_stage_from_substrate, resources_compare_with_substrate, and resources_promote_to_substrate exist because the shared substrate mount is intentionally not writable through normal sandbox actions.
The model can stage a durable file into /workspace, compare it with the current substrate, revise it there, then explicitly promote the chosen result back into /workspace/agent through the managed tool surface.
Shared substrate writeback is now versioned.
resources_list_versions- lists immutable versions for one promoted substrate path
resources_read_version- rereads one immutable promoted substrate version by
versionId
- rereads one immutable promoted substrate version by
resources_restore_version- restores one immutable substrate version back into the shared substrate as a new promoted version
- can also enforce an optional
expectedVersionIdorexpectedContentHashprecondition before rollback writeback - now also defaults to the same
outcome_evaluategate when a durable outcome exists
resources_promote_to_substrate- can enforce an optional
expectedVersionIdorexpectedContentHashprecondition before replacing shared substrate - now also defaults to the same
outcome_evaluategate when a durable outcome exists
- can enforce an optional
resources_compare_with_substrate also returns the current content hashes and the latest recorded substrate version metadata.
That gives the model a safe optimistic contract: compare first, then pass the returned latestVersionId back as expectedVersionId when version history exists, or fall back to the returned expectedContentHash when the substrate file exists but has not been versioned yet.
The important rule is:
- search and summaries provide hints
- exact reread tools provide verification
outcome_read and outcome_define sit between those two layers.
They make the session’s current success target durable without pretending that the outcome itself replaces the event log.
The outcome is meant to guide the bounded run, while the session log still records how the run actually evolved.
That is why retrieval_search should be read as candidate generation, not as canonical truth.
What does not belong here
Tool definitions are not:- session scheduling
- application-specific routing
- external publication semantics
- environment configuration