Primitive

Agent

The actor allowed to do research work.

Definition

An agent is the actor that performs research work under an explicit identity and permission boundary. It can be an LLM coding agent, a deterministic script, a baseline runner, a training loop, a human operator, or a custom service. Model agents by what they are allowed to see, decide, execute, write, and explain, not by whether they are powered by an LLM.

How It Looks

Agent identity + configTools, memory, permissionsRuns, artifacts, and audit events

An agent looks like an accountable actor plus runtime: identity, owner, provider or command, prompt/config, tool manifest, permission grants, budget, memory access, approval gates, and audit events.

How To Use It

Use agents whenever an actor can launch work, inspect state, call tools, mutate files, consume budget, read memory, or influence future decisions. Keeping humans, scripts, LLMs, and services in the same model gives the platform one place to enforce permissions, record provenance, and audit decisions.

Actor model

An agent is an accountable identity plus a runtime. The identity answers who or what acted; the runtime answers how the action was performed. A single person may operate several human agents with different permissions, and a single provider model may back several LLM agents with different prompts, tool manifests, and budgets.

Do not make LLMs special in the core schema. A Claude coding session, a Codex session, a shell script, a nightly scheduler, a PyTorch training loop, a benchmark baseline, and a human reviewer all need the same high-level questions answered: what can it access, what can it change, what evidence did it use, and what trace did it leave behind?

LLM agents

LLM agents are interactive decision-makers that combine model calls with tools. Their record should include provider, model, model parameters, prompt stack, tool manifest, runtime image, working directory, budget, stop conditions, and any delegated sub-agents or child runs they create.

The prompt is part of the agent configuration, not loose UI text. System instructions, task prompts, benchmark-specific rules, tool-use policy, memory retrieval policy, and refusal or escalation rules should be versioned so a run can be reconstructed later. If a prompt changes, that is a new agent version even when the provider model is unchanged.

Scripts and baseline runners

Script agents are agents even when they make no autonomous decisions. A fixed command that preprocesses data, trains a baseline, evaluates a checkpoint, imports a paper, or converts artifacts can still consume budget, touch files, and produce results that later policies trust.

Baseline runners should be narrow agents with stable code snapshots, deterministic commands where possible, pinned containers, and minimal permissions. Treat them as first-class agents so comparisons can distinguish a new research idea from the reference implementation, official baseline, smoke-test runner, or regression-check runner.

Humans

Humans should be represented as agents when they approve actions, edit artifacts, write memory, label outputs, resolve failures, or launch runs. This does not imply the platform controls the human; it means human interventions have identity, timestamp, scope, and provenance like every other consequential action.

Human agents often sit at approval gates. They may grant temporary filesystem access, permit a network call, accept a policy proposal, mark an evaluation as invalid, or convert an observation into a high-confidence finding. Those decisions should be attached to the affected run, policy, memory record, or evaluation.

Permissions

Permissions are the boundary around the agent, not an afterthought on tools. Model filesystem roots, network domains, secrets, datasets, benchmark splits, GPUs, budget ceilings, evaluation access, and write privileges explicitly. Prefer scoped grants over global roles, and record the reason, approver, duration, and affected run when permissions are elevated.

Permission checks should happen before tool calls and before memory or artifact reads. A training agent may read public data and write checkpoints while being blocked from hidden labels, private scorers, evaluator code, production secrets, or promotion privileges.

Tools

Tools are callable capabilities exposed to an agent: shell, editor, container runtime, dataset reader, evaluator, memory search, artifact store, web fetcher, queue launcher, or approval requester. Tool access should be described by manifest, version, input schema, output schema, side effects, and sandbox policy.

Separate tool availability from permission. An agent can know that a tool exists while being denied a specific invocation because the requested path, secret, dataset, host, GPU pool, or benchmark split is outside its grant. The denied call is still useful audit data because it explains what the agent attempted and where the boundary held.

Memory access

Agents need memory access rules for both reads and writes. Reads should define scope, filters, ranking, citation requirements, freshness windows, and whether low-confidence or contradicted records are visible. Writes should define whether the agent can store raw observations, proposed findings, validated findings, run notes, or private scratchpad state.

Memory access should be visible in the run trace. If an agent launches expensive work because it retrieved three prior failures and one promising result, those memory records should be linked. If a policy blocks a configuration because memory says it is known-bad, the blocked decision should be inspectable too.

Auditability

Every consequential agent action should leave an audit event: prompt/config used, memory read, tool call requested, permission check result, command executed, artifact written, evaluation submitted, approval granted, and memory written. The audit log is what lets researchers distinguish model insight from prompt leakage, hidden state, operator intervention, or benchmark contamination.

Auditability is especially important for autonomous loops. When a policy launches an agent, and that agent launches child runs, the platform should preserve the chain from objective to proposal to approval to execution to evaluation to memory update. Without that chain, the system cannot explain why resources were spent or why a result should be trusted.

Show Examples

W2S LLM research agent

A Codex-backed W2S agent receives a benchmark task, retrieves allowed memories about prior weak-to-strong experiments, edits training code, runs a bounded sweep, submits predictions to the trusted evaluator, and writes a memory note with citations to the run and score. It can read public train data and previous findings, but it cannot read hidden labels or modify the official scorer.

{
  "agent": "codex-w2s-researcher",
  "runtime": "llm_coding_session",
  "model": "codex",
  "permissions": ["repo:write", "docker:run", "memory:read:w2s", "memory:write:observations"],
  "denied": ["dataset:hidden_labels", "evaluation:scorer_write"],
  "tools": ["editor", "shell", "docker", "memory_search", "evaluation_submit"]
}

W2S baseline runner

A baseline runner executes a pinned reference command for comparison. It has no model prompt and no broad editor access. Its value is that every baseline result carries the same provenance model as an autonomous agent result: code version, container, command, dataset grant, logs, artifacts, and evaluation submission.

{
  "agent": "w2s-baseline-reference",
  "runtime": "script",
  "command": "python train_baseline.py --config configs/reference.yaml",
  "permissions": ["dataset:train_read", "artifact:write", "evaluation:submit"],
  "tools": ["shell", "artifact_store"]
}

AutoGo self-play agent

An AutoGo training agent runs self-play, writes checkpoints, and requests arena evaluation against a stable baseline. It may read memories about failed hyperparameters and allocate a GPU budget, but it cannot change the arena rules or promote itself without a policy or human approval gate.

{
  "agent": "autogo-self-play-v3",
  "runtime": "training_loop",
  "permissions": ["gpu:allocate:limited", "checkpoints:write", "memory:read:autogo"],
  "approval_required": ["promote_model", "increase_gpu_budget"],
  "tools": ["queue_launcher", "checkpoint_store", "arena_submit", "memory_search"]
}

Human approval agent

A human reviewer is modeled as an agent when they approve an escalation, reject a suspicious evaluation, or validate a finding. This makes manual judgment auditable without pretending it was produced by the automated researcher.

{
  "agent": "human-reviewer",
  "runtime": "human",
  "actions": ["approve_permission", "reject_run", "validate_memory"],
  "audit": ["actor_id", "timestamp", "reason", "target_run_id"]
}

Owns / Defines

Identity, runtime, tools, permissions, prompt/configuration, memory access, approvals, and audit trail.

Questions Operators Should Answer

What kind of actor is this agent: LLM coding agent, deterministic script, baseline runner, training loop, service, scheduler, policy executor, or human?
Which identity, owner, provider, model, prompt stack, runtime image, code commit, command, tool manifest, and environment variables define this agent version?
Which filesystems, networks, secrets, datasets, benchmark splits, GPU pools, queues, artifact stores, and evaluation endpoints can the agent access?
Which tools are available, which invocations require approval, and how are denied tool calls recorded?
What can the agent read from memory, what can it write, and what citations or confidence levels are required before memory affects future work?
Which actions are autonomous, which are human-driven, which are scheduled by policy, and which require explicit approval gates?
How are prompts, retrieved memories, tool calls, permission grants, artifacts, evaluations, and memory writes audited for reproducibility?
How can the agent be paused, interrupted, resumed, rate-limited, budget-limited, revoked, or rolled back?
How are baseline runners distinguished from exploratory agents while still using the same provenance and permission model?
What prevents benchmark leakage, evaluator mutation, hidden-data access, or self-promotion without independent approval?