Primitive

Run

The thing that happened.

Definition

A run is a concrete execution of work by an agent or system. It records configuration, status, logs, artifacts, resources, metrics, lineage, and optional parent run. Do not split Trial into a separate primitive; seed attempts, sweeps, game batches, retries, and pipeline phases can be represented as runs or child runs.

How It Looks

Parent runChild runs / phases / seedsArtifacts + signals

A run looks like a timestamped execution record with a parent id, agent id, benchmark id, dataset versions, command, resource lease, logs, artifacts, metrics, and child runs for repeated phases or seeds.

How To Use It

Use one run for anything that actually happened. If the work has repeated seeds, phases, retries, or batches, attach them as child runs instead of inventing a separate Trial primitive.

Run Replaces Trial

Picidae should avoid treating trial as a separate primitive. A trial is usually just a run with a parent: one seed, one sweep point, one game batch, one retry, one collect phase, or one evaluation phase. This keeps the backend simple while still supporting nested execution trees.

Lifecycle

A run moves through lifecycle states such as queued, provisioning, syncing data, running, evaluating, completed, failed, canceled, and archived. Each transition should be timestamped and explainable because the scheduler, UI, billing system, and policy layer all depend on run state.

What A Run Captures

A useful run record captures the agent, benchmark, datasets, evaluation targets, compute target, image digest, command, environment, config, seed, parent run, logs, artifacts, metrics, resource usage, cost, and final status. This is the minimum needed to reproduce or audit what happened.

Nested Runs

Nested runs let one high-level run represent a research attempt while children represent repeated work. A parent can contain one child per seed, one child per hyperparameter point, one child per pipeline phase, or one child per evaluation batch. The primitive stays the same; lineage carries the structure.

Data And Artifact Flow

Runs are where the data plane becomes visible. Before execution, datasets may be mounted, copied, or streamed. During execution, logs and metrics stream back. After execution, artifacts such as checkpoints, predictions, reports, game records, and traces are persisted and linked to memory.

Show Examples

W2S idea across seeds

A parent run tests one weak-to-strong idea. It creates five child runs for seeds 42 through 46. Each child emits predictions and metrics. The parent aggregates the signals and writes memory if the result is stable enough.

run: confidence-reweighting
children:
  - seed: 42
  - seed: 43
  - seed: 44
  - seed: 45
  - seed: 46
evaluation: pgr-hidden-labels

AutoGo pipeline

An AutoGo run can represent a full iteration. Child runs collect self-play data, train from the new data, and evaluate the checkpoint in an arena. Failures in any phase are attached to the same parent lineage.

run: autogo-iteration-17
children:
  - collect-games
  - train-checkpoint
  - arena-evaluation
artifacts:
  - game_records/
  - checkpoints/iter17.pt
  - arena_report.json

Owns / Defines

Action, status, logs, artifacts, metrics, resources, config, and optional parent run.

Questions Operators Should Answer

What starts the run: user action, policy decision, schedule, webhook, retry, or child workflow?
Which state transitions are legal from queued through running, completed, failed, canceled, evaluated, or archived?
What must be captured for reproducibility: config, environment, code snapshot, dataset versions, seeds, and resource allocation?
How should nested work be modeled: parent_run_id, phases, attempts, batches, or linked evaluations?
What artifacts, logs, metrics, and failure reasons are queryable during execution versus after completion?