Primitive

Insight

A compressed observation that changes strategy.

Definition

An insight is the product form of an aha moment. It compresses many runs, metrics, artifacts, and plots into a claim that changes what the research system should do next. A chart alone is not an insight; the insight is plot plus interpretation plus decision pressure.

How It Looks

Many runsPlot + claimStrategic decision

An insight looks like: MCTS-policy disagreement predicts useful training signal; evidence is a plot across checkpoints and game phases; suggested decision is to branch into disagreement-weighted policy training.

How To Use It

Use insights to allocate human attention. The system should not ask the user to inspect every run; it should surface the few observations that might change strategy.

Aha Moment

Autonomous research generates too much state. The valuable interface is a small number of visual or statistical compressions that make a hidden structure obvious enough for a human to decide.

Not Just Metrics

Metrics tell you what won. Insights try to explain why, when it might transfer, and what to try next. They can come from plots, ablations, counterexamples, slice analysis, representation geometry, or failure clustering.

Suspicion Included

An insight can be negative. A sudden score jump with too many submissions, weak OOD transfer, seed cherry-picking, or rising complexity is also an observation that should change strategy.

Show Examples

AutoGo plot insight

A plot shows that positions where MCTS argmax disagrees with raw policy argmax are the positions that later produce training gains. That changes the next row of experiments.

insight:
  claim: MCTS-policy disagreement is a proxy for target quality
  evidence: 42 runs across 9 checkpoints
  plot: artifacts/disagreement_vs_gain.png
  confidence: medium
  suggested_decision: branch_into_disagreement_weighted_training

W2S hack radar insight

A high PGR result is suspicious because it only improves after many evaluator submissions and fails transfer. The decision is not to promote; the decision is to run a sealed transfer evaluation.

insight:
  claim: frontier jump may be reward hacking
  evidence:
    submissions: high
    transfer: failed
    complexity: increased
  suggested_decision: rerun_on_shadow_eval

Owns / Defines

Claim, plot, evidence set, confidence, counter-evidence, affected lineage, and suggested decision.

Questions Operators Should Answer

  • What claim does this insight make?
  • Which plot or artifact makes the claim legible?
  • Which runs support it, and which runs contradict it?
  • Does it suggest continue, branch, stop, rerun, promote, or investigate?
  • Is it stable across seeds, datasets, environments, or only one lucky trace?