Primitive

Card

One research move.

Definition

A card is one move in the research game: a hypothesis, code change, diagnostic, ablation, rerun, or lateral idea that can gather evidence. Runs are executions. Cards are the ideas those executions test. This lets Picidae show research as a decision tree instead of a flat pile of logs.

How It Looks

QuestionCard / hypothesisRuns as evidence

A card looks like: try disagreement-weighted labels because MCTS-policy disagreement may indicate target quality; run three seeds; compare score, variance, and diagnostic plots; then continue, branch, stop, or promote.

How To Use It

Use cards to preserve what the agent was trying to learn. A smart agent may run many commands, but the human needs the strategic unit: what idea was tested, what happened, and what decision does it imply?

Not Just A Run

A run says something happened. A card says why it happened. One card may have many runs for seeds, ablations, phases, or reruns. A failed run can still support a useful card if it falsifies a hypothesis or reveals a boundary.

Phone-Sized Unit

Cards are designed for supervision from a phone. The user should be able to see title, claim, score movement, evidence quality, suspicion flags, and suggested action without opening terminal logs.

Agent Freedom

Picidae should not force the agent into a rigid workflow. The agent can invent experiments freely, but every meaningful attempt should eventually be compressed into a card so the research state is inspectable.

Show Examples

Autoresearch card

The agent changes the attention window pattern. The card links the diff, training runs, metric movement, VRAM cost, and whether the change should be kept.

card:
  title: switch to local-only attention
  hypothesis: shorter windows improve throughput without hurting val_bpb
  parent: baseline
  runs: [run_122, run_123, run_124]
  status: discarded
  decision_reason: worse val_bpb at similar memory

Ablation card

A high-performing idea can spawn cards for ablations. These cards check whether each component is real or just complexity.

card:
  title: remove posterior tempering
  parent: em-posterior-v4
  type: ablation
  suggested_action: keep parent, reject ablation

Owns / Defines

Hypothesis, proposed change, parent card, linked runs, status, evidence summary, and next action.

Questions Operators Should Answer

  • What hypothesis or research move does this card represent?
  • Which runs and artifacts support or contradict it?
  • Is the card a continuation, branch, ablation, diagnostic, rerun, or promotion candidate?
  • What should happen next: continue, branch, stop, rerun, investigate, or promote?
  • Can the card be understood without reading raw logs?