# Cross-builder paired-run pilot - protocol (v0.1, draft)

The v1.0 → v1.1 gate. The standard's own Q3 falsifier. Three builders, three pairs, one rolling four-week window.

This document is the pilot's contract. It is not the paired-run rubric (`rubric.md`) and not the acceptance instrument (`acceptance-instrument.md`); both live alongside this file and are versioned with it.

**Status:** Proposed. Recruitment opens 2026-06-15. First run window: 2026-07-01 to 2026-07-28.

**Last verified:** not yet · **Re-verify by:** empty (status is not yet Proven)

---

## 1. The question

Does a builder who articulates a VALUE.md before building reach a result the named recipient accepts faster than a builder working from a plain description of the same problem?

The hypothesis the standard carries is **yes**. The pilot exists to either generate the first directional signal or to surface that the recruitment shape itself is the hard part (which is also data).

## 2. Eligibility

A pilot builder is someone who, at intake:

- **Has shipped at least three projects** that aimed at a specific recipient and either landed or didn't. Need enough builder-judgment to recognize "this didn't land" as a category.
- **Is not the author of Keep a Value** and has not co-authored or co-edited the standard's prose. (Self-application disclosure named in Standard.md §Status applies - this pilot is the cure.)
- **Has read the standard once** within the 72 hours before the pilot run. Cold-reading-the-standard-during-the-run conflates two different effects.
- **Owns the problem they bring** - not a contrived problem, not a problem assigned to them for the pilot. The build trap doesn't happen on assigned problems.

Three builders, recruited as a convenience sample. The pilot is directional, not inferential. The recruitment list lives in `recruitment.md` (gitignored - names are private).

## 3. Design

Each builder runs **two passes on the same problem they own**, in a randomized order with a four-week gap between passes:

- **Pass A - VALUE.md condition:** the builder writes a conforming VALUE.md against the standard, passes the six-part gate, then builds against it. Time-to-recipient-accepted-result is logged from the moment Q3 is signed off as Proposed to the moment the named recipient gives the accept verdict.
- **Pass B - description-only condition:** the builder writes a plain-language project description (no Q1/Q2/Q3 structure, no gate) of comparable length, then builds against it. Same time-logging shape.

The same recipient evaluates both passes for the same builder. Recipient is blind to which pass is VALUE.md and which is description-only - they see only the artifact at delivery, not the upstream contract.

**Order randomization:** flip a coin per builder. Three builders × two passes = six runs over the four-week window. The two passes for the same builder are on **different sub-problems within the same project** to prevent the second pass from inheriting the first pass's learning. The sub-problems are chosen by the builder at intake before randomization, locked in writing, and frozen.

## 4. Scoring (placeholder - see `rubric.md`)

The full rubric lives in `rubric.md` and ships when the first run ships (committed together, not in advance - pre-committing a rubric to imagined runs is proxy theater; the recurring-skip ledger in WRAPUP.md names this).

Two headline measurements:

1. **Time-to-acceptance** (clock minutes, from contract sign-off to recipient accept verdict).
2. **Acceptance verdict** (yes / partial / no) from the named recipient against the recipient's own words: "did this do the change for you?"

The acceptance instrument lives in `acceptance-instrument.md`.

## 5. Consent + privacy

- All builders sign a consent form (`consent-template.md`) before intake.
- Builder identity is private - pilot writeups use Builder A, B, C.
- The problem each builder brings may be public (their project, their choice) or anonymized; they choose at intake. Pilot writeup defaults to anonymized unless the builder opts in to attribution.
- Recipient identity is private by default. Recipient quotes are paraphrased unless the recipient signs an attribution waiver.
- Raw timing logs and verdict records live in private storage; aggregated results live in `results.md` in this repo.

## 6. Stop conditions

The pilot is **paused immediately** if:

- A builder reports the protocol is contaminating their actual work (e.g. the description-only pass is producing real harm to their recipient because the description-only condition is structurally weaker).
- The recipient withdraws consent at any point - their pass is excluded from results.
- Both pairs from the same builder cannot complete within the four-week window (deferred to next window, not abandoned).

The pilot is **abandoned and re-designed** if:

- Recruitment cannot fill three slots within six weeks of opening (signal that the recruitment shape is wrong, not that the experiment is wrong).
- The first completed pair shows that the two conditions are not actually distinguishable from the recipient's seat (signal that the contract shape, not the time-to-acceptance, is what differs).

## 7. Reporting

Within two weeks of pilot close, this directory carries:

- `results.md` - three pairs of timing + verdict + recipient quote, aggregate signal.
- `recipient-feedback.md` - verbatim recipient feedback per pass, paraphrased per consent.
- `builder-debrief.md` - each builder's debrief, paraphrased per consent.
- `lessons.md` - what the pilot taught about the standard, the rubric, the protocol itself, and what to change for the next cycle.

The standard's Q3 (Standard.md) gets re-verified against the pilot results. The Status field moves Proposed → Proven or stays Proposed with a sharpened hypothesis.

## 8. What this pilot does NOT do

- It does not establish that VALUE.md is the cause of any observed time difference. n=3 is too small for causal claims; the pilot reports a directional signal.
- It does not test the standard with builders outside the recruitment pool's selection bias (people one degree of separation from the author).
- It does not test whether the standard works across cultures or languages - that is the translation policy's job.
- It does not validate the standard's claim for the stopping-power half of the build trap. Both conditions are clarity-deficit interventions.