# Cross-builder paired-run pilot - protocol (v0.1, draft) The v1.0 → v1.1 gate. The standard's own Q3 falsifier. Three builders, three pairs, one rolling four-week window. This document is the pilot's contract. It is not the paired-run rubric (`rubric.md`) and not the acceptance instrument (`acceptance-instrument.md`); both live alongside this file and are versioned with it. **Status:** Proposed. Recruitment opens 2026-06-15. First run window: 2026-07-01 to 2026-07-28. **Last verified:** not yet · **Re-verify by:** empty (status is not yet Proven) --- ## 1. The question Does a builder who articulates a VALUE.md before building reach a result the named recipient accepts faster than a builder working from a plain description of the same problem? The hypothesis the standard carries is **yes**. The pilot exists to either generate the first directional signal or to surface that the recruitment shape itself is the hard part (which is also data). ## 2. Eligibility A pilot builder is someone who, at intake: - **Has shipped at least three projects** that aimed at a specific recipient and either landed or didn't. Need enough builder-judgment to recognize "this didn't land" as a category. - **Is not the author of Keep a Value** and has not co-authored or co-edited the standard's prose. (Self-application disclosure named in Standard.md §Status applies - this pilot is the cure.) - **Has read the standard once** within the 72 hours before the pilot run. Cold-reading-the-standard-during-the-run conflates two different effects. - **Owns the problem they bring** - not a contrived problem, not a problem assigned to them for the pilot. The build trap doesn't happen on assigned problems. Three builders, recruited as a convenience sample. The pilot is directional, not inferential. The recruitment list lives in `recruitment.md` (gitignored - names are private). ## 3. Design Each builder runs **two passes on the same problem they own**, in a randomized order with a four-week gap between passes: - **Pass A - VALUE.md condition:** the builder writes a conforming VALUE.md against the standard, passes the six-part gate, then builds against it. Time-to-recipient-accepted-result is logged from the moment Q3 is signed off as Proposed to the moment the named recipient gives the accept verdict. - **Pass B - description-only condition:** the builder writes a plain-language project description (no Q1/Q2/Q3 structure, no gate) of comparable length, then builds against it. Same time-logging shape. The same recipient evaluates both passes for the same builder. Recipient is blind to which pass is VALUE.md and which is description-only - they see only the artifact at delivery, not the upstream contract. **Order randomization:** flip a coin per builder. Three builders × two passes = six runs over the four-week window. The two passes for the same builder are on **different sub-problems within the same project** to prevent the second pass from inheriting the first pass's learning. The sub-problems are chosen by the builder at intake before randomization, locked in writing, and frozen. ## 4. Scoring (placeholder - see `rubric.md`) The full rubric lives in `rubric.md` and ships when the first run ships (committed together, not in advance - pre-committing a rubric to imagined runs is proxy theater; the recurring-skip ledger in WRAPUP.md names this). Two headline measurements: 1. **Time-to-acceptance** (clock minutes, from contract sign-off to recipient accept verdict). 2. **Acceptance verdict** (yes / partial / no) from the named recipient against the recipient's own words: "did this do the change for you?" The acceptance instrument lives in `acceptance-instrument.md`. ## 5. Consent + privacy - All builders sign a consent form (`consent-template.md`) before intake. - Builder identity is private - pilot writeups use Builder A, B, C. - The problem each builder brings may be public (their project, their choice) or anonymized; they choose at intake. Pilot writeup defaults to anonymized unless the builder opts in to attribution. - Recipient identity is private by default. Recipient quotes are paraphrased unless the recipient signs an attribution waiver. - Raw timing logs and verdict records live in private storage; aggregated results live in `results.md` in this repo. ## 6. Stop conditions The pilot is **paused immediately** if: - A builder reports the protocol is contaminating their actual work (e.g. the description-only pass is producing real harm to their recipient because the description-only condition is structurally weaker). - The recipient withdraws consent at any point - their pass is excluded from results. - Both pairs from the same builder cannot complete within the four-week window (deferred to next window, not abandoned). The pilot is **abandoned and re-designed** if: - Recruitment cannot fill three slots within six weeks of opening (signal that the recruitment shape is wrong, not that the experiment is wrong). - The first completed pair shows that the two conditions are not actually distinguishable from the recipient's seat (signal that the contract shape, not the time-to-acceptance, is what differs). ## 7. Reporting Within two weeks of pilot close, this directory carries: - `results.md` - three pairs of timing + verdict + recipient quote, aggregate signal. - `recipient-feedback.md` - verbatim recipient feedback per pass, paraphrased per consent. - `builder-debrief.md` - each builder's debrief, paraphrased per consent. - `lessons.md` - what the pilot taught about the standard, the rubric, the protocol itself, and what to change for the next cycle. The standard's Q3 (Standard.md) gets re-verified against the pilot results. The Status field moves Proposed → Proven or stays Proposed with a sharpened hypothesis. ## 8. What this pilot does NOT do - It does not establish that VALUE.md is the cause of any observed time difference. n=3 is too small for causal claims; the pilot reports a directional signal. - It does not test the standard with builders outside the recruitment pool's selection bias (people one degree of separation from the author). - It does not test whether the standard works across cultures or languages - that is the translation policy's job. - It does not validate the standard's claim for the stopping-power half of the build trap. Both conditions are clarity-deficit interventions.