The Keep a Value Runbook

You're here because something is off. Either you've been building for a while and the work isn't landing, or you're about to start something new and you're not sure you're ready. Both are the same threshold. This runbook tells you which side of it you're on, and what to do.

A note for non-software builders. This page uses words like "ship," "build," and "VALUE.md at the project root." Translate freely: "ship" means deliver; "build" means make, write, plan, campaign, teach; "VALUE.md at the project root" means a one-page contract you put where the work lives - a Google Doc, a Notion page, the top of your campaign brief. The standard applies to fundraising speeches, program rollouts, grant proposals, and conference talks as much as to software. Two of the worked examples are not software (see standard/examples/FundraisingSpeech-VALUE.md and standard/examples/IndustrySubmission-VALUE.md).

A note for builders working at LLM cadence. The discipline does not require a file. If you are about to ask an LLM to build something, or you are mid-conversation and you can feel the build trap closing, the conversational deployment is in scope: walk Q1 (who is this for?), Q2 (what changes for them?), Q3 (how would a stranger know it landed?) before the next thing gets generated. No file required. The questions are the discipline; the gate still applies; the output is sometimes a committed VALUE.md, sometimes a sharper prompt that names the recipient and the change. The cadence argument and the author's 10-day self-observation that motivate this are at Standard.md "Why this discipline now" and research/audits/2026-06-17-author-self-observation.md.

It has one verdict: ready to build, or not ready to build. No grades, no scores, no "how foggy are you." Two states. You leave the runbook knowing which one you're in.

If you're ready, the rest is short: pass the Value Statement Test (§0), write the three-sentence lede into the skeleton, fill in the elaboration the skeleton asks for, and go build. If you're not ready, the rest of this page is the walk back.

About "three sentences." The three answers are the lede of your VALUE.md - the journalistic top, the Commander's Intent, the part that has to stand alone if everything else is cut. The skeleton elaborates the lede with the contract's other fields (the falsifier, the check, the status, optional promises, what you won't promise, what you won't break). Three sentences sit at the top; the rest serves them. (Heath & Heath's Made to Stick Ch. 1 calls this concentric circles; journalism calls it the inverted pyramid.)

Read this before you start: you will almost certainly fail the test on first try. That is not the runbook failing. That is the runbook working. The diagnosis is the gift; the rewrite is the work. Plan on two or three honest passes - that's the shape.


0 · The Value Statement Test (the diagnostic you carry)

The Value Statement Test is not just something you do when you're about to write a VALUE.md. It's the diagnostic you carry around with you - the thing you reach for the moment something feels off.

Use it when:

The test, in any of those moments, is two moves. First, the grounding question - the discovery gate that comes before articulation:

"What is the most recent specific moment I observed my recipient doing the thing my project is meant to change?"

A specific moment with a date, a place, and behavior you actually watched. Not a persona you imagine. Not a category of user. If you cannot name an observed moment - not what the recipient said they wanted, not what you assume they need, but what you saw them do - your Value Statement is fiction. Clarity without grounding is still fog, just fog with better sentences. Go observe before you write.

A grounding observation counts when the recipient hits friction, makes a workaround, or abandons the task without prompting. A stated preference, a survey response, or an interview comment does not count. The grounding is "what they did when no one was watching," not "what they said when they were."

Second move, only if the first lands cleanly: try to say your project's Value Statement out loud, in one sentence.

About the bi-directional shape, before you try. The Value Statement names what each party gives the other. Most builders skip the second half. If you can name what you deliver to the recipient but not what they deliver back, your project is half a transaction - and the missing half is usually where the value claim collapses. The "gives back" is not a promise about the recipient's behavior; it names the shape of the exchange (money, attention, data, repeat use) that you observe. Promise Theory still holds.

The shape:

"[My project] gives [the recipient] [the value]; [the recipient] gives back [what they exchange in return]."

If you stall on the second half, pick from this list. Most projects have exactly one of these as the cleanest fit:

If none of these fits, the project may genuinely be one-directional (charity, public good, internal infrastructure) - name that explicitly rather than inventing a give-back.

Say it to an empty room. Say it to a friend. Say it to a rubber duck. The point is to hear yourself say it. (Do not use an LLM to judge whether your Value Statement lands; LLMs generate plausible-sounding feedback by default, and you need genuine incomprehension or genuine recognition from a real listener. Reserve LLMs for structure critique - see standard/stranger-test-prompt.md - not for this articulation test.)

Most builders cannot complete this sentence cleanly the first time. That is not a failure of the builder. That is the test working. The stumble is the diagnosis; the three questions are the cure.

The test works because articulating changes understanding. Subjects who explain to themselves while studying retain and apply material in ways silent readers do not (Chi, Bassok, Lewis, Reimann, Glaser 1989). The folk engineering version is rubber-ducking (Hunt & Thomas 1999). Geoffrey Moore named the same move the Elevator Test (Crossing the Chasm, 1991): if you can't say it in the time an elevator takes, you don't have a thinking defect about communication; you have a communication defect about thinking.

Two worked examples of clean Value Statements:

If you got a clean Value Statement, continue to §1 to write or refresh your VALUE.md. If you stumbled, continue to §1 anyway - the three questions will diagnose where you got stuck.


1 · The check (60 seconds)

Get the discipline into the moment. Three paths:

Open it (or open your mouth). The skeleton has the three questions already in place with placeholders to replace - no comments, no noise. Without checking notes, fill in:

  1. Q1 - Who it's for? The one specific person this is for. Not "users," not "the team." Someone you could describe in one more sentence of past behavior.
  2. Q2 - What changes for them? The change in their behavior or state - what they can do or feel that they couldn't before. Not what you ship.
  3. Q3 - How will you know? What proof would tell you the change landed, and how a stranger could check it without you?

Write each section in plain sentences directly into the skeleton, replacing each <placeholder>. Don't free-style in a separate scratch file. Writing inside the skeleton means the structure is enforcing itself as you write - every section already names its job.

If you want the why behind each section - the failure modes, the named cognitive traps each question prevents - read Template.md once. It's the annotated version, with the rules in the margins. The skeleton is your working file; the template is the teacher.

Once the three are filled in, run the six-part test on what you wrote.

You're ready to build when the Q0 grounding precondition holds and all six gate checks pass. The gate has two tiers: format checks you administer on yourself, and one external check no one can game alone.

If more than one person is building. Each builder fills in Q1-Q3 independently before comparing. Divergence in Q1 (different recipients named) is a team alignment problem this gate does not solve - resolve it before proceeding.

The Q0 precondition plus the six-part gate, in order

Q0 is the grounding precondition - check it first, before the gate. Then run the six gate checks below. Five of the six are self-administered. One - check 2 - requires a real stranger. They all sit in the same numbered list because they all have to hold; the "external" label on 2 tells you which one cannot be faked. We number Q0 alongside checks 1-6 below so the order on the page matches the order in your head.

Any one of the six fails → you're not ready to build. Stop. Read the rest of this page. All six pass → you're ready. Save your VALUE.md at your project root. Close this page. Go build.

Stop condition. If you cannot resolve the failure mode with the walk-back (after the three honest passes described in §3), the gate result is: do not build this yet. This is a legitimate outcome, not a failure of the runbook. The standard exists to make "do not build this yet" a defensible answer.

That's the test. You administer it on yourself; you live with the result.

What a passing file looks like (30-second preview)

Before you walk through what "not ready" looks like, anchor on the shape of a clean one. This is the Pinpoint example - a fictional incident-cause-ranking tool for on-call engineers. Full annotated version at the bottom of this page; the compact preview is here so you can see the shape now, not in 10 minutes.

Value Statement: Pinpoint gives the on-call engineer ranked likely causes within 90 seconds of the page; the engineer gives back the resolution data that sharpens the next page's ranking.

Q1 - Who. The on-call engineer who gets paged at 3am and has to find the cause fast. Not their VP. Not "users."

Q2 - What changes. Before: she opens the alert, navigates to whichever dashboard she most recently used, starts searching. Median time-to-cause: 22 minutes (n=14 observed pages). After: the alert arrives carrying three ranked likely causes, each linked to the specific dashboard or log slice where the evidence lives. Median time-to-cause drops below 5 minutes.

Q3 - How you'll know. Each resolved incident records whether the real cause was in the top three at page time. The named recipient (the engineer who took the page) attests yes or no. Breaks if: the real cause is outside the top three on more than 10% of incidents over any rolling 30-day window. Check: Field - per-incident yes/no, logged in audits/q3-acceptance/YYYY-Q.md.

What we don't break: if engineers stop reading the dashboard themselves and trust the ranking, their ability to debug without the tool on the rare wrong day degrades. Counter-metric: ability to debug without Pinpoint, sampled quarterly.

That's the whole thing. One sentence per question, a measurable falsifier, a counter-metric. The shape is what you're aiming at; the rest of this page is how to get yours into that shape.


2 · What "not ready" looks like, and how to walk back

You're not ready for one of six reasons - one for each check that broke. Each has a different fix. Find which check broke and read that section.

If sentence 1 broke (Who)

You wrote "users," "the team," "developers," "everyone," "the market" - a category, not a person. Or you wrote a specific role but couldn't describe their last frustrating thirty seconds.

The failure. Your brain swapped a hard question ("who is the one person whose life this changes?") for an easier one ("what kind of person might find this useful?"). The swap is invisible to you while it's happening - Daniel Kahneman calls this attribute substitution (Thinking, Fast and Slow, Ch. 9). You won't notice you did it.

The walk back. Write one sentence of past behavior about the recipient. Not who they are - what they were doing the last time they hit the pain your project solves. "An engineer who, last Tuesday at 2am, retried npm test four times hoping the failure would go away." That's a person you could draw as a stick figure doing a specific thing at a specific moment. Chip and Dan Heath call this the concreteness test (Made to Stick, Ch. 3) - "specific people doing specific things."

You'll know you crossed it when you could describe the recipient to a stranger in two sentences and they could draw the stick figure.

If sentence 2 broke (What changes)

You wrote a thing you'll ship - "we'll add export," "the API supports Y," "the page is live." Or you wrote a feeling - "they'll love it." Either way, sentence 2 is not a change in the recipient.

The failure. Output dressed as outcome. If your sentence is true the moment you finish writing the code - regardless of whether anyone uses it differently afterward - it's a feature, not a change. Melissa Perri named the systemic version of this (Escaping the Build Trap) - and Richard Rumelt named the strategic version: "If you fail to identify and analyze the obstacles, you don't have a strategy. Instead, you have either a stretch goal, a budget, or a list of things you wish would happen" (Good Strategy / Bad Strategy, Ch. 3).

The walk back. Write "After this lands, [the person from sentence 1] does or measures ___ differently." Fill the blank with a verb on the recipient's side - an observable behavior or a measurable state.

About "feels." Feelings are not banned, but they're a trap. If your verb is feels, you have a feeling - and a feeling needs an observable proxy: a survey score with a threshold, a stop-doing-X behavior, a self-report on a fixed instrument. Without the proxy, you have the weather, and the standard can't promise the weather. The proxy is what makes the feeling falsifiable. (This is Avedis Donabedian's structure / process / outcome distinction in miniature - Milbank Quarterly 44(3), 1966. Process and outcome are observable; "feeling supported" is neither unless you name how it's measured.)

A test that works: imagine the project gets cancelled mid-build and your team ships something completely different. Does your sentence 2 still describe a real change for the recipient? If yes, it's a change. If no, you wrote a feature.

You'll know you crossed it when sentence 2 starts with the recipient as the subject, not you or your code - and the verb is either observable behavior or measurement against a named proxy.

If sentence 3 broke (How you'd know)

You wrote "we'll know it's working" - vague. Or "it'll feel right" - your judgment, not a stranger's. Or "tests pass" - proves the code, not the value.

The failure. No falsifier means no learning. Stanislas Dehaene's third pillar of learning is error feedback - "whenever we are surprised because the world violates our expectations, error signals spread throughout our brain. They correct our mental models" (How We Learn). No expectation = no surprise possible = no learning. Your project becomes a feedback circuit with the wire cut.

The walk back. Specify the proof so concretely that a coworker with zero context could check it in sixty seconds and tell you yes or no. The recipient decides whether the change happened, not you. Not green tests. Not "looks done." Not your gut.

Three quick tests for sentence 3:

You'll know you crossed it when a stranger could check sentence 3 alone, in sixty seconds, without asking you anything.

Warning - a well-formed proxy is necessary but not sufficient. Sentence 3 can pass the stranger check while measuring something that does not equal real value (the dashboard turns green, the user still doesn't come back). This is proxy theater - see the subject-test walkback below for the full diagnosis. The "what we don't break" section (gate check 6) is the brake that catches it.

If the subject test broke (you promised the weather)

You wrote a promise or a Breaks if whose subject is the recipient's behavior, a third party, or the world. Examples: "the client pays in five days," "investors fund the round," "users adopt the feature." All concrete. All falsifiable. All outside your control.

The failure. You can only promise your own behavior. Promise Theory's autonomy axiom (Mark Burgess, IFIP/IEEE DSOM 2005): an agent can only make promises about what it will do. A promise about another agent's behavior is malformed by construction - the formalism rejects it. The standard's own rule restates this: "don't promise the weather."

The walk back. Move the recipient's behavior into a proxy you observe and ship. Instead of "the client pays," write "the invoice file is delivered with the correct fields by the fifth business day." Instead of "users adopt," write "the onboarding flow takes a new user from sign-up to first action in under three minutes." The first is the weather; the second is something you do.

Warning - proxies can drift, and proxies can become theater. A clean proxy can pass the subject test and still drift from the real outcome it's standing in for. The onboarding flow runs in under three minutes - and the user still doesn't come back tomorrow. The invoice ships on time - and the client still doesn't pay. The proxy passing is necessary but not sufficient. Worse, the more elegantly instrumented the proxy is, the easier it is to mistake the green dashboard for the real win - this is proxy theater: the check passes, the recipient still does not change, and the team congratulates itself because the doc said the threshold was honest. The "what we don't break" check (next section) is what catches both drift and theater: name the counter-metric that would tell you the proxy lied, and choose that counter-metric so it cannot be gamed by the same instrumentation that produced the proxy.

You'll know you crossed it when every promise and every Breaks if has a subject you ship - not a behavior you hope for - AND you have named what would have to be true outside the proxy for the proxy to still be measuring real value.

If the side-effect check broke (you didn't name what you'd break)

You named a headline outcome and stopped there. Example: "the PM saves 2 hours/week" (named outcome) while "the auto-generated docs become hallucinated garbage" (unstated side effect).

The failure. When a measure becomes a target, it ceases to be a good measure (Charles Goodhart 1975; Marilyn Strathern 1997). A project can hit its named outcome while making the broader system worse. Robert K. Merton called this "imperious immediacy of interest" (American Sociological Review 1(6), 1936): the named goal overrides the system-level costs you didn't measure because no one made you. The fix is to make yourself.

The walk back. Add one bullet to your What we don't break section: the single thing that gets worse, or could get worse, if you optimize hard for the headline outcome - and the threshold that would prove it. Examples:

You'll know you crossed it when a sympathetic skeptic, reading both the outcome and the don't break line, could not name a third side-effect you missed.


3 · After "not ready"

You walked back, you rewrote. Now run the check again. Hedges, stranger, read-aloud, one recipient, subject test, what-you-don't-break. Same bar.

If all six pass - you're ready. Save your VALUE.md and go build.

If something still fails - you found a real problem. That is not a failing of the runbook. That is the runbook working. Most projects in fog stay in fog because nobody made them pass six honest checks. You just did. The wall isn't the runbook. The wall is the part of the project that wasn't ready, and now you can see it.

You get three honest passes. If the file still fails on the fourth pass, the project itself is not ready - that is the real diagnosis. Stop. Pick a different problem, or come back to this one when you can answer the failing check without hedging. Polishing a draft past pass three is the runbook's own version of the build trap.

Two honest moves before pass three:


4 · Once you've shipped your VALUE.md

The check at the top of this page is for before you build. The discipline keeps working while you build.


4.5 · The Status lifecycle

Every Q3 check and every promise carries one of four states. They are not labels you pick by feel; each has a concrete trigger that moves you to the next.

The bar to move from Proposed to Promised is the most common stuck point. Builders write Proposed claims and never wire the check, then ship anyway. The runbook's discipline: if a promise has been Proposed for more than two cycles (two weeks of weekly re-reading, two months of monthly re-reading - pick your cadence and hold it), either wire the check or rewrite the claim down to something you can check today.

A Proven claim that has not been re-verified for an agreed window (typically one to four quarters depending on volatility) drops back to Promised, not Broken. The check still exists; the evidence is just stale.


5 · A worked example

Pinpoint (clean, fictional)

Three sentences. No hedges. A stranger gets it. Read aloud without footnotes. Ready.


6 · Next time

You don't need to re-read this page on your next project. You ran the check; you know the shape now. Open the skeleton. Three sentences for the lede, the rest as the skeleton asks. Six checks. Ready or not ready.

If a year from now you still can't run the check unaided - the runbook failed you. Tell us. That's how we know.


This runbook is part of Keep a Value. Standard at the root.