The Keep a Value Runbook

You're here because something is off. Either you've been building for a while and the work isn't landing, or you're about to start something new and you're not sure you're ready. Both are the same threshold. This runbook tells you which side of it you're on, and what to do.

A note for non-software builders. This page uses words like "ship," "build," and "VALUE.md at the project root." Translate freely: "ship" means deliver; "build" means make, write, plan, campaign, teach; "VALUE.md at the project root" means a one-page contract you put where the work lives - a Google Doc, a Notion page, the top of your campaign brief. The standard applies to fundraising speeches, program rollouts, grant proposals, and conference talks as much as to software. Two of the worked examples are not software (see standard/examples/FundraisingSpeech-VALUE.md and standard/examples/IndustrySubmission-VALUE.md).

A note for builders working at LLM cadence. The discipline does not require a file. If you are about to ask an LLM to build something, or you are mid-conversation and you can feel the build trap closing, the conversational deployment is in scope: walk Q1 (who is this for?), Q2 (what changes for them?), Q3 (how would a stranger know it landed?) before the next thing gets generated. No file required. The questions are the discipline; the gate still applies; the output is sometimes a committed VALUE.md, sometimes a sharper prompt that names the recipient and the change. The cadence argument and the author's 10-day self-observation that motivate this are at Standard.md "Why this discipline now" and research/audits/2026-06-17-author-self-observation.md.

It has one verdict: ready to build, or not ready to build. No grades, no scores, no "how foggy are you." Two states. You leave the runbook knowing which one you're in.

If you're ready, the rest is short: pass the Value Statement Test (§0), write the three-sentence lede into the skeleton, fill in the elaboration the skeleton asks for, and go build. If you're not ready, the rest of this page is the walk back.

About "three sentences." The three answers are the lede of your VALUE.md - the journalistic top, the Commander's Intent, the part that has to stand alone if everything else is cut. The skeleton elaborates the lede with the contract's other fields (the falsifier, the check, the status, optional promises, what you won't promise, what you won't break). Three sentences sit at the top; the rest serves them. (Heath & Heath's Made to Stick Ch. 1 calls this concentric circles; journalism calls it the inverted pyramid.)

Read this before you start: you will almost certainly fail the test on first try. That is not the runbook failing. That is the runbook working. The diagnosis is the gift; the rewrite is the work. Plan on two or three honest passes - that's the shape.

0 · The Value Statement Test (the diagnostic you carry)

The Value Statement Test is not just something you do when you're about to write a VALUE.md. It's the diagnostic you carry around with you - the thing you reach for the moment something feels off.

Use it when:

You feel build fatigue and you can't name why.
You're about to start something new and you're not sure how to state it.
A meeting just ended and the team agreed on a plan you couldn't repeat back if asked.
An LLM is producing output for you and you can't tell if any of it matters.
You're three weeks into a project and the work has stopped landing.

The test, in any of those moments, is two moves. First, the grounding question - the discovery gate that comes before articulation:

"What is the most recent specific moment I observed my recipient doing the thing my project is meant to change?"

A specific moment with a date, a place, and behavior you actually watched. Not a persona you imagine. Not a category of user. If you cannot name an observed moment - not what the recipient said they wanted, not what you assume they need, but what you saw them do - your Value Statement is fiction. Clarity without grounding is still fog, just fog with better sentences. Go observe before you write.

A grounding observation counts when the recipient hits friction, makes a workaround, or abandons the task without prompting. A stated preference, a survey response, or an interview comment does not count. The grounding is "what they did when no one was watching," not "what they said when they were."

Second move, only if the first lands cleanly: try to say your project's Value Statement out loud, in one sentence.

About the bi-directional shape, before you try. The Value Statement names what each party gives the other. Most builders skip the second half. If you can name what you deliver to the recipient but not what they deliver back, your project is half a transaction - and the missing half is usually where the value claim collapses. The "gives back" is not a promise about the recipient's behavior; it names the shape of the exchange (money, attention, data, repeat use) that you observe. Promise Theory still holds.

The shape:

"[My project] gives [the recipient] [the value]; [the recipient] gives back [what they exchange in return]."

If you stall on the second half, pick from this list. Most projects have exactly one of these as the cleanest fit:

Money - payment, subscription, contract value, donation.
Attention - time spent, return visits, engagement.
Feedback - issue reports, survey responses, written critique.
Repeat use - the act of coming back tomorrow.
Advocacy - referrals, public mentions, recommendations.
Resolution data - the artifact of their use that improves the next iteration (the on-call engineer's incident notes; the donor's pledge-card response).
Permission - consent to keep working, renewed budget, extended runway, board sign-off.

If none of these fits, the project may genuinely be one-directional (charity, public good, internal infrastructure) - name that explicitly rather than inventing a give-back.

Say it to an empty room. Say it to a friend. Say it to a rubber duck. The point is to hear yourself say it. (Do not use an LLM to judge whether your Value Statement lands; LLMs generate plausible-sounding feedback by default, and you need genuine incomprehension or genuine recognition from a real listener. Reserve LLMs for structure critique - see standard/stranger-test-prompt.md - not for this articulation test.)

If it comes out cleanly: you are closer to ready than you think. If you haven't written a VALUE.md yet, open the skeleton and write the three questions; they will go fast. If you're already mid-project, the fatigue you felt was not about the work - it was about something else (energy, scope creep, a stuck stakeholder), and the work itself is on track.
If you stumble - hedging, restarting, "well, kind of," running out of breath halfway: the stumble is the diagnosis. You are not in the build trap because you lack discipline; you are in the build trap because you cannot say what you are building. The three questions below are how you walk back to the sentence you couldn't say. If you already have a VALUE.md, this is the signal to re-read it - the project has drifted from what you wrote down.

Most builders cannot complete this sentence cleanly the first time. That is not a failure of the builder. That is the test working. The stumble is the diagnosis; the three questions are the cure.

The test works because articulating changes understanding. Subjects who explain to themselves while studying retain and apply material in ways silent readers do not (Chi, Bassok, Lewis, Reimann, Glaser 1989). The folk engineering version is rubber-ducking (Hunt & Thomas 1999). Geoffrey Moore named the same move the Elevator Test (Crossing the Chasm, 1991): if you can't say it in the time an elevator takes, you don't have a thinking defect about communication; you have a communication defect about thinking.

Two worked examples of clean Value Statements:

"Pinpoint gives the on-call engineer ranked likely causes within 90 seconds of the page; the engineer gives back the resolution data that sharpens the next page's ranking."
"This runbook gives the builder a 30-second test plus six walk-back sections; the builder gives back honest answers to questions they didn't think to ask themselves."

If you got a clean Value Statement, continue to §1 to write or refresh your VALUE.md. If you stumbled, continue to §1 anyway - the three questions will diagnose where you got stuck.

1 · The check (60 seconds)

Get the discipline into the moment. Three paths:

Reading this on the web: copy the skeleton from standard/Skeleton.md (or use the "Copy skeleton" button on the index page) and paste it into a new file called VALUE.md wherever your project lives - the root of your repo, the top of your campaign brief, a Google Doc, a Notion page, the head of your speech outline.
Reading this with the repo cloned: cp standard/Skeleton.md /path/to/your/project/VALUE.md.
Running at LLM cadence (no file): answer the three questions out loud, in the chat, or silently to yourself, before the next thing gets generated. The gate items still apply. Skip to the gate enumeration below; come back to the file paths if and when an artifact warrants a committed contract.

Open it (or open your mouth). The skeleton has the three questions already in place with placeholders to replace - no comments, no noise. Without checking notes, fill in:

Q1 - Who it's for? The one specific person this is for. Not "users," not "the team." Someone you could describe in one more sentence of past behavior.
Q2 - What changes for them? The change in their behavior or state - what they can do or feel that they couldn't before. Not what you ship.
Q3 - How will you know? What proof would tell you the change landed, and how a stranger could check it without you?

Write each section in plain sentences directly into the skeleton, replacing each <placeholder>. Don't free-style in a separate scratch file. Writing inside the skeleton means the structure is enforcing itself as you write - every section already names its job.

If you want the why behind each section - the failure modes, the named cognitive traps each question prevents - read Template.md once. It's the annotated version, with the rules in the margins. The skeleton is your working file; the template is the teacher.

Once the three are filled in, run the six-part test on what you wrote.

You're ready to build when the Q0 grounding precondition holds and all six gate checks pass. The gate has two tiers: format checks you administer on yourself, and one external check no one can game alone.

If more than one person is building. Each builder fills in Q1-Q3 independently before comparing. Divergence in Q1 (different recipients named) is a team alignment problem this gate does not solve - resolve it before proceeding.

The Q0 precondition plus the six-part gate, in order

Q0 is the grounding precondition - check it first, before the gate. Then run the six gate checks below. Five of the six are self-administered. One - check 2 - requires a real stranger. They all sit in the same numbered list because they all have to hold; the "external" label on 2 tells you which one cannot be faked. We number Q0 alongside checks 1-6 below so the order on the page matches the order in your head.

0. Q0 grounded (self - precondition). You can name when (date), where (setting), and how long (duration) you observed your recipient doing the thing your project is meant to change. If you cannot, the gate auto-fails: the three sentences below will be a fiction you can polish past every other check. Go observe before you write.
1. No hedge-words (self). No "maybe," "I think," "kind of," "somewhat," "possibly," "ish." A friend reading your three sentences couldn't circle a single one.
2. A stranger gets it (external - cannot be self-administered). Get the three sentences in front of someone who isn't on the project. The other checks can all pass on a sentence that means nothing outside your head; this is the one that catches that.

Open Slack, iMessage, or email. Paste this to one real person who is not on the project:

"Please read these three sentences with no context and answer in your own words: (1) who is this for? (2) what changes for them? (3) how would you know it happened? If you can't answer, say what was unclear."

The ladder, in order of fidelity:
- Best: text or hand them to one real person, no context, and ask: "who is this for, what changes, how would you know?" If they can't answer in their own words, the file isn't ready. (Kahneman & Klein 2009 call this adversarial collaboration.) Then ask: "what would you tell this builder to do next, and what would you tell them to stop doing?" - if they cannot answer that, the check failed even if they nodded along. Polite "yes, I get it" is the most common false positive on this check; asking for a directive is the cheapest way to surface whether the three sentences produced a clear next move for someone outside the project.
- Acceptable: a trusted colleague who is not on the project. A real human who can genuinely fail to understand. (Hunt & Thomas 1999, The Pragmatic Programmer: rubber-ducking - but with a person, not an LLM. LLMs trained to be helpful will generate plausible-sounding feedback rather than the genuine incomprehension a real stranger provides.)
- Last resort, no real stranger available right now: use the stranger-test prompt at standard/stranger-test-prompt.md. Paste your VALUE.md after the prompt, send it to Claude or ChatGPT or Gemini, and read the three findings it returns. This is structure critique, not the articulation test §0 warned against - they are different jobs. The §0 warning is about whether you can say your Value Statement out loud and have a real listener react to it; that requires genuine incomprehension or genuine recognition, which an LLM cannot fake. The stranger-test prompt here asks the LLM to scan a finished document for specific structural failures (subject-test, recipient drift, unfalsifiable Q3) - things an LLM is good at - and produce findings in a fixed format you can act on. A cold-read LLM is still a weaker stranger than a real human, and it will sometimes produce plausible-sounding feedback, but it catches structural problems more reliably than your own walkthrough. Log the result, then find a real stranger before the next cycle.
- Truly last resort (auto-fail flag): walk through the sentences as if you were the named recipient. At each sentence, ask: what would I do next? where would I get stuck? (Wharton et al. 1994: cognitive walkthrough.) Using this option means the check auto-fails - log the cognitive walkthrough result, then find a real stranger or run the stranger-test prompt before building.
3. You can read it out loud without adding footnotes (self). No "well, technically…" No silent caveats. The sentences stand alone.
4. One recipient across all three sentences (self). Highlight every name or role in your three sentences. If the who shifts (Natan in S1, the donors in S2, the audience in S3), you have two projects in one file. Either pick one as the headline recipient and move the others into separate promises, or split into two VALUE.md files.
5. The subject test (self). Highlight the grammatical subject of every promise and every Breaks if: line. The subject must be you (the builder), or something you ship. If the subject is the recipient's behavior ("the client pays"), a third party ("investors agree"), or the world ("the market opens up") - the promise is malformed. You can only promise your own behavior. (Mark Burgess, Promise Theory, 2005: the autonomy axiom.) Rewrite until the subject is yours, or move the recipient's behavior into a proxy you observe - "the invoice file is delivered with X fields by Y date" (yours) instead of "the client pays" (not yours).
6. You named what you don't break (self). For your headline outcome, name one thing in the system that gets worse, or could get worse, if you optimize for this outcome. Write it in the What we don't break section. If you can't name one, you haven't finished thinking. (Robert K. Merton 1936, American Sociological Review 1(6): "the imperious immediacy of interest" - the named outcome overrides system-level costs you didn't measure. Charles Goodhart 1975, Marilyn Strathern 1997: when a measure becomes a target, it ceases to be a good measure.)

Any one of the six fails → you're not ready to build. Stop. Read the rest of this page. All six pass → you're ready. Save your VALUE.md at your project root. Close this page. Go build.

Stop condition. If you cannot resolve the failure mode with the walk-back (after the three honest passes described in §3), the gate result is: do not build this yet. This is a legitimate outcome, not a failure of the runbook. The standard exists to make "do not build this yet" a defensible answer.

That's the test. You administer it on yourself; you live with the result.

What a passing file looks like (30-second preview)

Before you walk through what "not ready" looks like, anchor on the shape of a clean one. This is the Pinpoint example - a fictional incident-cause-ranking tool for on-call engineers. Full annotated version at the bottom of this page; the compact preview is here so you can see the shape now, not in 10 minutes.

Value Statement: Pinpoint gives the on-call engineer ranked likely causes within 90 seconds of the page; the engineer gives back the resolution data that sharpens the next page's ranking.

Q1 - Who. The on-call engineer who gets paged at 3am and has to find the cause fast. Not their VP. Not "users."

Q2 - What changes. Before: she opens the alert, navigates to whichever dashboard she most recently used, starts searching. Median time-to-cause: 22 minutes (n=14 observed pages). After: the alert arrives carrying three ranked likely causes, each linked to the specific dashboard or log slice where the evidence lives. Median time-to-cause drops below 5 minutes.

Q3 - How you'll know. Each resolved incident records whether the real cause was in the top three at page time. The named recipient (the engineer who took the page) attests yes or no. Breaks if: the real cause is outside the top three on more than 10% of incidents over any rolling 30-day window. Check: Field - per-incident yes/no, logged in audits/q3-acceptance/YYYY-Q.md.

What we don't break: if engineers stop reading the dashboard themselves and trust the ranking, their ability to debug without the tool on the rare wrong day degrades. Counter-metric: ability to debug without Pinpoint, sampled quarterly.

That's the whole thing. One sentence per question, a measurable falsifier, a counter-metric. The shape is what you're aiming at; the rest of this page is how to get yours into that shape.

2 · What "not ready" looks like, and how to walk back

You're not ready for one of six reasons - one for each check that broke. Each has a different fix. Find which check broke and read that section.

If sentence 1 broke (Who)

You wrote "users," "the team," "developers," "everyone," "the market" - a category, not a person. Or you wrote a specific role but couldn't describe their last frustrating thirty seconds.

The failure. Your brain swapped a hard question ("who is the one person whose life this changes?") for an easier one ("what kind of person might find this useful?"). The swap is invisible to you while it's happening - Daniel Kahneman calls this attribute substitution (Thinking, Fast and Slow, Ch. 9). You won't notice you did it.

The walk back. Write one sentence of past behavior about the recipient. Not who they are - what they were doing the last time they hit the pain your project solves. "An engineer who, last Tuesday at 2am, retried npm test four times hoping the failure would go away." That's a person you could draw as a stick figure doing a specific thing at a specific moment. Chip and Dan Heath call this the concreteness test (Made to Stick, Ch. 3) - "specific people doing specific things."

You'll know you crossed it when you could describe the recipient to a stranger in two sentences and they could draw the stick figure.

If sentence 2 broke (What changes)

You wrote a thing you'll ship - "we'll add export," "the API supports Y," "the page is live." Or you wrote a feeling - "they'll love it." Either way, sentence 2 is not a change in the recipient.

The failure. Output dressed as outcome. If your sentence is true the moment you finish writing the code - regardless of whether anyone uses it differently afterward - it's a feature, not a change. Melissa Perri named the systemic version of this (Escaping the Build Trap) - and Richard Rumelt named the strategic version: "If you fail to identify and analyze the obstacles, you don't have a strategy. Instead, you have either a stretch goal, a budget, or a list of things you wish would happen" (Good Strategy / Bad Strategy, Ch. 3).

The walk back. Write "After this lands, [the person from sentence 1] does or measures ___ differently." Fill the blank with a verb on the recipient's side - an observable behavior or a measurable state.

About "feels." Feelings are not banned, but they're a trap. If your verb is feels, you have a feeling - and a feeling needs an observable proxy: a survey score with a threshold, a stop-doing-X behavior, a self-report on a fixed instrument. Without the proxy, you have the weather, and the standard can't promise the weather. The proxy is what makes the feeling falsifiable. (This is Avedis Donabedian's structure / process / outcome distinction in miniature - Milbank Quarterly 44(3), 1966. Process and outcome are observable; "feeling supported" is neither unless you name how it's measured.)

A test that works: imagine the project gets cancelled mid-build and your team ships something completely different. Does your sentence 2 still describe a real change for the recipient? If yes, it's a change. If no, you wrote a feature.

You'll know you crossed it when sentence 2 starts with the recipient as the subject, not you or your code - and the verb is either observable behavior or measurement against a named proxy.

If sentence 3 broke (How you'd know)

You wrote "we'll know it's working" - vague. Or "it'll feel right" - your judgment, not a stranger's. Or "tests pass" - proves the code, not the value.

The failure. No falsifier means no learning. Stanislas Dehaene's third pillar of learning is error feedback - "whenever we are surprised because the world violates our expectations, error signals spread throughout our brain. They correct our mental models" (How We Learn). No expectation = no surprise possible = no learning. Your project becomes a feedback circuit with the wire cut.

The walk back. Specify the proof so concretely that a coworker with zero context could check it in sixty seconds and tell you yes or no. The recipient decides whether the change happened, not you. Not green tests. Not "looks done." Not your gut.

Three quick tests for sentence 3:

Pointing-and-Calling. Could you point at the thing? "Conversion rate" - point at the chart. "User happiness" - fail. (James Clear, Atomic Habits Ch. 4, citing the Japanese rail safety system.)
Number over a window. Not "fast," but "under five minutes, median, over thirty days." A number and a clock make a promise breakable and checkable.
Question form. Rephrase as a question. "Engineers retry less" becomes "Will engineers retry npm test less than 2× per failure?" If you can't pose it as a question with a yes/no answer, sentence 3 isn't proof yet.

You'll know you crossed it when a stranger could check sentence 3 alone, in sixty seconds, without asking you anything.

Warning - a well-formed proxy is necessary but not sufficient. Sentence 3 can pass the stranger check while measuring something that does not equal real value (the dashboard turns green, the user still doesn't come back). This is proxy theater - see the subject-test walkback below for the full diagnosis. The "what we don't break" section (gate check 6) is the brake that catches it.

If the subject test broke (you promised the weather)

You wrote a promise or a Breaks if whose subject is the recipient's behavior, a third party, or the world. Examples: "the client pays in five days," "investors fund the round," "users adopt the feature." All concrete. All falsifiable. All outside your control.

The failure. You can only promise your own behavior. Promise Theory's autonomy axiom (Mark Burgess, IFIP/IEEE DSOM 2005): an agent can only make promises about what it will do. A promise about another agent's behavior is malformed by construction - the formalism rejects it. The standard's own rule restates this: "don't promise the weather."

The walk back. Move the recipient's behavior into a proxy you observe and ship. Instead of "the client pays," write "the invoice file is delivered with the correct fields by the fifth business day." Instead of "users adopt," write "the onboarding flow takes a new user from sign-up to first action in under three minutes." The first is the weather; the second is something you do.

Warning - proxies can drift, and proxies can become theater. A clean proxy can pass the subject test and still drift from the real outcome it's standing in for. The onboarding flow runs in under three minutes - and the user still doesn't come back tomorrow. The invoice ships on time - and the client still doesn't pay. The proxy passing is necessary but not sufficient. Worse, the more elegantly instrumented the proxy is, the easier it is to mistake the green dashboard for the real win - this is proxy theater: the check passes, the recipient still does not change, and the team congratulates itself because the doc said the threshold was honest. The "what we don't break" check (next section) is what catches both drift and theater: name the counter-metric that would tell you the proxy lied, and choose that counter-metric so it cannot be gamed by the same instrumentation that produced the proxy.

You'll know you crossed it when every promise and every Breaks if has a subject you ship - not a behavior you hope for - AND you have named what would have to be true outside the proxy for the proxy to still be measuring real value.

If the side-effect check broke (you didn't name what you'd break)

You named a headline outcome and stopped there. Example: "the PM saves 2 hours/week" (named outcome) while "the auto-generated docs become hallucinated garbage" (unstated side effect).

The failure. When a measure becomes a target, it ceases to be a good measure (Charles Goodhart 1975; Marilyn Strathern 1997). A project can hit its named outcome while making the broader system worse. Robert K. Merton called this "imperious immediacy of interest" (American Sociological Review 1(6), 1936): the named goal overrides the system-level costs you didn't measure because no one made you. The fix is to make yourself.

The walk back. Add one bullet to your What we don't break section: the single thing that gets worse, or could get worse, if you optimize hard for the headline outcome - and the threshold that would prove it. Examples:

Outcome: "PM saves 2 hours/week." Don't break: "if the auto-generated wiki pages fail a manual accuracy spot-check on a 10-page sample, the project's value is negative; we re-evaluate."
Outcome: "engineer ships PRs in half the time." Don't break: "if the rolling 30-day defect rate doubles, we're not faster - we're moving the cost downstream."

You'll know you crossed it when a sympathetic skeptic, reading both the outcome and the don't break line, could not name a third side-effect you missed.

3 · After "not ready"

You walked back, you rewrote. Now run the check again. Hedges, stranger, read-aloud, one recipient, subject test, what-you-don't-break. Same bar.

If all six pass - you're ready. Save your VALUE.md and go build.

If something still fails - you found a real problem. That is not a failing of the runbook. That is the runbook working. Most projects in fog stay in fog because nobody made them pass six honest checks. You just did. The wall isn't the runbook. The wall is the part of the project that wasn't ready, and now you can see it.

You get three honest passes. If the file still fails on the fourth pass, the project itself is not ready - that is the real diagnosis. Stop. Pick a different problem, or come back to this one when you can answer the failing check without hedging. Polishing a draft past pass three is the runbook's own version of the build trap.

Two honest moves before pass three:

Rewrite once more. Sometimes the third pass is the one that lands. The act of writing the sentences is the act of finding what you actually meant. Karl Weick: "How can I know what I think until I see what I say?"
Look at a working example. Pattern-match against the Keep a Value standard's own contract or the Pinpoint example below. See the shape of a passing file, then ask: where does mine break that shape?

4 · Once you've shipped your VALUE.md

The check at the top of this page is for before you build. The discipline keeps working while you build.

Start every work week by re-reading your VALUE.md. Is the project I'm about to do this week still pointed at the recipient, the change, and the proof I wrote down? If yes - build. If something shifted - edit the file before you write code. The file is not sacred. It's a tool you keep sharp.
If you read it back and acted on autopilot anyway, that is also data. The document worked as a mirror and you looked away. This is the failure mode the runbook cannot prevent by itself; naming it is part of using the standard honestly.
End every work week with one honest question. "What did I do this week that moved sentence 3? What did I do that didn't?" The honest answer is the input to next week. (James Clear's three reflection questions in Atomic Habits Ch. 20 are the form; one is enough.)
Kill the project (or shrink scope until a smaller proof becomes observable) when the proof has been unobservable for two cycles AND you can't write a smaller, sooner proof. That's the kill criterion. Editing the file is normal. Killing the project is when the edit would require lying about what's happening. Do NOT "kill the file but keep building" - that is the build trap with a paperwork waiver. If the rewritten Q3 no longer describes the Q2 change for the Q1 recipient, it is not a smaller proof - it is a different project; start a new VALUE.md. This closes the meta-loophole where a builder shrinks Q3 indefinitely while still calling it the same project.

4.5 · The Status lifecycle

Every Q3 check and every promise carries one of four states. They are not labels you pick by feel; each has a concrete trigger that moves you to the next.

Proposed. The default. You have written the claim. You have not yet wired the check that would catch it failing. Most claims start here and stay here until you build the check. "We will know X by measuring Y" is Proposed if Y is not yet being measured.
Promised. Move here when the check is in place and would fire if the claim failed. The script runs. The dashboard exists. The survey is scheduled. The audit cadence is on the calendar. You are now exposed - a failure can be seen, by you and by a stranger.
Proven. Move here when the check has fired positive at least once on real conditions (not a dry run, not a synthetic test). The change happened, the recipient said so, the proxy crossed the threshold. Proven decays. Treat it as "verified at date X under conditions Y" - a re-verification cadence is part of staying Proven.
Broken. Move here the moment the check fires negative on real conditions. Loud. Visible. Not soft-pedaled. A Broken claim is not a failure of the standard; it is the standard doing its job. The next move is either repair (with a new check) or honest withdrawal.

The bar to move from Proposed to Promised is the most common stuck point. Builders write Proposed claims and never wire the check, then ship anyway. The runbook's discipline: if a promise has been Proposed for more than two cycles (two weeks of weekly re-reading, two months of monthly re-reading - pick your cadence and hold it), either wire the check or rewrite the claim down to something you can check today.

A Proven claim that has not been re-verified for an agreed window (typically one to four quarters depending on volatility) drops back to Promised, not Broken. The check still exists; the evidence is just stale.

5 · A worked example

Pinpoint (clean, fictional)

Who. The on-call engineer who gets paged at 3am and has to find the cause fast. Not their VP. Not "users."
What changes. Median time-to-cause drops from 22 minutes to under 5 - because the alert arrives with three ranked likely causes, each linked to evidence.
How you'd know. Each resolved incident records whether the real cause was in the top three. Breaks if it isn't, over any rolling thirty-day window.

Three sentences. No hedges. A stranger gets it. Read aloud without footnotes. Ready.

6 · Next time

You don't need to re-read this page on your next project. You ran the check; you know the shape now. Open the skeleton. Three sentences for the lede, the rest as the skeleton asks. Six checks. Ready or not ready.

If a year from now you still can't run the check unaided - the runbook failed you. Tell us. That's how we know.

This runbook is part of Keep a Value. Standard at the root.