Default to the end-state

A platform team I work with is moving our compute from x86_64 to arm64. The migration plan, roughly:

  1. Multi-arch the container builds.
  2. Add graviton = true to the service's Terraform module call (touch every repository).
  3. Once every service has flipped the flag, remove it (touch every repository again).

Step 3 is where I want to stop.

Step 3 is a second sweep across every repository that participated in step 2 — re-opening each service, deleting the flag, re-deploying. It is the most expensive part of the migration and the easiest to defer indefinitely. Most migrations never finish step 3.

There is a cheaper version of the same plan. Change the flag's polarity.

Instead of graviton = true, expose legacy_x86 = true. The default — the result of doing nothing — is now the migration target. Step 2 becomes "services that aren't ready opt out". Step 3 becomes "the platform team deletes the flag handling in a single PR". No second sweep. No long tail.

The same trap shows up outside of infrastructure. When we migrated our protobuf TypeScript generation from one toolchain to another, every package depended on a build script called proto:gen to produce its TypeScript bindings. The natural instinct was:

  1. Introduce a second build script proto:gen:v2 next to the existing proto:gen.
  2. Migrate every package to call proto:gen:v2 (touch every repository).
  3. Once every package is on v2, delete the old script and rename proto:gen:v2 back to proto:gen (touch every repository again).

Same shape. Two sweeps.

The polarity-flipped version:

  1. Rename the existing script to proto:gen:classic (touch every repository, but no behavior change).
  2. Introduce the new generator under the canonical proto:gen name. New code lands on it by default.
  3. Migrate packages off proto:gen:classic opportunistically, and delete it once nothing depends on it.

proto:gen always means the current canonical generator, so no package has to touch its build script just to keep up with the migration.

The underlying idea has a name borrowed from API design: the pit of success. Doing nothing should land you in the state you want. A flag — or a script name, or a default config — that points away from the target inverts that. Doing nothing keeps you in the past.

So when I find myself proposing a migration flag, I ask:

If every team ignores my Slack message, where do they end up?

If the answer is the old behavior, I flip the flag around.