TL;DR — Changing something other things depend on, in one irreversible step, is how you get paged. The fix is expand → contract: add the new shape, run the old and new side by side, migrate in small reversible steps, and remove the old shape last — only once you can prove nothing needs it. A database column is the clearest example, but the same pattern fixes API changes, risky rollouts, and redirects.
It’s 11:32 PM on a Friday and my phone is buzzing like crazy, about to vibrate off the nightstand. That afternoon I’d merged what looked like a harmless bit of cleanup. One line:
ALTER TABLE users DROP COLUMN legacy_email;
CI was green. The app didn’t use legacy_email anymore — I’d checked. What I hadn’t checked was the nightly export job, written two years ago by a team that doesn’t even exist anymore, that still SELECTed it. At 11 PM it ran, failed, and started paging whoever was on call. And here’s what turned a bug into an incident: there was no undo. Reverting the deploy brought the code back, but the column — and every byte in it 🥲 — was gone. The real fix was a restore from backup at midnight.
The mistake wasn’t dropping the column. It was dropping it all at once, in a single irreversible step, while I was still guessing about who depended on it.
Why this is scarier than it looks
A schema change isn’t just another change. It’s something three things have to agree on at the same instant: the database, the code writing to it, and everything reading from it — other services, jobs, replicas, that export nobody owns. And it’s not only other systems: during a rolling deploy your own app runs the old and new code side by side for a few minutes, so both versions have to be happy with the same schema at once. And that “just deploy the migration and the code together”? It never happens at the same instant: there’s always a window of a few seconds between the two. That window is where the danger lives — some code is reading a column that’s already gone, or writing to one that doesn’t even exist yet.
The pattern: expand, then contract
The fix is a pattern old enough to have a few names — expand/contract, or parallel change. The whole idea is to never let the old shape and the new shape be mutually exclusive at any single moment. Instead of flipping everything at once, you keep both running side by side and cross over in small steps, each deployable on its own:
flowchart LR A[1. Add new column] --> B[2. Backfill] B --> C[3. Write to both] C --> D[4. Read from new] D --> E[5. Stop writing old] E --> F[6. Drop old column]
Walk through it, and watch how at every step both shapes are valid — so nothing is ever forced to be in sync:
1. Expand: add the new column. Purely additive. Nullable, no default that rewrites the whole table. Nothing reads it, nothing breaks — a safe, boring deploy.
ALTER TABLE users ADD COLUMN email_verified boolean;
2. Backfill, in batches. Fill in the existing rows from the old data. Two traps here. First, a single unbounded UPDATE takes a lock on millions of rows and your “quick migration” becomes a 40-minute outage — loop over id ranges so each transaction stays small. Second, make the backfill resumable: track the last id you handled, so if it dies halfway you restart without redoing work or skipping rows.
UPDATE users SET email_verified = legacy_verified_at IS NOT NULL
WHERE id BETWEEN $1 AND $2; -- one chunk at a time
3. Write to both. Deploy code that writes the new column and the old one. This is the step people skip, and it’s the one that makes the whole thing safe: because old and new code run at the same time during the deploy, the old code is still creating rows, and you need those rows correct in both places.
4. Read from the new column. Flip the readers over — ideally behind a flag, so this step alone reverts in seconds. The old column is still written and still holds its data, so if the new path is wrong, you flip straight back.
5. Stop writing the old column. Nothing depends on legacy_email now. But its data is still sitting there. This is your last safe checkpoint.
6. Contract: drop the old column — last. The only irreversible step in the whole sequence, so earn it. Don’t guess that the column is dead; prove it. Add a log line or a counter that fires whenever anything reads it, watch your database’s query stats for stray SELECTs, and let it sit for a grace period. When that counter has been flat at zero long enough that dropping it feels boring, drop it. Boring is the goal.
The single DROP had exactly one moment, and it was a cliff. Expand/contract has six moments, five of which you can walk back from with a git revert or a flag flip — and it quarantines the single point of no return to the very end, when you have the most information and the least doubt. No 11 PM page 😴.
From one column to a whole subsystem
The column is about as simple as it gets: one column, one DROP. Now watch the same pattern carry real weight. On a product I worked on, the navigation menus lived in the database (we had our reasons 😏) — each menu row had its label hardcoded, in a single language. Until we needed those menus translated into several languages. The tempting move is the big bang: introduce a translation layer and rewire every read in one release. It’s the Friday-night DROP all over again, just at a much bigger scale. Expand/contract instead — and the old hardcoded label still does one last useful job before it’s removed.
flowchart LR A[1. Add translate_key column] --> B[2. Backfill keys from labels] B --> C[3. Seed source locale, auto-translate the rest] C --> D[4. Write label and key] D --> E[5. Read by key, label as fallback] E --> F[6. Drop the old label columns]
1. Expand: add a nullable translate_key column to the menu table. Additive and idempotent — nobody reads it yet, nothing breaks.
2. Backfill the keys. A migration walks every existing menu and derives a stable key from its current label — slugified into something like billing.invoices — and writes it into the new column. Re-runnable, so a half-finished run just picks up where it left off.
3. Seed, then fan out. The label that was already sitting in the row becomes the source-language value for that key in the translation service. A background job takes it from there, translating each key into the other locales and writing every result back. The old hardcoded text wasn’t thrown away — it was the seed for its own replacement.
4. Write to both. The menu editor now saves the fixed label and the key, side by side — exactly like the two columns in the earlier example.
5. Read by the key. Rendering resolves each label by key plus the user’s locale, with the original hardcoded label as the fallback whenever a translation is still missing. Behind a flag, so the whole switch reverts in seconds.
6. Contract. Once every menu has a key and nothing reads the fixed columns anymore, drop them. Last, and only then.
Notice what changed and what didn’t. The “new shape” here isn’t a column — it’s a key backed by an entire translation subsystem, jobs and locales and all. The pattern didn’t care: add the new thing, keep both alive while you migrate, remove the old thing last. And the detail I love most — the thing being replaced seeded its own replacement. That hardcoded label was the seed every translation grew from, right up until the moment it was safe to delete.
The column was just the clearest example
Underneath the SQL, the pattern has nothing to do with databases: add the new thing, keep both alive while you migrate, remove the old thing last. It shows up anywhere a change has dependents you don’t fully control.
Renaming a field in an API. Expand: return name next to the old full_name. Both are valid, so every client keeps working and migrates on its own schedule — no flag day. The “backfill” is just patience: you watch which field clients actually request. Contract: delete full_name once that number hits zero.
Rolling out risky logic. Expand: ship the new code path behind a flag, defaulted off — deployed but dormant. Run both: ramp it 1% → 100% (or run new and old in parallel and compare), watching the graphs. Contract: delete the flag and the old branch once the new one has earned it. The flag is your reversible cutover; deleting it is the one-way step, saved for last.
Moving a URL or a domain. Expand: serve the new location with a 302 — a redirect that works now but stays reversible, because nothing caches it forever. Contract: only once the traffic has drained and you’re sure, promote it to a 301, the permanent version you can’t take back.
The same holds for evolving an event payload, a config format, anything with consumers you don’t own. What’s constant is the discipline: never make the old and new worlds mutually exclusive, and keep the single irreversible act — the DROP, the deleted flag, the 301 — for last, when doubt is lowest.
When following every step is overkill
Six careful steps for a twelve-row config table nobody reads in production? No need. The caution should match the cost of being wrong: a heavily-read column on a huge table, with readers you can’t even list, earns every step; a tiny table only you touch is fine with plain judgment. The pattern is a tool, not a ritual — use only as much of it as the blast radius warrants, and no more.
Expand, run both, contract — and make the irreversible step the last one. That’s the pattern. The column was just where it paged me. Your weekends will thank you.
Going deeper
- Parallel Change — Danilo Sato, on martinfowler.com. The canonical write-up of the pattern (expand → migrate → contract), where the name comes from.
- Backward compatible database changes — PlanetScale. A practical, step-by-step walkthrough of expand/contract for a column change, in both the schema and the app code.
- Online migrations at scale — Jacqueline Xu, Stripe. The same idea on a big table: the four-step dual-write / backfill / read-switch / cleanup migration.
- Evolutionary Database Design — Pramod Sadalage & Martin Fowler. The wider context: migrations as first-class, version-controlled artifacts that let a schema evolve continuously.
- Refactoring Databases: Evolutionary Database Design — Scott Ambler & Pramod Sadalage. The book-length treatment, with a catalog of small, safe schema refactorings.