Startups love saying they’re “leaving the cloud.” Most of them aren’t. They’re renegotiating commitments, moving one or two expensive services, and keeping the rest right where it is.
That’s not a cop-out; it’s maturity. The cloud exit narrative has been hijacked by two camps: people who treat AWS/Azure/GCP bills as moral failure, and cloud vendors who treat every repatriation story as a rounding error. Both sides miss the operational truth: repatriation is a workload-by-workload supply-chain decision, not a personality trait.
We’re in 2026. If you run a real product, you’re already multi-tenant across vendors in some form: SaaS dependencies (Stripe, Twilio, Snowflake, Datadog), CI/CD (GitHub Actions), a model API (OpenAI, Anthropic, Google), and at least one hyperscaler for core compute. The question isn’t “cloud or not.” The question is whether your current architecture is priced like an experiment while your company is priced like a business.
The mistake: treating cloud bills like “waste” instead of a contract + architecture problem
Cloud invoices feel personal because they look like receipts. But the bill is only partly about runtime. It’s also about contracts (commitments), defaults (managed services you didn’t re-evaluate), and organizational design (who is allowed to change what).
People point to Basecamp/HEY publicly moving parts of their stack off the cloud, and to 37signals’ years-long argument that cloud costs are frequently mismanaged. That story resonates because it’s concrete: they ran the math, bought hardware, and did the work. But it’s easy to extract the wrong lesson: “cloud is a scam.” The right lesson is duller and more useful: your architecture and purchasing model must match your steady-state usage.
On the other side, hyperscalers respond with an equally incomplete truth: “customers choose what’s right; most stay.” Sure. The cloud is objectively convenient. It also has a specific failure mode: once you’ve assembled a stack of managed services, your unit economics can become a function of vendor pricing and data gravity rather than engineering choices. That’s not evil. It’s just how the incentives line up.
Repatriation isn’t “cloud vs on-prem.” It’s a spectrum of control
The interesting shift isn’t that companies discovered colocation again. It’s that “cloud” has become a bundle of distinct products with radically different economics: commodity compute, proprietary PaaS primitives, managed databases, data warehouses, observability pipelines, AI accelerators, and edge delivery. You can repatriate one slice while doubling down on another.
Three repatriation archetypes that keep showing up
- Compute repatriation: Move steady, predictable CPU workloads off EC2/GCE/VMs onto owned hardware or long-term leased capacity. Keep burst in the cloud.
- Data repatriation: Keep compute near users, but pull bulk storage, cold data, and certain analytics pipelines into cheaper, controllable environments. Sometimes this is “cloud-to-cloud” (e.g., out of one hyperscaler into another) rather than to physical metal.
- Control-plane repatriation: Keep workloads in the cloud but remove proprietary glue. Examples: moving from a hyperscaler’s managed Kubernetes add-ons to upstream Kubernetes patterns; using Postgres on VMs instead of a managed database for specific profiles; using OpenTelemetry instrumentation rather than vendor-specific agents where feasible.
Founders like the compute story because it’s easiest to explain to a board. Operators should care more about the control-plane story. The fastest way to get trapped isn’t EC2 pricing; it’s the accumulation of “small” proprietary dependencies that become untouchable.
Key Takeaway
Repatriation that starts with ideology ends in a rewrite. Repatriation that starts with a workload inventory ends in a procurement change.
Table 1: Practical comparison of infrastructure options founders actually choose (not ideology)
| Option | Best for | Tradeoffs | Real examples |
|---|---|---|---|
| Hyperscaler IaaS (EC2/GCE/Azure VMs) | Fast iteration, mixed workloads, global footprint | Cost variance; egress friction; easy to sprawl | AWS EC2, Google Compute Engine, Azure Virtual Machines |
| Managed PaaS databases | Small teams that need uptime fast | Higher steady-state cost; limited deep tuning; version constraints | Amazon RDS/Aurora, Cloud SQL, Azure Database for PostgreSQL |
| Colocation + owned hardware | Predictable load; strong cost control; long-lived services | Upfront planning; staffing; slower capacity changes | Equinix, Digital Realty (colo facilities) |
| Dedicated bare metal (rented) | Quick escape from cloud pricing without buying servers | Capacity planning still needed; fewer managed features | OVHcloud, Hetzner, Scaleway, Equinix Metal (historically; service evolved) |
| “Cloud exit lite” (contract + architecture tuning) | Teams that over-bought managed services or under-used commitments | Requires discipline; savings can evaporate if governance is weak | Savings Plans / Reserved Instances (AWS), committed use discounts (GCP), Azure Reservations |
The parts everyone forgets: egress, managed-service gravity, and organizational drag
If you ask an engineer why repatriation is hard, you’ll hear “databases” and “networking.” True, but incomplete. The real blockers are economic and human.
Egress is the tax you only notice after you’ve architected your data flows
Cloud egress fees are public, and the pattern is consistent: pulling data out costs money; moving data around inside a provider is easier. That shapes architecture over time. Your system becomes a set of assumptions about where bytes live. Repatriation breaks assumptions first, systems second.
The contrarian move: treat egress like an architectural constraint from day one, even if you never leave. That means fewer cross-region data dependencies, explicit data contracts between services, and a bias toward data formats you can move without rewriting half your pipeline.
Managed services are sticky because they are genuinely good
Aurora, BigQuery, DynamoDB, Cloudflare’s edge network, managed Kafka offerings—these products solve real problems. The trap is that they solve problems in a proprietary way. If your team has never operated Postgres backups, never tuned a Kafka cluster, and never handled incident response for storage failures, you haven’t “outsourced undifferentiated heavy lifting.” You’ve deleted the skill from your company.
That’s fine until your priorities change. Then “leaving” becomes a hiring plan, an on-call redesign, and a multi-quarter migration project—before you touch a single server.
Your biggest infra risk is not cost. It’s governance
Cloud sprawl is usually a permissioning problem disguised as a cost problem. If every team can provision anything, they will. If no one owns lifecycle management, nothing gets deleted. If the FinOps function is advisory-only, it becomes a newsletter.
Most cloud cost “optimization” work is just rewriting the company’s rules about who is allowed to create infrastructure—and what happens when they do.
A contrarian rule: don’t repatriate your database first
“Our database is expensive” is the classic trigger. It’s also how migrations die: you start with the most critical system, discover ten years of implicit behavior, and stall. Databases are the crown jewels; treat them that way.
If you want repatriation to succeed, start with the boring stuff that drains money quietly and doesn’t require a company-wide freeze:
- Batch compute that runs on a schedule and doesn’t need instant scale.
- Stateless services behind a well-defined API and good observability.
- CI runners or build farms, where costs can be dominated by always-on machines and artifact transfer.
- Non-production environments with clear shutdown policies (and enforcement), not polite reminders.
- Log retention and cold storage where you can change policies without changing application behavior.
Once you can move these, you’ve proven three things that matter more than the database itself: you can ship infra change safely, you can observe it, and you can run it with your existing team.
The tooling reality in 2026: Kubernetes won, but “Kubernetes everywhere” is still a bad plan
Kubernetes is the default substrate for portable orchestration. That does not mean your company should run it in every environment you touch. “We’ll just run K8s on-prem” is the new “we’ll just build our own database.” It can be right, but it’s rarely free.
Here’s the posture that holds up under pressure: use managed Kubernetes (EKS, GKE, AKS) where it saves operational load, and keep your manifests and platform assumptions close to upstream Kubernetes so you can move if you must. Avoid provider-specific ingress, identity, and storage plugins unless you have a clear exit plan.
What “portable” looks like in practice
Portable doesn’t mean “no cloud services.” It means you can swap critical layers without rewriting everything above them. A simple sanity test: if you had to move one environment from AWS to a colo facility, could you keep your deployment workflow, secrets management model, and observability pipeline mostly intact?
Open standards help here. OpenTelemetry became the default instrumentation layer across vendors precisely because teams got tired of being locked into one observability agent. That’s not ideology; it’s operational freedom.
# Minimal OpenTelemetry Collector example (conceptual)
# Vendor-neutral pipeline so you can route telemetry to Datadog, Grafana, or others
receivers:
otlp:
protocols:
grpc:
http:
processors:
batch: {}
exporters:
otlphttp:
endpoint: https://your-observability-endpoint.example
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlphttp]
Table 2: A workload triage checklist you can use in a single meeting
| Workload signal | What it implies | Best target | Red flags |
|---|---|---|---|
| Highly predictable CPU usage | You pay a convenience premium on burstable infrastructure | Owned hardware, colo, or dedicated bare metal | Frequent traffic spikes; unclear SLOs; no capacity planning muscle |
| Heavy data egress to customers/partners | Network costs and contracts matter as much as compute | Edge/CDN focus; consider data locality redesign | Data formats coupled to vendor services; cross-region chatty services |
| Deep dependency on proprietary managed services | Migration cost is mostly engineering time, not hardware | Stay put; carve out new portability layer for future services | “We’ll rewrite later” as the only plan; missing runbooks and ownership |
| Strict latency needs, global users | Placement and edge strategy dominate | Hybrid: cloud regions + CDN/edge (e.g., Cloudflare) | Single-region database; synchronous cross-region writes |
| Security/compliance constraints (data residency, audits) | Controls and evidence matter as much as architecture | Whichever environment your team can prove and operate safely | Shadow infra; unclear access controls; weak key management |
How the best operators run a cloud exit without turning it into a religion
There’s a clean way to do this that doesn’t require theatrics or a rewrite. It looks less like a manifesto and more like procurement + platform engineering working as one team.
- Inventory reality: list the top cost centers by service and by workload. Not “AWS,” but “Aurora + read replicas,” “NAT gateway traffic,” “observability ingest,” “S3 + egress,” “GPU hours,” “CI minutes.”
- Pick one migration with low blast radius: something stateless, observable, and easy to roll back. Prove you can move safely.
- Lock governance before you migrate: budget alerts are not governance. Put guardrails in IAM, tagging policy, and provisioning workflows.
- Renegotiate like an adult: use the fact that you have options. Commitments can be rational if they match usage; they’re poison if they’re used to paper over sprawl.
- Write down your “never again” rules: which proprietary services are allowed, under what conditions, and what the exit plan is.
Notice what’s missing: a grand “we are leaving” announcement. Mature companies don’t announce that they fixed procurement. They just stop bleeding margin.
Key Takeaway
If your cloud exit plan doesn’t reduce organizational entropy—who can provision what, who owns it, and how it gets deleted—you’ll recreate the same cost problem in a different building.
A sharp prediction worth planning around
By the end of 2026, “cloud repatriation” will stop being a headline and become a normal finance-and-platform cycle: workloads will move back and forth as pricing changes, AI accelerators shift between availability zones and vendors, and regulatory requirements tighten. The winners won’t be the teams with the most ideological architecture. They’ll be the teams that can change their mind quickly without breaking production.
Here’s the question to sit with this week: Which part of your stack would you be unable to move in under a year, no matter how badly you needed to? Name it. Then pick the smallest adjacent system you can redesign to make that answer less scary.
That’s repatriation as a capability, not a campaign.