EC2 Spot vs On-Demand: Real Savings, Interruption Risks & When to Use Each
AWS EC2 Spot Instances offer some of the most aggressive discounts in cloud computing — typically 60–90% below On-Demand rates. The catch: AWS can reclaim Spot capacity with 2 minutes' notice when they need it back.
For the right workloads, that tradeoff is a no-brainer. For the wrong ones, it's a production outage waiting to happen. Here's how to tell the difference.
What Is Spot Pricing?
AWS maintains large pools of EC2 capacity. At any moment, some of that capacity sits idle — reserved for On-Demand demand spikes that haven't materialized yet. Rather than let it sit empty, AWS auctions it off at deep discounts: Spot Instances.
The price fluctuates based on supply and demand in each Availability Zone. When AWS needs the capacity back (for On-Demand demand), they terminate Spot Instances with a 2-minute interruption notice.
AWS Spot pricing is no longer a pure auction. Since 2017, AWS uses a flat price model — you pay the current market price for the instance type, not your bid. Prices change gradually rather than spiking dramatically. You set a maximum price (default: On-Demand rate), and your instance runs as long as the market price stays below your max.
Real Savings Numbers
Let's look at actual prices. All data from CloudBench, us-east-1:
| Instance | On-Demand/hr | Spot/hr | Savings | Monthly OD | Monthly Spot |
|---|---|---|---|---|---|
| t3.micro | $0.0104 | ~$0.0031 | 70% | $7.49 | ~$2.23 |
| t3.small | $0.0208 | ~$0.0063 | 70% | $14.98 | ~$4.54 |
| m5.large | $0.096 | ~$0.029 | 70% | $69.12 | ~$20.88 |
| c6i.4xlarge | $0.680 | ~$0.204 | 70% | $489.60 | ~$146.88 |
| r5.2xlarge | $0.504 | ~$0.151 | 70% | $362.88 | ~$108.72 |
A cluster of 10× c6i.4xlarge instances on Spot vs On-Demand saves roughly $3,427/month — over $41,000/year. For ML training jobs or data pipelines that can tolerate interruption, this is transformative.
Use CloudBench to see live Spot prices for any instance type across all regions. Spot prices in eu-north-1 are often even lower than us-east-1 for many instance types.
Interruption Risk: The Real Numbers
The interruption risk is the reason Spot is priced so low. But how often does it actually happen?
| Interruption Frequency | % of Spot Instance-Hours |
|---|---|
| <5% of the time | ~95% of instances |
| 5–10% of the time | ~3% of instances |
| >10% of the time | ~2% of instances |
For popular instance types (t3, m5, c6i) in high-capacity regions (us-east-1, eu-west-1), real-world interruption rates are often below 1%. The higher-risk scenarios are niche instance types in small AZs during capacity crunches.
When AWS reclaims a Spot Instance, they send an EC2 Spot Instance interruption notice 2 minutes before termination via the instance metadata service and EventBridge. A well-architected system catches this notice and begins graceful shutdown — draining connections, checkpointing state, and deregistering from load balancers — within that window.
Cheapest Spot Regions
Spot prices vary dramatically by region. eu-north-1 (Stockholm) and us-east-1 (N. Virginia) are consistently among the cheapest, largely because they have high capacity relative to demand.
| Region | t3.micro Spot | m5.large Spot | Notes |
|---|---|---|---|
| eu-north-1 | ~$0.0022/hr | ~$0.016/hr | Often cheapest overall |
| us-east-1 | ~$0.0031/hr | ~$0.029/hr | High capacity, low interruption |
| eu-west-1 | ~$0.0035/hr | ~$0.031/hr | Good alternative to us-east-1 |
| us-west-2 | ~$0.0037/hr | ~$0.032/hr | ML workloads common here |
| ap-southeast-1 | ~$0.0040/hr | ~$0.038/hr | Higher demand, higher prices |
Stockholm has excellent infrastructure, GDPR-compliant data residency for EU workloads, and consistently low Spot prices. If your users are in Europe and your workload is flexible, eu-north-1 Spot is one of the best cost-performance combinations in AWS.
When to Use Spot Instances
✅ Batch processing and data pipelines
ETL jobs, log processing, data transformation — anything that processes chunks of data and can be restarted from a checkpoint. Spot is ideal here. If a node is interrupted mid-job, restart the chunk from the last checkpoint. The 60–90% savings justify the extra engineering.
✅ CI/CD build runners
Your build queue is not time-critical to the second. A build runner that gets interrupted simply requeues the job. GitHub Actions, GitLab CI, Jenkins, and BuildKite all support Spot-backed runners. This is one of the highest-ROI Spot use cases — build minutes can be your largest EC2 cost at scale.
✅ Machine learning training
ML training jobs can checkpoint model weights to S3 at regular intervals. If a Spot GPU instance is interrupted, resume from the last checkpoint on a new Spot instance. This is how many ML teams run billion-parameter model training at a fraction of On-Demand cost.
✅ Development and staging environments
Dev environments don't need 99.99% uptime. A developer whose Spot dev server gets interrupted just re-launches it — usually with less disruption than a coffee break. At $2.23/month for a t3.micro Spot, equipping 10 developers costs less than $25/month.
✅ Stateless web tier with Auto Scaling
Stateless web servers behind a load balancer are perfect Spot candidates. When a node is interrupted, the load balancer stops sending traffic and ASG replaces it. Users see nothing. Configure your ASG to use a mix of instance types and use capacity-optimized allocation strategy to minimize interruptions.
✅ Video transcoding and media processing
Encoding jobs are inherently parallelizable and restartable. AWS MediaConvert supports Spot-backed transcoding natively.
When NOT to Use Spot Instances
❌ Primary databases
Never run your primary PostgreSQL, MySQL, or any write-leader on Spot. An abrupt 2-minute shutdown with active write transactions is a recipe for data loss or corruption. Use On-Demand or Reserved Instances for database primaries, always.
❌ Stateful services that can't checkpoint
If your service holds in-memory state that can't be recovered from an external store, Spot is the wrong choice. This includes legacy monoliths that load data into memory at startup, or services with long-running WebSocket connections.
❌ Third-party software with license servers
Some commercial software licenses are node-locked or have limited concurrent activations. An interrupted Spot instance may leave a license "stuck" until it times out. Check your license terms before running on Spot.
❌ Anything that can't tolerate even 2 minutes of downtime
Payment processors, real-time trading systems, or any service with SLA penalties for >1-minute outages should run On-Demand with Multi-AZ redundancy. Spot's interruption risk is probabilistic, not guaranteed — but "probabilistic" doesn't mean "acceptable" for all workloads.
Architecting for Spot: 5 Patterns That Work
1. Diversify instance types and AZs
Never rely on a single instance type. Configure your ASG with 5–10 instance types of similar size (e.g., m5.large, m5a.large, m4.large, m5d.large). When capacity for one type gets tight, the ASG uses another. More instance types = lower interruption probability.
2. Use capacity-optimized allocation strategy
In your ASG's Spot allocation strategy, use capacity-optimized rather than lowest-price. This prioritizes the Spot pools with the most available capacity, which directly correlates with lower interruption rates. The price difference is usually small.
3. Listen for interruption notices
Poll the instance metadata endpoint every 5 seconds:
curl -s http://169.254.169.254/latest/meta-data/spot/interruption-action
When it returns terminate, begin graceful shutdown immediately: drain connections, deregister from the load balancer, flush writes to S3/RDS, and notify your monitoring system.
4. Checkpoint frequently
For batch jobs: checkpoint to S3 or DynamoDB every 5–15 minutes. Store the checkpoint key in a queue message. On restart, pull the message, resume from the checkpoint. The cost of checkpointing is negligible; the cost of reprocessing from scratch is not.
5. Maintain a small On-Demand baseline
For web tiers: run 20–30% of your baseline capacity On-Demand, and fill the rest with Spot. This ensures you always have some capacity even during a broad Spot reclamation event. ASG's base capacity option handles this automatically.
Decision Framework: Spot or On-Demand?
| Criteria | Spot | On-Demand |
|---|---|---|
| Can the workload checkpoint and restart? | ✓ → Use Spot | — |
| Stateless (no local state)? | ✓ → Use Spot | — |
| Is this a primary database? | — | ✓ → Use OD |
| Does 2-min downtime violate your SLA? | — | ✓ → Use OD |
| Dev/staging/CI environment? | ✓ → Use Spot | — |
| Batch, ML training, media processing? | ✓ → Use Spot | — |
| Holds in-memory state that can't be recovered? | — | ✓ → Use OD |
| Long-running stateful connections (WebSocket)? | — | ✓ → Use OD |
Most production systems benefit from a mixed fleet: On-Demand for stateful, critical, or license-constrained workloads — Spot for stateless, batch, and dev layers. Savings Plans cover your On-Demand baseline. Spot handles the variable layer. See our Savings Plans guide for how to layer these discounts.
Find the Best Spot Prices
CloudBench tracks live Spot and On-Demand prices for 600+ instance types across all AWS regions. See exactly how much you can save before you commit.
Compare Spot Prices →