@davidgu: we had an incident because we ...
@davidgu
16 views
Jun 03, 2026
Advertisement
2
s3 partitions are AWS’s dirty little hack
Last year, S3’s partitioning system took down our production service.
S3 Partitions put a hard-limit on the amount of traffic your bucket can serve.
they are totally opaque, can change unexpectedly and are impossible to configure (unless you call AWS support!)
Last year, S3’s partitioning system took down our production service.
S3 Partitions put a hard-limit on the amount of traffic your bucket can serve.
they are totally opaque, can change unexpectedly and are impossible to configure (unless you call AWS support!)
3
S3 feels like magic. Bottomless storage, instant retrieval and infinitely scalable.
Turns out that it’s just a massive fleet of servers with real capacity limits and those servers 5xx when overloaded.
We run millions of ec2 instances a day and regularly see S3 errors in our logs.
Turns out that it’s just a massive fleet of servers with real capacity limits and those servers 5xx when overloaded.
We run millions of ec2 instances a day and regularly see S3 errors in our logs.
4
Last year had downtime because we overwhelmed S3 server capacity.
Triggered during a migration to a new S3 bucket used for production data.
During the post-mortem we discovered the obscure S3 quirk that brought down our service.
Triggered during a migration to a new S3 bucket used for production data.
During the post-mortem we discovered the obscure S3 quirk that brought down our service.
5
Straight from the AWS docs: “Your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per partitioned Amazon S3 prefix”.
We shard our data across extremely granular S3 prefixes, so it’s impossible for us to hit these limits.
Confused by this apparent contradiction we investigated deeper.
We shard our data across extremely granular S3 prefixes, so it’s impossible for us to hit these limits.
Confused by this apparent contradiction we investigated deeper.
6
We found that S3 partitioned prefixes are NOT the same as S3 prefixes.
If you create an A/B/C prefix structure, the S3 service can dynamically choose to partition your data by A/B, or even just A!
And their load balancer will distribute load evenly across S3 partitions (not prefixes!).
If you create an A/B/C prefix structure, the S3 service can dynamically choose to partition your data by A/B, or even just A!
And their load balancer will distribute load evenly across S3 partitions (not prefixes!).
7
S3 has an invisible background job that analyzes the structure for your prefixes, access patterns and dynamically selects the partition strategy on a per-bucket basis.
Even crazier is that you can call AWS support and tell them to “pre-partition” your bucket!
Even crazier is that you can call AWS support and tell them to “pre-partition” your bucket!
8
The previous bucket had intelligently adapted its partitioning to our specific traffic pattern.
The new bucket had zero history and its default partitioning was a terrible fit for our workload.
Fun fact: when an S3 server is overloaded it will return a 503 Slow Down
The new bucket had zero history and its default partitioning was a terrible fit for our workload.
Fun fact: when an S3 server is overloaded it will return a 503 Slow Down

