Article

Cache Layers vs Storage Classes for Performance

Spread the word

Improving S3 Performance Through Caching vs. Storage Classes

In November 2023, Amazon S3 Express One Zone became generally available, designed for AWS customers requiring ultra-high-performance S3 storage. This new storage class targets the most demanding workloads that exceed the performance capabilities of S3 Standard. Although S3 Express One Zone provides persistent storage, its use cases significantly overlap with traditional S3 caching scenarios, raising an important question: Does this development make S3 caching obsolete?

S3 Storage Class Performance

  • Speed: Delivers 10x faster performance than S3 Standard
  • Scale: Supports up to 2 million reads and 200,000 writes per second
  • Transaction Costs: Read and write operations cost only 25% of what they do with S3 Standard
  • S3 API Interface: Uses the same familiar interface as any standard S3 bucket

It's clear from the improvements of S3 Express One Zone that it's targeting extremely high TPS workloads. This extreme performance comes at a cost—both financially and in terms of durability—which must be carefully considered when choosing it for high-performance applications.

  • Storage Costs: Storage is 3-4x more expensive than S3 Standard
  • Redundancy: Data is stored only within one AZ (hence the name "One Zone"), making it less durable than S3 Standard
  • Network Dependency: Performance benefits are most significant when your compute resources are in the same AZ as your storage; these advantages diminish considerably when they're not co-located

The increased performance has some tradeoffs and has its best benefits when in the same AZ. If your compute is hosted globally, then the benefits of Express One Zone dwindle as performance decreases. If the tradeoffs aren’t too much, you can get started with Express One Zone with the official AWS documentation.

S3 Cache Layers

The type of cache layer you choose depends on your specific use case. You might need a high-performance cache for a particular workload or a global cache to reduce latency for geographically distant requests. Fortunately, several options are available.

CloudFront CDN

CloudFront CDN is Amazon's content delivery network service that can be used to cache S3 objects at edge locations around the world.

  • Functions as a standard CDN to accelerate content delivery: CloudFront caches content at edge locations closer to end users, minimizing data travel distance.
  • Reduces latency to single-digit milliseconds: By positioning content at strategic edge locations, CloudFront delivers data significantly faster than direct S3 access.
  • Particularly valuable for high-traffic, globally distributed users: CloudFront's distributed architecture excels with global audiences, efficiently handling large request volumes while maintaining performance. Performance benefits decrease when users are located near your servers.
  • Offers limited advantages when compute resources and users share location: When your servers and users are in the same region or availability zone, CloudFront's benefits are reduced since network distance is already minimal.

While Amazon S3 Express One Zone and CloudFront CDN both can operate as an S3 cache, the overlap in use cases where they’re effective is quite small**.** A CDN generally helps to globally scale your infrastructure while increasing security, whereas Amazon S3 Express One Zone is designed for high-performance local workloads.

Archil

Archil is a third-party solution designed specifically for high-performance S3 caching, offering similar performance to Amazon S3 Express One Zone. It functions as a POSIX-compliant S3 cache for applications requiring both performance and flexibility. Archil mounts as file storage on your EC2 instance, caching data from S3 and operating as a write-back cache—consolidating multiple writes into one to reduce costs.

Just like Amazon S3 Express One Zone, Archil specializes in demanding workloads that require high TPS and extremely low latency. However, Archil offers several distinct advantages:

  • POSIX Compliance: Allows developers to interact with S3 as a POSIX-compliant file system, reducing code changes needed for the cache
  • **Pay-As-You-Use Model:** S3 Express One Zone ****is persistent storage, meaning you pay the storage premium continuously. Archil, as a true cache, evicts data after the TTL expires, saving costs on rarely accessed data

Archil offers a more dynamic performance solution, allowing you to quickly move S3 objects in and out of your cache. While Archil caches data only when it's accessed, S3 Express One Zone is a persistent storage layer that requires data to be pre-stored in that tier to gain performance benefits. If you don't know which S3 files you'll need performant access to, or can't justify the cost of moving your entire bucket to S3 Express One Zone, Archil is a great alternative.

Other S3 Caching Solutions

Beyond Amazon S3 Express One Zone, few managed high-performance solutions effectively accelerate S3 for demanding workloads. Alternative AWS caching options exist, but have significant limitations:

  • **ElastiCache/Redis:** Functions effectively as a cache, but performance deteriorates and costs rise as object sizes increase, typically becoming impractical for objects larger than 128 MB
  • **MinIO:** Operates as an S3-compatible object storage system that can be self-hosted for performance comparable to Amazon S3 Express One Zone, but requires manual infrastructure management and lacks seamless AWS service integration. These performance gains are also only realized if your compute is on-prem along with MinIO; otherwise, the network latency will ruin performance
  • Compute-adjacent EBS volumes: While not technically a cache, high-performance instances with attached EBS volumes can function as an improvised cache for frequently accessed data, though this approach demands custom implementation

S3 Cache or Storage Class, Who Wins?

Choosing between S3 Express One Zone and a dedicated cache layer depends on your specific requirements. S3 Express One Zone works best for workloads requiring consistently warm data access. In contrast, Archil excels when data access patterns are unpredictable, allowing for cost-effective eviction and storage of less frequently accessed items. While the market offers few managed S3 performance-enhancing solutions, both S3 Express One Zone and Archil provide reliable, resilient services for handling the most demanding workloads.

Authors