Article
S3 Intelligent Tiering
•
read
What is S3 Intelligent Tiering
Navigating the various tiers of S3 can be challenging, with over 8 tiering options currently available and potentially more to come. Tiering provides cost benefits by aligning with your data access patterns. Some tiers feature higher storage costs but reduced API costs (POST, GET, PUT, etc.), while others offer the inverse pricing structure. Without highly predictable access patterns (which is rare), determining optimal tier placement for objects can be complex and resource-intensive. S3 Intelligent Tiering addresses this challenge by automatically managing tier placement, enabling you to realize storage cost savings:
- Frequent Access tier: This is the default tier for S3 Intelligent Tiering. Your objects start here and there's no additional fee.
- Infrequent Access tier: Objects that haven't been accessed for 30 consecutive days automatically move to this tier, saving about 40% on storage costs.
- Archive Instant Access tier: Objects not accessed for 90 consecutive days move here, saving about 50% on storage costs while maintaining millisecond access.
- Archive Access tier: With optional activation, objects not accessed for 90+ days move here, saving up to 68% on storage with retrieval times of minutes to hours.
- Deep Archive Access tier: Also optional, objects not accessed for 180+ days move to this tier, saving up to 95% with retrieval times of hours.
All of these tier transitions happen automatically without performance impact (up until Archive Access Tier) or operational overhead, and there are no retrieval charges when accessing objects in any tier. When an object is accessed, it transitions back to the Frequent Access Tier and the lifecycle begins again. The up to date pricing can be found here.
Very Important Note: S3 Intelligent Tiering charges a monitoring and automation fee of $0.0025 per 1,000 objects monthly to cover the costs of tracking access patterns and automatically moving objects between storage tiers.
Why S3 Intelligent Tiering Usually Saves Money
Most of the time, S3 Intelligent Tiering will save you money compared to using a single storage class. Here's why:
- Automatic cost optimization without latency impact: For the Infrequent Access and Archive Instant Access tiers, you maintain millisecond access times while reducing storage costs by up to 50%.
- Better returns with larger files: The benefits of Intelligent Tiering increase with larger file sizes because the fixed monitoring cost per object becomes proportionally smaller compared to the storage savings.
- Set-and-forget approach: Once enabled, you don't need to actively manage or predict access patterns, reducing operational overhead while still optimizing costs.
The monitoring fee becomes negligible when your average object size is larger and your data has varying access patterns over time. For many situations, the automatic tiering will yield net savings within the first few billing cycles.
When not to use S3 Intelligent Tiering
S3 Intelligent tiering is great: you get almost automatic savings so why would you ever not use it? There are a few cases where enabling S3 Intelligent tiering can make you lose money:
- You know your access pattern: If your data has a predictable access pattern, you may be better off manually assigning the appropriate storage tier rather than paying the monitoring fee for intelligent tiering.
- Small objects: For very small objects, the monitoring fee ($0.0025 per 1,000 objects) might exceed the storage savings you'd get from tiering. S3 Intelligent Tiering automatically excludes objects smaller than 128KB from monitoring and tiering fees, making it cost-effective for larger files. However, it may not be economical for smaller files that are tiered.
- Short-lived objects: If your objects are only stored for a short period (less than 30 days), they won't benefit from automatic tiering to lower-cost tiers.
- Consistently frequent access: If you access your objects regularly and consistently (never going 30+ days without access), they'll always stay in the Frequent Access tier, meaning you pay the monitoring fee without gaining tiering benefits.
When deciding whether to use S3 Intelligent Tiering, consider your workload's access patterns, object sizes, and retention periods. For unpredictable access patterns with longer-term storage needs, the automatic cost optimization typically outweighs the monitoring fee.
Optimizing for Intelligent Tiering
Even though Intelligent tiering handles most of the overhead on its own, there are some ways to save a bit of extra money on your AWS bill.
Consolidate into Larger Files
Combining smaller files into larger ones is one of the most effective strategies for maximizing S3 Intelligent Tiering benefits. By consolidating files, you significantly reduce the impact of the per-object monitoring fees, which becomes negligible relative to storage costs for larger objects.
For AI and data analytics workloads, consider bundling datasets into larger archives or using formats like Parquet that support efficient columnar storage. This approach not only optimizes costs, but also improves performance by reducing the number of API calls and having the TTFB (Time to First Byte) latency occur less frequently.
Activate Optional Archive Tiers
For comprehensive cost optimization, enable both optional archive tiers within your S3 Intelligent Tiering configuration. By default, S3 Intelligent Tiering only moves objects between the Frequent, Infrequent, and Archive Instant Access tiers.
The Archive Access tier provides up to 68% savings with retrieval times of minutes to hours for data that hasn't been accessed for 90+ days. Meanwhile, the Deep Archive Access tier offers up to 95% savings for data you're willing to wait hours to access after 180+ days of no activity.
These tiers are particularly valuable for backup data, compliance archives, and historical datasets that rarely need retrieval but must be retained for regulatory or business purposes. However, if your use case cannot handle the performance drawbacks that come with the archive tiers, there are other options available.
Implement Strategic S3 Tagging
A well-planned tagging strategy allows you to selectively apply intelligent tiering where it makes the most financial sense. Create lifecycle configurations that target specific tag combinations to control which objects use intelligent tiering versus standard storage classes.
Consider tagging by data importance, project, or expected access patterns. For example, you might tag datasets with access-pattern:variable
for intelligent tiering and access-pattern:archival
for direct placement in Glacier Deep Archive.
Tags can also help with cost allocation and tracking savings from intelligent tiering across different business units, providing visibility into storage optimization efforts.
Leverage Storage Class Analysis
Before fully implementing intelligent tiering across your storage, use AWS Storage Class Analysis to gain insights into your actual usage patterns. This tool helps identify access patterns across your S3 buckets and provides data-driven recommendations for optimal storage classes.
Run analysis for 30-60 days to capture typical usage patterns before making configuration decisions. The findings will help you determine which buckets or prefixes would benefit most from intelligent tiering versus static storage classes.
Storage Class Analysis can also help identify unexpected access patterns that might make certain datasets poor candidates for intelligent tiering.
Combine with Object Lifecycle Policies
For a comprehensive approach to storage optimization, combine intelligent tiering with targeted lifecycle policies. Use lifecycle rules to expire or delete objects that are no longer needed, rather than paying for long-term storage in any tier.
Create transition rules for objects with predictable access patterns while using intelligent tiering for those with variable patterns. This hybrid approach ensures you're using the most cost-effective storage option for each type of data.
For example, you might use lifecycle policies to move log files directly to Glacier after 90 days, while keeping customer data in intelligent tiering to accommodate unpredictable access needs.
Burst Workload Challenges: Why AI/ML Patterns Don't Fit S3 Tiering Models
A common access pattern for S3 is burst usage, where data remains idle for a while and then is accessed very frequently for a short period of time. This burst pattern is particularly common in AI/ML workloads where data is trained on intensively, then left idle. Unfortunately, no current storage class effectively optimizes for this use case.
In the case of burst usage patterns like AI/ML workloads, traditional storage classes struggle to provide optimal cost efficiency:
- Standard S3: While providing immediate access, keeping all data in Standard S3 means paying premium rates even during long idle periods, resulting in unnecessary costs.
- S3 Infrequent Access: This tier has lower storage costs, instead charging increased retrieval fees. During burst access periods, these retrieval costs can quickly accumulate and potentially exceed what you would have paid with Standard storage.
- S3 Intelligent Tiering: While better than static tiers, even Intelligent Tiering has limitations. When objects move to lower-cost tiers and then experience sudden burst access, they must first transition back to the Frequent Access tier, which takes time and can result in a delay before optimal performance is restored.
The ideal solution for burst workloads would be a storage class that combines low storage costs during idle periods with no retrieval fees and instantaneous performance when access patterns surge. Unfortunately, this specific optimization remains a gap in the current S3 storage class offerings.
To mitigate this issue, organizations often implement custom solutions like maintaining a "hot" copy of data in Standard S3 during anticipated burst periods while keeping a "cold" copy in lower-cost tiers for long-term retention, effectively creating their own hybrid approach to handle these cyclical access patterns.
However managed solutions are often a better choice than implementing the solution on your own since the S3 savings you will likely not outweigh developer costs. Companies like Archil implement a managed S3 cache which will pull an object into the cache when accessed during a workload, and read and write to the cache to reduce API fees to S3. At the end of the workload, it will write the cached data back to S3 so the customer will only pay for one read and write per object rather than for multiple updates that would be normal for these kinds of workloads.
Smart S3 Storage Optimization: Key Takeaways
S3 Intelligent Tiering automatically moves data between storage tiers based on access patterns, eliminating manual management while saving up to 95% on storage costs. Though not ideal for small objects or consistently accessed data, it benefits most workloads with variable access patterns.
To maximize savings: consolidate small files, enable archive tiers, implement strategic tagging, and complement with lifecycle policies. For specialized burst workloads like AI/ML training, the current S3 tiering system has inherent limitations.
Third-party caching solutions like Archil have emerged to address these limitations by intelligently buffering data during intensive processing periods, reducing API costs, and optimizing storage tier transitions. These specialized services can provide significant cost advantages for cyclical workloads without requiring complex custom implementations, making them worth evaluating as part of a comprehensive S3 optimization strategy.
Authors