Article

Stop Syncing and Start Mounting: A Smarter Alternative to aws s3 sync

•

read

Spread the word

If you’re still relying on aws s3 sync to move data between your local directory and Amazon S3, you’re not alone. It’s one of the first tools engineers use with the AWS Ccommand Lline Iinterface—simple and familiar, yet effective for basic file transfers. But as your datasets grow, the limitations of syncing start to show. Repetitive transfers, mounting latency, and fragile CLI workarounds turn what should be a solved problem into a persistent bottleneck.

Why Everyone Starts with aws s3 sync

For most engineers, aws s3 sync is the go-to command when you need to move files to or from Amazon S3. It's part of the AWS CLI toolkit and feels intuitive, —especially for those familiar with Unix-like sync or recursive copy patterns. The command does exactly what it promises: it syncs a local directory with an S3 bucket or vice versa, copying new and updated files and skipping the rest.

Here’s a simple example:

bash CopyEdit aws s3 sync ./data s3://my-bucket/data

This command syncs the contents of your current local directory to the specified bucket. You can also use options to control behavior, —like --delete to remove files in the destination that no longer exist locally, or --exclude to skip specific files. In theory, this gives you a lightweight and, reproducible way to manage object storage.

For small projects, creating backups, or one-off transfers, aws s3 sync gets the job done. It supports basic flags like --exact-timestamps, --acl bucket-owner-full-control, and lets you override the command’s default URL or configure output formats via the cli binary format setting. If your workflow only deals with a handful of files and minimal metadata, syncing may feel like the simplest solution.

But as we’ll see in the next section,—simplicity comes at a cost.

But Syncing Has Serious Limitations

At first glance, aws s3 sync seems like a powerful tool, —but the deeper your data infrastructure goes, the more cracks start to show. What works for simple file uploads or basic backups doesn’t scale well when you're dealing with constantly changing datasets, large directories, or data-intensive workflows.

Performance Bottlenecks for Large Datasets

Every time you run the sync command, the AWS CLI compares your local directory to the contents of the target bucket. This works fine for a dozen files. —Bbut for thousands (or millions), the overhead compounds quickly. Since the sync operation is batch-based, there's no concept of continuous access or low-latency interaction. It’s always check, then copy.

And because the sync command syncs files by checking for differences in metadata, edge cases involving object metadata, server side encryption, or metadata default settings can introduce inconsistencies or false positives.

“Syncing can quietly become a performance tax—especially when it starts competing with compute jobs for bandwidth or time.”

—Archil Engineering Lead, Harrison Leath

Operational Complexity Grows Quickly

The moment your project outgrows its first bucket, you're juggling flags. You’re specifying file types to exclude (--exclude '*.tmp'), adding ACLs (--acl bucket-owner-full-control), managing delete behavior (--delete), and ensuring the cli binary format setting is properly configured. That’s before you even get into pagination controls like --no-paginate or disable cli pager, which are often required for automation.

In larger teams, someone inevitably runs a sync that deletes the wrong files or fails silently,—especially when using tools that don’t surface command output clearly. The result: created files vanish, backups overwrite production data, or permissions are mishandled.

Risk of Data Loss or Duplication

One of the most dangerous pitfalls of sync is the --delete flag. Used improperly, it can remove files in the destination that don’t exist in the source, —even if those files were placed there intentionally by another service or team. There’s also no built-in way to simulate or dry-run this behavior for verification, unless you use flags like --exact-timestamps or try to script around cli input parameters.

And since aws s3 sync doesn't offer transactional guarantees, sync interruptions (due to timeouts, SSL verification errors, or maximum socket connect time issues) can leave your storage layer in a half-complete state.

Common Sync Pain Points Devs Deal With

Beyond performance and reliability, aws s3 sync introduces a surprising amount of low-level friction,—especially once your usage goes beyond basic file uploads. These aren’t edge cases. They're the daily reality for developers managing data pipelines and multi-environment deployments.

Here are some common frustrations that come up again and again:

Lack of visibility before execution: Unless you write wrapper scripts or simulate actions with careful filters, it's hard to preview what the sync operation will do. The sync command syncs objects in bulk, often with minimal feedback in the command output.
Fragile filtering and exclusions: Syncing only the files you need can require a patchwork of -exclude, -include, and -exact-timestamps. Miss one, and you might accidentally upload one or more files that shouldn’t be there, —or delete ones that should.
Inconsistent metadata handling: Sync doesn’t always preserve object metadata, caching behavior, or server side encryption settings unless explicitly specified using flags like -metadata-directive or -sse. This can lead to subtle bugs or downstream failures.
Error-prone CLI ergonomics: Misconfigured defaults like the cli binary format setting or automatic behaviors such as disable automatically prompt and disable cli pager can trip up automation scripts. Many engineers run into vague errors like cancel save preferences unable or overly verbose logs unless they fine-tune only show errors flags.
Lack of symbolic link support: If your local file structure uses symlinks—a common pattern in ML projects—aws s3 sync ignores them. This can silently break expectations about file presence or path dependencies.

If your sync operation involves repeated overrides or excessive debug logging, you're probably using the wrong abstraction for the job.

When Syncing Makes Sense

Despite its drawbacks, aws s3 sync still has a place in modern cloud workflows, —especially when simplicity and portability matter more than speed or scale. For some use cases, syncing is exactly the right tool.

Here’s where the sync command shines:

One-time file transfers: Migrating static assets, such as .txt upload files or archived logs, from a local directory to a bucket.
Basic backups: Creating point-in-time copies of the current local directory, —especially in CI pipelines or developer environments.
Cold storage pushes: Sending infrequently accessed files to S3 storage tiers using the AWS Ccommand Lline Iinterface without needing real-time access.
Air-gapped workflows: When the source and target systems can’t remain continuously connected, sync provides a transactional model that’s easy to reason about.
Team-wide scripting: When you want a single, portable command that just works across machines with aws cli installed.

If you're running a small-scale workload, managing a few thousand files, or just need a way to dump files matchingcertain patterns to a specified bucket, syncing may be all you need.

However, once you step into dynamic workflows,—especially where you're repeatedly syncing files that already exist, overriding metadata, or working with large datasets, —the operational tax adds up quickly.

What Mounting Offers That Syncing Doesn’t

If syncing feels increasingly brittle as your workloads scale, it's because it was never meant to behave like a file system. It's a copy tool—nothing more. Mounting, on the other hand, fundamentally changes how you interact with cloud storage by treating S3 like a live, navigable, and responsive file system.

Here’s what mounting unlocks:

Real-Time Access to S3 Files

Mounted volumes let you access files on-demand, without having to re-transfer, re-sync, or manage version state. No more “sync or recursive copy” loops or scripting around which new and updated files were changed. Files are simply... there,—just like in a traditional local filesystem.

This approach eliminates sync drift entirely. If a file exists in the bucket, it’s accessible immediately from your local current directory,—without triggering a background transfer job or reconciling timestamps.

POSIX Compatibility

Archil’s mounted volumes behave like a standard Linux-compatible file system. That means:

You can use standard Unix commands (cat, cp, ls) without worrying about sync flags
Your code and pipelines don’t need to change
You get proper file semantics, including support for symbolic links, permission flags, and predictable behavior with large file trees

This is especially powerful for teams working with legacy tools or batch-processing systems that assume files are on-disk—not in a remote object store.

Zero Data Movement

Unlike sync, which always copies data between endpoints, a mounted volume streams files directly from S3. This means:

No duplicate storage usage from created files in both source and destination
No re-uploading the same files because of minor metadata changes
No confusion around delete output, sync operation, or whether the specified command used the correct flags

The mount behaves like a window into the bucket—, not a mirror of it.

“Once you stop thinking in terms of file transfer and start thinking in terms of file presence, everything gets faster and simpler.”

— Archil Founder, Hunter Leath

Archil in Action: Mounting S3 the Right Way

Archil replaces the repetitive, error-prone nature of aws s3 sync with a single, declarative step: mount your S3 bucket as if it were a local file system. No data gets copied unless it’s accessed. No scripting gymnastics. No sync loops.

Here’s what it looks like in practice:

bash CopyEdit archil mount s3://my-bucket ~/data

This command creates a live, POSIX-compliant view of your bucket at ~/data. Files appear in your local directory structure instantly. There’s no sync command constantly rechecking for new and updated files, no concern about cli binary format setting, and no risk of conflicting states caused by multiple tools trying to “fix” the same folder.

Archil also respects S3's native configurations, —whether you’re using access point aliases, server side encryption, or bucket owner full control ACLs. Under the hood, it uses a custom caching and streaming protocol to deliver sub-second access to S3 objects, even in large-scale environments.

And because Archil is built for Unix-style environments, it works seamlessly with common shell tools, data pipelines, and AI/ML workflows, —without requiring you to rewire everything just to avoid syncing.

Better Performance, Lower Costs

Every time you run aws s3 sync, you’re not just moving data—you’re paying for it. Syncing large directories leads to increased storage usage, more API calls, and duplicated data transfer, especially when working with S3 across multiple environments. Even small mistakes,—like resyncing files due to changed object metadata or forgotten flags,—can rack up real costs.

Archil avoids this entirely. By mounting the bucket as a live file system, it fetches only what you need, when you need it, —without copying files unnecessarily. There’s no storage duplication, no wasted transfer bandwidth, and no extra compute spent scanning the current local directory for differences.

No More Paying to Re-Sync the Same Files

The savings go beyond just dollars:

You avoid delays caused by excessive sync command syncs and S3 rate limits
You don’t need to script around maximum socket connect time or retry logic
You reduce the risk of incurring charges from re-uploading the same txt upload or deleting files with the wrong flag

For teams working with large models, real-time analytics, or high-churn datasets, sync becomes an unnecessary tax. Mounting replaces all of that with something faster—and cheaper.

Better Performance, Lower Costs

Archil avoids this entirely. By mounting the bucket as a live file system, it fetches only what you need, when you need it,—without copying files unnecessarily. There’s no storage duplication, no wasted transfer bandwidth, and no extra compute spent scanning the current local directory for differences.

The savings go beyond just dollars:

You avoid delays caused by excessive sync command syncs and S3 rate limits
You don’t need to script around maximum socket connect time or retry logic
You reduce the risk of incurring charges from re-uploading the same txt upload or deleting files with the wrong flag

“One of our customers reduced their monthly S3 I/O costs by 78% just by swapping out their daily sync jobs for a single Archil mount.”

— Archil Founder, Hunter Leath

For teams working with large models, real-time analytics, or high-churn datasets, sync becomes an unnecessary tax. Mounting replaces all of that with something faster—and cheaper.

When to Still Use `aws s3 sync`

Despite its limitations, aws s3 sync isn’t going away anytime soon. —and it It still makes sense in a few specific scenarios.

The Right Tool for the Right Moment

Here’s when sync might be the better option:

One-time data pushes from a source directory during setup or migration
Backup jobs that run overnight and require simple diffs of uploaded files
Air-gapped workflows where mounting isn’t possible or supported
Static archiving of folders using txt delete, exclude files, or simple inclusion patterns

If your use case doesn’t require low latency or live access.—aAnd if you're comfortable parsing through the aws cli user guide and fine-tuning configured cli binary format flags—sync can still serve you well.

In these cases, the following sync command might still be all you need:

bash CopyEdit aws s3 sync ./backup s3://my-archive-bucket --delete --exclude "*.tmp"

But It Doesn’t Scale

The more often you run syncs, the more you're exposed to fragility: inconsistent metadata handling, unpredictable file states, and CLI flags that break silently when a default value or only valid value is misapplied. Sync wasn't built to deliver fast iteration, real-time access, or high availability.

For teams dealing with fast-moving data or pipelines that need to function properly at scale, mounting simply removes the guesswork.

Final Thoughts: Don’t Sync When You Can Mount

The aws s3 sync command served its purpose, —it gave giving developers a simple way to push files to S3 from a local machine. But cloud workloads have evolved. Teams now need real-time file access, predictable performance, and fewer brittle scripts. Mounting S3 directly offers a better foundation.

With Archil, you avoid the complexity, fragility, and cost of traditional sync-based workflows. Here'’s what you no longer have to worry about:

Overriding the command'’s default URL or tweaking the aws configure command
Troubleshooting unclear file transfer progress or broken automation flows
Managing default format, credential file, access key id, or secret access key
Handling permissions and ownership via account's canonical id and granted permissions
Remembering to verify SSL certificates or disable automatic pagination in CI/CD pipelines
Accidentally breaking encryption settings like specifies server side encryption or customer provided encryption key
Losing or duplicating metadata from misused metadata directive value or specifies caching behavior
Breaking pipelines due to mishandled symlinks (support symbolic links) or unix like quotation rules
Introducing noise by forgetting to store essential cookies, collect anonymous statistics, or save cookie preferences
Digging through the AWS Console footer to debug a silent failure in your sync job

“We built Archil so you can interact with S3 like it’s part of your machine—not a remote object you have to constantly chase down.”

— Archil Founder, Hunter Leath

The takeaway: if your infrastructure depends on syncing, it's probably time to switch to something built for scale, speed, and sanity.

Try It Yourself

Ready to stop syncing and start mounting?

Archil lets you turn any S3 bucket into a high-performance, POSIX-compliant file system,—without changing your application code, cloud environment, or folder structure.

Get instant access to your S3 data with a single mount command
Reduce AWS costs by eliminating unnecessary sync jobs
Streamline dev, ML, and analytics workflows with real-time, on-demand access

👉 **Get started with Archil now,** and make your data infrastructure simpler, faster, and future-proof.

Authors

Stop Syncing and Start Mounting: A Smarter Alternative to aws s3 sync

Why Everyone Starts with aws s3 sync

But Syncing Has Serious Limitations

Performance Bottlenecks for Large Datasets

Operational Complexity Grows Quickly

Risk of Data Loss or Duplication

Common Sync Pain Points Devs Deal With

When Syncing Makes Sense

What Mounting Offers That Syncing Doesn’t

Real-Time Access to S3 Files

POSIX Compatibility

Zero Data Movement

Archil in Action: Mounting S3 the Right Way

Better Performance, Lower Costs

No More Paying to Re-Sync the Same Files

Better Performance, Lower Costs

When to Still Use aws s3 sync

The Right Tool for the Right Moment

But It Doesn’t Scale

Final Thoughts: Don’t Sync When You Can Mount

Try It Yourself

When to Still Use `aws s3 sync`