Blog post

Archil: from a file system to a data company

June 2, 2025

•

6 min

read

6 min

read

•

June 2, 2025

Authors

Hunter Leath

Founder and CEO

Spread the word

Hey there, I’m Hunter, the founder of Archil (you might remember us from our HN post as Regatta Storage last Fall). Archil is building the next foundational primitive for cloud storage. Archil replaces services like EBS or Hyperdisk with volume storage that’s infinite, shareable between instances, and automatically connects to massive data sets in S3. Sign up for the wailtlist here.

We’ve spent the last 6 months, heads down, making this vision a reality. Today, we’re thrilled to announce that we’ve raised $6.7M in seed funding led by Felicis, with participation from great investors like Y Combinator, Peak XV, Wayfinder, General Catalyst, Lombardstreet Ventures, and Twenty Two Ventures, as well as angels including Theo Browne (T3 chat), Wayne Duso (formerly Amazon), Erik Bernhardsson (Modal Labs), Ryan Worl (Warpstream), and Amit Gupta (formerly Benchling). Building foundational infrastructure is really difficult, and we’re grateful to have investors who share our vision for the journey.

We’re excited about how this funding will improve our ability to deliver a meaningful improvement for cloud applications. But, instead of focusing on the round, I want to talk about what we’re building, why it matters, and where we’re headed.

What’s wrong with cloud storage?

There’s a very simple problem with every cloud provider out there today, and it looks like this:

When you go to start a new server in AWS, Azure, or GCP, every cloud provider hits you with the same question. How much storage do you want to attach to this server? Frankly, it doesn’t make sense that we still have to answer this question in the era of cloud computing, nearly 20 years after this text box first appeared inside of EC2.

Why are we asked to guess how much storage we need before launching an instance? Why should anyone have to learn what an IOPS is, or pay for unused space, or manually figure out how to transfer data to the new instance?

Things get even harder if you’re using containers with something like Kubernetes, because now you (by default) lose your data during a process restart. How do you deploy a 200 GiB model to a Kubernetes pod? Do you slow down starts by baking it in the image or putting in S3?

I’ve been obsessed with solving this problem for nearly 10 years. I spent 8 years trying to fix it from inside the cloud, at Amazon’s Elastic File System, and then I spent 1 year outside the cloud, at Netflix, trying to understand how teams actually approach these questions.

Today, developers have to take time away from building features that matter for their users to worry about low-level data placement questions. They have to reinvent the same brittle synchronization stack — chunking, checkpointing, and pushing data to S3, while dodging cold start penalties. Building this well is so hard that entire companies, like Databricks, Snowflake, Motherduck, and Clickhouse are built around this core technology as a moat. But in the era of GPUs, model weights, and training data sets, the data is only getting bigger, and the synchronization pipelines more complex.

If you ask application developers what they really want, they’ll say they don’t want to spend time worrying about this complexity at all. Let me show you what we ask when you create an Archil volume, to give you a sense of the difference.

We ask you for a name, a cloud region, and the location where the data should come from. That’s it. When you start your instance and mount the Archil volume,

sudo archil mount $VOLUME_NAME

you get a POSIX-compatible file system (so you can write to it!), mountable on more than one instance, that’s instantly populated with the data from your bucket. To ensure that our users get great performance, we’ve written a blazing-fast custom data protocol (not NFS!), and we run a ton of NVME SSDs for durably caching reads and writes before they are propagated back to your S3 bucket.

With Archil volumes, you don’t have to worry about data infrastructure because all of your data, is always on all of your instances, instantly — all for 90% lower costs than just using EBS and 30x lower latencies than using S3 directly.

We think this is an exciting step forward for developers who no longer need to worry about how they move data around, for large organizations who are running older POSIX-based software, and for teams who want to reduce their cloud costs. However, since it’s 2025, I also need to talk about AI. In the past few months, Replit, Supabase, and Neon have shown us a future where agents drive the majority of new application development. These agents prefer to use infrastructure that’s cloud-agnostic, easy to launch, and scales to zero. However, until now, the only kind of infrastructure that you could buy this way was databases. Fly has found that these agents really want to use POSIX file systems, and Archil’s cloud-agnostic, infinite, pay-as-you-go volumes finally fill this gap. If you’re a platform for building agentic applications or a developer struggling with this problem, we’d love to chat about how we can make things easier for you.

We aren’t a storage company.

Over the next year, we’ll accomplish our goal of becoming the highest performance storage in the cloud by combining our local-like custom storage protocol with Lustre-like scale-out performance for HPC / AI training workloads. However, we think that describing ourselves as “storage” is limiting. Traditionally, “storage” products do two jobs: they take bytes from the user, and then give them back when asked. This means that the only way that storage products differ is in performance: how can we get those bytes faster?

At Archil, we think storage is an amazing point in the stack for leverage. We can unlock a 10x better cloud developer experience by taking a different approach: pushing as much work into the underlying “storage” layer as possible, including

Connectivity. Storage doesn’t exist without data, and it’s not useful to have a great storage system if you can’t even access the data you need. This is why Archil volumes connect directly to S3 buckets to provide local-like, instant access to massive data sets, but data lives in more places than S3. Hugging Face models, Github repositories, internal data lakes, SFTP servers, external data products, and data warehouses all contain critical application data. But today, we expect developers to do all of the work of moving the data around when they need it. Archil will become the clearinghouse for data, and Archil volumes will connect to any data source — instantly manifesting it on instance, as if the data were local.

Compute. Generally, once you’re connected to your data, you’ll want to do something with it. Sometimes this is interactive — you’re returning data to a user or performing some kind of edit. Frequently, though, you’re just transforming the data. Whether it’s reformatting, aggregating, or indexing, these low-level transformations could run far more efficiently on the same compute where the data already lives. So we’re making that possible — exposing serverless, raw data transformations, right on our storage layer.

Intelligence. As data sets grow and applications multiply, managing them — tracking which version was used where, and deciding whether to move compute to data or data to compute — is becoming a full-time job. Today’s tools paper over the problem, but these sit outside the storage layer. Archil is different: we’re building versioning, locality-awareness, and access controls into the volume, so the right data shows up in the right place, with the right guarantees — no pipeline glue required. We believe storage should collaborate with your applications, not just serve them.

As a result, Archil isn’t a storage company, it’s a data company. Our mission isn’t just to store your data better and faster (though we’ll do that too). We want to blur the line between storing, managing, and using data — and make the process radically simpler. We want to become the data company.

What’s next?

We’ve spent the past 6 months hardening, optimizing, and readying the service for General Availability, and we’re excited to work with you. If you think that Archil volumes could be a good fit for your application, we’d love for you to sign up for our waitlist here before we release them generally. Our vision is enormous, and if you’re interested in helping to build the world’s first “data company”, then let us know here — we’re hiring right now in SF. Finally, we’re planning some more interesting technical content for the blog, including more details on how the product works, how we avoid monopolistic egress fees, and what our initial set of customers are able to achieve by switching to Archil volumes. Watch this space, we’ll be sharing more technical details soon.

Authors

Hunter Leath

Founder and CEO