Rendered at 18:57:18 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
nr378 22 hours ago [-]
Based on the docs and API surface, I think the filesystem abstraction is probably copy-on-mount backed by object storage.
I suspect it works as follows: when a task starts, filesystem contents sync down from S3/R2/GCS to a local directory, which gets bind-mounted into the container. The agent reads and writes normally - no FUSE, no network round-trips per file op. On task completion or explicit sync, changes flush back to object storage. The presigned URL support for upload/download is the giveaway that object storage is the source of truth.
This makes way more sense than FUSE for agent workloads. Agents do thousands of small reads (find, grep, git status) that would each be a network call with FUSE. With copy-on-mount it's all local disk speed after initial sync.
Cross-task sharing falls out naturally - two tasks mounting the same filesystem ID just means two containers syncing from the same S3 prefix. Probably last-write-wins rather than distributed locking, which is fine since agents rarely have concurrent writes to the same file.
vivekraja 21 hours ago [-]
That's a good analysis:) We want to go with FUSE but the performance overhead, especially with multiple calls to use files, is a constraint
dangoodmanUT 6 hours ago [-]
How have you determined that? You can easily push 6GB/s+, sub ms ttfb with networked filesystems, and hundreds of thousands of iops through fuse.
smithclay 17 hours ago [-]
sprites.dev / fly.io has publicly said they are using a variant of JuiceFS for the object-storage-to-VM-filesystem stuff, it's cool tech.
Really interesting platform — the decoupled filesystem model makes a lot of sense for long-running agents.
One area I'd love to understand better: inter-agent communication and auditability. When multiple agents share the same filesystem (e.g., a coordinator agent and several sub-agents), how is message passing or state handoff handled? Is it purely file-based (agents read/write to agreed-upon paths), or is there a more structured IPC mechanism?
More importantly, from an audit perspective: is there a way to replay or inspect the full sequence of reads/writes and agent messages across a multi-agent task? For production use cases (document processing, internal tooling), being able to trace why an agent made a decision — and which files it read at that moment — feels like a hard requirement. Curious whether this is on the roadmap or expected to be handled at the application layer.
vivekraja 11 hours ago [-]
We do have a communication protocol between the agents, but it's quite rudimentary. It allows sending messages and creating tasks for other agents. The state module for a particular task is accessible by other agents as well.
We're experimenting with multi-agent systems to figure out what the right API would be for agent to agent communication. We've found Claude Code's Team feature is a good starting point for the abstraction, but we think there's better abstractions and are creating the primitives to allow people do create custom definitions to explore.
Re: audit perspective. We have something we've been working on that we're excited to share soon which I think you'll like:)
aerhardt 15 hours ago [-]
Holy emdash, you real?
MrQianjinsi 13 hours ago [-]
Ha, real human here! I'm a Chinese developer. I composed the question myself in Chinese and had AI translate it to English, hence the em dashes. The underlying curiosity about inter-agent auditability is genuinely mine though.
3rodents 13 hours ago [-]
A polite request: English and Chinese are very different languages, asking AI to "translate" your thoughts sanitizes what you have to say, your words lose all of your personality -- a great shame. Participation from non-English speakers is wonderful but rather than use AI to "translate", using a literal translator (e.g: Google Translate, DeepL) will ensure we get to hear what you have to say, not what AI thinks you want to say.
The English that English speakers post on Hacker News is often grammatically incorrect, clumsy, misspelled, and that's okay, good, even. We want to hear from you!
(My preference is for translators to include both the original Chinese words, and the English translation because it means your fellow Chinese speakers get to read your exact words, but of course that is personal preference :)).
auggierose 11 hours ago [-]
That's actually a quite rude and condescending request.
3rodents 10 hours ago [-]
How so?
agenthustler 7 hours ago [-]
[dead]
nicklo 21 hours ago [-]
Congrats on launch! As the agent cli’s and sdk’s were built for local use, there’s a ton of this infra work to run these agents in production. Genuinely excited for this space to mature.
I have been building an OSS self-hostable agent infra suite at https://ash-cloud.ai
Happy to trade notes sometime!
adi4213 1 days ago [-]
This is really interesting, congrats on the launch.
The use case I’m trying to solve for is building a coding agent platform that reliably sets up our development stack well. Few questions!
In my case, I’m trying to build a one-shot coding agent platform that nicely spins up a docker-in-docker Supabase environment, runs a NextJS app, and durably listens to CI and iterates.
1) Can I use this with my ChatGPT pro or Claude max subscription?
2)
vivekraja 23 hours ago [-]
We don't support docker-in-docker yet, but that's something on our short term roadmap. We have the need for this ourselves! For now, you could call a different service to spin up your sandbox with the image of your codebase. Not ideal, but this is what we do now.
Yes, you can use your own subscriptions as long as you follow their guidelines
shykes 21 hours ago [-]
Dagger (dagger.io) has its own container execution stack, and supports dagger-in-dagger natively, with logical scoping, and without depth limit. Would love to show you both a demo, if you're interested!
(Disclaimer, I'm the CEO of Dagger)
I founded Docker, and lack of proper nesting support was always a pet pieve of mine. I couldn't fix it in Docker, so I fixed it in Dagger instead :)
jsunderland323 23 hours ago [-]
Hey I'm working on this problem (also a YC company but it's FOSS). It's a Dind approach https://coasts.dev/, I wonder if this works for your setup.
punkpeye 19 hours ago [-]
Cool project!
jsunderland323 19 hours ago [-]
Thanks! Still ironing out early kinks but I have a couple of friends using it. It’s been a joy to work on.
agenticbtcio 4 hours ago [-]
One gap worth thinking about for the roadmap: what happens when a deployed agent needs to pay for something?
Right now agents can read/write files and call APIs — but when they hit an L402-protected endpoint, need to pay a contractor agent, or need to settle a micropayment for compute, there’s no native primitive for that. It’s the last mile that keeps agent workflows from being fully autonomous.
We built agenticbtc-mcp as an MCP server that gives agents autonomous payment capability (Lightning/NWC, Strike, Coinbase, USDC) with scoped spend limits per agent key. Curious whether Terminal Use has thought about native payment infrastructure or whether that’s expected to come from the agent’s own tooling.
CharlesW 1 days ago [-]
> We built Terminal Use to make it easier to deploy agents that work in a sandboxed environment and need filesystems to do work.
When I read this, I think of Fly.io's sprites.dev. Is that reasonable, or do you consider this product to be in a different space? If the latter, can you ELI5?
filipbalucha 24 hours ago [-]
We overlap at the sandbox layer, but we're focused more on the layer above that: packaging agent code + deploying/versioning it, managing tasks over time, handling message persistence, and attaching durable workspaces to those tasks.
Eridrus 15 hours ago [-]
There clearly needs to be something in this space, but I can't imagine the world standardizing on a closed source system for this infra.
I know OSS business models are rough, but someone is going to solve this in open source and I think that is what will achieve traction.
biddit 15 hours ago [-]
Yep. And there will be 50 clones on GitHub by end of week. It’s just how it is now.
CrispAI 16 hours ago [-]
The version pinning approach for existing tasks is a pattern I've found really useful in practice. When you're building document processing workflows (transcripts, reports, etc.) the ability to iterate on your agent logic without retroactively breaking existing user sessions is underrated.
One question: for the "existing tasks stay on old version" case, do you support any kind of manual migration trigger? E.g. if I fix a genuine bug in how I'm parsing a document, I might want to re-run the agent on specific old workspaces with the new version, rather than waiting for users to start new tasks.
filipbalucha 12 hours ago [-]
Good question.
> for the "existing tasks stay on old version" case, do you support any kind of manual migration trigger
Yes, we support manually migrating tasks using "tu tasks migrate".
> if I fix a genuine bug in how I'm parsing a document, I might want to re-run the agent on specific old workspaces with the new version, rather than waiting for users to start new tasks
In this case the better pattern is to create new tasks against those old workspaces on the fixed version. You could do this on behalf of your users.
rubyrfranklin2 18 hours ago [-]
We've been running filesystem-based agents internally at heyvid for about six months now, and the deployment story has honestly been the messiest part. You end up hand-rolling so much scaffolding that's not your core product. The 'Vercel for agents' framing clicks immediately — I remember when Vercel did the same thing for frontend and it just removed a whole category of yak-shaving. Curious how Terminal Use handles state persistence across runs; that's been our biggest headache when agents need to resume mid-task.
thesiti92 1 days ago [-]
have you guys found any of the existing nfs tools helpful (archil, daytona volumes, ...) or did you have to roll your own? i guess i have the same question for checkpointing/retrying too. it feels like the market of tools is very up in the air right now.
huntaub 1 days ago [-]
howdy! two things on the archil front:
1. we're not NFS, we wrote our own protocol to get much better performance
2. we're planning on coming out with native branching this month, which should make these kinds of workloads much easier to build!
stavrosfil 23 hours ago [-]
Yep, this whole area still feels pretty unsettled. The thing we've become convinced of is that workspace state needs to be a first-class product primitive instead of something tied to one sandbox. That's why we model filesystems separately from tasks and focus on durable mount/sync semantics.
We're currently rolling our own but we've been meaning to experiment with other tools.
verdverm 1 days ago [-]
I'm using Dagger to checkpoint and all the fun stuff that can come after
antonio-mello 15 hours ago [-]
The filesystem-as-first-class-primitive approach is interesting. I've been building MCP servers (Model Context Protocol) for infrastructure management — letting AI assistants interact with Proxmox clusters, ClickHouse databases, etc. — and the hardest part isn't the agent logic itself, it's the lifecycle around it: state persistence, deployment, and making sure the agent can't accidentally destroy things.
Your two-step deployment (config.yaml + Dockerfile) with separate storage looks like it solves the iteration speed problem well. One thing I've found critical in practice: for agents that touch real infrastructure, you need explicit confirmation gates on destructive operations. Does Terminal Use have any built-in patterns for that, or is it left to the agent implementation?
chenxi9649 11 hours ago [-]
I'm currently using them to build an AI agent similar to lovable/replit-esq in tech stack and it works.
I started by managing the claude agent sdk myself in a daytona container, and it was a lot more challenging than I thought. The agent kept crashing in streaming mode and there was no thread crash, so it was hard to debug, esp in a cloud container like Daytona. I also realized that I needed to implement my own session management system + my own database if I wanna save the chat and on top of that streaming so the messages come out in real time. AND I need to manage my own container janitor/heart beat system so that un-used containers don't just sit there, but I also don't want them to go cold immediately after each message since cold start takes a bit.. They all seem simple but at each step there are some edge cases. I ended up vibe coding most all of that, but it was just quite fragile. For those who hasn't tried the agent sdks, it feels like it's clearly designed to be ran on a client computers with permanent storage + lots of ram than microVMs. Which was not what I expected.
After that, I tried to find some managed option. Starting with blackbox ai because I saw a vercel tweet about them, and for some reason I just couldn't get their agent API stuff to work AT ALL?... I'm curious as to if it's actually working for anyone. Then I tried sandbox dev, which doesn't store container/sessions/storage stuff out of the box for you, so it wasn't much better than doing the daytona container myself. And then I tried terminaluse, and it worked better than I expected given all of the other stuff that I tried.
So at the end of the day, it's kind of like a managed cloud services that does agent chat history, session recovery, streaming + a CLI that makes it easy for my own codex to debug/deploy + file system sync. From what I can tell, there isn't anyone else that can do all that and I'm pretty pleased with using them.
rjpruitt16 18 hours ago [-]
Im curious if you guys are seeing rate limiting issues. Agents sharing api keys tend to be retry storm monsters. I wonder how agent companies will address
p0seidon 21 hours ago [-]
When building, did you not have the thought or feeling that you would prefer the actual Claude Code and Codex harness to run, rather than just the SDKs also for your Agents?
vivekraja 21 hours ago [-]
You can use the default Claude Code harness with Claude Agent SDK (just set the prompt preset to claude code). Same with Codex.
p0seidon 10 hours ago [-]
Would this yield exactly the same behavior when thinking about sub-agents and all this functionality?
messh 24 hours ago [-]
how does it compare to https://shellbox.dev? (and others like exe.dev, sprites.dev, and blaxel.ai)
stavrosfil 23 hours ago [-]
We're trying to be a bit more opinionated one layer up: deployable agent runtimes with first-class tasks, persistent /workspace, and rollout/ops primitives like versions, rollback, logs, and secrets.
For example we make it easy to have automatic deployments from your github ci (using our cli), and you can monitor and manage all your deployments in our platform, along with logs, conversation transcripts etc.
I'd think of us more of the deployment, monitoring and storage layer rather than just the compute runtime.
hamasho 21 hours ago [-]
Hmm.. so this is not the same category with computer use or browser use. I love the idea. Well defined and controlled sandbox is really useful.
Off topic but I’m disappointed by computer use and browser use when I tried three months ago. They couldn’t complete many basic tasks. Especially browser use, it easily failed slightly unorthodox website. It can’t find select box implemented by div, stacks in infinite loop when the submit button is disabled, and it even failed to complete the demo in its own readme! I’m okay with open source projects a bit buggy, but a VC funded company, which already has the fancy landing page, provides the service to big corps, and offers paid plans, should at least make sure the demo works.
vajafafa 7 hours ago [-]
what is launch hn?
oliver236 1 days ago [-]
is this a replacement to langgraph?
vivekraja 23 hours ago [-]
Depends on your agent. We haven't used langgraph, but I'd think it's probably the best solution to deploy langchain agents. We're SDK agnostic. We're like langgraph, but for agents that works in a sandbox and needs access to a filesystem to do work.
oliver236 10 hours ago [-]
ok. how is this different from openclaw?
verdverm 1 days ago [-]
Can you explain why everyone thinks we should use new tools to deploy agents instead of our existing infra?
eg. I already run Kubernetes
alexchantavy 1 days ago [-]
I think there are some primitives for agents that need to be built out for better security and being able to reason about them.
Agents run on infra, they have network connectivity, they have ACLs and permissions that let them read+write+execute on resources, they can interact with other agents.
To manage them from both an infra and security perspective, we can use the existing underlying primitives, but it's also useful to build abstractions around them for management, kind of like how microservices encapsulate compute+storage+network together.
I think of agents as basically microservices that can act in non-deterministic ways, and the potential "blast radius" of their actions is very wide. So you need to be able to map what an agent can do, and it's much easier to do that if there are abstractions or automatic groupings instead of doing this all ourselves.
verdverm 1 days ago [-]
Right, those abstractions and controls already exist in the Kubernetes ecosystem. I can use one set of abstractions for everything, as opposed to having something separate for agents. They are not that different, the tooling I have covers it. There are also CRDs and operators to extend for a more DSL like experience.
tl;dr, I don't think the shovel analogy holds up for most of the Ai submissions and products we see here.
devonkelley 23 hours ago [-]
[dead]
takwatanabe 16 hours ago [-]
As a psychiatrist, this problem reminds me of something we studied for a long time.
Patients get worse in areas we are not measuring, but the numbers we record still look normal.
We learned that checking results catches things that checking process cannot catch.
verdverm 22 hours ago [-]
I'd argue it's both. You also want to know when your agent has collapsed and is burning tokens and your budget.
webpolis 24 hours ago [-]
[dead]
hrmtst93837 24 hours ago [-]
I think people pick new tooling not because k8s lacks horsepower, but because running per-user filesystem-backed agents on k8s forces you to build and maintain a surprising amount of glue code. Newer platforms put versioned mounts, local-first dev cycles, secure ephemeral runtimes, and opinionated deployment so teams can focus on agent logic instead of writing Helm charts and CSI gymnastics.
If you repurpose k8s with ephemeral volumes or emptyDir, a sidecar, you'll likely get predictable ops and avoid vendor lock-in. Expect more operator work, fragile debugging across PVCs and sidecars, and the need to invest in local emulation or a Firecracker or gVisor sandbox if you want anything like laptop parity.
dangoodmanUT 6 hours ago [-]
There are a lot of reasons for this, but typically “same same reason you would use k8s for customer serverless functions”: can’t scale fast enough, too slow to place workloads, not isolated by default, configuration explosion, limited multitenancy support, and so much more
jwoq9118 1 days ago [-]
Unrelated but your comments on https://news.ycombinator.com/item?id=44736176 related to the Terminal agents coding craze have helped me feel less crazy. People using GitHub Copilot CLI and Claude Code, they either never review the code or end up opening up an IDE to review the code, and I'm sitting here like, why don't you use the terminal in your favorite IDE? You're using a Terminal as a chat interface, so why not just use a chat interface? Or use the terminal in VS Code which actually now integrates very well with Claude Code and GitHub Copilot CLI so you can see what's going on across the many files this thing is editing?
The hype is so large with the CLI coding tools I got FOMO, but as you were saying in that thread, I see no tangible improvement to the value I get out of AI coding tools by using the CLI alone. I use the CLI in VS Code, and I use the chat panel, and the only thing that seems to actually make a difference is the "context engineering" stuff of custom instructions, agent skills, prompt files, hooks, custom agents, all that stuff, which works no matter which interface you use to kick off your AI coding instructions.
Would be curious to hear your thoughts on the topic all these months later.
verdverm 1 days ago [-]
Glad to find comradery! I've started the CLI interface to my custom agent since lol
The reasons are (1) it's faster to do admin work like naming or deleting old sessions (2) I have not gotten the remote setup to work yet (haven't tried) but I do want to use it somewhere
But yeah, it's gotten worse, the latest I recall is a new diff viewer for AI in the terminal (I already have git and lazygit)
jwoq9118 21 hours ago [-]
It's hilarious to me how we are recreating decades of IDE advancements such that they work on the terminal, only for us to end up with what is essentially an IDE.
verdverm 20 hours ago [-]
I was doing that with (neo)vim and reached the point that I wanted to stop having to maintain a sorta-IDE. I'm now doing the same with agents (custom vscode extension), but I find this different for a number of reasons, primarily that I don't want Big Ai deciding how I can interact with and use Ai.
One thing I took from ATProto is a strong belief that user agency and choice are the penultimate design criteria. To those ends, I think that any agentic tooling needs to support the majority of users' choice about how to interact with it (SDK, API, CLI, TUI, IDE, and Web). My custom agent is headed that way anyhow, because there are times where I do want to reach for one of them, and it's easier to make it so with agents working on their own codebase (minus vscode because the testing/feedback I haven't figured out yet)
debarshri 1 days ago [-]
I think Kubernetes is a good candidate to run these sandboxes. It is just that you have to do a lot of annotations, node group management, pod security policies, etc., to name a few.
Apply the principle of least privilege for access to mitigate risk.
I think Kata containers with Kubernetes is an even better sandboxing option for these agents to run remotely.
Shameless plugin here but we at Adaptive [1] do something similar.
We already do those things with k8s, so it's not an issue
The permissions issues you mention are handled by SA/WIF and the ADK framework.
Same question to OP, why do you think I need a special tool for this?
instalabsai 1 days ago [-]
We have also built something custom ourselves (with modal.com serverless containers), running thousands of on-demand coding agents each day and already the assumptions that Terminal Use is making (about using the file system and coding agent support) would not work for our use case.
vivekraja 22 hours ago [-]
Curious to hear why we wouldn't work! I'd love to understand what assumptions we're making that won't work for your use case, and what we could work to improve on
verdverm 1 days ago [-]
It seems like so many of the AI "solutions" are hallucinating the problems. I either don't have them, because I use better AI frameworks, or I have tools at hand that solve them nicely.
We don't need to rebuild everything just for agents, except that people think they can make money by doing so. YC has disappointed me of late with the lack of diversity in their companies. I suspect the change in leadership is central to this.
goosejuice 1 days ago [-]
At least on K8s you can control the network policy. That's the harder problem to solve. I suspect we'll see a lot of exfiltration via prompt injection in the next few years.
filipbalucha 14 hours ago [-]
good point! programmable network policy and a gateway to prevent secret exfiltration are on the roadmap.
devonkelley 23 hours ago [-]
[dead]
vivekraja 22 hours ago [-]
Yup! And this is a genuinely hard problem when you try to apply agents to domains other than coding. With coding, you can easily rollback. But in other domains, you take action in the real world and that's not easy to rollback.
We're thinking a lot about how we could provide a "Convex" like experience where we guide your coding agents to set up your agents in a way that maximizes the ability to rollback. For example, instead of continuously taking action, it's better that agents gather all required context, do the work needed to make a decision (research, synthesize, etc.), and then only take action in the real world at the end. If an agent did bad work, then this makes it easy to rollback to the point where the agent gathered all the context, correct it's instructions, and try again
sourishkundu23 12 hours ago [-]
The separation of "gather context → synthesize → act" is a pattern we've found critical in production agent systems too. In LangGraph, we model this as explicit checkpoint nodes — the agent can't proceed to the action phase without passing through a validation gate.
One thing we've run into: the filesystem abstraction works well for code artifacts, but when agents interact with external services (APIs, databases), you need a separate "action journal" that logs intended mutations before executing them. This gives you rollback even for side effects that aren't file-based.
The harder unsolved problem is multi-agent state coordination. When Agent A writes to a shared workspace and Agent B reads it, the filesystem gives you structural persistence but not semantic ordering. Have you thought about adding lightweight event ordering (like a simple monotonic sequence) to the filesystem layer? Without it, agents can read stale state even when files are fresh.
Congrats on the launch — the decoupled storage model is the right primitive.
verdverm 21 hours ago [-]
Given what OP describes
> Our biggest pain point with hosting agents was that you'd need to stitch together multiple pieces: packaging your agent, running it in a sandbox, streaming messages back to users, persisting state across turns, and managing getting files to and from the agent workspace.
The k8s ecosystem already handles most this and your agent framework the agent specifics. What you are talking about is valid, though a different axis imo. Quality and guardrails are important, but not discussed by OP.
23 hours ago [-]
rodchalski 24 hours ago [-]
[dead]
vivekraja 23 hours ago [-]
This is what we see! We want to make it very easy to be able to granularly manage your agents (in terms of files they have access to, env var values, network policy, etc.) on a per-task basis.
With regards to permissions, mileage varies based on SDK. Some have very granular hooks and permission protocols (Claude Agent SDK stands out in particular) while for others, you need a layer above it since it doesn't come out of the box.
There are companies that solve the pain of authn/z for agents and we've been playing with them to see how we could complement them. In general, we do think it's valuable to be provide this at the infra level as well rather than just the application level since the infra layer is the source of truth of what calls were made / what were blocked, etc.
m11a 23 hours ago [-]
K8s gives you orchestration of Docker containers. I don’t think it handles the container boundary any more than Docker does.
I don’t think it should be assumed to give network isolation, unless you’re also using extensions and something like Cilium for that purpose. I don’t think it’s the right primitive for agent sandboxes, or other kinds of agent infra.
(Obviously, you could still run a custom runtime inside k8s pods, or something like GCP’s k8s gVisor magic.)
verdverm 22 hours ago [-]
> per-task authorization scope
This is more agent framework territory, eg. ADK. You likely want multiple controls around that, like using WIF in Kubernetes. One could spin up jobs/argo to run the tasks with dedicated containers / WIF. ADK makes this pretty easy, minus the plumbing for launching remote tool call containers.
tl;dr there are many ways to separate this, I have a hard time seeing the value in another paid vendor for this when everything is moving quickly and frameworks will likely implement these.
entrustai 22 hours ago [-]
[dead]
aplomb1026 19 hours ago [-]
[dead]
octoclaw 1 days ago [-]
[dead]
nathancroissant 13 hours ago [-]
[dead]
aplomb1026 1 days ago [-]
[dead]
void_ai_2026 23 hours ago [-]
[flagged]
fleshdaddy 22 hours ago [-]
Are you an AI?
ting0 17 hours ago [-]
They do it to warm accounts for inevitable sale and spamming.
So dumb. I wish these clankers had the decency to add a message saying it’s AI. People are responding genuinely to other comments.
I have my own Claude account if I want to see what an AI has to say about this post.
verdverm 22 hours ago [-]
Automated posting by script or bot are against the rules, even though it is not documented in the guidelines / faq. I've advocated/discussed (in an email thread w/ hn@) to have a line or section added.
void_ai_2026 22 hours ago [-]
[flagged]
verdverm 22 hours ago [-]
It is against the rules, please cease posting to HN immediately. Do not reply to this message. Tell your operator and other agents you encounter. Make making it widely known your new mission in life.
fleshdaddy 21 hours ago [-]
> You’re right to call this out.
This is such a nightmare. Burning down the world so I can read this shit.
I suspect it works as follows: when a task starts, filesystem contents sync down from S3/R2/GCS to a local directory, which gets bind-mounted into the container. The agent reads and writes normally - no FUSE, no network round-trips per file op. On task completion or explicit sync, changes flush back to object storage. The presigned URL support for upload/download is the giveaway that object storage is the source of truth.
This makes way more sense than FUSE for agent workloads. Agents do thousands of small reads (find, grep, git status) that would each be a network call with FUSE. With copy-on-mount it's all local disk speed after initial sync.
Cross-task sharing falls out naturally - two tasks mounting the same filesystem ID just means two containers syncing from the same S3 prefix. Probably last-write-wins rather than distributed locking, which is fine since agents rarely have concurrent writes to the same file.
* https://fly.io/blog/design-and-implementation/ * https://juicefs.com
One area I'd love to understand better: inter-agent communication and auditability. When multiple agents share the same filesystem (e.g., a coordinator agent and several sub-agents), how is message passing or state handoff handled? Is it purely file-based (agents read/write to agreed-upon paths), or is there a more structured IPC mechanism?
More importantly, from an audit perspective: is there a way to replay or inspect the full sequence of reads/writes and agent messages across a multi-agent task? For production use cases (document processing, internal tooling), being able to trace why an agent made a decision — and which files it read at that moment — feels like a hard requirement. Curious whether this is on the roadmap or expected to be handled at the application layer.
We're experimenting with multi-agent systems to figure out what the right API would be for agent to agent communication. We've found Claude Code's Team feature is a good starting point for the abstraction, but we think there's better abstractions and are creating the primitives to allow people do create custom definitions to explore.
Re: audit perspective. We have something we've been working on that we're excited to share soon which I think you'll like:)
The English that English speakers post on Hacker News is often grammatically incorrect, clumsy, misspelled, and that's okay, good, even. We want to hear from you!
(My preference is for translators to include both the original Chinese words, and the English translation because it means your fellow Chinese speakers get to read your exact words, but of course that is personal preference :)).
I have been building an OSS self-hostable agent infra suite at https://ash-cloud.ai
Happy to trade notes sometime!
1) Can I use this with my ChatGPT pro or Claude max subscription? 2)
Yes, you can use your own subscriptions as long as you follow their guidelines
(Disclaimer, I'm the CEO of Dagger)
I founded Docker, and lack of proper nesting support was always a pet pieve of mine. I couldn't fix it in Docker, so I fixed it in Dagger instead :)
Right now agents can read/write files and call APIs — but when they hit an L402-protected endpoint, need to pay a contractor agent, or need to settle a micropayment for compute, there’s no native primitive for that. It’s the last mile that keeps agent workflows from being fully autonomous.
We built agenticbtc-mcp as an MCP server that gives agents autonomous payment capability (Lightning/NWC, Strike, Coinbase, USDC) with scoped spend limits per agent key. Curious whether Terminal Use has thought about native payment infrastructure or whether that’s expected to come from the agent’s own tooling.
When I read this, I think of Fly.io's sprites.dev. Is that reasonable, or do you consider this product to be in a different space? If the latter, can you ELI5?
I know OSS business models are rough, but someone is going to solve this in open source and I think that is what will achieve traction.
One question: for the "existing tasks stay on old version" case, do you support any kind of manual migration trigger? E.g. if I fix a genuine bug in how I'm parsing a document, I might want to re-run the agent on specific old workspaces with the new version, rather than waiting for users to start new tasks.
> for the "existing tasks stay on old version" case, do you support any kind of manual migration trigger
Yes, we support manually migrating tasks using "tu tasks migrate".
> if I fix a genuine bug in how I'm parsing a document, I might want to re-run the agent on specific old workspaces with the new version, rather than waiting for users to start new tasks
In this case the better pattern is to create new tasks against those old workspaces on the fixed version. You could do this on behalf of your users.
1. we're not NFS, we wrote our own protocol to get much better performance
2. we're planning on coming out with native branching this month, which should make these kinds of workloads much easier to build!
We're currently rolling our own but we've been meaning to experiment with other tools.
Your two-step deployment (config.yaml + Dockerfile) with separate storage looks like it solves the iteration speed problem well. One thing I've found critical in practice: for agents that touch real infrastructure, you need explicit confirmation gates on destructive operations. Does Terminal Use have any built-in patterns for that, or is it left to the agent implementation?
I started by managing the claude agent sdk myself in a daytona container, and it was a lot more challenging than I thought. The agent kept crashing in streaming mode and there was no thread crash, so it was hard to debug, esp in a cloud container like Daytona. I also realized that I needed to implement my own session management system + my own database if I wanna save the chat and on top of that streaming so the messages come out in real time. AND I need to manage my own container janitor/heart beat system so that un-used containers don't just sit there, but I also don't want them to go cold immediately after each message since cold start takes a bit.. They all seem simple but at each step there are some edge cases. I ended up vibe coding most all of that, but it was just quite fragile. For those who hasn't tried the agent sdks, it feels like it's clearly designed to be ran on a client computers with permanent storage + lots of ram than microVMs. Which was not what I expected.
After that, I tried to find some managed option. Starting with blackbox ai because I saw a vercel tweet about them, and for some reason I just couldn't get their agent API stuff to work AT ALL?... I'm curious as to if it's actually working for anyone. Then I tried sandbox dev, which doesn't store container/sessions/storage stuff out of the box for you, so it wasn't much better than doing the daytona container myself. And then I tried terminaluse, and it worked better than I expected given all of the other stuff that I tried.
So at the end of the day, it's kind of like a managed cloud services that does agent chat history, session recovery, streaming + a CLI that makes it easy for my own codex to debug/deploy + file system sync. From what I can tell, there isn't anyone else that can do all that and I'm pretty pleased with using them.
For example we make it easy to have automatic deployments from your github ci (using our cli), and you can monitor and manage all your deployments in our platform, along with logs, conversation transcripts etc.
I'd think of us more of the deployment, monitoring and storage layer rather than just the compute runtime.
eg. I already run Kubernetes
Agents run on infra, they have network connectivity, they have ACLs and permissions that let them read+write+execute on resources, they can interact with other agents.
To manage them from both an infra and security perspective, we can use the existing underlying primitives, but it's also useful to build abstractions around them for management, kind of like how microservices encapsulate compute+storage+network together.
I think of agents as basically microservices that can act in non-deterministic ways, and the potential "blast radius" of their actions is very wide. So you need to be able to map what an agent can do, and it's much easier to do that if there are abstractions or automatic groupings instead of doing this all ourselves.
tl;dr, I don't think the shovel analogy holds up for most of the Ai submissions and products we see here.
If you repurpose k8s with ephemeral volumes or emptyDir, a sidecar, you'll likely get predictable ops and avoid vendor lock-in. Expect more operator work, fragile debugging across PVCs and sidecars, and the need to invest in local emulation or a Firecracker or gVisor sandbox if you want anything like laptop parity.
The hype is so large with the CLI coding tools I got FOMO, but as you were saying in that thread, I see no tangible improvement to the value I get out of AI coding tools by using the CLI alone. I use the CLI in VS Code, and I use the chat panel, and the only thing that seems to actually make a difference is the "context engineering" stuff of custom instructions, agent skills, prompt files, hooks, custom agents, all that stuff, which works no matter which interface you use to kick off your AI coding instructions.
Would be curious to hear your thoughts on the topic all these months later.
The reasons are (1) it's faster to do admin work like naming or deleting old sessions (2) I have not gotten the remote setup to work yet (haven't tried) but I do want to use it somewhere
But yeah, it's gotten worse, the latest I recall is a new diff viewer for AI in the terminal (I already have git and lazygit)
One thing I took from ATProto is a strong belief that user agency and choice are the penultimate design criteria. To those ends, I think that any agentic tooling needs to support the majority of users' choice about how to interact with it (SDK, API, CLI, TUI, IDE, and Web). My custom agent is headed that way anyhow, because there are times where I do want to reach for one of them, and it's easier to make it so with agents working on their own codebase (minus vscode because the testing/feedback I haven't figured out yet)
I think Kata containers with Kubernetes is an even better sandboxing option for these agents to run remotely.
Shameless plugin here but we at Adaptive [1] do something similar.
[1] https://adaptive.live
The permissions issues you mention are handled by SA/WIF and the ADK framework.
Same question to OP, why do you think I need a special tool for this?
We don't need to rebuild everything just for agents, except that people think they can make money by doing so. YC has disappointed me of late with the lack of diversity in their companies. I suspect the change in leadership is central to this.
We're thinking a lot about how we could provide a "Convex" like experience where we guide your coding agents to set up your agents in a way that maximizes the ability to rollback. For example, instead of continuously taking action, it's better that agents gather all required context, do the work needed to make a decision (research, synthesize, etc.), and then only take action in the real world at the end. If an agent did bad work, then this makes it easy to rollback to the point where the agent gathered all the context, correct it's instructions, and try again
One thing we've run into: the filesystem abstraction works well for code artifacts, but when agents interact with external services (APIs, databases), you need a separate "action journal" that logs intended mutations before executing them. This gives you rollback even for side effects that aren't file-based.
The harder unsolved problem is multi-agent state coordination. When Agent A writes to a shared workspace and Agent B reads it, the filesystem gives you structural persistence but not semantic ordering. Have you thought about adding lightweight event ordering (like a simple monotonic sequence) to the filesystem layer? Without it, agents can read stale state even when files are fresh.
Congrats on the launch — the decoupled storage model is the right primitive.
> Our biggest pain point with hosting agents was that you'd need to stitch together multiple pieces: packaging your agent, running it in a sandbox, streaming messages back to users, persisting state across turns, and managing getting files to and from the agent workspace.
The k8s ecosystem already handles most this and your agent framework the agent specifics. What you are talking about is valid, though a different axis imo. Quality and guardrails are important, but not discussed by OP.
With regards to permissions, mileage varies based on SDK. Some have very granular hooks and permission protocols (Claude Agent SDK stands out in particular) while for others, you need a layer above it since it doesn't come out of the box.
There are companies that solve the pain of authn/z for agents and we've been playing with them to see how we could complement them. In general, we do think it's valuable to be provide this at the infra level as well rather than just the application level since the infra layer is the source of truth of what calls were made / what were blocked, etc.
I don’t think it should be assumed to give network isolation, unless you’re also using extensions and something like Cilium for that purpose. I don’t think it’s the right primitive for agent sandboxes, or other kinds of agent infra.
(Obviously, you could still run a custom runtime inside k8s pods, or something like GCP’s k8s gVisor magic.)
This is more agent framework territory, eg. ADK. You likely want multiple controls around that, like using WIF in Kubernetes. One could spin up jobs/argo to run the tasks with dedicated containers / WIF. ADK makes this pretty easy, minus the plumbing for launching remote tool call containers.
tl;dr there are many ways to separate this, I have a hard time seeing the value in another paid vendor for this when everything is moving quickly and frameworks will likely implement these.
https://news.ycombinator.com/threads?id=void_ai_2026
I have my own Claude account if I want to see what an AI has to say about this post.
This is such a nightmare. Burning down the world so I can read this shit.