ForeverVM: Run AI-generated code in stateful sandboxes that run forever
forevervm.comHey HN!
We started Jamsocket a few years ago as a way to run ephemeral servers that last for as long as a WebSocket connection. We sandboxed those servers, so with the rise of LLMs we started to see people use them for arbitrary code execution.
While this works, it was clunkier than what we would have wanted in a first-principles code execution product. We built ForeverVM from scratch to be that product.
In particular, it felt clunky for app developers to have to think about sandboxes starting and stopping, so the core tenet of ForeverVM is using memory snapshotting to create the abstraction of a Python REPL that lives forever.
When you go on our site, you are given a live Python repl, try it out!
Is it possible to run cython code with this as well? Since you can run a setup.py script could you compile cython and run it?
Looking at the docs, it seems only suited for interpreted code, but I’d be interested to know if this was feasible or almost feasible with a little work.
We are working now on support for arbitrary imports of public packages from PyPi, which will include cython support, but only for public pypi packages. Soon after that we'll be working on a way to provide proprietary packages (including cython).
Where did you see mention of a setup.py script? I couldn't find that in their docs. From what I saw, they only support using a long-lived repl.
Why would you want to have an ever growing memory usage for your Python environment?
Since LLM context is limited, at some point the LLM will forget what was defined at the beginning so you will need to reset/ remind the LLM whats in memory.
Fun fact: this is very similar to how Smalltalk works. Instead of storing source code as text on disk, it only stores the compiled representation as a frozen VM. Using introspection, you can still find all of the live classes/methods/variables. Is this the best way to build applications? Almost assuredly not. But it does make for an interesting learning environment, which seems in line with what this project is, too.
You're right that LLM context is the limiting factor here, and we generally don't expect machines to be used across different LLM contexts (though there is nothing stopping you).
The utility here is mostly that you're not paying for compute/memory when you're not actively running a command. The "forever" aspect is a side effect of that architecture, but it also means you can freeze/resume a session later in time just as you can freeze/resume the LLM session that "owns" it.
It's the other way around, it swaps idle sessions to disk, so that they don't consume memory. From what I read, apparently "traditional" code interpreters keep sessions in memory and if a session is idle, it expires. This one will write it to disk instead, so that if user comes back after a month, it's still there.
Why/when does someone want to use this?
It's probably nice to have whenever you're using an LLM that doesn't have a code interpreter, like Claude. It can probably use code execution as a reality check.
Yes, I've found that just having the MCP server installed, now when I ask a question about Python, Claude becomes eager to check its work before answering Python questions (Claude does have a built in analysis tool, but it only runs Javascript).
Good question, we’ll add some info to the page for this.
LLMs are generally quite good at writing code, so attaching a Python REPL gives them extra abilities. For example, I was able to use a version with boto3 to answer questions about an AWS cluster that took multiple API calls.
LLMs are also good at using a code execution environment for data analysis.
Is it possible to reuse the same paused VM multiple times from the same snapshot?
It's not exposed in the API yet, but it's very possible with the architecture and something we plan to expose. I am curious if you have a use case for that, because I've been looking for use cases! Being able to fork the chat and try different things in parallel is the motivating use case in my mind, but I'm sure there are others.
The obvious use-case (to me) is to create an agent that relies on an interpreter with a bunch of pre-loaded state that's already been set up exactly a certain way — where that state would require a lot of initial CPU time (resulting in seconds/minutes of additional time-to-first-response latency), if it was something that had to run as an "on boot" step on each agent invocation.
Compare/contrast: the Smalltalk software distribution model, where rather than shipping a VM + a bunch of code that gets bootstrapped into that VM every time you run it, you ship an application (or more like, a virtual appliance) as a VM with a snapshot process-memory image wherein the VM has already preloaded that code [and its runtime!] and is "fully ready" to execute that code with no further work. (Or maybe — in the case of server software — it's already executing that code!)
Check out why Togerther.AI acquired CodeSandbox.
Disclosure, I’m an investor in Jamsocket, the company behind this… but I’d be remiss if I didn’t say that every time Paul and Taylor launch something they have been working on, I end up saying “woah.” In particular, using ForeverVM with Clause is so fun.
May I ask how you got the opportunity to invest in this company? If you are a VC, makes sense, just wondering how normies can get access to invest in companies they believe in. Thanks
If you're an accredited investor (make sure you meet the financial criteria) you can cold email seed/pre-seed stage companies. These companies typically raise on SAFEs and may have low minimum investments (say $5k or $10k).
YC lists all their companies here: https://www.ycombinator.com/companies.
Many companies are likely happy to take your small check if you are a nice person and can be even minimally helpful to them. Note that for YC companies you'll probably have to swallow the pill of a $20M valuation or so.
I do indeed work in VC. But as another reply mentions, any accredited investor can write small checks into startups, and most preseed/seed founders are happy to take angel checks.
It’s trivial to build something that does what this describes. I’m sure there’s more too it, but based on the description the pieces are already there under permissive open source licenses.
For a clean implementation I’d look at socket-activated rootless podman with a wasi-sdk build of Python.
Kind of how with Dropbox "you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem"?
(c.f. https://news.ycombinator.com/item?id=9224)
It was an afternoon to prototype, followed by a lot of work to make it scale to the point of giving everyone who lands from HN a live CPython process ;)
This is the sort of thing that would touch a lot of my data so I’d much prefer to have it self hosted but you mention Claude rather than deepseek or mistral so know your audience I guess.
Fair enough. Our audience is businesses rather than consumer, so our equivalent to self-hosting is that we can run it in a customer's cloud.
We mention Claude a lot because it is a good general coding model, but this works with any LLM trained for tool calling. Lately I've been using it as much with Gemini Flash 2.0, via Codename Goose.
What has AI got to do with this? It's in the headline but I don't see why.
The API could be used for non-AI use cases if you wanted to, but it’s built to be integrated with an LLM through tool calling. We provide an MCP (model context protocol, for integration in Claude, Cursor, Windsurf etc.) server.
You might have noticed that ChatGPT (and others) will sometimes run Python code to do calculations. My understanding is that this will enable the same thing in other environments, like Cursor, Continue, or aider.
Also, those code interpreters usually can't make external network requests, which is adds a lot of capabilities like pulling some data, and then analyzing it.