Truly elastic clouds with Zerg: OS-less Erlang on Xen

zerg.erlangonxen.org

105 points by cetinsert 13 years ago

smoyer 13 years ago

Isn't this simply an acknowledgement that having an entire OS (at least in a traditional sense) is overkill? I remember a couple of attempts to make the JVM run on bare metal (JavaOS was one if I recall correctly) and I think that, to a lesser extent, ChromeOS, FirefoxOS and company are trying to eliminate some of the bloat that's occurred.

The other day I spun up a CentOS6 server on a VM and it required a minimum of 512MB of RAM. I remember building RedHat5 system (not RHEL5) with 32MB or 64MB. What is the new server doing differently? Functionally I have the exact same machines (yes, the newer machine is more secure, but should that really require 8x the memory?).

So these micro-systems allow the user to go "back to DOS" if they think Windows is too overblown. And if you don't need a windowing system, and you have the language tools you need, why not treat the VMs (or hardware devices) as embedded systems.

reeses 13 years ago

Very similar to how most projects start with,"Let's design the database model..."
pjmlp 13 years ago

There are JVMs that run in bare metal, specially in the embedded space.
This was also the Lisp Machines and Smalltalk.
Native Oberon and AOS are also two systems were the language and OS follow the same principles, although the code is usually compiled to native code, or JITted on load.
- shawn-butler 13 years ago
  
  ARM even implemented hardware java bytecode instruction execution. [0]
  I don't think it ever got very far in the market, but I don't know all the reasons behind it.
  [0]: http://www.arm.com/products/processors/technologies/jazelle....
  
  pjmlp 13 years ago
  
  For the same reason Lisp Machines had issues with their special processors.
  It is a fallacy that those instructions help execution, in the end general purpose processors are quite capable and you just need a native code compiler instead of trying to execute the bytecodes directly.
  So most JITs on ARM just take advantage of the native instructions, or you just make use of a native code compiler for Java if required to do so. Although the general public seems unaware of it, many Java SDKs for the embedded market also offer native compilers, even Oracle does it.

jacques_chester 13 years ago

Nifty.

I postulated a while back that the virtual machine would eventually be the target of web application design[1]. I wrote an honours project proposal for a blog engine based on reckoning that if you can control a (virtual) machine from the bottom up, why stay trapped in the architectural constraints of shared hosting?

At the time I referred to the Mirage project, which was an OCaml runtime ported to sit directly on Xen [2]; later HalVM came along and did the same for Haskell [3].

This one is particularly neat because they've built more packaging and framework around it to support fast creation and destruction of instances.

I think this is an exciting time. The sooner people stop targeting shared servers as their architectural baseline, the sooner we can start to see genuine architectural innovation again. Right now all the architectural innovation is hidden in one-shot apps and inside companies. It needs to trickle down.

[1] http://clubtroppo.com.au/2008/07/10/shared-hosting-is-doomed...

[2] http://openmirage.org/

[3] http://corp.galois.com/halvm

viktorsovietov 13 years ago

yes, very soon we will start to give attention to writing application over this platform. there is some work still needs to be done in management stack, but it's manageable amount of work.
mercurial 13 years ago

HalVM looks dead, judging by the age of the last commits, but Mirage seems alive and well.
- dons 13 years ago
  
  Not dead. Here's Adam's talk from the Xen Summit 6 months ago - http://www.xen.org/xensummit/xs12na_talks/M9b.html
  
  viktorsovietov 13 years ago
  
  Thanks, I'm glad to hear that!
  
  mercurial 13 years ago
  
  I'll have a look. When I first heard of it (last year?) I thought it was brilliant.

akent 13 years ago

"Over capacity"

"The demo is limited to 16 concurrent instances and 2 libvirt connections. Due to these limitations we were unable to spawn a new instance to service your request. Please try again later."

Um.

astrodust 13 years ago

Irony knows no bounds, except when misconfigured.
jasonzemos 13 years ago

I got the same message (02:35 PDT). This must be a joke, given the title. However http://erlangonxen.org works fine.
viktorsovietov 13 years ago

Sorry gentlemen, we had used libvirt and it set some limits. http://erlangonxen.org/blog/glimpse-truly-elastic
- derleth 13 years ago
  
  > Sorry gentlemen
  And ladies.
  
  viktorsovietov 13 years ago
  
  Sure, and ladies, ladies never to be forgotten!
asadjb 13 years ago

I got the same response. But then I refreshed and it worked. Magic, no?

zobzu 13 years ago

few questions:

"how do get logs of everyone accessing the server?"

"how do you debug the server itself when it fails?"

"how do you audit the server if an attacker actually breaks in?"

it seems to me that the app has to implement all that as with an exokernel, except, there's a single process running, yours

To my understanding: compared to traditional vms, what you gain is speed and simplicity. compared to jails and containers, what you gain is simplicity.

I wonder if it would not be better, to use an actual OS written with a modern language, so that we retain some of the useful attributes (multiprocessing, ie each "vm" is really just a container with a messaging system, unshared filesystems, but with some shared areas, etc.)

viktorsovietov 13 years ago

every instance can export its monitoring information as 9p virtual filesystem, which easily can be mounted from outside
we debug server code in BEAM, Erlang on Xen is a deployment platform, if instance crashed we simply restart it.
intruder has very few chances to find breaking in beneficial - there's no shell inside which gives only minimal chances to snatch control, instance simply will crash. also, having of no OS leaves no holes to dig deeper
You're correct, it's exokernel-like approach.
We gain simplicity, much better resource consumption characteristrics, manageability at large scale and much better instance mobility. And, well, security.
- p9idf 13 years ago
  
  >every instance can export its monitoring information as 9p virtual filesystem
  How do you handle authentication?
  
  viktorsovietov 13 years ago
  
  right now it looks like http://erlangonxen.org/more/mumble
- zobzu 13 years ago
  
  It mainly mean that instead of using a shell that runs /bin/sh and associated control commands (ls, cat, whatever), you've to bring your own shell code and call the functions yourself. I assume one would write such a loader in erlang that serves a webpage to query any content from the fs, database, etc.
  Also, IPC seems to be mainly network based, which means latency. Some modern OS designs function with the same base ideas: managed runtime, small codebase, fully contained processes but use system-local IPC and thus, do have multi-processing (instead of multi-nodes, or in fact, in addition to multi-nodes).
  Maybe some of those should be written in a web-friendly language and ship and httpd for adoption (so far they've not been adopted as the cost of rewriting apps > using archaic OSes)
  Ideally I'd see an OS with:
  - above characteristics (singularity, plan9 like)
  - Simple, fast, efficient filesystem (i.e. with features and performance as good as popular databases) - so you don't need a database server
  - clustered resources that are language-aware: filesystem (database), cpu, memory are networked resources, but you get control from the code about what is executed on the same local instance (ie same physical system) and what can be shipped to "any instance" - this brings true, full elasticity. (all this is also a little plan9-ish but not exactly)
  
  jacques_chester 13 years ago
  
  > Simple, fast, efficient filesystem (i.e. with features and performance as good as popular databases) - so you don't need a database server
  File systems and databases (of any kind) are not 1:1 substitutes.

jared314 13 years ago

It sounds like they exchanged one container, an OS managed process, for another one.

rwmj 13 years ago

This. If you consider KVM, where an OS container is a regular process, then you can see there'd be no point in doing this for KVM since you might as well just run a Linux process. It's only "interesting" for Xen because the Xen hypervisor has a strangely architected and relatively heavyweight "process".
- edwintorok 13 years ago
  
  Agreed, if you are in control of the host OS then you might as well just run some unprivileged processes in some lightweight containers (LXC, or just user/network namespaces + chroot).
  There are two situations where running applications directly on the hypervisor might be beneficial though:
  - If you have to run multiple untrusted (and possibly malicious) tasks on same host. In this case virtualization provides better isolation/security than just containers or processes (which would be vulnerable to kernel exploits, or other ways of gaining root inside a container allows to take control of the host)
  - If the hypervisor is fixed and you can't touch it (i.e. Amazon EC2). In this case running applications directly on the hypervisor might eliminate some of the virtualization overhead.
  
  rwmj 13 years ago
  
  For the first case, I'd rather use something like Google Native Client.
  The attack surface of a VM is rather large, particularly when you consider the dozens of emulated devices and instructions. For example, there was a notable hole in SCSI emulation recently discovered which let any guest overwrite any part of a host's disk (by sending some obscure SCSI commands which weren't being filtered by the virtualized device properly).
  
  e12e 13 years ago
  
  > For the first case, I'd rather use something like Google Native Client.
  Wait, you're saying Google NaCl could be going the way of Java -- being used for server side containers, rather than client side apps? (I don't really see why you'd want that over using a single process(tree)/binary compiled from go...).
  
  e12e 13 years ago
  
  Oh, I see. Zerovm[1,2] is built on top of NaCl, and does indeed appear to be a "javaosification" of NaCl -- with no bytecode, no dedicated language, but a sandbox/lockdown of native code.
  [1] https://news.ycombinator.com/item?id=3746222 [2] http://zerovm.org
pjmlp 13 years ago

The idea behind this systems is that there is no operating system any longer.
The programming language runtime is mapped to execute directly on top of the hypervisor.
There are already such systems for Erlang, OCaml, Haskell and Java runtimes.
If the programming language has a good library, there is no need for an operating system, the hardware access can be taken care by the hypervisor.
This leads to safer and faster servers running on virtual machines.
- pacala 13 years ago
  
  If the hardware access is taken care by the hypervisor, doesn't this imply the hypervisor supports the functionality of device drivers, i.e. the hypervisor has become the OS? Granted, it may delegate the device driver work to a some VM, in which case we're dealing with a microkernel of sorts.
  
  pjmlp 13 years ago
  
  In a way yes, the hypervisor is the new OS, although a very thin one.
  Some kind of microkernel, yes. With the set of required features for hardware abstraction and virtualization, with everything else being provided by the respective language runtime.
  This is much faster than traditional operating system stacks, and may provide a way into bringing microkernels into mainstream OSs.
  When using programming languages that come with batteries included, meaning with a good set of libraries for all the usual OS services, then removing a few layers between hardware and application helps running everything faster and increases security due to a smaller set of code.
  The guys behind Microsoft's Singularity project are also researching something similar for Windows, a project named Drawbridge, where the full OS runs on a hypervisor as a set of libraries in user space.
  Anyway this is nothing new, the idea of a virtualized OS goes back to OS/360, it was just kept away in the mainframe world and is now becoming mainstream.
viktorsovietov 13 years ago

When you sat in container, you're limited with libraries of host OS, and there still will be some issues with migration and resource management (on large scales)
wiml 13 years ago

Yes. Everything old is new again, it seems. Many old OSes were VM hypervisors in today's parlance--- VM/370 comes to mind; several of the first Unix and other systems I got to play with were virtual machines running on IBM big iron--- and it was also reasonably common to run programs directly in the VM without a guest OS.
When I was in school, a common formalism for talking about operating system design was that they provided a virtual machine abstraction to each process. As time went on the "devices" looked less and less like actual hardware devices, so instead of having an emulated punchcard reader and emulated interrupts, you'd have a read(2) syscall. But that's a relatively unimportant detail.
And with paravirtualization the cycle continues: the abstraction the guest sees is no longer that of a bare metal machine, but one with an increasing number of syscall-like hypervisor calls...
- pjmlp 13 years ago
  
  I consider UNIX the C language runtime in a way. :)

CurtMonash 13 years ago

If it's fast creation/destruction of VMs you want, for scale-out or other purposes, there's always ZeroVM. https://news.ycombinator.com/item?id=3746222

continuations 13 years ago

> 0.6 sec ago is when we received your request. Within this time we managed to create a new Xen instance, boot it, and run the application that rendered the page you are viewing. By the time you are done reading this, the instance will be gone.

So every time it receives a request it launches a new Xen instance, processes the request, sends a response, and then kill the Xen instance?

And all that takes 600ms just to process a single request?

Why would anyone want to do that when I could just have a persistent JVM running on Linux to process requests in under 10ms?

What's the point of this?

ithkuil 13 years ago

> What's the point of this?
to show that's possible.
There are certainly better use cases for that. From the top of my head:
1) Per user VM. The first request for a logged in user will spawn a VM and then all requests for her will be routed there
2) Fine grained elasticity control. Currently it's difficult to decide when to spawn new VMs, because it takes time to spawn a new one (and billing is computed hourly, but that's another issue), so it complicated to decide whether to pump up the steam or wait a little bit more. Having a sub-second startup time could help creating better elasticity controls.

mwcampbell 13 years ago

How is Xen better in this context than a general-purpose OS kernel, be it Linux, BSD, or Illumos?

For a "truly elastic cloud", what one really needs is a homogeneous pool of host machines on which heterogeneous workloads can be quickly started and stopped. For this, I believe a general-purpose OS kernel is better, even if one dispenses with the usual accompanying userland, because such a kernel is already equipped to run more workloads directly.

philsnow 13 years ago

Why should the pool of host machines be homogeneous ? Just for ease of administration ?
pjmlp 13 years ago

Because it follows a picokernel model, where it runs an hypervisor directly on top of the hardware to assure the minimal set of hardware integration features.
Everything else not required by your application just wastes hardware resources.
This allows for a higher use of applications per physical machine.

arb99 13 years ago

" 0.6 sec

ago is when we received your request. Within this time we managed to create a new Xen instance, boot it, and run the application that rendered the page you are viewing. By the time you are done reading this, the instance will be gone. "

for a basic page like that is a bit slow...

peteretep 13 years ago

Presumably the idea is that you run a pool of them, so you're not spawning a new instance each time?
- mmcnickle 13 years ago
  
  I got the impression that you _are_ expected to spawn a new instance each time.
  
  smosher 13 years ago
  
  I think so too. But simple pages in this configuration just measure nginx performance (check the breakdown)—that's a good sign for the rest of the process.
  
  peteretep 13 years ago
  
  My impression is that the fast loadup is for scaling upwards as needed. I am only expecting it to spawn a new piece if I am the first person whose request can't be answered as part of a wave of unexpected traffic... Why else would you want this?
smosher 13 years ago

Thanks for pointing it out, I gave me an excuse to look at the numbers:
I took 4.6 seconds when I loaded it. In the breakdown ~4.3s of that was nginx dicking around (probably spending much of its time telling people it can't spawn an instance right now.) The lightweight nature of the page is obviously exposing the nginx overhead, which is turning out to be a bottleneck. Based on all that: with more complex pages, a more realistic limit on instances and similar resource utilization, I would expect it to be down right snappy.
- arb99 13 years ago
  
  fair enough. was just pointing it out as it is presented as if it is fast for a page load time. neat project though.
  
  smosher 13 years ago
  
  It is a good point, when the load time is measured in seconds it doesn't leave one with the impression that it's fast. It would be better to see it serving with a more realistic configuration.
jlouis 13 years ago

True, but the Erlang VM is not really written for fast startup. Module loading is serial and not parallel. The key here is that you can scale up in 0.6 seconds, which is awfully much faster than most other solutions, where scaling takes minutes.
- viktorsovietov 13 years ago
  
  BEAM ("official" Erlang VM) starts much longer. Erlang On Xen has it's own VM (written from scratch), it was designed for run without OS and to srtart very fast, particularly.
- dons 13 years ago
  
  While halvm boots in the 10ms range (like starting a GHC process). These can be very fine grained and fast.

biot 13 years ago

Previous discussion: https://news.ycombinator.com/item?id=5243360

Goranek 13 years ago

Wow! Just Wow!