Isn't this simply an acknowledgement that having an entire OS (at least in a traditional sense) is overkill? I remember a couple of attempts to make the JVM run on bare metal (JavaOS was one if I recall correctly) and I think that, to a lesser extent, ChromeOS, FirefoxOS and company are trying to eliminate some of the bloat that's occurred.
The other day I spun up a CentOS6 server on a VM and it required a minimum of 512MB of RAM. I remember building RedHat5 system (not RHEL5) with 32MB or 64MB. What is the new server doing differently? Functionally I have the exact same machines (yes, the newer machine is more secure, but should that really require 8x the memory?).
So these micro-systems allow the user to go "back to DOS" if they think Windows is too overblown. And if you don't need a windowing system, and you have the language tools you need, why not treat the VMs (or hardware devices) as embedded systems.
There are JVMs that run in bare metal, specially in the embedded space.
This was also the Lisp Machines and Smalltalk.
Native Oberon and AOS are also two systems were the language and OS follow the same principles, although the code is usually compiled to native code, or JITted on load.
For the same reason Lisp Machines had issues with their special processors.
It is a fallacy that those instructions help execution, in the end general purpose processors are quite capable and you just need a native code compiler instead of trying to execute the bytecodes directly.
So most JITs on ARM just take advantage of the native instructions, or you just make use of a native code compiler for Java if required to do so. Although the general public seems unaware of it, many Java SDKs for the embedded market also offer native compilers, even Oracle does it.
I postulated a while back that the virtual machine would eventually be the target of web application design[1]. I wrote an honours project proposal for a blog engine based on reckoning that if you can control a (virtual) machine from the bottom up, why stay trapped in the architectural constraints of shared hosting?
At the time I referred to the Mirage project, which was an OCaml runtime ported to sit directly on Xen [2]; later HalVM came along and did the same for Haskell [3].
This one is particularly neat because they've built more packaging and framework around it to support fast creation and destruction of instances.
I think this is an exciting time. The sooner people stop targeting shared servers as their architectural baseline, the sooner we can start to see genuine architectural innovation again. Right now all the architectural innovation is hidden in one-shot apps and inside companies. It needs to trickle down.
yes, very soon we will start to give attention to writing application over this platform. there is some work still needs to be done in management stack, but it's manageable amount of work.
"The demo is limited to 16 concurrent instances and 2 libvirt connections. Due to these limitations we were unable to spawn a new instance to service your request. Please try again later."
"how do get logs of everyone accessing the server?"
"how do you debug the server itself when it fails?"
"how do you audit the server if an attacker actually breaks in?"
it seems to me that the app has to implement all that as with an exokernel, except, there's a single process running, yours
To my understanding:
compared to traditional vms, what you gain is speed and simplicity.
compared to jails and containers, what you gain is simplicity.
I wonder if it would not be better, to use an actual OS written with a modern language, so that we retain some of the useful attributes (multiprocessing, ie each "vm" is really just a container with a messaging system, unshared filesystems, but with some shared areas, etc.)
every instance can export its monitoring information as 9p virtual filesystem, which easily can be mounted from outside
we debug server code in BEAM, Erlang on Xen is a deployment platform, if instance crashed we simply restart it.
intruder has very few chances to find breaking in beneficial - there's no shell inside which gives only minimal chances to snatch control, instance simply will crash. also, having of no OS leaves no holes to dig deeper
You're correct, it's exokernel-like approach.
We gain simplicity, much better resource consumption characteristrics, manageability at large scale and much better instance mobility. And, well, security.
It mainly mean that instead of using a shell that runs /bin/sh and associated control commands (ls, cat, whatever), you've to bring your own shell code and call the functions yourself. I assume one would write such a loader in erlang that serves a webpage to query any content from the fs, database, etc.
Also, IPC seems to be mainly network based, which means latency. Some modern OS designs function with the same base ideas:
managed runtime, small codebase, fully contained processes but use system-local IPC and thus, do have multi-processing (instead of multi-nodes, or in fact, in addition to multi-nodes).
Maybe some of those should be written in a web-friendly language and ship and httpd for adoption (so far they've not been adopted as the cost of rewriting apps > using archaic OSes)
Ideally I'd see an OS with:
- above characteristics (singularity, plan9 like)
- Simple, fast, efficient filesystem (i.e. with features and performance as good as popular databases) - so you don't need a database server
- clustered resources that are language-aware: filesystem (database), cpu, memory are networked resources, but you get control from the code about what is executed on the same local instance (ie same physical system) and what can be shipped to "any instance" - this brings true, full elasticity. (all this is also a little plan9-ish but not exactly)
This. If you consider KVM, where an OS container is a regular process, then you can see there'd be no point in doing this for KVM since you might as well just run a Linux process. It's only "interesting" for Xen because the Xen hypervisor has a strangely architected and relatively heavyweight "process".
Agreed, if you are in control of the host OS then you might as well just run some unprivileged processes in some lightweight containers (LXC, or just user/network namespaces + chroot).
There are two situations where running applications directly on the hypervisor might be beneficial though:
- If you have to run multiple untrusted (and possibly malicious) tasks on same host. In this case virtualization provides better isolation/security than just containers or processes (which would be vulnerable to kernel exploits, or other ways of gaining root inside a container allows to take control of the host)
- If the hypervisor is fixed and you can't touch it (i.e. Amazon EC2). In this case running applications directly on the hypervisor might eliminate some of the virtualization overhead.
For the first case, I'd rather use something like Google Native Client.
The attack surface of a VM is rather large, particularly when you consider the dozens of emulated devices and instructions. For example, there was a notable hole in SCSI emulation recently discovered which let any guest overwrite any part of a host's disk (by sending some obscure SCSI commands which weren't being filtered by the virtualized device properly).
> For the first case, I'd rather use something like Google Native Client.
Wait, you're saying Google NaCl could be going the way of Java -- being used for server side containers, rather than client side apps? (I don't really see why you'd want that over using a single process(tree)/binary compiled from go...).
Oh, I see. Zerovm[1,2] is built on top of NaCl, and does indeed appear to be a "javaosification" of NaCl -- with no bytecode, no dedicated language, but a sandbox/lockdown of native code.
If the hardware access is taken care by the hypervisor, doesn't this imply the hypervisor supports the functionality of device drivers, i.e. the hypervisor has become the OS? Granted, it may delegate the device driver work to a some VM, in which case we're dealing with a microkernel of sorts.
In a way yes, the hypervisor is the new OS, although a very thin one.
Some kind of microkernel, yes. With the set of required features for hardware abstraction and virtualization, with everything else being provided by the respective language runtime.
This is much faster than traditional operating system stacks, and may provide a way into bringing microkernels into mainstream OSs.
When using programming languages that come with batteries included, meaning with a good set of libraries for all the usual OS services, then removing a few layers between hardware and application helps running everything faster and increases security due to a smaller set of code.
The guys behind Microsoft's Singularity project are also researching something similar for Windows, a project named Drawbridge, where the full OS runs on a hypervisor as a set of libraries in user space.
Anyway this is nothing new, the idea of a virtualized OS goes back to OS/360, it was just kept away in the mainframe world and is now becoming mainstream.
When you sat in container, you're limited with libraries of host OS, and there still will be some issues with migration and resource management (on large scales)
Yes. Everything old is new again, it seems. Many old OSes were VM hypervisors in today's parlance--- VM/370 comes to mind; several of the first Unix and other systems I got to play with were virtual machines running on IBM big iron--- and it was also reasonably common to run programs directly in the VM without a guest OS.
When I was in school, a common formalism for talking about operating system design was that they provided a virtual machine abstraction to each process. As time went on the "devices" looked less and less like actual hardware devices, so instead of having an emulated punchcard reader and emulated interrupts, you'd have a read(2) syscall. But that's a relatively unimportant detail.
And with paravirtualization the cycle continues: the abstraction the guest sees is no longer that of a bare metal machine, but one with an increasing number of syscall-like hypervisor calls...
> 0.6 sec ago is when we received your request. Within this time we managed to create a new Xen instance, boot it, and run the application that rendered the page you are viewing. By the time you are done reading this, the instance will be gone.
So every time it receives a request it launches a new Xen instance, processes the request, sends a response, and then kill the Xen instance?
And all that takes 600ms just to process a single request?
Why would anyone want to do that when I could just have a persistent JVM running on Linux to process requests in under 10ms?
There are certainly better use cases for that. From the top of my head:
1) Per user VM. The first request for a logged in user will spawn a VM and then all requests for her will be routed there
2) Fine grained elasticity control. Currently it's difficult to decide when to spawn new VMs, because it takes time to spawn a new one (and billing is computed hourly, but that's another issue), so it complicated to decide whether to pump up the steam or wait a little bit more. Having a sub-second startup time could help creating better elasticity controls.
How is Xen better in this context than a general-purpose OS kernel, be it Linux, BSD, or Illumos?
For a "truly elastic cloud", what one really needs is a homogeneous pool of host machines on which heterogeneous workloads can be quickly started and stopped. For this, I believe a general-purpose OS kernel is better, even if one dispenses with the usual accompanying userland, because such a kernel is already equipped to run more workloads directly.
Because it follows a picokernel model, where it runs an hypervisor directly on top of the hardware to assure the minimal set of hardware integration features.
Everything else not required by your application just wastes hardware resources.
This allows for a higher use of applications per physical machine.
ago is when we received your request. Within this time we managed to create a new Xen instance, boot it, and run the application that rendered the page you are viewing. By the time you are done reading this, the instance will be gone.
"
I think so too. But simple pages in this configuration just measure nginx performance (check the breakdown)—that's a good sign for the rest of the process.
My impression is that the fast loadup is for scaling upwards as needed. I am only expecting it to spawn a new piece if I am the first person whose request can't be answered as part of a wave of unexpected traffic... Why else would you want this?
Thanks for pointing it out, I gave me an excuse to look at the numbers:
I took 4.6 seconds when I loaded it. In the breakdown ~4.3s of that was nginx dicking around (probably spending much of its time telling people it can't spawn an instance right now.) The lightweight nature of the page is obviously exposing the nginx overhead, which is turning out to be a bottleneck. Based on all that: with more complex pages, a more realistic limit on instances and similar resource utilization, I would expect it to be down right snappy.
It is a good point, when the load time is measured in seconds it doesn't leave one with the impression that it's fast. It would be better to see it serving with a more realistic configuration.
True, but the Erlang VM is not really written for fast startup. Module loading is serial and not parallel. The key here is that you can scale up in 0.6 seconds, which is awfully much faster than most other solutions, where scaling takes minutes.
BEAM ("official" Erlang VM) starts much longer. Erlang On Xen has it's own VM (written from scratch), it was designed for run without OS and to srtart very fast, particularly.
Isn't this simply an acknowledgement that having an entire OS (at least in a traditional sense) is overkill? I remember a couple of attempts to make the JVM run on bare metal (JavaOS was one if I recall correctly) and I think that, to a lesser extent, ChromeOS, FirefoxOS and company are trying to eliminate some of the bloat that's occurred.
The other day I spun up a CentOS6 server on a VM and it required a minimum of 512MB of RAM. I remember building RedHat5 system (not RHEL5) with 32MB or 64MB. What is the new server doing differently? Functionally I have the exact same machines (yes, the newer machine is more secure, but should that really require 8x the memory?).
So these micro-systems allow the user to go "back to DOS" if they think Windows is too overblown. And if you don't need a windowing system, and you have the language tools you need, why not treat the VMs (or hardware devices) as embedded systems.
Very similar to how most projects start with,"Let's design the database model..."
There are JVMs that run in bare metal, specially in the embedded space.
This was also the Lisp Machines and Smalltalk.
Native Oberon and AOS are also two systems were the language and OS follow the same principles, although the code is usually compiled to native code, or JITted on load.
ARM even implemented hardware java bytecode instruction execution. [0]
I don't think it ever got very far in the market, but I don't know all the reasons behind it.
[0]: http://www.arm.com/products/processors/technologies/jazelle....
For the same reason Lisp Machines had issues with their special processors.
It is a fallacy that those instructions help execution, in the end general purpose processors are quite capable and you just need a native code compiler instead of trying to execute the bytecodes directly.
So most JITs on ARM just take advantage of the native instructions, or you just make use of a native code compiler for Java if required to do so. Although the general public seems unaware of it, many Java SDKs for the embedded market also offer native compilers, even Oracle does it.
Nifty.
I postulated a while back that the virtual machine would eventually be the target of web application design[1]. I wrote an honours project proposal for a blog engine based on reckoning that if you can control a (virtual) machine from the bottom up, why stay trapped in the architectural constraints of shared hosting?
At the time I referred to the Mirage project, which was an OCaml runtime ported to sit directly on Xen [2]; later HalVM came along and did the same for Haskell [3].
This one is particularly neat because they've built more packaging and framework around it to support fast creation and destruction of instances.
I think this is an exciting time. The sooner people stop targeting shared servers as their architectural baseline, the sooner we can start to see genuine architectural innovation again. Right now all the architectural innovation is hidden in one-shot apps and inside companies. It needs to trickle down.
[1] http://clubtroppo.com.au/2008/07/10/shared-hosting-is-doomed...
[2] http://openmirage.org/
[3] http://corp.galois.com/halvm
yes, very soon we will start to give attention to writing application over this platform. there is some work still needs to be done in management stack, but it's manageable amount of work.
HalVM looks dead, judging by the age of the last commits, but Mirage seems alive and well.
Not dead. Here's Adam's talk from the Xen Summit 6 months ago - http://www.xen.org/xensummit/xs12na_talks/M9b.html
Thanks, I'm glad to hear that!
I'll have a look. When I first heard of it (last year?) I thought it was brilliant.
"Over capacity"
"The demo is limited to 16 concurrent instances and 2 libvirt connections. Due to these limitations we were unable to spawn a new instance to service your request. Please try again later."
Um.
Irony knows no bounds, except when misconfigured.
I got the same message (02:35 PDT). This must be a joke, given the title. However http://erlangonxen.org works fine.
Sorry gentlemen, we had used libvirt and it set some limits. http://erlangonxen.org/blog/glimpse-truly-elastic
> Sorry gentlemen
And ladies.
Sure, and ladies, ladies never to be forgotten!
I got the same response. But then I refreshed and it worked. Magic, no?
few questions:
"how do get logs of everyone accessing the server?"
"how do you debug the server itself when it fails?"
"how do you audit the server if an attacker actually breaks in?"
it seems to me that the app has to implement all that as with an exokernel, except, there's a single process running, yours
To my understanding: compared to traditional vms, what you gain is speed and simplicity. compared to jails and containers, what you gain is simplicity.
I wonder if it would not be better, to use an actual OS written with a modern language, so that we retain some of the useful attributes (multiprocessing, ie each "vm" is really just a container with a messaging system, unshared filesystems, but with some shared areas, etc.)
every instance can export its monitoring information as 9p virtual filesystem, which easily can be mounted from outside
we debug server code in BEAM, Erlang on Xen is a deployment platform, if instance crashed we simply restart it.
intruder has very few chances to find breaking in beneficial - there's no shell inside which gives only minimal chances to snatch control, instance simply will crash. also, having of no OS leaves no holes to dig deeper
You're correct, it's exokernel-like approach.
We gain simplicity, much better resource consumption characteristrics, manageability at large scale and much better instance mobility. And, well, security.
>every instance can export its monitoring information as 9p virtual filesystem
How do you handle authentication?
right now it looks like http://erlangonxen.org/more/mumble
It mainly mean that instead of using a shell that runs /bin/sh and associated control commands (ls, cat, whatever), you've to bring your own shell code and call the functions yourself. I assume one would write such a loader in erlang that serves a webpage to query any content from the fs, database, etc.
Also, IPC seems to be mainly network based, which means latency. Some modern OS designs function with the same base ideas: managed runtime, small codebase, fully contained processes but use system-local IPC and thus, do have multi-processing (instead of multi-nodes, or in fact, in addition to multi-nodes).
Maybe some of those should be written in a web-friendly language and ship and httpd for adoption (so far they've not been adopted as the cost of rewriting apps > using archaic OSes)
Ideally I'd see an OS with:
- above characteristics (singularity, plan9 like)
- Simple, fast, efficient filesystem (i.e. with features and performance as good as popular databases) - so you don't need a database server
- clustered resources that are language-aware: filesystem (database), cpu, memory are networked resources, but you get control from the code about what is executed on the same local instance (ie same physical system) and what can be shipped to "any instance" - this brings true, full elasticity. (all this is also a little plan9-ish but not exactly)
> Simple, fast, efficient filesystem (i.e. with features and performance as good as popular databases) - so you don't need a database server
File systems and databases (of any kind) are not 1:1 substitutes.
It sounds like they exchanged one container, an OS managed process, for another one.
This. If you consider KVM, where an OS container is a regular process, then you can see there'd be no point in doing this for KVM since you might as well just run a Linux process. It's only "interesting" for Xen because the Xen hypervisor has a strangely architected and relatively heavyweight "process".
Agreed, if you are in control of the host OS then you might as well just run some unprivileged processes in some lightweight containers (LXC, or just user/network namespaces + chroot).
There are two situations where running applications directly on the hypervisor might be beneficial though:
- If you have to run multiple untrusted (and possibly malicious) tasks on same host. In this case virtualization provides better isolation/security than just containers or processes (which would be vulnerable to kernel exploits, or other ways of gaining root inside a container allows to take control of the host)
- If the hypervisor is fixed and you can't touch it (i.e. Amazon EC2). In this case running applications directly on the hypervisor might eliminate some of the virtualization overhead.
For the first case, I'd rather use something like Google Native Client.
The attack surface of a VM is rather large, particularly when you consider the dozens of emulated devices and instructions. For example, there was a notable hole in SCSI emulation recently discovered which let any guest overwrite any part of a host's disk (by sending some obscure SCSI commands which weren't being filtered by the virtualized device properly).
> For the first case, I'd rather use something like Google Native Client.
Wait, you're saying Google NaCl could be going the way of Java -- being used for server side containers, rather than client side apps? (I don't really see why you'd want that over using a single process(tree)/binary compiled from go...).
Oh, I see. Zerovm[1,2] is built on top of NaCl, and does indeed appear to be a "javaosification" of NaCl -- with no bytecode, no dedicated language, but a sandbox/lockdown of native code.
[1] https://news.ycombinator.com/item?id=3746222 [2] http://zerovm.org
The idea behind this systems is that there is no operating system any longer.
The programming language runtime is mapped to execute directly on top of the hypervisor.
There are already such systems for Erlang, OCaml, Haskell and Java runtimes.
If the programming language has a good library, there is no need for an operating system, the hardware access can be taken care by the hypervisor.
This leads to safer and faster servers running on virtual machines.
If the hardware access is taken care by the hypervisor, doesn't this imply the hypervisor supports the functionality of device drivers, i.e. the hypervisor has become the OS? Granted, it may delegate the device driver work to a some VM, in which case we're dealing with a microkernel of sorts.
In a way yes, the hypervisor is the new OS, although a very thin one.
Some kind of microkernel, yes. With the set of required features for hardware abstraction and virtualization, with everything else being provided by the respective language runtime.
This is much faster than traditional operating system stacks, and may provide a way into bringing microkernels into mainstream OSs.
When using programming languages that come with batteries included, meaning with a good set of libraries for all the usual OS services, then removing a few layers between hardware and application helps running everything faster and increases security due to a smaller set of code.
The guys behind Microsoft's Singularity project are also researching something similar for Windows, a project named Drawbridge, where the full OS runs on a hypervisor as a set of libraries in user space.
Anyway this is nothing new, the idea of a virtualized OS goes back to OS/360, it was just kept away in the mainframe world and is now becoming mainstream.
When you sat in container, you're limited with libraries of host OS, and there still will be some issues with migration and resource management (on large scales)
Yes. Everything old is new again, it seems. Many old OSes were VM hypervisors in today's parlance--- VM/370 comes to mind; several of the first Unix and other systems I got to play with were virtual machines running on IBM big iron--- and it was also reasonably common to run programs directly in the VM without a guest OS.
When I was in school, a common formalism for talking about operating system design was that they provided a virtual machine abstraction to each process. As time went on the "devices" looked less and less like actual hardware devices, so instead of having an emulated punchcard reader and emulated interrupts, you'd have a read(2) syscall. But that's a relatively unimportant detail.
And with paravirtualization the cycle continues: the abstraction the guest sees is no longer that of a bare metal machine, but one with an increasing number of syscall-like hypervisor calls...
I consider UNIX the C language runtime in a way. :)
If it's fast creation/destruction of VMs you want, for scale-out or other purposes, there's always ZeroVM. https://news.ycombinator.com/item?id=3746222
> 0.6 sec ago is when we received your request. Within this time we managed to create a new Xen instance, boot it, and run the application that rendered the page you are viewing. By the time you are done reading this, the instance will be gone.
So every time it receives a request it launches a new Xen instance, processes the request, sends a response, and then kill the Xen instance?
And all that takes 600ms just to process a single request?
Why would anyone want to do that when I could just have a persistent JVM running on Linux to process requests in under 10ms?
What's the point of this?
> What's the point of this?
to show that's possible.
There are certainly better use cases for that. From the top of my head:
1) Per user VM. The first request for a logged in user will spawn a VM and then all requests for her will be routed there
2) Fine grained elasticity control. Currently it's difficult to decide when to spawn new VMs, because it takes time to spawn a new one (and billing is computed hourly, but that's another issue), so it complicated to decide whether to pump up the steam or wait a little bit more. Having a sub-second startup time could help creating better elasticity controls.
How is Xen better in this context than a general-purpose OS kernel, be it Linux, BSD, or Illumos?
For a "truly elastic cloud", what one really needs is a homogeneous pool of host machines on which heterogeneous workloads can be quickly started and stopped. For this, I believe a general-purpose OS kernel is better, even if one dispenses with the usual accompanying userland, because such a kernel is already equipped to run more workloads directly.
Why should the pool of host machines be homogeneous ? Just for ease of administration ?
Because it follows a picokernel model, where it runs an hypervisor directly on top of the hardware to assure the minimal set of hardware integration features.
Everything else not required by your application just wastes hardware resources.
This allows for a higher use of applications per physical machine.
" 0.6 sec
ago is when we received your request. Within this time we managed to create a new Xen instance, boot it, and run the application that rendered the page you are viewing. By the time you are done reading this, the instance will be gone. "
for a basic page like that is a bit slow...
Presumably the idea is that you run a pool of them, so you're not spawning a new instance each time?
I got the impression that you _are_ expected to spawn a new instance each time.
I think so too. But simple pages in this configuration just measure nginx performance (check the breakdown)—that's a good sign for the rest of the process.
My impression is that the fast loadup is for scaling upwards as needed. I am only expecting it to spawn a new piece if I am the first person whose request can't be answered as part of a wave of unexpected traffic... Why else would you want this?
Thanks for pointing it out, I gave me an excuse to look at the numbers:
I took 4.6 seconds when I loaded it. In the breakdown ~4.3s of that was nginx dicking around (probably spending much of its time telling people it can't spawn an instance right now.) The lightweight nature of the page is obviously exposing the nginx overhead, which is turning out to be a bottleneck. Based on all that: with more complex pages, a more realistic limit on instances and similar resource utilization, I would expect it to be down right snappy.
fair enough. was just pointing it out as it is presented as if it is fast for a page load time. neat project though.
It is a good point, when the load time is measured in seconds it doesn't leave one with the impression that it's fast. It would be better to see it serving with a more realistic configuration.
True, but the Erlang VM is not really written for fast startup. Module loading is serial and not parallel. The key here is that you can scale up in 0.6 seconds, which is awfully much faster than most other solutions, where scaling takes minutes.
BEAM ("official" Erlang VM) starts much longer. Erlang On Xen has it's own VM (written from scratch), it was designed for run without OS and to srtart very fast, particularly.
While halvm boots in the 10ms range (like starting a GHC process). These can be very fine grained and fast.
Previous discussion: https://news.ycombinator.com/item?id=5243360
Wow! Just Wow!