This is _really_ cool. I don't know why a lot of commenters here are going into the weeds to grouse about Java, flash, and general anger at computational notebooks.
What we have here is a complete client-side browser environment for development. Not some half-assed language or hyper restricted toy--this is real Python, and your browser's full JS engine all available in JupyterLab's IDE (basically a simpler VS Code at this point, it uses the same editing component).
We all freaked out a bit as Apple drove out IDEs from their app store, Google locked down Termux and similar developer tools from Android. Well, here's the answer to those situations. Something no app store owner can kill on a whim. I love stuff like this and hope it helps to enable and inspire the next generation of developers.
Despite Pyolite has a miserable performance (20MB of downloads), the overall project direction is correct.
I said this already 10 years ago: We don't need more cloud computing but need to empower users end devices again. Jupyter is typically operated on powerful notebooks and not on mobile devices.
It's convenient to directly open a notebook from a browser download or a slack message without having to fire up jupyter in a terminal and navigate it to that temp file.
Vscode is now my go-to for opening and running notebooks. The issue I had with the normal jupyter was it it's support for normal python files were bad. With vscode I can work on both normal python files and notebook files in the same window without much overhead
Pycharm is also excellent at this. My biggest problem with the normal jupyter interface is years of reflex that Ctrl+w is for deleting words. Quite annoying in a web browser.
I like that it is a standalone notepad.exe-like program that opens my notebook and just works. I usually use it more for looking at notebooks than writing them tho.
Think we've just got better at runtimes over time (JIT, intermediate formats etc.) and WASM was designed to be good for this rather than needed to work with an already exising bytecode
Reflection and other language features preclude direct translation of Java bytecode to machine code, whereas WASM is designed to be a portable assembly language, closer to the IL of GCC or LLVM.
java -XX:+PrintCompilation prints all methods/loops generated to native code
The reflection API has two issues, a security check each time you call a method and the arguments being transformed to objects.
The code is still generated to assembly code but the assembly code is slower because of that overhead.
There are crucial differences between Java applets and JS.
- Applets tried to render their own GUI, Wasm doesn't and defers to the browser.
- applets needed a big, slow to start and resource hungry VM. Wasm is running in the same thread your JS is also running in, it's light, and loads faster than JS
- Java and flash were plugins, which needed to be installed and kept up to date separately. Wasm is baked into your browser's JS engine
- Wasm code is very fast and can achieve near native execution speeds. It can make use of advanced optimisations. SIMD has shipped in Chrome, and will soon in Firefox
- The wasm spec is very, very good, and really quite small. This means that implementing it is comparatively cheap, and this should make it easy to see it implemented by different vendors.
- Java was just Java. Wasm can serve as a platform for any language. See my earlier point about the spec
So it's apples and oranges. The need to have something besides JS hasn't gone away, so their use cases might be similar. The two technologies couldn't be more distinct, though.
You must view the browser with JS and WASM as a unit.
The browser renders it's own GUI too, it's not OS native
The browser uses lots of resources too.
The browser is kind of a plugin to the OS and must be updated separately.
Java nowadays is pretty fast too.
Java VM serves a platform for multiple languages like Scala, Kotlin, Clojure.
Let's face it, the browser is the new JVM and a soon it gets the same permissions like the JVM to access the file system and such, we get the same problems.
> a soon it gets the same permissions like the JVM to access the file system and such
Like... never?
We get better systems as we get more experience. That's why C# was better than Java, Java today is better than Java was when C# launched. That's why we now have amazing languages like Rust and also that's why the same problems will never be the same given we have a ton of experience with VMs, docker, sandboxing in browsers etc.
> https://developer.mozilla.org/en-US/docs/Web/Security/Subres... : "Subresource Integrity (SRI) is a security feature that enables browsers to verify that resources they fetch (for example, from a CDN) are delivered without unexpected manipulation. It works by allowing you to provide a cryptographic hash that a fetched resource must match."
> There's a new Native Filesystem API: "The new Native File System API allows web apps to read or save changes directly to files and folders on the user's device." https://web.dev/native-file-system/
> We'll need a way to grant specific URLs specific, limited amounts of storage.
> You must view the browser with JS and WASM as a unit
“Web” assembly is a bit of a misnomer. It’s an IR at the end of the day and can be run without a browser[1]. But your other points could be true one day if the de facto WASM runtime becomes bloated or decides to ship with some GUI renderer.
> I said this already 10 years ago: We don't need more cloud computing but need to empower users end devices again. Jupyter is typically operated on powerful notebooks and not on mobile devices.
If you’re working with data of any significant size at all it then it doesn’t matter how fast your user device is—it’s so much cheaper (time and network egress costs) to send the computations from a user device to the cloud than to pull tens-thousands of GB of data to your local machine. Moreover, I don’t know of many local machines with tens of CPU cores, hundreds or thousands of GB of RAM, or tens-hundreds of TB of SSD for handling that computation quickly.
User devices are great for very small data, but I don’t see the point for larger datasets.
I don’t believe that the majority of Jupyterhub users are “consumers” rather than professionals, but more importantly, that doesn’t change the fact that the professional use case exist and isn’t amenable to the fat client approach.
As an anecdote, I work on a Jupyterhub managed service offering with customers in both the private and public sectors and our data sizes are pretty much all in this range.
Isn't it, though? The people most likely to need a cloud-managed service like that probably have too much data to crunch on a laptop, as you described.
Ah, to be clear I’m not saying our users are a representative sample of Jupyterhub users. I am saying that there are a lot of people who use Jupyterhub for large datasets—it’s certainly not uncommon.
Most of the users will not run Jupyter Notebooks locally, they expect a remote kernel in the cloud with GPU/multicore machines, only the web app will be running in browser (think of big query). Of course small users with you datasets will run locally.
This depends greatly what you want to do. After all, this has incredible latency (compared to the cloud), limited throughput. Zero configuration. Free.
You want to train the next imagenet model? Analyse 100Gig database? Probably not the correct tradeoff.
You want 20 students to have a perfectly consistent instant-start-up python environment? Definitely the correct tradeoff.
You want to try some python methods, write tests, ... Very short latency is going to help you more than throughput is.
To be fair, "we don't need more cloud computing" doesn't mean, "we don't need any cloud computing".
I don't agree that we don't need more of cloud, IMO we do, but we need to focus on personal computing much more than now, which is the general theme of the GP comment.
To be clear, I’m not talking about big data. More impotantly though, I didn’t say there was no place for client compute, only that it isn’t economical for datasets in excess of a few GB.
Any substance on why we shouldn’t be doing it or why it’s stupid? What’s the alternative? Should researchers all learn Kubernetes and AWS and deploy their own environments?
The problem with Jupyter is that it impedes common-sense practices like version control, reproducibility, and automation.
If you're spending the time and effort to rent these big servers, why not spend the 5 percent of the effort and do it right?
Jupyter exists mostly because analytics/math guys are too lazy to spend a day learning software development practices. Must be some sort of us-vs-them point of misplaced pride.
> Jupyter exists mostly because analytics/math guys are too lazy to spend a day learning software development practices
Bahahahaha, yup those dang lazy mathematicians just shooting themselves in the foot and forcing us to deal with it! /s
Your preferences for software development are irrelevant. The value is in delivering the math to the end user. Using GCP, kubernetes or JavaScript to do that is an implementation detail. Sorry to tell you this, but you're a servant to those dang lazy analysts and without their insights, you're worthless.
The guy above correctly said that Jupyter is nice for ad-hoc analysis kind of work. The problem is that when you've reached terabytes of data and tens of cores you're not "ad-hoc" anymore.
Too often the math guys try to avoid responsibility by claiming they're doing "ad-hoc" work when they're clearly not anymore. It's convenient, yes, but leads to a bad place eventually.
I understood fine. The problem is trying to formalize "ad hoc" versus "production development" practices as if they're meaningful.
There isn't some golden truth of software development that analysts are too lazy to learn and implement. There's the problem and then there's solutions. Complaining that Jupyter-based development doesn't adequately accomdate version control or some other whistle commonly used in software development is some peak developer entitlement.
That’s fine. Why waste time productionizing something that may or may not ever go to production? We do this all the time. Do you think Tesla built a whole fully-automated factory to build its first proof-of-concept car?
> The problem is that when you've reached terabytes of data and tens of cores you're not "ad-hoc" anymore.
Not in any meaningful way, the code to explore a 50MB data set on a little machine looks the same as the code to explore a 50GB data set on a big machine, so of course one doesn’t need version control more than the other.
> Too often the math guys try to avoid responsibility by claiming they're doing "ad-hoc" work when they're clearly not anymore. It's convenient, yes, but leads to a bad place eventually.
This is a moralistic argument. The economic argument is that the primary artifact of research is insight, not code, so getting to that novel insight as fast as possible is paramount. Putting version control, tests, or other ceremony into the exploration loop is a pointless cost. Productionizing that insight can happen later in a more traditional software development workflow. It’s similar to how we write proofs of concept without the intensive testing effort that we would go into if we were writing production software.
>The problem is that when you've reached terabytes of data and tens of cores you're not "ad-hoc" anymore.
Oh, yes you are. "Ad-hoc" doesn't imply small. It implies one-off exploration and creation. You have no less of a need to do that with terabytes of data than you do with megabytes.
I think that's a misguided take. I'm a software developer and data scientist (of sorts). Jupyter is an extremely convenient tool for adhoc data analysis. By default it gives you easy visualizations and inspection ability, allowing you to verify intermediate computation before rerunning.
It's easy to convert a notebook into a script with version control, reproducibility, and automation.
I am shocked at your unjustified and rather arrogant gatekeeping here.
Version controlled Jupyter notebooks running in an automated environment (eg Kubernetes), with repeatable test data loading into a processing environment (eg. MinIO or Spark), is quite commonplace. Making it even easier with WASM makes sense.
What about Jupyter impedes good practices? That it empowers ad hoc exploration at all? It is merely an IDE tailored to sharing interactive text and code. To me it is one of the most exciting ecosystems for modern software development (and I’ve been developing software for 30 years).
Not only do cloud services offer better compute capabilities (and GPUs/TPUs etc), but they offer easier reproduciblity and sharing. Even when I hack on stuff myself, Colab is quick and easy to set up, no worrying about Docker or virtualenvs.
Unfortunately we will need more cloud computing. If you're watching what's going on in the ransomware and cyber insurance space, small and many medium-sized companies that require E&O coverage for their contracts are not going to be able to afford to run on their own equipment.
Good. People who are bad at administering computers will stop doing it, and will focus on what they're good at.
Then, we can use the on-demand nature of cloud services to reduce their power consumption. Simultaneously we can move that consumption into renewable-powered datacenters. This is literally better for everyone.
I don't understand your argument, people should do things they're bad at to keep some other company from getting too big? And oh by the way they'll now probably be using more and less sustainable power, but at least they'll be less secure. That sounds like a very high price to pay for ideological purity...
I use frequently the basthon notebook https://notebook.basthon.fr/ which is also a wasm powered jupyter (based on pyodide) and quite like it. It's flying a bit under the radar as it's not translated in other langage than french. How does it compare to this project?
Just to throw in my 2c. I'm a big fan of jupyter notebooks for data visualisation and prototyping pipelines. Being able to run all of your data analysis code then run your visualisation code without having to repipe your data is great.
For other projects such as bots, webapps or regular programming I'll use vscode or neovim.
Which brings me to a neat feature in vscode that has me starting to do my data vis there. The `# %%` in python files lets me run code blocks in the same way as a notebook, but then i can run the file top to bottom aswell. It is starting to change the way i prototype.
Not to mention it can open and edit .ipynb files
How far are we away from having _collaborative_ Jupyter in the browser? Would love a Google Docs experience of sharing a no-sign up required link to help remotely teach basic python.
Livebook does this! It only runs Elixir for now, but there is an issue to add other languages in future. It’s a really cool project IMO. https://github.com/elixir-nx/livebook
Not that far, someone just needs to make a JupyterLab plugin that uses Automerge or a similar OT/CRDT structure for collaborative editing documents in a workspace (perhaps using WebRTC data channels for P2P sync between clients, or stick with the tried and true server model like Google docs). The trouble is turning that into something as polished and secure as Google Docs collaborative editing experience--there's a _lot_ of work to get there with tons of little corner cases, security issues (you're potentially giving strangers over the internet access to remotely run code in your browser--that should raise big alarm bells), etc. to think through. But the basic stuff is all out there for someone motivated to pick up and go wild with.
This reuses almost all the RTC work done upstream in JupyterLab itself.
And since this is implemented as a regular JupyterLab plugin, folks will then be able to swap it for something else and implement their own if they want to, as a federated extension.
We offer a collaborative notebook experience on https://iko.ai.
It's our internal machine learning platform that solves a bunch of our problems. We initially started building iko for internal because our projects took a toll on us.
- No-setup, fresh, notebook environments with the most popular libraries pre-installed.
- Real-time collaborative notebooks to see your teammates' changes live. Pair program, troubleshoot, and prototype together.
- Deploys your Streamlit application (we use this for prototypes to show clients so that a data scientist does not have to spin up a VM on GCP, install dependencies and environment, get the model, build the application, etc).
- Multiple notebook versions
- Leverage GPUs and schedule long-running notebooks that survive closed browsers and network disruptions. Watch your notebook's output as it runs from multiple devices.
- Automatic experiment tracking to detect your models, parameters, and metrics and saves them without you remembering to do so or pollute your notebook with tracking code. Know which parameters produced which model on which data.
- Easily deploy your model and get a "REST endpoint" so data scientists don't tap on anyone's shoulder to deploy their model, and developers can use the models without being dragged into the ML realm. You also can invoke it by entering data or uploading a CSV file.
- Build a Docker image for your model and push it to a registry (DockerHub or GitLab for now) to use it wherever you want
- Monitor your models' performance on a live dashboard and know if your model is losing its predictive power.
- Publish notebooks as AppBooks: automatically parametrize a notebook to enable clients to interact with it without being overwhelmed by code, or exporting as PDF, or building an application, or mutating the notebook.
More on our roadmap. We're only focusing on actual problems we have faced serving our clients, and problems we are facing now. This is not a "startup idea"; we're building what we need but we'd love to hear your thoughts and problems you have faced we may not be familiar with.
No, read more from the README. This uses pyodide, a WASM port of desktop Python that runs entirely in the browser (obviously stuff like file access is sandboxed). In addition it adds a web worker that runs a Javascript kernel powered by your browser's JS engine. All of this runs 100% in your browser, there is no server component at all.
This is _really_ cool. I don't know why a lot of commenters here are going into the weeds to grouse about Java, flash, and general anger at computational notebooks.
What we have here is a complete client-side browser environment for development. Not some half-assed language or hyper restricted toy--this is real Python, and your browser's full JS engine all available in JupyterLab's IDE (basically a simpler VS Code at this point, it uses the same editing component).
We all freaked out a bit as Apple drove out IDEs from their app store, Google locked down Termux and similar developer tools from Android. Well, here's the answer to those situations. Something no app store owner can kill on a whim. I love stuff like this and hope it helps to enable and inspire the next generation of developers.
Despite Pyolite has a miserable performance (20MB of downloads), the overall project direction is correct.
I said this already 10 years ago: We don't need more cloud computing but need to empower users end devices again. Jupyter is typically operated on powerful notebooks and not on mobile devices.
The solution I use for local jupyter notebooks is nteract [0] which is like a standalone application that can edit/open .ipynb files.
It has a few quirks, but works quite good for daily use.
[0] https://nteract.io/
What do you like about nteract over Jupyter's frontend? The website is devoid of details.
It's convenient to directly open a notebook from a browser download or a slack message without having to fire up jupyter in a terminal and navigate it to that temp file.
I use vscode to open and run notebooks using the jupyter plugin. Works great. Don’t need to fire up the browser
Vscode is now my go-to for opening and running notebooks. The issue I had with the normal jupyter was it it's support for normal python files were bad. With vscode I can work on both normal python files and notebook files in the same window without much overhead
Pycharm is also excellent at this. My biggest problem with the normal jupyter interface is years of reflex that Ctrl+w is for deleting words. Quite annoying in a web browser.
Yeah, that's exactly the same reasoning. Because it's more convenient you run your notebooks in a dedicated tool. Nteract or Vscode or Pycharm..
I like that it is a standalone notepad.exe-like program that opens my notebook and just works. I usually use it more for looking at notebooks than writing them tho.
vscodium can also open ipynb files, to give an alternative.
I try to work mostly with "py:percent" script files now, in the same jupyter notebook style but without saving outputs.
Similar to jupyterlite: https://starboard.gg/jupystar (and https://starboard.gg)
Isn't that not just Java all over again, but this time with JavaScript?
No, WASM gets compiled to native code
Java bytecode got translated to native too
Think we've just got better at runtimes over time (JIT, intermediate formats etc.) and WASM was designed to be good for this rather than needed to work with an already exising bytecode
Reflection and other language features preclude direct translation of Java bytecode to machine code, whereas WASM is designed to be a portable assembly language, closer to the IL of GCC or LLVM.
java -XX:+PrintCompilation prints all methods/loops generated to native code
The reflection API has two issues, a security check each time you call a method and the arguments being transformed to objects. The code is still generated to assembly code but the assembly code is slower because of that overhead.
Java and Flash, but now it is Good (TM), because the powers that be decided so.
"Everything Old is New Again: Binary Security of WebAssembly"
https://www.usenix.org/conference/usenixsecurity20/presentat...
So, enjoy the 2nd coming of applets/flash,
https://platform.uno/
https://dotnet.microsoft.com/apps/aspnet/web-apps/blazor
https://tinygo.org/
.... favourite stack compiled into WASM.
There are crucial differences between Java applets and JS.
- Applets tried to render their own GUI, Wasm doesn't and defers to the browser.
- applets needed a big, slow to start and resource hungry VM. Wasm is running in the same thread your JS is also running in, it's light, and loads faster than JS
- Java and flash were plugins, which needed to be installed and kept up to date separately. Wasm is baked into your browser's JS engine
- Wasm code is very fast and can achieve near native execution speeds. It can make use of advanced optimisations. SIMD has shipped in Chrome, and will soon in Firefox
- The wasm spec is very, very good, and really quite small. This means that implementing it is comparatively cheap, and this should make it easy to see it implemented by different vendors.
- Java was just Java. Wasm can serve as a platform for any language. See my earlier point about the spec
So it's apples and oranges. The need to have something besides JS hasn't gone away, so their use cases might be similar. The two technologies couldn't be more distinct, though.
You must view the browser with JS and WASM as a unit.
The browser renders it's own GUI too, it's not OS native
The browser uses lots of resources too.
The browser is kind of a plugin to the OS and must be updated separately.
Java nowadays is pretty fast too.
Java VM serves a platform for multiple languages like Scala, Kotlin, Clojure.
Let's face it, the browser is the new JVM and a soon it gets the same permissions like the JVM to access the file system and such, we get the same problems.
> a soon it gets the same permissions like the JVM to access the file system and such
Like... never?
We get better systems as we get more experience. That's why C# was better than Java, Java today is better than Java was when C# launched. That's why we now have amazing languages like Rust and also that's why the same problems will never be the same given we have a ton of experience with VMs, docker, sandboxing in browsers etc.
From https://news.ycombinator.com/item?id=24052393 re: Starboard:
> https://developer.mozilla.org/en-US/docs/Web/Security/Subres... : "Subresource Integrity (SRI) is a security feature that enables browsers to verify that resources they fetch (for example, from a CDN) are delivered without unexpected manipulation. It works by allowing you to provide a cryptographic hash that a fetched resource must match."
> There's a new Native Filesystem API: "The new Native File System API allows web apps to read or save changes directly to files and folders on the user's device." https://web.dev/native-file-system/
> We'll need a way to grant specific URLs specific, limited amounts of storage.
[...]
> https://github.com/deathbeds/jyve/issues/46 :
> Would [Micromamba] and conda-forge build a WASM architecture target?
> You must view the browser with JS and WASM as a unit
“Web” assembly is a bit of a misnomer. It’s an IR at the end of the day and can be run without a browser[1]. But your other points could be true one day if the de facto WASM runtime becomes bloated or decides to ship with some GUI renderer.
[1] https://github.com/bytecodealliance/wasmtime
> I said this already 10 years ago: We don't need more cloud computing but need to empower users end devices again. Jupyter is typically operated on powerful notebooks and not on mobile devices.
If you’re working with data of any significant size at all it then it doesn’t matter how fast your user device is—it’s so much cheaper (time and network egress costs) to send the computations from a user device to the cloud than to pull tens-thousands of GB of data to your local machine. Moreover, I don’t know of many local machines with tens of CPU cores, hundreds or thousands of GB of RAM, or tens-hundreds of TB of SSD for handling that computation quickly.
User devices are great for very small data, but I don’t see the point for larger datasets.
Most users don't have 'tens-thousands' of GB of data as part of their use case. You're describing a business case, not an end user consumer case.
I don’t believe that the majority of Jupyterhub users are “consumers” rather than professionals, but more importantly, that doesn’t change the fact that the professional use case exist and isn’t amenable to the fat client approach.
As an anecdote, I work on a Jupyterhub managed service offering with customers in both the private and public sectors and our data sizes are pretty much all in this range.
> I work on a Jupyterhub managed service offering ...
That sounds like selection bias.
It might be. Like I said: anecdote.
Isn't it, though? The people most likely to need a cloud-managed service like that probably have too much data to crunch on a laptop, as you described.
Ah, to be clear I’m not saying our users are a representative sample of Jupyterhub users. I am saying that there are a lot of people who use Jupyterhub for large datasets—it’s certainly not uncommon.
Most of the users will not run Jupyter Notebooks locally, they expect a remote kernel in the cloud with GPU/multicore machines, only the web app will be running in browser (think of big query). Of course small users with you datasets will run locally.
This depends greatly what you want to do. After all, this has incredible latency (compared to the cloud), limited throughput. Zero configuration. Free.
You want to train the next imagenet model? Analyse 100Gig database? Probably not the correct tradeoff.
You want 20 students to have a perfectly consistent instant-start-up python environment? Definitely the correct tradeoff.
You want to try some python methods, write tests, ... Very short latency is going to help you more than throughput is.
To be fair, "we don't need more cloud computing" doesn't mean, "we don't need any cloud computing".
I don't agree that we don't need more of cloud, IMO we do, but we need to focus on personal computing much more than now, which is the general theme of the GP comment.
Not everything is big data.
To be clear, I’m not talking about big data. More impotantly though, I didn’t say there was no place for client compute, only that it isn’t economical for datasets in excess of a few GB.
You definitely shouldn't be running this stuff from a Jupyter notebook.
Why not? We make a lot of money offering it to people.
Yes, the market in enabling people to do the "you shouldn't be doing this" stupid things is huge.
Any substance on why we shouldn’t be doing it or why it’s stupid? What’s the alternative? Should researchers all learn Kubernetes and AWS and deploy their own environments?
The problem with Jupyter is that it impedes common-sense practices like version control, reproducibility, and automation.
If you're spending the time and effort to rent these big servers, why not spend the 5 percent of the effort and do it right?
Jupyter exists mostly because analytics/math guys are too lazy to spend a day learning software development practices. Must be some sort of us-vs-them point of misplaced pride.
> Jupyter exists mostly because analytics/math guys are too lazy to spend a day learning software development practices
Bahahahaha, yup those dang lazy mathematicians just shooting themselves in the foot and forcing us to deal with it! /s
Your preferences for software development are irrelevant. The value is in delivering the math to the end user. Using GCP, kubernetes or JavaScript to do that is an implementation detail. Sorry to tell you this, but you're a servant to those dang lazy analysts and without their insights, you're worthless.
You misunderstood.
The guy above correctly said that Jupyter is nice for ad-hoc analysis kind of work. The problem is that when you've reached terabytes of data and tens of cores you're not "ad-hoc" anymore.
Too often the math guys try to avoid responsibility by claiming they're doing "ad-hoc" work when they're clearly not anymore. It's convenient, yes, but leads to a bad place eventually.
I understood fine. The problem is trying to formalize "ad hoc" versus "production development" practices as if they're meaningful.
There isn't some golden truth of software development that analysts are too lazy to learn and implement. There's the problem and then there's solutions. Complaining that Jupyter-based development doesn't adequately accomdate version control or some other whistle commonly used in software development is some peak developer entitlement.
Your boss will eventually ask to make your "ad-hoc" stuff "production".
At which point you'll dump the hot mess in somebody else's lap, and the whole thing will be rewritten from scratch.
If that's your thing, then go for it.
That’s fine. Why waste time productionizing something that may or may not ever go to production? We do this all the time. Do you think Tesla built a whole fully-automated factory to build its first proof-of-concept car?
> The problem is that when you've reached terabytes of data and tens of cores you're not "ad-hoc" anymore.
Not in any meaningful way, the code to explore a 50MB data set on a little machine looks the same as the code to explore a 50GB data set on a big machine, so of course one doesn’t need version control more than the other.
> Too often the math guys try to avoid responsibility by claiming they're doing "ad-hoc" work when they're clearly not anymore. It's convenient, yes, but leads to a bad place eventually.
This is a moralistic argument. The economic argument is that the primary artifact of research is insight, not code, so getting to that novel insight as fast as possible is paramount. Putting version control, tests, or other ceremony into the exploration loop is a pointless cost. Productionizing that insight can happen later in a more traditional software development workflow. It’s similar to how we write proofs of concept without the intensive testing effort that we would go into if we were writing production software.
>The problem is that when you've reached terabytes of data and tens of cores you're not "ad-hoc" anymore.
Oh, yes you are. "Ad-hoc" doesn't imply small. It implies one-off exploration and creation. You have no less of a need to do that with terabytes of data than you do with megabytes.
I think that's a misguided take. I'm a software developer and data scientist (of sorts). Jupyter is an extremely convenient tool for adhoc data analysis. By default it gives you easy visualizations and inspection ability, allowing you to verify intermediate computation before rerunning.
It's easy to convert a notebook into a script with version control, reproducibility, and automation.
I am shocked at your unjustified and rather arrogant gatekeeping here.
Version controlled Jupyter notebooks running in an automated environment (eg Kubernetes), with repeatable test data loading into a processing environment (eg. MinIO or Spark), is quite commonplace. Making it even easier with WASM makes sense.
What about Jupyter impedes good practices? That it empowers ad hoc exploration at all? It is merely an IDE tailored to sharing interactive text and code. To me it is one of the most exciting ecosystems for modern software development (and I’ve been developing software for 30 years).
Not only do cloud services offer better compute capabilities (and GPUs/TPUs etc), but they offer easier reproduciblity and sharing. Even when I hack on stuff myself, Colab is quick and easy to set up, no worrying about Docker or virtualenvs.
Unfortunately we will need more cloud computing. If you're watching what's going on in the ransomware and cyber insurance space, small and many medium-sized companies that require E&O coverage for their contracts are not going to be able to afford to run on their own equipment.
Good. People who are bad at administering computers will stop doing it, and will focus on what they're good at.
Then, we can use the on-demand nature of cloud services to reduce their power consumption. Simultaneously we can move that consumption into renewable-powered datacenters. This is literally better for everyone.
Meanwhile we create tremendous concentration risk and the world pays rent to Amazon, Google and Microsoft? I wouldn't call that 'good'.
I don't understand your argument, people should do things they're bad at to keep some other company from getting too big? And oh by the way they'll now probably be using more and less sustainable power, but at least they'll be less secure. That sounds like a very high price to pay for ideological purity...
I use frequently the basthon notebook https://notebook.basthon.fr/ which is also a wasm powered jupyter (based on pyodide) and quite like it. It's flying a bit under the radar as it's not translated in other langage than french. How does it compare to this project?
Just to throw in my 2c. I'm a big fan of jupyter notebooks for data visualisation and prototyping pipelines. Being able to run all of your data analysis code then run your visualisation code without having to repipe your data is great.
For other projects such as bots, webapps or regular programming I'll use vscode or neovim.
Which brings me to a neat feature in vscode that has me starting to do my data vis there. The `# %%` in python files lets me run code blocks in the same way as a notebook, but then i can run the file top to bottom aswell. It is starting to change the way i prototype. Not to mention it can open and edit .ipynb files
How far are we away from having _collaborative_ Jupyter in the browser? Would love a Google Docs experience of sharing a no-sign up required link to help remotely teach basic python.
Livebook does this! It only runs Elixir for now, but there is an issue to add other languages in future. It’s a really cool project IMO. https://github.com/elixir-nx/livebook
Not that far, someone just needs to make a JupyterLab plugin that uses Automerge or a similar OT/CRDT structure for collaborative editing documents in a workspace (perhaps using WebRTC data channels for P2P sync between clients, or stick with the tried and true server model like Google docs). The trouble is turning that into something as polished and secure as Google Docs collaborative editing experience--there's a _lot_ of work to get there with tons of little corner cases, security issues (you're potentially giving strangers over the internet access to remotely run code in your browser--that should raise big alarm bells), etc. to think through. But the basic stuff is all out there for someone motivated to pick up and go wild with.
There is a PR to add initial support for real time collaboration in JupyterLite: https://github.com/jtpio/jupyterlite/pull/109
This reuses almost all the RTC work done upstream in JupyterLab itself.
And since this is implemented as a regular JupyterLab plugin, folks will then be able to swap it for something else and implement their own if they want to, as a federated extension.
JupyterLab does support real-time collaborative editing in the browser, as of a few weeks ago.
https://github.com/jupyterlab/rtc
Doesn’t fufill your wish for no-sign up link yet, but https://hex.tech supports live collaboration for python notebooks.
I think as time goes on, live collaboration is becoming “table stakes” for a lot of tools.
We offer a collaborative notebook experience on https://iko.ai.
It's our internal machine learning platform that solves a bunch of our problems. We initially started building iko for internal because our projects took a toll on us.
- No-setup, fresh, notebook environments with the most popular libraries pre-installed.
- Real-time collaborative notebooks to see your teammates' changes live. Pair program, troubleshoot, and prototype together.
- Deploys your Streamlit application (we use this for prototypes to show clients so that a data scientist does not have to spin up a VM on GCP, install dependencies and environment, get the model, build the application, etc).
- Multiple notebook versions
- Leverage GPUs and schedule long-running notebooks that survive closed browsers and network disruptions. Watch your notebook's output as it runs from multiple devices.
- Automatic experiment tracking to detect your models, parameters, and metrics and saves them without you remembering to do so or pollute your notebook with tracking code. Know which parameters produced which model on which data.
- Easily deploy your model and get a "REST endpoint" so data scientists don't tap on anyone's shoulder to deploy their model, and developers can use the models without being dragged into the ML realm. You also can invoke it by entering data or uploading a CSV file.
- Build a Docker image for your model and push it to a registry (DockerHub or GitLab for now) to use it wherever you want
- Monitor your models' performance on a live dashboard and know if your model is losing its predictive power.
- Publish notebooks as AppBooks: automatically parametrize a notebook to enable clients to interact with it without being overwhelmed by code, or exporting as PDF, or building an application, or mutating the notebook.
More on our roadmap. We're only focusing on actual problems we have faced serving our clients, and problems we are facing now. This is not a "startup idea"; we're building what we need but we'd love to hear your thoughts and problems you have faced we may not be familiar with.
The kernels will still run on the cloud I believe. It would be great if Jupiter works better for larger programs and ui development.
No, read more from the README. This uses pyodide, a WASM port of desktop Python that runs entirely in the browser (obviously stuff like file access is sandboxed). In addition it adds a web worker that runs a Javascript kernel powered by your browser's JS engine. All of this runs 100% in your browser, there is no server component at all.