Mito – Excel-like interface for Pandas dataframes in Jupyter notebook

257 points by alefnula 3 years ago

narush 3 years ago

Hey everyone. Mito cofounder here. Thanks to whoever posted this - was a real surprise to find it here :-)

Mito (pronounced my-toe) was born out of our personal experience with spreadsheets, and a previous (failed) spreadsheet version control product.

Spreadsheets were the original killer app for computers, and are the most popular programming language used worldwide today. That being said, spreadsheets have some growing to do! They don’t handle large datasets well, they don’t lead to repeatable or auditable processes, and generally they disrespect many of the hard won software engineering principals that us engineers fight for.

More than that, as spreadsheet users run into these problems and turn to Python to solve them, they struggle to use pandas to accomplish what would have been two clicks in a spreadsheet. Pandas is great, but the syntax is not always so obvious (not is learning to program in the first place!)

Mito is the our first step in addressing these problems. Take any dataframe, edit it like a spreadsheet, and generate code that corresponds to those edits. You can then take this Python code and use it in other scripts, send it to your colleagues, or just rerun it.

We’ve been working on Mito for over a year now. Growth has really picked up in the past few months - and we’ve begun working with larger companies to help accelerate their transition to Python.

To any companies who are somewhere in that Python transition process - please do reach out - we would love to see if we can be helpful for all your spreadsheet users!

Feel free to browse my profile for other spreadsheet related thoughts, I’m a bit of a HN junkie. Of course, any and all feedback (positive or negative) is appreciated.

My cofounders and I will be trolling about in the comments. Say hey! :-)

aarondia 3 years ago

Heyo! Another co-founder here. Excited to see Mito on HN :) Thanks @alefnula for posting!
+1 to everything @narush said.
It's important to us that the software we build is empowering to users and not restrictive. This plays out in two primary ways: 1) Since Mito is open source and generates Python code for every edit, Mito doesn't lock users into a 'Mito ecosystem', instead it help users interact with the powerful & robust Python ecosystem. 2) Because Mito is an extension to Jupyter Notebooks + JupyterLab, Mito improves your existing workflows instead of completely altering your data analytics stack.
Excited to interact with you all in the comments :)
- kite_and_code 3 years ago
  
  Can you please clarify what you mean by "mito is open-source"?
  Last time I checked the code was under a proprietary license.
  Edit: I found in another comment below that mito is now available under GPL license here: https://github.com/mito-ds/monorepo/blob/dev/LICENSE
  Edit2: Just saw your answer now - thanks for the clarification and links!
  
  aarondia 3 years ago
  
  Mito is licensed [1] under the AGPL liscence. The TLDR of the license is that you can use, distribute, and modify Mito for free, but any modifications that you make need to be shared back with the Mito community.
  There is an additional version of Mito, Mito Pro, that is licensed under a different license that provides access to advanced functionality only if you are paying for a Mito Pro / Enterprise subscription.
  [1] https://github.com/mito-ds/monorepo/blob/dev/LICENSE [2] https://github.com/mito-ds/monorepo/blob/dev/mitosheet/src/p...
  
  teruakohatu 3 years ago
  
  Does AGPL mean it can only be used in a notebook for which the notebook itself is open source?
  Or does it mean it can only be used with notebook software (eg. Jupyter) that is open source but in a closed source notebook?
kite_and_code 3 years ago

If you are a large company trying to migrate to Python, you might also want to have a look at bamboolib.com which was acquired by Databricks.
bamboolib is very similar to mito (hard to tell who was first).
The advantage is that it runs within Databricks which gives you the ability to scale to any amount of data easily and Databricks has many (and growing) security certifications e.g. HIPAA compliance.
bamboolib can be used in plain Jupyter. Also, bamboolib private preview within Databricks is about to start within the next days.
Full disclosure: I am a co-founder of bamboolib and employed by Databricks
- NoImmatureAdHom 3 years ago
  
  bamboolib appears to be closed-source. You're at their mercy.
  
  kite_and_code 3 years ago
  
  bamboolib co-founder here:
  It's correct that bamboolib is (still) closed-source (which might be subject to change but I don't make promises).
  It's also correct that customers can extend the bamboolib UI in various ways via plugins that they can author themselves. That empowers them to build bamboolib into the kind of tool that they want.
  Also, all the code is always exported and thus, there is at least no "code lockin".
  Regarding being "at their mercy", I can say that there are many customers who are happy by the service that we provide.
  
  NoImmatureAdHom 3 years ago
  
  I'm sure you have good intentions, but the fact of the matter is the company may be acquired or the people replaced, and those intentions might change.
  IMHO investing in a closed-source product like bamboolib as a tool for an important business function is very risky. Imagine you're a small company, and you start using bamboolib for some part of your data analysis pipeline. Bamboolib gets acquired (you have exited kite_and_code, congratulations), and the now very large company that controls it decides to stop supporting some feature critical to what you're doing, make an addition that messes everything up, go full-on SaaS somehow, or just shut the product down. What now? You've been growing, so you've got a small team of junior non-experts who were getting the hang of it...switching will be painful (or you could lock yourself in that walled garden and pay the SaaS price...).
  
  kite_and_code 3 years ago
  
  Fair points.
  I guess in this specific case at hand, companies can switch between bamboolib, mito, dtale and it is less likely that all of them will become unavailable at the same time. The switch is also not so hard because there are no underlying proprietary file formats involved (except for bamboolib plugins) because the generated code is pandas, plotly, etc.
  Similarly as described below/above: counter-intuitively, the availability of open-source LibreCalc makes it easier and safer to adopt closed-source Excel.
  
  nojito 3 years ago
  
  Excel is closed source and it powers the world.
  
  wanderingmind 3 years ago
  
  If MS shuts down, there are better FOSS tools that can process excel files (Librecalc), or in general the entire office ecosystem. Can't say the same for small startups.
  
  kurupt213 3 years ago
  
  This is a flawed way of looking at things.
lcrmorin 3 years ago

Hey a bit late to the party (HN newsletter crowd). This really seems like something my BigCorp could use. I am on holiday RN, so I won't fire my computer to try it. But I was wondering, does it allows easy copy pasting the table into standard MS documents (work ? outlook mails ?).

santiagobasulto 3 years ago

I like this. Is a "friendlier" way to browse data. Said that, I have to add:

Exploring large datasets requires a COMPLETELY different mindset. When your data starts growing, it's impossible to keep it all in a visual format (for 2 reasons[0]) and you have to start thinking analytically. You have to start looking at the statistical values of your data to understand what's its shape. That's why the `.describe()` and `.info()` methods in Pandas are so useful. After many years doing this, I can "see" the shape of my data just by looking at the statistical information about it (mean, median, std, min, max, etc).

After some time you don't need to rely on visual tools, just can run a few methods, look at some numbers, and understand all your data. Kinda feels like the operator of The Matrix that is looking at the green numbers descend and knows what's going on behind the scenes.

[0] Your eyes are really inefficient at capturing information and there's only so much memory available: try loading a 15GB CSV in Excel.

wenc 3 years ago

I would caution against this approach in general (unless you’re working with unusually uniform data from a deterministic source — in my world that is rarely the case). Summary statistics are useful but taken in isolation they can mislead. One loses the ability to get a feel for interesting non-aggregated phenomenon.
I find it’s important to actually “touch” the raw data even if only in a buffered, random sampling sort of way to get a feel for it. Sometimes with big datasets, looking through rows of data feels tedious and meaningless but I’ve found that I’ve often picked up on things I wouldn’t have without actually looking at the raw data. Raw data is often flawed, but there’s often some signal in it that tells a story hence it’s important not to overlook these through a lens of aggregate statistics.
The next step is to visualize the data multidimensionally in something like Tableau. Tableau works on very large datasets (it has an internal columnstore format called Hyper) and can dynamically disaggregate and drill down. Insights are usually obtained by looking at details, not aggregates.
- mejutoco 3 years ago
  
  A good example of what you are warning against is Anscombe’s quartet
  https://en.wikipedia.org/wiki/Anscombe's_quartet
  
  santiagobasulto 3 years ago
  
  Histograms and Boxplots (and IQRs) don't lie tho...
  
  mrbungie 3 years ago
  
  Boxplots don't lie, but they can mislead as any summary statistic, data viz or model can. https://blog.bioturing.com/wp-content/uploads/2018/11/BoxVio...
  Misleading histograms depend totally on the bin-width tho.
- kite_and_code 3 years ago
  
  If you want to use open-source Python-based visualizations instead of Tableau, the following tools allow the creation of custom plots - including the ability to export the underlying code.
  - bamboolib (proprietary license - acquired by Databricks in order to run within the Databricks notebooks)
  - mito (GPL license)
  - dtale (MIT license)
  
  pea 3 years ago
  
  If you can write visualisations in Python itself, I am a big fan of Altair's syntax (https://github.com/altair-viz/altair), which is based on vega-lite. A while back, I wrote a brief guide and comparison of the main plotting libraries: https://datapane.com/reports/87NNEJ7/the-ultimate-guide-to-p...
  One benefit of having them in actual code is that you can programmatically automate the creation of things like dashboards and reports. For instance, schedule a script to share an interactive plot every Monday morning, or build a live dashboard that updates every 10m. This opens up a lot of possibilities that would be impossible in a traditional drag-and-drop tool.
  
  aarondia 3 years ago
  
  > programmatically automate the creation of things like dashboards and reports.
  That's an awesome use case for Python, and that sort of script generation is one of the main reasons that we see people adopting Python/Mito. And specifically, graphing[1] is one of the most popular features in Mito.
  Mito generates Plotly [2] graphs, and of course generates the Plotly graph code too, so you can customize the graphs to your perfect liking (Plotly has great documentation and a lot of customizations) or schedule the script to run automatically.
  [1] https://docs.trymito.io/how-to/graphing [2] https://plotly.com/
  
  kite_and_code 3 years ago
  
  Thanks for mentioning Altair. I am personally also a big fan.
  I am one of the co-founders of bamboolib and we are actively thinking about adding support for altair to the Plot Creator (instead of just relying on Plotly).
  Since we are talking other viz options in Python, there are of course also matplotlib, seaborn, plotly, and more.
- santiagobasulto 3 years ago
  
  Of course that `.head()`, `.tail()`, `iloc` and other mechanisms to visualize the data of subsets is always important. But would you really caution AGAINST this? Like, literally telling someone NOT to use summary statistics to explore a dataset?
  
  wenc 3 years ago
  
  No, I’m more cautioning against using summary statistics in isolation without looking at the raw data.
  I was more responding to the statement that one can “see” the shape of data through them and not needing visual tools. The lens of summary statistics is a very narrow one — it’s a necessary but almost always insufficient one. Even .ilocs are insufficient —- it’s hard to know what to .iloc for. One really needs to browse the data interactively to get a good sense of it.
  
  santiagobasulto 3 years ago
  
  Ah, ok. Sorry, I misunderstood. Yes, we’re on the same page. As usual, a good balance is necessary.
aarondia 3 years ago

This is a great point and something that we're actively working on improving in Mito. If you have millions of rows of data, its not enough to just scroll through your data, you need tools to build your understanding.
Some of the tools that you mentioned exist in Mito today. For example, Mito generates summary information about each column (all of the .describe() info along with a histogram of the data). And we're creating features for gaining a global understanding of the data too.
In practice, one of the main ways that we see people use Mito is for that initial exploration of the data. Often the first thing that users do when they import data into Mito is to correct the column dtype, delete columns that are irrelevant to their analysis, and filter out/replace missing values.
- pbronez 3 years ago
  
  It would be super fun to implement an intelligent head() function that shows a representative sample rather than the first X rows. Do the profiling & identify a collection of rows that represent the overall distribution.
  You could develop some IP around efficient and effective ways to do this. Probably would require an ensemble of unsupervised methods.
  
  aarondia 3 years ago
  
  That's a cool idea! One helpful .head() function could include the most unique data typed data. It could help you identify which columns have mixed dtypes: mostly numbers, and some cells that are supposed to be numbers but are actually strings because of additional decimals.
narush 3 years ago

Good points! I also think that this is an area that Mito could do better in. While we do provide pretty cool summary stats [1] and graphing capabilities [2], there isn't a great view for the summary stats of the entire dataframe. It's def on the roadmap -- but this comment makes me think we should move on it quick.
Thanks for the feedback!
[1] https://docs.trymito.io/how-to/summary-statistics
[2] https://docs.trymito.io/how-to/graphing
CJefferson 3 years ago

I find the world is full of datasets with < 200 datapoints, and that is where excel (in my experience) is great. With such datasets it often makes sense to look through the data at particular outliers.
Also, even with huge datasets I tend to always look at a random sample, and the "most extreme" datapoints -- mainly because in my experience there is a good chance some parts of the data are malformed, and need to be recollected/fixed. Of course, if you trust your data collection you don't need this!
- kite_and_code 3 years ago
  
  +1 - this is also how I operated as a Data Scientist myself
awild 3 years ago

> try loading a 15GB CSV in Excel.
Or visualising it in r or pandas without meaningful subsampling.
- pea 3 years ago
  
  One cool library I saw recently for helping on the visualisation side is https://github.com/vegafusion/vegafusion
  It allows you to use Altair in Python for visualising data, but does the computation in the backend using Arrow DataFusion. Not for 15GB perhaps, but cool nonetheless.
- kurupt213 3 years ago
  
  I have an excel template for handling a relatively large amount of data. No where 15GB on one sheet. I use it for preprocessing experimental data from a single experiment. There are about 10 chart tabs build in so I can visually inspect the data looking for errors (and go back and inspect the raw instrument data when something looks off).
  The aggregate data is around 1.5 million experimental results. MiniTab is too unwieldy and requires too much manual reformatting of the data sheets.
  Is this something I should be looking at in R or project Jupyter? Does one make better visualizations than the other?
  
  awild 3 years ago
  
  Ggplot is extremely powerful if you can grok its grammar, which takes some getting used to. But I'd assume that if you see a graph in a scientific paper it's made with ggplot.
  Having many data points you want to explore you are always going to be at the edges of what your hardware and software can produce.
  The last really big datasets I worked with were for my thesis and I had to do subsampling to below 10% to get results within 10minutes or so and that was basically plotting midi recordings of piano performances, so nothing gigantic
kurupt213 3 years ago

In all seriousness, excell can’t be the right option for 15GB of alphanumeric data (one sheet?)
mint2 3 years ago

Do you as a rule look at a sample of the individual raw data, non aggregated?
- santiagobasulto 3 years ago
  
  Usually aggregated... then can start looking at "subsets". For example, step 1 is look at the whole dataset. Then you identify that there are a lot of rows with a type of missing value, so you look at the statistical attributes of that subset (all the rows with value X in null).
  From time to time you can do a `.head()/.title()` or an `.iloc[X:Y]` to check some things visually. But just as a "refresher".
  
  aarondia 3 years ago
  
  This sort of bouncing back and forth between the aggregate the raw data is something that Mito is really great at. To view aggregate info, users tend to either look at graphs or pivot tables of their data in Mito. They use that aggregate view to identify subsets that need some further investigation/cleaning/transforming. And then they filter down to that subset, make the correction, and use the aggregate view again to see the results.
  Practically, this just looks like moving between two tabs in the spreadsheet!
  Something that we don't support right now, but would love to support in the future is cross-filtering. It would be a powerful/easy way of supporting that back and forth workflow.

jpn 3 years ago

I played around with many of these before:

- https://github.com/quantopian/qgrid

- https://github.com/man-group/dtale

I find that I'm actually a lot faster using basic Pandas methods to get the data I want in exactly the form I want it.

If I really want to show everything, I just use:

```

with pd.option_context('display.max_rows', None):

   print(df)

```

Foivos 3 years ago
I use a similar function when I want to see everything:
```
def showAllRows(dataframeToShow):
```
  with pd.option_context('display.max_rows', None, 'display.max_columns', None):

    display(dataframeToShow)
```
# calling it while limiting the number of returned rows.
showAllRows(df.head(1000))
```
Be warned though! if you call this function without limiting the number of rows to be fetched, it is guaranteed you will crash your machine. Always use head, sample or slices.
If do get a crush, then your only option is to open the ipynb file with vi and manually delete the millions of lines this function created.
Another function that I like is:
```
def showColumns(df, substring):
```
    print([x for x in df.columns if substring in x])

    return
```
# calling it
showColumns(df, "year")
```
This is useful in data frames with many columns, when you want to find all the columns that have a specific string in their name. It returns a string, which then you can pass it in the dataframe to print only these columns.
dekhn 3 years ago

what irks me about dtale is if you scroll with the vertical slider, it can't update the view fast enough until you stop scrolling.

harabat 3 years ago

For those who are going through the thread finding new tools: pandas-profiling[0] is a library for automatic EDA (which bamboolib[1], mentioned elsewhere, also does).

[0]: https://github.com/pandas-profiling/pandas-profiling [1]: https://bamboolib.com/

kite_and_code 3 years ago

Lux might also be interesting: https://github.com/lux-org/lux
- narush 3 years ago
  
  Def check these all out! Lots of cool tools out there. For anyone who's tried a bunch of these... that's a great topic for a Medium post :)
MaxDPS 3 years ago

I just found out about pandas-profiling a couple days ago and the examples blew my mind, it looks amazing (I’ve yet to actually try it out though).

rcarmo 3 years ago

The telemetry thing is... weird. So we can use it for free but have no way to turn it off but upgrade to paid?

aarondia 3 years ago

Thanks for that feedback. Mito's approach to telemetry is that we never log any of your data or metadata about your data. We don't track things like the size, shape, or content of your data.
We do collect info about app usage, things like which buttons users click. This allows us to focus development time on improving the features that are used most often.
That being said, it's important to us that there is a way to be totally telemetry-less if users don't want any information to be leave their computer. Compared to most other cloud-based sass data science tools where you pretty much have no hope of total privacy, we're proud of the flexibility that we offer.
But of course, we're always open to feedback about how we can continue to improve our practices!
- learndeeply 3 years ago
  
  I don't get it. What in the license prevents users from removing the telemetry? AGPL just means the user needs to open source that change, right?
  Edit: To remove telemetry, just call:
  from mitoinstaller.user_install import go_pro; go_pro();
  No licensing or payment required, and doesn't violate the license.
  
  narush 3 years ago
  
  Mito is open source, but using Pro features does actually require a Pro or enterprise license. You can check out this callout in the license [1], as well as the restrictions on Mito Pro features here [2]. We're in the process of fixing up the upgrade to Pro process a bit... as you can tell... :)
  You can of course fork Mito and turn off telemetry as long as you open source your changes! Go for it - happy to hop on a call and help you get set up with the codebase, if you want. Yay open source!
  [1] https://github.com/mito-ds/monorepo/blob/974091b455950c6c50e... [2] https://github.com/mito-ds/monorepo/blob/dev/mitosheet/mitos...
  
  imcoconut 3 years ago
  
  Mito looks awesome.
  Just want to say that I respect the fact that you built this library, are offering it open source, for free, and want the telemetry on. You are up front and open about it.
  Sometimes I think people can get a little entitled with all the work someone else puts in to a project they want to use (not accusing GP of this, speaking generally). As you said, under the license anyone is more than welcome to fork, modify and open source. Yay open source indeed :)
  
  narush 3 years ago
  
  Thanks a ton!
  
  teruakohatu 3 years ago
  
  You should consider using something other than pip to distribute an installer for the Pro version.
- NoImmatureAdHom 3 years ago
  
  Just to be very clear, the way to be "totally telemetry-less" is to pay you?
  
  aarondia 3 years ago
  
  Yes
MadameBanaan 3 years ago

Yeah, I have a hard pass on anything that offers an "Open Source" version, but actually meant to be a "Try it and be my Guinea Pig".

kite_and_code 3 years ago

To the founders of mito, regarding the mito GPL license:

What is your take on that regarding usage inside cloud provider's notebooks like AWS, GCP, Azure, Databricks?

Is it allowed or not allowed by the license? And who should/can control the usage since users can install any kind of Python library in those environments.

And, separately from the maybe ambiguous legal answer: What is your personal intention with the license?

Disclosure: I am employed by Databricks.

narush 3 years ago

Hiya kite_and_code - thanks for the question + good to see you here :)
Our understanding of our license is evolving - we're first time open source devs, and as I'm sure you know it can be a tricky process. That being said: we totally support Mito users using Mito from notebooks hosted in the cloud!
Currently, we have quite a few users using Mito in notebooks hosted through AWS, GCP, etc. We’re aiming to be good stewards of open source software, and want to see Mito exist where ever it is solving users problems!
We’ve had lots of folks in lots of environments request Mito, and are actively working on prioritizing supporting those other environments. We added classic Notebook support last month (funnily, I thought it’d take weeks to support, and it took 2 days lol) - and are looking into VS Code, Streamlit, Dash, and more!
EDIT: due to comment below, I edited this comment for clarity that we 100% support users using Mito from notebooks in the cloud!
- kite_and_code 3 years ago
  
  I can totally relate that finding a suitable open-source business model is a fuzzy journey.
  Nevertheless, from the user perspective I would love to hear a more clear answer - at least for e.g. the next 6-12 months.
  Currently, it seems like you are tolerating usage inside the cloud providers without taking a clear stance. I think this creates fear, uncertainty, doubt and slows down mito adoption within the cloud.
  I would appreciate a clear statement in the near future around your thinking on how mito should be made available in those environments. After all, the clouds are an environment to where more and more users are migrating to. Or at least use it in parallel to local setups.
  I can understand if you don't want to answer on the spot in case you don't have a clear stance yet. In this case, please take your time and let us know when you made your decision.
  Really love what you're doing and the innovation that you are pushing for! <3
  
  narush 3 years ago
  
  Oh, sorry I wasn't clear! We totally expect that users will use Mito in notebooks on the cloud cloud, and we are in support of this usage!
  Ideally, we will continue to extend our support to these environments over time, as currently there are lots of environments where users want Mito but we don't support it yet (notebooks api differences, etc) - a good example being AWS Sagemaker.
  I'll edit my answer above to be more clear about this as well. Thanks for the ask for clarification!
- mbreese 3 years ago
  
  > Our understanding of our license is evolving
  As a potential user, this is pretty troubling. I can understand your intentions, but if the license doesn’t match your intentions (and if you don’t completely understand the license), how can we be sure our workflows will be supported/possible in the future?

boringg 3 years ago

Looks neat - pandas is very powerful and it makes it more approachable for non-programmers. However paid product like this - I probably wouldn't make the switch to this and then have the company go belly up leaving users stranded. Too much risk.

Hope for the best though - pandas is pretty fantastic.

okennedy 3 years ago

You might want to check out a tool Vizier: https://vizierdb.info (I'm one of the devs). Direct interaction with notebooks state (e.g., dataframes as spreadsheets) is one of the central ideas, and it's fully open source.
- aarondia 3 years ago
  
  This looks cool :)
aarondia 3 years ago

One of the creators of Mito, here. Thanks for your feedback. I wanted to share a couple of nuggets about Mito that have been helpful in talking about this with other users.
1. The core Mito product is open source. You can see our GitHub here [1]. We also have a pro version that has some additional, code visible, but non-open source features. The way that we think about which features belong in which version of the product is as following: Features that are needed to just get any average analysis done are open source features. On the other hand, features that are specifically useful in an organization -- connecting to company databases, formatting / styling data and graphs for a presentation, etc. -- are pro features. So if you are a team that is relying on our pro features, you're helping support the longevity & progress of Mito. If you are not one of those users and using the open source version, then you will always have access to Mito (and can even help improve it!). Of course the line between what features are specifically helpful in an organization and what feature are needed for an average analysis is a bit blurry, and is a moving target as we continue to expand Mito's offering.
2. Mito is designed specifically to not force users to make a big 'switch'. I've commented this elsewhere in this thread, but just to recap: Because Mito is an extension to Juptyer and because we generate python code for every edit you make, Mito is designed to improve your existing workflow instead of lock you into a new system. Many Mito users use Mito as a starting point! They do as much of their analysis as they can in the Mito spreadsheet and then continue writing more customized Python code to finish up their work.
Not requiring a big switch is nice for the user and its nice for Mito too! Lots of large companies have been able to get up and running with Mito in 30 minutes because it fits into their data stack.
Anyways, not that these are the only two reasons you might feel uneasy about adopting Mito, but at least wanted to share why the switch to Mito might be less scary than switching to other tools.
[1] https://github.com/mito-ds/monorepo
- kite_and_code 3 years ago
  
  I love how mito enables companies to use the power of open-source!
  You might want to think about enabling companies to create the company specific extensions themselves e.g. via a plugin API. You might still request them to pay for this version of Mito but they are enabled to extend it with their engineering power instead of relying on you.
  We had good experiences with this at bamboolib (I am one of the co-founders) and in addition to recurring license revenue it also increased demand for consulting from our end because the internal company devs started working on plugins and then wanted our direct guidance on how to get the more tricky things to work.
  
  narush 3 years ago
  
  Yeah, we've thought a bit about a plugin API - for the reasons you say, I think it would be an awesome feature to open up to teams!
  Any tips on going about it? No need to share the secret sauce, unless you want :P
  To be totally honest, we're not architected super well to support plugins currently. The big challenge would be allowing users to specify this plugin in pure Python (seems like we want this) - but we think that hand-coded UIs outperform autogenerated ones for now. We've been thinking about how to do better though... maybe soon.
  Of course, if Mito is missing features, we're open source [1] -- all contributions are welcome! Also feel free to open an issue and we can discuss :)
  [1] https://github.com/mito-ds/monorepo
  
  kite_and_code 3 years ago
  
  Cool to hear that!
  To be honest, we regularly refactor our architecture at bamboolib in order to make sure that there is almost no gap between what we would love to say in natural language and the code that we need to write.
  This resulted in a very stable and clear internal API surface (read architecture). So, literally, all we had to do was adding mount points where users could register their plugins and then include those at render time.
  The next day, customers could write plugins just as we did. And, as a matter of fact, all the bamboolib transformations, visualizations, views, etc are just sophisticated plugins that our customers could write themselves because they have access to the same API as we do.
  So, no secret sauce except for "good architecture" which is easiest achieved as an ongoing effort rather than an one-off project.

noobker 3 years ago

Mito looks cool. I'm hopeful a tool like it can create a bridge between Excel-based analysts/researchers and more mature application flows.

Another tool like Mito is Bamboo: https://bamboolib.8080labs.com/

aarondia 3 years ago

Heyo, Mito cofounder here, bridging that gap is one of the main ways that enterprises are using Mito today! Helping business users become data self-sufficient in a world where Excel's data size limitations make it a non-option is where Mito shines :)

jpalomaki 3 years ago

If others are interested, Mito does not work in vscode or Google Collab. Only classic Jupyter Notebooks and Jupyter Labs are supported currently [1].

[1] https://docs.trymito.io/misc/faq

malshe 3 years ago

Thanks for checking that! I use vscode so this is a no go...
- aarondia 3 years ago
  
  Yeah, Mito is limited to the Jupyter ecosystem (for now). We want to expand to VSCode, Google Collab, and Streamlit!
  For the time being, because Mito generates pandas code for every edit you make, you can always use Mito in Jupyter to generate code, and then copy it over to VSCode. Admittedly, its not as nice of a workflow, but it does work!
  
  pen2l 3 years ago
  
  I see you guys provide convenient installers that can be obtained with pip. Me, I run my JupyterLab that I got with Conda on a Windows setup. Can you comment on whether it's a thorn-free path to get Mito in such a setup? Or should I use this as another sign to completely migrate to a nix system for all my dev needs... :)
  
  aarondia 3 years ago
  
  You should be good to go with your conda setup on windows! I run Mito on a windows machine through a conda virtual environment often! We have some instructions for how to do that here [1]
  [1] https://docs.trymito.io/getting-started/installing-mito/inst...

ryzvonusef 3 years ago

https://www.youtube.com/watch?v=T7YkWuTIlTw

video of it in use.

aarondia 3 years ago

Thanks for sharing. There's a few other YouTubers who have made some cool videos about Mito -- The Data Professor [1] and Talk Python to Me [2]
And some cool Medium posts too! Mitosheet: enabling collaboration [3], Mito: One of the Coolest Python Libraries You Have Ever Seen [4] Preparing a dataset for analysis [5]
[1] https://www.youtube.com/watch?v=l2nBO_LkkcQ [2] https://www.youtube.com/watch?v=XAGmSPZsYLU [3] https://medium.com/trymito/mitosheet-empowering-collaboratio... [4] https://towardsdatascience.com/mito-one-of-the-coolest-pytho... [5] https://medium.com/@twelsh37/preparing-a-dataset-for-analysi...

sodimel 3 years ago

Looks like a Datasette[0] clone which runs on top of something (jupyter) which runs on top of Python (ipython). I think I would like to see how much time it takes to open a massive dataset in Mito & in Datasette :P

[0]: https://datasette.io/

aarondia 3 years ago

Heyo, one of the Mito creators here. Thanks for sharing Datasette. I haven't seen that one before. It looks neat!
You're right though, there are several tools that fit the general shape of: GUI on top of Jupyter on top of Python. There's a few general vectors to understand these tools by:
1. Excel-ness: Although most (if not all) of these tools incorporate some type of spreadsheet, the interface for interacting with the data in that spreadsheet differs greatly. Some tools, like Bamboolib [1] and Datasette [2] resemble Excel only in the spreadsheet. Other tools, like Mito [3], stick to a lot of the other Excel design decisions -- things like having a toolbar with buttons and menu items to access functionality, the ability to write spreadsheet formulas inside of the cell & formula bar, etc. In many ways, this Excel-ness design vector is a proxy for how easy it is to get started with the tool. What we see, is that users are able to download Mito and get something useful out their first analysis because the interface is one that they are used to!
2. Ownership of your analysis / lack of lockin: We believe that the most powerful low-code spreadsheet tools allow spreadsheet users to easily transition to full programming languages, if they want to. Instead oflocking users into a limited and proprietary product, it's better if users can transition to a full programming language (like Python) very naturally. This transition is super natural in Mito because we generate Python code for every edit that a user makes. So if Mito doesn't support the exact transformation that you want, you can use Mito as a starting point for your analysis and customize the script that Mito generates.
[1] https://bamboolib.8080labs.com/ [2] https://datasette.io/ [3] https://www.trymito.io/
- kite_and_code 3 years ago
  
  bamboolib co-founder here. We are also thinking about adding Excel-type formulas to the UI and already have internal prototypes.
  However, please be aware that bamboolib might soon only be available within Databricks notebooks instead of local Jupyter notebooks like mito.

flakiness 3 years ago

Nice! The page looks more like a SaaS offering or something, which initially scared me away a bit. I hope the emphasis is more on the opensource library and showing paying options as some premium thing.

I didn't realize that the "too nice" landing page makes me anxious for open source software :-/

narush 3 years ago

To pull the curtains back a bit: we probably spend about 85% of our product and development time on open source code. Just this week, we developed copy and paste, nan value filling, and spilling a text column on a delimiter - all of these are open source features.
As we've begun to engage with larger teams, we often take features that we build out for their workflow and open source them as well - a few of the teams have been explicit proponents for the open source tool, which is awesome to see.
I'm sure our thinking on this will evolve over time, but we are highly focused on developing just a _great_ piece of open source software. And for folks that need more power, we want to give them the chance to get it - while also supporting Mito's development :)
P.S. Check out our Mito Pro roadmap here: https://www.trymito.io/plans#mito_pro_roadmap. Feedback appreciated!
aarondia 3 years ago

Well first of off, thank you, I put a lot of effort into implementing that landing page :)
We're super focused on the open source offering. The vast majority of our users are on the open source version and the vast majority of the features we release are open source! (You can check out our PR's if you're interested in verifying)
The Mito Pro and Enterprise plans are designed for advanced users and teams. In those versions we provide features that make it easier to collaborate, create presentation-ready materials, and hook up to other company resources.
But we're an open source tool through and through!
- narush 3 years ago
  
  Fancy seeing you here, writing the same comment as me... :}
kite_and_code 3 years ago

I am not so sure about the open-source fact. Please see comments and thread below.
Edit: It is GPL by now as seen here https://github.com/mito-ds/monorepo/blob/dev/LICENSE

MaxDPS 3 years ago

This looks pretty interesting. I like how customizable JupyterLab is with these extensions. Not sure if this is the right place to ask, but does anyone have any recommendations for other extensions I might want to look at?

whoevercares 3 years ago

Tricky question - what do you think about Databricks who acquired Bamboolib and saying they will integrate pandas GUI into their workspace?

kite_and_code 3 years ago

Another alternative is bamboolib.com which was acquired by Databricks last September to offer it within Databricks notebooks

filmor 3 years ago

Are you affiliated? There are three comments in this comment page by you, and they all manage to mention bamboolib...
- kite_and_code 3 years ago
  
  Yes, I am one of the co-founders of bamboolib and employed by Databricks.
  I already added my disclosure to the following answer [0] in this thread but I was hesitant to add it to every answer.
  Do you prefer if I explicitly add my affiliation in every comment that mentions bamboolib? If so, I will try to edit them (if the HN UI still allows me to - I observed that it stops allowing this after some time)
  [0] https://news.ycombinator.com/item?id=31450910
  
  Closi 3 years ago
  
  > Do you prefer if I explicitly add my affiliation in every comment that mentions bamboolib?
  Personally I thought your original post's tone implied that you weren't affiliated to me personally.
  You don't have to add a formal 'disclosure', but you could just say "I built x which is..." rather than "Another product is x which is...".
  
  kite_and_code 3 years ago
  
  Thank you for sharing that observation and the suggestion! I will keep that in mind :)

alefnula 3 years ago

I have no affiliation with the project. Just found it, tried it out, and it looks very promising...

aarondia 3 years ago

Thanks for posting!

punk_ihaq 3 years ago

Wow love it! It would be cool to see a bidirectional Streamlit custom component for Mito!

narush 3 years ago

This is on the roadmap! Would love to hear a bit more about how you would use this component...
1. Would you want the component to generate code? Or would it just be the editing of a dataframe that is useful to you?
2. What other components would be used in this dashboard? Would love to hear a bit more about the workflow around Mito here.
The more detail you can provide - the more helpful in prioritizing this! I think Mito in streamlit would be ... awesome!

pipeline_peak 3 years ago

The web page needs to be heavier

narush 3 years ago

Super fair, lol. We'll work on optimizing it - just a tiny team and lots of things on our plate rn. The main issue is our images / video, which I have tried compressing but can't do so while maintaining the quality. Any tips are greatly appreciated!
Believe it or not, the last version of this website was even heavier...