This is a cool project! But if you just want a simple CLI, you can query Excel and ODS files with dsq [0]. It's the same end result without needing to carry plugins around and with support for many additional data formats. Plus a growing standard library of functions that don't come built into SQLite such as best-effort date parsing, URL parsing/extraction, statistical aggregation functions, math functions, string and regex helpers, hashing functions and so on [1].
An annoying thing about this extension-based style of file support is needing to create a new table for every new file if the schema is different. This is a limitation [2] of sqlite unfortunately. dsq doesn't work this way so it doesn't have that limit.
On the other hand, if you go this route you can more easily combine with other extensions. That's not really possible with dsq right now.
I think when Wasm gets a little bit more mature, it will enable the creation of more composable tools. One might be able to hot create modules? Kinda ironic that this is how it is structured given that arbitrary projections is what SQL does.
Is there a way to hook lower in the SQLite stack, and make it think that your DS is something it already understands?
What about a function that returns a view that it dynamically generates?
I've been contemplating a workaround for this: since registering a new function to a SQLite connection (and deregistering it later) is a pretty fast operation, it should be possible to register a custom function with the detected columns for a single use only and then remove it after running the query.
> I'm looking for a way to have a searchable/filterable CSV file of 30k rows as a web app.
Can you expand on this? How would people express the searches and filters? SQL? or some other way?
Assuming SQL, all the work's done for you already! :D A single static page that serves sqlite3-in-wasm to the client browser, with, I donno, A text box to enter SQL queries and a file picker.
This is cool, and kudos to you for writing it, and getting it out there. I've always found that more than half the "hard work" is actually finishing something and getting it out there, and you get all my respect for doing that.
Querying Excel spreadsheets with SQL, however, is something that can already be done with no additional tools, no custom plugins, just good old fashioned command line knowledge.
(first, download superstore.xls, some test data I found [0])
loxias@host:~$ sqlite3 -csv :memory: -cmd ".import '| ssconvert -T Gnumeric_stf:stf_csv superstore.xls fd://1' orders" -cmd '.mode column' 'SELECT State, AVG(Profit), COUNT(*) FROM orders GROUP BY State ORDER BY AVG(Profit) DESC LIMIT 5'
State AVG(Profit) COUNT(*)
------------ ---------------- --------
Vermont 204.088936363636 11
Rhode Island 130.100523214286 56
Indiana 123.375411409396 149
Montana 122.2219 15
Minnesota 121.608847191011 89
loxias@host:~$ sqlite3 -csv :memory: -cmd ".import '| ssconvert -T Gnumeric_stf:stf_csv superstore.xls fd://1' orders" -cmd '.mode column' 'SELECT State, AVG(Profit), COUNT(*) FROM orders GROUP BY State ORDER BY AVG(Profit) ASC LIMIT 5'
State AVG(Profit) COUNT(*)
-------------- ----------------- --------
Ohio -36.1863040511728 469
Colorado -35.8673510989011 182
North Carolina -30.0839847389558 249
Tennessee -29.1895825136612 183
Pennsylvania -26.5075984667803 587
And there you have the 5 top and 5 worst performing states by profit margin.
> You just went from requiring one bit of code to requiring another
Not in the slightest. I went from requiring customized code, including a whole build framework for a specific programming language to requiring only the tools one can assume are installed, and no code.
For the task of "querying spreadsheets with sqlite", there are only two programs you can assume the user already has: Sqlite, and spreadsheet software. :) I think that's a safe and reasonable assumption, no?
ssconvert is part of Gnumeric. One can do the same thing with libreoffice but I didn't know the syntax offhand. [0]
It is a crime against humanity how opaque office documents are to data extraction.
So much potential and efficiency lost. as in likely hundreds of billions of dollars.
Practically every single office in the world uses office suite products. There's probably a 10s to 100s of billions of person-hours or more of collective work invested in office suite documents and spreadsheets, and getting at it programmatically is not easy.
Not easy is a bad term.
Intentionally walled, obfuscated, undocumented, and constantly changed to maintain monopolies in core Office software as well as other "back office" products.
And it's still willingly accepted by virtually all corporations and organizations worldwide.
It’s implementation dependent. You implement an interface and it’s up to the implementation to optimize the calls. The interface is also rather simplistic so there’s some optimization that can’t be done at the implementation level since SQLite didn’t provide all the information at the time of the query.
In addition to just loading and querying data, spreadsheets are great as database mocks in integration tests. I long ago used XLSQL (note: there is a new XLSQL project unrelated to this one) to mock an Oracle database: https://sourceforge.net/projects/xlsql/.
Do you have a workaround workflow in the meantime? I could imagine a few non-Macro or Macro-light solutions to this. Use a pivot table for example, the results of which I believe get 'magiced' into their destination cells instead of getting hidden, I however don't recall whether they auto-inherit autofilters set on the source table and/or source cell range.
How much organizational (not technical, we'll get to that) control/authority/input/stakeholder role do you have on the source document templates?
This is a cool project! But if you just want a simple CLI, you can query Excel and ODS files with dsq [0]. It's the same end result without needing to carry plugins around and with support for many additional data formats. Plus a growing standard library of functions that don't come built into SQLite such as best-effort date parsing, URL parsing/extraction, statistical aggregation functions, math functions, string and regex helpers, hashing functions and so on [1].
An annoying thing about this extension-based style of file support is needing to create a new table for every new file if the schema is different. This is a limitation [2] of sqlite unfortunately. dsq doesn't work this way so it doesn't have that limit.
On the other hand, if you go this route you can more easily combine with other extensions. That's not really possible with dsq right now.
[0] https://github.com/multiprocessio/dsq
[1] https://github.com/multiprocessio/go-sqlite3-stdlib
[2] https://sqlite.org/forum/forumpost/ec944414fa
I think when Wasm gets a little bit more mature, it will enable the creation of more composable tools. One might be able to hot create modules? Kinda ironic that this is how it is structured given that arbitrary projections is what SQL does.
Is there a way to hook lower in the SQLite stack, and make it think that your DS is something it already understands?
What about a function that returns a view that it dynamically generates?
Sqlite does not allow table valued functions to return results with varying schemas. The schema must be defined once up front and then cannot change.
That forum post is a request I made to them to allow dynamic columns but they don't seem interested so far.
I've been contemplating a workaround for this: since registering a new function to a SQLite connection (and deregistering it later) is a pretty fast operation, it should be possible to register a custom function with the detected columns for a single use only and then remove it after running the query.
Yeah, I was thinking along the same lines, possibly running SQLite under Cling [1] via cppyy [2]
[1] https://www.infoq.com/news/2015/05/cling-cpp-interpreter/
[2] https://github.com/wlav/cppyy
The problem is if you have multiple function calls within one query.
You could give them all a unique name though. Like call "the importX family": import1, import2, import3, etc...
Still not ideal but at least possible.
Edit: Actually if you just transform the query to add the suffix then that's pretty reasonable a UX. This might give me a way forward to try out TVFs!
I get that as how it currently stands. But somewhere in the sqlite codebase, everything is a projection.
Still thinking and reading through sqlite source ...
https://dirtylittlesql.com and https://io10.dev also support this.
I'm looking for a way to have a searchable/filterable CSV file of 30k rows as a web app.
Some kind of fork of DirtyLittleSQL looks like it is the solution!
> I'm looking for a way to have a searchable/filterable CSV file of 30k rows as a web app.
Can you expand on this? How would people express the searches and filters? SQL? or some other way?
Assuming SQL, all the work's done for you already! :D A single static page that serves sqlite3-in-wasm to the client browser, with, I donno, A text box to enter SQL queries and a file picker.
Well this is the coolest thing I've seen all week.
One-liner for doing this for CSVs from a few days ago: https://news.ycombinator.com/item?id=31824030
This is cool, and kudos to you for writing it, and getting it out there. I've always found that more than half the "hard work" is actually finishing something and getting it out there, and you get all my respect for doing that.
Querying Excel spreadsheets with SQL, however, is something that can already be done with no additional tools, no custom plugins, just good old fashioned command line knowledge.
(first, download superstore.xls, some test data I found [0])
And there you have the 5 top and 5 worst performing states by profit margin.Look ma! No code! No temp files even! :D
[0] https://community.tableau.com/s/question/0D54T00000CWeX8SAL/...*
You just went from requiring one bit of code to requiring another. I.e. Does every Linux distro and Mac and Windows come with ssconvert?
(heavily edited for tone and clarity)
> You just went from requiring one bit of code to requiring another
Not in the slightest. I went from requiring customized code, including a whole build framework for a specific programming language to requiring only the tools one can assume are installed, and no code.
For the task of "querying spreadsheets with sqlite", there are only two programs you can assume the user already has: Sqlite, and spreadsheet software. :) I think that's a safe and reasonable assumption, no?
ssconvert is part of Gnumeric. One can do the same thing with libreoffice but I didn't know the syntax offhand. [0]
FreeXL is another alternative: https://www.gaia-gis.it/fossil/freexl/index
As another alternative, you can use in2csv[1] to convert excel documents to csv and pipe them into the other tools of csvkit.
[1]: https://csvkit.readthedocs.io/en/latest/tutorial/1_getting_s...
It is a crime against humanity how opaque office documents are to data extraction.
So much potential and efficiency lost. as in likely hundreds of billions of dollars.
Practically every single office in the world uses office suite products. There's probably a 10s to 100s of billions of person-hours or more of collective work invested in office suite documents and spreadsheets, and getting at it programmatically is not easy.
Not easy is a bad term.
Intentionally walled, obfuscated, undocumented, and constantly changed to maintain monopolies in core Office software as well as other "back office" products.
And it's still willingly accepted by virtually all corporations and organizations worldwide.
Somewhat-kinda related, the textql extension for Asciidoctor is so dang useful it should be in core.
https://gist.github.com/mojavelinux/8856117
I use this as a "centralized parts repository" for big ol' maintenance manuals. Refresh from PDM/PLM/LSA/Whatever. Rebuild for new parts data.
Built on TextQL, natch
https://github.com/dinedal/textql
How do virtual tables compare with regular tables for performance?
I would assume SQLite has some optimisations for native tables (rather than reading the data from another virtual table backed file)?
It’s implementation dependent. You implement an interface and it’s up to the implementation to optimize the calls. The interface is also rather simplistic so there’s some optimization that can’t be done at the implementation level since SQLite didn’t provide all the information at the time of the query.
In addition to just loading and querying data, spreadsheets are great as database mocks in integration tests. I long ago used XLSQL (note: there is a new XLSQL project unrelated to this one) to mock an Oracle database: https://sourceforge.net/projects/xlsql/.
I built a desktop for a similar purpose, and it can handle GBs of CSV files pretty quickly. It is for CSV though.
It is more convenient if you do this regularly for your job.
Please check it out: https://superintendent.app
Do any of these tools take into account hidden Excel rows? It’s a constant frustration.
Dunno! But if you send me a sample I can try it out for dsq.
Sweet. Here’s my open issue with VisiData about this: https://github.com/saulpw/visidata/issues/1398
There’s a sample file at the end of that thread.
Do you have a workaround workflow in the meantime? I could imagine a few non-Macro or Macro-light solutions to this. Use a pivot table for example, the results of which I believe get 'magiced' into their destination cells instead of getting hidden, I however don't recall whether they auto-inherit autofilters set on the source table and/or source cell range.
How much organizational (not technical, we'll get to that) control/authority/input/stakeholder role do you have on the source document templates?
Also, you might want to check out https://github.com/microsoft/advanced-formula-environment/
> Do you have a workaround workflow in the meantime?
Yeah in the VisiData ticket I link to a post[1] with some code to handle hidden rows and columns in Pandas+openpyxl.
[1] https://towardsdatascience.com/how-to-load-excel-files-with-...
rga (ripgrep all) is basically rg (ripgrep) for binaries.