Libgsqlite: A SQLite extension which loads a Google Sheet as a virtual table github.com 264 points by x2bool 4 days ago
In a similar vein, Steampipe  has a Google Sheets plugin . It uses Postgres Foreign Data Wrappers instead of SQLite as the virtual table engine. (Disclaimer: I'm a lead on this open source project.)
1 - https://github.com/turbot/steampipe 2 - https://github.com/turbot/steampipe-plugin-googlesheets
In ClickHouse, querying Google Sheets works out of the box, no plugins required:
In case you don't read the whole README, note this in limitations:
> The extension will load the spreadsheet only once while creating a virtual table. If you want to pick up recent changes, drop the table and create it again.
Thanks I was wondering about this, didn't want to trigger usage limits.
That is along is already great! Could be a way to slowly migrate off google spread sheets.
This project actually started as a fork of https://github.com/x2bool/xlite - SQLite extension for querying .xls and .xlsx spreadsheets.
This is very cool and could be the basis for a huge set of interesting operations - but any time I read about DB <> Spreadsheet discussions they seem to be started by folks who much better understand DBs.
The principle use of the spreadsheet for both power users and casual users is a lot around justifying answers by being able to see the data. The goal is not speed or scale.
If you and sending a report to a superior - clear and visible rows, columns, and "math" makes it faster and easier to trust information for non-technical users.
This same logic is why the spreadsheet is best serving customized, one-off, or changing operations without a huge amount of data. This makes up a huge chunk of day-to-day business decisions, especially in finance where the math is much easier.
I don't think the spreadsheet lasts forever, but its successor will likely be something that presents an easy way to get answers from a set of data that is visible or at least able to be quickly understood, regardless of whether its fast, scalable, networked etc.
Yes, I would agree that technical and non-technical usability is one appealing spreadsheet use case. I think that another thing I find myself using spreadsheets for is simple data entry. By using a google sheet to enter data, you've already got yourself an authenticated, multi-user web form for easily entering data. It even integrates with google forms, which is a very handy tool for collecting data. It's certainly a lot easier than building it all yourself and hosting it.
Once the data is entered, there are much better tools for data visualization and aggregation (in my opinion). It's nice to have the "frontend" taken care of so that your time is freed up to analyze and extract meaning from the data. I'm speaking from the perspective of someone who does a lot of rapid prototyping and data analysis. There are likely use cases where this won't scale.
Spreadsheets are rather peculiar. I don't think anyone would come up with the concept of them today.
This "fill formula down/right" feature is so strange. A developer would be like: let's just set a formula for the entire column, and make it refer to another column. And let's name the columns something. Like Airtable.
But its surpringly flexible and hackable.
I wish there was something like GSheets + Airtable.
> let's just set a formula for the entire column, and make it refer to another column
Meet Lotus Improv (1991): https://en.wikipedia.org/wiki/Lotus_Improv
I think the patents have expired now.
Improv brought me joy. There isn’t anything like it.
I read this many times on HN. As far as I can tell Excel copied most of it about a decade later with pivot tables.
Sort of. Improv lets you work in the pivot though, like that's the main view, and the ramifications of that are huge.
- You don't have to make your sum/avg/analytic rows because Improv does that automatically. Yes, Excel does this on the pivot tables, but because you can't work there, a lot of people make a SUM(B2:B999) or something to make summary data more accessible.
- Your formulas look simple like ConversionRate=Total:Closes/Total:Leads instead of =SUM(A2:A99)/SUM(B2:B99) and sure I know there's a way to make Excel do something like that, but nobody does, which is one thing that makes it hard to mix stuff between multiple documents in Excel -- someone moves the lead count on the lead sheet and it blows everything up back at headquarters. In Improv if someone renamed Leads to Pipe or something it'd break and you'd see that right away with much less chaos.
And so on
Grist is it, in fact "GSheets + Airtable" is how people often see it. From https://blog.appsumo.com/airtable-alternatives/#3_Grist: "Grist has been described as if Google Sheets + Microsoft Excel + Airtable had a beautiful baby."
If when you say GSheets + Airtable, you're looking for a spreadsheet that is designed for data analysis +a let's you name columns and sets formulas to the entire column, then checkout Mito (https://www.trymito.io). Disclosure: I created Mito.
Mito doesn't have real time collaboration though. That is, not until Jupyter real time collaboration becomes more popular (https://jupyterlab.readthedocs.io/en/stable/user/rtc.html)
I think the real test is: can it quickly build an ad-hoc financial model. This is the bread and butter of spreadsheets.
Mito doesn't look like it keeps the flexibility of a spreadsheet in that way.
Sounds like you're describing Grist! And it's open source. https://github.com/gristlabs/grist-core
> let's just set a formula for the entire column, and make it refer to another column. And let's name the columns something
If you turn a worksheet section into a proper Table in excel, it does exactly this.
> let's just set a formula for the entire column, and make it refer to another column. And let's name the columns something.
I don’t understand because you obviously can do this with sheets/excel
Enter the formula on row 1, copy it, select the column and paste
> set a formula for the entire column
Spreadsheet power users on both excel and sheets do exactly that. It’s called array formula. Press cmd-shift-enter on the cell with the formula and it’s valid for the entire column and can be changed in one central place.
Hah, it's you again! I keep bumping into you on these threads and feeling like I'm reading my own words in your comments. Yes, spreadsheets are beautiful. Most developers are missing the vision (and I think they missed the point in your comment too)
> I wish there was something like GSheets + Airtable
It’s called coda.io I use it for small data and I love it.
You can just use curl to get the csv from the gayest and pipe into SQLite using the .import command.
Not sure what this has on top of that.
I wrote a little cli that is a bit easier than curl for getting data to and from google sheets as csv. I do think this sqlite extension would be even more convenient if it supported writing data with INSERT/UPDATE.
This needs its own post
from the _what_
typo of gsheet?
More likely to be autocorrect doing it’s usual.
Yes that’s correct ! Oops
From the gayest
That sounds like me, but I don’t issue csvs.
I only issue .STLs
Gayest what? Sheet? Account holder?
For one, there's no CSV standard and often csv based data exchange methods have lurking (and/or obvious) correctness problems. Maybe gsheets and sqlite are lucky there, maybe not.
This project is also well documented including the gsheets side (which is quite non-obvious).
> Maybe gsheets and sqlite are lucky there, maybe not
Seems weird to to pooh-pooh the idea without any reason to think it isn't good. Also, it's not really a question of luck, is it? It's not that mysterious how to validate an integration like this.
>>"(which is quite non-obvious)."
ELI5 what this means, please? (unpack this for the ignorant among us, plz)
I mean the gsheets api access setup is convoluted and here it's well documented. It has about two dozen steps.
> For one, there's no CSV standard
This is false: https://www.rfc-editor.org/rfc/rfc4180
Well yes, there are actually many many CSV standards, my mistake for imprecise language. What I meant to say that there's no single CSV format that programs agree on. Starting with Excel...
> INSERT, UPDATE and DELETE statements won't be implemented. Welcome PRs.
Adding support for modifying Google Sheets would be a game changer. I see many (SQL-like) extensions that will query GS, but none can change it.
I’ve been using google sheets as the backend database storage for a few recent projects.
Using SQLite as the interface instead of the sheets API will be so nice.
Btw - highly recommend sheets as a backend. For my recent projects it has been such a time saver.
Last project I was involved with deals with Google Sheets. They were using it as their multiuser read/write database. I offered to migrate it all to a proper database for FREE during my own personal time outside of work. Blank stares were all I got. I really hope AGI will soon be able to replace those humans that cannot be reasoned with logic. Millions were involved, hundreds of employees, thousands of sales, all being tracked in a couple of sheets. The faster data retrieval alone was worth it if we migrated it to a proper database.
Nothing you’ve described sounds like a valid reason to move “a couple of spreadsheets” to a database system which might require: budget planning, database administrator, system design, server, colocation, query optimization, report system, monitoring, user interface, backup/recovery plan, authentication system, change management logs, user training, business process analysis, etc.
The two planks of wood across the canyon work fine, there's no valid reason to build a bridge! That would require planning, architects, engineers, safety inspections, construction, etc. Let's just keep ferrying cars across the wood planks, we all know how wood planks work, and we've got plenty of duct tape for when they break!
This is a deeply flawed analogy. Hosted cloud spreadsheet has effectively perfect availability and durability compared to most database solutions, and never requires maintenance, and costs nothing.
Google Sheets, the product, may have excellent reliability, but business processes that rely on spreadsheets certainly do not. The point in my analogy was not that the planks of wood were bad products, it's that they were being misused by ignorant people.
Use wood planks as a temporary crossing, don't ferry cars over them. Run your bake sale with spreadsheets, don't run a multimillion business with them.
I don't need to link you to the resources on the astronomical number of problems that are caused by spreadsheet-oriented business processes and shadow IT. Using spreadsheets instead of proper systems shifts the operational burden onto humans, who certainly do not have perfect availability and durability.
> but business processes that rely on spreadsheets certainly do not
I wonder if this migration from spreadsheets to databases is what helps ossify business culture in older companies. With spreadsheets, change and iteration are easy with fast cycles. With a database, change is a pain and iteration cycles are very long since everything has to though multiple stages of review, approval, and implementation.
Interesting idea. I'd say probably yes, to some extent. Bringing back the human element, I'd say the businesses still stuck on spreadsheets are much less open to change than those who are willing to adopt database technology.
While long review cycles for schema changes are annoying, I find it more worrying that most businesses have zero review process at all for spreadsheets.
That's why your standard IT guy loves databases and hates democratized software like Sheets. You can run a 80% of business in a spreadsheet but IT Guy doesn't want to be disintermediated.
No, as an actual IT guy myself, our job is to support and improve business technical operations. Most of us want _less_ work. If we wanted to be intermediaries, we'd be middle managers.
I don't know how much time you've spent supporting spreadsheet users, but I work with them daily and I see first-hand the amount of energy being wasted. Stuff that takes them all day, they could do in a single SQL query. Probably wouldn't take them more than a day to learn how.
It's not about democracy, it's about doing things in a sensible way. I'd be happy if they used another "democratic" tool like Microsoft PowerBI and a fully-managed database.
You're just gatekeeping. You have tautologically defined "sensible" to be the same as your personal preferences.
Eh, maybe. I'll hazard that my preferences for business technical operations are more informed than those of my users. I seek only to help them.
Side note, I'm quite surprised that somebody in this form is so staunchly in favor of spreadsheets.
But could the hundreds of employees continue to make changes to the DB if it was moved out of sheets?
Actually there were only a handful of people updating it daily and it was my job to make all of those somehow give accurate results. The hundred of employees I was talking about were the ones making the sales, no direct access to those sheets, and their performance is going to be evaluated with the data that was recorded within those sheets. Data retrieval is so slow, a month of sales by 300 salesperson is around 30k rows, we have to connect that to their attendance, the day whether it is a weekday, weekend, holiday, rest day. The store they were assigned. The monthly quota they have, etc.
I lasted 3 months only, got tired of the sudden meetings discussing why can't we give fast and accurate results.
The point the person above was trying to make is that the reason you got "blank stares" was that you were proposing replacing a system where the users could read and write the data, to a system where they can't access it. Everyone saw this but you. Im sure AGI would've seen it too.
Also, the fact that you offered to do it for free makes me question your motivation. I think you were trying to make your own job easier at the cost of the users no longer having the thing that worked for them.
They were using the wrong tool for the job. The R in RDBMS for the existing sheets, they were using R as in not machines but humans doing the RELATIONAL work. Yes I was trying to make my job easier, that is why mankind invented RDBMS right? To make our work easier and accurate? No?
The user interface is usually more important than the backend. If the user interface for a spreadsheet is more powerful, then that can make it a better solution than a database that has a better backend but terrible UI.
It can be very difficult to replace spreadsheets because the UI has had 30+ years of iteration.
Can’t they though? Presumably there would be a form that validates inputs that they can use. Part of the problem that OP described is that they were writing directly to the “database” (and OP had to validate input afterwards).
Spreadsheets have their place, not every sheet needs to be migrated to a database application. But some certainly should.
Somewhere before "I had to procure a new VM just to open this Excel file", you should probably migrate to a database + Python
So it probably would have been better to give them an excel-like frontend connected to an sql server?
If you could match the same price, security, availability, mobile app/responsive UI support, audit history, comments, read-only sharing, export features, 3rd party add-ons, etc.
The "we will destroy all your accounts across our entire service catalog including any potential of future earnings if you cross our AI abuse detection thresholds" sword of Damocles is obviously unavailable and thus out of scope.
I hate to break it to you, but an equally likely outcome of the introduction of AGI would be to replace you instead, and give those users exactly what they asked for, even if it's not optimal (or improving it, but preserving the interface they are used to).
I am not a lawyer, and I do not know your jurisdiction, but where I come from, it would be illegal for me to work off-the-clock for my employer. It opens up many cans of worms regarding liability, accounting, etc. I have to wonder whether those sorts of things were going through the heads which gave you "blank stares". Did you attempt to compromise by offering to do it on the clock instead?
> I really hope AGI will soon be able to replace those humans that cannot be reasoned with logic.
Great use case
Tools such as steampipe, and Python pandas/gspread make it pretty trivial to get data and push it out into Google Sheets as a front end.
eg. I frequently use Google Sheets as the UI for simple and complex projects. The web front end is provided by Google, as is the identity/auth part. You only have to worry about important part - the data.
Those setup instructions are so frustrating: https://github.com/0x6b/libgsqlite#setup-google-cloud
Why does this have to be so hard!? I really wish Google Cloud (and likewise AWS and many other providers) didn't force users to jump through so many steps to use tools like this.
> Why does this have to be so hard!?
Well, in part because they are using the console rather driving it programmatically; all the cloud platforms are optimized for programmatic/IAC rather than console operations.
And in part because Google Cloud (as opposed to AWS) has a segregated-projects, and activate-options approach to manage the scope of management surface and costs, whereas AWS (unless you create separate accounts) is more of “lump everything together”, and within an account is everything-default-on.
If there was a similar thing for AWS, while there might be some configuration needed, the “create a project” and “activate the API” processes probably wouldn’t be part of the basic instructions.
I get it. Google Sheets costs you Zero money.
BUT you can get a proper SQL managed instance(with much faster query, larger data store, Geo-replication, point-in-time backup etc) for less than $5 a month.
Do your due diligence, identify your needs, compare the price of engineering hours with $5/month and go ahead with whatever suites you the best.
Sheets has a human-friendly, readily-understood, multi-user capable user interface.
You’re likely correct that $5/mo isn’t the barrier to switching.
100% agree, user-friendliness is better in Sheets.
Sheets are also geo-replicated and have infinite version history. I seriously doubt that anyone has ever lost data in Sheets.
But are they always aware when correct version isn't current version?
I could see this being very useful. I uploaded a 70MB xls file to Sheets and Chrome says that tab is 600MB to 800MB of RAM. I ran a Find operation and that spiked up to 1.2GB of RAM. How 70MB of data turns into over 1GB of RAM is beyond me.
This seems like the least sqlite-like possible thing. Immediately, it makes me ask if sqlite is the right tool for a job that involves Google Sheets. But I'm sure it fills a very particular need for some people.
Someone has had the idea to plug all that into GPT? Like "please take those data and tell me what's the average price of our customers' basquets"?
This is perfect!