rompic a year ago

My favourite random sqlite story: The company I once worked for used an outdated version of sqlite (3.8.6) in one of their products. The databases used got bigger and bigger and in a very big project one of the "already known to be slow"-queries took more than an hour on my laptop making the tool unusable.

On a quiet day, I was able to save the temporary table used as part of the process and run the problematic query against it in an isolated fashion.

The query returned an extremely high number of results and when I discovered this I questioned my SQL-fu, my sanity and my trust in computers.

I found that we were hit by a bug that was fixed 6 years before I discovered it (https://sqlite.org/src/info/6f2222d550f5b0ee7ed). Sqlite's query planner assumed that a field with a not null constraint can never be null, which isn't the case for the right hand table in a left join.

I fixed it by adding a not null check in the query and then later by updating the library. After that the 1 hour query ran in ~700 ms.

This faster run time also helped with smaller projects and in the end allowed extending our test suite considerably.

Tldr: Keep your dependencies up to date.

  • zoomablemind a year ago

    > ...Tldr: Keep your dependencies up to date.

    I'd rather TL;DR it as use query EXPLAIN to see what may be slowing any query down.

    • rompic a year ago

      I think it was due to EXPLAIN that I found out what's happening.

      At the same time there have been roughly 60 release notes that mention performance since 3.8.6, so these came for free with the update: https://www.sqlite.org/changes.html

putna a year ago

wow, amazing story.

> it’s the old joke of, you get 95% of the functionality with the first 95% of your budget, and the last 5% on the second 95% of your budget.

repeats every time

  • aidenn0 a year ago

    I think that's overly optimistic. It implies you can have a complete project with less than double your budget.

    • pflanze a year ago

      I thought that the implication was that the budget grew by a factor of ~20, not 2. As in, the first 95% finished in 95% of the original budget, the rest finished in 95% of the final budget, so the first 95% took 5% of the final budget.

      Hmm, maybe I didn't detect the sarcasm in your reply.

      • aidenn0 a year ago

        Oh that makes more sense. I thought both were in terms of the original budget

fijiaarone a year ago

I’m interested in looking at fossil SCM - not the built in web server, wiki, and issue tracking, but the auto sync and simplified interface appeal for personal projects, and maybe an embedded versioned file based CMS.

  • maxk42 a year ago

    I've used git since it was first released to the public and recently I feel like I know less about it than ever before. I switched to fossil for personal projects around a year ago and haven't regretted it one bit!

    • dognotdog a year ago

      I kind of went the other way as github/gitlab took off. I was using fossil for years because it was so self contained with issue tracking and everything, and it is great, but the lack of easy two-way interop with github had me gradually drop it for new projects, when I anticipate some level of other contributors. That being said, going from fossil->git via git-export is very simple and painless.

  • ilyt a year ago

    That reminds me of git-bug https://news.ycombinator.com/item?id=33730417

    Basically issue tracking added to git repo directly

    That being said my personal project are just push/pull/commit so I don't see the reason to change. Maybe some script to auto-push commits every hour or something but, well I have backups so that's not really required either

kjellsbells a year ago

This was a great story, nicely told.

Looking into it as an outsider, it seems that a key inflection point was their adoption of a really industrial-strength test discipline at just the right time. Not only was it impressive to have written the engine, but it seems to be doubly impressive to knuckle down for a year to get the test coverage - and it paid off in spades.

papito a year ago

The origins of SQLite are pretty wild. This was one of my favorite episodes.

fijiaarone a year ago

“Freedom means taking care of yourself.”

  • bch a year ago

    > “Freedom means taking care of yourself.”

    Others say "freedom" is just another word for "nothing left to lose."

    By the first definition though, I think one should keep in mind "With great [freedom] comes great responsibility."

panda_fish a year ago

Thanks for that amazing trip down memory lane…

> Wow, I’ve got an SQL database running on my Palm Pilot.

Weirdly, my thesis in 2000 was to write an SQL parser for Palm (Palm V - still have it). I remember there being such a massive gap in the market for a pervasive standards-compliant data storage solution. I used javacc, which is still around I think - I can’t imagine it covered 1% of the features of SQLite though. Bravo!

jacob019 a year ago

Should I be using file extension .db or .sqlite3 ?

  • breck a year ago

    I prefer .sqlite3. It's 10x clearer what it is.

  • simonw a year ago

    My experience is that it doesn't matter at all, but .db appears to me a lot more common than .sqlite3

    • ocdtrekkie a year ago

      I tend to use .sqlite so people know what the file is likely to open in. On Windows systems, I've found endless .dat files that could be from any number of database formats.

      I've noticed the tendency for .db but I consider it a bad practice, all things considered.

zoomablemind a year ago

Anybody figured out what's those formulations may be on the white board in the article's hero image?

Quantum mechanics?

  • sega_sai a year ago

    I could recognise the Schroedinger's equation and one Maxwell's equation.

throwaway29303 a year ago

  Shane Harrelson did this for us about 10 years ago. He came up with this huge corpus of SQL statements, and he ran them against every database engine that he could get his hands on. We wanted to make sure everybody got the same answer, and he managed to segfault every single database engine he tried, including SQLite, except for Postgres. Postgres always ran and gave the correct answer. We were never able to find a fault in that. The Postgres people tell me that we just weren’t trying hard enough. It is possible to fault Postgres, but we were very impressed.

  We crashed Oracle, including commercial versions of Oracle. We crashed DB2. Anything we could get our hands on, we tried it and we managed to crash it, but the point was that we wanted to make sure that SQLite got the same answers for all of these queries, or equivalent answers, because a lot of these queries, they’re indeterminate and the rows might come out in a different order because you [crosstalk 00:25:10] order by clause, so we wanted to make sure that all the database engines got equivalent answers. Mostly, we wanted to make sure that SQLite was getting the same answers everybody else is.

  That’s another test suite, and then we have lots of smaller ones, as well. Between them all, it’s a lot of testing code, and it takes a long time to run.
I hope some people are/were paying attention to this. ;)
  • marcosdumay a year ago

    You mean fixing their databases? I would expect Postgres dev to act on it, but not much from the other ones.

    • no_wizard a year ago

      How do mean act on it? Sounds like Postgres is the only one that didn’t fault

      • sbaiddn a year ago

        It was a funny. They'll jump on it because they're conscientious. Because they are conscientious they have nothing ti jump to

posharma a year ago

There’s an industry wide fad (like ChatGPT) and then there’s a hacker news wide fad which is SQLite :-). Not sure how it all started. Last few months (or my be even an year) there has been an unprecedented number of SQLite posts. What gives?

  • ravi-delia a year ago

    I feel like it's just that it's one of the coolest pieces of software that every developer- web, backend, systems, or otherwise, has at least seen before. It runs in the browser, it runs on your phone, and unlike other databases (all of which are also very cool) it is completely within the domain of even a single inexperienced developer. It's just a little file on disk, just a library for accessing it, there's no magic! Especially if you're used to taking a database for granted it's just an extremely cool thing to think about.

    • ethbr0 a year ago

      The brilliance of SQLite is the brilliance of interfaces.

      Value (in the abstract, not just $ sense) accrued around SQL.

      At some point, so much value accrued that people were using it for things it wasn't designed to do.

      SQLite provided a solution for "people who want to use a database, but don't look like traditional database operators." Turns out there's a lot of those.

      That this large userbase existed was a brilliant observation, combined with brilliant execution in shepherding and evolving SQLite since.

      And none of the above would've been possible if the SQL interface hadn't been standardized and adopted over the last few decades*.

      * Turns out, SQL's 50th anniversary will be 2024

  • lliamander a year ago

    I think most programmers would like to be remembered for having written some really cool, useful piece of software - ideally by themselves or with a small team of people, where they have creative control over the requirements and design.

    SQLite (and it's founder, Richard Hipp) are an inspirational example of such success.

    • int_19h a year ago

      Not just useful piece of software, but useful piece of software that actually gets widely used. Better yet, the one that has staying power, so it's not just a brief stint in the limelight.

      Which, of course, is very much the case with SQLite. It shows that a small team with a vision and an emphasis on quality can make a product that becomes pervasive in the industry for decades.

  • ok123456 a year ago

    Calling SQLite a fad is like calling Linux a fad.

    • nindalf a year ago

      He doesn’t mean using it. He means posting about SQLite on HN. The story of SQLite, how it came to be developed, how it’s tested, how it really works under the hood, funky ways to use it on the server for storage (fly.io, Cloudflare do this). All of these have become more common in the last year.

      • catach a year ago

        Yeah, my impression is that SQLite stories have been very common for many years on HN.

        EDIT: did some counting, certainly a huge increase.

        Threads with "sqlite" in the title:

        2022: 346

        2019: 142

        • ok123456 a year ago

          Stories about technologies that people actually use are better than the usual PR fluff pieces about web-doo-dads that you need to pay-per-query.

  • jskulski a year ago

    Haha yeah I feel like SQLite is a very _told_ story on HN. Great story and tech for real, inspirational but also very trendy right now!

    • posharma a year ago

      Yeah, exactly. May be I should’ve used the word trendy as some ppl take offense to the word fad for SQLite.

  • geysersam a year ago

    It might be more than a year actually. I'm pretty sure it was trendy 1.5 years ago. Remember I read a really cool blog post around then where someone used range requests to query SQLite statically. It used WASM + SQLite and hyped hard here.

  • sangnoir a year ago

    SQLite (like curl and libcurl) have been pervasive and awesome for a long time and you'd always find posts about both on HN since forever, because of their ubiquity.

    SQLite has been appearing a lot more often on HN because of a different more recent fad: edge computing.

  • jestarray a year ago

    how is sqlite a fad? its used pretty much everywhere, is simple, and yet surprisingly scalable to some

    • posharma a year ago

      SQLite is not a fad. It’s just that it never gained so much attention here before.

      • cdcarter a year ago

        Google Trends [0] indicates that sqlite is about as popular as usual. There's been some recent growth in the last year, but that's after a bit of a dip. A regression of the time series would be fairly flat.

        I suspect popularity here comes in waves through a Katamari Damacy effect. People start reading about a topic, and start posting, thus more people read, research, post, etc... until a saturation point, a cooling off period, and then a rebuild.

        [0]: obviously not the same as whats happening on HN, I'd love to see someone pull these numbers from the HN API!

        • TheRealPomax a year ago

          They were talking about it being a fad here on HN, in terms of submitted (and upvoted to FP) posts per week/month/year. It's always been a topic getting the incidental FP attention, but the number of times something relating to SQLite makes it to the front page here seems to have increased considerably in recent months.

          • cdcarter a year ago

            I believe I acknowledged that.

  • lukevp a year ago

    I think as single computers get more and more power, and engineers work in these super slow, eventually consistent cloud infrastructures, there’s a visceral appeal of an architecture that can run on my laptop and also scale to a reasonable user base. The only limiting factor for most applications at this point that would prevent a single machine from serving them is having HA/DR. So a lot of effort is being made at the replication side of SQLite so it can be part of this new stack. I am seeing it emerge as a reaction to the crazy amount of microservices and distributed computing just to solve small problems that could fit on a machine. And the inability to work locally as well.

  • jonas21 a year ago

    Last few months? I feel like SQLite has been popular on HN for at least the last 5 years (and the data seems to confirm this [1]). In which case, maybe it's not so much of a fad?

    [1] https://hn.algolia.com/?q=sqlite

  • mappu a year ago

    Just a theory: containerized backends gave developers much more freedom of choice to pick technologies that happen to work better with SQLite.

    Go and Rust work great with SQLite because all the concurrency is mediated within a single application process.

    During the PHP and Ruby years, SQLite did not work that well and still does not, because the different worker processes can't communicate and SQLite has an extremely poor-performing sleep loop around the write lock.

    This gave it a low-performance reputation compared to MySQL and Postgres that it has struggled to shake off.

  • vagrantJin a year ago

    I doubt ChatGPT is a fad. Feels like the second coming of the gun and it will have an outsized impact.

  • Seattle3503 a year ago

    For me, it is always the best place to start when I need a DB for personal projects. The serverless nature makes it very easy to use. It is easy to fall in love with the convience it provides, because lets you focus on the fun parts of your project.

elteto a year ago

“Shane Harrelson […] came up with this huge corpus of SQL statements, and he ran them against every database engine that he could get his hands on. We wanted to make sure everybody got the same answer, and he managed to segfault every single database engine he tried, including SQLite, except for Postgres. Postgres always ran and gave the correct answer. We were never able to find a fault in that. The Postgres people tell me that we just weren’t trying hard enough. It is possible to fault Postgres, but we were very impressed.”

Did not expect to find such a cool anecdote about Postgres here!

  • brongondwana a year ago

    I had some Perl at a previous job which would produce a multi-thousand line SQL statement that was pasted into a phppgadmin screen to get a data set out. Awful workaround but it was smooth and fast. Very nice db to work with

revskill a year ago

Interview question: Given a 2GB csv file, write a script to do calculation with it.

Me: Put all in SQLite and write a SQL query.

  • dfinninger a year ago

    Yeah, there are actually a number of scripts I have at my company that use Python’s baked in SQLite3 module to do something complicated very easily.

    There’s even a lib for Parquet making analysis on a small number of problematic files quite easy.

  • pletnes a year ago

    Have a go with duckdb next time - you can query csv files without loading them first.

    • simonw a year ago

      You can do that with SQLite too: https://til.simonwillison.net/sqlite/one-line-csv-operations

      (DuckDB is a lot more ergonomic for that kind of thing though - it's really fantastic tech)

      • pletnes a year ago

        Cool! Although I believe duckdb can do it on disk / out of memory, so querying huge files are possible. I also like its syntax, I tend to CREATE VIEW mycsv AS SELECT * FROM ‘my.csv’ (or similar). Then I think you can select or join even across files, although I haven’t gotten that far yet.

      • sam_lowry_ a year ago

        Unlike sqlite, DuckDB is a very complex and much less polished.

    • deaddodo a year ago

      The spirit/answer being sought of the question is to not have to load all of the data in memory first.

      • pletnes a year ago

        I believe duckdb does not load the whole csv file into memory. It will load a few rows to find column headers and guess data types.

        • deaddodo a year ago

          You’re just pettyfogging the situation. The spirit of the question is to find a solution that is acceptable/performant algorithmically.

          Certainly, there are hiring panels that appreciate these sorts of tricks to go around the solution, usually citing “out of the box” thinking, but the majority would probably just say “do it without that solution” or mark you as a fail.

  • mjcohen a year ago

    If it is a "nice" csv file, gawk will do quite well.

counttheforks a year ago

[flagged]

  • bobobob420 a year ago

    marked as duplicate https://news.ycombinator.com/item?id=34504085 https://news.ycombinator.com/item?id=34511831

    These were not taken down. Anyways who cares what the BBC has to say about India. They are not exactly a known unbiased source when it comes to coverage of India and never have been.

    I slightly agree with some of your thoughts on speaking your mind freely on this board however its probably for the best. Dang knows this place turning into a cesspool is not useful. I will say to other hacker news commentators please do not downvote if you disagree with a comment. Discourse is important but I do agree anonymous discourse is much more nuanced

    I do not understand how this post is number one. Are you saying Hacker News can promote certain posts to the top and it is not organically curated?

    • counttheforks a year ago

      A dupe of 3 and 4 day old posts, which got nearly 200 comments? No. It is not reasonable to just remove that. This is censorship. Especially when you consider that dang tends to instead just post a comment saying: "Previously discussed at ..."

      > I do not understand how this post is number one. Are you saying Hacker News can promote certain posts to the top and it is not organically curated?

      Absolutely, yes. The mods can influence the ranking directly and they have openly talked about this. If you search https://hn.algolia.com/ for "penalized", "demoted" or "promoted" you will find discussions regarding this.

      ---

      Since dang banned me from posting new replies:

      > Maybe you haven’t been here long enough to have read the FAQ or know that generally all dupes from 3-4 days ago are marked, and only allowed after at least several months?

      I've been here for many years. I just make a new account periodically. Maybe you haven't been paying enough attention to see the difference in behavior that dang exhibits when he wants to censor something? Notice how he's usually all over the comments explaining why something was removed as dupe, but now there is nothing except him linking to the older posts (https://news.ycombinator.com/item?id=34559614), waiting for an hour or two, and then silently deleting the new one.

      Not to mention that the subject of the post is completely different. The latest post is about the documentary being removed from archive.org at the request of the BBC. None of the earlier posts cover this.

      • dahart a year ago

        Maybe you haven’t been here long enough to have read the FAQ or know that generally all dupes from 3-4 days ago are marked, and only allowed after at least several months?

        https://news.ycombinator.com/newsfaq.html

        “If a story has not had significant attention in the last year or so, a small number of reposts is ok. Otherwise we bury reposts as duplicates.”

      • dahart a year ago

        I’m confused by your comment and edits. You both asked for and complained about being given a reason why the article is a dupe. Which is it? The comment explains why using those two links. Why has this particular article gotten you so riled up?

        I come here in part for the moderation, it increases the quality of discussions. It’s not surprising that if you keep making a lot of noise and trying to generate off-topic outrage, it might be met with some active moderation, that’s exactly what I would expect, and I’m sorry it’s bothering you so much, but FWIW I don’t mind that it’s happening, and it does seem reasonable and in-bounds from my perspective. BTW to me it seems like you might be making some incorrect assumptions about what’s happening and why; the news topic does seem to be covered adequately with the first one or two articles.

        • counttheforks a year ago

          > You both asked for and complained about being given a reason why the article is a dupe. Which is it?

          Dang posted the links to the previous stories, didn't mark the post as a dupe or say anything about it being a dupe. Because it's not. Then two hours later it silently got deleted.

          • dahart a year ago

            It is marked as dupe, and not deleted, we can still see it. Why does it matter if there was a time delay between the two things you saw? What does that have to do with anything?

            • bobobob420 a year ago

              I feel like were glossing over this comment about hacker news moderators being able to influence the rankings of posts.

              • dahart a year ago

                Are we? So I’m really curious, what are you expecting, and why? Do you think moderators should not be able to influence the rankings of posts? You believe all posts, regardless of content, should be allowed? Even if they’re duplicate, even if they’re offensive and only start flame-wars, even if they’re lying or contain graphic or illegal content?

                To me, that doesn’t seem like an entirely reasonable thing to expect given that 1) the site guidelines explicitly say that posts will be influenced, and it gives explicit reasons why, 2) there are no large high quality forums anywhere that have high quality discussion without moderators deleting low-quality posts and comments, and 3) site mods on a private forum can always do whatever they want, it’s their site.

                It’s often worth taking a minute to re-read the site guidelines and FAQ. I do it every few months.

                https://news.ycombinator.com/newsguidelines.html

                https://news.ycombinator.com/newsfaq.html

                Like I said above, I want and expect some posts to be influenced by moderators, it’s one of the reasons I visit HN. I’m glad that duplicates are pushed down, just like I’m glad that off-topic comments and excessive hyperbole get pushed down by moderators.

                Dan is a human being, doing the never-ending job of keeping this tree pruned enough to allow high quality discussion, while in reality still allowing a whole lot of silly argument and unexceptional articles and blog posts. He’s only pruning stuff that really truly goes over the line, a line defined by site rules and applied more or less consistently. @counttheforks has said a bunch of mean stuff, has clearly made wrong assumptions about Dan (the claim about being banned from commenting immediately above being one of those), and they called Dan a liar in another thread. Even when mods see people being crappy all day every day, I’m sure that still doesn’t feel great. To Dan’s credit, he didn’t remove any of those comments.

    • croes a year ago

      It's not about what the BBC says but what India does to prevent others seeing what the BBC says.

EGreg a year ago

Now what about the untold true story of SQLite?

  • dmitriid a year ago

    A Netflix documentary series

    • usefulcat a year ago

      It’ll be cancelled after the first season

breck a year ago

One thing not emphasized enough in this interview: SQLite is public domain. So was the web, which also won against its better funded licensed competitors at the time.

The lesson is clear: if you want to win in the long run, you need not just great skills and tech, but you have to go public domain. Or to put it bluntly: #LicensesAreForLosers.

https://breckyunits.com/how-the-public-domain-can-win.html

  • simonw a year ago

    I think I've heard SQLite creator Richard D Hipp say in the past that he thinks using the public domain declaration rather than a regular open source license was a mistake.

    Public domain raises all sorts of challenges for potential adopters of the software that aren't an issue with a more deliberately designed license.

    UPDATE: I misremembered this. He does talk about some of the surprise challenges in this interview, but does not go as far as saying that he regretted it: https://changelog.com/podcast/201#transcript-215

    • maphew a year ago

      Some countries don't have Public Domain, and that can be a problem. I didn't know that. SQLite's response (from that transcript):

      """ “We’ll sell you a license for SQLite.”

      We do our best to talk them out of it and explain they don’t need this, but for a lot of people it’s cheaper to pay the fee and get the license than it is to convince their lawyers that they don’t need one. """

      Priceless! Except that it's not. LoL.

  • buttspelunker a year ago

    Are you talking about CERN's decision to put their web client and server source code into the public domain in 1993? It seems like a stretch - to me - to attribute the success of the web to that decision.

    • andylynch a year ago

      It was probably a big factor. Gopher was a real competitor initially but the University of Minnesota which owned the IP started trying to charge license fees in Feb ‘93- I suspect CERN’s decision was a response to this as well as MOSAIC’s similar efforts around the same time.

      • buttspelunker a year ago

        Was there a single university that didn't have NCSA Mosaic installed and available to their X terminals after that?

        This still doesn't add up to me.

    • breck a year ago

      You couldn't predict that CERN's protocol would win, but you COULD predict that a public domain protocol would win.

      Proof:

      For every pair {protocolX,protocolY} where functionality(protocolX) = functionality(protocolY) && isPublicDomain(protocolX) == true && isPublicDomain(protocolY) == false, then speedAndUtility(protocolX) >> speedAndUtility(protocolY).

      https://breckyunits.com/how-the-public-domain-can-win.html