ashvardanian 6 hours ago

Five years ago I was absolutely frustrated with the state of Graph databases and libraries and tried putting several non-Graph DBMSs behind a NetworkX-like Python interface <https://github.com/unum-cloud/NetworkXum>.

When benchmarked, Neo4J crashed on every graph I’ve tried <https://www.unum.cloud/blog/2020-11-12-graphs>, making SQLite and Postgres much more viable options even for network-processing workloads. So I wouldn’t be surprised to learn that people actually use pgRouting and Supabase in that setting.

With the rise of Postgres-compatible I’m wondering if it’s worth refreshing the project. Similarly, there are now more Graph DBs like MemGraph compatible with CYPHER, which should probably work much better than Neo4J.

  • threeseed an hour ago

    You ran Neo4J with 512MB even thought it has always recommended 2GB at a minimum.

    And MemGraph is nice but it's memory only where as Neo4J is designed for super large graphs that live on the filesystem. Not really that comparable.

  • henryfjordan 6 hours ago

    I had almost exactly the opposite experience, although my dataset was pretty small.

    We wanted to store a graph in postgres and ended up writing some recursive queries to pull subgraphs then had NetworkX layered over it to do some more complex graph operations. We ended up doing that for a short while but then switched to Neo4j because of how comparatively easy it was to write queries (although the Python support for Neo4j was severely lacking). Never really stressed it out on dataset size though.

    I did manage to crash Redis' graph plugin pretty quickly when I was testing that.

    • SahAssar 6 hours ago

      Not sure what you consider "quite small" and I don't know how NetworkX works, but postgresql recursive queries have worked well for me for small graphs.

      Could you share what the data structure and scale was?

      • henryfjordan 5 hours ago

        We basically had a single table that we wanted to be able to nest on itself arbitrarily. Think categories and subcategories, maybe 100k nodes/rows

        Postgres worked fine but cypher is so much more expressive and handles stuff like loop detection for you, neo4j was much easier to work with. Performance wasn't ever really an issue with either.

        • AlphaSite 2 hours ago

          We have something similar in Postgres but IMO disconnectedness also plays a really big part in this whole calculation. We actually ended up just changing the transitive closure for fast operations (and simpler code).

  • nemo44x 5 hours ago

    Neo4j is pretty bad and very dated. No idea what they’re doing. MemGraph is a much better tool.

    Really graph is a feature and not a product.

    • threeseed an hour ago

      Neo4J is mature not dated which is why it's so popular.

      And couldn't disagree more that graph is a feature. You really want something optimised for it (query language / storage approach) as the data structure is so different in every way from a relational or document store.

kiwicopple 6 hours ago

My original goal in this article was to figure out if pgrouting would be a good tool to build a memory-layer (for AI/agents) but the article got a bit long. Early results are promising - I’ll follow up with another article soon

there are some other interesting extensions in this space - onesparse[0] is early in development but pretty exciting as it builds on SuiteSparse which is very mature

[0] https://onesparse.com/docs.html

  • michelpp 5 hours ago

    Thanks Paul! OneSparse author here, we're still in early dev stages (OneSparse requires some new features in postgres that won't be available until pg18) but my plan is to do a benchmark shootout with various graph tool for postgres. Initial results look good, on my 4-core 11th gen intel laptop we're getting some really good numbers!

                 LiveJournal                 Orkut
      Nodes:     3,997,962                   3,072,441
      Edges:     34,681,185                  117,185,037
      Triangles: 177,820,130                 627,583,972
    
                      Seconds  Edges/Second  Seconds  Edges/Second
      Tri Count LL:   2.69     12,892,634    32.03    3,658,602
      Tri Count LU:   1.78     19,483,812    16.38    7,156,338
      Tri Centrality: 1.45     23,918,059    12.22    9,589,610
      Page Rank:      7.12     4,870,953     23.14    5,064,176
    
    Orkut was as big as I could go due to limited RAM. One of my constrains is limited access to big enough hardware to do the kinds of Graphs Of Unusual Size (billions of edges, trillions of triangles) where we can really flex the scale that CUDA support gives us. Stay tuned!
nrjames 2 hours ago

I’ve always wondered why there isn’t a “SQLite for graphs,” so to speak. Is there something about how they have to be stored that precludes an in-process solution with disk-based storage?

  • ryangs 2 hours ago

    https://www.hillelwayne.com/post/graph-types/ gives an interesting take on why we don't see a graph type as a primitive on more programming languages. Essentially boils down to graphs being very vague and depending on the topology of your graphs you are going to want different implementations for reasonable efficiency.

    That said, there are graph databases.

    • nrjames 2 hours ago

      Interesting! Thanks for the link. I suppose the graph databases just take an opinionated approach. NetworkX is great; I always wished it had a simple backend.

  • canadiantim 2 hours ago

    There is now, it's Kuzudb, embedded as well.

szvsw 4 hours ago

Supabase consistently puts out such fantastic bite-sized gems - and for me some of my favorites have been related to PostGIS - whether it’s about serving tiles directly, or this (ab)use of functionality typically used in a PG geospatial context. Nothing revolutionary or massive and complex - not like reading DDIA of course, but just fun and mentally activating, making me want to jump into something new. I really applaud them for frequently posting actually engaging content that just gets you excited to work with databases… it sounds silly to say it like that, but it does feel like I get regularly struck with the feeling of sadness when I realize how vanilla all of my daily development related interactions with dbs are so vanilla.

rubenvanwyk an hour ago

You can always trust Postgres to have another extension that opens up great new data modelling opportunities. This is great. Wonder how this stacks up to the CedarDB (Postgres-compatible) graph capabilities.

xnx 7 hours ago

Resourceful, but is there a reason to use this approach over pgvector?

  • nyrikki 6 hours ago

    There is an unfortunate overloaded of the terms relational and relationships in relational databases.

    Relationships is association between relations/tables, parent-child, node/edge etc, depending on model, extensions etc..

    There are three basic models of databases:

          model name.  | basic data structure
          -----------------------------------
          relational   | tables
          hierarchical | trees
          network      | graph 
    
    A "relational" in RDBMS and Codd's rules is just a table data structure with some additional rules.

    Part of those rules are a named table, with named and typed attributes (columns) with data in the form of rows of tuples.

    PgVector is nearest neighbor search for tuple values, often from a single table/relation while PgRouting is graph traversal for relational data.

    There is a bit more to that and in the relational model the data is independent of the schema, and no RDBMS is pure.

    It is possibly helpful to realize that pgvector is about finding neighbors among tuples, in that relation/table, and that it is very different from graph traversal.

whalesalad 6 hours ago

Has anyone done something like this with a very large dataset? Hundreds of millions of rows.